* [PATCH 01/18] Initial support for -mevex512
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
@ 2023-09-21 7:19 ` Hu, Lin1
2023-10-07 6:34 ` [PATCH v2 " Haochen Jiang
2023-09-21 7:19 ` [PATCH 02/18] [PATCH 1/5] Push evex512 target for 512 bit intrins Hu, Lin1
` (18 subsequent siblings)
19 siblings, 1 reply; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:19 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_EVEX512_SET): New.
(OPTION_MASK_ISA2_EVEX512_UNSET): Ditto.
(ix86_handle_option): Handle EVEX512.
* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
* config/i386/i386-options.cc: (isa2_opts): Ditto.
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_option_override_internal): Set EVEX512 target if it is not
explicitly set when AVX512 is enabled. Disable
AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512.
* config/i386/i386.opt: Add mevex512. Temporaily RejectNegative.
---
gcc/common/config/i386/i386-common.cc | 15 +++++++++++++++
gcc/config/i386/i386-c.cc | 2 ++
gcc/config/i386/i386-options.cc | 19 ++++++++++++++++++-
gcc/config/i386/i386.opt | 4 ++++
4 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 95468b7c405..8cc59e08d06 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3. If not see
#define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
#define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
#define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_EVEX512_SET OPTION_MASK_ISA2_EVEX512
/* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2. */
@@ -309,6 +310,7 @@ along with GCC; see the file COPYING3. If not see
#define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3
#define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512
#define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512
/* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same
as -mno-sse4.1. */
@@ -1341,6 +1343,19 @@ ix86_handle_option (struct gcc_options *opts,
}
return true;
+ case OPT_mevex512:
+ if (value)
+ {
+ opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512_SET;
+ opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_SET;
+ }
+ else
+ {
+ opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_EVEX512_UNSET;
+ opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_UNSET;
+ }
+ return true;
+
case OPT_mfma:
if (value)
{
diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index 47768fa0940..93154efa7ff 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -707,6 +707,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
def_or_undef (parse_in, "__SHA512__");
if (isa_flag2 & OPTION_MASK_ISA2_SM4)
def_or_undef (parse_in, "__SM4__");
+ if (isa_flag2 & OPTION_MASK_ISA2_EVEX512)
+ def_or_undef (parse_in, "__EVEX512__");
if (TARGET_IAMCU)
{
def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index e47f9ed5d5f..a1a7a92da9f 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -250,7 +250,8 @@ static struct ix86_target_opts isa2_opts[] =
{ "-mavxvnniint16", OPTION_MASK_ISA2_AVXVNNIINT16 },
{ "-msm3", OPTION_MASK_ISA2_SM3 },
{ "-msha512", OPTION_MASK_ISA2_SHA512 },
- { "-msm4", OPTION_MASK_ISA2_SM4 }
+ { "-msm4", OPTION_MASK_ISA2_SM4 },
+ { "-mevex512", OPTION_MASK_ISA2_EVEX512 }
};
static struct ix86_target_opts isa_opts[] =
{
@@ -1109,6 +1110,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
IX86_ATTR_ISA ("sm3", OPT_msm3),
IX86_ATTR_ISA ("sha512", OPT_msha512),
IX86_ATTR_ISA ("sm4", OPT_msm4),
+ IX86_ATTR_ISA ("evex512", OPT_mevex512),
/* enum options */
IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_),
@@ -2559,6 +2561,21 @@ ix86_option_override_internal (bool main_args_p,
&= ~((OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_TBM)
& ~opts->x_ix86_isa_flags_explicit);
+ /* Set EVEX512 target if it is not explicitly set
+ when AVX512 is enabled. */
+ if (TARGET_AVX512F_P(opts->x_ix86_isa_flags)
+ && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_EVEX512))
+ opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512;
+
+ /* Disable AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512. */
+ if (!TARGET_EVEX512_P(opts->x_ix86_isa_flags2))
+ {
+ opts->x_ix86_isa_flags
+ &= ~(OPTION_MASK_ISA_AVX512PF | OPTION_MASK_ISA_AVX512ER);
+ opts->x_ix86_isa_flags2
+ &= ~(OPTION_MASK_ISA2_AVX5124FMAPS | OPTION_MASK_ISA2_AVX5124VNNIW);
+ }
+
/* Validate -mpreferred-stack-boundary= value or default it to
PREFERRED_STACK_BOUNDARY_DEFAULT. */
ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT;
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 78b499304a4..6d8601b1f75 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1310,3 +1310,7 @@ Enable vectorization for gather instruction.
mscatter
Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
Enable vectorization for scatter instruction.
+
+mevex512
+Target RejectNegative Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
+Support 512 bit vector built-in functions and code generation.
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 01/18] Initial support for -mevex512
2023-09-21 7:19 ` [PATCH 01/18] Initial support for -mevex512 Hu, Lin1
@ 2023-10-07 6:34 ` Haochen Jiang
0 siblings, 0 replies; 25+ messages in thread
From: Haochen Jiang @ 2023-10-07 6:34 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, zingaburga
Hi all,
Sorry for the patch revision delay since just back from the vacation.
I have slightly revised this patch for the __EVEX256__ request with the code:
diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index 47768fa0940..9c44bd7fb63 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -546,7 +546,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
if (isa_flag & OPTION_MASK_ISA_AVX512BW)
def_or_undef (parse_in, "__AVX512BW__");
if (isa_flag & OPTION_MASK_ISA_AVX512VL)
- def_or_undef (parse_in, "__AVX512VL__");
+ {
+ def_or_undef (parse_in, "__AVX512VL__");
+ def_or_undef (parse_in, "__EVEX256__");
+ }
if (isa_flag & OPTION_MASK_ISA_AVX512VBMI)
def_or_undef (parse_in, "__AVX512VBMI__");
if (isa_flag & OPTION_MASK_ISA_AVX512IFMA)
See if it meets the need. If there is no concern, I will commit all 18 patches
on Monday or Tuesday.
Thx,
Haochen
gcc/ChangeLog:
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_EVEX512_SET): New.
(OPTION_MASK_ISA2_EVEX512_UNSET): Ditto.
(ix86_handle_option): Handle EVEX512.
* config/i386/i386-c.cc
(ix86_target_macros_internal): Handle EVEX512. Add __EVEX256__
when AVX512VL is set.
* config/i386/i386-options.cc: (isa2_opts): Handle EVEX512.
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_option_override_internal): Set EVEX512 target if it is not
explicitly set when AVX512 is enabled. Disable
AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512.
* config/i386/i386.opt: Add mevex512. Temporaily RejectNegative.
---
gcc/common/config/i386/i386-common.cc | 15 +++++++++++++++
gcc/config/i386/i386-c.cc | 7 ++++++-
gcc/config/i386/i386-options.cc | 19 ++++++++++++++++++-
gcc/config/i386/i386.opt | 4 ++++
4 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 95468b7c405..8cc59e08d06 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3. If not see
#define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
#define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
#define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_EVEX512_SET OPTION_MASK_ISA2_EVEX512
/* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2. */
@@ -309,6 +310,7 @@ along with GCC; see the file COPYING3. If not see
#define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3
#define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512
#define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512
/* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same
as -mno-sse4.1. */
@@ -1341,6 +1343,19 @@ ix86_handle_option (struct gcc_options *opts,
}
return true;
+ case OPT_mevex512:
+ if (value)
+ {
+ opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512_SET;
+ opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_SET;
+ }
+ else
+ {
+ opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_EVEX512_UNSET;
+ opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_EVEX512_UNSET;
+ }
+ return true;
+
case OPT_mfma:
if (value)
{
diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index 47768fa0940..9c44bd7fb63 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -546,7 +546,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
if (isa_flag & OPTION_MASK_ISA_AVX512BW)
def_or_undef (parse_in, "__AVX512BW__");
if (isa_flag & OPTION_MASK_ISA_AVX512VL)
- def_or_undef (parse_in, "__AVX512VL__");
+ {
+ def_or_undef (parse_in, "__AVX512VL__");
+ def_or_undef (parse_in, "__EVEX256__");
+ }
if (isa_flag & OPTION_MASK_ISA_AVX512VBMI)
def_or_undef (parse_in, "__AVX512VBMI__");
if (isa_flag & OPTION_MASK_ISA_AVX512IFMA)
@@ -707,6 +710,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
def_or_undef (parse_in, "__SHA512__");
if (isa_flag2 & OPTION_MASK_ISA2_SM4)
def_or_undef (parse_in, "__SM4__");
+ if (isa_flag2 & OPTION_MASK_ISA2_EVEX512)
+ def_or_undef (parse_in, "__EVEX512__");
if (TARGET_IAMCU)
{
def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index e47f9ed5d5f..a1a7a92da9f 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -250,7 +250,8 @@ static struct ix86_target_opts isa2_opts[] =
{ "-mavxvnniint16", OPTION_MASK_ISA2_AVXVNNIINT16 },
{ "-msm3", OPTION_MASK_ISA2_SM3 },
{ "-msha512", OPTION_MASK_ISA2_SHA512 },
- { "-msm4", OPTION_MASK_ISA2_SM4 }
+ { "-msm4", OPTION_MASK_ISA2_SM4 },
+ { "-mevex512", OPTION_MASK_ISA2_EVEX512 }
};
static struct ix86_target_opts isa_opts[] =
{
@@ -1109,6 +1110,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
IX86_ATTR_ISA ("sm3", OPT_msm3),
IX86_ATTR_ISA ("sha512", OPT_msha512),
IX86_ATTR_ISA ("sm4", OPT_msm4),
+ IX86_ATTR_ISA ("evex512", OPT_mevex512),
/* enum options */
IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_),
@@ -2559,6 +2561,21 @@ ix86_option_override_internal (bool main_args_p,
&= ~((OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_TBM)
& ~opts->x_ix86_isa_flags_explicit);
+ /* Set EVEX512 target if it is not explicitly set
+ when AVX512 is enabled. */
+ if (TARGET_AVX512F_P(opts->x_ix86_isa_flags)
+ && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_EVEX512))
+ opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_EVEX512;
+
+ /* Disable AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512. */
+ if (!TARGET_EVEX512_P(opts->x_ix86_isa_flags2))
+ {
+ opts->x_ix86_isa_flags
+ &= ~(OPTION_MASK_ISA_AVX512PF | OPTION_MASK_ISA_AVX512ER);
+ opts->x_ix86_isa_flags2
+ &= ~(OPTION_MASK_ISA2_AVX5124FMAPS | OPTION_MASK_ISA2_AVX5124VNNIW);
+ }
+
/* Validate -mpreferred-stack-boundary= value or default it to
PREFERRED_STACK_BOUNDARY_DEFAULT. */
ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT;
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 78b499304a4..6d8601b1f75 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1310,3 +1310,7 @@ Enable vectorization for gather instruction.
mscatter
Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
Enable vectorization for scatter instruction.
+
+mevex512
+Target RejectNegative Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
+Support 512 bit vector built-in functions and code generation.
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 02/18] [PATCH 1/5] Push evex512 target for 512 bit intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
2023-09-21 7:19 ` [PATCH 01/18] Initial support for -mevex512 Hu, Lin1
@ 2023-09-21 7:19 ` Hu, Lin1
2023-09-21 7:19 ` [PATCH 03/18] [PATCH 2/5] " Hu, Lin1
` (17 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:19 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/avx512fintrin.h: Add evex512 target for 512 bit intrins.
---
gcc/config/i386/avx512fintrin.h | 19663 +++++++++++++++---------------
1 file changed, 9871 insertions(+), 9792 deletions(-)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 517e7878d8c..85bf72d9fae 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -34,4457 +34,4421 @@
#define __DISABLE_AVX512F__
#endif /* __AVX512F__ */
-/* Internal data types for implementing the intrinsics. */
-typedef double __v8df __attribute__ ((__vector_size__ (64)));
-typedef float __v16sf __attribute__ ((__vector_size__ (64)));
-typedef long long __v8di __attribute__ ((__vector_size__ (64)));
-typedef unsigned long long __v8du __attribute__ ((__vector_size__ (64)));
-typedef int __v16si __attribute__ ((__vector_size__ (64)));
-typedef unsigned int __v16su __attribute__ ((__vector_size__ (64)));
-typedef short __v32hi __attribute__ ((__vector_size__ (64)));
-typedef unsigned short __v32hu __attribute__ ((__vector_size__ (64)));
-typedef char __v64qi __attribute__ ((__vector_size__ (64)));
-typedef unsigned char __v64qu __attribute__ ((__vector_size__ (64)));
-
-/* The Intel API is flexible enough that we must allow aliasing with other
- vector types, and their scalar components. */
-typedef float __m512 __attribute__ ((__vector_size__ (64), __may_alias__));
-typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__));
-typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
-
-/* Unaligned version of the same type. */
-typedef float __m512_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
-typedef long long __m512i_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
-typedef double __m512d_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
-
typedef unsigned char __mmask8;
typedef unsigned short __mmask16;
+typedef unsigned int __mmask32;
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_int2mask (int __M)
+/* Constants for mantissa extraction */
+typedef enum
{
- return (__mmask16) __M;
-}
+ _MM_MANT_NORM_1_2, /* interval [1, 2) */
+ _MM_MANT_NORM_p5_2, /* interval [0.5, 2) */
+ _MM_MANT_NORM_p5_1, /* interval [0.5, 1) */
+ _MM_MANT_NORM_p75_1p5 /* interval [0.75, 1.5) */
+} _MM_MANTISSA_NORM_ENUM;
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2int (__mmask16 __M)
+typedef enum
{
- return (int) __M;
-}
+ _MM_MANT_SIGN_src, /* sign = sign(SRC) */
+ _MM_MANT_SIGN_zero, /* sign = 0 */
+ _MM_MANT_SIGN_nan /* DEST = NaN if sign(SRC) = 1 */
+} _MM_MANTISSA_SIGN_ENUM;
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_epi64 (long long __A, long long __B, long long __C,
- long long __D, long long __E, long long __F,
- long long __G, long long __H)
+_mm_add_round_sd (__m128d __A, __m128d __B, const int __R)
{
- return __extension__ (__m512i) (__v8di)
- { __H, __G, __F, __E, __D, __C, __B, __A };
+ return (__m128d) __builtin_ia32_addsd_round ((__v2df) __A,
+ (__v2df) __B,
+ __R);
}
-/* Create the vector [A B C D E F G H I J K L M N O P]. */
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_epi32 (int __A, int __B, int __C, int __D,
- int __E, int __F, int __G, int __H,
- int __I, int __J, int __K, int __L,
- int __M, int __N, int __O, int __P)
-{
- return __extension__ (__m512i)(__v16si)
- { __P, __O, __N, __M, __L, __K, __J, __I,
- __H, __G, __F, __E, __D, __C, __B, __A };
-}
-
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_epi16 (short __q31, short __q30, short __q29, short __q28,
- short __q27, short __q26, short __q25, short __q24,
- short __q23, short __q22, short __q21, short __q20,
- short __q19, short __q18, short __q17, short __q16,
- short __q15, short __q14, short __q13, short __q12,
- short __q11, short __q10, short __q09, short __q08,
- short __q07, short __q06, short __q05, short __q04,
- short __q03, short __q02, short __q01, short __q00)
+_mm_mask_add_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, const int __R)
{
- return __extension__ (__m512i)(__v32hi){
- __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
- __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
- __q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
- __q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31
- };
+ return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_epi8 (char __q63, char __q62, char __q61, char __q60,
- char __q59, char __q58, char __q57, char __q56,
- char __q55, char __q54, char __q53, char __q52,
- char __q51, char __q50, char __q49, char __q48,
- char __q47, char __q46, char __q45, char __q44,
- char __q43, char __q42, char __q41, char __q40,
- char __q39, char __q38, char __q37, char __q36,
- char __q35, char __q34, char __q33, char __q32,
- char __q31, char __q30, char __q29, char __q28,
- char __q27, char __q26, char __q25, char __q24,
- char __q23, char __q22, char __q21, char __q20,
- char __q19, char __q18, char __q17, char __q16,
- char __q15, char __q14, char __q13, char __q12,
- char __q11, char __q10, char __q09, char __q08,
- char __q07, char __q06, char __q05, char __q04,
- char __q03, char __q02, char __q01, char __q00)
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_add_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return __extension__ (__m512i)(__v64qi){
- __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
- __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
- __q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
- __q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31,
- __q32, __q33, __q34, __q35, __q36, __q37, __q38, __q39,
- __q40, __q41, __q42, __q43, __q44, __q45, __q46, __q47,
- __q48, __q49, __q50, __q51, __q52, __q53, __q54, __q55,
- __q56, __q57, __q58, __q59, __q60, __q61, __q62, __q63
- };
+ return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_pd (double __A, double __B, double __C, double __D,
- double __E, double __F, double __G, double __H)
+_mm_add_round_ss (__m128 __A, __m128 __B, const int __R)
{
- return __extension__ (__m512d)
- { __H, __G, __F, __E, __D, __C, __B, __A };
+ return (__m128) __builtin_ia32_addss_round ((__v4sf) __A,
+ (__v4sf) __B,
+ __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_ps (float __A, float __B, float __C, float __D,
- float __E, float __F, float __G, float __H,
- float __I, float __J, float __K, float __L,
- float __M, float __N, float __O, float __P)
+_mm_mask_add_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, const int __R)
{
- return __extension__ (__m512)
- { __P, __O, __N, __M, __L, __K, __J, __I,
- __H, __G, __F, __E, __D, __C, __B, __A };
+ return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U, __R);
}
-#define _mm512_setr_epi64(e0,e1,e2,e3,e4,e5,e6,e7) \
- _mm512_set_epi64(e7,e6,e5,e4,e3,e2,e1,e0)
-
-#define _mm512_setr_epi32(e0,e1,e2,e3,e4,e5,e6,e7, \
- e8,e9,e10,e11,e12,e13,e14,e15) \
- _mm512_set_epi32(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0)
-
-#define _mm512_setr_pd(e0,e1,e2,e3,e4,e5,e6,e7) \
- _mm512_set_pd(e7,e6,e5,e4,e3,e2,e1,e0)
-
-#define _mm512_setr_ps(e0,e1,e2,e3,e4,e5,e6,e7,e8,e9,e10,e11,e12,e13,e14,e15) \
- _mm512_set_ps(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0)
-
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_undefined_ps (void)
+_mm_maskz_add_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Winit-self"
- __m512 __Y = __Y;
-#pragma GCC diagnostic pop
- return __Y;
+ return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
-#define _mm512_undefined _mm512_undefined_ps
-
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_undefined_pd (void)
+_mm_sub_round_sd (__m128d __A, __m128d __B, const int __R)
{
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Winit-self"
- __m512d __Y = __Y;
-#pragma GCC diagnostic pop
- return __Y;
+ return (__m128d) __builtin_ia32_subsd_round ((__v2df) __A,
+ (__v2df) __B,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_undefined_epi32 (void)
+_mm_mask_sub_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, const int __R)
{
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Winit-self"
- __m512i __Y = __Y;
-#pragma GCC diagnostic pop
- return __Y;
+ return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U, __R);
}
-#define _mm512_undefined_si512 _mm512_undefined_epi32
-
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_epi8 (char __A)
+_mm_maskz_sub_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return __extension__ (__m512i)(__v64qi)
- { __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A };
+ return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_epi16 (short __A)
+_mm_sub_round_ss (__m128 __A, __m128 __B, const int __R)
{
- return __extension__ (__m512i)(__v32hi)
- { __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A };
+ return (__m128) __builtin_ia32_subss_round ((__v4sf) __A,
+ (__v4sf) __B,
+ __R);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_pd (double __A)
+_mm_mask_sub_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, const int __R)
{
- return __extension__ (__m512d)(__v8df)
- { __A, __A, __A, __A, __A, __A, __A, __A };
+ return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_ps (float __A)
+_mm_maskz_sub_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return __extension__ (__m512)(__v16sf)
- { __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A };
+ return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
-/* Create the vector [A B C D A B C D A B C D A B C D]. */
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set4_epi32 (int __A, int __B, int __C, int __D)
-{
- return __extension__ (__m512i)(__v16si)
- { __D, __C, __B, __A, __D, __C, __B, __A,
- __D, __C, __B, __A, __D, __C, __B, __A };
-}
+#else
+#define _mm_add_round_sd(A, B, C) \
+ (__m128d)__builtin_ia32_addsd_round(A, B, C)
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set4_epi64 (long long __A, long long __B, long long __C,
- long long __D)
-{
- return __extension__ (__m512i) (__v8di)
- { __D, __C, __B, __A, __D, __C, __B, __A };
-}
+#define _mm_mask_add_round_sd(W, U, A, B, C) \
+ (__m128d)__builtin_ia32_addsd_mask_round(A, B, W, U, C)
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set4_pd (double __A, double __B, double __C, double __D)
-{
- return __extension__ (__m512d)
- { __D, __C, __B, __A, __D, __C, __B, __A };
-}
+#define _mm_maskz_add_round_sd(U, A, B, C) \
+ (__m128d)__builtin_ia32_addsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set4_ps (float __A, float __B, float __C, float __D)
-{
- return __extension__ (__m512)
- { __D, __C, __B, __A, __D, __C, __B, __A,
- __D, __C, __B, __A, __D, __C, __B, __A };
-}
+#define _mm_add_round_ss(A, B, C) \
+ (__m128)__builtin_ia32_addss_round(A, B, C)
-#define _mm512_setr4_epi64(e0,e1,e2,e3) \
- _mm512_set4_epi64(e3,e2,e1,e0)
+#define _mm_mask_add_round_ss(W, U, A, B, C) \
+ (__m128)__builtin_ia32_addss_mask_round(A, B, W, U, C)
-#define _mm512_setr4_epi32(e0,e1,e2,e3) \
- _mm512_set4_epi32(e3,e2,e1,e0)
+#define _mm_maskz_add_round_ss(U, A, B, C) \
+ (__m128)__builtin_ia32_addss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-#define _mm512_setr4_pd(e0,e1,e2,e3) \
- _mm512_set4_pd(e3,e2,e1,e0)
+#define _mm_sub_round_sd(A, B, C) \
+ (__m128d)__builtin_ia32_subsd_round(A, B, C)
-#define _mm512_setr4_ps(e0,e1,e2,e3) \
- _mm512_set4_ps(e3,e2,e1,e0)
+#define _mm_mask_sub_round_sd(W, U, A, B, C) \
+ (__m128d)__builtin_ia32_subsd_mask_round(A, B, W, U, C)
-extern __inline __m512
+#define _mm_maskz_sub_round_sd(U, A, B, C) \
+ (__m128d)__builtin_ia32_subsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_sub_round_ss(A, B, C) \
+ (__m128)__builtin_ia32_subss_round(A, B, C)
+
+#define _mm_mask_sub_round_ss(W, U, A, B, C) \
+ (__m128)__builtin_ia32_subss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_sub_round_ss(U, A, B, C) \
+ (__m128)__builtin_ia32_subss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#endif
+
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_ps (void)
+_mm_rcp14_sd (__m128d __A, __m128d __B)
{
- return __extension__ (__m512){ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
- 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
+ return (__m128d) __builtin_ia32_rcp14sd ((__v2df) __B,
+ (__v2df) __A);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero (void)
+_mm_mask_rcp14_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return _mm512_setzero_ps ();
+ return (__m128d) __builtin_ia32_rcp14sd_mask ((__v2df) __B,
+ (__v2df) __A,
+ (__v2df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_pd (void)
+_mm_maskz_rcp14_sd (__mmask8 __U, __m128d __A, __m128d __B)
{
- return __extension__ (__m512d) { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
+ return (__m128d) __builtin_ia32_rcp14sd_mask ((__v2df) __B,
+ (__v2df) __A,
+ (__v2df) _mm_setzero_ps (),
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_epi32 (void)
+_mm_rcp14_ss (__m128 __A, __m128 __B)
{
- return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+ return (__m128) __builtin_ia32_rcp14ss ((__v4sf) __B,
+ (__v4sf) __A);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_si512 (void)
+_mm_mask_rcp14_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+ return (__m128) __builtin_ia32_rcp14ss_mask ((__v4sf) __B,
+ (__v4sf) __A,
+ (__v4sf) __W,
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mov_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm_maskz_rcp14_ss (__mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_rcp14ss_mask ((__v4sf) __B,
+ (__v4sf) __A,
+ (__v4sf) _mm_setzero_ps (),
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mov_pd (__mmask8 __U, __m512d __A)
+_mm_rsqrt14_sd (__m128d __A, __m128d __B)
{
- return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_rsqrt14sd ((__v2df) __B,
+ (__v2df) __A);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mov_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm_mask_rsqrt14_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_rsqrt14sd_mask ((__v2df) __B,
+ (__v2df) __A,
+ (__v2df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mov_ps (__mmask16 __U, __m512 __A)
+_mm_maskz_rsqrt14_sd (__mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_rsqrt14sd_mask ((__v2df) __B,
+ (__v2df) __A,
+ (__v2df) _mm_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_pd (void const *__P)
+_mm_rsqrt14_ss (__m128 __A, __m128 __B)
{
- return *(__m512d *) __P;
+ return (__m128) __builtin_ia32_rsqrt14ss ((__v4sf) __B,
+ (__v4sf) __A);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_load_pd (__m512d __W, __mmask8 __U, void const *__P)
+_mm_mask_rsqrt14_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_rsqrt14ss_mask ((__v4sf) __B,
+ (__v4sf) __A,
+ (__v4sf) __W,
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_load_pd (__mmask8 __U, void const *__P)
+_mm_maskz_rsqrt14_ss (__mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_rsqrt14ss_mask ((__v4sf) __B,
+ (__v4sf) __A,
+ (__v4sf) _mm_setzero_ps (),
+ (__mmask8) __U);
}
-extern __inline void
+#ifdef __OPTIMIZE__
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_pd (void *__P, __m512d __A)
+_mm_sqrt_round_sd (__m128d __A, __m128d __B, const int __R)
{
- *(__m512d *) __P = __A;
+ return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
+ (__v2df) __A,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline void
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_store_pd (void *__P, __mmask8 __U, __m512d __A)
+_mm_mask_sqrt_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- __builtin_ia32_storeapd512_mask ((__v8df *) __P, (__v8df) __A,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
+ (__v2df) __A,
+ (__v2df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_ps (void const *__P)
+_mm_maskz_sqrt_round_sd (__mmask8 __U, __m128d __A, __m128d __B, const int __R)
{
- return *(__m512 *) __P;
+ return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
+ (__v2df) __A,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_load_ps (__m512 __W, __mmask16 __U, void const *__P)
+_mm_sqrt_round_ss (__m128 __A, __m128 __B, const int __R)
{
- return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
+ (__v4sf) __A,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_load_ps (__mmask16 __U, void const *__P)
+_mm_mask_sqrt_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
+ (__v4sf) __A,
+ (__v4sf) __W,
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_ps (void *__P, __m512 __A)
+_mm_maskz_sqrt_round_ss (__mmask8 __U, __m128 __A, __m128 __B, const int __R)
{
- *(__m512 *) __P = __A;
+ return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
+ (__v4sf) __A,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_store_ps (void *__P, __mmask16 __U, __m512 __A)
+_mm_mul_round_sd (__m128d __A, __m128d __B, const int __R)
{
- __builtin_ia32_storeaps512_mask ((__v16sf *) __P, (__v16sf) __A,
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_mulsd_round ((__v2df) __A,
+ (__v2df) __B,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mov_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
+_mm_mask_mul_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, const int __R)
{
- return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mov_epi64 (__mmask8 __U, __m512i __A)
+_mm_maskz_mul_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_epi64 (void const *__P)
+_mm_mul_round_ss (__m128 __A, __m128 __B, const int __R)
{
- return *(__m512i *) __P;
+ return (__m128) __builtin_ia32_mulss_round ((__v4sf) __A,
+ (__v4sf) __B,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_load_epi64 (__m512i __W, __mmask8 __U, void const *__P)
+_mm_mask_mul_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, const int __R)
{
- return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_load_epi64 (__mmask8 __U, void const *__P)
+_mm_maskz_mul_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_epi64 (void *__P, __m512i __A)
+_mm_div_round_sd (__m128d __A, __m128d __B, const int __R)
{
- *(__m512i *) __P = __A;
+ return (__m128d) __builtin_ia32_divsd_round ((__v2df) __A,
+ (__v2df) __B,
+ __R);
}
-extern __inline void
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_store_epi64 (void *__P, __mmask8 __U, __m512i __A)
+_mm_mask_div_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, const int __R)
{
- __builtin_ia32_movdqa64store512_mask ((__v8di *) __P, (__v8di) __A,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mov_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
+_mm_maskz_div_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mov_epi32 (__mmask16 __U, __m512i __A)
+_mm_div_round_ss (__m128 __A, __m128 __B, const int __R)
{
- return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_divss_round ((__v4sf) __A,
+ (__v4sf) __B,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_si512 (void const *__P)
+_mm_mask_div_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, const int __R)
{
- return *(__m512i *) __P;
+ return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_epi32 (void const *__P)
+_mm_maskz_div_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return *(__m512i *) __P;
+ return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_load_epi32 (__m512i __W, __mmask16 __U, void const *__P)
+_mm_scalef_round_sd (__m128d __A, __m128d __B, const int __R)
{
- return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_load_epi32 (__mmask16 __U, void const *__P)
+_mm_mask_scalef_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_si512 (void *__P, __m512i __A)
+_mm_maskz_scalef_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- *(__m512i *) __P = __A;
+ return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_epi32 (void *__P, __m512i __A)
+_mm_scalef_round_ss (__m128 __A, __m128 __B, const int __R)
{
- *(__m512i *) __P = __A;
+ return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) -1, __R);
}
-extern __inline void
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_store_epi32 (void *__P, __mmask16 __U, __m512i __A)
+_mm_mask_scalef_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- __builtin_ia32_movdqa32store512_mask ((__v16si *) __P, (__v16si) __A,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mullo_epi32 (__m512i __A, __m512i __B)
+_mm_maskz_scalef_round_ss (__mmask8 __U, __m128 __A, __m128 __B, const int __R)
{
- return (__m512i) ((__v16su) __A * (__v16su) __B);
+ return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
+#else
+#define _mm_sqrt_round_sd(A, B, C) \
+ (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, \
+ (__v2df) _mm_setzero_pd (), -1, C)
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mullo_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
-{
- return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
-}
+#define _mm_mask_sqrt_round_sd(W, U, A, B, C) \
+ (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, W, U, C)
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mullo_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
-{
- return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W, __M);
-}
+#define _mm_maskz_sqrt_round_sd(U, A, B, C) \
+ (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, \
+ (__v2df) _mm_setzero_pd (), U, C)
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mullox_epi64 (__m512i __A, __m512i __B)
-{
- return (__m512i) ((__v8du) __A * (__v8du) __B);
-}
+#define _mm_sqrt_round_ss(A, B, C) \
+ (__m128)__builtin_ia32_sqrtss_mask_round (B, A, \
+ (__v4sf) _mm_setzero_ps (), -1, C)
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mullox_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
-{
- return _mm512_mask_mov_epi64 (__W, __M, _mm512_mullox_epi64 (__A, __B));
-}
+#define _mm_mask_sqrt_round_ss(W, U, A, B, C) \
+ (__m128)__builtin_ia32_sqrtss_mask_round (B, A, W, U, C)
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sllv_epi32 (__m512i __X, __m512i __Y)
+#define _mm_maskz_sqrt_round_ss(U, A, B, C) \
+ (__m128)__builtin_ia32_sqrtss_mask_round (B, A, \
+ (__v4sf) _mm_setzero_ps (), U, C)
+
+#define _mm_mul_round_sd(A, B, C) \
+ (__m128d)__builtin_ia32_mulsd_round(A, B, C)
+
+#define _mm_mask_mul_round_sd(W, U, A, B, C) \
+ (__m128d)__builtin_ia32_mulsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_mul_round_sd(U, A, B, C) \
+ (__m128d)__builtin_ia32_mulsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_mul_round_ss(A, B, C) \
+ (__m128)__builtin_ia32_mulss_round(A, B, C)
+
+#define _mm_mask_mul_round_ss(W, U, A, B, C) \
+ (__m128)__builtin_ia32_mulss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_mul_round_ss(U, A, B, C) \
+ (__m128)__builtin_ia32_mulss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#define _mm_div_round_sd(A, B, C) \
+ (__m128d)__builtin_ia32_divsd_round(A, B, C)
+
+#define _mm_mask_div_round_sd(W, U, A, B, C) \
+ (__m128d)__builtin_ia32_divsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_div_round_sd(U, A, B, C) \
+ (__m128d)__builtin_ia32_divsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_div_round_ss(A, B, C) \
+ (__m128)__builtin_ia32_divss_round(A, B, C)
+
+#define _mm_mask_div_round_ss(W, U, A, B, C) \
+ (__m128)__builtin_ia32_divss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_div_round_ss(U, A, B, C) \
+ (__m128)__builtin_ia32_divss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#define _mm_scalef_round_sd(A, B, C) \
+ ((__m128d) \
+ __builtin_ia32_scalefsd_mask_round ((A), (B), \
+ (__v2df) _mm_undefined_pd (), \
+ -1, (C)))
+
+#define _mm_scalef_round_ss(A, B, C) \
+ ((__m128) \
+ __builtin_ia32_scalefss_mask_round ((A), (B), \
+ (__v4sf) _mm_undefined_ps (), \
+ -1, (C)))
+
+#define _mm_mask_scalef_round_sd(W, U, A, B, C) \
+ ((__m128d) \
+ __builtin_ia32_scalefsd_mask_round ((A), (B), (W), (U), (C)))
+
+#define _mm_mask_scalef_round_ss(W, U, A, B, C) \
+ ((__m128) \
+ __builtin_ia32_scalefss_mask_round ((A), (B), (W), (U), (C)))
+
+#define _mm_maskz_scalef_round_sd(U, A, B, C) \
+ ((__m128d) \
+ __builtin_ia32_scalefsd_mask_round ((A), (B), \
+ (__v2df) _mm_setzero_pd (), \
+ (U), (C)))
+
+#define _mm_maskz_scalef_round_ss(U, A, B, C) \
+ ((__m128) \
+ __builtin_ia32_scalefss_mask_round ((A), (B), \
+ (__v4sf) _mm_setzero_ps (), \
+ (U), (C)))
+#endif
+
+#define _mm_mask_sqrt_sd(W, U, A, B) \
+ _mm_mask_sqrt_round_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_sqrt_sd(U, A, B) \
+ _mm_maskz_sqrt_round_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_mask_sqrt_ss(W, U, A, B) \
+ _mm_mask_sqrt_round_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_sqrt_ss(U, A, B) \
+ _mm_maskz_sqrt_round_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_mask_scalef_sd(W, U, A, B) \
+ _mm_mask_scalef_round_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_scalef_sd(U, A, B) \
+ _mm_maskz_scalef_round_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_mask_scalef_ss(W, U, A, B) \
+ _mm_mask_scalef_round_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_scalef_ss(U, A, B) \
+ _mm_maskz_scalef_round_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtu32_sd (__m128d __A, unsigned __B)
{
- return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_cvtusi2sd32 ((__v2df) __A, __B);
}
-extern __inline __m512i
+#ifdef __x86_64__
+#ifdef __OPTIMIZE__
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sllv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundu64_sd (__m128d __A, unsigned long long __B, const int __R)
{
- return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sllv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundi64_sd (__m128d __A, long long __B, const int __R)
{
- return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srav_epi32 (__m512i __X, __m512i __Y)
+_mm_cvt_roundsi64_sd (__m128d __A, long long __B, const int __R)
{
- return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R);
}
+#else
+#define _mm_cvt_roundu64_sd(A, B, C) \
+ (__m128d)__builtin_ia32_cvtusi2sd64(A, B, C)
-extern __inline __m512i
+#define _mm_cvt_roundi64_sd(A, B, C) \
+ (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C)
+
+#define _mm_cvt_roundsi64_sd(A, B, C) \
+ (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C)
+#endif
+
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srav_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundu32_ss (__m128 __A, unsigned __B, const int __R)
{
- return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srav_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundsi32_ss (__m128 __A, int __B, const int __R)
{
- return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srlv_epi32 (__m512i __X, __m512i __Y)
+_mm_cvt_roundi32_ss (__m128 __A, int __B, const int __R)
{
- return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R);
}
+#else
+#define _mm_cvt_roundu32_ss(A, B, C) \
+ (__m128)__builtin_ia32_cvtusi2ss32(A, B, C)
-extern __inline __m512i
+#define _mm_cvt_roundi32_ss(A, B, C) \
+ (__m128)__builtin_ia32_cvtsi2ss32(A, B, C)
+
+#define _mm_cvt_roundsi32_ss(A, B, C) \
+ (__m128)__builtin_ia32_cvtsi2ss32(A, B, C)
+#endif
+
+#ifdef __x86_64__
+#ifdef __OPTIMIZE__
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srlv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundu64_ss (__m128 __A, unsigned long long __B, const int __R)
{
- return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srlv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
+_mm_cvt_roundsi64_ss (__m128 __A, long long __B, const int __R)
{
- return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_epi64 (__m512i __A, __m512i __B)
+_mm_cvt_roundi64_ss (__m128 __A, long long __B, const int __R)
{
- return (__m512i) ((__v8du) __A + (__v8du) __B);
+ return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R);
}
+#else
+#define _mm_cvt_roundu64_ss(A, B, C) \
+ (__m128)__builtin_ia32_cvtusi2ss64(A, B, C)
-extern __inline __m512i
+#define _mm_cvt_roundi64_ss(A, B, C) \
+ (__m128)__builtin_ia32_cvtsi2ss64(A, B, C)
+
+#define _mm_cvt_roundsi64_ss(A, B, C) \
+ (__m128)__builtin_ia32_cvtsi2ss64(A, B, C)
+#endif
+
+#endif
+
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm_mask_load_ss (__m128 __W, __mmask8 __U, const float *__P)
{
- return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_loadss_mask (__P, (__v4sf) __W, __U);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm_maskz_load_ss (__mmask8 __U, const float *__P)
{
- return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_loadss_mask (__P, (__v4sf) _mm_setzero_ps (),
+ __U);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_epi64 (__m512i __A, __m512i __B)
+_mm_mask_load_sd (__m128d __W, __mmask8 __U, const double *__P)
{
- return (__m512i) ((__v8du) __A - (__v8du) __B);
+ return (__m128d) __builtin_ia32_loadsd_mask (__P, (__v2df) __W, __U);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm_maskz_load_sd (__mmask8 __U, const double *__P)
{
- return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_loadsd_mask (__P, (__v2df) _mm_setzero_pd (),
+ __U);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm_mask_move_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_movess_mask ((__v4sf) __A, (__v4sf) __B,
+ (__v4sf) __W, __U);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sllv_epi64 (__m512i __X, __m512i __Y)
+_mm_maskz_move_ss (__mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
- (__v8di) __Y,
- (__v8di)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m128) __builtin_ia32_movess_mask ((__v4sf) __A, (__v4sf) __B,
+ (__v4sf) _mm_setzero_ps (), __U);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sllv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
+_mm_mask_move_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
- (__v8di) __Y,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_movesd_mask ((__v2df) __A, (__v2df) __B,
+ (__v2df) __W, __U);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sllv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
+_mm_maskz_move_sd (__mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
- (__v8di) __Y,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_movesd_mask ((__v2df) __A, (__v2df) __B,
+ (__v2df) _mm_setzero_pd (),
+ __U);
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srav_epi64 (__m512i __X, __m512i __Y)
+_mm_mask_store_ss (float *__P, __mmask8 __U, __m128 __A)
{
- return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
- (__v8di) __Y,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ __builtin_ia32_storess_mask (__P, (__v4sf) __A, (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srav_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
+_mm_mask_store_sd (double *__P, __mmask8 __U, __m128d __A)
{
- return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
- (__v8di) __Y,
- (__v8di) __W,
- (__mmask8) __U);
+ __builtin_ia32_storesd_mask (__P, (__v2df) __A, (__mmask8) __U);
}
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srav_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
+_mm_fixupimm_round_sd (__m128d __A, __m128d __B, __m128i __C,
+ const int __imm, const int __R)
{
- return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
- (__v8di) __Y,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
+ (__v2df) __B,
+ (__v2di) __C, __imm,
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srlv_epi64 (__m512i __X, __m512i __Y)
+_mm_mask_fixupimm_round_sd (__m128d __A, __mmask8 __U, __m128d __B,
+ __m128i __C, const int __imm, const int __R)
{
- return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
- (__v8di) __Y,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
+ (__v2df) __B,
+ (__v2di) __C, __imm,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srlv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
+_mm_maskz_fixupimm_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ __m128i __C, const int __imm, const int __R)
{
- return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
- (__v8di) __Y,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A,
+ (__v2df) __B,
+ (__v2di) __C,
+ __imm,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srlv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
+_mm_fixupimm_round_ss (__m128 __A, __m128 __B, __m128i __C,
+ const int __imm, const int __R)
{
- return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
- (__v8di) __Y,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4si) __C, __imm,
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_epi32 (__m512i __A, __m512i __B)
+_mm_mask_fixupimm_round_ss (__m128 __A, __mmask8 __U, __m128 __B,
+ __m128i __C, const int __imm, const int __R)
{
- return (__m512i) ((__v16su) __A + (__v16su) __B);
+ return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4si) __C, __imm,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm_maskz_fixupimm_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ __m128i __C, const int __imm, const int __R)
{
- return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4si) __C, __imm,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
-{
- return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
-}
+#else
+#define _mm_fixupimm_round_sd(X, Y, Z, C, R) \
+ ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
+ (__mmask8)(-1), (R)))
-extern __inline __m512i
+#define _mm_mask_fixupimm_round_sd(X, U, Y, Z, C, R) \
+ ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
+ (__mmask8)(U), (R)))
+
+#define _mm_maskz_fixupimm_round_sd(U, X, Y, Z, C, R) \
+ ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
+ (__mmask8)(U), (R)))
+
+#define _mm_fixupimm_round_ss(X, Y, Z, C, R) \
+ ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
+ (__mmask8)(-1), (R)))
+
+#define _mm_mask_fixupimm_round_ss(X, U, Y, Z, C, R) \
+ ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
+ (__mmask8)(U), (R)))
+
+#define _mm_maskz_fixupimm_round_ss(U, X, Y, Z, C, R) \
+ ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
+ (__mmask8)(U), (R)))
+
+#endif
+
+#ifdef __x86_64__
+#ifdef __OPTIMIZE__
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_epi32 (__m512i __X, __m512i __Y)
+_mm_cvt_roundss_u64 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf) __A, __R);
}
-extern __inline __m512i
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_epi32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y)
+_mm_cvt_roundss_si64 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v8di) __W, __M);
+ return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R);
}
-extern __inline __m512i
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_epi32 (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm_cvt_roundss_i64 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v8di)
- _mm512_setzero_si512 (),
- __M);
+ return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R);
}
-extern __inline __m512i
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_epi32 (__m512i __A, __m512i __B)
+_mm_cvtt_roundss_u64 (__m128 __A, const int __R)
{
- return (__m512i) ((__v16su) __A - (__v16su) __B);
+ return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf) __A, __R);
}
-extern __inline __m512i
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm_cvtt_roundss_i64 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R);
}
-extern __inline __m512i
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm_cvtt_roundss_si64 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R);
}
+#else
+#define _mm_cvt_roundss_u64(A, B) \
+ ((unsigned long long)__builtin_ia32_vcvtss2usi64(A, B))
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_epu32 (__m512i __X, __m512i __Y)
-{
- return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+#define _mm_cvt_roundss_si64(A, B) \
+ ((long long)__builtin_ia32_vcvtss2si64(A, B))
+
+#define _mm_cvt_roundss_i64(A, B) \
+ ((long long)__builtin_ia32_vcvtss2si64(A, B))
+
+#define _mm_cvtt_roundss_u64(A, B) \
+ ((unsigned long long)__builtin_ia32_vcvttss2usi64(A, B))
+
+#define _mm_cvtt_roundss_i64(A, B) \
+ ((long long)__builtin_ia32_vcvttss2si64(A, B))
+
+#define _mm_cvtt_roundss_si64(A, B) \
+ ((long long)__builtin_ia32_vcvttss2si64(A, B))
+#endif
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline unsigned
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundss_u32 (__m128 __A, const int __R)
+{
+ return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A, __R);
}
-extern __inline __m512i
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_epu32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y)
+_mm_cvt_roundss_si32 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v8di) __W, __M);
+ return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R);
}
-extern __inline __m512i
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_epu32 (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm_cvt_roundss_i32 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
- (__v16si) __Y,
- (__v8di)
- _mm512_setzero_si512 (),
- __M);
+ return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_slli_epi64 (__m512i __A, unsigned int __B)
+_mm_cvtt_roundss_u32 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A, __R);
}
-extern __inline __m512i
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_slli_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
- unsigned int __B)
+_mm_cvtt_roundss_i32 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R);
}
-extern __inline __m512i
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_slli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
+_mm_cvtt_roundss_si32 (__m128 __A, const int __R)
{
- return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R);
}
#else
-#define _mm512_slli_epi64(X, C) \
- ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X), \
- (unsigned int)(C), \
- (__v8di)(__m512i)_mm512_undefined_epi32 (), \
- (__mmask8)-1))
+#define _mm_cvt_roundss_u32(A, B) \
+ ((unsigned)__builtin_ia32_vcvtss2usi32(A, B))
-#define _mm512_mask_slli_epi64(W, U, X, C) \
- ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X), \
- (unsigned int)(C), \
- (__v8di)(__m512i)(W), \
- (__mmask8)(U)))
+#define _mm_cvt_roundss_si32(A, B) \
+ ((int)__builtin_ia32_vcvtss2si32(A, B))
-#define _mm512_maskz_slli_epi64(U, X, C) \
- ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X), \
- (unsigned int)(C), \
- (__v8di)(__m512i)_mm512_setzero_si512 (), \
- (__mmask8)(U)))
+#define _mm_cvt_roundss_i32(A, B) \
+ ((int)__builtin_ia32_vcvtss2si32(A, B))
+
+#define _mm_cvtt_roundss_u32(A, B) \
+ ((unsigned)__builtin_ia32_vcvttss2usi32(A, B))
+
+#define _mm_cvtt_roundss_si32(A, B) \
+ ((int)__builtin_ia32_vcvttss2si32(A, B))
+
+#define _mm_cvtt_roundss_i32(A, B) \
+ ((int)__builtin_ia32_vcvttss2si32(A, B))
#endif
-extern __inline __m512i
+#ifdef __x86_64__
+#ifdef __OPTIMIZE__
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sll_epi64 (__m512i __A, __m128i __B)
+_mm_cvt_roundsd_u64 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
- (__v2di) __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df) __A, __R);
}
-extern __inline __m512i
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sll_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
+_mm_cvt_roundsd_si64 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
- (__v2di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R);
}
-extern __inline __m512i
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sll_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
+_mm_cvt_roundsd_i64 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
- (__v2di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srli_epi64 (__m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_u64 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df) __A, __R);
}
-extern __inline __m512i
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srli_epi64 (__m512i __W, __mmask8 __U,
- __m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_si64 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R);
}
-extern __inline __m512i
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_i64 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R);
}
#else
-#define _mm512_srli_epi64(X, C) \
- ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X), \
- (unsigned int)(C), \
- (__v8di)(__m512i)_mm512_undefined_epi32 (), \
- (__mmask8)-1))
+#define _mm_cvt_roundsd_u64(A, B) \
+ ((unsigned long long)__builtin_ia32_vcvtsd2usi64(A, B))
-#define _mm512_mask_srli_epi64(W, U, X, C) \
- ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X), \
- (unsigned int)(C), \
- (__v8di)(__m512i)(W), \
- (__mmask8)(U)))
+#define _mm_cvt_roundsd_si64(A, B) \
+ ((long long)__builtin_ia32_vcvtsd2si64(A, B))
-#define _mm512_maskz_srli_epi64(U, X, C) \
- ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X), \
- (unsigned int)(C), \
- (__v8di)(__m512i)_mm512_setzero_si512 (), \
- (__mmask8)(U)))
+#define _mm_cvt_roundsd_i64(A, B) \
+ ((long long)__builtin_ia32_vcvtsd2si64(A, B))
+
+#define _mm_cvtt_roundsd_u64(A, B) \
+ ((unsigned long long)__builtin_ia32_vcvttsd2usi64(A, B))
+
+#define _mm_cvtt_roundsd_si64(A, B) \
+ ((long long)__builtin_ia32_vcvttsd2si64(A, B))
+
+#define _mm_cvtt_roundsd_i64(A, B) \
+ ((long long)__builtin_ia32_vcvttsd2si64(A, B))
+#endif
#endif
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srl_epi64 (__m512i __A, __m128i __B)
+_mm_cvt_roundsd_u32 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
- (__v2di) __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A, __R);
}
-extern __inline __m512i
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srl_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
+_mm_cvt_roundsd_si32 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
- (__v2di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R);
}
-extern __inline __m512i
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srl_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
+_mm_cvt_roundsd_i32 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
- (__v2di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srai_epi64 (__m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_u32 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A, __R);
}
-extern __inline __m512i
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srai_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
- unsigned int __B)
+_mm_cvtt_roundsd_i32 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R);
}
-extern __inline __m512i
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srai_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
+_mm_cvtt_roundsd_si32 (__m128d __A, const int __R)
{
- return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R);
}
-#else
-#define _mm512_srai_epi64(X, C) \
- ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X), \
- (unsigned int)(C), \
- (__v8di)(__m512i)_mm512_undefined_epi32 (), \
- (__mmask8)-1))
-
-#define _mm512_mask_srai_epi64(W, U, X, C) \
- ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X), \
- (unsigned int)(C), \
- (__v8di)(__m512i)(W), \
- (__mmask8)(U)))
-
-#define _mm512_maskz_srai_epi64(U, X, C) \
- ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X), \
- (unsigned int)(C), \
- (__v8di)(__m512i)_mm512_setzero_si512 (), \
- (__mmask8)(U)))
-#endif
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sra_epi64 (__m512i __A, __m128i __B)
+_mm_cvt_roundsd_ss (__m128 __A, __m128d __B, const int __R)
{
- return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
- (__v2di) __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m128) __builtin_ia32_cvtsd2ss_round ((__v4sf) __A,
+ (__v2df) __B,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sra_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
+_mm_mask_cvt_roundsd_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128d __B, const int __R)
{
- return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
- (__v2di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_cvtsd2ss_mask_round ((__v4sf) __A,
+ (__v2df) __B,
+ (__v4sf) __W,
+ __U,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sra_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
+_mm_maskz_cvt_roundsd_ss (__mmask8 __U, __m128 __A,
+ __m128d __B, const int __R)
{
- return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
- (__v2di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_cvtsd2ss_mask_round ((__v4sf) __A,
+ (__v2df) __B,
+ _mm_setzero_ps (),
+ __U,
+ __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_slli_epi32 (__m512i __A, unsigned int __B)
+_mm_cvt_roundss_sd (__m128d __A, __m128 __B, const int __R)
{
- return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_cvtss2sd_round ((__v2df) __A,
+ (__v4sf) __B,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_slli_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
- unsigned int __B)
+_mm_mask_cvt_roundss_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128 __B, const int __R)
{
- return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_cvtss2sd_mask_round ((__v2df) __A,
+ (__v4sf) __B,
+ (__v2df) __W,
+ __U,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_slli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
+_mm_maskz_cvt_roundss_sd (__mmask8 __U, __m128d __A,
+ __m128 __B, const int __R)
{
- return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_cvtss2sd_mask_round ((__v2df) __A,
+ (__v4sf) __B,
+ _mm_setzero_pd (),
+ __U,
+ __R);
}
-#else
-#define _mm512_slli_epi32(X, C) \
- ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X), \
- (unsigned int)(C), \
- (__v16si)(__m512i)_mm512_undefined_epi32 (), \
- (__mmask16)-1))
-
-#define _mm512_mask_slli_epi32(W, U, X, C) \
- ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X), \
- (unsigned int)(C), \
- (__v16si)(__m512i)(W), \
- (__mmask16)(U)))
-
-#define _mm512_maskz_slli_epi32(U, X, C) \
- ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X), \
- (unsigned int)(C), \
- (__v16si)(__m512i)_mm512_setzero_si512 (), \
- (__mmask16)(U)))
-#endif
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sll_epi32 (__m512i __A, __m128i __B)
+_mm_getexp_round_ss (__m128 __A, __m128 __B, const int __R)
{
- return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
- (__v4si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
+ (__v4sf) __B,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sll_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
+_mm_mask_getexp_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, const int __R)
{
- return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
- (__v4si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sll_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
+_mm_maskz_getexp_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
- (__v4si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srli_epi32 (__m512i __A, unsigned int __B)
+_mm_getexp_round_sd (__m128d __A, __m128d __B, const int __R)
{
- return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
+ (__v2df) __B,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srli_epi32 (__m512i __W, __mmask16 __U,
- __m512i __A, unsigned int __B)
+_mm_mask_getexp_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, const int __R)
{
- return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
+_mm_maskz_getexp_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
-#else
-#define _mm512_srli_epi32(X, C) \
- ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X), \
- (unsigned int)(C), \
- (__v16si)(__m512i)_mm512_undefined_epi32 (),\
- (__mmask16)-1))
-
-#define _mm512_mask_srli_epi32(W, U, X, C) \
- ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X), \
- (unsigned int)(C), \
- (__v16si)(__m512i)(W), \
- (__mmask16)(U)))
-
-#define _mm512_maskz_srli_epi32(U, X, C) \
- ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X), \
- (unsigned int)(C), \
- (__v16si)(__m512i)_mm512_setzero_si512 (), \
- (__mmask16)(U)))
-#endif
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srl_epi32 (__m512i __A, __m128i __B)
+_mm_getmant_round_sd (__m128d __A, __m128d __B,
+ _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D, const int __R)
{
- return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
- (__v4si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
+ (__v2df) __B,
+ (__D << 2) | __C,
+ __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srl_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
+_mm_mask_getmant_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D, const int __R)
{
- return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
- (__v4si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__D << 2) | __C,
+ (__v2df) __W,
+ __U, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srl_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
+_mm_maskz_getmant_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D, const int __R)
{
- return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
- (__v4si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
-}
-
-#ifdef __OPTIMIZE__
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srai_epi32 (__m512i __A, unsigned int __B)
-{
- return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_srai_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
- unsigned int __B)
-{
- return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
- (__v16si) __W,
- (__mmask16) __U);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_srai_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
-{
- return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
-}
-#else
-#define _mm512_srai_epi32(X, C) \
- ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X), \
- (unsigned int)(C), \
- (__v16si)(__m512i)_mm512_undefined_epi32 (),\
- (__mmask16)-1))
-
-#define _mm512_mask_srai_epi32(W, U, X, C) \
- ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X), \
- (unsigned int)(C), \
- (__v16si)(__m512i)(W), \
- (__mmask16)(U)))
-
-#define _mm512_maskz_srai_epi32(U, X, C) \
- ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X), \
- (unsigned int)(C), \
- (__v16si)(__m512i)_mm512_setzero_si512 (), \
- (__mmask16)(U)))
-#endif
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sra_epi32 (__m512i __A, __m128i __B)
-{
- return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
- (__v4si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sra_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
-{
- return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
- (__v4si) __B,
- (__v16si) __W,
- (__mmask16) __U);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sra_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
-{
- return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
- (__v4si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__D << 2) | __C,
+ (__v2df)
+ _mm_setzero_pd(),
+ __U, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_add_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_getmant_round_ss (__m128 __A, __m128 __B,
+ _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D, const int __R)
{
- return (__m128d) __builtin_ia32_addsd_round ((__v2df) __A,
- (__v2df) __B,
- __R);
+ return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__D << 2) | __C,
+ __R);
}
-extern __inline __m128d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, const int __R)
+_mm_mask_getmant_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D, const int __R)
{
- return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__D << 2) | __C,
+ (__v4sf) __W,
+ __U, __R);
}
-extern __inline __m128d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm_maskz_getmant_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D, const int __R)
{
- return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__D << 2) | __C,
+ (__v4sf)
+ _mm_setzero_ps(),
+ __U, __R);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_add_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm,
+ const int __R)
{
- return (__m128) __builtin_ia32_addss_round ((__v4sf) __A,
- (__v4sf) __B,
- __R);
+ return (__m128)
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __imm,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) -1,
+ __R);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, const int __R)
+_mm_mask_roundscale_round_ss (__m128 __A, __mmask8 __B, __m128 __C,
+ __m128 __D, const int __imm, const int __R)
{
- return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U, __R);
+ return (__m128)
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
+ (__v4sf) __D, __imm,
+ (__v4sf) __A,
+ (__mmask8) __B,
+ __R);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm_maskz_roundscale_round_ss (__mmask8 __A, __m128 __B, __m128 __C,
+ const int __imm, const int __R)
{
- return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U, __R);
+ return (__m128)
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
+ (__v4sf) __C, __imm,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __A,
+ __R);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sub_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm,
+ const int __R)
{
- return (__m128d) __builtin_ia32_subsd_round ((__v2df) __A,
- (__v2df) __B,
- __R);
+ return (__m128d)
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
+ (__v2df) __B, __imm,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) -1,
+ __R);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, const int __R)
+_mm_mask_roundscale_round_sd (__m128d __A, __mmask8 __B, __m128d __C,
+ __m128d __D, const int __imm, const int __R)
{
- return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U, __R);
+ return (__m128d)
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
+ (__v2df) __D, __imm,
+ (__v2df) __A,
+ (__mmask8) __B,
+ __R);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm_maskz_roundscale_round_sd (__mmask8 __A, __m128d __B, __m128d __C,
+ const int __imm, const int __R)
{
- return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__m128d)
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
+ (__v2df) __C, __imm,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __A,
+ __R);
}
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sub_round_ss (__m128 __A, __m128 __B, const int __R)
-{
- return (__m128) __builtin_ia32_subss_round ((__v4sf) __A,
- (__v4sf) __B,
- __R);
-}
+#else
+#define _mm_cvt_roundsd_u32(A, B) \
+ ((unsigned)__builtin_ia32_vcvtsd2usi32(A, B))
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, const int __R)
-{
- return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U, __R);
-}
+#define _mm_cvt_roundsd_si32(A, B) \
+ ((int)__builtin_ia32_vcvtsd2si32(A, B))
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
-{
- return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U, __R);
-}
+#define _mm_cvt_roundsd_i32(A, B) \
+ ((int)__builtin_ia32_vcvtsd2si32(A, B))
-#else
-#define _mm_add_round_sd(A, B, C) \
- (__m128d)__builtin_ia32_addsd_round(A, B, C)
+#define _mm_cvtt_roundsd_u32(A, B) \
+ ((unsigned)__builtin_ia32_vcvttsd2usi32(A, B))
-#define _mm_mask_add_round_sd(W, U, A, B, C) \
- (__m128d)__builtin_ia32_addsd_mask_round(A, B, W, U, C)
+#define _mm_cvtt_roundsd_si32(A, B) \
+ ((int)__builtin_ia32_vcvttsd2si32(A, B))
-#define _mm_maskz_add_round_sd(U, A, B, C) \
- (__m128d)__builtin_ia32_addsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+#define _mm_cvtt_roundsd_i32(A, B) \
+ ((int)__builtin_ia32_vcvttsd2si32(A, B))
-#define _mm_add_round_ss(A, B, C) \
- (__m128)__builtin_ia32_addss_round(A, B, C)
+#define _mm_cvt_roundsd_ss(A, B, C) \
+ (__m128)__builtin_ia32_cvtsd2ss_round(A, B, C)
-#define _mm_mask_add_round_ss(W, U, A, B, C) \
- (__m128)__builtin_ia32_addss_mask_round(A, B, W, U, C)
+#define _mm_mask_cvt_roundsd_ss(W, U, A, B, C) \
+ (__m128)__builtin_ia32_cvtsd2ss_mask_round ((A), (B), (W), (U), (C))
-#define _mm_maskz_add_round_ss(U, A, B, C) \
- (__m128)__builtin_ia32_addss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+#define _mm_maskz_cvt_roundsd_ss(U, A, B, C) \
+ (__m128)__builtin_ia32_cvtsd2ss_mask_round ((A), (B), _mm_setzero_ps (), \
+ (U), (C))
-#define _mm_sub_round_sd(A, B, C) \
- (__m128d)__builtin_ia32_subsd_round(A, B, C)
+#define _mm_cvt_roundss_sd(A, B, C) \
+ (__m128d)__builtin_ia32_cvtss2sd_round(A, B, C)
-#define _mm_mask_sub_round_sd(W, U, A, B, C) \
- (__m128d)__builtin_ia32_subsd_mask_round(A, B, W, U, C)
+#define _mm_mask_cvt_roundss_sd(W, U, A, B, C) \
+ (__m128d)__builtin_ia32_cvtss2sd_mask_round ((A), (B), (W), (U), (C))
-#define _mm_maskz_sub_round_sd(U, A, B, C) \
- (__m128d)__builtin_ia32_subsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+#define _mm_maskz_cvt_roundss_sd(U, A, B, C) \
+ (__m128d)__builtin_ia32_cvtss2sd_mask_round ((A), (B), _mm_setzero_pd (), \
+ (U), (C))
-#define _mm_sub_round_ss(A, B, C) \
- (__m128)__builtin_ia32_subss_round(A, B, C)
+#define _mm_getmant_round_sd(X, Y, C, D, R) \
+ ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (R)))
-#define _mm_mask_sub_round_ss(W, U, A, B, C) \
- (__m128)__builtin_ia32_subss_mask_round(A, B, W, U, C)
+#define _mm_mask_getmant_round_sd(W, U, X, Y, C, D, R) \
+ ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v2df)(__m128d)(W), \
+ (__mmask8)(U),\
+ (R)))
-#define _mm_maskz_sub_round_ss(U, A, B, C) \
- (__m128)__builtin_ia32_subss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+#define _mm_maskz_getmant_round_sd(U, X, Y, C, D, R) \
+ ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v2df)(__m128d)_mm_setzero_pd(), \
+ (__mmask8)(U),\
+ (R)))
+
+#define _mm_getmant_round_ss(X, Y, C, D, R) \
+ ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (R)))
+
+#define _mm_mask_getmant_round_ss(W, U, X, Y, C, D, R) \
+ ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v4sf)(__m128)(W), \
+ (__mmask8)(U),\
+ (R)))
+
+#define _mm_maskz_getmant_round_ss(U, X, Y, C, D, R) \
+ ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v4sf)(__m128)_mm_setzero_ps(), \
+ (__mmask8)(U),\
+ (R)))
+
+#define _mm_getexp_round_ss(A, B, R) \
+ ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), R))
+
+#define _mm_mask_getexp_round_ss(W, U, A, B, C) \
+ (__m128)__builtin_ia32_getexpss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_getexp_round_ss(U, A, B, C) \
+ (__m128)__builtin_ia32_getexpss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#define _mm_getexp_round_sd(A, B, R) \
+ ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B), R))
+
+#define _mm_mask_getexp_round_sd(W, U, A, B, C) \
+ (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_getexp_round_sd(U, A, B, C) \
+ (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_roundscale_round_ss(A, B, I, R) \
+ ((__m128) \
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A), \
+ (__v4sf) (__m128) (B), \
+ (int) (I), \
+ (__v4sf) _mm_setzero_ps (), \
+ (__mmask8) (-1), \
+ (int) (R)))
+#define _mm_mask_roundscale_round_ss(A, U, B, C, I, R) \
+ ((__m128) \
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (B), \
+ (__v4sf) (__m128) (C), \
+ (int) (I), \
+ (__v4sf) (__m128) (A), \
+ (__mmask8) (U), \
+ (int) (R)))
+#define _mm_maskz_roundscale_round_ss(U, A, B, I, R) \
+ ((__m128) \
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A), \
+ (__v4sf) (__m128) (B), \
+ (int) (I), \
+ (__v4sf) _mm_setzero_ps (), \
+ (__mmask8) (U), \
+ (int) (R)))
+#define _mm_roundscale_round_sd(A, B, I, R) \
+ ((__m128d) \
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A), \
+ (__v2df) (__m128d) (B), \
+ (int) (I), \
+ (__v2df) _mm_setzero_pd (), \
+ (__mmask8) (-1), \
+ (int) (R)))
+#define _mm_mask_roundscale_round_sd(A, U, B, C, I, R) \
+ ((__m128d) \
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (B), \
+ (__v2df) (__m128d) (C), \
+ (int) (I), \
+ (__v2df) (__m128d) (A), \
+ (__mmask8) (U), \
+ (int) (R)))
+#define _mm_maskz_roundscale_round_sd(U, A, B, I, R) \
+ ((__m128d) \
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A), \
+ (__v2df) (__m128d) (B), \
+ (int) (I), \
+ (__v2df) _mm_setzero_pd (), \
+ (__mmask8) (U), \
+ (int) (R)))
#endif
-/* Constant helper to represent the ternary logic operations among
- vector A, B and C. */
-typedef enum
-{
- _MM_TERNLOG_A = 0xF0,
- _MM_TERNLOG_B = 0xCC,
- _MM_TERNLOG_C = 0xAA
-} _MM_TERNLOG_ENUM;
+#define _mm_mask_cvtss_sd(W, U, A, B) \
+ _mm_mask_cvt_roundss_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_cvtss_sd(U, A, B) \
+ _mm_maskz_cvt_roundss_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_mask_cvtsd_ss(W, U, A, B) \
+ _mm_mask_cvt_roundsd_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_cvtsd_ss(U, A, B) \
+ _mm_maskz_cvt_roundsd_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ternarylogic_epi64 (__m512i __A, __m512i __B, __m512i __C,
- const int __imm)
+_kshiftli_mask16 (__mmask16 __A, unsigned int __B)
{
- return (__m512i)
- __builtin_ia32_pternlogq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __C,
- (unsigned char) __imm,
- (__mmask8) -1);
+ return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A,
+ (__mmask8) __B);
}
-extern __inline __m512i
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ternarylogic_epi64 (__m512i __A, __mmask8 __U, __m512i __B,
- __m512i __C, const int __imm)
+_kshiftri_mask16 (__mmask16 __A, unsigned int __B)
{
- return (__m512i)
- __builtin_ia32_pternlogq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __C,
- (unsigned char) __imm,
- (__mmask8) __U);
+ return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A,
+ (__mmask8) __B);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_ternarylogic_epi64 (__mmask8 __U, __m512i __A, __m512i __B,
- __m512i __C, const int __imm)
+_mm_cmp_round_sd_mask (__m128d __X, __m128d __Y, const int __P, const int __R)
{
- return (__m512i)
- __builtin_ia32_pternlogq512_maskz ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __C,
- (unsigned char) __imm,
- (__mmask8) __U);
+ return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
+ (__v2df) __Y, __P,
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ternarylogic_epi32 (__m512i __A, __m512i __B, __m512i __C,
- const int __imm)
+_mm_mask_cmp_round_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y,
+ const int __P, const int __R)
{
- return (__m512i)
- __builtin_ia32_pternlogd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __C,
- (unsigned char) __imm,
- (__mmask16) -1);
+ return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
+ (__v2df) __Y, __P,
+ (__mmask8) __M, __R);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ternarylogic_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
- __m512i __C, const int __imm)
+_mm_cmp_round_ss_mask (__m128 __X, __m128 __Y, const int __P, const int __R)
{
- return (__m512i)
- __builtin_ia32_pternlogd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __C,
- (unsigned char) __imm,
- (__mmask16) __U);
+ return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
+ (__v4sf) __Y, __P,
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_ternarylogic_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
- __m512i __C, const int __imm)
+_mm_mask_cmp_round_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y,
+ const int __P, const int __R)
{
- return (__m512i)
- __builtin_ia32_pternlogd512_maskz ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __C,
- (unsigned char) __imm,
- (__mmask16) __U);
+ return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
+ (__v4sf) __Y, __P,
+ (__mmask8) __M, __R);
}
+
#else
-#define _mm512_ternarylogic_epi64(A, B, C, I) \
- ((__m512i) \
- __builtin_ia32_pternlogq512_mask ((__v8di) (__m512i) (A), \
- (__v8di) (__m512i) (B), \
- (__v8di) (__m512i) (C), \
- (unsigned char) (I), \
- (__mmask8) -1))
-#define _mm512_mask_ternarylogic_epi64(A, U, B, C, I) \
- ((__m512i) \
- __builtin_ia32_pternlogq512_mask ((__v8di) (__m512i) (A), \
- (__v8di) (__m512i) (B), \
- (__v8di) (__m512i) (C), \
- (unsigned char)(I), \
- (__mmask8) (U)))
-#define _mm512_maskz_ternarylogic_epi64(U, A, B, C, I) \
- ((__m512i) \
- __builtin_ia32_pternlogq512_maskz ((__v8di) (__m512i) (A), \
- (__v8di) (__m512i) (B), \
- (__v8di) (__m512i) (C), \
- (unsigned char) (I), \
- (__mmask8) (U)))
-#define _mm512_ternarylogic_epi32(A, B, C, I) \
- ((__m512i) \
- __builtin_ia32_pternlogd512_mask ((__v16si) (__m512i) (A), \
- (__v16si) (__m512i) (B), \
- (__v16si) (__m512i) (C), \
- (unsigned char) (I), \
- (__mmask16) -1))
-#define _mm512_mask_ternarylogic_epi32(A, U, B, C, I) \
- ((__m512i) \
- __builtin_ia32_pternlogd512_mask ((__v16si) (__m512i) (A), \
- (__v16si) (__m512i) (B), \
- (__v16si) (__m512i) (C), \
- (unsigned char) (I), \
- (__mmask16) (U)))
-#define _mm512_maskz_ternarylogic_epi32(U, A, B, C, I) \
- ((__m512i) \
- __builtin_ia32_pternlogd512_maskz ((__v16si) (__m512i) (A), \
- (__v16si) (__m512i) (B), \
- (__v16si) (__m512i) (C), \
- (unsigned char) (I), \
- (__mmask16) (U)))
-#endif
+#define _kshiftli_mask16(X, Y) \
+ ((__mmask16) __builtin_ia32_kshiftlihi ((__mmask16)(X), (__mmask8)(Y)))
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rcp14_pd (__m512d __A)
-{
- return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
-}
+#define _kshiftri_mask16(X, Y) \
+ ((__mmask16) __builtin_ia32_kshiftrihi ((__mmask16)(X), (__mmask8)(Y)))
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rcp14_pd (__m512d __W, __mmask8 __U, __m512d __A)
-{
- return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U);
-}
+#define _mm_cmp_round_sd_mask(X, Y, P, R) \
+ ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (int)(P),\
+ (__mmask8)-1, R))
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rcp14_pd (__mmask8 __U, __m512d __A)
-{
- return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
-}
+#define _mm_mask_cmp_round_sd_mask(M, X, Y, P, R) \
+ ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (int)(P),\
+ (M), R))
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rcp14_ps (__m512 __A)
-{
- return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
-}
+#define _mm_cmp_round_ss_mask(X, Y, P, R) \
+ ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (int)(P), \
+ (__mmask8)-1, R))
-extern __inline __m512
+#define _mm_mask_cmp_round_ss_mask(M, X, Y, P, R) \
+ ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (int)(P), \
+ (M), R))
+
+#endif
+
+extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rcp14_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_kortest_mask16_u8 (__mmask16 __A, __mmask16 __B, unsigned char *__CF)
{
- return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U);
+ *__CF = (unsigned char) __builtin_ia32_kortestchi (__A, __B);
+ return (unsigned char) __builtin_ia32_kortestzhi (__A, __B);
}
-extern __inline __m512
+extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rcp14_ps (__mmask16 __U, __m512 __A)
+_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
{
- return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (unsigned char) __builtin_ia32_kortestzhi ((__mmask16) __A,
+ (__mmask16) __B);
}
-extern __inline __m128d
+extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rcp14_sd (__m128d __A, __m128d __B)
+_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
{
- return (__m128d) __builtin_ia32_rcp14sd ((__v2df) __B,
- (__v2df) __A);
+ return (unsigned char) __builtin_ia32_kortestchi ((__mmask16) __A,
+ (__mmask16) __B);
}
-extern __inline __m128d
+extern __inline unsigned int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rcp14_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_cvtmask16_u32 (__mmask16 __A)
{
- return (__m128d) __builtin_ia32_rcp14sd_mask ((__v2df) __B,
- (__v2df) __A,
- (__v2df) __W,
- (__mmask8) __U);
+ return (unsigned int) __builtin_ia32_kmovw ((__mmask16 ) __A);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rcp14_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_cvtu32_mask16 (unsigned int __A)
{
- return (__m128d) __builtin_ia32_rcp14sd_mask ((__v2df) __B,
- (__v2df) __A,
- (__v2df) _mm_setzero_ps (),
- (__mmask8) __U);
+ return (__mmask16) __builtin_ia32_kmovw ((__mmask16 ) __A);
}
-extern __inline __m128
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rcp14_ss (__m128 __A, __m128 __B)
+_load_mask16 (__mmask16 *__A)
{
- return (__m128) __builtin_ia32_rcp14ss ((__v4sf) __B,
- (__v4sf) __A);
+ return (__mmask16) __builtin_ia32_kmovw (*(__mmask16 *) __A);
}
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rcp14_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_store_mask16 (__mmask16 *__A, __mmask16 __B)
{
- return (__m128) __builtin_ia32_rcp14ss_mask ((__v4sf) __B,
- (__v4sf) __A,
- (__v4sf) __W,
- (__mmask8) __U);
+ *(__mmask16 *) __A = __builtin_ia32_kmovw (__B);
}
-extern __inline __m128
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rcp14_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_kand_mask16 (__mmask16 __A, __mmask16 __B)
{
- return (__m128) __builtin_ia32_rcp14ss_mask ((__v4sf) __B,
- (__v4sf) __A,
- (__v4sf) _mm_setzero_ps (),
- (__mmask8) __U);
+ return (__mmask16) __builtin_ia32_kandhi ((__mmask16) __A, (__mmask16) __B);
}
-extern __inline __m512d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rsqrt14_pd (__m512d __A)
+_kandn_mask16 (__mmask16 __A, __mmask16 __B)
{
- return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+ (__mmask16) __B);
}
-extern __inline __m512d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rsqrt14_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_kor_mask16 (__mmask16 __A, __mmask16 __B)
{
- return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
}
-extern __inline __m512d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rsqrt14_pd (__mmask8 __U, __m512d __A)
+_kxnor_mask16 (__mmask16 __A, __mmask16 __B)
{
- return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__mmask16) __builtin_ia32_kxnorhi ((__mmask16) __A, (__mmask16) __B);
}
-extern __inline __m512
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rsqrt14_ps (__m512 __A)
+_kxor_mask16 (__mmask16 __A, __mmask16 __B)
{
- return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__mmask16) __builtin_ia32_kxorhi ((__mmask16) __A, (__mmask16) __B);
}
-extern __inline __m512
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rsqrt14_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_knot_mask16 (__mmask16 __A)
{
- return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__mmask16) __builtin_ia32_knothi ((__mmask16) __A);
}
-extern __inline __m512
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rsqrt14_ps (__mmask16 __U, __m512 __A)
+_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
{
- return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
}
+#ifdef __OPTIMIZE__
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rsqrt14_sd (__m128d __A, __m128d __B)
+_mm_max_round_sd (__m128d __A, __m128d __B, const int __R)
{
- return (__m128d) __builtin_ia32_rsqrt14sd ((__v2df) __B,
- (__v2df) __A);
+ return (__m128d) __builtin_ia32_maxsd_round ((__v2df) __A,
+ (__v2df) __B,
+ __R);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rsqrt14_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm_mask_max_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, const int __R)
{
- return (__m128d) __builtin_ia32_rsqrt14sd_mask ((__v2df) __B,
- (__v2df) __A,
+ return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
(__v2df) __W,
- (__mmask8) __U);
+ (__mmask8) __U, __R);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rsqrt14_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm_maskz_max_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m128d) __builtin_ia32_rsqrt14sd_mask ((__v2df) __B,
- (__v2df) __A,
- (__v2df) _mm_setzero_pd (),
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rsqrt14_ss (__m128 __A, __m128 __B)
+_mm_max_round_ss (__m128 __A, __m128 __B, const int __R)
{
- return (__m128) __builtin_ia32_rsqrt14ss ((__v4sf) __B,
- (__v4sf) __A);
+ return (__m128) __builtin_ia32_maxss_round ((__v4sf) __A,
+ (__v4sf) __B,
+ __R);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rsqrt14_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
-{
- return (__m128) __builtin_ia32_rsqrt14ss_mask ((__v4sf) __B,
- (__v4sf) __A,
+_mm_mask_max_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, const int __R)
+{
+ return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
(__v4sf) __W,
- (__mmask8) __U);
+ (__mmask8) __U, __R);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rsqrt14_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm_maskz_max_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m128) __builtin_ia32_rsqrt14ss_mask ((__v4sf) __B,
- (__v4sf) __A,
- (__v4sf) _mm_setzero_ps (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_round_pd (__m512d __A, const int __R)
+_mm_min_round_sd (__m128d __A, __m128d __B, const int __R)
{
- return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (__m128d) __builtin_ia32_minsd_round ((__v2df) __A,
+ (__v2df) __B,
+ __R);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
- const int __R)
+_mm_mask_min_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, const int __R)
{
- return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_round_pd (__mmask8 __U, __m512d __A, const int __R)
+_mm_maskz_min_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_round_ps (__m512 __A, const int __R)
+_mm_min_round_ss (__m128 __A, __m128 __B, const int __R)
{
- return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (__m128) __builtin_ia32_minss_round ((__v4sf) __A,
+ (__v4sf) __B,
+ __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_round_ps (__m512 __W, __mmask16 __U, __m512 __A, const int __R)
+_mm_mask_min_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, const int __R)
{
- return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_round_ps (__mmask16 __U, __m512 __A, const int __R)
+_mm_maskz_min_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
+#else
+#define _mm_max_round_sd(A, B, C) \
+ (__m128d)__builtin_ia32_maxsd_round(A, B, C)
+
+#define _mm_mask_max_round_sd(W, U, A, B, C) \
+ (__m128d)__builtin_ia32_maxsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_max_round_sd(U, A, B, C) \
+ (__m128d)__builtin_ia32_maxsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_max_round_ss(A, B, C) \
+ (__m128)__builtin_ia32_maxss_round(A, B, C)
+
+#define _mm_mask_max_round_ss(W, U, A, B, C) \
+ (__m128)__builtin_ia32_maxss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_max_round_ss(U, A, B, C) \
+ (__m128)__builtin_ia32_maxss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#define _mm_min_round_sd(A, B, C) \
+ (__m128d)__builtin_ia32_minsd_round(A, B, C)
+
+#define _mm_mask_min_round_sd(W, U, A, B, C) \
+ (__m128d)__builtin_ia32_minsd_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_min_round_sd(U, A, B, C) \
+ (__m128d)__builtin_ia32_minsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+
+#define _mm_min_round_ss(A, B, C) \
+ (__m128)__builtin_ia32_minss_round(A, B, C)
+
+#define _mm_mask_min_round_ss(W, U, A, B, C) \
+ (__m128)__builtin_ia32_minss_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_min_round_ss(U, A, B, C) \
+ (__m128)__builtin_ia32_minss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
+
+#endif
+
+#ifdef __OPTIMIZE__
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sqrt_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
{
- return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
- (__v2df) __A,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) -1, __R);
+ return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+ (__v2df) __A,
+ (__v2df) __B,
+ __R);
}
-extern __inline __m128d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sqrt_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
{
- return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
- (__v2df) __A,
- (__v2df) __W,
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ __R);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sqrt_round_sd (__mmask8 __U, __m128d __A, __m128d __B, const int __R)
+_mm_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
{
- return (__m128d) __builtin_ia32_sqrtsd_mask_round ((__v2df) __B,
- (__v2df) __A,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+ (__v2df) __A,
+ -(__v2df) __B,
+ __R);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sqrt_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
{
- return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
- (__v4sf) __A,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) -1, __R);
+ return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+ (__v4sf) __A,
+ -(__v4sf) __B,
+ __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+ return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+ -(__v2df) __A,
+ (__v2df) __B,
+ __R);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sqrt_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
{
- return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
- (__v4sf) __A,
- (__v4sf) __W,
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+ -(__v4sf) __A,
+ (__v4sf) __B,
+ __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+ return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+ -(__v2df) __A,
+ -(__v2df) __B,
+ __R);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sqrt_round_ss (__mmask8 __U, __m128 __A, __m128 __B, const int __R)
+_mm_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
{
- return (__m128) __builtin_ia32_sqrtss_mask_round ((__v4sf) __B,
- (__v4sf) __A,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+ -(__v4sf) __A,
+ -(__v4sf) __B,
+ __R);
}
#else
-#define _mm512_sqrt_round_pd(A, C) \
- (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, C)
+#define _mm_fmadd_round_sd(A, B, C, R) \
+ (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, C, R)
-#define _mm512_mask_sqrt_round_pd(W, U, A, C) \
- (__m512d)__builtin_ia32_sqrtpd512_mask(A, W, U, C)
+#define _mm_fmadd_round_ss(A, B, C, R) \
+ (__m128)__builtin_ia32_vfmaddss3_round(A, B, C, R)
-#define _mm512_maskz_sqrt_round_pd(U, A, C) \
- (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_setzero_pd(), U, C)
+#define _mm_fmsub_round_sd(A, B, C, R) \
+ (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, -(C), R)
-#define _mm512_sqrt_round_ps(A, C) \
- (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_undefined_ps(), -1, C)
+#define _mm_fmsub_round_ss(A, B, C, R) \
+ (__m128)__builtin_ia32_vfmaddss3_round(A, B, -(C), R)
-#define _mm512_mask_sqrt_round_ps(W, U, A, C) \
- (__m512)__builtin_ia32_sqrtps512_mask(A, W, U, C)
+#define _mm_fnmadd_round_sd(A, B, C, R) \
+ (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), C, R)
-#define _mm512_maskz_sqrt_round_ps(U, A, C) \
- (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_setzero_ps(), U, C)
+#define _mm_fnmadd_round_ss(A, B, C, R) \
+ (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), C, R)
-#define _mm_sqrt_round_sd(A, B, C) \
- (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, \
- (__v2df) _mm_setzero_pd (), -1, C)
+#define _mm_fnmsub_round_sd(A, B, C, R) \
+ (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), -(C), R)
-#define _mm_mask_sqrt_round_sd(W, U, A, B, C) \
- (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, W, U, C)
-
-#define _mm_maskz_sqrt_round_sd(U, A, B, C) \
- (__m128d)__builtin_ia32_sqrtsd_mask_round (B, A, \
- (__v2df) _mm_setzero_pd (), U, C)
-
-#define _mm_sqrt_round_ss(A, B, C) \
- (__m128)__builtin_ia32_sqrtss_mask_round (B, A, \
- (__v4sf) _mm_setzero_ps (), -1, C)
-
-#define _mm_mask_sqrt_round_ss(W, U, A, B, C) \
- (__m128)__builtin_ia32_sqrtss_mask_round (B, A, W, U, C)
-
-#define _mm_maskz_sqrt_round_ss(U, A, B, C) \
- (__m128)__builtin_ia32_sqrtss_mask_round (B, A, \
- (__v4sf) _mm_setzero_ps (), U, C)
+#define _mm_fnmsub_round_ss(A, B, C, R) \
+ (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), -(C), R)
#endif
-#define _mm_mask_sqrt_sd(W, U, A, B) \
- _mm_mask_sqrt_round_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_sqrt_sd(U, A, B) \
- _mm_maskz_sqrt_round_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_mask_sqrt_ss(W, U, A, B) \
- _mm_mask_sqrt_round_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_sqrt_ss(U, A, B) \
- _mm_maskz_sqrt_round_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi8_epi32 (__m128i __A)
+_mm_mask_fmadd_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+ (__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi8_epi32 (__m512i __W, __mmask16 __U, __m128i __A)
+_mm_mask_fmadd_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi8_epi32 (__mmask16 __U, __m128i __A)
+_mm_mask3_fmadd_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
+ (__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi8_epi64 (__m128i __A)
+_mm_mask3_fmadd_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi8_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
+_mm_maskz_fmadd_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+ (__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi8_epi64 (__mmask8 __U, __m128i __A)
+_mm_maskz_fmadd_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi16_epi32 (__m256i __A)
+_mm_mask_fmsub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+ (__v2df) __A,
+ -(__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi16_epi32 (__m512i __W, __mmask16 __U, __m256i __A)
+_mm_mask_fmsub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+ (__v4sf) __A,
+ -(__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi16_epi32 (__mmask16 __U, __m256i __A)
+_mm_mask3_fmsub_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
+ (__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi16_epi64 (__m128i __A)
+_mm_mask3_fmsub_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi16_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
+_mm_maskz_fmsub_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+ (__v2df) __A,
+ -(__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi16_epi64 (__mmask8 __U, __m128i __A)
+_mm_maskz_fmsub_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+ (__v4sf) __A,
+ -(__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_epi64 (__m256i __X)
+_mm_mask_fnmadd_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+ -(__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_epi64 (__m512i __W, __mmask8 __U, __m256i __X)
+_mm_mask_fnmadd_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+ -(__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_epi64 (__mmask8 __U, __m256i __X)
+_mm_mask3_fnmadd_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
+ -(__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu8_epi32 (__m128i __A)
+_mm_mask3_fnmadd_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
+ -(__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu8_epi32 (__m512i __W, __mmask16 __U, __m128i __A)
+_mm_maskz_fnmadd_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+ -(__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu8_epi32 (__mmask16 __U, __m128i __A)
+_mm_maskz_fnmadd_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu8_epi64 (__m128i __A)
-{
- return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+ -(__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu8_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
+_mm_mask_fnmsub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+ -(__v2df) __A,
+ -(__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu8_epi64 (__mmask8 __U, __m128i __A)
+_mm_mask_fnmsub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+ -(__v4sf) __A,
+ -(__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu16_epi32 (__m256i __A)
+_mm_mask3_fnmsub_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
+ -(__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu16_epi32 (__m512i __W, __mmask16 __U, __m256i __A)
+_mm_mask3_fnmsub_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
+ -(__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu16_epi32 (__mmask16 __U, __m256i __A)
+_mm_maskz_fnmsub_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
{
- return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+ -(__v2df) __A,
+ -(__v2df) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu16_epi64 (__m128i __A)
+_mm_maskz_fnmsub_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
{
- return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+ -(__v4sf) __A,
+ -(__v4sf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu16_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
+_mm_mask_fmadd_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+ (__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu16_epi64 (__mmask8 __U, __m128i __A)
+_mm_mask_fmadd_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu32_epi64 (__m256i __X)
+_mm_mask3_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
+ const int __R)
{
- return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
+ (__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu32_epi64 (__m512i __W, __mmask8 __U, __m256i __X)
+_mm_mask3_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
+ const int __R)
{
- return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu32_epi64 (__mmask8 __U, __m256i __X)
+_mm_maskz_fmadd_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+ (__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_maskz_fmadd_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B, const int __R)
+_mm_mask_fmsub_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+ (__v2df) __A,
+ -(__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- const int __R)
+_mm_mask_fmsub_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
+ return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+ (__v4sf) __A,
+ -(__v4sf) __B,
(__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_mask3_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
+ const int __R)
{
- return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
+ (__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
- __m512 __B, const int __R)
+_mm_mask3_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
+ const int __R)
{
- return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
+ (__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_maskz_fmsub_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+ (__v2df) __A,
+ -(__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_maskz_fmsub_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+ (__v4sf) __A,
+ -(__v4sf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B, const int __R)
+_mm_mask_fnmadd_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+ -(__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- const int __R)
+_mm_mask_fnmadd_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
+ return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+ -(__v4sf) __A,
+ (__v4sf) __B,
(__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_mask3_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
+ const int __R)
{
- return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
+ -(__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
- __m512 __B, const int __R)
+_mm_mask3_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
+ const int __R)
{
- return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
+ -(__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_maskz_fnmadd_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+ -(__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U, __R);
}
-#else
-#define _mm512_add_round_pd(A, B, C) \
- (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-
-#define _mm512_mask_add_round_pd(W, U, A, B, C) \
- (__m512d)__builtin_ia32_addpd512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_add_round_pd(U, A, B, C) \
- (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-
-#define _mm512_add_round_ps(A, B, C) \
- (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-
-#define _mm512_mask_add_round_ps(W, U, A, B, C) \
- (__m512)__builtin_ia32_addps512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_add_round_ps(U, A, B, C) \
- (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-
-#define _mm512_sub_round_pd(A, B, C) \
- (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-
-#define _mm512_mask_sub_round_pd(W, U, A, B, C) \
- (__m512d)__builtin_ia32_subpd512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_sub_round_pd(U, A, B, C) \
- (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-
-#define _mm512_sub_round_ps(A, B, C) \
- (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-
-#define _mm512_mask_sub_round_ps(W, U, A, B, C) \
- (__m512)__builtin_ia32_subps512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_sub_round_ps(U, A, B, C) \
- (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-#endif
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_maskz_fnmadd_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+ -(__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B, const int __R)
+_mm_mask_fnmsub_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
+ -(__v2df) __A,
+ -(__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- const int __R)
+_mm_mask_fnmsub_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
+ return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
+ -(__v4sf) __A,
+ -(__v4sf) __B,
(__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_mask3_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
+ const int __R)
{
- return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
+ -(__v2df) __A,
+ (__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
- __m512 __B, const int __R)
+_mm_mask3_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
+ const int __R)
{
- return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
+ -(__v4sf) __A,
+ (__v4sf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_maskz_fnmsub_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
+ const int __R)
{
- return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
+ -(__v2df) __A,
+ -(__v2df) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_round_pd (__m512d __M, __m512d __V, const int __R)
+_mm_maskz_fnmsub_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
- (__v8df) __V,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
+ -(__v4sf) __A,
+ -(__v4sf) __B,
+ (__mmask8) __U, __R);
}
+#else
+#define _mm_mask_fmadd_round_sd(A, U, B, C, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_mask (A, B, C, U, R)
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_round_pd (__m512d __W, __mmask8 __U, __m512d __M,
- __m512d __V, const int __R)
-{
- return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
- (__v8df) __V,
- (__v8df) __W,
- (__mmask8) __U, __R);
-}
+#define _mm_mask_fmadd_round_ss(A, U, B, C, R) \
+ (__m128) __builtin_ia32_vfmaddss3_mask (A, B, C, U, R)
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_round_pd (__mmask8 __U, __m512d __M, __m512d __V,
- const int __R)
-{
- return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
- (__v8df) __V,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U, __R);
-}
+#define _mm_mask3_fmadd_round_sd(A, B, C, U, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_mask3 (A, B, C, U, R)
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_round_ps (__m512 __A, __m512 __B, const int __R)
-{
- return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
-}
+#define _mm_mask3_fmadd_round_ss(A, B, C, U, R) \
+ (__m128) __builtin_ia32_vfmaddss3_mask3 (A, B, C, U, R)
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
- __m512 __B, const int __R)
-{
- return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U, __R);
-}
+#define _mm_maskz_fmadd_round_sd(U, A, B, C, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, B, C, U, R)
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
-{
- return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+#define _mm_maskz_fmadd_round_ss(U, A, B, C, R) \
+ (__m128) __builtin_ia32_vfmaddss3_maskz (A, B, C, U, R)
+
+#define _mm_mask_fmsub_round_sd(A, U, B, C, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_mask (A, B, -(C), U, R)
+
+#define _mm_mask_fmsub_round_ss(A, U, B, C, R) \
+ (__m128) __builtin_ia32_vfmaddss3_mask (A, B, -(C), U, R)
+
+#define _mm_mask3_fmsub_round_sd(A, B, C, U, R) \
+ (__m128d) __builtin_ia32_vfmsubsd3_mask3 (A, B, C, U, R)
+
+#define _mm_mask3_fmsub_round_ss(A, B, C, U, R) \
+ (__m128) __builtin_ia32_vfmsubss3_mask3 (A, B, C, U, R)
+
+#define _mm_maskz_fmsub_round_sd(U, A, B, C, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, B, -(C), U, R)
+
+#define _mm_maskz_fmsub_round_ss(U, A, B, C, R) \
+ (__m128) __builtin_ia32_vfmaddss3_maskz (A, B, -(C), U, R)
+
+#define _mm_mask_fnmadd_round_sd(A, U, B, C, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_mask (A, -(B), C, U, R)
+
+#define _mm_mask_fnmadd_round_ss(A, U, B, C, R) \
+ (__m128) __builtin_ia32_vfmaddss3_mask (A, -(B), C, U, R)
+
+#define _mm_mask3_fnmadd_round_sd(A, B, C, U, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_mask3 (A, -(B), C, U, R)
+
+#define _mm_mask3_fnmadd_round_ss(A, B, C, U, R) \
+ (__m128) __builtin_ia32_vfmaddss3_mask3 (A, -(B), C, U, R)
+
+#define _mm_maskz_fnmadd_round_sd(U, A, B, C, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, -(B), C, U, R)
+
+#define _mm_maskz_fnmadd_round_ss(U, A, B, C, R) \
+ (__m128) __builtin_ia32_vfmaddss3_maskz (A, -(B), C, U, R)
+
+#define _mm_mask_fnmsub_round_sd(A, U, B, C, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_mask (A, -(B), -(C), U, R)
+
+#define _mm_mask_fnmsub_round_ss(A, U, B, C, R) \
+ (__m128) __builtin_ia32_vfmaddss3_mask (A, -(B), -(C), U, R)
+
+#define _mm_mask3_fnmsub_round_sd(A, B, C, U, R) \
+ (__m128d) __builtin_ia32_vfmsubsd3_mask3 (A, -(B), C, U, R)
+
+#define _mm_mask3_fnmsub_round_ss(A, B, C, U, R) \
+ (__m128) __builtin_ia32_vfmsubss3_mask3 (A, -(B), C, U, R)
+
+#define _mm_maskz_fnmsub_round_sd(U, A, B, C, R) \
+ (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, -(B), -(C), U, R)
+
+#define _mm_maskz_fnmsub_round_ss(U, A, B, C, R) \
+ (__m128) __builtin_ia32_vfmaddss3_maskz (A, -(B), -(C), U, R)
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comi_round_ss (__m128 __A, __m128 __B, const int __P, const int __R)
+{
+ return __builtin_ia32_vcomiss ((__v4sf) __A, (__v4sf) __B, __P, __R);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comi_round_sd (__m128d __A, __m128d __B, const int __P, const int __R)
+{
+ return __builtin_ia32_vcomisd ((__v2df) __A, (__v2df) __B, __P, __R);
}
+#else
+#define _mm_comi_round_ss(A, B, C, D)\
+__builtin_ia32_vcomiss(A, B, C, D)
+#define _mm_comi_round_sd(A, B, C, D)\
+__builtin_ia32_vcomisd(A, B, C, D)
+#endif
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mul_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_mask_add_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m128d) __builtin_ia32_mulsd_round ((__v2df) __A,
- (__v2df) __B,
- __R);
+ return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, const int __R)
+_mm_maskz_add_sd (__mmask8 __U, __m128d __A, __m128d __B)
+{
+ return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_add_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+{
+ return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_add_ss (__mmask8 __U, __m128 __A, __m128 __B)
+{
+ return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+{
+ return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sub_sd (__mmask8 __U, __m128d __A, __m128d __B)
+{
+ return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+{
+ return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sub_ss (__mmask8 __U, __m128 __A, __m128 __B)
+{
+ return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_mul_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B)
{
return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
(__v2df) __B,
(__v2df) __W,
- (__mmask8) __U, __R);
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm_maskz_mul_sd (__mmask8 __U, __m128d __A, __m128d __B)
{
return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
(__v2df) __B,
(__v2df)
_mm_setzero_pd (),
- (__mmask8) __U, __R);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mul_round_ss (__m128 __A, __m128 __B, const int __R)
-{
- return (__m128) __builtin_ia32_mulss_round ((__v4sf) __A,
- (__v4sf) __B,
- __R);
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, const int __R)
+_mm_mask_mul_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B)
{
return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
(__v4sf) __B,
(__v4sf) __W,
- (__mmask8) __U, __R);
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm_maskz_mul_ss (__mmask8 __U, __m128 __A, __m128 __B)
{
return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
(__v4sf) __B,
(__v4sf)
_mm_setzero_ps (),
- (__mmask8) __U, __R);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_div_round_sd (__m128d __A, __m128d __B, const int __R)
-{
- return (__m128d) __builtin_ia32_divsd_round ((__v2df) __A,
- (__v2df) __B,
- __R);
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, const int __R)
+_mm_mask_div_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B)
{
return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
(__v2df) __B,
(__v2df) __W,
- (__mmask8) __U, __R);
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm_maskz_div_sd (__mmask8 __U, __m128d __A, __m128d __B)
{
return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
(__v2df) __B,
(__v2df)
_mm_setzero_pd (),
- (__mmask8) __U, __R);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_div_round_ss (__m128 __A, __m128 __B, const int __R)
-{
- return (__m128) __builtin_ia32_divss_round ((__v4sf) __A,
- (__v4sf) __B,
- __R);
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, const int __R)
+_mm_mask_div_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B)
{
return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
(__v4sf) __B,
(__v4sf) __W,
- (__mmask8) __U, __R);
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm_maskz_div_ss (__mmask8 __U, __m128 __A, __m128 __B)
{
return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
(__v4sf) __B,
(__v4sf)
_mm_setzero_ps (),
- (__mmask8) __U, __R);
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm512_mul_round_pd(A, B, C) \
- (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-
-#define _mm512_mask_mul_round_pd(W, U, A, B, C) \
- (__m512d)__builtin_ia32_mulpd512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_mul_round_pd(U, A, B, C) \
- (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-
-#define _mm512_mul_round_ps(A, B, C) \
- (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-
-#define _mm512_mask_mul_round_ps(W, U, A, B, C) \
- (__m512)__builtin_ia32_mulps512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_mul_round_ps(U, A, B, C) \
- (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-
-#define _mm512_div_round_pd(A, B, C) \
- (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-
-#define _mm512_mask_div_round_pd(W, U, A, B, C) \
- (__m512d)__builtin_ia32_divpd512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_div_round_pd(U, A, B, C) \
- (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-
-#define _mm512_div_round_ps(A, B, C) \
- (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-
-#define _mm512_mask_div_round_ps(W, U, A, B, C) \
- (__m512)__builtin_ia32_divps512_mask(A, B, W, U, C)
-
-#define _mm512_maskz_div_round_ps(U, A, B, C) \
- (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-
-#define _mm_mul_round_sd(A, B, C) \
- (__m128d)__builtin_ia32_mulsd_round(A, B, C)
-
-#define _mm_mask_mul_round_sd(W, U, A, B, C) \
- (__m128d)__builtin_ia32_mulsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_mul_round_sd(U, A, B, C) \
- (__m128d)__builtin_ia32_mulsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
-
-#define _mm_mul_round_ss(A, B, C) \
- (__m128)__builtin_ia32_mulss_round(A, B, C)
-
-#define _mm_mask_mul_round_ss(W, U, A, B, C) \
- (__m128)__builtin_ia32_mulss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_mul_round_ss(U, A, B, C) \
- (__m128)__builtin_ia32_mulss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#define _mm_div_round_sd(A, B, C) \
- (__m128d)__builtin_ia32_divsd_round(A, B, C)
-
-#define _mm_mask_div_round_sd(W, U, A, B, C) \
- (__m128d)__builtin_ia32_divsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_div_round_sd(U, A, B, C) \
- (__m128d)__builtin_ia32_divsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
-
-#define _mm_div_round_ss(A, B, C) \
- (__m128)__builtin_ia32_divss_round(A, B, C)
-
-#define _mm_mask_div_round_ss(W, U, A, B, C) \
- (__m128)__builtin_ia32_divss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_div_round_ss(U, A, B, C) \
- (__m128)__builtin_ia32_divss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#endif
-
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_mask_max_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B, const int __R)
+_mm_maskz_max_sd (__mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- const int __R)
+_mm_mask_max_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_maskz_max_ss (__mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
- __m512 __B, const int __R)
+_mm_mask_min_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_maskz_min_sd (__mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_mask_min_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B, const int __R)
+_mm_maskz_min_ss (__mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- const int __R)
+_mm_scalef_sd (__m128d __A, __m128d __B)
{
- return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_scalef_ss (__m128 __A, __m128 __B)
{
- return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+#ifdef __x86_64__
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
- __m512 __B, const int __R)
+_mm_cvtu64_ss (__m128 __A, unsigned long long __B)
{
- return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+_mm_cvtu64_sd (__m128d __A, unsigned long long __B)
{
- return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm512_max_round_pd(A, B, R) \
- (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R)
-
-#define _mm512_mask_max_round_pd(W, U, A, B, R) \
- (__m512d)__builtin_ia32_maxpd512_mask(A, B, W, U, R)
+#endif
-#define _mm512_maskz_max_round_pd(U, A, B, R) \
- (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R)
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtu32_ss (__m128 __A, unsigned __B)
+{
+ return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm512_max_round_ps(A, B, R) \
- (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_undefined_pd(), -1, R)
+#ifdef __OPTIMIZE__
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fixupimm_sd (__m128d __A, __m128d __B, __m128i __C, const int __imm)
+{
+ return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
+ (__v2df) __B,
+ (__v2di) __C, __imm,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm512_mask_max_round_ps(W, U, A, B, R) \
- (__m512)__builtin_ia32_maxps512_mask(A, B, W, U, R)
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fixupimm_sd (__m128d __A, __mmask8 __U, __m128d __B,
+ __m128i __C, const int __imm)
+{
+ return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
+ (__v2df) __B,
+ (__v2di) __C, __imm,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm512_maskz_max_round_ps(U, A, B, R) \
- (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R)
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fixupimm_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ __m128i __C, const int __imm)
+{
+ return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A,
+ (__v2df) __B,
+ (__v2di) __C,
+ __imm,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm512_min_round_pd(A, B, R) \
- (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R)
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fixupimm_ss (__m128 __A, __m128 __B, __m128i __C, const int __imm)
+{
+ return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4si) __C, __imm,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm512_mask_min_round_pd(W, U, A, B, R) \
- (__m512d)__builtin_ia32_minpd512_mask(A, B, W, U, R)
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fixupimm_ss (__m128 __A, __mmask8 __U, __m128 __B,
+ __m128i __C, const int __imm)
+{
+ return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4si) __C, __imm,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm512_maskz_min_round_pd(U, A, B, R) \
- (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R)
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fixupimm_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ __m128i __C, const int __imm)
+{
+ return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4si) __C, __imm,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm512_min_round_ps(A, B, R) \
- (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, R)
+#else
+#define _mm_fixupimm_sd(X, Y, Z, C) \
+ ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
+ (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
-#define _mm512_mask_min_round_ps(W, U, A, B, R) \
- (__m512)__builtin_ia32_minps512_mask(A, B, W, U, R)
+#define _mm_mask_fixupimm_sd(X, U, Y, Z, C) \
+ ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_fixupimm_sd(U, X, Y, Z, C) \
+ ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_fixupimm_ss(X, Y, Z, C) \
+ ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
+ (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_fixupimm_ss(X, U, Y, Z, C) \
+ ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_fixupimm_ss(U, X, Y, Z, C) \
+ ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-#define _mm512_maskz_min_round_ps(U, A, B, R) \
- (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R)
#endif
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+#ifdef __x86_64__
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_round_pd (__m512d __A, __m512d __B, const int __R)
+_mm_cvtss_u64 (__m128 __A)
{
- return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf)
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B, const int __R)
+_mm_cvttss_u64 (__m128 __A)
{
- return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U, __R);
+ return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf)
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- const int __R)
+_mm_cvttss_i64 (__m128 __A)
{
- return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U, __R);
+ return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
+#endif /* __x86_64__ */
-extern __inline __m512
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_round_ps (__m512 __A, __m512 __B, const int __R)
+_mm_cvtss_u32 (__m128 __A)
{
- return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
- __m512 __B, const int __R)
+_mm_cvttss_u32 (__m128 __A)
{
- return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
- const int __R)
+_mm_cvttss_i32 (__m128 __A)
{
- return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm_cvtsd_i32 (__m128d __A)
{
- return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) -1, __R);
+ return (int) __builtin_ia32_cvtsd2si ((__v2df) __A);
}
-extern __inline __m128d
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_scalef_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm_cvtss_i32 (__m128 __A)
{
- return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U, __R);
+ return (int) __builtin_ia32_cvtss2si ((__v4sf) __A);
}
extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_scalef_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm_cvti32_sd (__m128d __A, int __B)
{
- return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_cvtsi2sd ((__v2df) __A, __B);
}
extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm_cvti32_ss (__m128 __A, int __B)
{
- return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) -1, __R);
+ return (__m128) __builtin_ia32_cvtsi2ss ((__v4sf) __A, __B);
}
-extern __inline __m128
+#ifdef __x86_64__
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_scalef_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm_cvtsd_u64 (__m128d __A)
{
- return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U, __R);
+ return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df)
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_scalef_round_ss (__mmask8 __U, __m128 __A, __m128 __B, const int __R)
+_mm_cvttsd_u64 (__m128d __A)
{
- return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U, __R);
+ return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df)
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm512_scalef_round_pd(A, B, C) \
- ((__m512d) \
- __builtin_ia32_scalefpd512_mask((A), (B), \
- (__v8df) _mm512_undefined_pd(), \
- -1, (C)))
-#define _mm512_mask_scalef_round_pd(W, U, A, B, C) \
- ((__m512d) __builtin_ia32_scalefpd512_mask((A), (B), (W), (U), (C)))
-
-#define _mm512_maskz_scalef_round_pd(U, A, B, C) \
- ((__m512d) \
- __builtin_ia32_scalefpd512_mask((A), (B), \
- (__v8df) _mm512_setzero_pd(), \
- (U), (C)))
-
-#define _mm512_scalef_round_ps(A, B, C) \
- ((__m512) \
- __builtin_ia32_scalefps512_mask((A), (B), \
- (__v16sf) _mm512_undefined_ps(), \
- -1, (C)))
-
-#define _mm512_mask_scalef_round_ps(W, U, A, B, C) \
- ((__m512) __builtin_ia32_scalefps512_mask((A), (B), (W), (U), (C)))
-
-#define _mm512_maskz_scalef_round_ps(U, A, B, C) \
- ((__m512) \
- __builtin_ia32_scalefps512_mask((A), (B), \
- (__v16sf) _mm512_setzero_ps(), \
- (U), (C)))
-
-#define _mm_scalef_round_sd(A, B, C) \
- ((__m128d) \
- __builtin_ia32_scalefsd_mask_round ((A), (B), \
- (__v2df) _mm_undefined_pd (), \
- -1, (C)))
-
-#define _mm_scalef_round_ss(A, B, C) \
- ((__m128) \
- __builtin_ia32_scalefss_mask_round ((A), (B), \
- (__v4sf) _mm_undefined_ps (), \
- -1, (C)))
-
-#define _mm_mask_scalef_round_sd(W, U, A, B, C) \
- ((__m128d) \
- __builtin_ia32_scalefsd_mask_round ((A), (B), (W), (U), (C)))
-
-#define _mm_mask_scalef_round_ss(W, U, A, B, C) \
- ((__m128) \
- __builtin_ia32_scalefss_mask_round ((A), (B), (W), (U), (C)))
-
-#define _mm_maskz_scalef_round_sd(U, A, B, C) \
- ((__m128d) \
- __builtin_ia32_scalefsd_mask_round ((A), (B), \
- (__v2df) _mm_setzero_pd (), \
- (U), (C)))
-
-#define _mm_maskz_scalef_round_ss(U, A, B, C) \
- ((__m128) \
- __builtin_ia32_scalefss_mask_round ((A), (B), \
- (__v4sf) _mm_setzero_ps (), \
- (U), (C)))
-#endif
-
-#define _mm_mask_scalef_sd(W, U, A, B) \
- _mm_mask_scalef_round_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_scalef_sd(U, A, B) \
- _mm_maskz_scalef_round_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_mask_scalef_ss(W, U, A, B) \
- _mm_mask_scalef_round_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_scalef_ss(U, A, B) \
- _mm_maskz_scalef_round_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
+_mm_cvttsd_i64 (__m128d __A)
{
- return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) -1, __R);
+ return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
- __m512d __C, const int __R)
+_mm_cvtsd_i64 (__m128d __A)
{
- return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (long long) __builtin_ia32_cvtsd2si64 ((__v2df) __A);
}
-extern __inline __m512d
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
- __mmask8 __U, const int __R)
+_mm_cvtss_i64 (__m128 __A)
{
- return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (long long) __builtin_ia32_cvtss2si64 ((__v4sf) __A);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- __m512d __C, const int __R)
+_mm_cvti64_sd (__m128d __A, long long __B)
{
- return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_cvtsi642sd ((__v2df) __A, __B);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+_mm_cvti64_ss (__m128 __A, long long __B)
{
- return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) -1, __R);
+ return (__m128) __builtin_ia32_cvtsi642ss ((__v4sf) __A, __B);
}
+#endif /* __x86_64__ */
-extern __inline __m512
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
- __m512 __C, const int __R)
+_mm_cvtsd_u32 (__m128d __A)
{
- return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
- __mmask16 __U, const int __R)
+_mm_cvttsd_u32 (__m128d __A)
{
- return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
- __m512 __C, const int __R)
+_mm_cvttsd_i32 (__m128d __A)
{
- return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+#ifdef __OPTIMIZE__
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
+_mm_getexp_ss (__m128 __A, __m128 __B)
{
- return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) -1, __R);
+ return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
+ (__v4sf) __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
- __m512d __C, const int __R)
+_mm_mask_getexp_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
- __mmask8 __U, const int __R)
+_mm_maskz_getexp_ss (__mmask8 __U, __m128 __A, __m128 __B)
{
- return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- __m512d __C, const int __R)
+_mm_getexp_sd (__m128d __A, __m128d __B)
{
- return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
+ (__v2df) __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+_mm_mask_getexp_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) -1, __R);
+ return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
- __m512 __C, const int __R)
+_mm_maskz_getexp_sd (__mmask8 __U, __m128d __A, __m128d __B)
{
- return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
- __mmask16 __U, const int __R)
+_mm_getmant_sd (__m128d __A, __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D)
{
- return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
+ (__v2df) __B,
+ (__D << 2) | __C,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
- __m512 __C, const int __R)
+_mm_mask_getmant_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+ _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
{
- return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
-}
-
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
-{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) -1, __R);
+ return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__D << 2) | __C,
+ (__v2df) __W,
+ __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
- __m512d __C, const int __R)
+_mm_maskz_getmant_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
+ (__v2df) __B,
+ (__D << 2) | __C,
+ (__v2df)
+ _mm_setzero_pd(),
+ __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
- __mmask8 __U, const int __R)
+_mm_getmant_ss (__m128 __A, __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__D << 2) | __C,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- __m512d __C, const int __R)
+_mm_mask_getmant_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+ _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__D << 2) | __C,
+ (__v4sf) __W,
+ __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+_mm_maskz_getmant_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) -1, __R);
+ return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
+ (__v4sf) __B,
+ (__D << 2) | __C,
+ (__v4sf)
+ _mm_setzero_ps(),
+ __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
- __m512 __C, const int __R)
+_mm_roundscale_ss (__m128 __A, __m128 __B, const int __imm)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return (__m128)
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __imm,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
- __mmask16 __U, const int __R)
-{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
-}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
- __m512 __C, const int __R)
+_mm_mask_roundscale_ss (__m128 __A, __mmask8 __B, __m128 __C, __m128 __D,
+ const int __imm)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return (__m128)
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
+ (__v4sf) __D, __imm,
+ (__v4sf) __A,
+ (__mmask8) __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
+_mm_maskz_roundscale_ss (__mmask8 __A, __m128 __B, __m128 __C,
+ const int __imm)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- -(__v8df) __C,
- (__mmask8) -1, __R);
+ return (__m128)
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
+ (__v4sf) __C, __imm,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
- __m512d __C, const int __R)
+_mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- -(__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128d)
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
+ (__v2df) __B, __imm,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
- __mmask8 __U, const int __R)
+_mm_mask_roundscale_sd (__m128d __A, __mmask8 __B, __m128d __C, __m128d __D,
+ const int __imm)
{
- return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128d)
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
+ (__v2df) __D, __imm,
+ (__v2df) __A,
+ (__mmask8) __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- __m512d __C, const int __R)
+_mm_maskz_roundscale_sd (__mmask8 __A, __m128d __B, __m128d __C,
+ const int __imm)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- -(__v8df) __C,
- (__mmask8) __U, __R);
+ return (__m128d)
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
+ (__v2df) __C, __imm,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+_mm_cmp_sd_mask (__m128d __X, __m128d __Y, const int __P)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- -(__v16sf) __C,
- (__mmask16) -1, __R);
+ return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
+ (__v2df) __Y, __P,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
- __m512 __C, const int __R)
+_mm_mask_cmp_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y, const int __P)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- -(__v16sf) __C,
- (__mmask16) __U, __R);
+ return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
+ (__v2df) __Y, __P,
+ (__mmask8) __M,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
- __mmask16 __U, const int __R)
+_mm_cmp_ss_mask (__m128 __X, __m128 __Y, const int __P)
{
- return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
+ (__v4sf) __Y, __P,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
- __m512 __C, const int __R)
+_mm_mask_cmp_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y, const int __P)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- -(__v16sf) __C,
- (__mmask16) __U, __R);
+ return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
+ (__v4sf) __Y, __P,
+ (__mmask8) __M,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
-{
- return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) -1, __R);
-}
+#else
+#define _mm_getmant_sd(X, Y, C, D) \
+ ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), \
+ (int)(((D)<<2) | (C)), \
+ _MM_FROUND_CUR_DIRECTION))
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
- __m512d __C, const int __R)
-{
- return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
-}
+#define _mm_mask_getmant_sd(W, U, X, Y, C, D) \
+ ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v2df)(__m128d)(W), \
+ (__mmask8)(U),\
+ _MM_FROUND_CUR_DIRECTION))
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
- __mmask8 __U, const int __R)
-{
- return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
-}
+#define _mm_maskz_getmant_sd(U, X, Y, C, D) \
+ ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v2df)_mm_setzero_pd(), \
+ (__mmask8)(U),\
+ _MM_FROUND_CUR_DIRECTION))
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- __m512d __C, const int __R)
-{
- return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
-}
+#define _mm_getmant_ss(X, Y, C, D) \
+ ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), \
+ (int)(((D)<<2) | (C)), \
+ _MM_FROUND_CUR_DIRECTION))
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
-{
- return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) -1, __R);
-}
+#define _mm_mask_getmant_ss(W, U, X, Y, C, D) \
+ ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v4sf)(__m128)(W), \
+ (__mmask8)(U),\
+ _MM_FROUND_CUR_DIRECTION))
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
- __m512 __C, const int __R)
-{
- return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
-}
+#define _mm_maskz_getmant_ss(U, X, Y, C, D) \
+ ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v4sf)_mm_setzero_ps(), \
+ (__mmask8)(U),\
+ _MM_FROUND_CUR_DIRECTION))
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
- __mmask16 __U, const int __R)
-{
- return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
-}
+#define _mm_getexp_ss(A, B) \
+ ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), \
+ _MM_FROUND_CUR_DIRECTION))
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
- __m512 __C, const int __R)
-{
- return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
-}
+#define _mm_mask_getexp_ss(W, U, A, B) \
+ (__m128)__builtin_ia32_getexpss_mask_round(A, B, W, U,\
+ _MM_FROUND_CUR_DIRECTION)
-extern __inline __m512d
+#define _mm_maskz_getexp_ss(U, A, B) \
+ (__m128)__builtin_ia32_getexpss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U,\
+ _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_getexp_sd(A, B) \
+ ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B),\
+ _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_getexp_sd(W, U, A, B) \
+ (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, W, U,\
+ _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_maskz_getexp_sd(U, A, B) \
+ (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U,\
+ _MM_FROUND_CUR_DIRECTION)
+
+#define _mm_roundscale_ss(A, B, I) \
+ ((__m128) \
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A), \
+ (__v4sf) (__m128) (B), \
+ (int) (I), \
+ (__v4sf) _mm_setzero_ps (), \
+ (__mmask8) (-1), \
+ _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask_roundscale_ss(A, U, B, C, I) \
+ ((__m128) \
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (B), \
+ (__v4sf) (__m128) (C), \
+ (int) (I), \
+ (__v4sf) (__m128) (A), \
+ (__mmask8) (U), \
+ _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_roundscale_ss(U, A, B, I) \
+ ((__m128) \
+ __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A), \
+ (__v4sf) (__m128) (B), \
+ (int) (I), \
+ (__v4sf) _mm_setzero_ps (), \
+ (__mmask8) (U), \
+ _MM_FROUND_CUR_DIRECTION))
+#define _mm_roundscale_sd(A, B, I) \
+ ((__m128d) \
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A), \
+ (__v2df) (__m128d) (B), \
+ (int) (I), \
+ (__v2df) _mm_setzero_pd (), \
+ (__mmask8) (-1), \
+ _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask_roundscale_sd(A, U, B, C, I) \
+ ((__m128d) \
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (B), \
+ (__v2df) (__m128d) (C), \
+ (int) (I), \
+ (__v2df) (__m128d) (A), \
+ (__mmask8) (U), \
+ _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_roundscale_sd(U, A, B, I) \
+ ((__m128d) \
+ __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A), \
+ (__v2df) (__m128d) (B), \
+ (int) (I), \
+ (__v2df) _mm_setzero_pd (), \
+ (__mmask8) (U), \
+ _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_cmp_sd_mask(X, Y, P) \
+ ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (int)(P),\
+ (__mmask8)-1,_MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_cmp_sd_mask(M, X, Y, P) \
+ ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \
+ (__v2df)(__m128d)(Y), (int)(P),\
+ M,_MM_FROUND_CUR_DIRECTION))
+
+#define _mm_cmp_ss_mask(X, Y, P) \
+ ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (int)(P), \
+ (__mmask8)-1,_MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_cmp_ss_mask(M, X, Y, P) \
+ ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \
+ (__v4sf)(__m128)(Y), (int)(P), \
+ M,_MM_FROUND_CUR_DIRECTION))
+
+#endif
+
+#ifdef __DISABLE_AVX512F__
+#undef __DISABLE_AVX512F__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512F__ */
+
+#if !defined (__AVX512F__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512f,evex512")
+#define __DISABLE_AVX512F_512__
+#endif /* __AVX512F_512__ */
+
+/* Internal data types for implementing the intrinsics. */
+typedef double __v8df __attribute__ ((__vector_size__ (64)));
+typedef float __v16sf __attribute__ ((__vector_size__ (64)));
+typedef long long __v8di __attribute__ ((__vector_size__ (64)));
+typedef unsigned long long __v8du __attribute__ ((__vector_size__ (64)));
+typedef int __v16si __attribute__ ((__vector_size__ (64)));
+typedef unsigned int __v16su __attribute__ ((__vector_size__ (64)));
+typedef short __v32hi __attribute__ ((__vector_size__ (64)));
+typedef unsigned short __v32hu __attribute__ ((__vector_size__ (64)));
+typedef char __v64qi __attribute__ ((__vector_size__ (64)));
+typedef unsigned char __v64qu __attribute__ ((__vector_size__ (64)));
+
+/* The Intel API is flexible enough that we must allow aliasing with other
+ vector types, and their scalar components. */
+typedef float __m512 __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+/* Unaligned version of the same type. */
+typedef float __m512_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
+typedef long long __m512i_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
+typedef double __m512d_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
+
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
+_mm512_int2mask (int __M)
{
- return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) -1, __R);
+ return (__mmask16) __M;
}
-extern __inline __m512d
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
- __m512d __C, const int __R)
+_mm512_mask2int (__mmask16 __M)
{
- return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return (int) __M;
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
- __mmask8 __U, const int __R)
+_mm512_set_epi64 (long long __A, long long __B, long long __C,
+ long long __D, long long __E, long long __F,
+ long long __G, long long __H)
{
- return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return __extension__ (__m512i) (__v8di)
+ { __H, __G, __F, __E, __D, __C, __B, __A };
}
-extern __inline __m512d
+/* Create the vector [A B C D E F G H I J K L M N O P]. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- __m512d __C, const int __R)
+_mm512_set_epi32 (int __A, int __B, int __C, int __D,
+ int __E, int __F, int __G, int __H,
+ int __I, int __J, int __K, int __L,
+ int __M, int __N, int __O, int __P)
{
- return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U, __R);
+ return __extension__ (__m512i)(__v16si)
+ { __P, __O, __N, __M, __L, __K, __J, __I,
+ __H, __G, __F, __E, __D, __C, __B, __A };
}
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set_epi16 (short __q31, short __q30, short __q29, short __q28,
+ short __q27, short __q26, short __q25, short __q24,
+ short __q23, short __q22, short __q21, short __q20,
+ short __q19, short __q18, short __q17, short __q16,
+ short __q15, short __q14, short __q13, short __q12,
+ short __q11, short __q10, short __q09, short __q08,
+ short __q07, short __q06, short __q05, short __q04,
+ short __q03, short __q02, short __q01, short __q00)
{
- return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) -1, __R);
+ return __extension__ (__m512i)(__v32hi){
+ __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
+ __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
+ __q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
+ __q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31
+ };
}
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
- __m512 __C, const int __R)
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set_epi8 (char __q63, char __q62, char __q61, char __q60,
+ char __q59, char __q58, char __q57, char __q56,
+ char __q55, char __q54, char __q53, char __q52,
+ char __q51, char __q50, char __q49, char __q48,
+ char __q47, char __q46, char __q45, char __q44,
+ char __q43, char __q42, char __q41, char __q40,
+ char __q39, char __q38, char __q37, char __q36,
+ char __q35, char __q34, char __q33, char __q32,
+ char __q31, char __q30, char __q29, char __q28,
+ char __q27, char __q26, char __q25, char __q24,
+ char __q23, char __q22, char __q21, char __q20,
+ char __q19, char __q18, char __q17, char __q16,
+ char __q15, char __q14, char __q13, char __q12,
+ char __q11, char __q10, char __q09, char __q08,
+ char __q07, char __q06, char __q05, char __q04,
+ char __q03, char __q02, char __q01, char __q00)
{
- return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return __extension__ (__m512i)(__v64qi){
+ __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07,
+ __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15,
+ __q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23,
+ __q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31,
+ __q32, __q33, __q34, __q35, __q36, __q37, __q38, __q39,
+ __q40, __q41, __q42, __q43, __q44, __q45, __q46, __q47,
+ __q48, __q49, __q50, __q51, __q52, __q53, __q54, __q55,
+ __q56, __q57, __q58, __q59, __q60, __q61, __q62, __q63
+ };
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
- __mmask16 __U, const int __R)
+_mm512_set_pd (double __A, double __B, double __C, double __D,
+ double __E, double __F, double __G, double __H)
{
- return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return __extension__ (__m512d)
+ { __H, __G, __F, __E, __D, __C, __B, __A };
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
- __m512 __C, const int __R)
+_mm512_set_ps (float __A, float __B, float __C, float __D,
+ float __E, float __F, float __G, float __H,
+ float __I, float __J, float __K, float __L,
+ float __M, float __N, float __O, float __P)
{
- return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U, __R);
+ return __extension__ (__m512)
+ { __P, __O, __N, __M, __L, __K, __J, __I,
+ __H, __G, __F, __E, __D, __C, __B, __A };
}
-#else
-#define _mm512_fmadd_round_pd(A, B, C, R) \
- (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmadd_round_pd(A, U, B, C, R) \
- (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmadd_round_pd(A, B, C, U, R) \
- (__m512d)__builtin_ia32_vfmaddpd512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmadd_round_pd(U, A, B, C, R) \
- (__m512d)__builtin_ia32_vfmaddpd512_maskz(A, B, C, U, R)
-
-#define _mm512_fmadd_round_ps(A, B, C, R) \
- (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmadd_round_ps(A, U, B, C, R) \
- (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmadd_round_ps(A, B, C, U, R) \
- (__m512)__builtin_ia32_vfmaddps512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmadd_round_ps(U, A, B, C, R) \
- (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, C, U, R)
-
-#define _mm512_fmsub_round_pd(A, B, C, R) \
- (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmsub_round_pd(A, U, B, C, R) \
- (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmsub_round_pd(A, B, C, U, R) \
- (__m512d)__builtin_ia32_vfmsubpd512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmsub_round_pd(U, A, B, C, R) \
- (__m512d)__builtin_ia32_vfmsubpd512_maskz(A, B, C, U, R)
-
-#define _mm512_fmsub_round_ps(A, B, C, R) \
- (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmsub_round_ps(A, U, B, C, R) \
- (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmsub_round_ps(A, B, C, U, R) \
- (__m512)__builtin_ia32_vfmsubps512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmsub_round_ps(U, A, B, C, R) \
- (__m512)__builtin_ia32_vfmsubps512_maskz(A, B, C, U, R)
-#define _mm512_fmaddsub_round_pd(A, B, C, R) \
- (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, -1, R)
-
-#define _mm512_mask_fmaddsub_round_pd(A, U, B, C, R) \
- (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, U, R)
-
-#define _mm512_mask3_fmaddsub_round_pd(A, B, C, U, R) \
- (__m512d)__builtin_ia32_vfmaddsubpd512_mask3(A, B, C, U, R)
-
-#define _mm512_maskz_fmaddsub_round_pd(U, A, B, C, R) \
- (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, C, U, R)
-
-#define _mm512_fmaddsub_round_ps(A, B, C, R) \
- (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, -1, R)
+#define _mm512_setr_epi64(e0,e1,e2,e3,e4,e5,e6,e7) \
+ _mm512_set_epi64(e7,e6,e5,e4,e3,e2,e1,e0)
-#define _mm512_mask_fmaddsub_round_ps(A, U, B, C, R) \
- (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, U, R)
+#define _mm512_setr_epi32(e0,e1,e2,e3,e4,e5,e6,e7, \
+ e8,e9,e10,e11,e12,e13,e14,e15) \
+ _mm512_set_epi32(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0)
-#define _mm512_mask3_fmaddsub_round_ps(A, B, C, U, R) \
- (__m512)__builtin_ia32_vfmaddsubps512_mask3(A, B, C, U, R)
+#define _mm512_setr_pd(e0,e1,e2,e3,e4,e5,e6,e7) \
+ _mm512_set_pd(e7,e6,e5,e4,e3,e2,e1,e0)
-#define _mm512_maskz_fmaddsub_round_ps(U, A, B, C, R) \
- (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, C, U, R)
+#define _mm512_setr_ps(e0,e1,e2,e3,e4,e5,e6,e7,e8,e9,e10,e11,e12,e13,e14,e15) \
+ _mm512_set_ps(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0)
-#define _mm512_fmsubadd_round_pd(A, B, C, R) \
- (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), -1, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_undefined_ps (void)
+{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+ __m512 __Y = __Y;
+#pragma GCC diagnostic pop
+ return __Y;
+}
-#define _mm512_mask_fmsubadd_round_pd(A, U, B, C, R) \
- (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), U, R)
+#define _mm512_undefined _mm512_undefined_ps
-#define _mm512_mask3_fmsubadd_round_pd(A, B, C, U, R) \
- (__m512d)__builtin_ia32_vfmsubaddpd512_mask3(A, B, C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_undefined_pd (void)
+{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+ __m512d __Y = __Y;
+#pragma GCC diagnostic pop
+ return __Y;
+}
-#define _mm512_maskz_fmsubadd_round_pd(U, A, B, C, R) \
- (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, -(C), U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_undefined_epi32 (void)
+{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+ __m512i __Y = __Y;
+#pragma GCC diagnostic pop
+ return __Y;
+}
-#define _mm512_fmsubadd_round_ps(A, B, C, R) \
- (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), -1, R)
+#define _mm512_undefined_si512 _mm512_undefined_epi32
-#define _mm512_mask_fmsubadd_round_ps(A, U, B, C, R) \
- (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi8 (char __A)
+{
+ return __extension__ (__m512i)(__v64qi)
+ { __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A };
+}
-#define _mm512_mask3_fmsubadd_round_ps(A, B, C, U, R) \
- (__m512)__builtin_ia32_vfmsubaddps512_mask3(A, B, C, U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi16 (short __A)
+{
+ return __extension__ (__m512i)(__v32hi)
+ { __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A };
+}
-#define _mm512_maskz_fmsubadd_round_ps(U, A, B, C, R) \
- (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, -(C), U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_pd (double __A)
+{
+ return __extension__ (__m512d)(__v8df)
+ { __A, __A, __A, __A, __A, __A, __A, __A };
+}
-#define _mm512_fnmadd_round_pd(A, B, C, R) \
- (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, -1, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_ps (float __A)
+{
+ return __extension__ (__m512)(__v16sf)
+ { __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A };
+}
-#define _mm512_mask_fnmadd_round_pd(A, U, B, C, R) \
- (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, U, R)
+/* Create the vector [A B C D A B C D A B C D A B C D]. */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_epi32 (int __A, int __B, int __C, int __D)
+{
+ return __extension__ (__m512i)(__v16si)
+ { __D, __C, __B, __A, __D, __C, __B, __A,
+ __D, __C, __B, __A, __D, __C, __B, __A };
+}
-#define _mm512_mask3_fnmadd_round_pd(A, B, C, U, R) \
- (__m512d)__builtin_ia32_vfnmaddpd512_mask3(A, B, C, U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_epi64 (long long __A, long long __B, long long __C,
+ long long __D)
+{
+ return __extension__ (__m512i) (__v8di)
+ { __D, __C, __B, __A, __D, __C, __B, __A };
+}
-#define _mm512_maskz_fnmadd_round_pd(U, A, B, C, R) \
- (__m512d)__builtin_ia32_vfnmaddpd512_maskz(A, B, C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_pd (double __A, double __B, double __C, double __D)
+{
+ return __extension__ (__m512d)
+ { __D, __C, __B, __A, __D, __C, __B, __A };
+}
-#define _mm512_fnmadd_round_ps(A, B, C, R) \
- (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, -1, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_ps (float __A, float __B, float __C, float __D)
+{
+ return __extension__ (__m512)
+ { __D, __C, __B, __A, __D, __C, __B, __A,
+ __D, __C, __B, __A, __D, __C, __B, __A };
+}
-#define _mm512_mask_fnmadd_round_ps(A, U, B, C, R) \
- (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, U, R)
+#define _mm512_setr4_epi64(e0,e1,e2,e3) \
+ _mm512_set4_epi64(e3,e2,e1,e0)
-#define _mm512_mask3_fnmadd_round_ps(A, B, C, U, R) \
- (__m512)__builtin_ia32_vfnmaddps512_mask3(A, B, C, U, R)
+#define _mm512_setr4_epi32(e0,e1,e2,e3) \
+ _mm512_set4_epi32(e3,e2,e1,e0)
-#define _mm512_maskz_fnmadd_round_ps(U, A, B, C, R) \
- (__m512)__builtin_ia32_vfnmaddps512_maskz(A, B, C, U, R)
+#define _mm512_setr4_pd(e0,e1,e2,e3) \
+ _mm512_set4_pd(e3,e2,e1,e0)
-#define _mm512_fnmsub_round_pd(A, B, C, R) \
- (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, -1, R)
+#define _mm512_setr4_ps(e0,e1,e2,e3) \
+ _mm512_set4_ps(e3,e2,e1,e0)
-#define _mm512_mask_fnmsub_round_pd(A, U, B, C, R) \
- (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, U, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_ps (void)
+{
+ return __extension__ (__m512){ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
+ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
+}
-#define _mm512_mask3_fnmsub_round_pd(A, B, C, U, R) \
- (__m512d)__builtin_ia32_vfnmsubpd512_mask3(A, B, C, U, R)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero (void)
+{
+ return _mm512_setzero_ps ();
+}
-#define _mm512_maskz_fnmsub_round_pd(U, A, B, C, R) \
- (__m512d)__builtin_ia32_vfnmsubpd512_maskz(A, B, C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_pd (void)
+{
+ return __extension__ (__m512d) { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
+}
-#define _mm512_fnmsub_round_ps(A, B, C, R) \
- (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, -1, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_epi32 (void)
+{
+ return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+}
-#define _mm512_mask_fnmsub_round_ps(A, U, B, C, R) \
- (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, U, R)
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_si512 (void)
+{
+ return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+}
-#define _mm512_mask3_fnmsub_round_ps(A, B, C, U, R) \
- (__m512)__builtin_ia32_vfnmsubps512_mask3(A, B, C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mov_pd (__m512d __W, __mmask8 __U, __m512d __A)
+{
+ return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U);
+}
-#define _mm512_maskz_fnmsub_round_ps(U, A, B, C, R) \
- (__m512)__builtin_ia32_vfnmsubps512_maskz(A, B, C, U, R)
-#endif
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mov_pd (__mmask8 __U, __m512d __A)
+{
+ return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
+}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_epi64 (__m512i __A)
+_mm512_mask_mov_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
+_mm512_maskz_mov_ps (__mmask16 __U, __m512 __A)
{
- return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_abs_epi64 (__mmask8 __U, __m512i __A)
+_mm512_load_pd (void const *__P)
{
- return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return *(__m512d *) __P;
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_epi32 (__m512i __A)
+_mm512_mask_load_pd (__m512d __W, __mmask8 __U, void const *__P)
{
- return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
+_mm512_maskz_load_pd (__mmask8 __U, void const *__P)
{
- return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_store_pd (void *__P, __m512d __A)
+{
+ *(__m512d *) __P = __A;
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_abs_epi32 (__mmask16 __U, __m512i __A)
+_mm512_mask_store_pd (void *__P, __mmask8 __U, __m512d __A)
{
- return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ __builtin_ia32_storeapd512_mask ((__v8df *) __P, (__v8df) __A,
+ (__mmask8) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcastss_ps (__m128 __A)
+_mm512_load_ps (void const *__P)
{
- return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return *(__m512 *) __P;
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcastss_ps (__m512 __O, __mmask16 __M, __m128 __A)
+_mm512_mask_load_ps (__m512 __W, __mmask16 __U, void const *__P)
{
- return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
- (__v16sf) __O, __M);
+ return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcastss_ps (__mmask16 __M, __m128 __A)
+_mm512_maskz_load_ps (__mmask16 __U, void const *__P)
{
- return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- __M);
+ return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m512d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcastsd_pd (__m128d __A)
+_mm512_store_ps (void *__P, __m512 __A)
{
- return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ *(__m512 *) __P = __A;
}
-extern __inline __m512d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcastsd_pd (__m512d __O, __mmask8 __M, __m128d __A)
+_mm512_mask_store_ps (void *__P, __mmask16 __U, __m512 __A)
{
- return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
- (__v8df) __O, __M);
+ __builtin_ia32_storeaps512_mask ((__v16sf *) __P, (__v16sf) __A,
+ (__mmask16) __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcastsd_pd (__mmask8 __M, __m128d __A)
+_mm512_mask_mov_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
{
- return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- __M);
+ return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcastd_epi32 (__m128i __A)
+_mm512_maskz_mov_epi64 (__mmask8 __U, __m512i __A)
{
- return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcastd_epi32 (__m512i __O, __mmask16 __M, __m128i __A)
+_mm512_load_epi64 (void const *__P)
{
- return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
- (__v16si) __O, __M);
+ return *(__m512i *) __P;
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcastd_epi32 (__mmask16 __M, __m128i __A)
+_mm512_mask_load_epi64 (__m512i __W, __mmask8 __U, void const *__P)
{
- return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
+ return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_epi32 (int __A)
+_mm512_maskz_load_epi64 (__mmask8 __U, void const *__P)
{
- return (__m512i)(__v16si)
- { __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A };
+ return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_set1_epi32 (__m512i __O, __mmask16 __M, int __A)
+_mm512_store_epi64 (void *__P, __m512i __A)
{
- return (__m512i) __builtin_ia32_pbroadcastd512_gpr_mask (__A, (__v16si) __O,
- __M);
+ *(__m512i *) __P = __A;
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_set1_epi32 (__mmask16 __M, int __A)
+_mm512_mask_store_epi64 (void *__P, __mmask8 __U, __m512i __A)
{
- return (__m512i)
- __builtin_ia32_pbroadcastd512_gpr_mask (__A,
- (__v16si) _mm512_setzero_si512 (),
- __M);
+ __builtin_ia32_movdqa64store512_mask ((__v8di *) __P, (__v8di) __A,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcastq_epi64 (__m128i __A)
+_mm512_mask_mov_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
{
- return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcastq_epi64 (__m512i __O, __mmask8 __M, __m128i __A)
+_mm512_maskz_mov_epi32 (__mmask16 __U, __m512i __A)
{
- return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
- (__v8di) __O, __M);
+ return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcastq_epi64 (__mmask8 __M, __m128i __A)
+_mm512_load_si512 (void const *__P)
{
- return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- __M);
+ return *(__m512i *) __P;
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_epi64 (long long __A)
+_mm512_load_epi32 (void const *__P)
{
- return (__m512i)(__v8di) { __A, __A, __A, __A, __A, __A, __A, __A };
+ return *(__m512i *) __P;
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_set1_epi64 (__m512i __O, __mmask8 __M, long long __A)
+_mm512_mask_load_epi32 (__m512i __W, __mmask16 __U, void const *__P)
{
- return (__m512i) __builtin_ia32_pbroadcastq512_gpr_mask (__A, (__v8di) __O,
- __M);
+ return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_set1_epi64 (__mmask8 __M, long long __A)
+_mm512_maskz_load_epi32 (__mmask16 __U, void const *__P)
{
- return (__m512i)
- __builtin_ia32_pbroadcastq512_gpr_mask (__A,
- (__v8di) _mm512_setzero_si512 (),
- __M);
+ return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m512
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f32x4 (__m128 __A)
+_mm512_store_si512 (void *__P, __m512i __A)
{
- return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ *(__m512i *) __P = __A;
}
-extern __inline __m512
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f32x4 (__m512 __O, __mmask16 __M, __m128 __A)
+_mm512_store_epi32 (void *__P, __m512i __A)
{
- return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
- (__v16sf) __O,
- __M);
+ *(__m512i *) __P = __A;
}
-extern __inline __m512
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f32x4 (__mmask16 __M, __m128 __A)
+_mm512_mask_store_epi32 (void *__P, __mmask16 __U, __m512i __A)
{
- return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- __M);
+ __builtin_ia32_movdqa32store512_mask ((__v16si *) __P, (__v16si) __A,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i32x4 (__m128i __A)
+_mm512_mullo_epi32 (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m512i) ((__v16su) __A * (__v16su) __B);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i32x4 (__m512i __O, __mmask16 __M, __m128i __A)
+_mm512_maskz_mullo_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
- (__v16si) __O,
- __M);
+ return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i32x4 (__mmask16 __M, __m128i __A)
+_mm512_mask_mullo_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
+ return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W, __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f64x4 (__m256d __A)
+_mm512_mullox_epi64 (__m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m512i) ((__v8du) __A * (__v8du) __B);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f64x4 (__m512d __O, __mmask8 __M, __m256d __A)
+_mm512_mask_mullox_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
- (__v8df) __O,
- __M);
+ return _mm512_mask_mov_epi64 (__W, __M, _mm512_mullox_epi64 (__A, __B));
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f64x4 (__mmask8 __M, __m256d __A)
+_mm512_sllv_epi32 (__m512i __X, __m512i __Y)
{
- return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- __M);
+ return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sllv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
+{
+ return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v16si) __W,
+ (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sllv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
+{
+ return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i64x4 (__m256i __A)
+_mm512_srav_epi32 (__m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i64x4 (__m512i __O, __mmask8 __M, __m256i __A)
+_mm512_mask_srav_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
- (__v8di) __O,
- __M);
+ return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i64x4 (__mmask8 __M, __m256i __A)
+_mm512_maskz_srav_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- __M);
+ return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-typedef enum
-{
- _MM_PERM_AAAA = 0x00, _MM_PERM_AAAB = 0x01, _MM_PERM_AAAC = 0x02,
- _MM_PERM_AAAD = 0x03, _MM_PERM_AABA = 0x04, _MM_PERM_AABB = 0x05,
- _MM_PERM_AABC = 0x06, _MM_PERM_AABD = 0x07, _MM_PERM_AACA = 0x08,
- _MM_PERM_AACB = 0x09, _MM_PERM_AACC = 0x0A, _MM_PERM_AACD = 0x0B,
- _MM_PERM_AADA = 0x0C, _MM_PERM_AADB = 0x0D, _MM_PERM_AADC = 0x0E,
- _MM_PERM_AADD = 0x0F, _MM_PERM_ABAA = 0x10, _MM_PERM_ABAB = 0x11,
- _MM_PERM_ABAC = 0x12, _MM_PERM_ABAD = 0x13, _MM_PERM_ABBA = 0x14,
- _MM_PERM_ABBB = 0x15, _MM_PERM_ABBC = 0x16, _MM_PERM_ABBD = 0x17,
- _MM_PERM_ABCA = 0x18, _MM_PERM_ABCB = 0x19, _MM_PERM_ABCC = 0x1A,
- _MM_PERM_ABCD = 0x1B, _MM_PERM_ABDA = 0x1C, _MM_PERM_ABDB = 0x1D,
- _MM_PERM_ABDC = 0x1E, _MM_PERM_ABDD = 0x1F, _MM_PERM_ACAA = 0x20,
- _MM_PERM_ACAB = 0x21, _MM_PERM_ACAC = 0x22, _MM_PERM_ACAD = 0x23,
- _MM_PERM_ACBA = 0x24, _MM_PERM_ACBB = 0x25, _MM_PERM_ACBC = 0x26,
- _MM_PERM_ACBD = 0x27, _MM_PERM_ACCA = 0x28, _MM_PERM_ACCB = 0x29,
- _MM_PERM_ACCC = 0x2A, _MM_PERM_ACCD = 0x2B, _MM_PERM_ACDA = 0x2C,
- _MM_PERM_ACDB = 0x2D, _MM_PERM_ACDC = 0x2E, _MM_PERM_ACDD = 0x2F,
- _MM_PERM_ADAA = 0x30, _MM_PERM_ADAB = 0x31, _MM_PERM_ADAC = 0x32,
- _MM_PERM_ADAD = 0x33, _MM_PERM_ADBA = 0x34, _MM_PERM_ADBB = 0x35,
- _MM_PERM_ADBC = 0x36, _MM_PERM_ADBD = 0x37, _MM_PERM_ADCA = 0x38,
- _MM_PERM_ADCB = 0x39, _MM_PERM_ADCC = 0x3A, _MM_PERM_ADCD = 0x3B,
- _MM_PERM_ADDA = 0x3C, _MM_PERM_ADDB = 0x3D, _MM_PERM_ADDC = 0x3E,
- _MM_PERM_ADDD = 0x3F, _MM_PERM_BAAA = 0x40, _MM_PERM_BAAB = 0x41,
- _MM_PERM_BAAC = 0x42, _MM_PERM_BAAD = 0x43, _MM_PERM_BABA = 0x44,
- _MM_PERM_BABB = 0x45, _MM_PERM_BABC = 0x46, _MM_PERM_BABD = 0x47,
- _MM_PERM_BACA = 0x48, _MM_PERM_BACB = 0x49, _MM_PERM_BACC = 0x4A,
- _MM_PERM_BACD = 0x4B, _MM_PERM_BADA = 0x4C, _MM_PERM_BADB = 0x4D,
- _MM_PERM_BADC = 0x4E, _MM_PERM_BADD = 0x4F, _MM_PERM_BBAA = 0x50,
- _MM_PERM_BBAB = 0x51, _MM_PERM_BBAC = 0x52, _MM_PERM_BBAD = 0x53,
- _MM_PERM_BBBA = 0x54, _MM_PERM_BBBB = 0x55, _MM_PERM_BBBC = 0x56,
- _MM_PERM_BBBD = 0x57, _MM_PERM_BBCA = 0x58, _MM_PERM_BBCB = 0x59,
- _MM_PERM_BBCC = 0x5A, _MM_PERM_BBCD = 0x5B, _MM_PERM_BBDA = 0x5C,
- _MM_PERM_BBDB = 0x5D, _MM_PERM_BBDC = 0x5E, _MM_PERM_BBDD = 0x5F,
- _MM_PERM_BCAA = 0x60, _MM_PERM_BCAB = 0x61, _MM_PERM_BCAC = 0x62,
- _MM_PERM_BCAD = 0x63, _MM_PERM_BCBA = 0x64, _MM_PERM_BCBB = 0x65,
- _MM_PERM_BCBC = 0x66, _MM_PERM_BCBD = 0x67, _MM_PERM_BCCA = 0x68,
- _MM_PERM_BCCB = 0x69, _MM_PERM_BCCC = 0x6A, _MM_PERM_BCCD = 0x6B,
- _MM_PERM_BCDA = 0x6C, _MM_PERM_BCDB = 0x6D, _MM_PERM_BCDC = 0x6E,
- _MM_PERM_BCDD = 0x6F, _MM_PERM_BDAA = 0x70, _MM_PERM_BDAB = 0x71,
- _MM_PERM_BDAC = 0x72, _MM_PERM_BDAD = 0x73, _MM_PERM_BDBA = 0x74,
- _MM_PERM_BDBB = 0x75, _MM_PERM_BDBC = 0x76, _MM_PERM_BDBD = 0x77,
- _MM_PERM_BDCA = 0x78, _MM_PERM_BDCB = 0x79, _MM_PERM_BDCC = 0x7A,
- _MM_PERM_BDCD = 0x7B, _MM_PERM_BDDA = 0x7C, _MM_PERM_BDDB = 0x7D,
- _MM_PERM_BDDC = 0x7E, _MM_PERM_BDDD = 0x7F, _MM_PERM_CAAA = 0x80,
- _MM_PERM_CAAB = 0x81, _MM_PERM_CAAC = 0x82, _MM_PERM_CAAD = 0x83,
- _MM_PERM_CABA = 0x84, _MM_PERM_CABB = 0x85, _MM_PERM_CABC = 0x86,
- _MM_PERM_CABD = 0x87, _MM_PERM_CACA = 0x88, _MM_PERM_CACB = 0x89,
- _MM_PERM_CACC = 0x8A, _MM_PERM_CACD = 0x8B, _MM_PERM_CADA = 0x8C,
- _MM_PERM_CADB = 0x8D, _MM_PERM_CADC = 0x8E, _MM_PERM_CADD = 0x8F,
- _MM_PERM_CBAA = 0x90, _MM_PERM_CBAB = 0x91, _MM_PERM_CBAC = 0x92,
- _MM_PERM_CBAD = 0x93, _MM_PERM_CBBA = 0x94, _MM_PERM_CBBB = 0x95,
- _MM_PERM_CBBC = 0x96, _MM_PERM_CBBD = 0x97, _MM_PERM_CBCA = 0x98,
- _MM_PERM_CBCB = 0x99, _MM_PERM_CBCC = 0x9A, _MM_PERM_CBCD = 0x9B,
- _MM_PERM_CBDA = 0x9C, _MM_PERM_CBDB = 0x9D, _MM_PERM_CBDC = 0x9E,
- _MM_PERM_CBDD = 0x9F, _MM_PERM_CCAA = 0xA0, _MM_PERM_CCAB = 0xA1,
- _MM_PERM_CCAC = 0xA2, _MM_PERM_CCAD = 0xA3, _MM_PERM_CCBA = 0xA4,
- _MM_PERM_CCBB = 0xA5, _MM_PERM_CCBC = 0xA6, _MM_PERM_CCBD = 0xA7,
- _MM_PERM_CCCA = 0xA8, _MM_PERM_CCCB = 0xA9, _MM_PERM_CCCC = 0xAA,
- _MM_PERM_CCCD = 0xAB, _MM_PERM_CCDA = 0xAC, _MM_PERM_CCDB = 0xAD,
- _MM_PERM_CCDC = 0xAE, _MM_PERM_CCDD = 0xAF, _MM_PERM_CDAA = 0xB0,
- _MM_PERM_CDAB = 0xB1, _MM_PERM_CDAC = 0xB2, _MM_PERM_CDAD = 0xB3,
- _MM_PERM_CDBA = 0xB4, _MM_PERM_CDBB = 0xB5, _MM_PERM_CDBC = 0xB6,
- _MM_PERM_CDBD = 0xB7, _MM_PERM_CDCA = 0xB8, _MM_PERM_CDCB = 0xB9,
- _MM_PERM_CDCC = 0xBA, _MM_PERM_CDCD = 0xBB, _MM_PERM_CDDA = 0xBC,
- _MM_PERM_CDDB = 0xBD, _MM_PERM_CDDC = 0xBE, _MM_PERM_CDDD = 0xBF,
- _MM_PERM_DAAA = 0xC0, _MM_PERM_DAAB = 0xC1, _MM_PERM_DAAC = 0xC2,
- _MM_PERM_DAAD = 0xC3, _MM_PERM_DABA = 0xC4, _MM_PERM_DABB = 0xC5,
- _MM_PERM_DABC = 0xC6, _MM_PERM_DABD = 0xC7, _MM_PERM_DACA = 0xC8,
- _MM_PERM_DACB = 0xC9, _MM_PERM_DACC = 0xCA, _MM_PERM_DACD = 0xCB,
- _MM_PERM_DADA = 0xCC, _MM_PERM_DADB = 0xCD, _MM_PERM_DADC = 0xCE,
- _MM_PERM_DADD = 0xCF, _MM_PERM_DBAA = 0xD0, _MM_PERM_DBAB = 0xD1,
- _MM_PERM_DBAC = 0xD2, _MM_PERM_DBAD = 0xD3, _MM_PERM_DBBA = 0xD4,
- _MM_PERM_DBBB = 0xD5, _MM_PERM_DBBC = 0xD6, _MM_PERM_DBBD = 0xD7,
- _MM_PERM_DBCA = 0xD8, _MM_PERM_DBCB = 0xD9, _MM_PERM_DBCC = 0xDA,
- _MM_PERM_DBCD = 0xDB, _MM_PERM_DBDA = 0xDC, _MM_PERM_DBDB = 0xDD,
- _MM_PERM_DBDC = 0xDE, _MM_PERM_DBDD = 0xDF, _MM_PERM_DCAA = 0xE0,
- _MM_PERM_DCAB = 0xE1, _MM_PERM_DCAC = 0xE2, _MM_PERM_DCAD = 0xE3,
- _MM_PERM_DCBA = 0xE4, _MM_PERM_DCBB = 0xE5, _MM_PERM_DCBC = 0xE6,
- _MM_PERM_DCBD = 0xE7, _MM_PERM_DCCA = 0xE8, _MM_PERM_DCCB = 0xE9,
- _MM_PERM_DCCC = 0xEA, _MM_PERM_DCCD = 0xEB, _MM_PERM_DCDA = 0xEC,
- _MM_PERM_DCDB = 0xED, _MM_PERM_DCDC = 0xEE, _MM_PERM_DCDD = 0xEF,
- _MM_PERM_DDAA = 0xF0, _MM_PERM_DDAB = 0xF1, _MM_PERM_DDAC = 0xF2,
- _MM_PERM_DDAD = 0xF3, _MM_PERM_DDBA = 0xF4, _MM_PERM_DDBB = 0xF5,
- _MM_PERM_DDBC = 0xF6, _MM_PERM_DDBD = 0xF7, _MM_PERM_DDCA = 0xF8,
- _MM_PERM_DDCB = 0xF9, _MM_PERM_DDCC = 0xFA, _MM_PERM_DDCD = 0xFB,
- _MM_PERM_DDDA = 0xFC, _MM_PERM_DDDB = 0xFD, _MM_PERM_DDDC = 0xFE,
- _MM_PERM_DDDD = 0xFF
-} _MM_PERM_ENUM;
-
-#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_epi32 (__m512i __A, _MM_PERM_ENUM __mask)
+_mm512_srlv_epi32 (__m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
- __mask,
+ return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
+ (__v16si) __Y,
(__v16si)
_mm512_undefined_epi32 (),
(__mmask16) -1);
@@ -4492,21 +4456,20 @@ _mm512_shuffle_epi32 (__m512i __A, _MM_PERM_ENUM __mask)
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
- _MM_PERM_ENUM __mask)
+_mm512_mask_srlv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
- __mask,
+ return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
+ (__v16si) __Y,
(__v16si) __W,
(__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_epi32 (__mmask16 __U, __m512i __A, _MM_PERM_ENUM __mask)
+_mm512_maskz_srlv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
- __mask,
+ return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X,
+ (__v16si) __Y,
(__v16si)
_mm512_setzero_si512 (),
(__mmask16) __U);
@@ -4514,302 +4477,190 @@ _mm512_maskz_shuffle_epi32 (__mmask16 __U, __m512i __A, _MM_PERM_ENUM __mask)
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_i64x2 (__m512i __A, __m512i __B, const int __imm)
+_mm512_add_epi64 (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
- (__v8di) __B, __imm,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512i) ((__v8du) __A + (__v8du) __B);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_i64x2 (__m512i __W, __mmask8 __U, __m512i __A,
- __m512i __B, const int __imm)
+_mm512_mask_add_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
- (__v8di) __B, __imm,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_i64x2 (__mmask8 __U, __m512i __A, __m512i __B,
- const int __imm)
+_mm512_maskz_add_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
- (__v8di) __B, __imm,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_i32x4 (__m512i __A, __m512i __B, const int __imm)
+_mm512_sub_epi64 (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
- (__v16si) __B,
- __imm,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m512i) ((__v8du) __A - (__v8du) __B);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_i32x4 (__m512i __W, __mmask16 __U, __m512i __A,
- __m512i __B, const int __imm)
+_mm512_mask_sub_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
- (__v16si) __B,
- __imm,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_i32x4 (__mmask16 __U, __m512i __A, __m512i __B,
- const int __imm)
+_mm512_maskz_sub_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
- (__v16si) __B,
- __imm,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_f64x2 (__m512d __A, __m512d __B, const int __imm)
+_mm512_sllv_epi64 (__m512i __X, __m512i __Y)
{
- return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
- (__v8df) __B, __imm,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
+ (__v8di) __Y,
+ (__v8di)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_f64x2 (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B, const int __imm)
+_mm512_mask_sllv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
{
- return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
- (__v8df) __B, __imm,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
+ (__v8di) __Y,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_f64x2 (__mmask8 __U, __m512d __A, __m512d __B,
- const int __imm)
+_mm512_maskz_sllv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
{
- return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
- (__v8df) __B, __imm,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X,
+ (__v8di) __Y,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_f32x4 (__m512 __A, __m512 __B, const int __imm)
+_mm512_srav_epi64 (__m512i __X, __m512i __Y)
{
- return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
- (__v16sf) __B, __imm,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
+ (__v8di) __Y,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_f32x4 (__m512 __W, __mmask16 __U, __m512 __A,
- __m512 __B, const int __imm)
+_mm512_mask_srav_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
{
- return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
- (__v16sf) __B, __imm,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
+ (__v8di) __Y,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_f32x4 (__mmask16 __U, __m512 __A, __m512 __B,
- const int __imm)
+_mm512_maskz_srav_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
{
- return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
- (__v16sf) __B, __imm,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
+ (__v8di) __Y,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-#else
-#define _mm512_shuffle_epi32(X, C) \
- ((__m512i) __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
- (__v16si)(__m512i)_mm512_undefined_epi32 (),\
- (__mmask16)-1))
-
-#define _mm512_mask_shuffle_epi32(W, U, X, C) \
- ((__m512i) __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
- (__v16si)(__m512i)(W),\
- (__mmask16)(U)))
-
-#define _mm512_maskz_shuffle_epi32(U, X, C) \
- ((__m512i) __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
- (__v16si)(__m512i)_mm512_setzero_si512 (),\
- (__mmask16)(U)))
-
-#define _mm512_shuffle_i64x2(X, Y, C) \
- ((__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(C),\
- (__v8di)(__m512i)_mm512_undefined_epi32 (),\
- (__mmask8)-1))
-
-#define _mm512_mask_shuffle_i64x2(W, U, X, Y, C) \
- ((__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(C),\
- (__v8di)(__m512i)(W),\
- (__mmask8)(U)))
-
-#define _mm512_maskz_shuffle_i64x2(U, X, Y, C) \
- ((__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(C),\
- (__v8di)(__m512i)_mm512_setzero_si512 (),\
- (__mmask8)(U)))
-
-#define _mm512_shuffle_i32x4(X, Y, C) \
- ((__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(C),\
- (__v16si)(__m512i)_mm512_undefined_epi32 (),\
- (__mmask16)-1))
-
-#define _mm512_mask_shuffle_i32x4(W, U, X, Y, C) \
- ((__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(C),\
- (__v16si)(__m512i)(W),\
- (__mmask16)(U)))
-
-#define _mm512_maskz_shuffle_i32x4(U, X, Y, C) \
- ((__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(C),\
- (__v16si)(__m512i)_mm512_setzero_si512 (),\
- (__mmask16)(U)))
-
-#define _mm512_shuffle_f64x2(X, Y, C) \
- ((__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (int)(C),\
- (__v8df)(__m512d)_mm512_undefined_pd(),\
- (__mmask8)-1))
-
-#define _mm512_mask_shuffle_f64x2(W, U, X, Y, C) \
- ((__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (int)(C),\
- (__v8df)(__m512d)(W),\
- (__mmask8)(U)))
-
-#define _mm512_maskz_shuffle_f64x2(U, X, Y, C) \
- ((__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (int)(C),\
- (__v8df)(__m512d)_mm512_setzero_pd(),\
- (__mmask8)(U)))
-
-#define _mm512_shuffle_f32x4(X, Y, C) \
- ((__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (int)(C),\
- (__v16sf)(__m512)_mm512_undefined_ps(),\
- (__mmask16)-1))
-
-#define _mm512_mask_shuffle_f32x4(W, U, X, Y, C) \
- ((__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (int)(C),\
- (__v16sf)(__m512)(W),\
- (__mmask16)(U)))
-
-#define _mm512_maskz_shuffle_f32x4(U, X, Y, C) \
- ((__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (int)(C),\
- (__v16sf)(__m512)_mm512_setzero_ps(),\
- (__mmask16)(U)))
-#endif
-
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rolv_epi32 (__m512i __A, __m512i __B)
+_mm512_srlv_epi64 (__m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
+ (__v8di) __Y,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rolv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_srlv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
+ (__v8di) __Y,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rolv_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_srlv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X,
+ (__v8di) __Y,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rorv_epi32 (__m512i __A, __m512i __B)
+_mm512_add_epi32 (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m512i) ((__v16su) __A + (__v16su) __B);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rorv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_add_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rorv_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_add_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rolv_epi64 (__m512i __A, __m512i __B)
+_mm512_mul_epi32 (__m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
- (__v8di) __B,
+ return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
+ (__v16si) __Y,
(__v8di)
_mm512_undefined_epi32 (),
(__mmask8) -1);
@@ -4817,2133 +4668,2567 @@ _mm512_rolv_epi64 (__m512i __A, __m512i __B)
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rolv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_mul_epi32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v8di) __W, __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rolv_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_maskz_mul_epi32 (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
- (__v8di) __B,
+ return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X,
+ (__v16si) __Y,
(__v8di)
_mm512_setzero_si512 (),
- (__mmask8) __U);
+ __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rorv_epi64 (__m512i __A, __m512i __B)
+_mm512_sub_epi32 (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512i) ((__v16su) __A - (__v16su) __B);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rorv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_sub_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rorv_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_maskz_sub_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-#ifdef __OPTIMIZE__
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundpd_epi32 (__m512d __A, const int __R)
-{
- return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1, __R);
+_mm512_mul_epu32 (__m512i __X, __m512i __Y)
+{
+ return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A,
- const int __R)
+_mm512_mask_mul_epu32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
- (__v8si) __W,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v8di) __W, __M);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R)
+_mm512_maskz_mul_epu32 (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X,
+ (__v16si) __Y,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m256i
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundpd_epu32 (__m512d __A, const int __R)
+_mm512_slli_epi64 (__m512i __A, unsigned int __B)
{
- return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A,
- const int __R)
+_mm512_mask_slli_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
+ unsigned int __B)
{
- return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
- (__v8si) __W,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R)
+_mm512_maskz_slli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
{
- return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
#else
-#define _mm512_cvtt_roundpd_epi32(A, B) \
- ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
-
-#define _mm512_mask_cvtt_roundpd_epi32(W, U, A, B) \
- ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)(W), U, B))
-
-#define _mm512_maskz_cvtt_roundpd_epi32(U, A, B) \
- ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
-
-#define _mm512_cvtt_roundpd_epu32(A, B) \
- ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+#define _mm512_slli_epi64(X, C) \
+ ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v8di)(__m512i)_mm512_undefined_epi32 (), \
+ (__mmask8)-1))
-#define _mm512_mask_cvtt_roundpd_epu32(W, U, A, B) \
- ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)(W), U, B))
+#define _mm512_mask_slli_epi64(W, U, X, C) \
+ ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v8di)(__m512i)(W), \
+ (__mmask8)(U)))
-#define _mm512_maskz_cvtt_roundpd_epu32(U, A, B) \
- ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+#define _mm512_maskz_slli_epi64(U, X, C) \
+ ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v8di)(__m512i)_mm512_setzero_si512 (), \
+ (__mmask8)(U)))
#endif
-#ifdef __OPTIMIZE__
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundpd_epi32 (__m512d __A, const int __R)
+_mm512_sll_epi64 (__m512i __A, __m128i __B)
{
- return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
+ (__v2di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A,
- const int __R)
+_mm512_mask_sll_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
{
- return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
- (__v8si) __W,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
+ (__v2di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R)
+_mm512_maskz_sll_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
{
- return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A,
+ (__v2di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m256i
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundpd_epu32 (__m512d __A, const int __R)
+_mm512_srli_epi64 (__m512i __A, unsigned int __B)
{
- return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A,
- const int __R)
+_mm512_mask_srli_epi64 (__m512i __W, __mmask8 __U,
+ __m512i __A, unsigned int __B)
{
- return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
- (__v8si) __W,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R)
+_mm512_maskz_srli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
{
- return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
#else
-#define _mm512_cvt_roundpd_epi32(A, B) \
- ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
-
-#define _mm512_mask_cvt_roundpd_epi32(W, U, A, B) \
- ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)(W), U, B))
-
-#define _mm512_maskz_cvt_roundpd_epi32(U, A, B) \
- ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
-
-#define _mm512_cvt_roundpd_epu32(A, B) \
- ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+#define _mm512_srli_epi64(X, C) \
+ ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v8di)(__m512i)_mm512_undefined_epi32 (), \
+ (__mmask8)-1))
-#define _mm512_mask_cvt_roundpd_epu32(W, U, A, B) \
- ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)(W), U, B))
+#define _mm512_mask_srli_epi64(W, U, X, C) \
+ ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v8di)(__m512i)(W), \
+ (__mmask8)(U)))
-#define _mm512_maskz_cvt_roundpd_epu32(U, A, B) \
- ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+#define _mm512_maskz_srli_epi64(U, X, C) \
+ ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v8di)(__m512i)_mm512_setzero_si512 (), \
+ (__mmask8)(U)))
#endif
-#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundps_epi32 (__m512 __A, const int __R)
+_mm512_srl_epi64 (__m512i __A, __m128i __B)
{
- return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1, __R);
+ return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
+ (__v2di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A,
- const int __R)
+_mm512_mask_srl_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
{
- return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
- (__v16si) __W,
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
+ (__v2di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R)
+_mm512_maskz_srl_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
{
- return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A,
+ (__v2di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
+#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundps_epu32 (__m512 __A, const int __R)
+_mm512_srai_epi64 (__m512i __A, unsigned int __B)
{
- return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1, __R);
+ return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A,
- const int __R)
+_mm512_mask_srai_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
+ unsigned int __B)
{
- return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
- (__v16si) __W,
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R)
+_mm512_maskz_srai_epi64 (__mmask8 __U, __m512i __A, unsigned int __B)
{
- return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
#else
-#define _mm512_cvtt_roundps_epi32(A, B) \
- ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
-
-#define _mm512_mask_cvtt_roundps_epi32(W, U, A, B) \
- ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)(W), U, B))
-
-#define _mm512_maskz_cvtt_roundps_epi32(U, A, B) \
- ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
-
-#define _mm512_cvtt_roundps_epu32(A, B) \
- ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+#define _mm512_srai_epi64(X, C) \
+ ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v8di)(__m512i)_mm512_undefined_epi32 (), \
+ (__mmask8)-1))
-#define _mm512_mask_cvtt_roundps_epu32(W, U, A, B) \
- ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)(W), U, B))
+#define _mm512_mask_srai_epi64(W, U, X, C) \
+ ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v8di)(__m512i)(W), \
+ (__mmask8)(U)))
-#define _mm512_maskz_cvtt_roundps_epu32(U, A, B) \
- ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+#define _mm512_maskz_srai_epi64(U, X, C) \
+ ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v8di)(__m512i)_mm512_setzero_si512 (), \
+ (__mmask8)(U)))
#endif
-#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundps_epi32 (__m512 __A, const int __R)
+_mm512_sra_epi64 (__m512i __A, __m128i __B)
{
- return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1, __R);
+ return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
+ (__v2di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A,
- const int __R)
+_mm512_mask_sra_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B)
{
- return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
- (__v16si) __W,
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
+ (__v2di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R)
+_mm512_maskz_sra_epi64 (__mmask8 __U, __m512i __A, __m128i __B)
{
- return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A,
+ (__v2di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
+#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundps_epu32 (__m512 __A, const int __R)
+_mm512_slli_epi32 (__m512i __A, unsigned int __B)
{
- return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1, __R);
+ return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A,
- const int __R)
+_mm512_mask_slli_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+ unsigned int __B)
{
- return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
- (__v16si) __W,
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R)
+_mm512_maskz_slli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
{
- return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
#else
-#define _mm512_cvt_roundps_epi32(A, B) \
- ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
-
-#define _mm512_mask_cvt_roundps_epi32(W, U, A, B) \
- ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)(W), U, B))
+#define _mm512_slli_epi32(X, C) \
+ ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v16si)(__m512i)_mm512_undefined_epi32 (), \
+ (__mmask16)-1))
-#define _mm512_maskz_cvt_roundps_epi32(U, A, B) \
- ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+#define _mm512_mask_slli_epi32(W, U, X, C) \
+ ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v16si)(__m512i)(W), \
+ (__mmask16)(U)))
-#define _mm512_cvt_roundps_epu32(A, B) \
- ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+#define _mm512_maskz_slli_epi32(U, X, C) \
+ ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v16si)(__m512i)_mm512_setzero_si512 (), \
+ (__mmask16)(U)))
+#endif
-#define _mm512_mask_cvt_roundps_epu32(W, U, A, B) \
- ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)(W), U, B))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sll_epi32 (__m512i __A, __m128i __B)
+{
+ return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
+ (__v4si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
+}
-#define _mm512_maskz_cvt_roundps_epu32(U, A, B) \
- ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
-#endif
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sll_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
+{
+ return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
+ (__v4si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
+}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu32_sd (__m128d __A, unsigned __B)
+_mm512_maskz_sll_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
{
- return (__m128d) __builtin_ia32_cvtusi2sd32 ((__v2df) __A, __B);
+ return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A,
+ (__v4si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-#ifdef __x86_64__
#ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu64_sd (__m128d __A, unsigned long long __B, const int __R)
+_mm512_srli_epi32 (__m512i __A, unsigned int __B)
{
- return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B, __R);
+ return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi64_sd (__m128d __A, long long __B, const int __R)
+_mm512_mask_srli_epi32 (__m512i __W, __mmask16 __U,
+ __m512i __A, unsigned int __B)
{
- return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R);
+ return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsi64_sd (__m128d __A, long long __B, const int __R)
+_mm512_maskz_srli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
{
- return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R);
+ return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
#else
-#define _mm_cvt_roundu64_sd(A, B, C) \
- (__m128d)__builtin_ia32_cvtusi2sd64(A, B, C)
-
-#define _mm_cvt_roundi64_sd(A, B, C) \
- (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C)
+#define _mm512_srli_epi32(X, C) \
+ ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v16si)(__m512i)_mm512_undefined_epi32 (),\
+ (__mmask16)-1))
-#define _mm_cvt_roundsi64_sd(A, B, C) \
- (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C)
-#endif
+#define _mm512_mask_srli_epi32(W, U, X, C) \
+ ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v16si)(__m512i)(W), \
+ (__mmask16)(U)))
+#define _mm512_maskz_srli_epi32(U, X, C) \
+ ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v16si)(__m512i)_mm512_setzero_si512 (), \
+ (__mmask16)(U)))
#endif
-#ifdef __OPTIMIZE__
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu32_ss (__m128 __A, unsigned __B, const int __R)
+_mm512_srl_epi32 (__m512i __A, __m128i __B)
{
- return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B, __R);
+ return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
+ (__v4si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsi32_ss (__m128 __A, int __B, const int __R)
+_mm512_mask_srl_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
{
- return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R);
+ return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
+ (__v4si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi32_ss (__m128 __A, int __B, const int __R)
+_mm512_maskz_srl_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
{
- return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R);
+ return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A,
+ (__v4si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-#else
-#define _mm_cvt_roundu32_ss(A, B, C) \
- (__m128)__builtin_ia32_cvtusi2ss32(A, B, C)
-
-#define _mm_cvt_roundi32_ss(A, B, C) \
- (__m128)__builtin_ia32_cvtsi2ss32(A, B, C)
-
-#define _mm_cvt_roundsi32_ss(A, B, C) \
- (__m128)__builtin_ia32_cvtsi2ss32(A, B, C)
-#endif
-#ifdef __x86_64__
#ifdef __OPTIMIZE__
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu64_ss (__m128 __A, unsigned long long __B, const int __R)
+_mm512_srai_epi32 (__m512i __A, unsigned int __B)
{
- return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B, __R);
+ return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsi64_ss (__m128 __A, long long __B, const int __R)
+_mm512_mask_srai_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+ unsigned int __B)
{
- return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R);
+ return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi64_ss (__m128 __A, long long __B, const int __R)
+_mm512_maskz_srai_epi32 (__mmask16 __U, __m512i __A, unsigned int __B)
{
- return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R);
+ return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
#else
-#define _mm_cvt_roundu64_ss(A, B, C) \
- (__m128)__builtin_ia32_cvtusi2ss64(A, B, C)
-
-#define _mm_cvt_roundi64_ss(A, B, C) \
- (__m128)__builtin_ia32_cvtsi2ss64(A, B, C)
+#define _mm512_srai_epi32(X, C) \
+ ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v16si)(__m512i)_mm512_undefined_epi32 (),\
+ (__mmask16)-1))
-#define _mm_cvt_roundsi64_ss(A, B, C) \
- (__m128)__builtin_ia32_cvtsi2ss64(A, B, C)
-#endif
+#define _mm512_mask_srai_epi32(W, U, X, C) \
+ ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v16si)(__m512i)(W), \
+ (__mmask16)(U)))
+#define _mm512_maskz_srai_epi32(U, X, C) \
+ ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X), \
+ (unsigned int)(C), \
+ (__v16si)(__m512i)_mm512_setzero_si512 (), \
+ (__mmask16)(U)))
#endif
-extern __inline __m128i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_epi8 (__m512i __A)
-{
- return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
- (__v16qi)
- _mm_undefined_si128 (),
- (__mmask16) -1);
-}
-
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
+_mm512_sra_epi32 (__m512i __A, __m128i __B)
{
- __builtin_ia32_pmovdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
+ return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
+ (__v4si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
+_mm512_mask_sra_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B)
{
- return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
- (__v16qi) __O, __M);
+ return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
+ (__v4si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_epi8 (__mmask16 __M, __m512i __A)
+_mm512_maskz_sra_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
{
- return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
- (__v16qi)
- _mm_setzero_si128 (),
- __M);
+ return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A,
+ (__v4si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m128i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi32_epi8 (__m512i __A)
+/* Constant helper to represent the ternary logic operations among
+ vector A, B and C. */
+typedef enum
{
- return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
- (__v16qi)
- _mm_undefined_si128 (),
- (__mmask16) -1);
-}
+ _MM_TERNLOG_A = 0xF0,
+ _MM_TERNLOG_B = 0xCC,
+ _MM_TERNLOG_C = 0xAA
+} _MM_TERNLOG_ENUM;
-extern __inline void
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
+_mm512_ternarylogic_epi64 (__m512i __A, __m512i __B, __m512i __C,
+ const int __imm)
{
- __builtin_ia32_pmovsdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
+ return (__m512i)
+ __builtin_ia32_pternlogq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __C,
+ (unsigned char) __imm,
+ (__mmask8) -1);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
+_mm512_mask_ternarylogic_epi64 (__m512i __A, __mmask8 __U, __m512i __B,
+ __m512i __C, const int __imm)
{
- return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
- (__v16qi) __O, __M);
+ return (__m512i)
+ __builtin_ia32_pternlogq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __C,
+ (unsigned char) __imm,
+ (__mmask8) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi32_epi8 (__mmask16 __M, __m512i __A)
+_mm512_maskz_ternarylogic_epi64 (__mmask8 __U, __m512i __A, __m512i __B,
+ __m512i __C, const int __imm)
{
- return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
- (__v16qi)
- _mm_setzero_si128 (),
- __M);
+ return (__m512i)
+ __builtin_ia32_pternlogq512_maskz ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __C,
+ (unsigned char) __imm,
+ (__mmask8) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi32_epi8 (__m512i __A)
+_mm512_ternarylogic_epi32 (__m512i __A, __m512i __B, __m512i __C,
+ const int __imm)
{
- return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
- (__v16qi)
- _mm_undefined_si128 (),
- (__mmask16) -1);
+ return (__m512i)
+ __builtin_ia32_pternlogd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __C,
+ (unsigned char) __imm,
+ (__mmask16) -1);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
+_mm512_mask_ternarylogic_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+ __m512i __C, const int __imm)
{
- __builtin_ia32_pmovusdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
+ return (__m512i)
+ __builtin_ia32_pternlogd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __C,
+ (unsigned char) __imm,
+ (__mmask16) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
+_mm512_maskz_ternarylogic_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+ __m512i __C, const int __imm)
{
- return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
- (__v16qi) __O,
- __M);
+ return (__m512i)
+ __builtin_ia32_pternlogd512_maskz ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __C,
+ (unsigned char) __imm,
+ (__mmask16) __U);
}
+#else
+#define _mm512_ternarylogic_epi64(A, B, C, I) \
+ ((__m512i) \
+ __builtin_ia32_pternlogq512_mask ((__v8di) (__m512i) (A), \
+ (__v8di) (__m512i) (B), \
+ (__v8di) (__m512i) (C), \
+ (unsigned char) (I), \
+ (__mmask8) -1))
+#define _mm512_mask_ternarylogic_epi64(A, U, B, C, I) \
+ ((__m512i) \
+ __builtin_ia32_pternlogq512_mask ((__v8di) (__m512i) (A), \
+ (__v8di) (__m512i) (B), \
+ (__v8di) (__m512i) (C), \
+ (unsigned char)(I), \
+ (__mmask8) (U)))
+#define _mm512_maskz_ternarylogic_epi64(U, A, B, C, I) \
+ ((__m512i) \
+ __builtin_ia32_pternlogq512_maskz ((__v8di) (__m512i) (A), \
+ (__v8di) (__m512i) (B), \
+ (__v8di) (__m512i) (C), \
+ (unsigned char) (I), \
+ (__mmask8) (U)))
+#define _mm512_ternarylogic_epi32(A, B, C, I) \
+ ((__m512i) \
+ __builtin_ia32_pternlogd512_mask ((__v16si) (__m512i) (A), \
+ (__v16si) (__m512i) (B), \
+ (__v16si) (__m512i) (C), \
+ (unsigned char) (I), \
+ (__mmask16) -1))
+#define _mm512_mask_ternarylogic_epi32(A, U, B, C, I) \
+ ((__m512i) \
+ __builtin_ia32_pternlogd512_mask ((__v16si) (__m512i) (A), \
+ (__v16si) (__m512i) (B), \
+ (__v16si) (__m512i) (C), \
+ (unsigned char) (I), \
+ (__mmask16) (U)))
+#define _mm512_maskz_ternarylogic_epi32(U, A, B, C, I) \
+ ((__m512i) \
+ __builtin_ia32_pternlogd512_maskz ((__v16si) (__m512i) (A), \
+ (__v16si) (__m512i) (B), \
+ (__v16si) (__m512i) (C), \
+ (unsigned char) (I), \
+ (__mmask16) (U)))
+#endif
-extern __inline __m128i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi32_epi8 (__mmask16 __M, __m512i __A)
+_mm512_rcp14_pd (__m512d __A)
{
- return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
- (__v16qi)
- _mm_setzero_si128 (),
- __M);
+ return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_epi16 (__m512i __A)
+_mm512_mask_rcp14_pd (__m512d __W, __mmask8 __U, __m512d __A)
{
- return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
- (__v16hi)
- _mm256_undefined_si256 (),
- (__mmask16) -1);
+ return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_storeu_epi16 (void * __P, __mmask16 __M, __m512i __A)
+_mm512_maskz_rcp14_pd (__mmask8 __U, __m512d __A)
{
- __builtin_ia32_pmovdw512mem_mask ((__v16hi *) __P, (__v16si) __A, __M);
+ return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
+_mm512_rcp14_ps (__m512 __A)
{
- return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
- (__v16hi) __O, __M);
+ return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_epi16 (__mmask16 __M, __m512i __A)
+_mm512_mask_rcp14_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
- (__v16hi)
- _mm256_setzero_si256 (),
- __M);
+ return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi32_epi16 (__m512i __A)
+_mm512_maskz_rcp14_ps (__mmask16 __U, __m512 __A)
{
- return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
- (__v16hi)
- _mm256_undefined_si256 (),
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A)
+_mm512_rsqrt14_pd (__m512d __A)
{
- __builtin_ia32_pmovsdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M);
+ return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
+_mm512_mask_rsqrt14_pd (__m512d __W, __mmask8 __U, __m512d __A)
{
- return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
- (__v16hi) __O, __M);
+ return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi32_epi16 (__mmask16 __M, __m512i __A)
+_mm512_maskz_rsqrt14_pd (__mmask8 __U, __m512d __A)
{
- return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
- (__v16hi)
- _mm256_setzero_si256 (),
- __M);
+ return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi32_epi16 (__m512i __A)
+_mm512_rsqrt14_ps (__m512 __A)
{
- return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
- (__v16hi)
- _mm256_undefined_si256 (),
+ return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
(__mmask16) -1);
}
-extern __inline void
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A)
+_mm512_mask_rsqrt14_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- __builtin_ia32_pmovusdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M);
+ return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
+_mm512_maskz_rsqrt14_ps (__mmask16 __U, __m512 __A)
{
- return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
- (__v16hi) __O,
- __M);
+ return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m256i
+#ifdef __OPTIMIZE__
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi32_epi16 (__mmask16 __M, __m512i __A)
+_mm512_sqrt_round_pd (__m512d __A, const int __R)
{
- return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
- (__v16hi)
- _mm256_setzero_si256 (),
- __M);
+ return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_epi32 (__m512i __A)
+_mm512_mask_sqrt_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ const int __R)
{
- return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A)
+_mm512_maskz_sqrt_round_pd (__mmask8 __U, __m512d __A, const int __R)
{
- __builtin_ia32_pmovqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M);
+ return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
+_mm512_sqrt_round_ps (__m512 __A, const int __R)
{
- return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
- (__v8si) __O, __M);
+ return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_epi32 (__mmask8 __M, __m512i __A)
+_mm512_mask_sqrt_round_ps (__m512 __W, __mmask16 __U, __m512 __A, const int __R)
{
- return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- __M);
+ return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi64_epi32 (__m512i __A)
+_mm512_maskz_sqrt_round_ps (__mmask16 __U, __m512 __A, const int __R)
{
- return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
-extern __inline void
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_storeu_epi32 (void *__P, __mmask8 __M, __m512i __A)
-{
- __builtin_ia32_pmovsqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M);
-}
+#else
+#define _mm512_sqrt_round_pd(A, C) \
+ (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, C)
-extern __inline __m256i
+#define _mm512_mask_sqrt_round_pd(W, U, A, C) \
+ (__m512d)__builtin_ia32_sqrtpd512_mask(A, W, U, C)
+
+#define _mm512_maskz_sqrt_round_pd(U, A, C) \
+ (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_setzero_pd(), U, C)
+
+#define _mm512_sqrt_round_ps(A, C) \
+ (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_undefined_ps(), -1, C)
+
+#define _mm512_mask_sqrt_round_ps(W, U, A, C) \
+ (__m512)__builtin_ia32_sqrtps512_mask(A, W, U, C)
+
+#define _mm512_maskz_sqrt_round_ps(U, A, C) \
+ (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#endif
+
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
+_mm512_cvtepi8_epi32 (__m128i __A)
{
- return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
- (__v8si) __O, __M);
+ return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi64_epi32 (__mmask8 __M, __m512i __A)
+_mm512_mask_cvtepi8_epi32 (__m512i __W, __mmask16 __U, __m128i __A)
{
- return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- __M);
+ return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi64_epi32 (__m512i __A)
+_mm512_maskz_cvtepi8_epi32 (__mmask16 __U, __m128i __A)
{
- return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A)
+_mm512_cvtepi8_epi64 (__m128i __A)
{
- __builtin_ia32_pmovusqd512mem_mask ((__v8si*) __P, (__v8di) __A, __M);
+ return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepi8_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
{
- return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
- (__v8si) __O, __M);
+ return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m256i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi64_epi32 (__mmask8 __M, __m512i __A)
-{
- return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- __M);
+_mm512_maskz_cvtepi8_epi64 (__mmask8 __U, __m128i __A)
+{
+ return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_epi16 (__m512i __A)
+_mm512_cvtepi16_epi32 (__m256i __A)
{
- return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
- (__v8hi)
- _mm_undefined_si128 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepi16_epi32 (__m512i __W, __mmask16 __U, __m256i __A)
{
- __builtin_ia32_pmovqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
+ return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepi16_epi32 (__mmask16 __U, __m256i __A)
{
- return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
- (__v8hi) __O, __M);
+ return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_epi16 (__mmask8 __M, __m512i __A)
+_mm512_cvtepi16_epi64 (__m128i __A)
{
- return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
- (__v8hi)
- _mm_setzero_si128 (),
- __M);
+ return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi64_epi16 (__m512i __A)
+_mm512_mask_cvtepi16_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
{
- return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
- (__v8hi)
- _mm_undefined_si128 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_storeu_epi16 (void * __P, __mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepi16_epi64 (__mmask8 __U, __m128i __A)
{
- __builtin_ia32_pmovsqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
+ return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_cvtepi32_epi64 (__m256i __X)
{
- return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
- (__v8hi) __O, __M);
+ return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi64_epi16 (__mmask8 __M, __m512i __A)
+_mm512_mask_cvtepi32_epi64 (__m512i __W, __mmask8 __U, __m256i __X)
{
- return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
- (__v8hi)
- _mm_setzero_si128 (),
- __M);
+ return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi64_epi16 (__m512i __A)
+_mm512_maskz_cvtepi32_epi64 (__mmask8 __U, __m256i __X)
{
- return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
- (__v8hi)
- _mm_undefined_si128 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
+_mm512_cvtepu8_epi32 (__m128i __A)
{
- __builtin_ia32_pmovusqw512mem_mask ((__v8hi*) __P, (__v8di) __A, __M);
+ return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepu8_epi32 (__m512i __W, __mmask16 __U, __m128i __A)
{
- return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
- (__v8hi) __O, __M);
+ return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi64_epi16 (__mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepu8_epi32 (__mmask16 __U, __m128i __A)
{
- return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
- (__v8hi)
- _mm_setzero_si128 (),
- __M);
+ return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_epi8 (__m512i __A)
+_mm512_cvtepu8_epi64 (__m128i __A)
{
- return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
- (__v16qi)
- _mm_undefined_si128 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepu8_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
{
- __builtin_ia32_pmovqb512mem_mask ((unsigned long long *) __P,
- (__v8di) __A, __M);
+ return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepu8_epi64 (__mmask8 __U, __m128i __A)
{
- return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
- (__v16qi) __O, __M);
+ return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_epi8 (__mmask8 __M, __m512i __A)
+_mm512_cvtepu16_epi32 (__m256i __A)
{
- return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
- (__v16qi)
- _mm_setzero_si128 (),
- __M);
+ return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsepi64_epi8 (__m512i __A)
+_mm512_mask_cvtepu16_epi32 (__m512i __W, __mmask16 __U, __m256i __A)
{
- return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
- (__v16qi)
- _mm_undefined_si128 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepu16_epi32 (__mmask16 __U, __m256i __A)
{
- __builtin_ia32_pmovsqb512mem_mask ((unsigned long long *) __P, (__v8di) __A, __M);
+ return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtsepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_cvtepu16_epi64 (__m128i __A)
{
- return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
- (__v16qi) __O, __M);
+ return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtsepi64_epi8 (__mmask8 __M, __m512i __A)
+_mm512_mask_cvtepu16_epi64 (__m512i __W, __mmask8 __U, __m128i __A)
{
- return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
- (__v16qi)
- _mm_setzero_si128 (),
- __M);
+ return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtusepi64_epi8 (__m512i __A)
+_mm512_maskz_cvtepu16_epi64 (__mmask8 __U, __m128i __A)
{
- return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
- (__v16qi)
- _mm_undefined_si128 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
+_mm512_cvtepu32_epi64 (__m256i __X)
{
- __builtin_ia32_pmovusqb512mem_mask ((unsigned long long *) __P, (__v8di) __A, __M);
+ return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtusepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
+_mm512_mask_cvtepu32_epi64 (__m512i __W, __mmask8 __U, __m256i __X)
{
- return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
- (__v16qi) __O,
- __M);
+ return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m128i
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtusepi64_epi8 (__mmask8 __M, __m512i __A)
+_mm512_maskz_cvtepu32_epi64 (__mmask8 __U, __m256i __X)
{
- return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
- (__v16qi)
- _mm_setzero_si128 (),
- __M);
+ return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
+#ifdef __OPTIMIZE__
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_pd (__m256i __A)
+_mm512_add_round_pd (__m512d __A, __m512d __B, const int __R)
{
- return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_pd (__m512d __W, __mmask8 __U, __m256i __A)
+_mm512_mask_add_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B, const int __R)
{
- return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_pd (__mmask8 __U, __m256i __A)
+_mm512_maskz_add_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu32_pd (__m256i __A)
+_mm512_add_round_ps (__m512 __A, __m512 __B, const int __R)
{
- return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu32_pd (__m512d __W, __mmask8 __U, __m256i __A)
+_mm512_mask_add_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ __m512 __B, const int __R)
{
- return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu32_pd (__mmask8 __U, __m256i __A)
+_mm512_maskz_add_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
{
- return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepi32_ps (__m512i __A, const int __R)
+_mm512_sub_round_pd (__m512d __A, __m512d __B, const int __R)
{
- return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepi32_ps (__m512 __W, __mmask16 __U, __m512i __A,
- const int __R)
+_mm512_mask_sub_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B, const int __R)
{
- return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepi32_ps (__mmask16 __U, __m512i __A, const int __R)
+_mm512_maskz_sub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ const int __R)
{
- return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepu32_ps (__m512i __A, const int __R)
+_mm512_sub_round_ps (__m512 __A, __m512 __B, const int __R)
{
- return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepu32_ps (__m512 __W, __mmask16 __U, __m512i __A,
- const int __R)
+_mm512_mask_sub_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ __m512 __B, const int __R)
{
- return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepu32_ps (__mmask16 __U, __m512i __A, const int __R)
+_mm512_maskz_sub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
{
- return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
-
#else
-#define _mm512_cvt_roundepi32_ps(A, B) \
- (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+#define _mm512_add_round_pd(A, B, C) \
+ (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-#define _mm512_mask_cvt_roundepi32_ps(W, U, A, B) \
- (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), W, U, B)
+#define _mm512_mask_add_round_pd(W, U, A, B, C) \
+ (__m512d)__builtin_ia32_addpd512_mask(A, B, W, U, C)
-#define _mm512_maskz_cvt_roundepi32_ps(U, A, B) \
- (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+#define _mm512_maskz_add_round_pd(U, A, B, C) \
+ (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-#define _mm512_cvt_roundepu32_ps(A, B) \
- (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+#define _mm512_add_round_ps(A, B, C) \
+ (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-#define _mm512_mask_cvt_roundepu32_ps(W, U, A, B) \
- (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), W, U, B)
+#define _mm512_mask_add_round_ps(W, U, A, B, C) \
+ (__m512)__builtin_ia32_addps512_mask(A, B, W, U, C)
-#define _mm512_maskz_cvt_roundepu32_ps(U, A, B) \
- (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+#define _mm512_maskz_add_round_ps(U, A, B, C) \
+ (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#define _mm512_sub_round_pd(A, B, C) \
+ (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
+
+#define _mm512_mask_sub_round_pd(W, U, A, B, C) \
+ (__m512d)__builtin_ia32_subpd512_mask(A, B, W, U, C)
+
+#define _mm512_maskz_sub_round_pd(U, A, B, C) \
+ (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
+
+#define _mm512_sub_round_ps(A, B, C) \
+ (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
+
+#define _mm512_mask_sub_round_ps(W, U, A, B, C) \
+ (__m512)__builtin_ia32_subps512_mask(A, B, W, U, C)
+
+#define _mm512_maskz_sub_round_ps(U, A, B, C) \
+ (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
#endif
#ifdef __OPTIMIZE__
-extern __inline __m256d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_extractf64x4_pd (__m512d __A, const int __imm)
+_mm512_mul_round_pd (__m512d __A, __m512d __B, const int __R)
{
- return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
- __imm,
- (__v4df)
- _mm256_undefined_pd (),
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m256d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_extractf64x4_pd (__m256d __W, __mmask8 __U, __m512d __A,
- const int __imm)
+_mm512_mask_mul_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B, const int __R)
{
- return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
- __imm,
- (__v4df) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m256d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_extractf64x4_pd (__mmask8 __U, __m512d __A, const int __imm)
+_mm512_maskz_mul_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ const int __R)
{
- return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
- __imm,
- (__v4df)
- _mm256_setzero_pd (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_extractf32x4_ps (__m512 __A, const int __imm)
+_mm512_mul_round_ps (__m512 __A, __m512 __B, const int __R)
{
- return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
- __imm,
- (__v4sf)
- _mm_undefined_ps (),
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_extractf32x4_ps (__m128 __W, __mmask8 __U, __m512 __A,
- const int __imm)
-{
- return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
- __imm,
- (__v4sf) __W,
- (__mmask8) __U);
+_mm512_mask_mul_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ __m512 __B, const int __R)
+{
+ return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_extractf32x4_ps (__mmask8 __U, __m512 __A, const int __imm)
+_mm512_maskz_mul_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
{
- return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
- __imm,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_extracti64x4_epi64 (__m512i __A, const int __imm)
+_mm512_div_round_pd (__m512d __M, __m512d __V, const int __R)
{
- return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
- __imm,
- (__v4di)
- _mm256_undefined_si256 (),
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+ (__v8df) __V,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_extracti64x4_epi64 (__m256i __W, __mmask8 __U, __m512i __A,
- const int __imm)
+_mm512_mask_div_round_pd (__m512d __W, __mmask8 __U, __m512d __M,
+ __m512d __V, const int __R)
{
- return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
- __imm,
- (__v4di) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+ (__v8df) __V,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_extracti64x4_epi64 (__mmask8 __U, __m512i __A, const int __imm)
+_mm512_maskz_div_round_pd (__mmask8 __U, __m512d __M, __m512d __V,
+ const int __R)
{
- return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
- __imm,
- (__v4di)
- _mm256_setzero_si256 (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+ (__v8df) __V,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m128i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_extracti32x4_epi32 (__m512i __A, const int __imm)
+_mm512_div_round_ps (__m512 __A, __m512 __B, const int __R)
{
- return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
- __imm,
- (__v4si)
- _mm_undefined_si128 (),
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline __m128i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_extracti32x4_epi32 (__m128i __W, __mmask8 __U, __m512i __A,
- const int __imm)
+_mm512_mask_div_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ __m512 __B, const int __R)
{
- return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
- __imm,
- (__v4si) __W,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
}
-extern __inline __m128i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_extracti32x4_epi32 (__mmask8 __U, __m512i __A, const int __imm)
+_mm512_maskz_div_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
{
- return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
- __imm,
- (__v4si)
- _mm_setzero_si128 (),
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
+
#else
+#define _mm512_mul_round_pd(A, B, C) \
+ (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-#define _mm512_extractf64x4_pd(X, C) \
- ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X), \
- (int) (C),\
- (__v4df)(__m256d)_mm256_undefined_pd(),\
- (__mmask8)-1))
+#define _mm512_mask_mul_round_pd(W, U, A, B, C) \
+ (__m512d)__builtin_ia32_mulpd512_mask(A, B, W, U, C)
-#define _mm512_mask_extractf64x4_pd(W, U, X, C) \
- ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X), \
- (int) (C),\
- (__v4df)(__m256d)(W),\
- (__mmask8)(U)))
+#define _mm512_maskz_mul_round_pd(U, A, B, C) \
+ (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-#define _mm512_maskz_extractf64x4_pd(U, X, C) \
- ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X), \
- (int) (C),\
- (__v4df)(__m256d)_mm256_setzero_pd(),\
- (__mmask8)(U)))
+#define _mm512_mul_round_ps(A, B, C) \
+ (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-#define _mm512_extractf32x4_ps(X, C) \
- ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X), \
- (int) (C),\
- (__v4sf)(__m128)_mm_undefined_ps(),\
- (__mmask8)-1))
+#define _mm512_mask_mul_round_ps(W, U, A, B, C) \
+ (__m512)__builtin_ia32_mulps512_mask(A, B, W, U, C)
-#define _mm512_mask_extractf32x4_ps(W, U, X, C) \
- ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X), \
- (int) (C),\
- (__v4sf)(__m128)(W),\
- (__mmask8)(U)))
+#define _mm512_maskz_mul_round_ps(U, A, B, C) \
+ (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-#define _mm512_maskz_extractf32x4_ps(U, X, C) \
- ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X), \
- (int) (C),\
- (__v4sf)(__m128)_mm_setzero_ps(),\
- (__mmask8)(U)))
+#define _mm512_div_round_pd(A, B, C) \
+ (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C)
-#define _mm512_extracti64x4_epi64(X, C) \
- ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X), \
- (int) (C),\
- (__v4di)(__m256i)_mm256_undefined_si256 (),\
- (__mmask8)-1))
+#define _mm512_mask_div_round_pd(W, U, A, B, C) \
+ (__m512d)__builtin_ia32_divpd512_mask(A, B, W, U, C)
-#define _mm512_mask_extracti64x4_epi64(W, U, X, C) \
- ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X), \
- (int) (C),\
- (__v4di)(__m256i)(W),\
- (__mmask8)(U)))
+#define _mm512_maskz_div_round_pd(U, A, B, C) \
+ (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C)
-#define _mm512_maskz_extracti64x4_epi64(U, X, C) \
- ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X), \
- (int) (C),\
- (__v4di)(__m256i)_mm256_setzero_si256 (),\
- (__mmask8)(U)))
+#define _mm512_div_round_ps(A, B, C) \
+ (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C)
-#define _mm512_extracti32x4_epi32(X, C) \
- ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X), \
- (int) (C),\
- (__v4si)(__m128i)_mm_undefined_si128 (),\
- (__mmask8)-1))
+#define _mm512_mask_div_round_ps(W, U, A, B, C) \
+ (__m512)__builtin_ia32_divps512_mask(A, B, W, U, C)
-#define _mm512_mask_extracti32x4_epi32(W, U, X, C) \
- ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X), \
- (int) (C),\
- (__v4si)(__m128i)(W),\
- (__mmask8)(U)))
+#define _mm512_maskz_div_round_ps(U, A, B, C) \
+ (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
-#define _mm512_maskz_extracti32x4_epi32(U, X, C) \
- ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X), \
- (int) (C),\
- (__v4si)(__m128i)_mm_setzero_si128 (),\
- (__mmask8)(U)))
#endif
#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_inserti32x4 (__m512i __A, __m128i __B, const int __imm)
+_mm512_max_round_pd (__m512d __A, __m512d __B, const int __R)
{
- return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __A,
- (__v4si) __B,
- __imm,
- (__v16si) __A, -1);
+ return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_max_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B, const int __R)
+{
+ return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_max_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ const int __R)
+{
+ return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_insertf32x4 (__m512 __A, __m128 __B, const int __imm)
+_mm512_max_round_ps (__m512 __A, __m512 __B, const int __R)
{
- return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __A,
- (__v4sf) __B,
- __imm,
- (__v16sf) __A, -1);
+ return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_max_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ __m512 __B, const int __R)
+{
+ return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_max_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
+{
+ return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_inserti64x4 (__m512i __A, __m256i __B, const int __imm)
+_mm512_min_round_pd (__m512d __A, __m512d __B, const int __R)
{
- return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
- (__v4di) __B,
- __imm,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_inserti64x4 (__m512i __W, __mmask8 __U, __m512i __A,
- __m256i __B, const int __imm)
+_mm512_mask_min_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B, const int __R)
{
- return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
- (__v4di) __B,
- __imm,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_inserti64x4 (__mmask8 __U, __m512i __A, __m256i __B,
- const int __imm)
+_mm512_maskz_min_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ const int __R)
{
- return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
- (__v4di) __B,
- __imm,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_insertf64x4 (__m512d __A, __m256d __B, const int __imm)
+_mm512_min_round_ps (__m512 __A, __m512 __B, const int __R)
{
- return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
- (__v4df) __B,
- __imm,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_insertf64x4 (__m512d __W, __mmask8 __U, __m512d __A,
- __m256d __B, const int __imm)
+_mm512_mask_min_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ __m512 __B, const int __R)
{
- return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
- (__v4df) __B,
- __imm,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_insertf64x4 (__mmask8 __U, __m512d __A, __m256d __B,
- const int __imm)
+_mm512_maskz_min_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
{
- return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
- (__v4df) __B,
- __imm,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
#else
-#define _mm512_insertf32x4(X, Y, C) \
- ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X), \
- (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (X), (__mmask16)(-1)))
+#define _mm512_max_round_pd(A, B, R) \
+ (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R)
-#define _mm512_inserti32x4(X, Y, C) \
- ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X), \
- (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (X), (__mmask16)(-1)))
+#define _mm512_mask_max_round_pd(W, U, A, B, R) \
+ (__m512d)__builtin_ia32_maxpd512_mask(A, B, W, U, R)
-#define _mm512_insertf64x4(X, Y, C) \
- ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X), \
- (__v4df)(__m256d) (Y), (int) (C), \
- (__v8df)(__m512d)_mm512_undefined_pd(), \
- (__mmask8)-1))
+#define _mm512_maskz_max_round_pd(U, A, B, R) \
+ (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R)
-#define _mm512_mask_insertf64x4(W, U, X, Y, C) \
- ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X), \
- (__v4df)(__m256d) (Y), (int) (C), \
- (__v8df)(__m512d)(W), \
- (__mmask8)(U)))
+#define _mm512_max_round_ps(A, B, R) \
+ (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_undefined_pd(), -1, R)
-#define _mm512_maskz_insertf64x4(U, X, Y, C) \
- ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X), \
- (__v4df)(__m256d) (Y), (int) (C), \
- (__v8df)(__m512d)_mm512_setzero_pd(), \
- (__mmask8)(U)))
+#define _mm512_mask_max_round_ps(W, U, A, B, R) \
+ (__m512)__builtin_ia32_maxps512_mask(A, B, W, U, R)
-#define _mm512_inserti64x4(X, Y, C) \
- ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X), \
- (__v4di)(__m256i) (Y), (int) (C), \
- (__v8di)(__m512i)_mm512_undefined_epi32 (), \
- (__mmask8)-1))
+#define _mm512_maskz_max_round_ps(U, A, B, R) \
+ (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R)
-#define _mm512_mask_inserti64x4(W, U, X, Y, C) \
- ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X), \
- (__v4di)(__m256i) (Y), (int) (C),\
- (__v8di)(__m512i)(W),\
- (__mmask8)(U)))
+#define _mm512_min_round_pd(A, B, R) \
+ (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R)
-#define _mm512_maskz_inserti64x4(U, X, Y, C) \
- ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X), \
- (__v4di)(__m256i) (Y), (int) (C), \
- (__v8di)(__m512i)_mm512_setzero_si512 (), \
- (__mmask8)(U)))
+#define _mm512_mask_min_round_pd(W, U, A, B, R) \
+ (__m512d)__builtin_ia32_minpd512_mask(A, B, W, U, R)
+
+#define _mm512_maskz_min_round_pd(U, A, B, R) \
+ (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R)
+
+#define _mm512_min_round_ps(A, B, R) \
+ (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, R)
+
+#define _mm512_mask_min_round_ps(W, U, A, B, R) \
+ (__m512)__builtin_ia32_minps512_mask(A, B, W, U, R)
+
+#define _mm512_maskz_min_round_ps(U, A, B, R) \
+ (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R)
#endif
+#ifdef __OPTIMIZE__
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_pd (void const *__P)
+_mm512_scalef_round_pd (__m512d __A, __m512d __B, const int __R)
{
- return *(__m512d_u *)__P;
+ return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_loadu_pd (__m512d __W, __mmask8 __U, void const *__P)
+_mm512_mask_scalef_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B, const int __R)
{
- return (__m512d) __builtin_ia32_loadupd512_mask ((const double *) __P,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_loadu_pd (__mmask8 __U, void const *__P)
+_mm512_maskz_scalef_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ const int __R)
{
- return (__m512d) __builtin_ia32_loadupd512_mask ((const double *) __P,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_pd (void *__P, __m512d __A)
+_mm512_scalef_round_ps (__m512 __A, __m512 __B, const int __R)
{
- *(__m512d_u *)__P = __A;
+ return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline void
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_storeu_pd (void *__P, __mmask8 __U, __m512d __A)
+_mm512_mask_scalef_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ __m512 __B, const int __R)
{
- __builtin_ia32_storeupd512_mask ((double *) __P, (__v8df) __A,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_scalef_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+ const int __R)
+{
+ return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_ps (void const *__P)
-{
- return *(__m512_u *)__P;
-}
+#else
+#define _mm512_scalef_round_pd(A, B, C) \
+ ((__m512d) \
+ __builtin_ia32_scalefpd512_mask((A), (B), \
+ (__v8df) _mm512_undefined_pd(), \
+ -1, (C)))
+
+#define _mm512_mask_scalef_round_pd(W, U, A, B, C) \
+ ((__m512d) __builtin_ia32_scalefpd512_mask((A), (B), (W), (U), (C)))
+
+#define _mm512_maskz_scalef_round_pd(U, A, B, C) \
+ ((__m512d) \
+ __builtin_ia32_scalefpd512_mask((A), (B), \
+ (__v8df) _mm512_setzero_pd(), \
+ (U), (C)))
+
+#define _mm512_scalef_round_ps(A, B, C) \
+ ((__m512) \
+ __builtin_ia32_scalefps512_mask((A), (B), \
+ (__v16sf) _mm512_undefined_ps(), \
+ -1, (C)))
+
+#define _mm512_mask_scalef_round_ps(W, U, A, B, C) \
+ ((__m512) __builtin_ia32_scalefps512_mask((A), (B), (W), (U), (C)))
+
+#define _mm512_maskz_scalef_round_ps(U, A, B, C) \
+ ((__m512) \
+ __builtin_ia32_scalefps512_mask((A), (B), \
+ (__v16sf) _mm512_setzero_ps(), \
+ (U), (C)))
+
+#endif
-extern __inline __m512
+#ifdef __OPTIMIZE__
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_loadu_ps (__m512 __W, __mmask16 __U, void const *__P)
+_mm512_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
{
- return (__m512) __builtin_ia32_loadups512_mask ((const float *) __P,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1, __R);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_loadu_ps (__mmask16 __U, void const *__P)
+_mm512_mask_fmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+ __m512d __C, const int __R)
{
- return (__m512) __builtin_ia32_loadups512_mask ((const float *) __P,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_ps (void *__P, __m512 __A)
+_mm512_mask3_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
+ __mmask8 __U, const int __R)
{
- *(__m512_u *)__P = __A;
+ return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_storeu_ps (void *__P, __mmask16 __U, __m512 __A)
+_mm512_maskz_fmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ __m512d __C, const int __R)
{
- __builtin_ia32_storeups512_mask ((float *) __P, (__v16sf) __A,
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_load_ss (__m128 __W, __mmask8 __U, const float *__P)
+_mm512_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
{
- return (__m128) __builtin_ia32_loadss_mask (__P, (__v4sf) __W, __U);
+ return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) -1, __R);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_load_ss (__mmask8 __U, const float *__P)
+_mm512_mask_fmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m128) __builtin_ia32_loadss_mask (__P, (__v4sf) _mm_setzero_ps (),
- __U);
+ return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_load_sd (__m128d __W, __mmask8 __U, const double *__P)
+_mm512_mask3_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
+ __mmask16 __U, const int __R)
{
- return (__m128d) __builtin_ia32_loadsd_mask (__P, (__v2df) __W, __U);
+ return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_load_sd (__mmask8 __U, const double *__P)
+_mm512_maskz_fmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m128d) __builtin_ia32_loadsd_mask (__P, (__v2df) _mm_setzero_pd (),
- __U);
+ return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_move_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
{
- return (__m128) __builtin_ia32_movess_mask ((__v4sf) __A, (__v4sf) __B,
- (__v4sf) __W, __U);
+ return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1, __R);
}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_move_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_fmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+ __m512d __C, const int __R)
{
- return (__m128) __builtin_ia32_movess_mask ((__v4sf) __A, (__v4sf) __B,
- (__v4sf) _mm_setzero_ps (), __U);
+ return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_move_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask3_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
+ __mmask8 __U, const int __R)
{
- return (__m128d) __builtin_ia32_movesd_mask ((__v2df) __A, (__v2df) __B,
- (__v2df) __W, __U);
+ return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_move_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_maskz_fmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ __m512d __C, const int __R)
{
- return (__m128d) __builtin_ia32_movesd_mask ((__v2df) __A, (__v2df) __B,
- (__v2df) _mm_setzero_pd (),
- __U);
+ return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_store_ss (float *__P, __mmask8 __U, __m128 __A)
+_mm512_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
{
- __builtin_ia32_storess_mask (__P, (__v4sf) __A, (__mmask8) __U);
+ return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) -1, __R);
}
-extern __inline void
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_store_sd (double *__P, __mmask8 __U, __m128d __A)
+_mm512_mask_fmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+ __m512 __C, const int __R)
{
- __builtin_ia32_storesd_mask (__P, (__v2df) __A, (__mmask8) __U);
+ return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_epi64 (void const *__P)
+_mm512_mask3_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
+ __mmask16 __U, const int __R)
{
- return *(__m512i_u *) __P;
+ return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_loadu_epi64 (__m512i __W, __mmask8 __U, void const *__P)
+_mm512_maskz_fmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m512i) __builtin_ia32_loaddqudi512_mask ((const long long *) __P,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_loadu_epi64 (__mmask8 __U, void const *__P)
+_mm512_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
{
- return (__m512i) __builtin_ia32_loaddqudi512_mask ((const long long *) __P,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1, __R);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_epi64 (void *__P, __m512i __A)
+_mm512_mask_fmaddsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+ __m512d __C, const int __R)
{
- *(__m512i_u *) __P = (__m512i_u) __A;
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_storeu_epi64 (void *__P, __mmask8 __U, __m512i __A)
+_mm512_mask3_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
+ __mmask8 __U, const int __R)
{
- __builtin_ia32_storedqudi512_mask ((long long *) __P, (__v8di) __A,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_si512 (void const *__P)
+_mm512_maskz_fmaddsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ __m512d __C, const int __R)
{
- return *(__m512i_u *)__P;
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_epi32 (void const *__P)
+_mm512_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
{
- return *(__m512i_u *) __P;
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_loadu_epi32 (__m512i __W, __mmask16 __U, void const *__P)
+_mm512_mask_fmaddsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m512i) __builtin_ia32_loaddqusi512_mask ((const int *) __P,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_loadu_epi32 (__mmask16 __U, void const *__P)
+_mm512_mask3_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
+ __mmask16 __U, const int __R)
{
- return (__m512i) __builtin_ia32_loaddqusi512_mask ((const int *) __P,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline void
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_si512 (void *__P, __m512i __A)
+_mm512_maskz_fmaddsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+ __m512 __C, const int __R)
{
- *(__m512i_u *)__P = __A;
+ return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_epi32 (void *__P, __m512i __A)
+_mm512_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
{
- *(__m512i_u *) __P = (__m512i_u) __A;
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ -(__v8df) __C,
+ (__mmask8) -1, __R);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_storeu_epi32 (void *__P, __mmask16 __U, __m512i __A)
+_mm512_mask_fmsubadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+ __m512d __C, const int __R)
{
- __builtin_ia32_storedqusi512_mask ((int *) __P, (__v16si) __A,
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ -(__v8df) __C,
+ (__mmask8) __U, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutevar_pd (__m512d __A, __m512i __C)
+_mm512_mask3_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
+ __mmask8 __U, const int __R)
{
- return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
- (__v8di) __C,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutevar_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512i __C)
+_mm512_maskz_fmsubadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ __m512d __C, const int __R)
{
- return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
- (__v8di) __C,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ -(__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutevar_pd (__mmask8 __U, __m512d __A, __m512i __C)
+_mm512_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
{
- return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
- (__v8di) __C,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ -(__v16sf) __C,
+ (__mmask16) -1, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutevar_ps (__m512 __A, __m512i __C)
+_mm512_mask_fmsubadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
- (__v16si) __C,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ -(__v16sf) __C,
+ (__mmask16) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutevar_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512i __C)
+_mm512_mask3_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
+ __mmask16 __U, const int __R)
{
- return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
- (__v16si) __C,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutevar_ps (__mmask16 __U, __m512 __A, __m512i __C)
+_mm512_maskz_fmsubadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
- (__v16si) __C,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ -(__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex2var_epi64 (__m512i __A, __m512i __I, __m512i __B)
+_mm512_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
{
- return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I
- /* idx */ ,
- (__v8di) __A,
- (__v8di) __B,
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex2var_epi64 (__m512i __A, __mmask8 __U, __m512i __I,
- __m512i __B)
+_mm512_mask_fnmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+ __m512d __C, const int __R)
{
- return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I
- /* idx */ ,
- (__v8di) __A,
- (__v8di) __B,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2_permutex2var_epi64 (__m512i __A, __m512i __I,
- __mmask8 __U, __m512i __B)
+_mm512_mask3_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
+ __mmask8 __U, const int __R)
{
- return (__m512i) __builtin_ia32_vpermi2varq512_mask ((__v8di) __A,
- (__v8di) __I
- /* idx */ ,
- (__v8di) __B,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex2var_epi64 (__mmask8 __U, __m512i __A,
- __m512i __I, __m512i __B)
+_mm512_maskz_fnmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ __m512d __C, const int __R)
{
- return (__m512i) __builtin_ia32_vpermt2varq512_maskz ((__v8di) __I
- /* idx */ ,
- (__v8di) __A,
- (__v8di) __B,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex2var_epi32 (__m512i __A, __m512i __I, __m512i __B)
+_mm512_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
{
- return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I
- /* idx */ ,
- (__v16si) __A,
- (__v16si) __B,
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex2var_epi32 (__m512i __A, __mmask16 __U,
- __m512i __I, __m512i __B)
+_mm512_mask_fnmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I
- /* idx */ ,
- (__v16si) __A,
- (__v16si) __B,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2_permutex2var_epi32 (__m512i __A, __m512i __I,
- __mmask16 __U, __m512i __B)
+_mm512_mask3_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
+ __mmask16 __U, const int __R)
{
- return (__m512i) __builtin_ia32_vpermi2vard512_mask ((__v16si) __A,
- (__v16si) __I
- /* idx */ ,
- (__v16si) __B,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex2var_epi32 (__mmask16 __U, __m512i __A,
- __m512i __I, __m512i __B)
+_mm512_maskz_fnmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m512i) __builtin_ia32_vpermt2vard512_maskz ((__v16si) __I
- /* idx */ ,
- (__v16si) __A,
- (__v16si) __B,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex2var_pd (__m512d __A, __m512i __I, __m512d __B)
+_mm512_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
{
- return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I
- /* idx */ ,
- (__v8df) __A,
- (__v8df) __B,
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex2var_pd (__m512d __A, __mmask8 __U, __m512i __I,
- __m512d __B)
+_mm512_mask_fnmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+ __m512d __C, const int __R)
{
- return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I
- /* idx */ ,
- (__v8df) __A,
- (__v8df) __B,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2_permutex2var_pd (__m512d __A, __m512i __I, __mmask8 __U,
- __m512d __B)
+_mm512_mask3_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C,
+ __mmask8 __U, const int __R)
{
- return (__m512d) __builtin_ia32_vpermi2varpd512_mask ((__v8df) __A,
- (__v8di) __I
- /* idx */ ,
- (__v8df) __B,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex2var_pd (__mmask8 __U, __m512d __A, __m512i __I,
- __m512d __B)
+_mm512_maskz_fnmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ __m512d __C, const int __R)
{
- return (__m512d) __builtin_ia32_vpermt2varpd512_maskz ((__v8di) __I
- /* idx */ ,
- (__v8df) __A,
- (__v8df) __B,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex2var_ps (__m512 __A, __m512i __I, __m512 __B)
+_mm512_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
{
- return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I
- /* idx */ ,
- (__v16sf) __A,
- (__v16sf) __B,
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) -1, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex2var_ps (__m512 __A, __mmask16 __U, __m512i __I, __m512 __B)
+_mm512_mask_fnmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I
- /* idx */ ,
- (__v16sf) __A,
- (__v16sf) __B,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask2_permutex2var_ps (__m512 __A, __m512i __I, __mmask16 __U,
- __m512 __B)
+_mm512_mask3_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C,
+ __mmask16 __U, const int __R)
{
- return (__m512) __builtin_ia32_vpermi2varps512_mask ((__v16sf) __A,
- (__v16si) __I
- /* idx */ ,
- (__v16sf) __B,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex2var_ps (__mmask16 __U, __m512 __A, __m512i __I,
- __m512 __B)
+_mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+ __m512 __C, const int __R)
{
- return (__m512) __builtin_ia32_vpermt2varps512_maskz ((__v16si) __I
- /* idx */ ,
- (__v16sf) __A,
- (__v16sf) __B,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U, __R);
}
+#else
+#define _mm512_fmadd_round_pd(A, B, C, R) \
+ (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, -1, R)
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+#define _mm512_mask_fmadd_round_pd(A, U, B, C, R) \
+ (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmadd_round_pd(A, B, C, U, R) \
+ (__m512d)__builtin_ia32_vfmaddpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmadd_round_pd(U, A, B, C, R) \
+ (__m512d)__builtin_ia32_vfmaddpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fmadd_round_ps(A, B, C, R) \
+ (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmadd_round_ps(A, U, B, C, R) \
+ (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmadd_round_ps(A, B, C, U, R) \
+ (__m512)__builtin_ia32_vfmaddps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmadd_round_ps(U, A, B, C, R) \
+ (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, C, U, R)
+
+#define _mm512_fmsub_round_pd(A, B, C, R) \
+ (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmsub_round_pd(A, U, B, C, R) \
+ (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmsub_round_pd(A, B, C, U, R) \
+ (__m512d)__builtin_ia32_vfmsubpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmsub_round_pd(U, A, B, C, R) \
+ (__m512d)__builtin_ia32_vfmsubpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fmsub_round_ps(A, B, C, R) \
+ (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmsub_round_ps(A, U, B, C, R) \
+ (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmsub_round_ps(A, B, C, U, R) \
+ (__m512)__builtin_ia32_vfmsubps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmsub_round_ps(U, A, B, C, R) \
+ (__m512)__builtin_ia32_vfmsubps512_maskz(A, B, C, U, R)
+
+#define _mm512_fmaddsub_round_pd(A, B, C, R) \
+ (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmaddsub_round_pd(A, U, B, C, R) \
+ (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmaddsub_round_pd(A, B, C, U, R) \
+ (__m512d)__builtin_ia32_vfmaddsubpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmaddsub_round_pd(U, A, B, C, R) \
+ (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fmaddsub_round_ps(A, B, C, R) \
+ (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fmaddsub_round_ps(A, U, B, C, R) \
+ (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fmaddsub_round_ps(A, B, C, U, R) \
+ (__m512)__builtin_ia32_vfmaddsubps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmaddsub_round_ps(U, A, B, C, R) \
+ (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, C, U, R)
+
+#define _mm512_fmsubadd_round_pd(A, B, C, R) \
+ (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), -1, R)
+
+#define _mm512_mask_fmsubadd_round_pd(A, U, B, C, R) \
+ (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), U, R)
+
+#define _mm512_mask3_fmsubadd_round_pd(A, B, C, U, R) \
+ (__m512d)__builtin_ia32_vfmsubaddpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmsubadd_round_pd(U, A, B, C, R) \
+ (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, -(C), U, R)
+
+#define _mm512_fmsubadd_round_ps(A, B, C, R) \
+ (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), -1, R)
+
+#define _mm512_mask_fmsubadd_round_ps(A, U, B, C, R) \
+ (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), U, R)
+
+#define _mm512_mask3_fmsubadd_round_ps(A, B, C, U, R) \
+ (__m512)__builtin_ia32_vfmsubaddps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fmsubadd_round_ps(U, A, B, C, R) \
+ (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, -(C), U, R)
+
+#define _mm512_fnmadd_round_pd(A, B, C, R) \
+ (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fnmadd_round_pd(A, U, B, C, R) \
+ (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fnmadd_round_pd(A, B, C, U, R) \
+ (__m512d)__builtin_ia32_vfnmaddpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fnmadd_round_pd(U, A, B, C, R) \
+ (__m512d)__builtin_ia32_vfnmaddpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fnmadd_round_ps(A, B, C, R) \
+ (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fnmadd_round_ps(A, U, B, C, R) \
+ (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fnmadd_round_ps(A, B, C, U, R) \
+ (__m512)__builtin_ia32_vfnmaddps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fnmadd_round_ps(U, A, B, C, R) \
+ (__m512)__builtin_ia32_vfnmaddps512_maskz(A, B, C, U, R)
+
+#define _mm512_fnmsub_round_pd(A, B, C, R) \
+ (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fnmsub_round_pd(A, U, B, C, R) \
+ (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fnmsub_round_pd(A, B, C, U, R) \
+ (__m512d)__builtin_ia32_vfnmsubpd512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fnmsub_round_pd(U, A, B, C, R) \
+ (__m512d)__builtin_ia32_vfnmsubpd512_maskz(A, B, C, U, R)
+
+#define _mm512_fnmsub_round_ps(A, B, C, R) \
+ (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, -1, R)
+
+#define _mm512_mask_fnmsub_round_ps(A, U, B, C, R) \
+ (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, U, R)
+
+#define _mm512_mask3_fnmsub_round_ps(A, B, C, U, R) \
+ (__m512)__builtin_ia32_vfnmsubps512_mask3(A, B, C, U, R)
+
+#define _mm512_maskz_fnmsub_round_ps(U, A, B, C, R) \
+ (__m512)__builtin_ia32_vfnmsubps512_maskz(A, B, C, U, R)
+#endif
+
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permute_pd (__m512d __X, const int __C)
+_mm512_abs_epi64 (__m512i __A)
{
- return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permute_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __C)
+_mm512_mask_abs_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
{
- return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permute_pd (__mmask8 __U, __m512d __X, const int __C)
+_mm512_maskz_abs_epi64 (__mmask8 __U, __m512i __A)
{
- return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permute_ps (__m512 __X, const int __C)
+_mm512_abs_epi32 (__m512i __A)
{
- return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permute_ps (__m512 __W, __mmask16 __U, __m512 __X, const int __C)
+_mm512_mask_abs_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
{
- return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permute_ps (__mmask16 __U, __m512 __X, const int __C)
+_mm512_maskz_abs_epi32 (__mmask16 __U, __m512i __A)
{
- return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-#else
-#define _mm512_permute_pd(X, C) \
- ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C), \
- (__v8df)(__m512d)_mm512_undefined_pd(),\
- (__mmask8)(-1)))
-
-#define _mm512_mask_permute_pd(W, U, X, C) \
- ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C), \
- (__v8df)(__m512d)(W), \
- (__mmask8)(U)))
-
-#define _mm512_maskz_permute_pd(U, X, C) \
- ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C), \
- (__v8df)(__m512d)_mm512_setzero_pd(), \
- (__mmask8)(U)))
-
-#define _mm512_permute_ps(X, C) \
- ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C), \
- (__v16sf)(__m512)_mm512_undefined_ps(),\
- (__mmask16)(-1)))
-
-#define _mm512_mask_permute_ps(W, U, X, C) \
- ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C), \
- (__v16sf)(__m512)(W), \
- (__mmask16)(U)))
-
-#define _mm512_maskz_permute_ps(U, X, C) \
- ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C), \
- (__v16sf)(__m512)_mm512_setzero_ps(), \
- (__mmask16)(U)))
-#endif
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex_epi64 (__m512i __X, const int __I)
+_mm512_broadcastss_ps (__m128 __A)
{
- return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) (-1));
+ return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex_epi64 (__m512i __W, __mmask8 __M,
- __m512i __X, const int __I)
+_mm512_mask_broadcastss_ps (__m512 __O, __mmask16 __M, __m128 __A)
{
- return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
- (__v8di) __W,
- (__mmask8) __M);
+ return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
+ (__v16sf) __O, __M);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex_epi64 (__mmask8 __M, __m512i __X, const int __I)
+_mm512_maskz_broadcastss_ps (__mmask16 __M, __m128 __A)
{
- return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __M);
+ return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ __M);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutex_pd (__m512d __X, const int __M)
+_mm512_broadcastsd_pd (__m128d __A)
{
- return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+ return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
(__v8df)
_mm512_undefined_pd (),
(__mmask8) -1);
@@ -6951,1543 +7236,1580 @@ _mm512_permutex_pd (__m512d __X, const int __M)
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutex_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __M)
+_mm512_mask_broadcastsd_pd (__m512d __O, __mmask8 __M, __m128d __A)
{
- return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
+ (__v8df) __O, __M);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutex_pd (__mmask8 __U, __m512d __X, const int __M)
+_mm512_maskz_broadcastsd_pd (__mmask8 __M, __m128d __A)
{
- return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+ return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A,
(__v8df)
_mm512_setzero_pd (),
- (__mmask8) __U);
+ __M);
}
-#else
-#define _mm512_permutex_pd(X, M) \
- ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M), \
- (__v8df)(__m512d)_mm512_undefined_pd(),\
- (__mmask8)-1))
-
-#define _mm512_mask_permutex_pd(W, U, X, M) \
- ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M), \
- (__v8df)(__m512d)(W), (__mmask8)(U)))
-#define _mm512_maskz_permutex_pd(U, X, M) \
- ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M), \
- (__v8df)(__m512d)_mm512_setzero_pd(),\
- (__mmask8)(U)))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_broadcastd_epi32 (__m128i __A)
+{
+ return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
+}
-#define _mm512_permutex_epi64(X, I) \
- ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
- (int)(I), \
- (__v8di)(__m512i) \
- (_mm512_undefined_epi32 ()),\
- (__mmask8)(-1)))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_broadcastd_epi32 (__m512i __O, __mmask16 __M, __m128i __A)
+{
+ return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
+ (__v16si) __O, __M);
+}
-#define _mm512_maskz_permutex_epi64(M, X, I) \
- ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
- (int)(I), \
- (__v8di)(__m512i) \
- (_mm512_setzero_si512 ()),\
- (__mmask8)(M)))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_broadcastd_epi32 (__mmask16 __M, __m128i __A)
+{
+ return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
+}
-#define _mm512_mask_permutex_epi64(W, M, X, I) \
- ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
- (int)(I), \
- (__v8di)(__m512i)(W), \
- (__mmask8)(M)))
-#endif
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi32 (int __A)
+{
+ return (__m512i)(__v16si)
+ { __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A };
+}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutexvar_epi64 (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_mask_set1_epi32 (__m512i __O, __mmask16 __M, int __A)
{
- return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
- (__v8di) __X,
- (__v8di)
- _mm512_setzero_si512 (),
- __M);
+ return (__m512i) __builtin_ia32_pbroadcastd512_gpr_mask (__A, (__v16si) __O,
+ __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutexvar_epi64 (__m512i __X, __m512i __Y)
+_mm512_maskz_set1_epi32 (__mmask16 __M, int __A)
{
- return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
- (__v8di) __X,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512i)
+ __builtin_ia32_pbroadcastd512_gpr_mask (__A,
+ (__v16si) _mm512_setzero_si512 (),
+ __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutexvar_epi64 (__m512i __W, __mmask8 __M, __m512i __X,
- __m512i __Y)
+_mm512_broadcastq_epi64 (__m128i __A)
{
- return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
- (__v8di) __X,
- (__v8di) __W,
- __M);
+ return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutexvar_epi32 (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_broadcastq_epi64 (__m512i __O, __mmask8 __M, __m128i __A)
{
- return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
- (__v16si) __X,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
+ return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
+ (__v8di) __O, __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutexvar_epi32 (__m512i __X, __m512i __Y)
+_mm512_maskz_broadcastq_epi64 (__mmask8 __M, __m128i __A)
{
- return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
- (__v16si) __X,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutexvar_epi32 (__m512i __W, __mmask16 __M, __m512i __X,
- __m512i __Y)
+_mm512_set1_epi64 (long long __A)
{
- return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
- (__v16si) __X,
- (__v16si) __W,
- __M);
+ return (__m512i)(__v8di) { __A, __A, __A, __A, __A, __A, __A, __A };
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutexvar_pd (__m512i __X, __m512d __Y)
+_mm512_mask_set1_epi64 (__m512i __O, __mmask8 __M, long long __A)
{
- return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
- (__v8di) __X,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pbroadcastq512_gpr_mask (__A, (__v8di) __O,
+ __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutexvar_pd (__m512d __W, __mmask8 __U, __m512i __X, __m512d __Y)
+_mm512_maskz_set1_epi64 (__mmask8 __M, long long __A)
{
- return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
- (__v8di) __X,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512i)
+ __builtin_ia32_pbroadcastq512_gpr_mask (__A,
+ (__v8di) _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutexvar_pd (__mmask8 __U, __m512i __X, __m512d __Y)
+_mm512_broadcast_f32x4 (__m128 __A)
{
- return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
- (__v8di) __X,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_permutexvar_ps (__m512i __X, __m512 __Y)
+_mm512_mask_broadcast_f32x4 (__m512 __O, __mmask16 __M, __m128 __A)
{
- return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
- (__v16si) __X,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
+ (__v16sf) __O,
+ __M);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_permutexvar_ps (__m512 __W, __mmask16 __U, __m512i __X, __m512 __Y)
+_mm512_maskz_broadcast_f32x4 (__mmask16 __M, __m128 __A)
{
- return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
- (__v16si) __X,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_permutexvar_ps (__mmask16 __U, __m512i __X, __m512 __Y)
+_mm512_broadcast_i32x4 (__m128i __A)
{
- return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
- (__v16si) __X,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_ps (__m512 __M, __m512 __V, const int __imm)
+_mm512_mask_broadcast_i32x4 (__m512i __O, __mmask16 __M, __m128i __A)
{
- return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
- (__v16sf) __V, __imm,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
+ (__v16si) __O,
+ __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_ps (__m512 __W, __mmask16 __U, __m512 __M,
- __m512 __V, const int __imm)
+_mm512_maskz_broadcast_i32x4 (__mmask16 __M, __m128i __A)
{
- return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
- (__v16sf) __V, __imm,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_ps (__mmask16 __U, __m512 __M, __m512 __V, const int __imm)
+_mm512_broadcast_f64x4 (__m256d __A)
{
- return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
- (__v16sf) __V, __imm,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_shuffle_pd (__m512d __M, __m512d __V, const int __imm)
+_mm512_mask_broadcast_f64x4 (__m512d __O, __mmask8 __M, __m256d __A)
{
- return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
- (__v8df) __V, __imm,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
+ (__v8df) __O,
+ __M);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_shuffle_pd (__m512d __W, __mmask8 __U, __m512d __M,
- __m512d __V, const int __imm)
+_mm512_maskz_broadcast_f64x4 (__mmask8 __M, __m256d __A)
{
- return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
- (__v8df) __V, __imm,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_shuffle_pd (__mmask8 __U, __m512d __M, __m512d __V,
- const int __imm)
+_mm512_broadcast_i64x4 (__m256i __A)
{
- return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
- (__v8df) __V, __imm,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fixupimm_round_pd (__m512d __A, __m512d __B, __m512i __C,
- const int __imm, const int __R)
+_mm512_mask_broadcast_i64x4 (__m512i __O, __mmask8 __M, __m256i __A)
{
- return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8di) __C,
- __imm,
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
+ (__v8di) __O,
+ __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fixupimm_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
- __m512i __C, const int __imm, const int __R)
+_mm512_maskz_broadcast_i64x4 (__mmask8 __M, __m256i __A)
{
- return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8di) __C,
- __imm,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512d
+typedef enum
+{
+ _MM_PERM_AAAA = 0x00, _MM_PERM_AAAB = 0x01, _MM_PERM_AAAC = 0x02,
+ _MM_PERM_AAAD = 0x03, _MM_PERM_AABA = 0x04, _MM_PERM_AABB = 0x05,
+ _MM_PERM_AABC = 0x06, _MM_PERM_AABD = 0x07, _MM_PERM_AACA = 0x08,
+ _MM_PERM_AACB = 0x09, _MM_PERM_AACC = 0x0A, _MM_PERM_AACD = 0x0B,
+ _MM_PERM_AADA = 0x0C, _MM_PERM_AADB = 0x0D, _MM_PERM_AADC = 0x0E,
+ _MM_PERM_AADD = 0x0F, _MM_PERM_ABAA = 0x10, _MM_PERM_ABAB = 0x11,
+ _MM_PERM_ABAC = 0x12, _MM_PERM_ABAD = 0x13, _MM_PERM_ABBA = 0x14,
+ _MM_PERM_ABBB = 0x15, _MM_PERM_ABBC = 0x16, _MM_PERM_ABBD = 0x17,
+ _MM_PERM_ABCA = 0x18, _MM_PERM_ABCB = 0x19, _MM_PERM_ABCC = 0x1A,
+ _MM_PERM_ABCD = 0x1B, _MM_PERM_ABDA = 0x1C, _MM_PERM_ABDB = 0x1D,
+ _MM_PERM_ABDC = 0x1E, _MM_PERM_ABDD = 0x1F, _MM_PERM_ACAA = 0x20,
+ _MM_PERM_ACAB = 0x21, _MM_PERM_ACAC = 0x22, _MM_PERM_ACAD = 0x23,
+ _MM_PERM_ACBA = 0x24, _MM_PERM_ACBB = 0x25, _MM_PERM_ACBC = 0x26,
+ _MM_PERM_ACBD = 0x27, _MM_PERM_ACCA = 0x28, _MM_PERM_ACCB = 0x29,
+ _MM_PERM_ACCC = 0x2A, _MM_PERM_ACCD = 0x2B, _MM_PERM_ACDA = 0x2C,
+ _MM_PERM_ACDB = 0x2D, _MM_PERM_ACDC = 0x2E, _MM_PERM_ACDD = 0x2F,
+ _MM_PERM_ADAA = 0x30, _MM_PERM_ADAB = 0x31, _MM_PERM_ADAC = 0x32,
+ _MM_PERM_ADAD = 0x33, _MM_PERM_ADBA = 0x34, _MM_PERM_ADBB = 0x35,
+ _MM_PERM_ADBC = 0x36, _MM_PERM_ADBD = 0x37, _MM_PERM_ADCA = 0x38,
+ _MM_PERM_ADCB = 0x39, _MM_PERM_ADCC = 0x3A, _MM_PERM_ADCD = 0x3B,
+ _MM_PERM_ADDA = 0x3C, _MM_PERM_ADDB = 0x3D, _MM_PERM_ADDC = 0x3E,
+ _MM_PERM_ADDD = 0x3F, _MM_PERM_BAAA = 0x40, _MM_PERM_BAAB = 0x41,
+ _MM_PERM_BAAC = 0x42, _MM_PERM_BAAD = 0x43, _MM_PERM_BABA = 0x44,
+ _MM_PERM_BABB = 0x45, _MM_PERM_BABC = 0x46, _MM_PERM_BABD = 0x47,
+ _MM_PERM_BACA = 0x48, _MM_PERM_BACB = 0x49, _MM_PERM_BACC = 0x4A,
+ _MM_PERM_BACD = 0x4B, _MM_PERM_BADA = 0x4C, _MM_PERM_BADB = 0x4D,
+ _MM_PERM_BADC = 0x4E, _MM_PERM_BADD = 0x4F, _MM_PERM_BBAA = 0x50,
+ _MM_PERM_BBAB = 0x51, _MM_PERM_BBAC = 0x52, _MM_PERM_BBAD = 0x53,
+ _MM_PERM_BBBA = 0x54, _MM_PERM_BBBB = 0x55, _MM_PERM_BBBC = 0x56,
+ _MM_PERM_BBBD = 0x57, _MM_PERM_BBCA = 0x58, _MM_PERM_BBCB = 0x59,
+ _MM_PERM_BBCC = 0x5A, _MM_PERM_BBCD = 0x5B, _MM_PERM_BBDA = 0x5C,
+ _MM_PERM_BBDB = 0x5D, _MM_PERM_BBDC = 0x5E, _MM_PERM_BBDD = 0x5F,
+ _MM_PERM_BCAA = 0x60, _MM_PERM_BCAB = 0x61, _MM_PERM_BCAC = 0x62,
+ _MM_PERM_BCAD = 0x63, _MM_PERM_BCBA = 0x64, _MM_PERM_BCBB = 0x65,
+ _MM_PERM_BCBC = 0x66, _MM_PERM_BCBD = 0x67, _MM_PERM_BCCA = 0x68,
+ _MM_PERM_BCCB = 0x69, _MM_PERM_BCCC = 0x6A, _MM_PERM_BCCD = 0x6B,
+ _MM_PERM_BCDA = 0x6C, _MM_PERM_BCDB = 0x6D, _MM_PERM_BCDC = 0x6E,
+ _MM_PERM_BCDD = 0x6F, _MM_PERM_BDAA = 0x70, _MM_PERM_BDAB = 0x71,
+ _MM_PERM_BDAC = 0x72, _MM_PERM_BDAD = 0x73, _MM_PERM_BDBA = 0x74,
+ _MM_PERM_BDBB = 0x75, _MM_PERM_BDBC = 0x76, _MM_PERM_BDBD = 0x77,
+ _MM_PERM_BDCA = 0x78, _MM_PERM_BDCB = 0x79, _MM_PERM_BDCC = 0x7A,
+ _MM_PERM_BDCD = 0x7B, _MM_PERM_BDDA = 0x7C, _MM_PERM_BDDB = 0x7D,
+ _MM_PERM_BDDC = 0x7E, _MM_PERM_BDDD = 0x7F, _MM_PERM_CAAA = 0x80,
+ _MM_PERM_CAAB = 0x81, _MM_PERM_CAAC = 0x82, _MM_PERM_CAAD = 0x83,
+ _MM_PERM_CABA = 0x84, _MM_PERM_CABB = 0x85, _MM_PERM_CABC = 0x86,
+ _MM_PERM_CABD = 0x87, _MM_PERM_CACA = 0x88, _MM_PERM_CACB = 0x89,
+ _MM_PERM_CACC = 0x8A, _MM_PERM_CACD = 0x8B, _MM_PERM_CADA = 0x8C,
+ _MM_PERM_CADB = 0x8D, _MM_PERM_CADC = 0x8E, _MM_PERM_CADD = 0x8F,
+ _MM_PERM_CBAA = 0x90, _MM_PERM_CBAB = 0x91, _MM_PERM_CBAC = 0x92,
+ _MM_PERM_CBAD = 0x93, _MM_PERM_CBBA = 0x94, _MM_PERM_CBBB = 0x95,
+ _MM_PERM_CBBC = 0x96, _MM_PERM_CBBD = 0x97, _MM_PERM_CBCA = 0x98,
+ _MM_PERM_CBCB = 0x99, _MM_PERM_CBCC = 0x9A, _MM_PERM_CBCD = 0x9B,
+ _MM_PERM_CBDA = 0x9C, _MM_PERM_CBDB = 0x9D, _MM_PERM_CBDC = 0x9E,
+ _MM_PERM_CBDD = 0x9F, _MM_PERM_CCAA = 0xA0, _MM_PERM_CCAB = 0xA1,
+ _MM_PERM_CCAC = 0xA2, _MM_PERM_CCAD = 0xA3, _MM_PERM_CCBA = 0xA4,
+ _MM_PERM_CCBB = 0xA5, _MM_PERM_CCBC = 0xA6, _MM_PERM_CCBD = 0xA7,
+ _MM_PERM_CCCA = 0xA8, _MM_PERM_CCCB = 0xA9, _MM_PERM_CCCC = 0xAA,
+ _MM_PERM_CCCD = 0xAB, _MM_PERM_CCDA = 0xAC, _MM_PERM_CCDB = 0xAD,
+ _MM_PERM_CCDC = 0xAE, _MM_PERM_CCDD = 0xAF, _MM_PERM_CDAA = 0xB0,
+ _MM_PERM_CDAB = 0xB1, _MM_PERM_CDAC = 0xB2, _MM_PERM_CDAD = 0xB3,
+ _MM_PERM_CDBA = 0xB4, _MM_PERM_CDBB = 0xB5, _MM_PERM_CDBC = 0xB6,
+ _MM_PERM_CDBD = 0xB7, _MM_PERM_CDCA = 0xB8, _MM_PERM_CDCB = 0xB9,
+ _MM_PERM_CDCC = 0xBA, _MM_PERM_CDCD = 0xBB, _MM_PERM_CDDA = 0xBC,
+ _MM_PERM_CDDB = 0xBD, _MM_PERM_CDDC = 0xBE, _MM_PERM_CDDD = 0xBF,
+ _MM_PERM_DAAA = 0xC0, _MM_PERM_DAAB = 0xC1, _MM_PERM_DAAC = 0xC2,
+ _MM_PERM_DAAD = 0xC3, _MM_PERM_DABA = 0xC4, _MM_PERM_DABB = 0xC5,
+ _MM_PERM_DABC = 0xC6, _MM_PERM_DABD = 0xC7, _MM_PERM_DACA = 0xC8,
+ _MM_PERM_DACB = 0xC9, _MM_PERM_DACC = 0xCA, _MM_PERM_DACD = 0xCB,
+ _MM_PERM_DADA = 0xCC, _MM_PERM_DADB = 0xCD, _MM_PERM_DADC = 0xCE,
+ _MM_PERM_DADD = 0xCF, _MM_PERM_DBAA = 0xD0, _MM_PERM_DBAB = 0xD1,
+ _MM_PERM_DBAC = 0xD2, _MM_PERM_DBAD = 0xD3, _MM_PERM_DBBA = 0xD4,
+ _MM_PERM_DBBB = 0xD5, _MM_PERM_DBBC = 0xD6, _MM_PERM_DBBD = 0xD7,
+ _MM_PERM_DBCA = 0xD8, _MM_PERM_DBCB = 0xD9, _MM_PERM_DBCC = 0xDA,
+ _MM_PERM_DBCD = 0xDB, _MM_PERM_DBDA = 0xDC, _MM_PERM_DBDB = 0xDD,
+ _MM_PERM_DBDC = 0xDE, _MM_PERM_DBDD = 0xDF, _MM_PERM_DCAA = 0xE0,
+ _MM_PERM_DCAB = 0xE1, _MM_PERM_DCAC = 0xE2, _MM_PERM_DCAD = 0xE3,
+ _MM_PERM_DCBA = 0xE4, _MM_PERM_DCBB = 0xE5, _MM_PERM_DCBC = 0xE6,
+ _MM_PERM_DCBD = 0xE7, _MM_PERM_DCCA = 0xE8, _MM_PERM_DCCB = 0xE9,
+ _MM_PERM_DCCC = 0xEA, _MM_PERM_DCCD = 0xEB, _MM_PERM_DCDA = 0xEC,
+ _MM_PERM_DCDB = 0xED, _MM_PERM_DCDC = 0xEE, _MM_PERM_DCDD = 0xEF,
+ _MM_PERM_DDAA = 0xF0, _MM_PERM_DDAB = 0xF1, _MM_PERM_DDAC = 0xF2,
+ _MM_PERM_DDAD = 0xF3, _MM_PERM_DDBA = 0xF4, _MM_PERM_DDBB = 0xF5,
+ _MM_PERM_DDBC = 0xF6, _MM_PERM_DDBD = 0xF7, _MM_PERM_DDCA = 0xF8,
+ _MM_PERM_DDCB = 0xF9, _MM_PERM_DDCC = 0xFA, _MM_PERM_DDCD = 0xFB,
+ _MM_PERM_DDDA = 0xFC, _MM_PERM_DDDB = 0xFD, _MM_PERM_DDDC = 0xFE,
+ _MM_PERM_DDDD = 0xFF
+} _MM_PERM_ENUM;
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fixupimm_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
- __m512i __C, const int __imm, const int __R)
+_mm512_shuffle_epi32 (__m512i __A, _MM_PERM_ENUM __mask)
{
- return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8di) __C,
- __imm,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
+ __mask,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fixupimm_round_ps (__m512 __A, __m512 __B, __m512i __C,
- const int __imm, const int __R)
+_mm512_mask_shuffle_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+ _MM_PERM_ENUM __mask)
{
- return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16si) __C,
- __imm,
- (__mmask16) -1, __R);
+ return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
+ __mask,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fixupimm_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
- __m512i __C, const int __imm, const int __R)
+_mm512_maskz_shuffle_epi32 (__mmask16 __U, __m512i __A, _MM_PERM_ENUM __mask)
{
- return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16si) __C,
- __imm,
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A,
+ __mask,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fixupimm_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
- __m512i __C, const int __imm, const int __R)
+_mm512_shuffle_i64x2 (__m512i __A, __m512i __B, const int __imm)
{
- return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16si) __C,
- __imm,
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
+ (__v8di) __B, __imm,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fixupimm_round_sd (__m128d __A, __m128d __B, __m128i __C,
- const int __imm, const int __R)
+_mm512_mask_shuffle_i64x2 (__m512i __W, __mmask8 __U, __m512i __A,
+ __m512i __B, const int __imm)
{
- return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
- (__v2df) __B,
- (__v2di) __C, __imm,
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
+ (__v8di) __B, __imm,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fixupimm_round_sd (__m128d __A, __mmask8 __U, __m128d __B,
- __m128i __C, const int __imm, const int __R)
+_mm512_maskz_shuffle_i64x2 (__mmask8 __U, __m512i __A, __m512i __B,
+ const int __imm)
{
- return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
- (__v2df) __B,
- (__v2di) __C, __imm,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A,
+ (__v8di) __B, __imm,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fixupimm_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- __m128i __C, const int __imm, const int __R)
+_mm512_shuffle_i32x4 (__m512i __A, __m512i __B, const int __imm)
{
- return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A,
- (__v2df) __B,
- (__v2di) __C,
- __imm,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
+ (__v16si) __B,
+ __imm,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fixupimm_round_ss (__m128 __A, __m128 __B, __m128i __C,
- const int __imm, const int __R)
+_mm512_mask_shuffle_i32x4 (__m512i __W, __mmask16 __U, __m512i __A,
+ __m512i __B, const int __imm)
{
- return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
- (__v4sf) __B,
- (__v4si) __C, __imm,
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
+ (__v16si) __B,
+ __imm,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fixupimm_round_ss (__m128 __A, __mmask8 __U, __m128 __B,
- __m128i __C, const int __imm, const int __R)
+_mm512_maskz_shuffle_i32x4 (__mmask16 __U, __m512i __A, __m512i __B,
+ const int __imm)
{
- return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
- (__v4sf) __B,
- (__v4si) __C, __imm,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A,
+ (__v16si) __B,
+ __imm,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fixupimm_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- __m128i __C, const int __imm, const int __R)
+_mm512_shuffle_f64x2 (__m512d __A, __m512d __B, const int __imm)
{
- return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A,
- (__v4sf) __B,
- (__v4si) __C, __imm,
- (__mmask8) __U, __R);
+ return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
+ (__v8df) __B, __imm,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-#else
-#define _mm512_shuffle_pd(X, Y, C) \
- ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (int)(C),\
- (__v8df)(__m512d)_mm512_undefined_pd(),\
- (__mmask8)-1))
-
-#define _mm512_mask_shuffle_pd(W, U, X, Y, C) \
- ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (int)(C),\
- (__v8df)(__m512d)(W),\
- (__mmask8)(U)))
-
-#define _mm512_maskz_shuffle_pd(U, X, Y, C) \
- ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (int)(C),\
- (__v8df)(__m512d)_mm512_setzero_pd(),\
- (__mmask8)(U)))
-
-#define _mm512_shuffle_ps(X, Y, C) \
- ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (int)(C),\
- (__v16sf)(__m512)_mm512_undefined_ps(),\
- (__mmask16)-1))
-
-#define _mm512_mask_shuffle_ps(W, U, X, Y, C) \
- ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (int)(C),\
- (__v16sf)(__m512)(W),\
- (__mmask16)(U)))
-
-#define _mm512_maskz_shuffle_ps(U, X, Y, C) \
- ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (int)(C),\
- (__v16sf)(__m512)_mm512_setzero_ps(),\
- (__mmask16)(U)))
-
-#define _mm512_fixupimm_round_pd(X, Y, Z, C, R) \
- ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
- (__mmask8)(-1), (R)))
-
-#define _mm512_mask_fixupimm_round_pd(X, U, Y, Z, C, R) \
- ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
- (__mmask8)(U), (R)))
-
-#define _mm512_maskz_fixupimm_round_pd(U, X, Y, Z, C, R) \
- ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
- (__mmask8)(U), (R)))
-
-#define _mm512_fixupimm_round_ps(X, Y, Z, C, R) \
- ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
- (__mmask16)(-1), (R)))
-
-#define _mm512_mask_fixupimm_round_ps(X, U, Y, Z, C, R) \
- ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
- (__mmask16)(U), (R)))
-
-#define _mm512_maskz_fixupimm_round_ps(U, X, Y, Z, C, R) \
- ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
- (__mmask16)(U), (R)))
-
-#define _mm_fixupimm_round_sd(X, Y, Z, C, R) \
- ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
- (__mmask8)(-1), (R)))
-
-#define _mm_mask_fixupimm_round_sd(X, U, Y, Z, C, R) \
- ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
- (__mmask8)(U), (R)))
-
-#define _mm_maskz_fixupimm_round_sd(U, X, Y, Z, C, R) \
- ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
- (__mmask8)(U), (R)))
-
-#define _mm_fixupimm_round_ss(X, Y, Z, C, R) \
- ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
- (__mmask8)(-1), (R)))
-
-#define _mm_mask_fixupimm_round_ss(X, U, Y, Z, C, R) \
- ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
- (__mmask8)(U), (R)))
-
-#define _mm_maskz_fixupimm_round_ss(U, X, Y, Z, C, R) \
- ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
- (__mmask8)(U), (R)))
-#endif
-
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movehdup_ps (__m512 __A)
+_mm512_mask_shuffle_f64x2 (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B, const int __imm)
{
- return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
+ (__v8df) __B, __imm,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_movehdup_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_shuffle_f64x2 (__mmask8 __U, __m512d __A, __m512d __B,
+ const int __imm)
{
- return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A,
+ (__v8df) __B, __imm,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_movehdup_ps (__mmask16 __U, __m512 __A)
+_mm512_shuffle_f32x4 (__m512 __A, __m512 __B, const int __imm)
{
- return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
+ (__v16sf) __B, __imm,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_moveldup_ps (__m512 __A)
+_mm512_mask_shuffle_f32x4 (__m512 __W, __mmask16 __U, __m512 __A,
+ __m512 __B, const int __imm)
{
- return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
+ (__v16sf) __B, __imm,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_moveldup_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_shuffle_f32x4 (__mmask16 __U, __m512 __A, __m512 __B,
+ const int __imm)
{
- return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A,
+ (__v16sf) __B, __imm,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m512
+#else
+#define _mm512_shuffle_epi32(X, C) \
+ ((__m512i) __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
+ (__v16si)(__m512i)_mm512_undefined_epi32 (),\
+ (__mmask16)-1))
+
+#define _mm512_mask_shuffle_epi32(W, U, X, C) \
+ ((__m512i) __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
+ (__v16si)(__m512i)(W),\
+ (__mmask16)(U)))
+
+#define _mm512_maskz_shuffle_epi32(U, X, C) \
+ ((__m512i) __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\
+ (__v16si)(__m512i)_mm512_setzero_si512 (),\
+ (__mmask16)(U)))
+
+#define _mm512_shuffle_i64x2(X, Y, C) \
+ ((__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(C),\
+ (__v8di)(__m512i)_mm512_undefined_epi32 (),\
+ (__mmask8)-1))
+
+#define _mm512_mask_shuffle_i64x2(W, U, X, Y, C) \
+ ((__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(C),\
+ (__v8di)(__m512i)(W),\
+ (__mmask8)(U)))
+
+#define _mm512_maskz_shuffle_i64x2(U, X, Y, C) \
+ ((__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(C),\
+ (__v8di)(__m512i)_mm512_setzero_si512 (),\
+ (__mmask8)(U)))
+
+#define _mm512_shuffle_i32x4(X, Y, C) \
+ ((__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(C),\
+ (__v16si)(__m512i)_mm512_undefined_epi32 (),\
+ (__mmask16)-1))
+
+#define _mm512_mask_shuffle_i32x4(W, U, X, Y, C) \
+ ((__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(C),\
+ (__v16si)(__m512i)(W),\
+ (__mmask16)(U)))
+
+#define _mm512_maskz_shuffle_i32x4(U, X, Y, C) \
+ ((__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(C),\
+ (__v16si)(__m512i)_mm512_setzero_si512 (),\
+ (__mmask16)(U)))
+
+#define _mm512_shuffle_f64x2(X, Y, C) \
+ ((__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (int)(C),\
+ (__v8df)(__m512d)_mm512_undefined_pd(),\
+ (__mmask8)-1))
+
+#define _mm512_mask_shuffle_f64x2(W, U, X, Y, C) \
+ ((__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (int)(C),\
+ (__v8df)(__m512d)(W),\
+ (__mmask8)(U)))
+
+#define _mm512_maskz_shuffle_f64x2(U, X, Y, C) \
+ ((__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (int)(C),\
+ (__v8df)(__m512d)_mm512_setzero_pd(),\
+ (__mmask8)(U)))
+
+#define _mm512_shuffle_f32x4(X, Y, C) \
+ ((__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (int)(C),\
+ (__v16sf)(__m512)_mm512_undefined_ps(),\
+ (__mmask16)-1))
+
+#define _mm512_mask_shuffle_f32x4(W, U, X, Y, C) \
+ ((__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (int)(C),\
+ (__v16sf)(__m512)(W),\
+ (__mmask16)(U)))
+
+#define _mm512_maskz_shuffle_f32x4(U, X, Y, C) \
+ ((__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (int)(C),\
+ (__v16sf)(__m512)_mm512_setzero_ps(),\
+ (__mmask16)(U)))
+#endif
+
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_moveldup_ps (__mmask16 __U, __m512 __A)
+_mm512_rolv_epi32 (__m512i __A, __m512i __B)
{
- return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_si512 (__m512i __A, __m512i __B)
+_mm512_mask_rolv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) ((__v16su) __A | (__v16su) __B);
+ return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_epi32 (__m512i __A, __m512i __B)
+_mm512_maskz_rolv_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) ((__v16su) __A | (__v16su) __B);
+ return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_or_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_rorv_epi32 (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_or_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_rorv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_epi64 (__m512i __A, __m512i __B)
+_mm512_maskz_rorv_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) ((__v8du) __A | (__v8du) __B);
+ return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_or_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_rolv_epi64 (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_or_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_rolv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_si512 (__m512i __A, __m512i __B)
+_mm512_maskz_rolv_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) ((__v16su) __A ^ (__v16su) __B);
+ return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_epi32 (__m512i __A, __m512i __B)
+_mm512_rorv_epi64 (__m512i __A, __m512i __B)
{
- return (__m512i) ((__v16su) __A ^ (__v16su) __B);
+ return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_xor_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_rorv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_xor_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_rorv_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_epi64 (__m512i __A, __m512i __B)
+_mm512_cvtt_roundpd_epi32 (__m512d __A, const int __R)
{
- return (__m512i) ((__v8du) __A ^ (__v8du) __B);
+ return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_xor_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+ (__v8si) __W,
+ (__mmask8) __U, __R);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R)
+{
+ return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_xor_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_cvtt_roundpd_epu32 (__m512d __A, const int __R)
{
- return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rol_epi32 (__m512i __A, const int __B)
+_mm512_mask_cvtt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+ (__v8si) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rol_epi32 (__m512i __W, __mmask16 __U, __m512i __A, const int __B)
+_mm512_maskz_cvtt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R)
{
- return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U, __R);
}
+#else
+#define _mm512_cvtt_roundpd_epi32(A, B) \
+ ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
-extern __inline __m512i
+#define _mm512_mask_cvtt_roundpd_epi32(W, U, A, B) \
+ ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)(W), U, B))
+
+#define _mm512_maskz_cvtt_roundpd_epi32(U, A, B) \
+ ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+
+#define _mm512_cvtt_roundpd_epu32(A, B) \
+ ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+
+#define _mm512_mask_cvtt_roundpd_epu32(W, U, A, B) \
+ ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)(W), U, B))
+
+#define _mm512_maskz_cvtt_roundpd_epu32(U, A, B) \
+ ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rol_epi32 (__mmask16 __U, __m512i __A, const int __B)
+_mm512_cvt_roundpd_epi32 (__m512d __A, const int __R)
{
- return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ror_epi32 (__m512i __A, int __B)
+_mm512_mask_cvt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+ (__v8si) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ror_epi32 (__m512i __W, __mmask16 __U, __m512i __A, int __B)
+_mm512_maskz_cvt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R)
{
- return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_ror_epi32 (__mmask16 __U, __m512i __A, int __B)
+_mm512_cvt_roundpd_epu32 (__m512d __A, const int __R)
{
- return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rol_epi64 (__m512i __A, const int __B)
+_mm512_mask_cvt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+ (__v8si) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rol_epi64 (__m512i __W, __mmask8 __U, __m512i __A, const int __B)
+_mm512_maskz_cvt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R)
{
- return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U, __R);
}
+#else
+#define _mm512_cvt_roundpd_epi32(A, B) \
+ ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+
+#define _mm512_mask_cvt_roundpd_epi32(W, U, A, B) \
+ ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)(W), U, B))
+
+#define _mm512_maskz_cvt_roundpd_epi32(U, A, B) \
+ ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+
+#define _mm512_cvt_roundpd_epu32(A, B) \
+ ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B))
+
+#define _mm512_mask_cvt_roundpd_epu32(W, U, A, B) \
+ ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)(W), U, B))
+
+#define _mm512_maskz_cvt_roundpd_epu32(U, A, B) \
+ ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B))
+#endif
+#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rol_epi64 (__mmask8 __U, __m512i __A, const int __B)
+_mm512_cvtt_roundps_epi32 (__m512 __A, const int __R)
{
- return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1, __R);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ror_epi64 (__m512i __A, int __B)
+_mm512_mask_cvtt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+ (__v16si) __W,
+ (__mmask16) __U, __R);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ror_epi64 (__m512i __W, __mmask8 __U, __m512i __A, int __B)
+_mm512_maskz_cvtt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R)
{
- return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U, __R);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_ror_epi64 (__mmask8 __U, __m512i __A, int __B)
+_mm512_cvtt_roundps_epu32 (__m512 __A, const int __R)
{
- return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
-}
-
-#else
-#define _mm512_rol_epi32(A, B) \
- ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A), \
- (int)(B), \
- (__v16si)_mm512_undefined_epi32 (), \
- (__mmask16)(-1)))
-#define _mm512_mask_rol_epi32(W, U, A, B) \
- ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A), \
- (int)(B), \
- (__v16si)(__m512i)(W), \
- (__mmask16)(U)))
-#define _mm512_maskz_rol_epi32(U, A, B) \
- ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A), \
- (int)(B), \
- (__v16si)_mm512_setzero_si512 (), \
- (__mmask16)(U)))
-#define _mm512_ror_epi32(A, B) \
- ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A), \
- (int)(B), \
- (__v16si)_mm512_undefined_epi32 (), \
- (__mmask16)(-1)))
-#define _mm512_mask_ror_epi32(W, U, A, B) \
- ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A), \
- (int)(B), \
- (__v16si)(__m512i)(W), \
- (__mmask16)(U)))
-#define _mm512_maskz_ror_epi32(U, A, B) \
- ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A), \
- (int)(B), \
- (__v16si)_mm512_setzero_si512 (), \
- (__mmask16)(U)))
-#define _mm512_rol_epi64(A, B) \
- ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A), \
- (int)(B), \
- (__v8di)_mm512_undefined_epi32 (), \
- (__mmask8)(-1)))
-#define _mm512_mask_rol_epi64(W, U, A, B) \
- ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A), \
- (int)(B), \
- (__v8di)(__m512i)(W), \
- (__mmask8)(U)))
-#define _mm512_maskz_rol_epi64(U, A, B) \
- ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A), \
- (int)(B), \
- (__v8di)_mm512_setzero_si512 (), \
- (__mmask8)(U)))
-
-#define _mm512_ror_epi64(A, B) \
- ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A), \
- (int)(B), \
- (__v8di)_mm512_undefined_epi32 (), \
- (__mmask8)(-1)))
-#define _mm512_mask_ror_epi64(W, U, A, B) \
- ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A), \
- (int)(B), \
- (__v8di)(__m512i)(W), \
- (__mmask8)(U)))
-#define _mm512_maskz_ror_epi64(U, A, B) \
- ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A), \
- (int)(B), \
- (__v8di)_mm512_setzero_si512 (), \
- (__mmask8)(U)))
-#endif
+ return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1, __R);
+}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_si512 (__m512i __A, __m512i __B)
+_mm512_mask_cvtt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A,
+ const int __R)
{
- return (__m512i) ((__v16su) __A & (__v16su) __B);
+ return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+ (__v16si) __W,
+ (__mmask16) __U, __R);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_epi32 (__m512i __A, __m512i __B)
+_mm512_maskz_cvtt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R)
{
- return (__m512i) ((__v16su) __A & (__v16su) __B);
+ return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U, __R);
}
+#else
+#define _mm512_cvtt_roundps_epi32(A, B) \
+ ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+
+#define _mm512_mask_cvtt_roundps_epi32(W, U, A, B) \
+ ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)(W), U, B))
+
+#define _mm512_maskz_cvtt_roundps_epi32(U, A, B) \
+ ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+
+#define _mm512_cvtt_roundps_epu32(A, B) \
+ ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+
+#define _mm512_mask_cvtt_roundps_epu32(W, U, A, B) \
+ ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)(W), U, B))
+#define _mm512_maskz_cvtt_roundps_epu32(U, A, B) \
+ ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+#endif
+
+#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_and_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_cvt_roundps_epi32 (__m512 __A, const int __R)
{
- return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1, __R);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_and_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+ (__v16si) __W,
+ (__mmask16) __U, __R);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_epi64 (__m512i __A, __m512i __B)
+_mm512_maskz_cvt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R)
{
- return (__m512i) ((__v8du) __A & (__v8du) __B);
+ return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U, __R);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_and_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_cvt_roundps_epu32 (__m512 __A, const int __R)
{
- return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W, __U);
+ return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1, __R);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_and_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_pd (),
- __U);
+ return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+ (__v16si) __W,
+ (__mmask16) __U, __R);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_si512 (__m512i __A, __m512i __B)
+_mm512_maskz_cvt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R)
{
- return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U, __R);
}
+#else
+#define _mm512_cvt_roundps_epi32(A, B) \
+ ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
-extern __inline __m512i
+#define _mm512_mask_cvt_roundps_epi32(W, U, A, B) \
+ ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)(W), U, B))
+
+#define _mm512_maskz_cvt_roundps_epi32(U, A, B) \
+ ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+
+#define _mm512_cvt_roundps_epu32(A, B) \
+ ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_undefined_epi32 (), -1, B))
+
+#define _mm512_mask_cvt_roundps_epu32(W, U, A, B) \
+ ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)(W), U, B))
+
+#define _mm512_maskz_cvt_roundps_epu32(U, A, B) \
+ ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B))
+#endif
+
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_epi32 (__m512i __A, __m512i __B)
+_mm512_cvtepi32_epi8 (__m512i __A)
{
- return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
+ return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
+ (__v16qi)
+ _mm_undefined_si128 (),
(__mmask16) -1);
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_andnot_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ __builtin_ia32_pmovdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
}
-extern __inline __m512i
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_andnot_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
+ (__v16qi) __O, __M);
}
-extern __inline __m512i
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_epi64 (__m512i __A, __m512i __B)
+_mm512_maskz_cvtepi32_epi8 (__mmask16 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A,
+ (__v16qi)
+ _mm_setzero_si128 (),
+ __M);
}
-extern __inline __m512i
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_andnot_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_cvtsepi32_epi8 (__m512i __A)
{
- return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W, __U);
+ return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
+ (__v16qi)
+ _mm_undefined_si128 (),
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_andnot_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtsepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_pd (),
- __U);
+ __builtin_ia32_pmovsdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
}
-extern __inline __mmask16
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_test_epi32_mask (__m512i __A, __m512i __B)
+_mm512_mask_cvtsepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
{
- return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
- (__v16si) __B,
- (__mmask16) -1);
+ return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
+ (__v16qi) __O, __M);
}
-extern __inline __mmask16
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_test_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_cvtsepi32_epi8 (__mmask16 __M, __m512i __A)
{
- return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
- (__v16si) __B, __U);
+ return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A,
+ (__v16qi)
+ _mm_setzero_si128 (),
+ __M);
}
-extern __inline __mmask8
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_test_epi64_mask (__m512i __A, __m512i __B)
+_mm512_cvtusepi32_epi8 (__m512i __A)
{
- return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A,
- (__v8di) __B,
- (__mmask8) -1);
+ return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
+ (__v16qi)
+ _mm_undefined_si128 (),
+ (__mmask16) -1);
}
-extern __inline __mmask8
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_test_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtusepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A)
{
- return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A, (__v8di) __B, __U);
+ __builtin_ia32_pmovusdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M);
}
-extern __inline __mmask16
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_testn_epi32_mask (__m512i __A, __m512i __B)
+_mm512_mask_cvtusepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A)
{
- return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
- (__v16si) __B,
- (__mmask16) -1);
+ return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
+ (__v16qi) __O,
+ __M);
}
-extern __inline __mmask16
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_cvtusepi32_epi8 (__mmask16 __M, __m512i __A)
{
- return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
- (__v16si) __B, __U);
+ return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A,
+ (__v16qi)
+ _mm_setzero_si128 (),
+ __M);
}
-extern __inline __mmask8
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_testn_epi64_mask (__m512i __A, __m512i __B)
+_mm512_cvtepi32_epi16 (__m512i __A)
{
- return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
- (__v8di) __B,
- (__mmask8) -1);
+ return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
+ (__v16hi)
+ _mm256_undefined_si256 (),
+ (__mmask16) -1);
}
-extern __inline __mmask8
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_testn_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtepi32_storeu_epi16 (void * __P, __mmask16 __M, __m512i __A)
{
- return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
- (__v8di) __B, __U);
+ __builtin_ia32_pmovdw512mem_mask ((__v16hi *) __P, (__v16si) __A, __M);
}
-extern __inline __m512
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_ps (__m512 __A)
+_mm512_mask_cvtepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
{
- return (__m512) _mm512_and_epi32 ((__m512i) __A,
- _mm512_set1_epi32 (0x7fffffff));
+ return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
+ (__v16hi) __O, __M);
}
-extern __inline __m512
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_cvtepi32_epi16 (__mmask16 __M, __m512i __A)
{
- return (__m512) _mm512_mask_and_epi32 ((__m512i) __W, __U, (__m512i) __A,
- _mm512_set1_epi32 (0x7fffffff));
+ return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A,
+ (__v16hi)
+ _mm256_setzero_si256 (),
+ __M);
}
-extern __inline __m512d
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_pd (__m512d __A)
+_mm512_cvtsepi32_epi16 (__m512i __A)
{
- return (__m512d) _mm512_and_epi64 ((__m512i) __A,
- _mm512_set1_epi64 (0x7fffffffffffffffLL));
+ return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
+ (__v16hi)
+ _mm256_undefined_si256 (),
+ (__mmask16) -1);
}
-extern __inline __m512d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_mask_cvtsepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A)
{
- return (__m512d)
- _mm512_mask_and_epi64 ((__m512i) __W, __U, (__m512i) __A,
- _mm512_set1_epi64 (0x7fffffffffffffffLL));
+ __builtin_ia32_pmovsdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpackhi_epi32 (__m512i __A, __m512i __B)
+_mm512_mask_cvtsepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
+ (__v16hi) __O, __M);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpackhi_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
- __m512i __B)
+_mm512_maskz_cvtsepi32_epi16 (__mmask16 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A,
+ (__v16hi)
+ _mm256_setzero_si256 (),
+ __M);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpackhi_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_cvtusepi32_epi16 (__m512i __A)
{
- return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
+ (__v16hi)
+ _mm256_undefined_si256 (),
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpackhi_epi64 (__m512i __A, __m512i __B)
+_mm512_mask_cvtusepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ __builtin_ia32_pmovusdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpackhi_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtusepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
+ (__v16hi) __O,
+ __M);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpackhi_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_maskz_cvtusepi32_epi16 (__mmask16 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A,
+ (__v16hi)
+ _mm256_setzero_si256 (),
+ __M);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpacklo_epi32 (__m512i __A, __m512i __B)
+_mm512_cvtepi64_epi32 (__m512i __A)
{
- return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpacklo_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
- __m512i __B)
+_mm512_mask_cvtepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W,
- (__mmask16) __U);
+ __builtin_ia32_pmovqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpacklo_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
+ (__v8si) __O, __M);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpacklo_epi64 (__m512i __A, __m512i __B)
+_mm512_maskz_cvtepi64_epi32 (__mmask8 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ __M);
}
-extern __inline __m512i
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpacklo_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
+_mm512_cvtsepi64_epi32 (__m512i __A)
{
- return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpacklo_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_mask_cvtsepi64_storeu_epi32 (void *__P, __mmask8 __M, __m512i __A)
{
- return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ __builtin_ia32_pmovsqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M);
}
-#ifdef __x86_64__
-#ifdef __OPTIMIZE__
-extern __inline unsigned long long
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_u64 (__m128 __A, const int __R)
+_mm512_mask_cvtsepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
{
- return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf) __A, __R);
+ return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
+ (__v8si) __O, __M);
}
-extern __inline long long
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_si64 (__m128 __A, const int __R)
+_mm512_maskz_cvtsepi64_epi32 (__mmask8 __M, __m512i __A)
{
- return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R);
+ return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ __M);
}
-extern __inline long long
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_i64 (__m128 __A, const int __R)
+_mm512_cvtusepi64_epi32 (__m512i __A)
{
- return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R);
+ return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1);
}
-extern __inline unsigned long long
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_u64 (__m128 __A, const int __R)
+_mm512_mask_cvtusepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A)
{
- return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf) __A, __R);
+ __builtin_ia32_pmovusqd512mem_mask ((__v8si*) __P, (__v8di) __A, __M);
}
-extern __inline long long
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_i64 (__m128 __A, const int __R)
+_mm512_mask_cvtusepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A)
{
- return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R);
+ return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
+ (__v8si) __O, __M);
}
-extern __inline long long
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_si64 (__m128 __A, const int __R)
+_mm512_maskz_cvtusepi64_epi32 (__mmask8 __M, __m512i __A)
{
- return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R);
+ return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ __M);
}
-#else
-#define _mm_cvt_roundss_u64(A, B) \
- ((unsigned long long)__builtin_ia32_vcvtss2usi64(A, B))
-
-#define _mm_cvt_roundss_si64(A, B) \
- ((long long)__builtin_ia32_vcvtss2si64(A, B))
-
-#define _mm_cvt_roundss_i64(A, B) \
- ((long long)__builtin_ia32_vcvtss2si64(A, B))
-
-#define _mm_cvtt_roundss_u64(A, B) \
- ((unsigned long long)__builtin_ia32_vcvttss2usi64(A, B))
-
-#define _mm_cvtt_roundss_i64(A, B) \
- ((long long)__builtin_ia32_vcvttss2si64(A, B))
-
-#define _mm_cvtt_roundss_si64(A, B) \
- ((long long)__builtin_ia32_vcvttss2si64(A, B))
-#endif
-#endif
-#ifdef __OPTIMIZE__
-extern __inline unsigned
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_u32 (__m128 __A, const int __R)
+_mm512_cvtepi64_epi16 (__m512i __A)
{
- return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A, __R);
+ return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
+ (__v8hi)
+ _mm_undefined_si128 (),
+ (__mmask8) -1);
}
-extern __inline int
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_si32 (__m128 __A, const int __R)
+_mm512_mask_cvtepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
{
- return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R);
+ __builtin_ia32_pmovqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
}
-extern __inline int
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_i32 (__m128 __A, const int __R)
+_mm512_mask_cvtepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
{
- return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R);
+ return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
+ (__v8hi) __O, __M);
}
-extern __inline unsigned
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_u32 (__m128 __A, const int __R)
+_mm512_maskz_cvtepi64_epi16 (__mmask8 __M, __m512i __A)
{
- return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A, __R);
+ return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A,
+ (__v8hi)
+ _mm_setzero_si128 (),
+ __M);
}
-extern __inline int
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_i32 (__m128 __A, const int __R)
+_mm512_cvtsepi64_epi16 (__m512i __A)
{
- return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R);
+ return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
+ (__v8hi)
+ _mm_undefined_si128 (),
+ (__mmask8) -1);
}
-extern __inline int
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundss_si32 (__m128 __A, const int __R)
+_mm512_mask_cvtsepi64_storeu_epi16 (void * __P, __mmask8 __M, __m512i __A)
{
- return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R);
+ __builtin_ia32_pmovsqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M);
}
-#else
-#define _mm_cvt_roundss_u32(A, B) \
- ((unsigned)__builtin_ia32_vcvtss2usi32(A, B))
-
-#define _mm_cvt_roundss_si32(A, B) \
- ((int)__builtin_ia32_vcvtss2si32(A, B))
-
-#define _mm_cvt_roundss_i32(A, B) \
- ((int)__builtin_ia32_vcvtss2si32(A, B))
-
-#define _mm_cvtt_roundss_u32(A, B) \
- ((unsigned)__builtin_ia32_vcvttss2usi32(A, B))
-
-#define _mm_cvtt_roundss_si32(A, B) \
- ((int)__builtin_ia32_vcvttss2si32(A, B))
-
-#define _mm_cvtt_roundss_i32(A, B) \
- ((int)__builtin_ia32_vcvttss2si32(A, B))
-#endif
-#ifdef __x86_64__
-#ifdef __OPTIMIZE__
-extern __inline unsigned long long
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_u64 (__m128d __A, const int __R)
+_mm512_mask_cvtsepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
{
- return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df) __A, __R);
+ return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
+ (__v8hi) __O, __M);
}
-extern __inline long long
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_si64 (__m128d __A, const int __R)
+_mm512_maskz_cvtsepi64_epi16 (__mmask8 __M, __m512i __A)
{
- return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R);
+ return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A,
+ (__v8hi)
+ _mm_setzero_si128 (),
+ __M);
}
-extern __inline long long
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_i64 (__m128d __A, const int __R)
+_mm512_cvtusepi64_epi16 (__m512i __A)
{
- return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R);
+ return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
+ (__v8hi)
+ _mm_undefined_si128 (),
+ (__mmask8) -1);
}
-extern __inline unsigned long long
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_u64 (__m128d __A, const int __R)
+_mm512_mask_cvtusepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A)
{
- return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df) __A, __R);
+ __builtin_ia32_pmovusqw512mem_mask ((__v8hi*) __P, (__v8di) __A, __M);
}
-extern __inline long long
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_si64 (__m128d __A, const int __R)
+_mm512_mask_cvtusepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A)
{
- return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R);
+ return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
+ (__v8hi) __O, __M);
}
-extern __inline long long
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_i64 (__m128d __A, const int __R)
+_mm512_maskz_cvtusepi64_epi16 (__mmask8 __M, __m512i __A)
{
- return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R);
+ return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A,
+ (__v8hi)
+ _mm_setzero_si128 (),
+ __M);
}
-#else
-#define _mm_cvt_roundsd_u64(A, B) \
- ((unsigned long long)__builtin_ia32_vcvtsd2usi64(A, B))
-
-#define _mm_cvt_roundsd_si64(A, B) \
- ((long long)__builtin_ia32_vcvtsd2si64(A, B))
-
-#define _mm_cvt_roundsd_i64(A, B) \
- ((long long)__builtin_ia32_vcvtsd2si64(A, B))
-
-#define _mm_cvtt_roundsd_u64(A, B) \
- ((unsigned long long)__builtin_ia32_vcvttsd2usi64(A, B))
-
-#define _mm_cvtt_roundsd_si64(A, B) \
- ((long long)__builtin_ia32_vcvttsd2si64(A, B))
-
-#define _mm_cvtt_roundsd_i64(A, B) \
- ((long long)__builtin_ia32_vcvttsd2si64(A, B))
-#endif
-#endif
-#ifdef __OPTIMIZE__
-extern __inline unsigned
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_u32 (__m128d __A, const int __R)
+_mm512_cvtepi64_epi8 (__m512i __A)
{
- return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A, __R);
+ return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
+ (__v16qi)
+ _mm_undefined_si128 (),
+ (__mmask8) -1);
}
-extern __inline int
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_si32 (__m128d __A, const int __R)
+_mm512_mask_cvtepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
{
- return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R);
+ __builtin_ia32_pmovqb512mem_mask ((unsigned long long *) __P,
+ (__v8di) __A, __M);
}
-extern __inline int
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_i32 (__m128d __A, const int __R)
+_mm512_mask_cvtepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
{
- return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R);
+ return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
+ (__v16qi) __O, __M);
}
-extern __inline unsigned
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_u32 (__m128d __A, const int __R)
+_mm512_maskz_cvtepi64_epi8 (__mmask8 __M, __m512i __A)
{
- return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A, __R);
+ return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A,
+ (__v16qi)
+ _mm_setzero_si128 (),
+ __M);
}
-extern __inline int
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_i32 (__m128d __A, const int __R)
+_mm512_cvtsepi64_epi8 (__m512i __A)
{
- return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R);
+ return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
+ (__v16qi)
+ _mm_undefined_si128 (),
+ (__mmask8) -1);
}
-extern __inline int
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsd_si32 (__m128d __A, const int __R)
+_mm512_mask_cvtsepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
{
- return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R);
+ __builtin_ia32_pmovsqb512mem_mask ((unsigned long long *) __P, (__v8di) __A, __M);
}
-#else
-#define _mm_cvt_roundsd_u32(A, B) \
- ((unsigned)__builtin_ia32_vcvtsd2usi32(A, B))
-
-#define _mm_cvt_roundsd_si32(A, B) \
- ((int)__builtin_ia32_vcvtsd2si32(A, B))
-#define _mm_cvt_roundsd_i32(A, B) \
- ((int)__builtin_ia32_vcvtsd2si32(A, B))
-
-#define _mm_cvtt_roundsd_u32(A, B) \
- ((unsigned)__builtin_ia32_vcvttsd2usi32(A, B))
-
-#define _mm_cvtt_roundsd_si32(A, B) \
- ((int)__builtin_ia32_vcvttsd2si32(A, B))
-
-#define _mm_cvtt_roundsd_i32(A, B) \
- ((int)__builtin_ia32_vcvttsd2si32(A, B))
-#endif
-
-extern __inline __m512d
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movedup_pd (__m512d __A)
+_mm512_mask_cvtsepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
{
- return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
+ (__v16qi) __O, __M);
}
-extern __inline __m512d
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_movedup_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_cvtsepi64_epi8 (__mmask8 __M, __m512i __A)
{
- return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A,
+ (__v16qi)
+ _mm_setzero_si128 (),
+ __M);
}
-extern __inline __m512d
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_movedup_pd (__mmask8 __U, __m512d __A)
+_mm512_cvtusepi64_epi8 (__m512i __A)
{
- return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
+ (__v16qi)
+ _mm_undefined_si128 (),
+ (__mmask8) -1);
}
-extern __inline __m512d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpacklo_pd (__m512d __A, __m512d __B)
+_mm512_mask_cvtusepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A)
{
- return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1);
+ __builtin_ia32_pmovusqb512mem_mask ((unsigned long long *) __P, (__v8di) __A, __M);
}
-extern __inline __m512d
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpacklo_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_cvtusepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A)
{
- return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
+ (__v16qi) __O,
+ __M);
}
-extern __inline __m512d
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpacklo_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_cvtusepi64_epi8 (__mmask8 __M, __m512i __A)
{
- return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A,
+ (__v16qi)
+ _mm_setzero_si128 (),
+ __M);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpackhi_pd (__m512d __A, __m512d __B)
+_mm512_cvtepi32_pd (__m256i __A)
{
- return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
- (__v8df) __B,
+ return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
(__v8df)
_mm512_undefined_pd (),
(__mmask8) -1);
@@ -8495,93 +8817,88 @@ _mm512_unpackhi_pd (__m512d __A, __m512d __B)
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpackhi_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_cvtepi32_pd (__m512d __W, __mmask8 __U, __m256i __A)
{
- return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
- (__v8df) __B,
+ return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
(__v8df) __W,
(__mmask8) __U);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpackhi_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_cvtepi32_pd (__mmask8 __U, __m256i __A)
{
- return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
- (__v8df) __B,
+ return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A,
(__v8df)
_mm512_setzero_pd (),
(__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpackhi_ps (__m512 __A, __m512 __B)
+_mm512_cvtepu32_pd (__m256i __A)
{
- return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpackhi_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_cvtepu32_pd (__m512d __W, __mmask8 __U, __m256i __A)
{
- return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpackhi_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_cvtepu32_pd (__mmask8 __U, __m256i __A)
{
- return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundps_pd (__m256 __A, const int __R)
+_mm512_cvt_roundepi32_ps (__m512i __A, const int __R)
{
- return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundps_pd (__m512d __W, __mmask8 __U, __m256 __A,
- const int __R)
+_mm512_mask_cvt_roundepi32_ps (__m512 __W, __mmask16 __U, __m512i __A,
+ const int __R)
{
- return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
- (__v8df) __W,
- (__mmask8) __U, __R);
+ return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundps_pd (__mmask8 __U, __m256 __A, const int __R)
+_mm512_maskz_cvt_roundepi32_ps (__mmask16 __U, __m512i __A, const int __R)
{
- return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_ps (__m256i __A, const int __R)
+_mm512_cvt_roundepu32_ps (__m512i __A, const int __R)
{
- return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+ return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
(__v16sf)
_mm512_undefined_ps (),
(__mmask16) -1, __R);
@@ -8589,2829 +8906,2850 @@ _mm512_cvt_roundph_ps (__m256i __A, const int __R)
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_ps (__m512 __W, __mmask16 __U, __m256i __A,
- const int __R)
+_mm512_mask_cvt_roundepu32_ps (__m512 __W, __mmask16 __U, __m512i __A,
+ const int __R)
{
- return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+ return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
(__v16sf) __W,
(__mmask16) __U, __R);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_ps (__mmask16 __U, __m256i __A, const int __R)
+_mm512_maskz_cvt_roundepu32_ps (__mmask16 __U, __m512i __A, const int __R)
{
- return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+ return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
(__v16sf)
_mm512_setzero_ps (),
(__mmask16) __U, __R);
}
-extern __inline __m256i
+#else
+#define _mm512_cvt_roundepi32_ps(A, B) \
+ (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+
+#define _mm512_mask_cvt_roundepi32_ps(W, U, A, B) \
+ (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), W, U, B)
+
+#define _mm512_maskz_cvt_roundepi32_ps(U, A, B) \
+ (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+
+#define _mm512_cvt_roundepu32_ps(A, B) \
+ (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+
+#define _mm512_mask_cvt_roundepu32_ps(W, U, A, B) \
+ (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), W, U, B)
+
+#define _mm512_maskz_cvt_roundepu32_ps(U, A, B) \
+ (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m256d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundps_ph (__m512 __A, const int __I)
+_mm512_extractf64x4_pd (__m512d __A, const int __imm)
{
- return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
- __I,
- (__v16hi)
- _mm256_undefined_si256 (),
- -1);
+ return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
+ __imm,
+ (__v4df)
+ _mm256_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline __m256i
+extern __inline __m256d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_ph (__m512 __A, const int __I)
+_mm512_mask_extractf64x4_pd (__m256d __W, __mmask8 __U, __m512d __A,
+ const int __imm)
{
- return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
- __I,
- (__v16hi)
- _mm256_undefined_si256 (),
- -1);
+ return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
+ __imm,
+ (__v4df) __W,
+ (__mmask8) __U);
}
-extern __inline __m256i
+extern __inline __m256d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundps_ph (__m256i __U, __mmask16 __W, __m512 __A,
- const int __I)
+_mm512_maskz_extractf64x4_pd (__mmask8 __U, __m512d __A, const int __imm)
{
- return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
- __I,
- (__v16hi) __U,
- (__mmask16) __W);
+ return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A,
+ __imm,
+ (__v4df)
+ _mm256_setzero_pd (),
+ (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_extractf32x4_ps (__m512 __A, const int __imm)
+{
+ return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
+ __imm,
+ (__v4sf)
+ _mm_undefined_ps (),
+ (__mmask8) -1);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_extractf32x4_ps (__m128 __W, __mmask8 __U, __m512 __A,
+ const int __imm)
+{
+ return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
+ __imm,
+ (__v4sf) __W,
+ (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_extractf32x4_ps (__mmask8 __U, __m512 __A, const int __imm)
+{
+ return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A,
+ __imm,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U);
}
extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_ph (__m256i __U, __mmask16 __W, __m512 __A, const int __I)
+_mm512_extracti64x4_epi64 (__m512i __A, const int __imm)
{
- return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
- __I,
- (__v16hi) __U,
- (__mmask16) __W);
+ return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
+ __imm,
+ (__v4di)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1);
}
extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundps_ph (__mmask16 __W, __m512 __A, const int __I)
+_mm512_mask_extracti64x4_epi64 (__m256i __W, __mmask8 __U, __m512i __A,
+ const int __imm)
{
- return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
- __I,
- (__v16hi)
- _mm256_setzero_si256 (),
- (__mmask16) __W);
+ return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
+ __imm,
+ (__v4di) __W,
+ (__mmask8) __U);
}
extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_ph (__mmask16 __W, __m512 __A, const int __I)
+_mm512_maskz_extracti64x4_epi64 (__mmask8 __U, __m512i __A, const int __imm)
{
- return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
- __I,
- (__v16hi)
+ return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A,
+ __imm,
+ (__v4di)
_mm256_setzero_si256 (),
- (__mmask16) __W);
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_extracti32x4_epi32 (__m512i __A, const int __imm)
+{
+ return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
+ __imm,
+ (__v4si)
+ _mm_undefined_si128 (),
+ (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_extracti32x4_epi32 (__m128i __W, __mmask8 __U, __m512i __A,
+ const int __imm)
+{
+ return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
+ __imm,
+ (__v4si) __W,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_extracti32x4_epi32 (__mmask8 __U, __m512i __A, const int __imm)
+{
+ return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
+ __imm,
+ (__v4si)
+ _mm_setzero_si128 (),
+ (__mmask8) __U);
}
#else
-#define _mm512_cvt_roundps_pd(A, B) \
- (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, B)
-#define _mm512_mask_cvt_roundps_pd(W, U, A, B) \
- (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)(W), U, B)
+#define _mm512_extractf64x4_pd(X, C) \
+ ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X), \
+ (int) (C),\
+ (__v4df)(__m256d)_mm256_undefined_pd(),\
+ (__mmask8)-1))
-#define _mm512_maskz_cvt_roundps_pd(U, A, B) \
- (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_setzero_pd(), U, B)
+#define _mm512_mask_extractf64x4_pd(W, U, X, C) \
+ ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X), \
+ (int) (C),\
+ (__v4df)(__m256d)(W),\
+ (__mmask8)(U)))
-#define _mm512_cvt_roundph_ps(A, B) \
- (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+#define _mm512_maskz_extractf64x4_pd(U, X, C) \
+ ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X), \
+ (int) (C),\
+ (__v4df)(__m256d)_mm256_setzero_pd(),\
+ (__mmask8)(U)))
-#define _mm512_mask_cvt_roundph_ps(W, U, A, B) \
- (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)(W), U, B)
+#define _mm512_extractf32x4_ps(X, C) \
+ ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X), \
+ (int) (C),\
+ (__v4sf)(__m128)_mm_undefined_ps(),\
+ (__mmask8)-1))
-#define _mm512_maskz_cvt_roundph_ps(U, A, B) \
- (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+#define _mm512_mask_extractf32x4_ps(W, U, X, C) \
+ ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X), \
+ (int) (C),\
+ (__v4sf)(__m128)(W),\
+ (__mmask8)(U)))
-#define _mm512_cvt_roundps_ph(A, I) \
- ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
- (__v16hi)_mm256_undefined_si256 (), -1))
-#define _mm512_cvtps_ph(A, I) \
- ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
- (__v16hi)_mm256_undefined_si256 (), -1))
-#define _mm512_mask_cvt_roundps_ph(U, W, A, I) \
- ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
- (__v16hi)(__m256i)(U), (__mmask16) (W)))
-#define _mm512_mask_cvtps_ph(U, W, A, I) \
- ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
- (__v16hi)(__m256i)(U), (__mmask16) (W)))
-#define _mm512_maskz_cvt_roundps_ph(W, A, I) \
- ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
- (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W)))
-#define _mm512_maskz_cvtps_ph(W, A, I) \
- ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
- (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W)))
+#define _mm512_maskz_extractf32x4_ps(U, X, C) \
+ ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X), \
+ (int) (C),\
+ (__v4sf)(__m128)_mm_setzero_ps(),\
+ (__mmask8)(U)))
+
+#define _mm512_extracti64x4_epi64(X, C) \
+ ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X), \
+ (int) (C),\
+ (__v4di)(__m256i)_mm256_undefined_si256 (),\
+ (__mmask8)-1))
+
+#define _mm512_mask_extracti64x4_epi64(W, U, X, C) \
+ ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X), \
+ (int) (C),\
+ (__v4di)(__m256i)(W),\
+ (__mmask8)(U)))
+
+#define _mm512_maskz_extracti64x4_epi64(U, X, C) \
+ ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X), \
+ (int) (C),\
+ (__v4di)(__m256i)_mm256_setzero_si256 (),\
+ (__mmask8)(U)))
+
+#define _mm512_extracti32x4_epi32(X, C) \
+ ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X), \
+ (int) (C),\
+ (__v4si)(__m128i)_mm_undefined_si128 (),\
+ (__mmask8)-1))
+
+#define _mm512_mask_extracti32x4_epi32(W, U, X, C) \
+ ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X), \
+ (int) (C),\
+ (__v4si)(__m128i)(W),\
+ (__mmask8)(U)))
+
+#define _mm512_maskz_extracti32x4_epi32(U, X, C) \
+ ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X), \
+ (int) (C),\
+ (__v4si)(__m128i)_mm_setzero_si128 (),\
+ (__mmask8)(U)))
#endif
#ifdef __OPTIMIZE__
-extern __inline __m256
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundpd_ps (__m512d __A, const int __R)
-{
- return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
- (__v8sf)
- _mm256_undefined_ps (),
- (__mmask8) -1, __R);
-}
-
-extern __inline __m256
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundpd_ps (__m256 __W, __mmask8 __U, __m512d __A,
- const int __R)
+_mm512_inserti32x4 (__m512i __A, __m128i __B, const int __imm)
{
- return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
- (__v8sf) __W,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __A,
+ (__v4si) __B,
+ __imm,
+ (__v16si) __A, -1);
}
-extern __inline __m256
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundpd_ps (__mmask8 __U, __m512d __A, const int __R)
+_mm512_insertf32x4 (__m512 __A, __m128 __B, const int __imm)
{
- return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
- (__v8sf)
- _mm256_setzero_ps (),
- (__mmask8) __U, __R);
+ return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __A,
+ (__v4sf) __B,
+ __imm,
+ (__v16sf) __A, -1);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_ss (__m128 __A, __m128d __B, const int __R)
+_mm512_inserti64x4 (__m512i __A, __m256i __B, const int __imm)
{
- return (__m128) __builtin_ia32_cvtsd2ss_round ((__v4sf) __A,
- (__v2df) __B,
- __R);
+ return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
+ (__v4di) __B,
+ __imm,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundsd_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128d __B, const int __R)
+_mm512_mask_inserti64x4 (__m512i __W, __mmask8 __U, __m512i __A,
+ __m256i __B, const int __imm)
{
- return (__m128) __builtin_ia32_cvtsd2ss_mask_round ((__v4sf) __A,
- (__v2df) __B,
- (__v4sf) __W,
- __U,
- __R);
+ return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
+ (__v4di) __B,
+ __imm,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundsd_ss (__mmask8 __U, __m128 __A,
- __m128d __B, const int __R)
+_mm512_maskz_inserti64x4 (__mmask8 __U, __m512i __A, __m256i __B,
+ const int __imm)
{
- return (__m128) __builtin_ia32_cvtsd2ss_mask_round ((__v4sf) __A,
- (__v2df) __B,
- _mm_setzero_ps (),
- __U,
- __R);
+ return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A,
+ (__v4di) __B,
+ __imm,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_sd (__m128d __A, __m128 __B, const int __R)
+_mm512_insertf64x4 (__m512d __A, __m256d __B, const int __imm)
{
- return (__m128d) __builtin_ia32_cvtss2sd_round ((__v2df) __A,
- (__v4sf) __B,
- __R);
+ return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
+ (__v4df) __B,
+ __imm,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundss_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128 __B, const int __R)
+_mm512_mask_insertf64x4 (__m512d __W, __mmask8 __U, __m512d __A,
+ __m256d __B, const int __imm)
{
- return (__m128d) __builtin_ia32_cvtss2sd_mask_round ((__v2df) __A,
- (__v4sf) __B,
- (__v2df) __W,
- __U,
- __R);
+ return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
+ (__v4df) __B,
+ __imm,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundss_sd (__mmask8 __U, __m128d __A,
- __m128 __B, const int __R)
+_mm512_maskz_insertf64x4 (__mmask8 __U, __m512d __A, __m256d __B,
+ const int __imm)
{
- return (__m128d) __builtin_ia32_cvtss2sd_mask_round ((__v2df) __A,
- (__v4sf) __B,
- _mm_setzero_pd (),
- __U,
- __R);
+ return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A,
+ (__v4df) __B,
+ __imm,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
#else
-#define _mm512_cvt_roundpd_ps(A, B) \
- (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_undefined_ps(), -1, B)
-
-#define _mm512_mask_cvt_roundpd_ps(W, U, A, B) \
- (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)(W), U, B)
-
-#define _mm512_maskz_cvt_roundpd_ps(U, A, B) \
- (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_setzero_ps(), U, B)
+#define _mm512_insertf32x4(X, Y, C) \
+ ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X), \
+ (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (X), (__mmask16)(-1)))
-#define _mm_cvt_roundsd_ss(A, B, C) \
- (__m128)__builtin_ia32_cvtsd2ss_round(A, B, C)
+#define _mm512_inserti32x4(X, Y, C) \
+ ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X), \
+ (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (X), (__mmask16)(-1)))
-#define _mm_mask_cvt_roundsd_ss(W, U, A, B, C) \
- (__m128)__builtin_ia32_cvtsd2ss_mask_round ((A), (B), (W), (U), (C))
+#define _mm512_insertf64x4(X, Y, C) \
+ ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X), \
+ (__v4df)(__m256d) (Y), (int) (C), \
+ (__v8df)(__m512d)_mm512_undefined_pd(), \
+ (__mmask8)-1))
-#define _mm_maskz_cvt_roundsd_ss(U, A, B, C) \
- (__m128)__builtin_ia32_cvtsd2ss_mask_round ((A), (B), _mm_setzero_ps (), \
- (U), (C))
+#define _mm512_mask_insertf64x4(W, U, X, Y, C) \
+ ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X), \
+ (__v4df)(__m256d) (Y), (int) (C), \
+ (__v8df)(__m512d)(W), \
+ (__mmask8)(U)))
-#define _mm_cvt_roundss_sd(A, B, C) \
- (__m128d)__builtin_ia32_cvtss2sd_round(A, B, C)
+#define _mm512_maskz_insertf64x4(U, X, Y, C) \
+ ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X), \
+ (__v4df)(__m256d) (Y), (int) (C), \
+ (__v8df)(__m512d)_mm512_setzero_pd(), \
+ (__mmask8)(U)))
-#define _mm_mask_cvt_roundss_sd(W, U, A, B, C) \
- (__m128d)__builtin_ia32_cvtss2sd_mask_round ((A), (B), (W), (U), (C))
+#define _mm512_inserti64x4(X, Y, C) \
+ ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X), \
+ (__v4di)(__m256i) (Y), (int) (C), \
+ (__v8di)(__m512i)_mm512_undefined_epi32 (), \
+ (__mmask8)-1))
-#define _mm_maskz_cvt_roundss_sd(U, A, B, C) \
- (__m128d)__builtin_ia32_cvtss2sd_mask_round ((A), (B), _mm_setzero_pd (), \
- (U), (C))
+#define _mm512_mask_inserti64x4(W, U, X, Y, C) \
+ ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X), \
+ (__v4di)(__m256i) (Y), (int) (C),\
+ (__v8di)(__m512i)(W),\
+ (__mmask8)(U)))
+#define _mm512_maskz_inserti64x4(U, X, Y, C) \
+ ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X), \
+ (__v4di)(__m256i) (Y), (int) (C), \
+ (__v8di)(__m512i)_mm512_setzero_si512 (), \
+ (__mmask8)(U)))
#endif
-#define _mm_mask_cvtss_sd(W, U, A, B) \
- _mm_mask_cvt_roundss_sd ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_cvtss_sd(U, A, B) \
- _mm_maskz_cvt_roundss_sd ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_loadu_pd (void const *__P)
+{
+ return *(__m512d_u *)__P;
+}
-#define _mm_mask_cvtsd_ss(W, U, A, B) \
- _mm_mask_cvt_roundsd_ss ((W), (U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_loadu_pd (__m512d __W, __mmask8 __U, void const *__P)
+{
+ return (__m512d) __builtin_ia32_loadupd512_mask ((const double *) __P,
+ (__v8df) __W,
+ (__mmask8) __U);
+}
-#define _mm_maskz_cvtsd_ss(U, A, B) \
- _mm_maskz_cvt_roundsd_ss ((U), (A), (B), _MM_FROUND_CUR_DIRECTION)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_loadu_pd (__mmask8 __U, void const *__P)
+{
+ return (__m512d) __builtin_ia32_loadupd512_mask ((const double *) __P,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
+}
extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_stream_si512 (__m512i * __P, __m512i __A)
+_mm512_storeu_pd (void *__P, __m512d __A)
{
- __builtin_ia32_movntdq512 ((__v8di *) __P, (__v8di) __A);
+ *(__m512d_u *)__P = __A;
}
extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_stream_ps (float *__P, __m512 __A)
+_mm512_mask_storeu_pd (void *__P, __mmask8 __U, __m512d __A)
{
- __builtin_ia32_movntps512 (__P, (__v16sf) __A);
+ __builtin_ia32_storeupd512_mask ((double *) __P, (__v8df) __A,
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_stream_pd (double *__P, __m512d __A)
+_mm512_loadu_ps (void const *__P)
{
- __builtin_ia32_movntpd512 (__P, (__v8df) __A);
+ return *(__m512_u *)__P;
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_stream_load_si512 (void *__P)
+_mm512_mask_loadu_ps (__m512 __W, __mmask16 __U, void const *__P)
{
- return __builtin_ia32_movntdqa512 ((__v8di *)__P);
+ return (__m512) __builtin_ia32_loadups512_mask ((const float *) __P,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-/* Constants for mantissa extraction */
-typedef enum
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_loadu_ps (__mmask16 __U, void const *__P)
{
- _MM_MANT_NORM_1_2, /* interval [1, 2) */
- _MM_MANT_NORM_p5_2, /* interval [0.5, 2) */
- _MM_MANT_NORM_p5_1, /* interval [0.5, 1) */
- _MM_MANT_NORM_p75_1p5 /* interval [0.75, 1.5) */
-} _MM_MANTISSA_NORM_ENUM;
+ return (__m512) __builtin_ia32_loadups512_mask ((const float *) __P,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
+}
-typedef enum
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_storeu_ps (void *__P, __m512 __A)
{
- _MM_MANT_SIGN_src, /* sign = sign(SRC) */
- _MM_MANT_SIGN_zero, /* sign = 0 */
- _MM_MANT_SIGN_nan /* DEST = NaN if sign(SRC) = 1 */
-} _MM_MANTISSA_SIGN_ENUM;
+ *(__m512_u *)__P = __A;
+}
-#ifdef __OPTIMIZE__
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm512_mask_storeu_ps (void *__P, __mmask16 __U, __m512 __A)
{
- return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
- (__v4sf) __B,
- __R);
+ __builtin_ia32_storeups512_mask ((float *) __P, (__v16sf) __A,
+ (__mmask16) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, const int __R)
+_mm512_loadu_epi64 (void const *__P)
{
- return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U, __R);
+ return *(__m512i_u *) __P;
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm512_mask_loadu_epi64 (__m512i __W, __mmask8 __U, void const *__P)
{
- return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_loaddqudi512_mask ((const long long *) __P,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm512_maskz_loadu_epi64 (__mmask8 __U, void const *__P)
{
- return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
- (__v2df) __B,
- __R);
+ return (__m512i) __builtin_ia32_loaddqudi512_mask ((const long long *) __P,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, const int __R)
+_mm512_storeu_epi64 (void *__P, __m512i __A)
{
- return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U, __R);
+ *(__m512i_u *) __P = (__m512i_u) __A;
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm512_mask_storeu_epi64 (void *__P, __mmask8 __U, __m512i __A)
{
- return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U, __R);
+ __builtin_ia32_storedqudi512_mask ((long long *) __P, (__v8di) __A,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getexp_round_ps (__m512 __A, const int __R)
+_mm512_loadu_si512 (void const *__P)
{
- return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return *(__m512i_u *)__P;
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getexp_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
- const int __R)
+_mm512_loadu_epi32 (void const *__P)
{
- return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U, __R);
+ return *(__m512i_u *) __P;
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getexp_round_ps (__mmask16 __U, __m512 __A, const int __R)
+_mm512_mask_loadu_epi32 (__m512i __W, __mmask16 __U, void const *__P)
{
- return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_loaddqusi512_mask ((const int *) __P,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getexp_round_pd (__m512d __A, const int __R)
+_mm512_maskz_loadu_epi32 (__mmask16 __U, void const *__P)
{
- return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_loaddqusi512_mask ((const int *) __P,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m512d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getexp_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
- const int __R)
+_mm512_storeu_si512 (void *__P, __m512i __A)
{
- return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U, __R);
+ *(__m512i_u *)__P = __A;
}
-extern __inline __m512d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getexp_round_pd (__mmask8 __U, __m512d __A, const int __R)
+_mm512_storeu_epi32 (void *__P, __m512i __A)
{
- return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U, __R);
+ *(__m512i_u *) __P = (__m512i_u) __A;
}
-extern __inline __m512d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getmant_round_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_mask_storeu_epi32 (void *__P, __mmask16 __U, __m512i __A)
{
- return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
- (__C << 2) | __B,
- _mm512_undefined_pd (),
- (__mmask8) -1, __R);
+ __builtin_ia32_storedqusi512_mask ((int *) __P, (__v16si) __A,
+ (__mmask16) __U);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getmant_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
- _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_permutevar_pd (__m512d __A, __m512i __C)
{
- return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
- (__C << 2) | __B,
- (__v8df) __W, __U,
- __R);
+ return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
+ (__v8di) __C,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getmant_round_pd (__mmask8 __U, __m512d __A,
- _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_mask_permutevar_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512i __C)
{
- return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
- (__C << 2) | __B,
- (__v8df)
- _mm512_setzero_pd (),
- __U, __R);
+ return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
+ (__v8di) __C,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getmant_round_ps (__m512 __A, _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_maskz_permutevar_pd (__mmask8 __U, __m512d __A, __m512i __C)
{
- return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
- (__C << 2) | __B,
- _mm512_undefined_ps (),
- (__mmask16) -1, __R);
+ return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A,
+ (__v8di) __C,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getmant_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
- _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_permutevar_ps (__m512 __A, __m512i __C)
{
- return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
- (__C << 2) | __B,
- (__v16sf) __W, __U,
- __R);
+ return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
+ (__v16si) __C,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getmant_round_ps (__mmask16 __U, __m512 __A,
- _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm512_mask_permutevar_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512i __C)
{
- return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
- (__C << 2) | __B,
- (__v16sf)
- _mm512_setzero_ps (),
- __U, __R);
+ return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
+ (__v16si) __C,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_round_sd (__m128d __A, __m128d __B,
- _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_maskz_permutevar_ps (__mmask16 __U, __m512 __A, __m512i __C)
{
- return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
- (__v2df) __B,
- (__D << 2) | __C,
- __R);
+ return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A,
+ (__v16si) __C,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_permutex2var_epi64 (__m512i __A, __m512i __I, __m512i __B)
{
- return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__D << 2) | __C,
- (__v2df) __W,
- __U, __R);
+ return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I
+ /* idx */ ,
+ (__v8di) __A,
+ (__v8di) __B,
+ (__mmask8) -1);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_mask_permutex2var_epi64 (__m512i __A, __mmask8 __U, __m512i __I,
+ __m512i __B)
{
- return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__D << 2) | __C,
- (__v2df)
- _mm_setzero_pd(),
- __U, __R);
+ return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I
+ /* idx */ ,
+ (__v8di) __A,
+ (__v8di) __B,
+ (__mmask8) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_round_ss (__m128 __A, __m128 __B,
- _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_mask2_permutex2var_epi64 (__m512i __A, __m512i __I,
+ __mmask8 __U, __m512i __B)
{
- return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
- (__v4sf) __B,
- (__D << 2) | __C,
- __R);
+ return (__m512i) __builtin_ia32_vpermi2varq512_mask ((__v8di) __A,
+ (__v8di) __I
+ /* idx */ ,
+ (__v8di) __B,
+ (__mmask8) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_maskz_permutex2var_epi64 (__mmask8 __U, __m512i __A,
+ __m512i __I, __m512i __B)
{
- return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__D << 2) | __C,
- (__v4sf) __W,
- __U, __R);
+ return (__m512i) __builtin_ia32_vpermt2varq512_maskz ((__v8di) __I
+ /* idx */ ,
+ (__v8di) __A,
+ (__v8di) __B,
+ (__mmask8) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm512_permutex2var_epi32 (__m512i __A, __m512i __I, __m512i __B)
{
- return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__D << 2) | __C,
- (__v4sf)
- _mm_setzero_ps(),
- __U, __R);
-}
-
-#else
-#define _mm512_getmant_round_pd(X, B, C, R) \
- ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
- (int)(((C)<<2) | (B)), \
- (__v8df)(__m512d)_mm512_undefined_pd(), \
- (__mmask8)-1,\
- (R)))
-
-#define _mm512_mask_getmant_round_pd(W, U, X, B, C, R) \
- ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
- (int)(((C)<<2) | (B)), \
- (__v8df)(__m512d)(W), \
- (__mmask8)(U),\
- (R)))
-
-#define _mm512_maskz_getmant_round_pd(U, X, B, C, R) \
- ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
- (int)(((C)<<2) | (B)), \
- (__v8df)(__m512d)_mm512_setzero_pd(), \
- (__mmask8)(U),\
- (R)))
-#define _mm512_getmant_round_ps(X, B, C, R) \
- ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
- (int)(((C)<<2) | (B)), \
- (__v16sf)(__m512)_mm512_undefined_ps(), \
- (__mmask16)-1,\
- (R)))
-
-#define _mm512_mask_getmant_round_ps(W, U, X, B, C, R) \
- ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
- (int)(((C)<<2) | (B)), \
- (__v16sf)(__m512)(W), \
- (__mmask16)(U),\
- (R)))
-
-#define _mm512_maskz_getmant_round_ps(U, X, B, C, R) \
- ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
- (int)(((C)<<2) | (B)), \
- (__v16sf)(__m512)_mm512_setzero_ps(), \
- (__mmask16)(U),\
- (R)))
-#define _mm_getmant_round_sd(X, Y, C, D, R) \
- ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), \
- (int)(((D)<<2) | (C)), \
- (R)))
-
-#define _mm_mask_getmant_round_sd(W, U, X, Y, C, D, R) \
- ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v2df)(__m128d)(W), \
- (__mmask8)(U),\
- (R)))
-
-#define _mm_maskz_getmant_round_sd(U, X, Y, C, D, R) \
- ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v2df)(__m128d)_mm_setzero_pd(), \
- (__mmask8)(U),\
- (R)))
-
-#define _mm_getmant_round_ss(X, Y, C, D, R) \
- ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), \
- (int)(((D)<<2) | (C)), \
- (R)))
-
-#define _mm_mask_getmant_round_ss(W, U, X, Y, C, D, R) \
- ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v4sf)(__m128)(W), \
- (__mmask8)(U),\
- (R)))
-
-#define _mm_maskz_getmant_round_ss(U, X, Y, C, D, R) \
- ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v4sf)(__m128)_mm_setzero_ps(), \
- (__mmask8)(U),\
- (R)))
-
-#define _mm_getexp_round_ss(A, B, R) \
- ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), R))
-
-#define _mm_mask_getexp_round_ss(W, U, A, B, C) \
- (__m128)__builtin_ia32_getexpss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_getexp_round_ss(U, A, B, C) \
- (__m128)__builtin_ia32_getexpss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#define _mm_getexp_round_sd(A, B, R) \
- ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B), R))
-
-#define _mm_mask_getexp_round_sd(W, U, A, B, C) \
- (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_getexp_round_sd(U, A, B, C) \
- (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
+ return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I
+ /* idx */ ,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) -1);
+}
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_permutex2var_epi32 (__m512i __A, __mmask16 __U,
+ __m512i __I, __m512i __B)
+{
+ return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I
+ /* idx */ ,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
-#define _mm512_getexp_round_ps(A, R) \
- ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \
- (__v16sf)_mm512_undefined_ps(), (__mmask16)-1, R))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask2_permutex2var_epi32 (__m512i __A, __m512i __I,
+ __mmask16 __U, __m512i __B)
+{
+ return (__m512i) __builtin_ia32_vpermi2vard512_mask ((__v16si) __A,
+ (__v16si) __I
+ /* idx */ ,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
-#define _mm512_mask_getexp_round_ps(W, U, A, R) \
- ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \
- (__v16sf)(__m512)(W), (__mmask16)(U), R))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_permutex2var_epi32 (__mmask16 __U, __m512i __A,
+ __m512i __I, __m512i __B)
+{
+ return (__m512i) __builtin_ia32_vpermt2vard512_maskz ((__v16si) __I
+ /* idx */ ,
+ (__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) __U);
+}
-#define _mm512_maskz_getexp_round_ps(U, A, R) \
- ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \
- (__v16sf)_mm512_setzero_ps(), (__mmask16)(U), R))
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_permutex2var_pd (__m512d __A, __m512i __I, __m512d __B)
+{
+ return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I
+ /* idx */ ,
+ (__v8df) __A,
+ (__v8df) __B,
+ (__mmask8) -1);
+}
-#define _mm512_getexp_round_pd(A, R) \
- ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \
- (__v8df)_mm512_undefined_pd(), (__mmask8)-1, R))
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_permutex2var_pd (__m512d __A, __mmask8 __U, __m512i __I,
+ __m512d __B)
+{
+ return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I
+ /* idx */ ,
+ (__v8df) __A,
+ (__v8df) __B,
+ (__mmask8) __U);
+}
-#define _mm512_mask_getexp_round_pd(W, U, A, R) \
- ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \
- (__v8df)(__m512d)(W), (__mmask8)(U), R))
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask2_permutex2var_pd (__m512d __A, __m512i __I, __mmask8 __U,
+ __m512d __B)
+{
+ return (__m512d) __builtin_ia32_vpermi2varpd512_mask ((__v8df) __A,
+ (__v8di) __I
+ /* idx */ ,
+ (__v8df) __B,
+ (__mmask8) __U);
+}
-#define _mm512_maskz_getexp_round_pd(U, A, R) \
- ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \
- (__v8df)_mm512_setzero_pd(), (__mmask8)(U), R))
-#endif
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_permutex2var_pd (__mmask8 __U, __m512d __A, __m512i __I,
+ __m512d __B)
+{
+ return (__m512d) __builtin_ia32_vpermt2varpd512_maskz ((__v8di) __I
+ /* idx */ ,
+ (__v8df) __A,
+ (__v8df) __B,
+ (__mmask8) __U);
+}
-#ifdef __OPTIMIZE__
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_roundscale_round_ps (__m512 __A, const int __imm, const int __R)
+_mm512_permutex2var_ps (__m512 __A, __m512i __I, __m512 __B)
{
- return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, __imm,
- (__v16sf)
- _mm512_undefined_ps (),
- -1, __R);
+ return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I
+ /* idx */ ,
+ (__v16sf) __A,
+ (__v16sf) __B,
+ (__mmask16) -1);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_roundscale_round_ps (__m512 __A, __mmask16 __B, __m512 __C,
- const int __imm, const int __R)
+_mm512_mask_permutex2var_ps (__m512 __A, __mmask16 __U, __m512i __I, __m512 __B)
{
- return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __C, __imm,
- (__v16sf) __A,
- (__mmask16) __B, __R);
+ return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I
+ /* idx */ ,
+ (__v16sf) __A,
+ (__v16sf) __B,
+ (__mmask16) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_roundscale_round_ps (__mmask16 __A, __m512 __B,
- const int __imm, const int __R)
+_mm512_mask2_permutex2var_ps (__m512 __A, __m512i __I, __mmask16 __U,
+ __m512 __B)
{
- return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __B,
- __imm,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __A, __R);
+ return (__m512) __builtin_ia32_vpermi2varps512_mask ((__v16sf) __A,
+ (__v16si) __I
+ /* idx */ ,
+ (__v16sf) __B,
+ (__mmask16) __U);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_roundscale_round_pd (__m512d __A, const int __imm, const int __R)
+_mm512_maskz_permutex2var_ps (__mmask16 __U, __m512 __A, __m512i __I,
+ __m512 __B)
{
- return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, __imm,
- (__v8df)
- _mm512_undefined_pd (),
- -1, __R);
+ return (__m512) __builtin_ia32_vpermt2varps512_maskz ((__v16si) __I
+ /* idx */ ,
+ (__v16sf) __A,
+ (__v16sf) __B,
+ (__mmask16) __U);
}
+#ifdef __OPTIMIZE__
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_roundscale_round_pd (__m512d __A, __mmask8 __B,
- __m512d __C, const int __imm, const int __R)
+_mm512_permute_pd (__m512d __X, const int __C)
{
- return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __C, __imm,
- (__v8df) __A,
- (__mmask8) __B, __R);
+ return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B,
- const int __imm, const int __R)
+_mm512_mask_permute_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __C)
{
- return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __B,
- __imm,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __A, __R);
+ return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm,
- const int __R)
+_mm512_maskz_permute_pd (__mmask8 __U, __m512d __X, const int __C)
{
- return (__m128)
- __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
- (__v4sf) __B, __imm,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) -1,
- __R);
+ return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_round_ss (__m128 __A, __mmask8 __B, __m128 __C,
- __m128 __D, const int __imm, const int __R)
+_mm512_permute_ps (__m512 __X, const int __C)
{
- return (__m128)
- __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
- (__v4sf) __D, __imm,
- (__v4sf) __A,
- (__mmask8) __B,
- __R);
+ return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_round_ss (__mmask8 __A, __m128 __B, __m128 __C,
- const int __imm, const int __R)
+_mm512_mask_permute_ps (__m512 __W, __mmask16 __U, __m512 __X, const int __C)
{
- return (__m128)
- __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
- (__v4sf) __C, __imm,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __A,
- __R);
+ return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm,
- const int __R)
+_mm512_maskz_permute_ps (__mmask16 __U, __m512 __X, const int __C)
{
- return (__m128d)
- __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
- (__v2df) __B, __imm,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) -1,
- __R);
+ return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
+#else
+#define _mm512_permute_pd(X, C) \
+ ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C), \
+ (__v8df)(__m512d)_mm512_undefined_pd(),\
+ (__mmask8)(-1)))
+
+#define _mm512_mask_permute_pd(W, U, X, C) \
+ ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C), \
+ (__v8df)(__m512d)(W), \
+ (__mmask8)(U)))
+
+#define _mm512_maskz_permute_pd(U, X, C) \
+ ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C), \
+ (__v8df)(__m512d)_mm512_setzero_pd(), \
+ (__mmask8)(U)))
+
+#define _mm512_permute_ps(X, C) \
+ ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C), \
+ (__v16sf)(__m512)_mm512_undefined_ps(),\
+ (__mmask16)(-1)))
-extern __inline __m128d
+#define _mm512_mask_permute_ps(W, U, X, C) \
+ ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C), \
+ (__v16sf)(__m512)(W), \
+ (__mmask16)(U)))
+
+#define _mm512_maskz_permute_ps(U, X, C) \
+ ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C), \
+ (__v16sf)(__m512)_mm512_setzero_ps(), \
+ (__mmask16)(U)))
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_round_sd (__m128d __A, __mmask8 __B, __m128d __C,
- __m128d __D, const int __imm, const int __R)
+_mm512_permutex_epi64 (__m512i __X, const int __I)
{
- return (__m128d)
- __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
- (__v2df) __D, __imm,
- (__v2df) __A,
- (__mmask8) __B,
- __R);
+ return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) (-1));
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_round_sd (__mmask8 __A, __m128d __B, __m128d __C,
- const int __imm, const int __R)
+_mm512_mask_permutex_epi64 (__m512i __W, __mmask8 __M,
+ __m512i __X, const int __I)
{
- return (__m128d)
- __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
- (__v2df) __C, __imm,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __A,
- __R);
+ return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
+ (__v8di) __W,
+ (__mmask8) __M);
}
-#else
-#define _mm512_roundscale_round_ps(A, B, R) \
- ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
- (__v16sf)_mm512_undefined_ps(), (__mmask16)(-1), R))
-#define _mm512_mask_roundscale_round_ps(A, B, C, D, R) \
- ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(C), \
- (int)(D), \
- (__v16sf)(__m512)(A), \
- (__mmask16)(B), R))
-#define _mm512_maskz_roundscale_round_ps(A, B, C, R) \
- ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(B), \
- (int)(C), \
- (__v16sf)_mm512_setzero_ps(),\
- (__mmask16)(A), R))
-#define _mm512_roundscale_round_pd(A, B, R) \
- ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(A), (int)(B),\
- (__v8df)_mm512_undefined_pd(), (__mmask8)(-1), R))
-#define _mm512_mask_roundscale_round_pd(A, B, C, D, R) \
- ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(C), \
- (int)(D), \
- (__v8df)(__m512d)(A), \
- (__mmask8)(B), R))
-#define _mm512_maskz_roundscale_round_pd(A, B, C, R) \
- ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(B), \
- (int)(C), \
- (__v8df)_mm512_setzero_pd(),\
- (__mmask8)(A), R))
-#define _mm_roundscale_round_ss(A, B, I, R) \
- ((__m128) \
- __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A), \
- (__v4sf) (__m128) (B), \
- (int) (I), \
- (__v4sf) _mm_setzero_ps (), \
- (__mmask8) (-1), \
- (int) (R)))
-#define _mm_mask_roundscale_round_ss(A, U, B, C, I, R) \
- ((__m128) \
- __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (B), \
- (__v4sf) (__m128) (C), \
- (int) (I), \
- (__v4sf) (__m128) (A), \
- (__mmask8) (U), \
- (int) (R)))
-#define _mm_maskz_roundscale_round_ss(U, A, B, I, R) \
- ((__m128) \
- __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A), \
- (__v4sf) (__m128) (B), \
- (int) (I), \
- (__v4sf) _mm_setzero_ps (), \
- (__mmask8) (U), \
- (int) (R)))
-#define _mm_roundscale_round_sd(A, B, I, R) \
- ((__m128d) \
- __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A), \
- (__v2df) (__m128d) (B), \
- (int) (I), \
- (__v2df) _mm_setzero_pd (), \
- (__mmask8) (-1), \
- (int) (R)))
-#define _mm_mask_roundscale_round_sd(A, U, B, C, I, R) \
- ((__m128d) \
- __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (B), \
- (__v2df) (__m128d) (C), \
- (int) (I), \
- (__v2df) (__m128d) (A), \
- (__mmask8) (U), \
- (int) (R)))
-#define _mm_maskz_roundscale_round_sd(U, A, B, I, R) \
- ((__m128d) \
- __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A), \
- (__v2df) (__m128d) (B), \
- (int) (I), \
- (__v2df) _mm_setzero_pd (), \
- (__mmask8) (U), \
- (int) (R)))
-#endif
-
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_floor_ps (__m512 __A)
+_mm512_maskz_permutex_epi64 (__mmask8 __M, __m512i __X, const int __I)
{
- return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
- _MM_FROUND_FLOOR,
- (__v16sf) __A, -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __M);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_floor_pd (__m512d __A)
+_mm512_permutex_pd (__m512d __X, const int __M)
{
- return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
- _MM_FROUND_FLOOR,
- (__v8df) __A, -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ceil_ps (__m512 __A)
+_mm512_mask_permutex_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __M)
{
- return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
- _MM_FROUND_CEIL,
- (__v16sf) __A, -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+ (__v8df) __W,
+ (__mmask8) __U);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ceil_pd (__m512d __A)
+_mm512_maskz_permutex_pd (__mmask8 __U, __m512d __X, const int __M)
{
- return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
- _MM_FROUND_CEIL,
- (__v8df) __A, -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
+#else
+#define _mm512_permutex_pd(X, M) \
+ ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M), \
+ (__v8df)(__m512d)_mm512_undefined_pd(),\
+ (__mmask8)-1))
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_floor_ps (__m512 __W, __mmask16 __U, __m512 __A)
-{
- return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
- _MM_FROUND_FLOOR,
- (__v16sf) __W, __U,
- _MM_FROUND_CUR_DIRECTION);
-}
+#define _mm512_mask_permutex_pd(W, U, X, M) \
+ ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M), \
+ (__v8df)(__m512d)(W), (__mmask8)(U)))
-extern __inline __m512d
+#define _mm512_maskz_permutex_pd(U, X, M) \
+ ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M), \
+ (__v8df)(__m512d)_mm512_setzero_pd(),\
+ (__mmask8)(U)))
+
+#define _mm512_permutex_epi64(X, I) \
+ ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
+ (int)(I), \
+ (__v8di)(__m512i) \
+ (_mm512_undefined_epi32 ()),\
+ (__mmask8)(-1)))
+
+#define _mm512_maskz_permutex_epi64(M, X, I) \
+ ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
+ (int)(I), \
+ (__v8di)(__m512i) \
+ (_mm512_setzero_si512 ()),\
+ (__mmask8)(M)))
+
+#define _mm512_mask_permutex_epi64(W, M, X, I) \
+ ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \
+ (int)(I), \
+ (__v8di)(__m512i)(W), \
+ (__mmask8)(M)))
+#endif
+
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_floor_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_permutexvar_epi64 (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
- _MM_FROUND_FLOOR,
- (__v8df) __W, __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
+ (__v8di) __X,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ceil_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_permutexvar_epi64 (__m512i __X, __m512i __Y)
{
- return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
- _MM_FROUND_CEIL,
- (__v16sf) __W, __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
+ (__v8di) __X,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_ceil_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_mask_permutexvar_epi64 (__m512i __W, __mmask8 __M, __m512i __X,
+ __m512i __Y)
{
- return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
- _MM_FROUND_CEIL,
- (__v8df) __W, __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y,
+ (__v8di) __X,
+ (__v8di) __W,
+ __M);
}
-#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_alignr_epi32 (__m512i __A, __m512i __B, const int __imm)
+_mm512_maskz_permutexvar_epi32 (__mmask16 __M, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
- (__v16si) __B, __imm,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
+ (__v16si) __X,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_alignr_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
- __m512i __B, const int __imm)
+_mm512_permutexvar_epi32 (__m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
- (__v16si) __B, __imm,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
+ (__v16si) __X,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_alignr_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
- const int __imm)
+_mm512_mask_permutexvar_epi32 (__m512i __W, __mmask16 __M, __m512i __X,
+ __m512i __Y)
{
- return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
- (__v16si) __B, __imm,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
+ (__v16si) __X,
+ (__v16si) __W,
+ __M);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_alignr_epi64 (__m512i __A, __m512i __B, const int __imm)
+_mm512_permutexvar_pd (__m512i __X, __m512d __Y)
{
- return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
- (__v8di) __B, __imm,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
+ (__v8di) __X,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_alignr_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
- __m512i __B, const int __imm)
+_mm512_mask_permutexvar_pd (__m512d __W, __mmask8 __U, __m512i __X, __m512d __Y)
{
- return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
- (__v8di) __B, __imm,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
+ (__v8di) __X,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_alignr_epi64 (__mmask8 __U, __m512i __A, __m512i __B,
- const int __imm)
+_mm512_maskz_permutexvar_pd (__mmask8 __U, __m512i __X, __m512d __Y)
{
- return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
- (__v8di) __B, __imm,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y,
+ (__v8di) __X,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-#else
-#define _mm512_alignr_epi32(X, Y, C) \
- ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_undefined_epi32 (),\
- (__mmask16)-1))
-
-#define _mm512_mask_alignr_epi32(W, U, X, Y, C) \
- ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(C), (__v16si)(__m512i)(W), \
- (__mmask16)(U)))
-
-#define _mm512_maskz_alignr_epi32(U, X, Y, C) \
- ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_setzero_si512 (),\
- (__mmask16)(U)))
-
-#define _mm512_alignr_epi64(X, Y, C) \
- ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_undefined_epi32 (), \
- (__mmask8)-1))
-
-#define _mm512_mask_alignr_epi64(W, U, X, Y, C) \
- ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(C), (__v8di)(__m512i)(W), (__mmask8)(U)))
-
-#define _mm512_maskz_alignr_epi64(U, X, Y, C) \
- ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_setzero_si512 (),\
- (__mmask8)(U)))
-#endif
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpeq_epi32_mask (__m512i __A, __m512i __B)
+_mm512_permutexvar_ps (__m512i __X, __m512 __Y)
{
- return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
+ (__v16si) __X,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpeq_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_mask_permutexvar_ps (__m512 __W, __mmask16 __U, __m512i __X, __m512 __Y)
{
- return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A,
- (__v16si) __B, __U);
+ return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
+ (__v16si) __X,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __mmask8
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpeq_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_maskz_permutexvar_ps (__mmask16 __U, __m512i __X, __m512 __Y)
{
- return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A,
- (__v8di) __B, __U);
+ return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y,
+ (__v16si) __X,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __mmask8
+#ifdef __OPTIMIZE__
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpeq_epi64_mask (__m512i __A, __m512i __B)
+_mm512_shuffle_ps (__m512 __M, __m512 __V, const int __imm)
{
- return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
+ (__v16sf) __V, __imm,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpgt_epi32_mask (__m512i __A, __m512i __B)
+_mm512_mask_shuffle_ps (__m512 __W, __mmask16 __U, __m512 __M,
+ __m512 __V, const int __imm)
{
- return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
+ (__v16sf) __V, __imm,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpgt_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
+_mm512_maskz_shuffle_ps (__mmask16 __U, __m512 __M, __m512 __V, const int __imm)
{
- return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A,
- (__v16si) __B, __U);
+ return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M,
+ (__v16sf) __V, __imm,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __mmask8
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpgt_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
+_mm512_shuffle_pd (__m512d __M, __m512d __V, const int __imm)
{
- return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A,
- (__v8di) __B, __U);
+ return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
+ (__v8df) __V, __imm,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline __mmask8
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpgt_epi64_mask (__m512i __A, __m512i __B)
+_mm512_mask_shuffle_pd (__m512d __W, __mmask8 __U, __m512d __M,
+ __m512d __V, const int __imm)
{
- return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
+ (__v8df) __V, __imm,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __mmask16
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpge_epi32_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_shuffle_pd (__mmask8 __U, __m512d __M, __m512d __V,
+ const int __imm)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 5,
- (__mmask16) -1);
+ return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M,
+ (__v8df) __V, __imm,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __mmask16
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpge_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_fixupimm_round_pd (__m512d __A, __m512d __B, __m512i __C,
+ const int __imm, const int __R)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 5,
- (__mmask16) __M);
+ return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8di) __C,
+ __imm,
+ (__mmask8) -1, __R);
}
-extern __inline __mmask16
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpge_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_fixupimm_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
+ __m512i __C, const int __imm, const int __R)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 5,
- (__mmask16) __M);
+ return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8di) __C,
+ __imm,
+ (__mmask8) __U, __R);
}
-extern __inline __mmask16
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpge_epu32_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_fixupimm_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ __m512i __C, const int __imm, const int __R)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 5,
- (__mmask16) -1);
+ return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8di) __C,
+ __imm,
+ (__mmask8) __U, __R);
}
-extern __inline __mmask8
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpge_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_fixupimm_round_ps (__m512 __A, __m512 __B, __m512i __C,
+ const int __imm, const int __R)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 5,
- (__mmask8) __M);
+ return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16si) __C,
+ __imm,
+ (__mmask16) -1, __R);
}
-extern __inline __mmask8
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpge_epi64_mask (__m512i __X, __m512i __Y)
+_mm512_mask_fixupimm_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
+ __m512i __C, const int __imm, const int __R)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 5,
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16si) __C,
+ __imm,
+ (__mmask16) __U, __R);
}
-extern __inline __mmask8
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpge_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_maskz_fixupimm_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
+ __m512i __C, const int __imm, const int __R)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 5,
- (__mmask8) __M);
+ return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16si) __C,
+ __imm,
+ (__mmask16) __U, __R);
}
-extern __inline __mmask8
+#else
+#define _mm512_shuffle_pd(X, Y, C) \
+ ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (int)(C),\
+ (__v8df)(__m512d)_mm512_undefined_pd(),\
+ (__mmask8)-1))
+
+#define _mm512_mask_shuffle_pd(W, U, X, Y, C) \
+ ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (int)(C),\
+ (__v8df)(__m512d)(W),\
+ (__mmask8)(U)))
+
+#define _mm512_maskz_shuffle_pd(U, X, Y, C) \
+ ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (int)(C),\
+ (__v8df)(__m512d)_mm512_setzero_pd(),\
+ (__mmask8)(U)))
+
+#define _mm512_shuffle_ps(X, Y, C) \
+ ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (int)(C),\
+ (__v16sf)(__m512)_mm512_undefined_ps(),\
+ (__mmask16)-1))
+
+#define _mm512_mask_shuffle_ps(W, U, X, Y, C) \
+ ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (int)(C),\
+ (__v16sf)(__m512)(W),\
+ (__mmask16)(U)))
+
+#define _mm512_maskz_shuffle_ps(U, X, Y, C) \
+ ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (int)(C),\
+ (__v16sf)(__m512)_mm512_setzero_ps(),\
+ (__mmask16)(U)))
+
+#define _mm512_fixupimm_round_pd(X, Y, Z, C, R) \
+ ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
+ (__mmask8)(-1), (R)))
+
+#define _mm512_mask_fixupimm_round_pd(X, U, Y, Z, C, R) \
+ ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
+ (__mmask8)(U), (R)))
+
+#define _mm512_maskz_fixupimm_round_pd(U, X, Y, Z, C, R) \
+ ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
+ (__mmask8)(U), (R)))
+
+#define _mm512_fixupimm_round_ps(X, Y, Z, C, R) \
+ ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
+ (__mmask16)(-1), (R)))
+
+#define _mm512_mask_fixupimm_round_ps(X, U, Y, Z, C, R) \
+ ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
+ (__mmask16)(U), (R)))
+
+#define _mm512_maskz_fixupimm_round_ps(U, X, Y, Z, C, R) \
+ ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
+ (__mmask16)(U), (R)))
+
+#endif
+
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpge_epu64_mask (__m512i __X, __m512i __Y)
+_mm512_movehdup_ps (__m512 __A)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 5,
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmple_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_movehdup_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 2,
- (__mmask16) __M);
+ return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmple_epi32_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_movehdup_ps (__mmask16 __U, __m512 __A)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 2,
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmple_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_moveldup_ps (__m512 __A)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 2,
- (__mmask16) __M);
+ return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmple_epu32_mask (__m512i __X, __m512i __Y)
+_mm512_mask_moveldup_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 2,
- (__mmask16) -1);
+ return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __mmask8
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmple_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_maskz_moveldup_ps (__mmask16 __U, __m512 __A)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 2,
- (__mmask8) __M);
+ return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmple_epi64_mask (__m512i __X, __m512i __Y)
+_mm512_or_si512 (__m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 2,
- (__mmask8) -1);
+ return (__m512i) ((__v16su) __A | (__v16su) __B);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmple_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_or_epi32 (__m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 2,
- (__mmask8) __M);
+ return (__m512i) ((__v16su) __A | (__v16su) __B);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmple_epu64_mask (__m512i __X, __m512i __Y)
+_mm512_mask_or_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 2,
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmplt_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_maskz_or_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 1,
- (__mmask16) __M);
+ return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmplt_epi32_mask (__m512i __X, __m512i __Y)
+_mm512_or_epi64 (__m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 1,
- (__mmask16) -1);
+ return (__m512i) ((__v8du) __A | (__v8du) __B);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmplt_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_or_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 1,
- (__mmask16) __M);
+ return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_or_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 1,
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmplt_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_xor_si512 (__m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 1,
- (__mmask8) __M);
+ return (__m512i) ((__v16su) __A ^ (__v16su) __B);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmplt_epi64_mask (__m512i __X, __m512i __Y)
+_mm512_xor_epi32 (__m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 1,
- (__mmask8) -1);
+ return (__m512i) ((__v16su) __A ^ (__v16su) __B);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmplt_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_mask_xor_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 1,
- (__mmask8) __M);
+ return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmplt_epu64_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_xor_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 1,
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpneq_epi32_mask (__m512i __X, __m512i __Y)
+_mm512_xor_epi64 (__m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 4,
- (__mmask16) -1);
+ return (__m512i) ((__v8du) __A ^ (__v8du) __B);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpneq_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_mask_xor_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 4,
- (__mmask16) __M);
+ return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpneq_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
+_mm512_maskz_xor_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 4,
- (__mmask16) __M);
+ return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __mmask16
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpneq_epu32_mask (__m512i __X, __m512i __Y)
+_mm512_rol_epi32 (__m512i __A, const int __B)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, 4,
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpneq_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_mask_rol_epi32 (__m512i __W, __mmask16 __U, __m512i __A, const int __B)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 4,
- (__mmask8) __M);
+ return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpneq_epi64_mask (__m512i __X, __m512i __Y)
+_mm512_maskz_rol_epi32 (__mmask16 __U, __m512i __A, const int __B)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 4,
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmpneq_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
+_mm512_ror_epi32 (__m512i __A, int __B)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 4,
- (__mmask8) __M);
+ return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmpneq_epu64_mask (__m512i __X, __m512i __Y)
+_mm512_mask_ror_epi32 (__m512i __W, __mmask16 __U, __m512i __A, int __B)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, 4,
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-#define _MM_CMPINT_EQ 0x0
-#define _MM_CMPINT_LT 0x1
-#define _MM_CMPINT_LE 0x2
-#define _MM_CMPINT_UNUSED 0x3
-#define _MM_CMPINT_NE 0x4
-#define _MM_CMPINT_NLT 0x5
-#define _MM_CMPINT_GE 0x5
-#define _MM_CMPINT_NLE 0x6
-#define _MM_CMPINT_GT 0x6
-
-#ifdef __OPTIMIZE__
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftli_mask16 (__mmask16 __A, unsigned int __B)
+_mm512_maskz_ror_epi32 (__mmask16 __U, __m512i __A, int __B)
{
- return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A,
- (__mmask8) __B);
+ return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftri_mask16 (__mmask16 __A, unsigned int __B)
+_mm512_rol_epi64 (__m512i __A, const int __B)
{
- return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A,
- (__mmask8) __B);
+ return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_epi64_mask (__m512i __X, __m512i __Y, const int __P)
+_mm512_mask_rol_epi64 (__m512i __W, __mmask8 __U, __m512i __A, const int __B)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, __P,
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_epi32_mask (__m512i __X, __m512i __Y, const int __P)
+_mm512_maskz_rol_epi64 (__mmask8 __U, __m512i __A, const int __B)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, __P,
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_epu64_mask (__m512i __X, __m512i __Y, const int __P)
+_mm512_ror_epi64 (__m512i __A, int __B)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, __P,
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_epu32_mask (__m512i __X, __m512i __Y, const int __P)
+_mm512_mask_ror_epi64 (__m512i __W, __mmask8 __U, __m512i __A, int __B)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, __P,
- (__mmask16) -1);
+ return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_round_pd_mask (__m512d __X, __m512d __Y, const int __P,
- const int __R)
+_mm512_maskz_ror_epi64 (__mmask8 __U, __m512i __A, int __B)
{
- return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X,
- (__v8df) __Y, __P,
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __mmask16
+#else
+#define _mm512_rol_epi32(A, B) \
+ ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A), \
+ (int)(B), \
+ (__v16si)_mm512_undefined_epi32 (), \
+ (__mmask16)(-1)))
+#define _mm512_mask_rol_epi32(W, U, A, B) \
+ ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A), \
+ (int)(B), \
+ (__v16si)(__m512i)(W), \
+ (__mmask16)(U)))
+#define _mm512_maskz_rol_epi32(U, A, B) \
+ ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A), \
+ (int)(B), \
+ (__v16si)_mm512_setzero_si512 (), \
+ (__mmask16)(U)))
+#define _mm512_ror_epi32(A, B) \
+ ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A), \
+ (int)(B), \
+ (__v16si)_mm512_undefined_epi32 (), \
+ (__mmask16)(-1)))
+#define _mm512_mask_ror_epi32(W, U, A, B) \
+ ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A), \
+ (int)(B), \
+ (__v16si)(__m512i)(W), \
+ (__mmask16)(U)))
+#define _mm512_maskz_ror_epi32(U, A, B) \
+ ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A), \
+ (int)(B), \
+ (__v16si)_mm512_setzero_si512 (), \
+ (__mmask16)(U)))
+#define _mm512_rol_epi64(A, B) \
+ ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A), \
+ (int)(B), \
+ (__v8di)_mm512_undefined_epi32 (), \
+ (__mmask8)(-1)))
+#define _mm512_mask_rol_epi64(W, U, A, B) \
+ ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A), \
+ (int)(B), \
+ (__v8di)(__m512i)(W), \
+ (__mmask8)(U)))
+#define _mm512_maskz_rol_epi64(U, A, B) \
+ ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A), \
+ (int)(B), \
+ (__v8di)_mm512_setzero_si512 (), \
+ (__mmask8)(U)))
+
+#define _mm512_ror_epi64(A, B) \
+ ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A), \
+ (int)(B), \
+ (__v8di)_mm512_undefined_epi32 (), \
+ (__mmask8)(-1)))
+#define _mm512_mask_ror_epi64(W, U, A, B) \
+ ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A), \
+ (int)(B), \
+ (__v8di)(__m512i)(W), \
+ (__mmask8)(U)))
+#define _mm512_maskz_ror_epi64(U, A, B) \
+ ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A), \
+ (int)(B), \
+ (__v8di)_mm512_setzero_si512 (), \
+ (__mmask8)(U)))
+#endif
+
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_round_ps_mask (__m512 __X, __m512 __Y, const int __P, const int __R)
+_mm512_and_si512 (__m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X,
- (__v16sf) __Y, __P,
- (__mmask16) -1, __R);
+ return (__m512i) ((__v16su) __A & (__v16su) __B);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epi64_mask (__mmask8 __U, __m512i __X, __m512i __Y,
- const int __P)
+_mm512_and_epi32 (__m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
- (__v8di) __Y, __P,
- (__mmask8) __U);
+ return (__m512i) ((__v16su) __A & (__v16su) __B);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epi32_mask (__mmask16 __U, __m512i __X, __m512i __Y,
- const int __P)
+_mm512_mask_and_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
- (__v16si) __Y, __P,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epu64_mask (__mmask8 __U, __m512i __X, __m512i __Y,
- const int __P)
+_mm512_maskz_and_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
- (__v8di) __Y, __P,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_epu32_mask (__mmask16 __U, __m512i __X, __m512i __Y,
- const int __P)
+_mm512_and_epi64 (__m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
- (__v16si) __Y, __P,
- (__mmask16) __U);
+ return (__m512i) ((__v8du) __A & (__v8du) __B);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_round_pd_mask (__mmask8 __U, __m512d __X, __m512d __Y,
- const int __P, const int __R)
+_mm512_mask_and_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X,
- (__v8df) __Y, __P,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W, __U);
}
-extern __inline __mmask16
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_round_ps_mask (__mmask16 __U, __m512 __X, __m512 __Y,
- const int __P, const int __R)
+_mm512_maskz_and_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X,
- (__v16sf) __Y, __P,
- (__mmask16) __U, __R);
+ return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_pd (),
+ __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_round_sd_mask (__m128d __X, __m128d __Y, const int __P, const int __R)
+_mm512_andnot_si512 (__m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
- (__v2df) __Y, __P,
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_round_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y,
- const int __P, const int __R)
+_mm512_andnot_epi32 (__m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
- (__v2df) __Y, __P,
- (__mmask8) __M, __R);
+ return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_round_ss_mask (__m128 __X, __m128 __Y, const int __P, const int __R)
+_mm512_mask_andnot_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
- (__v4sf) __Y, __P,
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_round_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y,
- const int __P, const int __R)
+_mm512_maskz_andnot_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
- (__v4sf) __Y, __P,
- (__mmask8) __M, __R);
+ return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-#else
-#define _kshiftli_mask16(X, Y) \
- ((__mmask16) __builtin_ia32_kshiftlihi ((__mmask16)(X), (__mmask8)(Y)))
-
-#define _kshiftri_mask16(X, Y) \
- ((__mmask16) __builtin_ia32_kshiftrihi ((__mmask16)(X), (__mmask8)(Y)))
-
-#define _mm512_cmp_epi64_mask(X, Y, P) \
- ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(P),\
- (__mmask8)-1))
-
-#define _mm512_cmp_epi32_mask(X, Y, P) \
- ((__mmask16) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(P), \
- (__mmask16)-1))
-
-#define _mm512_cmp_epu64_mask(X, Y, P) \
- ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(P),\
- (__mmask8)-1))
-
-#define _mm512_cmp_epu32_mask(X, Y, P) \
- ((__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(P), \
- (__mmask16)-1))
-
-#define _mm512_cmp_round_pd_mask(X, Y, P, R) \
- ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (int)(P),\
- (__mmask8)-1, R))
-
-#define _mm512_cmp_round_ps_mask(X, Y, P, R) \
- ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (int)(P),\
- (__mmask16)-1, R))
-
-#define _mm512_mask_cmp_epi64_mask(M, X, Y, P) \
- ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(P),\
- (__mmask8)(M)))
-
-#define _mm512_mask_cmp_epi32_mask(M, X, Y, P) \
- ((__mmask16) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(P), \
- (__mmask16)(M)))
-
-#define _mm512_mask_cmp_epu64_mask(M, X, Y, P) \
- ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X), \
- (__v8di)(__m512i)(Y), (int)(P),\
- (__mmask8)(M)))
-
-#define _mm512_mask_cmp_epu32_mask(M, X, Y, P) \
- ((__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X), \
- (__v16si)(__m512i)(Y), (int)(P), \
- (__mmask16)(M)))
-
-#define _mm512_mask_cmp_round_pd_mask(M, X, Y, P, R) \
- ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (int)(P),\
- (__mmask8)(M), R))
-
-#define _mm512_mask_cmp_round_ps_mask(M, X, Y, P, R) \
- ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (int)(P),\
- (__mmask16)(M), R))
-
-#define _mm_cmp_round_sd_mask(X, Y, P, R) \
- ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (int)(P),\
- (__mmask8)-1, R))
-
-#define _mm_mask_cmp_round_sd_mask(M, X, Y, P, R) \
- ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (int)(P),\
- (M), R))
-
-#define _mm_cmp_round_ss_mask(X, Y, P, R) \
- ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (int)(P), \
- (__mmask8)-1, R))
-
-#define _mm_mask_cmp_round_ss_mask(M, X, Y, P, R) \
- ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (int)(P), \
- (M), R))
-#endif
-
-#ifdef __OPTIMIZE__
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32gather_ps (__m512i __index, void const *__addr, int __scale)
+_mm512_andnot_epi64 (__m512i __A, __m512i __B)
{
- __m512 __v1_old = _mm512_undefined_ps ();
- __mmask16 __mask = 0xFFFF;
-
- return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) __v1_old,
- __addr,
- (__v16si) __index,
- __mask, __scale);
+ return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32gather_ps (__m512 __v1_old, __mmask16 __mask,
- __m512i __index, void const *__addr, int __scale)
+_mm512_mask_andnot_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) __v1_old,
- __addr,
- (__v16si) __index,
- __mask, __scale);
+ return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W, __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32gather_pd (__m256i __index, void const *__addr, int __scale)
+_mm512_maskz_andnot_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- __m512d __v1_old = _mm512_undefined_pd ();
- __mmask8 __mask = 0xFF;
-
- return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) __v1_old,
- __addr,
- (__v8si) __index, __mask,
- __scale);
+ return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_pd (),
+ __U);
}
-extern __inline __m512d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32gather_pd (__m512d __v1_old, __mmask8 __mask,
- __m256i __index, void const *__addr, int __scale)
+_mm512_test_epi32_mask (__m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) __v1_old,
- __addr,
- (__v8si) __index,
- __mask, __scale);
+ return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) -1);
}
-extern __inline __m256
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64gather_ps (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_test_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
{
- __m256 __v1_old = _mm256_undefined_ps ();
- __mmask8 __mask = 0xFF;
-
- return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) __v1_old,
- __addr,
- (__v8di) __index, __mask,
- __scale);
+ return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A,
+ (__v16si) __B, __U);
}
-extern __inline __m256
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64gather_ps (__m256 __v1_old, __mmask8 __mask,
- __m512i __index, void const *__addr, int __scale)
+_mm512_test_epi64_mask (__m512i __A, __m512i __B)
{
- return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) __v1_old,
- __addr,
- (__v8di) __index,
- __mask, __scale);
+ return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A,
+ (__v8di) __B,
+ (__mmask8) -1);
}
-extern __inline __m512d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64gather_pd (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_test_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
{
- __m512d __v1_old = _mm512_undefined_pd ();
- __mmask8 __mask = 0xFF;
-
- return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) __v1_old,
- __addr,
- (__v8di) __index, __mask,
- __scale);
+ return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A, (__v8di) __B, __U);
}
-extern __inline __m512d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64gather_pd (__m512d __v1_old, __mmask8 __mask,
- __m512i __index, void const *__addr, int __scale)
+_mm512_testn_epi32_mask (__m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) __v1_old,
- __addr,
- (__v8di) __index,
- __mask, __scale);
+ return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32gather_epi32 (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
{
- __m512i __v1_old = _mm512_undefined_epi32 ();
- __mmask16 __mask = 0xFFFF;
-
- return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) __v1_old,
- __addr,
- (__v16si) __index,
- __mask, __scale);
+ return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A,
+ (__v16si) __B, __U);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32gather_epi32 (__m512i __v1_old, __mmask16 __mask,
- __m512i __index, void const *__addr, int __scale)
+_mm512_testn_epi64_mask (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) __v1_old,
- __addr,
- (__v16si) __index,
- __mask, __scale);
+ return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
+ (__v8di) __B,
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32gather_epi64 (__m256i __index, void const *__addr, int __scale)
+_mm512_mask_testn_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
{
- __m512i __v1_old = _mm512_undefined_epi32 ();
- __mmask8 __mask = 0xFF;
-
- return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) __v1_old,
- __addr,
- (__v8si) __index, __mask,
- __scale);
+ return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A,
+ (__v8di) __B, __U);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32gather_epi64 (__m512i __v1_old, __mmask8 __mask,
- __m256i __index, void const *__addr,
- int __scale)
+_mm512_abs_ps (__m512 __A)
{
- return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) __v1_old,
- __addr,
- (__v8si) __index,
- __mask, __scale);
+ return (__m512) _mm512_and_epi32 ((__m512i) __A,
+ _mm512_set1_epi32 (0x7fffffff));
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64gather_epi32 (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- __m256i __v1_old = _mm256_undefined_si256 ();
- __mmask8 __mask = 0xFF;
-
- return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) __v1_old,
- __addr,
- (__v8di) __index,
- __mask, __scale);
+ return (__m512) _mm512_mask_and_epi32 ((__m512i) __W, __U, (__m512i) __A,
+ _mm512_set1_epi32 (0x7fffffff));
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64gather_epi32 (__m256i __v1_old, __mmask8 __mask,
- __m512i __index, void const *__addr, int __scale)
+_mm512_abs_pd (__m512d __A)
{
- return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) __v1_old,
- __addr,
- (__v8di) __index,
- __mask, __scale);
+ return (__m512d) _mm512_and_epi64 ((__m512i) __A,
+ _mm512_set1_epi64 (0x7fffffffffffffffLL));
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64gather_epi64 (__m512i __index, void const *__addr, int __scale)
+_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512d __A)
{
- __m512i __v1_old = _mm512_undefined_epi32 ();
- __mmask8 __mask = 0xFF;
-
- return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) __v1_old,
- __addr,
- (__v8di) __index, __mask,
- __scale);
+ return (__m512d)
+ _mm512_mask_and_epi64 ((__m512i) __W, __U, (__m512i) __A,
+ _mm512_set1_epi64 (0x7fffffffffffffffLL));
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64gather_epi64 (__m512i __v1_old, __mmask8 __mask,
- __m512i __index, void const *__addr,
- int __scale)
+_mm512_unpackhi_epi32 (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) __v1_old,
- __addr,
- (__v8di) __index,
- __mask, __scale);
+ return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32scatter_ps (void *__addr, __m512i __index, __m512 __v1, int __scale)
+_mm512_mask_unpackhi_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+ __m512i __B)
{
- __builtin_ia32_scattersiv16sf (__addr, (__mmask16) 0xFFFF,
- (__v16si) __index, (__v16sf) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32scatter_ps (void *__addr, __mmask16 __mask,
- __m512i __index, __m512 __v1, int __scale)
+_mm512_maskz_unpackhi_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- __builtin_ia32_scattersiv16sf (__addr, __mask, (__v16si) __index,
- (__v16sf) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32scatter_pd (void *__addr, __m256i __index, __m512d __v1,
- int __scale)
+_mm512_unpackhi_epi64 (__m512i __A, __m512i __B)
{
- __builtin_ia32_scattersiv8df (__addr, (__mmask8) 0xFF,
- (__v8si) __index, (__v8df) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32scatter_pd (void *__addr, __mmask8 __mask,
- __m256i __index, __m512d __v1, int __scale)
+_mm512_mask_unpackhi_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- __builtin_ia32_scattersiv8df (__addr, __mask, (__v8si) __index,
- (__v8df) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64scatter_ps (void *__addr, __m512i __index, __m256 __v1, int __scale)
+_mm512_maskz_unpackhi_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- __builtin_ia32_scatterdiv16sf (__addr, (__mmask8) 0xFF,
- (__v8di) __index, (__v8sf) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64scatter_ps (void *__addr, __mmask8 __mask,
- __m512i __index, __m256 __v1, int __scale)
+_mm512_unpacklo_epi32 (__m512i __A, __m512i __B)
{
- __builtin_ia32_scatterdiv16sf (__addr, __mask, (__v8di) __index,
- (__v8sf) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64scatter_pd (void *__addr, __m512i __index, __m512d __v1,
- int __scale)
+_mm512_mask_unpacklo_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+ __m512i __B)
{
- __builtin_ia32_scatterdiv8df (__addr, (__mmask8) 0xFF,
- (__v8di) __index, (__v8df) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64scatter_pd (void *__addr, __mmask8 __mask,
- __m512i __index, __m512d __v1, int __scale)
+_mm512_maskz_unpacklo_epi32 (__mmask16 __U, __m512i __A, __m512i __B)
{
- __builtin_ia32_scatterdiv8df (__addr, __mask, (__v8di) __index,
- (__v8df) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32scatter_epi32 (void *__addr, __m512i __index,
- __m512i __v1, int __scale)
+_mm512_unpacklo_epi64 (__m512i __A, __m512i __B)
{
- __builtin_ia32_scattersiv16si (__addr, (__mmask16) 0xFFFF,
- (__v16si) __index, (__v16si) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32scatter_epi32 (void *__addr, __mmask16 __mask,
- __m512i __index, __m512i __v1, int __scale)
+_mm512_mask_unpacklo_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
- __builtin_ia32_scattersiv16si (__addr, __mask, (__v16si) __index,
- (__v16si) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i32scatter_epi64 (void *__addr, __m256i __index,
- __m512i __v1, int __scale)
+_mm512_maskz_unpacklo_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- __builtin_ia32_scattersiv8di (__addr, (__mmask8) 0xFF,
- (__v8si) __index, (__v8di) __v1, __scale);
+ return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i32scatter_epi64 (void *__addr, __mmask8 __mask,
- __m256i __index, __m512i __v1, int __scale)
+_mm512_movedup_pd (__m512d __A)
{
- __builtin_ia32_scattersiv8di (__addr, __mask, (__v8si) __index,
- (__v8di) __v1, __scale);
+ return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64scatter_epi32 (void *__addr, __m512i __index,
- __m256i __v1, int __scale)
+_mm512_mask_movedup_pd (__m512d __W, __mmask8 __U, __m512d __A)
{
- __builtin_ia32_scatterdiv16si (__addr, (__mmask8) 0xFF,
- (__v8di) __index, (__v8si) __v1, __scale);
+ return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64scatter_epi32 (void *__addr, __mmask8 __mask,
- __m512i __index, __m256i __v1, int __scale)
+_mm512_maskz_movedup_pd (__mmask8 __U, __m512d __A)
{
- __builtin_ia32_scatterdiv16si (__addr, __mask, (__v8di) __index,
- (__v8si) __v1, __scale);
+ return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_i64scatter_epi64 (void *__addr, __m512i __index,
- __m512i __v1, int __scale)
+_mm512_unpacklo_pd (__m512d __A, __m512d __B)
{
- __builtin_ia32_scatterdiv8di (__addr, (__mmask8) 0xFF,
- (__v8di) __index, (__v8di) __v1, __scale);
+ return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
}
-extern __inline void
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_i64scatter_epi64 (void *__addr, __mmask8 __mask,
- __m512i __index, __m512i __v1, int __scale)
+_mm512_mask_unpacklo_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
{
- __builtin_ia32_scatterdiv8di (__addr, __mask, (__v8di) __index,
- (__v8di) __v1, __scale);
-}
-#else
-#define _mm512_i32gather_ps(INDEX, ADDR, SCALE) \
- (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)_mm512_undefined_ps(),\
- (void const *) (ADDR), \
- (__v16si)(__m512i) (INDEX), \
- (__mmask16)0xFFFF, \
- (int) (SCALE))
-
-#define _mm512_mask_i32gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE) \
- (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)(__m512) (V1OLD), \
- (void const *) (ADDR), \
- (__v16si)(__m512i) (INDEX), \
- (__mmask16) (MASK), \
- (int) (SCALE))
-
-#define _mm512_i32gather_pd(INDEX, ADDR, SCALE) \
- (__m512d) __builtin_ia32_gathersiv8df ((__v8df)_mm512_undefined_pd(), \
- (void const *) (ADDR), \
- (__v8si)(__m256i) (INDEX), \
- (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i32gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE) \
- (__m512d) __builtin_ia32_gathersiv8df ((__v8df)(__m512d) (V1OLD), \
- (void const *) (ADDR), \
- (__v8si)(__m256i) (INDEX), \
- (__mmask8) (MASK), \
- (int) (SCALE))
-
-#define _mm512_i64gather_ps(INDEX, ADDR, SCALE) \
- (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)_mm256_undefined_ps(), \
- (void const *) (ADDR), \
- (__v8di)(__m512i) (INDEX), \
- (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i64gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE) \
- (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)(__m256) (V1OLD), \
- (void const *) (ADDR), \
- (__v8di)(__m512i) (INDEX), \
- (__mmask8) (MASK), \
- (int) (SCALE))
-
-#define _mm512_i64gather_pd(INDEX, ADDR, SCALE) \
- (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)_mm512_undefined_pd(), \
- (void const *) (ADDR), \
- (__v8di)(__m512i) (INDEX), \
- (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i64gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE) \
- (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)(__m512d) (V1OLD), \
- (void const *) (ADDR), \
- (__v8di)(__m512i) (INDEX), \
- (__mmask8) (MASK), \
- (int) (SCALE))
-
-#define _mm512_i32gather_epi32(INDEX, ADDR, SCALE) \
- (__m512i) __builtin_ia32_gathersiv16si ((__v16si)_mm512_undefined_epi32 (),\
- (void const *) (ADDR), \
- (__v16si)(__m512i) (INDEX), \
- (__mmask16)0xFFFF, \
- (int) (SCALE))
-
-#define _mm512_mask_i32gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE) \
- (__m512i) __builtin_ia32_gathersiv16si ((__v16si)(__m512i) (V1OLD), \
- (void const *) (ADDR), \
- (__v16si)(__m512i) (INDEX), \
- (__mmask16) (MASK), \
- (int) (SCALE))
-
-#define _mm512_i32gather_epi64(INDEX, ADDR, SCALE) \
- (__m512i) __builtin_ia32_gathersiv8di ((__v8di)_mm512_undefined_epi32 (),\
- (void const *) (ADDR), \
- (__v8si)(__m256i) (INDEX), \
- (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i32gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE) \
- (__m512i) __builtin_ia32_gathersiv8di ((__v8di)(__m512i) (V1OLD), \
- (void const *) (ADDR), \
- (__v8si)(__m256i) (INDEX), \
- (__mmask8) (MASK), \
- (int) (SCALE))
-
-#define _mm512_i64gather_epi32(INDEX, ADDR, SCALE) \
- (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)_mm256_undefined_si256(),\
- (void const *) (ADDR), \
- (__v8di)(__m512i) (INDEX), \
- (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i64gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE) \
- (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)(__m256i) (V1OLD), \
- (void const *) (ADDR), \
- (__v8di)(__m512i) (INDEX), \
- (__mmask8) (MASK), \
- (int) (SCALE))
-
-#define _mm512_i64gather_epi64(INDEX, ADDR, SCALE) \
- (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)_mm512_undefined_epi32 (),\
- (void const *) (ADDR), \
- (__v8di)(__m512i) (INDEX), \
- (__mmask8)0xFF, (int) (SCALE))
-
-#define _mm512_mask_i64gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE) \
- (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)(__m512i) (V1OLD), \
- (void const *) (ADDR), \
- (__v8di)(__m512i) (INDEX), \
- (__mmask8) (MASK), \
- (int) (SCALE))
-
-#define _mm512_i32scatter_ps(ADDR, INDEX, V1, SCALE) \
- __builtin_ia32_scattersiv16sf ((void *) (ADDR), (__mmask16)0xFFFF, \
- (__v16si)(__m512i) (INDEX), \
- (__v16sf)(__m512) (V1), (int) (SCALE))
-
-#define _mm512_mask_i32scatter_ps(ADDR, MASK, INDEX, V1, SCALE) \
- __builtin_ia32_scattersiv16sf ((void *) (ADDR), (__mmask16) (MASK), \
- (__v16si)(__m512i) (INDEX), \
- (__v16sf)(__m512) (V1), (int) (SCALE))
-
-#define _mm512_i32scatter_pd(ADDR, INDEX, V1, SCALE) \
- __builtin_ia32_scattersiv8df ((void *) (ADDR), (__mmask8)0xFF, \
- (__v8si)(__m256i) (INDEX), \
- (__v8df)(__m512d) (V1), (int) (SCALE))
-
-#define _mm512_mask_i32scatter_pd(ADDR, MASK, INDEX, V1, SCALE) \
- __builtin_ia32_scattersiv8df ((void *) (ADDR), (__mmask8) (MASK), \
- (__v8si)(__m256i) (INDEX), \
- (__v8df)(__m512d) (V1), (int) (SCALE))
-
-#define _mm512_i64scatter_ps(ADDR, INDEX, V1, SCALE) \
- __builtin_ia32_scatterdiv16sf ((void *) (ADDR), (__mmask8)0xFF, \
- (__v8di)(__m512i) (INDEX), \
- (__v8sf)(__m256) (V1), (int) (SCALE))
-
-#define _mm512_mask_i64scatter_ps(ADDR, MASK, INDEX, V1, SCALE) \
- __builtin_ia32_scatterdiv16sf ((void *) (ADDR), (__mmask16) (MASK), \
- (__v8di)(__m512i) (INDEX), \
- (__v8sf)(__m256) (V1), (int) (SCALE))
-
-#define _mm512_i64scatter_pd(ADDR, INDEX, V1, SCALE) \
- __builtin_ia32_scatterdiv8df ((void *) (ADDR), (__mmask8)0xFF, \
- (__v8di)(__m512i) (INDEX), \
- (__v8df)(__m512d) (V1), (int) (SCALE))
-
-#define _mm512_mask_i64scatter_pd(ADDR, MASK, INDEX, V1, SCALE) \
- __builtin_ia32_scatterdiv8df ((void *) (ADDR), (__mmask8) (MASK), \
- (__v8di)(__m512i) (INDEX), \
- (__v8df)(__m512d) (V1), (int) (SCALE))
-
-#define _mm512_i32scatter_epi32(ADDR, INDEX, V1, SCALE) \
- __builtin_ia32_scattersiv16si ((void *) (ADDR), (__mmask16)0xFFFF, \
- (__v16si)(__m512i) (INDEX), \
- (__v16si)(__m512i) (V1), (int) (SCALE))
-
-#define _mm512_mask_i32scatter_epi32(ADDR, MASK, INDEX, V1, SCALE) \
- __builtin_ia32_scattersiv16si ((void *) (ADDR), (__mmask16) (MASK), \
- (__v16si)(__m512i) (INDEX), \
- (__v16si)(__m512i) (V1), (int) (SCALE))
-
-#define _mm512_i32scatter_epi64(ADDR, INDEX, V1, SCALE) \
- __builtin_ia32_scattersiv8di ((void *) (ADDR), (__mmask8)0xFF, \
- (__v8si)(__m256i) (INDEX), \
- (__v8di)(__m512i) (V1), (int) (SCALE))
-
-#define _mm512_mask_i32scatter_epi64(ADDR, MASK, INDEX, V1, SCALE) \
- __builtin_ia32_scattersiv8di ((void *) (ADDR), (__mmask8) (MASK), \
- (__v8si)(__m256i) (INDEX), \
- (__v8di)(__m512i) (V1), (int) (SCALE))
-
-#define _mm512_i64scatter_epi32(ADDR, INDEX, V1, SCALE) \
- __builtin_ia32_scatterdiv16si ((void *) (ADDR), (__mmask8)0xFF, \
- (__v8di)(__m512i) (INDEX), \
- (__v8si)(__m256i) (V1), (int) (SCALE))
-
-#define _mm512_mask_i64scatter_epi32(ADDR, MASK, INDEX, V1, SCALE) \
- __builtin_ia32_scatterdiv16si ((void *) (ADDR), (__mmask8) (MASK), \
- (__v8di)(__m512i) (INDEX), \
- (__v8si)(__m256i) (V1), (int) (SCALE))
+ return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U);
+}
-#define _mm512_i64scatter_epi64(ADDR, INDEX, V1, SCALE) \
- __builtin_ia32_scatterdiv8di ((void *) (ADDR), (__mmask8)0xFF, \
- (__v8di)(__m512i) (INDEX), \
- (__v8di)(__m512i) (V1), (int) (SCALE))
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_unpacklo_pd (__mmask8 __U, __m512d __A, __m512d __B)
+{
+ return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
+}
-#define _mm512_mask_i64scatter_epi64(ADDR, MASK, INDEX, V1, SCALE) \
- __builtin_ia32_scatterdiv8di ((void *) (ADDR), (__mmask8) (MASK), \
- (__v8di)(__m512i) (INDEX), \
- (__v8di)(__m512i) (V1), (int) (SCALE))
-#endif
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_unpackhi_pd (__m512d __A, __m512d __B)
+{
+ return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
+}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compress_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_mask_unpackhi_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_compress_pd (__mmask8 __U, __m512d __A)
+_mm512_maskz_unpackhi_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline void
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compressstoreu_pd (void *__P, __mmask8 __U, __m512d __A)
+_mm512_unpackhi_ps (__m512 __A, __m512 __B)
{
- __builtin_ia32_compressstoredf512_mask ((__v8df *) __P, (__v8df) __A,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compress_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_mask_unpackhi_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_compress_ps (__mmask16 __U, __m512 __A)
+_mm512_maskz_unpackhi_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline void
+#ifdef __OPTIMIZE__
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compressstoreu_ps (void *__P, __mmask16 __U, __m512 __A)
+_mm512_cvt_roundps_pd (__m256 __A, const int __R)
{
- __builtin_ia32_compressstoresf512_mask ((__v16sf *) __P, (__v16sf) __A,
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compress_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
+_mm512_mask_cvt_roundps_pd (__m512d __W, __mmask8 __U, __m256 __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_compress_epi64 (__mmask8 __U, __m512i __A)
+_mm512_maskz_cvt_roundps_pd (__mmask8 __U, __m256 __A, const int __R)
{
- return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline void
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compressstoreu_epi64 (void *__P, __mmask8 __U, __m512i __A)
+_mm512_cvt_roundph_ps (__m256i __A, const int __R)
{
- __builtin_ia32_compressstoredi512_mask ((__v8di *) __P, (__v8di) __A,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compress_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
+_mm512_mask_cvt_roundph_ps (__m512 __W, __mmask16 __U, __m256i __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_compress_epi32 (__mmask16 __U, __m512i __A)
+_mm512_maskz_cvt_roundph_ps (__mmask16 __U, __m256i __A, const int __R)
{
- return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
-extern __inline void
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_compressstoreu_epi32 (void *__P, __mmask16 __U, __m512i __A)
+_mm512_cvt_roundps_ph (__m512 __A, const int __I)
{
- __builtin_ia32_compressstoresi512_mask ((__v16si *) __P, (__v16si) __A,
- (__mmask16) __U);
+ return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+ __I,
+ (__v16hi)
+ _mm256_undefined_si256 (),
+ -1);
}
-extern __inline __m512d
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expand_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_cvtps_ph (__m512 __A, const int __I)
{
- return (__m512d) __builtin_ia32_expanddf512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+ __I,
+ (__v16hi)
+ _mm256_undefined_si256 (),
+ -1);
}
-extern __inline __m512d
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expand_pd (__mmask8 __U, __m512d __A)
+_mm512_mask_cvt_roundps_ph (__m256i __U, __mmask16 __W, __m512 __A,
+ const int __I)
{
- return (__m512d) __builtin_ia32_expanddf512_maskz ((__v8df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+ __I,
+ (__v16hi) __U,
+ (__mmask16) __W);
}
-extern __inline __m512d
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtps_ph (__m256i __U, __mmask16 __W, __m512 __A, const int __I)
+{
+ return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+ __I,
+ (__v16hi) __U,
+ (__mmask16) __W);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundps_ph (__mmask16 __W, __m512 __A, const int __I)
+{
+ return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+ __I,
+ (__v16hi)
+ _mm256_setzero_si256 (),
+ (__mmask16) __W);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtps_ph (__mmask16 __W, __m512 __A, const int __I)
+{
+ return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
+ __I,
+ (__v16hi)
+ _mm256_setzero_si256 (),
+ (__mmask16) __W);
+}
+#else
+#define _mm512_cvt_roundps_pd(A, B) \
+ (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, B)
+
+#define _mm512_mask_cvt_roundps_pd(W, U, A, B) \
+ (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)(W), U, B)
+
+#define _mm512_maskz_cvt_roundps_pd(U, A, B) \
+ (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_setzero_pd(), U, B)
+
+#define _mm512_cvt_roundph_ps(A, B) \
+ (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_undefined_ps(), -1, B)
+
+#define _mm512_mask_cvt_roundph_ps(W, U, A, B) \
+ (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)(W), U, B)
+
+#define _mm512_maskz_cvt_roundph_ps(U, A, B) \
+ (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_setzero_ps(), U, B)
+
+#define _mm512_cvt_roundps_ph(A, I) \
+ ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+ (__v16hi)_mm256_undefined_si256 (), -1))
+#define _mm512_cvtps_ph(A, I) \
+ ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+ (__v16hi)_mm256_undefined_si256 (), -1))
+#define _mm512_mask_cvt_roundps_ph(U, W, A, I) \
+ ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+ (__v16hi)(__m256i)(U), (__mmask16) (W)))
+#define _mm512_mask_cvtps_ph(U, W, A, I) \
+ ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+ (__v16hi)(__m256i)(U), (__mmask16) (W)))
+#define _mm512_maskz_cvt_roundps_ph(W, A, I) \
+ ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+ (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W)))
+#define _mm512_maskz_cvtps_ph(W, A, I) \
+ ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) (A), (int) (I),\
+ (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W)))
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expandloadu_pd (__m512d __W, __mmask8 __U, void const *__P)
+_mm512_cvt_roundpd_ps (__m512d __A, const int __R)
{
- return (__m512d) __builtin_ia32_expandloaddf512_mask ((const __v8df *) __P,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
+ (__v8sf)
+ _mm256_undefined_ps (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512d
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expandloadu_pd (__mmask8 __U, void const *__P)
+_mm512_mask_cvt_roundpd_ps (__m256 __W, __mmask8 __U, __m512d __A,
+ const int __R)
{
- return (__m512d) __builtin_ia32_expandloaddf512_maskz ((const __v8df *) __P,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
+ (__v8sf) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expand_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_cvt_roundpd_ps (__mmask8 __U, __m512d __A, const int __R)
{
- return (__m512) __builtin_ia32_expandsf512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A,
+ (__v8sf)
+ _mm256_setzero_ps (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+#else
+#define _mm512_cvt_roundpd_ps(A, B) \
+ (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_undefined_ps(), -1, B)
+
+#define _mm512_mask_cvt_roundpd_ps(W, U, A, B) \
+ (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)(W), U, B)
+
+#define _mm512_maskz_cvt_roundpd_ps(U, A, B) \
+ (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_setzero_ps(), U, B)
+
+#endif
+
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expand_ps (__mmask16 __U, __m512 __A)
+_mm512_stream_si512 (__m512i * __P, __m512i __A)
{
- return (__m512) __builtin_ia32_expandsf512_maskz ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ __builtin_ia32_movntdq512 ((__v8di *) __P, (__v8di) __A);
}
-extern __inline __m512
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expandloadu_ps (__m512 __W, __mmask16 __U, void const *__P)
+_mm512_stream_ps (float *__P, __m512 __A)
{
- return (__m512) __builtin_ia32_expandloadsf512_mask ((const __v16sf *) __P,
- (__v16sf) __W,
- (__mmask16) __U);
+ __builtin_ia32_movntps512 (__P, (__v16sf) __A);
}
-extern __inline __m512
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expandloadu_ps (__mmask16 __U, void const *__P)
+_mm512_stream_pd (double *__P, __m512d __A)
{
- return (__m512) __builtin_ia32_expandloadsf512_maskz ((const __v16sf *) __P,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ __builtin_ia32_movntpd512 (__P, (__v8df) __A);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expand_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
+_mm512_stream_load_si512 (void *__P)
{
- return (__m512i) __builtin_ia32_expanddi512_mask ((__v8di) __A,
- (__v8di) __W,
- (__mmask8) __U);
+ return __builtin_ia32_movntdqa512 ((__v8di *)__P);
}
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expand_epi64 (__mmask8 __U, __m512i __A)
+_mm512_getexp_round_ps (__m512 __A, const int __R)
{
- return (__m512i) __builtin_ia32_expanddi512_maskz ((__v8di) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expandloadu_epi64 (__m512i __W, __mmask8 __U, void const *__P)
+_mm512_mask_getexp_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_expandloaddi512_mask ((const __v8di *) __P,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expandloadu_epi64 (__mmask8 __U, void const *__P)
+_mm512_maskz_getexp_round_ps (__mmask16 __U, __m512 __A, const int __R)
{
- return (__m512i)
- __builtin_ia32_expandloaddi512_maskz ((const __v8di *) __P,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expand_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
+_mm512_getexp_round_pd (__m512d __A, const int __R)
{
- return (__m512i) __builtin_ia32_expandsi512_mask ((__v16si) __A,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expand_epi32 (__mmask16 __U, __m512i __A)
+_mm512_mask_getexp_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ const int __R)
{
- return (__m512i) __builtin_ia32_expandsi512_maskz ((__v16si) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_expandloadu_epi32 (__m512i __W, __mmask16 __U, void const *__P)
+_mm512_maskz_getexp_round_pd (__mmask8 __U, __m512d __A, const int __R)
{
- return (__m512i) __builtin_ia32_expandloadsi512_mask ((const __v16si *) __P,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
+_mm512_getmant_round_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C, const int __R)
{
- return (__m512i) __builtin_ia32_expandloadsi512_maskz ((const __v16si *) __P,
- (__v16si)
- _mm512_setzero_si512
- (), (__mmask16) __U);
+ return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
+ (__C << 2) | __B,
+ _mm512_undefined_pd (),
+ (__mmask8) -1, __R);
}
-/* Mask arithmetic operations */
-#define _kand_mask16 _mm512_kand
-#define _kandn_mask16 _mm512_kandn
-#define _knot_mask16 _mm512_knot
-#define _kor_mask16 _mm512_kor
-#define _kxnor_mask16 _mm512_kxnor
-#define _kxor_mask16 _mm512_kxor
-
-extern __inline unsigned char
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortest_mask16_u8 (__mmask16 __A, __mmask16 __B, unsigned char *__CF)
+_mm512_mask_getmant_round_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C, const int __R)
{
- *__CF = (unsigned char) __builtin_ia32_kortestchi (__A, __B);
- return (unsigned char) __builtin_ia32_kortestzhi (__A, __B);
+ return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
+ (__C << 2) | __B,
+ (__v8df) __W, __U,
+ __R);
}
-extern __inline unsigned char
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+_mm512_maskz_getmant_round_pd (__mmask8 __U, __m512d __A,
+ _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C, const int __R)
{
- return (unsigned char) __builtin_ia32_kortestzhi ((__mmask16) __A,
- (__mmask16) __B);
+ return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A,
+ (__C << 2) | __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ __U, __R);
}
-extern __inline unsigned char
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+_mm512_getmant_round_ps (__m512 __A, _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C, const int __R)
{
- return (unsigned char) __builtin_ia32_kortestchi ((__mmask16) __A,
- (__mmask16) __B);
+ return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
+ (__C << 2) | __B,
+ _mm512_undefined_ps (),
+ (__mmask16) -1, __R);
}
-extern __inline unsigned int
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtmask16_u32 (__mmask16 __A)
+_mm512_mask_getmant_round_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C, const int __R)
{
- return (unsigned int) __builtin_ia32_kmovw ((__mmask16 ) __A);
+ return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
+ (__C << 2) | __B,
+ (__v16sf) __W, __U,
+ __R);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtu32_mask16 (unsigned int __A)
+_mm512_maskz_getmant_round_ps (__mmask16 __U, __m512 __A,
+ _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C, const int __R)
{
- return (__mmask16) __builtin_ia32_kmovw ((__mmask16 ) __A);
+ return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A,
+ (__C << 2) | __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ __U, __R);
}
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_load_mask16 (__mmask16 *__A)
-{
- return (__mmask16) __builtin_ia32_kmovw (*(__mmask16 *) __A);
-}
+#else
+#define _mm512_getmant_round_pd(X, B, C, R) \
+ ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v8df)(__m512d)_mm512_undefined_pd(), \
+ (__mmask8)-1,\
+ (R)))
+
+#define _mm512_mask_getmant_round_pd(W, U, X, B, C, R) \
+ ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v8df)(__m512d)(W), \
+ (__mmask8)(U),\
+ (R)))
+
+#define _mm512_maskz_getmant_round_pd(U, X, B, C, R) \
+ ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v8df)(__m512d)_mm512_setzero_pd(), \
+ (__mmask8)(U),\
+ (R)))
+#define _mm512_getmant_round_ps(X, B, C, R) \
+ ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v16sf)(__m512)_mm512_undefined_ps(), \
+ (__mmask16)-1,\
+ (R)))
+
+#define _mm512_mask_getmant_round_ps(W, U, X, B, C, R) \
+ ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v16sf)(__m512)(W), \
+ (__mmask16)(U),\
+ (R)))
+
+#define _mm512_maskz_getmant_round_ps(U, X, B, C, R) \
+ ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v16sf)(__m512)_mm512_setzero_ps(), \
+ (__mmask16)(U),\
+ (R)))
-extern __inline void
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_store_mask16 (__mmask16 *__A, __mmask16 __B)
-{
- *(__mmask16 *) __A = __builtin_ia32_kmovw (__B);
-}
+#define _mm512_getexp_round_ps(A, R) \
+ ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \
+ (__v16sf)_mm512_undefined_ps(), (__mmask16)-1, R))
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kand (__mmask16 __A, __mmask16 __B)
-{
- return (__mmask16) __builtin_ia32_kandhi ((__mmask16) __A, (__mmask16) __B);
-}
+#define _mm512_mask_getexp_round_ps(W, U, A, R) \
+ ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \
+ (__v16sf)(__m512)(W), (__mmask16)(U), R))
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kandn (__mmask16 __A, __mmask16 __B)
-{
- return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
- (__mmask16) __B);
-}
+#define _mm512_maskz_getexp_round_ps(U, A, R) \
+ ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \
+ (__v16sf)_mm512_setzero_ps(), (__mmask16)(U), R))
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kor (__mmask16 __A, __mmask16 __B)
-{
- return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
-}
+#define _mm512_getexp_round_pd(A, R) \
+ ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \
+ (__v8df)_mm512_undefined_pd(), (__mmask8)-1, R))
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kortestz (__mmask16 __A, __mmask16 __B)
-{
- return (__mmask16) __builtin_ia32_kortestzhi ((__mmask16) __A,
- (__mmask16) __B);
-}
+#define _mm512_mask_getexp_round_pd(W, U, A, R) \
+ ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \
+ (__v8df)(__m512d)(W), (__mmask8)(U), R))
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kortestc (__mmask16 __A, __mmask16 __B)
-{
- return (__mmask16) __builtin_ia32_kortestchi ((__mmask16) __A,
- (__mmask16) __B);
-}
+#define _mm512_maskz_getexp_round_pd(U, A, R) \
+ ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \
+ (__v8df)_mm512_setzero_pd(), (__mmask8)(U), R))
+#endif
-extern __inline __mmask16
+#ifdef __OPTIMIZE__
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kxnor (__mmask16 __A, __mmask16 __B)
+_mm512_roundscale_round_ps (__m512 __A, const int __imm, const int __R)
{
- return (__mmask16) __builtin_ia32_kxnorhi ((__mmask16) __A, (__mmask16) __B);
+ return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, __imm,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ -1, __R);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kxor (__mmask16 __A, __mmask16 __B)
+_mm512_mask_roundscale_round_ps (__m512 __A, __mmask16 __B, __m512 __C,
+ const int __imm, const int __R)
{
- return (__mmask16) __builtin_ia32_kxorhi ((__mmask16) __A, (__mmask16) __B);
+ return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __C, __imm,
+ (__v16sf) __A,
+ (__mmask16) __B, __R);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_knot (__mmask16 __A)
+_mm512_maskz_roundscale_round_ps (__mmask16 __A, __m512 __B,
+ const int __imm, const int __R)
{
- return (__mmask16) __builtin_ia32_knothi ((__mmask16) __A);
+ return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __B,
+ __imm,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __A, __R);
}
-extern __inline __mmask16
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kunpackb (__mmask16 __A, __mmask16 __B)
+_mm512_roundscale_round_pd (__m512d __A, const int __imm, const int __R)
{
- return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
+ return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, __imm,
+ (__v8df)
+ _mm512_undefined_pd (),
+ -1, __R);
}
-extern __inline __mmask16
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
+_mm512_mask_roundscale_round_pd (__m512d __A, __mmask8 __B,
+ __m512d __C, const int __imm, const int __R)
{
- return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
+ return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __C, __imm,
+ (__v8df) __A,
+ (__mmask8) __B, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_inserti32x4 (__mmask16 __B, __m512i __C, __m128i __D,
- const int __imm)
+_mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B,
+ const int __imm, const int __R)
{
- return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C,
- (__v4si) __D,
- __imm,
- (__v16si)
- _mm512_setzero_si512 (),
- __B);
+ return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __B,
+ __imm,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __A, __R);
}
+#else
+#define _mm512_roundscale_round_ps(A, B, R) \
+ ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
+ (__v16sf)_mm512_undefined_ps(), (__mmask16)(-1), R))
+#define _mm512_mask_roundscale_round_ps(A, B, C, D, R) \
+ ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(C), \
+ (int)(D), \
+ (__v16sf)(__m512)(A), \
+ (__mmask16)(B), R))
+#define _mm512_maskz_roundscale_round_ps(A, B, C, R) \
+ ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(B), \
+ (int)(C), \
+ (__v16sf)_mm512_setzero_ps(),\
+ (__mmask16)(A), R))
+#define _mm512_roundscale_round_pd(A, B, R) \
+ ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(A), (int)(B),\
+ (__v8df)_mm512_undefined_pd(), (__mmask8)(-1), R))
+#define _mm512_mask_roundscale_round_pd(A, B, C, D, R) \
+ ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(C), \
+ (int)(D), \
+ (__v8df)(__m512d)(A), \
+ (__mmask8)(B), R))
+#define _mm512_maskz_roundscale_round_pd(A, B, C, R) \
+ ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(B), \
+ (int)(C), \
+ (__v8df)_mm512_setzero_pd(),\
+ (__mmask8)(A), R))
+#endif
+
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_insertf32x4 (__mmask16 __B, __m512 __C, __m128 __D,
- const int __imm)
+_mm512_floor_ps (__m512 __A)
{
- return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C,
- (__v4sf) __D,
- __imm,
- (__v16sf)
- _mm512_setzero_ps (), __B);
+ return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
+ _MM_FROUND_FLOOR,
+ (__v16sf) __A, -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_inserti32x4 (__m512i __A, __mmask16 __B, __m512i __C,
- __m128i __D, const int __imm)
+_mm512_floor_pd (__m512d __A)
{
- return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C,
- (__v4si) __D,
- __imm,
- (__v16si) __A,
- __B);
+ return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
+ _MM_FROUND_FLOOR,
+ (__v8df) __A, -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_insertf32x4 (__m512 __A, __mmask16 __B, __m512 __C,
- __m128 __D, const int __imm)
-{
- return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C,
- (__v4sf) __D,
- __imm,
- (__v16sf) __A, __B);
-}
-#else
-#define _mm512_maskz_insertf32x4(A, X, Y, C) \
- ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X), \
- (__v4sf)(__m128) (Y), (int) (C), (__v16sf)_mm512_setzero_ps(), \
- (__mmask16)(A)))
-
-#define _mm512_maskz_inserti32x4(A, X, Y, C) \
- ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X), \
- (__v4si)(__m128i) (Y), (int) (C), (__v16si)_mm512_setzero_si512 (), \
- (__mmask16)(A)))
-
-#define _mm512_mask_insertf32x4(A, B, X, Y, C) \
- ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X), \
- (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (A), \
- (__mmask16)(B)))
-
-#define _mm512_mask_inserti32x4(A, B, X, Y, C) \
- ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X), \
- (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (A), \
- (__mmask16)(B)))
-#endif
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_epi64 (__m512i __A, __m512i __B)
+_mm512_ceil_ps (__m512 __A)
{
- return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
+ _MM_FROUND_CEIL,
+ (__v16sf) __A, -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_epi64 (__mmask8 __M, __m512i __A, __m512i __B)
+_mm512_ceil_pd (__m512d __A)
{
- return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- __M);
+ return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
+ _MM_FROUND_CEIL,
+ (__v8df) __A, -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_floor_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W, __M);
+ return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
+ _MM_FROUND_FLOOR,
+ (__v16sf) __W, __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_epi64 (__m512i __A, __m512i __B)
+_mm512_mask_floor_pd (__m512d __W, __mmask8 __U, __m512d __A)
{
- return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_undefined_epi32 (),
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
+ _MM_FROUND_FLOOR,
+ (__v8df) __W, __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_ceil_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W, __M);
+ return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,
+ _MM_FROUND_CEIL,
+ (__v16sf) __W, __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_epi64 (__mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_ceil_pd (__m512d __W, __mmask8 __U, __m512d __A)
{
- return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- __M);
+ return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,
+ _MM_FROUND_CEIL,
+ (__v8df) __W, __U,
+ _MM_FROUND_CUR_DIRECTION);
}
+#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_epu64 (__m512i __A, __m512i __B)
+_mm512_alignr_epi32 (__m512i __A, __m512i __B, const int __imm)
{
- return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
+ return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
+ (__v16si) __B, __imm,
+ (__v16si)
_mm512_undefined_epi32 (),
- (__mmask8) -1);
+ (__mmask16) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_epu64 (__mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_alignr_epi32 (__m512i __W, __mmask16 __U, __m512i __A,
+ __m512i __B, const int __imm)
{
- return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- __M);
+ return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
+ (__v16si) __B, __imm,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
+_mm512_maskz_alignr_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+ const int __imm)
{
- return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W, __M);
+ return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A,
+ (__v16si) __B, __imm,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_epu64 (__m512i __A, __m512i __B)
+_mm512_alignr_epi64 (__m512i __A, __m512i __B, const int __imm)
{
- return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
- (__v8di) __B,
+ return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
+ (__v8di) __B, __imm,
(__v8di)
_mm512_undefined_epi32 (),
(__mmask8) -1);
@@ -11419,3194 +11757,3331 @@ _mm512_min_epu64 (__m512i __A, __m512i __B)
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
+_mm512_mask_alignr_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
+ __m512i __B, const int __imm)
{
- return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W, __M);
+ return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
+ (__v8di) __B, __imm,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_epu64 (__mmask8 __M, __m512i __A, __m512i __B)
+_mm512_maskz_alignr_epi64 (__mmask8 __U, __m512i __A, __m512i __B,
+ const int __imm)
{
- return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
- (__v8di) __B,
+ return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A,
+ (__v8di) __B, __imm,
(__v8di)
_mm512_setzero_si512 (),
- __M);
+ (__mmask8) __U);
}
+#else
+#define _mm512_alignr_epi32(X, Y, C) \
+ ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_undefined_epi32 (),\
+ (__mmask16)-1))
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_epi32 (__m512i __A, __m512i __B)
-{
- return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
-}
+#define _mm512_mask_alignr_epi32(W, U, X, Y, C) \
+ ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(C), (__v16si)(__m512i)(W), \
+ (__mmask16)(U)))
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
-{
- return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
-}
+#define _mm512_maskz_alignr_epi32(U, X, Y, C) \
+ ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_setzero_si512 (),\
+ (__mmask16)(U)))
-extern __inline __m512i
+#define _mm512_alignr_epi64(X, Y, C) \
+ ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_undefined_epi32 (), \
+ (__mmask8)-1))
+
+#define _mm512_mask_alignr_epi64(W, U, X, Y, C) \
+ ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(C), (__v8di)(__m512i)(W), (__mmask8)(U)))
+
+#define _mm512_maskz_alignr_epi64(U, X, Y, C) \
+ ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_setzero_si512 (),\
+ (__mmask8)(U)))
+#endif
+
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
+_mm512_cmpeq_epi32_mask (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W, __M);
+ return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_epi32 (__m512i __A, __m512i __B)
+_mm512_mask_cmpeq_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A,
+ (__v16si) __B, __U);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
+_mm512_mask_cmpeq_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
+ return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A,
+ (__v8di) __B, __U);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
+_mm512_cmpeq_epi64_mask (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W, __M);
+ return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_epu32 (__m512i __A, __m512i __B)
+_mm512_cmpgt_epi32_mask (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_epu32 (__mmask16 __M, __m512i __A, __m512i __B)
+_mm512_mask_cmpgt_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
+ return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A,
+ (__v16si) __B, __U);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
+_mm512_mask_cmpgt_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W, __M);
+ return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A,
+ (__v8di) __B, __U);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_epu32 (__m512i __A, __m512i __B)
+_mm512_cmpgt_epi64_mask (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_epu32 (__mmask16 __M, __m512i __A, __m512i __B)
+_mm512_cmpge_epi32_mask (__m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 5,
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
+_mm512_mask_cmpge_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
- (__v16si) __B,
- (__v16si) __W, __M);
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 5,
+ (__mmask16) __M);
}
-extern __inline __m512
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_unpacklo_ps (__m512 __A, __m512 __B)
+_mm512_mask_cmpge_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
{
- return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 5,
+ (__mmask16) __M);
}
-extern __inline __m512
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_unpacklo_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_cmpge_epu32_mask (__m512i __X, __m512i __Y)
{
- return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 5,
+ (__mmask16) -1);
}
-extern __inline __m512
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_unpacklo_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_cmpge_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 5,
+ (__mmask8) __M);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_max_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm512_cmpge_epi64_mask (__m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_maxsd_round ((__v2df) __A,
- (__v2df) __B,
- __R);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 5,
+ (__mmask8) -1);
}
-extern __inline __m128d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, const int __R)
+_mm512_mask_cmpge_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U, __R);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 5,
+ (__mmask8) __M);
}
-extern __inline __m128d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm512_cmpge_epu64_mask (__m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 5,
+ (__mmask8) -1);
}
-extern __inline __m128
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_max_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmple_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_maxss_round ((__v4sf) __A,
- (__v4sf) __B,
- __R);
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 2,
+ (__mmask16) __M);
}
-extern __inline __m128
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, const int __R)
+_mm512_cmple_epi32_mask (__m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U, __R);
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 2,
+ (__mmask16) -1);
}
-extern __inline __m128
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm512_mask_cmple_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U, __R);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 2,
+ (__mmask16) __M);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_min_round_sd (__m128d __A, __m128d __B, const int __R)
+_mm512_cmple_epu32_mask (__m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_minsd_round ((__v2df) __A,
- (__v2df) __B,
- __R);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 2,
+ (__mmask16) -1);
}
-extern __inline __m128d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, const int __R)
+_mm512_mask_cmple_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U, __R);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 2,
+ (__mmask8) __M);
}
-extern __inline __m128d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm512_cmple_epi64_mask (__m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 2,
+ (__mmask8) -1);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_min_round_ss (__m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmple_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_minss_round ((__v4sf) __A,
- (__v4sf) __B,
- __R);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 2,
+ (__mmask8) __M);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, const int __R)
+_mm512_cmple_epu64_mask (__m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U, __R);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 2,
+ (__mmask8) -1);
}
-extern __inline __m128
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm512_mask_cmplt_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U, __R);
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 1,
+ (__mmask16) __M);
}
-#else
-#define _mm_max_round_sd(A, B, C) \
- (__m128d)__builtin_ia32_maxsd_round(A, B, C)
-
-#define _mm_mask_max_round_sd(W, U, A, B, C) \
- (__m128d)__builtin_ia32_maxsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_max_round_sd(U, A, B, C) \
- (__m128d)__builtin_ia32_maxsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
-
-#define _mm_max_round_ss(A, B, C) \
- (__m128)__builtin_ia32_maxss_round(A, B, C)
-
-#define _mm_mask_max_round_ss(W, U, A, B, C) \
- (__m128)__builtin_ia32_maxss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_max_round_ss(U, A, B, C) \
- (__m128)__builtin_ia32_maxss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#define _mm_min_round_sd(A, B, C) \
- (__m128d)__builtin_ia32_minsd_round(A, B, C)
-
-#define _mm_mask_min_round_sd(W, U, A, B, C) \
- (__m128d)__builtin_ia32_minsd_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_min_round_sd(U, A, B, C) \
- (__m128d)__builtin_ia32_minsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U, C)
-
-#define _mm_min_round_ss(A, B, C) \
- (__m128)__builtin_ia32_minss_round(A, B, C)
-
-#define _mm_mask_min_round_ss(W, U, A, B, C) \
- (__m128)__builtin_ia32_minss_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_min_round_ss(U, A, B, C) \
- (__m128)__builtin_ia32_minss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U, C)
-
-#endif
-
-extern __inline __m512d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_blend_pd (__mmask8 __U, __m512d __A, __m512d __W)
+_mm512_cmplt_epi32_mask (__m512i __X, __m512i __Y)
{
- return (__m512d) __builtin_ia32_blendmpd_512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 1,
+ (__mmask16) -1);
}
-extern __inline __m512
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_blend_ps (__mmask16 __U, __m512 __A, __m512 __W)
+_mm512_mask_cmplt_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
{
- return (__m512) __builtin_ia32_blendmps_512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 1,
+ (__mmask16) __M);
}
-extern __inline __m512i
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_blend_epi64 (__mmask8 __U, __m512i __A, __m512i __W)
+_mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_blendmq_512_mask ((__v8di) __A,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 1,
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_blend_epi32 (__mmask16 __U, __m512i __A, __m512i __W)
+_mm512_mask_cmplt_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m512i) __builtin_ia32_blendmd_512_mask ((__v16si) __A,
- (__v16si) __W,
- (__mmask16) __U);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 1,
+ (__mmask8) __M);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+_mm512_cmplt_epi64_mask (__m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
- (__v2df) __A,
- (__v2df) __B,
- __R);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 1,
+ (__mmask8) -1);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmplt_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
- (__v4sf) __A,
- (__v4sf) __B,
- __R);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 1,
+ (__mmask8) __M);
}
-extern __inline __m128d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+_mm512_cmplt_epu64_mask (__m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
- (__v2df) __A,
- -(__v2df) __B,
- __R);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 1,
+ (__mmask8) -1);
}
-extern __inline __m128
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+_mm512_cmpneq_epi32_mask (__m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
- (__v4sf) __A,
- -(__v4sf) __B,
- __R);
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 4,
+ (__mmask16) -1);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+_mm512_mask_cmpneq_epi32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
- -(__v2df) __A,
- (__v2df) __B,
- __R);
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 4,
+ (__mmask16) __M);
}
-extern __inline __m128
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmpneq_epu32_mask (__mmask16 __M, __m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
- -(__v4sf) __A,
- (__v4sf) __B,
- __R);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 4,
+ (__mmask16) __M);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+_mm512_cmpneq_epu32_mask (__m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
- -(__v2df) __A,
- -(__v2df) __B,
- __R);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, 4,
+ (__mmask16) -1);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+_mm512_mask_cmpneq_epi64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
- -(__v4sf) __A,
- -(__v4sf) __B,
- __R);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 4,
+ (__mmask8) __M);
}
-#else
-#define _mm_fmadd_round_sd(A, B, C, R) \
- (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, C, R)
-
-#define _mm_fmadd_round_ss(A, B, C, R) \
- (__m128)__builtin_ia32_vfmaddss3_round(A, B, C, R)
-
-#define _mm_fmsub_round_sd(A, B, C, R) \
- (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, -(C), R)
-#define _mm_fmsub_round_ss(A, B, C, R) \
- (__m128)__builtin_ia32_vfmaddss3_round(A, B, -(C), R)
-
-#define _mm_fnmadd_round_sd(A, B, C, R) \
- (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), C, R)
-
-#define _mm_fnmadd_round_ss(A, B, C, R) \
- (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), C, R)
-
-#define _mm_fnmsub_round_sd(A, B, C, R) \
- (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), -(C), R)
-
-#define _mm_fnmsub_round_ss(A, B, C, R) \
- (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), -(C), R)
-#endif
-
-extern __inline __m128d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_cmpneq_epi64_mask (__m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
- (__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 4,
+ (__mmask8) -1);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_cmpneq_epu64_mask (__mmask8 __M, __m512i __X, __m512i __Y)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
- (__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 4,
+ (__mmask8) __M);
}
-extern __inline __m128d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
+_mm512_cmpneq_epu64_mask (__m512i __X, __m512i __Y)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
- (__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, 4,
+ (__mmask8) -1);
}
-extern __inline __m128
+#define _MM_CMPINT_EQ 0x0
+#define _MM_CMPINT_LT 0x1
+#define _MM_CMPINT_LE 0x2
+#define _MM_CMPINT_UNUSED 0x3
+#define _MM_CMPINT_NE 0x4
+#define _MM_CMPINT_NLT 0x5
+#define _MM_CMPINT_GE 0x5
+#define _MM_CMPINT_NLE 0x6
+#define _MM_CMPINT_GT 0x6
+
+#ifdef __OPTIMIZE__
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
+_mm512_cmp_epi64_mask (__m512i __X, __m512i __Y, const int __P)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
- (__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, __P,
+ (__mmask8) -1);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
+_mm512_cmp_epi32_mask (__m512i __X, __m512i __Y, const int __P)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
- (__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, __P,
+ (__mmask16) -1);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
+_mm512_cmp_epu64_mask (__m512i __X, __m512i __Y, const int __P)
{
- return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
- (__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, __P,
+ (__mmask8) -1);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_cmp_epu32_mask (__m512i __X, __m512i __Y, const int __P)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
- (__v2df) __A,
- -(__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, __P,
+ (__mmask16) -1);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_cmp_round_pd_mask (__m512d __X, __m512d __Y, const int __P,
+ const int __R)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
- (__v4sf) __A,
- -(__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X,
+ (__v8df) __Y, __P,
+ (__mmask8) -1, __R);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
+_mm512_cmp_round_ps_mask (__m512 __X, __m512 __Y, const int __P, const int __R)
{
- return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
- (__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X,
+ (__v16sf) __Y, __P,
+ (__mmask16) -1, __R);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
+_mm512_mask_cmp_epi64_mask (__mmask8 __U, __m512i __X, __m512i __Y,
+ const int __P)
{
- return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
- (__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, __P,
+ (__mmask8) __U);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
-{
- return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
- (__v2df) __A,
- -(__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+_mm512_mask_cmp_epi32_mask (__mmask16 __U, __m512i __X, __m512i __Y,
+ const int __P)
+{
+ return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, __P,
+ (__mmask16) __U);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
+_mm512_mask_cmp_epu64_mask (__mmask8 __U, __m512i __X, __m512i __Y,
+ const int __P)
{
- return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
- (__v4sf) __A,
- -(__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X,
+ (__v8di) __Y, __P,
+ (__mmask8) __U);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_cmp_epu32_mask (__mmask16 __U, __m512i __X, __m512i __Y,
+ const int __P)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
- -(__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X,
+ (__v16si) __Y, __P,
+ (__mmask16) __U);
}
-extern __inline __m128
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_cmp_round_pd_mask (__mmask8 __U, __m512d __X, __m512d __Y,
+ const int __P, const int __R)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
- -(__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X,
+ (__v8df) __Y, __P,
+ (__mmask8) __U, __R);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
+_mm512_mask_cmp_round_ps_mask (__mmask16 __U, __m512 __X, __m512 __Y,
+ const int __P, const int __R)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
- -(__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X,
+ (__v16sf) __Y, __P,
+ (__mmask16) __U, __R);
}
-extern __inline __m128
+#else
+#define _mm512_cmp_epi64_mask(X, Y, P) \
+ ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(P),\
+ (__mmask8)-1))
+
+#define _mm512_cmp_epi32_mask(X, Y, P) \
+ ((__mmask16) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(P), \
+ (__mmask16)-1))
+
+#define _mm512_cmp_epu64_mask(X, Y, P) \
+ ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(P),\
+ (__mmask8)-1))
+
+#define _mm512_cmp_epu32_mask(X, Y, P) \
+ ((__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(P), \
+ (__mmask16)-1))
+
+#define _mm512_cmp_round_pd_mask(X, Y, P, R) \
+ ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (int)(P),\
+ (__mmask8)-1, R))
+
+#define _mm512_cmp_round_ps_mask(X, Y, P, R) \
+ ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (int)(P),\
+ (__mmask16)-1, R))
+
+#define _mm512_mask_cmp_epi64_mask(M, X, Y, P) \
+ ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(P),\
+ (__mmask8)(M)))
+
+#define _mm512_mask_cmp_epi32_mask(M, X, Y, P) \
+ ((__mmask16) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(P), \
+ (__mmask16)(M)))
+
+#define _mm512_mask_cmp_epu64_mask(M, X, Y, P) \
+ ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X), \
+ (__v8di)(__m512i)(Y), (int)(P),\
+ (__mmask8)(M)))
+
+#define _mm512_mask_cmp_epu32_mask(M, X, Y, P) \
+ ((__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X), \
+ (__v16si)(__m512i)(Y), (int)(P), \
+ (__mmask16)(M)))
+
+#define _mm512_mask_cmp_round_pd_mask(M, X, Y, P, R) \
+ ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (int)(P),\
+ (__mmask8)(M), R))
+
+#define _mm512_mask_cmp_round_ps_mask(M, X, Y, P, R) \
+ ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (int)(P),\
+ (__mmask16)(M), R))
+
+#endif
+
+#ifdef __OPTIMIZE__
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
+_mm512_i32gather_ps (__m512i __index, void const *__addr, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
- -(__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ __m512 __v1_old = _mm512_undefined_ps ();
+ __mmask16 __mask = 0xFFFF;
+
+ return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) __v1_old,
+ __addr,
+ (__v16si) __index,
+ __mask, __scale);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
+_mm512_mask_i32gather_ps (__m512 __v1_old, __mmask16 __mask,
+ __m512i __index, void const *__addr, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
- -(__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) __v1_old,
+ __addr,
+ (__v16si) __index,
+ __mask, __scale);
}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
+_mm512_i32gather_pd (__m256i __index, void const *__addr, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
- -(__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ __m512d __v1_old = _mm512_undefined_pd ();
+ __mmask8 __mask = 0xFF;
+
+ return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) __v1_old,
+ __addr,
+ (__v8si) __index, __mask,
+ __scale);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_i32gather_pd (__m512d __v1_old, __mmask8 __mask,
+ __m256i __index, void const *__addr, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
- -(__v2df) __A,
- -(__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) __v1_old,
+ __addr,
+ (__v8si) __index,
+ __mask, __scale);
}
-extern __inline __m128
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_i64gather_ps (__m512i __index, void const *__addr, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
- -(__v4sf) __A,
- -(__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ __m256 __v1_old = _mm256_undefined_ps ();
+ __mmask8 __mask = 0xFF;
+
+ return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) __v1_old,
+ __addr,
+ (__v8di) __index, __mask,
+ __scale);
}
-extern __inline __m128d
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U)
+_mm512_mask_i64gather_ps (__m256 __v1_old, __mmask8 __mask,
+ __m512i __index, void const *__addr, int __scale)
{
- return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
- -(__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) __v1_old,
+ __addr,
+ (__v8di) __index,
+ __mask, __scale);
}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U)
+_mm512_i64gather_pd (__m512i __index, void const *__addr, int __scale)
{
- return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
- -(__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ __m512d __v1_old = _mm512_undefined_pd ();
+ __mmask8 __mask = 0xFF;
+
+ return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) __v1_old,
+ __addr,
+ (__v8di) __index, __mask,
+ __scale);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B)
+_mm512_mask_i64gather_pd (__m512d __v1_old, __mmask8 __mask,
+ __m512i __index, void const *__addr, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
- -(__v2df) __A,
- -(__v2df) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) __v1_old,
+ __addr,
+ (__v8di) __index,
+ __mask, __scale);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B)
+_mm512_i32gather_epi32 (__m512i __index, void const *__addr, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
- -(__v4sf) __A,
- -(__v4sf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ __m512i __v1_old = _mm512_undefined_epi32 ();
+ __mmask16 __mask = 0xFFFF;
+
+ return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) __v1_old,
+ __addr,
+ (__v16si) __index,
+ __mask, __scale);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm512_mask_i32gather_epi32 (__m512i __v1_old, __mmask16 __mask,
+ __m512i __index, void const *__addr, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
- (__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) __v1_old,
+ __addr,
+ (__v16si) __index,
+ __mask, __scale);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm512_i32gather_epi64 (__m256i __index, void const *__addr, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
- (__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U, __R);
-}
+ __m512i __v1_old = _mm512_undefined_epi32 ();
+ __mmask8 __mask = 0xFF;
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
- const int __R)
-{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
- (__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) __v1_old,
+ __addr,
+ (__v8si) __index, __mask,
+ __scale);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
- const int __R)
+_mm512_mask_i32gather_epi64 (__m512i __v1_old, __mmask8 __mask,
+ __m256i __index, void const *__addr,
+ int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
- (__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) __v1_old,
+ __addr,
+ (__v8si) __index,
+ __mask, __scale);
}
-extern __inline __m128d
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
- const int __R)
+_mm512_i64gather_epi32 (__m512i __index, void const *__addr, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
- (__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U, __R);
+ __m256i __v1_old = _mm256_undefined_si256 ();
+ __mmask8 __mask = 0xFF;
+
+ return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) __v1_old,
+ __addr,
+ (__v8di) __index,
+ __mask, __scale);
}
-extern __inline __m128
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
- const int __R)
+_mm512_mask_i64gather_epi32 (__m256i __v1_old, __mmask8 __mask,
+ __m512i __index, void const *__addr, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
- (__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U, __R);
+ return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) __v1_old,
+ __addr,
+ (__v8di) __index,
+ __mask, __scale);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm512_i64gather_epi64 (__m512i __index, void const *__addr, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
- (__v2df) __A,
- -(__v2df) __B,
- (__mmask8) __U, __R);
+ __m512i __v1_old = _mm512_undefined_epi32 ();
+ __mmask8 __mask = 0xFF;
+
+ return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) __v1_old,
+ __addr,
+ (__v8di) __index, __mask,
+ __scale);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm512_mask_i64gather_epi64 (__m512i __v1_old, __mmask8 __mask,
+ __m512i __index, void const *__addr,
+ int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
- (__v4sf) __A,
- -(__v4sf) __B,
- (__mmask8) __U, __R);
+ return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) __v1_old,
+ __addr,
+ (__v8di) __index,
+ __mask, __scale);
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
- const int __R)
+_mm512_i32scatter_ps (void *__addr, __m512i __index, __m512 __v1, int __scale)
{
- return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
- (__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scattersiv16sf (__addr, (__mmask16) 0xFFFF,
+ (__v16si) __index, (__v16sf) __v1, __scale);
}
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
- const int __R)
+_mm512_mask_i32scatter_ps (void *__addr, __mmask16 __mask,
+ __m512i __index, __m512 __v1, int __scale)
{
- return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
- (__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scattersiv16sf (__addr, __mask, (__v16si) __index,
+ (__v16sf) __v1, __scale);
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
- const int __R)
+_mm512_i32scatter_pd (void *__addr, __m256i __index, __m512d __v1,
+ int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
- (__v2df) __A,
- -(__v2df) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scattersiv8df (__addr, (__mmask8) 0xFF,
+ (__v8si) __index, (__v8df) __v1, __scale);
}
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
- const int __R)
+_mm512_mask_i32scatter_pd (void *__addr, __mmask8 __mask,
+ __m256i __index, __m512d __v1, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
- (__v4sf) __A,
- -(__v4sf) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scattersiv8df (__addr, __mask, (__v8si) __index,
+ (__v8df) __v1, __scale);
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm512_i64scatter_ps (void *__addr, __m512i __index, __m256 __v1, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
- -(__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scatterdiv16sf (__addr, (__mmask8) 0xFF,
+ (__v8di) __index, (__v8sf) __v1, __scale);
}
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
+_mm512_mask_i64scatter_ps (void *__addr, __mmask8 __mask,
+ __m512i __index, __m256 __v1, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
- -(__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scatterdiv16sf (__addr, __mask, (__v8di) __index,
+ (__v8sf) __v1, __scale);
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
- const int __R)
+_mm512_i64scatter_pd (void *__addr, __m512i __index, __m512d __v1,
+ int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask3 ((__v2df) __W,
- -(__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scatterdiv8df (__addr, (__mmask8) 0xFF,
+ (__v8di) __index, (__v8df) __v1, __scale);
}
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
- const int __R)
+_mm512_mask_i64scatter_pd (void *__addr, __mmask8 __mask,
+ __m512i __index, __m512d __v1, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_mask3 ((__v4sf) __W,
- -(__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scatterdiv8df (__addr, __mask, (__v8di) __index,
+ (__v8df) __v1, __scale);
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
- const int __R)
+_mm512_i32scatter_epi32 (void *__addr, __m512i __index,
+ __m512i __v1, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
- -(__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scattersiv16si (__addr, (__mmask16) 0xFFFF,
+ (__v16si) __index, (__v16si) __v1, __scale);
}
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
- const int __R)
+_mm512_mask_i32scatter_epi32 (void *__addr, __mmask16 __mask,
+ __m512i __index, __m512i __v1, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
- -(__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scattersiv16si (__addr, __mask, (__v16si) __index,
+ (__v16si) __v1, __scale);
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
- const int __R)
+_mm512_i32scatter_epi64 (void *__addr, __m256i __index,
+ __m512i __v1, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_mask ((__v2df) __W,
- -(__v2df) __A,
- -(__v2df) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scattersiv8di (__addr, (__mmask8) 0xFF,
+ (__v8si) __index, (__v8di) __v1, __scale);
}
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
- const int __R)
-{
- return (__m128) __builtin_ia32_vfmaddss3_mask ((__v4sf) __W,
- -(__v4sf) __A,
- -(__v4sf) __B,
- (__mmask8) __U, __R);
+_mm512_mask_i32scatter_epi64 (void *__addr, __mmask8 __mask,
+ __m256i __index, __m512i __v1, int __scale)
+{
+ __builtin_ia32_scattersiv8di (__addr, __mask, (__v8si) __index,
+ (__v8di) __v1, __scale);
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, __mmask8 __U,
- const int __R)
+_mm512_i64scatter_epi32 (void *__addr, __m512i __index,
+ __m256i __v1, int __scale)
{
- return (__m128d) __builtin_ia32_vfmsubsd3_mask3 ((__v2df) __W,
- -(__v2df) __A,
- (__v2df) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scatterdiv16si (__addr, (__mmask8) 0xFF,
+ (__v8di) __index, (__v8si) __v1, __scale);
}
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, __mmask8 __U,
- const int __R)
+_mm512_mask_i64scatter_epi32 (void *__addr, __mmask8 __mask,
+ __m512i __index, __m256i __v1, int __scale)
{
- return (__m128) __builtin_ia32_vfmsubss3_mask3 ((__v4sf) __W,
- -(__v4sf) __A,
- (__v4sf) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scatterdiv16si (__addr, __mask, (__v8di) __index,
+ (__v8si) __v1, __scale);
}
-extern __inline __m128d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_round_sd (__mmask8 __U, __m128d __W, __m128d __A, __m128d __B,
- const int __R)
+_mm512_i64scatter_epi64 (void *__addr, __m512i __index,
+ __m512i __v1, int __scale)
{
- return (__m128d) __builtin_ia32_vfmaddsd3_maskz ((__v2df) __W,
- -(__v2df) __A,
- -(__v2df) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scatterdiv8di (__addr, (__mmask8) 0xFF,
+ (__v8di) __index, (__v8di) __v1, __scale);
}
-extern __inline __m128
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_round_ss (__mmask8 __U, __m128 __W, __m128 __A, __m128 __B,
- const int __R)
+_mm512_mask_i64scatter_epi64 (void *__addr, __mmask8 __mask,
+ __m512i __index, __m512i __v1, int __scale)
{
- return (__m128) __builtin_ia32_vfmaddss3_maskz ((__v4sf) __W,
- -(__v4sf) __A,
- -(__v4sf) __B,
- (__mmask8) __U, __R);
+ __builtin_ia32_scatterdiv8di (__addr, __mask, (__v8di) __index,
+ (__v8di) __v1, __scale);
}
#else
-#define _mm_mask_fmadd_round_sd(A, U, B, C, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_mask (A, B, C, U, R)
+#define _mm512_i32gather_ps(INDEX, ADDR, SCALE) \
+ (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)_mm512_undefined_ps(),\
+ (void const *) (ADDR), \
+ (__v16si)(__m512i) (INDEX), \
+ (__mmask16)0xFFFF, \
+ (int) (SCALE))
-#define _mm_mask_fmadd_round_ss(A, U, B, C, R) \
- (__m128) __builtin_ia32_vfmaddss3_mask (A, B, C, U, R)
+#define _mm512_mask_i32gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE) \
+ (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)(__m512) (V1OLD), \
+ (void const *) (ADDR), \
+ (__v16si)(__m512i) (INDEX), \
+ (__mmask16) (MASK), \
+ (int) (SCALE))
-#define _mm_mask3_fmadd_round_sd(A, B, C, U, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_mask3 (A, B, C, U, R)
+#define _mm512_i32gather_pd(INDEX, ADDR, SCALE) \
+ (__m512d) __builtin_ia32_gathersiv8df ((__v8df)_mm512_undefined_pd(), \
+ (void const *) (ADDR), \
+ (__v8si)(__m256i) (INDEX), \
+ (__mmask8)0xFF, (int) (SCALE))
-#define _mm_mask3_fmadd_round_ss(A, B, C, U, R) \
- (__m128) __builtin_ia32_vfmaddss3_mask3 (A, B, C, U, R)
+#define _mm512_mask_i32gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE) \
+ (__m512d) __builtin_ia32_gathersiv8df ((__v8df)(__m512d) (V1OLD), \
+ (void const *) (ADDR), \
+ (__v8si)(__m256i) (INDEX), \
+ (__mmask8) (MASK), \
+ (int) (SCALE))
-#define _mm_maskz_fmadd_round_sd(U, A, B, C, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, B, C, U, R)
+#define _mm512_i64gather_ps(INDEX, ADDR, SCALE) \
+ (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)_mm256_undefined_ps(), \
+ (void const *) (ADDR), \
+ (__v8di)(__m512i) (INDEX), \
+ (__mmask8)0xFF, (int) (SCALE))
-#define _mm_maskz_fmadd_round_ss(U, A, B, C, R) \
- (__m128) __builtin_ia32_vfmaddss3_maskz (A, B, C, U, R)
+#define _mm512_mask_i64gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE) \
+ (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)(__m256) (V1OLD), \
+ (void const *) (ADDR), \
+ (__v8di)(__m512i) (INDEX), \
+ (__mmask8) (MASK), \
+ (int) (SCALE))
-#define _mm_mask_fmsub_round_sd(A, U, B, C, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_mask (A, B, -(C), U, R)
+#define _mm512_i64gather_pd(INDEX, ADDR, SCALE) \
+ (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)_mm512_undefined_pd(), \
+ (void const *) (ADDR), \
+ (__v8di)(__m512i) (INDEX), \
+ (__mmask8)0xFF, (int) (SCALE))
-#define _mm_mask_fmsub_round_ss(A, U, B, C, R) \
- (__m128) __builtin_ia32_vfmaddss3_mask (A, B, -(C), U, R)
+#define _mm512_mask_i64gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE) \
+ (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)(__m512d) (V1OLD), \
+ (void const *) (ADDR), \
+ (__v8di)(__m512i) (INDEX), \
+ (__mmask8) (MASK), \
+ (int) (SCALE))
-#define _mm_mask3_fmsub_round_sd(A, B, C, U, R) \
- (__m128d) __builtin_ia32_vfmsubsd3_mask3 (A, B, C, U, R)
+#define _mm512_i32gather_epi32(INDEX, ADDR, SCALE) \
+ (__m512i) __builtin_ia32_gathersiv16si ((__v16si)_mm512_undefined_epi32 (),\
+ (void const *) (ADDR), \
+ (__v16si)(__m512i) (INDEX), \
+ (__mmask16)0xFFFF, \
+ (int) (SCALE))
-#define _mm_mask3_fmsub_round_ss(A, B, C, U, R) \
- (__m128) __builtin_ia32_vfmsubss3_mask3 (A, B, C, U, R)
+#define _mm512_mask_i32gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE) \
+ (__m512i) __builtin_ia32_gathersiv16si ((__v16si)(__m512i) (V1OLD), \
+ (void const *) (ADDR), \
+ (__v16si)(__m512i) (INDEX), \
+ (__mmask16) (MASK), \
+ (int) (SCALE))
-#define _mm_maskz_fmsub_round_sd(U, A, B, C, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, B, -(C), U, R)
+#define _mm512_i32gather_epi64(INDEX, ADDR, SCALE) \
+ (__m512i) __builtin_ia32_gathersiv8di ((__v8di)_mm512_undefined_epi32 (),\
+ (void const *) (ADDR), \
+ (__v8si)(__m256i) (INDEX), \
+ (__mmask8)0xFF, (int) (SCALE))
-#define _mm_maskz_fmsub_round_ss(U, A, B, C, R) \
- (__m128) __builtin_ia32_vfmaddss3_maskz (A, B, -(C), U, R)
+#define _mm512_mask_i32gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE) \
+ (__m512i) __builtin_ia32_gathersiv8di ((__v8di)(__m512i) (V1OLD), \
+ (void const *) (ADDR), \
+ (__v8si)(__m256i) (INDEX), \
+ (__mmask8) (MASK), \
+ (int) (SCALE))
-#define _mm_mask_fnmadd_round_sd(A, U, B, C, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_mask (A, -(B), C, U, R)
+#define _mm512_i64gather_epi32(INDEX, ADDR, SCALE) \
+ (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)_mm256_undefined_si256(),\
+ (void const *) (ADDR), \
+ (__v8di)(__m512i) (INDEX), \
+ (__mmask8)0xFF, (int) (SCALE))
-#define _mm_mask_fnmadd_round_ss(A, U, B, C, R) \
- (__m128) __builtin_ia32_vfmaddss3_mask (A, -(B), C, U, R)
+#define _mm512_mask_i64gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE) \
+ (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)(__m256i) (V1OLD), \
+ (void const *) (ADDR), \
+ (__v8di)(__m512i) (INDEX), \
+ (__mmask8) (MASK), \
+ (int) (SCALE))
-#define _mm_mask3_fnmadd_round_sd(A, B, C, U, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_mask3 (A, -(B), C, U, R)
+#define _mm512_i64gather_epi64(INDEX, ADDR, SCALE) \
+ (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)_mm512_undefined_epi32 (),\
+ (void const *) (ADDR), \
+ (__v8di)(__m512i) (INDEX), \
+ (__mmask8)0xFF, (int) (SCALE))
-#define _mm_mask3_fnmadd_round_ss(A, B, C, U, R) \
- (__m128) __builtin_ia32_vfmaddss3_mask3 (A, -(B), C, U, R)
+#define _mm512_mask_i64gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE) \
+ (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)(__m512i) (V1OLD), \
+ (void const *) (ADDR), \
+ (__v8di)(__m512i) (INDEX), \
+ (__mmask8) (MASK), \
+ (int) (SCALE))
-#define _mm_maskz_fnmadd_round_sd(U, A, B, C, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, -(B), C, U, R)
+#define _mm512_i32scatter_ps(ADDR, INDEX, V1, SCALE) \
+ __builtin_ia32_scattersiv16sf ((void *) (ADDR), (__mmask16)0xFFFF, \
+ (__v16si)(__m512i) (INDEX), \
+ (__v16sf)(__m512) (V1), (int) (SCALE))
-#define _mm_maskz_fnmadd_round_ss(U, A, B, C, R) \
- (__m128) __builtin_ia32_vfmaddss3_maskz (A, -(B), C, U, R)
+#define _mm512_mask_i32scatter_ps(ADDR, MASK, INDEX, V1, SCALE) \
+ __builtin_ia32_scattersiv16sf ((void *) (ADDR), (__mmask16) (MASK), \
+ (__v16si)(__m512i) (INDEX), \
+ (__v16sf)(__m512) (V1), (int) (SCALE))
-#define _mm_mask_fnmsub_round_sd(A, U, B, C, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_mask (A, -(B), -(C), U, R)
+#define _mm512_i32scatter_pd(ADDR, INDEX, V1, SCALE) \
+ __builtin_ia32_scattersiv8df ((void *) (ADDR), (__mmask8)0xFF, \
+ (__v8si)(__m256i) (INDEX), \
+ (__v8df)(__m512d) (V1), (int) (SCALE))
-#define _mm_mask_fnmsub_round_ss(A, U, B, C, R) \
- (__m128) __builtin_ia32_vfmaddss3_mask (A, -(B), -(C), U, R)
+#define _mm512_mask_i32scatter_pd(ADDR, MASK, INDEX, V1, SCALE) \
+ __builtin_ia32_scattersiv8df ((void *) (ADDR), (__mmask8) (MASK), \
+ (__v8si)(__m256i) (INDEX), \
+ (__v8df)(__m512d) (V1), (int) (SCALE))
+
+#define _mm512_i64scatter_ps(ADDR, INDEX, V1, SCALE) \
+ __builtin_ia32_scatterdiv16sf ((void *) (ADDR), (__mmask8)0xFF, \
+ (__v8di)(__m512i) (INDEX), \
+ (__v8sf)(__m256) (V1), (int) (SCALE))
+
+#define _mm512_mask_i64scatter_ps(ADDR, MASK, INDEX, V1, SCALE) \
+ __builtin_ia32_scatterdiv16sf ((void *) (ADDR), (__mmask16) (MASK), \
+ (__v8di)(__m512i) (INDEX), \
+ (__v8sf)(__m256) (V1), (int) (SCALE))
+
+#define _mm512_i64scatter_pd(ADDR, INDEX, V1, SCALE) \
+ __builtin_ia32_scatterdiv8df ((void *) (ADDR), (__mmask8)0xFF, \
+ (__v8di)(__m512i) (INDEX), \
+ (__v8df)(__m512d) (V1), (int) (SCALE))
+
+#define _mm512_mask_i64scatter_pd(ADDR, MASK, INDEX, V1, SCALE) \
+ __builtin_ia32_scatterdiv8df ((void *) (ADDR), (__mmask8) (MASK), \
+ (__v8di)(__m512i) (INDEX), \
+ (__v8df)(__m512d) (V1), (int) (SCALE))
+
+#define _mm512_i32scatter_epi32(ADDR, INDEX, V1, SCALE) \
+ __builtin_ia32_scattersiv16si ((void *) (ADDR), (__mmask16)0xFFFF, \
+ (__v16si)(__m512i) (INDEX), \
+ (__v16si)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_mask_i32scatter_epi32(ADDR, MASK, INDEX, V1, SCALE) \
+ __builtin_ia32_scattersiv16si ((void *) (ADDR), (__mmask16) (MASK), \
+ (__v16si)(__m512i) (INDEX), \
+ (__v16si)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_i32scatter_epi64(ADDR, INDEX, V1, SCALE) \
+ __builtin_ia32_scattersiv8di ((void *) (ADDR), (__mmask8)0xFF, \
+ (__v8si)(__m256i) (INDEX), \
+ (__v8di)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_mask_i32scatter_epi64(ADDR, MASK, INDEX, V1, SCALE) \
+ __builtin_ia32_scattersiv8di ((void *) (ADDR), (__mmask8) (MASK), \
+ (__v8si)(__m256i) (INDEX), \
+ (__v8di)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_i64scatter_epi32(ADDR, INDEX, V1, SCALE) \
+ __builtin_ia32_scatterdiv16si ((void *) (ADDR), (__mmask8)0xFF, \
+ (__v8di)(__m512i) (INDEX), \
+ (__v8si)(__m256i) (V1), (int) (SCALE))
+
+#define _mm512_mask_i64scatter_epi32(ADDR, MASK, INDEX, V1, SCALE) \
+ __builtin_ia32_scatterdiv16si ((void *) (ADDR), (__mmask8) (MASK), \
+ (__v8di)(__m512i) (INDEX), \
+ (__v8si)(__m256i) (V1), (int) (SCALE))
+
+#define _mm512_i64scatter_epi64(ADDR, INDEX, V1, SCALE) \
+ __builtin_ia32_scatterdiv8di ((void *) (ADDR), (__mmask8)0xFF, \
+ (__v8di)(__m512i) (INDEX), \
+ (__v8di)(__m512i) (V1), (int) (SCALE))
+
+#define _mm512_mask_i64scatter_epi64(ADDR, MASK, INDEX, V1, SCALE) \
+ __builtin_ia32_scatterdiv8di ((void *) (ADDR), (__mmask8) (MASK), \
+ (__v8di)(__m512i) (INDEX), \
+ (__v8di)(__m512i) (V1), (int) (SCALE))
+#endif
-#define _mm_mask3_fnmsub_round_sd(A, B, C, U, R) \
- (__m128d) __builtin_ia32_vfmsubsd3_mask3 (A, -(B), C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_compress_pd (__m512d __W, __mmask8 __U, __m512d __A)
+{
+ return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U);
+}
-#define _mm_mask3_fnmsub_round_ss(A, B, C, U, R) \
- (__m128) __builtin_ia32_vfmsubss3_mask3 (A, -(B), C, U, R)
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_compress_pd (__mmask8 __U, __m512d __A)
+{
+ return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
+}
-#define _mm_maskz_fnmsub_round_sd(U, A, B, C, R) \
- (__m128d) __builtin_ia32_vfmaddsd3_maskz (A, -(B), -(C), U, R)
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_compressstoreu_pd (void *__P, __mmask8 __U, __m512d __A)
+{
+ __builtin_ia32_compressstoredf512_mask ((__v8df *) __P, (__v8df) __A,
+ (__mmask8) __U);
+}
-#define _mm_maskz_fnmsub_round_ss(U, A, B, C, R) \
- (__m128) __builtin_ia32_vfmaddss3_maskz (A, -(B), -(C), U, R)
-#endif
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_compress_ps (__m512 __W, __mmask16 __U, __m512 __A)
+{
+ return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U);
+}
-#ifdef __OPTIMIZE__
-extern __inline int
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comi_round_ss (__m128 __A, __m128 __B, const int __P, const int __R)
+_mm512_maskz_compress_ps (__mmask16 __U, __m512 __A)
{
- return __builtin_ia32_vcomiss ((__v4sf) __A, (__v4sf) __B, __P, __R);
+ return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline int
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comi_round_sd (__m128d __A, __m128d __B, const int __P, const int __R)
+_mm512_mask_compressstoreu_ps (void *__P, __mmask16 __U, __m512 __A)
{
- return __builtin_ia32_vcomisd ((__v2df) __A, (__v2df) __B, __P, __R);
+ __builtin_ia32_compressstoresf512_mask ((__v16sf *) __P, (__v16sf) __A,
+ (__mmask16) __U);
}
-#else
-#define _mm_comi_round_ss(A, B, C, D)\
-__builtin_ia32_vcomiss(A, B, C, D)
-#define _mm_comi_round_sd(A, B, C, D)\
-__builtin_ia32_vcomisd(A, B, C, D)
-#endif
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_pd (__m512d __A)
+_mm512_mask_compress_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
{
- return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_pd (__m512d __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_compress_epi64 (__mmask8 __U, __m512i __A)
{
- return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_pd (__mmask8 __U, __m512d __A)
+_mm512_mask_compressstoreu_epi64 (void *__P, __mmask8 __U, __m512i __A)
{
- return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_compressstoredi512_mask ((__v8di *) __P, (__v8di) __A,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_ps (__m512 __A)
+_mm512_mask_compress_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
{
- return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_ps (__m512 __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_compress_epi32 (__mmask16 __U, __m512i __A)
{
- return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m512
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_ps (__mmask16 __U, __m512 __A)
+_mm512_mask_compressstoreu_epi32 (void *__P, __mmask16 __U, __m512i __A)
{
- return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_compressstoresi512_mask ((__v16si *) __P, (__v16si) __A,
+ (__mmask16) __U);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_pd (__m512d __A, __m512d __B)
+_mm512_mask_expand_pd (__m512d __W, __mmask8 __U, __m512d __A)
{
- return (__m512d) ((__v8df)__A + (__v8df)__B);
+ return (__m512d) __builtin_ia32_expanddf512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_expand_pd (__mmask8 __U, __m512d __A)
{
- return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_expanddf512_maskz ((__v8df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_expandloadu_pd (__m512d __W, __mmask8 __U, void const *__P)
{
- return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_expandloaddf512_mask ((const __v8df *) __P,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_ps (__m512 __A, __m512 __B)
+_mm512_maskz_expandloadu_pd (__mmask8 __U, void const *__P)
{
- return (__m512) ((__v16sf)__A + (__v16sf)__B);
+ return (__m512d) __builtin_ia32_expandloaddf512_maskz ((const __v8df *) __P,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_expand_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_expandsf512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_expand_ps (__mmask16 __U, __m512 __A)
{
- return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_expandsf512_maskz ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_expandloadu_ps (__m512 __W, __mmask16 __U, void const *__P)
{
- return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_expandloadsf512_mask ((const __v16sf *) __P,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_maskz_expandloadu_ps (__mmask16 __U, void const *__P)
{
- return (__m128d) __builtin_ia32_addsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_expandloadsf512_maskz ((const __v16sf *) __P,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_expand_epi64 (__m512i __W, __mmask8 __U, __m512i __A)
{
- return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_expanddi512_mask ((__v8di) __A,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_maskz_expand_epi64 (__mmask8 __U, __m512i __A)
{
- return (__m128) __builtin_ia32_addss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_expanddi512_maskz ((__v8di) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_pd (__m512d __A, __m512d __B)
+_mm512_mask_expandloadu_epi64 (__m512i __W, __mmask8 __U, void const *__P)
{
- return (__m512d) ((__v8df)__A - (__v8df)__B);
+ return (__m512i) __builtin_ia32_expandloaddi512_mask ((const __v8di *) __P,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_expandloadu_epi64 (__mmask8 __U, void const *__P)
{
- return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_expandloaddi512_maskz ((const __v8di *) __P,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_expand_epi32 (__m512i __W, __mmask16 __U, __m512i __A)
{
- return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_expandsi512_mask ((__v16si) __A,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_ps (__m512 __A, __m512 __B)
+_mm512_maskz_expand_epi32 (__mmask16 __U, __m512i __A)
{
- return (__m512) ((__v16sf)__A - (__v16sf)__B);
+ return (__m512i) __builtin_ia32_expandsi512_maskz ((__v16si) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_expandloadu_epi32 (__m512i __W, __mmask16 __U, void const *__P)
{
- return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_expandloadsi512_mask ((const __v16si *) __P,
+ (__v16si) __W,
+ (__mmask16) __U);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
{
- return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_expandloadsi512_maskz ((const __v16si *) __P,
+ (__v16si)
+ _mm512_setzero_si512
+ (), (__mmask16) __U);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_kand (__mmask16 __A, __mmask16 __B)
{
- return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_kandhi ((__mmask16) __A, (__mmask16) __B);
}
-extern __inline __m128d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_kandn (__mmask16 __A, __mmask16 __B)
{
- return (__m128d) __builtin_ia32_subsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+ (__mmask16) __B);
}
-extern __inline __m128
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_kor (__mmask16 __A, __mmask16 __B)
{
- return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
}
-extern __inline __m128
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_kortestz (__mmask16 __A, __mmask16 __B)
{
- return (__m128) __builtin_ia32_subss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_kortestzhi ((__mmask16) __A,
+ (__mmask16) __B);
}
-extern __inline __m512d
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_pd (__m512d __A, __m512d __B)
+_mm512_kortestc (__mmask16 __A, __mmask16 __B)
{
- return (__m512d) ((__v8df)__A * (__v8df)__B);
+ return (__mmask16) __builtin_ia32_kortestchi ((__mmask16) __A,
+ (__mmask16) __B);
}
-extern __inline __m512d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_kxnor (__mmask16 __A, __mmask16 __B)
{
- return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_kxnorhi ((__mmask16) __A, (__mmask16) __B);
}
-extern __inline __m512d
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_kxor (__mmask16 __A, __mmask16 __B)
{
- return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_kxorhi ((__mmask16) __A, (__mmask16) __B);
}
-extern __inline __m512
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_ps (__m512 __A, __m512 __B)
+_mm512_knot (__mmask16 __A)
{
- return (__m512) ((__v16sf)__A * (__v16sf)__B);
+ return (__mmask16) __builtin_ia32_knothi ((__mmask16) __A);
}
-extern __inline __m512
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_kunpackb (__mmask16 __A, __mmask16 __B)
{
- return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
}
-extern __inline __m512
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_inserti32x4 (__mmask16 __B, __m512i __C, __m128i __D,
+ const int __imm)
{
- return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C,
+ (__v4si) __D,
+ __imm,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __B);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B)
+_mm512_maskz_insertf32x4 (__mmask16 __B, __m512 __C, __m128 __D,
+ const int __imm)
{
- return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C,
+ (__v4sf) __D,
+ __imm,
+ (__v16sf)
+ _mm512_setzero_ps (), __B);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_inserti32x4 (__m512i __A, __mmask16 __B, __m512i __C,
+ __m128i __D, const int __imm)
{
- return (__m128d) __builtin_ia32_mulsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C,
+ (__v4si) __D,
+ __imm,
+ (__v16si) __A,
+ __B);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B)
+_mm512_mask_insertf32x4 (__m512 __A, __mmask16 __B, __m512 __C,
+ __m128 __D, const int __imm)
{
- return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C,
+ (__v4sf) __D,
+ __imm,
+ (__v16sf) __A, __B);
}
+#else
+#define _mm512_maskz_insertf32x4(A, X, Y, C) \
+ ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X), \
+ (__v4sf)(__m128) (Y), (int) (C), (__v16sf)_mm512_setzero_ps(), \
+ (__mmask16)(A)))
-extern __inline __m128
+#define _mm512_maskz_inserti32x4(A, X, Y, C) \
+ ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X), \
+ (__v4si)(__m128i) (Y), (int) (C), (__v16si)_mm512_setzero_si512 (), \
+ (__mmask16)(A)))
+
+#define _mm512_mask_insertf32x4(A, B, X, Y, C) \
+ ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X), \
+ (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (A), \
+ (__mmask16)(B)))
+
+#define _mm512_mask_inserti32x4(A, B, X, Y, C) \
+ ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X), \
+ (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (A), \
+ (__mmask16)(B)))
+#endif
+
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_max_epi64 (__m512i __A, __m512i __B)
{
- return (__m128) __builtin_ia32_mulss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_pd (__m512d __M, __m512d __V)
+_mm512_maskz_max_epi64 (__mmask8 __M, __m512i __A, __m512i __B)
{
- return (__m512d) ((__v8df)__M / (__v8df)__V);
+ return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_pd (__m512d __W, __mmask8 __U, __m512d __M, __m512d __V)
+_mm512_mask_max_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
- (__v8df) __V,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W, __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_pd (__mmask8 __U, __m512d __M, __m512d __V)
+_mm512_min_epi64 (__m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
- (__v8df) __V,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_ps (__m512 __A, __m512 __B)
+_mm512_mask_min_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
{
- return (__m512) ((__v16sf)__A / (__v16sf)__B);
+ return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W, __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_min_epi64 (__mmask8 __M, __m512i __A, __m512i __B)
{
- return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_max_epu64 (__m512i __A, __m512i __B)
{
- return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B)
+_mm512_maskz_max_epu64 (__mmask8 __M, __m512i __A, __m512i __B)
{
- return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_max_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
{
- return (__m128d) __builtin_ia32_divsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W, __M);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B)
+_mm512_min_epu64 (__m512i __A, __m512i __B)
{
- return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_min_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B)
{
- return (__m128) __builtin_ia32_divss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W, __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_pd (__m512d __A, __m512d __B)
+_mm512_maskz_min_epu64 (__mmask8 __M, __m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_max_epi32 (__m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_max_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_ps (__m512 __A, __m512 __B)
+_mm512_mask_max_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W, __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_min_epi32 (__m512i __A, __m512i __B)
{
- return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_min_epi32 (__mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_min_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W, __M);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_max_epu32 (__m512i __A, __m512i __B)
{
- return (__m128d) __builtin_ia32_maxsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_maskz_max_epu32 (__mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_max_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m128) __builtin_ia32_maxss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W, __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_pd (__m512d __A, __m512d __B)
+_mm512_min_epu32 (__m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_min_epu32 (__mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_min_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B)
{
- return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A,
+ (__v16si) __B,
+ (__v16si) __W, __M);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_ps (__m512 __A, __m512 __B)
+_mm512_unpacklo_ps (__m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_unpacklo_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_unpacklo_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_blend_pd (__mmask8 __U, __m512d __A, __m512d __W)
{
- return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_blendmpd_512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_sd (__mmask8 __U, __m128d __A, __m128d __B)
+_mm512_mask_blend_ps (__mmask16 __U, __m512 __A, __m512 __W)
{
- return (__m128d) __builtin_ia32_minsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_blendmps_512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_blend_epi64 (__mmask8 __U, __m512i __A, __m512i __W)
{
- return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_blendmq_512_mask ((__v8di) __A,
+ (__v8di) __W,
+ (__mmask8) __U);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_ss (__mmask8 __U, __m128 __A, __m128 __B)
+_mm512_mask_blend_epi32 (__mmask16 __U, __m512i __A, __m512i __W)
{
- return (__m128) __builtin_ia32_minss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_blendmd_512_mask ((__v16si) __A,
+ (__v16si) __W,
+ (__mmask16) __U);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_pd (__m512d __A, __m512d __B)
+_mm512_sqrt_pd (__m512d __A)
{
- return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_undefined_pd (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm512_mask_sqrt_pd (__m512d __W, __mmask8 __U, __m512d __A)
{
- return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_sqrt_pd (__mmask8 __U, __m512d __A)
{
- return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_ps (__m512 __A, __m512 __B)
+_mm512_sqrt_ps (__m512 __A)
{
- return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_sqrt_ps (__m512 __W, __mmask16 __U, __m512 __A)
{
- return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_sqrt_ps (__mmask16 __U, __m512 __A)
{
- return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_sd (__m128d __A, __m128d __B)
+_mm512_add_pd (__m512d __A, __m512d __B)
{
- return (__m128d) __builtin_ia32_scalefsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) ((__v8df)__A + (__v8df)__B);
}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_ss (__m128 __A, __m128 __B)
+_mm512_mask_add_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m128) __builtin_ia32_scalefss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_maskz_add_pd (__mmask8 __U, __m512d __A, __m512d __B)
+{
+ return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_add_ps (__m512 __A, __m512 __B)
+{
+ return (__m512) ((__v16sf)__A + (__v16sf)__B);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_add_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_maskz_add_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_sub_pd (__m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) ((__v8df)__A - (__v8df)__B);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_mask_sub_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_maskz_sub_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_sub_ps (__m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) ((__v16sf)__A - (__v16sf)__B);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_mask_sub_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_maskz_sub_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_mul_pd (__m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) ((__v8df)__A * (__v8df)__B);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_mask_mul_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_maskz_mul_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_mul_ps (__m512 __A, __m512 __B)
{
- return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) ((__v16sf)__A * (__v16sf)__B);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_mul_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_maskz_mul_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_div_pd (__m512d __M, __m512d __V)
{
- return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) ((__v8df)__M / (__v8df)__V);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_div_pd (__m512d __W, __mmask8 __U, __m512d __M, __m512d __V)
{
- return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+ (__v8df) __V,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_maskz_div_pd (__mmask8 __U, __m512d __M, __m512d __V)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M,
+ (__v8df) __V,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_div_ps (__m512 __A, __m512 __B)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) ((__v16sf)__A / (__v16sf)__B);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_mask_div_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_maskz_div_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_max_pd (__m512d __A, __m512d __B)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_mask_max_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_maskz_max_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_max_ps (__m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_mask_max_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- -(__v8df) __C,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_maskz_max_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
- (__v8df) __B,
- -(__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_min_pd (__m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_mask_min_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- -(__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_maskz_min_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- -(__v16sf) __C,
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_min_ps (__m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- -(__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_mask_min_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_maskz_min_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- -(__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_scalef_pd (__m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_mask_scalef_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_maskz_scalef_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_scalef_ps (__m512 __A, __m512 __B)
{
- return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_scalef_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_maskz_scalef_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_fmadd_pd (__m512d __A, __m512d __B, __m512d __C)
{
- return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_fmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
{
- return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C)
+_mm512_mask3_fmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
{
- return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+ return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A,
(__v8df) __B,
(__v8df) __C,
- (__mmask8) -1,
+ (__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
+_mm512_maskz_fmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
{
- return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+ return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
(__v8df) __B,
(__v8df) __C,
(__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
+_mm512_fmadd_ps (__m512 __A, __m512 __B, __m512 __C)
{
- return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
+_mm512_mask_fmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
{
- return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __C,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C)
+_mm512_mask3_fmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
{
- return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+ return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A,
(__v16sf) __B,
(__v16sf) __C,
- (__mmask16) -1,
+ (__mmask16) __U,
_MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
+_mm512_maskz_fmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
{
- return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+ return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
(__v16sf) __B,
(__v16sf) __C,
(__mmask16) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+_mm512_fmsub_pd (__m512d __A, __m512d __B, __m512d __C)
{
- return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+_mm512_mask_fmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
{
- return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __C,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttpd_epi32 (__m512d __A)
+_mm512_mask3_fmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
{
- return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1,
+ return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_fmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
{
- return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
- (__v8si) __W,
+ return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
(__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttpd_epi32 (__mmask8 __U, __m512d __A)
+_mm512_fmsub_ps (__m512 __A, __m512 __B, __m512 __C)
{
- return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttpd_epu32 (__m512d __A)
+_mm512_mask_fmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
{
- return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A)
+_mm512_mask3_fmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
{
- return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
- (__v8si) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttpd_epu32 (__mmask8 __U, __m512d __A)
+_mm512_maskz_fmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
{
- return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_epi32 (__m512d __A)
+_mm512_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C)
{
- return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A)
+_mm512_mask_fmaddsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
{
- return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
- (__v8si) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_epi32 (__mmask8 __U, __m512d __A)
+_mm512_mask3_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
{
- return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_epu32 (__m512d __A)
+_mm512_maskz_fmaddsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
{
- return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_undefined_si256 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A)
+_mm512_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C)
{
- return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
- (__v8si) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_epu32 (__mmask8 __U, __m512d __A)
+_mm512_mask_fmaddsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
{
- return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
- (__v8si)
- _mm256_setzero_si256 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
+{
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmaddsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
+{
+ return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttps_epi32 (__m512 __A)
+_mm512_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C)
{
- return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ -(__v8df) __C,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttps_epi32 (__m512i __W, __mmask16 __U, __m512 __A)
+_mm512_mask_fmsubadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
{
- return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
- (__v16si) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ -(__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttps_epi32 (__mmask16 __U, __m512 __A)
+_mm512_mask3_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttps_epu32 (__m512 __A)
+_mm512_maskz_fmsubadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
{
- return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ -(__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttps_epu32 (__m512i __W, __mmask16 __U, __m512 __A)
+_mm512_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C)
{
- return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
- (__v16si) __W,
- (__mmask16) __U,
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ -(__v16sf) __C,
+ (__mmask16) -1,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttps_epu32 (__mmask16 __U, __m512 __A)
+_mm512_mask_fmsubadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
{
- return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_setzero_si512 (),
+ return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ -(__v16sf) __C,
(__mmask16) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_epi32 (__m512 __A)
+_mm512_mask3_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
{
- return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_epi32 (__m512i __W, __mmask16 __U, __m512 __A)
+_mm512_maskz_fmsubadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
{
- return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
- (__v16si) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ -(__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_epi32 (__mmask16 __U, __m512 __A)
+_mm512_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C)
{
- return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_epu32 (__m512 __A)
+_mm512_mask_fnmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
{
- return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1,
+ return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_epu32 (__m512i __W, __mmask16 __U, __m512 __A)
+_mm512_mask3_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
{
- return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
- (__v16si) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_epu32 (__mmask16 __U, __m512 __A)
+_mm512_maskz_fnmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
{
- return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline double
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsd_f64 (__m512d __A)
+_mm512_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C)
{
- return __A[0];
+ return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline float
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtss_f32 (__m512 __A)
+_mm512_mask_fnmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
{
- return __A[0];
+ return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __x86_64__
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu64_ss (__m128 __A, unsigned long long __B)
+_mm512_mask3_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
{
- return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu64_sd (__m128d __A, unsigned long long __B)
+_mm512_maskz_fnmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
{
- return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#endif
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu32_ss (__m128 __A, unsigned __B)
+_mm512_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C)
{
- return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_ps (__m512i __A)
+_mm512_mask_fnmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
{
- return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
- (__v16sf)
- _mm512_undefined_ps (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_ps (__m512 __W, __mmask16 __U, __m512i __A)
+_mm512_mask3_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
{
- return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_ps (__mmask16 __U, __m512i __A)
+_mm512_maskz_fnmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
{
- return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __C,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu32_ps (__m512i __A)
+_mm512_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C)
{
- return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
- (__v16sf)
- _mm512_undefined_ps (),
+ return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
(__mmask16) -1,
_MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu32_ps (__m512 __W, __mmask16 __U, __m512i __A)
+_mm512_mask_fnmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
{
- return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
- (__v16sf) __W,
+ return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
(__mmask16) __U,
_MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu32_ps (__mmask16 __U, __m512i __A)
+_mm512_mask3_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
{
- return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fixupimm_pd (__m512d __A, __m512d __B, __m512i __C, const int __imm)
+_mm512_maskz_fnmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
{
- return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8di) __C,
- __imm,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __C,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fixupimm_pd (__m512d __A, __mmask8 __U, __m512d __B,
- __m512i __C, const int __imm)
+_mm512_cvttpd_epi32 (__m512d __A)
{
- return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8di) __C,
- __imm,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fixupimm_pd (__mmask8 __U, __m512d __A, __m512d __B,
- __m512i __C, const int __imm)
+_mm512_mask_cvttpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A)
{
- return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A,
- (__v8df) __B,
- (__v8di) __C,
- __imm,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+ (__v8si) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fixupimm_ps (__m512 __A, __m512 __B, __m512i __C, const int __imm)
+_mm512_maskz_cvttpd_epi32 (__mmask8 __U, __m512d __A)
{
- return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16si) __C,
- __imm,
- (__mmask16) -1,
+ return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fixupimm_ps (__m512 __A, __mmask16 __U, __m512 __B,
- __m512i __C, const int __imm)
+_mm512_cvttpd_epu32 (__m512d __A)
{
- return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16si) __C,
- __imm,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fixupimm_ps (__mmask16 __U, __m512 __A, __m512 __B,
- __m512i __C, const int __imm)
+_mm512_mask_cvttpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A)
{
- return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A,
- (__v16sf) __B,
- (__v16si) __C,
- __imm,
- (__mmask16) __U,
+ return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+ (__v8si) __W,
+ (__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fixupimm_sd (__m128d __A, __m128d __B, __m128i __C, const int __imm)
+_mm512_maskz_cvttpd_epu32 (__mmask8 __U, __m512d __A)
{
- return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
- (__v2df) __B,
- (__v2di) __C, __imm,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fixupimm_sd (__m128d __A, __mmask8 __U, __m128d __B,
- __m128i __C, const int __imm)
+_mm512_cvtpd_epi32 (__m512d __A)
{
- return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A,
- (__v2df) __B,
- (__v2di) __C, __imm,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fixupimm_sd (__mmask8 __U, __m128d __A, __m128d __B,
- __m128i __C, const int __imm)
+_mm512_mask_cvtpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A)
{
- return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A,
- (__v2df) __B,
- (__v2di) __C,
- __imm,
+ return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+ (__v8si) __W,
(__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fixupimm_ss (__m128 __A, __m128 __B, __m128i __C, const int __imm)
+_mm512_maskz_cvtpd_epi32 (__mmask8 __U, __m512d __A)
{
- return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
- (__v4sf) __B,
- (__v4si) __C, __imm,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fixupimm_ss (__m128 __A, __mmask8 __U, __m128 __B,
- __m128i __C, const int __imm)
+_mm512_cvtpd_epu32 (__m512d __A)
{
- return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A,
- (__v4sf) __B,
- (__v4si) __C, __imm,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_undefined_si256 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m256i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fixupimm_ss (__mmask8 __U, __m128 __A, __m128 __B,
- __m128i __C, const int __imm)
+_mm512_mask_cvtpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A)
{
- return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A,
- (__v4sf) __B,
- (__v4si) __C, __imm,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+ (__v8si) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm512_fixupimm_pd(X, Y, Z, C) \
- ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
- (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_mask_fixupimm_pd(X, U, Y, Z, C) \
- ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_fixupimm_pd(U, X, Y, Z, C) \
- ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X), \
- (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-#define _mm512_fixupimm_ps(X, Y, Z, C) \
- ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
- (__mmask16)(-1), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_mask_fixupimm_ps(X, U, Y, Z, C) \
- ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
- (__mmask16)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_fixupimm_ps(U, X, Y, Z, C) \
- ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X), \
- (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
- (__mmask16)(U), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtpd_epu32 (__mmask8 __U, __m512d __A)
+{
+ return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A,
+ (__v8si)
+ _mm256_setzero_si256 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm_fixupimm_sd(X, Y, Z, C) \
- ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
- (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttps_epi32 (__m512 __A)
+{
+ return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm_mask_fixupimm_sd(X, U, Y, Z, C) \
- ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttps_epi32 (__m512i __W, __mmask16 __U, __m512 __A)
+{
+ return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+ (__v16si) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm_maskz_fixupimm_sd(U, X, Y, Z, C) \
- ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttps_epi32 (__mmask16 __U, __m512 __A)
+{
+ return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm_fixupimm_ss(X, Y, Z, C) \
- ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
- (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttps_epu32 (__m512 __A)
+{
+ return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm_mask_fixupimm_ss(X, U, Y, Z, C) \
- ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttps_epu32 (__m512i __W, __mmask16 __U, __m512 __A)
+{
+ return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+ (__v16si) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#define _mm_maskz_fixupimm_ss(U, X, Y, Z, C) \
- ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-#endif
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttps_epu32 (__mmask16 __U, __m512 __A)
+{
+ return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#ifdef __x86_64__
-extern __inline unsigned long long
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_u64 (__m128 __A)
+_mm512_cvtps_epi32 (__m512 __A)
{
- return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf)
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned long long
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttss_u64 (__m128 __A)
+_mm512_mask_cvtps_epi32 (__m512i __W, __mmask16 __U, __m512 __A)
{
- return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf)
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+ (__v16si) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline long long
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttss_i64 (__m128 __A)
+_mm512_maskz_cvtps_epi32 (__mmask16 __U, __m512 __A)
{
- return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#endif /* __x86_64__ */
-extern __inline int
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsi512_si32 (__m512i __A)
+_mm512_cvtps_epu32 (__m512 __A)
{
- __v16si __B = (__v16si) __A;
- return __B[0];
+ return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_u32 (__m128 __A)
+_mm512_mask_cvtps_epu32 (__m512i __W, __mmask16 __U, __m512 __A)
{
- return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+ (__v16si) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttss_u32 (__m128 __A)
+_mm512_maskz_cvtps_epu32 (__mmask16 __U, __m512 __A)
{
- return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline int
+extern __inline double
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttss_i32 (__m128 __A)
+_mm512_cvtsd_f64 (__m512d __A)
{
- return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __A[0];
}
-extern __inline int
+extern __inline float
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_i32 (__m128d __A)
+_mm512_cvtss_f32 (__m512 __A)
{
- return (int) __builtin_ia32_cvtsd2si ((__v2df) __A);
+ return __A[0];
}
-extern __inline int
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_i32 (__m128 __A)
+_mm512_cvtepi32_ps (__m512i __A)
{
- return (int) __builtin_ia32_cvtss2si ((__v4sf) __A);
+ return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti32_sd (__m128d __A, int __B)
+_mm512_mask_cvtepi32_ps (__m512 __W, __mmask16 __U, __m512i __A)
{
- return (__m128d) __builtin_ia32_cvtsi2sd ((__v2df) __A, __B);
+ return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti32_ss (__m128 __A, int __B)
+_mm512_maskz_cvtepi32_ps (__mmask16 __U, __m512i __A)
{
- return (__m128) __builtin_ia32_cvtsi2ss ((__v4sf) __A, __B);
+ return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __x86_64__
-extern __inline unsigned long long
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_u64 (__m128d __A)
+_mm512_cvtepu32_ps (__m512i __A)
{
- return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df)
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
+ (__v16sf)
+ _mm512_undefined_ps (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned long long
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsd_u64 (__m128d __A)
+_mm512_mask_cvtepu32_ps (__m512 __W, __mmask16 __U, __m512i __A)
{
- return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df)
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline long long
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsd_i64 (__m128d __A)
+_mm512_maskz_cvtepu32_ps (__mmask16 __U, __m512i __A)
{
- return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline long long
+#ifdef __OPTIMIZE__
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_i64 (__m128d __A)
+_mm512_fixupimm_pd (__m512d __A, __m512d __B, __m512i __C, const int __imm)
{
- return (long long) __builtin_ia32_cvtsd2si64 ((__v2df) __A);
+ return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8di) __C,
+ __imm,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline long long
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_i64 (__m128 __A)
+_mm512_mask_fixupimm_pd (__m512d __A, __mmask8 __U, __m512d __B,
+ __m512i __C, const int __imm)
{
- return (long long) __builtin_ia32_cvtss2si64 ((__v4sf) __A);
+ return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8di) __C,
+ __imm,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti64_sd (__m128d __A, long long __B)
+_mm512_maskz_fixupimm_pd (__mmask8 __U, __m512d __A, __m512d __B,
+ __m512i __C, const int __imm)
{
- return (__m128d) __builtin_ia32_cvtsi642sd ((__v2df) __A, __B);
+ return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A,
+ (__v8df) __B,
+ (__v8di) __C,
+ __imm,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti64_ss (__m128 __A, long long __B)
+_mm512_fixupimm_ps (__m512 __A, __m512 __B, __m512i __C, const int __imm)
{
- return (__m128) __builtin_ia32_cvtsi642ss ((__v4sf) __A, __B);
+ return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16si) __C,
+ __imm,
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-#endif /* __x86_64__ */
-extern __inline unsigned
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_u32 (__m128d __A)
+_mm512_mask_fixupimm_ps (__m512 __A, __mmask16 __U, __m512 __B,
+ __m512i __C, const int __imm)
{
- return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16si) __C,
+ __imm,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsd_u32 (__m128d __A)
+_mm512_maskz_fixupimm_ps (__mmask16 __U, __m512 __A, __m512 __B,
+ __m512i __C, const int __imm)
{
- return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16si) __C,
+ __imm,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
+#else
+#define _mm512_fixupimm_pd(X, Y, Z, C) \
+ ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
+ (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_mask_fixupimm_pd(X, U, Y, Z, C) \
+ ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_maskz_fixupimm_pd(U, X, Y, Z, C) \
+ ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X), \
+ (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_fixupimm_ps(X, Y, Z, C) \
+ ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
+ (__mmask16)(-1), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_mask_fixupimm_ps(X, U, Y, Z, C) \
+ ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
+ (__mmask16)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_maskz_fixupimm_ps(U, X, Y, Z, C) \
+ ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X), \
+ (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \
+ (__mmask16)(U), _MM_FROUND_CUR_DIRECTION))
+
+#endif
+
extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsd_i32 (__m128d __A)
+_mm512_cvtsi512_si32 (__m512i __A)
{
- return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A,
- _MM_FROUND_CUR_DIRECTION);
+ __v16si __B = (__v16si) __A;
+ return __B[0];
}
extern __inline __m512d
@@ -14770,70 +15245,6 @@ _mm512_maskz_getexp_pd (__mmask8 __U, __m512d __A)
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_ss (__m128 __A, __m128 __B)
-{
- return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
- (__v4sf) __B,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B)
-{
- return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_ss (__mmask8 __U, __m128 __A, __m128 __B)
-{
- return (__m128) __builtin_ia32_getexpss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_sd (__m128d __A, __m128d __B)
-{
- return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
- (__v2df) __B,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B)
-{
- return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_sd (__mmask8 __U, __m128d __A, __m128d __B)
-{
- return (__m128d) __builtin_ia32_getexpsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
-}
-
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_getmant_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B,
@@ -14906,82 +15317,6 @@ _mm512_maskz_getmant_ps (__mmask16 __U, __m512 __A,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_sd (__m128d __A, __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D)
-{
- return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
- (__v2df) __B,
- (__D << 2) | __C,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
- _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
-{
- return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__D << 2) | __C,
- (__v2df) __W,
- __U,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_sd (__mmask8 __U, __m128d __A, __m128d __B,
- _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
-{
- return (__m128d) __builtin_ia32_getmantsd_mask_round ((__v2df) __A,
- (__v2df) __B,
- (__D << 2) | __C,
- (__v2df)
- _mm_setzero_pd(),
- __U,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_ss (__m128 __A, __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D)
-{
- return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
- (__v4sf) __B,
- (__D << 2) | __C,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
- _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
-{
- return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__D << 2) | __C,
- (__v4sf) __W,
- __U,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_ss (__mmask8 __U, __m128 __A, __m128 __B,
- _MM_MANTISSA_NORM_ENUM __C, _MM_MANTISSA_SIGN_ENUM __D)
-{
- return (__m128) __builtin_ia32_getmantss_mask_round ((__v4sf) __A,
- (__v4sf) __B,
- (__D << 2) | __C,
- (__v4sf)
- _mm_setzero_ps(),
- __U,
- _MM_FROUND_CUR_DIRECTION);
-}
-
#else
#define _mm512_getmant_pd(X, B, C) \
((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
@@ -14990,107 +15325,39 @@ _mm_maskz_getmant_ss (__mmask8 __U, __m128 __A, __m128 __B,
(__mmask8)-1,\
_MM_FROUND_CUR_DIRECTION))
-#define _mm512_mask_getmant_pd(W, U, X, B, C) \
- ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
- (int)(((C)<<2) | (B)), \
- (__v8df)(__m512d)(W), \
- (__mmask8)(U),\
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_getmant_pd(U, X, B, C) \
- ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
- (int)(((C)<<2) | (B)), \
- (__v8df)_mm512_setzero_pd(), \
- (__mmask8)(U),\
- _MM_FROUND_CUR_DIRECTION))
-#define _mm512_getmant_ps(X, B, C) \
- ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
- (int)(((C)<<2) | (B)), \
- (__v16sf)_mm512_undefined_ps(), \
- (__mmask16)-1,\
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_mask_getmant_ps(W, U, X, B, C) \
- ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
- (int)(((C)<<2) | (B)), \
- (__v16sf)(__m512)(W), \
- (__mmask16)(U),\
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_getmant_ps(U, X, B, C) \
- ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
- (int)(((C)<<2) | (B)), \
- (__v16sf)_mm512_setzero_ps(), \
- (__mmask16)(U),\
- _MM_FROUND_CUR_DIRECTION))
-#define _mm_getmant_sd(X, Y, C, D) \
- ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), \
- (int)(((D)<<2) | (C)), \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getmant_sd(W, U, X, Y, C, D) \
- ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v2df)(__m128d)(W), \
- (__mmask8)(U),\
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_getmant_sd(U, X, Y, C, D) \
- ((__m128d)__builtin_ia32_getmantsd_mask_round ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v2df)_mm_setzero_pd(), \
- (__mmask8)(U),\
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_getmant_ss(X, Y, C, D) \
- ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), \
- (int)(((D)<<2) | (C)), \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getmant_ss(W, U, X, Y, C, D) \
- ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v4sf)(__m128)(W), \
+#define _mm512_mask_getmant_pd(W, U, X, B, C) \
+ ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v8df)(__m512d)(W), \
(__mmask8)(U),\
_MM_FROUND_CUR_DIRECTION))
-#define _mm_maskz_getmant_ss(U, X, Y, C, D) \
- ((__m128)__builtin_ia32_getmantss_mask_round ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v4sf)_mm_setzero_ps(), \
+#define _mm512_maskz_getmant_pd(U, X, B, C) \
+ ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v8df)_mm512_setzero_pd(), \
(__mmask8)(U),\
_MM_FROUND_CUR_DIRECTION))
+#define _mm512_getmant_ps(X, B, C) \
+ ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v16sf)_mm512_undefined_ps(), \
+ (__mmask16)-1,\
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm_getexp_ss(A, B) \
- ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getexp_ss(W, U, A, B) \
- (__m128)__builtin_ia32_getexpss_mask_round(A, B, W, U,\
- _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_getexp_ss(U, A, B) \
- (__m128)__builtin_ia32_getexpss_mask_round(A, B, (__v4sf)_mm_setzero_ps(), U,\
- _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_getexp_sd(A, B) \
- ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B),\
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getexp_sd(W, U, A, B) \
- (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, W, U,\
- _MM_FROUND_CUR_DIRECTION)
-
-#define _mm_maskz_getexp_sd(U, A, B) \
- (__m128d)__builtin_ia32_getexpsd_mask_round(A, B, (__v2df)_mm_setzero_pd(), U,\
- _MM_FROUND_CUR_DIRECTION)
+#define _mm512_mask_getmant_ps(W, U, X, B, C) \
+ ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v16sf)(__m512)(W), \
+ (__mmask16)(U),\
+ _MM_FROUND_CUR_DIRECTION))
+#define _mm512_maskz_getmant_ps(U, X, B, C) \
+ ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v16sf)_mm512_setzero_ps(), \
+ (__mmask16)(U),\
+ _MM_FROUND_CUR_DIRECTION))
#define _mm512_getexp_ps(A) \
((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \
(__v16sf)_mm512_undefined_ps(), (__mmask16)-1, _MM_FROUND_CUR_DIRECTION))
@@ -15185,87 +15452,6 @@ _mm512_maskz_roundscale_pd (__mmask8 __A, __m512d __B, const int __imm)
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_ss (__m128 __A, __m128 __B, const int __imm)
-{
- return (__m128)
- __builtin_ia32_rndscaless_mask_round ((__v4sf) __A,
- (__v4sf) __B, __imm,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_ss (__m128 __A, __mmask8 __B, __m128 __C, __m128 __D,
- const int __imm)
-{
- return (__m128)
- __builtin_ia32_rndscaless_mask_round ((__v4sf) __C,
- (__v4sf) __D, __imm,
- (__v4sf) __A,
- (__mmask8) __B,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_ss (__mmask8 __A, __m128 __B, __m128 __C,
- const int __imm)
-{
- return (__m128)
- __builtin_ia32_rndscaless_mask_round ((__v4sf) __B,
- (__v4sf) __C, __imm,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __A,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm)
-{
- return (__m128d)
- __builtin_ia32_rndscalesd_mask_round ((__v2df) __A,
- (__v2df) __B, __imm,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_sd (__m128d __A, __mmask8 __B, __m128d __C, __m128d __D,
- const int __imm)
-{
- return (__m128d)
- __builtin_ia32_rndscalesd_mask_round ((__v2df) __C,
- (__v2df) __D, __imm,
- (__v2df) __A,
- (__mmask8) __B,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_sd (__mmask8 __A, __m128d __B, __m128d __C,
- const int __imm)
-{
- return (__m128d)
- __builtin_ia32_rndscalesd_mask_round ((__v2df) __B,
- (__v2df) __C, __imm,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __A,
- _MM_FROUND_CUR_DIRECTION);
-}
-
#else
#define _mm512_roundscale_ps(A, B) \
((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
@@ -15293,54 +15479,6 @@ _mm_maskz_roundscale_sd (__mmask8 __A, __m128d __B, __m128d __C,
(int)(C), \
(__v8df)_mm512_setzero_pd(),\
(__mmask8)(A), _MM_FROUND_CUR_DIRECTION))
-#define _mm_roundscale_ss(A, B, I) \
- ((__m128) \
- __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A), \
- (__v4sf) (__m128) (B), \
- (int) (I), \
- (__v4sf) _mm_setzero_ps (), \
- (__mmask8) (-1), \
- _MM_FROUND_CUR_DIRECTION))
-#define _mm_mask_roundscale_ss(A, U, B, C, I) \
- ((__m128) \
- __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (B), \
- (__v4sf) (__m128) (C), \
- (int) (I), \
- (__v4sf) (__m128) (A), \
- (__mmask8) (U), \
- _MM_FROUND_CUR_DIRECTION))
-#define _mm_maskz_roundscale_ss(U, A, B, I) \
- ((__m128) \
- __builtin_ia32_rndscaless_mask_round ((__v4sf) (__m128) (A), \
- (__v4sf) (__m128) (B), \
- (int) (I), \
- (__v4sf) _mm_setzero_ps (), \
- (__mmask8) (U), \
- _MM_FROUND_CUR_DIRECTION))
-#define _mm_roundscale_sd(A, B, I) \
- ((__m128d) \
- __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A), \
- (__v2df) (__m128d) (B), \
- (int) (I), \
- (__v2df) _mm_setzero_pd (), \
- (__mmask8) (-1), \
- _MM_FROUND_CUR_DIRECTION))
-#define _mm_mask_roundscale_sd(A, U, B, C, I) \
- ((__m128d) \
- __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (B), \
- (__v2df) (__m128d) (C), \
- (int) (I), \
- (__v2df) (__m128d) (A), \
- (__mmask8) (U), \
- _MM_FROUND_CUR_DIRECTION))
-#define _mm_maskz_roundscale_sd(U, A, B, I) \
- ((__m128d) \
- __builtin_ia32_rndscalesd_mask_round ((__v2df) (__m128d) (A), \
- (__v2df) (__m128d) (B), \
- (int) (I), \
- (__v2df) _mm_setzero_pd (), \
- (__mmask8) (U), \
- _MM_FROUND_CUR_DIRECTION))
#endif
#ifdef __OPTIMIZE__
@@ -15384,46 +15522,6 @@ _mm512_mask_cmp_pd_mask (__mmask8 __U, __m512d __X, __m512d __Y, const int __P)
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask8
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_sd_mask (__m128d __X, __m128d __Y, const int __P)
-{
- return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
- (__v2df) __Y, __P,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __mmask8
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y, const int __P)
-{
- return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X,
- (__v2df) __Y, __P,
- (__mmask8) __M,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __mmask8
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_ss_mask (__m128 __X, __m128 __Y, const int __P)
-{
- return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
- (__v4sf) __Y, __P,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __mmask8
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y, const int __P)
-{
- return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X,
- (__v4sf) __Y, __P,
- (__mmask8) __M,
- _MM_FROUND_CUR_DIRECTION);
-}
-
#else
#define _mm512_cmp_pd_mask(X, Y, P) \
((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X), \
@@ -15445,25 +15543,6 @@ _mm_mask_cmp_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y, const int __P)
(__v16sf)(__m512)(Y), (int)(P),\
(__mmask16)(M),_MM_FROUND_CUR_DIRECTION))
-#define _mm_cmp_sd_mask(X, Y, P) \
- ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (int)(P),\
- (__mmask8)-1,_MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_cmp_sd_mask(M, X, Y, P) \
- ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \
- (__v2df)(__m128d)(Y), (int)(P),\
- M,_MM_FROUND_CUR_DIRECTION))
-
-#define _mm_cmp_ss_mask(X, Y, P) \
- ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (int)(P), \
- (__mmask8)-1,_MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_cmp_ss_mask(M, X, Y, P) \
- ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \
- (__v4sf)(__m128)(Y), (int)(P), \
- M,_MM_FROUND_CUR_DIRECTION))
#endif
extern __inline __mmask8
@@ -16493,9 +16572,9 @@ _mm512_mask_reduce_max_pd (__mmask8 __U, __m512d __A)
#undef __MM512_REDUCE_OP
-#ifdef __DISABLE_AVX512F__
-#undef __DISABLE_AVX512F__
+#ifdef __DISABLE_AVX512F_512__
+#undef __DISABLE_AVX512F_512__
#pragma GCC pop_options
-#endif /* __DISABLE_AVX512F__ */
+#endif /* __DISABLE_AVX512F_512__ */
#endif /* _AVX512FINTRIN_H_INCLUDED */
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 03/18] [PATCH 2/5] Push evex512 target for 512 bit intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
2023-09-21 7:19 ` [PATCH 01/18] Initial support for -mevex512 Hu, Lin1
2023-09-21 7:19 ` [PATCH 02/18] [PATCH 1/5] Push evex512 target for 512 bit intrins Hu, Lin1
@ 2023-09-21 7:19 ` Hu, Lin1
2023-09-21 7:19 ` [PATCH 04/18] [PATCH 3/5] " Hu, Lin1
` (16 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:19 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/avx512dqintrin.h: Add evex512 target for 512 bit
intrins.
---
gcc/config/i386/avx512dqintrin.h | 1840 +++++++++++++++---------------
1 file changed, 926 insertions(+), 914 deletions(-)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 93900a0b5c7..b6a1d499e25 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -184,1275 +184,1426 @@ _kandn_mask8 (__mmask8 __A, __mmask8 __B)
return (__mmask8) __builtin_ia32_kandnqi ((__mmask8) __A, (__mmask8) __B);
}
-extern __inline __m512d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f64x2 (__m128d __A)
-{
- return (__m512d)
- __builtin_ia32_broadcastf64x2_512_mask ((__v2df) __A,
- _mm512_undefined_pd (),
- (__mmask8) -1);
-}
-
-extern __inline __m512d
+#ifdef __OPTIMIZE__
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f64x2 (__m512d __O, __mmask8 __M, __m128d __A)
+_kshiftli_mask8 (__mmask8 __A, unsigned int __B)
{
- return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df)
- __A,
- (__v8df)
- __O, __M);
+ return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, (__mmask8) __B);
}
-extern __inline __m512d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f64x2 (__mmask8 __M, __m128d __A)
+_kshiftri_mask8 (__mmask8 __A, unsigned int __B)
{
- return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df)
- __A,
- (__v8df)
- _mm512_setzero_ps (),
- __M);
+ return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, (__mmask8) __B);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i64x2 (__m128i __A)
+_mm_reduce_sd (__m128d __A, __m128d __B, int __C)
{
- return (__m512i)
- __builtin_ia32_broadcasti64x2_512_mask ((__v2di) __A,
- _mm512_undefined_epi32 (),
+ return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df) _mm_setzero_pd (),
(__mmask8) -1);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i64x2 (__m512i __O, __mmask8 __M, __m128i __A)
+_mm_reduce_round_sd (__m128d __A, __m128d __B, int __C, const int __R)
{
- return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di)
- __A,
- (__v8di)
- __O, __M);
+ return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A)
+_mm_mask_reduce_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, int __C)
{
- return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di)
- __A,
- (__v8di)
- _mm512_setzero_si512 (),
- __M);
+ return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f32x2 (__m128 __A)
+_mm_mask_reduce_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
+ __m128d __B, int __C, const int __R)
{
- return (__m512)
- __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
- (__v16sf)_mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df) __W,
+ __U, __R);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f32x2 (__m512 __O, __mmask16 __M, __m128 __A)
+_mm_maskz_reduce_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C)
{
- return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
- (__v16sf)
- __O, __M);
+ return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df) _mm_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f32x2 (__mmask16 __M, __m128 __A)
+_mm_maskz_reduce_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
+ int __C, const int __R)
{
- return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- __M);
+ return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df)
+ _mm_setzero_pd (),
+ __U, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i32x2 (__m128i __A)
+_mm_reduce_ss (__m128 __A, __m128 __B, int __C)
{
- return (__m512i)
- __builtin_ia32_broadcasti32x2_512_mask ((__v4si) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf) _mm_setzero_ps (),
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i32x2 (__m512i __O, __mmask16 __M, __m128i __A)
+_mm_reduce_round_ss (__m128 __A, __m128 __B, int __C, const int __R)
{
- return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si)
- __A,
- (__v16si)
- __O, __M);
+ return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i32x2 (__mmask16 __M, __m128i __A)
+_mm_mask_reduce_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, int __C)
{
- return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si)
- __A,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
+ return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf) __W,
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_f32x8 (__m256 __A)
+_mm_mask_reduce_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
+ __m128 __B, int __C, const int __R)
{
- return (__m512)
- __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
- _mm512_undefined_ps (),
- (__mmask16) -1);
+ return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf) __W,
+ __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_f32x8 (__m512 __O, __mmask16 __M, __m256 __A)
+_mm_maskz_reduce_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C)
{
- return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
- (__v16sf)__O,
- __M);
+ return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf) _mm_setzero_ps (),
+ (__mmask8) __U);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_f32x8 (__mmask16 __M, __m256 __A)
+_mm_maskz_reduce_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
+ int __C, const int __R)
{
- return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
- (__v16sf)
- _mm512_setzero_ps (),
- __M);
+ return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf)
+ _mm_setzero_ps (),
+ __U, __R);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_broadcast_i32x8 (__m256i __A)
+_mm_range_sd (__m128d __A, __m128d __B, int __C)
{
- return (__m512i)
- __builtin_ia32_broadcasti32x8_512_mask ((__v8si) __A,
- (__v16si)
- _mm512_undefined_epi32 (),
- (__mmask16) -1);
+ return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_broadcast_i32x8 (__m512i __O, __mmask16 __M, __m256i __A)
+_mm_mask_range_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, int __C)
{
- return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si)
- __A,
- (__v16si)__O,
- __M);
+ return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_broadcast_i32x8 (__mmask16 __M, __m256i __A)
+_mm_maskz_range_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C)
{
- return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si)
- __A,
- (__v16si)
- _mm512_setzero_si512 (),
- __M);
+ return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mullo_epi64 (__m512i __A, __m512i __B)
+_mm_range_ss (__m128 __A, __m128 __B, int __C)
{
- return (__m512i) ((__v8du) __A * (__v8du) __B);
+ return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mullo_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
- __m512i __B)
+_mm_mask_range_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, int __C)
{
- return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di) __W,
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mullo_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
+_mm_maskz_range_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C)
{
- return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A,
- (__v8di) __B,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U);
+ return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_pd (__m512d __A, __m512d __B)
+_mm_range_round_sd (__m128d __A, __m128d __B, int __C, const int __R)
{
- return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) -1);
+ return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_xor_pd (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B)
+_mm_mask_range_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+ int __C, const int __R)
{
- return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_xor_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm_maskz_range_round_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C,
+ const int __R)
{
- return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+ (__v2df) __B, __C,
+ (__v2df)
+ _mm_setzero_pd (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_xor_ps (__m512 __A, __m512 __B)
+_mm_range_round_ss (__m128 __A, __m128 __B, int __C, const int __R)
{
- return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) -1);
+ return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_xor_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm_mask_range_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+ int __C, const int __R)
{
- return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf) __W,
+ (__mmask8) __U, __R);
}
-extern __inline __m512
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_xor_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm_maskz_range_round_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C,
+ const int __R)
{
- return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+ (__v4sf) __B, __C,
+ (__v4sf)
+ _mm_setzero_ps (),
+ (__mmask8) __U, __R);
}
-extern __inline __m512d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_pd (__m512d __A, __m512d __B)
+_mm_fpclass_ss_mask (__m128 __A, const int __imm)
{
- return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) -1);
+ return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm,
+ (__mmask8) -1);
}
-extern __inline __m512d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_or_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
+_mm_fpclass_sd_mask (__m128d __A, const int __imm)
{
- return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm,
+ (__mmask8) -1);
}
-extern __inline __m512d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_or_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
{
- return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm, __U);
}
-extern __inline __m512
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_or_ps (__m512 __A, __m512 __B)
+_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
{
- return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) -1);
+ return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U);
}
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_or_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
-{
- return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U);
-}
+#else
+#define _kshiftli_mask8(X, Y) \
+ ((__mmask8) __builtin_ia32_kshiftliqi ((__mmask8)(X), (__mmask8)(Y)))
-extern __inline __m512
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_or_ps (__mmask16 __U, __m512 __A, __m512 __B)
-{
- return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
-}
+#define _kshiftri_mask8(X, Y) \
+ ((__mmask8) __builtin_ia32_kshiftriqi ((__mmask8)(X), (__mmask8)(Y)))
+
+#define _mm_range_sd(A, B, C) \
+ ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
+ (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_range_sd(W, U, A, B, C) \
+ ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_range_sd(U, A, B, C) \
+ ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_range_ss(A, B, C) \
+ ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
+ (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_range_ss(W, U, A, B, C) \
+ ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_range_ss(U, A, B, C) \
+ ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
+ (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_range_round_sd(A, B, C, R) \
+ ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
+ (__mmask8) -1, (R)))
+
+#define _mm_mask_range_round_sd(W, U, A, B, C, R) \
+ ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \
+ (__mmask8)(U), (R)))
+
+#define _mm_maskz_range_round_sd(U, A, B, C, R) \
+ ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
+ (__mmask8)(U), (R)))
+
+#define _mm_range_round_ss(A, B, C, R) \
+ ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
+ (__mmask8) -1, (R)))
+
+#define _mm_mask_range_round_ss(W, U, A, B, C, R) \
+ ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \
+ (__mmask8)(U), (R)))
+
+#define _mm_maskz_range_round_ss(U, A, B, C, R) \
+ ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
+ (__mmask8)(U), (R)))
+
+#define _mm_fpclass_ss_mask(X, C) \
+ ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X), \
+ (int) (C), (__mmask8) (-1))) \
+
+#define _mm_fpclass_sd_mask(X, C) \
+ ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), \
+ (int) (C), (__mmask8) (-1))) \
+
+#define _mm_mask_fpclass_ss_mask(X, C, U) \
+ ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X), \
+ (int) (C), (__mmask8) (U)))
+
+#define _mm_mask_fpclass_sd_mask(X, C, U) \
+ ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), \
+ (int) (C), (__mmask8) (U)))
+#define _mm_reduce_sd(A, B, C) \
+ ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
+ (__mmask8)-1))
+
+#define _mm_mask_reduce_sd(W, U, A, B, C) \
+ ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), (__mmask8)(U)))
+
+#define _mm_maskz_reduce_sd(U, A, B, C) \
+ ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
+ (__mmask8)(U)))
+
+#define _mm_reduce_round_sd(A, B, C, R) \
+ ((__m128d) __builtin_ia32_reducesd_round ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__mmask8)(U), (int)(R)))
+
+#define _mm_mask_reduce_round_sd(W, U, A, B, C, R) \
+ ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \
+ (__mmask8)(U), (int)(R)))
+
+#define _mm_maskz_reduce_round_sd(U, A, B, C, R) \
+ ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
+ (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
+ (__mmask8)(U), (int)(R)))
+
+#define _mm_reduce_ss(A, B, C) \
+ ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
+ (__mmask8)-1))
+
+#define _mm_mask_reduce_ss(W, U, A, B, C) \
+ ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), (__mmask8)(U)))
+
+#define _mm_maskz_reduce_ss(U, A, B, C) \
+ ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
+ (__mmask8)(U)))
+
+#define _mm_reduce_round_ss(A, B, C, R) \
+ ((__m128) __builtin_ia32_reducess_round ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__mmask8)(U), (int)(R)))
+
+#define _mm_mask_reduce_round_ss(W, U, A, B, C, R) \
+ ((__m128) __builtin_ia32_reducess_mask_round ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \
+ (__mmask8)(U), (int)(R)))
+
+#define _mm_maskz_reduce_round_ss(U, A, B, C, R) \
+ ((__m128) __builtin_ia32_reducesd_mask_round ((__v4sf)(__m128)(A), \
+ (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
+ (__mmask8)(U), (int)(R)))
+
+#endif
+
+#ifdef __DISABLE_AVX512DQ__
+#undef __DISABLE_AVX512DQ__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512DQ__ */
+
+#if !defined (__AVX512DQ__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512dq,evex512")
+#define __DISABLE_AVX512DQ_512__
+#endif /* __AVX512DQ_512__ */
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_pd (__m512d __A, __m512d __B)
+_mm512_broadcast_f64x2 (__m128d __A)
{
- return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
+ return (__m512d)
+ __builtin_ia32_broadcastf64x2_512_mask ((__v2df) __A,
+ _mm512_undefined_pd (),
(__mmask8) -1);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_and_pd (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B)
+_mm512_mask_broadcast_f64x2 (__m512d __O, __mmask8 __M, __m128d __A)
{
- return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df)
+ __A,
+ (__v8df)
+ __O, __M);
}
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_and_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_broadcast_f64x2 (__mmask8 __M, __m128d __A)
{
- return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512d) __builtin_ia32_broadcastf64x2_512_mask ((__v2df)
+ __A,
+ (__v8df)
+ _mm512_setzero_ps (),
+ __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_and_ps (__m512 __A, __m512 __B)
+_mm512_broadcast_i64x2 (__m128i __A)
{
- return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) -1);
+ return (__m512i)
+ __builtin_ia32_broadcasti64x2_512_mask ((__v2di) __A,
+ _mm512_undefined_epi32 (),
+ (__mmask8) -1);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_and_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
+_mm512_mask_broadcast_i64x2 (__m512i __O, __mmask8 __M, __m128i __A)
{
- return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di)
+ __A,
+ (__v8di)
+ __O, __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_and_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_broadcast_i64x2 (__mmask8 __M, __m128i __A)
{
- return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_broadcasti64x2_512_mask ((__v2di)
+ __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_pd (__m512d __A, __m512d __B)
+_mm512_broadcast_f32x2 (__m128 __A)
{
- return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) -1);
+ return (__m512)
+ __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
+ (__v16sf)_mm512_undefined_ps (),
+ (__mmask16) -1);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_andnot_pd (__m512d __W, __mmask8 __U, __m512d __A,
- __m512d __B)
+_mm512_mask_broadcast_f32x2 (__m512 __O, __mmask16 __M, __m128 __A)
{
- return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df) __W,
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
+ (__v16sf)
+ __O, __M);
}
-extern __inline __m512d
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_andnot_pd (__mmask8 __U, __m512d __A, __m512d __B)
+_mm512_maskz_broadcast_f32x2 (__mmask16 __M, __m128 __A)
{
- return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
- (__v8df) __B,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U);
+ return (__m512) __builtin_ia32_broadcastf32x2_512_mask ((__v4sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_andnot_ps (__m512 __A, __m512 __B)
+_mm512_broadcast_i32x2 (__m128i __A)
{
- return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
+ return (__m512i)
+ __builtin_ia32_broadcasti32x2_512_mask ((__v4si) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
(__mmask16) -1);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_andnot_ps (__m512 __W, __mmask16 __U, __m512 __A,
- __m512 __B)
+_mm512_mask_broadcast_i32x2 (__m512i __O, __mmask16 __M, __m128i __A)
{
- return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf) __W,
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si)
+ __A,
+ (__v16si)
+ __O, __M);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_andnot_ps (__mmask16 __U, __m512 __A, __m512 __B)
+_mm512_maskz_broadcast_i32x2 (__mmask16 __M, __m128i __A)
{
- return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
- (__v16sf) __B,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U);
+ return (__m512i) __builtin_ia32_broadcasti32x2_512_mask ((__v4si)
+ __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
}
-extern __inline __mmask16
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movepi32_mask (__m512i __A)
+_mm512_broadcast_f32x8 (__m256 __A)
{
- return (__mmask16) __builtin_ia32_cvtd2mask512 ((__v16si) __A);
+ return (__m512)
+ __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
+ _mm512_undefined_ps (),
+ (__mmask16) -1);
}
-extern __inline __mmask8
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movepi64_mask (__m512i __A)
+_mm512_mask_broadcast_f32x8 (__m512 __O, __mmask16 __M, __m256 __A)
{
- return (__mmask8) __builtin_ia32_cvtq2mask512 ((__v8di) __A);
+ return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
+ (__v16sf)__O,
+ __M);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movm_epi32 (__mmask16 __A)
+_mm512_maskz_broadcast_f32x8 (__mmask16 __M, __m256 __A)
{
- return (__m512i) __builtin_ia32_cvtmask2d512 (__A);
+ return (__m512) __builtin_ia32_broadcastf32x8_512_mask ((__v8sf) __A,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_movm_epi64 (__mmask8 __A)
+_mm512_broadcast_i32x8 (__m256i __A)
{
- return (__m512i) __builtin_ia32_cvtmask2q512 (__A);
+ return (__m512i)
+ __builtin_ia32_broadcasti32x8_512_mask ((__v8si) __A,
+ (__v16si)
+ _mm512_undefined_epi32 (),
+ (__mmask16) -1);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttpd_epi64 (__m512d __A)
+_mm512_mask_broadcast_i32x8 (__m512i __O, __mmask16 __M, __m256i __A)
{
- return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si)
+ __A,
+ (__v16si)__O,
+ __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_broadcast_i32x8 (__mmask16 __M, __m256i __A)
{
- return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
- (__v8di) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_broadcasti32x8_512_mask ((__v8si)
+ __A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __M);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttpd_epi64 (__mmask8 __U, __m512d __A)
+_mm512_mullo_epi64 (__m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) ((__v8du) __A * (__v8du) __B);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttpd_epu64 (__m512d __A)
+_mm512_mask_mullo_epi64 (__m512i __W, __mmask8 __U, __m512i __A,
+ __m512i __B)
{
- return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di) __W,
+ (__mmask8) __U);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_mullo_epi64 (__mmask8 __U, __m512i __A, __m512i __B)
{
- return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
- (__v8di) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_pmullq512_mask ((__v8di) __A,
+ (__v8di) __B,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttpd_epu64 (__mmask8 __U, __m512d __A)
+_mm512_xor_pd (__m512d __A, __m512d __B)
{
- return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttps_epi64 (__m256 __A)
+_mm512_mask_xor_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B)
{
- return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttps_epi64 (__m512i __W, __mmask8 __U, __m256 __A)
+_mm512_maskz_xor_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
- (__v8di) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_xorpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttps_epi64 (__mmask8 __U, __m256 __A)
+_mm512_xor_ps (__m512 __A, __m512 __B)
{
- return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttps_epu64 (__m256 __A)
+_mm512_mask_xor_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttps_epu64 (__m512i __W, __mmask8 __U, __m256 __A)
+_mm512_maskz_xor_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
- (__v8di) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_xorps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttps_epu64 (__mmask8 __U, __m256 __A)
+_mm512_or_pd (__m512d __A, __m512d __B)
{
- return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_epi64 (__m512d __A)
+_mm512_mask_or_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_or_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
- (__v8di) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_orpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_epi64 (__mmask8 __U, __m512d __A)
+_mm512_or_ps (__m512 __A, __m512 __B)
{
- return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_epu64 (__m512d __A)
+_mm512_mask_or_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A)
+_mm512_maskz_or_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
- (__v8di) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_orps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_epu64 (__mmask8 __U, __m512d __A)
+_mm512_and_pd (__m512d __A, __m512d __B)
{
- return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) -1);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_epi64 (__m256 __A)
+_mm512_mask_and_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B)
{
- return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_epi64 (__m512i __W, __mmask8 __U, __m256 __A)
+_mm512_maskz_and_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
- (__v8di) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_andpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_epi64 (__mmask8 __U, __m256 __A)
+_mm512_and_ps (__m512 __A, __m512 __B)
{
- return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) -1);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtps_epu64 (__m256 __A)
+_mm512_mask_and_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtps_epu64 (__m512i __W, __mmask8 __U, __m256 __A)
+_mm512_maskz_and_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
- (__v8di) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_andps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtps_epu64 (__mmask8 __U, __m256 __A)
+_mm512_andnot_pd (__m512d __A, __m512d __B)
{
- return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
- (__v8di)
- _mm512_setzero_si512 (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) -1);
}
-extern __inline __m256
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_ps (__m512i __A)
+_mm512_mask_andnot_pd (__m512d __W, __mmask8 __U, __m512d __A,
+ __m512d __B)
{
- return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
- (__v8sf)
- _mm256_setzero_ps (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df) __W,
+ (__mmask8) __U);
}
-extern __inline __m256
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_ps (__m256 __W, __mmask8 __U, __m512i __A)
+_mm512_maskz_andnot_pd (__mmask8 __U, __m512d __A, __m512d __B)
{
- return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
- (__v8sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_andnpd512_mask ((__v8df) __A,
+ (__v8df) __B,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U);
}
-extern __inline __m256
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_ps (__mmask8 __U, __m512i __A)
+_mm512_andnot_ps (__m512 __A, __m512 __B)
{
- return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
- (__v8sf)
- _mm256_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) -1);
}
-extern __inline __m256
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu64_ps (__m512i __A)
+_mm512_mask_andnot_ps (__m512 __W, __mmask16 __U, __m512 __A,
+ __m512 __B)
{
- return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
- (__v8sf)
- _mm256_setzero_ps (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf) __W,
+ (__mmask16) __U);
}
-extern __inline __m256
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu64_ps (__m256 __W, __mmask8 __U, __m512i __A)
+_mm512_maskz_andnot_ps (__mmask16 __U, __m512 __A, __m512 __B)
{
- return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
- (__v8sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __builtin_ia32_andnps512_mask ((__v16sf) __A,
+ (__v16sf) __B,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U);
}
-extern __inline __m256
+extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu64_ps (__mmask8 __U, __m512i __A)
+_mm512_movepi32_mask (__m512i __A)
{
- return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
- (__v8sf)
- _mm256_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask16) __builtin_ia32_cvtd2mask512 ((__v16si) __A);
}
-extern __inline __m512d
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_pd (__m512i __A)
+_mm512_movepi64_mask (__m512i __A)
{
- return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask8) __builtin_ia32_cvtq2mask512 ((__v8di) __A);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_pd (__m512d __W, __mmask8 __U, __m512i __A)
+_mm512_movm_epi32 (__mmask16 __A)
{
- return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvtmask2d512 (__A);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_pd (__mmask8 __U, __m512i __A)
+_mm512_movm_epi64 (__mmask8 __A)
{
- return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvtmask2q512 (__A);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu64_pd (__m512i __A)
+_mm512_cvttpd_epi64 (__m512d __A)
{
- return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
- (__v8df)
- _mm512_setzero_pd (),
+ return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
(__mmask8) -1,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu64_pd (__m512d __W, __mmask8 __U, __m512i __A)
+_mm512_mask_cvttpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A)
{
- return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
- (__v8df) __W,
+ return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
+ (__v8di) __W,
(__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu64_pd (__mmask8 __U, __m512i __A)
+_mm512_maskz_cvttpd_epi64 (__mmask8 __U, __m512d __A)
{
- return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
- (__v8df)
- _mm512_setzero_pd (),
+ return (__m512i) __builtin_ia32_cvttpd2qq512_mask ((__v8df) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
(__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftli_mask8 (__mmask8 __A, unsigned int __B)
+_mm512_cvttpd_epu64 (__m512d __A)
{
- return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, (__mmask8) __B);
+ return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask8
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftri_mask8 (__mmask8 __A, unsigned int __B)
+_mm512_mask_cvttpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A)
{
- return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, (__mmask8) __B);
+ return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
+ (__v8di) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_range_pd (__m512d __A, __m512d __B, int __C)
+_mm512_maskz_cvttpd_epu64 (__mmask8 __U, __m512d __A)
{
- return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
- (__v8df) __B, __C,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvttpd2uqq512_mask ((__v8df) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_range_pd (__m512d __W, __mmask8 __U,
- __m512d __A, __m512d __B, int __C)
+_mm512_cvttps_epi64 (__m256 __A)
{
- return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
- (__v8df) __B, __C,
- (__v8df) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_range_pd (__mmask8 __U, __m512d __A, __m512d __B, int __C)
+_mm512_mask_cvttps_epi64 (__m512i __W, __mmask8 __U, __m256 __A)
{
- return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
- (__v8df) __B, __C,
- (__v8df)
- _mm512_setzero_pd (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
+ (__v8di) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_range_ps (__m512 __A, __m512 __B, int __C)
+_mm512_maskz_cvttps_epi64 (__mmask8 __U, __m256 __A)
{
- return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
- (__v16sf) __B, __C,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvttps2qq512_mask ((__v8sf) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_range_ps (__m512 __W, __mmask16 __U,
- __m512 __A, __m512 __B, int __C)
+_mm512_cvttps_epu64 (__m256 __A)
{
- return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
- (__v16sf) __B, __C,
- (__v16sf) __W,
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_range_ps (__mmask16 __U, __m512 __A, __m512 __B, int __C)
+_mm512_mask_cvttps_epu64 (__m512i __W, __mmask8 __U, __m256 __A)
{
- return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
- (__v16sf) __B, __C,
- (__v16sf)
- _mm512_setzero_ps (),
- (__mmask16) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
+ (__v8di) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_sd (__m128d __A, __m128d __B, int __C)
+_mm512_maskz_cvttps_epu64 (__mmask8 __U, __m256 __A)
{
- return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df) _mm_setzero_pd (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_cvttps2uqq512_mask ((__v8sf) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_round_sd (__m128d __A, __m128d __B, int __C, const int __R)
+_mm512_cvtpd_epi64 (__m512d __A)
{
- return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtpd_epi64 (__m512i __W, __mmask8 __U, __m512d __A)
+{
+ return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
+ (__v8di) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, int __C)
+_mm512_maskz_cvtpd_epi64 (__mmask8 __U, __m512d __A)
{
- return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df) __W,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_cvtpd2qq512_mask ((__v8df) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_round_sd (__m128d __W, __mmask8 __U, __m128d __A,
- __m128d __B, int __C, const int __R)
+_mm512_cvtpd_epu64 (__m512d __A)
{
- return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df) __W,
- __U, __R);
+ return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C)
+_mm512_mask_cvtpd_epu64 (__m512i __W, __mmask8 __U, __m512d __A)
{
- return (__m128d) __builtin_ia32_reducesd_mask ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df) _mm_setzero_pd (),
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
+ (__v8di) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_round_sd (__mmask8 __U, __m128d __A, __m128d __B,
- int __C, const int __R)
+_mm512_maskz_cvtpd_epu64 (__mmask8 __U, __m512d __A)
{
- return (__m128d) __builtin_ia32_reducesd_mask_round ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df)
- _mm_setzero_pd (),
- __U, __R);
+ return (__m512i) __builtin_ia32_cvtpd2uqq512_mask ((__v8df) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_ss (__m128 __A, __m128 __B, int __C)
+_mm512_cvtps_epi64 (__m256 __A)
{
- return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf) _mm_setzero_ps (),
- (__mmask8) -1);
+ return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_round_ss (__m128 __A, __m128 __B, int __C, const int __R)
+_mm512_mask_cvtps_epi64 (__m512i __W, __mmask8 __U, __m256 __A)
{
- return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) -1, __R);
+ return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
+ (__v8di) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, int __C)
+_mm512_maskz_cvtps_epi64 (__mmask8 __U, __m256 __A)
{
- return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf) __W,
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_cvtps2qq512_mask ((__v8sf) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_round_ss (__m128 __W, __mmask8 __U, __m128 __A,
- __m128 __B, int __C, const int __R)
+_mm512_cvtps_epu64 (__m256 __A)
{
- return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf) __W,
- __U, __R);
+ return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C)
+_mm512_mask_cvtps_epu64 (__m512i __W, __mmask8 __U, __m256 __A)
{
- return (__m128) __builtin_ia32_reducess_mask ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf) _mm_setzero_ps (),
- (__mmask8) __U);
+ return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
+ (__v8di) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_round_ss (__mmask8 __U, __m128 __A, __m128 __B,
- int __C, const int __R)
+_mm512_maskz_cvtps_epu64 (__mmask8 __U, __m256 __A)
{
- return (__m128) __builtin_ia32_reducess_mask_round ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf)
- _mm_setzero_ps (),
- __U, __R);
+ return (__m512i) __builtin_ia32_cvtps2uqq512_mask ((__v8sf) __A,
+ (__v8di)
+ _mm512_setzero_si512 (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_range_sd (__m128d __A, __m128d __B, int __C)
+_mm512_cvtepi64_ps (__m512i __A)
{
- return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df)
- _mm_setzero_pd (),
+ return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
+ (__v8sf)
+ _mm256_setzero_ps (),
(__mmask8) -1,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_range_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, int __C)
+_mm512_mask_cvtepi64_ps (__m256 __W, __mmask8 __U, __m512i __A)
{
- return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df) __W,
+ return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
+ (__v8sf) __W,
(__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_range_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C)
+_mm512_maskz_cvtepi64_ps (__mmask8 __U, __m512i __A)
{
- return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df)
- _mm_setzero_pd (),
+ return (__m256) __builtin_ia32_cvtqq2ps512_mask ((__v8di) __A,
+ (__v8sf)
+ _mm256_setzero_ps (),
(__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_range_ss (__m128 __A, __m128 __B, int __C)
+_mm512_cvtepu64_ps (__m512i __A)
{
- return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
+ (__v8sf)
+ _mm256_setzero_ps (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m256
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_range_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, int __C)
+_mm512_mask_cvtepu64_ps (__m256 __W, __mmask8 __U, __m512i __A)
{
- return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf) __W,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
+ (__v8sf) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtepu64_ps (__mmask8 __U, __m512i __A)
+{
+ return (__m256) __builtin_ia32_cvtuqq2ps512_mask ((__v8di) __A,
+ (__v8sf)
+ _mm256_setzero_ps (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_range_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C)
+_mm512_cvtepi64_pd (__m512i __A)
{
- return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_range_round_sd (__m128d __A, __m128d __B, int __C, const int __R)
+_mm512_mask_cvtepi64_pd (__m512d __W, __mmask8 __U, __m512i __A)
{
- return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) -1, __R);
+ return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_range_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
- int __C, const int __R)
+_mm512_maskz_cvtepi64_pd (__mmask8 __U, __m512i __A)
+{
+ return (__m512d) __builtin_ia32_cvtqq2pd512_mask ((__v8di) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepu64_pd (__m512i __A)
{
- return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df) __W,
- (__mmask8) __U, __R);
+ return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_range_round_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C,
- const int __R)
+_mm512_mask_cvtepu64_pd (__m512d __W, __mmask8 __U, __m512i __A)
{
- return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
- (__v2df) __B, __C,
- (__v2df)
- _mm_setzero_pd (),
- (__mmask8) __U, __R);
+ return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_range_round_ss (__m128 __A, __m128 __B, int __C, const int __R)
+_mm512_maskz_cvtepu64_pd (__mmask8 __U, __m512i __A)
{
- return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) -1, __R);
+ return (__m512d) __builtin_ia32_cvtuqq2pd512_mask ((__v8di) __A,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+#ifdef __OPTIMIZE__
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_range_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
- int __C, const int __R)
+_mm512_range_pd (__m512d __A, __m512d __B, int __C)
{
- return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf) __W,
- (__mmask8) __U, __R);
+ return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
+ (__v8df) __B, __C,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_range_round_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C,
- const int __R)
+_mm512_mask_range_pd (__m512d __W, __mmask8 __U,
+ __m512d __A, __m512d __B, int __C)
{
- return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
- (__v4sf) __B, __C,
- (__v4sf)
- _mm_setzero_ps (),
- (__mmask8) __U, __R);
+ return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
+ (__v8df) __B, __C,
+ (__v8df) __W,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask8
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fpclass_ss_mask (__m128 __A, const int __imm)
+_mm512_maskz_range_pd (__mmask8 __U, __m512d __A, __m512d __B, int __C)
{
- return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm,
- (__mmask8) -1);
+ return (__m512d) __builtin_ia32_rangepd512_mask ((__v8df) __A,
+ (__v8df) __B, __C,
+ (__v8df)
+ _mm512_setzero_pd (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask8
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fpclass_sd_mask (__m128d __A, const int __imm)
+_mm512_range_ps (__m512 __A, __m512 __B, int __C)
{
- return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm,
- (__mmask8) -1);
+ return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
+ (__v16sf) __B, __C,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask8
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
+_mm512_mask_range_ps (__m512 __W, __mmask16 __U,
+ __m512 __A, __m512 __B, int __C)
{
- return (__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) __A, __imm, __U);
+ return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
+ (__v16sf) __B, __C,
+ (__v16sf) __W,
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask8
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
+_mm512_maskz_range_ps (__mmask16 __U, __m512 __A, __m512 __B, int __C)
{
- return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, __U);
+ return (__m512) __builtin_ia32_rangeps512_mask ((__v16sf) __A,
+ (__v16sf) __B, __C,
+ (__v16sf)
+ _mm512_setzero_ps (),
+ (__mmask16) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512i
@@ -2395,72 +2546,6 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm)
}
#else
-#define _kshiftli_mask8(X, Y) \
- ((__mmask8) __builtin_ia32_kshiftliqi ((__mmask8)(X), (__mmask8)(Y)))
-
-#define _kshiftri_mask8(X, Y) \
- ((__mmask8) __builtin_ia32_kshiftriqi ((__mmask8)(X), (__mmask8)(Y)))
-
-#define _mm_range_sd(A, B, C) \
- ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
- (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_range_sd(W, U, A, B, C) \
- ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_range_sd(U, A, B, C) \
- ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_range_ss(A, B, C) \
- ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
- (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_range_ss(W, U, A, B, C) \
- ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_range_ss(U, A, B, C) \
- ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
- (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_range_round_sd(A, B, C, R) \
- ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
- (__mmask8) -1, (R)))
-
-#define _mm_mask_range_round_sd(W, U, A, B, C, R) \
- ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \
- (__mmask8)(U), (R)))
-
-#define _mm_maskz_range_round_sd(U, A, B, C, R) \
- ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
- (__mmask8)(U), (R)))
-
-#define _mm_range_round_ss(A, B, C, R) \
- ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
- (__mmask8) -1, (R)))
-
-#define _mm_mask_range_round_ss(W, U, A, B, C, R) \
- ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \
- (__mmask8)(U), (R)))
-
-#define _mm_maskz_range_round_ss(U, A, B, C, R) \
- ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
- (__mmask8)(U), (R)))
-
#define _mm512_cvtt_roundpd_epi64(A, B) \
((__m512i)__builtin_ia32_cvttpd2qq512_mask ((A), (__v8di) \
_mm512_setzero_si512 (), \
@@ -2792,22 +2877,6 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm)
(__v16si)(__m512i)_mm512_setzero_si512 (),\
(__mmask16)(U)))
-#define _mm_fpclass_ss_mask(X, C) \
- ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X), \
- (int) (C), (__mmask8) (-1))) \
-
-#define _mm_fpclass_sd_mask(X, C) \
- ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), \
- (int) (C), (__mmask8) (-1))) \
-
-#define _mm_mask_fpclass_ss_mask(X, C, U) \
- ((__mmask8) __builtin_ia32_fpclassss_mask ((__v4sf) (__m128) (X), \
- (int) (C), (__mmask8) (U)))
-
-#define _mm_mask_fpclass_sd_mask(X, C, U) \
- ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), \
- (int) (C), (__mmask8) (U)))
-
#define _mm512_mask_fpclass_pd_mask(u, X, C) \
((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \
(int) (C), (__mmask8)(u)))
@@ -2824,68 +2893,11 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm)
((__mmask16) __builtin_ia32_fpclassps512_mask ((__v16sf) (__m512) (x),\
(int) (c),(__mmask16)-1))
-#define _mm_reduce_sd(A, B, C) \
- ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
- (__mmask8)-1))
-
-#define _mm_mask_reduce_sd(W, U, A, B, C) \
- ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), (__mmask8)(U)))
-
-#define _mm_maskz_reduce_sd(U, A, B, C) \
- ((__m128d) __builtin_ia32_reducesd_mask ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
- (__mmask8)(U)))
-
-#define _mm_reduce_round_sd(A, B, C, R) \
- ((__m128d) __builtin_ia32_reducesd_round ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__mmask8)(U), (int)(R)))
-
-#define _mm_mask_reduce_round_sd(W, U, A, B, C, R) \
- ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), \
- (__mmask8)(U), (int)(R)))
-
-#define _mm_maskz_reduce_round_sd(U, A, B, C, R) \
- ((__m128d) __builtin_ia32_reducesd_mask_round ((__v2df)(__m128d)(A), \
- (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), \
- (__mmask8)(U), (int)(R)))
-
-#define _mm_reduce_ss(A, B, C) \
- ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
- (__mmask8)-1))
-
-#define _mm_mask_reduce_ss(W, U, A, B, C) \
- ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), (__mmask8)(U)))
-
-#define _mm_maskz_reduce_ss(U, A, B, C) \
- ((__m128) __builtin_ia32_reducess_mask ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
- (__mmask8)(U)))
-
-#define _mm_reduce_round_ss(A, B, C, R) \
- ((__m128) __builtin_ia32_reducess_round ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__mmask8)(U), (int)(R)))
-
-#define _mm_mask_reduce_round_ss(W, U, A, B, C, R) \
- ((__m128) __builtin_ia32_reducess_mask_round ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W), \
- (__mmask8)(U), (int)(R)))
-
-#define _mm_maskz_reduce_round_ss(U, A, B, C, R) \
- ((__m128) __builtin_ia32_reducesd_mask_round ((__v4sf)(__m128)(A), \
- (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (), \
- (__mmask8)(U), (int)(R)))
-
-
#endif
-#ifdef __DISABLE_AVX512DQ__
-#undef __DISABLE_AVX512DQ__
+#ifdef __DISABLE_AVX512DQ_512__
+#undef __DISABLE_AVX512DQ_512__
#pragma GCC pop_options
-#endif /* __DISABLE_AVX512DQ__ */
+#endif /* __DISABLE_AVX512DQ_512__ */
#endif /* _AVX512DQINTRIN_H_INCLUDED */
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 04/18] [PATCH 3/5] Push evex512 target for 512 bit intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (2 preceding siblings ...)
2023-09-21 7:19 ` [PATCH 03/18] [PATCH 2/5] " Hu, Lin1
@ 2023-09-21 7:19 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 05/18] [PATCH 4/5] " Hu, Lin1
` (15 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:19 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/avx512bwintrin.h: Add evex512 target for 512 bit
intrins.
---
gcc/config/i386/avx512bwintrin.h | 291 ++++++++++++++++---------------
1 file changed, 153 insertions(+), 138 deletions(-)
diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index d1cd549ce18..925bae1457c 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -34,16 +34,6 @@
#define __DISABLE_AVX512BW__
#endif /* __AVX512BW__ */
-/* Internal data types for implementing the intrinsics. */
-typedef short __v32hi __attribute__ ((__vector_size__ (64)));
-typedef short __v32hi_u __attribute__ ((__vector_size__ (64), \
- __may_alias__, __aligned__ (1)));
-typedef char __v64qi __attribute__ ((__vector_size__ (64)));
-typedef char __v64qi_u __attribute__ ((__vector_size__ (64), \
- __may_alias__, __aligned__ (1)));
-
-typedef unsigned long long __mmask64;
-
extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_ktest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF)
@@ -54,229 +44,292 @@ _ktest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF)
extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF)
+_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
{
- *__CF = (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
- return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
+ return (unsigned char) __builtin_ia32_ktestzsi (__A, __B);
}
extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
{
- return (unsigned char) __builtin_ia32_ktestzsi (__A, __B);
+ return (unsigned char) __builtin_ia32_ktestcsi (__A, __B);
}
extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF)
{
- return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
+ *__CF = (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
+ return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
}
extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
{
- return (unsigned char) __builtin_ia32_ktestcsi (__A, __B);
+ return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
}
extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
{
- return (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
+ return (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
}
-extern __inline unsigned char
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF)
+_kadd_mask32 (__mmask32 __A, __mmask32 __B)
{
- *__CF = (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
- return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
+ return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
}
-extern __inline unsigned char
+extern __inline unsigned int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF)
+_cvtmask32_u32 (__mmask32 __A)
{
- *__CF = (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
- return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+ return (unsigned int) __builtin_ia32_kmovd ((__mmask32) __A);
}
-extern __inline unsigned char
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+_cvtu32_mask32 (unsigned int __A)
{
- return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
+ return (__mmask32) __builtin_ia32_kmovd ((__mmask32) __A);
}
-extern __inline unsigned char
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+_load_mask32 (__mmask32 *__A)
{
- return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+ return (__mmask32) __builtin_ia32_kmovd (*__A);
}
-extern __inline unsigned char
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+_store_mask32 (__mmask32 *__A, __mmask32 __B)
{
- return (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
+ *(__mmask32 *) __A = __builtin_ia32_kmovd (__B);
}
-extern __inline unsigned char
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+_knot_mask32 (__mmask32 __A)
{
- return (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
+ return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
}
extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kadd_mask32 (__mmask32 __A, __mmask32 __B)
+_kor_mask32 (__mmask32 __A, __mmask32 __B)
{
- return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
+ return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
}
-extern __inline __mmask64
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kadd_mask64 (__mmask64 __A, __mmask64 __B)
+_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
{
- return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
+ return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
}
-extern __inline unsigned int
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtmask32_u32 (__mmask32 __A)
+_kxor_mask32 (__mmask32 __A, __mmask32 __B)
{
- return (unsigned int) __builtin_ia32_kmovd ((__mmask32) __A);
+ return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
}
-extern __inline unsigned long long
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtmask64_u64 (__mmask64 __A)
+_kand_mask32 (__mmask32 __A, __mmask32 __B)
{
- return (unsigned long long) __builtin_ia32_kmovq ((__mmask64) __A);
+ return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
}
extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtu32_mask32 (unsigned int __A)
+_kandn_mask32 (__mmask32 __A, __mmask32 __B)
{
- return (__mmask32) __builtin_ia32_kmovd ((__mmask32) __A);
+ return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
}
-extern __inline __mmask64
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_cvtu64_mask64 (unsigned long long __A)
+_mm512_kunpackw (__mmask32 __A, __mmask32 __B)
{
- return (__mmask64) __builtin_ia32_kmovq ((__mmask64) __A);
+ return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+ (__mmask32) __B);
}
extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_load_mask32 (__mmask32 *__A)
+_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
{
- return (__mmask32) __builtin_ia32_kmovd (*__A);
+ return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+ (__mmask32) __B);
}
-extern __inline __mmask64
+#if __OPTIMIZE__
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_load_mask64 (__mmask64 *__A)
+_kshiftli_mask32 (__mmask32 __A, unsigned int __B)
{
- return (__mmask64) __builtin_ia32_kmovq (*(__mmask64 *) __A);
+ return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A,
+ (__mmask8) __B);
}
-extern __inline void
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_store_mask32 (__mmask32 *__A, __mmask32 __B)
+_kshiftri_mask32 (__mmask32 __A, unsigned int __B)
{
- *(__mmask32 *) __A = __builtin_ia32_kmovd (__B);
+ return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A,
+ (__mmask8) __B);
}
-extern __inline void
+#else
+#define _kshiftli_mask32(X, Y) \
+ ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask32(X, Y) \
+ ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y)))
+
+#endif
+
+#ifdef __DISABLE_AVX512BW__
+#undef __DISABLE_AVX512BW__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512BW__ */
+
+#if !defined (__AVX512BW__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512bw,evex512")
+#define __DISABLE_AVX512BW_512__
+#endif /* __AVX512BW_512__ */
+
+/* Internal data types for implementing the intrinsics. */
+typedef short __v32hi __attribute__ ((__vector_size__ (64)));
+typedef short __v32hi_u __attribute__ ((__vector_size__ (64), \
+ __may_alias__, __aligned__ (1)));
+typedef char __v64qi __attribute__ ((__vector_size__ (64)));
+typedef char __v64qi_u __attribute__ ((__vector_size__ (64), \
+ __may_alias__, __aligned__ (1)));
+
+typedef unsigned long long __mmask64;
+
+extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_store_mask64 (__mmask64 *__A, __mmask64 __B)
+_ktest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF)
{
- *(__mmask64 *) __A = __builtin_ia32_kmovq (__B);
+ *__CF = (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
+ return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
}
-extern __inline __mmask32
+extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_knot_mask32 (__mmask32 __A)
+_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
{
- return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
+ return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
}
-extern __inline __mmask64
+extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_knot_mask64 (__mmask64 __A)
+_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
{
- return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
+ return (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
}
-extern __inline __mmask32
+extern __inline unsigned char
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kor_mask32 (__mmask32 __A, __mmask32 __B)
+_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF)
{
- return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
+ *__CF = (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
+ return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+ return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+ return (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
}
extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kor_mask64 (__mmask64 __A, __mmask64 __B)
+_kadd_mask64 (__mmask64 __A, __mmask64 __B)
{
- return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
+ return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
}
-extern __inline __mmask32
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
+_cvtmask64_u64 (__mmask64 __A)
{
- return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
+ return (unsigned long long) __builtin_ia32_kmovq ((__mmask64) __A);
}
extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
+_cvtu64_mask64 (unsigned long long __A)
{
- return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
+ return (__mmask64) __builtin_ia32_kmovq ((__mmask64) __A);
}
-extern __inline __mmask32
+extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kxor_mask32 (__mmask32 __A, __mmask32 __B)
+_load_mask64 (__mmask64 *__A)
{
- return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
+ return (__mmask64) __builtin_ia32_kmovq (*(__mmask64 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask64 (__mmask64 *__A, __mmask64 __B)
+{
+ *(__mmask64 *) __A = __builtin_ia32_kmovq (__B);
}
extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kxor_mask64 (__mmask64 __A, __mmask64 __B)
+_knot_mask64 (__mmask64 __A)
{
- return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+ return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
}
-extern __inline __mmask32
+extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kand_mask32 (__mmask32 __A, __mmask32 __B)
+_kor_mask64 (__mmask64 __A, __mmask64 __B)
{
- return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
+ return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
}
extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kand_mask64 (__mmask64 __A, __mmask64 __B)
+_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
{
- return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
+ return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
}
-extern __inline __mmask32
+extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kandn_mask32 (__mmask32 __A, __mmask32 __B)
+_kxor_mask64 (__mmask64 __A, __mmask64 __B)
{
- return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
+ return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask64 (__mmask64 __A, __mmask64 __B)
+{
+ return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
}
extern __inline __mmask64
@@ -366,22 +419,6 @@ _mm512_maskz_mov_epi8 (__mmask64 __U, __m512i __A)
(__mmask64) __U);
}
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_kunpackw (__mmask32 __A, __mmask32 __B)
-{
- return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
- (__mmask32) __B);
-}
-
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
-{
- return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
- (__mmask32) __B);
-}
-
extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_kunpackd (__mmask64 __A, __mmask64 __B)
@@ -2776,14 +2813,6 @@ _mm512_mask_packus_epi32 (__m512i __W, __mmask32 __M, __m512i __A,
}
#ifdef __OPTIMIZE__
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftli_mask32 (__mmask32 __A, unsigned int __B)
-{
- return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A,
- (__mmask8) __B);
-}
-
extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_kshiftli_mask64 (__mmask64 __A, unsigned int __B)
@@ -2792,14 +2821,6 @@ _kshiftli_mask64 (__mmask64 __A, unsigned int __B)
(__mmask8) __B);
}
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kshiftri_mask32 (__mmask32 __A, unsigned int __B)
-{
- return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A,
- (__mmask8) __B);
-}
-
extern __inline __mmask64
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_kshiftri_mask64 (__mmask64 __A, unsigned int __B)
@@ -3145,15 +3166,9 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N)
}
#else
-#define _kshiftli_mask32(X, Y) \
- ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y)))
-
#define _kshiftli_mask64(X, Y) \
((__mmask64) __builtin_ia32_kshiftlidi ((__mmask64)(X), (__mmask8)(Y)))
-#define _kshiftri_mask32(X, Y) \
- ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y)))
-
#define _kshiftri_mask64(X, Y) \
((__mmask64) __builtin_ia32_kshiftridi ((__mmask64)(X), (__mmask8)(Y)))
@@ -3328,9 +3343,9 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N)
#endif
-#ifdef __DISABLE_AVX512BW__
-#undef __DISABLE_AVX512BW__
+#ifdef __DISABLE_AVX512BW_512__
+#undef __DISABLE_AVX512BW_512__
#pragma GCC pop_options
-#endif /* __DISABLE_AVX512BW__ */
+#endif /* __DISABLE_AVX512BW_512__ */
#endif /* _AVX512BWINTRIN_H_INCLUDED */
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 05/18] [PATCH 4/5] Push evex512 target for 512 bit intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (3 preceding siblings ...)
2023-09-21 7:19 ` [PATCH 04/18] [PATCH 3/5] " Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 06/18] [PATCH 5/5] " Hu, Lin1
` (14 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config.gcc: Add avx512bitalgvlintrin.h.
* config/i386/avx5124fmapsintrin.h: Add evex512 target for 512 bit
intrins.
* config/i386/avx5124vnniwintrin.h: Ditto.
* config/i386/avx512bf16intrin.h: Ditto.
* config/i386/avx512bitalgintrin.h: Add evex512 target for 512 bit
intrins. Split 128/256 bit intrins to avx512bitalgvlintrin.h.
* config/i386/avx512erintrin.h: Add evex512 target for 512 bit
intrins
* config/i386/avx512ifmaintrin.h: Ditto
* config/i386/avx512pfintrin.h: Ditto
* config/i386/avx512vbmi2intrin.h: Ditto.
* config/i386/avx512vbmiintrin.h: Ditto.
* config/i386/avx512vnniintrin.h: Ditto.
* config/i386/avx512vp2intersectintrin.h: Ditto.
* config/i386/avx512vpopcntdqintrin.h: Ditto.
* config/i386/gfniintrin.h: Ditto.
* config/i386/immintrin.h: Add avx512bitalgvlintrin.h.
* config/i386/vaesintrin.h: Add evex512 target for 512 bit intrins.
* config/i386/vpclmulqdqintrin.h: Ditto.
* config/i386/avx512bitalgvlintrin.h: New.
---
gcc/config.gcc | 19 +--
gcc/config/i386/avx5124fmapsintrin.h | 2 +-
gcc/config/i386/avx5124vnniwintrin.h | 2 +-
gcc/config/i386/avx512bf16intrin.h | 31 ++--
gcc/config/i386/avx512bitalgintrin.h | 155 +-----------------
gcc/config/i386/avx512bitalgvlintrin.h | 180 +++++++++++++++++++++
gcc/config/i386/avx512erintrin.h | 2 +-
gcc/config/i386/avx512ifmaintrin.h | 4 +-
gcc/config/i386/avx512pfintrin.h | 2 +-
gcc/config/i386/avx512vbmi2intrin.h | 4 +-
gcc/config/i386/avx512vbmiintrin.h | 4 +-
gcc/config/i386/avx512vnniintrin.h | 4 +-
gcc/config/i386/avx512vp2intersectintrin.h | 4 +-
gcc/config/i386/avx512vpopcntdqintrin.h | 4 +-
gcc/config/i386/gfniintrin.h | 76 +++++----
gcc/config/i386/immintrin.h | 2 +
gcc/config/i386/vaesintrin.h | 4 +-
gcc/config/i386/vpclmulqdqintrin.h | 4 +-
18 files changed, 282 insertions(+), 221 deletions(-)
create mode 100644 gcc/config/i386/avx512bitalgvlintrin.h
diff --git a/gcc/config.gcc b/gcc/config.gcc
index ce5def08e2e..e47e6893e1d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -425,15 +425,16 @@ i[34567]86-*-* | x86_64-*-*)
avx512vbmi2vlintrin.h avx512vnniintrin.h
avx512vnnivlintrin.h vaesintrin.h vpclmulqdqintrin.h
avx512vpopcntdqvlintrin.h avx512bitalgintrin.h
- pconfigintrin.h wbnoinvdintrin.h movdirintrin.h
- waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
- avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
- avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
- tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
- amxbf16intrin.h x86gprintrin.h uintrintrin.h
- hresetintrin.h keylockerintrin.h avxvnniintrin.h
- mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h
- avxifmaintrin.h avxvnniint8intrin.h avxneconvertintrin.h
+ avx512bitalgvlintrin.h pconfigintrin.h wbnoinvdintrin.h
+ movdirintrin.h waitpkgintrin.h cldemoteintrin.h
+ avx512bf16vlintrin.h avx512bf16intrin.h enqcmdintrin.h
+ serializeintrin.h avx512vp2intersectintrin.h
+ avx512vp2intersectvlintrin.h tsxldtrkintrin.h
+ amxtileintrin.h amxint8intrin.h amxbf16intrin.h
+ x86gprintrin.h uintrintrin.h hresetintrin.h
+ keylockerintrin.h avxvnniintrin.h mwaitintrin.h
+ avx512fp16intrin.h avx512fp16vlintrin.h avxifmaintrin.h
+ avxvnniint8intrin.h avxneconvertintrin.h
cmpccxaddintrin.h amxfp16intrin.h prfchiintrin.h
raointintrin.h amxcomplexintrin.h avxvnniint16intrin.h
sm3intrin.h sha512intrin.h sm4intrin.h"
diff --git a/gcc/config/i386/avx5124fmapsintrin.h b/gcc/config/i386/avx5124fmapsintrin.h
index 97dd77c9235..4c884a5c203 100644
--- a/gcc/config/i386/avx5124fmapsintrin.h
+++ b/gcc/config/i386/avx5124fmapsintrin.h
@@ -30,7 +30,7 @@
#ifndef __AVX5124FMAPS__
#pragma GCC push_options
-#pragma GCC target("avx5124fmaps")
+#pragma GCC target("avx5124fmaps,evex512")
#define __DISABLE_AVX5124FMAPS__
#endif /* __AVX5124FMAPS__ */
diff --git a/gcc/config/i386/avx5124vnniwintrin.h b/gcc/config/i386/avx5124vnniwintrin.h
index fd129589798..795e4814f28 100644
--- a/gcc/config/i386/avx5124vnniwintrin.h
+++ b/gcc/config/i386/avx5124vnniwintrin.h
@@ -30,7 +30,7 @@
#ifndef __AVX5124VNNIW__
#pragma GCC push_options
-#pragma GCC target("avx5124vnniw")
+#pragma GCC target("avx5124vnniw,evex512")
#define __DISABLE_AVX5124VNNIW__
#endif /* __AVX5124VNNIW__ */
diff --git a/gcc/config/i386/avx512bf16intrin.h b/gcc/config/i386/avx512bf16intrin.h
index 107f4a448f6..94ccbf6389f 100644
--- a/gcc/config/i386/avx512bf16intrin.h
+++ b/gcc/config/i386/avx512bf16intrin.h
@@ -34,13 +34,6 @@
#define __DISABLE_AVX512BF16__
#endif /* __AVX512BF16__ */
-/* Internal data types for implementing the intrinsics. */
-typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64)));
-
-/* The Intel API is flexible enough that we must allow aliasing with other
- vector types, and their scalar components. */
-typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias__));
-
/* Convert One BF16 Data to One Single Float Data. */
extern __inline float
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -49,6 +42,24 @@ _mm_cvtsbh_ss (__bf16 __A)
return __builtin_ia32_cvtbf2sf (__A);
}
+#ifdef __DISABLE_AVX512BF16__
+#undef __DISABLE_AVX512BF16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512BF16__ */
+
+#if !defined (__AVX512BF16__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512bf16,evex512")
+#define __DISABLE_AVX512BF16_512__
+#endif /* __AVX512BF16_512__ */
+
+/* Internal data types for implementing the intrinsics. */
+typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64)));
+
+/* The Intel API is flexible enough that we must allow aliasing with other
+ vector types, and their scalar components. */
+typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias__));
+
/* vcvtne2ps2bf16 */
extern __inline __m512bh
@@ -144,9 +155,9 @@ _mm512_mask_cvtpbh_ps (__m512 __S, __mmask16 __U, __m256bh __A)
(__m512i)_mm512_cvtepi16_epi32 ((__m256i)__A), 16)));
}
-#ifdef __DISABLE_AVX512BF16__
-#undef __DISABLE_AVX512BF16__
+#ifdef __DISABLE_AVX512BF16_512__
+#undef __DISABLE_AVX512BF16_512__
#pragma GCC pop_options
-#endif /* __DISABLE_AVX512BF16__ */
+#endif /* __DISABLE_AVX512BF16_512__ */
#endif /* _AVX512BF16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx512bitalgintrin.h b/gcc/config/i386/avx512bitalgintrin.h
index a1c7be109a9..af8514f5838 100644
--- a/gcc/config/i386/avx512bitalgintrin.h
+++ b/gcc/config/i386/avx512bitalgintrin.h
@@ -22,15 +22,15 @@
<http://www.gnu.org/licenses/>. */
#if !defined _IMMINTRIN_H_INCLUDED
-# error "Never use <avx512bitalgintrin.h> directly; include <x86intrin.h> instead."
+# error "Never use <avx512bitalgintrin.h> directly; include <immintrin.h> instead."
#endif
#ifndef _AVX512BITALGINTRIN_H_INCLUDED
#define _AVX512BITALGINTRIN_H_INCLUDED
-#ifndef __AVX512BITALG__
+#if !defined (__AVX512BITALG__) || !defined (__EVEX512__)
#pragma GCC push_options
-#pragma GCC target("avx512bitalg")
+#pragma GCC target("avx512bitalg,evex512")
#define __DISABLE_AVX512BITALG__
#endif /* __AVX512BITALG__ */
@@ -108,153 +108,4 @@ _mm512_mask_bitshuffle_epi64_mask (__mmask64 __M, __m512i __A, __m512i __B)
#pragma GCC pop_options
#endif /* __DISABLE_AVX512BITALG__ */
-#if !defined(__AVX512BITALG__) || !defined(__AVX512VL__)
-#pragma GCC push_options
-#pragma GCC target("avx512bitalg,avx512vl")
-#define __DISABLE_AVX512BITALGVL__
-#endif /* __AVX512BITALGVL__ */
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_mask_popcnt_epi8 (__m256i __W, __mmask32 __U, __m256i __A)
-{
- return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A,
- (__v32qi) __W,
- (__mmask32) __U);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_maskz_popcnt_epi8 (__mmask32 __U, __m256i __A)
-{
- return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A,
- (__v32qi)
- _mm256_setzero_si256 (),
- (__mmask32) __U);
-}
-
-extern __inline __mmask32
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_bitshuffle_epi64_mask (__m256i __A, __m256i __B)
-{
- return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A,
- (__v32qi) __B,
- (__mmask32) -1);
-}
-
-extern __inline __mmask32
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_mask_bitshuffle_epi64_mask (__mmask32 __M, __m256i __A, __m256i __B)
-{
- return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A,
- (__v32qi) __B,
- (__mmask32) __M);
-}
-
-extern __inline __mmask16
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_bitshuffle_epi64_mask (__m128i __A, __m128i __B)
-{
- return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A,
- (__v16qi) __B,
- (__mmask16) -1);
-}
-
-extern __inline __mmask16
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_bitshuffle_epi64_mask (__mmask16 __M, __m128i __A, __m128i __B)
-{
- return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A,
- (__v16qi) __B,
- (__mmask16) __M);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_popcnt_epi8 (__m256i __A)
-{
- return (__m256i) __builtin_ia32_vpopcountb_v32qi ((__v32qi) __A);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_popcnt_epi16 (__m256i __A)
-{
- return (__m256i) __builtin_ia32_vpopcountw_v16hi ((__v16hi) __A);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_popcnt_epi8 (__m128i __A)
-{
- return (__m128i) __builtin_ia32_vpopcountb_v16qi ((__v16qi) __A);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_popcnt_epi16 (__m128i __A)
-{
- return (__m128i) __builtin_ia32_vpopcountw_v8hi ((__v8hi) __A);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_mask_popcnt_epi16 (__m256i __W, __mmask16 __U, __m256i __A)
-{
- return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A,
- (__v16hi) __W,
- (__mmask16) __U);
-}
-
-extern __inline __m256i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_maskz_popcnt_epi16 (__mmask16 __U, __m256i __A)
-{
- return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A,
- (__v16hi)
- _mm256_setzero_si256 (),
- (__mmask16) __U);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_popcnt_epi8 (__m128i __W, __mmask16 __U, __m128i __A)
-{
- return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A,
- (__v16qi) __W,
- (__mmask16) __U);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_popcnt_epi8 (__mmask16 __U, __m128i __A)
-{
- return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A,
- (__v16qi)
- _mm_setzero_si128 (),
- (__mmask16) __U);
-}
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_popcnt_epi16 (__m128i __W, __mmask8 __U, __m128i __A)
-{
- return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A,
- (__v8hi) __W,
- (__mmask8) __U);
-}
-
-extern __inline __m128i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_popcnt_epi16 (__mmask8 __U, __m128i __A)
-{
- return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A,
- (__v8hi)
- _mm_setzero_si128 (),
- (__mmask8) __U);
-}
-#ifdef __DISABLE_AVX512BITALGVL__
-#undef __DISABLE_AVX512BITALGVL__
-#pragma GCC pop_options
-#endif /* __DISABLE_AVX512BITALGVL__ */
-
#endif /* _AVX512BITALGINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx512bitalgvlintrin.h b/gcc/config/i386/avx512bitalgvlintrin.h
new file mode 100644
index 00000000000..36d697dea8a
--- /dev/null
+++ b/gcc/config/i386/avx512bitalgvlintrin.h
@@ -0,0 +1,180 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+ This file is part of GCC.
+
+ GCC is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+ GCC is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx512bitalgvlintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _AVX512BITALGVLINTRIN_H_INCLUDED
+#define _AVX512BITALGVLINTRIN_H_INCLUDED
+
+#if !defined(__AVX512BITALG__) || !defined(__AVX512VL__)
+#pragma GCC push_options
+#pragma GCC target("avx512bitalg,avx512vl")
+#define __DISABLE_AVX512BITALGVL__
+#endif /* __AVX512BITALGVL__ */
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_popcnt_epi8 (__m256i __W, __mmask32 __U, __m256i __A)
+{
+ return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A,
+ (__v32qi) __W,
+ (__mmask32) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_popcnt_epi8 (__mmask32 __U, __m256i __A)
+{
+ return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A,
+ (__v32qi)
+ _mm256_setzero_si256 (),
+ (__mmask32) __U);
+}
+
+extern __inline __mmask32
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_bitshuffle_epi64_mask (__m256i __A, __m256i __B)
+{
+ return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A,
+ (__v32qi) __B,
+ (__mmask32) -1);
+}
+
+extern __inline __mmask32
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_bitshuffle_epi64_mask (__mmask32 __M, __m256i __A, __m256i __B)
+{
+ return (__mmask32) __builtin_ia32_vpshufbitqmb256_mask ((__v32qi) __A,
+ (__v32qi) __B,
+ (__mmask32) __M);
+}
+
+extern __inline __mmask16
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_bitshuffle_epi64_mask (__m128i __A, __m128i __B)
+{
+ return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A,
+ (__v16qi) __B,
+ (__mmask16) -1);
+}
+
+extern __inline __mmask16
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_bitshuffle_epi64_mask (__mmask16 __M, __m128i __A, __m128i __B)
+{
+ return (__mmask16) __builtin_ia32_vpshufbitqmb128_mask ((__v16qi) __A,
+ (__v16qi) __B,
+ (__mmask16) __M);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_popcnt_epi8 (__m256i __A)
+{
+ return (__m256i) __builtin_ia32_vpopcountb_v32qi ((__v32qi) __A);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_popcnt_epi16 (__m256i __A)
+{
+ return (__m256i) __builtin_ia32_vpopcountw_v16hi ((__v16hi) __A);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_popcnt_epi8 (__m128i __A)
+{
+ return (__m128i) __builtin_ia32_vpopcountb_v16qi ((__v16qi) __A);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_popcnt_epi16 (__m128i __A)
+{
+ return (__m128i) __builtin_ia32_vpopcountw_v8hi ((__v8hi) __A);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_popcnt_epi16 (__m256i __W, __mmask16 __U, __m256i __A)
+{
+ return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A,
+ (__v16hi) __W,
+ (__mmask16) __U);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_popcnt_epi16 (__mmask16 __U, __m256i __A)
+{
+ return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A,
+ (__v16hi)
+ _mm256_setzero_si256 (),
+ (__mmask16) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_popcnt_epi8 (__m128i __W, __mmask16 __U, __m128i __A)
+{
+ return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A,
+ (__v16qi) __W,
+ (__mmask16) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_popcnt_epi8 (__mmask16 __U, __m128i __A)
+{
+ return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A,
+ (__v16qi)
+ _mm_setzero_si128 (),
+ (__mmask16) __U);
+}
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_popcnt_epi16 (__m128i __W, __mmask8 __U, __m128i __A)
+{
+ return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A,
+ (__v8hi) __W,
+ (__mmask8) __U);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_popcnt_epi16 (__mmask8 __U, __m128i __A)
+{
+ return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A,
+ (__v8hi)
+ _mm_setzero_si128 (),
+ (__mmask8) __U);
+}
+#ifdef __DISABLE_AVX512BITALGVL__
+#undef __DISABLE_AVX512BITALGVL__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512BITALGVL__ */
+
+#endif /* _AVX512BITALGVLINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx512erintrin.h b/gcc/config/i386/avx512erintrin.h
index bd83b7fbbc6..5c7be9c47ac 100644
--- a/gcc/config/i386/avx512erintrin.h
+++ b/gcc/config/i386/avx512erintrin.h
@@ -30,7 +30,7 @@
#ifndef __AVX512ER__
#pragma GCC push_options
-#pragma GCC target("avx512er")
+#pragma GCC target("avx512er,evex512")
#define __DISABLE_AVX512ER__
#endif /* __AVX512ER__ */
diff --git a/gcc/config/i386/avx512ifmaintrin.h b/gcc/config/i386/avx512ifmaintrin.h
index fc97f1defe8..e08078b2725 100644
--- a/gcc/config/i386/avx512ifmaintrin.h
+++ b/gcc/config/i386/avx512ifmaintrin.h
@@ -28,9 +28,9 @@
#ifndef _AVX512IFMAINTRIN_H_INCLUDED
#define _AVX512IFMAINTRIN_H_INCLUDED
-#ifndef __AVX512IFMA__
+#if !defined (__AVX512IFMA__) || !defined (__EVEX512__)
#pragma GCC push_options
-#pragma GCC target("avx512ifma")
+#pragma GCC target("avx512ifma,evex512")
#define __DISABLE_AVX512IFMA__
#endif /* __AVX512IFMA__ */
diff --git a/gcc/config/i386/avx512pfintrin.h b/gcc/config/i386/avx512pfintrin.h
index a547610660a..58af26ff02e 100644
--- a/gcc/config/i386/avx512pfintrin.h
+++ b/gcc/config/i386/avx512pfintrin.h
@@ -30,7 +30,7 @@
#ifndef __AVX512PF__
#pragma GCC push_options
-#pragma GCC target("avx512pf")
+#pragma GCC target("avx512pf,evex512")
#define __DISABLE_AVX512PF__
#endif /* __AVX512PF__ */
diff --git a/gcc/config/i386/avx512vbmi2intrin.h b/gcc/config/i386/avx512vbmi2intrin.h
index ca00f8a5f14..b7ff07b2d11 100644
--- a/gcc/config/i386/avx512vbmi2intrin.h
+++ b/gcc/config/i386/avx512vbmi2intrin.h
@@ -28,9 +28,9 @@
#ifndef __AVX512VBMI2INTRIN_H_INCLUDED
#define __AVX512VBMI2INTRIN_H_INCLUDED
-#if !defined(__AVX512VBMI2__)
+#if !defined(__AVX512VBMI2__) || !defined (__EVEX512__)
#pragma GCC push_options
-#pragma GCC target("avx512vbmi2")
+#pragma GCC target("avx512vbmi2,evex512")
#define __DISABLE_AVX512VBMI2__
#endif /* __AVX512VBMI2__ */
diff --git a/gcc/config/i386/avx512vbmiintrin.h b/gcc/config/i386/avx512vbmiintrin.h
index 502586090ae..1a7ab4edca3 100644
--- a/gcc/config/i386/avx512vbmiintrin.h
+++ b/gcc/config/i386/avx512vbmiintrin.h
@@ -28,9 +28,9 @@
#ifndef _AVX512VBMIINTRIN_H_INCLUDED
#define _AVX512VBMIINTRIN_H_INCLUDED
-#ifndef __AVX512VBMI__
+#if !defined (__AVX512VBMI__) || !defined (__EVEX512__)
#pragma GCC push_options
-#pragma GCC target("avx512vbmi")
+#pragma GCC target("avx512vbmi,evex512")
#define __DISABLE_AVX512VBMI__
#endif /* __AVX512VBMI__ */
diff --git a/gcc/config/i386/avx512vnniintrin.h b/gcc/config/i386/avx512vnniintrin.h
index e36e2e57f21..1090703ec48 100644
--- a/gcc/config/i386/avx512vnniintrin.h
+++ b/gcc/config/i386/avx512vnniintrin.h
@@ -28,9 +28,9 @@
#ifndef __AVX512VNNIINTRIN_H_INCLUDED
#define __AVX512VNNIINTRIN_H_INCLUDED
-#if !defined(__AVX512VNNI__)
+#if !defined(__AVX512VNNI__) || !defined (__EVEX512__)
#pragma GCC push_options
-#pragma GCC target("avx512vnni")
+#pragma GCC target("avx512vnni,evex512")
#define __DISABLE_AVX512VNNI__
#endif /* __AVX512VNNI__ */
diff --git a/gcc/config/i386/avx512vp2intersectintrin.h b/gcc/config/i386/avx512vp2intersectintrin.h
index 65e2fb1abf5..bf68245155d 100644
--- a/gcc/config/i386/avx512vp2intersectintrin.h
+++ b/gcc/config/i386/avx512vp2intersectintrin.h
@@ -28,9 +28,9 @@
#ifndef _AVX512VP2INTERSECTINTRIN_H_INCLUDED
#define _AVX512VP2INTERSECTINTRIN_H_INCLUDED
-#if !defined(__AVX512VP2INTERSECT__)
+#if !defined(__AVX512VP2INTERSECT__) || !defined (__EVEX512__)
#pragma GCC push_options
-#pragma GCC target("avx512vp2intersect")
+#pragma GCC target("avx512vp2intersect,evex512")
#define __DISABLE_AVX512VP2INTERSECT__
#endif /* __AVX512VP2INTERSECT__ */
diff --git a/gcc/config/i386/avx512vpopcntdqintrin.h b/gcc/config/i386/avx512vpopcntdqintrin.h
index 47897fbd8d7..9470a403f8e 100644
--- a/gcc/config/i386/avx512vpopcntdqintrin.h
+++ b/gcc/config/i386/avx512vpopcntdqintrin.h
@@ -28,9 +28,9 @@
#ifndef _AVX512VPOPCNTDQINTRIN_H_INCLUDED
#define _AVX512VPOPCNTDQINTRIN_H_INCLUDED
-#ifndef __AVX512VPOPCNTDQ__
+#if !defined (__AVX512VPOPCNTDQ__) || !defined (__EVEX512__)
#pragma GCC push_options
-#pragma GCC target("avx512vpopcntdq")
+#pragma GCC target("avx512vpopcntdq,evex512")
#define __DISABLE_AVX512VPOPCNTDQ__
#endif /* __AVX512VPOPCNTDQ__ */
diff --git a/gcc/config/i386/gfniintrin.h b/gcc/config/i386/gfniintrin.h
index ef3dc225b40..907e7a0cf7a 100644
--- a/gcc/config/i386/gfniintrin.h
+++ b/gcc/config/i386/gfniintrin.h
@@ -297,9 +297,53 @@ _mm256_maskz_gf2p8affine_epi64_epi8 (__mmask32 __A, __m256i __B,
#pragma GCC pop_options
#endif /* __GFNIAVX512VLBW__ */
-#if !defined(__GFNI__) || !defined(__AVX512F__) || !defined(__AVX512BW__)
+#if !defined(__GFNI__) || !defined(__EVEX512__) || !defined(__AVX512F__)
#pragma GCC push_options
-#pragma GCC target("gfni,avx512f,avx512bw")
+#pragma GCC target("gfni,avx512f,evex512")
+#define __DISABLE_GFNIAVX512F__
+#endif /* __GFNIAVX512F__ */
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_gf2p8mul_epi8 (__m512i __A, __m512i __B)
+{
+ return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi ((__v64qi) __A,
+ (__v64qi) __B);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_gf2p8affineinv_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
+{
+ return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ((__v64qi) __A,
+ (__v64qi) __B, __C);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
+{
+ return (__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi) __A,
+ (__v64qi) __B, __C);
+}
+#else
+#define _mm512_gf2p8affineinv_epi64_epi8(A, B, C) \
+ ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ( \
+ (__v64qi)(__m512i)(A), (__v64qi)(__m512i)(B), (int)(C)))
+#define _mm512_gf2p8affine_epi64_epi8(A, B, C) \
+ ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi)(__m512i)(A), \
+ (__v64qi)(__m512i)(B), (int)(C)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX512F__
+#undef __DISABLE_GFNIAVX512F__
+#pragma GCC pop_options
+#endif /* __GFNIAVX512F__ */
+
+#if !defined(__GFNI__) || !defined(__EVEX512__) || !defined(__AVX512BW__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx512bw,evex512")
#define __DISABLE_GFNIAVX512FBW__
#endif /* __GFNIAVX512FBW__ */
@@ -319,13 +363,6 @@ _mm512_maskz_gf2p8mul_epi8 (__mmask64 __A, __m512i __B, __m512i __C)
return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi_mask ((__v64qi) __B,
(__v64qi) __C, (__v64qi) _mm512_setzero_si512 (), __A);
}
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_gf2p8mul_epi8 (__m512i __A, __m512i __B)
-{
- return (__m512i) __builtin_ia32_vgf2p8mulb_v64qi ((__v64qi) __A,
- (__v64qi) __B);
-}
#ifdef __OPTIMIZE__
extern __inline __m512i
@@ -350,14 +387,6 @@ _mm512_maskz_gf2p8affineinv_epi64_epi8 (__mmask64 __A, __m512i __B,
(__v64qi) _mm512_setzero_si512 (), __A);
}
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_gf2p8affineinv_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
-{
- return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ((__v64qi) __A,
- (__v64qi) __B, __C);
-}
-
extern __inline __m512i
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm512_mask_gf2p8affine_epi64_epi8 (__m512i __A, __mmask64 __B, __m512i __C,
@@ -375,13 +404,6 @@ _mm512_maskz_gf2p8affine_epi64_epi8 (__mmask64 __A, __m512i __B, __m512i __C,
return (__m512i) __builtin_ia32_vgf2p8affineqb_v64qi_mask ((__v64qi) __B,
(__v64qi) __C, __D, (__v64qi) _mm512_setzero_si512 (), __A);
}
-extern __inline __m512i
-__attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
-{
- return (__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi) __A,
- (__v64qi) __B, __C);
-}
#else
#define _mm512_mask_gf2p8affineinv_epi64_epi8(A, B, C, D, E) \
((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask( \
@@ -391,9 +413,6 @@ _mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask( \
(__v64qi)(__m512i)(B), (__v64qi)(__m512i)(C), (int)(D), \
(__v64qi)(__m512i) _mm512_setzero_si512 (), (__mmask64)(A)))
-#define _mm512_gf2p8affineinv_epi64_epi8(A, B, C) \
- ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ( \
- (__v64qi)(__m512i)(A), (__v64qi)(__m512i)(B), (int)(C)))
#define _mm512_mask_gf2p8affine_epi64_epi8(A, B, C, D, E) \
((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi_mask((__v64qi)(__m512i)(C),\
(__v64qi)(__m512i)(D), (int)(E), (__v64qi)(__m512i)(A), (__mmask64)(B)))
@@ -401,9 +420,6 @@ _mm512_gf2p8affine_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi_mask((__v64qi)(__m512i)(B),\
(__v64qi)(__m512i)(C), (int)(D), \
(__v64qi)(__m512i) _mm512_setzero_si512 (), (__mmask64)(A)))
-#define _mm512_gf2p8affine_epi64_epi8(A, B, C) \
- ((__m512i) __builtin_ia32_vgf2p8affineqb_v64qi ((__v64qi)(__m512i)(A), \
- (__v64qi)(__m512i)(B), (int)(C)))
#endif
#ifdef __DISABLE_GFNIAVX512FBW__
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index 29b4dbbda24..4e17901db15 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -96,6 +96,8 @@
#include <avx512bitalgintrin.h>
+#include <avx512bitalgvlintrin.h>
+
#include <avx512vp2intersectintrin.h>
#include <avx512vp2intersectvlintrin.h>
diff --git a/gcc/config/i386/vaesintrin.h b/gcc/config/i386/vaesintrin.h
index 58fc19c9eb3..b2bcdbe5bd1 100644
--- a/gcc/config/i386/vaesintrin.h
+++ b/gcc/config/i386/vaesintrin.h
@@ -66,9 +66,9 @@ _mm256_aesenclast_epi128 (__m256i __A, __m256i __B)
#endif /* __DISABLE_VAES__ */
-#if !defined(__VAES__) || !defined(__AVX512F__)
+#if !defined(__VAES__) || !defined(__AVX512F__) || !defined(__EVEX512__)
#pragma GCC push_options
-#pragma GCC target("vaes,avx512f")
+#pragma GCC target("vaes,avx512f,evex512")
#define __DISABLE_VAESF__
#endif /* __VAES__ */
diff --git a/gcc/config/i386/vpclmulqdqintrin.h b/gcc/config/i386/vpclmulqdqintrin.h
index 2c83b6037a0..c8c2c19d33f 100644
--- a/gcc/config/i386/vpclmulqdqintrin.h
+++ b/gcc/config/i386/vpclmulqdqintrin.h
@@ -28,9 +28,9 @@
#ifndef _VPCLMULQDQINTRIN_H_INCLUDED
#define _VPCLMULQDQINTRIN_H_INCLUDED
-#if !defined(__VPCLMULQDQ__) || !defined(__AVX512F__)
+#if !defined(__VPCLMULQDQ__) || !defined(__AVX512F__) || !defined(__EVEX512__)
#pragma GCC push_options
-#pragma GCC target("vpclmulqdq,avx512f")
+#pragma GCC target("vpclmulqdq,avx512f,evex512")
#define __DISABLE_VPCLMULQDQF__
#endif /* __VPCLMULQDQF__ */
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 06/18] [PATCH 5/5] Push evex512 target for 512 bit intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (4 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 05/18] [PATCH 4/5] " Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 07/18] [PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Hu, Lin1
` (13 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/Changelog:
* config/i386/avx512fp16intrin.h: Add evex512 target for 512 bit
intrins.
Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>
---
gcc/config/i386/avx512fp16intrin.h | 8925 ++++++++++++++--------------
1 file changed, 4476 insertions(+), 4449 deletions(-)
diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index dd083e5ed67..92c0c24e9bd 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -25,8 +25,8 @@
#error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
#endif
-#ifndef __AVX512FP16INTRIN_H_INCLUDED
-#define __AVX512FP16INTRIN_H_INCLUDED
+#ifndef _AVX512FP16INTRIN_H_INCLUDED
+#define _AVX512FP16INTRIN_H_INCLUDED
#ifndef __AVX512FP16__
#pragma GCC push_options
@@ -37,21 +37,17 @@
/* Internal data types for implementing the intrinsics. */
typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
-typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
/* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components. */
typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
-typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
/* Unaligned version of the same type. */
typedef _Float16 __m128h_u __attribute__ ((__vector_size__ (16), \
__may_alias__, __aligned__ (1)));
typedef _Float16 __m256h_u __attribute__ ((__vector_size__ (32), \
__may_alias__, __aligned__ (1)));
-typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64), \
- __may_alias__, __aligned__ (1)));
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -78,33 +74,8 @@ _mm256_set_ph (_Float16 __A15, _Float16 __A14, _Float16 __A13,
__A12, __A13, __A14, __A15 };
}
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29,
- _Float16 __A28, _Float16 __A27, _Float16 __A26,
- _Float16 __A25, _Float16 __A24, _Float16 __A23,
- _Float16 __A22, _Float16 __A21, _Float16 __A20,
- _Float16 __A19, _Float16 __A18, _Float16 __A17,
- _Float16 __A16, _Float16 __A15, _Float16 __A14,
- _Float16 __A13, _Float16 __A12, _Float16 __A11,
- _Float16 __A10, _Float16 __A9, _Float16 __A8,
- _Float16 __A7, _Float16 __A6, _Float16 __A5,
- _Float16 __A4, _Float16 __A3, _Float16 __A2,
- _Float16 __A1, _Float16 __A0)
-{
- return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3,
- __A4, __A5, __A6, __A7,
- __A8, __A9, __A10, __A11,
- __A12, __A13, __A14, __A15,
- __A16, __A17, __A18, __A19,
- __A20, __A21, __A22, __A23,
- __A24, __A25, __A26, __A27,
- __A28, __A29, __A30, __A31 };
-}
-
-/* Create vectors of elements in the reversed order from _mm_set_ph,
- _mm256_set_ph and _mm512_set_ph functions. */
-
+/* Create vectors of elements in the reversed order from _mm_set_ph
+ and _mm256_set_ph functions. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
@@ -128,30 +99,7 @@ _mm256_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
__A0);
}
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
- _Float16 __A3, _Float16 __A4, _Float16 __A5,
- _Float16 __A6, _Float16 __A7, _Float16 __A8,
- _Float16 __A9, _Float16 __A10, _Float16 __A11,
- _Float16 __A12, _Float16 __A13, _Float16 __A14,
- _Float16 __A15, _Float16 __A16, _Float16 __A17,
- _Float16 __A18, _Float16 __A19, _Float16 __A20,
- _Float16 __A21, _Float16 __A22, _Float16 __A23,
- _Float16 __A24, _Float16 __A25, _Float16 __A26,
- _Float16 __A27, _Float16 __A28, _Float16 __A29,
- _Float16 __A30, _Float16 __A31)
-
-{
- return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25,
- __A24, __A23, __A22, __A21, __A20, __A19, __A18,
- __A17, __A16, __A15, __A14, __A13, __A12, __A11,
- __A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3,
- __A2, __A1, __A0);
-}
-
/* Broadcast _Float16 to vector. */
-
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm_set1_ph (_Float16 __A)
@@ -167,18 +115,7 @@ _mm256_set1_ph (_Float16 __A)
__A, __A, __A, __A, __A, __A, __A, __A);
}
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_set1_ph (_Float16 __A)
-{
- return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A,
- __A, __A, __A, __A, __A, __A, __A, __A);
-}
-
/* Create a vector with all zeros. */
-
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm_setzero_ph (void)
@@ -193,13 +130,6 @@ _mm256_setzero_ph (void)
return _mm256_set1_ph (0.0f16);
}
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_ph (void)
-{
- return _mm512_set1_ph (0.0f16);
-}
-
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm_undefined_ph (void)
@@ -222,3815 +152,4056 @@ _mm256_undefined_ph (void)
return __Y;
}
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_undefined_ph (void)
-{
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Winit-self"
- __m512h __Y = __Y;
-#pragma GCC diagnostic pop
- return __Y;
-}
-
extern __inline _Float16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_h (__m128h __A)
+_mm256_cvtsh_h (__m256h __A)
{
return __A[0];
}
-extern __inline _Float16
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_cvtsh_h (__m256h __A)
+_mm256_load_ph (void const *__P)
{
- return __A[0];
+ return *(const __m256h *) __P;
}
-extern __inline _Float16
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtsh_h (__m512h __A)
+_mm_load_ph (void const *__P)
{
- return __A[0];
+ return *(const __m128h *) __P;
}
-extern __inline __m512
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph_ps (__m512h __a)
+_mm256_loadu_ph (void const *__P)
{
- return (__m512) __a;
+ return *(const __m256h_u *) __P;
}
-extern __inline __m512d
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph_pd (__m512h __a)
+_mm_loadu_ph (void const *__P)
{
- return (__m512d) __a;
+ return *(const __m128h_u *) __P;
}
-extern __inline __m512i
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph_si512 (__m512h __a)
+_mm256_store_ph (void *__P, __m256h __A)
{
- return (__m512i) __a;
+ *(__m256h *) __P = __A;
}
-extern __inline __m128h
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph512_ph128 (__m512h __A)
+_mm_store_ph (void *__P, __m128h __A)
{
- union
- {
- __m128h __a[4];
- __m512h __v;
- } __u = { .__v = __A };
- return __u.__a[0];
+ *(__m128h *) __P = __A;
}
-extern __inline __m256h
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph512_ph256 (__m512h __A)
+_mm256_storeu_ph (void *__P, __m256h __A)
{
- union
- {
- __m256h __a[2];
- __m512h __v;
- } __u = { .__v = __A };
- return __u.__a[0];
+ *(__m256h_u *) __P = __A;
}
-extern __inline __m512h
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph128_ph512 (__m128h __A)
+_mm_storeu_ph (void *__P, __m128h __A)
{
- union
- {
- __m128h __a[4];
- __m512h __v;
- } __u;
- __u.__a[0] = __A;
- return __u.__v;
+ *(__m128h_u *) __P = __A;
}
-extern __inline __m512h
+/* Create a vector with element 0 as F and the rest zero. */
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castph256_ph512 (__m256h __A)
+_mm_set_sh (_Float16 __F)
{
- union
- {
- __m256h __a[2];
- __m512h __v;
- } __u;
- __u.__a[0] = __A;
- return __u.__v;
+ return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16,
+ __F);
}
-extern __inline __m512h
+/* Create a vector with element 0 as *P and the rest zero. */
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_zextph128_ph512 (__m128h __A)
+_mm_load_sh (void const *__P)
{
- return (__m512h) _mm512_insertf32x4 (_mm512_setzero_ps (),
- (__m128) __A, 0);
+ return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16,
+ *(_Float16 const *) __P);
}
-extern __inline __m512h
+/* Stores the lower _Float16 value. */
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_zextph256_ph512 (__m256h __A)
+_mm_store_sh (void *__P, __m128h __A)
{
- return (__m512h) _mm512_insertf64x4 (_mm512_setzero_pd (),
- (__m256d) __A, 0);
+ *(_Float16 *) __P = ((__v8hf)__A)[0];
}
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castps_ph (__m512 __a)
+/* Intrinsics of v[add,sub,mul,div]sh. */
+extern __inline __m128h
+ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_add_sh (__m128h __A, __m128h __B)
{
- return (__m512h) __a;
+ __A[0] += __B[0];
+ return __A;
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castpd_ph (__m512d __a)
+_mm_mask_add_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return (__m512h) __a;
+ return __builtin_ia32_addsh_mask (__C, __D, __A, __B);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_castsi512_ph (__m512i __a)
+_mm_maskz_add_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- return (__m512h) __a;
+ return __builtin_ia32_addsh_mask (__B, __C, _mm_setzero_ph (),
+ __A);
}
-/* Create a vector with element 0 as F and the rest zero. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_set_sh (_Float16 __F)
+_mm_sub_sh (__m128h __A, __m128h __B)
{
- return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16,
- __F);
+ __A[0] -= __B[0];
+ return __A;
}
-/* Create a vector with element 0 as *P and the rest zero. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_load_sh (void const *__P)
+_mm_mask_sub_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return _mm_set_ph (0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16, 0.0f16,
- *(_Float16 const *) __P);
+ return __builtin_ia32_subsh_mask (__C, __D, __A, __B);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_load_ph (void const *__P)
+_mm_maskz_sub_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- return *(const __m512h *) __P;
+ return __builtin_ia32_subsh_mask (__B, __C, _mm_setzero_ph (),
+ __A);
}
-extern __inline __m256h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_load_ph (void const *__P)
+_mm_mul_sh (__m128h __A, __m128h __B)
{
- return *(const __m256h *) __P;
+ __A[0] *= __B[0];
+ return __A;
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_load_ph (void const *__P)
+_mm_mask_mul_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return *(const __m128h *) __P;
+ return __builtin_ia32_mulsh_mask (__C, __D, __A, __B);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_loadu_ph (void const *__P)
+_mm_maskz_mul_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- return *(const __m512h_u *) __P;
+ return __builtin_ia32_mulsh_mask (__B, __C, _mm_setzero_ph (), __A);
}
-extern __inline __m256h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_loadu_ph (void const *__P)
+_mm_div_sh (__m128h __A, __m128h __B)
{
- return *(const __m256h_u *) __P;
+ __A[0] /= __B[0];
+ return __A;
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_loadu_ph (void const *__P)
+_mm_mask_div_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return *(const __m128h_u *) __P;
+ return __builtin_ia32_divsh_mask (__C, __D, __A, __B);
}
-/* Stores the lower _Float16 value. */
-extern __inline void
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_store_sh (void *__P, __m128h __A)
+_mm_maskz_div_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- *(_Float16 *) __P = ((__v8hf)__A)[0];
-}
-
-extern __inline void
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_store_ph (void *__P, __m512h __A)
-{
- *(__m512h *) __P = __A;
+ return __builtin_ia32_divsh_mask (__B, __C, _mm_setzero_ph (),
+ __A);
}
-extern __inline void
+#ifdef __OPTIMIZE__
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_store_ph (void *__P, __m256h __A)
+_mm_add_round_sh (__m128h __A, __m128h __B, const int __C)
{
- *(__m256h *) __P = __A;
+ return __builtin_ia32_addsh_mask_round (__A, __B,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __C);
}
-extern __inline void
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_store_ph (void *__P, __m128h __A)
+_mm_mask_add_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- *(__m128h *) __P = __A;
+ return __builtin_ia32_addsh_mask_round (__C, __D, __A, __B, __E);
}
-extern __inline void
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_storeu_ph (void *__P, __m512h __A)
+_mm_maskz_add_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D)
{
- *(__m512h_u *) __P = __A;
+ return __builtin_ia32_addsh_mask_round (__B, __C,
+ _mm_setzero_ph (),
+ __A, __D);
}
-extern __inline void
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_storeu_ph (void *__P, __m256h __A)
+_mm_sub_round_sh (__m128h __A, __m128h __B, const int __C)
{
- *(__m256h_u *) __P = __A;
+ return __builtin_ia32_subsh_mask_round (__A, __B,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __C);
}
-extern __inline void
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_storeu_ph (void *__P, __m128h __A)
+_mm_mask_sub_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- *(__m128h_u *) __P = __A;
+ return __builtin_ia32_subsh_mask_round (__C, __D, __A, __B, __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_ph (__m512h __A)
+_mm_maskz_sub_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D)
{
- return (__m512h) _mm512_and_epi32 ( _mm512_set1_epi32 (0x7FFF7FFF),
- (__m512i) __A);
+ return __builtin_ia32_subsh_mask_round (__B, __C,
+ _mm_setzero_ph (),
+ __A, __D);
}
-/* Intrinsics v[add,sub,mul,div]ph. */
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_ph (__m512h __A, __m512h __B)
+_mm_mul_round_sh (__m128h __A, __m128h __B, const int __C)
{
- return (__m512h) ((__v32hf) __A + (__v32hf) __B);
+ return __builtin_ia32_mulsh_mask_round (__A, __B,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __C);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_mul_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return __builtin_ia32_addph512_mask (__C, __D, __A, __B);
+ return __builtin_ia32_mulsh_mask_round (__C, __D, __A, __B, __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_mul_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D)
{
- return __builtin_ia32_addph512_mask (__B, __C,
- _mm512_setzero_ph (), __A);
+ return __builtin_ia32_mulsh_mask_round (__B, __C,
+ _mm_setzero_ph (),
+ __A, __D);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_ph (__m512h __A, __m512h __B)
+_mm_div_round_sh (__m128h __A, __m128h __B, const int __C)
{
- return (__m512h) ((__v32hf) __A - (__v32hf) __B);
+ return __builtin_ia32_divsh_mask_round (__A, __B,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __C);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_div_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return __builtin_ia32_subph512_mask (__C, __D, __A, __B);
+ return __builtin_ia32_divsh_mask_round (__C, __D, __A, __B, __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D)
{
- return __builtin_ia32_subph512_mask (__B, __C,
- _mm512_setzero_ph (), __A);
+ return __builtin_ia32_divsh_mask_round (__B, __C,
+ _mm_setzero_ph (),
+ __A, __D);
}
+#else
+#define _mm_add_round_sh(A, B, C) \
+ ((__m128h)__builtin_ia32_addsh_mask_round ((A), (B), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (C)))
-extern __inline __m512h
+#define _mm_mask_add_round_sh(A, B, C, D, E) \
+ ((__m128h)__builtin_ia32_addsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_add_round_sh(A, B, C, D) \
+ ((__m128h)__builtin_ia32_addsh_mask_round ((B), (C), \
+ _mm_setzero_ph (), \
+ (A), (D)))
+
+#define _mm_sub_round_sh(A, B, C) \
+ ((__m128h)__builtin_ia32_subsh_mask_round ((A), (B), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (C)))
+
+#define _mm_mask_sub_round_sh(A, B, C, D, E) \
+ ((__m128h)__builtin_ia32_subsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_sub_round_sh(A, B, C, D) \
+ ((__m128h)__builtin_ia32_subsh_mask_round ((B), (C), \
+ _mm_setzero_ph (), \
+ (A), (D)))
+
+#define _mm_mul_round_sh(A, B, C) \
+ ((__m128h)__builtin_ia32_mulsh_mask_round ((A), (B), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (C)))
+
+#define _mm_mask_mul_round_sh(A, B, C, D, E) \
+ ((__m128h)__builtin_ia32_mulsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_mul_round_sh(A, B, C, D) \
+ ((__m128h)__builtin_ia32_mulsh_mask_round ((B), (C), \
+ _mm_setzero_ph (), \
+ (A), (D)))
+
+#define _mm_div_round_sh(A, B, C) \
+ ((__m128h)__builtin_ia32_divsh_mask_round ((A), (B), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (C)))
+
+#define _mm_mask_div_round_sh(A, B, C, D, E) \
+ ((__m128h)__builtin_ia32_divsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_div_round_sh(A, B, C, D) \
+ ((__m128h)__builtin_ia32_divsh_mask_round ((B), (C), \
+ _mm_setzero_ph (), \
+ (A), (D)))
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsic vmaxsh vminsh. */
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_ph (__m512h __A, __m512h __B)
+_mm_max_sh (__m128h __A, __m128h __B)
{
- return (__m512h) ((__v32hf) __A * (__v32hf) __B);
+ __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
+ return __A;
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_max_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return __builtin_ia32_mulph512_mask (__C, __D, __A, __B);
+ return __builtin_ia32_maxsh_mask (__C, __D, __A, __B);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_max_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- return __builtin_ia32_mulph512_mask (__B, __C,
- _mm512_setzero_ph (), __A);
+ return __builtin_ia32_maxsh_mask (__B, __C, _mm_setzero_ph (),
+ __A);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_ph (__m512h __A, __m512h __B)
+_mm_min_sh (__m128h __A, __m128h __B)
{
- return (__m512h) ((__v32hf) __A / (__v32hf) __B);
+ __A[0] = __A[0] < __B[0] ? __A[0] : __B[0];
+ return __A;
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_min_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return __builtin_ia32_divph512_mask (__C, __D, __A, __B);
+ return __builtin_ia32_minsh_mask (__C, __D, __A, __B);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_min_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- return __builtin_ia32_divph512_mask (__B, __C,
- _mm512_setzero_ph (), __A);
+ return __builtin_ia32_minsh_mask (__B, __C, _mm_setzero_ph (),
+ __A);
}
#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_add_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_max_round_sh (__m128h __A, __m128h __B, const int __C)
{
- return __builtin_ia32_addph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1, __C);
+ return __builtin_ia32_maxsh_mask_round (__A, __B,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __C);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_add_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm_mask_max_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return __builtin_ia32_addph512_mask_round (__C, __D, __A, __B, __E);
+ return __builtin_ia32_maxsh_mask_round (__C, __D, __A, __B, __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_add_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
- const int __D)
+_mm_maskz_max_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D)
{
- return __builtin_ia32_addph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_maxsh_mask_round (__B, __C,
+ _mm_setzero_ph (),
+ __A, __D);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sub_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_min_round_sh (__m128h __A, __m128h __B, const int __C)
{
- return __builtin_ia32_subph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1, __C);
+ return __builtin_ia32_minsh_mask_round (__A, __B,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __C);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sub_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm_mask_min_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return __builtin_ia32_subph512_mask_round (__C, __D, __A, __B, __E);
+ return __builtin_ia32_minsh_mask_round (__C, __D, __A, __B, __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sub_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
- const int __D)
+_mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D)
{
- return __builtin_ia32_subph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_minsh_mask_round (__B, __C,
+ _mm_setzero_ph (),
+ __A, __D);
}
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mul_round_ph (__m512h __A, __m512h __B, const int __C)
-{
- return __builtin_ia32_mulph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1, __C);
-}
+#else
+#define _mm_max_round_sh(A, B, C) \
+ (__builtin_ia32_maxsh_mask_round ((A), (B), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (C)))
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_mul_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
- __m512h __D, const int __E)
-{
- return __builtin_ia32_mulph512_mask_round (__C, __D, __A, __B, __E);
-}
+#define _mm_mask_max_round_sh(A, B, C, D, E) \
+ (__builtin_ia32_maxsh_mask_round ((C), (D), (A), (B), (E)))
-extern __inline __m512h
+#define _mm_maskz_max_round_sh(A, B, C, D) \
+ (__builtin_ia32_maxsh_mask_round ((B), (C), \
+ _mm_setzero_ph (), \
+ (A), (D)))
+
+#define _mm_min_round_sh(A, B, C) \
+ (__builtin_ia32_minsh_mask_round ((A), (B), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (C)))
+
+#define _mm_mask_min_round_sh(A, B, C, D, E) \
+ (__builtin_ia32_minsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_min_round_sh(A, B, C, D) \
+ (__builtin_ia32_minsh_mask_round ((B), (C), \
+ _mm_setzero_ph (), \
+ (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcmpsh. */
+#ifdef __OPTIMIZE__
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_mul_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
- const int __D)
+_mm_cmp_sh_mask (__m128h __A, __m128h __B, const int __C)
{
- return __builtin_ia32_mulph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A, __D);
+ return (__mmask8)
+ __builtin_ia32_cmpsh_mask_round (__A, __B,
+ __C, (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_div_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_mask_cmp_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D)
{
- return __builtin_ia32_divph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1, __C);
+ return (__mmask8)
+ __builtin_ia32_cmpsh_mask_round (__B, __C,
+ __D, __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_div_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm_cmp_round_sh_mask (__m128h __A, __m128h __B, const int __C,
+ const int __D)
{
- return __builtin_ia32_divph512_mask_round (__C, __D, __A, __B, __E);
+ return (__mmask8) __builtin_ia32_cmpsh_mask_round (__A, __B,
+ __C, (__mmask8) -1,
+ __D);
}
-extern __inline __m512h
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
- const int __D)
+_mm_mask_cmp_round_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D, const int __E)
{
- return __builtin_ia32_divph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A, __D);
+ return (__mmask8) __builtin_ia32_cmpsh_mask_round (__B, __C,
+ __D, __A,
+ __E);
}
-#else
-#define _mm512_add_round_ph(A, B, C) \
- ((__m512h)__builtin_ia32_addph512_mask_round((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (C)))
-
-#define _mm512_mask_add_round_ph(A, B, C, D, E) \
- ((__m512h)__builtin_ia32_addph512_mask_round((C), (D), (A), (B), (E)))
-
-#define _mm512_maskz_add_round_ph(A, B, C, D) \
- ((__m512h)__builtin_ia32_addph512_mask_round((B), (C), \
- _mm512_setzero_ph (), \
- (A), (D)))
-
-#define _mm512_sub_round_ph(A, B, C) \
- ((__m512h)__builtin_ia32_subph512_mask_round((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (C)))
-
-#define _mm512_mask_sub_round_ph(A, B, C, D, E) \
- ((__m512h)__builtin_ia32_subph512_mask_round((C), (D), (A), (B), (E)))
-
-#define _mm512_maskz_sub_round_ph(A, B, C, D) \
- ((__m512h)__builtin_ia32_subph512_mask_round((B), (C), \
- _mm512_setzero_ph (), \
- (A), (D)))
-
-#define _mm512_mul_round_ph(A, B, C) \
- ((__m512h)__builtin_ia32_mulph512_mask_round((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (C)))
-#define _mm512_mask_mul_round_ph(A, B, C, D, E) \
- ((__m512h)__builtin_ia32_mulph512_mask_round((C), (D), (A), (B), (E)))
+#else
+#define _mm_cmp_sh_mask(A, B, C) \
+ (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), \
+ (_MM_FROUND_CUR_DIRECTION)))
-#define _mm512_maskz_mul_round_ph(A, B, C, D) \
- ((__m512h)__builtin_ia32_mulph512_mask_round((B), (C), \
- _mm512_setzero_ph (), \
- (A), (D)))
+#define _mm_mask_cmp_sh_mask(A, B, C, D) \
+ (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), \
+ (_MM_FROUND_CUR_DIRECTION)))
-#define _mm512_div_round_ph(A, B, C) \
- ((__m512h)__builtin_ia32_divph512_mask_round((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (C)))
+#define _mm_cmp_round_sh_mask(A, B, C, D) \
+ (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), (D)))
-#define _mm512_mask_div_round_ph(A, B, C, D, E) \
- ((__m512h)__builtin_ia32_divph512_mask_round((C), (D), (A), (B), (E)))
+#define _mm_mask_cmp_round_sh_mask(A, B, C, D, E) \
+ (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), (E)))
-#define _mm512_maskz_div_round_ph(A, B, C, D) \
- ((__m512h)__builtin_ia32_divph512_mask_round((B), (C), \
- _mm512_setzero_ph (), \
- (A), (D)))
-#endif /* __OPTIMIZE__ */
+#endif /* __OPTIMIZE__ */
-extern __inline __m512h
+/* Intrinsics vcomish. */
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_conj_pch (__m512h __A)
+_mm_comieq_sh (__m128h __A, __m128h __B)
{
- return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 (1<<31));
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OS,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A)
+_mm_comilt_sh (__m128h __A, __m128h __B)
{
- return (__m512h)
- __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
- (__v16sf) __W,
- (__mmask16) __U);
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OS,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A)
+_mm_comile_sh (__m128h __A, __m128h __B)
{
- return (__m512h)
- __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
- (__v16sf) _mm512_setzero_ps (),
- (__mmask16) __U);
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OS,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-/* Intrinsics of v[add,sub,mul,div]sh. */
-extern __inline __m128h
- __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_add_sh (__m128h __A, __m128h __B)
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comigt_sh (__m128h __A, __m128h __B)
{
- __A[0] += __B[0];
- return __A;
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OS,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_comige_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_addsh_mask (__C, __D, __A, __B);
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OS,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_comineq_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_addsh_mask (__B, __C, _mm_setzero_ph (),
- __A);
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_US,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sub_sh (__m128h __A, __m128h __B)
+_mm_ucomieq_sh (__m128h __A, __m128h __B)
{
- __A[0] -= __B[0];
- return __A;
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OQ,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_ucomilt_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_subsh_mask (__C, __D, __A, __B);
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OQ,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_ucomile_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_subsh_mask (__B, __C, _mm_setzero_ph (),
- __A);
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OQ,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mul_sh (__m128h __A, __m128h __B)
+_mm_ucomigt_sh (__m128h __A, __m128h __B)
{
- __A[0] *= __B[0];
- return __A;
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OQ,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_ucomige_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_mulsh_mask (__C, __D, __A, __B);
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OQ,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_ucomineq_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_mulsh_mask (__B, __C, _mm_setzero_ph (), __A);
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_UQ,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_div_sh (__m128h __A, __m128h __B)
+_mm_comi_sh (__m128h __A, __m128h __B, const int __P)
{
- __A[0] /= __B[0];
- return __A;
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, __P,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R)
{
- return __builtin_ia32_divsh_mask (__C, __D, __A, __B);
+ return __builtin_ia32_cmpsh_mask_round (__A, __B, __P,
+ (__mmask8) -1,__R);
}
+#else
+#define _mm_comi_round_sh(A, B, P, R) \
+ (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), (R)))
+#define _mm_comi_sh(A, B, P) \
+ (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), \
+ _MM_FROUND_CUR_DIRECTION))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vsqrtsh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_sqrt_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_divsh_mask (__B, __C, _mm_setzero_ph (),
- __A);
+ return __builtin_ia32_sqrtsh_mask_round (__B, __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_add_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_sqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return __builtin_ia32_addsh_mask_round (__A, __B,
- _mm_setzero_ph (),
- (__mmask8) -1, __C);
+ return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_add_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm_maskz_sqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- return __builtin_ia32_addsh_mask_round (__C, __D, __A, __B, __E);
+ return __builtin_ia32_sqrtsh_mask_round (__C, __B,
+ _mm_setzero_ph (),
+ __A, _MM_FROUND_CUR_DIRECTION);
}
+#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_add_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D)
+_mm_sqrt_round_sh (__m128h __A, __m128h __B, const int __C)
{
- return __builtin_ia32_addsh_mask_round (__B, __C,
- _mm_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_sqrtsh_mask_round (__B, __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __C);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sub_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_sqrt_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return __builtin_ia32_subsh_mask_round (__A, __B,
- _mm_setzero_ph (),
- (__mmask8) -1, __C);
+ return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B,
+ __E);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sub_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D)
{
- return __builtin_ia32_subsh_mask_round (__C, __D, __A, __B, __E);
+ return __builtin_ia32_sqrtsh_mask_round (__C, __B,
+ _mm_setzero_ph (),
+ __A, __D);
}
+#else
+#define _mm_sqrt_round_sh(A, B, C) \
+ (__builtin_ia32_sqrtsh_mask_round ((B), (A), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (C)))
+
+#define _mm_mask_sqrt_round_sh(A, B, C, D, E) \
+ (__builtin_ia32_sqrtsh_mask_round ((D), (C), (A), (B), (E)))
+
+#define _mm_maskz_sqrt_round_sh(A, B, C, D) \
+ (__builtin_ia32_sqrtsh_mask_round ((C), (B), \
+ _mm_setzero_ph (), \
+ (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vrsqrtsh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sub_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D)
+_mm_rsqrt_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_subsh_mask_round (__B, __C,
- _mm_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_rsqrtsh_mask (__B, __A, _mm_setzero_ph (),
+ (__mmask8) -1);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mul_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_rsqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return __builtin_ia32_mulsh_mask_round (__A, __B,
- _mm_setzero_ph (),
- (__mmask8) -1, __C);
+ return __builtin_ia32_rsqrtsh_mask (__D, __C, __A, __B);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_mul_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm_maskz_rsqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- return __builtin_ia32_mulsh_mask_round (__C, __D, __A, __B, __E);
+ return __builtin_ia32_rsqrtsh_mask (__C, __B, _mm_setzero_ph (),
+ __A);
}
+/* Intrinsics vrcpsh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_mul_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D)
+_mm_rcp_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_mulsh_mask_round (__B, __C,
- _mm_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_rcpsh_mask (__B, __A, _mm_setzero_ph (),
+ (__mmask8) -1);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_div_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_rcp_sh (__m128h __A, __mmask32 __B, __m128h __C, __m128h __D)
{
- return __builtin_ia32_divsh_mask_round (__A, __B,
- _mm_setzero_ph (),
- (__mmask8) -1, __C);
+ return __builtin_ia32_rcpsh_mask (__D, __C, __A, __B);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_div_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm_maskz_rcp_sh (__mmask32 __A, __m128h __B, __m128h __C)
{
- return __builtin_ia32_divsh_mask_round (__C, __D, __A, __B, __E);
+ return __builtin_ia32_rcpsh_mask (__C, __B, _mm_setzero_ph (),
+ __A);
}
+/* Intrinsics vscalefsh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D)
+_mm_scalef_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_divsh_mask_round (__B, __C,
- _mm_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_scalefsh_mask_round (__A, __B,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm_add_round_sh(A, B, C) \
- ((__m128h)__builtin_ia32_addsh_mask_round ((A), (B), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (C)))
-
-#define _mm_mask_add_round_sh(A, B, C, D, E) \
- ((__m128h)__builtin_ia32_addsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_add_round_sh(A, B, C, D) \
- ((__m128h)__builtin_ia32_addsh_mask_round ((B), (C), \
- _mm_setzero_ph (), \
- (A), (D)))
-
-#define _mm_sub_round_sh(A, B, C) \
- ((__m128h)__builtin_ia32_subsh_mask_round ((A), (B), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (C)))
-
-#define _mm_mask_sub_round_sh(A, B, C, D, E) \
- ((__m128h)__builtin_ia32_subsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_sub_round_sh(A, B, C, D) \
- ((__m128h)__builtin_ia32_subsh_mask_round ((B), (C), \
- _mm_setzero_ph (), \
- (A), (D)))
-
-#define _mm_mul_round_sh(A, B, C) \
- ((__m128h)__builtin_ia32_mulsh_mask_round ((A), (B), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (C)))
-
-#define _mm_mask_mul_round_sh(A, B, C, D, E) \
- ((__m128h)__builtin_ia32_mulsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_mul_round_sh(A, B, C, D) \
- ((__m128h)__builtin_ia32_mulsh_mask_round ((B), (C), \
- _mm_setzero_ph (), \
- (A), (D)))
-
-#define _mm_div_round_sh(A, B, C) \
- ((__m128h)__builtin_ia32_divsh_mask_round ((A), (B), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (C)))
-
-#define _mm_mask_div_round_sh(A, B, C, D, E) \
- ((__m128h)__builtin_ia32_divsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_div_round_sh(A, B, C, D) \
- ((__m128h)__builtin_ia32_divsh_mask_round ((B), (C), \
- _mm_setzero_ph (), \
- (A), (D)))
-#endif /* __OPTIMIZE__ */
-/* Intrinsic vmaxph vminph. */
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_ph (__m512h __A, __m512h __B)
+_mm_mask_scalef_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return __builtin_ia32_maxph512_mask (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1);
+ return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_maskz_scalef_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- return __builtin_ia32_maxph512_mask (__C, __D, __A, __B);
+ return __builtin_ia32_scalefsh_mask_round (__B, __C,
+ _mm_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_scalef_round_sh (__m128h __A, __m128h __B, const int __C)
{
- return __builtin_ia32_maxph512_mask (__B, __C,
- _mm512_setzero_ph (), __A);
+ return __builtin_ia32_scalefsh_mask_round (__A, __B,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __C);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_ph (__m512h __A, __m512h __B)
+_mm_mask_scalef_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return __builtin_ia32_minph512_mask (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1);
+ return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B,
+ __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __D)
{
- return __builtin_ia32_minph512_mask (__C, __D, __A, __B);
+ return __builtin_ia32_scalefsh_mask_round (__B, __C,
+ _mm_setzero_ph (),
+ __A, __D);
}
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_ph (__mmask32 __A, __m512h __B, __m512h __C)
-{
- return __builtin_ia32_minph512_mask (__B, __C,
- _mm512_setzero_ph (), __A);
-}
+#else
+#define _mm_scalef_round_sh(A, B, C) \
+ (__builtin_ia32_scalefsh_mask_round ((A), (B), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (C)))
+#define _mm_mask_scalef_round_sh(A, B, C, D, E) \
+ (__builtin_ia32_scalefsh_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_scalef_round_sh(A, B, C, D) \
+ (__builtin_ia32_scalefsh_mask_round ((B), (C), _mm_setzero_ph (), \
+ (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vreducesh. */
#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_max_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_reduce_sh (__m128h __A, __m128h __B, int __C)
{
- return __builtin_ia32_maxph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1, __C);
+ return __builtin_ia32_reducesh_mask_round (__A, __B, __C,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_max_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm_mask_reduce_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, int __E)
{
- return __builtin_ia32_maxph512_mask_round (__C, __D, __A, __B, __E);
+ return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_max_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
- const int __D)
+_mm_maskz_reduce_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
{
- return __builtin_ia32_maxph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_reducesh_mask_round (__B, __C, __D,
+ _mm_setzero_ph (), __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_min_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_reduce_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
{
- return __builtin_ia32_minph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1, __C);
+ return __builtin_ia32_reducesh_mask_round (__A, __B, __C,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __D);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_min_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm_mask_reduce_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, int __E, const int __F)
{
- return __builtin_ia32_minph512_mask_round (__C, __D, __A, __B, __E);
+ return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A,
+ __B, __F);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_min_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
- const int __D)
+_mm_maskz_reduce_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ int __D, const int __E)
{
- return __builtin_ia32_minph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_reducesh_mask_round (__B, __C, __D,
+ _mm_setzero_ph (),
+ __A, __E);
}
#else
-#define _mm512_max_round_ph(A, B, C) \
- (__builtin_ia32_maxph512_mask_round ((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (C)))
+#define _mm_reduce_sh(A, B, C) \
+ (__builtin_ia32_reducesh_mask_round ((A), (B), (C), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_mask_max_round_ph(A, B, C, D, E) \
- (__builtin_ia32_maxph512_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_reduce_sh(A, B, C, D, E) \
+ (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_maskz_max_round_ph(A, B, C, D) \
- (__builtin_ia32_maxph512_mask_round ((B), (C), \
- _mm512_setzero_ph (), \
- (A), (D)))
+#define _mm_maskz_reduce_sh(A, B, C, D) \
+ (__builtin_ia32_reducesh_mask_round ((B), (C), (D), \
+ _mm_setzero_ph (), \
+ (A), _MM_FROUND_CUR_DIRECTION))
-#define _mm512_min_round_ph(A, B, C) \
- (__builtin_ia32_minph512_mask_round ((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (C)))
+#define _mm_reduce_round_sh(A, B, C, D) \
+ (__builtin_ia32_reducesh_mask_round ((A), (B), (C), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (D)))
-#define _mm512_mask_min_round_ph(A, B, C, D, E) \
- (__builtin_ia32_minph512_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_reduce_round_sh(A, B, C, D, E, F) \
+ (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), (F)))
+
+#define _mm_maskz_reduce_round_sh(A, B, C, D, E) \
+ (__builtin_ia32_reducesh_mask_round ((B), (C), (D), \
+ _mm_setzero_ph (), \
+ (A), (E)))
-#define _mm512_maskz_min_round_ph(A, B, C, D) \
- (__builtin_ia32_minph512_mask_round ((B), (C), \
- _mm512_setzero_ph (), \
- (A), (D)))
#endif /* __OPTIMIZE__ */
-/* Intrinsic vmaxsh vminsh. */
+/* Intrinsics vrndscalesh. */
+#ifdef __OPTIMIZE__
extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_max_sh (__m128h __A, __m128h __B)
+ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_sh (__m128h __A, __m128h __B, int __C)
{
- __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
- return __A;
+ return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_mask_roundscale_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, int __E)
{
- return __builtin_ia32_maxsh_mask (__C, __D, __A, __B);
+ return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_maskz_roundscale_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
{
- return __builtin_ia32_maxsh_mask (__B, __C, _mm_setzero_ph (),
- __A);
+ return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D,
+ _mm_setzero_ph (), __A,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_min_sh (__m128h __A, __m128h __B)
+_mm_roundscale_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
{
- __A[0] = __A[0] < __B[0] ? __A[0] : __B[0];
- return __A;
+ return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ __D);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_mask_roundscale_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, int __E, const int __F)
{
- return __builtin_ia32_minsh_mask (__C, __D, __A, __B);
+ return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E,
+ __A, __B, __F);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+ int __D, const int __E)
{
- return __builtin_ia32_minsh_mask (__B, __C, _mm_setzero_ph (),
- __A);
+ return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D,
+ _mm_setzero_ph (),
+ __A, __E);
}
+#else
+#define _mm_roundscale_sh(A, B, C) \
+ (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, \
+ _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_roundscale_sh(A, B, C, D, E) \
+ (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), \
+ _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_roundscale_sh(A, B, C, D) \
+ (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D), \
+ _mm_setzero_ph (), \
+ (A), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_roundscale_round_sh(A, B, C, D) \
+ (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (D)))
+
+#define _mm_mask_roundscale_round_sh(A, B, C, D, E, F) \
+ (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), (F)))
+
+#define _mm_maskz_roundscale_round_sh(A, B, C, D, E) \
+ (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D), \
+ _mm_setzero_ph (), \
+ (A), (E)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfpclasssh. */
#ifdef __OPTIMIZE__
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_max_round_sh (__m128h __A, __m128h __B, const int __C)
+extern __inline __mmask8
+ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fpclass_sh_mask (__m128h __A, const int __imm)
{
- return __builtin_ia32_maxsh_mask_round (__A, __B,
- _mm_setzero_ph (),
- (__mmask8) -1, __C);
+ return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm,
+ (__mmask8) -1);
}
-extern __inline __m128h
+extern __inline __mmask8
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_max_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm_mask_fpclass_sh_mask (__mmask8 __U, __m128h __A, const int __imm)
{
- return __builtin_ia32_maxsh_mask_round (__C, __D, __A, __B, __E);
+ return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, __U);
}
+#else
+#define _mm_fpclass_sh_mask(X, C) \
+ ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \
+ (int) (C), (__mmask8) (-1))) \
+
+#define _mm_mask_fpclass_sh_mask(U, X, C) \
+ ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \
+ (int) (C), (__mmask8) (U)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vgetexpsh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_max_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D)
+_mm_getexp_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_maxsh_mask_round (__B, __C,
- _mm_setzero_ph (),
- __A, __D);
+ return (__m128h)
+ __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+ (__v8hf) _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_min_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_mask_getexp_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
{
- return __builtin_ia32_minsh_mask_round (__A, __B,
- _mm_setzero_ph (),
- (__mmask8) -1, __C);
+ return (__m128h)
+ __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+ (__v8hf) __W, (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_min_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm_maskz_getexp_sh (__mmask8 __U, __m128h __A, __m128h __B)
{
- return __builtin_ia32_minsh_mask_round (__C, __D, __A, __B, __E);
+ return (__m128h)
+ __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+ (__v8hf) _mm_setzero_ph (),
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
+#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D)
+_mm_getexp_round_sh (__m128h __A, __m128h __B, const int __R)
{
- return __builtin_ia32_minsh_mask_round (__B, __C,
- _mm_setzero_ph (),
- __A, __D);
+ return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
+ (__v8hf) __B,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ __R);
}
-#else
-#define _mm_max_round_sh(A, B, C) \
- (__builtin_ia32_maxsh_mask_round ((A), (B), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (C)))
-
-#define _mm_mask_max_round_sh(A, B, C, D, E) \
- (__builtin_ia32_maxsh_mask_round ((C), (D), (A), (B), (E)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getexp_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
+ __m128h __B, const int __R)
+{
+ return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf) __W,
+ (__mmask8) __U, __R);
+}
-#define _mm_maskz_max_round_sh(A, B, C, D) \
- (__builtin_ia32_maxsh_mask_round ((B), (C), \
- _mm_setzero_ph (), \
- (A), (D)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getexp_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
+ const int __R)
+{
+ return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf)
+ _mm_setzero_ph (),
+ (__mmask8) __U, __R);
+}
-#define _mm_min_round_sh(A, B, C) \
- (__builtin_ia32_minsh_mask_round ((A), (B), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (C)))
+#else
+#define _mm_getexp_round_sh(A, B, R) \
+ ((__m128h)__builtin_ia32_getexpsh_mask_round((__v8hf)(__m128h)(A), \
+ (__v8hf)(__m128h)(B), \
+ (__v8hf)_mm_setzero_ph(), \
+ (__mmask8)-1, R))
-#define _mm_mask_min_round_sh(A, B, C, D, E) \
- (__builtin_ia32_minsh_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_getexp_round_sh(W, U, A, B, C) \
+ (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, W, U, C)
-#define _mm_maskz_min_round_sh(A, B, C, D) \
- (__builtin_ia32_minsh_mask_round ((B), (C), \
- _mm_setzero_ph (), \
- (A), (D)))
+#define _mm_maskz_getexp_round_sh(U, A, B, C) \
+ (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, \
+ (__v8hf)_mm_setzero_ph(), \
+ U, C)
#endif /* __OPTIMIZE__ */
-/* vcmpph */
-#ifdef __OPTIMIZE
-extern __inline __mmask32
+/* Intrinsics vgetmantsh. */
+#ifdef __OPTIMIZE__
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_ph_mask (__m512h __A, __m512h __B, const int __C)
+_mm_getmant_sh (__m128h __A, __m128h __B,
+ _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D)
{
- return (__mmask32) __builtin_ia32_cmpph512_mask (__A, __B, __C,
- (__mmask32) -1);
+ return (__m128h)
+ __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+ (__D << 2) | __C, _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask32
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
- const int __D)
+_mm_mask_getmant_sh (__m128h __W, __mmask8 __U, __m128h __A,
+ __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D)
{
- return (__mmask32) __builtin_ia32_cmpph512_mask (__B, __C, __D,
- __A);
+ return (__m128h)
+ __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+ (__D << 2) | __C, (__v8hf) __W,
+ __U, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask32
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cmp_round_ph_mask (__m512h __A, __m512h __B, const int __C,
- const int __D)
+_mm_maskz_getmant_sh (__mmask8 __U, __m128h __A, __m128h __B,
+ _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D)
{
- return (__mmask32) __builtin_ia32_cmpph512_mask_round (__A, __B,
- __C, (__mmask32) -1,
- __D);
+ return (__m128h)
+ __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+ (__D << 2) | __C,
+ (__v8hf) _mm_setzero_ph(),
+ __U, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask32
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cmp_round_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
- const int __D, const int __E)
+_mm_getmant_round_sh (__m128h __A, __m128h __B,
+ _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D, const int __R)
{
- return (__mmask32) __builtin_ia32_cmpph512_mask_round (__B, __C,
- __D, __A,
- __E);
+ return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__D << 2) | __C,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getmant_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
+ __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+ return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__D << 2) | __C,
+ (__v8hf) __W,
+ __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getmant_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
+ _MM_MANTISSA_NORM_ENUM __C,
+ _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+ return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__D << 2) | __C,
+ (__v8hf)
+ _mm_setzero_ph(),
+ __U, __R);
}
#else
-#define _mm512_cmp_ph_mask(A, B, C) \
- (__builtin_ia32_cmpph512_mask ((A), (B), (C), (-1)))
+#define _mm_getmant_sh(X, Y, C, D) \
+ ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
+ (__v8hf)(__m128h)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v8hf)(__m128h) \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_mask_cmp_ph_mask(A, B, C, D) \
- (__builtin_ia32_cmpph512_mask ((B), (C), (D), (A)))
+#define _mm_mask_getmant_sh(W, U, X, Y, C, D) \
+ ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
+ (__v8hf)(__m128h)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v8hf)(__m128h)(W), \
+ (__mmask8)(U), \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_cmp_round_ph_mask(A, B, C, D) \
- (__builtin_ia32_cmpph512_mask_round ((A), (B), (C), (-1), (D)))
+#define _mm_maskz_getmant_sh(U, X, Y, C, D) \
+ ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
+ (__v8hf)(__m128h)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v8hf)(__m128h) \
+ _mm_setzero_ph(), \
+ (__mmask8)(U), \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_mask_cmp_round_ph_mask(A, B, C, D, E) \
- (__builtin_ia32_cmpph512_mask_round ((B), (C), (D), (A), (E)))
+#define _mm_getmant_round_sh(X, Y, C, D, R) \
+ ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
+ (__v8hf)(__m128h)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v8hf)(__m128h) \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, \
+ (R)))
+
+#define _mm_mask_getmant_round_sh(W, U, X, Y, C, D, R) \
+ ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
+ (__v8hf)(__m128h)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v8hf)(__m128h)(W), \
+ (__mmask8)(U), \
+ (R)))
+
+#define _mm_maskz_getmant_round_sh(U, X, Y, C, D, R) \
+ ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
+ (__v8hf)(__m128h)(Y), \
+ (int)(((D)<<2) | (C)), \
+ (__v8hf)(__m128h) \
+ _mm_setzero_ph(), \
+ (__mmask8)(U), \
+ (R)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcmpsh. */
-#ifdef __OPTIMIZE__
-extern __inline __mmask8
+/* Intrinsics vmovw. */
+extern __inline __m128i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_sh_mask (__m128h __A, __m128h __B, const int __C)
+_mm_cvtsi16_si128 (short __A)
{
- return (__mmask8)
- __builtin_ia32_cmpsh_mask_round (__A, __B,
- __C, (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A);
}
-extern __inline __mmask8
+extern __inline short
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D)
+_mm_cvtsi128_si16 (__m128i __A)
{
- return (__mmask8)
- __builtin_ia32_cmpsh_mask_round (__B, __C,
- __D, __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0);
}
-extern __inline __mmask8
+/* Intrinsics vmovsh. */
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cmp_round_sh_mask (__m128h __A, __m128h __B, const int __C,
- const int __D)
+_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C)
{
- return (__mmask8) __builtin_ia32_cmpsh_mask_round (__A, __B,
- __C, (__mmask8) -1,
- __D);
+ return __builtin_ia32_loadsh_mask (__C, __A, __B);
}
-extern __inline __mmask8
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cmp_round_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D, const int __E)
+_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B)
{
- return (__mmask8) __builtin_ia32_cmpsh_mask_round (__B, __C,
- __D, __A,
- __E);
+ return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A);
}
-#else
-#define _mm_cmp_sh_mask(A, B, C) \
- (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), \
- (_MM_FROUND_CUR_DIRECTION)))
-
-#define _mm_mask_cmp_sh_mask(A, B, C, D) \
- (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), \
- (_MM_FROUND_CUR_DIRECTION)))
-
-#define _mm_cmp_round_sh_mask(A, B, C, D) \
- (__builtin_ia32_cmpsh_mask_round ((A), (B), (C), (-1), (D)))
-
-#define _mm_mask_cmp_round_sh_mask(A, B, C, D, E) \
- (__builtin_ia32_cmpsh_mask_round ((B), (C), (D), (A), (E)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcomish. */
-extern __inline int
+extern __inline void
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comieq_sh (__m128h __A, __m128h __B)
+_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OS,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_storesh_mask (__A, __C, __B);
}
-extern __inline int
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comilt_sh (__m128h __A, __m128h __B)
+_mm_move_sh (__m128h __A, __m128h __B)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OS,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ __A[0] = __B[0];
+ return __A;
}
-extern __inline int
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comile_sh (__m128h __A, __m128h __B)
+_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OS,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B);
}
-extern __inline int
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comigt_sh (__m128h __A, __m128h __B)
+_mm_maskz_move_sh (__mmask8 __A, __m128h __B, __m128h __C)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OS,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A);
}
+/* Intrinsics vcvtsh2si, vcvtsh2us. */
extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comige_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_i32 (__m128h __A)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OS,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (int) __builtin_ia32_vcvtsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline int
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comineq_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_u32 (__m128h __A)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_US,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (int) __builtin_ia32_vcvtsh2usi32_round (__A,
+ _MM_FROUND_CUR_DIRECTION);
}
+#ifdef __OPTIMIZE__
extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomieq_sh (__m128h __A, __m128h __B)
+_mm_cvt_roundsh_i32 (__m128h __A, const int __R)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_EQ_OQ,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R);
}
-extern __inline int
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomilt_sh (__m128h __A, __m128h __B)
+_mm_cvt_roundsh_u32 (__m128h __A, const int __R)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LT_OQ,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R);
}
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomile_sh (__m128h __A, __m128h __B)
-{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_LE_OQ,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
+#else
+#define _mm_cvt_roundsh_i32(A, B) \
+ ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B)))
+#define _mm_cvt_roundsh_u32(A, B) \
+ ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B)))
-extern __inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomigt_sh (__m128h __A, __m128h __B)
-{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GT_OQ,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
+#endif /* __OPTIMIZE__ */
-extern __inline int
+#ifdef __x86_64__
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomige_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_i64 (__m128h __A)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_GE_OQ,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (long long)
+ __builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline int
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ucomineq_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_u64 (__m128h __A)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, _CMP_NEQ_UQ,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (long long)
+ __builtin_ia32_vcvtsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline int
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comi_sh (__m128h __A, __m128h __B, const int __P)
+_mm_cvt_roundsh_i64 (__m128h __A, const int __R)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, __P,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (long long) __builtin_ia32_vcvtsh2si64_round (__A, __R);
}
-extern __inline int
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R)
+_mm_cvt_roundsh_u64 (__m128h __A, const int __R)
{
- return __builtin_ia32_cmpsh_mask_round (__A, __B, __P,
- (__mmask8) -1,__R);
+ return (long long) __builtin_ia32_vcvtsh2usi64_round (__A, __R);
}
#else
-#define _mm_comi_round_sh(A, B, P, R) \
- (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), (R)))
-#define _mm_comi_sh(A, B, P) \
- (__builtin_ia32_cmpsh_mask_round ((A), (B), (P), (__mmask8) (-1), \
- _MM_FROUND_CUR_DIRECTION))
-
-#endif /* __OPTIMIZE__ */
+#define _mm_cvt_roundsh_i64(A, B) \
+ ((long long)__builtin_ia32_vcvtsh2si64_round ((A), (B)))
+#define _mm_cvt_roundsh_u64(A, B) \
+ ((long long)__builtin_ia32_vcvtsh2usi64_round ((A), (B)))
-/* Intrinsics vsqrtph. */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_ph (__m512h __A)
-{
- return __builtin_ia32_sqrtph512_mask_round (__A,
- _mm512_setzero_ph(),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
+#endif /* __OPTIMIZE__ */
+#endif /* __x86_64__ */
-extern __inline __m512h
+/* Intrinsics vcvtsi2sh, vcvtusi2sh. */
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
+_mm_cvti32_sh (__m128h __A, int __B)
{
- return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtsi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_ph (__mmask32 __A, __m512h __B)
+_mm_cvtu32_sh (__m128h __A, unsigned int __B)
{
- return __builtin_ia32_sqrtph512_mask_round (__B,
- _mm512_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtusi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_sqrt_round_ph (__m512h __A, const int __B)
-{
- return __builtin_ia32_sqrtph512_mask_round (__A,
- _mm512_setzero_ph(),
- (__mmask32) -1, __B);
-}
-
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_sqrt_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
- const int __D)
+_mm_cvt_roundi32_sh (__m128h __A, int __B, const int __R)
{
- return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B, __D);
+ return __builtin_ia32_vcvtsi2sh32_round (__A, __B, __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_sqrt_round_ph (__mmask32 __A, __m512h __B, const int __C)
+_mm_cvt_roundu32_sh (__m128h __A, unsigned int __B, const int __R)
{
- return __builtin_ia32_sqrtph512_mask_round (__B,
- _mm512_setzero_ph (),
- __A, __C);
+ return __builtin_ia32_vcvtusi2sh32_round (__A, __B, __R);
}
#else
-#define _mm512_sqrt_round_ph(A, B) \
- (__builtin_ia32_sqrtph512_mask_round ((A), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (B)))
-
-#define _mm512_mask_sqrt_round_ph(A, B, C, D) \
- (__builtin_ia32_sqrtph512_mask_round ((C), (A), (B), (D)))
-
-#define _mm512_maskz_sqrt_round_ph(A, B, C) \
- (__builtin_ia32_sqrtph512_mask_round ((B), \
- _mm512_setzero_ph (), \
- (A), (C)))
+#define _mm_cvt_roundi32_sh(A, B, C) \
+ (__builtin_ia32_vcvtsi2sh32_round ((A), (B), (C)))
+#define _mm_cvt_roundu32_sh(A, B, C) \
+ (__builtin_ia32_vcvtusi2sh32_round ((A), (B), (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vrsqrtph. */
-extern __inline __m512h
+#ifdef __x86_64__
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rsqrt_ph (__m512h __A)
+_mm_cvti64_sh (__m128h __A, long long __B)
{
- return __builtin_ia32_rsqrtph512_mask (__A, _mm512_setzero_ph (),
- (__mmask32) -1);
+ return __builtin_ia32_vcvtsi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rsqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
+_mm_cvtu64_sh (__m128h __A, unsigned long long __B)
{
- return __builtin_ia32_rsqrtph512_mask (__C, __A, __B);
+ return __builtin_ia32_vcvtusi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rsqrt_ph (__mmask32 __A, __m512h __B)
+_mm_cvt_roundi64_sh (__m128h __A, long long __B, const int __R)
{
- return __builtin_ia32_rsqrtph512_mask (__B, _mm512_setzero_ph (),
- __A);
+ return __builtin_ia32_vcvtsi2sh64_round (__A, __B, __R);
}
-/* Intrinsics vrsqrtsh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rsqrt_sh (__m128h __A, __m128h __B)
+_mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R)
{
- return __builtin_ia32_rsqrtsh_mask (__B, __A, _mm_setzero_ph (),
- (__mmask8) -1);
+ return __builtin_ia32_vcvtusi2sh64_round (__A, __B, __R);
}
-extern __inline __m128h
+#else
+#define _mm_cvt_roundi64_sh(A, B, C) \
+ (__builtin_ia32_vcvtsi2sh64_round ((A), (B), (C)))
+#define _mm_cvt_roundu64_sh(A, B, C) \
+ (__builtin_ia32_vcvtusi2sh64_round ((A), (B), (C)))
+
+#endif /* __OPTIMIZE__ */
+#endif /* __x86_64__ */
+
+/* Intrinsics vcvttsh2si, vcvttsh2us. */
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rsqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_cvttsh_i32 (__m128h __A)
{
- return __builtin_ia32_rsqrtsh_mask (__D, __C, __A, __B);
+ return (int)
+ __builtin_ia32_vcvttsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rsqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_cvttsh_u32 (__m128h __A)
{
- return __builtin_ia32_rsqrtsh_mask (__C, __B, _mm_setzero_ph (),
- __A);
+ return (int)
+ __builtin_ia32_vcvttsh2usi32_round (__A, _MM_FROUND_CUR_DIRECTION);
}
-/* Intrinsics vsqrtsh. */
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline int
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sqrt_sh (__m128h __A, __m128h __B)
+_mm_cvtt_roundsh_i32 (__m128h __A, const int __R)
{
- return __builtin_ia32_sqrtsh_mask_round (__B, __A,
- _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (int) __builtin_ia32_vcvttsh2si32_round (__A, __R);
}
-extern __inline __m128h
+extern __inline unsigned
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_cvtt_roundsh_u32 (__m128h __A, const int __R)
{
- return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (int) __builtin_ia32_vcvttsh2usi32_round (__A, __R);
}
-extern __inline __m128h
+#else
+#define _mm_cvtt_roundsh_i32(A, B) \
+ ((int)__builtin_ia32_vcvttsh2si32_round ((A), (B)))
+#define _mm_cvtt_roundsh_u32(A, B) \
+ ((int)__builtin_ia32_vcvttsh2usi32_round ((A), (B)))
+
+#endif /* __OPTIMIZE__ */
+
+#ifdef __x86_64__
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_cvttsh_i64 (__m128h __A)
{
- return __builtin_ia32_sqrtsh_mask_round (__C, __B,
- _mm_setzero_ph (),
- __A, _MM_FROUND_CUR_DIRECTION);
+ return (long long)
+ __builtin_ia32_vcvttsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_sqrt_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_cvttsh_u64 (__m128h __A)
{
- return __builtin_ia32_sqrtsh_mask_round (__B, __A,
- _mm_setzero_ph (),
- (__mmask8) -1, __C);
+ return (long long)
+ __builtin_ia32_vcvttsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_sqrt_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm_cvtt_roundsh_i64 (__m128h __A, const int __R)
{
- return __builtin_ia32_sqrtsh_mask_round (__D, __C, __A, __B,
- __E);
+ return (long long) __builtin_ia32_vcvttsh2si64_round (__A, __R);
}
-extern __inline __m128h
+extern __inline unsigned long long
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D)
+_mm_cvtt_roundsh_u64 (__m128h __A, const int __R)
{
- return __builtin_ia32_sqrtsh_mask_round (__C, __B,
- _mm_setzero_ph (),
- __A, __D);
+ return (long long) __builtin_ia32_vcvttsh2usi64_round (__A, __R);
}
#else
-#define _mm_sqrt_round_sh(A, B, C) \
- (__builtin_ia32_sqrtsh_mask_round ((B), (A), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (C)))
-
-#define _mm_mask_sqrt_round_sh(A, B, C, D, E) \
- (__builtin_ia32_sqrtsh_mask_round ((D), (C), (A), (B), (E)))
-
-#define _mm_maskz_sqrt_round_sh(A, B, C, D) \
- (__builtin_ia32_sqrtsh_mask_round ((C), (B), \
- _mm_setzero_ph (), \
- (A), (D)))
+#define _mm_cvtt_roundsh_i64(A, B) \
+ ((long long)__builtin_ia32_vcvttsh2si64_round ((A), (B)))
+#define _mm_cvtt_roundsh_u64(A, B) \
+ ((long long)__builtin_ia32_vcvttsh2usi64_round ((A), (B)))
#endif /* __OPTIMIZE__ */
+#endif /* __x86_64__ */
-/* Intrinsics vrcpph. */
-extern __inline __m512h
+/* Intrinsics vcvtsh2ss, vcvtsh2sd. */
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_rcp_ph (__m512h __A)
+_mm_cvtsh_ss (__m128 __A, __m128h __B)
{
- return __builtin_ia32_rcpph512_mask (__A, _mm512_setzero_ph (),
- (__mmask32) -1);
+ return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
+ _mm_setzero_ps (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_rcp_ph (__m512h __A, __mmask32 __B, __m512h __C)
+_mm_mask_cvtsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
+ __m128h __D)
{
- return __builtin_ia32_rcpph512_mask (__C, __A, __B);
+ return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_rcp_ph (__mmask32 __A, __m512h __B)
+_mm_maskz_cvtsh_ss (__mmask8 __A, __m128 __B,
+ __m128h __C)
{
- return __builtin_ia32_rcpph512_mask (__B, _mm512_setzero_ph (),
- __A);
+ return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
+ _mm_setzero_ps (),
+ __A, _MM_FROUND_CUR_DIRECTION);
}
-/* Intrinsics vrcpsh. */
-extern __inline __m128h
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_rcp_sh (__m128h __A, __m128h __B)
+_mm_cvtsh_sd (__m128d __A, __m128h __B)
{
- return __builtin_ia32_rcpsh_mask (__B, __A, _mm_setzero_ph (),
- (__mmask8) -1);
+ return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
+ _mm_setzero_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_rcp_sh (__m128h __A, __mmask32 __B, __m128h __C, __m128h __D)
+_mm_mask_cvtsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
+ __m128h __D)
{
- return __builtin_ia32_rcpsh_mask (__D, __C, __A, __B);
+ return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_rcp_sh (__mmask32 __A, __m128h __B, __m128h __C)
+_mm_maskz_cvtsh_sd (__mmask8 __A, __m128d __B, __m128h __C)
{
- return __builtin_ia32_rcpsh_mask (__C, __B, _mm_setzero_ph (),
- __A);
+ return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
+ _mm_setzero_pd (),
+ __A, _MM_FROUND_CUR_DIRECTION);
}
-/* Intrinsics vscalefph. */
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_ph (__m512h __A, __m512h __B)
+_mm_cvt_roundsh_ss (__m128 __A, __m128h __B, const int __R)
{
- return __builtin_ia32_scalefph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
+ _mm_setzero_ps (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512h
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+_mm_mask_cvt_roundsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
+ __m128h __D, const int __R)
{
- return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, __R);
}
-extern __inline __m512h
+extern __inline __m128
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_ph (__mmask32 __A, __m512h __B, __m512h __C)
+_mm_maskz_cvt_roundsh_ss (__mmask8 __A, __m128 __B,
+ __m128h __C, const int __R)
{
- return __builtin_ia32_scalefph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
+ _mm_setzero_ps (),
+ __A, __R);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_scalef_round_ph (__m512h __A, __m512h __B, const int __C)
+_mm_cvt_roundsh_sd (__m128d __A, __m128h __B, const int __R)
{
- return __builtin_ia32_scalefph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1, __C);
+ return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
+ _mm_setzero_pd (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512h
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_scalef_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm_mask_cvt_roundsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
+ __m128h __D, const int __R)
{
- return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B,
- __E);
+ return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, __R);
}
-extern __inline __m512h
+extern __inline __m128d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_scalef_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
- const int __D)
+_mm_maskz_cvt_roundsh_sd (__mmask8 __A, __m128d __B, __m128h __C, const int __R)
{
- return __builtin_ia32_scalefph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
+ _mm_setzero_pd (),
+ __A, __R);
}
#else
-#define _mm512_scalef_round_ph(A, B, C) \
- (__builtin_ia32_scalefph512_mask_round ((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (C)))
+#define _mm_cvt_roundsh_ss(A, B, R) \
+ (__builtin_ia32_vcvtsh2ss_mask_round ((B), (A), \
+ _mm_setzero_ps (), \
+ (__mmask8) -1, (R)))
-#define _mm512_mask_scalef_round_ph(A, B, C, D, E) \
- (__builtin_ia32_scalefph512_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_cvt_roundsh_ss(A, B, C, D, R) \
+ (__builtin_ia32_vcvtsh2ss_mask_round ((D), (C), (A), (B), (R)))
-#define _mm512_maskz_scalef_round_ph(A, B, C, D) \
- (__builtin_ia32_scalefph512_mask_round ((B), (C), \
- _mm512_setzero_ph (), \
- (A), (D)))
+#define _mm_maskz_cvt_roundsh_ss(A, B, C, R) \
+ (__builtin_ia32_vcvtsh2ss_mask_round ((C), (B), \
+ _mm_setzero_ps (), \
+ (A), (R)))
-#endif /* __OPTIMIZE__ */
+#define _mm_cvt_roundsh_sd(A, B, R) \
+ (__builtin_ia32_vcvtsh2sd_mask_round ((B), (A), \
+ _mm_setzero_pd (), \
+ (__mmask8) -1, (R)))
-/* Intrinsics vscalefsh. */
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_sh (__m128h __A, __m128h __B)
+#define _mm_mask_cvt_roundsh_sd(A, B, C, D, R) \
+ (__builtin_ia32_vcvtsh2sd_mask_round ((D), (C), (A), (B), (R)))
+
+#define _mm_maskz_cvt_roundsh_sd(A, B, C, R) \
+ (__builtin_ia32_vcvtsh2sd_mask_round ((C), (B), \
+ _mm_setzero_pd (), \
+ (A), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtss2sh, vcvtsd2sh. */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtss_sh (__m128h __A, __m128 __B)
{
- return __builtin_ia32_scalefsh_mask_round (__A, __B,
- _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_scalef_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_mask_cvtss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D)
{
- return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_scalef_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_maskz_cvtss_sh (__mmask8 __A, __m128h __B, __m128 __C)
{
- return __builtin_ia32_scalefsh_mask_round (__B, __C,
- _mm_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
+ _mm_setzero_ph (),
+ __A, _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_scalef_round_sh (__m128h __A, __m128h __B, const int __C)
+_mm_cvtsd_sh (__m128h __A, __m128d __B)
{
- return __builtin_ia32_scalefsh_mask_round (__A, __B,
- _mm_setzero_ph (),
- (__mmask8) -1, __C);
+ return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_scalef_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm_mask_cvtsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D)
{
- return __builtin_ia32_scalefsh_mask_round (__C, __D, __A, __B,
- __E);
+ return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- const int __D)
+_mm_maskz_cvtsd_sh (__mmask8 __A, __m128h __B, __m128d __C)
{
- return __builtin_ia32_scalefsh_mask_round (__B, __C,
- _mm_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
+ _mm_setzero_ph (),
+ __A, _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm_scalef_round_sh(A, B, C) \
- (__builtin_ia32_scalefsh_mask_round ((A), (B), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (C)))
-
-#define _mm_mask_scalef_round_sh(A, B, C, D, E) \
- (__builtin_ia32_scalefsh_mask_round ((C), (D), (A), (B), (E)))
-
-#define _mm_maskz_scalef_round_sh(A, B, C, D) \
- (__builtin_ia32_scalefsh_mask_round ((B), (C), _mm_setzero_ph (), \
- (A), (D)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vreduceph. */
#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_reduce_ph (__m512h __A, int __B)
+_mm_cvt_roundss_sh (__m128h __A, __m128 __B, const int __R)
{
- return __builtin_ia32_reduceph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_reduce_ph (__m512h __A, __mmask32 __B, __m512h __C, int __D)
+_mm_mask_cvt_roundss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D,
+ const int __R)
{
- return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_reduce_ph (__mmask32 __A, __m512h __B, int __C)
+_mm_maskz_cvt_roundss_sh (__mmask8 __A, __m128h __B, __m128 __C,
+ const int __R)
{
- return __builtin_ia32_reduceph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
+ _mm_setzero_ph (),
+ __A, __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_reduce_round_ph (__m512h __A, int __B, const int __C)
+_mm_cvt_roundsd_sh (__m128h __A, __m128d __B, const int __R)
{
- return __builtin_ia32_reduceph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1, __C);
+ return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1, __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_reduce_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
- int __D, const int __E)
+_mm_mask_cvt_roundsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D,
+ const int __R)
{
- return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B,
- __E);
+ return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_reduce_round_ph (__mmask32 __A, __m512h __B, int __C,
- const int __D)
+_mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C,
+ const int __R)
{
- return __builtin_ia32_reduceph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A, __D);
+ return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
+ _mm_setzero_ph (),
+ __A, __R);
}
#else
-#define _mm512_reduce_ph(A, B) \
- (__builtin_ia32_reduceph512_mask_round ((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, \
- _MM_FROUND_CUR_DIRECTION))
+#define _mm_cvt_roundss_sh(A, B, R) \
+ (__builtin_ia32_vcvtss2sh_mask_round ((B), (A), \
+ _mm_setzero_ph (), \
+ (__mmask8) -1, R))
-#define _mm512_mask_reduce_ph(A, B, C, D) \
- (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), \
- _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask_cvt_roundss_sh(A, B, C, D, R) \
+ (__builtin_ia32_vcvtss2sh_mask_round ((D), (C), (A), (B), (R)))
-#define _mm512_maskz_reduce_ph(A, B, C) \
- (__builtin_ia32_reduceph512_mask_round ((B), (C), \
- _mm512_setzero_ph (), \
- (A), _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_cvt_roundss_sh(A, B, C, R) \
+ (__builtin_ia32_vcvtss2sh_mask_round ((C), (B), \
+ _mm_setzero_ph (), \
+ A, R))
-#define _mm512_reduce_round_ph(A, B, C) \
- (__builtin_ia32_reduceph512_mask_round ((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (C)))
+#define _mm_cvt_roundsd_sh(A, B, R) \
+ (__builtin_ia32_vcvtsd2sh_mask_round ((B), (A), \
+ _mm_setzero_ph (), \
+ (__mmask8) -1, R))
-#define _mm512_mask_reduce_round_ph(A, B, C, D, E) \
- (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), (E)))
+#define _mm_mask_cvt_roundsd_sh(A, B, C, D, R) \
+ (__builtin_ia32_vcvtsd2sh_mask_round ((D), (C), (A), (B), (R)))
-#define _mm512_maskz_reduce_round_ph(A, B, C, D) \
- (__builtin_ia32_reduceph512_mask_round ((B), (C), \
- _mm512_setzero_ph (), \
- (A), (D)))
+#define _mm_maskz_cvt_roundsd_sh(A, B, C, R) \
+ (__builtin_ia32_vcvtsd2sh_mask_round ((C), (B), \
+ _mm_setzero_ph (), \
+ (A), (R)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vreducesh. */
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline _Float16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_sh (__m128h __A, __m128h __B, int __C)
+_mm_cvtsh_h (__m128h __A)
{
- return __builtin_ia32_reducesh_mask_round (__A, __B, __C,
- _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __A[0];
}
+/* Intrinsics vfmadd[132,213,231]sh. */
extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, int __E)
+ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_sh (__m128h __W, __m128h __A, __m128h __B)
{
- return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
+_mm_mask_fmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
{
- return __builtin_ia32_reducesh_mask_round (__B, __C, __D,
- _mm_setzero_ph (), __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_reduce_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
+_mm_mask3_fmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
{
- return __builtin_ia32_reducesh_mask_round (__A, __B, __C,
- _mm_setzero_ph (),
- (__mmask8) -1, __D);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_reduce_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, int __E, const int __F)
+_mm_maskz_fmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
{
- return __builtin_ia32_reducesh_mask_round (__C, __D, __E, __A,
- __B, __F);
+ return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
+
+#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_reduce_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- int __D, const int __E)
+_mm_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
{
- return __builtin_ia32_reducesh_mask_round (__B, __C, __D,
- _mm_setzero_ph (),
- __A, __E);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) -1,
+ __R);
}
-#else
-#define _mm_reduce_sh(A, B, C) \
- (__builtin_ia32_reducesh_mask_round ((A), (B), (C), \
- _mm_setzero_ph (), \
- (__mmask8)-1, \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_reduce_sh(A, B, C, D, E) \
- (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_reduce_sh(A, B, C, D) \
- (__builtin_ia32_reducesh_mask_round ((B), (C), (D), \
- _mm_setzero_ph (), \
- (A), _MM_FROUND_CUR_DIRECTION))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+ const int __R)
+{
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U, __R);
+}
-#define _mm_reduce_round_sh(A, B, C, D) \
- (__builtin_ia32_reducesh_mask_round ((A), (B), (C), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (D)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+ const int __R)
+{
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U, __R);
+}
-#define _mm_mask_reduce_round_sh(A, B, C, D, E, F) \
- (__builtin_ia32_reducesh_mask_round ((C), (D), (E), (A), (B), (F)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+ __m128h __B, const int __R)
+{
+ return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U, __R);
+}
-#define _mm_maskz_reduce_round_sh(A, B, C, D, E) \
- (__builtin_ia32_reducesh_mask_round ((B), (C), (D), \
- _mm_setzero_ph (), \
- (A), (E)))
+#else
+#define _mm_fmadd_round_sh(A, B, C, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (-1), (R)))
+#define _mm_mask_fmadd_round_sh(A, U, B, C, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (U), (R)))
+#define _mm_mask3_fmadd_round_sh(A, B, C, U, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_mask3 ((A), (B), (C), (U), (R)))
+#define _mm_maskz_fmadd_round_sh(U, A, B, C, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), (C), (U), (R)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vrndscaleph. */
-#ifdef __OPTIMIZE__
-extern __inline __m512h
+/* Intrinsics vfnmadd[132,213,231]sh. */
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_roundscale_ph (__m512h __A, int __B)
+_mm_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B)
{
- return __builtin_ia32_rndscaleph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_roundscale_ph (__m512h __A, __mmask32 __B,
- __m512h __C, int __D)
+_mm_mask_fnmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
{
- return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A, __B,
+ return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_roundscale_ph (__mmask32 __A, __m512h __B, int __C)
+_mm_mask3_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
{
- return __builtin_ia32_rndscaleph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_roundscale_round_ph (__m512h __A, int __B, const int __C)
+_mm_maskz_fnmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
{
- return __builtin_ia32_rndscaleph512_mask_round (__A, __B,
- _mm512_setzero_ph (),
- (__mmask32) -1,
- __C);
+ return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_roundscale_round_ph (__m512h __A, __mmask32 __B,
- __m512h __C, int __D, const int __E)
+_mm_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
{
- return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A,
- __B, __E);
+ return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) -1,
+ __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_roundscale_round_ph (__mmask32 __A, __m512h __B, int __C,
- const int __D)
+_mm_mask_fnmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+ const int __R)
{
- return __builtin_ia32_rndscaleph512_mask_round (__B, __C,
- _mm512_setzero_ph (),
- __A, __D);
+ return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U, __R);
}
-#else
-#define _mm512_roundscale_ph(A, B) \
- (__builtin_ia32_rndscaleph512_mask_round ((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_mask_roundscale_ph(A, B, C, D) \
- (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_maskz_roundscale_ph(A, B, C) \
- (__builtin_ia32_rndscaleph512_mask_round ((B), (C), \
- _mm512_setzero_ph (), \
- (A), \
- _MM_FROUND_CUR_DIRECTION))
-#define _mm512_roundscale_round_ph(A, B, C) \
- (__builtin_ia32_rndscaleph512_mask_round ((A), (B), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, (C)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+ const int __R)
+{
+ return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U, __R);
+}
-#define _mm512_mask_roundscale_round_ph(A, B, C, D, E) \
- (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), (E)))
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+ __m128h __B, const int __R)
+{
+ return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U, __R);
+}
-#define _mm512_maskz_roundscale_round_ph(A, B, C, D) \
- (__builtin_ia32_rndscaleph512_mask_round ((B), (C), \
- _mm512_setzero_ph (), \
- (A), (D)))
+#else
+#define _mm_fnmadd_round_sh(A, B, C, R) \
+ ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (-1), (R)))
+#define _mm_mask_fnmadd_round_sh(A, U, B, C, R) \
+ ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (U), (R)))
+#define _mm_mask3_fnmadd_round_sh(A, B, C, U, R) \
+ ((__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((A), (B), (C), (U), (R)))
+#define _mm_maskz_fnmadd_round_sh(U, A, B, C, R) \
+ ((__m128h) __builtin_ia32_vfnmaddsh3_maskz ((A), (B), (C), (U), (R)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vrndscalesh. */
-#ifdef __OPTIMIZE__
+/* Intrinsics vfmsub[132,213,231]sh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_sh (__m128h __A, __m128h __B, int __C)
+_mm_fmsub_sh (__m128h __W, __m128h __A, __m128h __B)
{
- return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C,
- _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, int __E)
+_mm_mask_fmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
{
- return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
+_mm_mask3_fmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
{
- return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D,
- _mm_setzero_ph (), __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_roundscale_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
+_mm_maskz_fmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
{
- return __builtin_ia32_rndscalesh_mask_round (__A, __B, __C,
- _mm_setzero_ph (),
- (__mmask8) -1,
- __D);
+ return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+ (__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
+
+#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_roundscale_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, int __E, const int __F)
+_mm_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
{
- return __builtin_ia32_rndscalesh_mask_round (__C, __D, __E,
- __A, __B, __F);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) -1,
+ __R);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
- int __D, const int __E)
+_mm_mask_fmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+ const int __R)
{
- return __builtin_ia32_rndscalesh_mask_round (__B, __C, __D,
- _mm_setzero_ph (),
- __A, __E);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ (__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) __U, __R);
}
-#else
-#define _mm_roundscale_sh(A, B, C) \
- (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C), \
- _mm_setzero_ph (), \
- (__mmask8)-1, \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_roundscale_sh(A, B, C, D, E) \
- (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_roundscale_sh(A, B, C, D) \
- (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D), \
- _mm_setzero_ph (), \
- (A), _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_roundscale_round_sh(A, B, C, D) \
- (__builtin_ia32_rndscalesh_mask_round ((A), (B), (C), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (D)))
-
-#define _mm_mask_roundscale_round_sh(A, B, C, D, E, F) \
- (__builtin_ia32_rndscalesh_mask_round ((C), (D), (E), (A), (B), (F)))
-
-#define _mm_maskz_roundscale_round_sh(A, B, C, D, E) \
- (__builtin_ia32_rndscalesh_mask_round ((B), (C), (D), \
- _mm_setzero_ph (), \
- (A), (E)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vfpclasssh. */
-#ifdef __OPTIMIZE__
-extern __inline __mmask8
- __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fpclass_sh_mask (__m128h __A, const int __imm)
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+ const int __R)
{
- return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm,
- (__mmask8) -1);
+ return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+ (__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __mmask8
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fpclass_sh_mask (__mmask8 __U, __m128h __A, const int __imm)
+_mm_maskz_fmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+ __m128h __B, const int __R)
{
- return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, __U);
+ return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+ (__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) __U, __R);
}
#else
-#define _mm_fpclass_sh_mask(X, C) \
- ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \
- (int) (C), (__mmask8) (-1))) \
+#define _mm_fmsub_round_sh(A, B, C, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (-1), (R)))
+#define _mm_mask_fmsub_round_sh(A, U, B, C, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (U), (R)))
+#define _mm_mask3_fmsub_round_sh(A, B, C, U, R) \
+ ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), (B), (C), (U), (R)))
+#define _mm_maskz_fmsub_round_sh(U, A, B, C, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), -(C), (U), (R)))
-#define _mm_mask_fpclass_sh_mask(U, X, C) \
- ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X), \
- (int) (C), (__mmask8) (U)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vfpclassph. */
-#ifdef __OPTIMIZE__
-extern __inline __mmask32
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fpclass_ph_mask (__mmask32 __U, __m512h __A,
- const int __imm)
+/* Intrinsics vfnmsub[132,213,231]sh. */
+extern __inline __m128h
+ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B)
{
- return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
- __imm, __U);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ -(__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __mmask32
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fpclass_ph_mask (__m512h __A, const int __imm)
+_mm_mask_fnmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
{
- return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
- __imm,
- (__mmask32) -1);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ -(__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm512_mask_fpclass_ph_mask(u, x, c) \
- ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
- (int) (c),(__mmask8)(u)))
-
-#define _mm512_fpclass_ph_mask(x, c) \
- ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
- (int) (c),(__mmask8)-1))
-#endif /* __OPIMTIZE__ */
-
-/* Intrinsics vgetexpph, vgetexpsh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_sh (__m128h __A, __m128h __B)
+_mm_mask3_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
{
- return (__m128h)
- __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
- (__v8hf) _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+ -(__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm_maskz_fnmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
{
- return (__m128h)
- __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
- (__v8hf) __W, (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+ -(__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
+
+#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_sh (__mmask8 __U, __m128h __A, __m128h __B)
+_mm_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
{
- return (__m128h)
- __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
- (__v8hf) _mm_setzero_ph (),
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ -(__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) -1,
+ __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getexp_ph (__m512h __A)
+_mm_mask_fnmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+ const int __R)
{
- return (__m512h)
- __builtin_ia32_getexpph512_mask ((__v32hf) __A,
- (__v32hf) _mm512_setzero_ph (),
- (__mmask32) -1, _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+ -(__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getexp_ph (__m512h __W, __mmask32 __U, __m512h __A)
+_mm_mask3_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+ const int __R)
{
- return (__m512h)
- __builtin_ia32_getexpph512_mask ((__v32hf) __A, (__v32hf) __W,
- (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+ -(__v8hf) __A,
+ (__v8hf) __B,
+ (__mmask8) __U, __R);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getexp_ph (__mmask32 __U, __m512h __A)
+_mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+ __m128h __B, const int __R)
{
- return (__m512h)
- __builtin_ia32_getexpph512_mask ((__v32hf) __A,
- (__v32hf) _mm512_setzero_ph (),
- (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
+ return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+ -(__v8hf) __A,
+ -(__v8hf) __B,
+ (__mmask8) __U, __R);
}
-#ifdef __OPTIMIZE__
+#else
+#define _mm_fnmsub_round_sh(A, B, C, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (-1), (R)))
+#define _mm_mask_fnmsub_round_sh(A, U, B, C, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (U), (R)))
+#define _mm_mask3_fnmsub_round_sh(A, B, C, U, R) \
+ ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), -(B), (C), (U), (R)))
+#define _mm_maskz_fnmsub_round_sh(U, A, B, C, R) \
+ ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), -(B), -(C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vf[,c]maddcsh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getexp_round_sh (__m128h __A, __m128h __B, const int __R)
+_mm_mask_fcmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
- (__v8hf) __B,
- _mm_setzero_ph (),
- (__mmask8) -1,
- __R);
+ return (__m128h)
+ __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A,
+ (__v8hf) __C,
+ (__v8hf) __D, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getexp_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
- __m128h __B, const int __R)
+_mm_mask3_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
{
- return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf) __W,
- (__mmask8) __U, __R);
+ return (__m128h)
+ __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf) __C, __D,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getexp_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
- const int __R)
+_mm_maskz_fcmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
{
- return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf)
- _mm_setzero_ph (),
- (__mmask8) __U, __R);
+ return (__m128h)
+ __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B,
+ (__v8hf) __C,
+ (__v8hf) __D,
+ __A, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getexp_round_ph (__m512h __A, const int __R)
+_mm_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C)
{
- return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
- (__v32hf)
- _mm512_setzero_ph (),
- (__mmask32) -1, __R);
+ return (__m128h)
+ __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf) __C,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getexp_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
- const int __R)
-{
- return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
- (__v32hf) __W,
- (__mmask32) __U, __R);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getexp_round_ph (__mmask32 __U, __m512h __A, const int __R)
+_mm_mask_fmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
- (__v32hf)
- _mm512_setzero_ph (),
- (__mmask32) __U, __R);
+ return (__m128h)
+ __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A,
+ (__v8hf) __C,
+ (__v8hf) __D, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm_getexp_round_sh(A, B, R) \
- ((__m128h)__builtin_ia32_getexpsh_mask_round((__v8hf)(__m128h)(A), \
- (__v8hf)(__m128h)(B), \
- (__v8hf)_mm_setzero_ph(), \
- (__mmask8)-1, R))
-
-#define _mm_mask_getexp_round_sh(W, U, A, B, C) \
- (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, W, U, C)
-
-#define _mm_maskz_getexp_round_sh(U, A, B, C) \
- (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, \
- (__v8hf)_mm_setzero_ph(), \
- U, C)
-
-#define _mm512_getexp_round_ph(A, R) \
- ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \
- (__v32hf)_mm512_setzero_ph(), (__mmask32)-1, R))
-
-#define _mm512_mask_getexp_round_ph(W, U, A, R) \
- ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \
- (__v32hf)(__m512h)(W), (__mmask32)(U), R))
-
-#define _mm512_maskz_getexp_round_ph(U, A, R) \
- ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \
- (__v32hf)_mm512_setzero_ph(), (__mmask32)(U), R))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vgetmantph, vgetmantsh. */
-#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_sh (__m128h __A, __m128h __B,
- _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D)
+_mm_mask3_fmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
{
return (__m128h)
- __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
- (__D << 2) | __C, _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf) __C, __D,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_sh (__m128h __W, __mmask8 __U, __m128h __A,
- __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D)
+_mm_maskz_fmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
{
return (__m128h)
- __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
- (__D << 2) | __C, (__v8hf) __W,
- __U, _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B,
+ (__v8hf) __C,
+ (__v8hf) __D,
+ __A, _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_sh (__mmask8 __U, __m128h __A, __m128h __B,
- _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D)
+_mm_fmadd_sch (__m128h __A, __m128h __B, __m128h __C)
{
return (__m128h)
- __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
- (__D << 2) | __C,
- (__v8hf) _mm_setzero_ph(),
- __U, _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmaddcsh_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf) __C,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getmant_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C)
+_mm_mask_fcmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
- (__C << 2) | __B,
- _mm512_setzero_ph (),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h)
+ __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A,
+ (__v8hf) __C,
+ (__v8hf) __D,
+ __B, __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getmant_ph (__m512h __W, __mmask32 __U, __m512h __A,
- _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C)
+_mm_mask3_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
+ __mmask8 __D, const int __E)
{
- return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
- (__C << 2) | __B,
- (__v32hf) __W, __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h)
+ __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf) __C,
+ __D, __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getmant_ph (__mmask32 __U, __m512h __A,
- _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C)
+_mm_maskz_fcmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
- (__C << 2) | __B,
- (__v32hf)
- _mm512_setzero_ph (),
- __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h)
+ __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B,
+ (__v8hf) __C,
+ (__v8hf) __D,
+ __A, __E);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_getmant_round_sh (__m128h __A, __m128h __B,
- _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
{
- return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
- (__v8hf) __B,
- (__D << 2) | __C,
- _mm_setzero_ph (),
- (__mmask8) -1,
- __R);
+ return (__m128h)
+ __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf) __C,
+ __D);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_getmant_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
- __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm_mask_fmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
- (__v8hf) __B,
- (__D << 2) | __C,
- (__v8hf) __W,
- __U, __R);
+ return (__m128h)
+ __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A,
+ (__v8hf) __C,
+ (__v8hf) __D,
+ __B, __E);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_getmant_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
- _MM_MANTISSA_NORM_ENUM __C,
- _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+_mm_mask3_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
+ __mmask8 __D, const int __E)
{
- return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
- (__v8hf) __B,
- (__D << 2) | __C,
- (__v8hf)
- _mm_setzero_ph(),
- __U, __R);
+ return (__m128h)
+ __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf) __C,
+ __D, __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_getmant_round_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm_maskz_fmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
- (__C << 2) | __B,
- _mm512_setzero_ph (),
- (__mmask32) -1, __R);
+ return (__m128h)
+ __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B,
+ (__v8hf) __C,
+ (__v8hf) __D,
+ __A, __E);
}
-extern __inline __m512h
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_getmant_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
- _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+_mm_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
{
- return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
- (__C << 2) | __B,
- (__v32hf) __W, __U,
- __R);
+ return (__m128h)
+ __builtin_ia32_vfmaddcsh_round ((__v8hf) __A,
+ (__v8hf) __B,
+ (__v8hf) __C,
+ __D);
}
+#else
+#define _mm_mask_fcmadd_round_sch(A, B, C, D, E) \
+ ((__m128h) \
+ __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) (A), \
+ (__v8hf) (C), \
+ (__v8hf) (D), \
+ (B), (E)))
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A,
- _MM_MANTISSA_NORM_ENUM __B,
- _MM_MANTISSA_SIGN_ENUM __C, const int __R)
-{
- return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
- (__C << 2) | __B,
- (__v32hf)
- _mm512_setzero_ph (),
- __U, __R);
-}
-#else
-#define _mm512_getmant_ph(X, B, C) \
- ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
- (int)(((C)<<2) | (B)), \
- (__v32hf)(__m512h) \
- _mm512_setzero_ph(), \
- (__mmask32)-1, \
- _MM_FROUND_CUR_DIRECTION))
+#define _mm_mask3_fcmadd_round_sch(A, B, C, D, E) \
+ ((__m128h) \
+ __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) (A), \
+ (__v8hf) (B), \
+ (__v8hf) (C), \
+ (D), (E)))
-#define _mm512_mask_getmant_ph(W, U, X, B, C) \
- ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
- (int)(((C)<<2) | (B)), \
- (__v32hf)(__m512h)(W), \
- (__mmask32)(U), \
- _MM_FROUND_CUR_DIRECTION))
+#define _mm_maskz_fcmadd_round_sch(A, B, C, D, E) \
+ __builtin_ia32_vfcmaddcsh_maskz_round ((B), (C), (D), (A), (E))
+#define _mm_fcmadd_round_sch(A, B, C, D) \
+ __builtin_ia32_vfcmaddcsh_round ((A), (B), (C), (D))
-#define _mm512_maskz_getmant_ph(U, X, B, C) \
- ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
- (int)(((C)<<2) | (B)), \
- (__v32hf)(__m512h) \
- _mm512_setzero_ph(), \
- (__mmask32)(U), \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_getmant_sh(X, Y, C, D) \
- ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
- (__v8hf)(__m128h)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v8hf)(__m128h) \
- _mm_setzero_ph (), \
- (__mmask8)-1, \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_mask_getmant_sh(W, U, X, Y, C, D) \
- ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
- (__v8hf)(__m128h)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v8hf)(__m128h)(W), \
- (__mmask8)(U), \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm_maskz_getmant_sh(U, X, Y, C, D) \
- ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
- (__v8hf)(__m128h)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v8hf)(__m128h) \
- _mm_setzero_ph(), \
- (__mmask8)(U), \
- _MM_FROUND_CUR_DIRECTION))
-
-#define _mm512_getmant_round_ph(X, B, C, R) \
- ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
- (int)(((C)<<2) | (B)), \
- (__v32hf)(__m512h) \
- _mm512_setzero_ph(), \
- (__mmask32)-1, \
- (R)))
-
-#define _mm512_mask_getmant_round_ph(W, U, X, B, C, R) \
- ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
- (int)(((C)<<2) | (B)), \
- (__v32hf)(__m512h)(W), \
- (__mmask32)(U), \
- (R)))
-
-
-#define _mm512_maskz_getmant_round_ph(U, X, B, C, R) \
- ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
- (int)(((C)<<2) | (B)), \
- (__v32hf)(__m512h) \
- _mm512_setzero_ph(), \
- (__mmask32)(U), \
- (R)))
+#define _mm_mask_fmadd_round_sch(A, B, C, D, E) \
+ ((__m128h) \
+ __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) (A), \
+ (__v8hf) (C), \
+ (__v8hf) (D), \
+ (B), (E)))
-#define _mm_getmant_round_sh(X, Y, C, D, R) \
- ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
- (__v8hf)(__m128h)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v8hf)(__m128h) \
- _mm_setzero_ph (), \
- (__mmask8)-1, \
- (R)))
+#define _mm_mask3_fmadd_round_sch(A, B, C, D, E) \
+ ((__m128h) \
+ __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) (A), \
+ (__v8hf) (B), \
+ (__v8hf) (C), \
+ (D), (E)))
-#define _mm_mask_getmant_round_sh(W, U, X, Y, C, D, R) \
- ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
- (__v8hf)(__m128h)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v8hf)(__m128h)(W), \
- (__mmask8)(U), \
- (R)))
+#define _mm_maskz_fmadd_round_sch(A, B, C, D, E) \
+ __builtin_ia32_vfmaddcsh_maskz_round ((B), (C), (D), (A), (E))
-#define _mm_maskz_getmant_round_sh(U, X, Y, C, D, R) \
- ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X), \
- (__v8hf)(__m128h)(Y), \
- (int)(((D)<<2) | (C)), \
- (__v8hf)(__m128h) \
- _mm_setzero_ph(), \
- (__mmask8)(U), \
- (R)))
+#define _mm_fmadd_round_sch(A, B, C, D) \
+ __builtin_ia32_vfmaddcsh_round ((A), (B), (C), (D))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vmovw. */
-extern __inline __m128i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsi16_si128 (short __A)
-{
- return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A);
-}
-
-extern __inline short
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsi128_si16 (__m128i __A)
-{
- return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0);
-}
-
-/* Intrinsics vmovsh. */
+/* Intrinsics vf[,c]mulcsh. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C)
+_mm_fcmul_sch (__m128h __A, __m128h __B)
{
- return __builtin_ia32_loadsh_mask (__C, __A, __B);
+ return (__m128h)
+ __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,
+ (__v8hf) __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B)
+_mm_mask_fcmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A);
+ return (__m128h)
+ __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,
+ (__v8hf) __D,
+ (__v8hf) __A,
+ __B, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline void
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C)
+_mm_maskz_fcmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
{
- __builtin_ia32_storesh_mask (__A, __C, __B);
+ return (__m128h)
+ __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,
+ (__v8hf) __C,
+ _mm_setzero_ph (),
+ __A, _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_move_sh (__m128h __A, __m128h __B)
+_mm_fmul_sch (__m128h __A, __m128h __B)
{
- __A[0] = __B[0];
- return __A;
+ return (__m128h)
+ __builtin_ia32_vfmulcsh_round ((__v8hf) __A,
+ (__v8hf) __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm_mask_fmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
{
- return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B);
+ return (__m128h)
+ __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,
+ (__v8hf) __D,
+ (__v8hf) __A,
+ __B, _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_move_sh (__mmask8 __A, __m128h __B, __m128h __C)
+_mm_maskz_fmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
{
- return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A);
+ return (__m128h)
+ __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,
+ (__v8hf) __C,
+ _mm_setzero_ph (),
+ __A, _MM_FROUND_CUR_DIRECTION);
}
-/* Intrinsics vcvtph2dq. */
-extern __inline __m512i
+#ifdef __OPTIMIZE__
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epi32 (__m256h __A)
+_mm_fcmul_round_sch (__m128h __A, __m128h __B, const int __D)
{
- return (__m512i)
- __builtin_ia32_vcvtph2dq512_mask_round (__A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h)
+ __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,
+ (__v8hf) __B,
+ __D);
}
-extern __inline __m512i
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
+_mm_mask_fcmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return (__m512i)
- __builtin_ia32_vcvtph2dq512_mask_round (__C,
- (__v16si) __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h)
+ __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,
+ (__v8hf) __D,
+ (__v8hf) __A,
+ __B, __E);
}
-extern __inline __m512i
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epi32 (__mmask16 __A, __m256h __B)
+_mm_maskz_fcmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
+ const int __E)
{
- return (__m512i)
- __builtin_ia32_vcvtph2dq512_mask_round (__B,
- (__v16si)
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m128h)
+ __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,
+ (__v8hf) __C,
+ _mm_setzero_ph (),
+ __A, __E);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epi32 (__m256h __A, int __B)
+_mm_fmul_round_sch (__m128h __A, __m128h __B, const int __D)
{
- return (__m512i)
- __builtin_ia32_vcvtph2dq512_mask_round (__A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) -1,
- __B);
+ return (__m128h)
+ __builtin_ia32_vfmulcsh_round ((__v8hf) __A,
+ (__v8hf) __B, __D);
}
-extern __inline __m512i
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
+_mm_mask_fmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+ __m128h __D, const int __E)
{
- return (__m512i)
- __builtin_ia32_vcvtph2dq512_mask_round (__C,
- (__v16si) __A,
- __B,
- __D);
+ return (__m128h)
+ __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,
+ (__v8hf) __D,
+ (__v8hf) __A,
+ __B, __E);
}
-extern __inline __m512i
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
+_mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E)
{
- return (__m512i)
- __builtin_ia32_vcvtph2dq512_mask_round (__B,
- (__v16si)
- _mm512_setzero_si512 (),
- __A,
- __C);
+ return (__m128h)
+ __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,
+ (__v8hf) __C,
+ _mm_setzero_ph (),
+ __A, __E);
}
#else
-#define _mm512_cvt_roundph_epi32(A, B) \
- ((__m512i) \
- __builtin_ia32_vcvtph2dq512_mask_round ((A), \
- (__v16si) \
- _mm512_setzero_si512 (), \
- (__mmask16)-1, \
- (B)))
+#define _mm_fcmul_round_sch(__A, __B, __D) \
+ (__m128h) __builtin_ia32_vfcmulcsh_round ((__v8hf) __A, \
+ (__v8hf) __B, __D)
-#define _mm512_mask_cvt_roundph_epi32(A, B, C, D) \
- ((__m512i) \
- __builtin_ia32_vcvtph2dq512_mask_round ((C), (__v16si)(A), (B), (D)))
+#define _mm_mask_fcmul_round_sch(__A, __B, __C, __D, __E) \
+ (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C, \
+ (__v8hf) __D, \
+ (__v8hf) __A, \
+ __B, __E)
-#define _mm512_maskz_cvt_roundph_epi32(A, B, C) \
- ((__m512i) \
- __builtin_ia32_vcvtph2dq512_mask_round ((B), \
- (__v16si) \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
+#define _mm_maskz_fcmul_round_sch(__A, __B, __C, __E) \
+ (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B, \
+ (__v8hf) __C, \
+ _mm_setzero_ph (), \
+ __A, __E)
+
+#define _mm_fmul_round_sch(__A, __B, __D) \
+ (__m128h) __builtin_ia32_vfmulcsh_round ((__v8hf) __A, \
+ (__v8hf) __B, __D)
+
+#define _mm_mask_fmul_round_sch(__A, __B, __C, __D, __E) \
+ (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C, \
+ (__v8hf) __D, \
+ (__v8hf) __A, \
+ __B, __E)
+
+#define _mm_maskz_fmul_round_sch(__A, __B, __C, __E) \
+ (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B, \
+ (__v8hf) __C, \
+ _mm_setzero_ph (), \
+ __A, __E)
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtph2udq. */
-extern __inline __m512i
+#define _mm_mul_sch(A, B) _mm_fmul_sch ((A), (B))
+#define _mm_mask_mul_sch(W, U, A, B) _mm_mask_fmul_sch ((W), (U), (A), (B))
+#define _mm_maskz_mul_sch(U, A, B) _mm_maskz_fmul_sch ((U), (A), (B))
+#define _mm_mul_round_sch(A, B, R) _mm_fmul_round_sch ((A), (B), (R))
+#define _mm_mask_mul_round_sch(W, U, A, B, R) \
+ _mm_mask_fmul_round_sch ((W), (U), (A), (B), (R))
+#define _mm_maskz_mul_round_sch(U, A, B, R) \
+ _mm_maskz_fmul_round_sch ((U), (A), (B), (R))
+
+#define _mm_cmul_sch(A, B) _mm_fcmul_sch ((A), (B))
+#define _mm_mask_cmul_sch(W, U, A, B) _mm_mask_fcmul_sch ((W), (U), (A), (B))
+#define _mm_maskz_cmul_sch(U, A, B) _mm_maskz_fcmul_sch ((U), (A), (B))
+#define _mm_cmul_round_sch(A, B, R) _mm_fcmul_round_sch ((A), (B), (R))
+#define _mm_mask_cmul_round_sch(W, U, A, B, R) \
+ _mm_mask_fcmul_round_sch ((W), (U), (A), (B), (R))
+#define _mm_maskz_cmul_round_sch(U, A, B, R) \
+ _mm_maskz_fcmul_round_sch ((U), (A), (B), (R))
+
+#ifdef __DISABLE_AVX512FP16__
+#undef __DISABLE_AVX512FP16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512FP16__ */
+
+#if !defined (__AVX512FP16__) || !defined (__EVEX512__)
+#pragma GCC push_options
+#pragma GCC target("avx512fp16,evex512")
+#define __DISABLE_AVX512FP16_512__
+#endif /* __AVX512FP16_512__ */
+
+typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
+typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64), \
+ __may_alias__, __aligned__ (1)));
+
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epu32 (__m256h __A)
+_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29,
+ _Float16 __A28, _Float16 __A27, _Float16 __A26,
+ _Float16 __A25, _Float16 __A24, _Float16 __A23,
+ _Float16 __A22, _Float16 __A21, _Float16 __A20,
+ _Float16 __A19, _Float16 __A18, _Float16 __A17,
+ _Float16 __A16, _Float16 __A15, _Float16 __A14,
+ _Float16 __A13, _Float16 __A12, _Float16 __A11,
+ _Float16 __A10, _Float16 __A9, _Float16 __A8,
+ _Float16 __A7, _Float16 __A6, _Float16 __A5,
+ _Float16 __A4, _Float16 __A3, _Float16 __A2,
+ _Float16 __A1, _Float16 __A0)
{
- return (__m512i)
- __builtin_ia32_vcvtph2udq512_mask_round (__A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3,
+ __A4, __A5, __A6, __A7,
+ __A8, __A9, __A10, __A11,
+ __A12, __A13, __A14, __A15,
+ __A16, __A17, __A18, __A19,
+ __A20, __A21, __A22, __A23,
+ __A24, __A25, __A26, __A27,
+ __A28, __A29, __A30, __A31 };
}
-extern __inline __m512i
+/* Create vectors of elements in the reversed order from
+ _mm512_set_ph functions. */
+
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
+_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+ _Float16 __A3, _Float16 __A4, _Float16 __A5,
+ _Float16 __A6, _Float16 __A7, _Float16 __A8,
+ _Float16 __A9, _Float16 __A10, _Float16 __A11,
+ _Float16 __A12, _Float16 __A13, _Float16 __A14,
+ _Float16 __A15, _Float16 __A16, _Float16 __A17,
+ _Float16 __A18, _Float16 __A19, _Float16 __A20,
+ _Float16 __A21, _Float16 __A22, _Float16 __A23,
+ _Float16 __A24, _Float16 __A25, _Float16 __A26,
+ _Float16 __A27, _Float16 __A28, _Float16 __A29,
+ _Float16 __A30, _Float16 __A31)
+
{
- return (__m512i)
- __builtin_ia32_vcvtph2udq512_mask_round (__C,
- (__v16si) __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
+ return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25,
+ __A24, __A23, __A22, __A21, __A20, __A19, __A18,
+ __A17, __A16, __A15, __A14, __A13, __A12, __A11,
+ __A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3,
+ __A2, __A1, __A0);
}
-extern __inline __m512i
+/* Broadcast _Float16 to vector. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epu32 (__mmask16 __A, __m256h __B)
+_mm512_set1_ph (_Float16 __A)
{
- return (__m512i)
- __builtin_ia32_vcvtph2udq512_mask_round (__B,
- (__v16si)
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A,
+ __A, __A, __A, __A, __A, __A, __A, __A);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+/* Create a vector with all zeros. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epu32 (__m256h __A, int __B)
+_mm512_setzero_ph (void)
{
- return (__m512i)
- __builtin_ia32_vcvtph2udq512_mask_round (__A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) -1,
- __B);
+ return _mm512_set1_ph (0.0f16);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
+_mm512_undefined_ph (void)
{
- return (__m512i)
- __builtin_ia32_vcvtph2udq512_mask_round (__C,
- (__v16si) __A,
- __B,
- __D);
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-self"
+ __m512h __Y = __Y;
+#pragma GCC diagnostic pop
+ return __Y;
}
-extern __inline __m512i
+extern __inline _Float16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
+_mm512_cvtsh_h (__m512h __A)
{
- return (__m512i)
- __builtin_ia32_vcvtph2udq512_mask_round (__B,
- (__v16si)
- _mm512_setzero_si512 (),
- __A,
- __C);
+ return __A[0];
}
-#else
-#define _mm512_cvt_roundph_epu32(A, B) \
- ((__m512i) \
- __builtin_ia32_vcvtph2udq512_mask_round ((A), \
- (__v16si) \
- _mm512_setzero_si512 (), \
- (__mmask16)-1, \
- (B)))
-
-#define _mm512_mask_cvt_roundph_epu32(A, B, C, D) \
- ((__m512i) \
- __builtin_ia32_vcvtph2udq512_mask_round ((C), (__v16si)(A), (B), (D)))
-
-#define _mm512_maskz_cvt_roundph_epu32(A, B, C) \
- ((__m512i) \
- __builtin_ia32_vcvtph2udq512_mask_round ((B), \
- (__v16si) \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvttph2dq. */
-extern __inline __m512i
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epi32 (__m256h __A)
+_mm512_castph_ps (__m512h __a)
{
- return (__m512i)
- __builtin_ia32_vcvttph2dq512_mask_round (__A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512) __a;
}
-extern __inline __m512i
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
+_mm512_castph_pd (__m512h __a)
{
- return (__m512i)
- __builtin_ia32_vcvttph2dq512_mask_round (__C,
- (__v16si) __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512d) __a;
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epi32 (__mmask16 __A, __m256h __B)
+_mm512_castph_si512 (__m512h __a)
{
- return (__m512i)
- __builtin_ia32_vcvttph2dq512_mask_round (__B,
- (__v16si)
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i) __a;
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epi32 (__m256h __A, int __B)
+_mm512_castph512_ph128 (__m512h __A)
{
- return (__m512i)
- __builtin_ia32_vcvttph2dq512_mask_round (__A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) -1,
- __B);
+ union
+ {
+ __m128h __a[4];
+ __m512h __v;
+ } __u = { .__v = __A };
+ return __u.__a[0];
}
-extern __inline __m512i
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epi32 (__m512i __A, __mmask16 __B,
- __m256h __C, int __D)
+_mm512_castph512_ph256 (__m512h __A)
{
- return (__m512i)
- __builtin_ia32_vcvttph2dq512_mask_round (__C,
- (__v16si) __A,
- __B,
- __D);
+ union
+ {
+ __m256h __a[2];
+ __m512h __v;
+ } __u = { .__v = __A };
+ return __u.__a[0];
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
+_mm512_castph128_ph512 (__m128h __A)
{
- return (__m512i)
- __builtin_ia32_vcvttph2dq512_mask_round (__B,
- (__v16si)
- _mm512_setzero_si512 (),
- __A,
- __C);
+ union
+ {
+ __m128h __a[4];
+ __m512h __v;
+ } __u;
+ __u.__a[0] = __A;
+ return __u.__v;
}
-#else
-#define _mm512_cvtt_roundph_epi32(A, B) \
- ((__m512i) \
- __builtin_ia32_vcvttph2dq512_mask_round ((A), \
- (__v16si) \
- (_mm512_setzero_si512 ()), \
- (__mmask16)(-1), (B)))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castph256_ph512 (__m256h __A)
+{
+ union
+ {
+ __m256h __a[2];
+ __m512h __v;
+ } __u;
+ __u.__a[0] = __A;
+ return __u.__v;
+}
-#define _mm512_mask_cvtt_roundph_epi32(A, B, C, D) \
- ((__m512i) \
- __builtin_ia32_vcvttph2dq512_mask_round ((C), \
- (__v16si)(A), \
- (B), \
- (D)))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextph128_ph512 (__m128h __A)
+{
+ return (__m512h) _mm512_insertf32x4 (_mm512_setzero_ps (),
+ (__m128) __A, 0);
+}
-#define _mm512_maskz_cvtt_roundph_epi32(A, B, C) \
- ((__m512i) \
- __builtin_ia32_vcvttph2dq512_mask_round ((B), \
- (__v16si) \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextph256_ph512 (__m256h __A)
+{
+ return (__m512h) _mm512_insertf64x4 (_mm512_setzero_pd (),
+ (__m256d) __A, 0);
+}
-#endif /* __OPTIMIZE__ */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castps_ph (__m512 __a)
+{
+ return (__m512h) __a;
+}
-/* Intrinsics vcvttph2udq. */
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epu32 (__m256h __A)
+_mm512_castpd_ph (__m512d __a)
{
- return (__m512i)
- __builtin_ia32_vcvttph2udq512_mask_round (__A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __a;
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
+_mm512_castsi512_ph (__m512i __a)
{
- return (__m512i)
- __builtin_ia32_vcvttph2udq512_mask_round (__C,
- (__v16si) __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __a;
}
-extern __inline __m512i
+/* Create a vector with element 0 as *P and the rest zero. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epu32 (__mmask16 __A, __m256h __B)
+_mm512_load_ph (void const *__P)
{
- return (__m512i)
- __builtin_ia32_vcvttph2udq512_mask_round (__B,
- (__v16si)
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return *(const __m512h *) __P;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_loadu_ph (void const *__P)
+{
+ return *(const __m512h_u *) __P;
+}
+
+/* Stores the lower _Float16 value. */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_store_ph (void *__P, __m512h __A)
+{
+ *(__m512h *) __P = __A;
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_storeu_ph (void *__P, __m512h __A)
+{
+ *(__m512h_u *) __P = __A;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_abs_ph (__m512h __A)
+{
+ return (__m512h) _mm512_and_epi32 ( _mm512_set1_epi32 (0x7FFF7FFF),
+ (__m512i) __A);
+}
+
+/* Intrinsics v[add,sub,mul,div]ph. */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_add_ph (__m512h __A, __m512h __B)
+{
+ return (__m512h) ((__v32hf) __A + (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_add_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+ return __builtin_ia32_addph512_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_add_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+ return __builtin_ia32_addph512_mask (__B, __C,
+ _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sub_ph (__m512h __A, __m512h __B)
+{
+ return (__m512h) ((__v32hf) __A - (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sub_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+ return __builtin_ia32_subph512_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sub_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+ return __builtin_ia32_subph512_mask (__B, __C,
+ _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mul_ph (__m512h __A, __m512h __B)
+{
+ return (__m512h) ((__v32hf) __A * (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mul_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+ return __builtin_ia32_mulph512_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mul_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+ return __builtin_ia32_mulph512_mask (__B, __C,
+ _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_div_ph (__m512h __A, __m512h __B)
+{
+ return (__m512h) ((__v32hf) __A / (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_div_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+ return __builtin_ia32_divph512_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_div_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+ return __builtin_ia32_divph512_mask (__B, __C,
+ _mm512_setzero_ph (), __A);
}
#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epu32 (__m256h __A, int __B)
+_mm512_add_round_ph (__m512h __A, __m512h __B, const int __C)
{
- return (__m512i)
- __builtin_ia32_vcvttph2udq512_mask_round (__A,
- (__v16si)
- _mm512_setzero_si512 (),
- (__mmask16) -1,
- __B);
+ return __builtin_ia32_addph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __C);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epu32 (__m512i __A, __mmask16 __B,
- __m256h __C, int __D)
+_mm512_mask_add_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return (__m512i)
- __builtin_ia32_vcvttph2udq512_mask_round (__C,
- (__v16si) __A,
- __B,
- __D);
+ return __builtin_ia32_addph512_mask_round (__C, __D, __A, __B, __E);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
+_mm512_maskz_add_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+ const int __D)
{
- return (__m512i)
- __builtin_ia32_vcvttph2udq512_mask_round (__B,
- (__v16si)
- _mm512_setzero_si512 (),
- __A,
- __C);
+ return __builtin_ia32_addph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sub_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+ return __builtin_ia32_subph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sub_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+ __m512h __D, const int __E)
+{
+ return __builtin_ia32_subph512_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sub_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+ const int __D)
+{
+ return __builtin_ia32_subph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mul_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+ return __builtin_ia32_mulph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mul_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+ __m512h __D, const int __E)
+{
+ return __builtin_ia32_mulph512_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mul_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+ const int __D)
+{
+ return __builtin_ia32_mulph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_div_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+ return __builtin_ia32_divph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_div_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+ __m512h __D, const int __E)
+{
+ return __builtin_ia32_divph512_mask_round (__C, __D, __A, __B, __E);
}
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+ const int __D)
+{
+ return __builtin_ia32_divph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A, __D);
+}
#else
-#define _mm512_cvtt_roundph_epu32(A, B) \
- ((__m512i) \
- __builtin_ia32_vcvttph2udq512_mask_round ((A), \
- (__v16si) \
- _mm512_setzero_si512 (), \
- (__mmask16)-1, \
- (B)))
+#define _mm512_add_round_ph(A, B, C) \
+ ((__m512h)__builtin_ia32_addph512_mask_round((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (C)))
-#define _mm512_mask_cvtt_roundph_epu32(A, B, C, D) \
- ((__m512i) \
- __builtin_ia32_vcvttph2udq512_mask_round ((C), \
- (__v16si)(A), \
- (B), \
- (D)))
+#define _mm512_mask_add_round_ph(A, B, C, D, E) \
+ ((__m512h)__builtin_ia32_addph512_mask_round((C), (D), (A), (B), (E)))
-#define _mm512_maskz_cvtt_roundph_epu32(A, B, C) \
- ((__m512i) \
- __builtin_ia32_vcvttph2udq512_mask_round ((B), \
- (__v16si) \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
+#define _mm512_maskz_add_round_ph(A, B, C, D) \
+ ((__m512h)__builtin_ia32_addph512_mask_round((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), (D)))
-#endif /* __OPTIMIZE__ */
+#define _mm512_sub_round_ph(A, B, C) \
+ ((__m512h)__builtin_ia32_subph512_mask_round((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (C)))
-/* Intrinsics vcvtdq2ph. */
-extern __inline __m256h
+#define _mm512_mask_sub_round_ph(A, B, C, D, E) \
+ ((__m512h)__builtin_ia32_subph512_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_sub_round_ph(A, B, C, D) \
+ ((__m512h)__builtin_ia32_subph512_mask_round((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), (D)))
+
+#define _mm512_mul_round_ph(A, B, C) \
+ ((__m512h)__builtin_ia32_mulph512_mask_round((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (C)))
+
+#define _mm512_mask_mul_round_ph(A, B, C, D, E) \
+ ((__m512h)__builtin_ia32_mulph512_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_mul_round_ph(A, B, C, D) \
+ ((__m512h)__builtin_ia32_mulph512_mask_round((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), (D)))
+
+#define _mm512_div_round_ph(A, B, C) \
+ ((__m512h)__builtin_ia32_divph512_mask_round((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (C)))
+
+#define _mm512_mask_div_round_ph(A, B, C, D, E) \
+ ((__m512h)__builtin_ia32_divph512_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_div_round_ph(A, B, C, D) \
+ ((__m512h)__builtin_ia32_divph512_mask_round((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), (D)))
+#endif /* __OPTIMIZE__ */
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_conj_pch (__m512h __A)
+{
+ return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 (1<<31));
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A)
+{
+ return (__m512h)
+ __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
+ (__v16sf) __W,
+ (__mmask16) __U);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A)
+{
+ return (__m512h)
+ __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
+ (__v16sf) _mm512_setzero_ps (),
+ (__mmask16) __U);
+}
+
+/* Intrinsic vmaxph vminph. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi32_ph (__m512i __A)
+_mm512_max_ph (__m512h __A, __m512h __B)
{
- return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A,
- _mm256_setzero_ph (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_maxph512_mask (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1);
}
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi32_ph (__m256h __A, __mmask16 __B, __m512i __C)
+_mm512_mask_max_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
{
- return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C,
- __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_maxph512_mask (__C, __D, __A, __B);
}
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi32_ph (__mmask16 __A, __m512i __B)
+_mm512_maskz_max_ph (__mmask32 __A, __m512h __B, __m512h __C)
{
- return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B,
- _mm256_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_maxph512_mask (__B, __C,
+ _mm512_setzero_ph (), __A);
}
-#ifdef __OPTIMIZE__
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepi32_ph (__m512i __A, int __B)
+_mm512_min_ph (__m512h __A, __m512h __B)
{
- return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A,
- _mm256_setzero_ph (),
- (__mmask16) -1,
- __B);
+ return __builtin_ia32_minph512_mask (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1);
}
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepi32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
+_mm512_mask_min_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
{
- return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C,
- __A,
- __B,
- __D);
+ return __builtin_ia32_minph512_mask (__C, __D, __A, __B);
}
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepi32_ph (__mmask16 __A, __m512i __B, int __C)
+_mm512_maskz_min_ph (__mmask32 __A, __m512h __B, __m512h __C)
{
- return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B,
- _mm256_setzero_ph (),
- __A,
- __C);
+ return __builtin_ia32_minph512_mask (__B, __C,
+ _mm512_setzero_ph (), __A);
}
-#else
-#define _mm512_cvt_roundepi32_ph(A, B) \
- (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(A), \
- _mm256_setzero_ph (), \
- (__mmask16)-1, \
- (B)))
-
-#define _mm512_mask_cvt_roundepi32_ph(A, B, C, D) \
- (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(C), \
- (A), \
- (B), \
- (D)))
-
-#define _mm512_maskz_cvt_roundepi32_ph(A, B, C) \
- (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(B), \
- _mm256_setzero_ph (), \
- (A), \
- (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvtudq2ph. */
-extern __inline __m256h
+#ifdef __OPTIMIZE__
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu32_ph (__m512i __A)
+_mm512_max_round_ph (__m512h __A, __m512h __B, const int __C)
{
- return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A,
- _mm256_setzero_ph (),
- (__mmask16) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_maxph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __C);
}
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu32_ph (__m256h __A, __mmask16 __B, __m512i __C)
+_mm512_mask_max_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C,
- __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_maxph512_mask_round (__C, __D, __A, __B, __E);
}
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu32_ph (__mmask16 __A, __m512i __B)
+_mm512_maskz_max_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+ const int __D)
{
- return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B,
- _mm256_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_maxph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A, __D);
}
-#ifdef __OPTIMIZE__
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepu32_ph (__m512i __A, int __B)
+_mm512_min_round_ph (__m512h __A, __m512h __B, const int __C)
{
- return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A,
- _mm256_setzero_ph (),
- (__mmask16) -1,
- __B);
+ return __builtin_ia32_minph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __C);
}
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepu32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
+_mm512_mask_min_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C,
- __A,
- __B,
- __D);
+ return __builtin_ia32_minph512_mask_round (__C, __D, __A, __B, __E);
}
-extern __inline __m256h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepu32_ph (__mmask16 __A, __m512i __B, int __C)
+_mm512_maskz_min_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+ const int __D)
{
- return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B,
- _mm256_setzero_ph (),
- __A,
- __C);
+ return __builtin_ia32_minph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A, __D);
}
#else
-#define _mm512_cvt_roundepu32_ph(A, B) \
- (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)(A), \
- _mm256_setzero_ph (), \
- (__mmask16)-1, \
- B))
+#define _mm512_max_round_ph(A, B, C) \
+ (__builtin_ia32_maxph512_mask_round ((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (C)))
-#define _mm512_mask_cvt_roundepu32_ph(A, B, C, D) \
- (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)C, \
- A, \
- B, \
- D))
+#define _mm512_mask_max_round_ph(A, B, C, D, E) \
+ (__builtin_ia32_maxph512_mask_round ((C), (D), (A), (B), (E)))
-#define _mm512_maskz_cvt_roundepu32_ph(A, B, C) \
- (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)B, \
- _mm256_setzero_ph (), \
- A, \
- C))
+#define _mm512_maskz_max_round_ph(A, B, C, D) \
+ (__builtin_ia32_maxph512_mask_round ((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), (D)))
-#endif /* __OPTIMIZE__ */
+#define _mm512_min_round_ph(A, B, C) \
+ (__builtin_ia32_minph512_mask_round ((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (C)))
-/* Intrinsics vcvtph2qq. */
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epi64 (__m128h __A)
-{
- return __builtin_ia32_vcvtph2qq512_mask_round (__A,
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
+#define _mm512_mask_min_round_ph(A, B, C, D, E) \
+ (__builtin_ia32_minph512_mask_round ((C), (D), (A), (B), (E)))
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
-{
- return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
-}
+#define _mm512_maskz_min_round_ph(A, B, C, D) \
+ (__builtin_ia32_minph512_mask_round ((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), (D)))
+#endif /* __OPTIMIZE__ */
-extern __inline __m512i
+/* vcmpph */
+#ifdef __OPTIMIZE
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B)
+_mm512_cmp_ph_mask (__m512h __A, __m512h __B, const int __C)
{
- return __builtin_ia32_vcvtph2qq512_mask_round (__B,
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__mmask32) __builtin_ia32_cmpph512_mask (__A, __B, __C,
+ (__mmask32) -1);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epi64 (__m128h __A, int __B)
+_mm512_mask_cmp_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
+ const int __D)
{
- return __builtin_ia32_vcvtph2qq512_mask_round (__A,
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- __B);
+ return (__mmask32) __builtin_ia32_cmpph512_mask (__B, __C, __D,
+ __A);
}
-extern __inline __m512i
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_cmp_round_ph_mask (__m512h __A, __m512h __B, const int __C,
+ const int __D)
{
- return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B, __D);
+ return (__mmask32) __builtin_ia32_cmpph512_mask_round (__A, __B,
+ __C, (__mmask32) -1,
+ __D);
}
-extern __inline __m512i
+extern __inline __mmask32
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
+_mm512_mask_cmp_round_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
+ const int __D, const int __E)
{
- return __builtin_ia32_vcvtph2qq512_mask_round (__B,
- _mm512_setzero_si512 (),
- __A,
- __C);
+ return (__mmask32) __builtin_ia32_cmpph512_mask_round (__B, __C,
+ __D, __A,
+ __E);
}
-#else
-#define _mm512_cvt_roundph_epi64(A, B) \
- (__builtin_ia32_vcvtph2qq512_mask_round ((A), \
- _mm512_setzero_si512 (), \
- (__mmask8)-1, \
- (B)))
-
-#define _mm512_mask_cvt_roundph_epi64(A, B, C, D) \
- (__builtin_ia32_vcvtph2qq512_mask_round ((C), (A), (B), (D)))
+#else
+#define _mm512_cmp_ph_mask(A, B, C) \
+ (__builtin_ia32_cmpph512_mask ((A), (B), (C), (-1)))
-#define _mm512_maskz_cvt_roundph_epi64(A, B, C) \
- (__builtin_ia32_vcvtph2qq512_mask_round ((B), \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
+#define _mm512_mask_cmp_ph_mask(A, B, C, D) \
+ (__builtin_ia32_cmpph512_mask ((B), (C), (D), (A)))
+
+#define _mm512_cmp_round_ph_mask(A, B, C, D) \
+ (__builtin_ia32_cmpph512_mask_round ((A), (B), (C), (-1), (D)))
+
+#define _mm512_mask_cmp_round_ph_mask(A, B, C, D, E) \
+ (__builtin_ia32_cmpph512_mask_round ((B), (C), (D), (A), (E)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtph2uqq. */
-extern __inline __m512i
+/* Intrinsics vsqrtph. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epu64 (__m128h __A)
+_mm512_sqrt_ph (__m512h __A)
{
- return __builtin_ia32_vcvtph2uqq512_mask_round (__A,
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_sqrtph512_mask_round (__A,
+ _mm512_setzero_ph(),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
+_mm512_mask_sqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
{
- return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B)
+_mm512_maskz_sqrt_ph (__mmask32 __A, __m512h __B)
{
- return __builtin_ia32_vcvtph2uqq512_mask_round (__B,
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_sqrtph512_mask_round (__B,
+ _mm512_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epu64 (__m128h __A, int __B)
+_mm512_sqrt_round_ph (__m512h __A, const int __B)
{
- return __builtin_ia32_vcvtph2uqq512_mask_round (__A,
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- __B);
+ return __builtin_ia32_sqrtph512_mask_round (__A,
+ _mm512_setzero_ph(),
+ (__mmask32) -1, __B);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_mask_sqrt_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+ const int __D)
{
- return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B, __D);
+ return __builtin_ia32_sqrtph512_mask_round (__C, __A, __B, __D);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
+_mm512_maskz_sqrt_round_ph (__mmask32 __A, __m512h __B, const int __C)
{
- return __builtin_ia32_vcvtph2uqq512_mask_round (__B,
- _mm512_setzero_si512 (),
- __A,
- __C);
+ return __builtin_ia32_sqrtph512_mask_round (__B,
+ _mm512_setzero_ph (),
+ __A, __C);
}
#else
-#define _mm512_cvt_roundph_epu64(A, B) \
- (__builtin_ia32_vcvtph2uqq512_mask_round ((A), \
- _mm512_setzero_si512 (), \
- (__mmask8)-1, \
- (B)))
+#define _mm512_sqrt_round_ph(A, B) \
+ (__builtin_ia32_sqrtph512_mask_round ((A), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (B)))
-#define _mm512_mask_cvt_roundph_epu64(A, B, C, D) \
- (__builtin_ia32_vcvtph2uqq512_mask_round ((C), (A), (B), (D)))
+#define _mm512_mask_sqrt_round_ph(A, B, C, D) \
+ (__builtin_ia32_sqrtph512_mask_round ((C), (A), (B), (D)))
-#define _mm512_maskz_cvt_roundph_epu64(A, B, C) \
- (__builtin_ia32_vcvtph2uqq512_mask_round ((B), \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
+#define _mm512_maskz_sqrt_round_ph(A, B, C) \
+ (__builtin_ia32_sqrtph512_mask_round ((B), \
+ _mm512_setzero_ph (), \
+ (A), (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvttph2qq. */
-extern __inline __m512i
+/* Intrinsics vrsqrtph. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epi64 (__m128h __A)
+_mm512_rsqrt_ph (__m512h __A)
{
- return __builtin_ia32_vcvttph2qq512_mask_round (__A,
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_rsqrtph512_mask (__A, _mm512_setzero_ph (),
+ (__mmask32) -1);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
+_mm512_mask_rsqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
{
- return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_rsqrtph512_mask (__C, __A, __B);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B)
+_mm512_maskz_rsqrt_ph (__mmask32 __A, __m512h __B)
{
- return __builtin_ia32_vcvttph2qq512_mask_round (__B,
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_rsqrtph512_mask (__B, _mm512_setzero_ph (),
+ __A);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+/* Intrinsics vrcpph. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epi64 (__m128h __A, int __B)
+_mm512_rcp_ph (__m512h __A)
{
- return __builtin_ia32_vcvttph2qq512_mask_round (__A,
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- __B);
+ return __builtin_ia32_rcpph512_mask (__A, _mm512_setzero_ph (),
+ (__mmask32) -1);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_mask_rcp_ph (__m512h __A, __mmask32 __B, __m512h __C)
{
- return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B, __D);
+ return __builtin_ia32_rcpph512_mask (__C, __A, __B);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
+_mm512_maskz_rcp_ph (__mmask32 __A, __m512h __B)
{
- return __builtin_ia32_vcvttph2qq512_mask_round (__B,
- _mm512_setzero_si512 (),
- __A,
- __C);
+ return __builtin_ia32_rcpph512_mask (__B, _mm512_setzero_ph (),
+ __A);
}
-#else
-#define _mm512_cvtt_roundph_epi64(A, B) \
- (__builtin_ia32_vcvttph2qq512_mask_round ((A), \
- _mm512_setzero_si512 (), \
- (__mmask8)-1, \
- (B)))
-
-#define _mm512_mask_cvtt_roundph_epi64(A, B, C, D) \
- __builtin_ia32_vcvttph2qq512_mask_round ((C), (A), (B), (D))
-
-#define _mm512_maskz_cvtt_roundph_epi64(A, B, C) \
- (__builtin_ia32_vcvttph2qq512_mask_round ((B), \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvttph2uqq. */
-extern __inline __m512i
+/* Intrinsics vscalefph. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epu64 (__m128h __A)
+_mm512_scalef_ph (__m512h __A, __m512h __B)
{
- return __builtin_ia32_vcvttph2uqq512_mask_round (__A,
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_scalefph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
+_mm512_mask_scalef_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
{
- return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B)
+_mm512_maskz_scalef_ph (__mmask32 __A, __m512h __B, __m512h __C)
{
- return __builtin_ia32_vcvttph2uqq512_mask_round (__B,
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_scalefph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epu64 (__m128h __A, int __B)
+_mm512_scalef_round_ph (__m512h __A, __m512h __B, const int __C)
{
- return __builtin_ia32_vcvttph2uqq512_mask_round (__A,
- _mm512_setzero_si512 (),
- (__mmask8) -1,
- __B);
+ return __builtin_ia32_scalefph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __C);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_mask_scalef_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B, __D);
+ return __builtin_ia32_scalefph512_mask_round (__C, __D, __A, __B,
+ __E);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
+_mm512_maskz_scalef_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+ const int __D)
{
- return __builtin_ia32_vcvttph2uqq512_mask_round (__B,
- _mm512_setzero_si512 (),
- __A,
- __C);
+ return __builtin_ia32_scalefph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A, __D);
}
#else
-#define _mm512_cvtt_roundph_epu64(A, B) \
- (__builtin_ia32_vcvttph2uqq512_mask_round ((A), \
- _mm512_setzero_si512 (), \
- (__mmask8)-1, \
- (B)))
-
-#define _mm512_mask_cvtt_roundph_epu64(A, B, C, D) \
- __builtin_ia32_vcvttph2uqq512_mask_round ((C), (A), (B), (D))
+#define _mm512_scalef_round_ph(A, B, C) \
+ (__builtin_ia32_scalefph512_mask_round ((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (C)))
-#define _mm512_maskz_cvtt_roundph_epu64(A, B, C) \
- (__builtin_ia32_vcvttph2uqq512_mask_round ((B), \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
+#define _mm512_mask_scalef_round_ph(A, B, C, D, E) \
+ (__builtin_ia32_scalefph512_mask_round ((C), (D), (A), (B), (E)))
-#endif /* __OPTIMIZE__ */
+#define _mm512_maskz_scalef_round_ph(A, B, C, D) \
+ (__builtin_ia32_scalefph512_mask_round ((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), (D)))
-/* Intrinsics vcvtqq2ph. */
-extern __inline __m128h
- __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi64_ph (__m512i __A)
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vreduceph. */
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_reduce_ph (__m512h __A, int __B)
{
- return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A,
- _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_reduceph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m512i __C)
+_mm512_mask_reduce_ph (__m512h __A, __mmask32 __B, __m512h __C, int __D)
{
- return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C,
- __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi64_ph (__mmask8 __A, __m512i __B)
+_mm512_maskz_reduce_ph (__mmask32 __A, __m512h __B, int __C)
{
- return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B,
- _mm_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_reduceph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepi64_ph (__m512i __A, int __B)
+_mm512_reduce_round_ph (__m512h __A, int __B, const int __C)
{
- return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A,
- _mm_setzero_ph (),
- (__mmask8) -1,
- __B);
+ return __builtin_ia32_reduceph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __C);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepi64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
+_mm512_mask_reduce_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+ int __D, const int __E)
{
- return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C,
- __A,
- __B,
- __D);
+ return __builtin_ia32_reduceph512_mask_round (__C, __D, __A, __B,
+ __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepi64_ph (__mmask8 __A, __m512i __B, int __C)
+_mm512_maskz_reduce_round_ph (__mmask32 __A, __m512h __B, int __C,
+ const int __D)
{
- return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B,
- _mm_setzero_ph (),
- __A,
- __C);
+ return __builtin_ia32_reduceph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A, __D);
}
#else
-#define _mm512_cvt_roundepi64_ph(A, B) \
- (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(A), \
- _mm_setzero_ph (), \
- (__mmask8)-1, \
- (B)))
+#define _mm512_reduce_ph(A, B) \
+ (__builtin_ia32_reduceph512_mask_round ((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_mask_cvt_roundepi64_ph(A, B, C, D) \
- (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(C), (A), (B), (D)))
+#define _mm512_mask_reduce_ph(A, B, C, D) \
+ (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_maskz_cvt_roundepi64_ph(A, B, C) \
- (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(B), \
- _mm_setzero_ph (), \
- (A), \
- (C)))
+#define _mm512_maskz_reduce_ph(A, B, C) \
+ (__builtin_ia32_reduceph512_mask_round ((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_reduce_round_ph(A, B, C) \
+ (__builtin_ia32_reduceph512_mask_round ((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (C)))
+
+#define _mm512_mask_reduce_round_ph(A, B, C, D, E) \
+ (__builtin_ia32_reduceph512_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_reduce_round_ph(A, B, C, D) \
+ (__builtin_ia32_reduceph512_mask_round ((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), (D)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtuqq2ph. */
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepu64_ph (__m512i __A)
+/* Intrinsics vrndscaleph. */
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_roundscale_ph (__m512h __A, int __B)
{
- return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A,
- _mm_setzero_ph (),
- (__mmask8) -1,
+ return __builtin_ia32_rndscaleph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m512i __C)
+_mm512_mask_roundscale_ph (__m512h __A, __mmask32 __B,
+ __m512h __C, int __D)
{
- return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C,
- __A,
- __B,
+ return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A, __B,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu64_ph (__mmask8 __A, __m512i __B)
+_mm512_maskz_roundscale_ph (__mmask32 __A, __m512h __B, int __C)
{
- return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B,
- _mm_setzero_ph (),
+ return __builtin_ia32_rndscaleph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
__A,
_MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepu64_ph (__m512i __A, int __B)
+_mm512_roundscale_round_ph (__m512h __A, int __B, const int __C)
{
- return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A,
- _mm_setzero_ph (),
- (__mmask8) -1,
- __B);
+ return __builtin_ia32_rndscaleph512_mask_round (__A, __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1,
+ __C);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepu64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
+_mm512_mask_roundscale_round_ph (__m512h __A, __mmask32 __B,
+ __m512h __C, int __D, const int __E)
{
- return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C,
- __A,
- __B,
- __D);
+ return __builtin_ia32_rndscaleph512_mask_round (__C, __D, __A,
+ __B, __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepu64_ph (__mmask8 __A, __m512i __B, int __C)
+_mm512_maskz_roundscale_round_ph (__mmask32 __A, __m512h __B, int __C,
+ const int __D)
{
- return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B,
- _mm_setzero_ph (),
- __A,
- __C);
+ return __builtin_ia32_rndscaleph512_mask_round (__B, __C,
+ _mm512_setzero_ph (),
+ __A, __D);
}
#else
-#define _mm512_cvt_roundepu64_ph(A, B) \
- (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(A), \
- _mm_setzero_ph (), \
- (__mmask8)-1, \
- (B)))
+#define _mm512_roundscale_ph(A, B) \
+ (__builtin_ia32_rndscaleph512_mask_round ((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_mask_cvt_roundepu64_ph(A, B, C, D) \
- (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(C), (A), (B), (D)))
+#define _mm512_mask_roundscale_ph(A, B, C, D) \
+ (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_maskz_cvt_roundepu64_ph(A, B, C) \
- (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(B), \
- _mm_setzero_ph (), \
- (A), \
- (C)))
+#define _mm512_maskz_roundscale_ph(A, B, C) \
+ (__builtin_ia32_rndscaleph512_mask_round ((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), \
+ _MM_FROUND_CUR_DIRECTION))
+#define _mm512_roundscale_round_ph(A, B, C) \
+ (__builtin_ia32_rndscaleph512_mask_round ((A), (B), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, (C)))
+
+#define _mm512_mask_roundscale_round_ph(A, B, C, D, E) \
+ (__builtin_ia32_rndscaleph512_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_roundscale_round_ph(A, B, C, D) \
+ (__builtin_ia32_rndscaleph512_mask_round ((B), (C), \
+ _mm512_setzero_ph (), \
+ (A), (D)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtph2w. */
-extern __inline __m512i
+/* Intrinsics vfpclassph. */
+#ifdef __OPTIMIZE__
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fpclass_ph_mask (__mmask32 __U, __m512h __A,
+ const int __imm)
+{
+ return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
+ __imm, __U);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fpclass_ph_mask (__m512h __A, const int __imm)
+{
+ return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
+ __imm,
+ (__mmask32) -1);
+}
+
+#else
+#define _mm512_mask_fpclass_ph_mask(u, x, c) \
+ ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
+ (int) (c),(__mmask8)(u)))
+
+#define _mm512_fpclass_ph_mask(x, c) \
+ ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
+ (int) (c),(__mmask8)-1))
+#endif /* __OPIMTIZE__ */
+
+/* Intrinsics vgetexpph. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epi16 (__m512h __A)
+_mm512_getexp_ph (__m512h __A)
{
- return (__m512i)
- __builtin_ia32_vcvtph2w512_mask_round (__A,
- (__v32hi)
- _mm512_setzero_si512 (),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+ (__v32hf) _mm512_setzero_ph (),
+ (__mmask32) -1, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
+_mm512_mask_getexp_ph (__m512h __W, __mmask32 __U, __m512h __A)
{
- return (__m512i)
- __builtin_ia32_vcvtph2w512_mask_round (__C,
- (__v32hi) __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_getexpph512_mask ((__v32hf) __A, (__v32hf) __W,
+ (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epi16 (__mmask32 __A, __m512h __B)
+_mm512_maskz_getexp_ph (__mmask32 __U, __m512h __A)
{
- return (__m512i)
- __builtin_ia32_vcvtph2w512_mask_round (__B,
- (__v32hi)
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+ (__v32hf) _mm512_setzero_ph (),
+ (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epi16 (__m512h __A, int __B)
+_mm512_getexp_round_ph (__m512h __A, const int __R)
{
- return (__m512i)
- __builtin_ia32_vcvtph2w512_mask_round (__A,
- (__v32hi)
- _mm512_setzero_si512 (),
- (__mmask32) -1,
- __B);
+ return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+ (__v32hf)
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
+_mm512_mask_getexp_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
+ const int __R)
{
- return (__m512i)
- __builtin_ia32_vcvtph2w512_mask_round (__C,
- (__v32hi) __A,
- __B,
- __D);
+ return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+ (__v32hf) __W,
+ (__mmask32) __U, __R);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
+_mm512_maskz_getexp_round_ph (__mmask32 __U, __m512h __A, const int __R)
{
- return (__m512i)
- __builtin_ia32_vcvtph2w512_mask_round (__B,
- (__v32hi)
- _mm512_setzero_si512 (),
- __A,
- __C);
+ return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+ (__v32hf)
+ _mm512_setzero_ph (),
+ (__mmask32) __U, __R);
}
#else
-#define _mm512_cvt_roundph_epi16(A, B) \
- ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((A), \
- (__v32hi) \
- _mm512_setzero_si512 (), \
- (__mmask32)-1, \
- (B)))
+#define _mm512_getexp_round_ph(A, R) \
+ ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \
+ (__v32hf)_mm512_setzero_ph(), (__mmask32)-1, R))
-#define _mm512_mask_cvt_roundph_epi16(A, B, C, D) \
- ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((C), \
- (__v32hi)(A), \
- (B), \
- (D)))
+#define _mm512_mask_getexp_round_ph(W, U, A, R) \
+ ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \
+ (__v32hf)(__m512h)(W), (__mmask32)(U), R))
-#define _mm512_maskz_cvt_roundph_epi16(A, B, C) \
- ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((B), \
- (__v32hi) \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
+#define _mm512_maskz_getexp_round_ph(U, A, R) \
+ ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A), \
+ (__v32hf)_mm512_setzero_ph(), (__mmask32)(U), R))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtph2uw. */
-extern __inline __m512i
+/* Intrinsics vgetmantph. */
+#ifdef __OPTIMIZE__
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_epu16 (__m512h __A)
+_mm512_getmant_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C)
{
- return (__m512i)
- __builtin_ia32_vcvtph2uw512_mask_round (__A,
- (__v32hi)
- _mm512_setzero_si512 (),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+ (__C << 2) | __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
+_mm512_mask_getmant_ph (__m512h __W, __mmask32 __U, __m512h __A,
+ _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C)
{
- return (__m512i)
- __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+ (__C << 2) | __B,
+ (__v32hf) __W, __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_epu16 (__mmask32 __A, __m512h __B)
+_mm512_maskz_getmant_ph (__mmask32 __U, __m512h __A,
+ _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C)
{
- return (__m512i)
- __builtin_ia32_vcvtph2uw512_mask_round (__B,
- (__v32hi)
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+ (__C << 2) | __B,
+ (__v32hf)
+ _mm512_setzero_ph (),
+ __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_epu16 (__m512h __A, int __B)
+_mm512_getmant_round_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C, const int __R)
{
- return (__m512i)
- __builtin_ia32_vcvtph2uw512_mask_round (__A,
- (__v32hi)
- _mm512_setzero_si512 (),
- (__mmask32) -1,
- __B);
+ return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+ (__C << 2) | __B,
+ _mm512_setzero_ph (),
+ (__mmask32) -1, __R);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
+_mm512_mask_getmant_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
+ _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C, const int __R)
{
- return (__m512i)
- __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B, __D);
+ return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+ (__C << 2) | __B,
+ (__v32hf) __W, __U,
+ __R);
}
-extern __inline __m512i
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
+_mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A,
+ _MM_MANTISSA_NORM_ENUM __B,
+ _MM_MANTISSA_SIGN_ENUM __C, const int __R)
{
- return (__m512i)
- __builtin_ia32_vcvtph2uw512_mask_round (__B,
- (__v32hi)
- _mm512_setzero_si512 (),
- __A,
- __C);
+ return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+ (__C << 2) | __B,
+ (__v32hf)
+ _mm512_setzero_ph (),
+ __U, __R);
}
#else
-#define _mm512_cvt_roundph_epu16(A, B) \
- ((__m512i) \
- __builtin_ia32_vcvtph2uw512_mask_round ((A), \
- (__v32hi) \
- _mm512_setzero_si512 (), \
- (__mmask32)-1, (B)))
+#define _mm512_getmant_ph(X, B, C) \
+ ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v32hf)(__m512h) \
+ _mm512_setzero_ph(), \
+ (__mmask32)-1, \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_mask_cvt_roundph_epu16(A, B, C, D) \
- ((__m512i) \
- __builtin_ia32_vcvtph2uw512_mask_round ((C), (__v32hi)(A), (B), (D)))
+#define _mm512_mask_getmant_ph(W, U, X, B, C) \
+ ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v32hf)(__m512h)(W), \
+ (__mmask32)(U), \
+ _MM_FROUND_CUR_DIRECTION))
-#define _mm512_maskz_cvt_roundph_epu16(A, B, C) \
- ((__m512i) \
- __builtin_ia32_vcvtph2uw512_mask_round ((B), \
- (__v32hi) \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
+
+#define _mm512_maskz_getmant_ph(U, X, B, C) \
+ ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v32hf)(__m512h) \
+ _mm512_setzero_ph(), \
+ (__mmask32)(U), \
+ _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_getmant_round_ph(X, B, C, R) \
+ ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v32hf)(__m512h) \
+ _mm512_setzero_ph(), \
+ (__mmask32)-1, \
+ (R)))
+
+#define _mm512_mask_getmant_round_ph(W, U, X, B, C, R) \
+ ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v32hf)(__m512h)(W), \
+ (__mmask32)(U), \
+ (R)))
+
+
+#define _mm512_maskz_getmant_round_ph(U, X, B, C, R) \
+ ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X), \
+ (int)(((C)<<2) | (B)), \
+ (__v32hf)(__m512h) \
+ _mm512_setzero_ph(), \
+ (__mmask32)(U), \
+ (R)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvttph2w. */
+/* Intrinsics vcvtph2dq. */
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epi16 (__m512h __A)
+_mm512_cvtph_epi32 (__m256h __A)
{
return (__m512i)
- __builtin_ia32_vcvttph2w512_mask_round (__A,
- (__v32hi)
+ __builtin_ia32_vcvtph2dq512_mask_round (__A,
+ (__v16si)
_mm512_setzero_si512 (),
- (__mmask32) -1,
+ (__mmask16) -1,
_MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
+_mm512_mask_cvtph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
{
return (__m512i)
- __builtin_ia32_vcvttph2w512_mask_round (__C,
- (__v32hi) __A,
+ __builtin_ia32_vcvtph2dq512_mask_round (__C,
+ (__v16si) __A,
__B,
_MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B)
+_mm512_maskz_cvtph_epi32 (__mmask16 __A, __m256h __B)
{
return (__m512i)
- __builtin_ia32_vcvttph2w512_mask_round (__B,
- (__v32hi)
+ __builtin_ia32_vcvtph2dq512_mask_round (__B,
+ (__v16si)
_mm512_setzero_si512 (),
__A,
_MM_FROUND_CUR_DIRECTION);
@@ -4039,2994 +4210,2868 @@ _mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B)
#ifdef __OPTIMIZE__
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epi16 (__m512h __A, int __B)
+_mm512_cvt_roundph_epi32 (__m256h __A, int __B)
{
return (__m512i)
- __builtin_ia32_vcvttph2w512_mask_round (__A,
- (__v32hi)
+ __builtin_ia32_vcvtph2dq512_mask_round (__A,
+ (__v16si)
_mm512_setzero_si512 (),
- (__mmask32) -1,
+ (__mmask16) -1,
__B);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epi16 (__m512i __A, __mmask32 __B,
- __m512h __C, int __D)
+_mm512_mask_cvt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
{
return (__m512i)
- __builtin_ia32_vcvttph2w512_mask_round (__C,
- (__v32hi) __A,
+ __builtin_ia32_vcvtph2dq512_mask_round (__C,
+ (__v16si) __A,
__B,
__D);
}
extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
+_mm512_maskz_cvt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
{
return (__m512i)
- __builtin_ia32_vcvttph2w512_mask_round (__B,
- (__v32hi)
+ __builtin_ia32_vcvtph2dq512_mask_round (__B,
+ (__v16si)
_mm512_setzero_si512 (),
__A,
__C);
}
#else
-#define _mm512_cvtt_roundph_epi16(A, B) \
- ((__m512i) \
- __builtin_ia32_vcvttph2w512_mask_round ((A), \
- (__v32hi) \
- _mm512_setzero_si512 (), \
- (__mmask32)-1, \
+#define _mm512_cvt_roundph_epi32(A, B) \
+ ((__m512i) \
+ __builtin_ia32_vcvtph2dq512_mask_round ((A), \
+ (__v16si) \
+ _mm512_setzero_si512 (), \
+ (__mmask16)-1, \
(B)))
-#define _mm512_mask_cvtt_roundph_epi16(A, B, C, D) \
- ((__m512i) \
- __builtin_ia32_vcvttph2w512_mask_round ((C), \
- (__v32hi)(A), \
- (B), \
- (D)))
-
-#define _mm512_maskz_cvtt_roundph_epi16(A, B, C) \
- ((__m512i) \
- __builtin_ia32_vcvttph2w512_mask_round ((B), \
- (__v32hi) \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvttph2uw. */
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvttph_epu16 (__m512h __A)
-{
- return (__m512i)
- __builtin_ia32_vcvttph2uw512_mask_round (__A,
- (__v32hi)
- _mm512_setzero_si512 (),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvttph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
-{
- return (__m512i)
- __builtin_ia32_vcvttph2uw512_mask_round (__C,
- (__v32hi) __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvttph_epu16 (__mmask32 __A, __m512h __B)
-{
- return (__m512i)
- __builtin_ia32_vcvttph2uw512_mask_round (__B,
- (__v32hi)
- _mm512_setzero_si512 (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-#ifdef __OPTIMIZE__
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtt_roundph_epu16 (__m512h __A, int __B)
-{
- return (__m512i)
- __builtin_ia32_vcvttph2uw512_mask_round (__A,
- (__v32hi)
- _mm512_setzero_si512 (),
- (__mmask32) -1,
- __B);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtt_roundph_epu16 (__m512i __A, __mmask32 __B,
- __m512h __C, int __D)
-{
- return (__m512i)
- __builtin_ia32_vcvttph2uw512_mask_round (__C,
- (__v32hi) __A,
- __B,
- __D);
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
-{
- return (__m512i)
- __builtin_ia32_vcvttph2uw512_mask_round (__B,
- (__v32hi)
- _mm512_setzero_si512 (),
- __A,
- __C);
-}
-
-#else
-#define _mm512_cvtt_roundph_epu16(A, B) \
- ((__m512i) \
- __builtin_ia32_vcvttph2uw512_mask_round ((A), \
- (__v32hi) \
- _mm512_setzero_si512 (), \
- (__mmask32)-1, \
- (B)))
-
-#define _mm512_mask_cvtt_roundph_epu16(A, B, C, D) \
- ((__m512i) \
- __builtin_ia32_vcvttph2uw512_mask_round ((C), \
- (__v32hi)(A), \
- (B), \
- (D)))
-
-#define _mm512_maskz_cvtt_roundph_epu16(A, B, C) \
- ((__m512i) \
- __builtin_ia32_vcvttph2uw512_mask_round ((B), \
- (__v32hi) \
- _mm512_setzero_si512 (), \
- (A), \
- (C)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvtw2ph. */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtepi16_ph (__m512i __A)
-{
- return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A,
- _mm512_setzero_ph (),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepi16_ph (__m512h __A, __mmask32 __B, __m512i __C)
-{
- return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C,
- __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepi16_ph (__mmask32 __A, __m512i __B)
-{
- return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B,
- _mm512_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-#ifdef __OPTIMIZE__
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepi16_ph (__m512i __A, int __B)
-{
- return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A,
- _mm512_setzero_ph (),
- (__mmask32) -1,
- __B);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepi16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
-{
- return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C,
- __A,
- __B,
- __D);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepi16_ph (__mmask32 __A, __m512i __B, int __C)
-{
- return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B,
- _mm512_setzero_ph (),
- __A,
- __C);
-}
-
-#else
-#define _mm512_cvt_roundepi16_ph(A, B) \
- (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(A), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, \
- (B)))
-
-#define _mm512_mask_cvt_roundepi16_ph(A, B, C, D) \
- (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(C), \
- (A), \
- (B), \
- (D)))
+#define _mm512_mask_cvt_roundph_epi32(A, B, C, D) \
+ ((__m512i) \
+ __builtin_ia32_vcvtph2dq512_mask_round ((C), (__v16si)(A), (B), (D)))
-#define _mm512_maskz_cvt_roundepi16_ph(A, B, C) \
- (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(B), \
- _mm512_setzero_ph (), \
- (A), \
- (C)))
+#define _mm512_maskz_cvt_roundph_epi32(A, B, C) \
+ ((__m512i) \
+ __builtin_ia32_vcvtph2dq512_mask_round ((B), \
+ (__v16si) \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtuw2ph. */
- extern __inline __m512h
- __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
- _mm512_cvtepu16_ph (__m512i __A)
- {
- return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A,
- _mm512_setzero_ph (),
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
- }
+/* Intrinsics vcvtph2udq. */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtph_epu32 (__m256h __A)
+{
+ return (__m512i)
+ __builtin_ia32_vcvtph2udq512_mask_round (__A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
+}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtepu16_ph (__m512h __A, __mmask32 __B, __m512i __C)
+_mm512_mask_cvtph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
{
- return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C,
- __A,
- __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvtph2udq512_mask_round (__C,
+ (__v16si) __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtepu16_ph (__mmask32 __A, __m512i __B)
+_mm512_maskz_cvtph_epu32 (__mmask16 __A, __m256h __B)
{
- return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B,
- _mm512_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvtph2udq512_mask_round (__B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundepu16_ph (__m512i __A, int __B)
+_mm512_cvt_roundph_epu32 (__m256h __A, int __B)
{
- return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A,
- _mm512_setzero_ph (),
- (__mmask32) -1,
- __B);
+ return (__m512i)
+ __builtin_ia32_vcvtph2udq512_mask_round (__A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) -1,
+ __B);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundepu16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
+_mm512_mask_cvt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
{
- return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C,
- __A,
- __B,
- __D);
+ return (__m512i)
+ __builtin_ia32_vcvtph2udq512_mask_round (__C,
+ (__v16si) __A,
+ __B,
+ __D);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C)
+_mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
{
- return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B,
- _mm512_setzero_ph (),
- __A,
- __C);
+ return (__m512i)
+ __builtin_ia32_vcvtph2udq512_mask_round (__B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __A,
+ __C);
}
#else
-#define _mm512_cvt_roundepu16_ph(A, B) \
- (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(A), \
- _mm512_setzero_ph (), \
- (__mmask32)-1, \
- (B)))
+#define _mm512_cvt_roundph_epu32(A, B) \
+ ((__m512i) \
+ __builtin_ia32_vcvtph2udq512_mask_round ((A), \
+ (__v16si) \
+ _mm512_setzero_si512 (), \
+ (__mmask16)-1, \
+ (B)))
-#define _mm512_mask_cvt_roundepu16_ph(A, B, C, D) \
- (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(C), \
- (A), \
- (B), \
- (D)))
+#define _mm512_mask_cvt_roundph_epu32(A, B, C, D) \
+ ((__m512i) \
+ __builtin_ia32_vcvtph2udq512_mask_round ((C), (__v16si)(A), (B), (D)))
-#define _mm512_maskz_cvt_roundepu16_ph(A, B, C) \
- (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(B), \
- _mm512_setzero_ph (), \
- (A), \
- (C)))
+#define _mm512_maskz_cvt_roundph_epu32(A, B, C) \
+ ((__m512i) \
+ __builtin_ia32_vcvtph2udq512_mask_round ((B), \
+ (__v16si) \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtsh2si, vcvtsh2us. */
-extern __inline int
+/* Intrinsics vcvttph2dq. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_i32 (__m128h __A)
+_mm512_cvttph_epi32 (__m256h __A)
{
- return (int) __builtin_ia32_vcvtsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2dq512_mask_round (__A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_u32 (__m128h __A)
+_mm512_mask_cvttph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
{
- return (int) __builtin_ia32_vcvtsh2usi32_round (__A,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2dq512_mask_round (__C,
+ (__v16si) __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
-extern __inline int
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_i32 (__m128h __A, const int __R)
+_mm512_maskz_cvttph_epi32 (__mmask16 __A, __m256h __B)
{
- return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2dq512_mask_round (__B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_u32 (__m128h __A, const int __R)
+_mm512_cvtt_roundph_epi32 (__m256h __A, int __B)
{
- return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2dq512_mask_round (__A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) -1,
+ __B);
}
-#else
-#define _mm_cvt_roundsh_i32(A, B) \
- ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B)))
-#define _mm_cvt_roundsh_u32(A, B) \
- ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B)))
-
-#endif /* __OPTIMIZE__ */
-
-#ifdef __x86_64__
-extern __inline long long
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_i64 (__m128h __A)
+_mm512_mask_cvtt_roundph_epi32 (__m512i __A, __mmask16 __B,
+ __m256h __C, int __D)
{
- return (long long)
- __builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2dq512_mask_round (__C,
+ (__v16si) __A,
+ __B,
+ __D);
}
-extern __inline unsigned long long
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_u64 (__m128h __A)
+_mm512_maskz_cvtt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
{
- return (long long)
- __builtin_ia32_vcvtsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2dq512_mask_round (__B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __A,
+ __C);
}
-#ifdef __OPTIMIZE__
-extern __inline long long
+#else
+#define _mm512_cvtt_roundph_epi32(A, B) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2dq512_mask_round ((A), \
+ (__v16si) \
+ (_mm512_setzero_si512 ()), \
+ (__mmask16)(-1), (B)))
+
+#define _mm512_mask_cvtt_roundph_epi32(A, B, C, D) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2dq512_mask_round ((C), \
+ (__v16si)(A), \
+ (B), \
+ (D)))
+
+#define _mm512_maskz_cvtt_roundph_epi32(A, B, C) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2dq512_mask_round ((B), \
+ (__v16si) \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvttph2udq. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_i64 (__m128h __A, const int __R)
+_mm512_cvttph_epu32 (__m256h __A)
{
- return (long long) __builtin_ia32_vcvtsh2si64_round (__A, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2udq512_mask_round (__A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned long long
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_u64 (__m128h __A, const int __R)
+_mm512_mask_cvttph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
{
- return (long long) __builtin_ia32_vcvtsh2usi64_round (__A, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2udq512_mask_round (__C,
+ (__v16si) __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm_cvt_roundsh_i64(A, B) \
- ((long long)__builtin_ia32_vcvtsh2si64_round ((A), (B)))
-#define _mm_cvt_roundsh_u64(A, B) \
- ((long long)__builtin_ia32_vcvtsh2usi64_round ((A), (B)))
-
-#endif /* __OPTIMIZE__ */
-#endif /* __x86_64__ */
-
-/* Intrinsics vcvttsh2si, vcvttsh2us. */
-extern __inline int
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsh_i32 (__m128h __A)
+_mm512_maskz_cvttph_epu32 (__mmask16 __A, __m256h __B)
{
- return (int)
- __builtin_ia32_vcvttsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2udq512_mask_round (__B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsh_u32 (__m128h __A)
+_mm512_cvtt_roundph_epu32 (__m256h __A, int __B)
{
- return (int)
- __builtin_ia32_vcvttsh2usi32_round (__A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2udq512_mask_round (__A,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ (__mmask16) -1,
+ __B);
}
-#ifdef __OPTIMIZE__
-extern __inline int
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsh_i32 (__m128h __A, const int __R)
+_mm512_mask_cvtt_roundph_epu32 (__m512i __A, __mmask16 __B,
+ __m256h __C, int __D)
{
- return (int) __builtin_ia32_vcvttsh2si32_round (__A, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2udq512_mask_round (__C,
+ (__v16si) __A,
+ __B,
+ __D);
}
-extern __inline unsigned
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsh_u32 (__m128h __A, const int __R)
+_mm512_maskz_cvtt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
{
- return (int) __builtin_ia32_vcvttsh2usi32_round (__A, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2udq512_mask_round (__B,
+ (__v16si)
+ _mm512_setzero_si512 (),
+ __A,
+ __C);
}
#else
-#define _mm_cvtt_roundsh_i32(A, B) \
- ((int)__builtin_ia32_vcvttsh2si32_round ((A), (B)))
-#define _mm_cvtt_roundsh_u32(A, B) \
- ((int)__builtin_ia32_vcvttsh2usi32_round ((A), (B)))
+#define _mm512_cvtt_roundph_epu32(A, B) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2udq512_mask_round ((A), \
+ (__v16si) \
+ _mm512_setzero_si512 (), \
+ (__mmask16)-1, \
+ (B)))
+
+#define _mm512_mask_cvtt_roundph_epu32(A, B, C, D) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2udq512_mask_round ((C), \
+ (__v16si)(A), \
+ (B), \
+ (D)))
+
+#define _mm512_maskz_cvtt_roundph_epu32(A, B, C) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2udq512_mask_round ((B), \
+ (__v16si) \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-#ifdef __x86_64__
-extern __inline long long
+/* Intrinsics vcvtdq2ph. */
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsh_i64 (__m128h __A)
+_mm512_cvtepi32_ph (__m512i __A)
{
- return (long long)
- __builtin_ia32_vcvttsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A,
+ _mm256_setzero_ph (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned long long
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvttsh_u64 (__m128h __A)
+_mm512_mask_cvtepi32_ph (__m256h __A, __mmask16 __B, __m512i __C)
{
- return (long long)
- __builtin_ia32_vcvttsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C,
+ __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
-extern __inline long long
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsh_i64 (__m128h __A, const int __R)
+_mm512_maskz_cvtepi32_ph (__mmask16 __A, __m512i __B)
{
- return (long long) __builtin_ia32_vcvttsh2si64_round (__A, __R);
+ return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B,
+ _mm256_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline unsigned long long
+#ifdef __OPTIMIZE__
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtt_roundsh_u64 (__m128h __A, const int __R)
+_mm512_cvt_roundepi32_ph (__m512i __A, int __B)
{
- return (long long) __builtin_ia32_vcvttsh2usi64_round (__A, __R);
+ return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __A,
+ _mm256_setzero_ph (),
+ (__mmask16) -1,
+ __B);
}
-#else
-#define _mm_cvtt_roundsh_i64(A, B) \
- ((long long)__builtin_ia32_vcvttsh2si64_round ((A), (B)))
-#define _mm_cvtt_roundsh_u64(A, B) \
- ((long long)__builtin_ia32_vcvttsh2usi64_round ((A), (B)))
-
-#endif /* __OPTIMIZE__ */
-#endif /* __x86_64__ */
-
-/* Intrinsics vcvtsi2sh, vcvtusi2sh. */
-extern __inline __m128h
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti32_sh (__m128h __A, int __B)
+_mm512_mask_cvt_roundepi32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
{
- return __builtin_ia32_vcvtsi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __C,
+ __A,
+ __B,
+ __D);
}
-extern __inline __m128h
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu32_sh (__m128h __A, unsigned int __B)
+_mm512_maskz_cvt_roundepi32_ph (__mmask16 __A, __m512i __B, int __C)
{
- return __builtin_ia32_vcvtusi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtdq2ph512_mask_round ((__v16si) __B,
+ _mm256_setzero_ph (),
+ __A,
+ __C);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+#else
+#define _mm512_cvt_roundepi32_ph(A, B) \
+ (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(A), \
+ _mm256_setzero_ph (), \
+ (__mmask16)-1, \
+ (B)))
+
+#define _mm512_mask_cvt_roundepi32_ph(A, B, C, D) \
+ (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(C), \
+ (A), \
+ (B), \
+ (D)))
+
+#define _mm512_maskz_cvt_roundepi32_ph(A, B, C) \
+ (__builtin_ia32_vcvtdq2ph512_mask_round ((__v16si)(B), \
+ _mm256_setzero_ph (), \
+ (A), \
+ (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtudq2ph. */
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi32_sh (__m128h __A, int __B, const int __R)
+_mm512_cvtepu32_ph (__m512i __A)
{
- return __builtin_ia32_vcvtsi2sh32_round (__A, __B, __R);
+ return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A,
+ _mm256_setzero_ph (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu32_sh (__m128h __A, unsigned int __B, const int __R)
+_mm512_mask_cvtepu32_ph (__m256h __A, __mmask16 __B, __m512i __C)
{
- return __builtin_ia32_vcvtusi2sh32_round (__A, __B, __R);
+ return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C,
+ __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm_cvt_roundi32_sh(A, B, C) \
- (__builtin_ia32_vcvtsi2sh32_round ((A), (B), (C)))
-#define _mm_cvt_roundu32_sh(A, B, C) \
- (__builtin_ia32_vcvtusi2sh32_round ((A), (B), (C)))
-
-#endif /* __OPTIMIZE__ */
-
-#ifdef __x86_64__
-extern __inline __m128h
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvti64_sh (__m128h __A, long long __B)
+_mm512_maskz_cvtepu32_ph (__mmask16 __A, __m512i __B)
{
- return __builtin_ia32_vcvtsi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B,
+ _mm256_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtu64_sh (__m128h __A, unsigned long long __B)
+_mm512_cvt_roundepu32_ph (__m512i __A, int __B)
{
- return __builtin_ia32_vcvtusi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __A,
+ _mm256_setzero_ph (),
+ (__mmask16) -1,
+ __B);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundi64_sh (__m128h __A, long long __B, const int __R)
+_mm512_mask_cvt_roundepu32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
{
- return __builtin_ia32_vcvtsi2sh64_round (__A, __B, __R);
+ return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __C,
+ __A,
+ __B,
+ __D);
}
-extern __inline __m128h
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R)
+_mm512_maskz_cvt_roundepu32_ph (__mmask16 __A, __m512i __B, int __C)
{
- return __builtin_ia32_vcvtusi2sh64_round (__A, __B, __R);
+ return __builtin_ia32_vcvtudq2ph512_mask_round ((__v16si) __B,
+ _mm256_setzero_ph (),
+ __A,
+ __C);
}
-#else
-#define _mm_cvt_roundi64_sh(A, B, C) \
- (__builtin_ia32_vcvtsi2sh64_round ((A), (B), (C)))
-#define _mm_cvt_roundu64_sh(A, B, C) \
- (__builtin_ia32_vcvtusi2sh64_round ((A), (B), (C)))
+#else
+#define _mm512_cvt_roundepu32_ph(A, B) \
+ (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)(A), \
+ _mm256_setzero_ph (), \
+ (__mmask16)-1, \
+ B))
+
+#define _mm512_mask_cvt_roundepu32_ph(A, B, C, D) \
+ (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)C, \
+ A, \
+ B, \
+ D))
+
+#define _mm512_maskz_cvt_roundepu32_ph(A, B, C) \
+ (__builtin_ia32_vcvtudq2ph512_mask_round ((__v16si)B, \
+ _mm256_setzero_ph (), \
+ A, \
+ C))
#endif /* __OPTIMIZE__ */
-#endif /* __x86_64__ */
-/* Intrinsics vcvtph2pd. */
-extern __inline __m512d
+/* Intrinsics vcvtph2qq. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtph_pd (__m128h __A)
+_mm512_cvtph_epi64 (__m128h __A)
{
- return __builtin_ia32_vcvtph2pd512_mask_round (__A,
- _mm512_setzero_pd (),
+ return __builtin_ia32_vcvtph2qq512_mask_round (__A,
+ _mm512_setzero_si512 (),
(__mmask8) -1,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtph_pd (__m512d __A, __mmask8 __B, __m128h __C)
+_mm512_mask_cvtph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
{
- return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B,
+ return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtph_pd (__mmask8 __A, __m128h __B)
+_mm512_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B)
{
- return __builtin_ia32_vcvtph2pd512_mask_round (__B,
- _mm512_setzero_pd (),
+ return __builtin_ia32_vcvtph2qq512_mask_round (__B,
+ _mm512_setzero_si512 (),
__A,
_MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundph_pd (__m128h __A, int __B)
+_mm512_cvt_roundph_epi64 (__m128h __A, int __B)
{
- return __builtin_ia32_vcvtph2pd512_mask_round (__A,
- _mm512_setzero_pd (),
+ return __builtin_ia32_vcvtph2qq512_mask_round (__A,
+ _mm512_setzero_si512 (),
(__mmask8) -1,
__B);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundph_pd (__m512d __A, __mmask8 __B, __m128h __C, int __D)
+_mm512_mask_cvt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
{
- return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B, __D);
+ return __builtin_ia32_vcvtph2qq512_mask_round (__C, __A, __B, __D);
}
-extern __inline __m512d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundph_pd (__mmask8 __A, __m128h __B, int __C)
+_mm512_maskz_cvt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
{
- return __builtin_ia32_vcvtph2pd512_mask_round (__B,
- _mm512_setzero_pd (),
+ return __builtin_ia32_vcvtph2qq512_mask_round (__B,
+ _mm512_setzero_si512 (),
__A,
__C);
}
#else
-#define _mm512_cvt_roundph_pd(A, B) \
- (__builtin_ia32_vcvtph2pd512_mask_round ((A), \
- _mm512_setzero_pd (), \
+#define _mm512_cvt_roundph_epi64(A, B) \
+ (__builtin_ia32_vcvtph2qq512_mask_round ((A), \
+ _mm512_setzero_si512 (), \
(__mmask8)-1, \
(B)))
-#define _mm512_mask_cvt_roundph_pd(A, B, C, D) \
- (__builtin_ia32_vcvtph2pd512_mask_round ((C), (A), (B), (D)))
+#define _mm512_mask_cvt_roundph_epi64(A, B, C, D) \
+ (__builtin_ia32_vcvtph2qq512_mask_round ((C), (A), (B), (D)))
-#define _mm512_maskz_cvt_roundph_pd(A, B, C) \
- (__builtin_ia32_vcvtph2pd512_mask_round ((B), \
- _mm512_setzero_pd (), \
- (A), \
+#define _mm512_maskz_cvt_roundph_epi64(A, B, C) \
+ (__builtin_ia32_vcvtph2qq512_mask_round ((B), \
+ _mm512_setzero_si512 (), \
+ (A), \
(C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtph2psx. */
-extern __inline __m512
+/* Intrinsics vcvtph2uqq. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtxph_ps (__m256h __A)
+_mm512_cvtph_epu64 (__m128h __A)
{
- return __builtin_ia32_vcvtph2psx512_mask_round (__A,
- _mm512_setzero_ps (),
- (__mmask16) -1,
+ return __builtin_ia32_vcvtph2uqq512_mask_round (__A,
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtxph_ps (__m512 __A, __mmask16 __B, __m256h __C)
+_mm512_mask_cvtph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
{
- return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B,
+ return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtxph_ps (__mmask16 __A, __m256h __B)
+_mm512_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B)
{
- return __builtin_ia32_vcvtph2psx512_mask_round (__B,
- _mm512_setzero_ps (),
+ return __builtin_ia32_vcvtph2uqq512_mask_round (__B,
+ _mm512_setzero_si512 (),
__A,
_MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m512
+
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtx_roundph_ps (__m256h __A, int __B)
+_mm512_cvt_roundph_epu64 (__m128h __A, int __B)
{
- return __builtin_ia32_vcvtph2psx512_mask_round (__A,
- _mm512_setzero_ps (),
- (__mmask16) -1,
+ return __builtin_ia32_vcvtph2uqq512_mask_round (__A,
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
__B);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtx_roundph_ps (__m512 __A, __mmask16 __B, __m256h __C, int __D)
+_mm512_mask_cvt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
{
- return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B, __D);
+ return __builtin_ia32_vcvtph2uqq512_mask_round (__C, __A, __B, __D);
}
-extern __inline __m512
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtx_roundph_ps (__mmask16 __A, __m256h __B, int __C)
+_mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
{
- return __builtin_ia32_vcvtph2psx512_mask_round (__B,
- _mm512_setzero_ps (),
+ return __builtin_ia32_vcvtph2uqq512_mask_round (__B,
+ _mm512_setzero_si512 (),
__A,
__C);
}
#else
-#define _mm512_cvtx_roundph_ps(A, B) \
- (__builtin_ia32_vcvtph2psx512_mask_round ((A), \
- _mm512_setzero_ps (), \
- (__mmask16)-1, \
+#define _mm512_cvt_roundph_epu64(A, B) \
+ (__builtin_ia32_vcvtph2uqq512_mask_round ((A), \
+ _mm512_setzero_si512 (), \
+ (__mmask8)-1, \
(B)))
-#define _mm512_mask_cvtx_roundph_ps(A, B, C, D) \
- (__builtin_ia32_vcvtph2psx512_mask_round ((C), (A), (B), (D)))
+#define _mm512_mask_cvt_roundph_epu64(A, B, C, D) \
+ (__builtin_ia32_vcvtph2uqq512_mask_round ((C), (A), (B), (D)))
-#define _mm512_maskz_cvtx_roundph_ps(A, B, C) \
- (__builtin_ia32_vcvtph2psx512_mask_round ((B), \
- _mm512_setzero_ps (), \
+#define _mm512_maskz_cvt_roundph_epu64(A, B, C) \
+ (__builtin_ia32_vcvtph2uqq512_mask_round ((B), \
+ _mm512_setzero_si512 (), \
(A), \
(C)))
+
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtps2ph. */
-extern __inline __m256h
+/* Intrinsics vcvttph2qq. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtxps_ph (__m512 __A)
+_mm512_cvttph_epi64 (__m128h __A)
{
- return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A,
- _mm256_setzero_ph (),
- (__mmask16) -1,
+ return __builtin_ia32_vcvttph2qq512_mask_round (__A,
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtxps_ph (__m256h __A, __mmask16 __B, __m512 __C)
+_mm512_mask_cvttph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
{
- return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C,
- __A, __B,
+ return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m256h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtxps_ph (__mmask16 __A, __m512 __B)
+_mm512_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B)
{
- return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B,
- _mm256_setzero_ph (),
+ return __builtin_ia32_vcvttph2qq512_mask_round (__B,
+ _mm512_setzero_si512 (),
__A,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-#ifdef __OPTIMIZE__
-extern __inline __m256h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtx_roundps_ph (__m512 __A, int __B)
-{
- return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A,
- _mm256_setzero_ph (),
- (__mmask16) -1,
- __B);
-}
-
-extern __inline __m256h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtx_roundps_ph (__m256h __A, __mmask16 __B, __m512 __C, int __D)
-{
- return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C,
- __A, __B, __D);
-}
-
-extern __inline __m256h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtx_roundps_ph (__mmask16 __A, __m512 __B, int __C)
-{
- return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B,
- _mm256_setzero_ph (),
- __A, __C);
-}
-
-#else
-#define _mm512_cvtx_roundps_ph(A, B) \
- (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(A), \
- _mm256_setzero_ph (),\
- (__mmask16)-1, (B)))
-
-#define _mm512_mask_cvtx_roundps_ph(A, B, C, D) \
- (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(C), \
- (A), (B), (D)))
-
-#define _mm512_maskz_cvtx_roundps_ph(A, B, C) \
- (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(B), \
- _mm256_setzero_ph (),\
- (A), (C)))
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vcvtpd2ph. */
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtpd_ph (__m512d __A)
-{
- return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A,
- _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m512d __C)
-{
- return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C,
- __A, __B,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtpd_ph (__mmask8 __A, __m512d __B)
-{
- return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B,
- _mm_setzero_ph (),
- __A,
- _MM_FROUND_CUR_DIRECTION);
+ _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvt_roundpd_ph (__m512d __A, int __B)
+_mm512_cvtt_roundph_epi64 (__m128h __A, int __B)
{
- return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A,
- _mm_setzero_ph (),
- (__mmask8) -1,
- __B);
+ return __builtin_ia32_vcvttph2qq512_mask_round (__A,
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ __B);
}
-extern __inline __m128h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvt_roundpd_ph (__m128h __A, __mmask8 __B, __m512d __C, int __D)
+_mm512_mask_cvtt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
{
- return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C,
- __A, __B, __D);
+ return __builtin_ia32_vcvttph2qq512_mask_round (__C, __A, __B, __D);
}
-extern __inline __m128h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C)
+_mm512_maskz_cvtt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
{
- return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B,
- _mm_setzero_ph (),
- __A, __C);
+ return __builtin_ia32_vcvttph2qq512_mask_round (__B,
+ _mm512_setzero_si512 (),
+ __A,
+ __C);
}
#else
-#define _mm512_cvt_roundpd_ph(A, B) \
- (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(A), \
- _mm_setzero_ph (), \
- (__mmask8)-1, (B)))
+#define _mm512_cvtt_roundph_epi64(A, B) \
+ (__builtin_ia32_vcvttph2qq512_mask_round ((A), \
+ _mm512_setzero_si512 (), \
+ (__mmask8)-1, \
+ (B)))
-#define _mm512_mask_cvt_roundpd_ph(A, B, C, D) \
- (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(C), \
- (A), (B), (D)))
+#define _mm512_mask_cvtt_roundph_epi64(A, B, C, D) \
+ __builtin_ia32_vcvttph2qq512_mask_round ((C), (A), (B), (D))
-#define _mm512_maskz_cvt_roundpd_ph(A, B, C) \
- (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(B), \
- _mm_setzero_ph (), \
- (A), (C)))
+#define _mm512_maskz_cvtt_roundph_epi64(A, B, C) \
+ (__builtin_ia32_vcvttph2qq512_mask_round ((B), \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtsh2ss, vcvtsh2sd. */
-extern __inline __m128
+/* Intrinsics vcvttph2uqq. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_ss (__m128 __A, __m128h __B)
+_mm512_cvttph_epu64 (__m128h __A)
{
- return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
- _mm_setzero_ps (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvttph2uqq512_mask_round (__A,
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvtsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
- __m128h __D)
+_mm512_mask_cvttph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
{
- return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvtsh_ss (__mmask8 __A, __m128 __B,
- __m128h __C)
+_mm512_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B)
{
- return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
- _mm_setzero_ps (),
- __A, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvttph2uqq512_mask_round (__B,
+ _mm512_setzero_si512 (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsh_sd (__m128d __A, __m128h __B)
+_mm512_cvtt_roundph_epu64 (__m128h __A, int __B)
{
- return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
- _mm_setzero_pd (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvttph2uqq512_mask_round (__A,
+ _mm512_setzero_si512 (),
+ (__mmask8) -1,
+ __B);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvtsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
- __m128h __D)
+_mm512_mask_cvtt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
{
- return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvttph2uqq512_mask_round (__C, __A, __B, __D);
}
-extern __inline __m128d
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvtsh_sd (__mmask8 __A, __m128d __B, __m128h __C)
+_mm512_maskz_cvtt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
{
- return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
- _mm_setzero_pd (),
- __A, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvttph2uqq512_mask_round (__B,
+ _mm512_setzero_si512 (),
+ __A,
+ __C);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_ss (__m128 __A, __m128h __B, const int __R)
+#else
+#define _mm512_cvtt_roundph_epu64(A, B) \
+ (__builtin_ia32_vcvttph2uqq512_mask_round ((A), \
+ _mm512_setzero_si512 (), \
+ (__mmask8)-1, \
+ (B)))
+
+#define _mm512_mask_cvtt_roundph_epu64(A, B, C, D) \
+ __builtin_ia32_vcvttph2uqq512_mask_round ((C), (A), (B), (D))
+
+#define _mm512_maskz_cvtt_roundph_epu64(A, B, C) \
+ (__builtin_ia32_vcvttph2uqq512_mask_round ((B), \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtqq2ph. */
+extern __inline __m128h
+ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepi64_ph (__m512i __A)
{
- return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
- _mm_setzero_ps (),
- (__mmask8) -1, __R);
+ return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
- __m128h __D, const int __R)
+_mm512_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m512i __C)
{
- return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, __R);
+ return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C,
+ __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundsh_ss (__mmask8 __A, __m128 __B,
- __m128h __C, const int __R)
+_mm512_maskz_cvtepi64_ph (__mmask8 __A, __m512i __B)
{
- return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
- _mm_setzero_ps (),
- __A, __R);
+ return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B,
+ _mm_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128d
+#ifdef __OPTIMIZE__
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsh_sd (__m128d __A, __m128h __B, const int __R)
+_mm512_cvt_roundepi64_ph (__m512i __A, int __B)
{
- return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
- _mm_setzero_pd (),
- (__mmask8) -1, __R);
+ return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ __B);
}
-extern __inline __m128d
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
- __m128h __D, const int __R)
+_mm512_mask_cvt_roundepi64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
{
- return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, __R);
+ return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __C,
+ __A,
+ __B,
+ __D);
}
-extern __inline __m128d
+extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundsh_sd (__mmask8 __A, __m128d __B, __m128h __C, const int __R)
+_mm512_maskz_cvt_roundepi64_ph (__mmask8 __A, __m512i __B, int __C)
{
- return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
- _mm_setzero_pd (),
- __A, __R);
+ return __builtin_ia32_vcvtqq2ph512_mask_round ((__v8di) __B,
+ _mm_setzero_ph (),
+ __A,
+ __C);
}
#else
-#define _mm_cvt_roundsh_ss(A, B, R) \
- (__builtin_ia32_vcvtsh2ss_mask_round ((B), (A), \
- _mm_setzero_ps (), \
- (__mmask8) -1, (R)))
-
-#define _mm_mask_cvt_roundsh_ss(A, B, C, D, R) \
- (__builtin_ia32_vcvtsh2ss_mask_round ((D), (C), (A), (B), (R)))
-
-#define _mm_maskz_cvt_roundsh_ss(A, B, C, R) \
- (__builtin_ia32_vcvtsh2ss_mask_round ((C), (B), \
- _mm_setzero_ps (), \
- (A), (R)))
-
-#define _mm_cvt_roundsh_sd(A, B, R) \
- (__builtin_ia32_vcvtsh2sd_mask_round ((B), (A), \
- _mm_setzero_pd (), \
- (__mmask8) -1, (R)))
+#define _mm512_cvt_roundepi64_ph(A, B) \
+ (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(A), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, \
+ (B)))
-#define _mm_mask_cvt_roundsh_sd(A, B, C, D, R) \
- (__builtin_ia32_vcvtsh2sd_mask_round ((D), (C), (A), (B), (R)))
+#define _mm512_mask_cvt_roundepi64_ph(A, B, C, D) \
+ (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(C), (A), (B), (D)))
-#define _mm_maskz_cvt_roundsh_sd(A, B, C, R) \
- (__builtin_ia32_vcvtsh2sd_mask_round ((C), (B), \
- _mm_setzero_pd (), \
- (A), (R)))
+#define _mm512_maskz_cvt_roundepi64_ph(A, B, C) \
+ (__builtin_ia32_vcvtqq2ph512_mask_round ((__v8di)(B), \
+ _mm_setzero_ph (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vcvtss2sh, vcvtsd2sh. */
+/* Intrinsics vcvtuqq2ph. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtss_sh (__m128h __A, __m128 __B)
+_mm512_cvtepu64_ph (__m512i __A)
{
- return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
- _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvtss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D)
+_mm512_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m512i __C)
{
- return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C,
+ __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvtss_sh (__mmask8 __A, __m128h __B, __m128 __C)
+_mm512_maskz_cvtepu64_ph (__mmask8 __A, __m512i __B)
{
- return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
- _mm_setzero_ph (),
- __A, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B,
+ _mm_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
+#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsd_sh (__m128h __A, __m128d __B)
+_mm512_cvt_roundepu64_ph (__m512i __A, int __B)
{
- return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
- _mm_setzero_ph (),
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ __B);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvtsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D)
+_mm512_mask_cvt_roundepu64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
{
- return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __C,
+ __A,
+ __B,
+ __D);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvtsd_sh (__mmask8 __A, __m128h __B, __m128d __C)
+_mm512_maskz_cvt_roundepu64_ph (__mmask8 __A, __m512i __B, int __C)
{
- return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
- _mm_setzero_ph (),
- __A, _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di) __B,
+ _mm_setzero_ph (),
+ __A,
+ __C);
}
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+#else
+#define _mm512_cvt_roundepu64_ph(A, B) \
+ (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(A), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, \
+ (B)))
+
+#define _mm512_mask_cvt_roundepu64_ph(A, B, C, D) \
+ (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(C), (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundepu64_ph(A, B, C) \
+ (__builtin_ia32_vcvtuqq2ph512_mask_round ((__v8di)(B), \
+ _mm_setzero_ph (), \
+ (A), \
+ (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2w. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundss_sh (__m128h __A, __m128 __B, const int __R)
+_mm512_cvtph_epi16 (__m512h __A)
{
- return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
- _mm_setzero_ph (),
- (__mmask8) -1, __R);
+ return (__m512i)
+ __builtin_ia32_vcvtph2w512_mask_round (__A,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D,
- const int __R)
+_mm512_mask_cvtph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
{
- return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, __R);
+ return (__m512i)
+ __builtin_ia32_vcvtph2w512_mask_round (__C,
+ (__v32hi) __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundss_sh (__mmask8 __A, __m128h __B, __m128 __C,
- const int __R)
+_mm512_maskz_cvtph_epi16 (__mmask32 __A, __m512h __B)
{
- return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
- _mm_setzero_ph (),
- __A, __R);
+ return (__m512i)
+ __builtin_ia32_vcvtph2w512_mask_round (__B,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvt_roundsd_sh (__m128h __A, __m128d __B, const int __R)
+_mm512_cvt_roundph_epi16 (__m512h __A, int __B)
{
- return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
- _mm_setzero_ph (),
- (__mmask8) -1, __R);
+ return (__m512i)
+ __builtin_ia32_vcvtph2w512_mask_round (__A,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ (__mmask32) -1,
+ __B);
}
-extern __inline __m128h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_cvt_roundsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D,
- const int __R)
+_mm512_mask_cvt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
{
- return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, __R);
+ return (__m512i)
+ __builtin_ia32_vcvtph2w512_mask_round (__C,
+ (__v32hi) __A,
+ __B,
+ __D);
}
-extern __inline __m128h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C,
- const int __R)
+_mm512_maskz_cvt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
{
- return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
- _mm_setzero_ph (),
- __A, __R);
+ return (__m512i)
+ __builtin_ia32_vcvtph2w512_mask_round (__B,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ __A,
+ __C);
}
#else
-#define _mm_cvt_roundss_sh(A, B, R) \
- (__builtin_ia32_vcvtss2sh_mask_round ((B), (A), \
- _mm_setzero_ph (), \
- (__mmask8) -1, R))
-
-#define _mm_mask_cvt_roundss_sh(A, B, C, D, R) \
- (__builtin_ia32_vcvtss2sh_mask_round ((D), (C), (A), (B), (R)))
-
-#define _mm_maskz_cvt_roundss_sh(A, B, C, R) \
- (__builtin_ia32_vcvtss2sh_mask_round ((C), (B), \
- _mm_setzero_ph (), \
- A, R))
-
-#define _mm_cvt_roundsd_sh(A, B, R) \
- (__builtin_ia32_vcvtsd2sh_mask_round ((B), (A), \
- _mm_setzero_ph (), \
- (__mmask8) -1, R))
+#define _mm512_cvt_roundph_epi16(A, B) \
+ ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((A), \
+ (__v32hi) \
+ _mm512_setzero_si512 (), \
+ (__mmask32)-1, \
+ (B)))
-#define _mm_mask_cvt_roundsd_sh(A, B, C, D, R) \
- (__builtin_ia32_vcvtsd2sh_mask_round ((D), (C), (A), (B), (R)))
+#define _mm512_mask_cvt_roundph_epi16(A, B, C, D) \
+ ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((C), \
+ (__v32hi)(A), \
+ (B), \
+ (D)))
-#define _mm_maskz_cvt_roundsd_sh(A, B, C, R) \
- (__builtin_ia32_vcvtsd2sh_mask_round ((C), (B), \
- _mm_setzero_ph (), \
- (A), (R)))
+#define _mm512_maskz_cvt_roundph_epi16(A, B, C) \
+ ((__m512i)__builtin_ia32_vcvtph2w512_mask_round ((B), \
+ (__v32hi) \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vfmaddsub[132,213,231]ph. */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C)
-{
- return (__m512h)
- __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
+/* Intrinsics vcvtph2uw. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+_mm512_cvtph_epu16 (__m512h __A)
{
- return (__m512h)
- __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvtph2uw512_mask_round (__A,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_mask_cvtph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
{
- return (__m512h)
- __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_maskz_cvtph_epu16 (__mmask32 __A, __m512h __B)
{
- return (__m512h)
- __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvtph2uw512_mask_round (__B,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
-{
- return (__m512h)
- __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1, __R);
-}
-
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmaddsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
- __m512h __C, const int __R)
+_mm512_cvt_roundph_epu16 (__m512h __A, int __B)
{
- return (__m512h)
- __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return (__m512i)
+ __builtin_ia32_vcvtph2uw512_mask_round (__A,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ (__mmask32) -1,
+ __B);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
- __mmask32 __U, const int __R)
+_mm512_mask_cvt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
{
- return (__m512h)
- __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return (__m512i)
+ __builtin_ia32_vcvtph2uw512_mask_round (__C, (__v32hi) __A, __B, __D);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmaddsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
- __m512h __C, const int __R)
+_mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
{
- return (__m512h)
- __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return (__m512i)
+ __builtin_ia32_vcvtph2uw512_mask_round (__B,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ __A,
+ __C);
}
#else
-#define _mm512_fmaddsub_round_ph(A, B, C, R) \
- ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fmaddsub_round_ph(A, U, B, C, R) \
- ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvt_roundph_epu16(A, B) \
+ ((__m512i) \
+ __builtin_ia32_vcvtph2uw512_mask_round ((A), \
+ (__v32hi) \
+ _mm512_setzero_si512 (), \
+ (__mmask32)-1, (B)))
-#define _mm512_mask3_fmaddsub_round_ph(A, B, C, U, R) \
- ((__m512h)__builtin_ia32_vfmaddsubph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvt_roundph_epu16(A, B, C, D) \
+ ((__m512i) \
+ __builtin_ia32_vcvtph2uw512_mask_round ((C), (__v32hi)(A), (B), (D)))
-#define _mm512_maskz_fmaddsub_round_ph(U, A, B, C, R) \
- ((__m512h)__builtin_ia32_vfmaddsubph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvt_roundph_epu16(A, B, C) \
+ ((__m512i) \
+ __builtin_ia32_vcvtph2uw512_mask_round ((B), \
+ (__v32hi) \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vfmsubadd[132,213,231]ph. */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
- _mm512_fmsubadd_ph (__m512h __A, __m512h __B, __m512h __C)
-{
- return (__m512h)
- __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
+/* Intrinsics vcvttph2w. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_ph (__m512h __A, __mmask32 __U,
- __m512h __B, __m512h __C)
+_mm512_cvttph_epi16 (__m512h __A)
{
- return (__m512h)
- __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2w512_mask_round (__A,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_ph (__m512h __A, __m512h __B,
- __m512h __C, __mmask32 __U)
+_mm512_mask_cvttph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
{
- return (__m512h)
- __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2w512_mask_round (__C,
+ (__v32hi) __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_ph (__mmask32 __U, __m512h __A,
- __m512h __B, __m512h __C)
+_mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B)
{
- return (__m512h)
- __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2w512_mask_round (__B,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsubadd_round_ph (__m512h __A, __m512h __B,
- __m512h __C, const int __R)
-{
- return (__m512h)
- __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1, __R);
-}
-
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsubadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
- __m512h __C, const int __R)
+_mm512_cvtt_roundph_epi16 (__m512h __A, int __B)
{
- return (__m512h)
- __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2w512_mask_round (__A,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ (__mmask32) -1,
+ __B);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsubadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
- __mmask32 __U, const int __R)
+_mm512_mask_cvtt_roundph_epi16 (__m512i __A, __mmask32 __B,
+ __m512h __C, int __D)
{
- return (__m512h)
- __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2w512_mask_round (__C,
+ (__v32hi) __A,
+ __B,
+ __D);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
- __m512h __C, const int __R)
+_mm512_maskz_cvtt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
{
- return (__m512h)
- __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2w512_mask_round (__B,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ __A,
+ __C);
}
#else
-#define _mm512_fmsubadd_round_ph(A, B, C, R) \
- ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fmsubadd_round_ph(A, U, B, C, R) \
- ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvtt_roundph_epi16(A, B) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2w512_mask_round ((A), \
+ (__v32hi) \
+ _mm512_setzero_si512 (), \
+ (__mmask32)-1, \
+ (B)))
-#define _mm512_mask3_fmsubadd_round_ph(A, B, C, U, R) \
- ((__m512h)__builtin_ia32_vfmsubaddph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvtt_roundph_epi16(A, B, C, D) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2w512_mask_round ((C), \
+ (__v32hi)(A), \
+ (B), \
+ (D)))
-#define _mm512_maskz_fmsubadd_round_ph(U, A, B, C, R) \
- ((__m512h)__builtin_ia32_vfmsubaddph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvtt_roundph_epi16(A, B, C) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2w512_mask_round ((B), \
+ (__v32hi) \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vfmadd[132,213,231]ph. */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
- _mm512_fmadd_ph (__m512h __A, __m512h __B, __m512h __C)
-{
- return (__m512h)
- __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
-{
- return (__m512h)
- __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
+/* Intrinsics vcvttph2uw. */
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_cvttph_epu16 (__m512h __A)
{
- return (__m512h)
- __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2uw512_mask_round (__A,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_mask_cvttph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
{
- return (__m512h)
- __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512i)
+ __builtin_ia32_vcvttph2uw512_mask_round (__C,
+ (__v32hi) __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+_mm512_maskz_cvttph_epu16 (__mmask32 __A, __m512h __B)
{
- return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2uw512_mask_round (__B,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
- __m512h __C, const int __R)
+_mm512_cvtt_roundph_epu16 (__m512h __A, int __B)
{
- return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2uw512_mask_round (__A,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ (__mmask32) -1,
+ __B);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
- __mmask32 __U, const int __R)
+_mm512_mask_cvtt_roundph_epu16 (__m512i __A, __mmask32 __B,
+ __m512h __C, int __D)
{
- return (__m512h) __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2uw512_mask_round (__C,
+ (__v32hi) __A,
+ __B,
+ __D);
}
-extern __inline __m512h
+extern __inline __m512i
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
- __m512h __C, const int __R)
+_mm512_maskz_cvtt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
{
- return (__m512h) __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return (__m512i)
+ __builtin_ia32_vcvttph2uw512_mask_round (__B,
+ (__v32hi)
+ _mm512_setzero_si512 (),
+ __A,
+ __C);
}
#else
-#define _mm512_fmadd_round_ph(A, B, C, R) \
- ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fmadd_round_ph(A, U, B, C, R) \
- ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvtt_roundph_epu16(A, B) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2uw512_mask_round ((A), \
+ (__v32hi) \
+ _mm512_setzero_si512 (), \
+ (__mmask32)-1, \
+ (B)))
-#define _mm512_mask3_fmadd_round_ph(A, B, C, U, R) \
- ((__m512h)__builtin_ia32_vfmaddph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvtt_roundph_epu16(A, B, C, D) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2uw512_mask_round ((C), \
+ (__v32hi)(A), \
+ (B), \
+ (D)))
-#define _mm512_maskz_fmadd_round_ph(U, A, B, C, R) \
- ((__m512h)__builtin_ia32_vfmaddph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvtt_roundph_epu16(A, B, C) \
+ ((__m512i) \
+ __builtin_ia32_vcvttph2uw512_mask_round ((B), \
+ (__v32hi) \
+ _mm512_setzero_si512 (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vfnmadd[132,213,231]ph. */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C)
-{
- return (__m512h)
- __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
+/* Intrinsics vcvtw2ph. */
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+_mm512_cvtepi16_ph (__m512i __A)
{
- return (__m512h)
- __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A,
+ _mm512_setzero_ph (),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_mask_cvtepi16_ph (__m512h __A, __mmask32 __B, __m512i __C)
{
- return (__m512h)
- __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C,
+ __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_maskz_cvtepi16_ph (__mmask32 __A, __m512i __B)
{
- return (__m512h)
- __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B,
+ _mm512_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
-{
- return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1, __R);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
- __m512h __C, const int __R)
+_mm512_cvt_roundepi16_ph (__m512i __A, int __B)
{
- return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __A,
+ _mm512_setzero_ph (),
+ (__mmask32) -1,
+ __B);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
- __mmask32 __U, const int __R)
+_mm512_mask_cvt_roundepi16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
{
- return (__m512h) __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __C,
+ __A,
+ __B,
+ __D);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
- __m512h __C, const int __R)
+_mm512_maskz_cvt_roundepi16_ph (__mmask32 __A, __m512i __B, int __C)
{
- return (__m512h) __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return __builtin_ia32_vcvtw2ph512_mask_round ((__v32hi) __B,
+ _mm512_setzero_ph (),
+ __A,
+ __C);
}
#else
-#define _mm512_fnmadd_round_ph(A, B, C, R) \
- ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fnmadd_round_ph(A, U, B, C, R) \
- ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvt_roundepi16_ph(A, B) \
+ (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(A), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, \
+ (B)))
-#define _mm512_mask3_fnmadd_round_ph(A, B, C, U, R) \
- ((__m512h)__builtin_ia32_vfnmaddph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvt_roundepi16_ph(A, B, C, D) \
+ (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(C), \
+ (A), \
+ (B), \
+ (D)))
-#define _mm512_maskz_fnmadd_round_ph(U, A, B, C, R) \
- ((__m512h)__builtin_ia32_vfnmaddph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvt_roundepi16_ph(A, B, C) \
+ (__builtin_ia32_vcvtw2ph512_mask_round ((__v32hi)(B), \
+ _mm512_setzero_ph (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vfmsub[132,213,231]ph. */
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_ph (__m512h __A, __m512h __B, __m512h __C)
-{
- return (__m512h)
- __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
-{
- return (__m512h)
- __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
-}
+/* Intrinsics vcvtuw2ph. */
+ extern __inline __m512h
+ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+ _mm512_cvtepu16_ph (__m512i __A)
+ {
+ return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A,
+ _mm512_setzero_ph (),
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
+ }
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_mask_cvtepu16_ph (__m512h __A, __mmask32 __B, __m512i __C)
{
- return (__m512h)
- __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C,
+ __A,
+ __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_maskz_cvtepu16_ph (__mmask32 __A, __m512i __B)
{
- return (__m512h)
- __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B,
+ _mm512_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
-{
- return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1, __R);
-}
-
-extern __inline __m512h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
- __m512h __C, const int __R)
+_mm512_cvt_roundepu16_ph (__m512i __A, int __B)
{
- return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __A,
+ _mm512_setzero_ph (),
+ (__mmask32) -1,
+ __B);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
- __mmask32 __U, const int __R)
+_mm512_mask_cvt_roundepu16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
{
- return (__m512h) __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __C,
+ __A,
+ __B,
+ __D);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
- __m512h __C, const int __R)
+_mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C)
{
- return (__m512h) __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return __builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi) __B,
+ _mm512_setzero_ph (),
+ __A,
+ __C);
}
#else
-#define _mm512_fmsub_round_ph(A, B, C, R) \
- ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fmsub_round_ph(A, U, B, C, R) \
- ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), (U), (R)))
+#define _mm512_cvt_roundepu16_ph(A, B) \
+ (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(A), \
+ _mm512_setzero_ph (), \
+ (__mmask32)-1, \
+ (B)))
-#define _mm512_mask3_fmsub_round_ph(A, B, C, U, R) \
- ((__m512h)__builtin_ia32_vfmsubph512_mask3 ((A), (B), (C), (U), (R)))
+#define _mm512_mask_cvt_roundepu16_ph(A, B, C, D) \
+ (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(C), \
+ (A), \
+ (B), \
+ (D)))
-#define _mm512_maskz_fmsub_round_ph(U, A, B, C, R) \
- ((__m512h)__builtin_ia32_vfmsubph512_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_maskz_cvt_roundepu16_ph(A, B, C) \
+ (__builtin_ia32_vcvtuw2ph512_mask_round ((__v32hi)(B), \
+ _mm512_setzero_ph (), \
+ (A), \
+ (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vfnmsub[132,213,231]ph. */
-extern __inline __m512h
+/* Intrinsics vcvtph2pd. */
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C)
+_mm512_cvtph_pd (__m128h __A)
{
- return (__m512h)
- __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtph2pd512_mask_round (__A,
+ _mm512_setzero_pd (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+_mm512_mask_cvtph_pd (__m512d __A, __mmask8 __B, __m128h __C)
{
- return (__m512h)
- __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+_mm512_maskz_cvtph_pd (__mmask8 __A, __m128h __B)
{
- return (__m512h)
- __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtph2pd512_mask_round (__B,
+ _mm512_setzero_pd (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+#ifdef __OPTIMIZE__
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+_mm512_cvt_roundph_pd (__m128h __A, int __B)
{
- return (__m512h)
- __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtph2pd512_mask_round (__A,
+ _mm512_setzero_pd (),
+ (__mmask8) -1,
+ __B);
}
-#ifdef __OPTIMIZE__
-extern __inline __m512h
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+_mm512_mask_cvt_roundph_pd (__m512d __A, __mmask8 __B, __m128h __C, int __D)
{
- return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) -1, __R);
+ return __builtin_ia32_vcvtph2pd512_mask_round (__C, __A, __B, __D);
}
-extern __inline __m512h
+extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fnmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
- __m512h __C, const int __R)
+_mm512_maskz_cvt_roundph_pd (__mmask8 __A, __m128h __B, int __C)
{
- return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return __builtin_ia32_vcvtph2pd512_mask_round (__B,
+ _mm512_setzero_pd (),
+ __A,
+ __C);
}
-extern __inline __m512h
+#else
+#define _mm512_cvt_roundph_pd(A, B) \
+ (__builtin_ia32_vcvtph2pd512_mask_round ((A), \
+ _mm512_setzero_pd (), \
+ (__mmask8)-1, \
+ (B)))
+
+#define _mm512_mask_cvt_roundph_pd(A, B, C, D) \
+ (__builtin_ia32_vcvtph2pd512_mask_round ((C), (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundph_pd(A, B, C) \
+ (__builtin_ia32_vcvtph2pd512_mask_round ((B), \
+ _mm512_setzero_pd (), \
+ (A), \
+ (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2psx. */
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
- __mmask32 __U, const int __R)
+_mm512_cvtxph_ps (__m256h __A)
{
- return (__m512h) __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return __builtin_ia32_vcvtph2psx512_mask_round (__A,
+ _mm512_setzero_ps (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m512h
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
- __m512h __C, const int __R)
+_mm512_mask_cvtxph_ps (__m512 __A, __mmask16 __B, __m256h __C)
{
- return (__m512h) __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- (__mmask32) __U, __R);
+ return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm512_fnmsub_round_ph(A, B, C, R) \
- ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), -1, (R)))
-
-#define _mm512_mask_fnmsub_round_ph(A, U, B, C, R) \
- ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), (U), (R)))
-
-#define _mm512_mask3_fnmsub_round_ph(A, B, C, U, R) \
- ((__m512h)__builtin_ia32_vfnmsubph512_mask3 ((A), (B), (C), (U), (R)))
-
-#define _mm512_maskz_fnmsub_round_ph(U, A, B, C, R) \
- ((__m512h)__builtin_ia32_vfnmsubph512_maskz ((A), (B), (C), (U), (R)))
-
-#endif /* __OPTIMIZE__ */
-
-/* Intrinsics vfmadd[132,213,231]sh. */
-extern __inline __m128h
- __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_sh (__m128h __W, __m128h __A, __m128h __B)
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtxph_ps (__mmask16 __A, __m256h __B)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) -1,
+ return __builtin_ia32_vcvtph2psx512_mask_round (__B,
+ _mm512_setzero_ps (),
+ __A,
_MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtx_roundph_ps (__m256h __A, int __B)
+{
+ return __builtin_ia32_vcvtph2psx512_mask_round (__A,
+ _mm512_setzero_ps (),
+ (__mmask16) -1,
+ __B);
+}
+
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm512_mask_cvtx_roundph_ps (__m512 __A, __mmask16 __B, __m256h __C, int __D)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtph2psx512_mask_round (__C, __A, __B, __D);
}
-extern __inline __m128h
+extern __inline __m512
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+_mm512_maskz_cvtx_roundph_ps (__mmask16 __A, __m256h __B, int __C)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtph2psx512_mask_round (__B,
+ _mm512_setzero_ps (),
+ __A,
+ __C);
}
-extern __inline __m128h
+#else
+#define _mm512_cvtx_roundph_ps(A, B) \
+ (__builtin_ia32_vcvtph2psx512_mask_round ((A), \
+ _mm512_setzero_ps (), \
+ (__mmask16)-1, \
+ (B)))
+
+#define _mm512_mask_cvtx_roundph_ps(A, B, C, D) \
+ (__builtin_ia32_vcvtph2psx512_mask_round ((C), (A), (B), (D)))
+
+#define _mm512_maskz_cvtx_roundph_ps(A, B, C) \
+ (__builtin_ia32_vcvtph2psx512_mask_round ((B), \
+ _mm512_setzero_ps (), \
+ (A), \
+ (C)))
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtps2ph. */
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+_mm512_cvtxps_ph (__m512 __A)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A,
+ _mm256_setzero_ph (),
+ (__mmask16) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtxps_ph (__m256h __A, __mmask16 __B, __m512 __C)
+{
+ return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C,
+ __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
+}
-#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+_mm512_maskz_cvtxps_ph (__mmask16 __A, __m512 __B)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) -1,
- __R);
+ return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B,
+ _mm256_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+#ifdef __OPTIMIZE__
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
- const int __R)
+_mm512_cvtx_roundps_ph (__m512 __A, int __B)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U, __R);
+ return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __A,
+ _mm256_setzero_ph (),
+ (__mmask16) -1,
+ __B);
}
-extern __inline __m128h
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
- const int __R)
+_mm512_mask_cvtx_roundps_ph (__m256h __A, __mmask16 __B, __m512 __C, int __D)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U, __R);
+ return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __C,
+ __A, __B, __D);
}
-extern __inline __m128h
+extern __inline __m256h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
- __m128h __B, const int __R)
+_mm512_maskz_cvtx_roundps_ph (__mmask16 __A, __m512 __B, int __C)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U, __R);
+ return __builtin_ia32_vcvtps2phx512_mask_round ((__v16sf) __B,
+ _mm256_setzero_ph (),
+ __A, __C);
}
#else
-#define _mm_fmadd_round_sh(A, B, C, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (-1), (R)))
-#define _mm_mask_fmadd_round_sh(A, U, B, C, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (U), (R)))
-#define _mm_mask3_fmadd_round_sh(A, B, C, U, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_mask3 ((A), (B), (C), (U), (R)))
-#define _mm_maskz_fmadd_round_sh(U, A, B, C, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_cvtx_roundps_ph(A, B) \
+ (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(A), \
+ _mm256_setzero_ph (),\
+ (__mmask16)-1, (B)))
-#endif /* __OPTIMIZE__ */
+#define _mm512_mask_cvtx_roundps_ph(A, B, C, D) \
+ (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(C), \
+ (A), (B), (D)))
-/* Intrinsics vfnmadd[132,213,231]sh. */
-extern __inline __m128h
- __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B)
-{
- return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
-}
+#define _mm512_maskz_cvtx_roundps_ph(A, B, C) \
+ (__builtin_ia32_vcvtps2phx512_mask_round ((__v16sf)(B), \
+ _mm256_setzero_ph (),\
+ (A), (C)))
+#endif /* __OPTIMIZE__ */
+/* Intrinsics vcvtpd2ph. */
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm512_cvtpd_ph (__m512d __A)
{
- return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+_mm512_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m512d __C)
{
- return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C,
+ __A, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+_mm512_maskz_cvtpd_ph (__mmask8 __A, __m512d __B)
{
- return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B,
+ _mm_setzero_ph (),
+ __A,
+ _MM_FROUND_CUR_DIRECTION);
}
-
#ifdef __OPTIMIZE__
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
-{
- return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) -1,
- __R);
-}
-
-extern __inline __m128h
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
- const int __R)
+_mm512_cvt_roundpd_ph (__m512d __A, int __B)
{
- return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U, __R);
+ return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __A,
+ _mm_setzero_ph (),
+ (__mmask8) -1,
+ __B);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
- const int __R)
+_mm512_mask_cvt_roundpd_ph (__m128h __A, __mmask8 __B, __m512d __C, int __D)
{
- return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U, __R);
+ return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __C,
+ __A, __B, __D);
}
extern __inline __m128h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
- __m128h __B, const int __R)
+_mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C)
{
- return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U, __R);
+ return __builtin_ia32_vcvtpd2ph512_mask_round ((__v8df) __B,
+ _mm_setzero_ph (),
+ __A, __C);
}
#else
-#define _mm_fnmadd_round_sh(A, B, C, R) \
- ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (-1), (R)))
-#define _mm_mask_fnmadd_round_sh(A, U, B, C, R) \
- ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (U), (R)))
-#define _mm_mask3_fnmadd_round_sh(A, B, C, U, R) \
- ((__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((A), (B), (C), (U), (R)))
-#define _mm_maskz_fnmadd_round_sh(U, A, B, C, R) \
- ((__m128h) __builtin_ia32_vfnmaddsh3_maskz ((A), (B), (C), (U), (R)))
+#define _mm512_cvt_roundpd_ph(A, B) \
+ (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(A), \
+ _mm_setzero_ph (), \
+ (__mmask8)-1, (B)))
+
+#define _mm512_mask_cvt_roundpd_ph(A, B, C, D) \
+ (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(C), \
+ (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundpd_ph(A, B, C) \
+ (__builtin_ia32_vcvtpd2ph512_mask_round ((__v8df)(B), \
+ _mm_setzero_ph (), \
+ (A), (C)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vfmsub[132,213,231]sh. */
-extern __inline __m128h
- __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmsub_sh (__m128h __W, __m128h __A, __m128h __B)
-{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+/* Intrinsics vfmaddsub[132,213,231]ph. */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C)
+{
+ return (__m512h)
+ __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm512_mask_fmaddsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+_mm512_mask3_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
{
- return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+_mm512_maskz_fmaddsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
- (__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-
#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+_mm512_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) -1,
- __R);
+ return (__m512h)
+ __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1, __R);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
- const int __R)
+_mm512_mask_fmaddsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- (__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) __U, __R);
+ return (__m512h)
+ __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
- const int __R)
+_mm512_mask3_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
+ __mmask32 __U, const int __R)
{
- return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
- (__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U, __R);
+ return (__m512h)
+ __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
- __m128h __B, const int __R)
+_mm512_maskz_fmaddsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
- (__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) __U, __R);
+ return (__m512h)
+ __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
#else
-#define _mm_fmsub_round_sh(A, B, C, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (-1), (R)))
-#define _mm_mask_fmsub_round_sh(A, U, B, C, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (U), (R)))
-#define _mm_mask3_fmsub_round_sh(A, B, C, U, R) \
- ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), (B), (C), (U), (R)))
-#define _mm_maskz_fmsub_round_sh(U, A, B, C, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), -(C), (U), (R)))
+#define _mm512_fmaddsub_round_ph(A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmaddsub_round_ph(A, U, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmaddsub_round_ph(A, B, C, U, R) \
+ ((__m512h)__builtin_ia32_vfmaddsubph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmaddsub_round_ph(U, A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmaddsubph512_maskz ((A), (B), (C), (U), (R)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vfnmsub[132,213,231]sh. */
-extern __inline __m128h
- __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B)
+/* Intrinsics vfmsubadd[132,213,231]ph. */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+ _mm512_fmsubadd_ph (__m512h __A, __m512h __B, __m512h __C)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- -(__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) -1,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+_mm512_mask_fmsubadd_ph (__m512h __A, __mmask32 __U,
+ __m512h __B, __m512h __C)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- -(__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+_mm512_mask3_fmsubadd_ph (__m512h __A, __m512h __B,
+ __m512h __C, __mmask32 __U)
{
- return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
- -(__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+_mm512_maskz_fmsubadd_ph (__mmask32 __U, __m512h __A,
+ __m512h __B, __m512h __C)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
- -(__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) __U,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-
#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+_mm512_fmsubadd_round_ph (__m512h __A, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- -(__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) -1,
- __R);
+ return (__m512h)
+ __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1, __R);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fnmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
- const int __R)
+_mm512_mask_fmsubadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
- -(__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) __U, __R);
+ return (__m512h)
+ __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
- const int __R)
+_mm512_mask3_fmsubadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
+ __mmask32 __U, const int __R)
{
- return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
- -(__v8hf) __A,
- (__v8hf) __B,
- (__mmask8) __U, __R);
+ return (__m512h)
+ __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
- __m128h __B, const int __R)
-{
- return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
- -(__v8hf) __A,
- -(__v8hf) __B,
- (__mmask8) __U, __R);
+_mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+ __m512h __C, const int __R)
+{
+ return (__m512h)
+ __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
#else
-#define _mm_fnmsub_round_sh(A, B, C, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (-1), (R)))
-#define _mm_mask_fnmsub_round_sh(A, U, B, C, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (U), (R)))
-#define _mm_mask3_fnmsub_round_sh(A, B, C, U, R) \
- ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), -(B), (C), (U), (R)))
-#define _mm_maskz_fnmsub_round_sh(U, A, B, C, R) \
- ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), -(B), -(C), (U), (R)))
+#define _mm512_fmsubadd_round_ph(A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmsubadd_round_ph(A, U, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmsubadd_round_ph(A, B, C, U, R) \
+ ((__m512h)__builtin_ia32_vfmsubaddph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmsubadd_round_ph(U, A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmsubaddph512_maskz ((A), (B), (C), (U), (R)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vf[,c]maddcph. */
+/* Intrinsics vfmadd[132,213,231]ph. */
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C)
+ _mm512_fmadd_ph (__m512h __A, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+_mm512_mask_fmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
- (__v32hf) __C,
- (__v32hf) __D, __B,
- _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
+_mm512_mask3_fmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
{
return (__m512h)
- __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- __D, _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fcmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
+_mm512_maskz_fmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B,
- (__v32hf) __C,
- (__v32hf) __D,
- __A, _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
+#ifdef __OPTIMIZE__
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_pch (__m512h __A, __m512h __B, __m512h __C)
+_mm512_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmaddcph512_round ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1, __R);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+_mm512_mask_fmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A,
- (__v32hf) __C,
- (__v32hf) __D, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
+_mm512_mask3_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
+ __mmask32 __U, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- __D, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
+_mm512_maskz_fmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B,
- (__v32hf) __C,
- (__v32hf) __D,
- __A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
-#ifdef __OPTIMIZE__
+#else
+#define _mm512_fmadd_round_ph(A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmadd_round_ph(A, U, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmadd_round_ph(A, B, C, U, R) \
+ ((__m512h)__builtin_ia32_vfmaddph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmadd_round_ph(U, A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmaddph512_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfnmadd[132,213,231]ph. */
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
+_mm512_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- __D);
+ __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fcmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm512_mask_fnmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
- (__v32hf) __C,
- (__v32hf) __D, __B,
- __E);
+ __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
- __mmask16 __D, const int __E)
+_mm512_mask3_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
{
return (__m512h)
- __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- __D, __E);
+ __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fcmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
- __m512h __D, const int __E)
+_mm512_maskz_fnmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B,
- (__v32hf) __C,
- (__v32hf) __D,
- __A, __E);
+ __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
+#ifdef __OPTIMIZE__
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
+_mm512_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmaddcph512_round ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- __D);
+ return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1, __R);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm512_mask_fnmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A,
- (__v32hf) __C,
- (__v32hf) __D, __B,
- __E);
+ return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask3_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
- __mmask16 __D, const int __E)
+_mm512_mask3_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
+ __mmask32 __U, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A,
- (__v32hf) __B,
- (__v32hf) __C,
- __D, __E);
+ return (__m512h) __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
- __m512h __D, const int __E)
+_mm512_maskz_fnmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B,
- (__v32hf) __C,
- (__v32hf) __D,
- __A, __E);
+ return (__m512h) __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
#else
-#define _mm512_fcmadd_round_pch(A, B, C, D) \
- (__m512h) __builtin_ia32_vfcmaddcph512_round ((A), (B), (C), (D))
-
-#define _mm512_mask_fcmadd_round_pch(A, B, C, D, E) \
- ((__m512h) \
- __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) (A), \
- (__v32hf) (C), \
- (__v32hf) (D), \
- (B), (E)))
-
-
-#define _mm512_mask3_fcmadd_round_pch(A, B, C, D, E) \
- ((__m512h) \
- __builtin_ia32_vfcmaddcph512_mask3_round ((A), (B), (C), (D), (E)))
-
-#define _mm512_maskz_fcmadd_round_pch(A, B, C, D, E) \
- (__m512h) \
- __builtin_ia32_vfcmaddcph512_maskz_round ((B), (C), (D), (A), (E))
-
-#define _mm512_fmadd_round_pch(A, B, C, D) \
- (__m512h) __builtin_ia32_vfmaddcph512_round ((A), (B), (C), (D))
+#define _mm512_fnmadd_round_ph(A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), -1, (R)))
-#define _mm512_mask_fmadd_round_pch(A, B, C, D, E) \
- ((__m512h) \
- __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) (A), \
- (__v32hf) (C), \
- (__v32hf) (D), \
- (B), (E)))
+#define _mm512_mask_fnmadd_round_ph(A, U, B, C, R) \
+ ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), (U), (R)))
-#define _mm512_mask3_fmadd_round_pch(A, B, C, D, E) \
- (__m512h) \
- __builtin_ia32_vfmaddcph512_mask3_round ((A), (B), (C), (D), (E))
+#define _mm512_mask3_fnmadd_round_ph(A, B, C, U, R) \
+ ((__m512h)__builtin_ia32_vfnmaddph512_mask3 ((A), (B), (C), (U), (R)))
-#define _mm512_maskz_fmadd_round_pch(A, B, C, D, E) \
- (__m512h) \
- __builtin_ia32_vfmaddcph512_maskz_round ((B), (C), (D), (A), (E))
+#define _mm512_maskz_fnmadd_round_ph(U, A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfnmaddph512_maskz ((A), (B), (C), (U), (R)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vf[,c]mulcph. */
+/* Intrinsics vfmsub[132,213,231]ph. */
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fcmul_pch (__m512h __A, __m512h __B)
+_mm512_fmsub_ph (__m512h __A, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfcmulcph512_round ((__v32hf) __A,
- (__v32hf) __B,
- _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fcmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+_mm512_mask_fmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C,
- (__v32hf) __D,
- (__v32hf) __A,
- __B, _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fcmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
+_mm512_mask3_fmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
{
return (__m512h)
- __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B,
- (__v32hf) __C,
- _mm512_setzero_ph (),
- __A, _MM_FROUND_CUR_DIRECTION);
+ __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmul_pch (__m512h __A, __m512h __B)
+_mm512_maskz_fmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfmulcph512_round ((__v32hf) __A,
+ __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
(__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
_MM_FROUND_CUR_DIRECTION);
}
+#ifdef __OPTIMIZE__
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+_mm512_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C,
- (__v32hf) __D,
- (__v32hf) __A,
- __B, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1, __R);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
+_mm512_mask_fmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B,
- (__v32hf) __C,
- _mm512_setzero_ph (),
- __A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
-#ifdef __OPTIMIZE__
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fcmul_round_pch (__m512h __A, __m512h __B, const int __D)
+_mm512_mask3_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
+ __mmask32 __U, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfcmulcph512_round ((__v32hf) __A,
- (__v32hf) __B, __D);
+ return (__m512h) __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fcmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm512_maskz_fmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+ __m512h __C, const int __R)
{
- return (__m512h)
- __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C,
- (__v32hf) __D,
- (__v32hf) __A,
- __B, __E);
+ return (__m512h) __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
}
+#else
+#define _mm512_fmsub_round_ph(A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmsub_round_ph(A, U, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmsubph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmsub_round_ph(A, B, C, U, R) \
+ ((__m512h)__builtin_ia32_vfmsubph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmsub_round_ph(U, A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfmsubph512_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfnmsub[132,213,231]ph. */
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fcmul_round_pch (__mmask16 __A, __m512h __B,
- __m512h __C, const int __E)
+_mm512_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B,
- (__v32hf) __C,
- _mm512_setzero_ph (),
- __A, __E);
+ __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_fmul_round_pch (__m512h __A, __m512h __B, const int __D)
+_mm512_mask_fnmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfmulcph512_round ((__v32hf) __A,
+ __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
(__v32hf) __B,
- __D);
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_fmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
- __m512h __D, const int __E)
+_mm512_mask3_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
{
return (__m512h)
- __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C,
- (__v32hf) __D,
- (__v32hf) __A,
- __B, __E);
+ __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B,
- __m512h __C, const int __E)
+_mm512_maskz_fnmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
{
return (__m512h)
- __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B,
- (__v32hf) __C,
- _mm512_setzero_ph (),
- __A, __E);
+ __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U,
+ _MM_FROUND_CUR_DIRECTION);
}
-#else
-#define _mm512_fcmul_round_pch(A, B, D) \
- (__m512h) __builtin_ia32_vfcmulcph512_round ((A), (B), (D))
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+{
+ return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) -1, __R);
+}
-#define _mm512_mask_fcmul_round_pch(A, B, C, D, E) \
- (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((C), (D), (A), (B), (E))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fnmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+ __m512h __C, const int __R)
+{
+ return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
+}
-#define _mm512_maskz_fcmul_round_pch(A, B, C, E) \
- (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((B), (C), \
- (__v32hf) \
- _mm512_setzero_ph (), \
- (A), (E))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
+ __mmask32 __U, const int __R)
+{
+ return (__m512h) __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
+}
-#define _mm512_fmul_round_pch(A, B, D) \
- (__m512h) __builtin_ia32_vfmulcph512_round ((A), (B), (D))
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+ __m512h __C, const int __R)
+{
+ return (__m512h) __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ (__mmask32) __U, __R);
+}
-#define _mm512_mask_fmul_round_pch(A, B, C, D, E) \
- (__m512h) __builtin_ia32_vfmulcph512_mask_round ((C), (D), (A), (B), (E))
+#else
+#define _mm512_fnmsub_round_ph(A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), -1, (R)))
-#define _mm512_maskz_fmul_round_pch(A, B, C, E) \
- (__m512h) __builtin_ia32_vfmulcph512_mask_round ((B), (C), \
- (__v32hf) \
- _mm512_setzero_ph (), \
- (A), (E))
+#define _mm512_mask_fnmsub_round_ph(A, U, B, C, R) \
+ ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fnmsub_round_ph(A, B, C, U, R) \
+ ((__m512h)__builtin_ia32_vfnmsubph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fnmsub_round_ph(U, A, B, C, R) \
+ ((__m512h)__builtin_ia32_vfnmsubph512_maskz ((A), (B), (C), (U), (R)))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vf[,c]maddcsh. */
-extern __inline __m128h
+/* Intrinsics vf[,c]maddcph. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fcmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm512_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C)
{
- return (__m128h)
- __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A,
- (__v8hf) __C,
- (__v8hf) __D, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
+_mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
{
- return (__m128h)
- __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf) __C, __D,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
+ (__v32hf) __C,
+ (__v32hf) __D, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fcmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
+_mm512_mask3_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
{
- return (__m128h)
- __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B,
- (__v8hf) __C,
- (__v8hf) __D,
- __A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ __D, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C)
+_mm512_maskz_fcmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
{
- return (__m128h)
- __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf) __C,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B,
+ (__v32hf) __C,
+ (__v32hf) __D,
+ __A, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm512_fmadd_pch (__m512h __A, __m512h __B, __m512h __C)
{
- return (__m128h)
- __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A,
- (__v8hf) __C,
- (__v8hf) __D, __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmaddcph512_round ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
+_mm512_mask_fmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
{
- return (__m128h)
- __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf) __C, __D,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A,
+ (__v32hf) __C,
+ (__v32hf) __D, __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
+_mm512_mask3_fmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
{
- return (__m128h)
- __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B,
- (__v8hf) __C,
- (__v8hf) __D,
- __A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ __D, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_sch (__m128h __A, __m128h __B, __m128h __C)
+_mm512_maskz_fmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
{
- return (__m128h)
- __builtin_ia32_vfmaddcsh_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf) __C,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B,
+ (__v32hf) __C,
+ (__v32hf) __D,
+ __A, _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fcmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm512_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
{
- return (__m128h)
- __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) __A,
- (__v8hf) __C,
- (__v8hf) __D,
- __B, __E);
+ return (__m512h)
+ __builtin_ia32_vfcmaddcph512_round ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ __D);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
- __mmask8 __D, const int __E)
+_mm512_mask_fcmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf) __C,
- __D, __E);
+ return (__m512h)
+ __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
+ (__v32hf) __C,
+ (__v32hf) __D, __B,
+ __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fcmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
- __m128h __D, const int __E)
+_mm512_mask3_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
+ __mmask16 __D, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfcmaddcsh_maskz_round ((__v8hf) __B,
- (__v8hf) __C,
- (__v8hf) __D,
- __A, __E);
+ return (__m512h)
+ __builtin_ia32_vfcmaddcph512_mask3_round ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ __D, __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
+_mm512_maskz_fcmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfcmaddcsh_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf) __C,
- __D);
+ return (__m512h)
+ __builtin_ia32_vfcmaddcph512_maskz_round ((__v32hf) __B,
+ (__v32hf) __C,
+ (__v32hf) __D,
+ __A, __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm512_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
{
- return (__m128h)
- __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) __A,
- (__v8hf) __C,
- (__v8hf) __D,
- __B, __E);
+ return (__m512h)
+ __builtin_ia32_vfmaddcph512_round ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ __D);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask3_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
- __mmask8 __D, const int __E)
+_mm512_mask_fmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf) __C,
- __D, __E);
+ return (__m512h)
+ __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) __A,
+ (__v32hf) __C,
+ (__v32hf) __D, __B,
+ __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
- __m128h __D, const int __E)
+_mm512_mask3_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
+ __mmask16 __D, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfmaddcsh_maskz_round ((__v8hf) __B,
- (__v8hf) __C,
- (__v8hf) __D,
- __A, __E);
+ return (__m512h)
+ __builtin_ia32_vfmaddcph512_mask3_round ((__v32hf) __A,
+ (__v32hf) __B,
+ (__v32hf) __C,
+ __D, __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
+_mm512_maskz_fmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfmaddcsh_round ((__v8hf) __A,
- (__v8hf) __B,
- (__v8hf) __C,
- __D);
+ return (__m512h)
+ __builtin_ia32_vfmaddcph512_maskz_round ((__v32hf) __B,
+ (__v32hf) __C,
+ (__v32hf) __D,
+ __A, __E);
}
+
#else
-#define _mm_mask_fcmadd_round_sch(A, B, C, D, E) \
- ((__m128h) \
- __builtin_ia32_vfcmaddcsh_mask_round ((__v8hf) (A), \
- (__v8hf) (C), \
- (__v8hf) (D), \
- (B), (E)))
+#define _mm512_fcmadd_round_pch(A, B, C, D) \
+ (__m512h) __builtin_ia32_vfcmaddcph512_round ((A), (B), (C), (D))
+#define _mm512_mask_fcmadd_round_pch(A, B, C, D, E) \
+ ((__m512h) \
+ __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) (A), \
+ (__v32hf) (C), \
+ (__v32hf) (D), \
+ (B), (E)))
-#define _mm_mask3_fcmadd_round_sch(A, B, C, D, E) \
- ((__m128h) \
- __builtin_ia32_vfcmaddcsh_mask3_round ((__v8hf) (A), \
- (__v8hf) (B), \
- (__v8hf) (C), \
- (D), (E)))
-#define _mm_maskz_fcmadd_round_sch(A, B, C, D, E) \
- __builtin_ia32_vfcmaddcsh_maskz_round ((B), (C), (D), (A), (E))
+#define _mm512_mask3_fcmadd_round_pch(A, B, C, D, E) \
+ ((__m512h) \
+ __builtin_ia32_vfcmaddcph512_mask3_round ((A), (B), (C), (D), (E)))
-#define _mm_fcmadd_round_sch(A, B, C, D) \
- __builtin_ia32_vfcmaddcsh_round ((A), (B), (C), (D))
+#define _mm512_maskz_fcmadd_round_pch(A, B, C, D, E) \
+ (__m512h) \
+ __builtin_ia32_vfcmaddcph512_maskz_round ((B), (C), (D), (A), (E))
-#define _mm_mask_fmadd_round_sch(A, B, C, D, E) \
- ((__m128h) \
- __builtin_ia32_vfmaddcsh_mask_round ((__v8hf) (A), \
- (__v8hf) (C), \
- (__v8hf) (D), \
- (B), (E)))
+#define _mm512_fmadd_round_pch(A, B, C, D) \
+ (__m512h) __builtin_ia32_vfmaddcph512_round ((A), (B), (C), (D))
-#define _mm_mask3_fmadd_round_sch(A, B, C, D, E) \
- ((__m128h) \
- __builtin_ia32_vfmaddcsh_mask3_round ((__v8hf) (A), \
- (__v8hf) (B), \
- (__v8hf) (C), \
- (D), (E)))
+#define _mm512_mask_fmadd_round_pch(A, B, C, D, E) \
+ ((__m512h) \
+ __builtin_ia32_vfmaddcph512_mask_round ((__v32hf) (A), \
+ (__v32hf) (C), \
+ (__v32hf) (D), \
+ (B), (E)))
-#define _mm_maskz_fmadd_round_sch(A, B, C, D, E) \
- __builtin_ia32_vfmaddcsh_maskz_round ((B), (C), (D), (A), (E))
+#define _mm512_mask3_fmadd_round_pch(A, B, C, D, E) \
+ (__m512h) \
+ __builtin_ia32_vfmaddcph512_mask3_round ((A), (B), (C), (D), (E))
-#define _mm_fmadd_round_sch(A, B, C, D) \
- __builtin_ia32_vfmaddcsh_round ((A), (B), (C), (D))
+#define _mm512_maskz_fmadd_round_pch(A, B, C, D, E) \
+ (__m512h) \
+ __builtin_ia32_vfmaddcph512_maskz_round ((B), (C), (D), (A), (E))
#endif /* __OPTIMIZE__ */
-/* Intrinsics vf[,c]mulcsh. */
-extern __inline __m128h
+/* Intrinsics vf[,c]mulcph. */
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fcmul_sch (__m128h __A, __m128h __B)
+_mm512_fcmul_pch (__m512h __A, __m512h __B)
{
- return (__m128h)
- __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,
- (__v8hf) __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfcmulcph512_round ((__v32hf) __A,
+ (__v32hf) __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fcmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm512_mask_fcmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
{
- return (__m128h)
- __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,
- (__v8hf) __D,
- (__v8hf) __A,
- __B, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C,
+ (__v32hf) __D,
+ (__v32hf) __A,
+ __B, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fcmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
+_mm512_maskz_fcmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
{
- return (__m128h)
- __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,
- (__v8hf) __C,
- _mm_setzero_ph (),
- __A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B,
+ (__v32hf) __C,
+ _mm512_setzero_ph (),
+ __A, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmul_sch (__m128h __A, __m128h __B)
+_mm512_fmul_pch (__m512h __A, __m512h __B)
{
- return (__m128h)
- __builtin_ia32_vfmulcsh_round ((__v8hf) __A,
- (__v8hf) __B,
- _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmulcph512_round ((__v32hf) __A,
+ (__v32hf) __B,
+ _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+_mm512_mask_fmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
{
- return (__m128h)
- __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,
- (__v8hf) __D,
- (__v8hf) __A,
- __B, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C,
+ (__v32hf) __D,
+ (__v32hf) __A,
+ __B, _MM_FROUND_CUR_DIRECTION);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
+_mm512_maskz_fmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
{
- return (__m128h)
- __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,
- (__v8hf) __C,
- _mm_setzero_ph (),
- __A, _MM_FROUND_CUR_DIRECTION);
+ return (__m512h)
+ __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B,
+ (__v32hf) __C,
+ _mm512_setzero_ph (),
+ __A, _MM_FROUND_CUR_DIRECTION);
}
#ifdef __OPTIMIZE__
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fcmul_round_sch (__m128h __A, __m128h __B, const int __D)
+_mm512_fcmul_round_pch (__m512h __A, __m512h __B, const int __D)
{
- return (__m128h)
- __builtin_ia32_vfcmulcsh_round ((__v8hf) __A,
- (__v8hf) __B,
- __D);
+ return (__m512h)
+ __builtin_ia32_vfcmulcph512_round ((__v32hf) __A,
+ (__v32hf) __B, __D);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fcmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm512_mask_fcmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C,
- (__v8hf) __D,
- (__v8hf) __A,
- __B, __E);
+ return (__m512h)
+ __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __C,
+ (__v32hf) __D,
+ (__v32hf) __A,
+ __B, __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fcmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
- const int __E)
+_mm512_maskz_fcmul_round_pch (__mmask16 __A, __m512h __B,
+ __m512h __C, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B,
- (__v8hf) __C,
- _mm_setzero_ph (),
- __A, __E);
+ return (__m512h)
+ __builtin_ia32_vfcmulcph512_mask_round ((__v32hf) __B,
+ (__v32hf) __C,
+ _mm512_setzero_ph (),
+ __A, __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_fmul_round_sch (__m128h __A, __m128h __B, const int __D)
+_mm512_fmul_round_pch (__m512h __A, __m512h __B, const int __D)
{
- return (__m128h)
- __builtin_ia32_vfmulcsh_round ((__v8hf) __A,
- (__v8hf) __B, __D);
+ return (__m512h)
+ __builtin_ia32_vfmulcph512_round ((__v32hf) __A,
+ (__v32hf) __B,
+ __D);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_mask_fmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
- __m128h __D, const int __E)
+_mm512_mask_fmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+ __m512h __D, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C,
- (__v8hf) __D,
- (__v8hf) __A,
- __B, __E);
+ return (__m512h)
+ __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __C,
+ (__v32hf) __D,
+ (__v32hf) __A,
+ __B, __E);
}
-extern __inline __m128h
+extern __inline __m512h
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E)
+_mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B,
+ __m512h __C, const int __E)
{
- return (__m128h)
- __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B,
- (__v8hf) __C,
- _mm_setzero_ph (),
- __A, __E);
+ return (__m512h)
+ __builtin_ia32_vfmulcph512_mask_round ((__v32hf) __B,
+ (__v32hf) __C,
+ _mm512_setzero_ph (),
+ __A, __E);
}
#else
-#define _mm_fcmul_round_sch(__A, __B, __D) \
- (__m128h) __builtin_ia32_vfcmulcsh_round ((__v8hf) __A, \
- (__v8hf) __B, __D)
+#define _mm512_fcmul_round_pch(A, B, D) \
+ (__m512h) __builtin_ia32_vfcmulcph512_round ((A), (B), (D))
-#define _mm_mask_fcmul_round_sch(__A, __B, __C, __D, __E) \
- (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __C, \
- (__v8hf) __D, \
- (__v8hf) __A, \
- __B, __E)
+#define _mm512_mask_fcmul_round_pch(A, B, C, D, E) \
+ (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((C), (D), (A), (B), (E))
-#define _mm_maskz_fcmul_round_sch(__A, __B, __C, __E) \
- (__m128h) __builtin_ia32_vfcmulcsh_mask_round ((__v8hf) __B, \
- (__v8hf) __C, \
- _mm_setzero_ph (), \
- __A, __E)
+#define _mm512_maskz_fcmul_round_pch(A, B, C, E) \
+ (__m512h) __builtin_ia32_vfcmulcph512_mask_round ((B), (C), \
+ (__v32hf) \
+ _mm512_setzero_ph (), \
+ (A), (E))
-#define _mm_fmul_round_sch(__A, __B, __D) \
- (__m128h) __builtin_ia32_vfmulcsh_round ((__v8hf) __A, \
- (__v8hf) __B, __D)
+#define _mm512_fmul_round_pch(A, B, D) \
+ (__m512h) __builtin_ia32_vfmulcph512_round ((A), (B), (D))
-#define _mm_mask_fmul_round_sch(__A, __B, __C, __D, __E) \
- (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __C, \
- (__v8hf) __D, \
- (__v8hf) __A, \
- __B, __E)
+#define _mm512_mask_fmul_round_pch(A, B, C, D, E) \
+ (__m512h) __builtin_ia32_vfmulcph512_mask_round ((C), (D), (A), (B), (E))
-#define _mm_maskz_fmul_round_sch(__A, __B, __C, __E) \
- (__m128h) __builtin_ia32_vfmulcsh_mask_round ((__v8hf) __B, \
- (__v8hf) __C, \
- _mm_setzero_ph (), \
- __A, __E)
+#define _mm512_maskz_fmul_round_pch(A, B, C, E) \
+ (__m512h) __builtin_ia32_vfmulcph512_mask_round ((B), (C), \
+ (__v32hf) \
+ _mm512_setzero_ph (), \
+ (A), (E))
#endif /* __OPTIMIZE__ */
@@ -7193,27 +7238,9 @@ _mm512_set1_pch (_Float16 _Complex __A)
#define _mm512_maskz_cmul_round_pch(U, A, B, R) \
_mm512_maskz_fcmul_round_pch ((U), (A), (B), (R))
-#define _mm_mul_sch(A, B) _mm_fmul_sch ((A), (B))
-#define _mm_mask_mul_sch(W, U, A, B) _mm_mask_fmul_sch ((W), (U), (A), (B))
-#define _mm_maskz_mul_sch(U, A, B) _mm_maskz_fmul_sch ((U), (A), (B))
-#define _mm_mul_round_sch(A, B, R) _mm_fmul_round_sch ((A), (B), (R))
-#define _mm_mask_mul_round_sch(W, U, A, B, R) \
- _mm_mask_fmul_round_sch ((W), (U), (A), (B), (R))
-#define _mm_maskz_mul_round_sch(U, A, B, R) \
- _mm_maskz_fmul_round_sch ((U), (A), (B), (R))
-
-#define _mm_cmul_sch(A, B) _mm_fcmul_sch ((A), (B))
-#define _mm_mask_cmul_sch(W, U, A, B) _mm_mask_fcmul_sch ((W), (U), (A), (B))
-#define _mm_maskz_cmul_sch(U, A, B) _mm_maskz_fcmul_sch ((U), (A), (B))
-#define _mm_cmul_round_sch(A, B, R) _mm_fcmul_round_sch ((A), (B), (R))
-#define _mm_mask_cmul_round_sch(W, U, A, B, R) \
- _mm_mask_fcmul_round_sch ((W), (U), (A), (B), (R))
-#define _mm_maskz_cmul_round_sch(U, A, B, R) \
- _mm_maskz_fcmul_round_sch ((U), (A), (B), (R))
-
-#ifdef __DISABLE_AVX512FP16__
-#undef __DISABLE_AVX512FP16__
+#ifdef __DISABLE_AVX512FP16_512__
+#undef __DISABLE_AVX512FP16_512__
#pragma GCC pop_options
-#endif /* __DISABLE_AVX512FP16__ */
+#endif /* __DISABLE_AVX512FP16_512__ */
-#endif /* __AVX512FP16INTRIN_H_INCLUDED */
+#endif /* _AVX512FP16INTRIN_H_INCLUDED */
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 07/18] [PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (5 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 06/18] [PATCH 5/5] " Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 08/18] [PATCH 2/5] " Hu, Lin1
` (12 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.
* config/i386/i386-builtins.cc
(ix86_init_mmx_sse_builtins): Ditto.
---
gcc/config/i386/i386-builtin.def | 648 +++++++++++++++----------------
gcc/config/i386/i386-builtins.cc | 72 ++--
2 files changed, 372 insertions(+), 348 deletions(-)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 8738b3b6a8a..0cc526383db 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -200,53 +200,53 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_maskstored256, "__builtin_ia32_mas
BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_maskstoreq256, "__builtin_ia32_maskstoreq256", IX86_BUILTIN_MASKSTOREQ256, UNKNOWN, (int) VOID_FTYPE_PV4DI_V4DI_V4DI)
/* AVX512F */
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16sf_mask, "__builtin_ia32_compressstoresf512_mask", IX86_BUILTIN_COMPRESSPSSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16si_mask, "__builtin_ia32_compressstoresi512_mask", IX86_BUILTIN_PCOMPRESSDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8df_mask, "__builtin_ia32_compressstoredf512_mask", IX86_BUILTIN_COMPRESSPDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8di_mask, "__builtin_ia32_compressstoredi512_mask", IX86_BUILTIN_PCOMPRESSQSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandloadsf512_maskz", IX86_BUILTIN_EXPANDPSLOAD512Z, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandloadsi512_maskz", IX86_BUILTIN_PEXPANDDLOAD512Z, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expandloaddf512_maskz", IX86_BUILTIN_EXPANDPDLOAD512Z, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expandloaddi512_maskz", IX86_BUILTIN_PEXPANDQLOAD512Z, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_loaddqusi512_mask", IX86_BUILTIN_LOADDQUSI512, UNKNOWN, (int) V16SI_FTYPE_PCINT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_loaddqudi512_mask", IX86_BUILTIN_LOADDQUDI512, UNKNOWN, (int) V8DI_FTYPE_PCINT64_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadupd512_mask", IX86_BUILTIN_LOADUPD512, UNKNOWN, (int) V8DF_FTYPE_PCDOUBLE_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadups512_mask", IX86_BUILTIN_LOADUPS512, UNKNOWN, (int) V16SF_FTYPE_PCFLOAT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadaps512_mask", IX86_BUILTIN_LOADAPS512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32load512_mask", IX86_BUILTIN_MOVDQA32LOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadapd512_mask", IX86_BUILTIN_LOADAPD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64load512_mask", IX86_BUILTIN_MOVDQA64LOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntv16sf, "__builtin_ia32_movntps512", IX86_BUILTIN_MOVNTPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntv8df, "__builtin_ia32_movntpd512", IX86_BUILTIN_MOVNTPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntv8di, "__builtin_ia32_movntdq512", IX86_BUILTIN_MOVNTDQ512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movntdqa, "__builtin_ia32_movntdqa512", IX86_BUILTIN_MOVNTDQA512, UNKNOWN, (int) V8DI_FTYPE_PV8DI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_storedqusi512_mask", IX86_BUILTIN_STOREDQUSI512, UNKNOWN, (int) VOID_FTYPE_PINT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_storedqudi512_mask", IX86_BUILTIN_STOREDQUDI512, UNKNOWN, (int) VOID_FTYPE_PINT64_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeupd512_mask", IX86_BUILTIN_STOREUPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8si2_mask_store, "__builtin_ia32_pmovusqd512mem_mask", IX86_BUILTIN_PMOVUSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8si2_mask_store, "__builtin_ia32_pmovsqd512mem_mask", IX86_BUILTIN_PMOVSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8si2_mask_store, "__builtin_ia32_pmovqd512mem_mask", IX86_BUILTIN_PMOVQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovusqw512mem_mask", IX86_BUILTIN_PMOVUSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovsqw512mem_mask", IX86_BUILTIN_PMOVSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovqw512mem_mask", IX86_BUILTIN_PMOVQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovusdw512mem_mask", IX86_BUILTIN_PMOVUSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovsdw512mem_mask", IX86_BUILTIN_PMOVSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovdw512mem_mask", IX86_BUILTIN_PMOVDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovqb512mem_mask", IX86_BUILTIN_PMOVQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovusqb512mem_mask", IX86_BUILTIN_PMOVUSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovsqb512mem_mask", IX86_BUILTIN_PMOVSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovusdb512mem_mask", IX86_BUILTIN_PMOVUSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovsdb512mem_mask", IX86_BUILTIN_PMOVSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovdb512mem_mask", IX86_BUILTIN_PMOVDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeups512_mask", IX86_BUILTIN_STOREUPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeaps512_mask", IX86_BUILTIN_STOREAPS512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_movdqa32store512_mask", IX86_BUILTIN_MOVDQA32STORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeapd512_mask", IX86_BUILTIN_STOREAPD512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_movdqa64store512_mask", IX86_BUILTIN_MOVDQA64STORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev16sf_mask, "__builtin_ia32_compressstoresf512_mask", IX86_BUILTIN_COMPRESSPSSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev16si_mask, "__builtin_ia32_compressstoresi512_mask", IX86_BUILTIN_PCOMPRESSDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev8df_mask, "__builtin_ia32_compressstoredf512_mask", IX86_BUILTIN_COMPRESSPDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressstorev8di_mask, "__builtin_ia32_compressstoredi512_mask", IX86_BUILTIN_PCOMPRESSQSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandloadsf512_maskz", IX86_BUILTIN_EXPANDPSLOAD512Z, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandloadsi512_maskz", IX86_BUILTIN_PEXPANDDLOAD512Z, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expandloaddf512_maskz", IX86_BUILTIN_EXPANDPDLOAD512Z, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expandloaddi512_maskz", IX86_BUILTIN_PEXPANDQLOAD512Z, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_loaddqusi512_mask", IX86_BUILTIN_LOADDQUSI512, UNKNOWN, (int) V16SI_FTYPE_PCINT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_loaddqudi512_mask", IX86_BUILTIN_LOADDQUDI512, UNKNOWN, (int) V8DI_FTYPE_PCINT64_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadupd512_mask", IX86_BUILTIN_LOADUPD512, UNKNOWN, (int) V8DF_FTYPE_PCDOUBLE_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadups512_mask", IX86_BUILTIN_LOADUPS512, UNKNOWN, (int) V16SF_FTYPE_PCFLOAT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_loadaps512_mask", IX86_BUILTIN_LOADAPS512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32load512_mask", IX86_BUILTIN_MOVDQA32LOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_loadapd512_mask", IX86_BUILTIN_LOADAPD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64load512_mask", IX86_BUILTIN_MOVDQA64LOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntv16sf, "__builtin_ia32_movntps512", IX86_BUILTIN_MOVNTPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntv8df, "__builtin_ia32_movntpd512", IX86_BUILTIN_MOVNTPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntv8di, "__builtin_ia32_movntdq512", IX86_BUILTIN_MOVNTDQ512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movntdqa, "__builtin_ia32_movntdqa512", IX86_BUILTIN_MOVNTDQA512, UNKNOWN, (int) V8DI_FTYPE_PV8DI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_storedqusi512_mask", IX86_BUILTIN_STOREDQUSI512, UNKNOWN, (int) VOID_FTYPE_PINT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_storedqudi512_mask", IX86_BUILTIN_STOREDQUDI512, UNKNOWN, (int) VOID_FTYPE_PINT64_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeupd512_mask", IX86_BUILTIN_STOREUPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8si2_mask_store, "__builtin_ia32_pmovusqd512mem_mask", IX86_BUILTIN_PMOVUSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8si2_mask_store, "__builtin_ia32_pmovsqd512mem_mask", IX86_BUILTIN_PMOVSQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8si2_mask_store, "__builtin_ia32_pmovqd512mem_mask", IX86_BUILTIN_PMOVQD512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovusqw512mem_mask", IX86_BUILTIN_PMOVUSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovsqw512mem_mask", IX86_BUILTIN_PMOVSQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8hi2_mask_store, "__builtin_ia32_pmovqw512mem_mask", IX86_BUILTIN_PMOVQW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovusdw512mem_mask", IX86_BUILTIN_PMOVUSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovsdw512mem_mask", IX86_BUILTIN_PMOVSDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16hi2_mask_store, "__builtin_ia32_pmovdw512mem_mask", IX86_BUILTIN_PMOVDW512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovqb512mem_mask", IX86_BUILTIN_PMOVQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovusqb512mem_mask", IX86_BUILTIN_PMOVUSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask_store_2, "__builtin_ia32_pmovsqb512mem_mask", IX86_BUILTIN_PMOVSQB512_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovusdb512mem_mask", IX86_BUILTIN_PMOVUSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovsdb512mem_mask", IX86_BUILTIN_PMOVSDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16qi2_mask_store, "__builtin_ia32_pmovdb512mem_mask", IX86_BUILTIN_PMOVDB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeups512_mask", IX86_BUILTIN_STOREUPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16sf_mask, "__builtin_ia32_storeaps512_mask", IX86_BUILTIN_STOREAPS512, UNKNOWN, (int) VOID_FTYPE_PV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev16si_mask, "__builtin_ia32_movdqa32store512_mask", IX86_BUILTIN_MOVDQA32STORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8df_mask, "__builtin_ia32_storeapd512_mask", IX86_BUILTIN_STOREAPD512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_storev8di_mask, "__builtin_ia32_movdqa64store512_mask", IX86_BUILTIN_MOVDQA64STORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loaddf_mask, "__builtin_ia32_loadsd_mask", IX86_BUILTIN_LOADSD_MASK, UNKNOWN, (int) V2DF_FTYPE_PCDOUBLE_V2DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadsf_mask, "__builtin_ia32_loadss_mask", IX86_BUILTIN_LOADSS_MASK, UNKNOWN, (int) V4SF_FTYPE_PCFLOAT_V4SF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_storedf_mask, "__builtin_ia32_storesd_mask", IX86_BUILTIN_STORESD_MASK, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF_UQI)
@@ -1360,231 +1360,231 @@ BDESC (OPTION_MASK_ISA_BMI2, 0, CODE_FOR_bmi2_pext_si3, "__builtin_ia32_pext_si"
BDESC (OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_bmi2_pext_di3, "__builtin_ia32_pext_di", IX86_BUILTIN_PEXT64, UNKNOWN, (int) UINT64_FTYPE_UINT64_UINT64)
/* AVX512F */
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_si512_256si, "__builtin_ia32_si512_256si", IX86_BUILTIN_SI512_SI256, UNKNOWN, (int) V16SI_FTYPE_V8SI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ps512_256ps, "__builtin_ia32_ps512_256ps", IX86_BUILTIN_PS512_PS256, UNKNOWN, (int) V16SF_FTYPE_V8SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_pd512_256pd, "__builtin_ia32_pd512_256pd", IX86_BUILTIN_PD512_PD256, UNKNOWN, (int) V8DF_FTYPE_V4DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_si512_si, "__builtin_ia32_si512_si", IX86_BUILTIN_SI512_SI, UNKNOWN, (int) V16SI_FTYPE_V4SI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ps512_ps, "__builtin_ia32_ps512_ps", IX86_BUILTIN_PS512_PS, UNKNOWN, (int) V16SF_FTYPE_V4SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_pd512_pd, "__builtin_ia32_pd512_pd", IX86_BUILTIN_PD512_PD, UNKNOWN, (int) V8DF_FTYPE_V2DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_alignv16si_mask, "__builtin_ia32_alignd512_mask", IX86_BUILTIN_ALIGND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_alignv8di_mask, "__builtin_ia32_alignq512_mask", IX86_BUILTIN_ALIGNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv16si, "__builtin_ia32_blendmd_512_mask", IX86_BUILTIN_BLENDMD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv8df, "__builtin_ia32_blendmpd_512_mask", IX86_BUILTIN_BLENDMPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv16sf, "__builtin_ia32_blendmps_512_mask", IX86_BUILTIN_BLENDMPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_blendmv8di, "__builtin_ia32_blendmq_512_mask", IX86_BUILTIN_BLENDMQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x4_512", IX86_BUILTIN_BROADCASTF32X4_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv8df_mask, "__builtin_ia32_broadcastf64x4_512", IX86_BUILTIN_BROADCASTF64X4_512, UNKNOWN, (int) V8DF_FTYPE_V4DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv16si_mask, "__builtin_ia32_broadcasti32x4_512", IX86_BUILTIN_BROADCASTI32X4_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_broadcastv8di_mask, "__builtin_ia32_broadcasti64x4_512", IX86_BUILTIN_BROADCASTI64X4_512, UNKNOWN, (int) V8DI_FTYPE_V4DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv8df_mask, "__builtin_ia32_broadcastsd512", IX86_BUILTIN_BROADCASTSD512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv16sf_mask, "__builtin_ia32_broadcastss512", IX86_BUILTIN_BROADCASTSS512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv16si3_mask, "__builtin_ia32_cmpd512_mask", IX86_BUILTIN_CMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv8di3_mask, "__builtin_ia32_cmpq512_mask", IX86_BUILTIN_CMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv8df_mask, "__builtin_ia32_compressdf512_mask", IX86_BUILTIN_COMPRESSPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv16sf_mask, "__builtin_ia32_compresssf512_mask", IX86_BUILTIN_COMPRESSPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatv8siv8df2_mask, "__builtin_ia32_cvtdq2pd512_mask", IX86_BUILTIN_CVTDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvtps2ph512_mask_sae, "__builtin_ia32_vcvtps2ph512_mask", IX86_BUILTIN_CVTPS2PH512, UNKNOWN, (int) V16HI_FTYPE_V16SF_INT_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatunsv8siv8df2_mask, "__builtin_ia32_cvtudq2pd512_mask", IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_si512_256si, "__builtin_ia32_si512_256si", IX86_BUILTIN_SI512_SI256, UNKNOWN, (int) V16SI_FTYPE_V8SI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ps512_256ps, "__builtin_ia32_ps512_256ps", IX86_BUILTIN_PS512_PS256, UNKNOWN, (int) V16SF_FTYPE_V8SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_pd512_256pd, "__builtin_ia32_pd512_256pd", IX86_BUILTIN_PD512_PD256, UNKNOWN, (int) V8DF_FTYPE_V4DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_si512_si, "__builtin_ia32_si512_si", IX86_BUILTIN_SI512_SI, UNKNOWN, (int) V16SI_FTYPE_V4SI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ps512_ps, "__builtin_ia32_ps512_ps", IX86_BUILTIN_PS512_PS, UNKNOWN, (int) V16SF_FTYPE_V4SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_pd512_pd, "__builtin_ia32_pd512_pd", IX86_BUILTIN_PD512_PD, UNKNOWN, (int) V8DF_FTYPE_V2DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_alignv16si_mask, "__builtin_ia32_alignd512_mask", IX86_BUILTIN_ALIGND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_alignv8di_mask, "__builtin_ia32_alignq512_mask", IX86_BUILTIN_ALIGNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv16si, "__builtin_ia32_blendmd_512_mask", IX86_BUILTIN_BLENDMD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv8df, "__builtin_ia32_blendmpd_512_mask", IX86_BUILTIN_BLENDMPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv16sf, "__builtin_ia32_blendmps_512_mask", IX86_BUILTIN_BLENDMPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_blendmv8di, "__builtin_ia32_blendmq_512_mask", IX86_BUILTIN_BLENDMQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x4_512", IX86_BUILTIN_BROADCASTF32X4_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv8df_mask, "__builtin_ia32_broadcastf64x4_512", IX86_BUILTIN_BROADCASTF64X4_512, UNKNOWN, (int) V8DF_FTYPE_V4DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv16si_mask, "__builtin_ia32_broadcasti32x4_512", IX86_BUILTIN_BROADCASTI32X4_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_broadcastv8di_mask, "__builtin_ia32_broadcasti64x4_512", IX86_BUILTIN_BROADCASTI64X4_512, UNKNOWN, (int) V8DI_FTYPE_V4DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv8df_mask, "__builtin_ia32_broadcastsd512", IX86_BUILTIN_BROADCASTSD512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv16sf_mask, "__builtin_ia32_broadcastss512", IX86_BUILTIN_BROADCASTSS512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv16si3_mask, "__builtin_ia32_cmpd512_mask", IX86_BUILTIN_CMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv8di3_mask, "__builtin_ia32_cmpq512_mask", IX86_BUILTIN_CMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv8df_mask, "__builtin_ia32_compressdf512_mask", IX86_BUILTIN_COMPRESSPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv16sf_mask, "__builtin_ia32_compresssf512_mask", IX86_BUILTIN_COMPRESSPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv8siv8df2_mask, "__builtin_ia32_cvtdq2pd512_mask", IX86_BUILTIN_CVTDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vcvtps2ph512_mask_sae, "__builtin_ia32_vcvtps2ph512_mask", IX86_BUILTIN_CVTPS2PH512, UNKNOWN, (int) V16HI_FTYPE_V16SF_INT_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatunsv8siv8df2_mask, "__builtin_ia32_cvtudq2pd512_mask", IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_cvtusi2sd32, "__builtin_ia32_cvtusi2sd32", IX86_BUILTIN_CVTUSI2SD32, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expanddf512_maskz", IX86_BUILTIN_EXPANDPD512Z, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandsf512_maskz", IX86_BUILTIN_EXPANDPS512Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf32x4_mask, "__builtin_ia32_extractf32x4_mask", IX86_BUILTIN_EXTRACTF32X4, UNKNOWN, (int) V4SF_FTYPE_V16SF_INT_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf64x4_mask, "__builtin_ia32_extractf64x4_mask", IX86_BUILTIN_EXTRACTF64X4, UNKNOWN, (int) V4DF_FTYPE_V8DF_INT_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextracti32x4_mask, "__builtin_ia32_extracti32x4_mask", IX86_BUILTIN_EXTRACTI32X4, UNKNOWN, (int) V4SI_FTYPE_V16SI_INT_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextracti64x4_mask, "__builtin_ia32_extracti64x4_mask", IX86_BUILTIN_EXTRACTI64X4, UNKNOWN, (int) V4DI_FTYPE_V8DI_INT_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinsertf32x4_mask, "__builtin_ia32_insertf32x4_mask", IX86_BUILTIN_INSERTF32X4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V4SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinsertf64x4_mask, "__builtin_ia32_insertf64x4_mask", IX86_BUILTIN_INSERTF64X4, UNKNOWN, (int) V8DF_FTYPE_V8DF_V4DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinserti32x4_mask, "__builtin_ia32_inserti32x4_mask", IX86_BUILTIN_INSERTI32X4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vinserti64x4_mask, "__builtin_ia32_inserti64x4_mask", IX86_BUILTIN_INSERTI64X4, UNKNOWN, (int) V8DI_FTYPE_V8DI_V4DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_movapd512_mask", IX86_BUILTIN_MOVAPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_movaps512_mask", IX86_BUILTIN_MOVAPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movddup512_mask, "__builtin_ia32_movddup512_mask", IX86_BUILTIN_MOVDDUP512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32_512_mask", IX86_BUILTIN_MOVDQA32_512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64_512_mask", IX86_BUILTIN_MOVDQA64_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movshdup512_mask, "__builtin_ia32_movshdup512_mask", IX86_BUILTIN_MOVSHDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movsldup512_mask, "__builtin_ia32_movsldup512_mask", IX86_BUILTIN_MOVSLDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_absv16si2_mask, "__builtin_ia32_pabsd512_mask", IX86_BUILTIN_PABSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_absv8di2_mask, "__builtin_ia32_pabsq512_mask", IX86_BUILTIN_PABSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv16si3_mask, "__builtin_ia32_paddd512_mask", IX86_BUILTIN_PADDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv8di3_mask, "__builtin_ia32_paddq512_mask", IX86_BUILTIN_PADDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_andv16si3_mask, "__builtin_ia32_pandd512_mask", IX86_BUILTIN_PANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_andnotv16si3_mask, "__builtin_ia32_pandnd512_mask", IX86_BUILTIN_PANDND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_andnotv8di3_mask, "__builtin_ia32_pandnq512_mask", IX86_BUILTIN_PANDNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_andv8di3_mask, "__builtin_ia32_pandq512_mask", IX86_BUILTIN_PANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv16si_mask, "__builtin_ia32_pbroadcastd512", IX86_BUILTIN_PBROADCASTD512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dup_gprv16si_mask, "__builtin_ia32_pbroadcastd512_gpr_mask", IX86_BUILTIN_PBROADCASTD512_GPR, UNKNOWN, (int) V16SI_FTYPE_SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_avx512cd_maskb_vec_dupv8di, "__builtin_ia32_broadcastmb512", IX86_BUILTIN_PBROADCASTMB512, UNKNOWN, (int) V8DI_FTYPE_UQI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_avx512cd_maskw_vec_dupv16si, "__builtin_ia32_broadcastmw512", IX86_BUILTIN_PBROADCASTMW512, UNKNOWN, (int) V16SI_FTYPE_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dupv8di_mask, "__builtin_ia32_pbroadcastq512", IX86_BUILTIN_PBROADCASTQ512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_dup_gprv8di_mask, "__builtin_ia32_pbroadcastq512_gpr_mask", IX86_BUILTIN_PBROADCASTQ512_GPR, UNKNOWN, (int) V8DI_FTYPE_DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_eqv16si3_mask, "__builtin_ia32_pcmpeqd512_mask", IX86_BUILTIN_PCMPEQD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_eqv8di3_mask, "__builtin_ia32_pcmpeqq512_mask", IX86_BUILTIN_PCMPEQQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv16si3_mask, "__builtin_ia32_pcmpgtd512_mask", IX86_BUILTIN_PCMPGTD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv8di3_mask, "__builtin_ia32_pcmpgtq512_mask", IX86_BUILTIN_PCMPGTQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv16si_mask, "__builtin_ia32_compresssi512_mask", IX86_BUILTIN_PCOMPRESSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv8di_mask, "__builtin_ia32_compressdi512_mask", IX86_BUILTIN_PCOMPRESSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandsi512_maskz", IX86_BUILTIN_PEXPANDD512Z, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expanddi512_maskz", IX86_BUILTIN_PEXPANDQ512Z, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv16si3_mask, "__builtin_ia32_pmaxsd512_mask", IX86_BUILTIN_PMAXSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv8di3_mask, "__builtin_ia32_pmaxsq512_mask", IX86_BUILTIN_PMAXSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_umaxv16si3_mask, "__builtin_ia32_pmaxud512_mask", IX86_BUILTIN_PMAXUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_umaxv8di3_mask, "__builtin_ia32_pmaxuq512_mask", IX86_BUILTIN_PMAXUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv16si3_mask, "__builtin_ia32_pminsd512_mask", IX86_BUILTIN_PMINSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv8di3_mask, "__builtin_ia32_pminsq512_mask", IX86_BUILTIN_PMINSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_uminv16si3_mask, "__builtin_ia32_pminud512_mask", IX86_BUILTIN_PMINUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_uminv8di3_mask, "__builtin_ia32_pminuq512_mask", IX86_BUILTIN_PMINUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16qi2_mask, "__builtin_ia32_pmovdb512_mask", IX86_BUILTIN_PMOVDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev16siv16hi2_mask, "__builtin_ia32_pmovdw512_mask", IX86_BUILTIN_PMOVDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div16qi2_mask, "__builtin_ia32_pmovqb512_mask", IX86_BUILTIN_PMOVQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8si2_mask, "__builtin_ia32_pmovqd512_mask", IX86_BUILTIN_PMOVQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_truncatev8div8hi2_mask, "__builtin_ia32_pmovqw512_mask", IX86_BUILTIN_PMOVQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask, "__builtin_ia32_pmovsdb512_mask", IX86_BUILTIN_PMOVSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask, "__builtin_ia32_pmovsdw512_mask", IX86_BUILTIN_PMOVSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask, "__builtin_ia32_pmovsqb512_mask", IX86_BUILTIN_PMOVSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8si2_mask, "__builtin_ia32_pmovsqd512_mask", IX86_BUILTIN_PMOVSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask, "__builtin_ia32_pmovsqw512_mask", IX86_BUILTIN_PMOVSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv16qiv16si2_mask, "__builtin_ia32_pmovsxbd512_mask", IX86_BUILTIN_PMOVSXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv8qiv8di2_mask, "__builtin_ia32_pmovsxbq512_mask", IX86_BUILTIN_PMOVSXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv8siv8di2_mask, "__builtin_ia32_pmovsxdq512_mask", IX86_BUILTIN_PMOVSXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv16hiv16si2_mask, "__builtin_ia32_pmovsxwd512_mask", IX86_BUILTIN_PMOVSXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sign_extendv8hiv8di2_mask, "__builtin_ia32_pmovsxwq512_mask", IX86_BUILTIN_PMOVSXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask, "__builtin_ia32_pmovusdb512_mask", IX86_BUILTIN_PMOVUSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask, "__builtin_ia32_pmovusdw512_mask", IX86_BUILTIN_PMOVUSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div16qi2_mask, "__builtin_ia32_pmovusqb512_mask", IX86_BUILTIN_PMOVUSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8si2_mask, "__builtin_ia32_pmovusqd512_mask", IX86_BUILTIN_PMOVUSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_us_truncatev8div8hi2_mask, "__builtin_ia32_pmovusqw512_mask", IX86_BUILTIN_PMOVUSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv16qiv16si2_mask, "__builtin_ia32_pmovzxbd512_mask", IX86_BUILTIN_PMOVZXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv8qiv8di2_mask, "__builtin_ia32_pmovzxbq512_mask", IX86_BUILTIN_PMOVZXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv8siv8di2_mask, "__builtin_ia32_pmovzxdq512_mask", IX86_BUILTIN_PMOVZXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv16hiv16si2_mask, "__builtin_ia32_pmovzxwd512_mask", IX86_BUILTIN_PMOVZXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_zero_extendv8hiv8di2_mask, "__builtin_ia32_pmovzxwq512_mask", IX86_BUILTIN_PMOVZXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vec_widen_smult_even_v16si_mask, "__builtin_ia32_pmuldq512_mask", IX86_BUILTIN_PMULDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_mulv16si3_mask, "__builtin_ia32_pmulld512_mask" , IX86_BUILTIN_PMULLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vec_widen_umult_even_v16si_mask, "__builtin_ia32_pmuludq512_mask", IX86_BUILTIN_PMULUDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_iorv16si3_mask, "__builtin_ia32_pord512_mask", IX86_BUILTIN_PORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_iorv8di3_mask, "__builtin_ia32_porq512_mask", IX86_BUILTIN_PORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolv16si_mask, "__builtin_ia32_prold512_mask", IX86_BUILTIN_PROLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolv8di_mask, "__builtin_ia32_prolq512_mask", IX86_BUILTIN_PROLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolvv16si_mask, "__builtin_ia32_prolvd512_mask", IX86_BUILTIN_PROLVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rolvv8di_mask, "__builtin_ia32_prolvq512_mask", IX86_BUILTIN_PROLVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorv16si_mask, "__builtin_ia32_prord512_mask", IX86_BUILTIN_PRORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorv8di_mask, "__builtin_ia32_prorq512_mask", IX86_BUILTIN_PRORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorvv16si_mask, "__builtin_ia32_prorvd512_mask", IX86_BUILTIN_PRORVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rorvv8di_mask, "__builtin_ia32_prorvq512_mask", IX86_BUILTIN_PRORVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_pshufdv3_mask, "__builtin_ia32_pshufd512_mask", IX86_BUILTIN_PSHUFD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslld512_mask", IX86_BUILTIN_PSLLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslldi512_mask", IX86_BUILTIN_PSLLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllq512_mask", IX86_BUILTIN_PSLLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllqi512_mask", IX86_BUILTIN_PSLLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashlvv16si_mask, "__builtin_ia32_psllv16si_mask", IX86_BUILTIN_PSLLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashlvv8di_mask, "__builtin_ia32_psllv8di_mask", IX86_BUILTIN_PSLLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psrad512_mask", IX86_BUILTIN_PSRAD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psradi512_mask", IX86_BUILTIN_PSRADI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraq512_mask", IX86_BUILTIN_PSRAQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraqi512_mask", IX86_BUILTIN_PSRAQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashrvv16si_mask, "__builtin_ia32_psrav16si_mask", IX86_BUILTIN_PSRAVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ashrvv8di_mask, "__builtin_ia32_psrav8di_mask", IX86_BUILTIN_PSRAVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrld512_mask", IX86_BUILTIN_PSRLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrldi512_mask", IX86_BUILTIN_PSRLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlq512_mask", IX86_BUILTIN_PSRLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlqi512_mask", IX86_BUILTIN_PSRLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_lshrvv16si_mask, "__builtin_ia32_psrlv16si_mask", IX86_BUILTIN_PSRLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_lshrvv8di_mask, "__builtin_ia32_psrlv8di_mask", IX86_BUILTIN_PSRLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv16si3_mask, "__builtin_ia32_psubd512_mask", IX86_BUILTIN_PSUBD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv8di3_mask, "__builtin_ia32_psubq512_mask", IX86_BUILTIN_PSUBQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testmv16si3_mask, "__builtin_ia32_ptestmd512", IX86_BUILTIN_PTESTMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testmv8di3_mask, "__builtin_ia32_ptestmq512", IX86_BUILTIN_PTESTMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testnmv16si3_mask, "__builtin_ia32_ptestnmd512", IX86_BUILTIN_PTESTNMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_testnmv8di3_mask, "__builtin_ia32_ptestnmq512", IX86_BUILTIN_PTESTNMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_highv16si_mask, "__builtin_ia32_punpckhdq512_mask", IX86_BUILTIN_PUNPCKHDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_highv8di_mask, "__builtin_ia32_punpckhqdq512_mask", IX86_BUILTIN_PUNPCKHQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_lowv16si_mask, "__builtin_ia32_punpckldq512_mask", IX86_BUILTIN_PUNPCKLDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_interleave_lowv8di_mask, "__builtin_ia32_punpcklqdq512_mask", IX86_BUILTIN_PUNPCKLQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_xorv16si3_mask, "__builtin_ia32_pxord512_mask", IX86_BUILTIN_PXORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_xorv8di3_mask, "__builtin_ia32_pxorq512_mask", IX86_BUILTIN_PXORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rcp14v8df_mask, "__builtin_ia32_rcp14pd512_mask", IX86_BUILTIN_RCP14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rcp14v16sf_mask, "__builtin_ia32_rcp14ps512_mask", IX86_BUILTIN_RCP14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expanddf512_maskz", IX86_BUILTIN_EXPANDPD512Z, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandsf512_maskz", IX86_BUILTIN_EXPANDPS512Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextractf32x4_mask, "__builtin_ia32_extractf32x4_mask", IX86_BUILTIN_EXTRACTF32X4, UNKNOWN, (int) V4SF_FTYPE_V16SF_INT_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextractf64x4_mask, "__builtin_ia32_extractf64x4_mask", IX86_BUILTIN_EXTRACTF64X4, UNKNOWN, (int) V4DF_FTYPE_V8DF_INT_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextracti32x4_mask, "__builtin_ia32_extracti32x4_mask", IX86_BUILTIN_EXTRACTI32X4, UNKNOWN, (int) V4SI_FTYPE_V16SI_INT_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vextracti64x4_mask, "__builtin_ia32_extracti64x4_mask", IX86_BUILTIN_EXTRACTI64X4, UNKNOWN, (int) V4DI_FTYPE_V8DI_INT_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinsertf32x4_mask, "__builtin_ia32_insertf32x4_mask", IX86_BUILTIN_INSERTF32X4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V4SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinsertf64x4_mask, "__builtin_ia32_insertf64x4_mask", IX86_BUILTIN_INSERTF64X4, UNKNOWN, (int) V8DF_FTYPE_V8DF_V4DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinserti32x4_mask, "__builtin_ia32_inserti32x4_mask", IX86_BUILTIN_INSERTI32X4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vinserti64x4_mask, "__builtin_ia32_inserti64x4_mask", IX86_BUILTIN_INSERTI64X4, UNKNOWN, (int) V8DI_FTYPE_V8DI_V4DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8df_mask, "__builtin_ia32_movapd512_mask", IX86_BUILTIN_MOVAPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16sf_mask, "__builtin_ia32_movaps512_mask", IX86_BUILTIN_MOVAPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movddup512_mask, "__builtin_ia32_movddup512_mask", IX86_BUILTIN_MOVDDUP512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_movdqa32_512_mask", IX86_BUILTIN_MOVDQA32_512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_movdqa64_512_mask", IX86_BUILTIN_MOVDQA64_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movshdup512_mask, "__builtin_ia32_movshdup512_mask", IX86_BUILTIN_MOVSHDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_movsldup512_mask, "__builtin_ia32_movsldup512_mask", IX86_BUILTIN_MOVSLDUP512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv16si2_mask, "__builtin_ia32_pabsd512_mask", IX86_BUILTIN_PABSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv8di2_mask, "__builtin_ia32_pabsq512_mask", IX86_BUILTIN_PABSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv16si3_mask, "__builtin_ia32_paddd512_mask", IX86_BUILTIN_PADDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv8di3_mask, "__builtin_ia32_paddq512_mask", IX86_BUILTIN_PADDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv16si3_mask, "__builtin_ia32_pandd512_mask", IX86_BUILTIN_PANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv16si3_mask, "__builtin_ia32_pandnd512_mask", IX86_BUILTIN_PANDND512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv8di3_mask, "__builtin_ia32_pandnq512_mask", IX86_BUILTIN_PANDNQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv8di3_mask, "__builtin_ia32_pandq512_mask", IX86_BUILTIN_PANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv16si_mask, "__builtin_ia32_pbroadcastd512", IX86_BUILTIN_PBROADCASTD512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dup_gprv16si_mask, "__builtin_ia32_pbroadcastd512_gpr_mask", IX86_BUILTIN_PBROADCASTD512_GPR, UNKNOWN, (int) V16SI_FTYPE_SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512cd_maskb_vec_dupv8di, "__builtin_ia32_broadcastmb512", IX86_BUILTIN_PBROADCASTMB512, UNKNOWN, (int) V8DI_FTYPE_UQI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512cd_maskw_vec_dupv16si, "__builtin_ia32_broadcastmw512", IX86_BUILTIN_PBROADCASTMW512, UNKNOWN, (int) V16SI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dupv8di_mask, "__builtin_ia32_pbroadcastq512", IX86_BUILTIN_PBROADCASTQ512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_dup_gprv8di_mask, "__builtin_ia32_pbroadcastq512_gpr_mask", IX86_BUILTIN_PBROADCASTQ512_GPR, UNKNOWN, (int) V8DI_FTYPE_DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_eqv16si3_mask, "__builtin_ia32_pcmpeqd512_mask", IX86_BUILTIN_PCMPEQD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_eqv8di3_mask, "__builtin_ia32_pcmpeqq512_mask", IX86_BUILTIN_PCMPEQQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_gtv16si3_mask, "__builtin_ia32_pcmpgtd512_mask", IX86_BUILTIN_PCMPGTD512_MASK, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_gtv8di3_mask, "__builtin_ia32_pcmpgtq512_mask", IX86_BUILTIN_PCMPGTQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv16si_mask, "__builtin_ia32_compresssi512_mask", IX86_BUILTIN_PCOMPRESSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_compressv8di_mask, "__builtin_ia32_compressdi512_mask", IX86_BUILTIN_PCOMPRESSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandsi512_maskz", IX86_BUILTIN_PEXPANDD512Z, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expanddi512_maskz", IX86_BUILTIN_PEXPANDQ512Z, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv16si3_mask, "__builtin_ia32_pmaxsd512_mask", IX86_BUILTIN_PMAXSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv8di3_mask, "__builtin_ia32_pmaxsq512_mask", IX86_BUILTIN_PMAXSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv16si3_mask, "__builtin_ia32_pmaxud512_mask", IX86_BUILTIN_PMAXUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv8di3_mask, "__builtin_ia32_pmaxuq512_mask", IX86_BUILTIN_PMAXUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv16si3_mask, "__builtin_ia32_pminsd512_mask", IX86_BUILTIN_PMINSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv8di3_mask, "__builtin_ia32_pminsq512_mask", IX86_BUILTIN_PMINSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv16si3_mask, "__builtin_ia32_pminud512_mask", IX86_BUILTIN_PMINUD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv8di3_mask, "__builtin_ia32_pminuq512_mask", IX86_BUILTIN_PMINUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16qi2_mask, "__builtin_ia32_pmovdb512_mask", IX86_BUILTIN_PMOVDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev16siv16hi2_mask, "__builtin_ia32_pmovdw512_mask", IX86_BUILTIN_PMOVDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div16qi2_mask, "__builtin_ia32_pmovqb512_mask", IX86_BUILTIN_PMOVQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8si2_mask, "__builtin_ia32_pmovqd512_mask", IX86_BUILTIN_PMOVQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_truncatev8div8hi2_mask, "__builtin_ia32_pmovqw512_mask", IX86_BUILTIN_PMOVQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16qi2_mask, "__builtin_ia32_pmovsdb512_mask", IX86_BUILTIN_PMOVSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev16siv16hi2_mask, "__builtin_ia32_pmovsdw512_mask", IX86_BUILTIN_PMOVSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div16qi2_mask, "__builtin_ia32_pmovsqb512_mask", IX86_BUILTIN_PMOVSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8si2_mask, "__builtin_ia32_pmovsqd512_mask", IX86_BUILTIN_PMOVSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ss_truncatev8div8hi2_mask, "__builtin_ia32_pmovsqw512_mask", IX86_BUILTIN_PMOVSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv16qiv16si2_mask, "__builtin_ia32_pmovsxbd512_mask", IX86_BUILTIN_PMOVSXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv8qiv8di2_mask, "__builtin_ia32_pmovsxbq512_mask", IX86_BUILTIN_PMOVSXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv8siv8di2_mask, "__builtin_ia32_pmovsxdq512_mask", IX86_BUILTIN_PMOVSXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv16hiv16si2_mask, "__builtin_ia32_pmovsxwd512_mask", IX86_BUILTIN_PMOVSXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sign_extendv8hiv8di2_mask, "__builtin_ia32_pmovsxwq512_mask", IX86_BUILTIN_PMOVSXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16qi2_mask, "__builtin_ia32_pmovusdb512_mask", IX86_BUILTIN_PMOVUSDB512, UNKNOWN, (int) V16QI_FTYPE_V16SI_V16QI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev16siv16hi2_mask, "__builtin_ia32_pmovusdw512_mask", IX86_BUILTIN_PMOVUSDW512, UNKNOWN, (int) V16HI_FTYPE_V16SI_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div16qi2_mask, "__builtin_ia32_pmovusqb512_mask", IX86_BUILTIN_PMOVUSQB512, UNKNOWN, (int) V16QI_FTYPE_V8DI_V16QI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8si2_mask, "__builtin_ia32_pmovusqd512_mask", IX86_BUILTIN_PMOVUSQD512, UNKNOWN, (int) V8SI_FTYPE_V8DI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_us_truncatev8div8hi2_mask, "__builtin_ia32_pmovusqw512_mask", IX86_BUILTIN_PMOVUSQW512, UNKNOWN, (int) V8HI_FTYPE_V8DI_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv16qiv16si2_mask, "__builtin_ia32_pmovzxbd512_mask", IX86_BUILTIN_PMOVZXBD512, UNKNOWN, (int) V16SI_FTYPE_V16QI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv8qiv8di2_mask, "__builtin_ia32_pmovzxbq512_mask", IX86_BUILTIN_PMOVZXBQ512, UNKNOWN, (int) V8DI_FTYPE_V16QI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv8siv8di2_mask, "__builtin_ia32_pmovzxdq512_mask", IX86_BUILTIN_PMOVZXDQ512, UNKNOWN, (int) V8DI_FTYPE_V8SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv16hiv16si2_mask, "__builtin_ia32_pmovzxwd512_mask", IX86_BUILTIN_PMOVZXWD512, UNKNOWN, (int) V16SI_FTYPE_V16HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_zero_extendv8hiv8di2_mask, "__builtin_ia32_pmovzxwq512_mask", IX86_BUILTIN_PMOVZXWQ512, UNKNOWN, (int) V8DI_FTYPE_V8HI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vec_widen_smult_even_v16si_mask, "__builtin_ia32_pmuldq512_mask", IX86_BUILTIN_PMULDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv16si3_mask, "__builtin_ia32_pmulld512_mask" , IX86_BUILTIN_PMULLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vec_widen_umult_even_v16si_mask, "__builtin_ia32_pmuludq512_mask", IX86_BUILTIN_PMULUDQ512, UNKNOWN, (int) V8DI_FTYPE_V16SI_V16SI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv16si3_mask, "__builtin_ia32_pord512_mask", IX86_BUILTIN_PORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv8di3_mask, "__builtin_ia32_porq512_mask", IX86_BUILTIN_PORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolv16si_mask, "__builtin_ia32_prold512_mask", IX86_BUILTIN_PROLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolv8di_mask, "__builtin_ia32_prolq512_mask", IX86_BUILTIN_PROLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolvv16si_mask, "__builtin_ia32_prolvd512_mask", IX86_BUILTIN_PROLVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rolvv8di_mask, "__builtin_ia32_prolvq512_mask", IX86_BUILTIN_PROLVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorv16si_mask, "__builtin_ia32_prord512_mask", IX86_BUILTIN_PRORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorv8di_mask, "__builtin_ia32_prorq512_mask", IX86_BUILTIN_PRORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorvv16si_mask, "__builtin_ia32_prorvd512_mask", IX86_BUILTIN_PRORVD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rorvv8di_mask, "__builtin_ia32_prorvq512_mask", IX86_BUILTIN_PRORVQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_pshufdv3_mask, "__builtin_ia32_pshufd512_mask", IX86_BUILTIN_PSHUFD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslld512_mask", IX86_BUILTIN_PSLLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv16si3_mask, "__builtin_ia32_pslldi512_mask", IX86_BUILTIN_PSLLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllq512_mask", IX86_BUILTIN_PSLLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv8di3_mask, "__builtin_ia32_psllqi512_mask", IX86_BUILTIN_PSLLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashlvv16si_mask, "__builtin_ia32_psllv16si_mask", IX86_BUILTIN_PSLLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashlvv8di_mask, "__builtin_ia32_psllv8di_mask", IX86_BUILTIN_PSLLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psrad512_mask", IX86_BUILTIN_PSRAD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv16si3_mask, "__builtin_ia32_psradi512_mask", IX86_BUILTIN_PSRADI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraq512_mask", IX86_BUILTIN_PSRAQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv8di3_mask, "__builtin_ia32_psraqi512_mask", IX86_BUILTIN_PSRAQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashrvv16si_mask, "__builtin_ia32_psrav16si_mask", IX86_BUILTIN_PSRAVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ashrvv8di_mask, "__builtin_ia32_psrav8di_mask", IX86_BUILTIN_PSRAVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrld512_mask", IX86_BUILTIN_PSRLD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv16si3_mask, "__builtin_ia32_psrldi512_mask", IX86_BUILTIN_PSRLDI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlq512_mask", IX86_BUILTIN_PSRLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv8di3_mask, "__builtin_ia32_psrlqi512_mask", IX86_BUILTIN_PSRLQI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_lshrvv16si_mask, "__builtin_ia32_psrlv16si_mask", IX86_BUILTIN_PSRLVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_lshrvv8di_mask, "__builtin_ia32_psrlv8di_mask", IX86_BUILTIN_PSRLVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv16si3_mask, "__builtin_ia32_psubd512_mask", IX86_BUILTIN_PSUBD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv8di3_mask, "__builtin_ia32_psubq512_mask", IX86_BUILTIN_PSUBQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testmv16si3_mask, "__builtin_ia32_ptestmd512", IX86_BUILTIN_PTESTMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testmv8di3_mask, "__builtin_ia32_ptestmq512", IX86_BUILTIN_PTESTMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testnmv16si3_mask, "__builtin_ia32_ptestnmd512", IX86_BUILTIN_PTESTNMD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_testnmv8di3_mask, "__builtin_ia32_ptestnmq512", IX86_BUILTIN_PTESTNMQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_highv16si_mask, "__builtin_ia32_punpckhdq512_mask", IX86_BUILTIN_PUNPCKHDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_highv8di_mask, "__builtin_ia32_punpckhqdq512_mask", IX86_BUILTIN_PUNPCKHQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_lowv16si_mask, "__builtin_ia32_punpckldq512_mask", IX86_BUILTIN_PUNPCKLDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_interleave_lowv8di_mask, "__builtin_ia32_punpcklqdq512_mask", IX86_BUILTIN_PUNPCKLQDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv16si3_mask, "__builtin_ia32_pxord512_mask", IX86_BUILTIN_PXORD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv8di3_mask, "__builtin_ia32_pxorq512_mask", IX86_BUILTIN_PXORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rcp14v8df_mask, "__builtin_ia32_rcp14pd512_mask", IX86_BUILTIN_RCP14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rcp14v16sf_mask, "__builtin_ia32_rcp14ps512_mask", IX86_BUILTIN_RCP14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v2df, "__builtin_ia32_rcp14sd", IX86_BUILTIN_RCP14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v2df_mask, "__builtin_ia32_rcp14sd_mask", IX86_BUILTIN_RCP14SDMASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v4sf, "__builtin_ia32_rcp14ss", IX86_BUILTIN_RCP14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_srcp14v4sf_mask, "__builtin_ia32_rcp14ss_mask", IX86_BUILTIN_RCP14SSMASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v8df_mask, "__builtin_ia32_rsqrt14pd512_mask", IX86_BUILTIN_RSQRT14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v16sf_mask, "__builtin_ia32_rsqrt14ps512_mask", IX86_BUILTIN_RSQRT14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rsqrt14v8df_mask, "__builtin_ia32_rsqrt14pd512_mask", IX86_BUILTIN_RSQRT14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_rsqrt14v16sf_mask, "__builtin_ia32_rsqrt14ps512_mask", IX86_BUILTIN_RSQRT14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v2df, "__builtin_ia32_rsqrt14sd", IX86_BUILTIN_RSQRT14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14_v2df_mask, "__builtin_ia32_rsqrt14sd_mask", IX86_BUILTIN_RSQRT14SDMASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14v4sf, "__builtin_ia32_rsqrt14ss", IX86_BUILTIN_RSQRT14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_rsqrt14_v4sf_mask, "__builtin_ia32_rsqrt14ss_mask", IX86_BUILTIN_RSQRT14SSMASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shufpd512_mask, "__builtin_ia32_shufpd512_mask", IX86_BUILTIN_SHUFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shufps512_mask, "__builtin_ia32_shufps512_mask", IX86_BUILTIN_SHUFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_f32x4_mask, "__builtin_ia32_shuf_f32x4_mask", IX86_BUILTIN_SHUF_F32x4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_f64x2_mask, "__builtin_ia32_shuf_f64x2_mask", IX86_BUILTIN_SHUF_F64x2, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_i32x4_mask, "__builtin_ia32_shuf_i32x4_mask", IX86_BUILTIN_SHUF_I32x4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_shuf_i64x2_mask, "__builtin_ia32_shuf_i64x2_mask", IX86_BUILTIN_SHUF_I64x2, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ucmpv16si3_mask, "__builtin_ia32_ucmpd512_mask", IX86_BUILTIN_UCMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_ucmpv8di3_mask, "__builtin_ia32_ucmpq512_mask", IX86_BUILTIN_UCMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpckhpd512_mask, "__builtin_ia32_unpckhpd512_mask", IX86_BUILTIN_UNPCKHPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpckhps512_mask, "__builtin_ia32_unpckhps512_mask", IX86_BUILTIN_UNPCKHPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpcklpd512_mask, "__builtin_ia32_unpcklpd512_mask", IX86_BUILTIN_UNPCKLPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_unpcklps512_mask, "__builtin_ia32_unpcklps512_mask", IX86_BUILTIN_UNPCKLPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_clzv16si2_mask, "__builtin_ia32_vplzcntd_512_mask", IX86_BUILTIN_VPCLZCNTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_clzv8di2_mask, "__builtin_ia32_vplzcntq_512_mask", IX86_BUILTIN_VPCLZCNTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_conflictv16si_mask, "__builtin_ia32_vpconflictsi_512_mask", IX86_BUILTIN_VPCONFLICTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512CD, 0, CODE_FOR_conflictv8di_mask, "__builtin_ia32_vpconflictdi_512_mask", IX86_BUILTIN_VPCONFLICTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permv8df_mask, "__builtin_ia32_permdf512_mask", IX86_BUILTIN_VPERMDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permv8di_mask, "__builtin_ia32_permdi512_mask", IX86_BUILTIN_VPERMDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv16si3_mask, "__builtin_ia32_vpermi2vard512_mask", IX86_BUILTIN_VPERMI2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv8df3_mask, "__builtin_ia32_vpermi2varpd512_mask", IX86_BUILTIN_VPERMI2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv16sf3_mask, "__builtin_ia32_vpermi2varps512_mask", IX86_BUILTIN_VPERMI2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermi2varv8di3_mask, "__builtin_ia32_vpermi2varq512_mask", IX86_BUILTIN_VPERMI2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilv8df_mask, "__builtin_ia32_vpermilpd512_mask", IX86_BUILTIN_VPERMILPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilv16sf_mask, "__builtin_ia32_vpermilps512_mask", IX86_BUILTIN_VPERMILPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilvarv8df3_mask, "__builtin_ia32_vpermilvarpd512_mask", IX86_BUILTIN_VPERMILVARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermilvarv16sf3_mask, "__builtin_ia32_vpermilvarps512_mask", IX86_BUILTIN_VPERMILVARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16si3_mask, "__builtin_ia32_vpermt2vard512_mask", IX86_BUILTIN_VPERMT2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16si3_maskz, "__builtin_ia32_vpermt2vard512_maskz", IX86_BUILTIN_VPERMT2VARD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8df3_mask, "__builtin_ia32_vpermt2varpd512_mask", IX86_BUILTIN_VPERMT2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8df3_maskz, "__builtin_ia32_vpermt2varpd512_maskz", IX86_BUILTIN_VPERMT2VARPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16sf3_mask, "__builtin_ia32_vpermt2varps512_mask", IX86_BUILTIN_VPERMT2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv16sf3_maskz, "__builtin_ia32_vpermt2varps512_maskz", IX86_BUILTIN_VPERMT2VARPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8di3_mask, "__builtin_ia32_vpermt2varq512_mask", IX86_BUILTIN_VPERMT2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vpermt2varv8di3_maskz, "__builtin_ia32_vpermt2varq512_maskz", IX86_BUILTIN_VPERMT2VARQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv8df_mask, "__builtin_ia32_permvardf512_mask", IX86_BUILTIN_VPERMVARDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv8di_mask, "__builtin_ia32_permvardi512_mask", IX86_BUILTIN_VPERMVARDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv16sf_mask, "__builtin_ia32_permvarsf512_mask", IX86_BUILTIN_VPERMVARSF512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_permvarv16si_mask, "__builtin_ia32_permvarsi512_mask", IX86_BUILTIN_VPERMVARSI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv16si_mask, "__builtin_ia32_pternlogd512_mask", IX86_BUILTIN_VTERNLOGD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv16si_maskz, "__builtin_ia32_pternlogd512_maskz", IX86_BUILTIN_VTERNLOGD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv8di_mask, "__builtin_ia32_pternlogq512_mask", IX86_BUILTIN_VTERNLOGQ512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vternlogv8di_maskz, "__builtin_ia32_pternlogq512_maskz", IX86_BUILTIN_VTERNLOGQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shufpd512_mask, "__builtin_ia32_shufpd512_mask", IX86_BUILTIN_SHUFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shufps512_mask, "__builtin_ia32_shufps512_mask", IX86_BUILTIN_SHUFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_f32x4_mask, "__builtin_ia32_shuf_f32x4_mask", IX86_BUILTIN_SHUF_F32x4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_f64x2_mask, "__builtin_ia32_shuf_f64x2_mask", IX86_BUILTIN_SHUF_F64x2, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_i32x4_mask, "__builtin_ia32_shuf_i32x4_mask", IX86_BUILTIN_SHUF_I32x4, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_shuf_i64x2_mask, "__builtin_ia32_shuf_i64x2_mask", IX86_BUILTIN_SHUF_I64x2, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ucmpv16si3_mask, "__builtin_ia32_ucmpd512_mask", IX86_BUILTIN_UCMPD512, UNKNOWN, (int) UHI_FTYPE_V16SI_V16SI_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_ucmpv8di3_mask, "__builtin_ia32_ucmpq512_mask", IX86_BUILTIN_UCMPQ512, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpckhpd512_mask, "__builtin_ia32_unpckhpd512_mask", IX86_BUILTIN_UNPCKHPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpckhps512_mask, "__builtin_ia32_unpckhps512_mask", IX86_BUILTIN_UNPCKHPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpcklpd512_mask, "__builtin_ia32_unpcklpd512_mask", IX86_BUILTIN_UNPCKLPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_unpcklps512_mask, "__builtin_ia32_unpcklps512_mask", IX86_BUILTIN_UNPCKLPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_clzv16si2_mask, "__builtin_ia32_vplzcntd_512_mask", IX86_BUILTIN_VPCLZCNTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_clzv8di2_mask, "__builtin_ia32_vplzcntq_512_mask", IX86_BUILTIN_VPCLZCNTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_conflictv16si_mask, "__builtin_ia32_vpconflictsi_512_mask", IX86_BUILTIN_VPCONFLICTD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512CD, OPTION_MASK_ISA2_EVEX512, CODE_FOR_conflictv8di_mask, "__builtin_ia32_vpconflictdi_512_mask", IX86_BUILTIN_VPCONFLICTQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permv8df_mask, "__builtin_ia32_permdf512_mask", IX86_BUILTIN_VPERMDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permv8di_mask, "__builtin_ia32_permdi512_mask", IX86_BUILTIN_VPERMDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv16si3_mask, "__builtin_ia32_vpermi2vard512_mask", IX86_BUILTIN_VPERMI2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv8df3_mask, "__builtin_ia32_vpermi2varpd512_mask", IX86_BUILTIN_VPERMI2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv16sf3_mask, "__builtin_ia32_vpermi2varps512_mask", IX86_BUILTIN_VPERMI2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermi2varv8di3_mask, "__builtin_ia32_vpermi2varq512_mask", IX86_BUILTIN_VPERMI2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilv8df_mask, "__builtin_ia32_vpermilpd512_mask", IX86_BUILTIN_VPERMILPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilv16sf_mask, "__builtin_ia32_vpermilps512_mask", IX86_BUILTIN_VPERMILPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilvarv8df3_mask, "__builtin_ia32_vpermilvarpd512_mask", IX86_BUILTIN_VPERMILVARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermilvarv16sf3_mask, "__builtin_ia32_vpermilvarps512_mask", IX86_BUILTIN_VPERMILVARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16si3_mask, "__builtin_ia32_vpermt2vard512_mask", IX86_BUILTIN_VPERMT2VARD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16si3_maskz, "__builtin_ia32_vpermt2vard512_maskz", IX86_BUILTIN_VPERMT2VARD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8df3_mask, "__builtin_ia32_vpermt2varpd512_mask", IX86_BUILTIN_VPERMT2VARPD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8df3_maskz, "__builtin_ia32_vpermt2varpd512_maskz", IX86_BUILTIN_VPERMT2VARPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16sf3_mask, "__builtin_ia32_vpermt2varps512_mask", IX86_BUILTIN_VPERMT2VARPS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv16sf3_maskz, "__builtin_ia32_vpermt2varps512_maskz", IX86_BUILTIN_VPERMT2VARPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8di3_mask, "__builtin_ia32_vpermt2varq512_mask", IX86_BUILTIN_VPERMT2VARQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vpermt2varv8di3_maskz, "__builtin_ia32_vpermt2varq512_maskz", IX86_BUILTIN_VPERMT2VARQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv8df_mask, "__builtin_ia32_permvardf512_mask", IX86_BUILTIN_VPERMVARDF512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DI_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv8di_mask, "__builtin_ia32_permvardi512_mask", IX86_BUILTIN_VPERMVARDI512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv16sf_mask, "__builtin_ia32_permvarsf512_mask", IX86_BUILTIN_VPERMVARSF512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SI_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_permvarv16si_mask, "__builtin_ia32_permvarsi512_mask", IX86_BUILTIN_VPERMVARSI512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv16si_mask, "__builtin_ia32_pternlogd512_mask", IX86_BUILTIN_VTERNLOGD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv16si_maskz, "__builtin_ia32_pternlogd512_maskz", IX86_BUILTIN_VTERNLOGD512_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv8di_mask, "__builtin_ia32_pternlogq512_mask", IX86_BUILTIN_VTERNLOGQ512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vternlogv8di_maskz, "__builtin_ia32_pternlogq512_maskz", IX86_BUILTIN_VTERNLOGQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movdf_mask, "__builtin_ia32_movesd_mask", IX86_BUILTIN_MOVSD_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_movsf_mask, "__builtin_ia32_movess_mask", IX86_BUILTIN_MOVSS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_copysignv16sf3, "__builtin_ia32_copysignps512", IX86_BUILTIN_CPYSGNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_copysignv8df3, "__builtin_ia32_copysignpd512", IX86_BUILTIN_CPYSGNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sqrtv8df2, "__builtin_ia32_sqrtpd512", IX86_BUILTIN_SQRTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sqrtv16sf2, "__builtin_ia32_sqrtps512", IX86_BUILTIN_SQRTPS_NR512, UNKNOWN, (int) V16SF_FTYPE_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_copysignv16sf3, "__builtin_ia32_copysignps512", IX86_BUILTIN_CPYSGNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_copysignv8df3, "__builtin_ia32_copysignpd512", IX86_BUILTIN_CPYSGNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv8df2, "__builtin_ia32_sqrtpd512", IX86_BUILTIN_SQRTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sqrtv16sf2, "__builtin_ia32_sqrtps512", IX86_BUILTIN_SQRTPS_NR512, UNKNOWN, (int) V16SF_FTYPE_V16SF)
BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_exp2v16sf, "__builtin_ia32_exp2ps", IX86_BUILTIN_EXP2PS, UNKNOWN, (int) V16SF_FTYPE_V16SF)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_floorph512", IX86_BUILTIN_FLOORPH512, (enum rtx_code) ROUND_FLOOR, (int) V32HF_FTYPE_V32HF_ROUND)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_ceilph512", IX86_BUILTIN_CEILPH512, (enum rtx_code) ROUND_CEIL, (int) V32HF_FTYPE_V32HF_ROUND)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_truncph512", IX86_BUILTIN_TRUNCPH512, (enum rtx_code) ROUND_TRUNC, (int) V32HF_FTYPE_V32HF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512, "__builtin_ia32_floorps512", IX86_BUILTIN_FLOORPS512, (enum rtx_code) ROUND_FLOOR, (int) V16SF_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512, "__builtin_ia32_ceilps512", IX86_BUILTIN_CEILPS512, (enum rtx_code) ROUND_CEIL, (int) V16SF_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512, "__builtin_ia32_truncps512", IX86_BUILTIN_TRUNCPS512, (enum rtx_code) ROUND_TRUNC, (int) V16SF_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_floorpd512", IX86_BUILTIN_FLOORPD512, (enum rtx_code) ROUND_FLOOR, (int) V8DF_FTYPE_V8DF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_ceilpd512", IX86_BUILTIN_CEILPD512, (enum rtx_code) ROUND_CEIL, (int) V8DF_FTYPE_V8DF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_truncpd512", IX86_BUILTIN_TRUNCPD512, (enum rtx_code) ROUND_TRUNC, (int) V8DF_FTYPE_V8DF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fix_notruncv16sfv16si, "__builtin_ia32_cvtps2dq512", IX86_BUILTIN_CVTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vec_pack_sfix_v8df, "__builtin_ia32_vec_pack_sfix512", IX86_BUILTIN_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_roundv16sf2_sfix, "__builtin_ia32_roundps_az_sfix512", IX86_BUILTIN_ROUNDPS_AZ_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V16SF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_floorps_sfix512", IX86_BUILTIN_FLOORPS_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_ceilps_sfix512", IX86_BUILTIN_CEILPS_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V16SF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_roundv8df2_vec_pack_sfix, "__builtin_ia32_roundpd_az_vec_pack_sfix512", IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_floorpd_vec_pack_sfix512", IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_floorps512", IX86_BUILTIN_FLOORPS512, (enum rtx_code) ROUND_FLOOR, (int) V16SF_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_ceilps512", IX86_BUILTIN_CEILPS512, (enum rtx_code) ROUND_CEIL, (int) V16SF_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_truncps512", IX86_BUILTIN_TRUNCPS512, (enum rtx_code) ROUND_TRUNC, (int) V16SF_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_floorpd512", IX86_BUILTIN_FLOORPD512, (enum rtx_code) ROUND_FLOOR, (int) V8DF_FTYPE_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_ceilpd512", IX86_BUILTIN_CEILPD512, (enum rtx_code) ROUND_CEIL, (int) V8DF_FTYPE_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd512, "__builtin_ia32_truncpd512", IX86_BUILTIN_TRUNCPD512, (enum rtx_code) ROUND_TRUNC, (int) V8DF_FTYPE_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fix_notruncv16sfv16si, "__builtin_ia32_cvtps2dq512", IX86_BUILTIN_CVTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vec_pack_sfix_v8df, "__builtin_ia32_vec_pack_sfix512", IX86_BUILTIN_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_roundv16sf2_sfix, "__builtin_ia32_roundps_az_sfix512", IX86_BUILTIN_ROUNDPS_AZ_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V16SF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_floorps_sfix512", IX86_BUILTIN_FLOORPS_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512_sfix, "__builtin_ia32_ceilps_sfix512", IX86_BUILTIN_CEILPS_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V16SF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_roundv8df2_vec_pack_sfix, "__builtin_ia32_roundpd_az_vec_pack_sfix512", IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512, UNKNOWN, (int) V16SI_FTYPE_V8DF_V8DF)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_floorpd_vec_pack_sfix512", IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_FLOOR, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
/* Mask arithmetic operations */
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kashiftqi, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
@@ -3034,26 +3034,26 @@ BDESC_END (ARGS, ROUND_ARGS)
/* AVX512F. */
BDESC_FIRST (round_args, ROUND_ARGS,
- OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv8df3_mask_round, "__builtin_ia32_addpd512_mask", IX86_BUILTIN_ADDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_addv16sf3_mask_round, "__builtin_ia32_addps512_mask", IX86_BUILTIN_ADDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+ OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv8df3_mask_round, "__builtin_ia32_addpd512_mask", IX86_BUILTIN_ADDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv16sf3_mask_round, "__builtin_ia32_addps512_mask", IX86_BUILTIN_ADDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmaddv2df3_round, "__builtin_ia32_addsd_round", IX86_BUILTIN_ADDSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmaddv2df3_mask_round, "__builtin_ia32_addsd_mask_round", IX86_BUILTIN_ADDSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmaddv4sf3_round, "__builtin_ia32_addss_round", IX86_BUILTIN_ADDSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmaddv4sf3_mask_round, "__builtin_ia32_addss_mask_round", IX86_BUILTIN_ADDSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv8df3_mask_round, "__builtin_ia32_cmppd512_mask", IX86_BUILTIN_CMPPD512, UNKNOWN, (int) UQI_FTYPE_V8DF_V8DF_INT_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cmpv16sf3_mask_round, "__builtin_ia32_cmpps512_mask", IX86_BUILTIN_CMPPS512, UNKNOWN, (int) UHI_FTYPE_V16SF_V16SF_INT_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv8df3_mask_round, "__builtin_ia32_cmppd512_mask", IX86_BUILTIN_CMPPD512, UNKNOWN, (int) UQI_FTYPE_V8DF_V8DF_INT_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cmpv16sf3_mask_round, "__builtin_ia32_cmpps512_mask", IX86_BUILTIN_CMPPS512, UNKNOWN, (int) UHI_FTYPE_V16SF_V16SF_INT_UHI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmcmpv2df3_mask_round, "__builtin_ia32_cmpsd_mask", IX86_BUILTIN_CMPSD_MASK, UNKNOWN, (int) UQI_FTYPE_V2DF_V2DF_INT_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmcmpv4sf3_mask_round, "__builtin_ia32_cmpss_mask", IX86_BUILTIN_CMPSS_MASK, UNKNOWN, (int) UQI_FTYPE_V4SF_V4SF_INT_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_comi_round, "__builtin_ia32_vcomisd", IX86_BUILTIN_COMIDF, UNKNOWN, (int) INT_FTYPE_V2DF_V2DF_INT_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_comi_round, "__builtin_ia32_vcomiss", IX86_BUILTIN_COMISF, UNKNOWN, (int) INT_FTYPE_V4SF_V4SF_INT_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatv16siv16sf2_mask_round, "__builtin_ia32_cvtdq2ps512_mask", IX86_BUILTIN_CVTDQ2PS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cvtpd2dq512_mask_round, "__builtin_ia32_cvtpd2dq512_mask", IX86_BUILTIN_CVTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cvtpd2ps512_mask_round, "__builtin_ia32_cvtpd2ps512_mask", IX86_BUILTIN_CVTPD2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DF_V8SF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fixuns_notruncv8dfv8si2_mask_round, "__builtin_ia32_cvtpd2udq512_mask", IX86_BUILTIN_CVTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvtph2ps512_mask_round, "__builtin_ia32_vcvtph2ps512_mask", IX86_BUILTIN_CVTPH2PS512, UNKNOWN, (int) V16SF_FTYPE_V16HI_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2dq512_mask", IX86_BUILTIN_CVTPS2DQ512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_cvtps2pd512_mask_round, "__builtin_ia32_cvtps2pd512_mask", IX86_BUILTIN_CVTPS2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SF_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixuns_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2udq512_mask", IX86_BUILTIN_CVTPS2UDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv16siv16sf2_mask_round, "__builtin_ia32_cvtdq2ps512_mask", IX86_BUILTIN_CVTDQ2PS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtpd2dq512_mask_round, "__builtin_ia32_cvtpd2dq512_mask", IX86_BUILTIN_CVTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtpd2ps512_mask_round, "__builtin_ia32_cvtpd2ps512_mask", IX86_BUILTIN_CVTPD2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DF_V8SF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_notruncv8dfv8si2_mask_round, "__builtin_ia32_cvtpd2udq512_mask", IX86_BUILTIN_CVTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_vcvtph2ps512_mask_round, "__builtin_ia32_vcvtph2ps512_mask", IX86_BUILTIN_CVTPH2PS512, UNKNOWN, (int) V16SF_FTYPE_V16HI_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2dq512_mask", IX86_BUILTIN_CVTPS2DQ512_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtps2pd512_mask_round, "__builtin_ia32_cvtps2pd512_mask", IX86_BUILTIN_CVTPS2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SF_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixuns_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2udq512_mask", IX86_BUILTIN_CVTPS2UDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_cvtsd2ss_round, "__builtin_ia32_cvtsd2ss_round", IX86_BUILTIN_CVTSD2SS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V2DF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_cvtsd2ss_mask_round, "__builtin_ia32_cvtsd2ss_mask_round", IX86_BUILTIN_CVTSD2SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V2DF_V4SF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_sse2_cvtsi2sdq_round, "__builtin_ia32_cvtsi2sd64", IX86_BUILTIN_CVTSI2SD64, UNKNOWN, (int) V2DF_FTYPE_V2DF_INT64_INT)
@@ -3069,64 +3069,64 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatunsv16siv16sf2_mask_round, "__b
BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_cvtusi2sd64_round, "__builtin_ia32_cvtusi2sd64", IX86_BUILTIN_CVTUSI2SD64, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT64_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_cvtusi2ss32_round, "__builtin_ia32_cvtusi2ss32", IX86_BUILTIN_CVTUSI2SS32, UNKNOWN, (int) V4SF_FTYPE_V4SF_UINT_INT)
BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_cvtusi2ss64_round, "__builtin_ia32_cvtusi2ss64", IX86_BUILTIN_CVTUSI2SS64, UNKNOWN, (int) V4SF_FTYPE_V4SF_UINT64_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_divv8df3_mask_round, "__builtin_ia32_divpd512_mask", IX86_BUILTIN_DIVPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_divv16sf3_mask_round, "__builtin_ia32_divps512_mask", IX86_BUILTIN_DIVPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_divv8df3_mask_round, "__builtin_ia32_divpd512_mask", IX86_BUILTIN_DIVPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_divv16sf3_mask_round, "__builtin_ia32_divps512_mask", IX86_BUILTIN_DIVPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmdivv2df3_round, "__builtin_ia32_divsd_round", IX86_BUILTIN_DIVSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmdivv2df3_mask_round, "__builtin_ia32_divsd_mask_round", IX86_BUILTIN_DIVSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmdivv4sf3_round, "__builtin_ia32_divss_round", IX86_BUILTIN_DIVSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmdivv4sf3_mask_round, "__builtin_ia32_divss_mask_round", IX86_BUILTIN_DIVSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv8df_mask_round, "__builtin_ia32_fixupimmpd512_mask", IX86_BUILTIN_FIXUPIMMPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv8df_maskz_round, "__builtin_ia32_fixupimmpd512_maskz", IX86_BUILTIN_FIXUPIMMPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv16sf_mask_round, "__builtin_ia32_fixupimmps512_mask", IX86_BUILTIN_FIXUPIMMPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fixupimmv16sf_maskz_round, "__builtin_ia32_fixupimmps512_maskz", IX86_BUILTIN_FIXUPIMMPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv8df_mask_round, "__builtin_ia32_fixupimmpd512_mask", IX86_BUILTIN_FIXUPIMMPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv8df_maskz_round, "__builtin_ia32_fixupimmpd512_maskz", IX86_BUILTIN_FIXUPIMMPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv16sf_mask_round, "__builtin_ia32_fixupimmps512_mask", IX86_BUILTIN_FIXUPIMMPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fixupimmv16sf_maskz_round, "__builtin_ia32_fixupimmps512_maskz", IX86_BUILTIN_FIXUPIMMPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv2df_mask_round, "__builtin_ia32_fixupimmsd_mask", IX86_BUILTIN_FIXUPIMMSD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DI_INT_QI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv2df_maskz_round, "__builtin_ia32_fixupimmsd_maskz", IX86_BUILTIN_FIXUPIMMSD128_MASKZ, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DI_INT_QI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv4sf_mask_round, "__builtin_ia32_fixupimmss_mask", IX86_BUILTIN_FIXUPIMMSS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI_INT_QI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sfixupimmv4sf_maskz_round, "__builtin_ia32_fixupimmss_maskz", IX86_BUILTIN_FIXUPIMMSS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI_INT_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getexpv8df_mask_round, "__builtin_ia32_getexppd512_mask", IX86_BUILTIN_GETEXPPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getexpv16sf_mask_round, "__builtin_ia32_getexpps512_mask", IX86_BUILTIN_GETEXPPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getexpv8df_mask_round, "__builtin_ia32_getexppd512_mask", IX86_BUILTIN_GETEXPPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getexpv16sf_mask_round, "__builtin_ia32_getexpps512_mask", IX86_BUILTIN_GETEXPPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv2df_round, "__builtin_ia32_getexpsd128_round", IX86_BUILTIN_GETEXPSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv2df_mask_round, "__builtin_ia32_getexpsd_mask_round", IX86_BUILTIN_GETEXPSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv4sf_round, "__builtin_ia32_getexpss128_round", IX86_BUILTIN_GETEXPSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sgetexpv4sf_mask_round, "__builtin_ia32_getexpss_mask_round", IX86_BUILTIN_GETEXPSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getmantv8df_mask_round, "__builtin_ia32_getmantpd512_mask", IX86_BUILTIN_GETMANTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_getmantv16sf_mask_round, "__builtin_ia32_getmantps512_mask", IX86_BUILTIN_GETMANTPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getmantv8df_mask_round, "__builtin_ia32_getmantpd512_mask", IX86_BUILTIN_GETMANTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_getmantv16sf_mask_round, "__builtin_ia32_getmantps512_mask", IX86_BUILTIN_GETMANTPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv2df_round, "__builtin_ia32_getmantsd_round", IX86_BUILTIN_GETMANTSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv2df_mask_round, "__builtin_ia32_getmantsd_mask_round", IX86_BUILTIN_GETMANTSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv4sf_round, "__builtin_ia32_getmantss_round", IX86_BUILTIN_GETMANTSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vgetmantv4sf_mask_round, "__builtin_ia32_getmantss_mask_round", IX86_BUILTIN_GETMANTSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv8df3_mask_round, "__builtin_ia32_maxpd512_mask", IX86_BUILTIN_MAXPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv16sf3_mask_round, "__builtin_ia32_maxps512_mask", IX86_BUILTIN_MAXPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv8df3_mask_round, "__builtin_ia32_maxpd512_mask", IX86_BUILTIN_MAXPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv16sf3_mask_round, "__builtin_ia32_maxps512_mask", IX86_BUILTIN_MAXPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsmaxv2df3_round, "__builtin_ia32_maxsd_round", IX86_BUILTIN_MAXSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsmaxv2df3_mask_round, "__builtin_ia32_maxsd_mask_round", IX86_BUILTIN_MAXSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsmaxv4sf3_round, "__builtin_ia32_maxss_round", IX86_BUILTIN_MAXSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsmaxv4sf3_mask_round, "__builtin_ia32_maxss_mask_round", IX86_BUILTIN_MAXSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv8df3_mask_round, "__builtin_ia32_minpd512_mask", IX86_BUILTIN_MINPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sminv16sf3_mask_round, "__builtin_ia32_minps512_mask", IX86_BUILTIN_MINPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv8df3_mask_round, "__builtin_ia32_minpd512_mask", IX86_BUILTIN_MINPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv16sf3_mask_round, "__builtin_ia32_minps512_mask", IX86_BUILTIN_MINPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsminv2df3_round, "__builtin_ia32_minsd_round", IX86_BUILTIN_MINSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsminv2df3_mask_round, "__builtin_ia32_minsd_mask_round", IX86_BUILTIN_MINSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsminv4sf3_round, "__builtin_ia32_minss_round", IX86_BUILTIN_MINSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsminv4sf3_mask_round, "__builtin_ia32_minss_mask_round", IX86_BUILTIN_MINSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_mulv8df3_mask_round, "__builtin_ia32_mulpd512_mask", IX86_BUILTIN_MULPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_mulv16sf3_mask_round, "__builtin_ia32_mulps512_mask", IX86_BUILTIN_MULPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv8df3_mask_round, "__builtin_ia32_mulpd512_mask", IX86_BUILTIN_MULPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv16sf3_mask_round, "__builtin_ia32_mulps512_mask", IX86_BUILTIN_MULPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmmulv2df3_round, "__builtin_ia32_mulsd_round", IX86_BUILTIN_MULSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmmulv2df3_mask_round, "__builtin_ia32_mulsd_mask_round", IX86_BUILTIN_MULSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_round, "__builtin_ia32_mulss_round", IX86_BUILTIN_MULSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_mask_round, "__builtin_ia32_mulss_mask_round", IX86_BUILTIN_MULSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev2df_mask_round, "__builtin_ia32_rndscalesd_mask_round", IX86_BUILTIN_RNDSCALESD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev4sf_mask_round, "__builtin_ia32_rndscaless_mask_round", IX86_BUILTIN_RNDSCALESS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmscalefv2df_mask_round, "__builtin_ia32_scalefsd_mask_round", IX86_BUILTIN_SCALEFSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmscalefv4sf_mask_round, "__builtin_ia32_scalefss_mask_round", IX86_BUILTIN_SCALEFSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sqrtv8df2_mask_round, "__builtin_ia32_sqrtpd512_mask", IX86_BUILTIN_SQRTPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_sqrtv16sf2_mask_round, "__builtin_ia32_sqrtps512_mask", IX86_BUILTIN_SQRTPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv8df2_mask_round, "__builtin_ia32_sqrtpd512_mask", IX86_BUILTIN_SQRTPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv16sf2_mask_round, "__builtin_ia32_sqrtps512_mask", IX86_BUILTIN_SQRTPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsqrtv2df2_mask_round, "__builtin_ia32_sqrtsd_mask_round", IX86_BUILTIN_SQRTSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsqrtv4sf2_mask_round, "__builtin_ia32_sqrtss_mask_round", IX86_BUILTIN_SQRTSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv8df3_mask_round, "__builtin_ia32_subpd512_mask", IX86_BUILTIN_SUBPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_subv16sf3_mask_round, "__builtin_ia32_subps512_mask", IX86_BUILTIN_SUBPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv8df3_mask_round, "__builtin_ia32_subpd512_mask", IX86_BUILTIN_SUBPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv16sf3_mask_round, "__builtin_ia32_subps512_mask", IX86_BUILTIN_SUBPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsubv2df3_round, "__builtin_ia32_subsd_round", IX86_BUILTIN_SUBSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_vmsubv2df3_mask_round, "__builtin_ia32_subsd_mask_round", IX86_BUILTIN_SUBSD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmsubv4sf3_round, "__builtin_ia32_subss_round", IX86_BUILTIN_SUBSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
@@ -3147,12 +3147,12 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_cvttss2si_round, "__builtin_ia32
BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_sse_cvttss2siq_round, "__builtin_ia32_vcvttss2si64", IX86_BUILTIN_VCVTTSS2SI64, UNKNOWN, (int) INT64_FTYPE_V4SF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvttss2usi_round, "__builtin_ia32_vcvttss2usi32", IX86_BUILTIN_VCVTTSS2USI32, UNKNOWN, (int) UINT_FTYPE_V4SF_INT)
BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, CODE_FOR_avx512f_vcvttss2usiq_round, "__builtin_ia32_vcvttss2usi64", IX86_BUILTIN_VCVTTSS2USI64, UNKNOWN, (int) UINT64_FTYPE_V4SF_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v8df_mask_round, "__builtin_ia32_vfmaddpd512_mask", IX86_BUILTIN_VFMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v8df_mask3_round, "__builtin_ia32_vfmaddpd512_mask3", IX86_BUILTIN_VFMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v8df_maskz_round, "__builtin_ia32_vfmaddpd512_maskz", IX86_BUILTIN_VFMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v16sf_mask_round, "__builtin_ia32_vfmaddps512_mask", IX86_BUILTIN_VFMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v16sf_mask3_round, "__builtin_ia32_vfmaddps512_mask3", IX86_BUILTIN_VFMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmadd_v16sf_maskz_round, "__builtin_ia32_vfmaddps512_maskz", IX86_BUILTIN_VFMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v8df_mask_round, "__builtin_ia32_vfmaddpd512_mask", IX86_BUILTIN_VFMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v8df_mask3_round, "__builtin_ia32_vfmaddpd512_mask3", IX86_BUILTIN_VFMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v8df_maskz_round, "__builtin_ia32_vfmaddpd512_maskz", IX86_BUILTIN_VFMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v16sf_mask_round, "__builtin_ia32_vfmaddps512_mask", IX86_BUILTIN_VFMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v16sf_mask3_round, "__builtin_ia32_vfmaddps512_mask3", IX86_BUILTIN_VFMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmadd_v16sf_maskz_round, "__builtin_ia32_vfmaddps512_maskz", IX86_BUILTIN_VFMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fmai_vmfmadd_v2df_round, "__builtin_ia32_vfmaddsd3_round", IX86_BUILTIN_VFMADDSD3_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fmai_vmfmadd_v4sf_round, "__builtin_ia32_vfmaddss3_round", IX86_BUILTIN_VFMADDSS3_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v2df_mask_round, "__builtin_ia32_vfmaddsd3_mask", IX86_BUILTIN_VFMADDSD3_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT)
@@ -3163,32 +3163,32 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v4sf_mask_round, "__
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v4sf_mask3_round, "__builtin_ia32_vfmaddss3_mask3", IX86_BUILTIN_VFMADDSS3_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmadd_v4sf_maskz_round, "__builtin_ia32_vfmaddss3_maskz", IX86_BUILTIN_VFMADDSS3_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmfmsub_v4sf_mask3_round, "__builtin_ia32_vfmsubss3_mask3", IX86_BUILTIN_VFMSUBSS3_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v8df_mask_round, "__builtin_ia32_vfmaddsubpd512_mask", IX86_BUILTIN_VFMADDSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v8df_mask3_round, "__builtin_ia32_vfmaddsubpd512_mask3", IX86_BUILTIN_VFMADDSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v8df_maskz_round, "__builtin_ia32_vfmaddsubpd512_maskz", IX86_BUILTIN_VFMADDSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v16sf_mask_round, "__builtin_ia32_vfmaddsubps512_mask", IX86_BUILTIN_VFMADDSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v16sf_mask3_round, "__builtin_ia32_vfmaddsubps512_mask3", IX86_BUILTIN_VFMADDSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmaddsub_v16sf_maskz_round, "__builtin_ia32_vfmaddsubps512_maskz", IX86_BUILTIN_VFMADDSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsubadd_v8df_mask3_round, "__builtin_ia32_vfmsubaddpd512_mask3", IX86_BUILTIN_VFMSUBADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsubadd_v16sf_mask3_round, "__builtin_ia32_vfmsubaddps512_mask3", IX86_BUILTIN_VFMSUBADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v8df_mask_round, "__builtin_ia32_vfmsubpd512_mask", IX86_BUILTIN_VFMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v8df_mask3_round, "__builtin_ia32_vfmsubpd512_mask3", IX86_BUILTIN_VFMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v8df_maskz_round, "__builtin_ia32_vfmsubpd512_maskz", IX86_BUILTIN_VFMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__builtin_ia32_vfmsubps512_mask", IX86_BUILTIN_VFMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v8df_mask3_round, "__builtin_ia32_vfnmaddpd512_mask3", IX86_BUILTIN_VFNMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v8df_maskz_round, "__builtin_ia32_vfnmaddpd512_maskz", IX86_BUILTIN_VFNMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v16sf_mask3_round, "__builtin_ia32_vfnmaddps512_mask3", IX86_BUILTIN_VFNMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmadd_v16sf_maskz_round, "__builtin_ia32_vfnmaddps512_maskz", IX86_BUILTIN_VFNMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v8df_mask3_round, "__builtin_ia32_vfnmsubpd512_mask3", IX86_BUILTIN_VFNMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v8df_maskz_round, "__builtin_ia32_vfnmsubpd512_maskz", IX86_BUILTIN_VFNMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v16sf_mask_round, "__builtin_ia32_vfnmsubps512_mask", IX86_BUILTIN_VFNMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v16sf_mask3_round, "__builtin_ia32_vfnmsubps512_mask3", IX86_BUILTIN_VFNMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_fnmsub_v16sf_maskz_round, "__builtin_ia32_vfnmsubps512_maskz", IX86_BUILTIN_VFNMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v8df_mask_round, "__builtin_ia32_vfmaddsubpd512_mask", IX86_BUILTIN_VFMADDSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v8df_mask3_round, "__builtin_ia32_vfmaddsubpd512_mask3", IX86_BUILTIN_VFMADDSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v8df_maskz_round, "__builtin_ia32_vfmaddsubpd512_maskz", IX86_BUILTIN_VFMADDSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v16sf_mask_round, "__builtin_ia32_vfmaddsubps512_mask", IX86_BUILTIN_VFMADDSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v16sf_mask3_round, "__builtin_ia32_vfmaddsubps512_mask3", IX86_BUILTIN_VFMADDSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmaddsub_v16sf_maskz_round, "__builtin_ia32_vfmaddsubps512_maskz", IX86_BUILTIN_VFMADDSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsubadd_v8df_mask3_round, "__builtin_ia32_vfmsubaddpd512_mask3", IX86_BUILTIN_VFMSUBADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsubadd_v16sf_mask3_round, "__builtin_ia32_vfmsubaddps512_mask3", IX86_BUILTIN_VFMSUBADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v8df_mask_round, "__builtin_ia32_vfmsubpd512_mask", IX86_BUILTIN_VFMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v8df_mask3_round, "__builtin_ia32_vfmsubpd512_mask3", IX86_BUILTIN_VFMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v8df_maskz_round, "__builtin_ia32_vfmsubpd512_maskz", IX86_BUILTIN_VFMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__builtin_ia32_vfmsubps512_mask", IX86_BUILTIN_VFMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v8df_mask3_round, "__builtin_ia32_vfnmaddpd512_mask3", IX86_BUILTIN_VFNMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v8df_maskz_round, "__builtin_ia32_vfnmaddpd512_maskz", IX86_BUILTIN_VFNMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v16sf_mask3_round, "__builtin_ia32_vfnmaddps512_mask3", IX86_BUILTIN_VFNMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmadd_v16sf_maskz_round, "__builtin_ia32_vfnmaddps512_maskz", IX86_BUILTIN_VFNMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v8df_mask3_round, "__builtin_ia32_vfnmsubpd512_mask3", IX86_BUILTIN_VFNMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v8df_maskz_round, "__builtin_ia32_vfnmsubpd512_maskz", IX86_BUILTIN_VFNMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v16sf_mask_round, "__builtin_ia32_vfnmsubps512_mask", IX86_BUILTIN_VFNMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v16sf_mask3_round, "__builtin_ia32_vfnmsubps512_mask3", IX86_BUILTIN_VFNMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_fnmsub_v16sf_maskz_round, "__builtin_ia32_vfnmsubps512_maskz", IX86_BUILTIN_VFNMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
/* AVX512ER */
BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_exp2v8df_mask_round, "__builtin_ia32_exp2pd_mask", IX86_BUILTIN_EXP2PD_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index 8a0b8dfe073..e1d1dac2ba2 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -784,83 +784,103 @@ ix86_init_mmx_sse_builtins (void)
IX86_BUILTIN_GATHERALTDIV8SI);
/* AVX512F */
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv16sf",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gathersiv16sf",
V16SF_FTYPE_V16SF_PCVOID_V16SI_HI_INT,
IX86_BUILTIN_GATHER3SIV16SF);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv8df",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gathersiv8df",
V8DF_FTYPE_V8DF_PCVOID_V8SI_QI_INT,
IX86_BUILTIN_GATHER3SIV8DF);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv16sf",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gatherdiv16sf",
V8SF_FTYPE_V8SF_PCVOID_V8DI_QI_INT,
IX86_BUILTIN_GATHER3DIV16SF);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv8df",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gatherdiv8df",
V8DF_FTYPE_V8DF_PCVOID_V8DI_QI_INT,
IX86_BUILTIN_GATHER3DIV8DF);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv16si",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gathersiv16si",
V16SI_FTYPE_V16SI_PCVOID_V16SI_HI_INT,
IX86_BUILTIN_GATHER3SIV16SI);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gathersiv8di",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gathersiv8di",
V8DI_FTYPE_V8DI_PCVOID_V8SI_QI_INT,
IX86_BUILTIN_GATHER3SIV8DI);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv16si",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gatherdiv16si",
V8SI_FTYPE_V8SI_PCVOID_V8DI_QI_INT,
IX86_BUILTIN_GATHER3DIV16SI);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gatherdiv8di",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gatherdiv8di",
V8DI_FTYPE_V8DI_PCVOID_V8DI_QI_INT,
IX86_BUILTIN_GATHER3DIV8DI);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altsiv8df ",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gather3altsiv8df ",
V8DF_FTYPE_V8DF_PCDOUBLE_V16SI_QI_INT,
IX86_BUILTIN_GATHER3ALTSIV8DF);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altdiv16sf ",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gather3altdiv16sf ",
V16SF_FTYPE_V16SF_PCFLOAT_V8DI_HI_INT,
IX86_BUILTIN_GATHER3ALTDIV16SF);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altsiv8di ",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gather3altsiv8di ",
V8DI_FTYPE_V8DI_PCINT64_V16SI_QI_INT,
IX86_BUILTIN_GATHER3ALTSIV8DI);
- def_builtin_pure (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_gather3altdiv16si ",
+ def_builtin_pure (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_gather3altdiv16si ",
V16SI_FTYPE_V16SI_PCINT_V8DI_HI_INT,
IX86_BUILTIN_GATHER3ALTDIV16SI);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv16sf",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scattersiv16sf",
VOID_FTYPE_PVOID_HI_V16SI_V16SF_INT,
IX86_BUILTIN_SCATTERSIV16SF);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv8df",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scattersiv8df",
VOID_FTYPE_PVOID_QI_V8SI_V8DF_INT,
IX86_BUILTIN_SCATTERSIV8DF);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv16sf",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scatterdiv16sf",
VOID_FTYPE_PVOID_QI_V8DI_V8SF_INT,
IX86_BUILTIN_SCATTERDIV16SF);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv8df",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scatterdiv8df",
VOID_FTYPE_PVOID_QI_V8DI_V8DF_INT,
IX86_BUILTIN_SCATTERDIV8DF);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv16si",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scattersiv16si",
VOID_FTYPE_PVOID_HI_V16SI_V16SI_INT,
IX86_BUILTIN_SCATTERSIV16SI);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scattersiv8di",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scattersiv8di",
VOID_FTYPE_PVOID_QI_V8SI_V8DI_INT,
IX86_BUILTIN_SCATTERSIV8DI);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv16si",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scatterdiv16si",
VOID_FTYPE_PVOID_QI_V8DI_V8SI_INT,
IX86_BUILTIN_SCATTERDIV16SI);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatterdiv8di",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scatterdiv8di",
VOID_FTYPE_PVOID_QI_V8DI_V8DI_INT,
IX86_BUILTIN_SCATTERDIV8DI);
@@ -1009,19 +1029,23 @@ ix86_init_mmx_sse_builtins (void)
VOID_FTYPE_PVOID_QI_V2DI_V2DI_INT,
IX86_BUILTIN_SCATTERDIV2DI);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltsiv8df ",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scatteraltsiv8df ",
VOID_FTYPE_PDOUBLE_QI_V16SI_V8DF_INT,
IX86_BUILTIN_SCATTERALTSIV8DF);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltdiv16sf ",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scatteraltdiv16sf ",
VOID_FTYPE_PFLOAT_HI_V8DI_V16SF_INT,
IX86_BUILTIN_SCATTERALTDIV16SF);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltsiv8di ",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scatteraltsiv8di ",
VOID_FTYPE_PLONGLONG_QI_V16SI_V8DI_INT,
IX86_BUILTIN_SCATTERALTSIV8DI);
- def_builtin (OPTION_MASK_ISA_AVX512F, 0, "__builtin_ia32_scatteraltdiv16si ",
+ def_builtin (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512,
+ "__builtin_ia32_scatteraltdiv16si ",
VOID_FTYPE_PINT_HI_V8DI_V16SI_INT,
IX86_BUILTIN_SCATTERALTDIV16SI);
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 08/18] [PATCH 2/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (6 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 07/18] [PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 09/18] [PATCH 3/5] " Hu, Lin1
` (11 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.
---
gcc/config/i386/i386-builtin.def | 94 ++++++++++++++++----------------
1 file changed, 47 insertions(+), 47 deletions(-)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 0cc526383db..7a0dec9bc8b 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2408,37 +2408,37 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cmpv2df3_mask, "__builtin_
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cmpv4sf3_mask, "__builtin_ia32_cmpps128_mask", IX86_BUILTIN_CMPPS128_MASK, UNKNOWN, (int) UQI_FTYPE_V4SF_V4SF_INT_UQI)
/* AVX512DQ. */
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x2_512_mask", IX86_BUILTIN_BROADCASTF32x2_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16si_mask, "__builtin_ia32_broadcasti32x2_512_mask", IX86_BUILTIN_BROADCASTI32x2_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv8df_mask_1, "__builtin_ia32_broadcastf64x2_512_mask", IX86_BUILTIN_BROADCASTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv8di_mask_1, "__builtin_ia32_broadcasti64x2_512_mask", IX86_BUILTIN_BROADCASTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16sf_mask_1, "__builtin_ia32_broadcastf32x8_512_mask", IX86_BUILTIN_BROADCASTF32X8_512, UNKNOWN, (int) V16SF_FTYPE_V8SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_broadcastv16si_mask_1, "__builtin_ia32_broadcasti32x8_512_mask", IX86_BUILTIN_BROADCASTI32X8_512, UNKNOWN, (int) V16SI_FTYPE_V8SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextractf64x2_mask, "__builtin_ia32_extractf64x2_512_mask", IX86_BUILTIN_EXTRACTF64X2_512, UNKNOWN, (int) V2DF_FTYPE_V8DF_INT_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextractf32x8_mask, "__builtin_ia32_extractf32x8_mask", IX86_BUILTIN_EXTRACTF32X8, UNKNOWN, (int) V8SF_FTYPE_V16SF_INT_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextracti64x2_mask, "__builtin_ia32_extracti64x2_512_mask", IX86_BUILTIN_EXTRACTI64X2_512, UNKNOWN, (int) V2DI_FTYPE_V8DI_INT_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vextracti32x8_mask, "__builtin_ia32_extracti32x8_mask", IX86_BUILTIN_EXTRACTI32X8, UNKNOWN, (int) V8SI_FTYPE_V16SI_INT_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv8df_mask, "__builtin_ia32_reducepd512_mask", IX86_BUILTIN_REDUCEPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv16sf_mask, "__builtin_ia32_reduceps512_mask", IX86_BUILTIN_REDUCEPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_mulv8di3_mask, "__builtin_ia32_pmullq512_mask", IX86_BUILTIN_PMULLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_xorv8df3_mask, "__builtin_ia32_xorpd512_mask", IX86_BUILTIN_XORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_xorv16sf3_mask, "__builtin_ia32_xorps512_mask", IX86_BUILTIN_XORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_iorv8df3_mask, "__builtin_ia32_orpd512_mask", IX86_BUILTIN_ORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_iorv16sf3_mask, "__builtin_ia32_orps512_mask", IX86_BUILTIN_ORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_andv8df3_mask, "__builtin_ia32_andpd512_mask", IX86_BUILTIN_ANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_andv16sf3_mask, "__builtin_ia32_andps512_mask", IX86_BUILTIN_ANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_andnotv8df3_mask, "__builtin_ia32_andnpd512_mask", IX86_BUILTIN_ANDNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_andnotv16sf3_mask, "__builtin_ia32_andnps512_mask", IX86_BUILTIN_ANDNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinsertf32x8_mask, "__builtin_ia32_insertf32x8_mask", IX86_BUILTIN_INSERTF32X8, UNKNOWN, (int) V16SF_FTYPE_V16SF_V8SF_INT_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinserti32x8_mask, "__builtin_ia32_inserti32x8_mask", IX86_BUILTIN_INSERTI32X8, UNKNOWN, (int) V16SI_FTYPE_V16SI_V8SI_INT_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinsertf64x2_mask, "__builtin_ia32_insertf64x2_512_mask", IX86_BUILTIN_INSERTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V2DF_INT_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vinserti64x2_mask, "__builtin_ia32_inserti64x2_512_mask", IX86_BUILTIN_INSERTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_INT_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_fpclassv8df_mask, "__builtin_ia32_fpclasspd512_mask", IX86_BUILTIN_FPCLASSPD512, UNKNOWN, (int) QI_FTYPE_V8DF_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_fpclassv16sf_mask, "__builtin_ia32_fpclassps512_mask", IX86_BUILTIN_FPCLASSPS512, UNKNOWN, (int) HI_FTYPE_V16SF_INT_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtd2maskv16si, "__builtin_ia32_cvtd2mask512", IX86_BUILTIN_CVTD2MASK512, UNKNOWN, (int) UHI_FTYPE_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtq2maskv8di, "__builtin_ia32_cvtq2mask512", IX86_BUILTIN_CVTQ2MASK512, UNKNOWN, (int) UQI_FTYPE_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtmask2dv16si, "__builtin_ia32_cvtmask2d512", IX86_BUILTIN_CVTMASK2D512, UNKNOWN, (int) V16SI_FTYPE_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512f_cvtmask2qv8di, "__builtin_ia32_cvtmask2q512", IX86_BUILTIN_CVTMASK2Q512, UNKNOWN, (int) V8DI_FTYPE_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16sf_mask, "__builtin_ia32_broadcastf32x2_512_mask", IX86_BUILTIN_BROADCASTF32x2_512, UNKNOWN, (int) V16SF_FTYPE_V4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16si_mask, "__builtin_ia32_broadcasti32x2_512_mask", IX86_BUILTIN_BROADCASTI32x2_512, UNKNOWN, (int) V16SI_FTYPE_V4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv8df_mask_1, "__builtin_ia32_broadcastf64x2_512_mask", IX86_BUILTIN_BROADCASTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V2DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv8di_mask_1, "__builtin_ia32_broadcasti64x2_512_mask", IX86_BUILTIN_BROADCASTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V2DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16sf_mask_1, "__builtin_ia32_broadcastf32x8_512_mask", IX86_BUILTIN_BROADCASTF32X8_512, UNKNOWN, (int) V16SF_FTYPE_V8SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_broadcastv16si_mask_1, "__builtin_ia32_broadcasti32x8_512_mask", IX86_BUILTIN_BROADCASTI32X8_512, UNKNOWN, (int) V16SI_FTYPE_V8SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextractf64x2_mask, "__builtin_ia32_extractf64x2_512_mask", IX86_BUILTIN_EXTRACTF64X2_512, UNKNOWN, (int) V2DF_FTYPE_V8DF_INT_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextractf32x8_mask, "__builtin_ia32_extractf32x8_mask", IX86_BUILTIN_EXTRACTF32X8, UNKNOWN, (int) V8SF_FTYPE_V16SF_INT_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextracti64x2_mask, "__builtin_ia32_extracti64x2_512_mask", IX86_BUILTIN_EXTRACTI64X2_512, UNKNOWN, (int) V2DI_FTYPE_V8DI_INT_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vextracti32x8_mask, "__builtin_ia32_extracti32x8_mask", IX86_BUILTIN_EXTRACTI32X8, UNKNOWN, (int) V8SI_FTYPE_V16SI_INT_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv8df_mask, "__builtin_ia32_reducepd512_mask", IX86_BUILTIN_REDUCEPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv16sf_mask, "__builtin_ia32_reduceps512_mask", IX86_BUILTIN_REDUCEPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_mulv8di3_mask, "__builtin_ia32_pmullq512_mask", IX86_BUILTIN_PMULLQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv8df3_mask, "__builtin_ia32_xorpd512_mask", IX86_BUILTIN_XORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_xorv16sf3_mask, "__builtin_ia32_xorps512_mask", IX86_BUILTIN_XORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv8df3_mask, "__builtin_ia32_orpd512_mask", IX86_BUILTIN_ORPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_iorv16sf3_mask, "__builtin_ia32_orps512_mask", IX86_BUILTIN_ORPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv8df3_mask, "__builtin_ia32_andpd512_mask", IX86_BUILTIN_ANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_andv16sf3_mask, "__builtin_ia32_andps512_mask", IX86_BUILTIN_ANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv8df3_mask, "__builtin_ia32_andnpd512_mask", IX86_BUILTIN_ANDNPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_andnotv16sf3_mask, "__builtin_ia32_andnps512_mask", IX86_BUILTIN_ANDNPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinsertf32x8_mask, "__builtin_ia32_insertf32x8_mask", IX86_BUILTIN_INSERTF32X8, UNKNOWN, (int) V16SF_FTYPE_V16SF_V8SF_INT_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinserti32x8_mask, "__builtin_ia32_inserti32x8_mask", IX86_BUILTIN_INSERTI32X8, UNKNOWN, (int) V16SI_FTYPE_V16SI_V8SI_INT_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinsertf64x2_mask, "__builtin_ia32_insertf64x2_512_mask", IX86_BUILTIN_INSERTF64X2_512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V2DF_INT_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_vinserti64x2_mask, "__builtin_ia32_inserti64x2_512_mask", IX86_BUILTIN_INSERTI64X2_512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V2DI_INT_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_fpclassv8df_mask, "__builtin_ia32_fpclasspd512_mask", IX86_BUILTIN_FPCLASSPD512, UNKNOWN, (int) QI_FTYPE_V8DF_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_fpclassv16sf_mask, "__builtin_ia32_fpclassps512_mask", IX86_BUILTIN_FPCLASSPS512, UNKNOWN, (int) HI_FTYPE_V16SF_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtd2maskv16si, "__builtin_ia32_cvtd2mask512", IX86_BUILTIN_CVTD2MASK512, UNKNOWN, (int) UHI_FTYPE_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtq2maskv8di, "__builtin_ia32_cvtq2mask512", IX86_BUILTIN_CVTQ2MASK512, UNKNOWN, (int) UQI_FTYPE_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtmask2dv16si, "__builtin_ia32_cvtmask2d512", IX86_BUILTIN_CVTMASK2D512, UNKNOWN, (int) V16SI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtmask2qv8di, "__builtin_ia32_cvtmask2q512", IX86_BUILTIN_CVTMASK2Q512, UNKNOWN, (int) V8DI_FTYPE_UQI)
/* AVX512BW. */
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kunpcksi, "__builtin_ia32_kunpcksi", IX86_BUILTIN_KUNPCKWD, UNKNOWN, (int) USI_FTYPE_USI_USI)
@@ -3207,26 +3207,26 @@ BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_vmrsqrt28v4sf_round, "__bu
BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_vmrsqrt28v4sf_mask_round, "__builtin_ia32_rsqrt28ss_mask_round", IX86_BUILTIN_RSQRT28SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT)
/* AVX512DQ. */
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv8df_mask_round, "__builtin_ia32_reducepd512_mask_round", IX86_BUILTIN_REDUCEPD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducepv16sf_mask_round, "__builtin_ia32_reduceps512_mask_round", IX86_BUILTIN_REDUCEPS512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv8df_mask_round, "__builtin_ia32_reducepd512_mask_round", IX86_BUILTIN_REDUCEPD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv16sf_mask_round, "__builtin_ia32_reduceps512_mask_round", IX86_BUILTIN_REDUCEPS512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_UHI_INT)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducesv2df_mask_round, "__builtin_ia32_reducesd_mask_round", IX86_BUILTIN_REDUCESD128_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducesv4sf_mask_round, "__builtin_ia32_reducess_mask_round", IX86_BUILTIN_REDUCESS128_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangesv2df_mask_round, "__builtin_ia32_rangesd128_mask_round", IX86_BUILTIN_RANGESD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangesv4sf_mask_round, "__builtin_ia32_rangess128_mask_round", IX86_BUILTIN_RANGESS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2qq512_mask", IX86_BUILTIN_CVTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_cvtps2qqv8di_mask_round, "__builtin_ia32_cvtps2qq512_mask", IX86_BUILTIN_CVTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2uqq512_mask", IX86_BUILTIN_CVTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_cvtps2uqqv8di_mask_round, "__builtin_ia32_cvtps2uqq512_mask", IX86_BUILTIN_CVTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatv8div8sf2_mask_round, "__builtin_ia32_cvtqq2ps512_mask", IX86_BUILTIN_CVTQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatunsv8div8sf2_mask_round, "__builtin_ia32_cvtuqq2ps512_mask", IX86_BUILTIN_CVTUQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatv8div8df2_mask_round, "__builtin_ia32_cvtqq2pd512_mask", IX86_BUILTIN_CVTQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_floatunsv8div8df2_mask_round, "__builtin_ia32_cvtuqq2pd512_mask", IX86_BUILTIN_CVTUQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fix_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2qq512_mask", IX86_BUILTIN_CVTTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2uqq512_mask", IX86_BUILTIN_CVTTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fix_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2qq512_mask", IX86_BUILTIN_CVTTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2uqq512_mask", IX86_BUILTIN_CVTTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2qq512_mask", IX86_BUILTIN_CVTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_cvtps2qqv8di_mask_round, "__builtin_ia32_cvtps2qq512_mask", IX86_BUILTIN_CVTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2uqq512_mask", IX86_BUILTIN_CVTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_cvtps2uqqv8di_mask_round, "__builtin_ia32_cvtps2uqq512_mask", IX86_BUILTIN_CVTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv8div8sf2_mask_round, "__builtin_ia32_cvtqq2ps512_mask", IX86_BUILTIN_CVTQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatunsv8div8sf2_mask_round, "__builtin_ia32_cvtuqq2ps512_mask", IX86_BUILTIN_CVTUQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatv8div8df2_mask_round, "__builtin_ia32_cvtqq2pd512_mask", IX86_BUILTIN_CVTQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_floatunsv8div8df2_mask_round, "__builtin_ia32_cvtuqq2pd512_mask", IX86_BUILTIN_CVTUQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fix_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2qq512_mask", IX86_BUILTIN_CVTTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2uqq512_mask", IX86_BUILTIN_CVTTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fix_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2qq512_mask", IX86_BUILTIN_CVTTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_fixuns_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2uqq512_mask", IX86_BUILTIN_CVTTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
/* AVX512FP16. */
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_addph512_mask_round", IX86_BUILTIN_ADDPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 09/18] [PATCH 3/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (7 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 08/18] [PATCH 2/5] " Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 10/18] [PATCH 4/5] " Hu, Lin1
` (10 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.
---
gcc/config/i386/i386-builtin.def | 226 +++++++++++++++----------------
1 file changed, 113 insertions(+), 113 deletions(-)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 7a0dec9bc8b..167d530a537 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -293,10 +293,10 @@ BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_si,
BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_di, "__builtin_ia32_cmpccxadd64", IX86_BUILTIN_CMPCCXADD64, UNKNOWN, (int) LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT)
/* AVX512BW */
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_storev32hi_mask, "__builtin_ia32_storedquhi512_mask", IX86_BUILTIN_STOREDQUHI512_MASK, UNKNOWN, (int) VOID_FTYPE_PSHORT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_storev64qi_mask, "__builtin_ia32_storedquqi512_mask", IX86_BUILTIN_STOREDQUQI512_MASK, UNKNOWN, (int) VOID_FTYPE_PCHAR_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_storev32hi_mask, "__builtin_ia32_storedquhi512_mask", IX86_BUILTIN_STOREDQUHI512_MASK, UNKNOWN, (int) VOID_FTYPE_PSHORT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_storev64qi_mask, "__builtin_ia32_storedquqi512_mask", IX86_BUILTIN_STOREDQUQI512_MASK, UNKNOWN, (int) VOID_FTYPE_PCHAR_V64QI_UDI)
/* AVX512VP2INTERSECT */
BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd512", IX86_BUILTIN_2INTERSECTD512, UNKNOWN, (int) VOID_FTYPE_PUHI_PUHI_V16SI_V16SI)
@@ -407,9 +407,9 @@ BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl
BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_ss_truncatev16hiv16qi2_mask_store, "__builtin_ia32_pmovswb256mem_mask", IX86_BUILTIN_PMOVSWB256_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_us_truncatev8hiv8qi2_mask_store_2, "__builtin_ia32_pmovuswb128mem_mask", IX86_BUILTIN_PMOVUSWB128_MEM, UNKNOWN, (int) VOID_FTYPE_PUDI_V8HI_UQI)
BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_us_truncatev16hiv16qi2_mask_store, "__builtin_ia32_pmovuswb256mem_mask", IX86_BUILTIN_PMOVUSWB256_MEM, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16HI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovuswb512mem_mask", IX86_BUILTIN_PMOVUSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovswb512mem_mask", IX86_BUILTIN_PMOVSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovwb512mem_mask", IX86_BUILTIN_PMOVWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovuswb512mem_mask", IX86_BUILTIN_PMOVUSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovswb512mem_mask", IX86_BUILTIN_PMOVSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovwb512mem_mask", IX86_BUILTIN_PMOVWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
/* AVX512FP16 */
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_loadhf_mask, "__builtin_ia32_loadsh_mask", IX86_BUILTIN_LOADSH_MASK, UNKNOWN, (int) V8HF_FTYPE_PCFLOAT16_V8HF_UQI)
@@ -1590,61 +1590,61 @@ BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_round
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kashiftqi, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kashifthi, "__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kashiftsi, "__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int) USI_FTYPE_USI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kashiftdi, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kashiftdi, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_klshiftrtqi, "__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_klshiftrthi, "__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_klshiftrtsi, "__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int) USI_FTYPE_USI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_klshiftrtdi, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_klshiftrtdi, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kandqi, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kandhi, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kandsi, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kanddi, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kanddi, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kandnqi, "__builtin_ia32_kandnqi", IX86_BUILTIN_KANDN8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kandnsi, "__builtin_ia32_kandnsi", IX86_BUILTIN_KANDN32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_knotqi, "__builtin_ia32_knotqi", IX86_BUILTIN_KNOT8, UNKNOWN, (int) UQI_FTYPE_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_knothi, "__builtin_ia32_knothi", IX86_BUILTIN_KNOT16, UNKNOWN, (int) UHI_FTYPE_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_knotsi, "__builtin_ia32_knotsi", IX86_BUILTIN_KNOT32, UNKNOWN, (int) USI_FTYPE_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_knotdi, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_knotdi, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kiorqi, "__builtin_ia32_korqi", IX86_BUILTIN_KOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kiorhi, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kiorsi, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktestqi, "__builtin_ia32_ktestcqi", IX86_BUILTIN_KTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktestqi, "__builtin_ia32_ktestzqi", IX86_BUILTIN_KTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktesthi, "__builtin_ia32_ktestchi", IX86_BUILTIN_KTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_ktesthi, "__builtin_ia32_ktestzhi", IX86_BUILTIN_KTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestsi, "__builtin_ia32_ktestcsi", IX86_BUILTIN_KTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestsi, "__builtin_ia32_ktestzsi", IX86_BUILTIN_KTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestdi, "__builtin_ia32_ktestcdi", IX86_BUILTIN_KTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ktestdi, "__builtin_ia32_ktestzdi", IX86_BUILTIN_KTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestsi, "__builtin_ia32_ktestcsi", IX86_BUILTIN_KTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestsi, "__builtin_ia32_ktestzsi", IX86_BUILTIN_KTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestdi, "__builtin_ia32_ktestcdi", IX86_BUILTIN_KTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ktestdi, "__builtin_ia32_ktestzdi", IX86_BUILTIN_KTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kortestqi, "__builtin_ia32_kortestcqi", IX86_BUILTIN_KORTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kortestqi, "__builtin_ia32_kortestzqi", IX86_BUILTIN_KORTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kortesthi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kortesthi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestsi, "__builtin_ia32_kortestcsi", IX86_BUILTIN_KORTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestsi, "__builtin_ia32_kortestzsi", IX86_BUILTIN_KORTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestdi, "__builtin_ia32_kortestcdi", IX86_BUILTIN_KORTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kortestdi, "__builtin_ia32_kortestzdi", IX86_BUILTIN_KORTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kortestdi, "__builtin_ia32_kortestcdi", IX86_BUILTIN_KORTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kortestdi, "__builtin_ia32_kortestzdi", IX86_BUILTIN_KORTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxnorsi, "__builtin_ia32_kxnorsi", IX86_BUILTIN_KXNOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kxorqi, "__builtin_ia32_kxorqi", IX86_BUILTIN_KXOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kxorhi, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxorsi, "__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kxordi, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kxordi, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kmovb, "__builtin_ia32_kmovb", IX86_BUILTIN_KMOV8, UNKNOWN, (int) UQI_FTYPE_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_kmovw, "__builtin_ia32_kmovw", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kmovd, "__builtin_ia32_kmovd", IX86_BUILTIN_KMOV32, UNKNOWN, (int) USI_FTYPE_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kmovq, "__builtin_ia32_kmovq", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kmovq, "__builtin_ia32_kmovq", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kaddqi, "__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_kaddhi, "__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kaddsi, "__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
/* SHA */
BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sha1msg1, 0, IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
@@ -2442,96 +2442,96 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtm
/* AVX512BW. */
BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kunpcksi, "__builtin_ia32_kunpcksi", IX86_BUILTIN_KUNPCKWD, UNKNOWN, (int) USI_FTYPE_USI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_kunpckdi, "__builtin_ia32_kunpckdi", IX86_BUILTIN_KUNPCKDQ, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packusdw_mask, "__builtin_ia32_packusdw512_mask", IX86_BUILTIN_PACKUSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ashlv4ti3, "__builtin_ia32_pslldq512", IX86_BUILTIN_PSLLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_lshrv4ti3, "__builtin_ia32_psrldq512", IX86_BUILTIN_PSRLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packssdw_mask, "__builtin_ia32_packssdw512_mask", IX86_BUILTIN_PACKSSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_palignrv4ti, "__builtin_ia32_palignr512", IX86_BUILTIN_PALIGNR512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_CONVERT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_palignrv64qi_mask, "__builtin_ia32_palignr512_mask", IX86_BUILTIN_PALIGNR512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UDI_CONVERT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_movdquhi512_mask", IX86_BUILTIN_MOVDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_movdquqi512_mask", IX86_BUILTIN_MOVDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512f_psadbw, "__builtin_ia32_psadbw512", IX86_BUILTIN_PSADBW512, UNKNOWN, (int) V8DI_FTYPE_V64QI_V64QI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_dbpsadbwv32hi_mask, "__builtin_ia32_dbpsadbw512_mask", IX86_BUILTIN_DBPSADBW512, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dupv64qi_mask, "__builtin_ia32_pbroadcastb512_mask", IX86_BUILTIN_PBROADCASTB512, UNKNOWN, (int) V64QI_FTYPE_V16QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dup_gprv64qi_mask, "__builtin_ia32_pbroadcastb512_gpr_mask", IX86_BUILTIN_PBROADCASTB512_GPR, UNKNOWN, (int) V64QI_FTYPE_QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dupv32hi_mask, "__builtin_ia32_pbroadcastw512_mask", IX86_BUILTIN_PBROADCASTW512, UNKNOWN, (int) V32HI_FTYPE_V8HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vec_dup_gprv32hi_mask, "__builtin_ia32_pbroadcastw512_gpr_mask", IX86_BUILTIN_PBROADCASTW512_GPR, UNKNOWN, (int) V32HI_FTYPE_HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_sign_extendv32qiv32hi2_mask, "__builtin_ia32_pmovsxbw512_mask", IX86_BUILTIN_PMOVSXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_zero_extendv32qiv32hi2_mask, "__builtin_ia32_pmovzxbw512_mask", IX86_BUILTIN_PMOVZXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_permvarv32hi_mask, "__builtin_ia32_permvarhi512_mask", IX86_BUILTIN_VPERMVARHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vpermt2varv32hi3_mask, "__builtin_ia32_vpermt2varhi512_mask", IX86_BUILTIN_VPERMT2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vpermt2varv32hi3_maskz, "__builtin_ia32_vpermt2varhi512_maskz", IX86_BUILTIN_VPERMT2VARHI512_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_vpermi2varv32hi3_mask, "__builtin_ia32_vpermi2varhi512_mask", IX86_BUILTIN_VPERMI2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_uavgv64qi3_mask, "__builtin_ia32_pavgb512_mask", IX86_BUILTIN_PAVGB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_uavgv32hi3_mask, "__builtin_ia32_pavgw512_mask", IX86_BUILTIN_PAVGW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_addv64qi3_mask, "__builtin_ia32_paddb512_mask", IX86_BUILTIN_PADDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_subv64qi3_mask, "__builtin_ia32_psubb512_mask", IX86_BUILTIN_PSUBB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_sssubv64qi3_mask, "__builtin_ia32_psubsb512_mask", IX86_BUILTIN_PSUBSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ssaddv64qi3_mask, "__builtin_ia32_paddsb512_mask", IX86_BUILTIN_PADDSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ussubv64qi3_mask, "__builtin_ia32_psubusb512_mask", IX86_BUILTIN_PSUBUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_usaddv64qi3_mask, "__builtin_ia32_paddusb512_mask", IX86_BUILTIN_PADDUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_subv32hi3_mask, "__builtin_ia32_psubw512_mask", IX86_BUILTIN_PSUBW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_addv32hi3_mask, "__builtin_ia32_paddw512_mask", IX86_BUILTIN_PADDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_sssubv32hi3_mask, "__builtin_ia32_psubsw512_mask", IX86_BUILTIN_PSUBSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ssaddv32hi3_mask, "__builtin_ia32_paddsw512_mask", IX86_BUILTIN_PADDSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ussubv32hi3_mask, "__builtin_ia32_psubusw512_mask", IX86_BUILTIN_PSUBUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_usaddv32hi3_mask, "__builtin_ia32_paddusw512_mask", IX86_BUILTIN_PADDUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_umaxv32hi3_mask, "__builtin_ia32_pmaxuw512_mask", IX86_BUILTIN_PMAXUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_smaxv32hi3_mask, "__builtin_ia32_pmaxsw512_mask", IX86_BUILTIN_PMAXSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_uminv32hi3_mask, "__builtin_ia32_pminuw512_mask", IX86_BUILTIN_PMINUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_sminv32hi3_mask, "__builtin_ia32_pminsw512_mask", IX86_BUILTIN_PMINSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_umaxv64qi3_mask, "__builtin_ia32_pmaxub512_mask", IX86_BUILTIN_PMAXUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_smaxv64qi3_mask, "__builtin_ia32_pmaxsb512_mask", IX86_BUILTIN_PMAXSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_uminv64qi3_mask, "__builtin_ia32_pminub512_mask", IX86_BUILTIN_PMINUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_sminv64qi3_mask, "__builtin_ia32_pminsb512_mask", IX86_BUILTIN_PMINSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovwb512_mask", IX86_BUILTIN_PMOVWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovswb512_mask", IX86_BUILTIN_PMOVSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovuswb512_mask", IX86_BUILTIN_PMOVUSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_umulhrswv32hi3_mask, "__builtin_ia32_pmulhrsw512_mask", IX86_BUILTIN_PMULHRSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_umulv32hi3_highpart_mask, "__builtin_ia32_pmulhuw512_mask" , IX86_BUILTIN_PMULHUW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_smulv32hi3_highpart_mask, "__builtin_ia32_pmulhw512_mask" , IX86_BUILTIN_PMULHW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_mulv32hi3_mask, "__builtin_ia32_pmullw512_mask", IX86_BUILTIN_PMULLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllwi512_mask", IX86_BUILTIN_PSLLWI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllw512_mask", IX86_BUILTIN_PSLLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packsswb_mask, "__builtin_ia32_packsswb512_mask", IX86_BUILTIN_PACKSSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_packuswb_mask, "__builtin_ia32_packuswb512_mask", IX86_BUILTIN_PACKUSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ashrvv32hi_mask, "__builtin_ia32_psrav32hi_mask", IX86_BUILTIN_PSRAVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pmaddubsw512v32hi_mask, "__builtin_ia32_pmaddubsw512_mask", IX86_BUILTIN_PMADDUBSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pmaddwd512v32hi_mask, "__builtin_ia32_pmaddwd512_mask", IX86_BUILTIN_PMADDWD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V32HI_V32HI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_lshrvv32hi_mask, "__builtin_ia32_psrlv32hi_mask", IX86_BUILTIN_PSRLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_highv64qi_mask, "__builtin_ia32_punpckhbw512_mask", IX86_BUILTIN_PUNPCKHBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_highv32hi_mask, "__builtin_ia32_punpckhwd512_mask", IX86_BUILTIN_PUNPCKHWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_lowv64qi_mask, "__builtin_ia32_punpcklbw512_mask", IX86_BUILTIN_PUNPCKLBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_interleave_lowv32hi_mask, "__builtin_ia32_punpcklwd512_mask", IX86_BUILTIN_PUNPCKLWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pshufbv64qi3_mask, "__builtin_ia32_pshufb512_mask", IX86_BUILTIN_PSHUFB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pshufhwv32hi_mask, "__builtin_ia32_pshufhw512_mask", IX86_BUILTIN_PSHUFHW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_pshuflwv32hi_mask, "__builtin_ia32_pshuflw512_mask", IX86_BUILTIN_PSHUFLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psrawi512_mask", IX86_BUILTIN_PSRAWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psraw512_mask", IX86_BUILTIN_PSRAW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlwi512_mask", IX86_BUILTIN_PSRLWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlw512_mask", IX86_BUILTIN_PSRLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtb2maskv64qi, "__builtin_ia32_cvtb2mask512", IX86_BUILTIN_CVTB2MASK512, UNKNOWN, (int) UDI_FTYPE_V64QI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtw2maskv32hi, "__builtin_ia32_cvtw2mask512", IX86_BUILTIN_CVTW2MASK512, UNKNOWN, (int) USI_FTYPE_V32HI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtmask2bv64qi, "__builtin_ia32_cvtmask2b512", IX86_BUILTIN_CVTMASK2B512, UNKNOWN, (int) V64QI_FTYPE_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cvtmask2wv32hi, "__builtin_ia32_cvtmask2w512", IX86_BUILTIN_CVTMASK2W512, UNKNOWN, (int) V32HI_FTYPE_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_eqv64qi3_mask, "__builtin_ia32_pcmpeqb512_mask", IX86_BUILTIN_PCMPEQB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_eqv32hi3_mask, "__builtin_ia32_pcmpeqw512_mask", IX86_BUILTIN_PCMPEQW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_gtv64qi3_mask, "__builtin_ia32_pcmpgtb512_mask", IX86_BUILTIN_PCMPGTB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_gtv32hi3_mask, "__builtin_ia32_pcmpgtw512_mask", IX86_BUILTIN_PCMPGTW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testmv64qi3_mask, "__builtin_ia32_ptestmb512", IX86_BUILTIN_PTESTMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testmv32hi3_mask, "__builtin_ia32_ptestmw512", IX86_BUILTIN_PTESTMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testnmv64qi3_mask, "__builtin_ia32_ptestnmb512", IX86_BUILTIN_PTESTNMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_testnmv32hi3_mask, "__builtin_ia32_ptestnmw512", IX86_BUILTIN_PTESTNMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ashlvv32hi_mask, "__builtin_ia32_psllv32hi_mask", IX86_BUILTIN_PSLLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_absv64qi2_mask, "__builtin_ia32_pabsb512_mask", IX86_BUILTIN_PABSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_absv32hi2_mask, "__builtin_ia32_pabsw512_mask", IX86_BUILTIN_PABSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_blendmv32hi, "__builtin_ia32_blendmw_512_mask", IX86_BUILTIN_BLENDMW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_blendmv64qi, "__builtin_ia32_blendmb_512_mask", IX86_BUILTIN_BLENDMB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cmpv64qi3_mask, "__builtin_ia32_cmpb512_mask", IX86_BUILTIN_CMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_cmpv32hi3_mask, "__builtin_ia32_cmpw512_mask", IX86_BUILTIN_CMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ucmpv64qi3_mask, "__builtin_ia32_ucmpb512_mask", IX86_BUILTIN_UCMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI)
-BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ucmpv32hi3_mask, "__builtin_ia32_ucmpw512_mask", IX86_BUILTIN_UCMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_kunpckdi, "__builtin_ia32_kunpckdi", IX86_BUILTIN_KUNPCKDQ, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packusdw_mask, "__builtin_ia32_packusdw512_mask", IX86_BUILTIN_PACKUSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ashlv4ti3, "__builtin_ia32_pslldq512", IX86_BUILTIN_PSLLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_lshrv4ti3, "__builtin_ia32_psrldq512", IX86_BUILTIN_PSRLDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_INT_CONVERT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packssdw_mask, "__builtin_ia32_packssdw512_mask", IX86_BUILTIN_PACKSSDW512, UNKNOWN, (int) V32HI_FTYPE_V16SI_V16SI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_palignrv4ti, "__builtin_ia32_palignr512", IX86_BUILTIN_PALIGNR512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_CONVERT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_palignrv64qi_mask, "__builtin_ia32_palignr512_mask", IX86_BUILTIN_PALIGNR512_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UDI_CONVERT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_movdquhi512_mask", IX86_BUILTIN_MOVDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_movdquqi512_mask", IX86_BUILTIN_MOVDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_psadbw, "__builtin_ia32_psadbw512", IX86_BUILTIN_PSADBW512, UNKNOWN, (int) V8DI_FTYPE_V64QI_V64QI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_dbpsadbwv32hi_mask, "__builtin_ia32_dbpsadbw512_mask", IX86_BUILTIN_DBPSADBW512, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dupv64qi_mask, "__builtin_ia32_pbroadcastb512_mask", IX86_BUILTIN_PBROADCASTB512, UNKNOWN, (int) V64QI_FTYPE_V16QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dup_gprv64qi_mask, "__builtin_ia32_pbroadcastb512_gpr_mask", IX86_BUILTIN_PBROADCASTB512_GPR, UNKNOWN, (int) V64QI_FTYPE_QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dupv32hi_mask, "__builtin_ia32_pbroadcastw512_mask", IX86_BUILTIN_PBROADCASTW512, UNKNOWN, (int) V32HI_FTYPE_V8HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vec_dup_gprv32hi_mask, "__builtin_ia32_pbroadcastw512_gpr_mask", IX86_BUILTIN_PBROADCASTW512_GPR, UNKNOWN, (int) V32HI_FTYPE_HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sign_extendv32qiv32hi2_mask, "__builtin_ia32_pmovsxbw512_mask", IX86_BUILTIN_PMOVSXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_zero_extendv32qiv32hi2_mask, "__builtin_ia32_pmovzxbw512_mask", IX86_BUILTIN_PMOVZXBW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_permvarv32hi_mask, "__builtin_ia32_permvarhi512_mask", IX86_BUILTIN_VPERMVARHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv32hi3_mask, "__builtin_ia32_vpermt2varhi512_mask", IX86_BUILTIN_VPERMT2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv32hi3_maskz, "__builtin_ia32_vpermt2varhi512_maskz", IX86_BUILTIN_VPERMT2VARHI512_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermi2varv32hi3_mask, "__builtin_ia32_vpermi2varhi512_mask", IX86_BUILTIN_VPERMI2VARHI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_uavgv64qi3_mask, "__builtin_ia32_pavgb512_mask", IX86_BUILTIN_PAVGB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_uavgv32hi3_mask, "__builtin_ia32_pavgw512_mask", IX86_BUILTIN_PAVGW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv64qi3_mask, "__builtin_ia32_paddb512_mask", IX86_BUILTIN_PADDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv64qi3_mask, "__builtin_ia32_psubb512_mask", IX86_BUILTIN_PSUBB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sssubv64qi3_mask, "__builtin_ia32_psubsb512_mask", IX86_BUILTIN_PSUBSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ssaddv64qi3_mask, "__builtin_ia32_paddsb512_mask", IX86_BUILTIN_PADDSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ussubv64qi3_mask, "__builtin_ia32_psubusb512_mask", IX86_BUILTIN_PSUBUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_usaddv64qi3_mask, "__builtin_ia32_paddusb512_mask", IX86_BUILTIN_PADDUSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv32hi3_mask, "__builtin_ia32_psubw512_mask", IX86_BUILTIN_PSUBW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv32hi3_mask, "__builtin_ia32_paddw512_mask", IX86_BUILTIN_PADDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sssubv32hi3_mask, "__builtin_ia32_psubsw512_mask", IX86_BUILTIN_PSUBSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ssaddv32hi3_mask, "__builtin_ia32_paddsw512_mask", IX86_BUILTIN_PADDSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ussubv32hi3_mask, "__builtin_ia32_psubusw512_mask", IX86_BUILTIN_PSUBUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_usaddv32hi3_mask, "__builtin_ia32_paddusw512_mask", IX86_BUILTIN_PADDUSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv32hi3_mask, "__builtin_ia32_pmaxuw512_mask", IX86_BUILTIN_PMAXUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv32hi3_mask, "__builtin_ia32_pmaxsw512_mask", IX86_BUILTIN_PMAXSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv32hi3_mask, "__builtin_ia32_pminuw512_mask", IX86_BUILTIN_PMINUW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv32hi3_mask, "__builtin_ia32_pminsw512_mask", IX86_BUILTIN_PMINSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umaxv64qi3_mask, "__builtin_ia32_pmaxub512_mask", IX86_BUILTIN_PMAXUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv64qi3_mask, "__builtin_ia32_pmaxsb512_mask", IX86_BUILTIN_PMAXSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_uminv64qi3_mask, "__builtin_ia32_pminub512_mask", IX86_BUILTIN_PMINUB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv64qi3_mask, "__builtin_ia32_pminsb512_mask", IX86_BUILTIN_PMINSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovwb512_mask", IX86_BUILTIN_PMOVWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovswb512_mask", IX86_BUILTIN_PMOVSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mask, "__builtin_ia32_pmovuswb512_mask", IX86_BUILTIN_PMOVUSWB512, UNKNOWN, (int) V32QI_FTYPE_V32HI_V32QI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_umulhrswv32hi3_mask, "__builtin_ia32_pmulhrsw512_mask", IX86_BUILTIN_PMULHRSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_umulv32hi3_highpart_mask, "__builtin_ia32_pmulhuw512_mask" , IX86_BUILTIN_PMULHUW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_smulv32hi3_highpart_mask, "__builtin_ia32_pmulhw512_mask" , IX86_BUILTIN_PMULHW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv32hi3_mask, "__builtin_ia32_pmullw512_mask", IX86_BUILTIN_PMULLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllwi512_mask", IX86_BUILTIN_PSLLWI512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashlv32hi3_mask, "__builtin_ia32_psllw512_mask", IX86_BUILTIN_PSLLW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packsswb_mask, "__builtin_ia32_packsswb512_mask", IX86_BUILTIN_PACKSSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_packuswb_mask, "__builtin_ia32_packuswb512_mask", IX86_BUILTIN_PACKUSWB512, UNKNOWN, (int) V64QI_FTYPE_V32HI_V32HI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ashrvv32hi_mask, "__builtin_ia32_psrav32hi_mask", IX86_BUILTIN_PSRAVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pmaddubsw512v32hi_mask, "__builtin_ia32_pmaddubsw512_mask", IX86_BUILTIN_PMADDUBSW512_MASK, UNKNOWN, (int) V32HI_FTYPE_V64QI_V64QI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pmaddwd512v32hi_mask, "__builtin_ia32_pmaddwd512_mask", IX86_BUILTIN_PMADDWD512_MASK, UNKNOWN, (int) V16SI_FTYPE_V32HI_V32HI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_lshrvv32hi_mask, "__builtin_ia32_psrlv32hi_mask", IX86_BUILTIN_PSRLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_highv64qi_mask, "__builtin_ia32_punpckhbw512_mask", IX86_BUILTIN_PUNPCKHBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_highv32hi_mask, "__builtin_ia32_punpckhwd512_mask", IX86_BUILTIN_PUNPCKHWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_lowv64qi_mask, "__builtin_ia32_punpcklbw512_mask", IX86_BUILTIN_PUNPCKLBW512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_interleave_lowv32hi_mask, "__builtin_ia32_punpcklwd512_mask", IX86_BUILTIN_PUNPCKLWD512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pshufbv64qi3_mask, "__builtin_ia32_pshufb512_mask", IX86_BUILTIN_PSHUFB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pshufhwv32hi_mask, "__builtin_ia32_pshufhw512_mask", IX86_BUILTIN_PSHUFHW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_pshuflwv32hi_mask, "__builtin_ia32_pshuflw512_mask", IX86_BUILTIN_PSHUFLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psrawi512_mask", IX86_BUILTIN_PSRAWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_ashrv32hi3_mask, "__builtin_ia32_psraw512_mask", IX86_BUILTIN_PSRAW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlwi512_mask", IX86_BUILTIN_PSRLWI512, UNKNOWN, (int) V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_lshrv32hi3_mask, "__builtin_ia32_psrlw512_mask", IX86_BUILTIN_PSRLW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtb2maskv64qi, "__builtin_ia32_cvtb2mask512", IX86_BUILTIN_CVTB2MASK512, UNKNOWN, (int) UDI_FTYPE_V64QI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtw2maskv32hi, "__builtin_ia32_cvtw2mask512", IX86_BUILTIN_CVTW2MASK512, UNKNOWN, (int) USI_FTYPE_V32HI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtmask2bv64qi, "__builtin_ia32_cvtmask2b512", IX86_BUILTIN_CVTMASK2B512, UNKNOWN, (int) V64QI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cvtmask2wv32hi, "__builtin_ia32_cvtmask2w512", IX86_BUILTIN_CVTMASK2W512, UNKNOWN, (int) V32HI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_eqv64qi3_mask, "__builtin_ia32_pcmpeqb512_mask", IX86_BUILTIN_PCMPEQB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_eqv32hi3_mask, "__builtin_ia32_pcmpeqw512_mask", IX86_BUILTIN_PCMPEQW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_gtv64qi3_mask, "__builtin_ia32_pcmpgtb512_mask", IX86_BUILTIN_PCMPGTB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_gtv32hi3_mask, "__builtin_ia32_pcmpgtw512_mask", IX86_BUILTIN_PCMPGTW512_MASK, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testmv64qi3_mask, "__builtin_ia32_ptestmb512", IX86_BUILTIN_PTESTMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testmv32hi3_mask, "__builtin_ia32_ptestmw512", IX86_BUILTIN_PTESTMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testnmv64qi3_mask, "__builtin_ia32_ptestnmb512", IX86_BUILTIN_PTESTNMB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_testnmv32hi3_mask, "__builtin_ia32_ptestnmw512", IX86_BUILTIN_PTESTNMW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ashlvv32hi_mask, "__builtin_ia32_psllv32hi_mask", IX86_BUILTIN_PSLLVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv64qi2_mask, "__builtin_ia32_pabsb512_mask", IX86_BUILTIN_PABSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_absv32hi2_mask, "__builtin_ia32_pabsw512_mask", IX86_BUILTIN_PABSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_blendmv32hi, "__builtin_ia32_blendmw_512_mask", IX86_BUILTIN_BLENDMW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_blendmv64qi, "__builtin_ia32_blendmb_512_mask", IX86_BUILTIN_BLENDMB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv64qi3_mask, "__builtin_ia32_cmpb512_mask", IX86_BUILTIN_CMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv32hi3_mask, "__builtin_ia32_cmpw512_mask", IX86_BUILTIN_CMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucmpv64qi3_mask, "__builtin_ia32_ucmpb512_mask", IX86_BUILTIN_UCMPB512, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_INT_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucmpv32hi3_mask, "__builtin_ia32_ucmpw512_mask", IX86_BUILTIN_UCMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
/* AVX512IFMA */
BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52luqv8di_mask, "__builtin_ia32_vpmadd52luq512_mask", IX86_BUILTIN_VPMADD52LUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 10/18] [PATCH 4/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (8 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 09/18] [PATCH 3/5] " Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 11/18] [PATCH 5/5] " Hu, Lin1
` (9 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.
---
gcc/config/i386/i386-builtin.def | 188 +++++++++++++++----------------
1 file changed, 94 insertions(+), 94 deletions(-)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 167d530a537..8250e2998cd 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -299,8 +299,8 @@ BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_sto
BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_storev64qi_mask, "__builtin_ia32_storedquqi512_mask", IX86_BUILTIN_STOREDQUQI512_MASK, UNKNOWN, (int) VOID_FTYPE_PCHAR_V64QI_UDI)
/* AVX512VP2INTERSECT */
-BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd512", IX86_BUILTIN_2INTERSECTD512, UNKNOWN, (int) VOID_FTYPE_PUHI_PUHI_V16SI_V16SI)
-BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectq512", IX86_BUILTIN_2INTERSECTQ512, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V8DI_V8DI)
+BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT | OPTION_MASK_ISA2_EVEX512, CODE_FOR_nothing, "__builtin_ia32_2intersectd512", IX86_BUILTIN_2INTERSECTD512, UNKNOWN, (int) VOID_FTYPE_PUHI_PUHI_V16SI_V16SI)
+BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT | OPTION_MASK_ISA2_EVEX512, CODE_FOR_nothing, "__builtin_ia32_2intersectq512", IX86_BUILTIN_2INTERSECTQ512, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V8DI_V8DI)
BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd256", IX86_BUILTIN_2INTERSECTD256, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V8SI_V8SI)
BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectq256", IX86_BUILTIN_2INTERSECTQ256, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V4DI_V4DI)
BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing, "__builtin_ia32_2intersectd128", IX86_BUILTIN_2INTERSECTD128, UNKNOWN, (int) VOID_FTYPE_PUQI_PUQI_V4SI_V4SI)
@@ -430,17 +430,17 @@ BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_rdpkru, "__builtin_ia32_rdpkru", IX86_B
BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_wrpkru, "__builtin_ia32_wrpkru", IX86_BUILTIN_WRPKRU, UNKNOWN, (int) VOID_FTYPE_UNSIGNED)
/* VBMI2 */
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressstorev64qi_mask, "__builtin_ia32_compressstoreuqi512_mask", IX86_BUILTIN_PCOMPRESSBSTORE512, UNKNOWN, (int) VOID_FTYPE_PV64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressstorev32hi_mask, "__builtin_ia32_compressstoreuhi512_mask", IX86_BUILTIN_PCOMPRESSWSTORE512, UNKNOWN, (int) VOID_FTYPE_PV32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressstorev64qi_mask, "__builtin_ia32_compressstoreuqi512_mask", IX86_BUILTIN_PCOMPRESSBSTORE512, UNKNOWN, (int) VOID_FTYPE_PV64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressstorev32hi_mask, "__builtin_ia32_compressstoreuhi512_mask", IX86_BUILTIN_PCOMPRESSWSTORE512, UNKNOWN, (int) VOID_FTYPE_PV32HI_V32HI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev32qi_mask, "__builtin_ia32_compressstoreuqi256_mask", IX86_BUILTIN_PCOMPRESSBSTORE256, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev16qi_mask, "__builtin_ia32_compressstoreuqi128_mask", IX86_BUILTIN_PCOMPRESSBSTORE128, UNKNOWN, (int) VOID_FTYPE_PV16QI_V16QI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev16hi_mask, "__builtin_ia32_compressstoreuhi256_mask", IX86_BUILTIN_PCOMPRESSWSTORE256, UNKNOWN, (int) VOID_FTYPE_PV16HI_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressstorev8hi_mask, "__builtin_ia32_compressstoreuhi128_mask", IX86_BUILTIN_PCOMPRESSWSTORE128, UNKNOWN, (int) VOID_FTYPE_PV8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandloadqi512_mask", IX86_BUILTIN_PEXPANDBLOAD512, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandloadqi512_maskz", IX86_BUILTIN_PEXPANDBLOAD512Z, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandloadhi512_mask", IX86_BUILTIN_PEXPANDWLOAD512, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandloadhi512_maskz", IX86_BUILTIN_PEXPANDWLOAD512Z, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandloadqi512_mask", IX86_BUILTIN_PEXPANDBLOAD512, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandloadqi512_maskz", IX86_BUILTIN_PEXPANDBLOAD512Z, UNKNOWN, (int) V64QI_FTYPE_PCV64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandloadhi512_mask", IX86_BUILTIN_PEXPANDWLOAD512, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandloadhi512_maskz", IX86_BUILTIN_PEXPANDWLOAD512Z, UNKNOWN, (int) V32HI_FTYPE_PCV32HI_V32HI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_mask, "__builtin_ia32_expandloadqi256_mask", IX86_BUILTIN_PEXPANDBLOAD256, UNKNOWN, (int) V32QI_FTYPE_PCV32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_maskz, "__builtin_ia32_expandloadqi256_maskz", IX86_BUILTIN_PEXPANDBLOAD256Z, UNKNOWN, (int) V32QI_FTYPE_PCV32QI_V32QI_USI)
@@ -2534,10 +2534,10 @@ BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucm
BDESC (OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_ucmpv32hi3_mask, "__builtin_ia32_ucmpw512_mask", IX86_BUILTIN_UCMPW512, UNKNOWN, (int) USI_FTYPE_V32HI_V32HI_INT_USI)
/* AVX512IFMA */
-BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52luqv8di_mask, "__builtin_ia32_vpmadd52luq512_mask", IX86_BUILTIN_VPMADD52LUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52luqv8di_maskz, "__builtin_ia32_vpmadd52luq512_maskz", IX86_BUILTIN_VPMADD52LUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52huqv8di_mask, "__builtin_ia32_vpmadd52huq512_mask", IX86_BUILTIN_VPMADD52HUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512IFMA, 0, CODE_FOR_vpmadd52huqv8di_maskz, "__builtin_ia32_vpmadd52huq512_maskz", IX86_BUILTIN_VPMADD52HUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52luqv8di_mask, "__builtin_ia32_vpmadd52luq512_mask", IX86_BUILTIN_VPMADD52LUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52luqv8di_maskz, "__builtin_ia32_vpmadd52luq512_maskz", IX86_BUILTIN_VPMADD52LUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52huqv8di_mask, "__builtin_ia32_vpmadd52huq512_mask", IX86_BUILTIN_VPMADD52HUQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512IFMA, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmadd52huqv8di_maskz, "__builtin_ia32_vpmadd52huq512_maskz", IX86_BUILTIN_VPMADD52HUQ512_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmadd52luqv4di_mask, "__builtin_ia32_vpmadd52luq256_mask", IX86_BUILTIN_VPMADD52LUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmadd52luqv4di_maskz, "__builtin_ia32_vpmadd52luq256_maskz", IX86_BUILTIN_VPMADD52LUQ256_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmadd52huqv4di_mask, "__builtin_ia32_vpmadd52huq256_mask", IX86_BUILTIN_VPMADD52HUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
@@ -2552,13 +2552,13 @@ BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A
BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_vpmadd52huqv2di, "__builtin_ia32_vpmadd52huq128", IX86_BUINTIN_VPMADD52HUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI)
/* AVX512VBMI */
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmultishiftqbv32qi_mask, "__builtin_ia32_vpmultishiftqb256_mask", IX86_BUILTIN_VPMULTISHIFTQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmultishiftqbv16qi_mask, "__builtin_ia32_vpmultishiftqb128_mask", IX86_BUILTIN_VPMULTISHIFTQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_permvarv64qi_mask, "__builtin_ia32_permvarqi512_mask", IX86_BUILTIN_VPERMVARQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_vpermt2varv64qi3_mask, "__builtin_ia32_vpermt2varqi512_mask", IX86_BUILTIN_VPERMT2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_vpermt2varv64qi3_maskz, "__builtin_ia32_vpermt2varqi512_maskz", IX86_BUILTIN_VPERMT2VARQI512_MASKZ, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_avx512bw_vpermi2varv64qi3_mask, "__builtin_ia32_vpermi2varqi512_mask", IX86_BUILTIN_VPERMI2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_permvarv64qi_mask, "__builtin_ia32_permvarqi512_mask", IX86_BUILTIN_VPERMVARQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv64qi3_mask, "__builtin_ia32_vpermt2varqi512_mask", IX86_BUILTIN_VPERMT2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermt2varv64qi3_maskz, "__builtin_ia32_vpermt2varqi512_maskz", IX86_BUILTIN_VPERMT2VARQI512_MASKZ, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_vpermi2varv64qi3_mask, "__builtin_ia32_vpermi2varqi512_mask", IX86_BUILTIN_VPERMI2VARQI512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_permvarv32qi_mask, "__builtin_ia32_permvarqi256_mask", IX86_BUILTIN_VPERMVARQI256_MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_permvarv16qi_mask, "__builtin_ia32_permvarqi128_mask", IX86_BUILTIN_VPERMVARQI128_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpermt2varv32qi3_mask, "__builtin_ia32_vpermt2varqi256_mask", IX86_BUILTIN_VPERMT2VARQI256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI)
@@ -2569,16 +2569,16 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512
BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpermi2varv16qi3_mask, "__builtin_ia32_vpermi2varqi128_mask", IX86_BUILTIN_VPERMI2VARQI128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI)
/* VBMI2 */
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressv64qi_mask, "__builtin_ia32_compressqi512_mask", IX86_BUILTIN_PCOMPRESSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_compressv32hi_mask, "__builtin_ia32_compresshi512_mask", IX86_BUILTIN_PCOMPRESSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressv64qi_mask, "__builtin_ia32_compressqi512_mask", IX86_BUILTIN_PCOMPRESSB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_compressv32hi_mask, "__builtin_ia32_compresshi512_mask", IX86_BUILTIN_PCOMPRESSW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv32qi_mask, "__builtin_ia32_compressqi256_mask", IX86_BUILTIN_PCOMPRESSB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv16qi_mask, "__builtin_ia32_compressqi128_mask", IX86_BUILTIN_PCOMPRESSB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv16hi_mask, "__builtin_ia32_compresshi256_mask", IX86_BUILTIN_PCOMPRESSW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_compressv8hi_mask, "__builtin_ia32_compresshi128_mask", IX86_BUILTIN_PCOMPRESSW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandqi512_mask", IX86_BUILTIN_PEXPANDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandqi512_maskz", IX86_BUILTIN_PEXPANDB512Z, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandhi512_mask", IX86_BUILTIN_PEXPANDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandhi512_maskz", IX86_BUILTIN_PEXPANDW512Z, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_mask, "__builtin_ia32_expandqi512_mask", IX86_BUILTIN_PEXPANDB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv64qi_maskz, "__builtin_ia32_expandqi512_maskz", IX86_BUILTIN_PEXPANDB512Z, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_mask, "__builtin_ia32_expandhi512_mask", IX86_BUILTIN_PEXPANDW512, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_expandv32hi_maskz, "__builtin_ia32_expandhi512_maskz", IX86_BUILTIN_PEXPANDW512Z, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_mask, "__builtin_ia32_expandqi256_mask", IX86_BUILTIN_PEXPANDB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv32qi_maskz, "__builtin_ia32_expandqi256_maskz", IX86_BUILTIN_PEXPANDB256Z, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv16qi_mask, "__builtin_ia32_expandqi128_mask", IX86_BUILTIN_PEXPANDB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_UHI)
@@ -2587,64 +2587,64 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expan
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv16hi_maskz, "__builtin_ia32_expandhi256_maskz", IX86_BUILTIN_PEXPANDW256Z, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8hi_mask, "__builtin_ia32_expandhi128_mask", IX86_BUILTIN_PEXPANDW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8hi_maskz, "__builtin_ia32_expandhi128_maskz", IX86_BUILTIN_PEXPANDW128Z, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v32hi, "__builtin_ia32_vpshrd_v32hi", IX86_BUILTIN_VPSHRDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v32hi_mask, "__builtin_ia32_vpshrd_v32hi_mask", IX86_BUILTIN_VPSHRDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v32hi, "__builtin_ia32_vpshrd_v32hi", IX86_BUILTIN_VPSHRDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v32hi_mask, "__builtin_ia32_vpshrd_v32hi_mask", IX86_BUILTIN_VPSHRDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v16hi, "__builtin_ia32_vpshrd_v16hi", IX86_BUILTIN_VPSHRDV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v16hi_mask, "__builtin_ia32_vpshrd_v16hi_mask", IX86_BUILTIN_VPSHRDV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT_V16HI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8hi, "__builtin_ia32_vpshrd_v8hi", IX86_BUILTIN_VPSHRDV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8hi_mask, "__builtin_ia32_vpshrd_v8hi_mask", IX86_BUILTIN_VPSHRDV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT_V8HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v16si, "__builtin_ia32_vpshrd_v16si", IX86_BUILTIN_VPSHRDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v16si_mask, "__builtin_ia32_vpshrd_v16si_mask", IX86_BUILTIN_VPSHRDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v16si, "__builtin_ia32_vpshrd_v16si", IX86_BUILTIN_VPSHRDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v16si_mask, "__builtin_ia32_vpshrd_v16si_mask", IX86_BUILTIN_VPSHRDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8si, "__builtin_ia32_vpshrd_v8si", IX86_BUILTIN_VPSHRDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v8si_mask, "__builtin_ia32_vpshrd_v8si_mask", IX86_BUILTIN_VPSHRDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT_V8SI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4si, "__builtin_ia32_vpshrd_v4si", IX86_BUILTIN_VPSHRDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4si_mask, "__builtin_ia32_vpshrd_v4si_mask", IX86_BUILTIN_VPSHRDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT_V4SI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v8di, "__builtin_ia32_vpshrd_v8di", IX86_BUILTIN_VPSHRDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrd_v8di_mask, "__builtin_ia32_vpshrd_v8di_mask", IX86_BUILTIN_VPSHRDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v8di, "__builtin_ia32_vpshrd_v8di", IX86_BUILTIN_VPSHRDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrd_v8di_mask, "__builtin_ia32_vpshrd_v8di_mask", IX86_BUILTIN_VPSHRDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4di, "__builtin_ia32_vpshrd_v4di", IX86_BUILTIN_VPSHRDV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v4di_mask, "__builtin_ia32_vpshrd_v4di_mask", IX86_BUILTIN_VPSHRDV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT_V4DI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v2di, "__builtin_ia32_vpshrd_v2di", IX86_BUILTIN_VPSHRDV2DI, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrd_v2di_mask, "__builtin_ia32_vpshrd_v2di_mask", IX86_BUILTIN_VPSHRDV2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT_V2DI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v32hi, "__builtin_ia32_vpshld_v32hi", IX86_BUILTIN_VPSHLDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v32hi_mask, "__builtin_ia32_vpshld_v32hi_mask", IX86_BUILTIN_VPSHLDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v32hi, "__builtin_ia32_vpshld_v32hi", IX86_BUILTIN_VPSHLDV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v32hi_mask, "__builtin_ia32_vpshld_v32hi_mask", IX86_BUILTIN_VPSHLDV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_INT_V32HI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v16hi, "__builtin_ia32_vpshld_v16hi", IX86_BUILTIN_VPSHLDV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v16hi_mask, "__builtin_ia32_vpshld_v16hi_mask", IX86_BUILTIN_VPSHLDV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_INT_V16HI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8hi, "__builtin_ia32_vpshld_v8hi", IX86_BUILTIN_VPSHLDV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8hi_mask, "__builtin_ia32_vpshld_v8hi_mask", IX86_BUILTIN_VPSHLDV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT_V8HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v16si, "__builtin_ia32_vpshld_v16si", IX86_BUILTIN_VPSHLDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v16si_mask, "__builtin_ia32_vpshld_v16si_mask", IX86_BUILTIN_VPSHLDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v16si, "__builtin_ia32_vpshld_v16si", IX86_BUILTIN_VPSHLDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v16si_mask, "__builtin_ia32_vpshld_v16si_mask", IX86_BUILTIN_VPSHLDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_INT_V16SI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8si, "__builtin_ia32_vpshld_v8si", IX86_BUILTIN_VPSHLDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v8si_mask, "__builtin_ia32_vpshld_v8si_mask", IX86_BUILTIN_VPSHLDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT_V8SI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4si, "__builtin_ia32_vpshld_v4si", IX86_BUILTIN_VPSHLDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4si_mask, "__builtin_ia32_vpshld_v4si_mask", IX86_BUILTIN_VPSHLDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT_V4SI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v8di, "__builtin_ia32_vpshld_v8di", IX86_BUILTIN_VPSHLDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshld_v8di_mask, "__builtin_ia32_vpshld_v8di_mask", IX86_BUILTIN_VPSHLDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v8di, "__builtin_ia32_vpshld_v8di", IX86_BUILTIN_VPSHLDV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshld_v8di_mask, "__builtin_ia32_vpshld_v8di_mask", IX86_BUILTIN_VPSHLDV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT_V8DI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4di, "__builtin_ia32_vpshld_v4di", IX86_BUILTIN_VPSHLDV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v4di_mask, "__builtin_ia32_vpshld_v4di_mask", IX86_BUILTIN_VPSHLDV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT_V4DI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v2di, "__builtin_ia32_vpshld_v2di", IX86_BUILTIN_VPSHLDV2DI, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshld_v2di_mask, "__builtin_ia32_vpshld_v2di_mask", IX86_BUILTIN_VPSHLDV2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT_V2DI_INT)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v32hi, "__builtin_ia32_vpshrdv_v32hi", IX86_BUILTIN_VPSHRDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v32hi_mask, "__builtin_ia32_vpshrdv_v32hi_mask", IX86_BUILTIN_VPSHRDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v32hi_maskz, "__builtin_ia32_vpshrdv_v32hi_maskz", IX86_BUILTIN_VPSHRDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v32hi, "__builtin_ia32_vpshrdv_v32hi", IX86_BUILTIN_VPSHRDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v32hi_mask, "__builtin_ia32_vpshrdv_v32hi_mask", IX86_BUILTIN_VPSHRDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v32hi_maskz, "__builtin_ia32_vpshrdv_v32hi_maskz", IX86_BUILTIN_VPSHRDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v16hi, "__builtin_ia32_vpshrdv_v16hi", IX86_BUILTIN_VPSHRDVV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v16hi_mask, "__builtin_ia32_vpshrdv_v16hi_mask", IX86_BUILTIN_VPSHRDVV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v16hi_maskz, "__builtin_ia32_vpshrdv_v16hi_maskz", IX86_BUILTIN_VPSHRDVV16HI_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8hi, "__builtin_ia32_vpshrdv_v8hi", IX86_BUILTIN_VPSHRDVV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8hi_mask, "__builtin_ia32_vpshrdv_v8hi_mask", IX86_BUILTIN_VPSHRDVV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8hi_maskz, "__builtin_ia32_vpshrdv_v8hi_maskz", IX86_BUILTIN_VPSHRDVV8HI_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v16si, "__builtin_ia32_vpshrdv_v16si", IX86_BUILTIN_VPSHRDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v16si_mask, "__builtin_ia32_vpshrdv_v16si_mask", IX86_BUILTIN_VPSHRDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v16si_maskz, "__builtin_ia32_vpshrdv_v16si_maskz", IX86_BUILTIN_VPSHRDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v16si, "__builtin_ia32_vpshrdv_v16si", IX86_BUILTIN_VPSHRDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v16si_mask, "__builtin_ia32_vpshrdv_v16si_mask", IX86_BUILTIN_VPSHRDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v16si_maskz, "__builtin_ia32_vpshrdv_v16si_maskz", IX86_BUILTIN_VPSHRDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8si, "__builtin_ia32_vpshrdv_v8si", IX86_BUILTIN_VPSHRDVV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8si_mask, "__builtin_ia32_vpshrdv_v8si_mask", IX86_BUILTIN_VPSHRDVV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v8si_maskz, "__builtin_ia32_vpshrdv_v8si_maskz", IX86_BUILTIN_VPSHRDVV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4si, "__builtin_ia32_vpshrdv_v4si", IX86_BUILTIN_VPSHRDVV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4si_mask, "__builtin_ia32_vpshrdv_v4si_mask", IX86_BUILTIN_VPSHRDVV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4si_maskz, "__builtin_ia32_vpshrdv_v4si_maskz", IX86_BUILTIN_VPSHRDVV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v8di, "__builtin_ia32_vpshrdv_v8di", IX86_BUILTIN_VPSHRDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v8di_mask, "__builtin_ia32_vpshrdv_v8di_mask", IX86_BUILTIN_VPSHRDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshrdv_v8di_maskz, "__builtin_ia32_vpshrdv_v8di_maskz", IX86_BUILTIN_VPSHRDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v8di, "__builtin_ia32_vpshrdv_v8di", IX86_BUILTIN_VPSHRDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v8di_mask, "__builtin_ia32_vpshrdv_v8di_mask", IX86_BUILTIN_VPSHRDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshrdv_v8di_maskz, "__builtin_ia32_vpshrdv_v8di_maskz", IX86_BUILTIN_VPSHRDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4di, "__builtin_ia32_vpshrdv_v4di", IX86_BUILTIN_VPSHRDVV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4di_mask, "__builtin_ia32_vpshrdv_v4di_mask", IX86_BUILTIN_VPSHRDVV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v4di_maskz, "__builtin_ia32_vpshrdv_v4di_maskz", IX86_BUILTIN_VPSHRDVV4DI_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
@@ -2652,27 +2652,27 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshr
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v2di_mask, "__builtin_ia32_vpshrdv_v2di_mask", IX86_BUILTIN_VPSHRDVV2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshrdv_v2di_maskz, "__builtin_ia32_vpshrdv_v2di_maskz", IX86_BUILTIN_VPSHRDVV2DI_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v32hi, "__builtin_ia32_vpshldv_v32hi", IX86_BUILTIN_VPSHLDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v32hi_mask, "__builtin_ia32_vpshldv_v32hi_mask", IX86_BUILTIN_VPSHLDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v32hi_maskz, "__builtin_ia32_vpshldv_v32hi_maskz", IX86_BUILTIN_VPSHLDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v32hi, "__builtin_ia32_vpshldv_v32hi", IX86_BUILTIN_VPSHLDVV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v32hi_mask, "__builtin_ia32_vpshldv_v32hi_mask", IX86_BUILTIN_VPSHLDVV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v32hi_maskz, "__builtin_ia32_vpshldv_v32hi_maskz", IX86_BUILTIN_VPSHLDVV32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_V32HI_USI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v16hi, "__builtin_ia32_vpshldv_v16hi", IX86_BUILTIN_VPSHLDVV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v16hi_mask, "__builtin_ia32_vpshldv_v16hi_mask", IX86_BUILTIN_VPSHLDVV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v16hi_maskz, "__builtin_ia32_vpshldv_v16hi_maskz", IX86_BUILTIN_VPSHLDVV16HI_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8hi, "__builtin_ia32_vpshldv_v8hi", IX86_BUILTIN_VPSHLDVV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8hi_mask, "__builtin_ia32_vpshldv_v8hi_mask", IX86_BUILTIN_VPSHLDVV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8hi_maskz, "__builtin_ia32_vpshldv_v8hi_maskz", IX86_BUILTIN_VPSHLDVV8HI_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v16si, "__builtin_ia32_vpshldv_v16si", IX86_BUILTIN_VPSHLDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v16si_mask, "__builtin_ia32_vpshldv_v16si_mask", IX86_BUILTIN_VPSHLDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v16si_maskz, "__builtin_ia32_vpshldv_v16si_maskz", IX86_BUILTIN_VPSHLDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v16si, "__builtin_ia32_vpshldv_v16si", IX86_BUILTIN_VPSHLDVV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v16si_mask, "__builtin_ia32_vpshldv_v16si_mask", IX86_BUILTIN_VPSHLDVV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v16si_maskz, "__builtin_ia32_vpshldv_v16si_maskz", IX86_BUILTIN_VPSHLDVV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8si, "__builtin_ia32_vpshldv_v8si", IX86_BUILTIN_VPSHLDVV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8si_mask, "__builtin_ia32_vpshldv_v8si_mask", IX86_BUILTIN_VPSHLDVV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v8si_maskz, "__builtin_ia32_vpshldv_v8si_maskz", IX86_BUILTIN_VPSHLDVV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4si, "__builtin_ia32_vpshldv_v4si", IX86_BUILTIN_VPSHLDVV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4si_mask, "__builtin_ia32_vpshldv_v4si_mask", IX86_BUILTIN_VPSHLDVV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4si_maskz, "__builtin_ia32_vpshldv_v4si_maskz", IX86_BUILTIN_VPSHLDVV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v8di, "__builtin_ia32_vpshldv_v8di", IX86_BUILTIN_VPSHLDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v8di_mask, "__builtin_ia32_vpshldv_v8di_mask", IX86_BUILTIN_VPSHLDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VBMI2, 0, CODE_FOR_vpshldv_v8di_maskz, "__builtin_ia32_vpshldv_v8di_maskz", IX86_BUILTIN_VPSHLDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v8di, "__builtin_ia32_vpshldv_v8di", IX86_BUILTIN_VPSHLDVV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v8di_mask, "__builtin_ia32_vpshldv_v8di_mask", IX86_BUILTIN_VPSHLDVV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VBMI2, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpshldv_v8di_maskz, "__builtin_ia32_vpshldv_v8di_maskz", IX86_BUILTIN_VPSHLDVV8DI_MASKZ, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4di, "__builtin_ia32_vpshldv_v4di", IX86_BUILTIN_VPSHLDVV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4di_mask, "__builtin_ia32_vpshldv_v4di_mask", IX86_BUILTIN_VPSHLDVV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v4di_maskz, "__builtin_ia32_vpshldv_v4di_maskz", IX86_BUILTIN_VPSHLDVV4DI_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
@@ -2681,20 +2681,20 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshl
BDESC (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpshldv_v2di_maskz, "__builtin_ia32_vpshldv_v2di_maskz", IX86_BUILTIN_VPSHLDVV2DI_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI)
/* GFNI */
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vgf2p8affineinvqb_v32qi, "__builtin_ia32_vgf2p8affineinvqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineinvqb_v32qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vgf2p8affineinvqb_v16qi, "__builtin_ia32_vgf2p8affineinvqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vgf2p8affineinvqb_v16qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vgf2p8affineqb_v64qi, "__builtin_ia32_vgf2p8affineqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineqb_v64qi_mask, "__builtin_ia32_vgf2p8affineqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineqb_v64qi, "__builtin_ia32_vgf2p8affineqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8affineqb_v64qi_mask, "__builtin_ia32_vgf2p8affineqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vgf2p8affineqb_v32qi, "__builtin_ia32_vgf2p8affineqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8affineqb_v32qi_mask, "__builtin_ia32_vgf2p8affineqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vgf2p8affineqb_v16qi, "__builtin_ia32_vgf2p8affineqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vgf2p8affineqb_v16qi_mask, "__builtin_ia32_vgf2p8affineqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vgf2p8mulb_v64qi, "__builtin_ia32_vgf2p8mulb_v64qi", IX86_BUILTIN_VGF2P8MULB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8mulb_v64qi_mask, "__builtin_ia32_vgf2p8mulb_v64qi_mask", IX86_BUILTIN_VGF2P8MULB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8mulb_v64qi, "__builtin_ia32_vgf2p8mulb_v64qi", IX86_BUILTIN_VGF2P8MULB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vgf2p8mulb_v64qi_mask, "__builtin_ia32_vgf2p8mulb_v64qi_mask", IX86_BUILTIN_VGF2P8MULB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vgf2p8mulb_v32qi, "__builtin_ia32_vgf2p8mulb_v32qi", IX86_BUILTIN_VGF2P8MULB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_vgf2p8mulb_v32qi_mask, "__builtin_ia32_vgf2p8mulb_v32qi_mask", IX86_BUILTIN_VGF2P8MULB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_vgf2p8mulb_v16qi, "__builtin_ia32_vgf2p8mulb_v16qi", IX86_BUILTIN_VGF2P8MULB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
@@ -2702,9 +2702,9 @@ BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vgf2p8mulb_v
/* AVX512_VNNI */
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusd_v16si, "__builtin_ia32_vpdpbusd_v16si", IX86_BUILTIN_VPDPBUSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusd_v16si_mask, "__builtin_ia32_vpdpbusd_v16si_mask", IX86_BUILTIN_VPDPBUSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusd_v16si_maskz, "__builtin_ia32_vpdpbusd_v16si_maskz", IX86_BUILTIN_VPDPBUSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusd_v16si, "__builtin_ia32_vpdpbusd_v16si", IX86_BUILTIN_VPDPBUSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusd_v16si_mask, "__builtin_ia32_vpdpbusd_v16si_mask", IX86_BUILTIN_VPDPBUSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusd_v16si_maskz, "__builtin_ia32_vpdpbusd_v16si_maskz", IX86_BUILTIN_VPDPBUSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpbusd_v8si, "__builtin_ia32_vpdpbusd_v8si", IX86_BUILTIN_VPDPBUSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v8si_mask, "__builtin_ia32_vpdpbusd_v8si_mask", IX86_BUILTIN_VPDPBUSDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v8si_maskz, "__builtin_ia32_vpdpbusd_v8si_maskz", IX86_BUILTIN_VPDPBUSDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
@@ -2712,9 +2712,9 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v4si_mask, "__builtin_ia32_vpdpbusd_v4si_mask", IX86_BUILTIN_VPDPBUSDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusd_v4si_maskz, "__builtin_ia32_vpdpbusd_v4si_maskz", IX86_BUILTIN_VPDPBUSDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusds_v16si, "__builtin_ia32_vpdpbusds_v16si", IX86_BUILTIN_VPDPBUSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusds_v16si_mask, "__builtin_ia32_vpdpbusds_v16si_mask", IX86_BUILTIN_VPDPBUSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpbusds_v16si_maskz, "__builtin_ia32_vpdpbusds_v16si_maskz", IX86_BUILTIN_VPDPBUSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusds_v16si, "__builtin_ia32_vpdpbusds_v16si", IX86_BUILTIN_VPDPBUSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusds_v16si_mask, "__builtin_ia32_vpdpbusds_v16si_mask", IX86_BUILTIN_VPDPBUSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpbusds_v16si_maskz, "__builtin_ia32_vpdpbusds_v16si_maskz", IX86_BUILTIN_VPDPBUSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpbusds_v8si, "__builtin_ia32_vpdpbusds_v8si", IX86_BUILTIN_VPDPBUSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v8si_mask, "__builtin_ia32_vpdpbusds_v8si_mask", IX86_BUILTIN_VPDPBUSDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v8si_maskz, "__builtin_ia32_vpdpbusds_v8si_maskz", IX86_BUILTIN_VPDPBUSDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
@@ -2722,9 +2722,9 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v4si_mask, "__builtin_ia32_vpdpbusds_v4si_mask", IX86_BUILTIN_VPDPBUSDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpbusds_v4si_maskz, "__builtin_ia32_vpdpbusds_v4si_maskz", IX86_BUILTIN_VPDPBUSDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssd_v16si, "__builtin_ia32_vpdpwssd_v16si", IX86_BUILTIN_VPDPWSSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssd_v16si_mask, "__builtin_ia32_vpdpwssd_v16si_mask", IX86_BUILTIN_VPDPWSSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssd_v16si_maskz, "__builtin_ia32_vpdpwssd_v16si_maskz", IX86_BUILTIN_VPDPWSSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssd_v16si, "__builtin_ia32_vpdpwssd_v16si", IX86_BUILTIN_VPDPWSSDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssd_v16si_mask, "__builtin_ia32_vpdpwssd_v16si_mask", IX86_BUILTIN_VPDPWSSDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssd_v16si_maskz, "__builtin_ia32_vpdpwssd_v16si_maskz", IX86_BUILTIN_VPDPWSSDV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpwssd_v8si, "__builtin_ia32_vpdpwssd_v8si", IX86_BUILTIN_VPDPWSSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v8si_mask, "__builtin_ia32_vpdpwssd_v8si_mask", IX86_BUILTIN_VPDPWSSDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v8si_maskz, "__builtin_ia32_vpdpwssd_v8si_maskz", IX86_BUILTIN_VPDPWSSDV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
@@ -2732,9 +2732,9 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v4si_mask, "__builtin_ia32_vpdpwssd_v4si_mask", IX86_BUILTIN_VPDPWSSDV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssd_v4si_maskz, "__builtin_ia32_vpdpwssd_v4si_maskz", IX86_BUILTIN_VPDPWSSDV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssds_v16si, "__builtin_ia32_vpdpwssds_v16si", IX86_BUILTIN_VPDPWSSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssds_v16si_mask, "__builtin_ia32_vpdpwssds_v16si_mask", IX86_BUILTIN_VPDPWSSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VNNI, 0, CODE_FOR_vpdpwssds_v16si_maskz, "__builtin_ia32_vpdpwssds_v16si_maskz", IX86_BUILTIN_VPDPWSSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssds_v16si, "__builtin_ia32_vpdpwssds_v16si", IX86_BUILTIN_VPDPWSSDSV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssds_v16si_mask, "__builtin_ia32_vpdpwssds_v16si_mask", IX86_BUILTIN_VPDPWSSDSV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VNNI, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpdpwssds_v16si_maskz, "__builtin_ia32_vpdpwssds_v16si_maskz", IX86_BUILTIN_VPDPWSSDSV16SI_MASKZ, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXVNNI, CODE_FOR_vpdpwssds_v8si, "__builtin_ia32_vpdpwssds_v8si", IX86_BUILTIN_VPDPWSSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v8si_mask, "__builtin_ia32_vpdpwssds_v8si_mask", IX86_BUILTIN_VPDPWSSDSV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v8si_maskz, "__builtin_ia32_vpdpwssds_v8si_maskz", IX86_BUILTIN_VPDPWSSDSV8SI_MASKZ, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
@@ -2773,13 +2773,13 @@ BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT16, CODE_FOR_vpdpwuuds_v4si, "__builtin_ia3
/* VPCLMULQDQ */
BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpclmulqdq_v2di, "__builtin_ia32_vpclmulqdq_v2di", IX86_BUILTIN_VPCLMULQDQ2, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT)
BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vpclmulqdq_v4di, "__builtin_ia32_vpclmulqdq_v4di", IX86_BUILTIN_VPCLMULQDQ4, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT)
-BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_vpclmulqdq_v8di, "__builtin_ia32_vpclmulqdq_v8di", IX86_BUILTIN_VPCLMULQDQ8, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
+BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpclmulqdq_v8di, "__builtin_ia32_vpclmulqdq_v8di", IX86_BUILTIN_VPCLMULQDQ8, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_INT)
/* VPOPCNTDQ */
-BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv16si, "__builtin_ia32_vpopcountd_v16si", IX86_BUILTIN_VPOPCOUNTDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI)
-BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv16si_mask, "__builtin_ia32_vpopcountd_v16si_mask", IX86_BUILTIN_VPOPCOUNTDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv8di, "__builtin_ia32_vpopcountq_v8di", IX86_BUILTIN_VPOPCOUNTQV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI)
-BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, 0, CODE_FOR_vpopcountv8di_mask, "__builtin_ia32_vpopcountq_v8di_mask", IX86_BUILTIN_VPOPCOUNTQV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv16si, "__builtin_ia32_vpopcountd_v16si", IX86_BUILTIN_VPOPCOUNTDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI)
+BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv16si_mask, "__builtin_ia32_vpopcountd_v16si_mask", IX86_BUILTIN_VPOPCOUNTDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv8di, "__builtin_ia32_vpopcountq_v8di", IX86_BUILTIN_VPOPCOUNTQV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI)
+BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv8di_mask, "__builtin_ia32_vpopcountq_v8di_mask", IX86_BUILTIN_VPOPCOUNTQV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv4di, "__builtin_ia32_vpopcountq_v4di", IX86_BUILTIN_VPOPCOUNTQV4DI, UNKNOWN, (int) V4DI_FTYPE_V4DI)
BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv4di_mask, "__builtin_ia32_vpopcountq_v4di_mask", IX86_BUILTIN_VPOPCOUNTQV4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_UQI)
@@ -2791,21 +2791,21 @@ BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_v
BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv8si_mask, "__builtin_ia32_vpopcountd_v8si_mask", IX86_BUILTIN_VPOPCOUNTDV8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_UHI)
/* BITALG */
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv64qi, "__builtin_ia32_vpopcountb_v64qi", IX86_BUILTIN_VPOPCOUNTBV64QI, UNKNOWN, (int) V64QI_FTYPE_V64QI)
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv64qi_mask, "__builtin_ia32_vpopcountb_v64qi_mask", IX86_BUILTIN_VPOPCOUNTBV64QI_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv64qi, "__builtin_ia32_vpopcountb_v64qi", IX86_BUILTIN_VPOPCOUNTBV64QI, UNKNOWN, (int) V64QI_FTYPE_V64QI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv64qi_mask, "__builtin_ia32_vpopcountb_v64qi_mask", IX86_BUILTIN_VPOPCOUNTBV64QI_MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_UDI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv32qi, "__builtin_ia32_vpopcountb_v32qi", IX86_BUILTIN_VPOPCOUNTBV32QI, UNKNOWN, (int) V32QI_FTYPE_V32QI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv32qi_mask, "__builtin_ia32_vpopcountb_v32qi_mask", IX86_BUILTIN_VPOPCOUNTBV32QI_MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16qi, "__builtin_ia32_vpopcountb_v16qi", IX86_BUILTIN_VPOPCOUNTBV16QI, UNKNOWN, (int) V16QI_FTYPE_V16QI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16qi_mask, "__builtin_ia32_vpopcountb_v16qi_mask", IX86_BUILTIN_VPOPCOUNTBV16QI_MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv32hi, "__builtin_ia32_vpopcountw_v32hi", IX86_BUILTIN_VPOPCOUNTWV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI)
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_vpopcountv32hi_mask, "__builtin_ia32_vpopcountw_v32hi_mask", IX86_BUILTIN_VPOPCOUNTQV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv32hi, "__builtin_ia32_vpopcountw_v32hi", IX86_BUILTIN_VPOPCOUNTWV32HI, UNKNOWN, (int) V32HI_FTYPE_V32HI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_vpopcountv32hi_mask, "__builtin_ia32_vpopcountw_v32hi_mask", IX86_BUILTIN_VPOPCOUNTQV32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V32HI_V32HI_USI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16hi, "__builtin_ia32_vpopcountw_v16hi", IX86_BUILTIN_VPOPCOUNTWV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv16hi_mask, "__builtin_ia32_vpopcountw_v16hi_mask", IX86_BUILTIN_VPOPCOUNTQV16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv8hi, "__builtin_ia32_vpopcountw_v8hi", IX86_BUILTIN_VPOPCOUNTWV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpopcountv8hi_mask, "__builtin_ia32_vpopcountw_v8hi_mask", IX86_BUILTIN_VPOPCOUNTQV8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512BITALG, 0, CODE_FOR_avx512vl_vpshufbitqmbv64qi_mask, "__builtin_ia32_vpshufbitqmb512_mask", IX86_BUILTIN_VPSHUFBITQMB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BITALG, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512vl_vpshufbitqmbv64qi_mask, "__builtin_ia32_vpshufbitqmb512_mask", IX86_BUILTIN_VPSHUFBITQMB512_MASK, UNKNOWN, (int) UDI_FTYPE_V64QI_V64QI_UDI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpshufbitqmbv32qi_mask, "__builtin_ia32_vpshufbitqmb256_mask", IX86_BUILTIN_VPSHUFBITQMB256_MASK, UNKNOWN, (int) USI_FTYPE_V32QI_V32QI_USI)
BDESC (OPTION_MASK_ISA_AVX512BITALG | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpshufbitqmbv16qi_mask, "__builtin_ia32_vpshufbitqmb128_mask", IX86_BUILTIN_VPSHUFBITQMB128_MASK, UNKNOWN, (int) UHI_FTYPE_V16QI_V16QI_UHI)
@@ -2829,39 +2829,39 @@ BDESC (0, OPTION_MASK_ISA2_RDPID, CODE_FOR_rdpid, "__builtin_ia32_rdpid", IX86_B
/* VAES. */
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdec_v16qi, "__builtin_ia32_vaesdec_v16qi", IX86_BUILTIN_VAESDEC16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdec_v32qi, "__builtin_ia32_vaesdec_v32qi", IX86_BUILTIN_VAESDEC32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
-BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdec_v64qi, "__builtin_ia32_vaesdec_v64qi", IX86_BUILTIN_VAESDEC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesdec_v64qi, "__builtin_ia32_vaesdec_v64qi", IX86_BUILTIN_VAESDEC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdeclast_v16qi, "__builtin_ia32_vaesdeclast_v16qi", IX86_BUILTIN_VAESDECLAST16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdeclast_v32qi, "__builtin_ia32_vaesdeclast_v32qi", IX86_BUILTIN_VAESDECLAST32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
-BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesdeclast_v64qi, "__builtin_ia32_vaesdeclast_v64qi", IX86_BUILTIN_VAESDECLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesdeclast_v64qi, "__builtin_ia32_vaesdeclast_v64qi", IX86_BUILTIN_VAESDECLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenc_v16qi, "__builtin_ia32_vaesenc_v16qi", IX86_BUILTIN_VAESENC16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenc_v32qi, "__builtin_ia32_vaesenc_v32qi", IX86_BUILTIN_VAESENC32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
-BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenc_v64qi, "__builtin_ia32_vaesenc_v64qi", IX86_BUILTIN_VAESENC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesenc_v64qi, "__builtin_ia32_vaesenc_v64qi", IX86_BUILTIN_VAESENC64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v16qi, "__builtin_ia32_vaesenclast_v16qi", IX86_BUILTIN_VAESENCLAST16, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v32qi, "__builtin_ia32_vaesenclast_v32qi", IX86_BUILTIN_VAESENCLAST32, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI)
-BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v64qi, "__builtin_ia32_vaesenclast_v64qi", IX86_BUILTIN_VAESENCLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
+BDESC (0, OPTION_MASK_ISA2_VAES | OPTION_MASK_ISA2_EVEX512, CODE_FOR_vaesenclast_v64qi, "__builtin_ia32_vaesenclast_v64qi", IX86_BUILTIN_VAESENCLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI)
/* BF16 */
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf, "__builtin_ia32_cvtne2ps2bf16_v32bf", IX86_BUILTIN_CVTNE2PS2BF16_V32BF, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_mask, "__builtin_ia32_cvtne2ps2bf16_v32bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASK, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_V32BF_USI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v32bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf, "__builtin_ia32_cvtne2ps2bf16_v32bf", IX86_BUILTIN_CVTNE2PS2BF16_V32BF, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_mask, "__builtin_ia32_cvtne2ps2bf16_v32bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASK, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_V32BF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v32bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_USI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf, "__builtin_ia32_cvtne2ps2bf16_v16bf", IX86_BUILTIN_CVTNE2PS2BF16_V16BF, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf_mask, "__builtin_ia32_cvtne2ps2bf16_v16bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V16BF_MASK, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_V16BF_UHI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v16bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16BF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf, "__builtin_ia32_cvtne2ps2bf16_v8bf", IX86_BUILTIN_CVTNE2PS2BF16_V8BF, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf_mask, "__builtin_ia32_cvtne2ps2bf16_v8bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V8BF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_V8BF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v8bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V8BF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_UQI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf, "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2BF16_V16SF, UNKNOWN, (int) V16BF_FTYPE_V16SF)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V16SF_MASK, UNKNOWN, (int) V16BF_FTYPE_V16SF_V16BF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16SF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16SF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtneps2bf16_v16sf, "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2BF16_V16SF, UNKNOWN, (int) V16BF_FTYPE_V16SF)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtneps2bf16_v16sf_mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V16SF_MASK, UNKNOWN, (int) V16BF_FTYPE_V16SF_V16BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_cvtneps2bf16_v16sf_maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16SF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_vcvtneps2bf16_v8sf, "__builtin_ia32_cvtneps2bf16_v8sf", IX86_BUILTIN_CVTNEPS2BF16_V8SF, UNKNOWN, (int) V8BF_FTYPE_V8SF)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_mask, "__builtin_ia32_cvtneps2bf16_v8sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V8SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V8SF_V8BF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_maskz, "__builtin_ia32_cvtneps2bf16_v8sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V8SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8SF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVXNECONVERT | OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_vcvtneps2bf16_v4sf, "__builtin_ia32_cvtneps2bf16_v4sf", IX86_BUILTIN_CVTNEPS2BF16_V4SF, UNKNOWN, (int) V8BF_FTYPE_V4SF)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_mask, "__builtin_ia32_cvtneps2bf16_v4sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V4SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V8BF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_maskz, "__builtin_ia32_cvtneps2bf16_v4sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V4SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_UQI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf, "__builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPBF16PS_V16SF, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_mask, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPBF16PS_V16SF_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_maskz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPBF16PS_V16SF_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_dpbf16ps_v16sf, "__builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPBF16PS_V16SF, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_dpbf16ps_v16sf_mask, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPBF16PS_V16SF_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512BF16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_dpbf16ps_v16sf_maskz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPBF16PS_V16SF_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf, "__builtin_ia32_dpbf16ps_v8sf", IX86_BUILTIN_DPBF16PS_V8SF, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_mask, "__builtin_ia32_dpbf16ps_v8sf_mask", IX86_BUILTIN_DPBF16PS_V8SF_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_maskz, "__builtin_ia32_dpbf16ps_v8sf_maskz", IX86_BUILTIN_DPBF16PS_V8SF_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI)
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 11/18] [PATCH 5/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (9 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 10/18] [PATCH 4/5] " Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 12/18] Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 Hu, Lin1
` (8 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.
---
gcc/config/i386/i386-builtin.def | 156 +++++++++++++++----------------
1 file changed, 78 insertions(+), 78 deletions(-)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 8250e2998cd..b90d5ccc969 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1568,9 +1568,9 @@ BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_copysignv8df3
BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_sqrtv8df2, "__builtin_ia32_sqrtpd512", IX86_BUILTIN_SQRTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF)
BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_sqrtv16sf2, "__builtin_ia32_sqrtps512", IX86_BUILTIN_SQRTPS_NR512, UNKNOWN, (int) V16SF_FTYPE_V16SF)
BDESC (OPTION_MASK_ISA_AVX512ER, 0, CODE_FOR_avx512er_exp2v16sf, "__builtin_ia32_exp2ps", IX86_BUILTIN_EXP2PS, UNKNOWN, (int) V16SF_FTYPE_V16SF)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_floorph512", IX86_BUILTIN_FLOORPH512, (enum rtx_code) ROUND_FLOOR, (int) V32HF_FTYPE_V32HF_ROUND)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_ceilph512", IX86_BUILTIN_CEILPH512, (enum rtx_code) ROUND_CEIL, (int) V32HF_FTYPE_V32HF_ROUND)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_truncph512", IX86_BUILTIN_TRUNCPH512, (enum rtx_code) ROUND_TRUNC, (int) V32HF_FTYPE_V32HF_ROUND)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_floorph512", IX86_BUILTIN_FLOORPH512, (enum rtx_code) ROUND_FLOOR, (int) V32HF_FTYPE_V32HF_ROUND)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_ceilph512", IX86_BUILTIN_CEILPH512, (enum rtx_code) ROUND_CEIL, (int) V32HF_FTYPE_V32HF_ROUND)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf, "__builtin_ia32_truncph512", IX86_BUILTIN_TRUNCPH512, (enum rtx_code) ROUND_TRUNC, (int) V32HF_FTYPE_V32HF_ROUND)
BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_floorps512", IX86_BUILTIN_FLOORPS512, (enum rtx_code) ROUND_FLOOR, (int) V16SF_FTYPE_V16SF_ROUND)
BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_ceilps512", IX86_BUILTIN_CEILPS512, (enum rtx_code) ROUND_CEIL, (int) V16SF_FTYPE_V16SF_ROUND)
BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512f_roundps512, "__builtin_ia32_truncps512", IX86_BUILTIN_TRUNCPS512, (enum rtx_code) ROUND_TRUNC, (int) V16SF_FTYPE_V16SF_ROUND)
@@ -2874,40 +2874,40 @@ BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_extendbfsf2_1, "__builtin_ia32_cvtbf2sf
/* AVX512FP16. */
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_addph128_mask", IX86_BUILTIN_ADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv16hf3_mask, "__builtin_ia32_addph256_mask", IX86_BUILTIN_ADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask, "__builtin_ia32_addph512_mask", IX86_BUILTIN_ADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv32hf3_mask, "__builtin_ia32_addph512_mask", IX86_BUILTIN_ADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv8hf3_mask, "__builtin_ia32_subph128_mask", IX86_BUILTIN_SUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv16hf3_mask, "__builtin_ia32_subph256_mask", IX86_BUILTIN_SUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask, "__builtin_ia32_subph512_mask", IX86_BUILTIN_SUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv32hf3_mask, "__builtin_ia32_subph512_mask", IX86_BUILTIN_SUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv8hf3_mask, "__builtin_ia32_mulph128_mask", IX86_BUILTIN_MULPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv16hf3_mask, "__builtin_ia32_mulph256_mask", IX86_BUILTIN_MULPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_mulph512_mask", IX86_BUILTIN_MULPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_mulph512_mask", IX86_BUILTIN_MULPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv8hf3_mask, "__builtin_ia32_divph128_mask", IX86_BUILTIN_DIVPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv16hf3_mask, "__builtin_ia32_divph256_mask", IX86_BUILTIN_DIVPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_divph512_mask", IX86_BUILTIN_DIVPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_divph512_mask", IX86_BUILTIN_DIVPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask, "__builtin_ia32_addsh_mask", IX86_BUILTIN_ADDSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask, "__builtin_ia32_subsh_mask", IX86_BUILTIN_SUBSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask, "__builtin_ia32_mulsh_mask", IX86_BUILTIN_MULSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask, "__builtin_ia32_divsh_mask", IX86_BUILTIN_DIVSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv8hf3_mask, "__builtin_ia32_maxph128_mask", IX86_BUILTIN_MAXPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv16hf3_mask, "__builtin_ia32_maxph256_mask", IX86_BUILTIN_MAXPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask, "__builtin_ia32_maxph512_mask", IX86_BUILTIN_MAXPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv32hf3_mask, "__builtin_ia32_maxph512_mask", IX86_BUILTIN_MAXPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv8hf3_mask, "__builtin_ia32_minph128_mask", IX86_BUILTIN_MINPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv16hf3_mask, "__builtin_ia32_minph256_mask", IX86_BUILTIN_MINPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_minph512_mask", IX86_BUILTIN_MINPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_minph512_mask", IX86_BUILTIN_MINPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask, "__builtin_ia32_maxsh_mask", IX86_BUILTIN_MAXSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__builtin_ia32_minsh_mask", IX86_BUILTIN_MINSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_cmpv8hf3_mask, "__builtin_ia32_cmpph128_mask", IX86_BUILTIN_CMPPH128_MASK, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_cmpv16hf3_mask, "__builtin_ia32_cmpph256_mask", IX86_BUILTIN_CMPPH256_MASK, UNKNOWN, (int) UHI_FTYPE_V16HF_V16HF_INT_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_cmpph512_mask", IX86_BUILTIN_CMPPH512_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_cmpph512_mask", IX86_BUILTIN_CMPPH512_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv8hf2_mask, "__builtin_ia32_sqrtph128_mask", IX86_BUILTIN_SQRTPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv16hf2_mask, "__builtin_ia32_sqrtph256_mask", IX86_BUILTIN_SQRTPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv8hf2_mask, "__builtin_ia32_rsqrtph128_mask", IX86_BUILTIN_RSQRTPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv16hf2_mask, "__builtin_ia32_rsqrtph256_mask", IX86_BUILTIN_RSQRTPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_rsqrtph512_mask", IX86_BUILTIN_RSQRTPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_rsqrtph512_mask", IX86_BUILTIN_RSQRTPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrsqrtv8hf2_mask, "__builtin_ia32_rsqrtsh_mask", IX86_BUILTIN_RSQRTSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv8hf2_mask, "__builtin_ia32_rcpph128_mask", IX86_BUILTIN_RCPPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv16hf2_mask, "__builtin_ia32_rcpph256_mask", IX86_BUILTIN_RCPPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__builtin_ia32_rcpph512_mask", IX86_BUILTIN_RCPPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__builtin_ia32_rcpph512_mask", IX86_BUILTIN_RCPPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrcpv8hf2_mask, "__builtin_ia32_rcpsh_mask", IX86_BUILTIN_RCPSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_scalefv8hf_mask, "__builtin_ia32_scalefph128_mask", IX86_BUILTIN_SCALEFPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_scalefv16hf_mask, "__builtin_ia32_scalefph256_mask", IX86_BUILTIN_SCALEFPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
@@ -2917,7 +2917,7 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_rndscalev16hf_mask, "__builtin_ia32_rndscaleph256_mask", IX86_BUILTIN_RNDSCALEPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv16hf_mask, "__builtin_ia32_fpclassph256_mask", IX86_BUILTIN_FPCLASSPH256, UNKNOWN, (int) HI_FTYPE_V16HF_INT_UHI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv8hf_mask, "__builtin_ia32_fpclassph128_mask", IX86_BUILTIN_FPCLASSPH128, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv32hf_mask, "__builtin_ia32_fpclassph512_mask", IX86_BUILTIN_FPCLASSPH512, UNKNOWN, (int) SI_FTYPE_V32HF_INT_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_fpclassv32hf_mask, "__builtin_ia32_fpclassph512_mask", IX86_BUILTIN_FPCLASSPH512, UNKNOWN, (int) SI_FTYPE_V32HF_INT_USI)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_vmfpclassv8hf_mask, "__builtin_ia32_fpclasssh_mask", IX86_BUILTIN_FPCLASSSH_MASK, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getexpv16hf_mask, "__builtin_ia32_getexpph256_mask", IX86_BUILTIN_GETEXPPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getexpv8hf_mask, "__builtin_ia32_getexpph128_mask", IX86_BUILTIN_GETEXPPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
@@ -3229,50 +3229,50 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_ran
BDESC (OPTION_MASK_ISA_AVX512DQ, OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
/* AVX512FP16. */
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_addph512_mask_round", IX86_BUILTIN_ADDPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_subph512_mask_round", IX86_BUILTIN_SUBPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_mulph512_mask_round", IX86_BUILTIN_MULPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_divph512_mask_round", IX86_BUILTIN_DIVPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_addph512_mask_round", IX86_BUILTIN_ADDPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_subph512_mask_round", IX86_BUILTIN_SUBPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_mulph512_mask_round", IX86_BUILTIN_MULPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_divph512_mask_round", IX86_BUILTIN_DIVPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask_round, "__builtin_ia32_addsh_mask_round", IX86_BUILTIN_ADDSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask_round, "__builtin_ia32_subsh_mask_round", IX86_BUILTIN_SUBSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask_round, "__builtin_ia32_mulsh_mask_round", IX86_BUILTIN_MULSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask_round, "__builtin_ia32_divsh_mask_round", IX86_BUILTIN_DIVSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask_round, "__builtin_ia32_maxph512_mask_round", IX86_BUILTIN_MAXPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_minph512_mask_round", IX86_BUILTIN_MINPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_smaxv32hf3_mask_round, "__builtin_ia32_maxph512_mask_round", IX86_BUILTIN_MAXPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_minph512_mask_round", IX86_BUILTIN_MINPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_round, "__builtin_ia32_maxsh_mask_round", IX86_BUILTIN_MAXSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_minsh_mask_round", IX86_BUILTIN_MINSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_cmpph512_mask_round", IX86_BUILTIN_CMPPH512_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_cmpph512_mask_round", IX86_BUILTIN_CMPPH512_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_cmpsh_mask_round", IX86_BUILTIN_CMPSH_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_sqrtph512_mask_round", IX86_BUILTIN_SQRTPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_sqrtph512_mask_round", IX86_BUILTIN_SQRTPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_sqrtsh_mask_round", IX86_BUILTIN_SQRTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_scalefph512_mask_round", IX86_BUILTIN_SCALEFPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_scalefph512_mask_round", IX86_BUILTIN_SCALEFPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmscalefv8hf_mask_round, "__builtin_ia32_scalefsh_mask_round", IX86_BUILTIN_SCALEFSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv32hf_mask_round, "__builtin_ia32_reduceph512_mask_round", IX86_BUILTIN_REDUCEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_reducepv32hf_mask_round, "__builtin_ia32_reduceph512_mask_round", IX86_BUILTIN_REDUCEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducesv8hf_mask_round, "__builtin_ia32_reducesh_mask_round", IX86_BUILTIN_REDUCESH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_rndscaleph512_mask_round", IX86_BUILTIN_RNDSCALEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_rndscaleph512_mask_round", IX86_BUILTIN_RNDSCALEPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_rndscalev8hf_mask_round, "__builtin_ia32_rndscalesh_mask_round", IX86_BUILTIN_RNDSCALESH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getexpv32hf_mask_round, "__builtin_ia32_getexpph512_mask", IX86_BUILTIN_GETEXPPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_getexpv32hf_mask_round, "__builtin_ia32_getexpph512_mask", IX86_BUILTIN_GETEXPPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_sgetexpv8hf_mask_round, "__builtin_ia32_getexpsh_mask_round", IX86_BUILTIN_GETEXPSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vgetmantv8hf_mask_round, "__builtin_ia32_getmantsh_mask_round", IX86_BUILTIN_GETMANTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq512_mask_round", IX86_BUILTIN_VCVTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq512_mask_round", IX86_BUILTIN_VCVTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv16si2_mask_round, "__builtin_ia32_vcvttph2dq512_mask_round", IX86_BUILTIN_VCVTTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv16si2_mask_round, "__builtin_ia32_vcvttph2udq512_mask_round", IX86_BUILTIN_VCVTTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq512_mask_round", IX86_BUILTIN_VCVTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq512_mask_round", IX86_BUILTIN_VCVTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv8di2_mask_round, "__builtin_ia32_vcvttph2qq512_mask_round", IX86_BUILTIN_VCVTTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv8di2_mask_round, "__builtin_ia32_vcvttph2uqq512_mask_round", IX86_BUILTIN_VCVTTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w512_mask_round", IX86_BUILTIN_VCVTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw512_mask_round", IX86_BUILTIN_VCVTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2w512_mask_round", IX86_BUILTIN_VCVTTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2uw512_mask_round", IX86_BUILTIN_VCVTTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph512_mask_round", IX86_BUILTIN_VCVTW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph512_mask_round", IX86_BUILTIN_VCVTUW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph512_mask_round", IX86_BUILTIN_VCVTDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph512_mask_round", IX86_BUILTIN_VCVTUDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph512_mask_round", IX86_BUILTIN_VCVTQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph512_mask_round", IX86_BUILTIN_VCVTUQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq512_mask_round", IX86_BUILTIN_VCVTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq512_mask_round", IX86_BUILTIN_VCVTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fix_truncv16si2_mask_round, "__builtin_ia32_vcvttph2dq512_mask_round", IX86_BUILTIN_VCVTTPH2DQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fixuns_truncv16si2_mask_round, "__builtin_ia32_vcvttph2udq512_mask_round", IX86_BUILTIN_VCVTTPH2UDQ512_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq512_mask_round", IX86_BUILTIN_VCVTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq512_mask_round", IX86_BUILTIN_VCVTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fix_truncv8di2_mask_round, "__builtin_ia32_vcvttph2qq512_mask_round", IX86_BUILTIN_VCVTTPH2QQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fixuns_truncv8di2_mask_round, "__builtin_ia32_vcvttph2uqq512_mask_round", IX86_BUILTIN_VCVTTPH2UQQ512_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w512_mask_round", IX86_BUILTIN_VCVTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw512_mask_round", IX86_BUILTIN_VCVTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fix_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2w512_mask_round", IX86_BUILTIN_VCVTTPH2W512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_fixuns_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2uw512_mask_round", IX86_BUILTIN_VCVTTPH2UW512_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph512_mask_round", IX86_BUILTIN_VCVTW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph512_mask_round", IX86_BUILTIN_VCVTUW2PH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph512_mask_round", IX86_BUILTIN_VCVTDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph512_mask_round", IX86_BUILTIN_VCVTUDQ2PH512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph512_mask_round", IX86_BUILTIN_VCVTQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph512_mask_round", IX86_BUILTIN_VCVTUQQ2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2si_round, "__builtin_ia32_vcvtsh2si32_round", IX86_BUILTIN_VCVTSH2SI32_ROUND, UNKNOWN, (int) INT_FTYPE_V8HF_INT)
BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2siq_round, "__builtin_ia32_vcvtsh2si64_round", IX86_BUILTIN_VCVTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usi_round, "__builtin_ia32_vcvtsh2usi32_round", IX86_BUILTIN_VCVTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT)
@@ -3285,32 +3285,32 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__b
BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT)
BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2shq_round, "__builtin_ia32_vcvtusi2sh64_round", IX86_BUILTIN_VCVTUSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT64_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv8df2_mask_round, "__builtin_ia32_vcvtph2pd512_mask_round", IX86_BUILTIN_VCVTPH2PD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8HF_V8DF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2psx512_mask_round", IX86_BUILTIN_VCVTPH2PSX512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph512_mask_round", IX86_BUILTIN_VCVTPD2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2phx512_mask_round", IX86_BUILTIN_VCVTPS2PHX512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_float_extend_phv8df2_mask_round, "__builtin_ia32_vcvtph2pd512_mask_round", IX86_BUILTIN_VCVTPH2PD512_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8HF_V8DF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2psx512_mask_round", IX86_BUILTIN_VCVTPH2PSX512_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph512_mask_round", IX86_BUILTIN_VCVTPD2PH512_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2phx512_mask_round", IX86_BUILTIN_VCVTPS2PHX512_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2ss_mask_round, "__builtin_ia32_vcvtsh2ss_mask_round", IX86_BUILTIN_VCVTSH2SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2sd_mask_round, "__builtin_ia32_vcvtsh2sd_mask_round", IX86_BUILTIN_VCVTSH2SD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtss2sh_mask_round, "__builtin_ia32_vcvtss2sh_mask_round", IX86_BUILTIN_VCVTSS2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsd2sh_mask_round, "__builtin_ia32_vcvtsd2sh_mask_round", IX86_BUILTIN_VCVTSD2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask_round, "__builtin_ia32_vfmaddsubph512_mask", IX86_BUILTIN_VFMADDSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask3_round, "__builtin_ia32_vfmaddsubph512_mask3", IX86_BUILTIN_VFMADDSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_round, "__builtin_ia32_vfmaddsubph512_maskz", IX86_BUILTIN_VFMADDSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask_round, "__builtin_ia32_vfmaddph512_mask", IX86_BUILTIN_VFMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask3_round, "__builtin_ia32_vfmaddph512_mask3", IX86_BUILTIN_VFMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_maskz_round, "__builtin_ia32_vfmaddph512_maskz", IX86_BUILTIN_VFMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask_round, "__builtin_ia32_vfnmaddph512_mask", IX86_BUILTIN_VFNMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask3_round, "__builtin_ia32_vfnmaddph512_mask3", IX86_BUILTIN_VFNMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_maskz_round, "__builtin_ia32_vfnmaddph512_maskz", IX86_BUILTIN_VFNMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask_round, "__builtin_ia32_vfmsubph512_mask", IX86_BUILTIN_VFMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask3_round, "__builtin_ia32_vfmsubph512_mask3", IX86_BUILTIN_VFMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round, "__builtin_ia32_vfmsubph512_maskz", IX86_BUILTIN_VFMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddsub_v32hf_mask_round, "__builtin_ia32_vfmaddsubph512_mask", IX86_BUILTIN_VFMADDSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddsub_v32hf_mask3_round, "__builtin_ia32_vfmaddsubph512_mask3", IX86_BUILTIN_VFMADDSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_round, "__builtin_ia32_vfmaddsubph512_maskz", IX86_BUILTIN_VFMADDSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmadd_v32hf_mask_round, "__builtin_ia32_vfmaddph512_mask", IX86_BUILTIN_VFMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmadd_v32hf_mask3_round, "__builtin_ia32_vfmaddph512_mask3", IX86_BUILTIN_VFMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmadd_v32hf_maskz_round, "__builtin_ia32_vfmaddph512_maskz", IX86_BUILTIN_VFMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmadd_v32hf_mask_round, "__builtin_ia32_vfnmaddph512_mask", IX86_BUILTIN_VFNMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmadd_v32hf_mask3_round, "__builtin_ia32_vfnmaddph512_mask3", IX86_BUILTIN_VFNMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmadd_v32hf_maskz_round, "__builtin_ia32_vfnmaddph512_maskz", IX86_BUILTIN_VFNMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsub_v32hf_mask_round, "__builtin_ia32_vfmsubph512_mask", IX86_BUILTIN_VFMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsub_v32hf_mask3_round, "__builtin_ia32_vfmsubph512_mask3", IX86_BUILTIN_VFMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round, "__builtin_ia32_vfmsubph512_maskz", IX86_BUILTIN_VFMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask_round, "__builtin_ia32_vfmaddsh3_mask", IX86_BUILTIN_VFMADDSH3_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask3_round, "__builtin_ia32_vfmaddsh3_mask3", IX86_BUILTIN_VFMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_maskz_round, "__builtin_ia32_vfmaddsh3_maskz", IX86_BUILTIN_VFMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
@@ -3318,18 +3318,18 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask_round
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask3_round, "__builtin_ia32_vfnmaddsh3_mask3", IX86_BUILTIN_VFNMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_maskz_round, "__builtin_ia32_vfnmaddsh3_maskz", IX86_BUILTIN_VFNMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmsub_v8hf_mask3_round, "__builtin_ia32_vfmsubsh3_mask3", IX86_BUILTIN_VFMSUBSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v32hf_round, "__builtin_ia32_vfmaddcph512_round", IX86_BUILTIN_VFMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_mask1_round, "__builtin_ia32_vfmaddcph512_mask_round", IX86_BUILTIN_VFMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_mask_round, "__builtin_ia32_vfmaddcph512_mask3_round", IX86_BUILTIN_VFMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_maskz_round, "__builtin_ia32_vfmaddcph512_maskz_round", IX86_BUILTIN_VFMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v32hf_round, "__builtin_ia32_vfcmaddcph512_round", IX86_BUILTIN_VFCMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_mask1_round, "__builtin_ia32_vfcmaddcph512_mask_round", IX86_BUILTIN_VFCMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_mask_round, "__builtin_ia32_vfcmaddcph512_mask3_round", IX86_BUILTIN_VFCMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_maskz_round, "__builtin_ia32_vfcmaddcph512_maskz_round", IX86_BUILTIN_VFCMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__builtin_ia32_vfcmulcph512_round", IX86_BUILTIN_VFCMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph512_mask_round", IX86_BUILTIN_VFCMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph512_round", IX86_BUILTIN_VFMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
-BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph512_mask_round", IX86_BUILTIN_VFMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_fma_fmaddc_v32hf_round, "__builtin_ia32_vfmaddcph512_round", IX86_BUILTIN_VFMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddc_v32hf_mask1_round, "__builtin_ia32_vfmaddcph512_mask_round", IX86_BUILTIN_VFMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddc_v32hf_mask_round, "__builtin_ia32_vfmaddcph512_mask3_round", IX86_BUILTIN_VFMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmaddc_v32hf_maskz_round, "__builtin_ia32_vfmaddcph512_maskz_round", IX86_BUILTIN_VFMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_fma_fcmaddc_v32hf_round, "__builtin_ia32_vfcmaddcph512_round", IX86_BUILTIN_VFCMADDCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmaddc_v32hf_mask1_round, "__builtin_ia32_vfcmaddcph512_mask_round", IX86_BUILTIN_VFCMADDCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmaddc_v32hf_mask_round, "__builtin_ia32_vfcmaddcph512_mask3_round", IX86_BUILTIN_VFCMADDCPH512_MASK3_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmaddc_v32hf_maskz_round, "__builtin_ia32_vfcmaddcph512_maskz_round", IX86_BUILTIN_VFCMADDCPH512_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__builtin_ia32_vfcmulcph512_round", IX86_BUILTIN_VFCMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph512_mask_round", IX86_BUILTIN_VFCMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph512_round", IX86_BUILTIN_VFMULCPH512_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_EVEX512, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph512_mask_round", IX86_BUILTIN_VFMULCPH512_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fma_fcmaddcsh_v8hf_round, "__builtin_ia32_vfcmaddcsh_round", IX86_BUILTIN_VFCMADDCSH_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_mask1_round, "__builtin_ia32_vfcmaddcsh_mask_round", IX86_BUILTIN_VFCMADDCSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_mask3_round, "__builtin_ia32_vfcmaddcsh_mask3_round", IX86_BUILTIN_VFCMADDCSH_MASK3_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 12/18] Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (10 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 11/18] [PATCH 5/5] " Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 13/18] Support -mevex512 for AVX512F intrins Hu, Lin1
` (7 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
Disable zmm broadcast for !TARGET_EVEX512.
* config/i386/i386-options.cc (ix86_option_override_internal):
Do not use PVW_512 when no-evex512.
(ix86_simd_clone_adjust): Add evex512 target into string.
* config/i386/i386.cc (type_natural_mode): Report ABI warning
when using zmm register w/o evex512.
(ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512.
(ix86_hard_regno_mode_ok): Ditto.
(ix86_set_reg_reg_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_vector_mode_supported_p): Ditto.
(ix86_preferred_simd_mode): Ditto.
(ix86_get_mask_mode): Ditto.
(ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit
libmvec call when !TARGET_EVEX512.
(ix86_simd_clone_usable): Ditto.
* config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment
when !TARGET_EVEX512
(MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512.
(STORE_MAX_PIECES): Ditto.
---
gcc/config/i386/i386-expand.cc | 1 +
gcc/config/i386/i386-options.cc | 14 +++++----
gcc/config/i386/i386.cc | 53 ++++++++++++++++++---------------
gcc/config/i386/i386.h | 7 +++--
4 files changed, 42 insertions(+), 33 deletions(-)
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index e42ff27c6ef..6eedcb384c0 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -611,6 +611,7 @@ ix86_broadcast_from_constant (machine_mode mode, rtx op)
avx512 embed broadcast is available. */
if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT
&& (!TARGET_AVX512F
+ || (GET_MODE_SIZE (mode) == 64 && !TARGET_EVEX512)
|| (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL)))
return nullptr;
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index a1a7a92da9f..e2a90d7d9e2 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -2845,7 +2845,8 @@ ix86_option_override_internal (bool main_args_p,
opts->x_ix86_move_max = opts->x_prefer_vector_width_type;
if (opts_set->x_ix86_move_max == PVW_NONE)
{
- if (TARGET_AVX512F_P (opts->x_ix86_isa_flags))
+ if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
+ && TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
opts->x_ix86_move_max = PVW_AVX512;
else
opts->x_ix86_move_max = PVW_AVX128;
@@ -2866,7 +2867,8 @@ ix86_option_override_internal (bool main_args_p,
opts->x_ix86_store_max = opts->x_prefer_vector_width_type;
if (opts_set->x_ix86_store_max == PVW_NONE)
{
- if (TARGET_AVX512F_P (opts->x_ix86_isa_flags))
+ if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
+ && TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
opts->x_ix86_store_max = PVW_AVX512;
else
opts->x_ix86_store_max = PVW_AVX128;
@@ -3145,13 +3147,13 @@ ix86_simd_clone_adjust (struct cgraph_node *node)
case 'e':
if (TARGET_PREFER_AVX256)
{
- if (!TARGET_AVX512F)
- str = "avx512f,prefer-vector-width=512";
+ if (!TARGET_AVX512F || !TARGET_EVEX512)
+ str = "avx512f,evex512,prefer-vector-width=512";
else
str = "prefer-vector-width=512";
}
- else if (!TARGET_AVX512F)
- str = "avx512f";
+ else if (!TARGET_AVX512F || !TARGET_EVEX512)
+ str = "avx512f,evex512";
break;
default:
gcc_unreachable ();
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 477e6cecc38..0df3bf10547 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1924,7 +1924,8 @@ type_natural_mode (const_tree type, const CUMULATIVE_ARGS *cum,
if (GET_MODE_NUNITS (mode) == TYPE_VECTOR_SUBPARTS (type)
&& GET_MODE_INNER (mode) == innermode)
{
- if (size == 64 && !TARGET_AVX512F && !TARGET_IAMCU)
+ if (size == 64 && (!TARGET_AVX512F || !TARGET_EVEX512)
+ && !TARGET_IAMCU)
{
static bool warnedavx512f;
static bool warnedavx512f_ret;
@@ -4347,7 +4348,7 @@ ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED)
/* AVX512F values are returned in ZMM0 if available. */
if (size == 64)
- return !TARGET_AVX512F;
+ return !TARGET_AVX512F || !TARGET_EVEX512;
}
if (mode == XFmode)
@@ -20286,7 +20287,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
- any of 512-bit wide vector mode
- any scalar mode. */
if (TARGET_AVX512F
- && (VALID_AVX512F_REG_OR_XI_MODE (mode)
+ && ((VALID_AVX512F_REG_OR_XI_MODE (mode) && TARGET_EVEX512)
|| VALID_AVX512F_SCALAR_MODE (mode)))
return true;
@@ -20538,7 +20539,7 @@ ix86_set_reg_reg_cost (machine_mode mode)
case MODE_VECTOR_INT:
case MODE_VECTOR_FLOAT:
- if ((TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
+ if ((TARGET_AVX512F && TARGET_EVEX512 && VALID_AVX512F_REG_MODE (mode))
|| (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
|| (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
|| (TARGET_SSE && VALID_SSE_REG_MODE (mode))
@@ -21267,7 +21268,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
{
/* (ior (not ...) ...) can be a single insn in AVX512. */
if (GET_CODE (XEXP (x, 0)) == NOT && TARGET_AVX512F
- && (GET_MODE_SIZE (mode) == 64
+ && ((TARGET_EVEX512
+ && GET_MODE_SIZE (mode) == 64)
|| (TARGET_AVX512VL
&& (GET_MODE_SIZE (mode) == 32
|| GET_MODE_SIZE (mode) == 16))))
@@ -21315,7 +21317,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
/* (and (not ...) (not ...)) can be a single insn in AVX512. */
if (GET_CODE (right) == NOT && TARGET_AVX512F
- && (GET_MODE_SIZE (mode) == 64
+ && ((TARGET_EVEX512
+ && GET_MODE_SIZE (mode) == 64)
|| (TARGET_AVX512VL
&& (GET_MODE_SIZE (mode) == 32
|| GET_MODE_SIZE (mode) == 16))))
@@ -21385,7 +21388,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
{
/* (not (xor ...)) can be a single insn in AVX512. */
if (GET_CODE (XEXP (x, 0)) == XOR && TARGET_AVX512F
- && (GET_MODE_SIZE (mode) == 64
+ && ((TARGET_EVEX512
+ && GET_MODE_SIZE (mode) == 64)
|| (TARGET_AVX512VL
&& (GET_MODE_SIZE (mode) == 32
|| GET_MODE_SIZE (mode) == 16))))
@@ -23000,7 +23004,7 @@ ix86_vector_mode_supported_p (machine_mode mode)
return true;
if (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
return true;
- if (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
+ if (TARGET_AVX512F && TARGET_EVEX512 && VALID_AVX512F_REG_MODE (mode))
return true;
if ((TARGET_MMX || TARGET_MMX_WITH_SSE)
&& VALID_MMX_REG_MODE (mode))
@@ -23690,7 +23694,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
switch (mode)
{
case E_QImode:
- if (TARGET_AVX512BW && !TARGET_PREFER_AVX256)
+ if (TARGET_AVX512BW && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
return V64QImode;
else if (TARGET_AVX && !TARGET_PREFER_AVX128)
return V32QImode;
@@ -23698,7 +23702,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
return V16QImode;
case E_HImode:
- if (TARGET_AVX512BW && !TARGET_PREFER_AVX256)
+ if (TARGET_AVX512BW && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
return V32HImode;
else if (TARGET_AVX && !TARGET_PREFER_AVX128)
return V16HImode;
@@ -23706,7 +23710,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
return V8HImode;
case E_SImode:
- if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+ if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
return V16SImode;
else if (TARGET_AVX && !TARGET_PREFER_AVX128)
return V8SImode;
@@ -23714,7 +23718,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
return V4SImode;
case E_DImode:
- if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+ if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
return V8DImode;
else if (TARGET_AVX && !TARGET_PREFER_AVX128)
return V4DImode;
@@ -23728,15 +23732,16 @@ ix86_preferred_simd_mode (scalar_mode mode)
{
if (TARGET_PREFER_AVX128)
return V8HFmode;
- else if (TARGET_PREFER_AVX256)
+ else if (TARGET_PREFER_AVX256 || !TARGET_EVEX512)
return V16HFmode;
}
- return V32HFmode;
+ if (TARGET_EVEX512)
+ return V32HFmode;
}
return word_mode;
case E_SFmode:
- if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+ if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
return V16SFmode;
else if (TARGET_AVX && !TARGET_PREFER_AVX128)
return V8SFmode;
@@ -23744,7 +23749,7 @@ ix86_preferred_simd_mode (scalar_mode mode)
return V4SFmode;
case E_DFmode:
- if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+ if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
return V8DFmode;
else if (TARGET_AVX && !TARGET_PREFER_AVX128)
return V4DFmode;
@@ -23764,13 +23769,13 @@ ix86_preferred_simd_mode (scalar_mode mode)
static unsigned int
ix86_autovectorize_vector_modes (vector_modes *modes, bool all)
{
- if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
+ if (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)
{
modes->safe_push (V64QImode);
modes->safe_push (V32QImode);
modes->safe_push (V16QImode);
}
- else if (TARGET_AVX512F && all)
+ else if (TARGET_AVX512F && TARGET_EVEX512 && all)
{
modes->safe_push (V32QImode);
modes->safe_push (V16QImode);
@@ -23808,7 +23813,7 @@ ix86_get_mask_mode (machine_mode data_mode)
unsigned elem_size = vector_size / nunits;
/* Scalar mask case. */
- if ((TARGET_AVX512F && vector_size == 64)
+ if ((TARGET_AVX512F && TARGET_EVEX512 && vector_size == 64)
|| (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)))
{
if (elem_size == 4
@@ -24306,7 +24311,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
{
/* If the function isn't exported, we can pick up just one ISA
for the clones. */
- if (TARGET_AVX512F)
+ if (TARGET_AVX512F && TARGET_EVEX512)
clonei->vecsize_mangle = 'e';
else if (TARGET_AVX2)
clonei->vecsize_mangle = 'd';
@@ -24398,17 +24403,17 @@ ix86_simd_clone_usable (struct cgraph_node *node)
return -1;
if (!TARGET_AVX)
return 0;
- return TARGET_AVX512F ? 3 : TARGET_AVX2 ? 2 : 1;
+ return (TARGET_AVX512F && TARGET_EVEX512) ? 3 : TARGET_AVX2 ? 2 : 1;
case 'c':
if (!TARGET_AVX)
return -1;
- return TARGET_AVX512F ? 2 : TARGET_AVX2 ? 1 : 0;
+ return (TARGET_AVX512F && TARGET_EVEX512) ? 2 : TARGET_AVX2 ? 1 : 0;
case 'd':
if (!TARGET_AVX2)
return -1;
- return TARGET_AVX512F ? 1 : 0;
+ return (TARGET_AVX512F && TARGET_EVEX512) ? 1 : 0;
case 'e':
- if (!TARGET_AVX512F)
+ if (!TARGET_AVX512F || !TARGET_EVEX512)
return -1;
return 0;
default:
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3e8488f2ae8..aac972f5caf 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -770,7 +770,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
TARGET_ABSOLUTE_BIGGEST_ALIGNMENT. */
#define BIGGEST_ALIGNMENT \
- (TARGET_IAMCU ? 32 : (TARGET_AVX512F ? 512 : (TARGET_AVX ? 256 : 128)))
+ (TARGET_IAMCU ? 32 : ((TARGET_AVX512F && TARGET_EVEX512) \
+ ? 512 : (TARGET_AVX ? 256 : 128)))
/* Maximum stack alignment. */
#define MAX_STACK_ALIGNMENT MAX_OFILE_ALIGNMENT
@@ -1807,7 +1808,7 @@ typedef struct ix86_args {
MOVE_MAX_PIECES defaults to MOVE_MAX. */
#define MOVE_MAX \
- ((TARGET_AVX512F \
+ ((TARGET_AVX512F && TARGET_EVEX512\
&& (ix86_move_max == PVW_AVX512 \
|| ix86_store_max == PVW_AVX512)) \
? 64 \
@@ -1826,7 +1827,7 @@ typedef struct ix86_args {
store_by_pieces of 16/32/64 bytes. */
#define STORE_MAX_PIECES \
(TARGET_INTER_UNIT_MOVES_TO_VEC \
- ? ((TARGET_AVX512F && ix86_store_max == PVW_AVX512) \
+ ? ((TARGET_AVX512F && TARGET_EVEX512 && ix86_store_max == PVW_AVX512) \
? 64 \
: ((TARGET_AVX \
&& ix86_store_max >= PVW_AVX256) \
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 13/18] Support -mevex512 for AVX512F intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (11 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 12/18] Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 14/18] Support -mevex512 for AVX512DQ intrins Hu, Lin1
` (6 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/i386-builtins.cc
(ix86_vectorize_builtin_gather): Disable 512 bit gather
when !TARGET_EVEX512.
* config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode):
Add TARGET_EVEX512.
(ix86_expand_int_sse_cmp): Ditto.
(ix86_expand_vector_init_one_nonzero): Disable subroutine
when !TARGET_EVEX512.
(ix86_emit_swsqrtsf): Add TARGET_EVEX512.
(ix86_vectorize_vec_perm_const): Disable subroutine when
!TARGET_EVEX512.
* config/i386/i386.cc
(standard_sse_constant_p): Add TARGET_EVEX512.
(standard_sse_constant_opcode): Ditto.
(ix86_get_ssemov): Ditto.
(ix86_legitimate_constant_p): Ditto.
(ix86_vectorize_builtin_scatter): Diable 512 bit scatter
when !TARGET_EVEX512.
* config/i386/i386.md (avx512f_512): New.
(movxi): Add TARGET_EVEX512.
(*movxi_internal_avx512f): Ditto.
(*movdi_internal): Change alternative 12 to ?Yv. Adjust mode
for alternative 13.
(*movsi_internal): Change alternative 8 to ?Yv. Adjust mode for
alternative 9.
(*movhi_internal): Change alternative 11 to *Yv.
(*movdf_internal): Change alternative 12 to Yv.
(*movsf_internal): Change alternative 5 to Yv. Adjust mode for
alternative 5 and 6.
(*mov<mode>_internal): Change alternative 4 to Yv.
(define_split for convert SF to DF): Add TARGET_EVEX512.
(extendbfsf2_1): Ditto.
* config/i386/predicates.md (bcst_mem_operand): Disable predicate
for 512 bit when !TARGET_EVEX512.
* config/i386/sse.md (VMOVE): Add TARGET_EVEX512.
(V48_AVX512VL): Ditto.
(V48_256_512_AVX512VL): Ditto.
(V48H_AVX512VL): Ditto.
(VI12_AVX512VL): Ditto.
(V): Ditto.
(V_512): Ditto.
(V_256_512): Ditto.
(VF): Ditto.
(VF1_VF2_AVX512DQ): Ditto.
(VFH): Ditto.
(VFB): Ditto.
(VF1): Ditto.
(VF1_AVX2): Ditto.
(VF2): Ditto.
(VF2H): Ditto.
(VF2_512_256): Ditto.
(VF2_512_256VL): Ditto.
(VF_512): Ditto.
(VFB_512): Ditto.
(VI48_AVX512VL): Ditto.
(VI1248_AVX512VLBW): Ditto.
(VF_AVX512VL): Ditto.
(VFH_AVX512VL): Ditto.
(VF1_AVX512VL): Ditto.
(VI): Ditto.
(VIHFBF): Ditto.
(VI_AVX2): Ditto.
(VI8): Ditto.
(VI8_AVX512VL): Ditto.
(VI2_AVX512F): Ditto.
(VI4_AVX512F): Ditto.
(VI4_AVX512VL): Ditto.
(VI48_AVX512F_AVX512VL): Ditto.
(VI8_AVX2_AVX512F): Ditto.
(VI8_AVX_AVX512F): Ditto.
(V8FI): Ditto.
(V16FI): Ditto.
(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
(VI248_AVX512VLBW): Ditto.
(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
(VI248_AVX512BW): Ditto.
(VI248_AVX512BW_AVX512VL): Ditto.
(VI48_AVX512F): Ditto.
(VI48_AVX_AVX512F): Ditto.
(VI12_AVX_AVX512F): Ditto.
(VI148_512): Ditto.
(VI124_256_AVX512F_AVX512BW): Ditto.
(VI48_512): Ditto.
(VI_AVX512BW): Ditto.
(VIHFBF_AVX512BW): Ditto.
(VI4F_256_512): Ditto.
(VI48F_256_512): Ditto.
(VI48F): Ditto.
(VI12_VI48F_AVX512VL): Ditto.
(V32_512): Ditto.
(AVX512MODE2P): Ditto.
(STORENT_MODE): Ditto.
(REDUC_PLUS_MODE): Ditto.
(REDUC_SMINMAX_MODE): Ditto.
(*andnot<mode>3): Change isa attribute to avx512f_512.
(*andnot<mode>3): Ditto.
(<code><mode>3): Ditto.
(<code>tf3): Ditto.
(FMAMODEM): Add TARGET_EVEX512.
(FMAMODE_AVX512): Ditto.
(VFH_SF_AVX512VL): Ditto.
(avx512f_fix_notruncv16sfv16si<mask_name><round_name>): Ditto.
(fix<fixunssuffix>_truncv16sfv16si2<mask_name><round_saeonly_name>):
Ditto.
(avx512f_cvtdq2pd512_2): Ditto.
(avx512f_cvtpd2dq512<mask_name><round_name>): Ditto.
(fix<fixunssuffix>_truncv8dfv8si2<mask_name><round_saeonly_name>):
Ditto.
(<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>): Ditto.
(vec_unpacks_lo_v16sf): Ditto.
(vec_unpacks_hi_v16sf): Ditto.
(vec_unpacks_float_hi_v16si): Ditto.
(vec_unpacks_float_lo_v16si): Ditto.
(vec_unpacku_float_hi_v16si): Ditto.
(vec_unpacku_float_lo_v16si): Ditto.
(vec_pack_sfix_trunc_v8df): Ditto.
(avx512f_vec_pack_sfix_v8df): Ditto.
(<mask_codefor>avx512f_unpckhps512<mask_name>): Ditto.
(<mask_codefor>avx512f_unpcklps512<mask_name>): Ditto.
(<mask_codefor>avx512f_movshdup512<mask_name>): Ditto.
(<mask_codefor>avx512f_movsldup512<mask_name>): Ditto.
(AVX512_VEC): Ditto.
(AVX512_VEC_2): Ditto.
(vec_extract_lo_v64qi): Ditto.
(vec_extract_hi_v64qi): Ditto.
(VEC_EXTRACT_MODE): Ditto.
(<mask_codefor>avx512f_unpckhpd512<mask_name>): Ditto.
(avx512f_movddup512<mask_name>): Ditto.
(avx512f_unpcklpd512<mask_name>): Ditto.
(*<avx512>_vternlog<mode>_all): Ditto.
(*<avx512>_vpternlog<mode>_1): Ditto.
(*<avx512>_vpternlog<mode>_2): Ditto.
(*<avx512>_vpternlog<mode>_3): Ditto.
(avx512f_shufps512_mask): Ditto.
(avx512f_shufps512_1<mask_name>): Ditto.
(avx512f_shufpd512_mask): Ditto.
(avx512f_shufpd512_1<mask_name>): Ditto.
(<mask_codefor>avx512f_interleave_highv8di<mask_name>): Ditto.
(<mask_codefor>avx512f_interleave_lowv8di<mask_name>): Ditto.
(vec_dupv2df<mask_name>): Ditto.
(trunc<pmov_src_lower><mode>2): Ditto.
(*avx512f_<code><pmov_src_lower><mode>2): Ditto.
(*avx512f_vpermvar_truncv8div8si_1): Ditto.
(avx512f_<code><pmov_src_lower><mode>2_mask): Ditto.
(avx512f_<code><pmov_src_lower><mode>2_mask_store): Ditto.
(truncv8div8qi2): Ditto.
(avx512f_<code>v8div16qi2): Ditto.
(*avx512f_<code>v8div16qi2_store_1): Ditto.
(*avx512f_<code>v8div16qi2_store_2): Ditto.
(avx512f_<code>v8div16qi2_mask): Ditto.
(*avx512f_<code>v8div16qi2_mask_1): Ditto.
(*avx512f_<code>v8div16qi2_mask_store_1): Ditto.
(avx512f_<code>v8div16qi2_mask_store_2): Ditto.
(vec_widen_umult_even_v16si<mask_name>): Ditto.
(*vec_widen_umult_even_v16si<mask_name>): Ditto.
(vec_widen_smult_even_v16si<mask_name>): Ditto.
(*vec_widen_smult_even_v16si<mask_name>): Ditto.
(VEC_PERM_AVX2): Ditto.
(one_cmpl<mode>2): Ditto.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Ditto.
(*one_cmpl<mode>2_pternlog_false_dep): Ditto.
(define_split to xor): Ditto.
(*andnot<mode>3): Ditto.
(define_split for ior): Ditto.
(*iornot<mode>3): Ditto.
(*xnor<mode>3): Ditto.
(*<nlogic><mode>3): Ditto.
(<mask_codefor>avx512f_interleave_highv16si<mask_name>): Ditto.
(<mask_codefor>avx512f_interleave_lowv16si<mask_name>): Ditto.
(avx512f_pshufdv3_mask): Ditto.
(avx512f_pshufd_1<mask_name>): Ditto.
(*vec_extractv4ti): Ditto.
(VEXTRACTI128_MODE): Ditto.
(define_split to vec_extract): Ditto.
(VI1248_AVX512VL_AVX512BW): Ditto.
(<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>): Ditto.
(<insn>v16qiv16si2): Ditto.
(avx512f_<code>v16hiv16si2<mask_name>): Ditto.
(<insn>v16hiv16si2): Ditto.
(avx512f_zero_extendv16hiv16si2_1): Ditto.
(avx512f_<code>v8qiv8di2<mask_name>): Ditto.
(*avx512f_<code>v8qiv8di2<mask_name>_1): Ditto.
(*avx512f_<code>v8qiv8di2<mask_name>_2): Ditto.
(<insn>v8qiv8di2): Ditto.
(avx512f_<code>v8hiv8di2<mask_name>): Ditto.
(<insn>v8hiv8di2): Ditto.
(avx512f_<code>v8siv8di2<mask_name>): Ditto.
(*avx512f_zero_extendv8siv8di2_1): Ditto.
(*avx512f_zero_extendv8siv8di2_2): Ditto.
(<insn>v8siv8di2): Ditto.
(avx512f_roundps512_sfix): Ditto.
(vashrv8di3): Ditto.
(vashrv16si3): Ditto.
(pbroadcast_evex_isa): Change isa attribute to avx512f_512.
(vec_dupv4sf): Add TARGET_EVEX512.
(*vec_dupv4si): Ditto.
(*vec_dupv2di): Ditto.
(vec_dup<mode>): Change isa attribute to avx512f_512.
(VPERMI2): Add TARGET_EVEX512.
(VPERMI2I): Ditto.
(VEC_INIT_MODE): Ditto.
(VEC_INIT_HALF_MODE): Ditto.
(<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>):
Ditto.
(avx512f_vcvtps2ph512_mask_sae): Ditto.
(<mask_codefor>avx512f_vcvtps2ph512<mask_name><round_saeonly_name>):
Ditto.
(*avx512f_vcvtps2ph512<merge_mask_name>): Ditto.
(INT_BROADCAST_MODE): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr89229-5b.c: Modify message of
scan-assembler.
* gcc.target/i386/pr89229-6b.c: Ditto.
* gcc.target/i386/pr89229-7b.c: Ditto.
---
gcc/config/i386/i386-builtins.cc | 24 +-
gcc/config/i386/i386-expand.cc | 12 +-
gcc/config/i386/i386.cc | 101 ++--
gcc/config/i386/i386.md | 81 ++-
gcc/config/i386/predicates.md | 3 +-
gcc/config/i386/sse.md | 553 +++++++++++----------
gcc/testsuite/gcc.target/i386/pr89229-5b.c | 2 +-
gcc/testsuite/gcc.target/i386/pr89229-6b.c | 2 +-
gcc/testsuite/gcc.target/i386/pr89229-7b.c | 2 +-
9 files changed, 445 insertions(+), 335 deletions(-)
diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index e1d1dac2ba2..70538fbe17b 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -1675,6 +1675,10 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype,
{
bool si;
enum ix86_builtins code;
+ const machine_mode mode = TYPE_MODE (TREE_TYPE (mem_vectype));
+
+ if ((!TARGET_AVX512F || !TARGET_EVEX512) && GET_MODE_SIZE (mode) == 64)
+ return NULL_TREE;
if (! TARGET_AVX2
|| (known_eq (TYPE_VECTOR_SUBPARTS (mem_vectype), 2u)
@@ -1755,28 +1759,16 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype,
code = si ? IX86_BUILTIN_GATHERSIV8SI : IX86_BUILTIN_GATHERALTDIV8SI;
break;
case E_V8DFmode:
- if (TARGET_AVX512F)
- code = si ? IX86_BUILTIN_GATHER3ALTSIV8DF : IX86_BUILTIN_GATHER3DIV8DF;
- else
- return NULL_TREE;
+ code = si ? IX86_BUILTIN_GATHER3ALTSIV8DF : IX86_BUILTIN_GATHER3DIV8DF;
break;
case E_V8DImode:
- if (TARGET_AVX512F)
- code = si ? IX86_BUILTIN_GATHER3ALTSIV8DI : IX86_BUILTIN_GATHER3DIV8DI;
- else
- return NULL_TREE;
+ code = si ? IX86_BUILTIN_GATHER3ALTSIV8DI : IX86_BUILTIN_GATHER3DIV8DI;
break;
case E_V16SFmode:
- if (TARGET_AVX512F)
- code = si ? IX86_BUILTIN_GATHER3SIV16SF : IX86_BUILTIN_GATHER3ALTDIV16SF;
- else
- return NULL_TREE;
+ code = si ? IX86_BUILTIN_GATHER3SIV16SF : IX86_BUILTIN_GATHER3ALTDIV16SF;
break;
case E_V16SImode:
- if (TARGET_AVX512F)
- code = si ? IX86_BUILTIN_GATHER3SIV16SI : IX86_BUILTIN_GATHER3ALTDIV16SI;
- else
- return NULL_TREE;
+ code = si ? IX86_BUILTIN_GATHER3SIV16SI : IX86_BUILTIN_GATHER3ALTDIV16SI;
break;
default:
return NULL_TREE;
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 6eedcb384c0..0705e08d38c 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -3943,7 +3943,7 @@ ix86_valid_mask_cmp_mode (machine_mode mode)
if ((inner_mode == QImode || inner_mode == HImode) && !TARGET_AVX512BW)
return false;
- return vector_size == 64 || TARGET_AVX512VL;
+ return (vector_size == 64 && TARGET_EVEX512) || TARGET_AVX512VL;
}
/* Return true if integer mask comparison should be used. */
@@ -4773,7 +4773,7 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1,
&& GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4
/* Don't do it if not using integer masks and we'd end up with
the right values in the registers though. */
- && (GET_MODE_SIZE (mode) == 64
+ && ((GET_MODE_SIZE (mode) == 64 && TARGET_EVEX512)
|| !vector_all_ones_operand (optrue, data_mode)
|| opfalse != CONST0_RTX (data_mode))))
{
@@ -15668,6 +15668,9 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
bool use_vector_set = false;
rtx (*gen_vec_set_0) (rtx, rtx, rtx) = NULL;
+ if (GET_MODE_SIZE (mode) == 64 && !TARGET_EVEX512)
+ return false;
+
switch (mode)
{
case E_V2DImode:
@@ -18288,7 +18291,7 @@ ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode mode, bool recip)
unsigned vector_size = GET_MODE_SIZE (mode);
if (TARGET_FMA
- || (TARGET_AVX512F && vector_size == 64)
+ || (TARGET_AVX512F && TARGET_EVEX512 && vector_size == 64)
|| (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)))
emit_insn (gen_rtx_SET (e2,
gen_rtx_FMA (mode, e0, x0, mthree)));
@@ -23005,6 +23008,9 @@ ix86_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode,
unsigned int i, nelt, which;
bool two_args;
+ if (GET_MODE_SIZE (vmode) == 64 && !TARGET_EVEX512)
+ return false;
+
/* For HF mode vector, convert it to HI using subreg. */
if (GET_MODE_INNER (vmode) == HFmode)
{
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 0df3bf10547..635dd85e764 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5263,7 +5263,7 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
switch (GET_MODE_SIZE (mode))
{
case 64:
- if (TARGET_AVX512F)
+ if (TARGET_AVX512F && TARGET_EVEX512)
return 2;
break;
case 32:
@@ -5313,9 +5313,14 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
case MODE_XI:
case MODE_OI:
if (EXT_REX_SSE_REG_P (operands[0]))
- return (TARGET_AVX512VL
- ? "vpxord\t%x0, %x0, %x0"
- : "vpxord\t%g0, %g0, %g0");
+ {
+ if (TARGET_AVX512VL)
+ return "vpxord\t%x0, %x0, %x0";
+ else if (TARGET_EVEX512)
+ return "vpxord\t%g0, %g0, %g0";
+ else
+ gcc_unreachable ();
+ }
return "vpxor\t%x0, %x0, %x0";
case MODE_V2DF:
@@ -5324,16 +5329,23 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
/* FALLTHRU */
case MODE_V8DF:
case MODE_V4DF:
- if (!EXT_REX_SSE_REG_P (operands[0]))
- return "vxorpd\t%x0, %x0, %x0";
- else if (TARGET_AVX512DQ)
- return (TARGET_AVX512VL
- ? "vxorpd\t%x0, %x0, %x0"
- : "vxorpd\t%g0, %g0, %g0");
- else
- return (TARGET_AVX512VL
- ? "vpxorq\t%x0, %x0, %x0"
- : "vpxorq\t%g0, %g0, %g0");
+ if (EXT_REX_SSE_REG_P (operands[0]))
+ {
+ if (TARGET_AVX512DQ)
+ return (TARGET_AVX512VL
+ ? "vxorpd\t%x0, %x0, %x0"
+ : "vxorpd\t%g0, %g0, %g0");
+ else
+ {
+ if (TARGET_AVX512VL)
+ return "vpxorq\t%x0, %x0, %x0";
+ else if (TARGET_EVEX512)
+ return "vpxorq\t%g0, %g0, %g0";
+ else
+ gcc_unreachable ();
+ }
+ }
+ return "vxorpd\t%x0, %x0, %x0";
case MODE_V4SF:
if (!EXT_REX_SSE_REG_P (operands[0]))
@@ -5341,16 +5353,23 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
/* FALLTHRU */
case MODE_V16SF:
case MODE_V8SF:
- if (!EXT_REX_SSE_REG_P (operands[0]))
- return "vxorps\t%x0, %x0, %x0";
- else if (TARGET_AVX512DQ)
- return (TARGET_AVX512VL
- ? "vxorps\t%x0, %x0, %x0"
- : "vxorps\t%g0, %g0, %g0");
- else
- return (TARGET_AVX512VL
- ? "vpxord\t%x0, %x0, %x0"
- : "vpxord\t%g0, %g0, %g0");
+ if (EXT_REX_SSE_REG_P (operands[0]))
+ {
+ if (TARGET_AVX512DQ)
+ return (TARGET_AVX512VL
+ ? "vxorps\t%x0, %x0, %x0"
+ : "vxorps\t%g0, %g0, %g0");
+ else
+ {
+ if (TARGET_AVX512VL)
+ return "vpxord\t%x0, %x0, %x0";
+ else if (TARGET_EVEX512)
+ return "vpxord\t%g0, %g0, %g0";
+ else
+ gcc_unreachable ();
+ }
+ }
+ return "vxorps\t%x0, %x0, %x0";
default:
gcc_unreachable ();
@@ -5368,7 +5387,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
case MODE_XI:
case MODE_V8DF:
case MODE_V16SF:
- gcc_assert (TARGET_AVX512F);
+ gcc_assert (TARGET_AVX512F && TARGET_EVEX512);
return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
case MODE_OI:
@@ -5380,14 +5399,18 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
case MODE_V2DF:
case MODE_V4SF:
gcc_assert (TARGET_SSE2);
- if (!EXT_REX_SSE_REG_P (operands[0]))
- return (TARGET_AVX
- ? "vpcmpeqd\t%0, %0, %0"
- : "pcmpeqd\t%0, %0");
- else if (TARGET_AVX512VL)
- return "vpternlogd\t{$0xFF, %0, %0, %0|%0, %0, %0, 0xFF}";
- else
- return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
+ if (EXT_REX_SSE_REG_P (operands[0]))
+ {
+ if (TARGET_AVX512VL)
+ return "vpternlogd\t{$0xFF, %0, %0, %0|%0, %0, %0, 0xFF}";
+ else if (TARGET_EVEX512)
+ return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
+ else
+ gcc_unreachable ();
+ }
+ return (TARGET_AVX
+ ? "vpcmpeqd\t%0, %0, %0"
+ : "pcmpeqd\t%0, %0");
default:
gcc_unreachable ();
@@ -5397,7 +5420,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
{
if (GET_MODE_SIZE (mode) == 64)
{
- gcc_assert (TARGET_AVX512F);
+ gcc_assert (TARGET_AVX512F && TARGET_EVEX512);
return "vpcmpeqd\t%t0, %t0, %t0";
}
else if (GET_MODE_SIZE (mode) == 32)
@@ -5409,7 +5432,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
}
else if (vector_all_ones_zero_extend_quarter_operand (x, mode))
{
- gcc_assert (TARGET_AVX512F);
+ gcc_assert (TARGET_AVX512F && TARGET_EVEX512);
return "vpcmpeqd\t%x0, %x0, %x0";
}
@@ -5511,6 +5534,8 @@ ix86_get_ssemov (rtx *operands, unsigned size,
|| memory_operand (operands[1], mode))
gcc_unreachable ();
size = 64;
+ /* We need TARGET_EVEX512 to move into zmm register. */
+ gcc_assert (TARGET_EVEX512);
switch (type)
{
case opcode_int:
@@ -10727,7 +10752,7 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
case E_OImode:
case E_XImode:
if (!standard_sse_constant_p (x, mode)
- && GET_MODE_SIZE (TARGET_AVX512F
+ && GET_MODE_SIZE (TARGET_AVX512F && TARGET_EVEX512
? XImode
: (TARGET_AVX
? OImode
@@ -19195,10 +19220,14 @@ ix86_vectorize_builtin_scatter (const_tree vectype,
{
bool si;
enum ix86_builtins code;
+ const machine_mode mode = TYPE_MODE (TREE_TYPE (vectype));
if (!TARGET_AVX512F)
return NULL_TREE;
+ if (!TARGET_EVEX512 && GET_MODE_SIZE (mode) == 64)
+ return NULL_TREE;
+
if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 2u)
? !TARGET_USE_SCATTER_2PARTS
: (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eef8a0e01eb..6eb4e540140 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -535,10 +535,11 @@
(define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
x64_avx,x64_avx512bw,x64_avx512dq,aes,
sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
- avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
- avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
- avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,
- avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl"
+ avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
+ noavx512f,avx512bw,noavx512bw,avx512dq,noavx512dq,
+ fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,avx512vnnivl,
+ avx512fp16,avxifma,avx512ifmavl,avxneconvert,avx512bf16vl,
+ vpclmulqdqvl"
(const_string "base"))
;; The (bounding maximum) length of an instruction immediate.
@@ -899,6 +900,8 @@
(eq_attr "isa" "fma_or_avx512vl")
(symbol_ref "TARGET_FMA || TARGET_AVX512VL")
(eq_attr "isa" "avx512f") (symbol_ref "TARGET_AVX512F")
+ (eq_attr "isa" "avx512f_512")
+ (symbol_ref "TARGET_AVX512F && TARGET_EVEX512")
(eq_attr "isa" "noavx512f") (symbol_ref "!TARGET_AVX512F")
(eq_attr "isa" "avx512bw") (symbol_ref "TARGET_AVX512BW")
(eq_attr "isa" "noavx512bw") (symbol_ref "!TARGET_AVX512BW")
@@ -2281,7 +2284,7 @@
(define_expand "movxi"
[(set (match_operand:XI 0 "nonimmediate_operand")
(match_operand:XI 1 "general_operand"))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"ix86_expand_vector_move (XImode, operands); DONE;")
(define_expand "movoi"
@@ -2357,7 +2360,7 @@
(define_insn "*movxi_internal_avx512f"
[(set (match_operand:XI 0 "nonimmediate_operand" "=v,v ,v ,m")
(match_operand:XI 1 "nonimmediate_or_sse_const_operand" " C,BC,vm,v"))]
- "TARGET_AVX512F
+ "TARGET_AVX512F && TARGET_EVEX512
&& (register_operand (operands[0], XImode)
|| register_operand (operands[1], XImode))"
{
@@ -2485,9 +2488,9 @@
(define_insn "*movdi_internal"
[(set (match_operand:DI 0 "nonimmediate_operand"
- "=r ,o ,r,r ,r,m ,*y,*y,?*y,?m,?r,?*y,?v,?v,?v,m ,m,?r ,?*Yd,?r,?v,?*y,?*x,*k,*k ,*r,*m,*k")
+ "=r ,o ,r,r ,r,m ,*y,*y,?*y,?m,?r,?*y,?Yv,?v,?v,m ,m,?r ,?*Yd,?r,?v,?*y,?*x,*k,*k ,*r,*m,*k")
(match_operand:DI 1 "general_operand"
- "riFo,riF,Z,rem,i,re,C ,*y,Bk ,*y,*y,r ,C ,?v,Bk,?v,v,*Yd,r ,?v,r ,*x ,*y ,*r,*kBk,*k,*k,CBC"))]
+ "riFo,riF,Z,rem,i,re,C ,*y,Bk ,*y,*y,r ,C ,?v,Bk,?v,v,*Yd,r ,?v,r ,*x ,*y ,*r,*kBk,*k,*k,CBC"))]
"!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& ix86_hardreg_mov_ok (operands[0], operands[1])"
{
@@ -2605,7 +2608,7 @@
(set (attr "mode")
(cond [(eq_attr "alternative" "2")
(const_string "SI")
- (eq_attr "alternative" "12,13")
+ (eq_attr "alternative" "12")
(cond [(match_test "TARGET_AVX")
(const_string "TI")
(ior (not (match_test "TARGET_SSE2"))
@@ -2613,6 +2616,18 @@
(const_string "V4SF")
]
(const_string "TI"))
+ (eq_attr "alternative" "13")
+ (cond [(match_test "TARGET_AVX512VL")
+ (const_string "TI")
+ (match_test "TARGET_AVX512F")
+ (const_string "DF")
+ (match_test "TARGET_AVX")
+ (const_string "TI")
+ (ior (not (match_test "TARGET_SSE2"))
+ (match_test "optimize_function_for_size_p (cfun)"))
+ (const_string "V4SF")
+ ]
+ (const_string "TI"))
(and (eq_attr "alternative" "14,15,16")
(not (match_test "TARGET_SSE2")))
@@ -2706,9 +2721,9 @@
(define_insn "*movsi_internal"
[(set (match_operand:SI 0 "nonimmediate_operand"
- "=r,m ,*y,*y,?*y,?m,?r,?*y,?v,?v,?v,m ,?r,?v,*k,*k ,*rm,*k")
+ "=r,m ,*y,*y,?*y,?m,?r,?*y,?Yv,?v,?v,m ,?r,?v,*k,*k ,*rm,*k")
(match_operand:SI 1 "general_operand"
- "g ,re,C ,*y,Bk ,*y,*y,r ,C ,?v,Bk,?v,?v,r ,*r,*kBk,*k ,CBC"))]
+ "g ,re,C ,*y,Bk ,*y,*y,r ,C ,?v,Bk,?v,?v,r ,*r,*kBk,*k ,CBC"))]
"!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& ix86_hardreg_mov_ok (operands[0], operands[1])"
{
@@ -2793,7 +2808,7 @@
(set (attr "mode")
(cond [(eq_attr "alternative" "2,3")
(const_string "DI")
- (eq_attr "alternative" "8,9")
+ (eq_attr "alternative" "8")
(cond [(match_test "TARGET_AVX")
(const_string "TI")
(ior (not (match_test "TARGET_SSE2"))
@@ -2801,6 +2816,18 @@
(const_string "V4SF")
]
(const_string "TI"))
+ (eq_attr "alternative" "9")
+ (cond [(match_test "TARGET_AVX512VL")
+ (const_string "TI")
+ (match_test "TARGET_AVX512F")
+ (const_string "SF")
+ (match_test "TARGET_AVX")
+ (const_string "TI")
+ (ior (not (match_test "TARGET_SSE2"))
+ (match_test "optimize_function_for_size_p (cfun)"))
+ (const_string "V4SF")
+ ]
+ (const_string "TI"))
(and (eq_attr "alternative" "10,11")
(not (match_test "TARGET_SSE2")))
@@ -2849,9 +2876,9 @@
(define_insn "*movhi_internal"
[(set (match_operand:HI 0 "nonimmediate_operand"
- "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
+ "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*Yv,*v,*v,m")
(match_operand:HI 1 "general_operand"
- "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r ,C ,*v,m ,*v"))]
+ "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r ,C ,*v,m ,*v"))]
"!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& ix86_hardreg_mov_ok (operands[0], operands[1])"
{
@@ -3993,9 +4020,9 @@
;; Possible store forwarding (partial memory) stall in alternatives 4, 6 and 7.
(define_insn "*movdf_internal"
[(set (match_operand:DF 0 "nonimmediate_operand"
- "=Yf*f,m ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,v,v,v,m,*x,*x,*x,m ,?r,?v,r ,o ,r ,m")
+ "=Yf*f,m ,Yf*f,?r ,!o,?*r ,!o,!o,?r,?m,?r,?r,Yv,v,v,m,*x,*x,*x,m ,?r,?v,r ,o ,r ,m")
(match_operand:DF 1 "general_operand"
- "Yf*fm,Yf*f,G ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C,v,m,v,C ,*x,m ,*x, v, r,roF,rF,rmF,rC"))]
+ "Yf*fm,Yf*f,G ,roF,r ,*roF,*r,F ,rm,rC,C ,F ,C ,v,m,v,C ,*x,m ,*x, v, r,roF,rF,rmF,rC"))]
"!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& (lra_in_progress || reload_completed
|| !CONST_DOUBLE_P (operands[1])
@@ -4170,9 +4197,9 @@
(define_insn "*movsf_internal"
[(set (match_operand:SF 0 "nonimmediate_operand"
- "=Yf*f,m ,Yf*f,?r ,?m,v,v,v,m,?r,?v,!*y,!*y,!m,!r,!*y,r ,m")
+ "=Yf*f,m ,Yf*f,?r ,?m,Yv,v,v,m,?r,?v,!*y,!*y,!m,!r,!*y,r ,m")
(match_operand:SF 1 "general_operand"
- "Yf*fm,Yf*f,G ,rmF,rF,C,v,m,v,v ,r ,*y ,m ,*y,*y,r ,rmF,rF"))]
+ "Yf*fm,Yf*f,G ,rmF,rF,C ,v,m,v,v ,r ,*y ,m ,*y,*y,r ,rmF,rF"))]
"!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& (lra_in_progress || reload_completed
|| !CONST_DOUBLE_P (operands[1])
@@ -4247,7 +4274,7 @@
(eq_attr "alternative" "11")
(const_string "DI")
(eq_attr "alternative" "5")
- (cond [(and (match_test "TARGET_AVX512F")
+ (cond [(and (match_test "TARGET_AVX512F && TARGET_EVEX512")
(not (match_test "TARGET_PREFER_AVX256")))
(const_string "V16SF")
(match_test "TARGET_AVX")
@@ -4271,7 +4298,11 @@
better to maintain the whole registers in single format
to avoid problems on using packed logical operations. */
(eq_attr "alternative" "6")
- (cond [(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (cond [(match_test "TARGET_AVX512VL")
+ (const_string "V4SF")
+ (match_test "TARGET_AVX512F")
+ (const_string "SF")
+ (ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(match_test "TARGET_SSE_SPLIT_REGS"))
(const_string "V4SF")
]
@@ -4301,9 +4332,9 @@
(define_insn "*mov<mode>_internal"
[(set (match_operand:HFBF 0 "nonimmediate_operand"
- "=?r,?r,?r,?m,v,v,?r,m,?v,v")
+ "=?r,?r,?r,?m ,Yv,v,?r,m,?v,v")
(match_operand:HFBF 1 "general_operand"
- "r ,F ,m ,r<hfbfconstf>,C,v, v,v,r ,m"))]
+ "r ,F ,m ,r<hfbfconstf>,C ,v, v,v,r ,m"))]
"!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& (lra_in_progress
|| reload_completed
@@ -5144,7 +5175,7 @@
&& optimize_insn_for_speed_p ()
&& reload_completed
&& (!EXT_REX_SSE_REG_P (operands[0])
- || TARGET_AVX512VL)"
+ || TARGET_AVX512VL || TARGET_EVEX512)"
[(set (match_dup 2)
(float_extend:V2DF
(vec_select:V2SF
@@ -5287,8 +5318,8 @@
(set_attr "memory" "none")
(set (attr "enabled")
(if_then_else (eq_attr "alternative" "2")
- (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
- && !TARGET_PREFER_AVX256")
+ (symbol_ref "TARGET_AVX512F && TARGET_EVEX512
+ && !TARGET_AVX512VL && !TARGET_PREFER_AVX256")
(const_string "*")))])
(define_expand "extend<mode>xf2"
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 37d20c6303a..ef49efdbde5 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1276,7 +1276,8 @@
(and (match_code "vec_duplicate")
(and (match_test "TARGET_AVX512F")
(ior (match_test "TARGET_AVX512VL")
- (match_test "GET_MODE_SIZE (GET_MODE (op)) == 64")))
+ (and (match_test "GET_MODE_SIZE (GET_MODE (op)) == 64")
+ (match_test "TARGET_EVEX512"))))
(match_test "VALID_BCST_MODE_P (GET_MODE_INNER (GET_MODE (op)))")
(match_test "GET_MODE (XEXP (op, 0))
== GET_MODE_INNER (GET_MODE (op))")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 80b43fd7db7..8d1b75b43e0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -253,43 +253,43 @@
;; All vector modes including V?TImode, used in move patterns.
(define_mode_iterator VMOVE
- [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
- (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
- (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
- (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI
- (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
- (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
- (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
+ [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+ (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI
+ (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX") V1TI
+ (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+ (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF])
;; All AVX-512{F,VL} vector modes without HF. Supposed TARGET_AVX512F baseline.
(define_mode_iterator V48_AVX512VL
- [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
- V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
- V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
- V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+ [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+ (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
+ (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+ (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
(define_mode_iterator V48_256_512_AVX512VL
- [V16SI (V8SI "TARGET_AVX512VL")
- V8DI (V4DI "TARGET_AVX512VL")
- V16SF (V8SF "TARGET_AVX512VL")
- V8DF (V4DF "TARGET_AVX512VL")])
+ [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL")
+ (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL")
+ (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")
+ (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL")])
;; All AVX-512{F,VL} vector modes. Supposed TARGET_AVX512F baseline.
(define_mode_iterator V48H_AVX512VL
- [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
- V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
+ [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+ (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
(V32HF "TARGET_AVX512FP16")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
- V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
- V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+ (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+ (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
;; 1,2 byte AVX-512{BW,VL} vector modes. Supposed TARGET_AVX512BW baseline.
(define_mode_iterator VI12_AVX512VL
- [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
- V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
+ [(V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
+ (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
(define_mode_iterator VI12HFBF_AVX512VL
[V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
@@ -302,13 +302,13 @@
;; All vector modes
(define_mode_iterator V
- [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
- (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
- (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
- (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
- (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+ [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+ (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI
+ (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
;; All 128bit vector modes
(define_mode_iterator V_128
@@ -324,22 +324,32 @@
V16HF V8HF V8SF V4SF V4DF V2DF])
;; All 512bit vector modes
-(define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF V32HF V32BF])
+(define_mode_iterator V_512
+ [(V64QI "TARGET_EVEX512") (V32HI "TARGET_EVEX512")
+ (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
+ (V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")
+ (V32HF "TARGET_EVEX512") (V32BF "TARGET_EVEX512")])
;; All 256bit and 512bit vector modes
(define_mode_iterator V_256_512
[V32QI V16HI V16HF V16BF V8SI V4DI V8SF V4DF
- (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V32HF "TARGET_AVX512F")
- (V32BF "TARGET_AVX512F") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
- (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
+ (V64QI "TARGET_AVX512F && TARGET_EVEX512")
+ (V32HI "TARGET_AVX512F && TARGET_EVEX512")
+ (V32HF "TARGET_AVX512F && TARGET_EVEX512")
+ (V32BF "TARGET_AVX512F && TARGET_EVEX512")
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")])
;; All vector float modes
(define_mode_iterator VF
- [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+ [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX")
+ (V2DF "TARGET_SSE2")])
(define_mode_iterator VF1_VF2_AVX512DQ
- [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
+ [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
(V8DF "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL")
(V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")])
@@ -347,14 +357,17 @@
[(V32HF "TARGET_AVX512FP16")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
- (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX")
+ (V2DF "TARGET_SSE2")])
;; 128-, 256- and 512-bit float vector modes for bitwise operations
(define_mode_iterator VFB
- [(V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") (V8HF "TARGET_SSE2")
- (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+ [(V32HF "TARGET_AVX512F && TARGET_EVEX512")
+ (V16HF "TARGET_AVX") (V8HF "TARGET_SSE2")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+ (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
;; 128- and 256-bit float vector modes
(define_mode_iterator VF_128_256
@@ -369,10 +382,10 @@
;; All SFmode vector float modes
(define_mode_iterator VF1
- [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF])
+ [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF])
(define_mode_iterator VF1_AVX2
- [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX2") V4SF])
+ [(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX2") V4SF])
;; 128- and 256-bit SF vector modes
(define_mode_iterator VF1_128_256
@@ -383,24 +396,24 @@
;; All DFmode vector float modes
(define_mode_iterator VF2
- [(V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
+ [(V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF])
;; All DFmode & HFmode vector float modes
(define_mode_iterator VF2H
[(V32HF "TARGET_AVX512FP16")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF])
;; 128- and 256-bit DF vector modes
(define_mode_iterator VF2_128_256
[(V4DF "TARGET_AVX") V2DF])
(define_mode_iterator VF2_512_256
- [(V8DF "TARGET_AVX512F") V4DF])
+ [(V8DF "TARGET_AVX512F && TARGET_EVEX512") V4DF])
(define_mode_iterator VF2_512_256VL
- [V8DF (V4DF "TARGET_AVX512VL")])
+ [(V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL")])
;; All 128bit vector SF/DF modes
(define_mode_iterator VF_128
@@ -417,30 +430,30 @@
;; All 512bit vector float modes
(define_mode_iterator VF_512
- [V16SF V8DF])
+ [(V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")])
;; All 512bit vector float modes for bitwise operations
(define_mode_iterator VFB_512
- [V32HF V16SF V8DF])
+ [(V32HF "TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")])
(define_mode_iterator V4SF_V8HF
[V4SF V8HF])
(define_mode_iterator VI48_AVX512VL
- [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
- V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
+ [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+ (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
(define_mode_iterator VI1248_AVX512VLBW
[(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL && TARGET_AVX512BW")
(V16QI "TARGET_AVX512VL && TARGET_AVX512BW")
(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
(V8HI "TARGET_AVX512VL && TARGET_AVX512BW")
- V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
- V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
+ (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+ (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
(define_mode_iterator VF_AVX512VL
- [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
- V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+ [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+ (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
;; AVX512ER SF plus 128- and 256-bit SF vector modes
(define_mode_iterator VF1_AVX512ER_128_256
@@ -450,14 +463,14 @@
[(V32HF "TARGET_AVX512FP16")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
- V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
- V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+ (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+ (V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
(define_mode_iterator VF2_AVX512VL
[V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
(define_mode_iterator VF1_AVX512VL
- [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
+ [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
(define_mode_iterator VHFBF [V32HF V16HF V8HF V32BF V16BF V8BF])
(define_mode_iterator VHFBF_256 [V16HF V16BF])
@@ -472,7 +485,8 @@
;; All vector integer modes
(define_mode_iterator VI
- [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+ [(V16SI "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512")
(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
(V8SI "TARGET_AVX") V4SI
@@ -480,7 +494,8 @@
;; All vector integer and HF modes
(define_mode_iterator VIHFBF
- [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+ [(V16SI "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512")
(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
(V8SI "TARGET_AVX") V4SI
@@ -491,8 +506,8 @@
(define_mode_iterator VI_AVX2
[(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
- (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
;; All QImode vector integer modes
(define_mode_iterator VI1
@@ -510,13 +525,13 @@
(V8SI "TARGET_AVX") (V4DI "TARGET_AVX")])
(define_mode_iterator VI8
- [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI])
+ [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI])
(define_mode_iterator VI8_FVL
[(V8DI "TARGET_AVX512F") V4DI (V2DI "TARGET_AVX512VL")])
(define_mode_iterator VI8_AVX512VL
- [V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
+ [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
(define_mode_iterator VI8_256_512
[V8DI (V4DI "TARGET_AVX512VL")])
@@ -544,7 +559,7 @@
[(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
(define_mode_iterator VI2_AVX512F
- [(V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI])
+ [(V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI])
(define_mode_iterator VI2_AVX512VNNIBW
[(V32HI "TARGET_AVX512BW || TARGET_AVX512VNNI")
@@ -557,14 +572,15 @@
[(V8SI "TARGET_AVX2") V4SI])
(define_mode_iterator VI4_AVX512F
- [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI])
+ [(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI])
(define_mode_iterator VI4_AVX512VL
- [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")])
+ [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")])
(define_mode_iterator VI48_AVX512F_AVX512VL
- [V4SI V8SI (V16SI "TARGET_AVX512F")
- (V2DI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V8DI "TARGET_AVX512F")])
+ [V4SI V8SI (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+ (V2DI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL")
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512")])
(define_mode_iterator VI2_AVX512VL
[(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI])
@@ -589,21 +605,21 @@
[(V4DI "TARGET_AVX2") V2DI])
(define_mode_iterator VI8_AVX2_AVX512F
- [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
+ [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
(define_mode_iterator VI8_AVX_AVX512F
- [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")])
+ [(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX")])
(define_mode_iterator VI4_128_8_256
[V4SI V4DI])
;; All V8D* modes
(define_mode_iterator V8FI
- [V8DF V8DI])
+ [(V8DF "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
;; All V16S* modes
(define_mode_iterator V16FI
- [V16SF V16SI])
+ [(V16SF "TARGET_EVEX512") (V16SI "TARGET_EVEX512")])
;; ??? We should probably use TImode instead.
(define_mode_iterator VIMAX_AVX2_AVX512BW
@@ -630,8 +646,8 @@
(define_mode_iterator VI124_AVX2_24_AVX512F_1_AVX512BW
[(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
- (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI
- (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI])
+ (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI])
(define_mode_iterator VI124_AVX2
[(V32QI "TARGET_AVX2") V16QI
@@ -648,8 +664,8 @@
[(V32HI "TARGET_AVX512BW")
(V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
(V8HI "TARGET_AVX512VL && TARGET_AVX512BW")
- V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
- V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
+ (V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+ (V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
(define_mode_iterator VI48_AVX2
[(V8SI "TARGET_AVX2") V4SI
@@ -663,14 +679,15 @@
(define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW
[(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
(V16SI "TARGET_AVX512BW") (V8SI "TARGET_AVX2") V4SI
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
(define_mode_iterator VI248_AVX512BW
- [(V32HI "TARGET_AVX512BW") V16SI V8DI])
+ [(V32HI "TARGET_AVX512BW") (V16SI "TARGET_EVEX512")
+ (V8DI "TARGET_EVEX512")])
(define_mode_iterator VI248_AVX512BW_AVX512VL
[(V32HI "TARGET_AVX512BW")
- (V4DI "TARGET_AVX512VL") V16SI V8DI])
+ (V4DI "TARGET_AVX512VL") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
;; Suppose TARGET_AVX512VL as baseline
(define_mode_iterator VI248_AVX512BW_1
@@ -684,16 +701,16 @@
V4DI V2DI])
(define_mode_iterator VI48_AVX512F
- [(V16SI "TARGET_AVX512F") V8SI V4SI
- (V8DI "TARGET_AVX512F") V4DI V2DI])
+ [(V16SI "TARGET_AVX512F && TARGET_EVEX512") V8SI V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") V4DI V2DI])
(define_mode_iterator VI48_AVX_AVX512F
- [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI])
+ [(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI])
(define_mode_iterator VI12_AVX_AVX512F
- [ (V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
- (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI])
+ [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+ (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI])
(define_mode_iterator V48_128_256
[V4SF V2DF
@@ -834,7 +851,8 @@
(define_mode_iterator VI248_256 [V16HI V8SI V4DI])
(define_mode_iterator VI248_512 [V32HI V16SI V8DI])
(define_mode_iterator VI48_128 [V4SI V2DI])
-(define_mode_iterator VI148_512 [V64QI V16SI V8DI])
+(define_mode_iterator VI148_512
+ [(V64QI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
(define_mode_iterator VI148_256 [V32QI V8SI V4DI])
(define_mode_iterator VI148_128 [V16QI V4SI V2DI])
@@ -844,15 +862,18 @@
[V32QI V16HI V8SI
(V64QI "TARGET_AVX512BW")
(V32HI "TARGET_AVX512BW")
- (V16SI "TARGET_AVX512F")])
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512")])
(define_mode_iterator VI48_256 [V8SI V4DI])
-(define_mode_iterator VI48_512 [V16SI V8DI])
+(define_mode_iterator VI48_512
+ [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
(define_mode_iterator VI4_256_8_512 [V8SI V8DI])
(define_mode_iterator VI_AVX512BW
- [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
+ [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
+ (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
(define_mode_iterator VIHFBF_AVX512BW
- [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
- (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")])
+ [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
+ (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
+ (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")])
;; Int-float size matches
(define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF])
@@ -862,12 +883,15 @@
(define_mode_iterator VI8F_256 [V4DI V4DF])
(define_mode_iterator VI4F_256_512
[V8SI V8SF
- (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")])
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512")])
(define_mode_iterator VI48F_256_512
[V8SI V8SF
- (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
- (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
- (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")])
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+ (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")])
(define_mode_iterator VF48_I1248
[V16SI V16SF V8DI V8DF V32HI V64QI])
(define_mode_iterator VF48H_AVX512VL
@@ -877,14 +901,17 @@
[V2DF V4SF])
(define_mode_iterator VI48F
- [V16SI V16SF V8DI V8DF
+ [(V16SI "TARGET_EVEX512") (V16SF "TARGET_EVEX512")
+ (V8DI "TARGET_EVEX512") (V8DF "TARGET_EVEX512")
(V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
(V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
(V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
(V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
(define_mode_iterator VI12_VI48F_AVX512VL
- [(V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
- (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
+ [(V16SI "TARGET_AVX512F && TARGET_EVEX512")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")
(V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
(V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
(V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
@@ -901,7 +928,8 @@
(define_mode_iterator V8_128 [V8HI V8HF V8BF])
(define_mode_iterator V16_256 [V16HI V16HF V16BF])
-(define_mode_iterator V32_512 [V32HI V32HF V32BF])
+(define_mode_iterator V32_512
+ [(V32HI "TARGET_EVEX512") (V32HF "TARGET_EVEX512") (V32BF "TARGET_EVEX512")])
;; Mapping from float mode to required SSE level
(define_mode_attr sse
@@ -1295,7 +1323,8 @@
;; Mix-n-match
(define_mode_iterator AVX256MODE2P [V8SI V8SF V4DF])
-(define_mode_iterator AVX512MODE2P [V16SI V16SF V8DF])
+(define_mode_iterator AVX512MODE2P
+ [(V16SI "TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V8DF "TARGET_EVEX512")])
;; Mapping for dbpsabbw modes
(define_mode_attr dbpsadbwmode
@@ -1897,9 +1926,11 @@
(define_mode_iterator STORENT_MODE
[(DI "TARGET_SSE2 && TARGET_64BIT") (SI "TARGET_SSE2")
(SF "TARGET_SSE4A") (DF "TARGET_SSE4A")
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") (V2DI "TARGET_SSE2")
- (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+ (V4DI "TARGET_AVX") (V2DI "TARGET_SSE2")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+ (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
(define_expand "storent<mode>"
[(set (match_operand:STORENT_MODE 0 "memory_operand")
@@ -3377,9 +3408,11 @@
(define_mode_iterator REDUC_PLUS_MODE
[(V4DF "TARGET_AVX") (V8SF "TARGET_AVX")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
- (V8DF "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512")
(V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
- (V32QI "TARGET_AVX") (V64QI "TARGET_AVX512F")])
+ (V32QI "TARGET_AVX")
+ (V64QI "TARGET_AVX512F && TARGET_EVEX512")])
(define_expand "reduc_plus_scal_<mode>"
[(plus:REDUC_PLUS_MODE
@@ -3423,9 +3456,11 @@
(V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
(V64QI "TARGET_AVX512BW")
(V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
- (V32HI "TARGET_AVX512BW") (V16SI "TARGET_AVX512F")
- (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
- (V8DF "TARGET_AVX512F")])
+ (V32HI "TARGET_AVX512BW")
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")])
(define_expand "reduc_<code>_scal_<mode>"
[(smaxmin:REDUC_SMINMAX_MODE
@@ -5035,7 +5070,7 @@
output_asm_insn (buf, operands);
return "";
}
- [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+ [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512")
(set_attr "type" "sselog")
(set_attr "prefix" "orig,vex,evex,evex")
(set (attr "mode")
@@ -5092,7 +5127,7 @@
output_asm_insn (buf, operands);
return "";
}
- [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+ [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512")
(set_attr "type" "sselog")
(set (attr "prefix_data16")
(if_then_else
@@ -5161,7 +5196,7 @@
output_asm_insn (buf, operands);
return "";
}
- [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+ [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512")
(set_attr "type" "sselog")
(set_attr "prefix" "orig,vex,evex,evex")
(set (attr "mode")
@@ -5223,7 +5258,7 @@
output_asm_insn (buf, operands);
return "";
}
- [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+ [(set_attr "isa" "noavx,avx,avx512vl,avx512f_512")
(set_attr "type" "sselog")
(set (attr "prefix_data16")
(if_then_else
@@ -5269,8 +5304,8 @@
(V2DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
(V8SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
(V4DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
- (V16SF "TARGET_AVX512F")
- (V8DF "TARGET_AVX512F")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")
(HF "TARGET_AVX512FP16")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
@@ -5312,8 +5347,8 @@
(V2DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
(V8SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
(V4DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
- (V16SF "TARGET_AVX512F")
- (V8DF "TARGET_AVX512F")])
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")])
(define_mode_iterator FMAMODE
[SF DF V4SF V2DF V8SF V4DF])
@@ -5387,8 +5422,10 @@
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(HF "TARGET_AVX512FP16")
- SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
- DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+ SF (V16SF "TARGET_EVEX512")
+ (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+ DF (V8DF "TARGET_EVEX512")
+ (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
(define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
[(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
@@ -8028,7 +8065,7 @@
(unspec:V16SI
[(match_operand:V16SF 1 "<round_nimm_predicate>" "<round_constraint>")]
UNSPEC_FIX_NOTRUNC))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvtps2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -8095,7 +8132,7 @@
[(set (match_operand:V16SI 0 "register_operand" "=v")
(any_fix:V16SI
(match_operand:V16SF 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvttps2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -8595,7 +8632,7 @@
(const_int 2) (const_int 3)
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)]))))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvtdq2pd\t{%t1, %0|%0, %t1}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -8631,7 +8668,7 @@
(unspec:V8SI
[(match_operand:V8DF 1 "<round_nimm_predicate>" "<round_constraint>")]
UNSPEC_FIX_NOTRUNC))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvtpd2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -8789,7 +8826,7 @@
[(set (match_operand:V8SI 0 "register_operand" "=v")
(any_fix:V8SI
(match_operand:V8DF 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvttpd2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -9193,7 +9230,7 @@
[(set (match_operand:V8SF 0 "register_operand" "=v")
(float_truncate:V8SF
(match_operand:V8DF 1 "<round_nimm_predicate>" "<round_constraint>")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvtpd2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -9355,7 +9392,7 @@
(const_int 2) (const_int 3)
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)]))))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvtps2pd\t{%t1, %0|%0, %t1}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -9540,7 +9577,7 @@
(set (match_operand:V8DF 0 "register_operand")
(float_extend:V8DF
(match_dup 2)))]
-"TARGET_AVX512F"
+"TARGET_AVX512F && TARGET_EVEX512"
"operands[2] = gen_reg_rtx (V8SFmode);")
(define_expand "vec_unpacks_lo_v4sf"
@@ -9678,7 +9715,7 @@
(set (match_operand:V8DF 0 "register_operand")
(float:V8DF
(match_dup 2)))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"operands[2] = gen_reg_rtx (V8SImode);")
(define_expand "vec_unpacks_float_lo_v16si"
@@ -9690,7 +9727,7 @@
(const_int 2) (const_int 3)
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)]))))]
- "TARGET_AVX512F")
+ "TARGET_AVX512F && TARGET_EVEX512")
(define_expand "vec_unpacku_float_hi_v4si"
[(set (match_dup 5)
@@ -9786,7 +9823,7 @@
(define_expand "vec_unpacku_float_hi_v16si"
[(match_operand:V8DF 0 "register_operand")
(match_operand:V16SI 1 "register_operand")]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
REAL_VALUE_TYPE TWO32r;
rtx k, x, tmp[4];
@@ -9835,7 +9872,7 @@
(define_expand "vec_unpacku_float_lo_v16si"
[(match_operand:V8DF 0 "register_operand")
(match_operand:V16SI 1 "nonimmediate_operand")]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
REAL_VALUE_TYPE TWO32r;
rtx k, x, tmp[3];
@@ -9929,7 +9966,7 @@
[(match_operand:V16SI 0 "register_operand")
(match_operand:V8DF 1 "nonimmediate_operand")
(match_operand:V8DF 2 "nonimmediate_operand")]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
rtx r1, r2;
@@ -10044,7 +10081,7 @@
[(match_operand:V16SI 0 "register_operand")
(match_operand:V8DF 1 "nonimmediate_operand")
(match_operand:V8DF 2 "nonimmediate_operand")]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
rtx r1, r2;
@@ -10237,7 +10274,7 @@
(const_int 11) (const_int 27)
(const_int 14) (const_int 30)
(const_int 15) (const_int 31)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vunpckhps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -10325,7 +10362,7 @@
(const_int 9) (const_int 25)
(const_int 12) (const_int 28)
(const_int 13) (const_int 29)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vunpcklps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -10465,7 +10502,7 @@
(const_int 11) (const_int 11)
(const_int 13) (const_int 13)
(const_int 15) (const_int 15)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vmovshdup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "sse")
(set_attr "prefix" "evex")
@@ -10518,7 +10555,7 @@
(const_int 10) (const_int 10)
(const_int 12) (const_int 12)
(const_int 14) (const_int 14)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vmovsldup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "sse")
(set_attr "prefix" "evex")
@@ -11429,7 +11466,8 @@
(V8SF "32x4") (V8SI "32x4") (V4DF "64x2") (V4DI "64x2")])
(define_mode_iterator AVX512_VEC
- [(V8DF "TARGET_AVX512DQ") (V8DI "TARGET_AVX512DQ") V16SF V16SI])
+ [(V8DF "TARGET_AVX512DQ") (V8DI "TARGET_AVX512DQ")
+ (V16SF "TARGET_EVEX512") (V16SI "TARGET_EVEX512")])
(define_expand "<extract_type>_vextract<shuffletype><extract_suf>_mask"
[(match_operand:<ssequartermode> 0 "nonimmediate_operand")
@@ -11598,7 +11636,8 @@
[(V16SF "32x8") (V16SI "32x8") (V8DF "64x4") (V8DI "64x4")])
(define_mode_iterator AVX512_VEC_2
- [(V16SF "TARGET_AVX512DQ") (V16SI "TARGET_AVX512DQ") V8DF V8DI])
+ [(V16SF "TARGET_AVX512DQ") (V16SI "TARGET_AVX512DQ")
+ (V8DF "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
(define_expand "<extract_type_2>_vextract<shuffletype><extract_suf_2>_mask"
[(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
@@ -12155,7 +12194,8 @@
(const_int 26) (const_int 27)
(const_int 28) (const_int 29)
(const_int 30) (const_int 31)])))]
- "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+ "TARGET_AVX512F && TARGET_EVEX512
+ && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
{
if (TARGET_AVX512VL
|| REG_P (operands[0])
@@ -12203,7 +12243,7 @@
(const_int 58) (const_int 59)
(const_int 60) (const_int 61)
(const_int 62) (const_int 63)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1}"
[(set_attr "type" "sselog1")
(set_attr "length_immediate" "1")
@@ -12299,13 +12339,13 @@
(define_mode_iterator VEC_EXTRACT_MODE
[(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
- (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI
(V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
(V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF
- (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
- (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF
+ (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX")])
(define_expand "vec_extract<mode><ssescalarmodelower>"
[(match_operand:<ssescalarmode> 0 "register_operand")
@@ -12347,7 +12387,7 @@
(const_int 3) (const_int 11)
(const_int 5) (const_int 13)
(const_int 7) (const_int 15)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vunpckhpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -12461,7 +12501,7 @@
(const_int 2) (const_int 10)
(const_int 4) (const_int 12)
(const_int 6) (const_int 14)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "sselog1")
(set_attr "prefix" "evex")
@@ -12477,7 +12517,7 @@
(const_int 2) (const_int 10)
(const_int 4) (const_int 12)
(const_int 6) (const_int 14)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vunpcklpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -12689,7 +12729,7 @@
(match_operand:SI 4 "const_0_to_255_operand")]
UNSPEC_VTERNLOG))]
"(<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
/* Disallow embeded broadcast for vector HFmode since
it's not real AVX512FP16 instruction. */
&& (GET_MODE_SIZE (GET_MODE_INNER (<MODE>mode)) >= 4
@@ -12781,7 +12821,7 @@
(match_operand:V 3 "regmem_or_bitnot_regmem_operand")
(match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))]
"(<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
&& ix86_pre_reload_split ()
&& (rtx_equal_p (STRIP_UNARY (operands[1]),
STRIP_UNARY (operands[4]))
@@ -12866,7 +12906,7 @@
(match_operand:V 3 "regmem_or_bitnot_regmem_operand"))
(match_operand:V 4 "regmem_or_bitnot_regmem_operand")))]
"(<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
&& ix86_pre_reload_split ()
&& (rtx_equal_p (STRIP_UNARY (operands[1]),
STRIP_UNARY (operands[4]))
@@ -12950,7 +12990,7 @@
(match_operand:V 2 "regmem_or_bitnot_regmem_operand"))
(match_operand:V 3 "regmem_or_bitnot_regmem_operand")))]
"(<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
&& ix86_pre_reload_split ()"
"#"
"&& 1"
@@ -13074,7 +13114,7 @@
(match_operand:SI 3 "const_0_to_255_operand")
(match_operand:V16SF 4 "register_operand")
(match_operand:HI 5 "register_operand")]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
int mask = INTVAL (operands[3]);
emit_insn (gen_avx512f_shufps512_1_mask (operands[0], operands[1], operands[2],
@@ -13261,7 +13301,7 @@
(match_operand 16 "const_12_to_15_operand")
(match_operand 17 "const_28_to_31_operand")
(match_operand 18 "const_28_to_31_operand")])))]
- "TARGET_AVX512F
+ "TARGET_AVX512F && TARGET_EVEX512
&& (INTVAL (operands[3]) == (INTVAL (operands[7]) - 4)
&& INTVAL (operands[4]) == (INTVAL (operands[8]) - 4)
&& INTVAL (operands[5]) == (INTVAL (operands[9]) - 4)
@@ -13296,7 +13336,7 @@
(match_operand:SI 3 "const_0_to_255_operand")
(match_operand:V8DF 4 "register_operand")
(match_operand:QI 5 "register_operand")]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
int mask = INTVAL (operands[3]);
emit_insn (gen_avx512f_shufpd512_1_mask (operands[0], operands[1], operands[2],
@@ -13326,7 +13366,7 @@
(match_operand 8 "const_12_to_13_operand")
(match_operand 9 "const_6_to_7_operand")
(match_operand 10 "const_14_to_15_operand")])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
int mask;
mask = INTVAL (operands[3]);
@@ -13458,7 +13498,7 @@
(const_int 3) (const_int 11)
(const_int 5) (const_int 13)
(const_int 7) (const_int 15)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpunpckhqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -13508,7 +13548,7 @@
(const_int 2) (const_int 10)
(const_int 4) (const_int 12)
(const_int 6) (const_int 14)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpunpcklqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -13872,8 +13912,8 @@
(set_attr "mode" "V2DF,DF,V8DF")
(set (attr "enabled")
(cond [(eq_attr "alternative" "2")
- (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
- && !TARGET_PREFER_AVX256")
+ (symbol_ref "TARGET_AVX512F && TARGET_EVEX512
+ && !TARGET_AVX512VL && !TARGET_PREFER_AVX256")
(match_test "<mask_avx512vl_condition>")
(const_string "*")
]
@@ -13957,13 +13997,13 @@
[(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand")
(truncate:PMOV_DST_MODE_1
(match_operand:<pmov_src_mode> 1 "register_operand")))]
- "TARGET_AVX512F")
+ "TARGET_AVX512F && TARGET_EVEX512")
(define_insn "*avx512f_<code><pmov_src_lower><mode>2"
[(set (match_operand:PMOV_DST_MODE_1 0 "nonimmediate_operand" "=v,m")
(any_truncate:PMOV_DST_MODE_1
(match_operand:<pmov_src_mode> 1 "register_operand" "v,v")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<trunsuffix><pmov_suff_1>\t{%1, %0|%0, %1}"
[(set_attr "type" "ssemov")
(set_attr "memory" "none,store")
@@ -14094,7 +14134,7 @@
(const_int 2) (const_int 3)
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)])))]
- "TARGET_AVX512F && ix86_pre_reload_split ()"
+ "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (match_dup 0)
@@ -14110,7 +14150,7 @@
(match_operand:<pmov_src_mode> 1 "register_operand" "v,v"))
(match_operand:PMOV_DST_MODE_1 2 "nonimm_or_0_operand" "0C,0")
(match_operand:<avx512fmaskmode> 3 "register_operand" "Yk,Yk")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<trunsuffix><pmov_suff_1>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
[(set_attr "type" "ssemov")
(set_attr "memory" "none,store")
@@ -14124,7 +14164,7 @@
(match_operand:<pmov_src_mode> 1 "register_operand"))
(match_dup 0)
(match_operand:<avx512fmaskmode> 2 "register_operand")))]
- "TARGET_AVX512F")
+ "TARGET_AVX512F && TARGET_EVEX512")
(define_expand "truncv32hiv32qi2"
[(set (match_operand:V32QI 0 "nonimmediate_operand")
@@ -15072,7 +15112,7 @@
[(set (match_operand:V8QI 0 "register_operand")
(truncate:V8QI
(match_operand:V8DI 1 "register_operand")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
rtx op0 = gen_reg_rtx (V16QImode);
@@ -15092,7 +15132,7 @@
(const_int 0) (const_int 0)
(const_int 0) (const_int 0)
(const_int 0) (const_int 0)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<trunsuffix>qb\t{%1, %0|%0, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -15102,7 +15142,7 @@
[(set (match_operand:V8QI 0 "memory_operand" "=m")
(any_truncate:V8QI
(match_operand:V8DI 1 "register_operand" "v")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<trunsuffix>qb\t{%1, %0|%0, %1}"
[(set_attr "type" "ssemov")
(set_attr "memory" "store")
@@ -15114,7 +15154,7 @@
(subreg:DI
(any_truncate:V8QI
(match_operand:V8DI 1 "register_operand")) 0))]
- "TARGET_AVX512F && ix86_pre_reload_split ()"
+ "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (match_dup 0)
@@ -15138,7 +15178,7 @@
(const_int 0) (const_int 0)
(const_int 0) (const_int 0)
(const_int 0) (const_int 0)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<trunsuffix>qb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -15159,7 +15199,7 @@
(const_int 0) (const_int 0)
(const_int 0) (const_int 0)
(const_int 0) (const_int 0)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<trunsuffix>qb\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -15172,7 +15212,7 @@
(match_operand:V8DI 1 "register_operand" "v"))
(match_dup 0)
(match_operand:QI 2 "register_operand" "Yk")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<trunsuffix>qb\t{%1, %0%{%2%}|%0%{%2%}, %1}"
[(set_attr "type" "ssemov")
(set_attr "memory" "store")
@@ -15195,7 +15235,7 @@
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)]))
(match_operand:QI 2 "register_operand")) 0))]
- "TARGET_AVX512F && ix86_pre_reload_split ()"
+ "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (match_dup 0)
@@ -15453,7 +15493,7 @@
(const_int 4) (const_int 6)
(const_int 8) (const_int 10)
(const_int 12) (const_int 14)])))))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
(define_insn "*vec_widen_umult_even_v16si<mask_name>"
@@ -15473,7 +15513,8 @@
(const_int 4) (const_int 6)
(const_int 8) (const_int 10)
(const_int 12) (const_int 14)])))))]
- "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+ "TARGET_AVX512F && TARGET_EVEX512
+ && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
"vpmuludq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sseimul")
(set_attr "prefix" "evex")
@@ -15568,7 +15609,7 @@
(const_int 4) (const_int 6)
(const_int 8) (const_int 10)
(const_int 12) (const_int 14)])))))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
(define_insn "*vec_widen_smult_even_v16si<mask_name>"
@@ -15588,7 +15629,8 @@
(const_int 4) (const_int 6)
(const_int 8) (const_int 10)
(const_int 12) (const_int 14)])))))]
- "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+ "TARGET_AVX512F && TARGET_EVEX512
+ && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
"vpmuldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sseimul")
(set_attr "prefix" "evex")
@@ -17263,8 +17305,10 @@
(V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")
(V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2")
(V16HF "TARGET_AVX512FP16")
- (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
- (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512")
(V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")
(V32HF "TARGET_AVX512FP16")])
@@ -17293,7 +17337,7 @@
{
operands[2] = CONSTM1_RTX (<MODE>mode);
- if (!TARGET_AVX512F)
+ if (!TARGET_AVX512F || (!TARGET_AVX512VL && !TARGET_EVEX512))
operands[2] = force_reg (<MODE>mode, operands[2]);
})
@@ -17302,6 +17346,7 @@
(xor:VI (match_operand:VI 1 "bcst_vector_operand" " 0, m,Br")
(match_operand:VI 2 "vector_all_ones_operand" "BC,BC,BC")))]
"TARGET_AVX512F
+ && (<MODE_SIZE> == 64 || TARGET_AVX512VL || TARGET_EVEX512)
&& (!<mask_applied>
|| <ssescalarmode>mode == SImode
|| <ssescalarmode>mode == DImode)"
@@ -17368,7 +17413,7 @@
(match_operand:VI 2 "vector_all_ones_operand" "BC,BC,BC")))
(unspec [(match_operand:VI 3 "register_operand" "0,0,0")]
UNSPEC_INSN_FALSE_DEP)]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && (<MODE_SIZE> == 64 || TARGET_AVX512VL || TARGET_EVEX512)"
{
if (TARGET_AVX512VL)
return "vpternlog<ternlogsuffix>\t{$0x55, %1, %0, %0<mask_operand3>|%0<mask_operand3>, %0, %1, 0x55}";
@@ -17392,7 +17437,7 @@
(not:<ssescalarmode>
(match_operand:<ssescalarmode> 1 "nonimmediate_operand"))))]
"<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256)"
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)"
[(set (match_dup 0)
(xor:VI48_AVX512F
(vec_duplicate:VI48_AVX512F (match_dup 1))
@@ -17538,7 +17583,8 @@
(symbol_ref "<MODE_SIZE> == 64 || TARGET_AVX512VL")
(eq_attr "alternative" "4")
(symbol_ref "<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256)")
+ || (TARGET_AVX512F && TARGET_EVEX512
+ && !TARGET_PREFER_AVX256)")
]
(const_string "*")))])
@@ -17582,7 +17628,7 @@
(match_operand:<ssescalarmode> 1 "nonimmediate_operand")))
(match_operand:VI 2 "vector_operand")))]
"<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256)"
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)"
[(set (match_dup 3)
(vec_duplicate:VI (match_dup 1)))
(set (match_dup 0)
@@ -17597,7 +17643,7 @@
(match_operand:<ssescalarmode> 1 "nonimmediate_operand")))
(match_operand:VI 2 "vector_operand")))]
"<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256)"
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256)"
[(set (match_dup 3)
(vec_duplicate:VI (match_dup 1)))
(set (match_dup 0)
@@ -17883,7 +17929,7 @@
(match_operand:VI 1 "bcst_vector_operand" "0,m, 0,vBr"))
(match_operand:VI 2 "bcst_vector_operand" "m,0,vBr, 0")))]
"(<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
&& (register_operand (operands[1], <MODE>mode)
|| register_operand (operands[2], <MODE>mode))"
{
@@ -17916,7 +17962,7 @@
(match_operand:VI 1 "bcst_vector_operand" "%0, 0")
(match_operand:VI 2 "bcst_vector_operand" " m,vBr"))))]
"(<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
&& (register_operand (operands[1], <MODE>mode)
|| register_operand (operands[2], <MODE>mode))"
{
@@ -17947,7 +17993,7 @@
(not:VI (match_operand:VI 1 "bcst_vector_operand" "%0, 0"))
(not:VI (match_operand:VI 2 "bcst_vector_operand" "m,vBr"))))]
"(<MODE_SIZE> == 64 || TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256))
+ || (TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256))
&& (register_operand (operands[1], <MODE>mode)
|| register_operand (operands[2], <MODE>mode))"
{
@@ -18669,7 +18715,7 @@
(const_int 11) (const_int 27)
(const_int 14) (const_int 30)
(const_int 15) (const_int 31)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpunpckhdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -18724,7 +18770,7 @@
(const_int 9) (const_int 25)
(const_int 12) (const_int 28)
(const_int 13) (const_int 29)])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpunpckldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -19418,7 +19464,7 @@
(match_operand:SI 2 "const_0_to_255_operand")
(match_operand:V16SI 3 "register_operand")
(match_operand:HI 4 "register_operand")]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
int mask = INTVAL (operands[2]);
emit_insn (gen_avx512f_pshufd_1_mask (operands[0], operands[1],
@@ -19462,7 +19508,7 @@
(match_operand 15 "const_12_to_15_operand")
(match_operand 16 "const_12_to_15_operand")
(match_operand 17 "const_12_to_15_operand")])))]
- "TARGET_AVX512F
+ "TARGET_AVX512F && TARGET_EVEX512
&& INTVAL (operands[2]) + 4 == INTVAL (operands[6])
&& INTVAL (operands[3]) + 4 == INTVAL (operands[7])
&& INTVAL (operands[4]) + 4 == INTVAL (operands[8])
@@ -20315,7 +20361,7 @@
(match_operand:V4TI 1 "register_operand" "v")
(parallel
[(match_operand:SI 2 "const_0_to_3_operand")])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vextracti32x4\t{%2, %1, %0|%0, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "length_immediate" "1")
@@ -20323,7 +20369,7 @@
(set_attr "mode" "XI")])
(define_mode_iterator VEXTRACTI128_MODE
- [(V4TI "TARGET_AVX512F") V2TI])
+ [(V4TI "TARGET_AVX512F && TARGET_EVEX512") V2TI])
(define_split
[(set (match_operand:TI 0 "nonimmediate_operand")
@@ -20346,7 +20392,8 @@
&& VECTOR_MODE_P (GET_MODE (operands[1]))
&& ((TARGET_SSE && GET_MODE_SIZE (GET_MODE (operands[1])) == 16)
|| (TARGET_AVX && GET_MODE_SIZE (GET_MODE (operands[1])) == 32)
- || (TARGET_AVX512F && GET_MODE_SIZE (GET_MODE (operands[1])) == 64))
+ || (TARGET_AVX512F && TARGET_EVEX512
+ && GET_MODE_SIZE (GET_MODE (operands[1])) == 64))
&& (<MODE>mode == SImode || TARGET_64BIT || MEM_P (operands[0]))"
[(set (match_dup 0) (vec_select:SWI48x (match_dup 1)
(parallel [(const_int 0)])))]
@@ -21994,8 +22041,9 @@
(define_mode_iterator VI1248_AVX512VL_AVX512BW
[(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
- (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX512VL")
+ (V2DI "TARGET_AVX512VL")])
(define_insn "*abs<mode>2"
[(set (match_operand:VI1248_AVX512VL_AVX512BW 0 "register_operand" "=<v_Yw>")
@@ -22840,7 +22888,7 @@
[(set (match_operand:V16SI 0 "register_operand" "=v")
(any_extend:V16SI
(match_operand:V16QI 1 "nonimmediate_operand" "vm")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<extsuffix>bd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -22850,7 +22898,7 @@
[(set (match_operand:V16SI 0 "register_operand")
(any_extend:V16SI
(match_operand:V16QI 1 "nonimmediate_operand")))]
- "TARGET_AVX512F")
+ "TARGET_AVX512F && TARGET_EVEX512")
(define_insn "avx2_<code>v8qiv8si2<mask_name>"
[(set (match_operand:V8SI 0 "register_operand" "=v")
@@ -22982,7 +23030,7 @@
[(set (match_operand:V16SI 0 "register_operand" "=v")
(any_extend:V16SI
(match_operand:V16HI 1 "nonimmediate_operand" "vm")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -22992,7 +23040,7 @@
[(set (match_operand:V16SI 0 "register_operand")
(any_extend:V16SI
(match_operand:V16HI 1 "nonimmediate_operand")))]
- "TARGET_AVX512F")
+ "TARGET_AVX512F && TARGET_EVEX512")
(define_insn_and_split "avx512f_zero_extendv16hiv16si2_1"
[(set (match_operand:V32HI 0 "register_operand" "=v")
@@ -23002,7 +23050,7 @@
(match_operand:V32HI 2 "const0_operand"))
(match_parallel 3 "pmovzx_parallel"
[(match_operand 4 "const_int_operand")])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"#"
"&& reload_completed"
[(set (match_dup 0) (zero_extend:V16SI (match_dup 1)))]
@@ -23223,7 +23271,7 @@
(const_int 2) (const_int 3)
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)]))))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -23233,7 +23281,7 @@
[(set (match_operand:V8DI 0 "register_operand" "=v")
(any_extend:V8DI
(match_operand:V8QI 1 "memory_operand" "m")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -23251,7 +23299,7 @@
(const_int 2) (const_int 3)
(const_int 4) (const_int 5)
(const_int 6) (const_int 7)]))))]
- "TARGET_AVX512F && ix86_pre_reload_split ()"
+ "TARGET_AVX512F && TARGET_EVEX512 && ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (match_dup 0)
@@ -23262,7 +23310,7 @@
[(set (match_operand:V8DI 0 "register_operand")
(any_extend:V8DI
(match_operand:V8QI 1 "nonimmediate_operand")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
if (!MEM_P (operands[1]))
{
@@ -23402,7 +23450,7 @@
[(set (match_operand:V8DI 0 "register_operand" "=v")
(any_extend:V8DI
(match_operand:V8HI 1 "nonimmediate_operand" "vm")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -23412,7 +23460,7 @@
[(set (match_operand:V8DI 0 "register_operand")
(any_extend:V8DI
(match_operand:V8HI 1 "nonimmediate_operand")))]
- "TARGET_AVX512F")
+ "TARGET_AVX512F && TARGET_EVEX512")
(define_insn "avx2_<code>v4hiv4di2<mask_name>"
[(set (match_operand:V4DI 0 "register_operand" "=v")
@@ -23538,7 +23586,7 @@
[(set (match_operand:V8DI 0 "register_operand" "=v")
(any_extend:V8DI
(match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -23552,7 +23600,7 @@
(match_operand:V16SI 2 "const0_operand"))
(match_parallel 3 "pmovzx_parallel"
[(match_operand 4 "const_int_operand")])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"#"
"&& reload_completed"
[(set (match_dup 0) (zero_extend:V8DI (match_dup 1)))]
@@ -23571,7 +23619,7 @@
(match_operand:V16SI 3 "const0_operand"))
(match_parallel 4 "pmovzx_parallel"
[(match_operand 5 "const_int_operand")])))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"#"
"&& reload_completed"
[(set (match_dup 0) (zero_extend:V8DI (match_dup 1)))]
@@ -23583,7 +23631,7 @@
[(set (match_operand:V8DI 0 "register_operand" "=v")
(any_extend:V8DI
(match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
- "TARGET_AVX512F")
+ "TARGET_AVX512F && TARGET_EVEX512")
(define_insn "avx2_<code>v4siv4di2<mask_name>"
[(set (match_operand:V4DI 0 "register_operand" "=v")
@@ -23977,7 +24025,7 @@
[(match_operand:V16SI 0 "register_operand")
(match_operand:V16SF 1 "nonimmediate_operand")
(match_operand:SI 2 "const_0_to_15_operand")]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
rtx tmp = gen_reg_rtx (V16SFmode);
emit_insn (gen_avx512f_rndscalev16sf (tmp, operands[1], operands[2]));
@@ -25394,7 +25442,7 @@
(ashiftrt:V8DI
(match_operand:V8DI 1 "register_operand")
(match_operand:V8DI 2 "nonimmediate_operand")))]
- "TARGET_AVX512F")
+ "TARGET_AVX512F && TARGET_EVEX512")
(define_expand "vashrv4di3"
[(set (match_operand:V4DI 0 "register_operand")
@@ -25485,7 +25533,7 @@
[(set (match_operand:V16SI 0 "register_operand")
(ashiftrt:V16SI (match_operand:V16SI 1 "register_operand")
(match_operand:V16SI 2 "nonimmediate_operand")))]
- "TARGET_AVX512F")
+ "TARGET_AVX512F && TARGET_EVEX512")
(define_expand "vashrv8si3"
[(set (match_operand:V8SI 0 "register_operand")
@@ -26058,8 +26106,8 @@
(define_mode_attr pbroadcast_evex_isa
[(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw")
(V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
- (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f")
- (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")
+ (V16SI "avx512f_512") (V8SI "avx512f") (V4SI "avx512f")
+ (V8DI "avx512f_512") (V4DI "avx512f") (V2DI "avx512f")
(V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")
(V32BF "avx512bw") (V16BF "avx512bw") (V8BF "avx512bw")])
@@ -26602,7 +26650,7 @@
(set (attr "enabled")
(if_then_else (eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
- && !TARGET_PREFER_AVX256")
+ && TARGET_EVEX512 && !TARGET_PREFER_AVX256")
(const_string "*")))])
(define_insn "*vec_dupv4si"
@@ -26630,7 +26678,7 @@
(set (attr "enabled")
(if_then_else (eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
- && !TARGET_PREFER_AVX256")
+ && TARGET_EVEX512 && !TARGET_PREFER_AVX256")
(const_string "*")))])
(define_insn "*vec_dupv2di"
@@ -26661,7 +26709,8 @@
(if_then_else
(eq_attr "alternative" "2")
(symbol_ref "TARGET_AVX512VL
- || (TARGET_AVX512F && !TARGET_PREFER_AVX256)")
+ || (TARGET_AVX512F && TARGET_EVEX512
+ && !TARGET_PREFER_AVX256)")
(const_string "*")))])
(define_insn "avx2_vbroadcasti128_<mode>"
@@ -26741,7 +26790,7 @@
[(set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "maybe_evex")
- (set_attr "isa" "avx2,noavx2,avx2,avx512f,noavx2")
+ (set_attr "isa" "avx2,noavx2,avx2,avx512f_512,noavx2")
(set_attr "mode" "<sseinsnmode>,V8SF,<sseinsnmode>,<sseinsnmode>,V8SF")])
(define_split
@@ -26908,7 +26957,8 @@
(set_attr "mode" "<sseinsnmode>")])
(define_mode_iterator VPERMI2
- [V16SI V16SF V8DI V8DF
+ [(V16SI "TARGET_EVEX512") (V16SF "TARGET_EVEX512")
+ (V8DI "TARGET_EVEX512") (V8DF "TARGET_EVEX512")
(V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
(V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
(V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
@@ -26919,7 +26969,7 @@
(V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
(define_mode_iterator VPERMI2I
- [V16SI V8DI
+ [(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
(V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
(V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
@@ -27543,28 +27593,29 @@
;; Modes handled by vec_init expanders.
(define_mode_iterator VEC_INIT_MODE
- [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
- (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
- (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
- (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
- (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
- (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
- (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
+ [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+ (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI
+ (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+ (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512")
+ (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
+ (V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX")])
;; Likewise, but for initialization from half sized vectors.
;; Thus, these are all VEC_INIT_MODE modes except V2??.
(define_mode_iterator VEC_INIT_HALF_MODE
- [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
- (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
- (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
- (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
- (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
- (V32BF "TARGET_AVX512F") (V16BF "TARGET_AVX") V8BF
- (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
- (V4TI "TARGET_AVX512F")])
+ [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+ (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX")
+ (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+ (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
+ (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
+ (V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX")
+ (V4TI "TARGET_AVX512F && TARGET_EVEX512")])
(define_expand "vec_init<mode><ssescalarmodelower>"
[(match_operand:VEC_INIT_MODE 0 "register_operand")
@@ -27817,7 +27868,7 @@
(unspec:V16SF
[(match_operand:V16HI 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")]
UNSPEC_VCVTPH2PS))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvtph2ps\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -27907,7 +27958,7 @@
UNSPEC_VCVTPS2PH)
(match_operand:V16HI 3 "nonimm_or_0_operand")
(match_operand:HI 4 "register_operand")))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
{
int round = INTVAL (operands[2]);
/* Separate {sae} from rounding control imm,
@@ -27926,7 +27977,7 @@
[(match_operand:V16SF 1 "register_operand" "v")
(match_operand:SI 2 "const_0_to_255_operand")]
UNSPEC_VCVTPS2PH))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvtps2ph\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -27938,7 +27989,7 @@
[(match_operand:V16SF 1 "register_operand" "v")
(match_operand:SI 2 "const_0_to_255_operand")]
UNSPEC_VCVTPS2PH))]
- "TARGET_AVX512F"
+ "TARGET_AVX512F && TARGET_EVEX512"
"vcvtps2ph\t{%2, %1, %0<merge_mask_operand3>|%0<merge_mask_operand3>, %1, %2}"
[(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
@@ -30285,10 +30336,10 @@
;; vinserti64x4 $0x1, %ymm15, %zmm15, %zmm15
(define_mode_iterator INT_BROADCAST_MODE
- [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
- (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
- (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
- (V8DI "TARGET_AVX512F && TARGET_64BIT")
+ [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+ (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+ (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+ (V8DI "TARGET_AVX512F && TARGET_EVEX512 && TARGET_64BIT")
(V4DI "TARGET_AVX && TARGET_64BIT") (V2DI "TARGET_64BIT")])
;; Broadcast from an integer. NB: Enable broadcast only if we can move
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5b.c b/gcc/testsuite/gcc.target/i386/pr89229-5b.c
index 261f2e12e8d..8a81585e790 100644
--- a/gcc/testsuite/gcc.target/i386/pr89229-5b.c
+++ b/gcc/testsuite/gcc.target/i386/pr89229-5b.c
@@ -3,4 +3,4 @@
#include "pr89229-5a.c"
-/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-times "vmovsd\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6b.c b/gcc/testsuite/gcc.target/i386/pr89229-6b.c
index a74f7169e6e..0c27daa4f74 100644
--- a/gcc/testsuite/gcc.target/i386/pr89229-6b.c
+++ b/gcc/testsuite/gcc.target/i386/pr89229-6b.c
@@ -3,4 +3,4 @@
#include "pr89229-6a.c"
-/* { dg-final { scan-assembler-times "vmovaps\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-times "vmovss\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7b.c b/gcc/testsuite/gcc.target/i386/pr89229-7b.c
index d3a56e6e2b7..baba99ec775 100644
--- a/gcc/testsuite/gcc.target/i386/pr89229-7b.c
+++ b/gcc/testsuite/gcc.target/i386/pr89229-7b.c
@@ -3,4 +3,4 @@
#include "pr89229-7a.c"
-/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
+/* { dg-final { scan-assembler-times "vmovss\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 14/18] Support -mevex512 for AVX512DQ intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (12 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 13/18] Support -mevex512 for AVX512F intrins Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 15/18] Support -mevex512 for AVX512BW intrins Hu, Lin1
` (5 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_sse2_mulvxdi3):
Add TARGET_EVEX512 for 512 bit usage.
* config/i386/i386.cc (standard_sse_constant_opcode): Ditto.
* config/i386/sse.md (VF1_VF2_AVX512DQ): Ditto.
(VF1_128_256VL): Ditto.
(VF2_AVX512VL): Ditto.
(VI8_256_512): Ditto.
(<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>):
Ditto.
(AVX512_VEC): Ditto.
(AVX512_VEC_2): Ditto.
(VI4F_BRCST32x2): Ditto.
(VI8F_BRCST64x2): Ditto.
---
gcc/config/i386/i386-expand.cc | 2 +-
gcc/config/i386/i386.cc | 22 ++++++++++++++++------
gcc/config/i386/sse.md | 24 ++++++++++++++----------
3 files changed, 31 insertions(+), 17 deletions(-)
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 0705e08d38c..063561e1265 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24008,7 +24008,7 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
machine_mode mode = GET_MODE (op0);
rtx t1, t2, t3, t4, t5, t6;
- if (TARGET_AVX512DQ && mode == V8DImode)
+ if (TARGET_AVX512DQ && TARGET_EVEX512 && mode == V8DImode)
emit_insn (gen_avx512dq_mulv8di3 (op0, op1, op2));
else if (TARGET_AVX512DQ && TARGET_AVX512VL && mode == V4DImode)
emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2));
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 635dd85e764..589b29a324d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5332,9 +5332,14 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
if (EXT_REX_SSE_REG_P (operands[0]))
{
if (TARGET_AVX512DQ)
- return (TARGET_AVX512VL
- ? "vxorpd\t%x0, %x0, %x0"
- : "vxorpd\t%g0, %g0, %g0");
+ {
+ if (TARGET_AVX512VL)
+ return "vxorpd\t%x0, %x0, %x0";
+ else if (TARGET_EVEX512)
+ return "vxorpd\t%g0, %g0, %g0";
+ else
+ gcc_unreachable ();
+ }
else
{
if (TARGET_AVX512VL)
@@ -5356,9 +5361,14 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
if (EXT_REX_SSE_REG_P (operands[0]))
{
if (TARGET_AVX512DQ)
- return (TARGET_AVX512VL
- ? "vxorps\t%x0, %x0, %x0"
- : "vxorps\t%g0, %g0, %g0");
+ {
+ if (TARGET_AVX512VL)
+ return "vxorps\t%x0, %x0, %x0";
+ else if (TARGET_EVEX512)
+ return "vxorps\t%g0, %g0, %g0";
+ else
+ gcc_unreachable ();
+ }
else
{
if (TARGET_AVX512VL)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8d1b75b43e0..a8f93ceddc5 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -350,7 +350,8 @@
(define_mode_iterator VF1_VF2_AVX512DQ
[(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
- (V8DF "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL")
+ (V8DF "TARGET_AVX512DQ && TARGET_EVEX512")
+ (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL")
(V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")])
(define_mode_iterator VFH
@@ -392,7 +393,7 @@
[(V8SF "TARGET_AVX") V4SF])
(define_mode_iterator VF1_128_256VL
- [V8SF (V4SF "TARGET_AVX512VL")])
+ [(V8SF "TARGET_EVEX512") (V4SF "TARGET_AVX512VL")])
;; All DFmode vector float modes
(define_mode_iterator VF2
@@ -467,7 +468,7 @@
(V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
(define_mode_iterator VF2_AVX512VL
- [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+ [(V8DF "TARGET_EVEX512") (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
(define_mode_iterator VF1_AVX512VL
[(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
@@ -534,7 +535,7 @@
[(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
(define_mode_iterator VI8_256_512
- [V8DI (V4DI "TARGET_AVX512VL")])
+ [(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL")])
(define_mode_iterator VI1_AVX2
[(V32QI "TARGET_AVX2") V16QI])
@@ -9075,7 +9076,7 @@
(define_insn "<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>"
[(set (match_operand:<sseintvecmode> 0 "register_operand" "=v")
(unsigned_fix:<sseintvecmode>
- (match_operand:VF1_128_256VL 1 "nonimmediate_operand" "vm")))]
+ (match_operand:VF1_128_256 1 "nonimmediate_operand" "vm")))]
"TARGET_AVX512VL"
"vcvttps2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "ssecvt")
@@ -11466,7 +11467,8 @@
(V8SF "32x4") (V8SI "32x4") (V4DF "64x2") (V4DI "64x2")])
(define_mode_iterator AVX512_VEC
- [(V8DF "TARGET_AVX512DQ") (V8DI "TARGET_AVX512DQ")
+ [(V8DF "TARGET_AVX512DQ && TARGET_EVEX512")
+ (V8DI "TARGET_AVX512DQ && TARGET_EVEX512")
(V16SF "TARGET_EVEX512") (V16SI "TARGET_EVEX512")])
(define_expand "<extract_type>_vextract<shuffletype><extract_suf>_mask"
@@ -11636,7 +11638,8 @@
[(V16SF "32x8") (V16SI "32x8") (V8DF "64x4") (V8DI "64x4")])
(define_mode_iterator AVX512_VEC_2
- [(V16SF "TARGET_AVX512DQ") (V16SI "TARGET_AVX512DQ")
+ [(V16SF "TARGET_AVX512DQ && TARGET_EVEX512")
+ (V16SI "TARGET_AVX512DQ && TARGET_EVEX512")
(V8DF "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
(define_expand "<extract_type_2>_vextract<shuffletype><extract_suf_2>_mask"
@@ -26850,8 +26853,8 @@
;; For broadcast[i|f]32x2. Yes there is no v4sf version, only v4si.
(define_mode_iterator VI4F_BRCST32x2
- [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
- V16SF (V8SF "TARGET_AVX512VL")])
+ [(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+ (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")])
(define_mode_attr 64x2mode
[(V8DF "V2DF") (V8DI "V2DI") (V4DI "V2DI") (V4DF "V2DF")])
@@ -26901,7 +26904,8 @@
;; For broadcast[i|f]64x2
(define_mode_iterator VI8F_BRCST64x2
- [V8DI V8DF (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")])
+ [(V8DI "TARGET_EVEX512") (V8DF "TARGET_EVEX512")
+ (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")])
(define_insn "<mask_codefor>avx512dq_broadcast<mode><mask_name>_1"
[(set (match_operand:VI8F_BRCST64x2 0 "register_operand" "=v,v")
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 15/18] Support -mevex512 for AVX512BW intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (13 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 14/18] Support -mevex512 for AVX512DQ intrins Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 16/18] Support -mevex512 for AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins Hu, Lin1
` (4 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/Changelog:
* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
Make sure there is EVEX512 enabled.
(ix86_expand_vecop_qihi2): Refuse V32QI->V32HI when no EVEX512.
* config/i386/i386.cc (ix86_hard_regno_mode_ok): Disable 64 bit mask
when !TARGET_EVEX512.
* config/i386/i386.md (avx512bw_512): New.
(SWI1248_AVX512BWDQ_64): Add TARGET_EVEX512.
(*zero_extendsidi2): Change isa to avx512bw_512.
(kmov_isa): Ditto.
(*anddi_1): Ditto.
(*andn<mode>_1): Change isa to kmov_isa.
(*<code><mode>_1): Ditto.
(*notxor<mode>_1): Ditto.
(*one_cmpl<mode>2_1): Ditto.
(*one_cmplsi2_1_zext): Change isa to avx512bw_512.
(*ashl<mode>3_1): Change isa to kmov_isa.
(*lshr<mode>3_1): Ditto.
* config/i386/sse.md (VI12HFBF_AVX512VL): Add TARGET_EVEX512.
(VI1248_AVX512VLBW): Ditto.
(VHFBF_AVX512VL): Ditto.
(VI): Ditto.
(VIHFBF): Ditto.
(VI_AVX2): Ditto.
(VI1_AVX512): Ditto.
(VI12_256_512_AVX512VL): Ditto.
(VI2_AVX2_AVX512BW): Ditto.
(VI2_AVX512VNNIBW): Ditto.
(VI2_AVX512VL): Ditto.
(VI2HFBF_AVX512VL): Ditto.
(VI8_AVX2_AVX512BW): Ditto.
(VIMAX_AVX2_AVX512BW): Ditto.
(VIMAX_AVX512VL): Ditto.
(VI12_AVX2_AVX512BW): Ditto.
(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
(VI248_AVX512VL): Ditto.
(VI248_AVX512VLBW): Ditto.
(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
(VI248_AVX512BW): Ditto.
(VI248_AVX512BW_AVX512VL): Ditto.
(VI248_512): Ditto.
(VI124_256_AVX512F_AVX512BW): Ditto.
(VI_AVX512BW): Ditto.
(VIHFBF_AVX512BW): Ditto.
(SWI1248_AVX512BWDQ): Ditto.
(SWI1248_AVX512BW): Ditto.
(SWI1248_AVX512BWDQ2): Ditto.
(*knotsi_1_zext): Ditto.
(define_split for zero_extend + not): Ditto.
(kunpckdi): Ditto.
(REDUC_SMINMAX_MODE): Ditto.
(VEC_EXTRACT_MODE): Ditto.
(*avx512bw_permvar_truncv16siv16hi_1): Ditto.
(*avx512bw_permvar_truncv16siv16hi_1_hf): Ditto.
(truncv32hiv32qi2): Ditto.
(avx512bw_<code>v32hiv32qi2): Ditto.
(avx512bw_<code>v32hiv32qi2_mask): Ditto.
(avx512bw_<code>v32hiv32qi2_mask_store): Ditto.
(usadv64qi): Ditto.
(VEC_PERM_AVX2): Ditto.
(AVX512ZEXTMASK): Ditto.
(SWI24_MASK): New.
(vec_pack_trunc_<mode>): Change iterator to SWI24_MASK.
(avx512bw_packsswb<mask_name>): Add TARGET_EVEX512.
(avx512bw_packssdw<mask_name>): Ditto.
(avx512bw_interleave_highv64qi<mask_name>): Ditto.
(avx512bw_interleave_lowv64qi<mask_name>): Ditto.
(<mask_codefor>avx512bw_pshuflwv32hi<mask_name>): Ditto.
(<mask_codefor>avx512bw_pshufhwv32hi<mask_name>): Ditto.
(vec_unpacks_lo_di): Ditto.
(SWI48x_MASK): New.
(vec_unpacks_hi_<mode>): Change iterator to SWI48x_MASK.
(avx512bw_umulhrswv32hi3<mask_name>): Add TARGET_EVEX512.
(VI1248_AVX512VL_AVX512BW): Ditto.
(avx512bw_<code>v32qiv32hi2<mask_name>): Ditto.
(*avx512bw_zero_extendv32qiv32hi2_1): Ditto.
(*avx512bw_zero_extendv32qiv32hi2_2): Ditto.
(<insn>v32qiv32hi2): Ditto.
(pbroadcast_evex_isa): Change isa attribute to avx512bw_512.
(VPERMI2): Add TARGET_EVEX512.
(VPERMI2I): Ditto.
---
gcc/config/i386/i386-expand.cc | 3 +-
gcc/config/i386/i386.cc | 4 +-
gcc/config/i386/i386.md | 54 ++++-----
gcc/config/i386/sse.md | 193 ++++++++++++++++++---------------
4 files changed, 128 insertions(+), 126 deletions(-)
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 063561e1265..ff2423f91ed 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -15617,6 +15617,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
case E_V32HFmode:
case E_V32BFmode:
case E_V64QImode:
+ gcc_assert (TARGET_EVEX512);
if (TARGET_AVX512BW)
return ix86_vector_duplicate_value (mode, target, val);
else
@@ -23512,7 +23513,7 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, rtx op1, rtx op2)
bool uns_p = code != ASHIFTRT;
if ((qimode == V16QImode && !TARGET_AVX2)
- || (qimode == V32QImode && !TARGET_AVX512BW)
+ || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
/* There are no V64HImode instructions. */
|| qimode == V64QImode)
return false;
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 589b29a324d..03c96ff048d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20308,8 +20308,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
return MASK_PAIR_REGNO_P(regno);
return ((TARGET_AVX512F && VALID_MASK_REG_MODE (mode))
- || (TARGET_AVX512BW
- && VALID_MASK_AVX512BW_MODE (mode)));
+ || (TARGET_AVX512BW && mode == SImode)
+ || (TARGET_AVX512BW && TARGET_EVEX512 && mode == DImode));
}
if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6eb4e540140..bdececc2309 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -536,10 +536,10 @@
x64_avx,x64_avx512bw,x64_avx512dq,aes,
sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
- noavx512f,avx512bw,noavx512bw,avx512dq,noavx512dq,
- fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,avx512vnnivl,
- avx512fp16,avxifma,avx512ifmavl,avxneconvert,avx512bf16vl,
- vpclmulqdqvl"
+ noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
+ noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
+ avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
+ avx512bf16vl,vpclmulqdqvl"
(const_string "base"))
;; The (bounding maximum) length of an instruction immediate.
@@ -904,6 +904,8 @@
(symbol_ref "TARGET_AVX512F && TARGET_EVEX512")
(eq_attr "isa" "noavx512f") (symbol_ref "!TARGET_AVX512F")
(eq_attr "isa" "avx512bw") (symbol_ref "TARGET_AVX512BW")
+ (eq_attr "isa" "avx512bw_512")
+ (symbol_ref "TARGET_AVX512BW && TARGET_EVEX512")
(eq_attr "isa" "noavx512bw") (symbol_ref "!TARGET_AVX512BW")
(eq_attr "isa" "avx512dq") (symbol_ref "TARGET_AVX512DQ")
(eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ")
@@ -1440,7 +1442,8 @@
(define_mode_iterator SWI1248_AVX512BWDQ_64
[(QI "TARGET_AVX512DQ") HI
- (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW && TARGET_64BIT")])
+ (SI "TARGET_AVX512BW")
+ (DI "TARGET_AVX512BW && TARGET_EVEX512 && TARGET_64BIT")])
(define_insn "*cmp<mode>_ccz_1"
[(set (reg FLAGS_REG)
@@ -4580,7 +4583,7 @@
(eq_attr "alternative" "12")
(const_string "x64_avx512bw")
(eq_attr "alternative" "13")
- (const_string "avx512bw")
+ (const_string "avx512bw_512")
]
(const_string "*")))
(set (attr "mmx_isa")
@@ -4657,7 +4660,7 @@
"split_double_mode (DImode, &operands[0], 1, &operands[3], &operands[4]);")
(define_mode_attr kmov_isa
- [(QI "avx512dq") (HI "avx512f") (SI "avx512bw") (DI "avx512bw")])
+ [(QI "avx512dq") (HI "avx512f") (SI "avx512bw") (DI "avx512bw_512")])
(define_insn "zero_extend<mode>di2"
[(set (match_operand:DI 0 "register_operand" "=r,*r,*k")
@@ -11124,7 +11127,7 @@
and{q}\t{%2, %0|%0, %2}
#
#"
- [(set_attr "isa" "x64,x64,x64,x64,avx512bw")
+ [(set_attr "isa" "x64,x64,x64,x64,avx512bw_512")
(set_attr "type" "alu,alu,alu,imovx,msklog")
(set_attr "length_immediate" "*,*,*,0,*")
(set (attr "prefix_rex")
@@ -11647,12 +11650,13 @@
(not:SWI48 (match_operand:SWI48 1 "register_operand" "r,r,k"))
(match_operand:SWI48 2 "nonimmediate_operand" "r,m,k")))
(clobber (reg:CC FLAGS_REG))]
- "TARGET_BMI || TARGET_AVX512BW"
+ "TARGET_BMI
+ || (TARGET_AVX512BW && (<MODE>mode == SImode || TARGET_EVEX512))"
"@
andn\t{%2, %1, %0|%0, %1, %2}
andn\t{%2, %1, %0|%0, %1, %2}
#"
- [(set_attr "isa" "bmi,bmi,avx512bw")
+ [(set_attr "isa" "bmi,bmi,<kmov_isa>")
(set_attr "type" "bitmanip,bitmanip,msklog")
(set_attr "btver2_decode" "direct, double,*")
(set_attr "mode" "<MODE>")])
@@ -11880,13 +11884,7 @@
<logic>{<imodesuffix>}\t{%2, %0|%0, %2}
<logic>{<imodesuffix>}\t{%2, %0|%0, %2}
#"
- [(set (attr "isa")
- (cond [(eq_attr "alternative" "2")
- (if_then_else (eq_attr "mode" "SI,DI")
- (const_string "avx512bw")
- (const_string "avx512f"))
- ]
- (const_string "*")))
+ [(set_attr "isa" "*,*,<kmov_isa>")
(set_attr "type" "alu, alu, msklog")
(set_attr "mode" "<MODE>")])
@@ -11913,13 +11911,7 @@
DONE;
}
}
- [(set (attr "isa")
- (cond [(eq_attr "alternative" "2")
- (if_then_else (eq_attr "mode" "SI,DI")
- (const_string "avx512bw")
- (const_string "avx512f"))
- ]
- (const_string "*")))
+ [(set_attr "isa" "*,*,<kmov_isa>")
(set_attr "type" "alu, alu, msklog")
(set_attr "mode" "<MODE>")])
@@ -13300,13 +13292,7 @@
"@
not{<imodesuffix>}\t%0
#"
- [(set (attr "isa")
- (cond [(eq_attr "alternative" "1")
- (if_then_else (eq_attr "mode" "SI,DI")
- (const_string "avx512bw")
- (const_string "avx512f"))
- ]
- (const_string "*")))
+ [(set_attr "isa" "*,<kmov_isa>")
(set_attr "type" "negnot,msklog")
(set_attr "mode" "<MODE>")])
@@ -13318,7 +13304,7 @@
"@
not{l}\t%k0
#"
- [(set_attr "isa" "x64,avx512bw")
+ [(set_attr "isa" "x64,avx512bw_512")
(set_attr "type" "negnot,msklog")
(set_attr "mode" "SI,SI")])
@@ -13943,7 +13929,7 @@
return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
}
}
- [(set_attr "isa" "*,*,bmi2,avx512bw")
+ [(set_attr "isa" "*,*,bmi2,<kmov_isa>")
(set (attr "type")
(cond [(eq_attr "alternative" "1")
(const_string "lea")
@@ -14995,7 +14981,7 @@
return "shr{<imodesuffix>}\t{%2, %0|%0, %2}";
}
}
- [(set_attr "isa" "*,bmi2,avx512bw")
+ [(set_attr "isa" "*,bmi2,<kmov_isa>")
(set_attr "type" "ishift,ishiftx,msklog")
(set (attr "length_immediate")
(if_then_else
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a8f93ceddc5..e59f6bf4410 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -292,10 +292,10 @@
(V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
(define_mode_iterator VI12HFBF_AVX512VL
- [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
- V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
- V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
- V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
+ [(V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
+ (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
+ (V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
+ (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
(define_mode_iterator VI1_AVX512VL
[V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")])
@@ -445,9 +445,11 @@
(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
(define_mode_iterator VI1248_AVX512VLBW
- [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL && TARGET_AVX512BW")
+ [(V64QI "TARGET_AVX512BW && TARGET_EVEX512")
+ (V32QI "TARGET_AVX512VL && TARGET_AVX512BW")
(V16QI "TARGET_AVX512VL && TARGET_AVX512BW")
- (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+ (V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
(V8HI "TARGET_AVX512VL && TARGET_AVX512BW")
(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
@@ -481,15 +483,15 @@
[V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
(define_mode_iterator VHFBF_AVX512VL
- [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
- V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
+ [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
+ (V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
;; All vector integer modes
(define_mode_iterator VI
[(V16SI "TARGET_AVX512F && TARGET_EVEX512")
(V8DI "TARGET_AVX512F && TARGET_EVEX512")
- (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
- (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
+ (V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
(V8SI "TARGET_AVX") V4SI
(V4DI "TARGET_AVX") V2DI])
@@ -497,16 +499,16 @@
(define_mode_iterator VIHFBF
[(V16SI "TARGET_AVX512F && TARGET_EVEX512")
(V8DI "TARGET_AVX512F && TARGET_EVEX512")
- (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
- (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
+ (V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
(V8SI "TARGET_AVX") V4SI
(V4DI "TARGET_AVX") V2DI
- (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
- (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF])
+ (V32HF "TARGET_AVX512BW && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+ (V32BF "TARGET_AVX512BW && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF])
(define_mode_iterator VI_AVX2
- [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
- (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
+ [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
@@ -541,7 +543,7 @@
[(V32QI "TARGET_AVX2") V16QI])
(define_mode_iterator VI1_AVX512
- [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI])
+ [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI])
(define_mode_iterator VI1_AVX512F
[(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI])
@@ -550,20 +552,20 @@
[(V64QI "TARGET_AVX512VNNI") (V32QI "TARGET_AVX2") V16QI])
(define_mode_iterator VI12_256_512_AVX512VL
- [V64QI (V32QI "TARGET_AVX512VL")
- V32HI (V16HI "TARGET_AVX512VL")])
+ [(V64QI "TARGET_EVEX512") (V32QI "TARGET_AVX512VL")
+ (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL")])
(define_mode_iterator VI2_AVX2
[(V16HI "TARGET_AVX2") V8HI])
(define_mode_iterator VI2_AVX2_AVX512BW
- [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
+ [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI])
(define_mode_iterator VI2_AVX512F
[(V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI])
(define_mode_iterator VI2_AVX512VNNIBW
- [(V32HI "TARGET_AVX512BW || TARGET_AVX512VNNI")
+ [(V32HI "(TARGET_AVX512BW || TARGET_AVX512VNNI) && TARGET_EVEX512")
(V16HI "TARGET_AVX2") V8HI])
(define_mode_iterator VI4_AVX
@@ -584,12 +586,12 @@
(V8DI "TARGET_AVX512F && TARGET_EVEX512")])
(define_mode_iterator VI2_AVX512VL
- [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI])
+ [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") (V32HI "TARGET_EVEX512")])
(define_mode_iterator VI2HFBF_AVX512VL
- [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI
- (V8HF "TARGET_AVX512VL") (V16HF "TARGET_AVX512VL") V32HF
- (V8BF "TARGET_AVX512VL") (V16BF "TARGET_AVX512VL") V32BF])
+ [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") (V32HI "TARGET_EVEX512")
+ (V8HF "TARGET_AVX512VL") (V16HF "TARGET_AVX512VL") (V32HF "TARGET_EVEX512")
+ (V8BF "TARGET_AVX512VL") (V16BF "TARGET_AVX512VL") (V32BF "TARGET_EVEX512")])
(define_mode_iterator VI2H_AVX512VL
[(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI
@@ -600,7 +602,7 @@
[V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F")])
(define_mode_iterator VI8_AVX2_AVX512BW
- [(V8DI "TARGET_AVX512BW") (V4DI "TARGET_AVX2") V2DI])
+ [(V8DI "TARGET_AVX512BW && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
(define_mode_iterator VI8_AVX2
[(V4DI "TARGET_AVX2") V2DI])
@@ -624,11 +626,11 @@
;; ??? We should probably use TImode instead.
(define_mode_iterator VIMAX_AVX2_AVX512BW
- [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") V1TI])
+ [(V4TI "TARGET_AVX512BW && TARGET_EVEX512") (V2TI "TARGET_AVX2") V1TI])
;; Suppose TARGET_AVX512BW as baseline
(define_mode_iterator VIMAX_AVX512VL
- [V4TI (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")])
+ [(V4TI "TARGET_EVEX512") (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")])
(define_mode_iterator VIMAX_AVX2
[(V2TI "TARGET_AVX2") V1TI])
@@ -638,15 +640,15 @@
(V16HI "TARGET_AVX2") V8HI])
(define_mode_iterator VI12_AVX2_AVX512BW
- [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
- (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
+ [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI])
(define_mode_iterator VI24_AVX2
[(V16HI "TARGET_AVX2") V8HI
(V8SI "TARGET_AVX2") V4SI])
(define_mode_iterator VI124_AVX2_24_AVX512F_1_AVX512BW
- [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
+ [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI
(V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI])
@@ -656,13 +658,13 @@
(V8SI "TARGET_AVX2") V4SI])
(define_mode_iterator VI248_AVX512VL
- [V32HI V16SI V8DI
+ [(V32HI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
(V16HI "TARGET_AVX512VL") (V8SI "TARGET_AVX512VL")
(V4DI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
(V4SI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
(define_mode_iterator VI248_AVX512VLBW
- [(V32HI "TARGET_AVX512BW")
+ [(V32HI "TARGET_AVX512BW && TARGET_EVEX512")
(V16HI "TARGET_AVX512VL && TARGET_AVX512BW")
(V8HI "TARGET_AVX512VL && TARGET_AVX512BW")
(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
@@ -678,16 +680,16 @@
(V4DI "TARGET_AVX2") V2DI])
(define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW
- [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
- (V16SI "TARGET_AVX512BW") (V8SI "TARGET_AVX2") V4SI
+ [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
+ (V16SI "TARGET_AVX512BW && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
(define_mode_iterator VI248_AVX512BW
- [(V32HI "TARGET_AVX512BW") (V16SI "TARGET_EVEX512")
+ [(V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16SI "TARGET_EVEX512")
(V8DI "TARGET_EVEX512")])
(define_mode_iterator VI248_AVX512BW_AVX512VL
- [(V32HI "TARGET_AVX512BW")
+ [(V32HI "TARGET_AVX512BW && TARGET_EVEX512")
(V4DI "TARGET_AVX512VL") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
;; Suppose TARGET_AVX512VL as baseline
@@ -850,7 +852,8 @@
(define_mode_iterator VI24_128 [V8HI V4SI])
(define_mode_iterator VI248_128 [V8HI V4SI V2DI])
(define_mode_iterator VI248_256 [V16HI V8SI V4DI])
-(define_mode_iterator VI248_512 [V32HI V16SI V8DI])
+(define_mode_iterator VI248_512
+ [(V32HI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
(define_mode_iterator VI48_128 [V4SI V2DI])
(define_mode_iterator VI148_512
[(V64QI "TARGET_EVEX512") (V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")])
@@ -861,8 +864,8 @@
(define_mode_iterator VI124_256 [V32QI V16HI V8SI])
(define_mode_iterator VI124_256_AVX512F_AVX512BW
[V32QI V16HI V8SI
- (V64QI "TARGET_AVX512BW")
- (V32HI "TARGET_AVX512BW")
+ (V64QI "TARGET_AVX512BW && TARGET_EVEX512")
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
(V16SI "TARGET_AVX512F && TARGET_EVEX512")])
(define_mode_iterator VI48_256 [V8SI V4DI])
(define_mode_iterator VI48_512
@@ -870,11 +873,14 @@
(define_mode_iterator VI4_256_8_512 [V8SI V8DI])
(define_mode_iterator VI_AVX512BW
[(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
- (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+ (V64QI "TARGET_AVX512BW && TARGET_EVEX512")])
(define_mode_iterator VIHFBF_AVX512BW
[(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
- (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
- (V32HF "TARGET_AVX512BW") (V32BF "TARGET_AVX512BW")])
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+ (V64QI "TARGET_AVX512BW && TARGET_EVEX512")
+ (V32HF "TARGET_AVX512BW && TARGET_EVEX512")
+ (V32BF "TARGET_AVX512BW && TARGET_EVEX512")])
;; Int-float size matches
(define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF])
@@ -1948,17 +1954,19 @@
;; All integer modes with AVX512BW/DQ.
(define_mode_iterator SWI1248_AVX512BWDQ
- [(QI "TARGET_AVX512DQ") HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
+ [(QI "TARGET_AVX512DQ") HI (SI "TARGET_AVX512BW")
+ (DI "TARGET_AVX512BW && TARGET_EVEX512")])
;; All integer modes with AVX512BW, where HImode operation
;; can be used instead of QImode.
(define_mode_iterator SWI1248_AVX512BW
- [QI HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
+ [QI HI (SI "TARGET_AVX512BW")
+ (DI "TARGET_AVX512BW && TARGET_EVEX512")])
;; All integer modes with AVX512BW/DQ, even HImode requires DQ.
(define_mode_iterator SWI1248_AVX512BWDQ2
[(QI "TARGET_AVX512DQ") (HI "TARGET_AVX512DQ")
- (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
+ (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW && TARGET_EVEX512")])
(define_expand "kmov<mskmodesuffix>"
[(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
@@ -2097,7 +2105,7 @@
(zero_extend:DI
(not:SI (match_operand:SI 1 "register_operand" "k"))))
(unspec [(const_int 0)] UNSPEC_MASKOP)]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"knotd\t{%1, %0|%0, %1}";
[(set_attr "type" "msklog")
(set_attr "prefix" "vex")
@@ -2107,7 +2115,7 @@
[(set (match_operand:DI 0 "mask_reg_operand")
(zero_extend:DI
(not:SI (match_operand:SI 1 "mask_reg_operand"))))]
- "TARGET_AVX512BW && reload_completed"
+ "TARGET_AVX512BW && TARGET_EVEX512 && reload_completed"
[(parallel
[(set (match_dup 0)
(zero_extend:DI
@@ -2213,7 +2221,7 @@
(const_int 32))
(zero_extend:DI (match_operand:SI 2 "register_operand" "k"))))
(unspec [(const_int 0)] UNSPEC_MASKOP)]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"kunpckdq\t{%2, %1, %0|%0, %1, %2}"
[(set_attr "mode" "DI")])
@@ -3455,9 +3463,9 @@
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")
(V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
- (V64QI "TARGET_AVX512BW")
+ (V64QI "TARGET_AVX512BW && TARGET_EVEX512")
(V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
- (V32HI "TARGET_AVX512BW")
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
(V16SI "TARGET_AVX512F && TARGET_EVEX512")
(V8DI "TARGET_AVX512F && TARGET_EVEX512")
(V16SF "TARGET_AVX512F && TARGET_EVEX512")
@@ -12340,12 +12348,12 @@
;; Modes handled by vec_extract patterns.
(define_mode_iterator VEC_EXTRACT_MODE
- [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
- (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
+ [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI
- (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
- (V32BF "TARGET_AVX512BW") (V16BF "TARGET_AVX") V8BF
+ (V32HF "TARGET_AVX512BW && TARGET_EVEX512") (V16HF "TARGET_AVX") V8HF
+ (V32BF "TARGET_AVX512BW && TARGET_EVEX512") (V16BF "TARGET_AVX") V8BF
(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
(V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF
(V4TI "TARGET_AVX512F && TARGET_EVEX512") (V2TI "TARGET_AVX")])
@@ -14028,7 +14036,7 @@
(const_int 10) (const_int 11)
(const_int 12) (const_int 13)
(const_int 14) (const_int 15)])))]
- "TARGET_AVX512BW && ix86_pre_reload_split ()"
+ "TARGET_AVX512BW && TARGET_EVEX512 && ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (match_dup 0)
@@ -14053,7 +14061,7 @@
(const_int 10) (const_int 11)
(const_int 12) (const_int 13)
(const_int 14) (const_int 15)])))]
- "TARGET_AVX512BW && ix86_pre_reload_split ()"
+ "TARGET_AVX512BW && TARGET_EVEX512 && ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (match_dup 0)
@@ -14173,13 +14181,13 @@
[(set (match_operand:V32QI 0 "nonimmediate_operand")
(truncate:V32QI
(match_operand:V32HI 1 "register_operand")))]
- "TARGET_AVX512BW")
+ "TARGET_AVX512BW && TARGET_EVEX512")
(define_insn "avx512bw_<code>v32hiv32qi2"
[(set (match_operand:V32QI 0 "nonimmediate_operand" "=v,m")
(any_truncate:V32QI
(match_operand:V32HI 1 "register_operand" "v,v")))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpmov<trunsuffix>wb\t{%1, %0|%0, %1}"
[(set_attr "type" "ssemov")
(set_attr "memory" "none,store")
@@ -14225,7 +14233,7 @@
(match_operand:V32HI 1 "register_operand" "v,v"))
(match_operand:V32QI 2 "nonimm_or_0_operand" "0C,0")
(match_operand:SI 3 "register_operand" "Yk,Yk")))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpmov<trunsuffix>wb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
[(set_attr "type" "ssemov")
(set_attr "memory" "none,store")
@@ -14239,7 +14247,7 @@
(match_operand:V32HI 1 "register_operand"))
(match_dup 0)
(match_operand:SI 2 "register_operand")))]
- "TARGET_AVX512BW")
+ "TARGET_AVX512BW && TARGET_EVEX512")
(define_mode_iterator PMOV_DST_MODE_2
[V4SI V8HI (V16QI "TARGET_AVX512BW")])
@@ -16126,7 +16134,7 @@
(match_operand:V64QI 1 "register_operand")
(match_operand:V64QI 2 "nonimmediate_operand")
(match_operand:V16SI 3 "nonimmediate_operand")]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
{
rtx t1 = gen_reg_rtx (V8DImode);
rtx t2 = gen_reg_rtx (V16SImode);
@@ -17312,7 +17320,7 @@
(V8DF "TARGET_AVX512F && TARGET_EVEX512")
(V16SI "TARGET_AVX512F && TARGET_EVEX512")
(V8DI "TARGET_AVX512F && TARGET_EVEX512")
- (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V64QI "TARGET_AVX512VBMI")
(V32HF "TARGET_AVX512FP16")])
(define_expand "vec_perm<mode>"
@@ -18018,7 +18026,7 @@
(const_string "*")))])
(define_mode_iterator AVX512ZEXTMASK
- [(DI "TARGET_AVX512BW") (SI "TARGET_AVX512BW") HI])
+ [(DI "TARGET_AVX512BW && TARGET_EVEX512") (SI "TARGET_AVX512BW") HI])
(define_insn "<avx512>_testm<mode>3<mask_scalar_merge_name>"
[(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
@@ -18130,16 +18138,18 @@
(unspec [(const_int 0)] UNSPEC_MASKOP)])]
"TARGET_AVX512F")
+(define_mode_iterator SWI24_MASK [HI (SI "TARGET_EVEX512")])
+
(define_expand "vec_pack_trunc_<mode>"
[(parallel
[(set (match_operand:<DOUBLEMASKMODE> 0 "register_operand")
(ior:<DOUBLEMASKMODE>
(ashift:<DOUBLEMASKMODE>
(zero_extend:<DOUBLEMASKMODE>
- (match_operand:SWI24 2 "register_operand"))
+ (match_operand:SWI24_MASK 2 "register_operand"))
(match_dup 3))
(zero_extend:<DOUBLEMASKMODE>
- (match_operand:SWI24 1 "register_operand"))))
+ (match_operand:SWI24_MASK 1 "register_operand"))))
(unspec [(const_int 0)] UNSPEC_MASKOP)])]
"TARGET_AVX512BW"
{
@@ -18267,7 +18277,7 @@
(const_int 60) (const_int 61)
(const_int 62) (const_int 63)])))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpacksswb\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "<mask_prefix>")
@@ -18336,7 +18346,7 @@
(const_int 14) (const_int 15)
(const_int 28) (const_int 29)
(const_int 30) (const_int 31)])))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpackssdw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "<mask_prefix>")
@@ -18398,7 +18408,7 @@
(const_int 61) (const_int 125)
(const_int 62) (const_int 126)
(const_int 63) (const_int 127)])))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpunpckhbw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -18494,7 +18504,7 @@
(const_int 53) (const_int 117)
(const_int 54) (const_int 118)
(const_int 55) (const_int 119)])))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpunpcklbw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -19677,7 +19687,7 @@
[(match_operand:V32HI 1 "nonimmediate_operand" "vm")
(match_operand:SI 2 "const_0_to_255_operand")]
UNSPEC_PSHUFLW))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpshuflw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -19853,7 +19863,7 @@
[(match_operand:V32HI 1 "nonimmediate_operand" "vm")
(match_operand:SI 2 "const_0_to_255_operand")]
UNSPEC_PSHUFHW))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpshufhw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sselog")
(set_attr "prefix" "evex")
@@ -20735,7 +20745,7 @@
(define_expand "vec_unpacks_lo_di"
[(set (match_operand:SI 0 "register_operand")
(subreg:SI (match_operand:DI 1 "register_operand") 0))]
- "TARGET_AVX512BW")
+ "TARGET_AVX512BW && TARGET_EVEX512")
(define_expand "vec_unpacku_hi_<mode>"
[(match_operand:<sseunpackmode> 0 "register_operand")
@@ -20774,12 +20784,15 @@
(unspec [(const_int 0)] UNSPEC_MASKOP)])]
"TARGET_AVX512F")
+(define_mode_iterator SWI48x_MASK [SI (DI "TARGET_EVEX512")])
+
(define_expand "vec_unpacks_hi_<mode>"
[(parallel
- [(set (subreg:SWI48x
+ [(set (subreg:SWI48x_MASK
(match_operand:<HALFMASKMODE> 0 "register_operand") 0)
- (lshiftrt:SWI48x (match_operand:SWI48x 1 "register_operand")
- (match_dup 2)))
+ (lshiftrt:SWI48x_MASK
+ (match_operand:SWI48x_MASK 1 "register_operand")
+ (match_dup 2)))
(unspec [(const_int 0)] UNSPEC_MASKOP)])]
"TARGET_AVX512BW"
"operands[2] = GEN_INT (GET_MODE_BITSIZE (<HALFMASKMODE>mode));")
@@ -21534,7 +21547,7 @@
(const_int 1) (const_int 1)
(const_int 1) (const_int 1)]))
(const_int 1))))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpmulhrsw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
[(set_attr "type" "sseimul")
(set_attr "prefix" "evex")
@@ -22042,8 +22055,8 @@
;; Mode iterator to handle singularity w/ absence of V2DI and V4DI
;; modes for abs instruction on pre AVX-512 targets.
(define_mode_iterator VI1248_AVX512VL_AVX512BW
- [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
- (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
+ [(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V16HI "TARGET_AVX2") V8HI
(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX512VL")
(V2DI "TARGET_AVX512VL")])
@@ -22702,7 +22715,7 @@
[(set (match_operand:V32HI 0 "register_operand" "=v")
(any_extend:V32HI
(match_operand:V32QI 1 "nonimmediate_operand" "vm")))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
[(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -22716,7 +22729,7 @@
(match_operand:V64QI 2 "const0_operand"))
(match_parallel 3 "pmovzx_parallel"
[(match_operand 4 "const_int_operand")])))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"#"
"&& reload_completed"
[(set (match_dup 0) (zero_extend:V32HI (match_dup 1)))]
@@ -22736,7 +22749,7 @@
(match_operand:V64QI 3 "const0_operand"))
(match_parallel 4 "pmovzx_parallel"
[(match_operand 5 "const_int_operand")])))]
- "TARGET_AVX512BW"
+ "TARGET_AVX512BW && TARGET_EVEX512"
"#"
"&& reload_completed"
[(set (match_dup 0) (zero_extend:V32HI (match_dup 1)))]
@@ -22749,7 +22762,7 @@
[(set (match_operand:V32HI 0 "register_operand")
(any_extend:V32HI
(match_operand:V32QI 1 "nonimmediate_operand")))]
- "TARGET_AVX512BW")
+ "TARGET_AVX512BW && TARGET_EVEX512")
(define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
[(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,Yw")
@@ -26107,12 +26120,12 @@
(set_attr "mode" "OI")])
(define_mode_attr pbroadcast_evex_isa
- [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw")
- (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
+ [(V64QI "avx512bw_512") (V32QI "avx512bw") (V16QI "avx512bw")
+ (V32HI "avx512bw_512") (V16HI "avx512bw") (V8HI "avx512bw")
(V16SI "avx512f_512") (V8SI "avx512f") (V4SI "avx512f")
(V8DI "avx512f_512") (V4DI "avx512f") (V2DI "avx512f")
- (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")
- (V32BF "avx512bw") (V16BF "avx512bw") (V8BF "avx512bw")])
+ (V32HF "avx512bw_512") (V16HF "avx512bw") (V8HF "avx512bw")
+ (V32BF "avx512bw_512") (V16BF "avx512bw") (V8BF "avx512bw")])
(define_insn "avx2_pbroadcast<mode>"
[(set (match_operand:VIHFBF 0 "register_operand" "=x,v")
@@ -26967,7 +26980,8 @@
(V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
(V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
(V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
- (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+ (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
(V8HI "TARGET_AVX512BW && TARGET_AVX512VL")
(V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
(V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
@@ -26976,7 +26990,8 @@
[(V16SI "TARGET_EVEX512") (V8DI "TARGET_EVEX512")
(V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
(V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
- (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+ (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
(V8HI "TARGET_AVX512BW && TARGET_AVX512VL")
(V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
(V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 16/18] Support -mevex512 for AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (14 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 15/18] Support -mevex512 for AVX512BW intrins Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 17/18] Support -mevex512 for AVX512FP16 intrins Hu, Lin1
` (3 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/sse.md (VI1_AVX512VL): Add TARGET_EVEX512.
(VI8_FVL): Ditto.
(VI1_AVX512F): Ditto.
(VI1_AVX512VNNI): Ditto.
(VI1_AVX512VL_F): Ditto.
(VI12_VI48F_AVX512VL): Ditto.
(*avx512f_permvar_truncv32hiv32qi_1): Ditto.
(sdot_prod<mode>): Ditto.
(VEC_PERM_AVX2): Ditto.
(VPERMI2): Ditto.
(VPERMI2I): Ditto.
(vpmadd52<vpmadd52type>v8di): Ditto.
(usdot_prod<mode>): Ditto.
(vpdpbusd_v16si): Ditto.
(vpdpbusds_v16si): Ditto.
(vpdpwssd_v16si): Ditto.
(vpdpwssds_v16si): Ditto.
(VI48_AVX512VP2VL): Ditto.
(avx512vp2intersect_2intersectv16si): Ditto.
(VF_AVX512BF16VL): Ditto.
(VF1_AVX512_256): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr90096.c: Adjust error message.
Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>
---
gcc/config/i386/sse.md | 56 +++++++++++++------------
gcc/testsuite/gcc.target/i386/pr90096.c | 2 +-
2 files changed, 31 insertions(+), 27 deletions(-)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e59f6bf4410..a5a95b9de66 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -298,7 +298,7 @@
(V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
(define_mode_iterator VI1_AVX512VL
- [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")])
+ [(V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")])
;; All vector modes
(define_mode_iterator V
@@ -531,7 +531,7 @@
[(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI])
(define_mode_iterator VI8_FVL
- [(V8DI "TARGET_AVX512F") V4DI (V2DI "TARGET_AVX512VL")])
+ [(V8DI "TARGET_AVX512F && TARGET_EVEX512") V4DI (V2DI "TARGET_AVX512VL")])
(define_mode_iterator VI8_AVX512VL
[(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
@@ -546,10 +546,10 @@
[(V64QI "TARGET_AVX512BW && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI])
(define_mode_iterator VI1_AVX512F
- [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI])
+ [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI])
(define_mode_iterator VI1_AVX512VNNI
- [(V64QI "TARGET_AVX512VNNI") (V32QI "TARGET_AVX2") V16QI])
+ [(V64QI "TARGET_AVX512VNNI && TARGET_EVEX512") (V32QI "TARGET_AVX2") V16QI])
(define_mode_iterator VI12_256_512_AVX512VL
[(V64QI "TARGET_EVEX512") (V32QI "TARGET_AVX512VL")
@@ -599,7 +599,7 @@
V8DI ])
(define_mode_iterator VI1_AVX512VL_F
- [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F")])
+ [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F && TARGET_EVEX512")])
(define_mode_iterator VI8_AVX2_AVX512BW
[(V8DI "TARGET_AVX512BW && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
@@ -923,8 +923,8 @@
(V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
(V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
(V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
- V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
- V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
+ (V64QI "TARGET_EVEX512") (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
+ (V32HI "TARGET_EVEX512") (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
(define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF])
@@ -14217,7 +14217,7 @@
(const_int 26) (const_int 27)
(const_int 28) (const_int 29)
(const_int 30) (const_int 31)])))]
- "TARGET_AVX512VBMI && ix86_pre_reload_split ()"
+ "TARGET_AVX512VBMI && TARGET_EVEX512 && ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (match_dup 0)
@@ -16040,7 +16040,7 @@
"TARGET_SSE2"
{
/* Try with vnni instructions. */
- if ((<MODE_SIZE> == 64 && TARGET_AVX512VNNI)
+ if ((<MODE_SIZE> == 64 && TARGET_AVX512VNNI && TARGET_EVEX512)
|| (<MODE_SIZE> < 64
&& ((TARGET_AVX512VNNI && TARGET_AVX512VL) || TARGET_AVXVNNI)))
{
@@ -17320,7 +17320,8 @@
(V8DF "TARGET_AVX512F && TARGET_EVEX512")
(V16SI "TARGET_AVX512F && TARGET_EVEX512")
(V8DI "TARGET_AVX512F && TARGET_EVEX512")
- (V32HI "TARGET_AVX512BW && TARGET_EVEX512") (V64QI "TARGET_AVX512VBMI")
+ (V32HI "TARGET_AVX512BW && TARGET_EVEX512")
+ (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512")
(V32HF "TARGET_AVX512FP16")])
(define_expand "vec_perm<mode>"
@@ -26983,7 +26984,8 @@
(V32HI "TARGET_AVX512BW && TARGET_EVEX512")
(V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
(V8HI "TARGET_AVX512BW && TARGET_AVX512VL")
- (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
+ (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512")
+ (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
(V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
(define_mode_iterator VPERMI2I
@@ -26993,7 +26995,8 @@
(V32HI "TARGET_AVX512BW && TARGET_EVEX512")
(V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
(V8HI "TARGET_AVX512BW && TARGET_AVX512VL")
- (V64QI "TARGET_AVX512VBMI") (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
+ (V64QI "TARGET_AVX512VBMI && TARGET_EVEX512")
+ (V32QI "TARGET_AVX512VBMI && TARGET_AVX512VL")
(V16QI "TARGET_AVX512VBMI && TARGET_AVX512VL")])
(define_expand "<avx512>_vpermi2var<mode>3_mask"
@@ -28977,7 +28980,7 @@
(match_operand:V8DI 2 "register_operand" "v")
(match_operand:V8DI 3 "nonimmediate_operand" "vm")]
VPMADD52))]
- "TARGET_AVX512IFMA"
+ "TARGET_AVX512IFMA && TARGET_EVEX512"
"vpmadd52<vpmadd52type>\t{%3, %2, %0|%0, %2, %3}"
[(set_attr "type" "ssemuladd")
(set_attr "prefix" "evex")
@@ -29579,9 +29582,9 @@
(match_operand:VI1_AVX512VNNI 1 "register_operand")
(match_operand:VI1_AVX512VNNI 2 "register_operand")
(match_operand:<ssedvecmode> 3 "register_operand")]
- "(<MODE_SIZE> == 64
- ||((TARGET_AVX512VNNI && TARGET_AVX512VL)
- || TARGET_AVXVNNI))"
+ "((<MODE_SIZE> == 64 && TARGET_EVEX512)
+ || ((TARGET_AVX512VNNI && TARGET_AVX512VL)
+ || TARGET_AVXVNNI))"
{
operands[1] = lowpart_subreg (<ssedvecmode>mode,
force_reg (<MODE>mode, operands[1]),
@@ -29602,7 +29605,7 @@
(match_operand:V16SI 2 "register_operand" "v")
(match_operand:V16SI 3 "nonimmediate_operand" "vm")]
UNSPEC_VPDPBUSD))]
- "TARGET_AVX512VNNI"
+ "TARGET_AVX512VNNI && TARGET_EVEX512"
"vpdpbusd\t{%3, %2, %0|%0, %2, %3}"
[(set_attr ("prefix") ("evex"))])
@@ -29670,7 +29673,7 @@
(match_operand:V16SI 2 "register_operand" "v")
(match_operand:V16SI 3 "nonimmediate_operand" "vm")]
UNSPEC_VPDPBUSDS))]
- "TARGET_AVX512VNNI"
+ "TARGET_AVX512VNNI && TARGET_EVEX512"
"vpdpbusds\t{%3, %2, %0|%0, %2, %3}"
[(set_attr ("prefix") ("evex"))])
@@ -29738,7 +29741,7 @@
(match_operand:V16SI 2 "register_operand" "v")
(match_operand:V16SI 3 "nonimmediate_operand" "vm")]
UNSPEC_VPDPWSSD))]
- "TARGET_AVX512VNNI"
+ "TARGET_AVX512VNNI && TARGET_EVEX512"
"vpdpwssd\t{%3, %2, %0|%0, %2, %3}"
[(set_attr ("prefix") ("evex"))])
@@ -29806,7 +29809,7 @@
(match_operand:V16SI 2 "register_operand" "v")
(match_operand:V16SI 3 "nonimmediate_operand" "vm")]
UNSPEC_VPDPWSSDS))]
- "TARGET_AVX512VNNI"
+ "TARGET_AVX512VNNI && TARGET_EVEX512"
"vpdpwssds\t{%3, %2, %0|%0, %2, %3}"
[(set_attr ("prefix") ("evex"))])
@@ -29929,9 +29932,9 @@
(set_attr "mode" "<sseinsnmode>")])
(define_mode_iterator VI48_AVX512VP2VL
- [V8DI
- (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
- (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")])
+ [(V8DI "TARGET_EVEX512")
+ (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
+ (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")])
(define_mode_iterator MASK_DWI [P2QI P2HI])
@@ -29972,12 +29975,12 @@
(unspec:P2HI [(match_operand:V16SI 1 "register_operand" "v")
(match_operand:V16SI 2 "vector_operand" "vm")]
UNSPEC_VP2INTERSECT))]
- "TARGET_AVX512VP2INTERSECT"
+ "TARGET_AVX512VP2INTERSECT && TARGET_EVEX512"
"vp2intersectd\t{%2, %1, %0|%0, %1, %2}"
[(set_attr ("prefix") ("evex"))])
(define_mode_iterator VF_AVX512BF16VL
- [V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
+ [(V32BF "TARGET_EVEX512") (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")])
;; Converting from BF to SF
(define_mode_attr bf16_cvt_2sf
[(V32BF "V16SF") (V16BF "V8SF") (V8BF "V4SF")])
@@ -30070,7 +30073,8 @@
"TARGET_AVX512BF16 && TARGET_AVX512VL"
"vcvtneps2bf16{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}")
-(define_mode_iterator VF1_AVX512_256 [V16SF (V8SF "TARGET_AVX512VL")])
+(define_mode_iterator VF1_AVX512_256
+ [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")])
(define_expand "avx512f_cvtneps2bf16_<mode>_maskz"
[(match_operand:<sf_cvt_bf16> 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/pr90096.c b/gcc/testsuite/gcc.target/i386/pr90096.c
index 871e0ffc691..74f052ea8e5 100644
--- a/gcc/testsuite/gcc.target/i386/pr90096.c
+++ b/gcc/testsuite/gcc.target/i386/pr90096.c
@@ -10,7 +10,7 @@ volatile __mmask64 m64;
void
foo (int i)
{
- x1 = _mm512_gf2p8affineinv_epi64_epi8 (x1, x2, 3); /* { dg-error "needs isa option -mgfni -mavx512f" } */
+ x1 = _mm512_gf2p8affineinv_epi64_epi8 (x1, x2, 3); /* { dg-error "needs isa option -mevex512 -mgfni -mavx512f" } */
}
#ifdef __x86_64__
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 17/18] Support -mevex512 for AVX512FP16 intrins
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (15 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 16/18] Support -mevex512 for AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-21 7:20 ` [PATCH 18/18] Allow -mno-evex512 usage Hu, Lin1
` (2 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/sse.md (V48H_AVX512VL): Add TARGET_EVEX512.
(VFH): Ditto.
(VF2H): Ditto.
(VFH_AVX512VL): Ditto.
(VHFBF): Ditto.
(VHF_AVX512VL): Ditto.
(VI2H_AVX512VL): Ditto.
(VI2F_256_512): Ditto.
(VF48_I1248): Remove unused iterator.
(VF48H_AVX512VL): Add TARGET_EVEX512.
(VF_AVX512): Remove unused iterator.
(REDUC_PLUS_MODE): Add TARGET_EVEX512.
(REDUC_SMINMAX_MODE): Ditto.
(FMAMODEM): Ditto.
(VFH_SF_AVX512VL): Ditto.
(VEC_PERM_AVX2): Ditto.
Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>
---
gcc/config/i386/sse.md | 44 ++++++++++++++++++++----------------------
1 file changed, 21 insertions(+), 23 deletions(-)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a5a95b9de66..25d53e15dce 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -280,7 +280,7 @@
(define_mode_iterator V48H_AVX512VL
[(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
(V8DI "TARGET_EVEX512") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
- (V32HF "TARGET_AVX512FP16")
+ (V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
@@ -355,7 +355,7 @@
(V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")])
(define_mode_iterator VFH
- [(V32HF "TARGET_AVX512FP16")
+ [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX") V4SF
@@ -401,7 +401,7 @@
;; All DFmode & HFmode vector float modes
(define_mode_iterator VF2H
- [(V32HF "TARGET_AVX512FP16")
+ [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8DF "TARGET_AVX512F && TARGET_EVEX512") (V4DF "TARGET_AVX") V2DF])
@@ -463,7 +463,7 @@
[(V16SF "TARGET_AVX512ER") (V8SF "TARGET_AVX") V4SF])
(define_mode_iterator VFH_AVX512VL
- [(V32HF "TARGET_AVX512FP16")
+ [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
@@ -475,12 +475,14 @@
(define_mode_iterator VF1_AVX512VL
[(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
-(define_mode_iterator VHFBF [V32HF V16HF V8HF V32BF V16BF V8BF])
+(define_mode_iterator VHFBF
+ [(V32HF "TARGET_EVEX512") V16HF V8HF
+ (V32BF "TARGET_EVEX512") V16BF V8BF])
(define_mode_iterator VHFBF_256 [V16HF V16BF])
(define_mode_iterator VHFBF_128 [V8HF V8BF])
(define_mode_iterator VHF_AVX512VL
- [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
+ [(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
(define_mode_iterator VHFBF_AVX512VL
[(V32HF "TARGET_EVEX512") (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")
@@ -594,9 +596,9 @@
(V8BF "TARGET_AVX512VL") (V16BF "TARGET_AVX512VL") (V32BF "TARGET_EVEX512")])
(define_mode_iterator VI2H_AVX512VL
- [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI
- (V8SI "TARGET_AVX512VL") V16SI
- V8DI ])
+ [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") (V32HI "TARGET_EVEX512")
+ (V8SI "TARGET_AVX512VL") (V16SI "TARGET_EVEX512")
+ (V8DI "TARGET_EVEX512")])
(define_mode_iterator VI1_AVX512VL_F
[V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F && TARGET_EVEX512")])
@@ -883,7 +885,10 @@
(V32BF "TARGET_AVX512BW && TARGET_EVEX512")])
;; Int-float size matches
-(define_mode_iterator VI2F_256_512 [V16HI V32HI V16HF V32HF V16BF V32BF])
+(define_mode_iterator VI2F_256_512
+ [V16HI (V32HI "TARGET_EVEX512")
+ V16HF (V32HF "TARGET_EVEX512")
+ V16BF (V32BF "TARGET_EVEX512")])
(define_mode_iterator VI4F_128 [V4SI V4SF])
(define_mode_iterator VI8F_128 [V2DI V2DF])
(define_mode_iterator VI4F_256 [V8SI V8SF])
@@ -899,10 +904,8 @@
(V8DI "TARGET_AVX512F && TARGET_EVEX512")
(V8DF "TARGET_AVX512F && TARGET_EVEX512")
(V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")])
-(define_mode_iterator VF48_I1248
- [V16SI V16SF V8DI V8DF V32HI V64QI])
(define_mode_iterator VF48H_AVX512VL
- [V8DF V16SF (V8SF "TARGET_AVX512VL")])
+ [(V8DF "TARGET_EVEX512") (V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL")])
(define_mode_iterator VF48_128
[V2DF V4SF])
@@ -928,11 +931,6 @@
(define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF])
-(define_mode_iterator VF_AVX512
- [(V4SF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
- (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
- V16SF V8DF])
-
(define_mode_iterator V8_128 [V8HI V8HF V8BF])
(define_mode_iterator V16_256 [V16HI V16HF V16BF])
(define_mode_iterator V32_512
@@ -3419,7 +3417,7 @@
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8DF "TARGET_AVX512F && TARGET_EVEX512")
(V16SF "TARGET_AVX512F && TARGET_EVEX512")
- (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+ (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL && TARGET_EVEX512")
(V32QI "TARGET_AVX")
(V64QI "TARGET_AVX512F && TARGET_EVEX512")])
@@ -3464,7 +3462,7 @@
(V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")
(V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
(V64QI "TARGET_AVX512BW && TARGET_EVEX512")
- (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+ (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL && TARGET_EVEX512")
(V32HI "TARGET_AVX512BW && TARGET_EVEX512")
(V16SI "TARGET_AVX512F && TARGET_EVEX512")
(V8DI "TARGET_AVX512F && TARGET_EVEX512")
@@ -5318,7 +5316,7 @@
(HF "TARGET_AVX512FP16")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
- (V32HF "TARGET_AVX512FP16")])
+ (V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")])
(define_expand "fma<mode>4"
[(set (match_operand:FMAMODEM 0 "register_operand")
@@ -5427,7 +5425,7 @@
;; Suppose AVX-512F as baseline
(define_mode_iterator VFH_SF_AVX512VL
- [(V32HF "TARGET_AVX512FP16")
+ [(V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")
(V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
(HF "TARGET_AVX512FP16")
@@ -17322,7 +17320,7 @@
(V8DI "TARGET_AVX512F && TARGET_EVEX512")
(V32HI "TARGET_AVX512BW && TARGET_EVEX512")
(V64QI "TARGET_AVX512VBMI && TARGET_EVEX512")
- (V32HF "TARGET_AVX512FP16")])
+ (V32HF "TARGET_AVX512FP16 && TARGET_EVEX512")])
(define_expand "vec_perm<mode>"
[(match_operand:VEC_PERM_AVX2 0 "register_operand")
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 18/18] Allow -mno-evex512 usage
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (16 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 17/18] Support -mevex512 for AVX512FP16 intrins Hu, Lin1
@ 2023-09-21 7:20 ` Hu, Lin1
2023-09-22 3:30 ` [PATCH 00/18] Support -mevex512 for AVX512 Hongtao Liu
2023-09-28 0:32 ` ZiNgA BuRgA
19 siblings, 0 replies; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-21 7:20 UTC (permalink / raw)
To: gcc-patches; +Cc: hongtao.liu, ubizjak, haochen.jiang
From: Haochen Jiang <haochen.jiang@intel.com>
gcc/ChangeLog:
* config/i386/i386.opt: Allow -mno-evex512.
gcc/testsuite/ChangeLog:
* gcc.target/i386/noevex512-1.c: New test.
* gcc.target/i386/noevex512-2.c: Ditto.
* gcc.target/i386/noevex512-3.c: Ditto.
---
gcc/config/i386/i386.opt | 2 +-
gcc/testsuite/gcc.target/i386/noevex512-1.c | 13 +++++++++++++
gcc/testsuite/gcc.target/i386/noevex512-2.c | 13 +++++++++++++
gcc/testsuite/gcc.target/i386/noevex512-3.c | 13 +++++++++++++
4 files changed, 40 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-3.c
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 6d8601b1f75..34fc167af82 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1312,5 +1312,5 @@ Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
Enable vectorization for scatter instruction.
mevex512
-Target RejectNegative Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
+Target Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save
Support 512 bit vector built-in functions and code generation.
diff --git a/gcc/testsuite/gcc.target/i386/noevex512-1.c b/gcc/testsuite/gcc.target/i386/noevex512-1.c
new file mode 100644
index 00000000000..7fd45f15be6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/noevex512-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -march=x86-64 -mavx512f -mno-evex512 -Wno-psabi" } */
+/* { dg-final { scan-assembler-not ".%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__m512d
+foo ()
+{
+ __m512d a, b;
+ a = a + b;
+ return a;
+}
diff --git a/gcc/testsuite/gcc.target/i386/noevex512-2.c b/gcc/testsuite/gcc.target/i386/noevex512-2.c
new file mode 100644
index 00000000000..1c206e385d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/noevex512-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx512bw -mno-evex512" } */
+
+#include <immintrin.h>
+
+long long
+foo (long long c)
+{
+ register long long a __asm ("k7") = c;
+ long long b = foo (a);
+ asm volatile ("" : "+k" (b)); /* { dg-error "inconsistent operand constraints in an 'asm'" } */
+ return b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/noevex512-3.c b/gcc/testsuite/gcc.target/i386/noevex512-3.c
new file mode 100644
index 00000000000..10e00c2d61c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/noevex512-3.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64 -Wno-psabi -mavx512f" } */
+/* { dg-final { scan-assembler-not ".%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__attribute__ ((target ("no-evex512"))) __m512d
+foo ()
+{
+ __m512d a, b;
+ a = a + b;
+ return a;
+}
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/18] Support -mevex512 for AVX512
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (17 preceding siblings ...)
2023-09-21 7:20 ` [PATCH 18/18] Allow -mno-evex512 usage Hu, Lin1
@ 2023-09-22 3:30 ` Hongtao Liu
2023-09-28 0:32 ` ZiNgA BuRgA
19 siblings, 0 replies; 25+ messages in thread
From: Hongtao Liu @ 2023-09-22 3:30 UTC (permalink / raw)
To: Hu, Lin1; +Cc: gcc-patches, hongtao.liu, ubizjak, haochen.jiang
On Thu, Sep 21, 2023 at 3:22 PM Hu, Lin1 <lin1.hu@intel.com> wrote:
>
> Hi all,
>
> After previous discussion, instead of supporting option -mavx10.1, we
> will first introduct option -m[no-]evex512, which will enable/disable
> 512 bit register and 64 bit mask register.
>
> It will not change the current option behavior since if AVX512F is
> enabled with no evex512 option specified, it will automatically enable
> 512 bit register and 64 bit mask register.
>
> How the patches go comes following:
>
> Patch 1 added initial support for option -mevex512.
>
> Patch 2-6 refined current intrin file to push evex512 target for all
> 512 bit intrins. Those scalar intrins remained untouched.
>
> Patch 7-11 added OPTION_MASK_ISA2_EVEX512 for all related builtins.
>
> Patch 12 disabled zmm register, 512 bit libmvec call for no-evex512,
> also requested evex512 for vectorization when using 512 bit register.
>
> Patch 13-17 supported evex512 in related patterns.
>
> Patch 18 added testcases for -mno-evex512 and allowed its usage.
>
> The patches currently cause scan-asm fail for pr89229-{5,6,7}b.c since
> we will emit scalar vmovss here. When trying to use x/ymm 16+ w/o
> avx512vl but with avx512f+evex512, I suppose we could either emit scalar
> or zmm instructions. It is quite a rare case on HW since there is no
> HW w/o avx512vl but with avx512f, so I prefer to not to add maintainence
> effort here to get a slightly perf improvement. But it could be changed
> to former behavior.
To make it easier for people to test before committing, I pushed the
patch to the vendor branch
refs/vendors/ix86/heads/evex512.
Welcome to try it out.
>
> Discussions are welcomed for all the patches.
>
> Thx,
> Haochen
>
> Haochen Jiang (18):
> Initial support for -mevex512
> Push evex512 target for 512 bit intrins
> Push evex512 target for 512 bit intrins
> Push evex512 target for 512 bit intrins
> Push evex512 target for 512 bit intrins
> Push evex512 target for 512 bit intrins
> Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
> Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
> Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
> Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
> Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins
> Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512
> Support -mevex512 for AVX512F intrins
> Support -mevex512 for AVX512DQ intrins
> Support -mevex512 for AVX512BW intrins
> Support -mevex512 for
> AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ
> intrins
> Support -mevex512 for AVX512FP16 intrins
> Allow -mno-evex512 usage
>
> gcc/common/config/i386/i386-common.cc | 15 +
> gcc/config.gcc | 19 +-
> gcc/config/i386/avx5124fmapsintrin.h | 2 +-
> gcc/config/i386/avx5124vnniwintrin.h | 2 +-
> gcc/config/i386/avx512bf16intrin.h | 31 +-
> gcc/config/i386/avx512bitalgintrin.h | 155 +-
> gcc/config/i386/avx512bitalgvlintrin.h | 180 +
> gcc/config/i386/avx512bwintrin.h | 291 +-
> gcc/config/i386/avx512dqintrin.h | 1840 +-
> gcc/config/i386/avx512erintrin.h | 2 +-
> gcc/config/i386/avx512fintrin.h | 19663 +++++++++---------
> gcc/config/i386/avx512fp16intrin.h | 8925 ++++----
> gcc/config/i386/avx512ifmaintrin.h | 4 +-
> gcc/config/i386/avx512pfintrin.h | 2 +-
> gcc/config/i386/avx512vbmi2intrin.h | 4 +-
> gcc/config/i386/avx512vbmiintrin.h | 4 +-
> gcc/config/i386/avx512vnniintrin.h | 4 +-
> gcc/config/i386/avx512vp2intersectintrin.h | 4 +-
> gcc/config/i386/avx512vpopcntdqintrin.h | 4 +-
> gcc/config/i386/gfniintrin.h | 76 +-
> gcc/config/i386/i386-builtin.def | 1312 +-
> gcc/config/i386/i386-builtins.cc | 96 +-
> gcc/config/i386/i386-c.cc | 2 +
> gcc/config/i386/i386-expand.cc | 18 +-
> gcc/config/i386/i386-options.cc | 33 +-
> gcc/config/i386/i386.cc | 168 +-
> gcc/config/i386/i386.h | 7 +-
> gcc/config/i386/i386.md | 127 +-
> gcc/config/i386/i386.opt | 4 +
> gcc/config/i386/immintrin.h | 2 +
> gcc/config/i386/predicates.md | 3 +-
> gcc/config/i386/sse.md | 854 +-
> gcc/config/i386/vaesintrin.h | 4 +-
> gcc/config/i386/vpclmulqdqintrin.h | 4 +-
> gcc/testsuite/gcc.target/i386/noevex512-1.c | 13 +
> gcc/testsuite/gcc.target/i386/noevex512-2.c | 13 +
> gcc/testsuite/gcc.target/i386/noevex512-3.c | 13 +
> gcc/testsuite/gcc.target/i386/pr89229-5b.c | 2 +-
> gcc/testsuite/gcc.target/i386/pr89229-6b.c | 2 +-
> gcc/testsuite/gcc.target/i386/pr89229-7b.c | 2 +-
> gcc/testsuite/gcc.target/i386/pr90096.c | 2 +-
> 41 files changed, 17170 insertions(+), 16738 deletions(-)
> create mode 100644 gcc/config/i386/avx512bitalgvlintrin.h
> create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/noevex512-3.c
>
> --
> 2.31.1
>
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/18] Support -mevex512 for AVX512
2023-09-21 7:19 [PATCH 00/18] Support -mevex512 for AVX512 Hu, Lin1
` (18 preceding siblings ...)
2023-09-22 3:30 ` [PATCH 00/18] Support -mevex512 for AVX512 Hongtao Liu
@ 2023-09-28 0:32 ` ZiNgA BuRgA
2023-09-28 2:26 ` Hu, Lin1
19 siblings, 1 reply; 25+ messages in thread
From: ZiNgA BuRgA @ 2023-09-28 0:32 UTC (permalink / raw)
To: lin1.hu, gcc-patches
Thanks for the new patch!
I see that there's a new __EVEX512__ define. Will there be some
__EVEX256__ (or maybe some max EVEX width) define, so that code can
detect whether the compiler supports AVX10.1/256 without resorting to
version checks?
^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH 00/18] Support -mevex512 for AVX512
2023-09-28 0:32 ` ZiNgA BuRgA
@ 2023-09-28 2:26 ` Hu, Lin1
2023-09-28 3:23 ` ZiNgA BuRgA
0 siblings, 1 reply; 25+ messages in thread
From: Hu, Lin1 @ 2023-09-28 2:26 UTC (permalink / raw)
To: ZiNgA BuRgA, gcc-patches
Hi,
Thanks for you reply.
I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches.
For example:
I have a segment of code like:
#if defined(__EVEX512__):
__mm512.*__;
#else
__mm256.*__;
#endif
But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.
So the code should be:
#if defined(__EVEX512__):
__mm512.*__;
#elif defined(__EVEX256__):
__mm256.*__;
#else
__mm512.*__;
#endif
If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.
BRs,
Lin
-----Original Message-----
From: ZiNgA BuRgA <zingaburga@hotmail.com>
Sent: Thursday, September 28, 2023 8:32 AM
To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512
Thanks for the new patch!
I see that there's a new __EVEX512__ define. Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/18] Support -mevex512 for AVX512
2023-09-28 2:26 ` Hu, Lin1
@ 2023-09-28 3:23 ` ZiNgA BuRgA
2023-10-07 2:33 ` Hongtao Liu
0 siblings, 1 reply; 25+ messages in thread
From: ZiNgA BuRgA @ 2023-09-28 3:23 UTC (permalink / raw)
To: Hu, Lin1, gcc-patches
That sounds about right. The code I had in mind would perhaps look like:
#if defined(__AVX512BW__) && defined(__AVX512VL__)
#if defined(__EVEX256__) && !defined(__EVEX512__)
// compiled code is AVX10.1/256 and AVX512 compatible
#else
// compiled code is only AVX512 compatible
#endif
// some code which only uses 256b instructions
__m256i...
#endif
The '__EVEX256__' define would avoid needing to check compiler versions.
Hopefully you can align it with whatever Clang does:
https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661/18
Thanks!
On 28/09/2023 12:26 pm, Hu, Lin1 wrote:
> Hi,
>
> Thanks for you reply.
>
> I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches.
>
> For example:
>
> I have a segment of code like:
> #if defined(__EVEX512__):
> __mm512.*__;
> #else
> __mm256.*__;
> #endif
>
> But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.
>
> So the code should be:
> #if defined(__EVEX512__):
> __mm512.*__;
> #elif defined(__EVEX256__):
> __mm256.*__;
> #else
> __mm512.*__;
> #endif
>
> If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.
>
> BRs,
> Lin
>
> -----Original Message-----
> From: ZiNgA BuRgA <zingaburga@hotmail.com>
> Sent: Thursday, September 28, 2023 8:32 AM
> To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512
>
> Thanks for the new patch!
>
> I see that there's a new __EVEX512__ define. Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/18] Support -mevex512 for AVX512
2023-09-28 3:23 ` ZiNgA BuRgA
@ 2023-10-07 2:33 ` Hongtao Liu
0 siblings, 0 replies; 25+ messages in thread
From: Hongtao Liu @ 2023-10-07 2:33 UTC (permalink / raw)
To: ZiNgA BuRgA; +Cc: Hu, Lin1, gcc-patches
On Thu, Sep 28, 2023 at 11:23 AM ZiNgA BuRgA <zingaburga@hotmail.com> wrote:
>
> That sounds about right. The code I had in mind would perhaps look like:
>
>
> #if defined(__AVX512BW__) && defined(__AVX512VL__)
> #if defined(__EVEX256__) && !defined(__EVEX512__)
> // compiled code is AVX10.1/256 and AVX512 compatible
> #else
> // compiled code is only AVX512 compatible
> #endif
>
> // some code which only uses 256b instructions
> __m256i...
> #endif
>
>
> The '__EVEX256__' define would avoid needing to check compiler versions.
Sounds reasonable, regarding how to set __EVEX256__, I think it should
be set/unset along with __AVX512VL__ and __EVEX512__ should not unset
__EVEX256__.
> Hopefully you can align it with whatever Clang does:
> https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661/18
>
> Thanks!
>
> On 28/09/2023 12:26 pm, Hu, Lin1 wrote:
> > Hi,
> >
> > Thanks for you reply.
> >
> > I'd like to verify that our understanding of your requirements is correct, and that __EVEX256__ can be considered a default macro to determine whether the compiler supports the __EVEX***__ series of switches.
> >
> > For example:
> >
> > I have a segment of code like:
> > #if defined(__EVEX512__):
> > __mm512.*__;
> > #else
> > __mm256.*__;
> > #endif
> >
> > But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I use gcc-13, so I can still use 512bit.
> >
> > So the code should be:
> > #if defined(__EVEX512__):
> > __mm512.*__;
> > #elif defined(__EVEX256__):
> > __mm256.*__;
> > #else
> > __mm512.*__;
> > #endif
> >
> > If we understand correctly, we'll consider the request. But since we're about to have a vacation, follow-up replies may be a bit slower.
> >
> > BRs,
> > Lin
> >
> > -----Original Message-----
> > From: ZiNgA BuRgA <zingaburga@hotmail.com>
> > Sent: Thursday, September 28, 2023 8:32 AM
> > To: Hu, Lin1 <lin1.hu@intel.com>; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512
> >
> > Thanks for the new patch!
> >
> > I see that there's a new __EVEX512__ define. Will there be some __EVEX256__ (or maybe some max EVEX width) define, so that code can detect whether the compiler supports AVX10.1/256 without resorting to version checks?
> >
> >
>
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 25+ messages in thread