public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] Add AVX512 k-mask intrinsics
@ 2016-11-11 14:34 Uros Bizjak
  2016-11-11 17:39 ` Andrew Senkevich
  0 siblings, 1 reply; 48+ messages in thread
From: Uros Bizjak @ 2016-11-11 14:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Senkevich

Some quick remarks:

+(define_insn "kmovb"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
+ (unspec:QI
+  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
+  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
+  "@
+   kmovb\t{%k1, %0|%0, %k1}
+   kmovb\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "QI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovd"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
+ (unspec:SI
+  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
+  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
+  "@
+   kmovd\t{%k1, %0|%0, %k1}
+   kmovd\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "SI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovq"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
+ (unspec:DI
+  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
+  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
+  "@
+   kmovq\t{%k1, %0|%0, %k1}
+   kmovq\t{%1, %0|%0, %1}
+   kmovq\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "DI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])

- kmovd (and existing kmovw) should be using register_operand for
opreand 0. In this case, there is no need for MEM_P checks at all.
- In the insn constraint, pease check TARGET_AVX before checking MEM_P.
- please put these definitions above corresponding *mov??_internal patterns.

+//    case USI_FTYPE_UQI:
+//    case USI_FTYPE_UHI:

No commented-out code without a good reason, please.

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-11-11 14:34 [PATCH] Add AVX512 k-mask intrinsics Uros Bizjak
@ 2016-11-11 17:39 ` Andrew Senkevich
  2016-11-11 17:50   ` Uros Bizjak
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2016-11-11 17:39 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]

2016-11-11 17:34 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> Some quick remarks:
>
> +(define_insn "kmovb"
> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
> + (unspec:QI
> +  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
> +  UNSPEC_KMOV))]
> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
> +  "@
> +   kmovb\t{%k1, %0|%0, %k1}
> +   kmovb\t{%1, %0|%0, %1}";
> +  [(set_attr "mode" "QI")
> +   (set_attr "type" "mskmov")
> +   (set_attr "prefix" "vex")])
> +
> +(define_insn "kmovd"
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
> + (unspec:SI
> +  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
> +  UNSPEC_KMOV))]
> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
> +  "@
> +   kmovd\t{%k1, %0|%0, %k1}
> +   kmovd\t{%1, %0|%0, %1}";
> +  [(set_attr "mode" "SI")
> +   (set_attr "type" "mskmov")
> +   (set_attr "prefix" "vex")])
> +
> +(define_insn "kmovq"
> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
> + (unspec:DI
> +  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
> +  UNSPEC_KMOV))]
> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
> +  "@
> +   kmovq\t{%k1, %0|%0, %k1}
> +   kmovq\t{%1, %0|%0, %1}
> +   kmovq\t{%1, %0|%0, %1}";
> +  [(set_attr "mode" "DI")
> +   (set_attr "type" "mskmov")
> +   (set_attr "prefix" "vex")])
>
> - kmovd (and existing kmovw) should be using register_operand for
> opreand 0. In this case, there is no need for MEM_P checks at all.
> - In the insn constraint, pease check TARGET_AVX before checking MEM_P.
> - please put these definitions above corresponding *mov??_internal patterns.

Do you mean put below *mov??_internal patterns? Attached corrected such way.


--
WBR,
Andrew

[-- Attachment #2: add_k-mask_intrinsics_11.11.patch --]
[-- Type: application/octet-stream, Size: 72979 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a87a17f..a3456f6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,46 @@
+2016-11-11  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+	* config/i386/avx512bwintrin.h: Add new k-mask intrinsics.
+	* config/i386/avx512dqintrin.h: Ditto.
+	* config/i386/avx512fintrin.h: Ditto.
+	* config/i386/i386-builtin-types.def (UCHAR_FTYPE_UQI_UQI_PUCHAR,
+	UCHAR_FTYPE_UHI_UHI_PUCHAR, UCHAR_FTYPE_USI_USI_PUCHAR,
+	UCHAR_FTYPE_UDI_UDI_PUCHAR, UCHAR_FTYPE_UQI_UQI, UCHAR_FTYPE_UHI_UHI,
+	UCHAR_FTYPE_USI_USI, UCHAR_FTYPE_UDI_UDI, UQI_FTYPE_UQI_INT,
+	UHI_FTYPE_UHI_INT, USI_FTYPE_USI_INT, UDI_FTYPE_UDI_INT,
+	UQI_FTYPE_UQI, USI_FTYPE_USI, UDI_FTYPE_UDI, UQI_FTYPE_UQI_UQI): New
+	function types.
+	* config/i386/i386-builtin.def (__builtin_ia32_kortest_mask8_u8qi,
+	__builtin_ia32_kortest_mask16_u8hi,
+	__builtin_ia32_kortest_mask32_u8si,
+	__builtin_ia32_kortest_mask64_u8di,
+	__builtin_ia32_kortestz_mask8_u8qi,
+	__builtin_ia32_kortestz_mask16_u8hi,
+	__builtin_ia32_kortestz_mask32_u8si,
+	__builtin_ia32_kortestz_mask64_u8di,
+	__builtin_ia32_kortestc_mask8_u8qi,
+	__builtin_ia32_kortestc_mask16_u8hi,
+	__builtin_ia32_kortestc_mask32_u8si,
+	__builtin_ia32_kortestc_mask64_u8di,
+	__builtin_ia32_kshiftliqi, __builtin_ia32_kshiftlihi,
+	__builtin_ia32_kshiftlisi, __builtin_ia32_kshiftlidi,
+	__builtin_ia32_kshiftriqi, __builtin_ia32_kshiftrihi,
+	__builtin_ia32_kshiftrisi, __builtin_ia32_kshiftridi,
+	__builtin_ia32_knotqi, __builtin_ia32_knotsi, __builtin_ia32_knotdi,
+	__builtin_ia32_korqi, __builtin_ia32_korsi, __builtin_ia32_kordi,
+	__builtin_ia32_kxnorqi, __builtin_ia32_kxnorsi,
+	__builtin_ia32_kxnordi, __builtin_ia32_kxorqi, __builtin_ia32_kxorsi,
+	__builtin_ia32_kxordi, __builtin_ia32_kaddqi, __builtin_ia32_kaddhi,
+	__builtin_ia32_kaddsi, __builtin_ia32_kadddi, __builtin_ia32_kandqi,
+	__builtin_ia32_kandsi, __builtin_ia32_kanddi, __builtin_ia32_kandnqi,
+	__builtin_ia32_kandnsi, __builtin_ia32_kandndi, __builtin_ia32_kmov8,
+	__builtin_ia32_kmov32, __builtin_ia32_kmov64): New.
+	* config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
+	* config/i386/i386.md (define_insn "kmovb"): New.
+	(define_insn "kmovd"): Ditto.
+	(define_insn "kmovq"): Ditto.
+	(define_insn "kadd<mode>"): Ditto.
+
 2016-11-10  Vladimir Makarov  <vmakarov@redhat.com>
 
 	* target.def (additional_allocno_class_p): New.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d522e24..dfd35bf 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,55 @@
+2016-11-11  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+	* gcc.target/i386/avx512bw-kaddd-1.c: New test.
+	* gcc.target/i386/avx512bw-kaddq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandnd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandnq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-2.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-4.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-2.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-4.c: Ditto.
+	* gcc.target/i386/avx512bw-knotd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-knotq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-korq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftld-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kunpckdq-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kxnord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxnorq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxorq-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kaddb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kandb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kandnb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-2.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-3.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-4.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
+	* gcc.target/i386/avx512dq-knotb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-korb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kxnorb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kxorb-1.c: Ditto.
+	* gcc.target/i386/avx512f-kaddw-1.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-2.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-3.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-4.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-5.c: Ditto.
+	* gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
+	* gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
+	* gcc.target/i386/avx512f-kunpckbw-3.c: Ditto.
+
 2016-11-10  Jakub Jelinek  <jakub@redhat.com>
 
 	* gfortran.dg/openmp-define-3.f90: Expect 201511 instead of
diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 8f03249..0829af3 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,238 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 typedef unsigned long long __mmask64;
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask32_u8si ((__mmask32) __A,
+							     (__mmask32) __B,
+							     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask32_u8si ((__mmask32) __A,
+							      (__mmask32) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask32_u8si ((__mmask32) __A,
+							      (__mmask32) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask64_u8di ((__mmask64) __A,
+							     (__mmask64) __B,
+							     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask64_u8di ((__mmask64) __A,
+							      (__mmask64) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask64_u8di ((__mmask64) __A,
+							      (__mmask64) __B);
+}
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask32_u32 (__mmask32 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask64_u64 (__mmask64 __A)
+{
+  return (unsigned long long) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask32 (unsigned int __A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu64_mask64 (unsigned long long __A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask32 (__mmask32 *__A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 (*(__mmask32 *) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask64 (__mmask64 *__A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 (*(__mmask64 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask32 (__mmask32 *__A, __mmask32 __B)
+{
+  *(__mmask32 *) __A = __builtin_ia32_kmov32 (__B);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask64 (__mmask64 *__A, __mmask64 __B)
+{
+  *(__mmask64 *) __A = __builtin_ia32_kmov64 (__B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask32 (__mmask32 __A, int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask64 (__mmask64 __A, int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftlidi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask32 (__mmask32 __A, int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask64 (__mmask64 __A, int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftridi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask32 (__mmask32 __A)
+{
+  return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask64 (__mmask64 __A)
+{
+  return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kandndi ((__mmask64) __A, (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_qi (void)
@@ -138,6 +370,14 @@ _mm512_kunpackw (__mmask32 __A, __mmask32 __B)
 					      (__mmask32) __B);
 }
 
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+					      (__mmask32) __B);
+}
+
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
@@ -146,6 +386,14 @@ _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
 					      (__mmask64) __B);
 }
 
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackd_mask64 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask64) __builtin_ia32_kunpckdi ((__mmask64) __A,
+					      (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_loadu_epi8 (__m512i __W, __mmask64 __U, void const *__P)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 1dbb6b0..87681f7 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,122 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask8_u8 (__mmask8 __A, __mmask8 __B, unsigned char* __C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask8_u8qi ((__mmask8) __A,
+							    (__mmask8) __B,
+							    (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask8_u8qi ((__mmask8) __A,
+							     (__mmask8) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask8_u8qi ((__mmask8) __A,
+							     (__mmask8) __B);
+}
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask8_u32 (__mmask8 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask8 (unsigned int __A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask8 (__mmask8 *__A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 (*(__mmask8 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask8 (__mmask8 *__A, __mmask8 __B)
+{
+  *(__mmask8 *) __A = __builtin_ia32_kmov8 (__B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask8 (__mmask8 __A, int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask8 (__mmask8 __A, int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask8 (__mmask8 __A)
+{
+  return (__mmask8) __builtin_ia32_knotqi ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_korqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxnorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kaddqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandnqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_broadcast_f64x2 (__m128d __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 2372c83..8787da8 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9977,6 +9977,62 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 }
 
 /* Mask arithmetic operations */
+#define _kand_mask16 _mm512_kand
+#define _kandn_mask16 _mm512_kandn
+#define _knot_mask16 _mm512_knot
+#define _kor_mask16 _mm512_kor
+#define _kxnor_mask16 _mm512_kxnor
+#define _kxor_mask16 _mm512_kxor
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask16_u32 (__mmask16 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask16 (unsigned int __A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask16 (__mmask16 *__A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 (*(__mmask16 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask16 (__mmask16 *__A, __mmask16 __B)
+{
+  *(__mmask16 *) __A = __builtin_ia32_kmov16 (__B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask16 (__mmask16 __A, int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask16 (__mmask16 __A, int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask16 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask16) __builtin_ia32_kaddhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kand (__mmask16 __A, __mmask16 __B)
@@ -9988,7 +10044,8 @@ extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kandn (__mmask16 __A, __mmask16 __B)
 {
-  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A, (__mmask16) __B);
+  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+					     (__mmask16) __B);
 }
 
 extern __inline __mmask16
@@ -9998,6 +10055,31 @@ _mm512_kor (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
 }
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask16_u8 (__mmask16 __A, __mmask16 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask16_u8hi ((__mmask16) __A,
+							     (__mmask16) __B,
+							     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask16_u8hi ((__mmask16) __A,
+							     (__mmask16) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask16_u8hi ((__mmask16) __A,
+							     (__mmask16) __B);
+}
+
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kortestz (__mmask16 __A, __mmask16 __B)
@@ -10042,6 +10124,13 @@ _mm512_kunpackb (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
 }
 
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 #ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index b34cfda..125fa94 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -139,6 +139,12 @@ DEF_POINTER_TYPE (PLONGLONG, LONGLONG)
 DEF_POINTER_TYPE (PULONGLONG, ULONGLONG)
 DEF_POINTER_TYPE (PUNSIGNED, UNSIGNED)
 
+DEF_POINTER_TYPE (PUQI, UQI)
+DEF_POINTER_TYPE (PUHI, UHI)
+DEF_POINTER_TYPE (PUSI, USI)
+DEF_POINTER_TYPE (PUDI, UDI)
+DEF_POINTER_TYPE (PUCHAR, UCHAR)
+
 DEF_POINTER_TYPE (PV2SI, V2SI)
 DEF_POINTER_TYPE (PV2DF, V2DF)
 DEF_POINTER_TYPE (PV2DI, V2DI)
@@ -527,7 +533,23 @@ DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, PV8DI, V8DI)
 
 # Instructions returning mask
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI)
+
+DEF_FUNCTION_TYPE (UQI, UQI, INT)
+DEF_FUNCTION_TYPE (UHI, UHI, INT)
+DEF_FUNCTION_TYPE (USI, USI, INT)
+DEF_FUNCTION_TYPE (UDI, UDI, INT)
+DEF_FUNCTION_TYPE (UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI)
+DEF_FUNCTION_TYPE (USI, USI)
+DEF_FUNCTION_TYPE (UDI, UDI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
 DEF_FUNCTION_TYPE (USI, V32QI)
 DEF_FUNCTION_TYPE (UDI, V64QI)
@@ -540,6 +562,7 @@ DEF_FUNCTION_TYPE (UHI, V16SI)
 DEF_FUNCTION_TYPE (UQI, V2DI)
 DEF_FUNCTION_TYPE (UQI, V4DI)
 DEF_FUNCTION_TYPE (UQI, V8DI)
+DEF_FUNCTION_TYPE (UQI, UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI, UHI)
 DEF_FUNCTION_TYPE (USI, USI, USI)
 DEF_FUNCTION_TYPE (UDI, UDI, UDI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 227526b..5dae57d 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1436,16 +1436,75 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi, "__builtin_ia32_kortest_mask8_u8qi", IX86_BUILTIN_KORTEST8_U8, UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kortest_mask16_u8hi", IX86_BUILTIN_KORTEST16_U8, UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi, "__builtin_ia32_kortest_mask32_u8si", IX86_BUILTIN_KORTEST32_U8, UNKNOWN, (int) UCHAR_FTYPE_USI_USI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi, "__builtin_ia32_kortest_mask64_u8di", IX86_BUILTIN_KORTEST64_U8, UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI_PUCHAR)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi, "__builtin_ia32_kortestz_mask8_u8qi", IX86_BUILTIN_KORTESTZ8_U8, UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kortestz_mask16_u8hi", IX86_BUILTIN_KORTESTZ16_U8, UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi, "__builtin_ia32_kortestz_mask32_u8si", IX86_BUILTIN_KORTESTZ32_U8, UNKNOWN, (int) UCHAR_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi, "__builtin_ia32_kortestz_mask64_u8di", IX86_BUILTIN_KORTESTZ64_U8, UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi, "__builtin_ia32_kortestc_mask8_u8qi", IX86_BUILTIN_KORTESTC8_U8, UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kortestc_mask16_u8hi", IX86_BUILTIN_KORTESTC16_U8, UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi, "__builtin_ia32_kortestc_mask32_u8si", IX86_BUILTIN_KORTESTC32_U8, UNKNOWN, (int) UCHAR_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi, "__builtin_ia32_kortestc_mask64_u8di", IX86_BUILTIN_KORTESTC64_U8, UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_shiftlqi3_1, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_shiftlhi3_1, "__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int) UHI_FTYPE_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftlsi3_1, "__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int) USI_FTYPE_USI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftldi3_1, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_INT)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_shiftrqi3_1, "__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int) UQI_FTYPE_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_shiftrhi3_1, "__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int) UHI_FTYPE_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftrsi3_1, "__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int) USI_FTYPE_USI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftrdi3_1, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_INT)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_one_cmplqi2, "__builtin_ia32_knotqi", IX86_BUILTIN_KNOT8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_one_cmplhi2, "__builtin_ia32_knothi", IX86_BUILTIN_KNOT16, UNKNOWN, (int) UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_one_cmplsi2, "__builtin_ia32_knotsi", IX86_BUILTIN_KNOT32, UNKNOWN, (int) USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_one_cmpldi2, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_iorqi3, "__builtin_ia32_korqi", IX86_BUILTIN_KOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_iorhi3, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_iorsi3, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_iordi3, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnorsi, "__builtin_ia32_kxnorsi", IX86_BUILTIN_KXNOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_xorqi3, "__builtin_ia32_kxorqi", IX86_BUILTIN_KXOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_xorhi3, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_xorsi3, "__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_xordi3, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kaddqi, "__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kaddhi, "__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kaddsi, "__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_andqi3, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_andsi3, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_anddi3, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandnqi, "__builtin_ia32_kandnqi", IX86_BUILTIN_KANDN8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandnsi, "__builtin_ia32_kandnsi", IX86_BUILTIN_KANDN32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestchi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestzhi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_xorhi3, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kmovb, "__builtin_ia32_kmov8", IX86_BUILTIN_KMOV8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kmov16", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovd, "__builtin_ia32_kmov32", IX86_BUILTIN_KMOV32, UNKNOWN, (int) USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovq, "__builtin_ia32_kmov64", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
 
 /* SHA */
 BDESC (OPTION_MASK_ISA_SSE2, CODE_FOR_sha1msg1, 0, IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..be91e19 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34638,7 +34638,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8HI:
     case V4DI_FTYPE_V4SI:
     case V4DI_FTYPE_V2DI:
+    case UQI_FTYPE_UQI:
     case UHI_FTYPE_UHI:
+    case USI_FTYPE_USI:
+    case UDI_FTYPE_UDI:
     case UHI_FTYPE_V16QI:
     case USI_FTYPE_V32QI:
     case UDI_FTYPE_V64QI:
@@ -34772,6 +34775,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case UINT_FTYPE_UINT_UCHAR:
     case UINT16_FTYPE_UINT16_INT:
     case UINT8_FTYPE_UINT8_INT:
+    case UQI_FTYPE_UQI_UQI:
     case UHI_FTYPE_UHI_UHI:
     case USI_FTYPE_USI_USI:
     case UDI_FTYPE_UDI_UDI:
@@ -34819,6 +34823,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8DI_INT:
     case QI_FTYPE_V4SF_INT:
     case QI_FTYPE_V2DF_INT:
+    case UQI_FTYPE_UQI_INT:
+    case UHI_FTYPE_UHI_INT:
+    case USI_FTYPE_USI_INT:
+    case UDI_FTYPE_UDI_INT:
       nargs = 2;
       nargs_constant = 1;
       break;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a5650a1..0f15ed1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2484,20 +2484,6 @@
 	   ]
 	   (const_string "SI")))])
 
-(define_insn "kmovw"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=k,k")
-	(unspec:HI
-	  [(match_operand:HI 1 "nonimmediate_operand" "r,km")]
-	  UNSPEC_KMOV))]
-  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512F"
-  "@
-   kmovw\t{%k1, %0|%0, %k1}
-   kmovw\t{%1, %0|%0, %1}";
-  [(set_attr "mode" "HI")
-   (set_attr "type" "mskmov")
-   (set_attr "prefix" "vex")])
-
-
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,k,k, r,m")
 	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,r,km,k,k"))]
@@ -2657,6 +2643,59 @@
 	   ]
 	   (const_string "QI")))])
 
+(define_insn "kmovw"
+  [(set (match_operand:HI 0 "register_operand" "=k,k")
+	(unspec:HI
+	  [(match_operand:HI 1 "nonimmediate_operand" "r,km")]
+	  UNSPEC_KMOV))]
+  "TARGET_AVX512F && !MEM_P (operands[1])"
+  "@
+   kmovw\t{%k1, %0|%0, %k1}
+   kmovw\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "HI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovb"
+  [(set (match_operand:QI 0 "register_operand" "=k,k")
+	(unspec:QI
+	  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
+	  UNSPEC_KMOV))]
+  "TARGET_AVX512DQ && !MEM_P (operands[1])"
+  "@
+   kmovb\t{%k1, %0|%0, %k1}
+   kmovb\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "QI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovd"
+  [(set (match_operand:SI 0 "register_operand" "=k,k")
+	(unspec:SI
+	  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
+	  UNSPEC_KMOV))]
+  "TARGET_AVX512BW && !MEM_P (operands[1])"
+  "@
+   kmovd\t{%k1, %0|%0, %k1}
+   kmovd\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "SI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovq"
+  [(set (match_operand:DI 0 "register_operand" "=k,k,km")
+	(unspec:DI
+	  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
+	  UNSPEC_KMOV))]
+  "TARGET_AVX512BW && !MEM_P (operands[1])"
+  "@
+   kmovq\t{%k1, %0|%0, %k1}
+   kmovq\t{%1, %0|%0, %1}
+   kmovq\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "DI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
 ;; Stores and loads of ax to arbitrary constant address.
 ;; We fake an second form of instruction to force reload to load address
 ;; into register when rax is not available
@@ -8304,11 +8343,11 @@
    (set_attr "mode" "QI")])
 
 (define_insn "kandn<mode>"
-  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
-	(and:SWI12
-	  (not:SWI12
-	    (match_operand:SWI12 1 "register_operand" "r,0,k"))
-	  (match_operand:SWI12 2 "register_operand" "r,r,k")))
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+	(and:SWI1248x
+	  (not:SWI1248x
+	    (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+	  (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_AVX512F"
 {
@@ -8319,10 +8358,50 @@
     case 1:
       return "#";
     case 2:
-      if (TARGET_AVX512DQ && <MODE>mode == QImode)
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+	return "kandnq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+	return "kandnd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
 	return "kandnb\t{%2, %1, %0|%0, %1, %2}";
       else
 	return "kandnw\t{%2, %1, %0|%0, %1, %2}";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kadd<mode>"
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+	(plus:SWI1248x
+	  (not:SWI1248x
+	    (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+	  (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "add\t{%k2, %k1, %k0|%k0, %k1, %k2}";
+    case 1:
+      return "#";
+    case 2:
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+	return "kaddq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+	return "kaddd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
+	return "kaddb\t{%2, %1, %0|%0, %1, %2}";
+      else
+	return "kaddw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_unreachable ();
     }
@@ -9687,7 +9766,7 @@
 ;; shift pair, instead using moves and sign extension for counts greater
 ;; than 31.
 
-(define_insn "*<mshift><mode>3"
+(define_insn "<mshift><mode>3_1"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
 	(any_lshift:SWI1248_AVX512BWDQ (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")
 				       (match_operand:QI 2 "immediate_operand" "i")))]
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
new file mode 100644
index 0000000..0b38850
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
new file mode 100644
index 0000000..5b7b417
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
new file mode 100644
index 0000000..2a934f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
new file mode 100644
index 0000000..6b68ab3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
new file mode 100644
index 0000000..35f1c12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
new file mode 100644
index 0000000..a1aaed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
new file mode 100644
index 0000000..a89b2d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+volatile __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _load_mask32 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
new file mode 100644
index 0000000..dcb65fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+extern __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  _store_mask32 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
new file mode 100644
index 0000000..fe5e1d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+extern unsigned int m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtmask32_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
new file mode 100644
index 0000000..8a085d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtu32_mask32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
new file mode 100644
index 0000000..51d547d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+volatile __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _load_mask64 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
new file mode 100644
index 0000000..9baf200
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+extern __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  _store_mask64 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
new file mode 100644
index 0000000..3a02d38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+extern unsigned long long m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtmask64_u64 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
new file mode 100644
index 0000000..1cc16ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned long long m1;
+extern __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtu64_mask64 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
new file mode 100644
index 0000000..dd6b6e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask32 (k1);
+  x = _mm512_mask_add_epi16 (x, k1, x, x);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
new file mode 100644
index 0000000..5b94358
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask64 (k1);
+  x = _mm512_mask_add_epi8 (x, k1, x, x);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
new file mode 100644
index 0000000..163c46e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
new file mode 100644
index 0000000..77b1b9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "korq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
new file mode 100644
index 0000000..85be9b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftld\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
new file mode 100644
index 0000000..cd5707e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
new file mode 100644
index 0000000..91b6313
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
new file mode 100644
index 0000000..c10fa4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
new file mode 100644
index 0000000..951260f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckdq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k3;
+  __mmask32 k1, k2;
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackd_mask64 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
new file mode 100644
index 0000000..c68ad8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckwd\[ \\t\]+\[^\{\n\]*%k\[1-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k3;
+  __mmask16 k1, k2;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackw_mask32 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
new file mode 100644
index 0000000..ccf4b63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
new file mode 100644
index 0000000..b9c0979
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
new file mode 100644
index 0000000..ce03ab4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
new file mode 100644
index 0000000..d6366dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
new file mode 100644
index 0000000..a84d8ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kaddb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
new file mode 100644
index 0000000..b5b5367
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask8 (k1, k2);
+  x = _mm512_mask_add_epi64 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
new file mode 100644
index 0000000..ff50610
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandnb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
new file mode 100644
index 0000000..3832853
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+volatile __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _load_mask8 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
new file mode 100644
index 0000000..8d06674
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+extern __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  _store_mask8 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
new file mode 100644
index 0000000..2da4719
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+extern unsigned int m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _cvtmask8_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
new file mode 100644
index 0000000..d3f8c5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _cvtu32_mask8 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
new file mode 100644
index 0000000..8bb9249
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "knotb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask8 (k1);
+  x = _mm512_mask_add_pd (x, k1, x, x);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
new file mode 100644
index 0000000..22b727d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "korb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
new file mode 100644
index 0000000..422d0b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
new file mode 100644
index 0000000..f87cf74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
new file mode 100644
index 0000000..ee21aa1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxnorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
new file mode 100644
index 0000000..63a1ff8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
new file mode 100644
index 0000000..9faf4ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kaddw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2, k3;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
new file mode 100644
index 0000000..77c8ddc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+volatile __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  m2 = _load_mask16 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
new file mode 100644
index 0000000..740ea9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+extern __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  _store_mask16 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
new file mode 100644
index 0000000..127a4ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+extern unsigned int m2;
+
+void
+avx512f_test ()
+{
+  m2 = _cvtmask16_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
new file mode 100644
index 0000000..d729e8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  m2 = _cvtu32_mask16 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
new file mode 100644
index 0000000..7a9de12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
new file mode 100644
index 0000000..641d307
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
new file mode 100644
index 0000000..2061f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kunpckbw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test () {
+  __mmask8 k1, k2;
+  __mmask16 k3;
+  volatile __m512 x = _mm512_setzero_ps(); 
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackb_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-11-11 17:39 ` Andrew Senkevich
@ 2016-11-11 17:50   ` Uros Bizjak
  2016-11-11 17:56     ` Uros Bizjak
  0 siblings, 1 reply; 48+ messages in thread
From: Uros Bizjak @ 2016-11-11 17:50 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: gcc-patches

On Fri, Nov 11, 2016 at 6:38 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2016-11-11 17:34 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>> Some quick remarks:
>>
>> +(define_insn "kmovb"
>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
>> + (unspec:QI
>> +  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>> +  UNSPEC_KMOV))]
>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
>> +  "@
>> +   kmovb\t{%k1, %0|%0, %k1}
>> +   kmovb\t{%1, %0|%0, %1}";
>> +  [(set_attr "mode" "QI")
>> +   (set_attr "type" "mskmov")
>> +   (set_attr "prefix" "vex")])
>> +
>> +(define_insn "kmovd"
>> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
>> + (unspec:SI
>> +  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>> +  UNSPEC_KMOV))]
>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>> +  "@
>> +   kmovd\t{%k1, %0|%0, %k1}
>> +   kmovd\t{%1, %0|%0, %1}";
>> +  [(set_attr "mode" "SI")
>> +   (set_attr "type" "mskmov")
>> +   (set_attr "prefix" "vex")])
>> +
>> +(define_insn "kmovq"
>> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
>> + (unspec:DI
>> +  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>> +  UNSPEC_KMOV))]
>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>> +  "@
>> +   kmovq\t{%k1, %0|%0, %k1}
>> +   kmovq\t{%1, %0|%0, %1}
>> +   kmovq\t{%1, %0|%0, %1}";
>> +  [(set_attr "mode" "DI")
>> +   (set_attr "type" "mskmov")
>> +   (set_attr "prefix" "vex")])
>>
>> - kmovd (and existing kmovw) should be using register_operand for
>> opreand 0. In this case, there is no need for MEM_P checks at all.
>> - In the insn constraint, pease check TARGET_AVX before checking MEM_P.
>> - please put these definitions above corresponding *mov??_internal patterns.
>
> Do you mean put below *mov??_internal patterns? Attached corrected such way.

No, please put kmovq near *movdi_internal, kmovd near *movsi_internal,
etc. It doesn't matter if they are above or below their respective
*mov??_internal patterns, as long as they are positioned in some
consistent way. IOW, new patterns shouldn't be grouped together, as is
the case with your patch.

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-11-11 17:50   ` Uros Bizjak
@ 2016-11-11 17:56     ` Uros Bizjak
  2016-11-11 18:23       ` Andrew Senkevich
  0 siblings, 1 reply; 48+ messages in thread
From: Uros Bizjak @ 2016-11-11 17:56 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: gcc-patches

On Fri, Nov 11, 2016 at 6:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Fri, Nov 11, 2016 at 6:38 PM, Andrew Senkevich
> <andrew.n.senkevich@gmail.com> wrote:
>> 2016-11-11 17:34 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>> Some quick remarks:
>>>
>>> +(define_insn "kmovb"
>>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
>>> + (unspec:QI
>>> +  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>> +  UNSPEC_KMOV))]
>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
>>> +  "@
>>> +   kmovb\t{%k1, %0|%0, %k1}
>>> +   kmovb\t{%1, %0|%0, %1}";
>>> +  [(set_attr "mode" "QI")
>>> +   (set_attr "type" "mskmov")
>>> +   (set_attr "prefix" "vex")])
>>> +
>>> +(define_insn "kmovd"
>>> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
>>> + (unspec:SI
>>> +  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>>> +  UNSPEC_KMOV))]
>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>> +  "@
>>> +   kmovd\t{%k1, %0|%0, %k1}
>>> +   kmovd\t{%1, %0|%0, %1}";
>>> +  [(set_attr "mode" "SI")
>>> +   (set_attr "type" "mskmov")
>>> +   (set_attr "prefix" "vex")])
>>> +
>>> +(define_insn "kmovq"
>>> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
>>> + (unspec:DI
>>> +  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>> +  UNSPEC_KMOV))]
>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>> +  "@
>>> +   kmovq\t{%k1, %0|%0, %k1}
>>> +   kmovq\t{%1, %0|%0, %1}
>>> +   kmovq\t{%1, %0|%0, %1}";
>>> +  [(set_attr "mode" "DI")
>>> +   (set_attr "type" "mskmov")
>>> +   (set_attr "prefix" "vex")])
>>>
>>> - kmovd (and existing kmovw) should be using register_operand for
>>> opreand 0. In this case, there is no need for MEM_P checks at all.
>>> - In the insn constraint, pease check TARGET_AVX before checking MEM_P.
>>> - please put these definitions above corresponding *mov??_internal patterns.
>>
>> Do you mean put below *mov??_internal patterns? Attached corrected such way.
>
> No, please put kmovq near *movdi_internal, kmovd near *movsi_internal,
> etc. It doesn't matter if they are above or below their respective
> *mov??_internal patterns, as long as they are positioned in some
> consistent way. IOW, new patterns shouldn't be grouped together, as is
> the case with your patch.

+(define_insn "kmovb"
+  [(set (match_operand:QI 0 "register_operand" "=k,k")
+    (unspec:QI
+      [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
+      UNSPEC_KMOV))]
+  "TARGET_AVX512DQ && !MEM_P (operands[1])"

There is no need for !MEM_P, this will prevent memory operand, which
is allowed by constraint "m".

+(define_insn "kmovq"
+  [(set (match_operand:DI 0 "register_operand" "=k,k,km")
+    (unspec:DI
+      [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
+      UNSPEC_KMOV))]
+  "TARGET_AVX512BW && !MEM_P (operands[1])"

Operand 0 should have "nonimmediate_operand" predicate. And here you
need  && !(MEM_P (op0) && MEM_P (op1)) in insn constraint to prevent
mem->mem moves.

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-11-11 17:56     ` Uros Bizjak
@ 2016-11-11 18:23       ` Andrew Senkevich
  2016-11-11 19:14         ` Uros Bizjak
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2016-11-11 18:23 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3248 bytes --]

2016-11-11 20:56 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> On Fri, Nov 11, 2016 at 6:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Fri, Nov 11, 2016 at 6:38 PM, Andrew Senkevich
>> <andrew.n.senkevich@gmail.com> wrote:
>>> 2016-11-11 17:34 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>> Some quick remarks:
>>>>
>>>> +(define_insn "kmovb"
>>>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
>>>> + (unspec:QI
>>>> +  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>>> +  UNSPEC_KMOV))]
>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
>>>> +  "@
>>>> +   kmovb\t{%k1, %0|%0, %k1}
>>>> +   kmovb\t{%1, %0|%0, %1}";
>>>> +  [(set_attr "mode" "QI")
>>>> +   (set_attr "type" "mskmov")
>>>> +   (set_attr "prefix" "vex")])
>>>> +
>>>> +(define_insn "kmovd"
>>>> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
>>>> + (unspec:SI
>>>> +  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>>>> +  UNSPEC_KMOV))]
>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>> +  "@
>>>> +   kmovd\t{%k1, %0|%0, %k1}
>>>> +   kmovd\t{%1, %0|%0, %1}";
>>>> +  [(set_attr "mode" "SI")
>>>> +   (set_attr "type" "mskmov")
>>>> +   (set_attr "prefix" "vex")])
>>>> +
>>>> +(define_insn "kmovq"
>>>> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
>>>> + (unspec:DI
>>>> +  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>>> +  UNSPEC_KMOV))]
>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>> +  "@
>>>> +   kmovq\t{%k1, %0|%0, %k1}
>>>> +   kmovq\t{%1, %0|%0, %1}
>>>> +   kmovq\t{%1, %0|%0, %1}";
>>>> +  [(set_attr "mode" "DI")
>>>> +   (set_attr "type" "mskmov")
>>>> +   (set_attr "prefix" "vex")])
>>>>
>>>> - kmovd (and existing kmovw) should be using register_operand for
>>>> opreand 0. In this case, there is no need for MEM_P checks at all.
>>>> - In the insn constraint, pease check TARGET_AVX before checking MEM_P.
>>>> - please put these definitions above corresponding *mov??_internal patterns.
>>>
>>> Do you mean put below *mov??_internal patterns? Attached corrected such way.
>>
>> No, please put kmovq near *movdi_internal, kmovd near *movsi_internal,
>> etc. It doesn't matter if they are above or below their respective
>> *mov??_internal patterns, as long as they are positioned in some
>> consistent way. IOW, new patterns shouldn't be grouped together, as is
>> the case with your patch.
>
> +(define_insn "kmovb"
> +  [(set (match_operand:QI 0 "register_operand" "=k,k")
> +    (unspec:QI
> +      [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
> +      UNSPEC_KMOV))]
> +  "TARGET_AVX512DQ && !MEM_P (operands[1])"
>
> There is no need for !MEM_P, this will prevent memory operand, which
> is allowed by constraint "m".
>
> +(define_insn "kmovq"
> +  [(set (match_operand:DI 0 "register_operand" "=k,k,km")
> +    (unspec:DI
> +      [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
> +      UNSPEC_KMOV))]
> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>
> Operand 0 should have "nonimmediate_operand" predicate. And here you
> need  && !(MEM_P (op0) && MEM_P (op1)) in insn constraint to prevent
> mem->mem moves.

Changed according your comments and attached.


--
WBR,
Andrew

[-- Attachment #2: add_k-mask_intrinsics_11.11_1.patch --]
[-- Type: application/octet-stream, Size: 73264 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a87a17f..a3456f6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,46 @@
+2016-11-11  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+	* config/i386/avx512bwintrin.h: Add new k-mask intrinsics.
+	* config/i386/avx512dqintrin.h: Ditto.
+	* config/i386/avx512fintrin.h: Ditto.
+	* config/i386/i386-builtin-types.def (UCHAR_FTYPE_UQI_UQI_PUCHAR,
+	UCHAR_FTYPE_UHI_UHI_PUCHAR, UCHAR_FTYPE_USI_USI_PUCHAR,
+	UCHAR_FTYPE_UDI_UDI_PUCHAR, UCHAR_FTYPE_UQI_UQI, UCHAR_FTYPE_UHI_UHI,
+	UCHAR_FTYPE_USI_USI, UCHAR_FTYPE_UDI_UDI, UQI_FTYPE_UQI_INT,
+	UHI_FTYPE_UHI_INT, USI_FTYPE_USI_INT, UDI_FTYPE_UDI_INT,
+	UQI_FTYPE_UQI, USI_FTYPE_USI, UDI_FTYPE_UDI, UQI_FTYPE_UQI_UQI): New
+	function types.
+	* config/i386/i386-builtin.def (__builtin_ia32_kortest_mask8_u8qi,
+	__builtin_ia32_kortest_mask16_u8hi,
+	__builtin_ia32_kortest_mask32_u8si,
+	__builtin_ia32_kortest_mask64_u8di,
+	__builtin_ia32_kortestz_mask8_u8qi,
+	__builtin_ia32_kortestz_mask16_u8hi,
+	__builtin_ia32_kortestz_mask32_u8si,
+	__builtin_ia32_kortestz_mask64_u8di,
+	__builtin_ia32_kortestc_mask8_u8qi,
+	__builtin_ia32_kortestc_mask16_u8hi,
+	__builtin_ia32_kortestc_mask32_u8si,
+	__builtin_ia32_kortestc_mask64_u8di,
+	__builtin_ia32_kshiftliqi, __builtin_ia32_kshiftlihi,
+	__builtin_ia32_kshiftlisi, __builtin_ia32_kshiftlidi,
+	__builtin_ia32_kshiftriqi, __builtin_ia32_kshiftrihi,
+	__builtin_ia32_kshiftrisi, __builtin_ia32_kshiftridi,
+	__builtin_ia32_knotqi, __builtin_ia32_knotsi, __builtin_ia32_knotdi,
+	__builtin_ia32_korqi, __builtin_ia32_korsi, __builtin_ia32_kordi,
+	__builtin_ia32_kxnorqi, __builtin_ia32_kxnorsi,
+	__builtin_ia32_kxnordi, __builtin_ia32_kxorqi, __builtin_ia32_kxorsi,
+	__builtin_ia32_kxordi, __builtin_ia32_kaddqi, __builtin_ia32_kaddhi,
+	__builtin_ia32_kaddsi, __builtin_ia32_kadddi, __builtin_ia32_kandqi,
+	__builtin_ia32_kandsi, __builtin_ia32_kanddi, __builtin_ia32_kandnqi,
+	__builtin_ia32_kandnsi, __builtin_ia32_kandndi, __builtin_ia32_kmov8,
+	__builtin_ia32_kmov32, __builtin_ia32_kmov64): New.
+	* config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
+	* config/i386/i386.md (define_insn "kmovb"): New.
+	(define_insn "kmovd"): Ditto.
+	(define_insn "kmovq"): Ditto.
+	(define_insn "kadd<mode>"): Ditto.
+
 2016-11-10  Vladimir Makarov  <vmakarov@redhat.com>
 
 	* target.def (additional_allocno_class_p): New.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d522e24..dfd35bf 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,55 @@
+2016-11-11  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+	* gcc.target/i386/avx512bw-kaddd-1.c: New test.
+	* gcc.target/i386/avx512bw-kaddq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandnd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandnq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-2.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-4.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-2.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-4.c: Ditto.
+	* gcc.target/i386/avx512bw-knotd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-knotq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-korq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftld-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kunpckdq-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kxnord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxnorq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxorq-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kaddb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kandb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kandnb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-2.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-3.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-4.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
+	* gcc.target/i386/avx512dq-knotb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-korb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kxnorb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kxorb-1.c: Ditto.
+	* gcc.target/i386/avx512f-kaddw-1.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-2.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-3.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-4.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-5.c: Ditto.
+	* gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
+	* gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
+	* gcc.target/i386/avx512f-kunpckbw-3.c: Ditto.
+
 2016-11-10  Jakub Jelinek  <jakub@redhat.com>
 
 	* gfortran.dg/openmp-define-3.f90: Expect 201511 instead of
diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 8f03249..0829af3 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,238 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 typedef unsigned long long __mmask64;
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask32_u8si ((__mmask32) __A,
+							     (__mmask32) __B,
+							     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask32_u8si ((__mmask32) __A,
+							      (__mmask32) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask32_u8si ((__mmask32) __A,
+							      (__mmask32) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask64_u8di ((__mmask64) __A,
+							     (__mmask64) __B,
+							     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask64_u8di ((__mmask64) __A,
+							      (__mmask64) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask64_u8di ((__mmask64) __A,
+							      (__mmask64) __B);
+}
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask32_u32 (__mmask32 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask64_u64 (__mmask64 __A)
+{
+  return (unsigned long long) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask32 (unsigned int __A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu64_mask64 (unsigned long long __A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask32 (__mmask32 *__A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 (*(__mmask32 *) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask64 (__mmask64 *__A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 (*(__mmask64 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask32 (__mmask32 *__A, __mmask32 __B)
+{
+  *(__mmask32 *) __A = __builtin_ia32_kmov32 (__B);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask64 (__mmask64 *__A, __mmask64 __B)
+{
+  *(__mmask64 *) __A = __builtin_ia32_kmov64 (__B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask32 (__mmask32 __A, int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask64 (__mmask64 __A, int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftlidi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask32 (__mmask32 __A, int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask64 (__mmask64 __A, int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftridi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask32 (__mmask32 __A)
+{
+  return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask64 (__mmask64 __A)
+{
+  return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kandndi ((__mmask64) __A, (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_qi (void)
@@ -138,6 +370,14 @@ _mm512_kunpackw (__mmask32 __A, __mmask32 __B)
 					      (__mmask32) __B);
 }
 
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+					      (__mmask32) __B);
+}
+
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
@@ -146,6 +386,14 @@ _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
 					      (__mmask64) __B);
 }
 
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackd_mask64 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask64) __builtin_ia32_kunpckdi ((__mmask64) __A,
+					      (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_loadu_epi8 (__m512i __W, __mmask64 __U, void const *__P)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 1dbb6b0..87681f7 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,122 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask8_u8 (__mmask8 __A, __mmask8 __B, unsigned char* __C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask8_u8qi ((__mmask8) __A,
+							    (__mmask8) __B,
+							    (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask8_u8qi ((__mmask8) __A,
+							     (__mmask8) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask8_u8qi ((__mmask8) __A,
+							     (__mmask8) __B);
+}
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask8_u32 (__mmask8 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask8 (unsigned int __A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask8 (__mmask8 *__A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 (*(__mmask8 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask8 (__mmask8 *__A, __mmask8 __B)
+{
+  *(__mmask8 *) __A = __builtin_ia32_kmov8 (__B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask8 (__mmask8 __A, int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask8 (__mmask8 __A, int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask8 (__mmask8 __A)
+{
+  return (__mmask8) __builtin_ia32_knotqi ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_korqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxnorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kaddqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandnqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_broadcast_f64x2 (__m128d __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 2372c83..8787da8 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9977,6 +9977,62 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 }
 
 /* Mask arithmetic operations */
+#define _kand_mask16 _mm512_kand
+#define _kandn_mask16 _mm512_kandn
+#define _knot_mask16 _mm512_knot
+#define _kor_mask16 _mm512_kor
+#define _kxnor_mask16 _mm512_kxnor
+#define _kxor_mask16 _mm512_kxor
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask16_u32 (__mmask16 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask16 (unsigned int __A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask16 (__mmask16 *__A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 (*(__mmask16 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask16 (__mmask16 *__A, __mmask16 __B)
+{
+  *(__mmask16 *) __A = __builtin_ia32_kmov16 (__B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask16 (__mmask16 __A, int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask16 (__mmask16 __A, int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask16 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask16) __builtin_ia32_kaddhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kand (__mmask16 __A, __mmask16 __B)
@@ -9988,7 +10044,8 @@ extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kandn (__mmask16 __A, __mmask16 __B)
 {
-  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A, (__mmask16) __B);
+  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+					     (__mmask16) __B);
 }
 
 extern __inline __mmask16
@@ -9998,6 +10055,31 @@ _mm512_kor (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
 }
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask16_u8 (__mmask16 __A, __mmask16 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask16_u8hi ((__mmask16) __A,
+							     (__mmask16) __B,
+							     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask16_u8hi ((__mmask16) __A,
+							     (__mmask16) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask16_u8hi ((__mmask16) __A,
+							     (__mmask16) __B);
+}
+
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kortestz (__mmask16 __A, __mmask16 __B)
@@ -10042,6 +10124,13 @@ _mm512_kunpackb (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
 }
 
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 #ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index b34cfda..125fa94 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -139,6 +139,12 @@ DEF_POINTER_TYPE (PLONGLONG, LONGLONG)
 DEF_POINTER_TYPE (PULONGLONG, ULONGLONG)
 DEF_POINTER_TYPE (PUNSIGNED, UNSIGNED)
 
+DEF_POINTER_TYPE (PUQI, UQI)
+DEF_POINTER_TYPE (PUHI, UHI)
+DEF_POINTER_TYPE (PUSI, USI)
+DEF_POINTER_TYPE (PUDI, UDI)
+DEF_POINTER_TYPE (PUCHAR, UCHAR)
+
 DEF_POINTER_TYPE (PV2SI, V2SI)
 DEF_POINTER_TYPE (PV2DF, V2DF)
 DEF_POINTER_TYPE (PV2DI, V2DI)
@@ -527,7 +533,23 @@ DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, PV8DI, V8DI)
 
 # Instructions returning mask
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI)
+
+DEF_FUNCTION_TYPE (UQI, UQI, INT)
+DEF_FUNCTION_TYPE (UHI, UHI, INT)
+DEF_FUNCTION_TYPE (USI, USI, INT)
+DEF_FUNCTION_TYPE (UDI, UDI, INT)
+DEF_FUNCTION_TYPE (UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI)
+DEF_FUNCTION_TYPE (USI, USI)
+DEF_FUNCTION_TYPE (UDI, UDI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
 DEF_FUNCTION_TYPE (USI, V32QI)
 DEF_FUNCTION_TYPE (UDI, V64QI)
@@ -540,6 +562,7 @@ DEF_FUNCTION_TYPE (UHI, V16SI)
 DEF_FUNCTION_TYPE (UQI, V2DI)
 DEF_FUNCTION_TYPE (UQI, V4DI)
 DEF_FUNCTION_TYPE (UQI, V8DI)
+DEF_FUNCTION_TYPE (UQI, UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI, UHI)
 DEF_FUNCTION_TYPE (USI, USI, USI)
 DEF_FUNCTION_TYPE (UDI, UDI, UDI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 227526b..5dae57d 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1436,16 +1436,75 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi, "__builtin_ia32_kortest_mask8_u8qi", IX86_BUILTIN_KORTEST8_U8, UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kortest_mask16_u8hi", IX86_BUILTIN_KORTEST16_U8, UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi, "__builtin_ia32_kortest_mask32_u8si", IX86_BUILTIN_KORTEST32_U8, UNKNOWN, (int) UCHAR_FTYPE_USI_USI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi, "__builtin_ia32_kortest_mask64_u8di", IX86_BUILTIN_KORTEST64_U8, UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI_PUCHAR)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi, "__builtin_ia32_kortestz_mask8_u8qi", IX86_BUILTIN_KORTESTZ8_U8, UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kortestz_mask16_u8hi", IX86_BUILTIN_KORTESTZ16_U8, UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi, "__builtin_ia32_kortestz_mask32_u8si", IX86_BUILTIN_KORTESTZ32_U8, UNKNOWN, (int) UCHAR_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi, "__builtin_ia32_kortestz_mask64_u8di", IX86_BUILTIN_KORTESTZ64_U8, UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi, "__builtin_ia32_kortestc_mask8_u8qi", IX86_BUILTIN_KORTESTC8_U8, UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kortestc_mask16_u8hi", IX86_BUILTIN_KORTESTC16_U8, UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi, "__builtin_ia32_kortestc_mask32_u8si", IX86_BUILTIN_KORTESTC32_U8, UNKNOWN, (int) UCHAR_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi, "__builtin_ia32_kortestc_mask64_u8di", IX86_BUILTIN_KORTESTC64_U8, UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_shiftlqi3_1, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_shiftlhi3_1, "__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int) UHI_FTYPE_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftlsi3_1, "__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int) USI_FTYPE_USI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftldi3_1, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_INT)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_shiftrqi3_1, "__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int) UQI_FTYPE_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_shiftrhi3_1, "__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int) UHI_FTYPE_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftrsi3_1, "__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int) USI_FTYPE_USI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftrdi3_1, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_INT)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_one_cmplqi2, "__builtin_ia32_knotqi", IX86_BUILTIN_KNOT8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_one_cmplhi2, "__builtin_ia32_knothi", IX86_BUILTIN_KNOT16, UNKNOWN, (int) UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_one_cmplsi2, "__builtin_ia32_knotsi", IX86_BUILTIN_KNOT32, UNKNOWN, (int) USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_one_cmpldi2, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_iorqi3, "__builtin_ia32_korqi", IX86_BUILTIN_KOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_iorhi3, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_iorsi3, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_iordi3, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnorsi, "__builtin_ia32_kxnorsi", IX86_BUILTIN_KXNOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_xorqi3, "__builtin_ia32_kxorqi", IX86_BUILTIN_KXOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_xorhi3, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_xorsi3, "__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_xordi3, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kaddqi, "__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kaddhi, "__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kaddsi, "__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_andqi3, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_andsi3, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_anddi3, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandnqi, "__builtin_ia32_kandnqi", IX86_BUILTIN_KANDN8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandnsi, "__builtin_ia32_kandnsi", IX86_BUILTIN_KANDN32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestchi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestzhi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_xorhi3, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kmovb, "__builtin_ia32_kmov8", IX86_BUILTIN_KMOV8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kmov16", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovd, "__builtin_ia32_kmov32", IX86_BUILTIN_KMOV32, UNKNOWN, (int) USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovq, "__builtin_ia32_kmov64", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
 
 /* SHA */
 BDESC (OPTION_MASK_ISA_SSE2, CODE_FOR_sha1msg1, 0, IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..be91e19 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34638,7 +34638,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8HI:
     case V4DI_FTYPE_V4SI:
     case V4DI_FTYPE_V2DI:
+    case UQI_FTYPE_UQI:
     case UHI_FTYPE_UHI:
+    case USI_FTYPE_USI:
+    case UDI_FTYPE_UDI:
     case UHI_FTYPE_V16QI:
     case USI_FTYPE_V32QI:
     case UDI_FTYPE_V64QI:
@@ -34772,6 +34775,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case UINT_FTYPE_UINT_UCHAR:
     case UINT16_FTYPE_UINT16_INT:
     case UINT8_FTYPE_UINT8_INT:
+    case UQI_FTYPE_UQI_UQI:
     case UHI_FTYPE_UHI_UHI:
     case USI_FTYPE_USI_USI:
     case UDI_FTYPE_UDI_UDI:
@@ -34819,6 +34823,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8DI_INT:
     case QI_FTYPE_V4SF_INT:
     case QI_FTYPE_V2DF_INT:
+    case UQI_FTYPE_UQI_INT:
+    case UHI_FTYPE_UHI_INT:
+    case USI_FTYPE_USI_INT:
+    case UDI_FTYPE_UDI_INT:
       nargs = 2;
       nargs_constant = 1;
       break;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a5650a1..50da7df 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2168,6 +2168,20 @@
   [(const_int 0)]
   "ix86_split_long_move (operands); DONE;")
 
+(define_insn "kmovq"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
+	(unspec:DI
+	  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
+	  UNSPEC_KMOV))]
+  "TARGET_AVX512BW && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "@
+   kmovq\t{%k1, %0|%0, %k1}
+   kmovq\t{%1, %0|%0, %1}
+   kmovq\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "DI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
 (define_insn "*movdi_internal"
   [(set (match_operand:DI 0 "nonimmediate_operand"
     "=r  ,o  ,r,r  ,r,m ,*y,*y,?*y,?m,?r ,?*Ym,*v,*v,*v,m ,m,?r ,?r,?*Yi,?*Ym,?*Yi,*k,*k ,*r ,*m")
@@ -2355,6 +2369,19 @@
   [(const_int 0)]
   "ix86_split_long_move (operands); DONE;")
 
+(define_insn "kmovd"
+  [(set (match_operand:SI 0 "register_operand" "=k,k")
+	(unspec:SI
+	  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
+	  UNSPEC_KMOV))]
+  "TARGET_AVX512BW && !MEM_P (operands[1])"
+  "@
+   kmovd\t{%k1, %0|%0, %k1}
+   kmovd\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "SI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
 (define_insn "*movsi_internal"
   [(set (match_operand:SI 0 "nonimmediate_operand"
 			"=r,m ,*y,*y,?rm,?*y,*v,*v,*v,m ,?r ,?r,?*Yi,*k  ,*rm")
@@ -2485,11 +2512,11 @@
 	   (const_string "SI")))])
 
 (define_insn "kmovw"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=k,k")
+  [(set (match_operand:HI 0 "register_operand" "=k,k")
 	(unspec:HI
 	  [(match_operand:HI 1 "nonimmediate_operand" "r,km")]
 	  UNSPEC_KMOV))]
-  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512F"
+  "TARGET_AVX512F && !MEM_P (operands[1])"
   "@
    kmovw\t{%k1, %0|%0, %k1}
    kmovw\t{%1, %0|%0, %1}";
@@ -2497,7 +2524,6 @@
    (set_attr "type" "mskmov")
    (set_attr "prefix" "vex")])
 
-
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,k,k, r,m")
 	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,r,km,k,k"))]
@@ -2561,6 +2587,19 @@
 	    ]
 	    (const_string "HI")))])
 
+(define_insn "kmovb"
+  [(set (match_operand:QI 0 "register_operand" "=k,k")
+	(unspec:QI
+	  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
+	  UNSPEC_KMOV))]
+  "TARGET_AVX512DQ"
+  "@
+   kmovb\t{%k1, %0|%0, %k1}
+   kmovb\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "QI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
 ;; Situation is quite tricky about when to choose full sized (SImode) move
 ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
 ;; partial register dependency machines (such as AMD Athlon), where QImode
@@ -8304,11 +8343,11 @@
    (set_attr "mode" "QI")])
 
 (define_insn "kandn<mode>"
-  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
-	(and:SWI12
-	  (not:SWI12
-	    (match_operand:SWI12 1 "register_operand" "r,0,k"))
-	  (match_operand:SWI12 2 "register_operand" "r,r,k")))
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+	(and:SWI1248x
+	  (not:SWI1248x
+	    (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+	  (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_AVX512F"
 {
@@ -8319,10 +8358,50 @@
     case 1:
       return "#";
     case 2:
-      if (TARGET_AVX512DQ && <MODE>mode == QImode)
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+	return "kandnq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+	return "kandnd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
 	return "kandnb\t{%2, %1, %0|%0, %1, %2}";
       else
 	return "kandnw\t{%2, %1, %0|%0, %1, %2}";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kadd<mode>"
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+	(plus:SWI1248x
+	  (not:SWI1248x
+	    (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+	  (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "add\t{%k2, %k1, %k0|%k0, %k1, %k2}";
+    case 1:
+      return "#";
+    case 2:
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+	return "kaddq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+	return "kaddd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
+	return "kaddb\t{%2, %1, %0|%0, %1, %2}";
+      else
+	return "kaddw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_unreachable ();
     }
@@ -9687,7 +9766,7 @@
 ;; shift pair, instead using moves and sign extension for counts greater
 ;; than 31.
 
-(define_insn "*<mshift><mode>3"
+(define_insn "<mshift><mode>3_1"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
 	(any_lshift:SWI1248_AVX512BWDQ (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")
 				       (match_operand:QI 2 "immediate_operand" "i")))]
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
new file mode 100644
index 0000000..0b38850
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
new file mode 100644
index 0000000..5b7b417
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
new file mode 100644
index 0000000..2a934f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
new file mode 100644
index 0000000..6b68ab3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
new file mode 100644
index 0000000..35f1c12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
new file mode 100644
index 0000000..a1aaed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
new file mode 100644
index 0000000..a89b2d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+volatile __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _load_mask32 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
new file mode 100644
index 0000000..dcb65fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+extern __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  _store_mask32 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
new file mode 100644
index 0000000..fe5e1d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+extern unsigned int m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtmask32_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
new file mode 100644
index 0000000..8a085d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtu32_mask32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
new file mode 100644
index 0000000..51d547d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+volatile __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _load_mask64 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
new file mode 100644
index 0000000..9baf200
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+extern __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  _store_mask64 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
new file mode 100644
index 0000000..3a02d38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+extern unsigned long long m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtmask64_u64 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
new file mode 100644
index 0000000..1cc16ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned long long m1;
+extern __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtu64_mask64 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
new file mode 100644
index 0000000..dd6b6e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask32 (k1);
+  x = _mm512_mask_add_epi16 (x, k1, x, x);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
new file mode 100644
index 0000000..5b94358
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask64 (k1);
+  x = _mm512_mask_add_epi8 (x, k1, x, x);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
new file mode 100644
index 0000000..163c46e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
new file mode 100644
index 0000000..77b1b9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "korq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
new file mode 100644
index 0000000..85be9b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftld\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
new file mode 100644
index 0000000..cd5707e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
new file mode 100644
index 0000000..91b6313
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
new file mode 100644
index 0000000..c10fa4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
new file mode 100644
index 0000000..951260f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckdq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k3;
+  __mmask32 k1, k2;
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackd_mask64 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
new file mode 100644
index 0000000..c68ad8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckwd\[ \\t\]+\[^\{\n\]*%k\[1-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k3;
+  __mmask16 k1, k2;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackw_mask32 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
new file mode 100644
index 0000000..ccf4b63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
new file mode 100644
index 0000000..b9c0979
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
new file mode 100644
index 0000000..ce03ab4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
new file mode 100644
index 0000000..d6366dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
new file mode 100644
index 0000000..a84d8ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kaddb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
new file mode 100644
index 0000000..b5b5367
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask8 (k1, k2);
+  x = _mm512_mask_add_epi64 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
new file mode 100644
index 0000000..ff50610
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandnb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
new file mode 100644
index 0000000..3832853
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+volatile __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _load_mask8 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
new file mode 100644
index 0000000..8d06674
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+extern __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  _store_mask8 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
new file mode 100644
index 0000000..2da4719
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+extern unsigned int m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _cvtmask8_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
new file mode 100644
index 0000000..d3f8c5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _cvtu32_mask8 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
new file mode 100644
index 0000000..8bb9249
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "knotb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask8 (k1);
+  x = _mm512_mask_add_pd (x, k1, x, x);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
new file mode 100644
index 0000000..22b727d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "korb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
new file mode 100644
index 0000000..422d0b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
new file mode 100644
index 0000000..f87cf74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
new file mode 100644
index 0000000..ee21aa1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxnorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
new file mode 100644
index 0000000..63a1ff8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
new file mode 100644
index 0000000..9faf4ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kaddw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2, k3;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
new file mode 100644
index 0000000..77c8ddc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+volatile __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  m2 = _load_mask16 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
new file mode 100644
index 0000000..740ea9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+extern __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  _store_mask16 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
new file mode 100644
index 0000000..127a4ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+extern unsigned int m2;
+
+void
+avx512f_test ()
+{
+  m2 = _cvtmask16_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
new file mode 100644
index 0000000..d729e8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  m2 = _cvtu32_mask16 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
new file mode 100644
index 0000000..7a9de12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
new file mode 100644
index 0000000..641d307
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
new file mode 100644
index 0000000..2061f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kunpckbw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test () {
+  __mmask8 k1, k2;
+  __mmask16 k3;
+  volatile __m512 x = _mm512_setzero_ps(); 
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackb_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-11-11 18:23       ` Andrew Senkevich
@ 2016-11-11 19:14         ` Uros Bizjak
  2016-12-02 17:45           ` Andrew Senkevich
  0 siblings, 1 reply; 48+ messages in thread
From: Uros Bizjak @ 2016-11-11 19:14 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: gcc-patches

On Fri, Nov 11, 2016 at 7:23 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2016-11-11 20:56 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>> On Fri, Nov 11, 2016 at 6:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Fri, Nov 11, 2016 at 6:38 PM, Andrew Senkevich
>>> <andrew.n.senkevich@gmail.com> wrote:
>>>> 2016-11-11 17:34 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>> Some quick remarks:
>>>>>
>>>>> +(define_insn "kmovb"
>>>>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
>>>>> + (unspec:QI
>>>>> +  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>>>> +  UNSPEC_KMOV))]
>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
>>>>> +  "@
>>>>> +   kmovb\t{%k1, %0|%0, %k1}
>>>>> +   kmovb\t{%1, %0|%0, %1}";
>>>>> +  [(set_attr "mode" "QI")
>>>>> +   (set_attr "type" "mskmov")
>>>>> +   (set_attr "prefix" "vex")])
>>>>> +
>>>>> +(define_insn "kmovd"
>>>>> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
>>>>> + (unspec:SI
>>>>> +  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>>>>> +  UNSPEC_KMOV))]
>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>> +  "@
>>>>> +   kmovd\t{%k1, %0|%0, %k1}
>>>>> +   kmovd\t{%1, %0|%0, %1}";
>>>>> +  [(set_attr "mode" "SI")
>>>>> +   (set_attr "type" "mskmov")
>>>>> +   (set_attr "prefix" "vex")])
>>>>> +
>>>>> +(define_insn "kmovq"
>>>>> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
>>>>> + (unspec:DI
>>>>> +  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>>>> +  UNSPEC_KMOV))]
>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>> +  "@
>>>>> +   kmovq\t{%k1, %0|%0, %k1}
>>>>> +   kmovq\t{%1, %0|%0, %1}
>>>>> +   kmovq\t{%1, %0|%0, %1}";
>>>>> +  [(set_attr "mode" "DI")
>>>>> +   (set_attr "type" "mskmov")
>>>>> +   (set_attr "prefix" "vex")])
>>>>>
>>>>> - kmovd (and existing kmovw) should be using register_operand for
>>>>> opreand 0. In this case, there is no need for MEM_P checks at all.
>>>>> - In the insn constraint, pease check TARGET_AVX before checking MEM_P.
>>>>> - please put these definitions above corresponding *mov??_internal patterns.
>>>>
>>>> Do you mean put below *mov??_internal patterns? Attached corrected such way.
>>>
>>> No, please put kmovq near *movdi_internal, kmovd near *movsi_internal,
>>> etc. It doesn't matter if they are above or below their respective
>>> *mov??_internal patterns, as long as they are positioned in some
>>> consistent way. IOW, new patterns shouldn't be grouped together, as is
>>> the case with your patch.
>>
>> +(define_insn "kmovb"
>> +  [(set (match_operand:QI 0 "register_operand" "=k,k")
>> +    (unspec:QI
>> +      [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>> +      UNSPEC_KMOV))]
>> +  "TARGET_AVX512DQ && !MEM_P (operands[1])"
>>
>> There is no need for !MEM_P, this will prevent memory operand, which
>> is allowed by constraint "m".
>>
>> +(define_insn "kmovq"
>> +  [(set (match_operand:DI 0 "register_operand" "=k,k,km")
>> +    (unspec:DI
>> +      [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>> +      UNSPEC_KMOV))]
>> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>>
>> Operand 0 should have "nonimmediate_operand" predicate. And here you
>> need  && !(MEM_P (op0) && MEM_P (op1)) in insn constraint to prevent
>> mem->mem moves.
>
> Changed according your comments and attached.

Still not good.

+(define_insn "kmovd"
+  [(set (match_operand:SI 0 "register_operand" "=k,k")
+    (unspec:SI
+      [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
+      UNSPEC_KMOV))]
+  "TARGET_AVX512BW && !MEM_P (operands[1])"

Remove !MEM_P in the above pattern.

 (define_insn "kmovw"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=k,k")
+  [(set (match_operand:HI 0 "register_operand" "=k,k")
     (unspec:HI
       [(match_operand:HI 1 "nonimmediate_operand" "r,km")]
       UNSPEC_KMOV))]
-  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512F"
+  "TARGET_AVX512F && !MEM_P (operands[1])"

Also remove !MEM_P here.

+(define_insn "kadd<mode>"
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+    (plus:SWI1248x
+      (not:SWI1248x
+        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "add\t{%k2, %k1, %k0|%k0, %k1, %k2}";
+    case 1:
+      return "#";
+    case 2:
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+    return "kaddq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+    return "kaddd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
+    return "kaddb\t{%2, %1, %0|%0, %1, %2}";
+      else
+    return "kaddw\t{%2, %1, %0|%0, %1, %2}";
+

The above pattern is wrong. Is there really a NOT RTX present,
implying effectively a kaddn?

If this is plain add, then you need to change other add patterns, see
how logic patterns are amended with "k" constraint, added pattern
should look like *k<logic><mode> pattern.

 (define_insn "kandn<mode>"
-  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
-    (and:SWI12
-      (not:SWI12
-        (match_operand:SWI12 1 "register_operand" "r,0,k"))
-      (match_operand:SWI12 2 "register_operand" "r,r,k")))
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+    (and:SWI1248x
+      (not:SWI1248x
+        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_AVX512F"
 {
@@ -8319,10 +8358,50 @@
     case 1:
       return "#";
     case 2:
-      if (TARGET_AVX512DQ && <MODE>mode == QImode)
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+    return "kandnq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+    return "kandnd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
     return "kandnb\t{%2, %1, %0|%0, %1, %2}";
       else
     return "kandnw\t{%2, %1, %0|%0, %1, %2}";

The above should use SWI1248_AVX512BW mode iterator, see
*k<logic><mode> pattern.

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-11-11 19:14         ` Uros Bizjak
@ 2016-12-02 17:45           ` Andrew Senkevich
  2016-12-02 18:31             ` Uros Bizjak
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2016-12-02 17:45 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6745 bytes --]

2016-11-11 22:14 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> On Fri, Nov 11, 2016 at 7:23 PM, Andrew Senkevich
> <andrew.n.senkevich@gmail.com> wrote:
>> 2016-11-11 20:56 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>> On Fri, Nov 11, 2016 at 6:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> On Fri, Nov 11, 2016 at 6:38 PM, Andrew Senkevich
>>>> <andrew.n.senkevich@gmail.com> wrote:
>>>>> 2016-11-11 17:34 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>>> Some quick remarks:
>>>>>>
>>>>>> +(define_insn "kmovb"
>>>>>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
>>>>>> + (unspec:QI
>>>>>> +  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>>>>> +  UNSPEC_KMOV))]
>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
>>>>>> +  "@
>>>>>> +   kmovb\t{%k1, %0|%0, %k1}
>>>>>> +   kmovb\t{%1, %0|%0, %1}";
>>>>>> +  [(set_attr "mode" "QI")
>>>>>> +   (set_attr "type" "mskmov")
>>>>>> +   (set_attr "prefix" "vex")])
>>>>>> +
>>>>>> +(define_insn "kmovd"
>>>>>> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
>>>>>> + (unspec:SI
>>>>>> +  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>>>>>> +  UNSPEC_KMOV))]
>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>>> +  "@
>>>>>> +   kmovd\t{%k1, %0|%0, %k1}
>>>>>> +   kmovd\t{%1, %0|%0, %1}";
>>>>>> +  [(set_attr "mode" "SI")
>>>>>> +   (set_attr "type" "mskmov")
>>>>>> +   (set_attr "prefix" "vex")])
>>>>>> +
>>>>>> +(define_insn "kmovq"
>>>>>> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
>>>>>> + (unspec:DI
>>>>>> +  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>>>>> +  UNSPEC_KMOV))]
>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>>> +  "@
>>>>>> +   kmovq\t{%k1, %0|%0, %k1}
>>>>>> +   kmovq\t{%1, %0|%0, %1}
>>>>>> +   kmovq\t{%1, %0|%0, %1}";
>>>>>> +  [(set_attr "mode" "DI")
>>>>>> +   (set_attr "type" "mskmov")
>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>
>>>>>> - kmovd (and existing kmovw) should be using register_operand for
>>>>>> opreand 0. In this case, there is no need for MEM_P checks at all.
>>>>>> - In the insn constraint, pease check TARGET_AVX before checking MEM_P.
>>>>>> - please put these definitions above corresponding *mov??_internal patterns.
>>>>>
>>>>> Do you mean put below *mov??_internal patterns? Attached corrected such way.
>>>>
>>>> No, please put kmovq near *movdi_internal, kmovd near *movsi_internal,
>>>> etc. It doesn't matter if they are above or below their respective
>>>> *mov??_internal patterns, as long as they are positioned in some
>>>> consistent way. IOW, new patterns shouldn't be grouped together, as is
>>>> the case with your patch.
>>>
>>> +(define_insn "kmovb"
>>> +  [(set (match_operand:QI 0 "register_operand" "=k,k")
>>> +    (unspec:QI
>>> +      [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>> +      UNSPEC_KMOV))]
>>> +  "TARGET_AVX512DQ && !MEM_P (operands[1])"
>>>
>>> There is no need for !MEM_P, this will prevent memory operand, which
>>> is allowed by constraint "m".
>>>
>>> +(define_insn "kmovq"
>>> +  [(set (match_operand:DI 0 "register_operand" "=k,k,km")
>>> +    (unspec:DI
>>> +      [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>> +      UNSPEC_KMOV))]
>>> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>>>
>>> Operand 0 should have "nonimmediate_operand" predicate. And here you
>>> need  && !(MEM_P (op0) && MEM_P (op1)) in insn constraint to prevent
>>> mem->mem moves.
>>
>> Changed according your comments and attached.
>
> Still not good.
>
> +(define_insn "kmovd"
> +  [(set (match_operand:SI 0 "register_operand" "=k,k")
> +    (unspec:SI
> +      [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
> +      UNSPEC_KMOV))]
> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>
> Remove !MEM_P in the above pattern.
>
>  (define_insn "kmovw"
> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=k,k")
> +  [(set (match_operand:HI 0 "register_operand" "=k,k")
>      (unspec:HI
>        [(match_operand:HI 1 "nonimmediate_operand" "r,km")]
>        UNSPEC_KMOV))]
> -  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512F"
> +  "TARGET_AVX512F && !MEM_P (operands[1])"
>
> Also remove !MEM_P here.
>
> +(define_insn "kadd<mode>"
> +  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
> +    (plus:SWI1248x
> +      (not:SWI1248x
> +        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
> +      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512F"
> +{
> +  switch (which_alternative)
> +    {
> +    case 0:
> +      return "add\t{%k2, %k1, %k0|%k0, %k1, %k2}";
> +    case 1:
> +      return "#";
> +    case 2:
> +      if (TARGET_AVX512BW && <MODE>mode == DImode)
> +    return "kaddq\t{%2, %1, %0|%0, %1, %2}";
> +      else if (TARGET_AVX512BW && <MODE>mode == SImode)
> +    return "kaddd\t{%2, %1, %0|%0, %1, %2}";
> +      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
> +    return "kaddb\t{%2, %1, %0|%0, %1, %2}";
> +      else
> +    return "kaddw\t{%2, %1, %0|%0, %1, %2}";
> +
>
> The above pattern is wrong. Is there really a NOT RTX present,
> implying effectively a kaddn?
>
> If this is plain add, then you need to change other add patterns, see
> how logic patterns are amended with "k" constraint, added pattern
> should look like *k<logic><mode> pattern.
>
>  (define_insn "kandn<mode>"
> -  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
> -    (and:SWI12
> -      (not:SWI12
> -        (match_operand:SWI12 1 "register_operand" "r,0,k"))
> -      (match_operand:SWI12 2 "register_operand" "r,r,k")))
> +  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
> +    (and:SWI1248x
> +      (not:SWI1248x
> +        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
> +      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
>     (clobber (reg:CC FLAGS_REG))]
>    "TARGET_AVX512F"
>  {
> @@ -8319,10 +8358,50 @@
>      case 1:
>        return "#";
>      case 2:
> -      if (TARGET_AVX512DQ && <MODE>mode == QImode)
> +      if (TARGET_AVX512BW && <MODE>mode == DImode)
> +    return "kandnq\t{%2, %1, %0|%0, %1, %2}";
> +      else if (TARGET_AVX512BW && <MODE>mode == SImode)
> +    return "kandnd\t{%2, %1, %0|%0, %1, %2}";
> +      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
>      return "kandnb\t{%2, %1, %0|%0, %1, %2}";
>        else
>      return "kandnw\t{%2, %1, %0|%0, %1, %2}";
>
> The above should use SWI1248_AVX512BW mode iterator, see
> *k<logic><mode> pattern.

I split this patch after last updates in md files, here is the first
part which doesn't change md files.
Regtested on x86_64-linux-gnu.  Is this part ok?


--
WBR,
Andrew

[-- Attachment #2: avx512-kmask-intrin-part1.patch --]
[-- Type: application/octet-stream, Size: 30476 bytes --]

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 4069802..9e6e0ce 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,90 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 typedef unsigned long long __mmask64;
 
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask32 (__mmask32 __A)
+{
+  return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask64 (__mmask64 __A)
+{
+  return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kandndi ((__mmask64) __A, (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_mov_epi16 (__m512i __W, __mmask32 __U, __m512i __A)
@@ -114,6 +198,14 @@ _mm512_kunpackw (__mmask32 __A, __mmask32 __B)
 					      (__mmask32) __B);
 }
 
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+					      (__mmask32) __B);
+}
+
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
@@ -122,6 +214,14 @@ _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
 					      (__mmask64) __B);
 }
 
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackd_mask64 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask64) __builtin_ia32_kunpckdi ((__mmask64) __A,
+					      (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_loadu_epi8 (__m512i __W, __mmask64 __U, void const *__P)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 4b954f9..d2405c3 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,48 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */
 
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask8 (__mmask8 __A)
+{
+  return (__mmask8) __builtin_ia32_knotqi ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_korqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxnorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandnqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_broadcast_f64x2 (__m128d __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 2372c83..ab1704b 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9977,6 +9977,13 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 }
 
 /* Mask arithmetic operations */
+#define _kand_mask16 _mm512_kand
+#define _kandn_mask16 _mm512_kandn
+#define _knot_mask16 _mm512_knot
+#define _kor_mask16 _mm512_kor
+#define _kxnor_mask16 _mm512_kxnor
+#define _kxor_mask16 _mm512_kxor
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kand (__mmask16 __A, __mmask16 __B)
@@ -9988,7 +9995,8 @@ extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kandn (__mmask16 __A, __mmask16 __B)
 {
-  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A, (__mmask16) __B);
+  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+					     (__mmask16) __B);
 }
 
 extern __inline __mmask16
@@ -10042,6 +10050,13 @@ _mm512_kunpackb (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
 }
 
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 #ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 4a38c12..6e938eb 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -139,6 +139,12 @@ DEF_POINTER_TYPE (PLONGLONG, LONGLONG)
 DEF_POINTER_TYPE (PULONGLONG, ULONGLONG)
 DEF_POINTER_TYPE (PUNSIGNED, UNSIGNED)
 
+DEF_POINTER_TYPE (PUQI, UQI)
+DEF_POINTER_TYPE (PUHI, UHI)
+DEF_POINTER_TYPE (PUSI, USI)
+DEF_POINTER_TYPE (PUDI, UDI)
+DEF_POINTER_TYPE (PUCHAR, UCHAR)
+
 DEF_POINTER_TYPE (PV2SI, V2SI)
 DEF_POINTER_TYPE (PV2DF, V2DF)
 DEF_POINTER_TYPE (PV2DI, V2DI)
@@ -536,7 +542,28 @@ DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI)
 
 
 # Instructions returning mask
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI)
+
+DEF_FUNCTION_TYPE (USI, UQI)
+DEF_FUNCTION_TYPE (USI, UHI)
+DEF_FUNCTION_TYPE (UQI, USI)
+DEF_FUNCTION_TYPE (UHI, USI)
+
+DEF_FUNCTION_TYPE (UQI, UQI, INT)
+DEF_FUNCTION_TYPE (UHI, UHI, INT)
+DEF_FUNCTION_TYPE (USI, USI, INT)
+DEF_FUNCTION_TYPE (UDI, UDI, INT)
+DEF_FUNCTION_TYPE (UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI)
+DEF_FUNCTION_TYPE (USI, USI)
+DEF_FUNCTION_TYPE (UDI, UDI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
 DEF_FUNCTION_TYPE (USI, V32QI)
 DEF_FUNCTION_TYPE (UDI, V64QI)
@@ -549,6 +576,7 @@ DEF_FUNCTION_TYPE (UHI, V16SI)
 DEF_FUNCTION_TYPE (UQI, V2DI)
 DEF_FUNCTION_TYPE (UQI, V4DI)
 DEF_FUNCTION_TYPE (UQI, V8DI)
+DEF_FUNCTION_TYPE (UQI, UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI, UHI)
 DEF_FUNCTION_TYPE (USI, USI, USI)
 DEF_FUNCTION_TYPE (UDI, UDI, UDI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index a9c272a..83a5089 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1436,15 +1436,33 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandqi, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandhi, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandsi, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kanddi, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandnqi, "__builtin_ia32_kandnqi", IX86_BUILTIN_KANDN8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandnsi, "__builtin_ia32_kandnsi", IX86_BUILTIN_KANDN32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_knotqi, "__builtin_ia32_knotqi", IX86_BUILTIN_KNOT8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_knothi, "__builtin_ia32_knothi", IX86_BUILTIN_KNOT16, UNKNOWN, (int) UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_knotsi, "__builtin_ia32_knotsi", IX86_BUILTIN_KNOT32, UNKNOWN, (int) USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_knotdi, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kiorqi, "__builtin_ia32_korqi", IX86_BUILTIN_KOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kiorhi, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kiorsi, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestchi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestzhi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnorsi, "__builtin_ia32_kxnorsi", IX86_BUILTIN_KXNOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxorqi, "__builtin_ia32_kxorqi", IX86_BUILTIN_KXOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxorhi, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxorsi, "__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxordi, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kmov16", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI)
 
 /* SHA */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5018ccb..e0ab145 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34793,7 +34793,12 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8HI:
     case V4DI_FTYPE_V4SI:
     case V4DI_FTYPE_V2DI:
+    case UQI_FTYPE_UQI:
     case UHI_FTYPE_UHI:
+    case USI_FTYPE_USI:
+    case USI_FTYPE_UQI:
+    case USI_FTYPE_UHI:
+    case UDI_FTYPE_UDI:
     case UHI_FTYPE_V16QI:
     case USI_FTYPE_V32QI:
     case UDI_FTYPE_V64QI:
@@ -34927,6 +34932,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case UINT_FTYPE_UINT_UCHAR:
     case UINT16_FTYPE_UINT16_INT:
     case UINT8_FTYPE_UINT8_INT:
+    case UQI_FTYPE_UQI_UQI:
     case UHI_FTYPE_UHI_UHI:
     case USI_FTYPE_USI_USI:
     case UDI_FTYPE_UDI_UDI:
@@ -34974,6 +34980,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8DI_INT:
     case QI_FTYPE_V4SF_INT:
     case QI_FTYPE_V2DF_INT:
+    case UQI_FTYPE_UQI_INT:
+    case UHI_FTYPE_UHI_INT:
+    case USI_FTYPE_USI_INT:
+    case UDI_FTYPE_UDI_INT:
       nargs = 2;
       nargs_constant = 1;
       break;
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
new file mode 100644
index 0000000..2a934f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
new file mode 100644
index 0000000..6b68ab3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
new file mode 100644
index 0000000..35f1c12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
new file mode 100644
index 0000000..a1aaed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
new file mode 100644
index 0000000..dd6b6e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask32 (k1);
+  x = _mm512_mask_add_epi16 (x, k1, x, x);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
new file mode 100644
index 0000000..5b94358
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask64 (k1);
+  x = _mm512_mask_add_epi8 (x, k1, x, x);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
new file mode 100644
index 0000000..163c46e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
new file mode 100644
index 0000000..77b1b9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "korq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
new file mode 100644
index 0000000..951260f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckdq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k3;
+  __mmask32 k1, k2;
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackd_mask64 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
new file mode 100644
index 0000000..c68ad8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckwd\[ \\t\]+\[^\{\n\]*%k\[1-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k3;
+  __mmask16 k1, k2;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackw_mask32 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
new file mode 100644
index 0000000..ccf4b63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
new file mode 100644
index 0000000..b9c0979
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
new file mode 100644
index 0000000..ce03ab4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
new file mode 100644
index 0000000..d6366dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
new file mode 100644
index 0000000..b5b5367
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask8 (k1, k2);
+  x = _mm512_mask_add_epi64 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
new file mode 100644
index 0000000..ff50610
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandnb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
new file mode 100644
index 0000000..8bb9249
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "knotb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask8 (k1);
+  x = _mm512_mask_add_pd (x, k1, x, x);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
new file mode 100644
index 0000000..22b727d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "korb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
new file mode 100644
index 0000000..ee21aa1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxnorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
new file mode 100644
index 0000000..63a1ff8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
new file mode 100644
index 0000000..2061f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kunpckbw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test () {
+  __mmask8 k1, k2;
+  __mmask16 k3;
+  volatile __m512 x = _mm512_setzero_ps(); 
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackb_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-12-02 17:45           ` Andrew Senkevich
@ 2016-12-02 18:31             ` Uros Bizjak
  2016-12-05 14:59               ` Andrew Senkevich
  2016-12-14 19:33               ` Andrew Senkevich
  0 siblings, 2 replies; 48+ messages in thread
From: Uros Bizjak @ 2016-12-02 18:31 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: gcc-patches

On Fri, Dec 2, 2016 at 6:44 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2016-11-11 22:14 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>> On Fri, Nov 11, 2016 at 7:23 PM, Andrew Senkevich
>> <andrew.n.senkevich@gmail.com> wrote:
>>> 2016-11-11 20:56 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>> On Fri, Nov 11, 2016 at 6:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>> On Fri, Nov 11, 2016 at 6:38 PM, Andrew Senkevich
>>>>> <andrew.n.senkevich@gmail.com> wrote:
>>>>>> 2016-11-11 17:34 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>>>> Some quick remarks:
>>>>>>>
>>>>>>> +(define_insn "kmovb"
>>>>>>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
>>>>>>> + (unspec:QI
>>>>>>> +  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>>>>>> +  UNSPEC_KMOV))]
>>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
>>>>>>> +  "@
>>>>>>> +   kmovb\t{%k1, %0|%0, %k1}
>>>>>>> +   kmovb\t{%1, %0|%0, %1}";
>>>>>>> +  [(set_attr "mode" "QI")
>>>>>>> +   (set_attr "type" "mskmov")
>>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>> +
>>>>>>> +(define_insn "kmovd"
>>>>>>> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
>>>>>>> + (unspec:SI
>>>>>>> +  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>>>>>>> +  UNSPEC_KMOV))]
>>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>>>> +  "@
>>>>>>> +   kmovd\t{%k1, %0|%0, %k1}
>>>>>>> +   kmovd\t{%1, %0|%0, %1}";
>>>>>>> +  [(set_attr "mode" "SI")
>>>>>>> +   (set_attr "type" "mskmov")
>>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>> +
>>>>>>> +(define_insn "kmovq"
>>>>>>> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
>>>>>>> + (unspec:DI
>>>>>>> +  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>>>>>> +  UNSPEC_KMOV))]
>>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>>>> +  "@
>>>>>>> +   kmovq\t{%k1, %0|%0, %k1}
>>>>>>> +   kmovq\t{%1, %0|%0, %1}
>>>>>>> +   kmovq\t{%1, %0|%0, %1}";
>>>>>>> +  [(set_attr "mode" "DI")
>>>>>>> +   (set_attr "type" "mskmov")
>>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>>
>>>>>>> - kmovd (and existing kmovw) should be using register_operand for
>>>>>>> opreand 0. In this case, there is no need for MEM_P checks at all.
>>>>>>> - In the insn constraint, pease check TARGET_AVX before checking MEM_P.
>>>>>>> - please put these definitions above corresponding *mov??_internal patterns.
>>>>>>
>>>>>> Do you mean put below *mov??_internal patterns? Attached corrected such way.
>>>>>
>>>>> No, please put kmovq near *movdi_internal, kmovd near *movsi_internal,
>>>>> etc. It doesn't matter if they are above or below their respective
>>>>> *mov??_internal patterns, as long as they are positioned in some
>>>>> consistent way. IOW, new patterns shouldn't be grouped together, as is
>>>>> the case with your patch.
>>>>
>>>> +(define_insn "kmovb"
>>>> +  [(set (match_operand:QI 0 "register_operand" "=k,k")
>>>> +    (unspec:QI
>>>> +      [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>>> +      UNSPEC_KMOV))]
>>>> +  "TARGET_AVX512DQ && !MEM_P (operands[1])"
>>>>
>>>> There is no need for !MEM_P, this will prevent memory operand, which
>>>> is allowed by constraint "m".
>>>>
>>>> +(define_insn "kmovq"
>>>> +  [(set (match_operand:DI 0 "register_operand" "=k,k,km")
>>>> +    (unspec:DI
>>>> +      [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>>> +      UNSPEC_KMOV))]
>>>> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>>>>
>>>> Operand 0 should have "nonimmediate_operand" predicate. And here you
>>>> need  && !(MEM_P (op0) && MEM_P (op1)) in insn constraint to prevent
>>>> mem->mem moves.
>>>
>>> Changed according your comments and attached.
>>
>> Still not good.
>>
>> +(define_insn "kmovd"
>> +  [(set (match_operand:SI 0 "register_operand" "=k,k")
>> +    (unspec:SI
>> +      [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>> +      UNSPEC_KMOV))]
>> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>>
>> Remove !MEM_P in the above pattern.
>>
>>  (define_insn "kmovw"
>> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=k,k")
>> +  [(set (match_operand:HI 0 "register_operand" "=k,k")
>>      (unspec:HI
>>        [(match_operand:HI 1 "nonimmediate_operand" "r,km")]
>>        UNSPEC_KMOV))]
>> -  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512F"
>> +  "TARGET_AVX512F && !MEM_P (operands[1])"
>>
>> Also remove !MEM_P here.
>>
>> +(define_insn "kadd<mode>"
>> +  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
>> +    (plus:SWI1248x
>> +      (not:SWI1248x
>> +        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
>> +      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
>> +   (clobber (reg:CC FLAGS_REG))]
>> +  "TARGET_AVX512F"
>> +{
>> +  switch (which_alternative)
>> +    {
>> +    case 0:
>> +      return "add\t{%k2, %k1, %k0|%k0, %k1, %k2}";
>> +    case 1:
>> +      return "#";
>> +    case 2:
>> +      if (TARGET_AVX512BW && <MODE>mode == DImode)
>> +    return "kaddq\t{%2, %1, %0|%0, %1, %2}";
>> +      else if (TARGET_AVX512BW && <MODE>mode == SImode)
>> +    return "kaddd\t{%2, %1, %0|%0, %1, %2}";
>> +      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
>> +    return "kaddb\t{%2, %1, %0|%0, %1, %2}";
>> +      else
>> +    return "kaddw\t{%2, %1, %0|%0, %1, %2}";
>> +
>>
>> The above pattern is wrong. Is there really a NOT RTX present,
>> implying effectively a kaddn?
>>
>> If this is plain add, then you need to change other add patterns, see
>> how logic patterns are amended with "k" constraint, added pattern
>> should look like *k<logic><mode> pattern.
>>
>>  (define_insn "kandn<mode>"
>> -  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
>> -    (and:SWI12
>> -      (not:SWI12
>> -        (match_operand:SWI12 1 "register_operand" "r,0,k"))
>> -      (match_operand:SWI12 2 "register_operand" "r,r,k")))
>> +  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
>> +    (and:SWI1248x
>> +      (not:SWI1248x
>> +        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
>> +      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
>>     (clobber (reg:CC FLAGS_REG))]
>>    "TARGET_AVX512F"
>>  {
>> @@ -8319,10 +8358,50 @@
>>      case 1:
>>        return "#";
>>      case 2:
>> -      if (TARGET_AVX512DQ && <MODE>mode == QImode)
>> +      if (TARGET_AVX512BW && <MODE>mode == DImode)
>> +    return "kandnq\t{%2, %1, %0|%0, %1, %2}";
>> +      else if (TARGET_AVX512BW && <MODE>mode == SImode)
>> +    return "kandnd\t{%2, %1, %0|%0, %1, %2}";
>> +      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
>>      return "kandnb\t{%2, %1, %0|%0, %1, %2}";
>>        else
>>      return "kandnw\t{%2, %1, %0|%0, %1, %2}";
>>
>> The above should use SWI1248_AVX512BW mode iterator, see
>> *k<logic><mode> pattern.
>
> I split this patch after last updates in md files, here is the first
> part which doesn't change md files.
> Regtested on x86_64-linux-gnu.  Is this part ok?

There is no point to scan for kmovX insn in e.g.:

+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );

since you emit it from inline asm.

Please remove these pointles kmovX scan-asm-times directives from the
testcases, and please also remove it  from avx512f-kandnw-1.c
testcase.

The patch is OK with this change.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-12-02 18:31             ` Uros Bizjak
@ 2016-12-05 14:59               ` Andrew Senkevich
  2016-12-05 17:19                 ` H.J. Lu
  2016-12-14 19:33               ` Andrew Senkevich
  1 sibling, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2016-12-05 14:59 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 7918 bytes --]

2016-12-02 21:31 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> On Fri, Dec 2, 2016 at 6:44 PM, Andrew Senkevich
> <andrew.n.senkevich@gmail.com> wrote:
>> 2016-11-11 22:14 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>> On Fri, Nov 11, 2016 at 7:23 PM, Andrew Senkevich
>>> <andrew.n.senkevich@gmail.com> wrote:
>>>> 2016-11-11 20:56 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>> On Fri, Nov 11, 2016 at 6:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>>> On Fri, Nov 11, 2016 at 6:38 PM, Andrew Senkevich
>>>>>> <andrew.n.senkevich@gmail.com> wrote:
>>>>>>> 2016-11-11 17:34 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>>>>> Some quick remarks:
>>>>>>>>
>>>>>>>> +(define_insn "kmovb"
>>>>>>>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
>>>>>>>> + (unspec:QI
>>>>>>>> +  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>>>>>>> +  UNSPEC_KMOV))]
>>>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
>>>>>>>> +  "@
>>>>>>>> +   kmovb\t{%k1, %0|%0, %k1}
>>>>>>>> +   kmovb\t{%1, %0|%0, %1}";
>>>>>>>> +  [(set_attr "mode" "QI")
>>>>>>>> +   (set_attr "type" "mskmov")
>>>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>>> +
>>>>>>>> +(define_insn "kmovd"
>>>>>>>> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
>>>>>>>> + (unspec:SI
>>>>>>>> +  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>>>>>>>> +  UNSPEC_KMOV))]
>>>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>>>>> +  "@
>>>>>>>> +   kmovd\t{%k1, %0|%0, %k1}
>>>>>>>> +   kmovd\t{%1, %0|%0, %1}";
>>>>>>>> +  [(set_attr "mode" "SI")
>>>>>>>> +   (set_attr "type" "mskmov")
>>>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>>> +
>>>>>>>> +(define_insn "kmovq"
>>>>>>>> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
>>>>>>>> + (unspec:DI
>>>>>>>> +  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>>>>>>> +  UNSPEC_KMOV))]
>>>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>>>>> +  "@
>>>>>>>> +   kmovq\t{%k1, %0|%0, %k1}
>>>>>>>> +   kmovq\t{%1, %0|%0, %1}
>>>>>>>> +   kmovq\t{%1, %0|%0, %1}";
>>>>>>>> +  [(set_attr "mode" "DI")
>>>>>>>> +   (set_attr "type" "mskmov")
>>>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>>>
>>>>>>>> - kmovd (and existing kmovw) should be using register_operand for
>>>>>>>> opreand 0. In this case, there is no need for MEM_P checks at all.
>>>>>>>> - In the insn constraint, pease check TARGET_AVX before checking MEM_P.
>>>>>>>> - please put these definitions above corresponding *mov??_internal patterns.
>>>>>>>
>>>>>>> Do you mean put below *mov??_internal patterns? Attached corrected such way.
>>>>>>
>>>>>> No, please put kmovq near *movdi_internal, kmovd near *movsi_internal,
>>>>>> etc. It doesn't matter if they are above or below their respective
>>>>>> *mov??_internal patterns, as long as they are positioned in some
>>>>>> consistent way. IOW, new patterns shouldn't be grouped together, as is
>>>>>> the case with your patch.
>>>>>
>>>>> +(define_insn "kmovb"
>>>>> +  [(set (match_operand:QI 0 "register_operand" "=k,k")
>>>>> +    (unspec:QI
>>>>> +      [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>>>> +      UNSPEC_KMOV))]
>>>>> +  "TARGET_AVX512DQ && !MEM_P (operands[1])"
>>>>>
>>>>> There is no need for !MEM_P, this will prevent memory operand, which
>>>>> is allowed by constraint "m".
>>>>>
>>>>> +(define_insn "kmovq"
>>>>> +  [(set (match_operand:DI 0 "register_operand" "=k,k,km")
>>>>> +    (unspec:DI
>>>>> +      [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>>>> +      UNSPEC_KMOV))]
>>>>> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>>>>>
>>>>> Operand 0 should have "nonimmediate_operand" predicate. And here you
>>>>> need  && !(MEM_P (op0) && MEM_P (op1)) in insn constraint to prevent
>>>>> mem->mem moves.
>>>>
>>>> Changed according your comments and attached.
>>>
>>> Still not good.
>>>
>>> +(define_insn "kmovd"
>>> +  [(set (match_operand:SI 0 "register_operand" "=k,k")
>>> +    (unspec:SI
>>> +      [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>>> +      UNSPEC_KMOV))]
>>> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>>>
>>> Remove !MEM_P in the above pattern.
>>>
>>>  (define_insn "kmovw"
>>> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=k,k")
>>> +  [(set (match_operand:HI 0 "register_operand" "=k,k")
>>>      (unspec:HI
>>>        [(match_operand:HI 1 "nonimmediate_operand" "r,km")]
>>>        UNSPEC_KMOV))]
>>> -  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512F"
>>> +  "TARGET_AVX512F && !MEM_P (operands[1])"
>>>
>>> Also remove !MEM_P here.
>>>
>>> +(define_insn "kadd<mode>"
>>> +  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
>>> +    (plus:SWI1248x
>>> +      (not:SWI1248x
>>> +        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
>>> +      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
>>> +   (clobber (reg:CC FLAGS_REG))]
>>> +  "TARGET_AVX512F"
>>> +{
>>> +  switch (which_alternative)
>>> +    {
>>> +    case 0:
>>> +      return "add\t{%k2, %k1, %k0|%k0, %k1, %k2}";
>>> +    case 1:
>>> +      return "#";
>>> +    case 2:
>>> +      if (TARGET_AVX512BW && <MODE>mode == DImode)
>>> +    return "kaddq\t{%2, %1, %0|%0, %1, %2}";
>>> +      else if (TARGET_AVX512BW && <MODE>mode == SImode)
>>> +    return "kaddd\t{%2, %1, %0|%0, %1, %2}";
>>> +      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
>>> +    return "kaddb\t{%2, %1, %0|%0, %1, %2}";
>>> +      else
>>> +    return "kaddw\t{%2, %1, %0|%0, %1, %2}";
>>> +
>>>
>>> The above pattern is wrong. Is there really a NOT RTX present,
>>> implying effectively a kaddn?
>>>
>>> If this is plain add, then you need to change other add patterns, see
>>> how logic patterns are amended with "k" constraint, added pattern
>>> should look like *k<logic><mode> pattern.
>>>
>>>  (define_insn "kandn<mode>"
>>> -  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
>>> -    (and:SWI12
>>> -      (not:SWI12
>>> -        (match_operand:SWI12 1 "register_operand" "r,0,k"))
>>> -      (match_operand:SWI12 2 "register_operand" "r,r,k")))
>>> +  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
>>> +    (and:SWI1248x
>>> +      (not:SWI1248x
>>> +        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
>>> +      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
>>>     (clobber (reg:CC FLAGS_REG))]
>>>    "TARGET_AVX512F"
>>>  {
>>> @@ -8319,10 +8358,50 @@
>>>      case 1:
>>>        return "#";
>>>      case 2:
>>> -      if (TARGET_AVX512DQ && <MODE>mode == QImode)
>>> +      if (TARGET_AVX512BW && <MODE>mode == DImode)
>>> +    return "kandnq\t{%2, %1, %0|%0, %1, %2}";
>>> +      else if (TARGET_AVX512BW && <MODE>mode == SImode)
>>> +    return "kandnd\t{%2, %1, %0|%0, %1, %2}";
>>> +      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
>>>      return "kandnb\t{%2, %1, %0|%0, %1, %2}";
>>>        else
>>>      return "kandnw\t{%2, %1, %0|%0, %1, %2}";
>>>
>>> The above should use SWI1248_AVX512BW mode iterator, see
>>> *k<logic><mode> pattern.
>>
>> I split this patch after last updates in md files, here is the first
>> part which doesn't change md files.
>> Regtested on x86_64-linux-gnu.  Is this part ok?
>
> There is no point to scan for kmovX insn in e.g.:
>
> +/* { dg-final { scan-assembler-times "kmovq" 2 } } */
> +
> +#include <immintrin.h>
> +
> +void
> +avx512bw_test ()
> +{
> +  __mmask64 k1, k2, k3;
> +  volatile __m512i x = _mm512_setzero_si512 ();
> +
> +  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
> +  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
>
> since you emit it from inline asm.
>
> Please remove these pointles kmovX scan-asm-times directives from the
> testcases, and please also remove it  from avx512f-kandnw-1.c
> testcase.
>
> The patch is OK with this change.

Attached fixed with updated ChangeLogs.

HJ, could you commit please?


--
WBR,
Andrew

[-- Attachment #2: avx512-kmask-intrin-part1_v2.patch --]
[-- Type: application/octet-stream, Size: 32933 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1ace8b0..02d560d
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,25 @@
+2016-12-05  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+	* config/i386/avx512bwintrin.h: Add new k-mask intrinsics.
+	* config/i386/avx512dqintrin.h: Ditto.
+	* config/i386/avx512fintrin.h: Ditto.
+	* config/i386/i386-builtin-types.def (UCHAR_FTYPE_UQI_UQI_PUCHAR,
+	UCHAR_FTYPE_UHI_UHI_PUCHAR, UCHAR_FTYPE_USI_USI_PUCHAR,
+	UCHAR_FTYPE_UDI_UDI_PUCHAR, UCHAR_FTYPE_UQI_UQI, UCHAR_FTYPE_UHI_UHI,
+	UCHAR_FTYPE_USI_USI, UCHAR_FTYPE_UDI_UDI, UQI_FTYPE_UQI_INT,
+	UHI_FTYPE_UHI_INT, USI_FTYPE_USI_INT, UDI_FTYPE_UDI_INT,
+	UQI_FTYPE_UQI, USI_FTYPE_USI, UDI_FTYPE_UDI, UQI_FTYPE_UQI_UQI): New
+	function types.
+	* config/i386/i386-builtin.def (__builtin_ia32_knotqi,
+	__builtin_ia32_knotsi, __builtin_ia32_knotdi,
+	__builtin_ia32_korqi, __builtin_ia32_korsi, __builtin_ia32_kordi,
+	__builtin_ia32_kxnorqi, __builtin_ia32_kxnorsi,
+	__builtin_ia32_kxnordi, __builtin_ia32_kxorqi, __builtin_ia32_kxorsi,
+	__builtin_ia32_kxordi, __builtin_ia32_kandqi,
+	__builtin_ia32_kandsi, __builtin_ia32_kanddi, __builtin_ia32_kandnqi,
+	__builtin_ia32_kandnsi, __builtin_ia32_kandndi): New.
+	* config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
+
 2016-12-05  Segher Boessenkool  <segher@kernel.crashing.org>
 
 	* combine.c: Revert r243162.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d9edb52..3b0a8fa
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,28 @@
+2016-12-05  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+	* gcc.target/i386/avx512bw-kandd-1.c: New.
+	* gcc.target/i386/avx512bw-kandnd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandnq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-knotd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-knotq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-korq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kunpckdq-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kxnord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxnorq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxorq-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kandb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kandnb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-knotb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-korb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kxnorb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kxorb-1.c: Ditto.
+	* gcc.target/i386/avx512f-kunpckbw-3.c: Ditto.
+	* gcc.target/i386/avx512f-kandnw-1.c: Removed unneeded check.
+
 2016-12-05  Paolo Bonzini  <bonzini@gnu.org>
 
 	* gcc.dg/fold-and-lshift.c, gcc.dg/fold-and-rshift-1.c,
diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 4069802..9e6e0ce 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,90 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 typedef unsigned long long __mmask64;
 
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask32 (__mmask32 __A)
+{
+  return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask64 (__mmask64 __A)
+{
+  return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kandndi ((__mmask64) __A, (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_mov_epi16 (__m512i __W, __mmask32 __U, __m512i __A)
@@ -114,6 +198,14 @@ _mm512_kunpackw (__mmask32 __A, __mmask32 __B)
 					      (__mmask32) __B);
 }
 
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+					      (__mmask32) __B);
+}
+
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
@@ -122,6 +214,14 @@ _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
 					      (__mmask64) __B);
 }
 
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackd_mask64 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask64) __builtin_ia32_kunpckdi ((__mmask64) __A,
+					      (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_loadu_epi8 (__m512i __W, __mmask64 __U, void const *__P)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 4b954f9..d2405c3 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,48 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */
 
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask8 (__mmask8 __A)
+{
+  return (__mmask8) __builtin_ia32_knotqi ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_korqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxnorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandnqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_broadcast_f64x2 (__m128d __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 2372c83..ab1704b 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9977,6 +9977,13 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 }
 
 /* Mask arithmetic operations */
+#define _kand_mask16 _mm512_kand
+#define _kandn_mask16 _mm512_kandn
+#define _knot_mask16 _mm512_knot
+#define _kor_mask16 _mm512_kor
+#define _kxnor_mask16 _mm512_kxnor
+#define _kxor_mask16 _mm512_kxor
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kand (__mmask16 __A, __mmask16 __B)
@@ -9988,7 +9995,8 @@ extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kandn (__mmask16 __A, __mmask16 __B)
 {
-  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A, (__mmask16) __B);
+  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+					     (__mmask16) __B);
 }
 
 extern __inline __mmask16
@@ -10042,6 +10050,13 @@ _mm512_kunpackb (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
 }
 
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 #ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 4a38c12..6e938eb 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -139,6 +139,12 @@ DEF_POINTER_TYPE (PLONGLONG, LONGLONG)
 DEF_POINTER_TYPE (PULONGLONG, ULONGLONG)
 DEF_POINTER_TYPE (PUNSIGNED, UNSIGNED)
 
+DEF_POINTER_TYPE (PUQI, UQI)
+DEF_POINTER_TYPE (PUHI, UHI)
+DEF_POINTER_TYPE (PUSI, USI)
+DEF_POINTER_TYPE (PUDI, UDI)
+DEF_POINTER_TYPE (PUCHAR, UCHAR)
+
 DEF_POINTER_TYPE (PV2SI, V2SI)
 DEF_POINTER_TYPE (PV2DF, V2DF)
 DEF_POINTER_TYPE (PV2DI, V2DI)
@@ -536,7 +542,28 @@ DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI)
 
 
 # Instructions returning mask
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI)
+
+DEF_FUNCTION_TYPE (USI, UQI)
+DEF_FUNCTION_TYPE (USI, UHI)
+DEF_FUNCTION_TYPE (UQI, USI)
+DEF_FUNCTION_TYPE (UHI, USI)
+
+DEF_FUNCTION_TYPE (UQI, UQI, INT)
+DEF_FUNCTION_TYPE (UHI, UHI, INT)
+DEF_FUNCTION_TYPE (USI, USI, INT)
+DEF_FUNCTION_TYPE (UDI, UDI, INT)
+DEF_FUNCTION_TYPE (UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI)
+DEF_FUNCTION_TYPE (USI, USI)
+DEF_FUNCTION_TYPE (UDI, UDI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
 DEF_FUNCTION_TYPE (USI, V32QI)
 DEF_FUNCTION_TYPE (UDI, V64QI)
@@ -549,6 +576,7 @@ DEF_FUNCTION_TYPE (UHI, V16SI)
 DEF_FUNCTION_TYPE (UQI, V2DI)
 DEF_FUNCTION_TYPE (UQI, V4DI)
 DEF_FUNCTION_TYPE (UQI, V8DI)
+DEF_FUNCTION_TYPE (UQI, UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI, UHI)
 DEF_FUNCTION_TYPE (USI, USI, USI)
 DEF_FUNCTION_TYPE (UDI, UDI, UDI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index a9c272a..83a5089 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1436,15 +1436,33 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandqi, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandhi, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandsi, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kanddi, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandnqi, "__builtin_ia32_kandnqi", IX86_BUILTIN_KANDN8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandnsi, "__builtin_ia32_kandnsi", IX86_BUILTIN_KANDN32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_knotqi, "__builtin_ia32_knotqi", IX86_BUILTIN_KNOT8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_knothi, "__builtin_ia32_knothi", IX86_BUILTIN_KNOT16, UNKNOWN, (int) UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_knotsi, "__builtin_ia32_knotsi", IX86_BUILTIN_KNOT32, UNKNOWN, (int) USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_knotdi, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kiorqi, "__builtin_ia32_korqi", IX86_BUILTIN_KOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kiorhi, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kiorsi, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestchi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestzhi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnorsi, "__builtin_ia32_kxnorsi", IX86_BUILTIN_KXNOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxorqi, "__builtin_ia32_kxorqi", IX86_BUILTIN_KXOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxorhi, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxorsi, "__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxordi, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kmov16", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI)
 
 /* SHA */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 41717da..003439f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34842,7 +34842,12 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8HI:
     case V4DI_FTYPE_V4SI:
     case V4DI_FTYPE_V2DI:
+    case UQI_FTYPE_UQI:
     case UHI_FTYPE_UHI:
+    case USI_FTYPE_USI:
+    case USI_FTYPE_UQI:
+    case USI_FTYPE_UHI:
+    case UDI_FTYPE_UDI:
     case UHI_FTYPE_V16QI:
     case USI_FTYPE_V32QI:
     case UDI_FTYPE_V64QI:
@@ -34976,6 +34981,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case UINT_FTYPE_UINT_UCHAR:
     case UINT16_FTYPE_UINT16_INT:
     case UINT8_FTYPE_UINT8_INT:
+    case UQI_FTYPE_UQI_UQI:
     case UHI_FTYPE_UHI_UHI:
     case USI_FTYPE_USI_USI:
     case UDI_FTYPE_UDI_UDI:
@@ -35023,6 +35029,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8DI_INT:
     case QI_FTYPE_V4SF_INT:
     case QI_FTYPE_V2DF_INT:
+    case UQI_FTYPE_UQI_INT:
+    case UHI_FTYPE_UHI_INT:
+    case USI_FTYPE_USI_INT:
+    case UDI_FTYPE_UDI_INT:
       nargs = 2;
       nargs_constant = 1;
       break;
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
new file mode 100644
index 0000000..2a934f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
new file mode 100644
index 0000000..69cbe04
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
new file mode 100644
index 0000000..e8b7a5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
new file mode 100644
index 0000000..a1aaed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
new file mode 100644
index 0000000..8a7e033
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask32 (k1);
+  x = _mm512_mask_add_epi16 (x, k1, x, x);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
new file mode 100644
index 0000000..deb6579
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask64 (k1);
+  x = _mm512_mask_add_epi8 (x, k1, x, x);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
new file mode 100644
index 0000000..4c35a81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
new file mode 100644
index 0000000..89753f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "korq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
new file mode 100644
index 0000000..951260f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckdq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k3;
+  __mmask32 k1, k2;
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackd_mask64 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
new file mode 100644
index 0000000..c68ad8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckwd\[ \\t\]+\[^\{\n\]*%k\[1-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k3;
+  __mmask16 k1, k2;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackw_mask32 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
new file mode 100644
index 0000000..d93d61e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
new file mode 100644
index 0000000..ba72e1f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
new file mode 100644
index 0000000..97ea291
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
new file mode 100644
index 0000000..abf4280
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
new file mode 100644
index 0000000..b5b5367
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask8 (k1, k2);
+  x = _mm512_mask_add_epi64 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
new file mode 100644
index 0000000..a0e96fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandnb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
new file mode 100644
index 0000000..03bbf83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "knotb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask8 (k1);
+  x = _mm512_mask_add_pd (x, k1, x, x);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
new file mode 100644
index 0000000..7717aee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "korb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
new file mode 100644
index 0000000..faa974f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxnorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
new file mode 100644
index 0000000..a21830b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kandnw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kandnw-1.c
index 727a589..17b7b29 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-kandnw-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kandnw-1.c
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -O2" } */
 /* { dg-final { scan-assembler-times "kandnw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "kmovw" 2 } } */
 
 #include <immintrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
new file mode 100644
index 0000000..2061f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kunpckbw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test () {
+  __mmask8 k1, k2;
+  __mmask16 k3;
+  volatile __m512 x = _mm512_setzero_ps(); 
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackb_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-12-05 14:59               ` Andrew Senkevich
@ 2016-12-05 17:19                 ` H.J. Lu
  0 siblings, 0 replies; 48+ messages in thread
From: H.J. Lu @ 2016-12-05 17:19 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Uros Bizjak, GCC Patches

On Mon, Dec 5, 2016 at 6:59 AM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2016-12-02 21:31 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>> On Fri, Dec 2, 2016 at 6:44 PM, Andrew Senkevich
>> <andrew.n.senkevich@gmail.com> wrote:
>>> 2016-11-11 22:14 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>> On Fri, Nov 11, 2016 at 7:23 PM, Andrew Senkevich
>>>> <andrew.n.senkevich@gmail.com> wrote:
>>>>> 2016-11-11 20:56 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>>> On Fri, Nov 11, 2016 at 6:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>>>> On Fri, Nov 11, 2016 at 6:38 PM, Andrew Senkevich
>>>>>>> <andrew.n.senkevich@gmail.com> wrote:
>>>>>>>> 2016-11-11 17:34 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>>>>>> Some quick remarks:
>>>>>>>>>
>>>>>>>>> +(define_insn "kmovb"
>>>>>>>>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
>>>>>>>>> + (unspec:QI
>>>>>>>>> +  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>>>>>>>> +  UNSPEC_KMOV))]
>>>>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
>>>>>>>>> +  "@
>>>>>>>>> +   kmovb\t{%k1, %0|%0, %k1}
>>>>>>>>> +   kmovb\t{%1, %0|%0, %1}";
>>>>>>>>> +  [(set_attr "mode" "QI")
>>>>>>>>> +   (set_attr "type" "mskmov")
>>>>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>>>> +
>>>>>>>>> +(define_insn "kmovd"
>>>>>>>>> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
>>>>>>>>> + (unspec:SI
>>>>>>>>> +  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>>>>>>>>> +  UNSPEC_KMOV))]
>>>>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>>>>>> +  "@
>>>>>>>>> +   kmovd\t{%k1, %0|%0, %k1}
>>>>>>>>> +   kmovd\t{%1, %0|%0, %1}";
>>>>>>>>> +  [(set_attr "mode" "SI")
>>>>>>>>> +   (set_attr "type" "mskmov")
>>>>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>>>> +
>>>>>>>>> +(define_insn "kmovq"
>>>>>>>>> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
>>>>>>>>> + (unspec:DI
>>>>>>>>> +  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>>>>>>>> +  UNSPEC_KMOV))]
>>>>>>>>> +  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
>>>>>>>>> +  "@
>>>>>>>>> +   kmovq\t{%k1, %0|%0, %k1}
>>>>>>>>> +   kmovq\t{%1, %0|%0, %1}
>>>>>>>>> +   kmovq\t{%1, %0|%0, %1}";
>>>>>>>>> +  [(set_attr "mode" "DI")
>>>>>>>>> +   (set_attr "type" "mskmov")
>>>>>>>>> +   (set_attr "prefix" "vex")])
>>>>>>>>>
>>>>>>>>> - kmovd (and existing kmovw) should be using register_operand for
>>>>>>>>> opreand 0. In this case, there is no need for MEM_P checks at all.
>>>>>>>>> - In the insn constraint, pease check TARGET_AVX before checking MEM_P.
>>>>>>>>> - please put these definitions above corresponding *mov??_internal patterns.
>>>>>>>>
>>>>>>>> Do you mean put below *mov??_internal patterns? Attached corrected such way.
>>>>>>>
>>>>>>> No, please put kmovq near *movdi_internal, kmovd near *movsi_internal,
>>>>>>> etc. It doesn't matter if they are above or below their respective
>>>>>>> *mov??_internal patterns, as long as they are positioned in some
>>>>>>> consistent way. IOW, new patterns shouldn't be grouped together, as is
>>>>>>> the case with your patch.
>>>>>>
>>>>>> +(define_insn "kmovb"
>>>>>> +  [(set (match_operand:QI 0 "register_operand" "=k,k")
>>>>>> +    (unspec:QI
>>>>>> +      [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
>>>>>> +      UNSPEC_KMOV))]
>>>>>> +  "TARGET_AVX512DQ && !MEM_P (operands[1])"
>>>>>>
>>>>>> There is no need for !MEM_P, this will prevent memory operand, which
>>>>>> is allowed by constraint "m".
>>>>>>
>>>>>> +(define_insn "kmovq"
>>>>>> +  [(set (match_operand:DI 0 "register_operand" "=k,k,km")
>>>>>> +    (unspec:DI
>>>>>> +      [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
>>>>>> +      UNSPEC_KMOV))]
>>>>>> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>>>>>>
>>>>>> Operand 0 should have "nonimmediate_operand" predicate. And here you
>>>>>> need  && !(MEM_P (op0) && MEM_P (op1)) in insn constraint to prevent
>>>>>> mem->mem moves.
>>>>>
>>>>> Changed according your comments and attached.
>>>>
>>>> Still not good.
>>>>
>>>> +(define_insn "kmovd"
>>>> +  [(set (match_operand:SI 0 "register_operand" "=k,k")
>>>> +    (unspec:SI
>>>> +      [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
>>>> +      UNSPEC_KMOV))]
>>>> +  "TARGET_AVX512BW && !MEM_P (operands[1])"
>>>>
>>>> Remove !MEM_P in the above pattern.
>>>>
>>>>  (define_insn "kmovw"
>>>> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=k,k")
>>>> +  [(set (match_operand:HI 0 "register_operand" "=k,k")
>>>>      (unspec:HI
>>>>        [(match_operand:HI 1 "nonimmediate_operand" "r,km")]
>>>>        UNSPEC_KMOV))]
>>>> -  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512F"
>>>> +  "TARGET_AVX512F && !MEM_P (operands[1])"
>>>>
>>>> Also remove !MEM_P here.
>>>>
>>>> +(define_insn "kadd<mode>"
>>>> +  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
>>>> +    (plus:SWI1248x
>>>> +      (not:SWI1248x
>>>> +        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
>>>> +      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
>>>> +   (clobber (reg:CC FLAGS_REG))]
>>>> +  "TARGET_AVX512F"
>>>> +{
>>>> +  switch (which_alternative)
>>>> +    {
>>>> +    case 0:
>>>> +      return "add\t{%k2, %k1, %k0|%k0, %k1, %k2}";
>>>> +    case 1:
>>>> +      return "#";
>>>> +    case 2:
>>>> +      if (TARGET_AVX512BW && <MODE>mode == DImode)
>>>> +    return "kaddq\t{%2, %1, %0|%0, %1, %2}";
>>>> +      else if (TARGET_AVX512BW && <MODE>mode == SImode)
>>>> +    return "kaddd\t{%2, %1, %0|%0, %1, %2}";
>>>> +      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
>>>> +    return "kaddb\t{%2, %1, %0|%0, %1, %2}";
>>>> +      else
>>>> +    return "kaddw\t{%2, %1, %0|%0, %1, %2}";
>>>> +
>>>>
>>>> The above pattern is wrong. Is there really a NOT RTX present,
>>>> implying effectively a kaddn?
>>>>
>>>> If this is plain add, then you need to change other add patterns, see
>>>> how logic patterns are amended with "k" constraint, added pattern
>>>> should look like *k<logic><mode> pattern.
>>>>
>>>>  (define_insn "kandn<mode>"
>>>> -  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
>>>> -    (and:SWI12
>>>> -      (not:SWI12
>>>> -        (match_operand:SWI12 1 "register_operand" "r,0,k"))
>>>> -      (match_operand:SWI12 2 "register_operand" "r,r,k")))
>>>> +  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
>>>> +    (and:SWI1248x
>>>> +      (not:SWI1248x
>>>> +        (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
>>>> +      (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
>>>>     (clobber (reg:CC FLAGS_REG))]
>>>>    "TARGET_AVX512F"
>>>>  {
>>>> @@ -8319,10 +8358,50 @@
>>>>      case 1:
>>>>        return "#";
>>>>      case 2:
>>>> -      if (TARGET_AVX512DQ && <MODE>mode == QImode)
>>>> +      if (TARGET_AVX512BW && <MODE>mode == DImode)
>>>> +    return "kandnq\t{%2, %1, %0|%0, %1, %2}";
>>>> +      else if (TARGET_AVX512BW && <MODE>mode == SImode)
>>>> +    return "kandnd\t{%2, %1, %0|%0, %1, %2}";
>>>> +      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
>>>>      return "kandnb\t{%2, %1, %0|%0, %1, %2}";
>>>>        else
>>>>      return "kandnw\t{%2, %1, %0|%0, %1, %2}";
>>>>
>>>> The above should use SWI1248_AVX512BW mode iterator, see
>>>> *k<logic><mode> pattern.
>>>
>>> I split this patch after last updates in md files, here is the first
>>> part which doesn't change md files.
>>> Regtested on x86_64-linux-gnu.  Is this part ok?
>>
>> There is no point to scan for kmovX insn in e.g.:
>>
>> +/* { dg-final { scan-assembler-times "kmovq" 2 } } */
>> +
>> +#include <immintrin.h>
>> +
>> +void
>> +avx512bw_test ()
>> +{
>> +  __mmask64 k1, k2, k3;
>> +  volatile __m512i x = _mm512_setzero_si512 ();
>> +
>> +  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
>> +  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
>>
>> since you emit it from inline asm.
>>
>> Please remove these pointles kmovX scan-asm-times directives from the
>> testcases, and please also remove it  from avx512f-kandnw-1.c
>> testcase.
>>
>> The patch is OK with this change.
>
> Attached fixed with updated ChangeLogs.
>
> HJ, could you commit please?
>

Done.


-- 
H.J.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-12-02 18:31             ` Uros Bizjak
  2016-12-05 14:59               ` Andrew Senkevich
@ 2016-12-14 19:33               ` Andrew Senkevich
  2016-12-14 20:35                 ` Uros Bizjak
  1 sibling, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2016-12-14 19:33 UTC (permalink / raw)
  To: Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 17248 bytes --]

2016-12-02 21:31 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
. . . . .
>>
>> I split this patch after last updates in md files, here is the first
>> part which doesn't change md files.
>> Regtested on x86_64-linux-gnu.  Is this part ok?
>
> There is no point to scan for kmovX insn in e.g.:
>
> +/* { dg-final { scan-assembler-times "kmovq" 2 } } */
> +
> +#include <immintrin.h>
> +
> +void
> +avx512bw_test ()
> +{
> +  __mmask64 k1, k2, k3;
> +  volatile __m512i x = _mm512_setzero_si512 ();
> +
> +  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
> +  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
>
> since you emit it from inline asm.
>
> Please remove these pointles kmovX scan-asm-times directives from the
> testcases, and please also remove it  from avx512f-kandnw-1.c
> testcase.
>
> The patch is OK with this change.

Hi

here is the second part of k-mask intrinsics, is it Ok?

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 9e6e0ce..7f40808 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,62 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));

 typedef unsigned long long __mmask64;

+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask32_u32 (__mmask32 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask64_u64 (__mmask64 __A)
+{
+  return (unsigned long long) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask32 (unsigned int __A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu64_mask64 (unsigned long long __A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask32 (__mmask32 *__A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 (*__A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask64 (__mmask64 *__A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 (*(__mmask64 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask32 (__mmask32 *__A, __mmask32 __B)
+{
+  *(__mmask32 *) __A = __builtin_ia32_kmov32 (__B);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask64 (__mmask64 *__A, __mmask64 __B)
+{
+  *(__mmask64 *) __A = __builtin_ia32_kmov64 (__B);
+}
+
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _knot_mask32 (__mmask32 __A)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index d2405c3..d15d35d 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,34 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */

+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask8_u32 (__mmask8 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov8 ((__mmask8 ) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask8 (unsigned int __A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask8 (__mmask8 *__A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 (*(__mmask8 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask8 (__mmask8 *__A, __mmask8 __B)
+{
+  *(__mmask8 *) __A = __builtin_ia32_kmov8 (__B);
+}
+
 extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _knot_mask8 (__mmask8 __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index ab1704b..45e1949 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9984,6 +9984,34 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U,
void const *__P)
 #define _kxnor_mask16 _mm512_kxnor
 #define _kxor_mask16 _mm512_kxor

+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask16_u32 (__mmask16 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask16 (unsigned int __A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask16 (__mmask16 *__A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 (*(__mmask16 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask16 (__mmask16 *__A, __mmask16 __B)
+{
+  *(__mmask16 *) __A = __builtin_ia32_kmov16 (__B);
+}
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kand (__mmask16 __A, __mmask16 __B)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 83a5089..8030083 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1463,7 +1463,10 @@ BDESC (OPTION_MASK_ISA_AVX512DQ,
CODE_FOR_kxorqi, "__builtin_ia32_kxorqi", IX86_
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxorhi,
"__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxorsi,
"__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int)
USI_FTYPE_USI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxordi,
"__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int)
UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kmovb,
"__builtin_ia32_kmov8", IX86_BUILTIN_KMOV8, UNKNOWN, (int)
UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw,
"__builtin_ia32_kmov16", IX86_BUILTIN_KMOV16, UNKNOWN, (int)
UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovd,
"__builtin_ia32_kmov32", IX86_BUILTIN_KMOV32, UNKNOWN, (int)
USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovq,
"__builtin_ia32_kmov64", IX86_BUILTIN_KMOV64, UNKNOWN, (int)
UDI_FTYPE_UDI)

 /* SHA */
 BDESC (OPTION_MASK_ISA_SSE2, CODE_FOR_sha1msg1, 0,
IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 454aeca..c7456d5 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1309,12 +1309,30 @@
 ;; Mask variant shift mnemonics
 (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])

+(define_expand "kmovb"
+  [(set (match_operand:QI 0 "nonimmediate_operand")
+ (match_operand:QI 1 "nonimmediate_operand"))]
+  "TARGET_AVX512DQ
+   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
+
 (define_expand "kmovw"
   [(set (match_operand:HI 0 "nonimmediate_operand")
  (match_operand:HI 1 "nonimmediate_operand"))]
   "TARGET_AVX512F
    && !(MEM_P (operands[0]) && MEM_P (operands[1]))")

+(define_expand "kmovd"
+  [(set (match_operand:SI 0 "nonimmediate_operand")
+ (match_operand:SI 1 "nonimmediate_operand"))]
+  "TARGET_AVX512BW
+   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
+
+(define_expand "kmovq"
+  [(set (match_operand:DI 0 "nonimmediate_operand")
+ (match_operand:DI 1 "nonimmediate_operand"))]
+  "TARGET_AVX512BW
+   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
+
 (define_insn "k<code><mode>"
   [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
  (any_logic:SWI1248_AVX512BW
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
new file mode 100644
index 0000000..2fbdafd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask32 k1;
+
+void
+avx512bw_test ()
+{
+  __mmask32 k = _cvtu32_mask32 (11);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
new file mode 100644
index 0000000..581affe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask32 k1;
+
+void
+avx512bw_test ()
+{
+  __mmask32 k0 = 11;
+  __mmask32 k = _load_mask32 (&k0);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
new file mode 100644
index 0000000..4cf22fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask32 k1 = 11;
+
+void
+avx512bw_test ()
+{
+  __mmask32 k0, k;
+
+  _store_mask32 (&k, k1);
+
+  asm volatile ("" : "+k" (k));
+  k0 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
new file mode 100644
index 0000000..d61f944
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile unsigned int i;
+
+void
+avx512bw_test ()
+{
+  __mmask32 k = 11;
+
+  asm volatile ("" : "+k" (k));
+  i = _cvtmask32_u32 (k);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
new file mode 100644
index 0000000..20586b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask64 k1;
+
+void
+avx512bw_test ()
+{
+  __mmask64 k = _cvtu64_mask64 (11);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
new file mode 100644
index 0000000..1a5f94c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask64 k1;
+
+void
+avx512bw_test ()
+{
+  __mmask64 k0 = 11;
+  __mmask64 k = _load_mask64 (&k0);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
new file mode 100644
index 0000000..53c6a17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask64 k1 = 11;
+
+void
+avx512bw_test ()
+{
+  __mmask64 k0, k;
+
+  _store_mask64 (&k, k1);
+
+  asm volatile ("" : "+k" (k));
+  k0 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
new file mode 100644
index 0000000..0122c6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile unsigned long long i;
+
+void
+avx512bw_test ()
+{
+  __mmask64 k = 11;
+
+  asm volatile ("" : "+k" (k));
+  i = _cvtmask64_u64 (k);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
new file mode 100644
index 0000000..162ce38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask8 k1;
+
+void
+avx512dq_test ()
+{
+  __mmask8 k = _cvtu32_mask8 (11);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
new file mode 100644
index 0000000..c10dd1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask8 k1;
+
+void
+avx512dq_test ()
+{
+  __mmask8 k0 = 11;
+  __mmask8 k = _load_mask8 (&k0);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
new file mode 100644
index 0000000..b3120dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask8 k1 = 11;
+
+void
+avx512bw_test ()
+{
+  __mmask8 k0, k;
+
+  _store_mask8 (&k, k1);
+
+  asm volatile ("" : "+k" (k));
+  k0 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
new file mode 100644
index 0000000..f4fbc49
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile unsigned int i;
+
+void
+avx512dq_test ()
+{
+  __mmask8 k = 11;
+
+  asm volatile ("" : "+k" (k));
+  i = _cvtmask8_u32 (k);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
new file mode 100644
index 0000000..95d203b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask16 k1;
+
+void
+avx512f_test ()
+{
+  __mmask16 k = _cvtu32_mask16 (11);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
new file mode 100644
index 0000000..82d1b30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask16 k1;
+
+void
+avx512f_test ()
+{
+  __mmask16 k0 = 11;
+  __mmask16 k = _load_mask16 (&k0);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
new file mode 100644
index 0000000..c1221e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask16 k1 = 11;
+
+void
+avx512f_test ()
+{
+  __mmask16 k0, k;
+
+  _store_mask16 (&k, k1);
+
+  asm volatile ("" : "+k" (k));
+  k0 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
new file mode 100644
index 0000000..21ad934
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile unsigned int i;
+
+void
+avx512f_test ()
+{
+  __mmask16 k = 11;
+
+  asm volatile ("" : "+k" (k));
+  i = _cvtmask16_u32 (k);
+}



--
WBR,
Andrew

[-- Attachment #2: avx512-kmask-intrin-part2.patch --]
[-- Type: application/octet-stream, Size: 16357 bytes --]

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 9e6e0ce..7f40808 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,62 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 typedef unsigned long long __mmask64;
 
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask32_u32 (__mmask32 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask64_u64 (__mmask64 __A)
+{
+  return (unsigned long long) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask32 (unsigned int __A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu64_mask64 (unsigned long long __A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask32 (__mmask32 *__A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 (*__A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask64 (__mmask64 *__A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 (*(__mmask64 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask32 (__mmask32 *__A, __mmask32 __B)
+{
+  *(__mmask32 *) __A = __builtin_ia32_kmov32 (__B);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask64 (__mmask64 *__A, __mmask64 __B)
+{
+  *(__mmask64 *) __A = __builtin_ia32_kmov64 (__B);
+}
+
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _knot_mask32 (__mmask32 __A)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index d2405c3..d15d35d 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,34 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */
 
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask8_u32 (__mmask8 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov8 ((__mmask8 ) __A);
+}
+	
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask8 (unsigned int __A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask8 (__mmask8 *__A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 (*(__mmask8 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask8 (__mmask8 *__A, __mmask8 __B)
+{
+  *(__mmask8 *) __A = __builtin_ia32_kmov8 (__B);
+}
+
 extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _knot_mask8 (__mmask8 __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index ab1704b..45e1949 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9984,6 +9984,34 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 #define _kxnor_mask16 _mm512_kxnor
 #define _kxor_mask16 _mm512_kxor
 
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask16_u32 (__mmask16 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask16 (unsigned int __A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask16 (__mmask16 *__A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 (*(__mmask16 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask16 (__mmask16 *__A, __mmask16 __B)
+{
+  *(__mmask16 *) __A = __builtin_ia32_kmov16 (__B);
+}
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kand (__mmask16 __A, __mmask16 __B)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 83a5089..8030083 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1463,7 +1463,10 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxorqi, "__builtin_ia32_kxorqi", IX86_
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxorhi, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxorsi, "__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxordi, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kmovb, "__builtin_ia32_kmov8", IX86_BUILTIN_KMOV8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kmov16", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovd, "__builtin_ia32_kmov32", IX86_BUILTIN_KMOV32, UNKNOWN, (int) USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovq, "__builtin_ia32_kmov64", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
 
 /* SHA */
 BDESC (OPTION_MASK_ISA_SSE2, CODE_FOR_sha1msg1, 0, IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 454aeca..c7456d5 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1309,12 +1309,30 @@
 ;; Mask variant shift mnemonics
 (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
 
+(define_expand "kmovb"
+  [(set (match_operand:QI 0 "nonimmediate_operand")
+	(match_operand:QI 1 "nonimmediate_operand"))]
+  "TARGET_AVX512DQ
+   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
+
 (define_expand "kmovw"
   [(set (match_operand:HI 0 "nonimmediate_operand")
 	(match_operand:HI 1 "nonimmediate_operand"))]
   "TARGET_AVX512F
    && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
 
+(define_expand "kmovd"
+  [(set (match_operand:SI 0 "nonimmediate_operand")
+	(match_operand:SI 1 "nonimmediate_operand"))]
+  "TARGET_AVX512BW
+   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
+
+(define_expand "kmovq"
+  [(set (match_operand:DI 0 "nonimmediate_operand")
+	(match_operand:DI 1 "nonimmediate_operand"))]
+  "TARGET_AVX512BW
+   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
+
 (define_insn "k<code><mode>"
   [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
 	(any_logic:SWI1248_AVX512BW
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
new file mode 100644
index 0000000..2fbdafd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask32 k1;
+
+void
+avx512bw_test ()
+{
+  __mmask32 k = _cvtu32_mask32 (11);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
new file mode 100644
index 0000000..581affe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask32 k1;
+
+void
+avx512bw_test ()
+{
+  __mmask32 k0 = 11; 
+  __mmask32 k = _load_mask32 (&k0);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
new file mode 100644
index 0000000..4cf22fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask32 k1 = 11;
+
+void
+avx512bw_test ()
+{
+  __mmask32 k0, k;
+ 
+  _store_mask32 (&k, k1);
+
+  asm volatile ("" : "+k" (k));
+  k0 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
new file mode 100644
index 0000000..d61f944
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile unsigned int i;
+
+void
+avx512bw_test ()
+{
+  __mmask32 k = 11;
+  
+  asm volatile ("" : "+k" (k));
+  i = _cvtmask32_u32 (k);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
new file mode 100644
index 0000000..20586b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask64 k1;
+
+void
+avx512bw_test ()
+{
+  __mmask64 k = _cvtu64_mask64 (11);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
new file mode 100644
index 0000000..1a5f94c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask64 k1;
+
+void
+avx512bw_test ()
+{
+  __mmask64 k0 = 11; 
+  __mmask64 k = _load_mask64 (&k0);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
new file mode 100644
index 0000000..53c6a17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask64 k1 = 11;
+
+void
+avx512bw_test ()
+{
+  __mmask64 k0, k;
+ 
+  _store_mask64 (&k, k1);
+
+  asm volatile ("" : "+k" (k));
+  k0 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
new file mode 100644
index 0000000..0122c6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile unsigned long long i;
+
+void
+avx512bw_test ()
+{
+  __mmask64 k = 11;
+  
+  asm volatile ("" : "+k" (k));
+  i = _cvtmask64_u64 (k);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
new file mode 100644
index 0000000..162ce38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask8 k1;
+
+void
+avx512dq_test ()
+{
+  __mmask8 k = _cvtu32_mask8 (11);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
new file mode 100644
index 0000000..c10dd1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask8 k1;
+
+void
+avx512dq_test ()
+{
+  __mmask8 k0 = 11; 
+  __mmask8 k = _load_mask8 (&k0);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
new file mode 100644
index 0000000..b3120dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask8 k1 = 11;
+
+void
+avx512bw_test ()
+{
+  __mmask8 k0, k;
+ 
+  _store_mask8 (&k, k1);
+
+  asm volatile ("" : "+k" (k));
+  k0 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
new file mode 100644
index 0000000..f4fbc49
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile unsigned int i;
+
+void
+avx512dq_test ()
+{
+  __mmask8 k = 11;
+
+  asm volatile ("" : "+k" (k));
+  i = _cvtmask8_u32 (k);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
new file mode 100644
index 0000000..95d203b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask16 k1;
+
+void
+avx512f_test ()
+{
+  __mmask16 k = _cvtu32_mask16 (11);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
new file mode 100644
index 0000000..82d1b30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask16 k1;
+
+void
+avx512f_test ()
+{
+  __mmask16 k0 = 11; 
+  __mmask16 k = _load_mask16 (&k0);
+
+  asm volatile ("" : "+k" (k));
+  k1 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
new file mode 100644
index 0000000..c1221e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile __mmask16 k1 = 11;
+
+void
+avx512f_test ()
+{
+  __mmask16 k0, k;
+ 
+  _store_mask16 (&k, k1);
+
+  asm volatile ("" : "+k" (k));
+  k0 = k;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
new file mode 100644
index 0000000..21ad934
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+volatile unsigned int i;
+
+void
+avx512f_test ()
+{
+  __mmask16 k = 11;
+  
+  asm volatile ("" : "+k" (k));
+  i = _cvtmask16_u32 (k);
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-12-14 19:33               ` Andrew Senkevich
@ 2016-12-14 20:35                 ` Uros Bizjak
       [not found]                   ` <CAMXFM3vC-3bMgQaQ2bnjDU7oQMPdvhurzgOFftZHqzNXAw=WgA@mail.gmail.com>
  0 siblings, 1 reply; 48+ messages in thread
From: Uros Bizjak @ 2016-12-14 20:35 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: GCC Patches

On Wed, Dec 14, 2016 at 8:04 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:

> here is the second part of k-mask intrinsics, is it Ok?

> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -1309,12 +1309,30 @@
>  ;; Mask variant shift mnemonics
>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>
> +(define_expand "kmovb"
> +  [(set (match_operand:QI 0 "nonimmediate_operand")
> + (match_operand:QI 1 "nonimmediate_operand"))]
> +  "TARGET_AVX512DQ
> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
> +
>  (define_expand "kmovw"
>    [(set (match_operand:HI 0 "nonimmediate_operand")
>   (match_operand:HI 1 "nonimmediate_operand"))]
>    "TARGET_AVX512F
>     && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>
> +(define_expand "kmovd"
> +  [(set (match_operand:SI 0 "nonimmediate_operand")
> + (match_operand:SI 1 "nonimmediate_operand"))]
> +  "TARGET_AVX512BW
> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
> +
> +(define_expand "kmovq"
> +  [(set (match_operand:DI 0 "nonimmediate_operand")
> + (match_operand:DI 1 "nonimmediate_operand"))]
> +  "TARGET_AVX512BW
> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
> +
>  (define_insn "k<code><mode>"
>    [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
>   (any_logic:SWI1248_AVX512BW

All the above patterns can be macroized with the following patch:

--cut here--
Index: sse.md
===================================================================
--- sse.md      (revision 243651)
+++ sse.md      (working copy)
@@ -1309,9 +1309,9 @@
 ;; Mask variant shift mnemonics
 (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])

-(define_expand "kmovw"
-  [(set (match_operand:HI 0 "nonimmediate_operand")
-       (match_operand:HI 1 "nonimmediate_operand"))]
+(define_expand "kmov<mskmodesuffix>"
+  [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
+       (match_operand:SWI1248_AVX512BWDQ 1 "nonimmediate_operand"))]
   "TARGET_AVX512F
    && !(MEM_P (operands[0]) && MEM_P (operands[1]))")

--cut here--

Please also post ChangeLog entry.

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
       [not found]                   ` <CAMXFM3vC-3bMgQaQ2bnjDU7oQMPdvhurzgOFftZHqzNXAw=WgA@mail.gmail.com>
@ 2016-12-15 16:51                     ` Uros Bizjak
  2016-12-15 19:04                       ` Andrew Senkevich
  0 siblings, 1 reply; 48+ messages in thread
From: Uros Bizjak @ 2016-12-15 16:51 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: GCC Patches

On Thu, Dec 15, 2016 at 2:31 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2016-12-14 22:55 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>> On Wed, Dec 14, 2016 at 8:04 PM, Andrew Senkevich
>> <andrew.n.senkevich@gmail.com> wrote:
>>
>>> here is the second part of k-mask intrinsics, is it Ok?
>>
>>> --- a/gcc/config/i386/sse.md
>>> +++ b/gcc/config/i386/sse.md
>>> @@ -1309,12 +1309,30 @@
>>>  ;; Mask variant shift mnemonics
>>>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>>>
>>> +(define_expand "kmovb"
>>> +  [(set (match_operand:QI 0 "nonimmediate_operand")
>>> + (match_operand:QI 1 "nonimmediate_operand"))]
>>> +  "TARGET_AVX512DQ
>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>> +
>>>  (define_expand "kmovw"
>>>    [(set (match_operand:HI 0 "nonimmediate_operand")
>>>   (match_operand:HI 1 "nonimmediate_operand"))]
>>>    "TARGET_AVX512F
>>>     && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>
>>> +(define_expand "kmovd"
>>> +  [(set (match_operand:SI 0 "nonimmediate_operand")
>>> + (match_operand:SI 1 "nonimmediate_operand"))]
>>> +  "TARGET_AVX512BW
>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>> +
>>> +(define_expand "kmovq"
>>> +  [(set (match_operand:DI 0 "nonimmediate_operand")
>>> + (match_operand:DI 1 "nonimmediate_operand"))]
>>> +  "TARGET_AVX512BW
>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>> +
>>>  (define_insn "k<code><mode>"
>>>    [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
>>>   (any_logic:SWI1248_AVX512BW
>>
>> All the above patterns can be macroized with the following patch:
>>
>> --cut here--
>> Index: sse.md
>> ===================================================================
>> --- sse.md      (revision 243651)
>> +++ sse.md      (working copy)
>> @@ -1309,9 +1309,9 @@
>>  ;; Mask variant shift mnemonics
>>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>>
>> -(define_expand "kmovw"
>> -  [(set (match_operand:HI 0 "nonimmediate_operand")
>> -       (match_operand:HI 1 "nonimmediate_operand"))]
>> +(define_expand "kmov<mskmodesuffix>"
>> +  [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
>> +       (match_operand:SWI1248_AVX512BWDQ 1 "nonimmediate_operand"))]
>>    "TARGET_AVX512F
>>     && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>
>> --cut here--
>>
>> Please also post ChangeLog entry.
>
> Thanks,
>
> here is with ChangeLogs and renamed internal __builtin_ia32_kmov* to
> match instruction names.
> For __builtin_ia32_kmov16 change I will follow up for update in branches.
>
> Regtested on x86_64-linux-gnu, Ok for trunk?

OK.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-12-15 16:51                     ` Uros Bizjak
@ 2016-12-15 19:04                       ` Andrew Senkevich
  2016-12-16 12:45                         ` Uros Bizjak
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2016-12-15 19:04 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 10002 bytes --]

2016-12-15 19:51 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> On Thu, Dec 15, 2016 at 2:31 PM, Andrew Senkevich
> <andrew.n.senkevich@gmail.com> wrote:
>> 2016-12-14 22:55 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>> On Wed, Dec 14, 2016 at 8:04 PM, Andrew Senkevich
>>> <andrew.n.senkevich@gmail.com> wrote:
>>>
>>>> here is the second part of k-mask intrinsics, is it Ok?
>>>
>>>> --- a/gcc/config/i386/sse.md
>>>> +++ b/gcc/config/i386/sse.md
>>>> @@ -1309,12 +1309,30 @@
>>>>  ;; Mask variant shift mnemonics
>>>>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>>>>
>>>> +(define_expand "kmovb"
>>>> +  [(set (match_operand:QI 0 "nonimmediate_operand")
>>>> + (match_operand:QI 1 "nonimmediate_operand"))]
>>>> +  "TARGET_AVX512DQ
>>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>> +
>>>>  (define_expand "kmovw"
>>>>    [(set (match_operand:HI 0 "nonimmediate_operand")
>>>>   (match_operand:HI 1 "nonimmediate_operand"))]
>>>>    "TARGET_AVX512F
>>>>     && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>>
>>>> +(define_expand "kmovd"
>>>> +  [(set (match_operand:SI 0 "nonimmediate_operand")
>>>> + (match_operand:SI 1 "nonimmediate_operand"))]
>>>> +  "TARGET_AVX512BW
>>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>> +
>>>> +(define_expand "kmovq"
>>>> +  [(set (match_operand:DI 0 "nonimmediate_operand")
>>>> + (match_operand:DI 1 "nonimmediate_operand"))]
>>>> +  "TARGET_AVX512BW
>>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>> +
>>>>  (define_insn "k<code><mode>"
>>>>    [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
>>>>   (any_logic:SWI1248_AVX512BW
>>>
>>> All the above patterns can be macroized with the following patch:
>>>
>>> --cut here--
>>> Index: sse.md
>>> ===================================================================
>>> --- sse.md      (revision 243651)
>>> +++ sse.md      (working copy)
>>> @@ -1309,9 +1309,9 @@
>>>  ;; Mask variant shift mnemonics
>>>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>>>
>>> -(define_expand "kmovw"
>>> -  [(set (match_operand:HI 0 "nonimmediate_operand")
>>> -       (match_operand:HI 1 "nonimmediate_operand"))]
>>> +(define_expand "kmov<mskmodesuffix>"
>>> +  [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
>>> +       (match_operand:SWI1248_AVX512BWDQ 1 "nonimmediate_operand"))]
>>>    "TARGET_AVX512F
>>>     && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>
>>> --cut here--
>>>
>>> Please also post ChangeLog entry.
>>
>> Thanks,
>>
>> here is with ChangeLogs and renamed internal __builtin_ia32_kmov* to
>> match instruction names.
>> For __builtin_ia32_kmov16 change I will follow up for update in branches.
>>
>> Regtested on x86_64-linux-gnu, Ok for trunk?
>
> OK.

Thanks,

here is one more part for kadd{b,w,d,q}, is it ok?

gcc/
    * config/i386/avx512bwintrin.h: Add new k-mask intrinsics.
    * config/i386/avx512dqintrin.h: Ditto.
    * config/i386/avx512fintrin.h: Ditto.
    * config/i386/i386-builtin.def (__builtin_ia32_kaddqi,
    __builtin_ia32_kaddhi, __builtin_ia32_kaddsi,
    __builtin_ia32_kadddi): New.
    * config/i386/sse.md (kadd<mode>): New.

gcc/testsuite/
    * gcc.target/i386/avx512bw-kaddd-1.c: New test.
    * gcc.target/i386/avx512bw-kaddq-1.c: Ditto.
    * gcc.target/i386/avx512dq-kaddb-1.c: Ditto.
    * gcc.target/i386/avx512f-kaddw-1.c: Ditto.

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index b35ae2b..e38055c 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,20 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));

 typedef unsigned long long __mmask64;

+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
+}
+
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask32_u32 (__mmask32 __A)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 4db44e4..ccc6a4d 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,13 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */

+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kaddqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask8_u32 (__mmask8 __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index a889c83..820741c 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9984,6 +9984,13 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U,
void const *__P)
 #define _kxnor_mask16 _mm512_kxnor
 #define _kxor_mask16 _mm512_kxor

+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask16 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask16) __builtin_ia32_kaddhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask16_u32 (__mmask16 __A)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 71382c8..7d86008 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1471,6 +1471,10 @@ BDESC (OPTION_MASK_ISA_AVX512DQ,
CODE_FOR_kmovb, "__builtin_ia32_kmovb", IX86_BU
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw,
"__builtin_ia32_kmovw", IX86_BUILTIN_KMOV16, UNKNOWN, (int)
UHI_FTYPE_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovd,
"__builtin_ia32_kmovd", IX86_BUILTIN_KMOV32, UNKNOWN, (int)
USI_FTYPE_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovq,
"__builtin_ia32_kmovq", IX86_BUILTIN_KMOV64, UNKNOWN, (int)
UDI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kaddqi,
"__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int)
UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kaddhi,
"__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kaddsi,
"__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int)
USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kadddi,
"__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int)
UDI_FTYPE_UDI_UDI)

 /* SHA */
 BDESC (OPTION_MASK_ISA_SSE2, CODE_FOR_sha1msg1, 0,
IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6dc57aa..4c9bdec 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1309,6 +1309,18 @@
 ;; Mask variant shift mnemonics
 (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])

+(define_insn "kadd<mode>"
+  [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
+ (plus:SWI1248_AVX512BWDQ
+  (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")
+  (match_operand:SWI1248_AVX512BWDQ 2 "register_operand" "k")))
+   (unspec [(const_int 0)] UNSPEC_MASKOP)]
+  "TARGET_AVX512F"
+  "kadd<mskmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
 (define_expand "kmov<mskmodesuffix>"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
  (match_operand:SWI1248_AVX512BWDQ 1 "nonimmediate_operand"))]
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
new file mode 100644
index 0000000..1f6c61f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k = _kadd_mask32 (11, 12);
+  asm volatile ("" : "+k" (k));
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
new file mode 100644
index 0000000..9e9aaae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k = _kadd_mask64 (11, 12);
+  asm volatile ("" : "+k" (k));
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
new file mode 100644
index 0000000..4be7b0b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kaddb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k = _kadd_mask8 (11, 12);
+  asm volatile ("" : "+k" (k));
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
new file mode 100644
index 0000000..957a395
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kaddw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k = _kadd_mask16 (11, 12);
+  asm volatile ("" : "+k" (k));
+}


--
WBR,
Andrew

[-- Attachment #2: avx512-kmask-intrin-part3.patch --]
[-- Type: application/octet-stream, Size: 6591 bytes --]

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index b35ae2b..e38055c 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,20 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 typedef unsigned long long __mmask64;
 
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
+}
+
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask32_u32 (__mmask32 __A)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 4db44e4..ccc6a4d 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,13 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */
 
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kaddqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask8_u32 (__mmask8 __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index a889c83..820741c 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9984,6 +9984,13 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 #define _kxnor_mask16 _mm512_kxnor
 #define _kxor_mask16 _mm512_kxor
 
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask16 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask16) __builtin_ia32_kaddhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask16_u32 (__mmask16 __A)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 71382c8..7d86008 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1471,6 +1471,10 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kmovb, "__builtin_ia32_kmovb", IX86_BU
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kmovw", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovd, "__builtin_ia32_kmovd", IX86_BUILTIN_KMOV32, UNKNOWN, (int) USI_FTYPE_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovq, "__builtin_ia32_kmovq", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kaddqi, "__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kaddhi, "__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kaddsi, "__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 
 /* SHA */
 BDESC (OPTION_MASK_ISA_SSE2, CODE_FOR_sha1msg1, 0, IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6dc57aa..4c9bdec 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1309,6 +1309,18 @@
 ;; Mask variant shift mnemonics
 (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
 
+(define_insn "kadd<mode>"
+  [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
+	(plus:SWI1248_AVX512BWDQ
+	  (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")
+	  (match_operand:SWI1248_AVX512BWDQ 2 "register_operand" "k")))
+   (unspec [(const_int 0)] UNSPEC_MASKOP)]
+  "TARGET_AVX512F"
+  "kadd<mskmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
 (define_expand "kmov<mskmodesuffix>"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
 	(match_operand:SWI1248_AVX512BWDQ 1 "nonimmediate_operand"))]
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
new file mode 100644
index 0000000..1f6c61f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k = _kadd_mask32 (11, 12);
+  asm volatile ("" : "+k" (k));
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
new file mode 100644
index 0000000..9e9aaae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k = _kadd_mask64 (11, 12);
+  asm volatile ("" : "+k" (k));
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
new file mode 100644
index 0000000..4be7b0b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kaddb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k = _kadd_mask8 (11, 12);
+  asm volatile ("" : "+k" (k));
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
new file mode 100644
index 0000000..957a395
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kaddw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k = _kadd_mask16 (11, 12);
+  asm volatile ("" : "+k" (k));
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-12-15 19:04                       ` Andrew Senkevich
@ 2016-12-16 12:45                         ` Uros Bizjak
  2017-01-16 22:30                           ` Andrew Senkevich
  0 siblings, 1 reply; 48+ messages in thread
From: Uros Bizjak @ 2016-12-16 12:45 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: GCC Patches

On Thu, Dec 15, 2016 at 7:55 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2016-12-15 19:51 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>> On Thu, Dec 15, 2016 at 2:31 PM, Andrew Senkevich
>> <andrew.n.senkevich@gmail.com> wrote:
>>> 2016-12-14 22:55 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>> On Wed, Dec 14, 2016 at 8:04 PM, Andrew Senkevich
>>>> <andrew.n.senkevich@gmail.com> wrote:
>>>>
>>>>> here is the second part of k-mask intrinsics, is it Ok?
>>>>
>>>>> --- a/gcc/config/i386/sse.md
>>>>> +++ b/gcc/config/i386/sse.md
>>>>> @@ -1309,12 +1309,30 @@
>>>>>  ;; Mask variant shift mnemonics
>>>>>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>>>>>
>>>>> +(define_expand "kmovb"
>>>>> +  [(set (match_operand:QI 0 "nonimmediate_operand")
>>>>> + (match_operand:QI 1 "nonimmediate_operand"))]
>>>>> +  "TARGET_AVX512DQ
>>>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>>> +
>>>>>  (define_expand "kmovw"
>>>>>    [(set (match_operand:HI 0 "nonimmediate_operand")
>>>>>   (match_operand:HI 1 "nonimmediate_operand"))]
>>>>>    "TARGET_AVX512F
>>>>>     && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>>>
>>>>> +(define_expand "kmovd"
>>>>> +  [(set (match_operand:SI 0 "nonimmediate_operand")
>>>>> + (match_operand:SI 1 "nonimmediate_operand"))]
>>>>> +  "TARGET_AVX512BW
>>>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>>> +
>>>>> +(define_expand "kmovq"
>>>>> +  [(set (match_operand:DI 0 "nonimmediate_operand")
>>>>> + (match_operand:DI 1 "nonimmediate_operand"))]
>>>>> +  "TARGET_AVX512BW
>>>>> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>>> +
>>>>>  (define_insn "k<code><mode>"
>>>>>    [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
>>>>>   (any_logic:SWI1248_AVX512BW
>>>>
>>>> All the above patterns can be macroized with the following patch:
>>>>
>>>> --cut here--
>>>> Index: sse.md
>>>> ===================================================================
>>>> --- sse.md      (revision 243651)
>>>> +++ sse.md      (working copy)
>>>> @@ -1309,9 +1309,9 @@
>>>>  ;; Mask variant shift mnemonics
>>>>  (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
>>>>
>>>> -(define_expand "kmovw"
>>>> -  [(set (match_operand:HI 0 "nonimmediate_operand")
>>>> -       (match_operand:HI 1 "nonimmediate_operand"))]
>>>> +(define_expand "kmov<mskmodesuffix>"
>>>> +  [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
>>>> +       (match_operand:SWI1248_AVX512BWDQ 1 "nonimmediate_operand"))]
>>>>    "TARGET_AVX512F
>>>>     && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
>>>>
>>>> --cut here--
>>>>
>>>> Please also post ChangeLog entry.
>>>
>>> Thanks,
>>>
>>> here is with ChangeLogs and renamed internal __builtin_ia32_kmov* to
>>> match instruction names.
>>> For __builtin_ia32_kmov16 change I will follow up for update in branches.
>>>
>>> Regtested on x86_64-linux-gnu, Ok for trunk?
>>
>> OK.
>
> Thanks,
>
> here is one more part for kadd{b,w,d,q}, is it ok?
>
> gcc/
>     * config/i386/avx512bwintrin.h: Add new k-mask intrinsics.
>     * config/i386/avx512dqintrin.h: Ditto.
>     * config/i386/avx512fintrin.h: Ditto.
>     * config/i386/i386-builtin.def (__builtin_ia32_kaddqi,
>     __builtin_ia32_kaddhi, __builtin_ia32_kaddsi,
>     __builtin_ia32_kadddi): New.
>     * config/i386/sse.md (kadd<mode>): New.
>
> gcc/testsuite/
>     * gcc.target/i386/avx512bw-kaddd-1.c: New test.
>     * gcc.target/i386/avx512bw-kaddq-1.c: Ditto.
>     * gcc.target/i386/avx512dq-kaddb-1.c: Ditto.
>     * gcc.target/i386/avx512f-kaddw-1.c: Ditto.

OK.

I'll commit the patch to mainline later today.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-12-16 12:45                         ` Uros Bizjak
@ 2017-01-16 22:30                           ` Andrew Senkevich
  2017-01-16 22:55                             ` Jakub Jelinek
  2017-01-17  8:12                             ` Uros Bizjak
  0 siblings, 2 replies; 48+ messages in thread
From: Andrew Senkevich @ 2017-01-16 22:30 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]

Hi,

here is one more part of intrinsics for k-mask registers shifts:

gcc/
    * config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
    * config/i386/avx512dqintrin.h: Ditto.
    * config/i386/avx512fintrin.h: Ditto.
    * config/i386/i386-builtin-types.def: Add new types.
    * gcc/config/i386/i386.c: Handle new types.
    * config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
    __builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
    __builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
    __builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
    __builtin_ia32_kshiftridi): New.
    * config/i386/sse.md (k<code><mode>2): Rename *k<code><mode>.

gcc/testsuite/
    * gcc.target/i386/avx512bw-kshiftld-1.c: New test.
    * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
    * gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
    * gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
    * gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
    * gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.


Is it Ok for trunk?


--
WBR,
Andrew

[-- Attachment #2: avx512-kmask-intrin-part4.patch --]
[-- Type: application/octet-stream, Size: 12702 bytes --]

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 21bec73..d6adaf2 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -42,6 +42,34 @@ typedef unsigned long long __mmask64;
 
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask32 (__mmask32 __A, unsigned int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask64 (__mmask64 __A, unsigned int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftlidi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask32 (__mmask32 __A, unsigned int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask64 (__mmask64 __A, unsigned int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftridi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kadd_mask32 (__mmask32 __A, __mmask32 __B)
 {
   return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 1fc2f68..9a6cf72 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -36,6 +36,20 @@
 
 extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask8 (__mmask8 __A, unsigned int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask8 (__mmask8 __A, unsigned int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kadd_mask8 (__mmask8 __A, __mmask8 __B)
 {
   return (__mmask8) __builtin_ia32_kaddqi ((__mmask8) __A, (__mmask8) __B);
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 346cb00..9256f49 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9986,6 +9986,20 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask16 (__mmask16 __A, unsigned int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask16 (__mmask16 __A, unsigned int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kadd_mask16 (__mmask16 __A, __mmask16 __B)
 {
   return (__mmask16) __builtin_ia32_kaddhi ((__mmask16) __A, (__mmask16) __B);
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index c351335..0649b3b 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1440,6 +1440,14 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kashiftqi2, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_UINT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kashifthi2, "__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int) UHI_FTYPE_UHI_UINT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kashiftsi2, "__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int) USI_FTYPE_USI_UINT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kashiftdi2, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_UINT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_klshiftrtqi2, "__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int) UQI_FTYPE_UQI_UINT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_klshiftrthi2, "__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int) UHI_FTYPE_UHI_UINT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_klshiftrtsi2, "__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int) USI_FTYPE_USI_UINT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_klshiftrtdi2, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_UINT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandqi, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandhi, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandsi, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f754994..bc504eb 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1410,7 +1410,7 @@
 ;; Mask variant shift mnemonics
 (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
 
-(define_insn "*k<code><mode>"
+(define_insn "k<code><mode>2"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
 	(any_lshift:SWI1248_AVX512BWDQ
 	  (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
new file mode 100644
index 0000000..85be9b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftld\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
new file mode 100644
index 0000000..cd5707e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
new file mode 100644
index 0000000..91b6313
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
new file mode 100644
index 0000000..c10fa4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
new file mode 100644
index 0000000..422d0b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  unsigned int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
new file mode 100644
index 0000000..f87cf74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  unsigned int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
new file mode 100644
index 0000000..7a9de12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  unsigned int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
new file mode 100644
index 0000000..641d307
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  unsigned int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 18b3d4c..e7a815e
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -558,10 +558,10 @@ DEF_FUNCTION_TYPE (USI, UHI)
 DEF_FUNCTION_TYPE (UQI, USI)
 DEF_FUNCTION_TYPE (UHI, USI)
 
-DEF_FUNCTION_TYPE (UQI, UQI, INT)
-DEF_FUNCTION_TYPE (UHI, UHI, INT)
-DEF_FUNCTION_TYPE (USI, USI, INT)
-DEF_FUNCTION_TYPE (UDI, UDI, INT)
+DEF_FUNCTION_TYPE (UQI, UQI, UINT)
+DEF_FUNCTION_TYPE (UHI, UHI, UINT)
+DEF_FUNCTION_TYPE (USI, USI, UINT)
+DEF_FUNCTION_TYPE (UDI, UDI, UINT)
 DEF_FUNCTION_TYPE (UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI)
 DEF_FUNCTION_TYPE (USI, USI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3327036..df0d14b
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35073,10 +35073,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8DI_INT:
     case QI_FTYPE_V4SF_INT:
     case QI_FTYPE_V2DF_INT:
-    case UQI_FTYPE_UQI_INT:
-    case UHI_FTYPE_UHI_INT:
-    case USI_FTYPE_USI_INT:
-    case UDI_FTYPE_UDI_INT:
+    case UQI_FTYPE_UQI_UINT:
+    case UHI_FTYPE_UHI_UINT:
+    case USI_FTYPE_USI_UINT:
+    case UDI_FTYPE_UDI_UINT:
       nargs = 2;
       nargs_constant = 1;
       break;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-16 22:30                           ` Andrew Senkevich
@ 2017-01-16 22:55                             ` Jakub Jelinek
  2017-01-17 11:05                               ` Andrew Senkevich
  2017-01-17  8:12                             ` Uros Bizjak
  1 sibling, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2017-01-16 22:55 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Uros Bizjak, GCC Patches, Kirill Yukhin

On Tue, Jan 17, 2017 at 01:30:11AM +0300, Andrew Senkevich wrote:
> here is one more part of intrinsics for k-mask registers shifts:

The software developer manuals describe KSHIFT{L,R}* like:
KSHIFTLW
COUNT <- imm8[7:0]
DEST[MAX_KL-1:0] <- 0
IF COUNT <=15
THEN DEST[15:0] <- SRC1[15:0] << COUNT;
FI;

What is the behavior when src1 == dest, like:
  kshiftld $3, %k3, %k3
?  Is it just a bug in the SDM and will it actually do the expected thing
(set %k3 to %k3 << 3 and clear just the upper bits), or do we need
an early-clobber on the destination to make sure GCC never emits these
insns with the same register as both input and output?

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-16 22:30                           ` Andrew Senkevich
  2017-01-16 22:55                             ` Jakub Jelinek
@ 2017-01-17  8:12                             ` Uros Bizjak
  1 sibling, 0 replies; 48+ messages in thread
From: Uros Bizjak @ 2017-01-17  8:12 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: GCC Patches, Kirill Yukhin

On Mon, Jan 16, 2017 at 11:30 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> Hi,
>
> here is one more part of intrinsics for k-mask registers shifts:
>
> gcc/
>     * config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
>     * config/i386/avx512dqintrin.h: Ditto.
>     * config/i386/avx512fintrin.h: Ditto.
>     * config/i386/i386-builtin-types.def: Add new types.
>     * gcc/config/i386/i386.c: Handle new types.
>     * config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
>     __builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
>     __builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
>     __builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
>     __builtin_ia32_kshiftridi): New.
>     * config/i386/sse.md (k<code><mode>2): Rename *k<code><mode>.
>
> gcc/testsuite/
>     * gcc.target/i386/avx512bw-kshiftld-1.c: New test.
>     * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
>     * gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
>     * gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
>     * gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
>     * gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
>
>
> Is it Ok for trunk?

-(define_insn "*k<code><mode>"
+(define_insn "k<code><mode>2"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
  (any_lshift:SWI1248_AVX512BWDQ
   (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")

Please do not add "2" to the insn name to follow de-facto convention
of other mask insn names.

Otherwise, OK - but please check Jakub's question first.

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-16 22:55                             ` Jakub Jelinek
@ 2017-01-17 11:05                               ` Andrew Senkevich
  2017-01-17 11:06                                 ` Uros Bizjak
  2017-01-17 12:30                                 ` Kirill Yukhin
  0 siblings, 2 replies; 48+ messages in thread
From: Andrew Senkevich @ 2017-01-17 11:05 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, GCC Patches, Kirill Yukhin

2017-01-17 1:55 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
> On Tue, Jan 17, 2017 at 01:30:11AM +0300, Andrew Senkevich wrote:
>> here is one more part of intrinsics for k-mask registers shifts:
>
> The software developer manuals describe KSHIFT{L,R}* like:
> KSHIFTLW
> COUNT <- imm8[7:0]
> DEST[MAX_KL-1:0] <- 0
> IF COUNT <=15
> THEN DEST[15:0] <- SRC1[15:0] << COUNT;
> FI;
>
> What is the behavior when src1 == dest, like:
>   kshiftld $3, %k3, %k3
> ?  Is it just a bug in the SDM and will it actually do the expected thing
> (set %k3 to %k3 << 3 and clear just the upper bits), or do we need
> an early-clobber on the destination to make sure GCC never emits these
> insns with the same register as both input and output?

Indeed, it should be different registers, how to do it?


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-17 11:05                               ` Andrew Senkevich
@ 2017-01-17 11:06                                 ` Uros Bizjak
  2017-01-17 12:30                                 ` Kirill Yukhin
  1 sibling, 0 replies; 48+ messages in thread
From: Uros Bizjak @ 2017-01-17 11:06 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Jakub Jelinek, GCC Patches, Kirill Yukhin

On Tue, Jan 17, 2017 at 12:04 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2017-01-17 1:55 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
>> On Tue, Jan 17, 2017 at 01:30:11AM +0300, Andrew Senkevich wrote:
>>> here is one more part of intrinsics for k-mask registers shifts:
>>
>> The software developer manuals describe KSHIFT{L,R}* like:
>> KSHIFTLW
>> COUNT <- imm8[7:0]
>> DEST[MAX_KL-1:0] <- 0
>> IF COUNT <=15
>> THEN DEST[15:0] <- SRC1[15:0] << COUNT;
>> FI;
>>
>> What is the behavior when src1 == dest, like:
>>   kshiftld $3, %k3, %k3
>> ?  Is it just a bug in the SDM and will it actually do the expected thing
>> (set %k3 to %k3 << 3 and clear just the upper bits), or do we need
>> an early-clobber on the destination to make sure GCC never emits these
>> insns with the same register as both input and output?
>
> Indeed, it should be different registers, how to do it?

"=&k" as operand 0 constraint.

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-17 11:05                               ` Andrew Senkevich
  2017-01-17 11:06                                 ` Uros Bizjak
@ 2017-01-17 12:30                                 ` Kirill Yukhin
  2017-01-17 13:03                                   ` Andrew Senkevich
  1 sibling, 1 reply; 48+ messages in thread
From: Kirill Yukhin @ 2017-01-17 12:30 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Jakub Jelinek, Uros Bizjak, GCC Patches

Hi Anrey,
On 17 Jan 14:04, Andrew Senkevich wrote:
> 2017-01-17 1:55 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
> > On Tue, Jan 17, 2017 at 01:30:11AM +0300, Andrew Senkevich wrote:
> >> here is one more part of intrinsics for k-mask registers shifts:
> >
> > The software developer manuals describe KSHIFT{L,R}* like:
> > KSHIFTLW
> > COUNT <- imm8[7:0]
> > DEST[MAX_KL-1:0] <- 0
> > IF COUNT <=15
> > THEN DEST[15:0] <- SRC1[15:0] << COUNT;
> > FI;
> >
> > What is the behavior when src1 == dest, like:
> >   kshiftld $3, %k3, %k3
> > ?  Is it just a bug in the SDM and will it actually do the expected thing
> > (set %k3 to %k3 << 3 and clear just the upper bits), or do we need
> > an early-clobber on the destination to make sure GCC never emits these
> > insns with the same register as both input and output?
>
> Indeed, it should be different registers, how to do it?
Are you sure?

I've played a bit w/ SDE. And looks like operands are not early clobber:
TID0: INS 0x00000000004003ee             AVX512VEX kmovd k0, eax
TID0:   k0 := 00000000_ffffffff
...
TID0: INS 0x00000000004003f4             AVX512VEX kshiftlw k0, k0, 0x3
TID0:   k0 := 00000000_0000fff8

You can see that same dest and source works just fine.

--
Thanks, K
>
>
> --
> WBR,
> Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-17 12:30                                 ` Kirill Yukhin
@ 2017-01-17 13:03                                   ` Andrew Senkevich
  2017-01-17 13:51                                     ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2017-01-17 13:03 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Jakub Jelinek, Uros Bizjak, GCC Patches

2017-01-17 15:30 GMT+03:00 Kirill Yukhin <kirill.yukhin@gmail.com>:
> Hi Anrey,
> On 17 Jan 14:04, Andrew Senkevich wrote:
>> 2017-01-17 1:55 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
>> > On Tue, Jan 17, 2017 at 01:30:11AM +0300, Andrew Senkevich wrote:
>> >> here is one more part of intrinsics for k-mask registers shifts:
>> >
>> > The software developer manuals describe KSHIFT{L,R}* like:
>> > KSHIFTLW
>> > COUNT <- imm8[7:0]
>> > DEST[MAX_KL-1:0] <- 0
>> > IF COUNT <=15
>> > THEN DEST[15:0] <- SRC1[15:0] << COUNT;
>> > FI;
>> >
>> > What is the behavior when src1 == dest, like:
>> >   kshiftld $3, %k3, %k3
>> > ?  Is it just a bug in the SDM and will it actually do the expected thing
>> > (set %k3 to %k3 << 3 and clear just the upper bits), or do we need
>> > an early-clobber on the destination to make sure GCC never emits these
>> > insns with the same register as both input and output?
>>
>> Indeed, it should be different registers, how to do it?
> Are you sure?
>
> I've played a bit w/ SDE. And looks like operands are not early clobber:
> TID0: INS 0x00000000004003ee             AVX512VEX kmovd k0, eax
> TID0:   k0 := 00000000_ffffffff
> ...
> TID0: INS 0x00000000004003f4             AVX512VEX kshiftlw k0, k0, 0x3
> TID0:   k0 := 00000000_0000fff8
>
> You can see that same dest and source works just fine.

Hmm, I looked only on what ICC generates, and it was not correct way.

Thanks Kirill!


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-17 13:03                                   ` Andrew Senkevich
@ 2017-01-17 13:51                                     ` Jakub Jelinek
  2017-01-18 12:48                                       ` Andrew Senkevich
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2017-01-17 13:51 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Kirill Yukhin, Uros Bizjak, GCC Patches

On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
> > I've played a bit w/ SDE. And looks like operands are not early clobber:
> > TID0: INS 0x00000000004003ee             AVX512VEX kmovd k0, eax
> > TID0:   k0 := 00000000_ffffffff
> > ...
> > TID0: INS 0x00000000004003f4             AVX512VEX kshiftlw k0, k0, 0x3
> > TID0:   k0 := 00000000_0000fff8
> >
> > You can see that same dest and source works just fine.
> 
> Hmm, I looked only on what ICC generates, and it was not correct way.

I've just tried
int
main ()
{
  unsigned int a = 0x5555;
  asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" : "=r" (a) : "r" (a) : "k6");
  __builtin_printf ("%x\n", a);
  return 0;
}
on KNL and got 0xaaaa.
Are you going to report to the SDM authors so that they fix it up?
E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
instead of SRC1[0:...] would fix it, or filling up TEMP first and only
at the end assigning DEST <- TEMP etc. would do.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-17 13:51                                     ` Jakub Jelinek
@ 2017-01-18 12:48                                       ` Andrew Senkevich
  2017-01-18 21:45                                         ` Uros Bizjak
  2017-01-19 10:46                                         ` Kirill Yukhin
  0 siblings, 2 replies; 48+ messages in thread
From: Andrew Senkevich @ 2017-01-18 12:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Kirill Yukhin, Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2391 bytes --]

2017-01-17 16:51 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
> On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
>> > I've played a bit w/ SDE. And looks like operands are not early clobber:
>> > TID0: INS 0x00000000004003ee             AVX512VEX kmovd k0, eax
>> > TID0:   k0 := 00000000_ffffffff
>> > ...
>> > TID0: INS 0x00000000004003f4             AVX512VEX kshiftlw k0, k0, 0x3
>> > TID0:   k0 := 00000000_0000fff8
>> >
>> > You can see that same dest and source works just fine.
>>
>> Hmm, I looked only on what ICC generates, and it was not correct way.
>
> I've just tried
> int
> main ()
> {
>   unsigned int a = 0x5555;
>   asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" : "=r" (a) : "r" (a) : "k6");
>   __builtin_printf ("%x\n", a);
>   return 0;
> }
> on KNL and got 0xaaaa.
> Are you going to report to the SDM authors so that they fix it up?
> E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
> instead of SRC1[0:...] would fix it, or filling up TEMP first and only
> at the end assigning DEST <- TEMP etc. would do.

Yes, we will work on it.

Attached patch refactored in part of builtints declarations and tests, is it Ok?

gcc/
    * config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
    * config/i386/avx512dqintrin.h: Ditto.
    * config/i386/avx512fintrin.h: Ditto.
    * config/i386/i386-builtin-types.def: Add new types.
    * gcc/config/i386/i386.c: Handle new types.
    * config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
    __builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
    __builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
    __builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
    __builtin_ia32_kshiftridi): New.
    * config/i386/sse.md (k<code><mode>): Rename *k<code><mode>.

gcc/testsuite/
    * gcc.target/i386/avx512bw-kshiftld-1.c: New test.
    * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
    * gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
    * gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
    * gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
    * gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
    * gcc.target/i386/avx-1.c: Test new intrinsics.
    * gcc.target/i386/sse-13.c: Ditto.
    * gcc.target/i386/sse-23.c: Ditto.


--
WBR,
Andrew

[-- Attachment #2: avx512-kmask-intrin-part4.patch --]
[-- Type: application/octet-stream, Size: 20697 bytes --]

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
old mode 100644
new mode 100755
index 21bec73..e41428a
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -2569,6 +2569,38 @@ _mm512_cmple_epi16_mask (__m512i __X, __m512i __Y)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask32 (__mmask32 __A, unsigned int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A,
+						(__mmask8) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask64 (__mmask64 __A, unsigned int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftlidi ((__mmask64) __A,
+						(__mmask8) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask32 (__mmask32 __A, unsigned int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A,
+						(__mmask8) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask64 (__mmask64 __A, unsigned int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftridi ((__mmask64) __A,
+						(__mmask8) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_alignr_epi8 (__m512i __A, __m512i __B, const int __N)
@@ -2972,6 +3004,18 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N)
 }
 
 #else
+#define _kshiftli_mask32(X, Y)							\
+  ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y)))
+
+#define _kshiftli_mask64(X, Y)							\
+  ((__mmask64) __builtin_ia32_kshiftlidi ((__mmask64)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask32(X, Y)							\
+  ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask64(X, Y)							\
+  ((__mmask64) __builtin_ia32_kshiftridi ((__mmask64)(X), (__mmask8)(Y)))
+
 #define _mm512_alignr_epi8(X, Y, N)						    \
   ((__m512i) __builtin_ia32_palignr512 ((__v8di)(__m512i)(X),			    \
 					(__v8di)(__m512i)(Y),			    \
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
old mode 100644
new mode 100755
index 1fc2f68..bcb4a32
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -997,6 +997,20 @@ _mm512_maskz_cvtepu64_pd (__mmask8 __U, __m512i __A)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask8 (__mmask8 __A, unsigned int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask8 (__mmask8 __A, unsigned int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_range_pd (__m512d __A, __m512d __B, int __C)
@@ -2008,6 +2022,12 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm)
 }
 
 #else
+#define _kshiftli_mask8(X, Y)						\
+  ((__mmask8) __builtin_ia32_kshiftliqi ((__mmask8)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask8(X, Y)						\
+  ((__mmask8) __builtin_ia32_kshiftriqi ((__mmask8)(X), (__mmask8)(Y)))
+
 #define _mm_range_sd(A, B, C)						\
   ((__m128d) __builtin_ia32_rangesd128_round ((__v2df)(__m128d)(A),	\
     (__v2df)(__m128d)(B), (int)(C),					\
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
old mode 100644
new mode 100755
index af6880e..810ac23
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -8966,6 +8966,22 @@ _mm512_cmpneq_epu64_mask (__m512i __X, __m512i __Y)
 #define _MM_CMPINT_GT	    0x6
 
 #ifdef __OPTIMIZE__
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask16 (__mmask16 __A, unsigned int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A,
+						(__mmask8) __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask16 (__mmask16 __A, unsigned int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A,
+						(__mmask8) __B);
+}
+
 extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_cmp_epi64_mask (__m512i __X, __m512i __Y, const int __P)
@@ -9120,6 +9136,12 @@ _mm_mask_cmp_round_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y,
 }
 
 #else
+#define _kshiftli_mask16(X, Y)						\
+  ((__mmask16) __builtin_ia32_kshiftlihi ((__mmask16)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask16(X, Y)						\
+  ((__mmask16) __builtin_ia32_kshiftrihi ((__mmask16)(X), (__mmask8)(Y)))
+
 #define _mm512_cmp_epi64_mask(X, Y, P)					\
   ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X),	\
 					   (__v8di)(__m512i)(Y), (int)(P),\
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
old mode 100644
new mode 100755
index f287ca0..2922324
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -558,10 +558,9 @@ DEF_FUNCTION_TYPE (USI, UHI)
 DEF_FUNCTION_TYPE (UQI, USI)
 DEF_FUNCTION_TYPE (UHI, USI)
 
-DEF_FUNCTION_TYPE (UQI, UQI, INT)
-DEF_FUNCTION_TYPE (UHI, UHI, INT)
-DEF_FUNCTION_TYPE (USI, USI, INT)
-DEF_FUNCTION_TYPE (UDI, UDI, INT)
+DEF_FUNCTION_TYPE (UHI, UHI, UQI)
+DEF_FUNCTION_TYPE (USI, USI, UQI)
+DEF_FUNCTION_TYPE (UDI, UDI, UQI)
 DEF_FUNCTION_TYPE (UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI)
 DEF_FUNCTION_TYPE (USI, USI)
@@ -619,6 +618,8 @@ DEF_FUNCTION_TYPE (UQI, V4SF, V4SF, INT)
 DEF_FUNCTION_TYPE (UQI, V4SF, V4SF, INT, UQI)
 DEF_FUNCTION_TYPE (UQI, V4SF, V4SF, INT, UQI, INT)
 
+DEF_FUNCTION_TYPE_ALIAS (UQI_FTYPE_UQI_UQI, CONST)
+
 DEF_FUNCTION_TYPE (V16SI, UHI)
 DEF_FUNCTION_TYPE (V8DI, UQI)
 DEF_FUNCTION_TYPE (V16QI, UHI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
old mode 100644
new mode 100755
index c351335..08ce2c9
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1440,6 +1440,14 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kashiftqi, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kashifthi, "__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kashiftsi, "__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int) USI_FTYPE_USI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kashiftdi, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_klshiftrtqi, "__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_klshiftrthi, "__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_klshiftrtsi, "__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int) USI_FTYPE_USI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_klshiftrtdi, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandqi, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandhi, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandsi, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
old mode 100644
new mode 100755
index eb4781d..46d1c44
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35073,10 +35073,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8DI_INT:
     case QI_FTYPE_V4SF_INT:
     case QI_FTYPE_V2DF_INT:
-    case UQI_FTYPE_UQI_INT:
-    case UHI_FTYPE_UHI_INT:
-    case USI_FTYPE_USI_INT:
-    case UDI_FTYPE_UDI_INT:
+    case UQI_FTYPE_UQI_UQI_CONST:
+    case UHI_FTYPE_UHI_UQI:
+    case USI_FTYPE_USI_UQI:
+    case UDI_FTYPE_UDI_UQI:
       nargs = 2;
       nargs_constant = 1;
       break;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f754994..bc504eb 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1410,7 +1410,7 @@
 ;; Mask variant shift mnemonics
 (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
 
-(define_insn "*k<code><mode>"
+(define_insn "k<code><mode>"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
 	(any_lshift:SWI1248_AVX512BWDQ
 	  (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 0418d07..2a0df23 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -168,6 +168,8 @@
 #define __builtin_ia32_xabort(I) __builtin_ia32_xabort(0)
 
 /* avx512fintrin.h */
+#define __builtin_ia32_kshiftlihi(A, B) __builtin_ia32_kshiftlihi(A, 8)
+#define __builtin_ia32_kshiftrihi(A, B) __builtin_ia32_kshiftrihi(A, 8)
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 8)
@@ -372,6 +374,10 @@
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
 
 /* avx512bwintrin.h */
+#define __builtin_ia32_kshiftlisi(A, B) __builtin_ia32_kshiftlisi(A, 8)
+#define __builtin_ia32_kshiftlidi(A, B) __builtin_ia32_kshiftlidi(A, 8)
+#define __builtin_ia32_kshiftrisi(A, B) __builtin_ia32_kshiftrisi(A, 8)
+#define __builtin_ia32_kshiftridi(A, B) __builtin_ia32_kshiftridi(A, 8)
 #define __builtin_ia32_ucmpw512_mask(A, B, E, D) __builtin_ia32_ucmpw512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb512_mask(A, B, E, D) __builtin_ia32_ucmpb512_mask(A, B, 1, D)
 #define __builtin_ia32_psrlwi512_mask(A, E, C, D) __builtin_ia32_psrlwi512_mask(A, 1, C, D)
@@ -388,6 +394,8 @@
 #define __builtin_ia32_pslldq512(A, B) __builtin_ia32_pslldq512(A, 8)
 
 /* avx512dqintrin.h */
+#define __builtin_ia32_kshiftliqi(A, B) __builtin_ia32_kshiftliqi(A, 8)
+#define __builtin_ia32_kshiftriqi(A, B) __builtin_ia32_kshiftriqi(A, 8)
 #define __builtin_ia32_reducess(A, B, F) __builtin_ia32_reducess(A, B, 1)
 #define __builtin_ia32_reducesd(A, B, F) __builtin_ia32_reducesd(A, B, 1)
 #define __builtin_ia32_reduceps512_mask(A, E, C, D) __builtin_ia32_reduceps512_mask(A, 1, C, D)
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
new file mode 100644
index 0000000..03714a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftld\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
new file mode 100644
index 0000000..70a4b67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
new file mode 100644
index 0000000..b99a713
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
new file mode 100644
index 0000000..b0051b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
new file mode 100644
index 0000000..2d72c0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  unsigned int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
new file mode 100644
index 0000000..c5ae199
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  unsigned int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
new file mode 100644
index 0000000..3782d90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  unsigned int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
new file mode 100644
index 0000000..6d537ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  unsigned int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index b23480a..ff0051b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -185,6 +185,8 @@
 #define __builtin_ia32_xabort(N) __builtin_ia32_xabort(1)
 
 /* avx512fintrin.h */
+#define __builtin_ia32_kshiftlihi(A, B) __builtin_ia32_kshiftlihi(A, 8)
+#define __builtin_ia32_kshiftrihi(A, B) __builtin_ia32_kshiftrihi(A, 8)
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 8)
@@ -389,6 +391,10 @@
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
 
 /* avx512bwintrin.h */
+#define __builtin_ia32_kshiftlisi(A, B) __builtin_ia32_kshiftlisi(A, 8)
+#define __builtin_ia32_kshiftlidi(A, B) __builtin_ia32_kshiftlidi(A, 8)
+#define __builtin_ia32_kshiftrisi(A, B) __builtin_ia32_kshiftrisi(A, 8)
+#define __builtin_ia32_kshiftridi(A, B) __builtin_ia32_kshiftridi(A, 8)
 #define __builtin_ia32_ucmpw512_mask(A, B, E, D) __builtin_ia32_ucmpw512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb512_mask(A, B, E, D) __builtin_ia32_ucmpb512_mask(A, B, 1, D)
 #define __builtin_ia32_psrlwi512_mask(A, E, C, D) __builtin_ia32_psrlwi512_mask(A, 1, C, D)
@@ -405,6 +411,8 @@
 #define __builtin_ia32_pslldq512(A, B) __builtin_ia32_pslldq512(A, 8)
 
 /* avx512dqintrin.h */
+#define __builtin_ia32_kshiftliqi(A, B) __builtin_ia32_kshiftliqi(A, 8)
+#define __builtin_ia32_kshiftriqi(A, B) __builtin_ia32_kshiftriqi(A, 8)
 #define __builtin_ia32_reducess(A, B, F) __builtin_ia32_reducess(A, B, 1)
 #define __builtin_ia32_reducesd(A, B, F) __builtin_ia32_reducesd(A, B, 1)
 #define __builtin_ia32_reduceps512_mask(A, E, C, D) __builtin_ia32_reduceps512_mask(A, 1, C, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index bf1cba0..f4fcb00 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -186,6 +186,8 @@
 #define __builtin_ia32_xabort(M) __builtin_ia32_xabort(1)
 
 /* avx512fintrin.h */
+#define __builtin_ia32_kshiftlihi(A, B) __builtin_ia32_kshiftlihi(A, 8)
+#define __builtin_ia32_kshiftrihi(A, B) __builtin_ia32_kshiftrihi(A, 8)
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 8)
@@ -388,6 +390,10 @@
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
 
 /* avx512bwintrin.h */
+#define __builtin_ia32_kshiftlisi(A, B) __builtin_ia32_kshiftlisi(A, 8)
+#define __builtin_ia32_kshiftlidi(A, B) __builtin_ia32_kshiftlidi(A, 8)
+#define __builtin_ia32_kshiftrisi(A, B) __builtin_ia32_kshiftrisi(A, 8)
+#define __builtin_ia32_kshiftridi(A, B) __builtin_ia32_kshiftridi(A, 8)
 #define __builtin_ia32_ucmpw512_mask(A, B, E, D) __builtin_ia32_ucmpw512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb512_mask(A, B, E, D) __builtin_ia32_ucmpb512_mask(A, B, 1, D)
 #define __builtin_ia32_psrlwi512_mask(A, E, C, D) __builtin_ia32_psrlwi512_mask(A, 1, C, D)
@@ -404,6 +410,8 @@
 #define __builtin_ia32_pslldq512(A, B) __builtin_ia32_pslldq512(A, 8)
 
 /* avx512dqintrin.h */
+#define __builtin_ia32_kshiftliqi(A, B) __builtin_ia32_kshiftliqi(A, 8)
+#define __builtin_ia32_kshiftriqi(A, B) __builtin_ia32_kshiftriqi(A, 8)
 #define __builtin_ia32_reducess(A, B, F) __builtin_ia32_reducess(A, B, 1)
 #define __builtin_ia32_reducesd(A, B, F) __builtin_ia32_reducesd(A, B, 1)
 #define __builtin_ia32_reduceps512_mask(A, E, C, D) __builtin_ia32_reduceps512_mask(A, 1, C, D)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-18 12:48                                       ` Andrew Senkevich
@ 2017-01-18 21:45                                         ` Uros Bizjak
  2017-01-19 10:46                                         ` Kirill Yukhin
  1 sibling, 0 replies; 48+ messages in thread
From: Uros Bizjak @ 2017-01-18 21:45 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Jakub Jelinek, Kirill Yukhin, GCC Patches

On Wed, Jan 18, 2017 at 1:45 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2017-01-17 16:51 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
>> On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
>>> > I've played a bit w/ SDE. And looks like operands are not early clobber:
>>> > TID0: INS 0x00000000004003ee             AVX512VEX kmovd k0, eax
>>> > TID0:   k0 := 00000000_ffffffff
>>> > ...
>>> > TID0: INS 0x00000000004003f4             AVX512VEX kshiftlw k0, k0, 0x3
>>> > TID0:   k0 := 00000000_0000fff8
>>> >
>>> > You can see that same dest and source works just fine.
>>>
>>> Hmm, I looked only on what ICC generates, and it was not correct way.
>>
>> I've just tried
>> int
>> main ()
>> {
>>   unsigned int a = 0x5555;
>>   asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" : "=r" (a) : "r" (a) : "k6");
>>   __builtin_printf ("%x\n", a);
>>   return 0;
>> }
>> on KNL and got 0xaaaa.
>> Are you going to report to the SDM authors so that they fix it up?
>> E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
>> instead of SRC1[0:...] would fix it, or filling up TEMP first and only
>> at the end assigning DEST <- TEMP etc. would do.
>
> Yes, we will work on it.
>
> Attached patch refactored in part of builtints declarations and tests, is it Ok?
>
> gcc/
>     * config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
>     * config/i386/avx512dqintrin.h: Ditto.
>     * config/i386/avx512fintrin.h: Ditto.
>     * config/i386/i386-builtin-types.def: Add new types.
>     * gcc/config/i386/i386.c: Handle new types.
>     * config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
>     __builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
>     __builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
>     __builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
>     __builtin_ia32_kshiftridi): New.
>     * config/i386/sse.md (k<code><mode>): Rename *k<code><mode>.
>
> gcc/testsuite/
>     * gcc.target/i386/avx512bw-kshiftld-1.c: New test.
>     * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
>     * gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
>     * gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
>     * gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
>     * gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
>     * gcc.target/i386/avx-1.c: Test new intrinsics.
>     * gcc.target/i386/sse-13.c: Ditto.
>     * gcc.target/i386/sse-23.c: Ditto.

OK.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-18 12:48                                       ` Andrew Senkevich
  2017-01-18 21:45                                         ` Uros Bizjak
@ 2017-01-19 10:46                                         ` Kirill Yukhin
  2017-01-19 16:45                                           ` Andrew Senkevich
  1 sibling, 1 reply; 48+ messages in thread
From: Kirill Yukhin @ 2017-01-19 10:46 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Jakub Jelinek, Uros Bizjak, GCC Patches

Hi Andrew,
On 18 Jan 15:45, Andrew Senkevich wrote:
> 2017-01-17 16:51 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
> > On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
> >> > I've played a bit w/ SDE. And looks like operands are not early clobber:
> >> > TID0: INS 0x00000000004003ee             AVX512VEX kmovd k0, eax
> >> > TID0:   k0 := 00000000_ffffffff
> >> > ...
> >> > TID0: INS 0x00000000004003f4             AVX512VEX kshiftlw k0, k0, 0x3
> >> > TID0:   k0 := 00000000_0000fff8
> >> >
> >> > You can see that same dest and source works just fine.
> >>
> >> Hmm, I looked only on what ICC generates, and it was not correct way.
> >
> > I've just tried
> > int
> > main ()
> > {
> >   unsigned int a = 0x5555;
> >   asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" : "=r" (a) : "r" (a) : "k6");
> >   __builtin_printf ("%x\n", a);
> >   return 0;
> > }
> > on KNL and got 0xaaaa.
> > Are you going to report to the SDM authors so that they fix it up?
> > E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
> > instead of SRC1[0:...] would fix it, or filling up TEMP first and only
> > at the end assigning DEST <- TEMP etc. would do.
>
> Yes, we will work on it.
>
> Attached patch refactored in part of builtints declarations and tests, is it Ok?

Could you please add runtime tests for new intrinsics as well?


--
Thanks, K

> --
> WBR,
> Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-19 10:46                                         ` Kirill Yukhin
@ 2017-01-19 16:45                                           ` Andrew Senkevich
  2017-01-19 18:04                                             ` Kirill Yukhin
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2017-01-19 16:45 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Jakub Jelinek, Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3117 bytes --]

2017-01-19 13:39 GMT+03:00 Kirill Yukhin <kirill.yukhin@gmail.com>:
> Hi Andrew,
> On 18 Jan 15:45, Andrew Senkevich wrote:
>> 2017-01-17 16:51 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
>> > On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
>> >> > I've played a bit w/ SDE. And looks like operands are not early clobber:
>> >> > TID0: INS 0x00000000004003ee             AVX512VEX kmovd k0, eax
>> >> > TID0:   k0 := 00000000_ffffffff
>> >> > ...
>> >> > TID0: INS 0x00000000004003f4             AVX512VEX kshiftlw k0, k0, 0x3
>> >> > TID0:   k0 := 00000000_0000fff8
>> >> >
>> >> > You can see that same dest and source works just fine.
>> >>
>> >> Hmm, I looked only on what ICC generates, and it was not correct way.
>> >
>> > I've just tried
>> > int
>> > main ()
>> > {
>> >   unsigned int a = 0x5555;
>> >   asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" : "=r" (a) : "r" (a) : "k6");
>> >   __builtin_printf ("%x\n", a);
>> >   return 0;
>> > }
>> > on KNL and got 0xaaaa.
>> > Are you going to report to the SDM authors so that they fix it up?
>> > E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
>> > instead of SRC1[0:...] would fix it, or filling up TEMP first and only
>> > at the end assigning DEST <- TEMP etc. would do.
>>
>> Yes, we will work on it.
>>
>> Attached patch refactored in part of builtints declarations and tests, is it Ok?
>
> Could you please add runtime tests for new intrinsics as well?

Attached with runtime tests.

gcc/
    * config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
    * config/i386/avx512dqintrin.h: Ditto.
    * config/i386/avx512fintrin.h: Ditto.
    * config/i386/i386-builtin-types.def: Add new types.
    * gcc/config/i386/i386.c: Handle new types.
    * config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
    __builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
    __builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
    __builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
    __builtin_ia32_kshiftridi): New.
    * config/i386/sse.md (k<code><mode>): Rename *k<code><mode>.

gcc/testsuite/
    * gcc.target/i386/avx512bw-kshiftld-1.c: New test.
    * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
    * gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
    * gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
    * gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
    * gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftld-2.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftlq-2.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftrd-2.c: Ditto.
    * gcc.target/i386/avx512bw-kshiftrq-2.c: Ditto.
    * gcc.target/i386/avx512dq-kshiftlb-2.c: Ditto.
    * gcc.target/i386/avx512dq-kshiftrb-2.c: Ditto.
    * gcc.target/i386/avx512f-kshiftlw-2.c: Ditto.
    * gcc.target/i386/avx512f-kshiftrw-2.c: Ditto.
    * gcc.target/i386/avx-1.c: Test new intrinsics.
    * gcc.target/i386/sse-13.c: Ditto.
    * gcc.target/i386/sse-23.c: Ditto.


--
WBR,
Andrew

[-- Attachment #2: avx512-kmask-intrin-part4.patch --]
[-- Type: application/octet-stream, Size: 25131 bytes --]

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
old mode 100644
new mode 100755
index 21bec73..e41428a
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -2569,6 +2569,38 @@ _mm512_cmple_epi16_mask (__m512i __X, __m512i __Y)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask32 (__mmask32 __A, unsigned int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A,
+						(__mmask8) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask64 (__mmask64 __A, unsigned int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftlidi ((__mmask64) __A,
+						(__mmask8) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask32 (__mmask32 __A, unsigned int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A,
+						(__mmask8) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask64 (__mmask64 __A, unsigned int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftridi ((__mmask64) __A,
+						(__mmask8) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_alignr_epi8 (__m512i __A, __m512i __B, const int __N)
@@ -2972,6 +3004,18 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N)
 }
 
 #else
+#define _kshiftli_mask32(X, Y)							\
+  ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y)))
+
+#define _kshiftli_mask64(X, Y)							\
+  ((__mmask64) __builtin_ia32_kshiftlidi ((__mmask64)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask32(X, Y)							\
+  ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask64(X, Y)							\
+  ((__mmask64) __builtin_ia32_kshiftridi ((__mmask64)(X), (__mmask8)(Y)))
+
 #define _mm512_alignr_epi8(X, Y, N)						    \
   ((__m512i) __builtin_ia32_palignr512 ((__v8di)(__m512i)(X),			    \
 					(__v8di)(__m512i)(Y),			    \
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
old mode 100644
new mode 100755
index 1fc2f68..bcb4a32
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -997,6 +997,20 @@ _mm512_maskz_cvtepu64_pd (__mmask8 __U, __m512i __A)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask8 (__mmask8 __A, unsigned int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask8 (__mmask8 __A, unsigned int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_range_pd (__m512d __A, __m512d __B, int __C)
@@ -2008,6 +2022,12 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm)
 }
 
 #else
+#define _kshiftli_mask8(X, Y)						\
+  ((__mmask8) __builtin_ia32_kshiftliqi ((__mmask8)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask8(X, Y)						\
+  ((__mmask8) __builtin_ia32_kshiftriqi ((__mmask8)(X), (__mmask8)(Y)))
+
 #define _mm_range_sd(A, B, C)						\
   ((__m128d) __builtin_ia32_rangesd128_round ((__v2df)(__m128d)(A),	\
     (__v2df)(__m128d)(B), (int)(C),					\
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
old mode 100644
new mode 100755
index af6880e..810ac23
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -8966,6 +8966,22 @@ _mm512_cmpneq_epu64_mask (__m512i __X, __m512i __Y)
 #define _MM_CMPINT_GT	    0x6
 
 #ifdef __OPTIMIZE__
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask16 (__mmask16 __A, unsigned int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A,
+						(__mmask8) __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask16 (__mmask16 __A, unsigned int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A,
+						(__mmask8) __B);
+}
+
 extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_cmp_epi64_mask (__m512i __X, __m512i __Y, const int __P)
@@ -9120,6 +9136,12 @@ _mm_mask_cmp_round_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y,
 }
 
 #else
+#define _kshiftli_mask16(X, Y)						\
+  ((__mmask16) __builtin_ia32_kshiftlihi ((__mmask16)(X), (__mmask8)(Y)))
+
+#define _kshiftri_mask16(X, Y)						\
+  ((__mmask16) __builtin_ia32_kshiftrihi ((__mmask16)(X), (__mmask8)(Y)))
+
 #define _mm512_cmp_epi64_mask(X, Y, P)					\
   ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X),	\
 					   (__v8di)(__m512i)(Y), (int)(P),\
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
old mode 100644
new mode 100755
index f287ca0..2922324
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -558,10 +558,9 @@ DEF_FUNCTION_TYPE (USI, UHI)
 DEF_FUNCTION_TYPE (UQI, USI)
 DEF_FUNCTION_TYPE (UHI, USI)
 
-DEF_FUNCTION_TYPE (UQI, UQI, INT)
-DEF_FUNCTION_TYPE (UHI, UHI, INT)
-DEF_FUNCTION_TYPE (USI, USI, INT)
-DEF_FUNCTION_TYPE (UDI, UDI, INT)
+DEF_FUNCTION_TYPE (UHI, UHI, UQI)
+DEF_FUNCTION_TYPE (USI, USI, UQI)
+DEF_FUNCTION_TYPE (UDI, UDI, UQI)
 DEF_FUNCTION_TYPE (UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI)
 DEF_FUNCTION_TYPE (USI, USI)
@@ -619,6 +618,8 @@ DEF_FUNCTION_TYPE (UQI, V4SF, V4SF, INT)
 DEF_FUNCTION_TYPE (UQI, V4SF, V4SF, INT, UQI)
 DEF_FUNCTION_TYPE (UQI, V4SF, V4SF, INT, UQI, INT)
 
+DEF_FUNCTION_TYPE_ALIAS (UQI_FTYPE_UQI_UQI, CONST)
+
 DEF_FUNCTION_TYPE (V16SI, UHI)
 DEF_FUNCTION_TYPE (V8DI, UQI)
 DEF_FUNCTION_TYPE (V16QI, UHI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
old mode 100644
new mode 100755
index c351335..08ce2c9
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1440,6 +1440,14 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kashiftqi, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kashifthi, "__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kashiftsi, "__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int) USI_FTYPE_USI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kashiftdi, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_klshiftrtqi, "__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI_CONST)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_klshiftrthi, "__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int) UHI_FTYPE_UHI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_klshiftrtsi, "__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int) USI_FTYPE_USI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_klshiftrtdi, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandqi, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandhi, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandsi, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
old mode 100644
new mode 100755
index eb4781d..46d1c44
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35073,10 +35073,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8DI_INT:
     case QI_FTYPE_V4SF_INT:
     case QI_FTYPE_V2DF_INT:
-    case UQI_FTYPE_UQI_INT:
-    case UHI_FTYPE_UHI_INT:
-    case USI_FTYPE_USI_INT:
-    case UDI_FTYPE_UDI_INT:
+    case UQI_FTYPE_UQI_UQI_CONST:
+    case UHI_FTYPE_UHI_UQI:
+    case USI_FTYPE_USI_UQI:
+    case UDI_FTYPE_UDI_UQI:
       nargs = 2;
       nargs_constant = 1;
       break;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f754994..bc504eb 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1410,7 +1410,7 @@
 ;; Mask variant shift mnemonics
 (define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
 
-(define_insn "*k<code><mode>"
+(define_insn "k<code><mode>"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
 	(any_lshift:SWI1248_AVX512BWDQ
 	  (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 0418d07..2a0df23 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -168,6 +168,8 @@
 #define __builtin_ia32_xabort(I) __builtin_ia32_xabort(0)
 
 /* avx512fintrin.h */
+#define __builtin_ia32_kshiftlihi(A, B) __builtin_ia32_kshiftlihi(A, 8)
+#define __builtin_ia32_kshiftrihi(A, B) __builtin_ia32_kshiftrihi(A, 8)
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 8)
@@ -372,6 +374,10 @@
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
 
 /* avx512bwintrin.h */
+#define __builtin_ia32_kshiftlisi(A, B) __builtin_ia32_kshiftlisi(A, 8)
+#define __builtin_ia32_kshiftlidi(A, B) __builtin_ia32_kshiftlidi(A, 8)
+#define __builtin_ia32_kshiftrisi(A, B) __builtin_ia32_kshiftrisi(A, 8)
+#define __builtin_ia32_kshiftridi(A, B) __builtin_ia32_kshiftridi(A, 8)
 #define __builtin_ia32_ucmpw512_mask(A, B, E, D) __builtin_ia32_ucmpw512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb512_mask(A, B, E, D) __builtin_ia32_ucmpb512_mask(A, B, 1, D)
 #define __builtin_ia32_psrlwi512_mask(A, E, C, D) __builtin_ia32_psrlwi512_mask(A, 1, C, D)
@@ -388,6 +394,8 @@
 #define __builtin_ia32_pslldq512(A, B) __builtin_ia32_pslldq512(A, 8)
 
 /* avx512dqintrin.h */
+#define __builtin_ia32_kshiftliqi(A, B) __builtin_ia32_kshiftliqi(A, 8)
+#define __builtin_ia32_kshiftriqi(A, B) __builtin_ia32_kshiftriqi(A, 8)
 #define __builtin_ia32_reducess(A, B, F) __builtin_ia32_reducess(A, B, 1)
 #define __builtin_ia32_reducesd(A, B, F) __builtin_ia32_reducesd(A, B, 1)
 #define __builtin_ia32_reduceps512_mask(A, E, C, D) __builtin_ia32_reduceps512_mask(A, 1, C, D)
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
new file mode 100644
index 0000000..03714a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftld\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
new file mode 100644
index 0000000..70a4b67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
new file mode 100644
index 0000000..b99a713
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
new file mode 100644
index 0000000..b0051b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  unsigned int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
new file mode 100644
index 0000000..2d72c0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  unsigned int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
new file mode 100644
index 0000000..c5ae199
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  unsigned int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
new file mode 100644
index 0000000..3782d90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  unsigned int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
new file mode 100644
index 0000000..6d537ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  unsigned int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index b23480a..ff0051b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -185,6 +185,8 @@
 #define __builtin_ia32_xabort(N) __builtin_ia32_xabort(1)
 
 /* avx512fintrin.h */
+#define __builtin_ia32_kshiftlihi(A, B) __builtin_ia32_kshiftlihi(A, 8)
+#define __builtin_ia32_kshiftrihi(A, B) __builtin_ia32_kshiftrihi(A, 8)
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 8)
@@ -389,6 +391,10 @@
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
 
 /* avx512bwintrin.h */
+#define __builtin_ia32_kshiftlisi(A, B) __builtin_ia32_kshiftlisi(A, 8)
+#define __builtin_ia32_kshiftlidi(A, B) __builtin_ia32_kshiftlidi(A, 8)
+#define __builtin_ia32_kshiftrisi(A, B) __builtin_ia32_kshiftrisi(A, 8)
+#define __builtin_ia32_kshiftridi(A, B) __builtin_ia32_kshiftridi(A, 8)
 #define __builtin_ia32_ucmpw512_mask(A, B, E, D) __builtin_ia32_ucmpw512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb512_mask(A, B, E, D) __builtin_ia32_ucmpb512_mask(A, B, 1, D)
 #define __builtin_ia32_psrlwi512_mask(A, E, C, D) __builtin_ia32_psrlwi512_mask(A, 1, C, D)
@@ -405,6 +411,8 @@
 #define __builtin_ia32_pslldq512(A, B) __builtin_ia32_pslldq512(A, 8)
 
 /* avx512dqintrin.h */
+#define __builtin_ia32_kshiftliqi(A, B) __builtin_ia32_kshiftliqi(A, 8)
+#define __builtin_ia32_kshiftriqi(A, B) __builtin_ia32_kshiftriqi(A, 8)
 #define __builtin_ia32_reducess(A, B, F) __builtin_ia32_reducess(A, B, 1)
 #define __builtin_ia32_reducesd(A, B, F) __builtin_ia32_reducesd(A, B, 1)
 #define __builtin_ia32_reduceps512_mask(A, E, C, D) __builtin_ia32_reduceps512_mask(A, 1, C, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index bf1cba0..f4fcb00 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -186,6 +186,8 @@
 #define __builtin_ia32_xabort(M) __builtin_ia32_xabort(1)
 
 /* avx512fintrin.h */
+#define __builtin_ia32_kshiftlihi(A, B) __builtin_ia32_kshiftlihi(A, 8)
+#define __builtin_ia32_kshiftrihi(A, B) __builtin_ia32_kshiftrihi(A, 8)
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 8)
@@ -388,6 +390,10 @@
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
 
 /* avx512bwintrin.h */
+#define __builtin_ia32_kshiftlisi(A, B) __builtin_ia32_kshiftlisi(A, 8)
+#define __builtin_ia32_kshiftlidi(A, B) __builtin_ia32_kshiftlidi(A, 8)
+#define __builtin_ia32_kshiftrisi(A, B) __builtin_ia32_kshiftrisi(A, 8)
+#define __builtin_ia32_kshiftridi(A, B) __builtin_ia32_kshiftridi(A, 8)
 #define __builtin_ia32_ucmpw512_mask(A, B, E, D) __builtin_ia32_ucmpw512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb512_mask(A, B, E, D) __builtin_ia32_ucmpb512_mask(A, B, 1, D)
 #define __builtin_ia32_psrlwi512_mask(A, E, C, D) __builtin_ia32_psrlwi512_mask(A, 1, C, D)
@@ -404,6 +410,8 @@
 #define __builtin_ia32_pslldq512(A, B) __builtin_ia32_pslldq512(A, 8)
 
 /* avx512dqintrin.h */
+#define __builtin_ia32_kshiftliqi(A, B) __builtin_ia32_kshiftliqi(A, 8)
+#define __builtin_ia32_kshiftriqi(A, B) __builtin_ia32_kshiftriqi(A, 8)
 #define __builtin_ia32_reducess(A, B, F) __builtin_ia32_reducess(A, B, 1)
 #define __builtin_ia32_reducesd(A, B, F) __builtin_ia32_reducesd(A, B, 1)
 #define __builtin_ia32_reduceps512_mask(A, E, C, D) __builtin_ia32_reduceps512_mask(A, 1, C, D)
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-2.c
new file mode 100644
index 0000000..7fdc01a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1 = 1;
+  unsigned int i = 25;
+
+  volatile __mmask32 r = _kshiftli_mask32 (k1, i);
+  if (r != 1 << i)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-2.c
new file mode 100644
index 0000000..4dabb4a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1 = 1;
+  unsigned int i = 53;
+
+  volatile __mmask64 r = _kshiftli_mask64 (k1, i);
+  if (r != 1 << i)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-2.c
new file mode 100644
index 0000000..ce3707f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+void
+avx512bw_test ()
+{
+  unsigned int i = 25;
+  __mmask32 k1 = 1 << i;
+
+  volatile __mmask32 r = _kshiftri_mask32 (k1, i);
+  if (r != 1)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-2.c
new file mode 100644
index 0000000..655f926
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+void
+avx512bw_test ()
+{
+  unsigned int i = 53;
+  __mmask64 k1 = 1 << i;
+
+  volatile __mmask64 r = _kshiftri_mask64 (k1, i);
+  if (r != 1)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-2.c
new file mode 100644
index 0000000..bb0f10a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512dq" } */
+/* { dg-require-effective-target avx512dq } */
+
+#include "avx512dq-check.h"
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1 = 1;
+  unsigned int i = 5;
+
+  volatile __mmask8 r = _kshiftli_mask8 (k1, i);
+  if (r != 1 << i)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-2.c
new file mode 100644
index 0000000..1b7c3bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512dq" } */
+/* { dg-require-effective-target avx512dq } */
+
+#include "avx512dq-check.h"
+
+void
+avx512dq_test ()
+{
+  unsigned int i = 5;
+  __mmask8 k1 = 1 << i;
+
+  volatile __mmask8 r = _kshiftri_mask8 (k1, i);
+  if (r != 1)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-2.c
new file mode 100644
index 0000000..89d45fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+
+void
+avx512f_test ()
+{
+  __mmask16 k1 = 1;
+  unsigned int i = 10;
+
+  volatile __mmask16 r = _kshiftli_mask16 (k1, i);
+  if (r != 1 << i)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-2.c
new file mode 100644
index 0000000..5a1483a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+
+void
+avx512f_test ()
+{
+  unsigned int i = 10;
+  __mmask16 k1 = 1 << i;
+
+  volatile __mmask16 r = _kshiftri_mask16 (k1, i);
+  if (r != 1)
+    abort ();
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-19 16:45                                           ` Andrew Senkevich
@ 2017-01-19 18:04                                             ` Kirill Yukhin
  2017-01-20 13:41                                               ` Andrew Senkevich
  0 siblings, 1 reply; 48+ messages in thread
From: Kirill Yukhin @ 2017-01-19 18:04 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Jakub Jelinek, Uros Bizjak, GCC Patches

On 19 Jan 19:42, Andrew Senkevich wrote:
> 2017-01-19 13:39 GMT+03:00 Kirill Yukhin <kirill.yukhin@gmail.com>:
> > Hi Andrew,
> > On 18 Jan 15:45, Andrew Senkevich wrote:
> >> 2017-01-17 16:51 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
> >> > On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
> >> >> > I've played a bit w/ SDE. And looks like operands are not early clobber:
> >> >> > TID0: INS 0x00000000004003ee             AVX512VEX kmovd k0, eax
> >> >> > TID0:   k0 := 00000000_ffffffff
> >> >> > ...
> >> >> > TID0: INS 0x00000000004003f4             AVX512VEX kshiftlw k0, k0, 0x3
> >> >> > TID0:   k0 := 00000000_0000fff8
> >> >> >
> >> >> > You can see that same dest and source works just fine.
> >> >>
> >> >> Hmm, I looked only on what ICC generates, and it was not correct way.
> >> >
> >> > I've just tried
> >> > int
> >> > main ()
> >> > {
> >> >   unsigned int a = 0x5555;
> >> >   asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" : "=r" (a) : "r" (a) : "k6");
> >> >   __builtin_printf ("%x\n", a);
> >> >   return 0;
> >> > }
> >> > on KNL and got 0xaaaa.
> >> > Are you going to report to the SDM authors so that they fix it up?
> >> > E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
> >> > instead of SRC1[0:...] would fix it, or filling up TEMP first and only
> >> > at the end assigning DEST <- TEMP etc. would do.
> >>
> >> Yes, we will work on it.
> >>
> >> Attached patch refactored in part of builtints declarations and tests, is it Ok?
> >
> > Could you please add runtime tests for new intrinsics as well?
>
> Attached with runtime tests.
Great! Thanks. Patch is OK for main trunk.

--
Thanks, K
>
> gcc/
>     * config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
>     * config/i386/avx512dqintrin.h: Ditto.
>     * config/i386/avx512fintrin.h: Ditto.
>     * config/i386/i386-builtin-types.def: Add new types.
>     * gcc/config/i386/i386.c: Handle new types.
>     * config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
>     __builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
>     __builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
>     __builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
>     __builtin_ia32_kshiftridi): New.
>     * config/i386/sse.md (k<code><mode>): Rename *k<code><mode>.
>
> gcc/testsuite/
>     * gcc.target/i386/avx512bw-kshiftld-1.c: New test.
>     * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
>     * gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
>     * gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
>     * gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
>     * gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftld-2.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftlq-2.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftrd-2.c: Ditto.
>     * gcc.target/i386/avx512bw-kshiftrq-2.c: Ditto.
>     * gcc.target/i386/avx512dq-kshiftlb-2.c: Ditto.
>     * gcc.target/i386/avx512dq-kshiftrb-2.c: Ditto.
>     * gcc.target/i386/avx512f-kshiftlw-2.c: Ditto.
>     * gcc.target/i386/avx512f-kshiftrw-2.c: Ditto.
>     * gcc.target/i386/avx-1.c: Test new intrinsics.
>     * gcc.target/i386/sse-13.c: Ditto.
>     * gcc.target/i386/sse-23.c: Ditto.
>
>
> --
> WBR,
> Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-19 18:04                                             ` Kirill Yukhin
@ 2017-01-20 13:41                                               ` Andrew Senkevich
  2017-01-20 13:47                                                 ` Uros Bizjak
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2017-01-20 13:41 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Jakub Jelinek, Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4376 bytes --]

2017-01-19 20:55 GMT+03:00 Kirill Yukhin <kirill.yukhin@gmail.com>:
> On 19 Jan 19:42, Andrew Senkevich wrote:
>> 2017-01-19 13:39 GMT+03:00 Kirill Yukhin <kirill.yukhin@gmail.com>:
>> > Hi Andrew,
>> > On 18 Jan 15:45, Andrew Senkevich wrote:
>> >> 2017-01-17 16:51 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
>> >> > On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
>> >> >> > I've played a bit w/ SDE. And looks like operands are not early clobber:
>> >> >> > TID0: INS 0x00000000004003ee             AVX512VEX kmovd k0, eax
>> >> >> > TID0:   k0 := 00000000_ffffffff
>> >> >> > ...
>> >> >> > TID0: INS 0x00000000004003f4             AVX512VEX kshiftlw k0, k0, 0x3
>> >> >> > TID0:   k0 := 00000000_0000fff8
>> >> >> >
>> >> >> > You can see that same dest and source works just fine.
>> >> >>
>> >> >> Hmm, I looked only on what ICC generates, and it was not correct way.
>> >> >
>> >> > I've just tried
>> >> > int
>> >> > main ()
>> >> > {
>> >> >   unsigned int a = 0x5555;
>> >> >   asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" : "=r" (a) : "r" (a) : "k6");
>> >> >   __builtin_printf ("%x\n", a);
>> >> >   return 0;
>> >> > }
>> >> > on KNL and got 0xaaaa.
>> >> > Are you going to report to the SDM authors so that they fix it up?
>> >> > E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
>> >> > instead of SRC1[0:...] would fix it, or filling up TEMP first and only
>> >> > at the end assigning DEST <- TEMP etc. would do.
>> >>
>> >> Yes, we will work on it.
>> >>
>> >> Attached patch refactored in part of builtints declarations and tests, is it Ok?
>> >
>> > Could you please add runtime tests for new intrinsics as well?
>>
>> Attached with runtime tests.
> Great! Thanks. Patch is OK for main trunk.
>
> --
> Thanks, K
>>
>> gcc/
>>     * config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
>>     * config/i386/avx512dqintrin.h: Ditto.
>>     * config/i386/avx512fintrin.h: Ditto.
>>     * config/i386/i386-builtin-types.def: Add new types.
>>     * gcc/config/i386/i386.c: Handle new types.
>>     * config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
>>     __builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
>>     __builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
>>     __builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
>>     __builtin_ia32_kshiftridi): New.
>>     * config/i386/sse.md (k<code><mode>): Rename *k<code><mode>.
>>
>> gcc/testsuite/
>>     * gcc.target/i386/avx512bw-kshiftld-1.c: New test.
>>     * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
>>     * gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
>>     * gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
>>     * gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
>>     * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
>>     * gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
>>     * gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
>>     * gcc.target/i386/avx512bw-kshiftld-2.c: Ditto.
>>     * gcc.target/i386/avx512bw-kshiftlq-2.c: Ditto.
>>     * gcc.target/i386/avx512bw-kshiftrd-2.c: Ditto.
>>     * gcc.target/i386/avx512bw-kshiftrq-2.c: Ditto.
>>     * gcc.target/i386/avx512dq-kshiftlb-2.c: Ditto.
>>     * gcc.target/i386/avx512dq-kshiftrb-2.c: Ditto.
>>     * gcc.target/i386/avx512f-kshiftlw-2.c: Ditto.
>>     * gcc.target/i386/avx512f-kshiftrw-2.c: Ditto.
>>     * gcc.target/i386/avx-1.c: Test new intrinsics.
>>     * gcc.target/i386/sse-13.c: Ditto.
>>     * gcc.target/i386/sse-23.c: Ditto.

Hi,

here is intrinsics for ktest{b,w,d,q} and kortest{b,w,d,q}. Is it Ok?

gcc/
    * config/i386/avx512bwintrin.h: Add k-mask test, kortest intrinsics.
    * config/i386/avx512dqintrin.h: Ditto.
    * config/i386/avx512fintrin.h: Ditto.
    * gcc/config/i386/i386.c: Handle new builtins.
    * config/i386/i386-builtin.def: Add new builtins.
    * config/i386/sse.md (ktest<mode>, kortest<mode>): New.
    (UNSPEC_KORTEST, UNSPEC_KTEST): New.

gcc/testsuite/
    * gcc.target/i386/avx512bw-ktestd-1.c: New test.
    * gcc.target/i386/avx512bw-ktestq-1.c: Ditto.
    * gcc.target/i386/avx512dq-ktestb-1.c: Ditto.
    * gcc.target/i386/avx512f-ktestw-1.c: Ditto.
    * gcc.target/i386/avx512bw-kortestd-1.c: Ditto.
    * gcc.target/i386/avx512bw-kortestq-1.c: Ditto.
    * gcc.target/i386/avx512dq-kortestb-1.c: Ditto.
    * gcc.target/i386/avx512f-kortestw-1.c: Ditto.


--
WBR,
Andrew

[-- Attachment #2: avx512-kmask-intrin-part5.patch --]
[-- Type: application/octet-stream, Size: 18367 bytes --]

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
old mode 100755
new mode 100644
index e41428a..8a06273
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,62 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 typedef unsigned long long __mmask64;
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestzsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestcsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
+}
+
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kadd_mask32 (__mmask32 __A, __mmask32 __B)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
old mode 100755
new mode 100644
index bcb4a32..b7dd6cd
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,34 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestz_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestzqi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestc_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestcqi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestzqi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestcqi (__A, __B);
+}
+
 extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kadd_mask8 (__mmask8 __A, __mmask8 __B)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 810ac23..4af9c0b
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -10006,6 +10006,36 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 #define _kxnor_mask16 _mm512_kxnor
 #define _kxor_mask16 _mm512_kxor
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestzhi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestchi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestzhi ((__mmask16) __A,
+						    (__mmask16) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestchi ((__mmask16) __A,
+						    (__mmask16) __B);
+}
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kadd_mask16 (__mmask16 __A, __mmask16 __B)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 08ce2c9..137aa3e
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1464,8 +1464,23 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kiorqi, "__builtin_ia32_korqi", IX86_B
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kiorhi, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kiorsi, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestchi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestzhi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ktestqi, "__builtin_ia32_ktestcqi", IX86_BUILTIN_KTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ktestqi, "__builtin_ia32_ktestzqi", IX86_BUILTIN_KTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_ktesthi, "__builtin_ia32_ktestchi", IX86_BUILTIN_KTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_ktesthi, "__builtin_ia32_ktestzhi", IX86_BUILTIN_KTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestsi, "__builtin_ia32_ktestcsi", IX86_BUILTIN_KTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestsi, "__builtin_ia32_ktestzsi", IX86_BUILTIN_KTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestdi, "__builtin_ia32_ktestcdi", IX86_BUILTIN_KTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestdi, "__builtin_ia32_ktestzdi", IX86_BUILTIN_KTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kortestqi, "__builtin_ia32_kortestcqi", IX86_BUILTIN_KORTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kortestqi, "__builtin_ia32_kortestzqi", IX86_BUILTIN_KORTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortesthi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortesthi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kortestsi, "__builtin_ia32_kortestcsi", IX86_BUILTIN_KORTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kortestsi, "__builtin_ia32_kortestzsi", IX86_BUILTIN_KORTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kortestdi, "__builtin_ia32_kortestcdi", IX86_BUILTIN_KORTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kortestdi, "__builtin_ia32_kortestzdi", IX86_BUILTIN_KORTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 46d1c44..65b32e6
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -37642,16 +37642,100 @@ rdseed_step:
       emit_insn (gen_pop (gen_rtx_REG (word_mode, FLAGS_REG)));
       return 0;
 
+    case IX86_BUILTIN_KTESTC8:
+      icode = CODE_FOR_ktestqi;
+      mode0 = QImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTZ8:
+      icode = CODE_FOR_ktestqi;
+      mode0 = QImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTC16:
+      icode = CODE_FOR_ktesthi;
+      mode0 = HImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTZ16:
+      icode = CODE_FOR_ktesthi;
+      mode0 = HImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTC32:
+      icode = CODE_FOR_ktestsi;
+      mode0 = SImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTZ32:
+      icode = CODE_FOR_ktestsi;
+      mode0 = SImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTC64:
+      icode = CODE_FOR_ktestdi;
+      mode0 = DImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTZ64:
+      icode = CODE_FOR_ktestdi;
+      mode0 = DImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTC8:
+      icode = CODE_FOR_kortestqi;
+      mode0 = QImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTZ8:
+      icode = CODE_FOR_kortestqi;
+      mode0 = QImode;
+      mode1 = CCZmode;
+      goto kortest;
+
     case IX86_BUILTIN_KORTESTC16:
-      icode = CODE_FOR_kortestchi;
+      icode = CODE_FOR_kortesthi;
       mode0 = HImode;
       mode1 = CCCmode;
       goto kortest;
 
     case IX86_BUILTIN_KORTESTZ16:
-      icode = CODE_FOR_kortestzhi;
+      icode = CODE_FOR_kortesthi;
       mode0 = HImode;
       mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTC32:
+      icode = CODE_FOR_kortestsi;
+      mode0 = SImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTZ32:
+      icode = CODE_FOR_kortestsi;
+      mode0 = SImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTC64:
+      icode = CODE_FOR_kortestdi;
+      mode0 = DImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTZ64:
+      icode = CODE_FOR_kortestdi;
+      mode0 = DImode;
+      mode1 = CCZmode;
 
     kortest:
       arg0 = CALL_EXPR_ARG (exp, 0); /* Mask reg src1.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index bc504eb..0d074f8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -108,6 +108,8 @@
 
   ;; Mask operations
   UNSPEC_MASKOP
+  UNSPEC_KORTEST
+  UNSPEC_KTEST
 
   ;; For embed. rounding feature
   UNSPEC_EMBEDDED_ROUNDING
@@ -1422,31 +1424,27 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "<MODE>")])
 
-;;There are kortrest[bdq] but no intrinsics for them.
-;;We probably don't need to implement them.
-(define_insn "kortestzhi"
-  [(set (reg:CCZ FLAGS_REG)
-	(compare:CCZ
-	  (ior:HI
-	    (match_operand:HI 0 "register_operand" "k")
-	    (match_operand:HI 1 "register_operand" "k"))
-	  (const_int 0)))]
-  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
-  "kortestw\t{%1, %0|%0, %1}"
-  [(set_attr "mode" "HI")
+(define_insn "ktest<mode>"
+  [(set (reg:CC FLAGS_REG)
+	(unspec:CC
+	  [(match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "k")
+	   (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")]
+	  UNSPEC_KTEST))]
+  "TARGET_AVX512F"
+  "ktest<mskmodesuffix>\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "<MODE>")
    (set_attr "type" "msklog")
    (set_attr "prefix" "vex")])
 
-(define_insn "kortestchi"
-  [(set (reg:CCC FLAGS_REG)
-	(compare:CCC
-	  (ior:HI
-	    (match_operand:HI 0 "register_operand" "k")
-	    (match_operand:HI 1 "register_operand" "k"))
-	  (const_int -1)))]
-  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
-  "kortestw\t{%1, %0|%0, %1}"
-  [(set_attr "mode" "HI")
+(define_insn "kortest<mode>"
+  [(set (reg:CC FLAGS_REG)
+	(unspec:CC
+	  [(match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "k")
+	   (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")]
+	  UNSPEC_KORTEST))]
+  "TARGET_AVX512F"
+  "kortest<mskmodesuffix>\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "<MODE>")
    (set_attr "type" "msklog")
    (set_attr "prefix" "vex")])
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kortestd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kortestd-1.c
new file mode 100644
index 0000000..9d6235c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kortestd-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512bw" } */
+/* { dg-final { scan-assembler-times "kortestd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k1;
+  __mmask32 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _kortestc_mask32_u8(k1, k2);
+  r = _kortestz_mask32_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kortestq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kortestq-1.c
new file mode 100644
index 0000000..7f27618
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kortestq-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512bw" } */
+/* { dg-final { scan-assembler-times "kortestq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k1;
+  __mmask64 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _kortestc_mask64_u8(k1, k2);
+  r = _kortestz_mask64_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-ktestd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-ktestd-1.c
new file mode 100644
index 0000000..56d3c4a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-ktestd-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512bw" } */
+/* { dg-final { scan-assembler-times "ktestd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k1;
+  __mmask32 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _ktestc_mask32_u8(k1, k2);
+  r = _ktestz_mask32_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-ktestq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-ktestq-1.c
new file mode 100644
index 0000000..3d91132
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-ktestq-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512bw" } */
+/* { dg-final { scan-assembler-times "ktestq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k1;
+  __mmask64 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _ktestc_mask64_u8(k1, k2);
+  r = _ktestz_mask64_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kortestb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kortestb-1.c
new file mode 100644
index 0000000..b743d60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kortestb-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512dq" } */
+/* { dg-final { scan-assembler-times "kortestb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test () {
+  volatile __mmask8 k1;
+  __mmask8 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _kortestc_mask8_u8(k1, k2);
+  r = _kortestz_mask8_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-ktestb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-ktestb-1.c
new file mode 100644
index 0000000..4e13fd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-ktestb-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512dq" } */
+/* { dg-final { scan-assembler-times "ktestb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test () {
+  volatile __mmask8 k1;
+  __mmask8 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _ktestc_mask8_u8(k1, k2);
+  r = _ktestz_mask8_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kortestw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kortestw-1.c
index af6f5f1..7084ada 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-kortestw-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kortestw-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O0 -mavx512f" } */
-/* { dg-final { scan-assembler-times "kortestw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)"  4 } } */
+/* { dg-final { scan-assembler-times "kortestw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 6 } } */
 
 #include <immintrin.h>
 
@@ -19,4 +19,9 @@ avx512f_test () {
 
   r = _mm512_kortestc (k3, k4);
   r = _mm512_kortestz (k3, k4);
+
+  volatile unsigned char r1 __attribute__((unused));	
+
+  r1 = _kortestc_mask16_u8(k1, k2);
+  r1 = _kortestz_mask16_u8(k1, k2);
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-ktestw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-1.c
new file mode 100644
index 0000000..f6151d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512f" } */
+/* { dg-final { scan-assembler-times "ktestw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test () {
+  volatile __mmask16 k1;
+  __mmask16 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _ktestc_mask16_u8(k1, k2);
+  r = _ktestz_mask16_u8(k1, k2);
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-20 13:41                                               ` Andrew Senkevich
@ 2017-01-20 13:47                                                 ` Uros Bizjak
  2017-01-20 17:26                                                   ` Kirill Yukhin
  0 siblings, 1 reply; 48+ messages in thread
From: Uros Bizjak @ 2017-01-20 13:47 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Kirill Yukhin, Jakub Jelinek, GCC Patches

On Fri, Jan 20, 2017 at 2:32 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:

> here is intrinsics for ktest{b,w,d,q} and kortest{b,w,d,q}. Is it Ok?
>
> gcc/
>     * config/i386/avx512bwintrin.h: Add k-mask test, kortest intrinsics.
>     * config/i386/avx512dqintrin.h: Ditto.
>     * config/i386/avx512fintrin.h: Ditto.
>     * gcc/config/i386/i386.c: Handle new builtins.
>     * config/i386/i386-builtin.def: Add new builtins.
>     * config/i386/sse.md (ktest<mode>, kortest<mode>): New.
>     (UNSPEC_KORTEST, UNSPEC_KTEST): New.
>
> gcc/testsuite/
>     * gcc.target/i386/avx512bw-ktestd-1.c: New test.
>     * gcc.target/i386/avx512bw-ktestq-1.c: Ditto.
>     * gcc.target/i386/avx512dq-ktestb-1.c: Ditto.
>     * gcc.target/i386/avx512f-ktestw-1.c: Ditto.
>     * gcc.target/i386/avx512bw-kortestd-1.c: Ditto.
>     * gcc.target/i386/avx512bw-kortestq-1.c: Ditto.
>     * gcc.target/i386/avx512dq-kortestb-1.c: Ditto.
>     * gcc.target/i386/avx512f-kortestw-1.c: Ditto.

IMO, you should add some runtime tests.

Otherwise, the patch LGTM, but I'l leave the final approval to Kirill.

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-20 13:47                                                 ` Uros Bizjak
@ 2017-01-20 17:26                                                   ` Kirill Yukhin
  2017-01-20 20:07                                                     ` Andrew Senkevich
  0 siblings, 1 reply; 48+ messages in thread
From: Kirill Yukhin @ 2017-01-20 17:26 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Andrew Senkevich, Jakub Jelinek, GCC Patches

Hi,
On 20 Jan 14:46, Uros Bizjak wrote:
> On Fri, Jan 20, 2017 at 2:32 PM, Andrew Senkevich
> <andrew.n.senkevich@gmail.com> wrote:
>
> > here is intrinsics for ktest{b,w,d,q} and kortest{b,w,d,q}. Is it Ok?
> >
> > gcc/
> >     * config/i386/avx512bwintrin.h: Add k-mask test, kortest intrinsics.
> >     * config/i386/avx512dqintrin.h: Ditto.
> >     * config/i386/avx512fintrin.h: Ditto.
> >     * gcc/config/i386/i386.c: Handle new builtins.
> >     * config/i386/i386-builtin.def: Add new builtins.
> >     * config/i386/sse.md (ktest<mode>, kortest<mode>): New.
> >     (UNSPEC_KORTEST, UNSPEC_KTEST): New.
> >
> > gcc/testsuite/
> >     * gcc.target/i386/avx512bw-ktestd-1.c: New test.
> >     * gcc.target/i386/avx512bw-ktestq-1.c: Ditto.
> >     * gcc.target/i386/avx512dq-ktestb-1.c: Ditto.
> >     * gcc.target/i386/avx512f-ktestw-1.c: Ditto.
> >     * gcc.target/i386/avx512bw-kortestd-1.c: Ditto.
> >     * gcc.target/i386/avx512bw-kortestq-1.c: Ditto.
> >     * gcc.target/i386/avx512dq-kortestb-1.c: Ditto.
> >     * gcc.target/i386/avx512f-kortestw-1.c: Ditto.
>
> IMO, you should add some runtime tests.
+1

> Otherwise, the patch LGTM, but I'l leave the final approval to Kirill.
Anyway trunk is frozen, so I suppose you'll need OK from RM.

So, no much hurry. Pls add runtime tests.

--
Thanks, K
>
> Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-20 17:26                                                   ` Kirill Yukhin
@ 2017-01-20 20:07                                                     ` Andrew Senkevich
  2017-01-21  8:25                                                       ` Richard Biener
                                                                         ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Andrew Senkevich @ 2017-01-20 20:07 UTC (permalink / raw)
  To: Kirill Yukhin, Richard Biener; +Cc: Uros Bizjak, Jakub Jelinek, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2816 bytes --]

2017-01-20 20:08 GMT+03:00 Kirill Yukhin <kirill.yukhin@gmail.com>:
> Hi,
> On 20 Jan 14:46, Uros Bizjak wrote:
>> On Fri, Jan 20, 2017 at 2:32 PM, Andrew Senkevich
>> <andrew.n.senkevich@gmail.com> wrote:
>>
>> > here is intrinsics for ktest{b,w,d,q} and kortest{b,w,d,q}. Is it Ok?
>> >
>> > gcc/
>> >     * config/i386/avx512bwintrin.h: Add k-mask test, kortest intrinsics.
>> >     * config/i386/avx512dqintrin.h: Ditto.
>> >     * config/i386/avx512fintrin.h: Ditto.
>> >     * gcc/config/i386/i386.c: Handle new builtins.
>> >     * config/i386/i386-builtin.def: Add new builtins.
>> >     * config/i386/sse.md (ktest<mode>, kortest<mode>): New.
>> >     (UNSPEC_KORTEST, UNSPEC_KTEST): New.
>> >
>> > gcc/testsuite/
>> >     * gcc.target/i386/avx512bw-ktestd-1.c: New test.
>> >     * gcc.target/i386/avx512bw-ktestq-1.c: Ditto.
>> >     * gcc.target/i386/avx512dq-ktestb-1.c: Ditto.
>> >     * gcc.target/i386/avx512f-ktestw-1.c: Ditto.
>> >     * gcc.target/i386/avx512bw-kortestd-1.c: Ditto.
>> >     * gcc.target/i386/avx512bw-kortestq-1.c: Ditto.
>> >     * gcc.target/i386/avx512dq-kortestb-1.c: Ditto.
>> >     * gcc.target/i386/avx512f-kortestw-1.c: Ditto.
>>
>> IMO, you should add some runtime tests.
> +1
>
>> Otherwise, the patch LGTM, but I'l leave the final approval to Kirill.
> Anyway trunk is frozen, so I suppose you'll need OK from RM.

Kirill, attached with runtime tests.

Richard, are you OK to approve commit of this patch?
It is last part of k-mask intrinsics, it would be great to have all
intrinsics of this type available in single GCC release..

Updated changelog:

gcc/
    * config/i386/avx512bwintrin.h: Add k-mask test, kortest intrinsics.
    * config/i386/avx512dqintrin.h: Ditto.
    * config/i386/avx512fintrin.h: Ditto.
    * gcc/config/i386/i386.c: Handle new builtins.
    * config/i386/i386-builtin.def: Add new builtins.
    * config/i386/sse.md (ktest<mode>, kortest<mode>): New.
    (UNSPEC_KORTEST, UNSPEC_KTEST): New.

gcc/testsuite/
    * gcc.target/i386/avx512bw-ktestd-1.c: New test.
    * gcc.target/i386/avx512bw-ktestq-1.c: Ditto.
    * gcc.target/i386/avx512dq-ktestb-1.c: Ditto.
    * gcc.target/i386/avx512f-ktestw-1.c: Ditto.
    * gcc.target/i386/avx512bw-kortestd-1.c: Ditto.
    * gcc.target/i386/avx512bw-kortestq-1.c: Ditto.
    * gcc.target/i386/avx512dq-kortestb-1.c: Ditto.
    * gcc.target/i386/avx512f-kortestw-1.c: Ditto.
    * gcc.target/i386/avx512bw-ktestd-2.c: Ditt
    * gcc.target/i386/avx512bw-ktestq-2.c: Ditto.
    * gcc.target/i386/avx512dq-ktestb-2.c: Ditto.
    * gcc.target/i386/avx512f-ktestw-2.c: Ditto.
    * gcc.target/i386/avx512bw-kortestd-2.c: Ditto.
    * gcc.target/i386/avx512bw-kortestq-2.c: Ditto.
    * gcc.target/i386/avx512dq-kortestb-2.c: Ditto.
    * gcc.target/i386/avx512f-kortestw-2.c: Ditto.


--
WBR,
Andrew

[-- Attachment #2: avx512-kmask-intrin-part5.patch --]
[-- Type: application/octet-stream, Size: 26096 bytes --]

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index e41428a..d05eed2
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,94 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 typedef unsigned long long __mmask64;
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktest_mask32_u8  (__mmask32 __A,  __mmask32 __B, unsigned char *__CF)
+{
+  *__CF = (unsigned char) __builtin_ia32_ktestcsi (__A, __B);
+  return (unsigned char) __builtin_ia32_ktestzsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktest_mask64_u8  (__mmask64 __A,  __mmask64 __B, unsigned char *__CF)
+{
+  *__CF = (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
+  return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestzsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestcsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestcdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask32_u8  (__mmask32 __A,  __mmask32 __B, unsigned char *__CF)
+{
+  *__CF = (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
+  return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask64_u8  (__mmask64 __A,  __mmask64 __B, unsigned char *__CF)
+{
+  *__CF = (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
+  return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestzsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestzdi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestcsi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestcdi (__A, __B);
+}
+
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kadd_mask32 (__mmask32 __A, __mmask32 __B)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index bcb4a32..670e41e
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,50 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktest_mask8_u8  (__mmask8 __A,  __mmask8 __B, unsigned char *__CF)
+{
+  *__CF = (unsigned char) __builtin_ia32_ktestcqi (__A, __B);
+  return (unsigned char) __builtin_ia32_ktestzqi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestz_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestzqi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestc_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestcqi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask8_u8  (__mmask8 __A,  __mmask8 __B, unsigned char *__CF)
+{
+  *__CF = (unsigned char) __builtin_ia32_kortestcqi (__A, __B);
+  return (unsigned char) __builtin_ia32_kortestzqi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestzqi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestcqi (__A, __B);
+}
+
 extern __inline __mmask8
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kadd_mask8 (__mmask8 __A, __mmask8 __B)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 810ac23..6c11453
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -10006,6 +10006,52 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 #define _kxnor_mask16 _mm512_kxnor
 #define _kxor_mask16 _mm512_kxor
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktest_mask16_u8  (__mmask16 __A,  __mmask16 __B, unsigned char *__CF)
+{
+  *__CF = (unsigned char) __builtin_ia32_ktestchi (__A, __B);
+  return (unsigned char) __builtin_ia32_ktestzhi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestzhi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestchi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask16_u8  (__mmask16 __A,  __mmask16 __B, unsigned char *__CF)
+{
+  *__CF = (unsigned char) __builtin_ia32_kortestchi (__A, __B);
+  return (unsigned char) __builtin_ia32_kortestzhi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestzhi ((__mmask16) __A,
+						    (__mmask16) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestchi ((__mmask16) __A,
+						    (__mmask16) __B);
+}
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kadd_mask16 (__mmask16 __A, __mmask16 __B)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 08ce2c9..137aa3e
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1464,8 +1464,23 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kiorqi, "__builtin_ia32_korqi", IX86_B
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kiorhi, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kiorsi, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestchi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestzhi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ktestqi, "__builtin_ia32_ktestcqi", IX86_BUILTIN_KTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ktestqi, "__builtin_ia32_ktestzqi", IX86_BUILTIN_KTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_ktesthi, "__builtin_ia32_ktestchi", IX86_BUILTIN_KTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_ktesthi, "__builtin_ia32_ktestzhi", IX86_BUILTIN_KTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestsi, "__builtin_ia32_ktestcsi", IX86_BUILTIN_KTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestsi, "__builtin_ia32_ktestzsi", IX86_BUILTIN_KTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestdi, "__builtin_ia32_ktestcdi", IX86_BUILTIN_KTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestdi, "__builtin_ia32_ktestzdi", IX86_BUILTIN_KTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kortestqi, "__builtin_ia32_kortestcqi", IX86_BUILTIN_KORTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kortestqi, "__builtin_ia32_kortestzqi", IX86_BUILTIN_KORTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortesthi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortesthi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kortestsi, "__builtin_ia32_kortestcsi", IX86_BUILTIN_KORTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kortestsi, "__builtin_ia32_kortestzsi", IX86_BUILTIN_KORTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kortestdi, "__builtin_ia32_kortestcdi", IX86_BUILTIN_KORTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kortestdi, "__builtin_ia32_kortestzdi", IX86_BUILTIN_KORTESTZ64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 46d1c44..65b32e6
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -37642,16 +37642,100 @@ rdseed_step:
       emit_insn (gen_pop (gen_rtx_REG (word_mode, FLAGS_REG)));
       return 0;
 
+    case IX86_BUILTIN_KTESTC8:
+      icode = CODE_FOR_ktestqi;
+      mode0 = QImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTZ8:
+      icode = CODE_FOR_ktestqi;
+      mode0 = QImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTC16:
+      icode = CODE_FOR_ktesthi;
+      mode0 = HImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTZ16:
+      icode = CODE_FOR_ktesthi;
+      mode0 = HImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTC32:
+      icode = CODE_FOR_ktestsi;
+      mode0 = SImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTZ32:
+      icode = CODE_FOR_ktestsi;
+      mode0 = SImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTC64:
+      icode = CODE_FOR_ktestdi;
+      mode0 = DImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KTESTZ64:
+      icode = CODE_FOR_ktestdi;
+      mode0 = DImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTC8:
+      icode = CODE_FOR_kortestqi;
+      mode0 = QImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTZ8:
+      icode = CODE_FOR_kortestqi;
+      mode0 = QImode;
+      mode1 = CCZmode;
+      goto kortest;
+
     case IX86_BUILTIN_KORTESTC16:
-      icode = CODE_FOR_kortestchi;
+      icode = CODE_FOR_kortesthi;
       mode0 = HImode;
       mode1 = CCCmode;
       goto kortest;
 
     case IX86_BUILTIN_KORTESTZ16:
-      icode = CODE_FOR_kortestzhi;
+      icode = CODE_FOR_kortesthi;
       mode0 = HImode;
       mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTC32:
+      icode = CODE_FOR_kortestsi;
+      mode0 = SImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTZ32:
+      icode = CODE_FOR_kortestsi;
+      mode0 = SImode;
+      mode1 = CCZmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTC64:
+      icode = CODE_FOR_kortestdi;
+      mode0 = DImode;
+      mode1 = CCCmode;
+      goto kortest;
+
+    case IX86_BUILTIN_KORTESTZ64:
+      icode = CODE_FOR_kortestdi;
+      mode0 = DImode;
+      mode1 = CCZmode;
 
     kortest:
       arg0 = CALL_EXPR_ARG (exp, 0); /* Mask reg src1.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index bc504eb..0d074f8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -108,6 +108,8 @@
 
   ;; Mask operations
   UNSPEC_MASKOP
+  UNSPEC_KORTEST
+  UNSPEC_KTEST
 
   ;; For embed. rounding feature
   UNSPEC_EMBEDDED_ROUNDING
@@ -1422,31 +1424,27 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "<MODE>")])
 
-;;There are kortrest[bdq] but no intrinsics for them.
-;;We probably don't need to implement them.
-(define_insn "kortestzhi"
-  [(set (reg:CCZ FLAGS_REG)
-	(compare:CCZ
-	  (ior:HI
-	    (match_operand:HI 0 "register_operand" "k")
-	    (match_operand:HI 1 "register_operand" "k"))
-	  (const_int 0)))]
-  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
-  "kortestw\t{%1, %0|%0, %1}"
-  [(set_attr "mode" "HI")
+(define_insn "ktest<mode>"
+  [(set (reg:CC FLAGS_REG)
+	(unspec:CC
+	  [(match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "k")
+	   (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")]
+	  UNSPEC_KTEST))]
+  "TARGET_AVX512F"
+  "ktest<mskmodesuffix>\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "<MODE>")
    (set_attr "type" "msklog")
    (set_attr "prefix" "vex")])
 
-(define_insn "kortestchi"
-  [(set (reg:CCC FLAGS_REG)
-	(compare:CCC
-	  (ior:HI
-	    (match_operand:HI 0 "register_operand" "k")
-	    (match_operand:HI 1 "register_operand" "k"))
-	  (const_int -1)))]
-  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
-  "kortestw\t{%1, %0|%0, %1}"
-  [(set_attr "mode" "HI")
+(define_insn "kortest<mode>"
+  [(set (reg:CC FLAGS_REG)
+	(unspec:CC
+	  [(match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "k")
+	   (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")]
+	  UNSPEC_KORTEST))]
+  "TARGET_AVX512F"
+  "kortest<mskmodesuffix>\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "<MODE>")
    (set_attr "type" "msklog")
    (set_attr "prefix" "vex")])
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kortestd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kortestd-1.c
new file mode 100644
index 0000000..9d6235c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kortestd-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512bw" } */
+/* { dg-final { scan-assembler-times "kortestd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k1;
+  __mmask32 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _kortestc_mask32_u8(k1, k2);
+  r = _kortestz_mask32_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kortestq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kortestq-1.c
new file mode 100644
index 0000000..7f27618
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kortestq-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512bw" } */
+/* { dg-final { scan-assembler-times "kortestq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k1;
+  __mmask64 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _kortestc_mask64_u8(k1, k2);
+  r = _kortestz_mask64_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-ktestd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-ktestd-1.c
new file mode 100644
index 0000000..56d3c4a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-ktestd-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512bw" } */
+/* { dg-final { scan-assembler-times "ktestd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k1;
+  __mmask32 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _ktestc_mask32_u8(k1, k2);
+  r = _ktestz_mask32_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-ktestq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-ktestq-1.c
new file mode 100644
index 0000000..3d91132
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-ktestq-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512bw" } */
+/* { dg-final { scan-assembler-times "ktestq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k1;
+  __mmask64 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _ktestc_mask64_u8(k1, k2);
+  r = _ktestz_mask64_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kortestb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kortestb-1.c
new file mode 100644
index 0000000..b743d60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kortestb-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512dq" } */
+/* { dg-final { scan-assembler-times "kortestb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test () {
+  volatile __mmask8 k1;
+  __mmask8 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _kortestc_mask8_u8(k1, k2);
+  r = _kortestz_mask8_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-ktestb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-ktestb-1.c
new file mode 100644
index 0000000..4e13fd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-ktestb-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512dq" } */
+/* { dg-final { scan-assembler-times "ktestb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test () {
+  volatile __mmask8 k1;
+  __mmask8 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _ktestc_mask8_u8(k1, k2);
+  r = _ktestz_mask8_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kortestw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kortestw-1.c
index af6f5f1..7084ada 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-kortestw-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kortestw-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O0 -mavx512f" } */
-/* { dg-final { scan-assembler-times "kortestw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)"  4 } } */
+/* { dg-final { scan-assembler-times "kortestw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 6 } } */
 
 #include <immintrin.h>
 
@@ -19,4 +19,9 @@ avx512f_test () {
 
   r = _mm512_kortestc (k3, k4);
   r = _mm512_kortestz (k3, k4);
+
+  volatile unsigned char r1 __attribute__((unused));	
+
+  r1 = _kortestc_mask16_u8(k1, k2);
+  r1 = _kortestz_mask16_u8(k1, k2);
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-ktestw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-1.c
new file mode 100644
index 0000000..f6151d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512f" } */
+/* { dg-final { scan-assembler-times "ktestw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test () {
+  volatile __mmask16 k1;
+  __mmask16 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _ktestc_mask16_u8(k1, k2);
+  r = _ktestz_mask16_u8(k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kortestd-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kortestd-2.c
new file mode 100644
index 0000000..741bbbf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kortestd-2.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+void
+avx512bw_test ()
+{
+  volatile __mmask32 k1, k2;
+  unsigned char r1, r2;
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (0) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (-1) );
+
+  r1 = _kortest_mask32_u8(k1, k2, &r2);
+
+  if ( r1 != 0 || r2 != 1 )
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kortestq-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kortestq-2.c
new file mode 100644
index 0000000..9efaac2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kortestq-2.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  unsigned char r1, r2;
+
+  k1 = _cvtu64_mask64(0);
+  k2 = _cvtu64_mask64(-1);
+
+  r1 = _kortest_mask64_u8(k1, k2, &r2);
+
+  if (r1 != 0 || r2 != 1)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-ktestd-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-ktestd-2.c
new file mode 100644
index 0000000..d931f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-ktestd-2.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+void
+avx512bw_test ()
+{
+  volatile __mmask32 k1, k2;
+  unsigned char r1, r2;
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (0) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (-1) );
+
+  r1 = _ktest_mask32_u8(k1, k2, &r2);
+
+  if (r1 != 1 || r2 != 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-ktestq-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-ktestq-2.c
new file mode 100644
index 0000000..518d829
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-ktestq-2.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  unsigned char r1, r2;
+
+  k1 = _cvtu64_mask64(0);
+  k2 = _cvtu64_mask64(-1);
+
+  r1 = _ktest_mask64_u8(k1, k2, &r2);
+
+  if (r1 != 1 || r2 != 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kortestb-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-kortestb-2.c
new file mode 100644
index 0000000..b71346a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kortestb-2.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512dq" } */
+/* { dg-require-effective-target avx512dq } */
+
+#include "avx512dq-check.h"
+
+void
+avx512dq_test ()
+{
+  volatile __mmask8 k1, k2;
+  unsigned char r1, r2;
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (0) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (-1) );
+
+  r1 = _kortest_mask8_u8(k1, k2, &r2);
+
+  if (r1 != 0 || r2 != 1)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-ktestb-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-ktestb-2.c
new file mode 100644
index 0000000..0c6e7c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-ktestb-2.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512dq" } */
+/* { dg-require-effective-target avx512dq } */
+
+#include "avx512dq-check.h"
+
+void
+avx512dq_test ()
+{
+  volatile __mmask8 k1, k2;
+  unsigned char r1, r2;
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (0) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (-1) );
+
+  r1 = _ktest_mask8_u8(k1, k2, &r2);
+
+  if (r1 != 1 || r2 != 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kortestw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-kortestw-2.c
index 4b9cadc..d2a56e4 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-kortestw-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kortestw-2.c
@@ -9,6 +9,8 @@ avx512f_test () {
   volatile __mmask16 k1;
   __mmask16 k2;
   volatile short r = 0;
+  volatile unsigned char r1 = 0;
+  unsigned char r2;
 
   /* Test kortestc.  */
   __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (0) );
@@ -50,4 +52,11 @@ avx512f_test () {
   r += _mm512_kortestz (k1, k2);
   if (!r)
     abort ();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (0) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (-1) );
+
+  r1 = _kortest_mask16_u8 (k1, k2, &r2);
+  if (r1 != 0 || r2 != 1)
+    abort ();
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c
new file mode 100644
index 0000000..6602c7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+
+void
+avx512f_test ()
+{
+  volatile __mmask16 k1, k2;
+  unsigned char r1, r2;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (0) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (-1) );
+
+  r1 = _ktest_mask16_u8(k1, k2, &r2);
+
+  if (r1 != 1 || r2 != 0)
+    abort ();
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-20 20:07                                                     ` Andrew Senkevich
@ 2017-01-21  8:25                                                       ` Richard Biener
  2017-01-23 11:33                                                       ` Kirill Yukhin
  2017-01-26  9:38                                                       ` Thomas Schwinge
  2 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2017-01-21  8:25 UTC (permalink / raw)
  To: Andrew Senkevich, Kirill Yukhin; +Cc: Uros Bizjak, Jakub Jelinek, GCC Patches

On January 20, 2017 9:03:53 PM GMT+01:00, Andrew Senkevich <andrew.n.senkevich@gmail.com> wrote:
>2017-01-20 20:08 GMT+03:00 Kirill Yukhin <kirill.yukhin@gmail.com>:
>> Hi,
>> On 20 Jan 14:46, Uros Bizjak wrote:
>>> On Fri, Jan 20, 2017 at 2:32 PM, Andrew Senkevich
>>> <andrew.n.senkevich@gmail.com> wrote:
>>>
>>> > here is intrinsics for ktest{b,w,d,q} and kortest{b,w,d,q}. Is it
>Ok?
>>> >
>>> > gcc/
>>> >     * config/i386/avx512bwintrin.h: Add k-mask test, kortest
>intrinsics.
>>> >     * config/i386/avx512dqintrin.h: Ditto.
>>> >     * config/i386/avx512fintrin.h: Ditto.
>>> >     * gcc/config/i386/i386.c: Handle new builtins.
>>> >     * config/i386/i386-builtin.def: Add new builtins.
>>> >     * config/i386/sse.md (ktest<mode>, kortest<mode>): New.
>>> >     (UNSPEC_KORTEST, UNSPEC_KTEST): New.
>>> >
>>> > gcc/testsuite/
>>> >     * gcc.target/i386/avx512bw-ktestd-1.c: New test.
>>> >     * gcc.target/i386/avx512bw-ktestq-1.c: Ditto.
>>> >     * gcc.target/i386/avx512dq-ktestb-1.c: Ditto.
>>> >     * gcc.target/i386/avx512f-ktestw-1.c: Ditto.
>>> >     * gcc.target/i386/avx512bw-kortestd-1.c: Ditto.
>>> >     * gcc.target/i386/avx512bw-kortestq-1.c: Ditto.
>>> >     * gcc.target/i386/avx512dq-kortestb-1.c: Ditto.
>>> >     * gcc.target/i386/avx512f-kortestw-1.c: Ditto.
>>>
>>> IMO, you should add some runtime tests.
>> +1
>>
>>> Otherwise, the patch LGTM, but I'l leave the final approval to
>Kirill.
>> Anyway trunk is frozen, so I suppose you'll need OK from RM.
>
>Kirill, attached with runtime tests.
>
>Richard, are you OK to approve commit of this patch?

OK.  Note trunk is not frozen, it's operated in release branch mode now.

Richard.

>It is last part of k-mask intrinsics, it would be great to have all
>intrinsics of this type available in single GCC release..
>
>Updated changelog:
>
>gcc/
>   * config/i386/avx512bwintrin.h: Add k-mask test, kortest intrinsics.
>    * config/i386/avx512dqintrin.h: Ditto.
>    * config/i386/avx512fintrin.h: Ditto.
>    * gcc/config/i386/i386.c: Handle new builtins.
>    * config/i386/i386-builtin.def: Add new builtins.
>    * config/i386/sse.md (ktest<mode>, kortest<mode>): New.
>    (UNSPEC_KORTEST, UNSPEC_KTEST): New.
>
>gcc/testsuite/
>    * gcc.target/i386/avx512bw-ktestd-1.c: New test.
>    * gcc.target/i386/avx512bw-ktestq-1.c: Ditto.
>    * gcc.target/i386/avx512dq-ktestb-1.c: Ditto.
>    * gcc.target/i386/avx512f-ktestw-1.c: Ditto.
>    * gcc.target/i386/avx512bw-kortestd-1.c: Ditto.
>    * gcc.target/i386/avx512bw-kortestq-1.c: Ditto.
>    * gcc.target/i386/avx512dq-kortestb-1.c: Ditto.
>    * gcc.target/i386/avx512f-kortestw-1.c: Ditto.
>    * gcc.target/i386/avx512bw-ktestd-2.c: Ditt
>    * gcc.target/i386/avx512bw-ktestq-2.c: Ditto.
>    * gcc.target/i386/avx512dq-ktestb-2.c: Ditto.
>    * gcc.target/i386/avx512f-ktestw-2.c: Ditto.
>    * gcc.target/i386/avx512bw-kortestd-2.c: Ditto.
>    * gcc.target/i386/avx512bw-kortestq-2.c: Ditto.
>    * gcc.target/i386/avx512dq-kortestb-2.c: Ditto.
>    * gcc.target/i386/avx512f-kortestw-2.c: Ditto.
>
>
>--
>WBR,
>Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-20 20:07                                                     ` Andrew Senkevich
  2017-01-21  8:25                                                       ` Richard Biener
@ 2017-01-23 11:33                                                       ` Kirill Yukhin
  2017-01-26  9:38                                                       ` Thomas Schwinge
  2 siblings, 0 replies; 48+ messages in thread
From: Kirill Yukhin @ 2017-01-23 11:33 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Richard Biener, Uros Bizjak, Jakub Jelinek, GCC Patches

On 20 Jan 23:03, Andrew Senkevich wrote:
> 2017-01-20 20:08 GMT+03:00 Kirill Yukhin <kirill.yukhin@gmail.com>:
> > Hi,
> > On 20 Jan 14:46, Uros Bizjak wrote:
> >> On Fri, Jan 20, 2017 at 2:32 PM, Andrew Senkevich
> >> <andrew.n.senkevich@gmail.com> wrote:
> >>
> >> > here is intrinsics for ktest{b,w,d,q} and kortest{b,w,d,q}. Is it Ok?
> >> >
> >> > gcc/
> >> >     * config/i386/avx512bwintrin.h: Add k-mask test, kortest intrinsics.
> >> >     * config/i386/avx512dqintrin.h: Ditto.
> >> >     * config/i386/avx512fintrin.h: Ditto.
> >> >     * gcc/config/i386/i386.c: Handle new builtins.
> >> >     * config/i386/i386-builtin.def: Add new builtins.
> >> >     * config/i386/sse.md (ktest<mode>, kortest<mode>): New.
> >> >     (UNSPEC_KORTEST, UNSPEC_KTEST): New.
> >> >
> >> > gcc/testsuite/
> >> >     * gcc.target/i386/avx512bw-ktestd-1.c: New test.
> >> >     * gcc.target/i386/avx512bw-ktestq-1.c: Ditto.
> >> >     * gcc.target/i386/avx512dq-ktestb-1.c: Ditto.
> >> >     * gcc.target/i386/avx512f-ktestw-1.c: Ditto.
> >> >     * gcc.target/i386/avx512bw-kortestd-1.c: Ditto.
> >> >     * gcc.target/i386/avx512bw-kortestq-1.c: Ditto.
> >> >     * gcc.target/i386/avx512dq-kortestb-1.c: Ditto.
> >> >     * gcc.target/i386/avx512f-kortestw-1.c: Ditto.
> >>
> >> IMO, you should add some runtime tests.
> > +1
> >
> >> Otherwise, the patch LGTM, but I'l leave the final approval to Kirill.
> > Anyway trunk is frozen, so I suppose you'll need OK from RM.
>
> Kirill, attached with runtime tests.
>
> Richard, are you OK to approve commit of this patch?
> It is last part of k-mask intrinsics, it would be great to have all
> intrinsics of this type available in single GCC release..
OK for main trunk. I'll check it in.

--
Thanks, K

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-20 20:07                                                     ` Andrew Senkevich
  2017-01-21  8:25                                                       ` Richard Biener
  2017-01-23 11:33                                                       ` Kirill Yukhin
@ 2017-01-26  9:38                                                       ` Thomas Schwinge
  2017-01-26 10:04                                                         ` Uros Bizjak
  2017-01-26 10:51                                                         ` Kirill Yukhin
  2 siblings, 2 replies; 48+ messages in thread
From: Thomas Schwinge @ 2017-01-26  9:38 UTC (permalink / raw)
  To: Andrew Senkevich, Kirill Yukhin
  Cc: Uros Bizjak, Jakub Jelinek, GCC Patches, Richard Biener

Hi!

On Fri, 20 Jan 2017 23:03:53 +0300, Andrew Senkevich <andrew.n.senkevich@gmail.com> wrote:
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c
> new file mode 100644
> index 0000000..6602c7a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx512f" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +#include "avx512f-check.h"
> +
> +void
> +avx512f_test ()
> +{
> +  volatile __mmask16 k1, k2;
> +  unsigned char r1, r2;
> +
> +  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (0) );
> +  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (-1) );
> +
> +  r1 = _ktest_mask16_u8(k1, k2, &r2);
> +
> +  if (r1 != 1 || r2 != 0)
> +    abort ();
> +}

I see:

    {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
    {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}

... because of:

    /tmp/ccjv3mX2.s: Assembler messages:
    /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
    compiler exited with status 1


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26  9:38                                                       ` Thomas Schwinge
@ 2017-01-26 10:04                                                         ` Uros Bizjak
  2017-01-26 10:51                                                         ` Kirill Yukhin
  1 sibling, 0 replies; 48+ messages in thread
From: Uros Bizjak @ 2017-01-26 10:04 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Andrew Senkevich, Kirill Yukhin, Jakub Jelinek, GCC Patches,
	Richard Biener

On Thu, Jan 26, 2017 at 10:14 AM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi!
>
> On Fri, 20 Jan 2017 23:03:53 +0300, Andrew Senkevich <andrew.n.senkevich@gmail.com> wrote:
>> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c
>> new file mode 100644
>> index 0000000..6602c7a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -mavx512f" } */
>> +/* { dg-require-effective-target avx512f } */
>> +
>> +#include "avx512f-check.h"
>> +
>> +void
>> +avx512f_test ()
>> +{
>> +  volatile __mmask16 k1, k2;
>> +  unsigned char r1, r2;
>> +
>> +  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (0) );
>> +  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (-1) );
>> +
>> +  r1 = _ktest_mask16_u8(k1, k2, &r2);
>> +
>> +  if (r1 != 1 || r2 != 0)
>> +    abort ();
>> +}
>
> I see:
>
>     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
>     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
>
> ... because of:
>
>     /tmp/ccjv3mX2.s: Assembler messages:
>     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
>     compiler exited with status 1

The problem is with __builtin_ia32_ktesthi (and __builtin_ia32_kaddhi)
intrinsics. These should be enabled only with AVX512DQ, since
corresponding insns are available in AVX512DQ ISA extension.

Andrew, can you please adjust builtins, instruction patterns,
intrinsics and testcases? Also, can you please review if there are any
other inconsistencies w.r.t. ISA throughout mask intrinsics?

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26  9:38                                                       ` Thomas Schwinge
  2017-01-26 10:04                                                         ` Uros Bizjak
@ 2017-01-26 10:51                                                         ` Kirill Yukhin
  2017-01-26 10:54                                                           ` Jakub Jelinek
  2017-01-26 11:53                                                           ` Thomas Schwinge
  1 sibling, 2 replies; 48+ messages in thread
From: Kirill Yukhin @ 2017-01-26 10:51 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Andrew Senkevich, Uros Bizjak, Jakub Jelinek, GCC Patches,
	Richard Biener

Hello Thomas,
On 26 Jan 10:14, Thomas Schwinge wrote:
> I see:
>
>     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
>     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
>
> ... because of:
>
>     /tmp/ccjv3mX2.s: Assembler messages:
>     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
>     compiler exited with status 1
Which version of gas do you use?
It should be OK since v2.25.

--
Thanks, K
>
>
> Grüße
>  Thomas

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26 10:51                                                         ` Kirill Yukhin
@ 2017-01-26 10:54                                                           ` Jakub Jelinek
  2017-01-26 10:55                                                             ` Uros Bizjak
  2017-01-26 11:53                                                           ` Thomas Schwinge
  1 sibling, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2017-01-26 10:54 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Thomas Schwinge, Andrew Senkevich, Uros Bizjak, GCC Patches,
	Richard Biener

On Thu, Jan 26, 2017 at 02:44:56AM -0800, Kirill Yukhin wrote:
> Hello Thomas,
> On 26 Jan 10:14, Thomas Schwinge wrote:
> > I see:
> >
> >     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
> >     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
> >
> > ... because of:
> >
> >     /tmp/ccjv3mX2.s: Assembler messages:
> >     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
> >     compiler exited with status 1
> Which version of gas do you use?
> It should be OK since v2.25.

It is weird, because the test already has:
/* { dg-require-effective-target avx512f } */
Perhaps if there are gas versions with partial avx512f support, we need
to improve the avx512f effective target test.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26 10:54                                                           ` Jakub Jelinek
@ 2017-01-26 10:55                                                             ` Uros Bizjak
  2017-01-26 11:04                                                               ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Uros Bizjak @ 2017-01-26 10:55 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Kirill Yukhin, Thomas Schwinge, Andrew Senkevich, GCC Patches,
	Richard Biener

On Thu, Jan 26, 2017 at 11:51 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jan 26, 2017 at 02:44:56AM -0800, Kirill Yukhin wrote:
>> Hello Thomas,
>> On 26 Jan 10:14, Thomas Schwinge wrote:
>> > I see:
>> >
>> >     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
>> >     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
>> >
>> > ... because of:
>> >
>> >     /tmp/ccjv3mX2.s: Assembler messages:
>> >     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
>> >     compiler exited with status 1
>> Which version of gas do you use?
>> It should be OK since v2.25.
>
> It is weird, because the test already has:
> /* { dg-require-effective-target avx512f } */
> Perhaps if there are gas versions with partial avx512f support, we need
> to improve the avx512f effective target test.

This is actually AVX512DQ instruction, please see [1], 3-509.

[1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26 10:55                                                             ` Uros Bizjak
@ 2017-01-26 11:04                                                               ` Jakub Jelinek
  2017-01-26 11:18                                                                 ` Uros Bizjak
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2017-01-26 11:04 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Kirill Yukhin, Thomas Schwinge, Andrew Senkevich, GCC Patches,
	Richard Biener

On Thu, Jan 26, 2017 at 11:54:52AM +0100, Uros Bizjak wrote:
> On Thu, Jan 26, 2017 at 11:51 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Thu, Jan 26, 2017 at 02:44:56AM -0800, Kirill Yukhin wrote:
> >> Hello Thomas,
> >> On 26 Jan 10:14, Thomas Schwinge wrote:
> >> > I see:
> >> >
> >> >     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
> >> >     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
> >> >
> >> > ... because of:
> >> >
> >> >     /tmp/ccjv3mX2.s: Assembler messages:
> >> >     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
> >> >     compiler exited with status 1
> >> Which version of gas do you use?
> >> It should be OK since v2.25.
> >
> > It is weird, because the test already has:
> > /* { dg-require-effective-target avx512f } */
> > Perhaps if there are gas versions with partial avx512f support, we need
> > to improve the avx512f effective target test.
> 
> This is actually AVX512DQ instruction, please see [1], 3-509.
> 
> [1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf

You're right.  But then the tests should be named avx512dq-ktestw-{1,2}.c,
should use -mavx512dq, avx512dq effective target etc. and indeed the
intrinsics shouldn't be in avx512fintrin.h header, but dq, and should not be
enabled for f, but only dq.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26 11:04                                                               ` Jakub Jelinek
@ 2017-01-26 11:18                                                                 ` Uros Bizjak
  0 siblings, 0 replies; 48+ messages in thread
From: Uros Bizjak @ 2017-01-26 11:18 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Kirill Yukhin, Thomas Schwinge, Andrew Senkevich, GCC Patches,
	Richard Biener

On Thu, Jan 26, 2017 at 12:00 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jan 26, 2017 at 11:54:52AM +0100, Uros Bizjak wrote:
>> On Thu, Jan 26, 2017 at 11:51 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Thu, Jan 26, 2017 at 02:44:56AM -0800, Kirill Yukhin wrote:
>> >> Hello Thomas,
>> >> On 26 Jan 10:14, Thomas Schwinge wrote:
>> >> > I see:
>> >> >
>> >> >     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
>> >> >     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
>> >> >
>> >> > ... because of:
>> >> >
>> >> >     /tmp/ccjv3mX2.s: Assembler messages:
>> >> >     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
>> >> >     compiler exited with status 1
>> >> Which version of gas do you use?
>> >> It should be OK since v2.25.
>> >
>> > It is weird, because the test already has:
>> > /* { dg-require-effective-target avx512f } */
>> > Perhaps if there are gas versions with partial avx512f support, we need
>> > to improve the avx512f effective target test.
>>
>> This is actually AVX512DQ instruction, please see [1], 3-509.
>>
>> [1] https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf
>
> You're right.  But then the tests should be named avx512dq-ktestw-{1,2}.c,
> should use -mavx512dq, avx512dq effective target etc. and indeed the
> intrinsics shouldn't be in avx512fintrin.h header, but dq, and should not be
> enabled for f, but only dq.

Yes, all this is needed to fix this oversight (and one more with
kaddw), as I proposed a couple of messages earlier.

Uros.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26 10:51                                                         ` Kirill Yukhin
  2017-01-26 10:54                                                           ` Jakub Jelinek
@ 2017-01-26 11:53                                                           ` Thomas Schwinge
  2017-01-26 12:04                                                             ` Kirill Yukhin
  1 sibling, 1 reply; 48+ messages in thread
From: Thomas Schwinge @ 2017-01-26 11:53 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Andrew Senkevich, Uros Bizjak, Jakub Jelinek, GCC Patches,
	Richard Biener

Hi!

On Thu, 26 Jan 2017 02:44:56 -0800, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> On 26 Jan 10:14, Thomas Schwinge wrote:
> > I see:
> >
> >     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
> >     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
> >
> > ... because of:
> >
> >     /tmp/ccjv3mX2.s: Assembler messages:
> >     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
> >     compiler exited with status 1
> Which version of gas do you use?

A rather old one on that Ubuntu 12.10 system:

    $ as --version
    GNU assembler (GNU Binutils for Ubuntu) 2.22.90.20120924
    [...]

> It should be OK since v2.25.

OK, but as done for other tests, for older versions such testing then
should be UNSUPPORTED instead of FAIL/UNRESOLVED (as long as that is
practicable, which has already been described how to do, as I understand
the other messages).


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26 11:53                                                           ` Thomas Schwinge
@ 2017-01-26 12:04                                                             ` Kirill Yukhin
  2017-01-26 12:17                                                               ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Kirill Yukhin @ 2017-01-26 12:04 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Andrew Senkevich, Uros Bizjak, Jakub Jelinek, GCC Patches,
	Richard Biener

Hi,
On 26 Jan 12:49, Thomas Schwinge wrote:
> Hi!
>
> On Thu, 26 Jan 2017 02:44:56 -0800, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> > On 26 Jan 10:14, Thomas Schwinge wrote:
> > > I see:
> > >
> > >     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
> > >     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
> > >
> > > ... because of:
> > >
> > >     /tmp/ccjv3mX2.s: Assembler messages:
> > >     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
> > >     compiler exited with status 1
> > Which version of gas do you use?
>
> A rather old one on that Ubuntu 12.10 system:
>
>     $ as --version
>     GNU assembler (GNU Binutils for Ubuntu) 2.22.90.20120924
>     [...]
>
> > It should be OK since v2.25.
>
> OK, but as done for other tests, for older versions such testing then
> should be UNSUPPORTED instead of FAIL/UNRESOLVED (as long as that is
> practicable, which has already been described how to do, as I understand
> the other messages).
This is a bug as Uroš properly mentioned. Will fix.

--
Thanks, K

>
>
> Grüße
>  Thomas

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26 12:04                                                             ` Kirill Yukhin
@ 2017-01-26 12:17                                                               ` Jakub Jelinek
  2017-01-26 12:23                                                                 ` Kirill Yukhin
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2017-01-26 12:17 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Thomas Schwinge, Andrew Senkevich, Uros Bizjak, GCC Patches,
	Richard Biener

On Thu, Jan 26, 2017 at 03:53:44AM -0800, Kirill Yukhin wrote:
> Hi,
> On 26 Jan 12:49, Thomas Schwinge wrote:
> > Hi!
> >
> > On Thu, 26 Jan 2017 02:44:56 -0800, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> > > On 26 Jan 10:14, Thomas Schwinge wrote:
> > > > I see:
> > > >
> > > >     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
> > > >     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
> > > >
> > > > ... because of:
> > > >
> > > >     /tmp/ccjv3mX2.s: Assembler messages:
> > > >     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
> > > >     compiler exited with status 1
> > > Which version of gas do you use?
> >
> > A rather old one on that Ubuntu 12.10 system:
> >
> >     $ as --version
> >     GNU assembler (GNU Binutils for Ubuntu) 2.22.90.20120924
> >     [...]
> >
> > > It should be OK since v2.25.
> >
> > OK, but as done for other tests, for older versions such testing then
> > should be UNSUPPORTED instead of FAIL/UNRESOLVED (as long as that is
> > practicable, which has already been described how to do, as I understand
> > the other messages).
> This is a bug as Uroš properly mentioned. Will fix.

Like this?  Tested on x86_64-linux.  Ok for trunk?

2017-01-26  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/avx512fintrin.h (_ktest_mask16_u8,
	_ktestz_mask16_u8, _ktestc_mask16_u8, _kadd_mask16): Move to ...
	* config/i386/avx512dqintrin.h (_ktest_mask16_u8,
	_ktestz_mask16_u8, _ktestc_mask16_u8, _kadd_mask16): ... here.
	* config/i386/i386-builtin.def (__builtin_ia32_ktestchi,
	__builtin_ia32_ktestzhi, __builtin_ia32_kaddhi): Use
	OPTION_MASK_ISA_AVX512DQ instead of OPTION_MASK_ISA_AVX512F.
	* config/i386/sse.md (SWI1248_AVX512BWDQ2): New mode iterator.
	(kadd<mode>, ktest<mode>): Use it instead of SWI1248_AVX512BWDQ.
testsuite/
	* gcc.target/i386/avx512f-kaddw-1.c: Renamed to ...
	* gcc.target/i386/avx512dq-kaddw-1.c: ... this.  New test.  Replace
	avx512f with avx512dq.
	* gcc.target/i386/avx512f-ktestw-1.c: Renamed to ...
	* gcc.target/i386/avx512dq-ktestw-1.c: ... this.  New test.  Replace
	avx512f with avx512dq.
	* gcc.target/i386/avx512f-ktestw-2.c: Renamed to ...
	* gcc.target/i386/avx512dq-ktestw-2.c: ... this.  New test.  Replace
	avx512f with avx512dq.

--- gcc/config/i386/avx512fintrin.h.jj	2017-01-23 18:09:48.000000000 +0100
+++ gcc/config/i386/avx512fintrin.h	2017-01-26 12:40:10.187825569 +0100
@@ -10008,28 +10008,6 @@ _mm512_maskz_expandloadu_epi32 (__mmask1
 
 extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktest_mask16_u8  (__mmask16 __A,  __mmask16 __B, unsigned char *__CF)
-{
-  *__CF = (unsigned char) __builtin_ia32_ktestchi (__A, __B);
-  return (unsigned char) __builtin_ia32_ktestzhi (__A, __B);
-}
-
-extern __inline unsigned char
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
-{
-  return (unsigned char) __builtin_ia32_ktestzhi (__A, __B);
-}
-
-extern __inline unsigned char
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_ktestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
-{
-  return (unsigned char) __builtin_ia32_ktestchi (__A, __B);
-}
-
-extern __inline unsigned char
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kortest_mask16_u8  (__mmask16 __A,  __mmask16 __B, unsigned char *__CF)
 {
   *__CF = (unsigned char) __builtin_ia32_kortestchi (__A, __B);
@@ -10052,13 +10030,6 @@ _kortestc_mask16_u8 (__mmask16 __A, __mm
 						    (__mmask16) __B);
 }
 
-extern __inline __mmask16
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_kadd_mask16 (__mmask16 __A, __mmask16 __B)
-{
-  return (__mmask16) __builtin_ia32_kaddhi ((__mmask16) __A, (__mmask16) __B);
-}
-
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask16_u32 (__mmask16 __A)
--- gcc/config/i386/avx512dqintrin.h.jj	2017-01-23 18:09:48.000000000 +0100
+++ gcc/config/i386/avx512dqintrin.h	2017-01-26 12:41:26.825839239 +0100
@@ -58,6 +58,28 @@ _ktestc_mask8_u8 (__mmask8 __A, __mmask8
 
 extern __inline unsigned char
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktest_mask16_u8  (__mmask16 __A,  __mmask16 __B, unsigned char *__CF)
+{
+  *__CF = (unsigned char) __builtin_ia32_ktestchi (__A, __B);
+  return (unsigned char) __builtin_ia32_ktestzhi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestzhi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_ktestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_ktestchi (__A, __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _kortest_mask8_u8  (__mmask8 __A,  __mmask8 __B, unsigned char *__CF)
 {
   *__CF = (unsigned char) __builtin_ia32_kortestcqi (__A, __B);
@@ -85,6 +107,13 @@ _kadd_mask8 (__mmask8 __A, __mmask8 __B)
   return (__mmask8) __builtin_ia32_kaddqi ((__mmask8) __A, (__mmask8) __B);
 }
 
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask16 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask16) __builtin_ia32_kaddhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 extern __inline unsigned int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _cvtmask8_u32 (__mmask8 __A)
--- gcc/config/i386/i386-builtin.def.jj	2017-01-23 18:09:48.000000000 +0100
+++ gcc/config/i386/i386-builtin.def	2017-01-26 12:35:47.564205530 +0100
@@ -1466,8 +1466,8 @@ BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FO
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kiordi, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ktestqi, "__builtin_ia32_ktestcqi", IX86_BUILTIN_KTESTC8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ktestqi, "__builtin_ia32_ktestzqi", IX86_BUILTIN_KTESTZ8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_ktesthi, "__builtin_ia32_ktestchi", IX86_BUILTIN_KTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_ktesthi, "__builtin_ia32_ktestzhi", IX86_BUILTIN_KTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ktesthi, "__builtin_ia32_ktestchi", IX86_BUILTIN_KTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ktesthi, "__builtin_ia32_ktestzhi", IX86_BUILTIN_KTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestsi, "__builtin_ia32_ktestcsi", IX86_BUILTIN_KTESTC32, UNKNOWN, (int) USI_FTYPE_USI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestsi, "__builtin_ia32_ktestzsi", IX86_BUILTIN_KTESTZ32, UNKNOWN, (int) USI_FTYPE_USI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_ktestdi, "__builtin_ia32_ktestcdi", IX86_BUILTIN_KTESTC64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
@@ -1495,7 +1495,7 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovd, "__builtin_ia32_kmovd", IX86_BUILTIN_KMOV32, UNKNOWN, (int) USI_FTYPE_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovq, "__builtin_ia32_kmovq", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kaddqi, "__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kaddhi, "__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kaddhi, "__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kaddsi, "__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int) USI_FTYPE_USI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
 
--- gcc/config/i386/sse.md.jj	2017-01-23 18:09:48.000000000 +0100
+++ gcc/config/i386/sse.md	2017-01-26 12:35:09.260698495 +0100
@@ -1302,6 +1302,11 @@ (define_mode_iterator SWI1248_AVX512BWDQ
 (define_mode_iterator SWI1248_AVX512BW
   [QI HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
 
+;; All integer modes with AVX512BW/DQ, even HImode requires DQ.
+(define_mode_iterator SWI1248_AVX512BWDQ2
+  [(QI "TARGET_AVX512DQ") (HI "TARGET_AVX512DQ")
+   (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
+
 (define_expand "kmov<mskmodesuffix>"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "nonimmediate_operand")
 	(match_operand:SWI1248_AVX512BWDQ 1 "nonimmediate_operand"))]
@@ -1398,10 +1403,10 @@ (define_insn "knot<mode>"
 	   (const_string "<MODE>")))])
 
 (define_insn "kadd<mode>"
-  [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
-	(plus:SWI1248_AVX512BWDQ
-	  (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")
-	  (match_operand:SWI1248_AVX512BWDQ 2 "register_operand" "k")))
+  [(set (match_operand:SWI1248_AVX512BWDQ2 0 "register_operand" "=k")
+	(plus:SWI1248_AVX512BWDQ2
+	  (match_operand:SWI1248_AVX512BWDQ2 1 "register_operand" "k")
+	  (match_operand:SWI1248_AVX512BWDQ2 2 "register_operand" "k")))
    (unspec [(const_int 0)] UNSPEC_MASKOP)]
   "TARGET_AVX512F"
   "kadd<mskmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
@@ -1427,8 +1432,8 @@ (define_insn "k<code><mode>"
 (define_insn "ktest<mode>"
   [(set (reg:CC FLAGS_REG)
 	(unspec:CC
-	  [(match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "k")
-	   (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")]
+	  [(match_operand:SWI1248_AVX512BWDQ2 0 "register_operand" "k")
+	   (match_operand:SWI1248_AVX512BWDQ2 1 "register_operand" "k")]
 	  UNSPEC_KTEST))]
   "TARGET_AVX512F"
   "ktest<mskmodesuffix>\t{%1, %0|%0, %1}"
--- gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c.jj	2016-12-17 20:09:36.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c	2017-01-26 12:28:53.253553230 +0100
@@ -1,12 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "kaddw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
-
-#include <immintrin.h>
-
-void
-avx512f_test ()
-{
-  __mmask16 k = _kadd_mask16 (11, 12);
-  asm volatile ("" : "+k" (k));
-}
--- gcc/testsuite/gcc.target/i386/avx512dq-kaddw-1.c.jj	2017-01-26 12:29:26.760119756 +0100
+++ gcc/testsuite/gcc.target/i386/avx512dq-kaddw-1.c	2017-01-26 12:29:43.395904539 +0100
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kaddw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask16 k = _kadd_mask16 (11, 12);
+  asm volatile ("" : "+k" (k));
+}
--- gcc/testsuite/gcc.target/i386/avx512f-ktestw-1.c.jj	2017-01-23 18:09:35.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/avx512f-ktestw-1.c	2017-01-26 12:29:17.170243820 +0100
@@ -1,16 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O0 -mavx512f" } */
-/* { dg-final { scan-assembler-times "ktestw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
-
-#include <immintrin.h>
-
-void
-avx512f_test () {
-  volatile __mmask16 k1;
-  __mmask16 k2;
-
-  volatile unsigned char r __attribute__((unused));	
-
-  r = _ktestc_mask16_u8(k1, k2);
-  r = _ktestz_mask16_u8(k1, k2);
-}
--- gcc/testsuite/gcc.target/i386/avx512dq-ktestw-1.c.jj	2017-01-26 12:29:53.362775598 +0100
+++ gcc/testsuite/gcc.target/i386/avx512dq-ktestw-1.c	2017-01-26 12:30:07.344594716 +0100
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx512dq" } */
+/* { dg-final { scan-assembler-times "ktestw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test () {
+  volatile __mmask16 k1;
+  __mmask16 k2;
+
+  volatile unsigned char r __attribute__((unused));	
+
+  r = _ktestc_mask16_u8(k1, k2);
+  r = _ktestz_mask16_u8(k1, k2);
+}
--- gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c.jj	2017-01-23 18:09:35.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/avx512f-ktestw-2.c	2017-01-26 12:29:15.746262242 +0100
@@ -1,20 +0,0 @@
-/* { dg-do run } */
-/* { dg-options "-O2 -mavx512f" } */
-/* { dg-require-effective-target avx512f } */
-
-#include "avx512f-check.h"
-
-void
-avx512f_test ()
-{
-  volatile __mmask16 k1, k2;
-  unsigned char r1, r2;
-
-  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (0) );
-  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (-1) );
-
-  r1 = _ktest_mask16_u8(k1, k2, &r2);
-
-  if (r1 != 1 || r2 != 0)
-    abort ();
-}
--- gcc/testsuite/gcc.target/i386/avx512dq-ktestw-2.c.jj	2017-01-26 12:29:56.526734666 +0100
+++ gcc/testsuite/gcc.target/i386/avx512dq-ktestw-2.c	2017-01-26 12:30:23.477386006 +0100
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512dq" } */
+/* { dg-require-effective-target avx512dq } */
+
+#include "avx512dq-check.h"
+
+void
+avx512dq_test ()
+{
+  volatile __mmask16 k1, k2;
+  unsigned char r1, r2;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (0) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (-1) );
+
+  r1 = _ktest_mask16_u8(k1, k2, &r2);
+
+  if (r1 != 1 || r2 != 0)
+    abort ();
+}


	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2017-01-26 12:17                                                               ` Jakub Jelinek
@ 2017-01-26 12:23                                                                 ` Kirill Yukhin
  0 siblings, 0 replies; 48+ messages in thread
From: Kirill Yukhin @ 2017-01-26 12:23 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, Andrew Senkevich, Uros Bizjak, GCC Patches,
	Richard Biener

On 26 Jan 13:05, Jakub Jelinek wrote:
> On Thu, Jan 26, 2017 at 03:53:44AM -0800, Kirill Yukhin wrote:
> > Hi,
> > On 26 Jan 12:49, Thomas Schwinge wrote:
> > > Hi!
> > >
> > > On Thu, 26 Jan 2017 02:44:56 -0800, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> > > > On 26 Jan 10:14, Thomas Schwinge wrote:
> > > > > I see:
> > > > >
> > > > >     {+FAIL: gcc.target/i386/avx512f-ktestw-2.c (test for excess errors)+}
> > > > >     {+UNRESOLVED: gcc.target/i386/avx512f-ktestw-2.c compilation failed to produce executable+}
> > > > >
> > > > > ... because of:
> > > > >
> > > > >     /tmp/ccjv3mX2.s: Assembler messages:
> > > > >     /tmp/ccjv3mX2.s:26: Error: no such instruction: `ktestw %k1,%k0'
> > > > >     compiler exited with status 1
> > > > Which version of gas do you use?
> > >
> > > A rather old one on that Ubuntu 12.10 system:
> > >
> > >     $ as --version
> > >     GNU assembler (GNU Binutils for Ubuntu) 2.22.90.20120924
> > >     [...]
> > >
> > > > It should be OK since v2.25.
> > >
> > > OK, but as done for other tests, for older versions such testing then
> > > should be UNSUPPORTED instead of FAIL/UNRESOLVED (as long as that is
> > > practicable, which has already been described how to do, as I understand
> > > the other messages).
> > This is a bug as Uroš properly mentioned. Will fix.
>
> Like this?  Tested on x86_64-linux.  Ok for trunk?
You're too fast. I did exactly the same.
OK for trunk.

--
Thanks, K

>
> 2017-01-26  Jakub Jelinek  <jakub@redhat.com>
>
> 	* config/i386/avx512fintrin.h (_ktest_mask16_u8,
> 	_ktestz_mask16_u8, _ktestc_mask16_u8, _kadd_mask16): Move to ...
> 	* config/i386/avx512dqintrin.h (_ktest_mask16_u8,
> 	_ktestz_mask16_u8, _ktestc_mask16_u8, _kadd_mask16): ... here.
> 	* config/i386/i386-builtin.def (__builtin_ia32_ktestchi,
> 	__builtin_ia32_ktestzhi, __builtin_ia32_kaddhi): Use
> 	OPTION_MASK_ISA_AVX512DQ instead of OPTION_MASK_ISA_AVX512F.
> 	* config/i386/sse.md (SWI1248_AVX512BWDQ2): New mode iterator.
> 	(kadd<mode>, ktest<mode>): Use it instead of SWI1248_AVX512BWDQ.
> testsuite/
> 	* gcc.target/i386/avx512f-kaddw-1.c: Renamed to ...
> 	* gcc.target/i386/avx512dq-kaddw-1.c: ... this.  New test.  Replace
> 	avx512f with avx512dq.
> 	* gcc.target/i386/avx512f-ktestw-1.c: Renamed to ...
> 	* gcc.target/i386/avx512dq-ktestw-1.c: ... this.  New test.  Replace
> 	avx512f with avx512dq.
> 	* gcc.target/i386/avx512f-ktestw-2.c: Renamed to ...
> 	* gcc.target/i386/avx512dq-ktestw-2.c: ... this.  New test.  Replace
> 	avx512f with avx512dq.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-11-11 15:26 ` Marc Glisse
@ 2016-11-11 18:28   ` Andrew Senkevich
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Senkevich @ 2016-11-11 18:28 UTC (permalink / raw)
  To: gcc-patches, Marc Glisse

2016-11-11 18:26 GMT+03:00 Marc Glisse <marc.glisse@inria.fr>:
> On Fri, 11 Nov 2016, Andrew Senkevich wrote:
>
>> +extern __inline __mmask32
>> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>> +_kand_mask32 (__mmask32 __A, __mmask32 __B)
>> +{
>> +  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32)
>> __B);
>> +}
>
>
> (picking one random example)
> Is a builtin really needed here? What would happen if you used
>
>   return __A & __B;
>
> ?

Good question. Looks like it also works (for this particular case).


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] Add AVX512 k-mask intrinsics
  2016-11-11 14:14 Andrew Senkevich
@ 2016-11-11 15:26 ` Marc Glisse
  2016-11-11 18:28   ` Andrew Senkevich
  0 siblings, 1 reply; 48+ messages in thread
From: Marc Glisse @ 2016-11-11 15:26 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: gcc-patches

On Fri, 11 Nov 2016, Andrew Senkevich wrote:

> +extern __inline __mmask32
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_kand_mask32 (__mmask32 __A, __mmask32 __B)
> +{
> +  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
> +}

(picking one random example)
Is a builtin really needed here? What would happen if you used

   return __A & __B;

?

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] Add AVX512 k-mask intrinsics
@ 2016-11-11 14:14 Andrew Senkevich
  2016-11-11 15:26 ` Marc Glisse
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Senkevich @ 2016-11-11 14:14 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 72026 bytes --]

Hi,

this patch adds several AVX512 intrinsics for k-mask instructions.
Also attached.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a87a17f..a3456f6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,46 @@
+2016-11-11  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+ * config/i386/avx512bwintrin.h: Add new k-mask intrinsics.
+ * config/i386/avx512dqintrin.h: Ditto.
+ * config/i386/avx512fintrin.h: Ditto.
+ * config/i386/i386-builtin-types.def (UCHAR_FTYPE_UQI_UQI_PUCHAR,
+ UCHAR_FTYPE_UHI_UHI_PUCHAR, UCHAR_FTYPE_USI_USI_PUCHAR,
+ UCHAR_FTYPE_UDI_UDI_PUCHAR, UCHAR_FTYPE_UQI_UQI, UCHAR_FTYPE_UHI_UHI,
+ UCHAR_FTYPE_USI_USI, UCHAR_FTYPE_UDI_UDI, UQI_FTYPE_UQI_INT,
+ UHI_FTYPE_UHI_INT, USI_FTYPE_USI_INT, UDI_FTYPE_UDI_INT,
+ UQI_FTYPE_UQI, USI_FTYPE_USI, UDI_FTYPE_UDI, UQI_FTYPE_UQI_UQI): New
+ function types.
+ * config/i386/i386-builtin.def (__builtin_ia32_kortest_mask8_u8qi,
+ __builtin_ia32_kortest_mask16_u8hi,
+ __builtin_ia32_kortest_mask32_u8si,
+ __builtin_ia32_kortest_mask64_u8di,
+ __builtin_ia32_kortestz_mask8_u8qi,
+ __builtin_ia32_kortestz_mask16_u8hi,
+ __builtin_ia32_kortestz_mask32_u8si,
+ __builtin_ia32_kortestz_mask64_u8di,
+ __builtin_ia32_kortestc_mask8_u8qi,
+ __builtin_ia32_kortestc_mask16_u8hi,
+ __builtin_ia32_kortestc_mask32_u8si,
+ __builtin_ia32_kortestc_mask64_u8di,
+ __builtin_ia32_kshiftliqi, __builtin_ia32_kshiftlihi,
+ __builtin_ia32_kshiftlisi, __builtin_ia32_kshiftlidi,
+ __builtin_ia32_kshiftriqi, __builtin_ia32_kshiftrihi,
+ __builtin_ia32_kshiftrisi, __builtin_ia32_kshiftridi,
+ __builtin_ia32_knotqi, __builtin_ia32_knotsi, __builtin_ia32_knotdi,
+ __builtin_ia32_korqi, __builtin_ia32_korsi, __builtin_ia32_kordi,
+ __builtin_ia32_kxnorqi, __builtin_ia32_kxnorsi,
+ __builtin_ia32_kxnordi, __builtin_ia32_kxorqi, __builtin_ia32_kxorsi,
+ __builtin_ia32_kxordi, __builtin_ia32_kaddqi, __builtin_ia32_kaddhi,
+ __builtin_ia32_kaddsi, __builtin_ia32_kadddi, __builtin_ia32_kandqi,
+ __builtin_ia32_kandsi, __builtin_ia32_kanddi, __builtin_ia32_kandnqi,
+ __builtin_ia32_kandnsi, __builtin_ia32_kandndi, __builtin_ia32_kmov8,
+ __builtin_ia32_kmov32, __builtin_ia32_kmov64): New.
+ * config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
+ * config/i386/i386.md (define_insn "kmovb"): New.
+ (define_insn "kmovd"): Ditto.
+ (define_insn "kmovq"): Ditto.
+ (define_insn "kadd<mode>"): Ditto.
+
 2016-11-10  Vladimir Makarov  <vmakarov@redhat.com>

  * target.def (additional_allocno_class_p): New.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d522e24..dfd35bf 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,55 @@
+2016-11-11  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+ * gcc.target/i386/avx512bw-kaddd-1.c: New test.
+ * gcc.target/i386/avx512bw-kaddq-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kandd-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kandnd-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kandnq-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kandq-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kmovd-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kmovd-2.c: Ditto.
+ * gcc.target/i386/avx512bw-kmovd-3.c: Ditto.
+ * gcc.target/i386/avx512bw-kmovd-4.c: Ditto.
+ * gcc.target/i386/avx512bw-kmovq-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kmovq-2.c: Ditto.
+ * gcc.target/i386/avx512bw-kmovq-3.c: Ditto.
+ * gcc.target/i386/avx512bw-kmovq-4.c: Ditto.
+ * gcc.target/i386/avx512bw-knotd-1.c: Ditto.
+ * gcc.target/i386/avx512bw-knotq-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kord-1.c: Ditto.
+ * gcc.target/i386/avx512bw-korq-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kshiftld-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kunpckdq-3.c: Ditto.
+ * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
+ * gcc.target/i386/avx512bw-kxnord-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kxnorq-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kxord-1.c: Ditto.
+ * gcc.target/i386/avx512bw-kxorq-1.c: Ditto.
+ * gcc.target/i386/avx512dq-kaddb-1.c: Ditto.
+ * gcc.target/i386/avx512dq-kandb-1.c: Ditto.
+ * gcc.target/i386/avx512dq-kandnb-1.c: Ditto.
+ * gcc.target/i386/avx512dq-kmovb-2.c: Ditto.
+ * gcc.target/i386/avx512dq-kmovb-3.c: Ditto.
+ * gcc.target/i386/avx512dq-kmovb-4.c: Ditto.
+ * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
+ * gcc.target/i386/avx512dq-knotb-1.c: Ditto.
+ * gcc.target/i386/avx512dq-korb-1.c: Ditto.
+ * gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
+ * gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
+ * gcc.target/i386/avx512dq-kxnorb-1.c: Ditto.
+ * gcc.target/i386/avx512dq-kxorb-1.c: Ditto.
+ * gcc.target/i386/avx512f-kaddw-1.c: Ditto.
+ * gcc.target/i386/avx512f-kmovw-2.c: Ditto.
+ * gcc.target/i386/avx512f-kmovw-3.c: Ditto.
+ * gcc.target/i386/avx512f-kmovw-4.c: Ditto.
+ * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
+ * gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
+ * gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
+ * gcc.target/i386/avx512f-kunpckbw-3.c: Ditto.
+
 2016-11-10  Jakub Jelinek  <jakub@redhat.com>

  * gfortran.dg/openmp-define-3.f90: Expect 201511 instead of
diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 8f03249..0829af3 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,238 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));

 typedef unsigned long long __mmask64;

+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask32_u8si ((__mmask32) __A,
+     (__mmask32) __B,
+     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask32_u8si ((__mmask32) __A,
+      (__mmask32) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask32_u8si ((__mmask32) __A,
+      (__mmask32) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask64_u8di ((__mmask64) __A,
+     (__mmask64) __B,
+     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask64_u8di ((__mmask64) __A,
+      (__mmask64) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask64_u8di ((__mmask64) __A,
+      (__mmask64) __B);
+}
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask32_u32 (__mmask32 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask64_u64 (__mmask64 __A)
+{
+  return (unsigned long long) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask32 (unsigned int __A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu64_mask64 (unsigned long long __A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask32 (__mmask32 *__A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 (*(__mmask32 *) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask64 (__mmask64 *__A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 (*(__mmask64 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask32 (__mmask32 *__A, __mmask32 __B)
+{
+  *(__mmask32 *) __A = __builtin_ia32_kmov32 (__B);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask64 (__mmask64 *__A, __mmask64 __B)
+{
+  *(__mmask64 *) __A = __builtin_ia32_kmov64 (__B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask32 (__mmask32 __A, int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask64 (__mmask64 __A, int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftlidi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask32 (__mmask32 __A, int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask64 (__mmask64 __A, int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftridi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask32 (__mmask32 __A)
+{
+  return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask64 (__mmask64 __A)
+{
+  return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kandndi ((__mmask64) __A, (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_qi (void)
@@ -138,6 +370,14 @@ _mm512_kunpackw (__mmask32 __A, __mmask32 __B)
       (__mmask32) __B);
 }

+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+      (__mmask32) __B);
+}
+
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
@@ -146,6 +386,14 @@ _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
       (__mmask64) __B);
 }

+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackd_mask64 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask64) __builtin_ia32_kunpckdi ((__mmask64) __A,
+      (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_loadu_epi8 (__m512i __W, __mmask64 __U, void const *__P)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 1dbb6b0..87681f7 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,122 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */

+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask8_u8 (__mmask8 __A, __mmask8 __B, unsigned char* __C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask8_u8qi ((__mmask8) __A,
+    (__mmask8) __B,
+    (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask8_u8qi ((__mmask8) __A,
+     (__mmask8) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask8_u8qi ((__mmask8) __A,
+     (__mmask8) __B);
+}
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask8_u32 (__mmask8 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask8 (unsigned int __A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask8 (__mmask8 *__A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 (*(__mmask8 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask8 (__mmask8 *__A, __mmask8 __B)
+{
+  *(__mmask8 *) __A = __builtin_ia32_kmov8 (__B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask8 (__mmask8 __A, int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask8 (__mmask8 __A, int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask8 (__mmask8 __A)
+{
+  return (__mmask8) __builtin_ia32_knotqi ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_korqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxnorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kaddqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandnqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_broadcast_f64x2 (__m128d __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 2372c83..8787da8 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9977,6 +9977,62 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U,
void const *__P)
 }

 /* Mask arithmetic operations */
+#define _kand_mask16 _mm512_kand
+#define _kandn_mask16 _mm512_kandn
+#define _knot_mask16 _mm512_knot
+#define _kor_mask16 _mm512_kor
+#define _kxnor_mask16 _mm512_kxnor
+#define _kxor_mask16 _mm512_kxor
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask16_u32 (__mmask16 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask16 (unsigned int __A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask16 (__mmask16 *__A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 (*(__mmask16 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask16 (__mmask16 *__A, __mmask16 __B)
+{
+  *(__mmask16 *) __A = __builtin_ia32_kmov16 (__B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask16 (__mmask16 __A, int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask16 (__mmask16 __A, int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask16 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask16) __builtin_ia32_kaddhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kand (__mmask16 __A, __mmask16 __B)
@@ -9988,7 +10044,8 @@ extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kandn (__mmask16 __A, __mmask16 __B)
 {
-  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A, (__mmask16) __B);
+  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+     (__mmask16) __B);
 }

 extern __inline __mmask16
@@ -9998,6 +10055,31 @@ _mm512_kor (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
 }

+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask16_u8 (__mmask16 __A, __mmask16 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask16_u8hi ((__mmask16) __A,
+     (__mmask16) __B,
+     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask16_u8hi ((__mmask16) __A,
+     (__mmask16) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask16_u8hi ((__mmask16) __A,
+     (__mmask16) __B);
+}
+
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kortestz (__mmask16 __A, __mmask16 __B)
@@ -10042,6 +10124,13 @@ _mm512_kunpackb (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A,
(__mmask16) __B);
 }

+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A,
(__mmask16) __B);
+}
+
 #ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/i386-builtin-types.def
b/gcc/config/i386/i386-builtin-types.def
index b34cfda..125fa94 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -139,6 +139,12 @@ DEF_POINTER_TYPE (PLONGLONG, LONGLONG)
 DEF_POINTER_TYPE (PULONGLONG, ULONGLONG)
 DEF_POINTER_TYPE (PUNSIGNED, UNSIGNED)

+DEF_POINTER_TYPE (PUQI, UQI)
+DEF_POINTER_TYPE (PUHI, UHI)
+DEF_POINTER_TYPE (PUSI, USI)
+DEF_POINTER_TYPE (PUDI, UDI)
+DEF_POINTER_TYPE (PUCHAR, UCHAR)
+
 DEF_POINTER_TYPE (PV2SI, V2SI)
 DEF_POINTER_TYPE (PV2DF, V2DF)
 DEF_POINTER_TYPE (PV2DI, V2DI)
@@ -527,7 +533,23 @@ DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, PV8DI, V8DI)

 # Instructions returning mask
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI)
+
+DEF_FUNCTION_TYPE (UQI, UQI, INT)
+DEF_FUNCTION_TYPE (UHI, UHI, INT)
+DEF_FUNCTION_TYPE (USI, USI, INT)
+DEF_FUNCTION_TYPE (UDI, UDI, INT)
+DEF_FUNCTION_TYPE (UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI)
+DEF_FUNCTION_TYPE (USI, USI)
+DEF_FUNCTION_TYPE (UDI, UDI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
 DEF_FUNCTION_TYPE (USI, V32QI)
 DEF_FUNCTION_TYPE (UDI, V64QI)
@@ -540,6 +562,7 @@ DEF_FUNCTION_TYPE (UHI, V16SI)
 DEF_FUNCTION_TYPE (UQI, V2DI)
 DEF_FUNCTION_TYPE (UQI, V4DI)
 DEF_FUNCTION_TYPE (UQI, V8DI)
+DEF_FUNCTION_TYPE (UQI, UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI, UHI)
 DEF_FUNCTION_TYPE (USI, USI, USI)
 DEF_FUNCTION_TYPE (UDI, UDI, UDI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 227526b..5dae57d 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1436,16 +1436,75 @@ BDESC (OPTION_MASK_ISA_AVX512F,
CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F,
CODE_FOR_avx512f_roundpd_vec_pack_sfix512,
"__builtin_ia32_ceilpd_vec_pack_sfix512",
IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL,
(int) V16SI_FTYPE_V8DF_V8DF_ROUND)

 /* Mask arithmetic operations */
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3,
"__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi,
"__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi,
"__builtin_ia32_kortest_mask8_u8qi", IX86_BUILTIN_KORTEST8_U8,
UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw,
"__builtin_ia32_kortest_mask16_u8hi", IX86_BUILTIN_KORTEST16_U8,
UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi,
"__builtin_ia32_kortest_mask32_u8si", IX86_BUILTIN_KORTEST32_U8,
UNKNOWN, (int) UCHAR_FTYPE_USI_USI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi,
"__builtin_ia32_kortest_mask64_u8di", IX86_BUILTIN_KORTEST64_U8,
UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI_PUCHAR)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi,
"__builtin_ia32_kortestz_mask8_u8qi", IX86_BUILTIN_KORTESTZ8_U8,
UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw,
"__builtin_ia32_kortestz_mask16_u8hi", IX86_BUILTIN_KORTESTZ16_U8,
UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi,
"__builtin_ia32_kortestz_mask32_u8si", IX86_BUILTIN_KORTESTZ32_U8,
UNKNOWN, (int) UCHAR_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi,
"__builtin_ia32_kortestz_mask64_u8di", IX86_BUILTIN_KORTESTZ64_U8,
UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi,
"__builtin_ia32_kortestc_mask8_u8qi", IX86_BUILTIN_KORTESTC8_U8,
UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw,
"__builtin_ia32_kortestc_mask16_u8hi", IX86_BUILTIN_KORTESTC16_U8,
UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi,
"__builtin_ia32_kortestc_mask32_u8si", IX86_BUILTIN_KORTESTC32_U8,
UNKNOWN, (int) UCHAR_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi,
"__builtin_ia32_kortestc_mask64_u8di", IX86_BUILTIN_KORTESTC64_U8,
UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_shiftlqi3_1,
"__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int)
UQI_FTYPE_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_shiftlhi3_1,
"__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int)
UHI_FTYPE_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftlsi3_1,
"__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int)
USI_FTYPE_USI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftldi3_1,
"__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int)
UDI_FTYPE_UDI_INT)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_shiftrqi3_1,
"__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int)
UQI_FTYPE_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_shiftrhi3_1,
"__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int)
UHI_FTYPE_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftrsi3_1,
"__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int)
USI_FTYPE_USI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftrdi3_1,
"__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int)
UDI_FTYPE_UDI_INT)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_one_cmplqi2,
"__builtin_ia32_knotqi", IX86_BUILTIN_KNOT8, UNKNOWN, (int)
UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_one_cmplhi2,
"__builtin_ia32_knothi", IX86_BUILTIN_KNOT16, UNKNOWN, (int)
UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_one_cmplsi2,
"__builtin_ia32_knotsi", IX86_BUILTIN_KNOT32, UNKNOWN, (int)
USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_one_cmpldi2,
"__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int)
UDI_FTYPE_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_iorqi3,
"__builtin_ia32_korqi", IX86_BUILTIN_KOR8, UNKNOWN, (int)
UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_iorhi3,
"__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_iorsi3,
"__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int)
USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_iordi3,
"__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int)
UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxnorqi,
"__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int)
UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi,
"__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnorsi,
"__builtin_ia32_kxnorsi", IX86_BUILTIN_KXNOR32, UNKNOWN, (int)
USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnordi,
"__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int)
UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_xorqi3,
"__builtin_ia32_kxorqi", IX86_BUILTIN_KXOR8, UNKNOWN, (int)
UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_xorhi3,
"__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_xorsi3,
"__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int)
USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_xordi3,
"__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int)
UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kaddqi,
"__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int)
UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kaddhi,
"__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kaddsi,
"__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int)
USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kadddi,
"__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int)
UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_andqi3,
"__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int)
UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3,
"__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_andsi3,
"__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int)
USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_anddi3,
"__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int)
UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandnqi,
"__builtin_ia32_kandnqi", IX86_BUILTIN_KANDN8, UNKNOWN, (int)
UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi,
"__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandnsi,
"__builtin_ia32_kandnsi", IX86_BUILTIN_KANDN32, UNKNOWN, (int)
USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandndi,
"__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int)
UDI_FTYPE_UDI_UDI)
+
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestchi,
"__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestzhi,
"__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kunpckhi,
"__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi,
"__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_xorhi3,
"__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int)
UHI_FTYPE_UHI_UHI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kmovb,
"__builtin_ia32_kmov8", IX86_BUILTIN_KMOV8, UNKNOWN, (int)
UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw,
"__builtin_ia32_kmov16", IX86_BUILTIN_KMOV16, UNKNOWN, (int)
UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovd,
"__builtin_ia32_kmov32", IX86_BUILTIN_KMOV32, UNKNOWN, (int)
USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovq,
"__builtin_ia32_kmov64", IX86_BUILTIN_KMOV64, UNKNOWN, (int)
UDI_FTYPE_UDI)

 /* SHA */
 BDESC (OPTION_MASK_ISA_SSE2, CODE_FOR_sha1msg1, 0,
IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..fc40b86 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34638,7 +34638,12 @@ ix86_expand_args_builtin (const struct
builtin_description *d,
     case V4DI_FTYPE_V8HI:
     case V4DI_FTYPE_V4SI:
     case V4DI_FTYPE_V2DI:
+    case UQI_FTYPE_UQI:
     case UHI_FTYPE_UHI:
+    case USI_FTYPE_USI:
+//    case USI_FTYPE_UQI:
+//    case USI_FTYPE_UHI:
+    case UDI_FTYPE_UDI:
     case UHI_FTYPE_V16QI:
     case USI_FTYPE_V32QI:
     case UDI_FTYPE_V64QI:
@@ -34772,6 +34777,7 @@ ix86_expand_args_builtin (const struct
builtin_description *d,
     case UINT_FTYPE_UINT_UCHAR:
     case UINT16_FTYPE_UINT16_INT:
     case UINT8_FTYPE_UINT8_INT:
+    case UQI_FTYPE_UQI_UQI:
     case UHI_FTYPE_UHI_UHI:
     case USI_FTYPE_USI_USI:
     case UDI_FTYPE_UDI_UDI:
@@ -34819,6 +34825,10 @@ ix86_expand_args_builtin (const struct
builtin_description *d,
     case V4DI_FTYPE_V8DI_INT:
     case QI_FTYPE_V4SF_INT:
     case QI_FTYPE_V2DF_INT:
+    case UQI_FTYPE_UQI_INT:
+    case UHI_FTYPE_UHI_INT:
+    case USI_FTYPE_USI_INT:
+    case UDI_FTYPE_UDI_INT:
       nargs = 2;
       nargs_constant = 1;
       break;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a5650a1..800450e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2497,6 +2497,46 @@
    (set_attr "type" "mskmov")
    (set_attr "prefix" "vex")])

+(define_insn "kmovb"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
+ (unspec:QI
+  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
+  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
+  "@
+   kmovb\t{%k1, %0|%0, %k1}
+   kmovb\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "QI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovd"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
+ (unspec:SI
+  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
+  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
+  "@
+   kmovd\t{%k1, %0|%0, %k1}
+   kmovd\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "SI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovq"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
+ (unspec:DI
+  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
+  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
+  "@
+   kmovq\t{%k1, %0|%0, %k1}
+   kmovq\t{%1, %0|%0, %1}
+   kmovq\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "DI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+

 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,k,k, r,m")
@@ -8304,11 +8344,11 @@
    (set_attr "mode" "QI")])

 (define_insn "kandn<mode>"
-  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
- (and:SWI12
-  (not:SWI12
-    (match_operand:SWI12 1 "register_operand" "r,0,k"))
-  (match_operand:SWI12 2 "register_operand" "r,r,k")))
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+ (and:SWI1248x
+  (not:SWI1248x
+    (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+  (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_AVX512F"
 {
@@ -8319,10 +8359,50 @@
     case 1:
       return "#";
     case 2:
-      if (TARGET_AVX512DQ && <MODE>mode == QImode)
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+ return "kandnq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+ return "kandnd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
  return "kandnb\t{%2, %1, %0|%0, %1, %2}";
       else
  return "kandnw\t{%2, %1, %0|%0, %1, %2}";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kadd<mode>"
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+ (plus:SWI1248x
+  (not:SWI1248x
+    (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+  (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "add\t{%k2, %k1, %k0|%k0, %k1, %k2}";
+    case 1:
+      return "#";
+    case 2:
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+ return "kaddq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+ return "kaddd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
+ return "kaddb\t{%2, %1, %0|%0, %1, %2}";
+      else
+ return "kaddw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_unreachable ();
     }
@@ -9687,7 +9767,7 @@
 ;; shift pair, instead using moves and sign extension for counts greater
 ;; than 31.

-(define_insn "*<mshift><mode>3"
+(define_insn "<mshift><mode>3_1"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
  (any_lshift:SWI1248_AVX512BWDQ (match_operand:SWI1248_AVX512BWDQ 1
"register_operand" "k")
        (match_operand:QI 2 "immediate_operand" "i")))]
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
new file mode 100644
index 0000000..0b38850
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
new file mode 100644
index 0000000..5b7b417
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
new file mode 100644
index 0000000..2a934f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
new file mode 100644
index 0000000..6b68ab3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
new file mode 100644
index 0000000..35f1c12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
new file mode 100644
index 0000000..a1aaed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
new file mode 100644
index 0000000..a89b2d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+volatile __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _load_mask32 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
new file mode 100644
index 0000000..dcb65fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+extern __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  _store_mask32 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
new file mode 100644
index 0000000..fe5e1d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+extern unsigned int m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtmask32_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
new file mode 100644
index 0000000..8a085d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtu32_mask32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
new file mode 100644
index 0000000..51d547d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+volatile __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _load_mask64 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
new file mode 100644
index 0000000..9baf200
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+extern __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  _store_mask64 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
new file mode 100644
index 0000000..3a02d38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+extern unsigned long long m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtmask64_u64 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
new file mode 100644
index 0000000..1cc16ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned long long m1;
+extern __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtu64_mask64 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
new file mode 100644
index 0000000..dd6b6e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask32 (k1);
+  x = _mm512_mask_add_epi16 (x, k1, x, x);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
new file mode 100644
index 0000000..5b94358
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask64 (k1);
+  x = _mm512_mask_add_epi8 (x, k1, x, x);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
new file mode 100644
index 0000000..163c46e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kord\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
new file mode 100644
index 0000000..77b1b9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "korq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
new file mode 100644
index 0000000..85be9b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftld\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
new file mode 100644
index 0000000..cd5707e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
new file mode 100644
index 0000000..91b6313
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrd\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
new file mode 100644
index 0000000..c10fa4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
new file mode 100644
index 0000000..951260f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckdq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k3;
+  __mmask32 k1, k2;
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackd_mask64 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
new file mode 100644
index 0000000..c68ad8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckwd\[
\\t\]+\[^\{\n\]*%k\[1-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k3;
+  __mmask16 k1, k2;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackw_mask32 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
new file mode 100644
index 0000000..ccf4b63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnord\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
new file mode 100644
index 0000000..b9c0979
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnorq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
new file mode 100644
index 0000000..ce03ab4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxord\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
new file mode 100644
index 0000000..d6366dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxorq\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
new file mode 100644
index 0000000..a84d8ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kaddb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
new file mode 100644
index 0000000..b5b5367
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask8 (k1, k2);
+  x = _mm512_mask_add_epi64 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
new file mode 100644
index 0000000..ff50610
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandnb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
new file mode 100644
index 0000000..3832853
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+volatile __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _load_mask8 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
new file mode 100644
index 0000000..8d06674
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+extern __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  _store_mask8 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
new file mode 100644
index 0000000..2da4719
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+extern unsigned int m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _cvtmask8_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
new file mode 100644
index 0000000..d3f8c5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _cvtu32_mask8 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
new file mode 100644
index 0000000..8bb9249
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "knotb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask8 (k1);
+  x = _mm512_mask_add_pd (x, k1, x, x);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
new file mode 100644
index 0000000..22b727d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "korb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
new file mode 100644
index 0000000..422d0b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
new file mode 100644
index 0000000..f87cf74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
new file mode 100644
index 0000000..ee21aa1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxnorb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
new file mode 100644
index 0000000..63a1ff8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxorb\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
new file mode 100644
index 0000000..9faf4ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kaddw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2, k3;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
new file mode 100644
index 0000000..77c8ddc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+volatile __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  m2 = _load_mask16 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
new file mode 100644
index 0000000..740ea9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+extern __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  _store_mask16 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
new file mode 100644
index 0000000..127a4ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+extern unsigned int m2;
+
+void
+avx512f_test ()
+{
+  m2 = _cvtmask16_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
new file mode 100644
index 0000000..d729e8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  m2 = _cvtu32_mask16 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
new file mode 100644
index 0000000..7a9de12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
new file mode 100644
index 0000000..641d307
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
new file mode 100644
index 0000000..2061f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kunpckbw\[
\\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test () {
+  __mmask8 k1, k2;
+  __mmask16 k3;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackb_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}

Is it ok for trunk?


--
WBR,
Andrew

[-- Attachment #2: add_k-mask_intrinsics.patch --]
[-- Type: application/octet-stream, Size: 72076 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a87a17f..a3456f6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,46 @@
+2016-11-11  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+	* config/i386/avx512bwintrin.h: Add new k-mask intrinsics.
+	* config/i386/avx512dqintrin.h: Ditto.
+	* config/i386/avx512fintrin.h: Ditto.
+	* config/i386/i386-builtin-types.def (UCHAR_FTYPE_UQI_UQI_PUCHAR,
+	UCHAR_FTYPE_UHI_UHI_PUCHAR, UCHAR_FTYPE_USI_USI_PUCHAR,
+	UCHAR_FTYPE_UDI_UDI_PUCHAR, UCHAR_FTYPE_UQI_UQI, UCHAR_FTYPE_UHI_UHI,
+	UCHAR_FTYPE_USI_USI, UCHAR_FTYPE_UDI_UDI, UQI_FTYPE_UQI_INT,
+	UHI_FTYPE_UHI_INT, USI_FTYPE_USI_INT, UDI_FTYPE_UDI_INT,
+	UQI_FTYPE_UQI, USI_FTYPE_USI, UDI_FTYPE_UDI, UQI_FTYPE_UQI_UQI): New
+	function types.
+	* config/i386/i386-builtin.def (__builtin_ia32_kortest_mask8_u8qi,
+	__builtin_ia32_kortest_mask16_u8hi,
+	__builtin_ia32_kortest_mask32_u8si,
+	__builtin_ia32_kortest_mask64_u8di,
+	__builtin_ia32_kortestz_mask8_u8qi,
+	__builtin_ia32_kortestz_mask16_u8hi,
+	__builtin_ia32_kortestz_mask32_u8si,
+	__builtin_ia32_kortestz_mask64_u8di,
+	__builtin_ia32_kortestc_mask8_u8qi,
+	__builtin_ia32_kortestc_mask16_u8hi,
+	__builtin_ia32_kortestc_mask32_u8si,
+	__builtin_ia32_kortestc_mask64_u8di,
+	__builtin_ia32_kshiftliqi, __builtin_ia32_kshiftlihi,
+	__builtin_ia32_kshiftlisi, __builtin_ia32_kshiftlidi,
+	__builtin_ia32_kshiftriqi, __builtin_ia32_kshiftrihi,
+	__builtin_ia32_kshiftrisi, __builtin_ia32_kshiftridi,
+	__builtin_ia32_knotqi, __builtin_ia32_knotsi, __builtin_ia32_knotdi,
+	__builtin_ia32_korqi, __builtin_ia32_korsi, __builtin_ia32_kordi,
+	__builtin_ia32_kxnorqi, __builtin_ia32_kxnorsi,
+	__builtin_ia32_kxnordi, __builtin_ia32_kxorqi, __builtin_ia32_kxorsi,
+	__builtin_ia32_kxordi, __builtin_ia32_kaddqi, __builtin_ia32_kaddhi,
+	__builtin_ia32_kaddsi, __builtin_ia32_kadddi, __builtin_ia32_kandqi,
+	__builtin_ia32_kandsi, __builtin_ia32_kanddi, __builtin_ia32_kandnqi,
+	__builtin_ia32_kandnsi, __builtin_ia32_kandndi, __builtin_ia32_kmov8,
+	__builtin_ia32_kmov32, __builtin_ia32_kmov64): New.
+	* config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
+	* config/i386/i386.md (define_insn "kmovb"): New.
+	(define_insn "kmovd"): Ditto.
+	(define_insn "kmovq"): Ditto.
+	(define_insn "kadd<mode>"): Ditto.
+
 2016-11-10  Vladimir Makarov  <vmakarov@redhat.com>
 
 	* target.def (additional_allocno_class_p): New.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d522e24..dfd35bf 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,55 @@
+2016-11-11  Andrew Senkevich  <andrew.senkevich@intel.com>
+
+	* gcc.target/i386/avx512bw-kaddd-1.c: New test.
+	* gcc.target/i386/avx512bw-kaddq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandnd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandnq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kandq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-2.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovd-4.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-2.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kmovq-4.c: Ditto.
+	* gcc.target/i386/avx512bw-knotd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-knotq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-korq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftld-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kunpckdq-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
+	* gcc.target/i386/avx512bw-kxnord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxnorq-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxord-1.c: Ditto.
+	* gcc.target/i386/avx512bw-kxorq-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kaddb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kandb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kandnb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-2.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-3.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-4.c: Ditto.
+	* gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
+	* gcc.target/i386/avx512dq-knotb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-korb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kxnorb-1.c: Ditto.
+	* gcc.target/i386/avx512dq-kxorb-1.c: Ditto.
+	* gcc.target/i386/avx512f-kaddw-1.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-2.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-3.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-4.c: Ditto.
+	* gcc.target/i386/avx512f-kmovw-5.c: Ditto.
+	* gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
+	* gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.
+	* gcc.target/i386/avx512f-kunpckbw-3.c: Ditto.
+
 2016-11-10  Jakub Jelinek  <jakub@redhat.com>
 
 	* gfortran.dg/openmp-define-3.f90: Expect 201511 instead of
diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 8f03249..0829af3 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -40,6 +40,238 @@ typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 typedef unsigned long long __mmask64;
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask32_u8si ((__mmask32) __A,
+							     (__mmask32) __B,
+							     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask32_u8si ((__mmask32) __A,
+							      (__mmask32) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask32_u8si ((__mmask32) __A,
+							      (__mmask32) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask64_u8di ((__mmask64) __A,
+							     (__mmask64) __B,
+							     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask64_u8di ((__mmask64) __A,
+							      (__mmask64) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask64_u8di ((__mmask64) __A,
+							      (__mmask64) __B);
+}
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask32_u32 (__mmask32 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask64_u64 (__mmask64 __A)
+{
+  return (unsigned long long) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask32 (unsigned int __A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu64_mask64 (unsigned long long __A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask32 (__mmask32 *__A)
+{
+  return (__mmask32) __builtin_ia32_kmov32 (*(__mmask32 *) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask64 (__mmask64 *__A)
+{
+  return (__mmask64) __builtin_ia32_kmov64 (*(__mmask64 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask32 (__mmask32 *__A, __mmask32 __B)
+{
+  *(__mmask32 *) __A = __builtin_ia32_kmov32 (__B);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask64 (__mmask64 *__A, __mmask64 __B)
+{
+  *(__mmask64 *) __A = __builtin_ia32_kmov64 (__B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask32 (__mmask32 __A, int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask64 (__mmask64 __A, int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftlidi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask32 (__mmask32 __A, int __B)
+{
+  return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A, __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask64 (__mmask64 __A, int __B)
+{
+  return (__mmask64) __builtin_ia32_kshiftridi ((__mmask64) __A, __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask32 (__mmask32 __A)
+{
+  return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask64 (__mmask64 __A)
+{
+  return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask32 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B);
+}
+
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask64 (__mmask64 __A, __mmask64 __B)
+{
+  return (__mmask64) __builtin_ia32_kandndi ((__mmask64) __A, (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_qi (void)
@@ -138,6 +370,14 @@ _mm512_kunpackw (__mmask32 __A, __mmask32 __B)
 					      (__mmask32) __B);
 }
 
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackw_mask32 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A,
+					      (__mmask32) __B);
+}
+
 extern __inline __mmask64
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
@@ -146,6 +386,14 @@ _mm512_kunpackd (__mmask64 __A, __mmask64 __B)
 					      (__mmask64) __B);
 }
 
+extern __inline __mmask64
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackd_mask64 (__mmask32 __A, __mmask32 __B)
+{
+  return (__mmask64) __builtin_ia32_kunpckdi ((__mmask64) __A,
+					      (__mmask64) __B);
+}
+
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_loadu_epi8 (__m512i __W, __mmask64 __U, void const *__P)
diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 1dbb6b0..87681f7 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -34,6 +34,122 @@
 #define __DISABLE_AVX512DQ__
 #endif /* __AVX512DQ__ */
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask8_u8 (__mmask8 __A, __mmask8 __B, unsigned char* __C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask8_u8qi ((__mmask8) __A,
+							    (__mmask8) __B,
+							    (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask8_u8qi ((__mmask8) __A,
+							     (__mmask8) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask8_u8 (__mmask8 __A, __mmask8 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask8_u8qi ((__mmask8) __A,
+							     (__mmask8) __B);
+}
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask8_u32 (__mmask8 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask8 (unsigned int __A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask8 (__mmask8 *__A)
+{
+  return (__mmask8) __builtin_ia32_kmov8 (*(__mmask8 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask8 (__mmask8 *__A, __mmask8 __B)
+{
+  *(__mmask8 *) __A = __builtin_ia32_kmov8 (__B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask8 (__mmask8 __A, int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftliqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask8 (__mmask8 __A, int __B)
+{
+  return (__mmask8) __builtin_ia32_kshiftriqi ((__mmask8) __A, __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_knot_mask8 (__mmask8 __A)
+{
+  return (__mmask8) __builtin_ia32_knotqi ((__mmask8) __A);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_korqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxnor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxnorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kxor_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kxorqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kaddqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kand_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandqi ((__mmask8) __A, (__mmask8) __B);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kandn_mask8 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask8) __builtin_ia32_kandnqi ((__mmask8) __A, (__mmask8) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_broadcast_f64x2 (__m128d __A)
diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 2372c83..8787da8 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -9977,6 +9977,62 @@ _mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P)
 }
 
 /* Mask arithmetic operations */
+#define _kand_mask16 _mm512_kand
+#define _kandn_mask16 _mm512_kandn
+#define _knot_mask16 _mm512_knot
+#define _kor_mask16 _mm512_kor
+#define _kxnor_mask16 _mm512_kxnor
+#define _kxor_mask16 _mm512_kxor
+
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtmask16_u32 (__mmask16 __A)
+{
+  return (unsigned int) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_cvtu32_mask16 (unsigned int __A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 ((__mmask16 ) __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_load_mask16 (__mmask16 *__A)
+{
+  return (__mmask16) __builtin_ia32_kmov16 (*(__mmask16 *) __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_store_mask16 (__mmask16 *__A, __mmask16 __B)
+{
+  *(__mmask16 *) __A = __builtin_ia32_kmov16 (__B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftli_mask16 (__mmask16 __A, int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftlihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kshiftri_mask16 (__mmask16 __A, int __B)
+{
+  return (__mmask16) __builtin_ia32_kshiftrihi ((__mmask16) __A, __B);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kadd_mask16 (__mmask16 __A, __mmask16 __B)
+{
+  return (__mmask16) __builtin_ia32_kaddhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kand (__mmask16 __A, __mmask16 __B)
@@ -9988,7 +10044,8 @@ extern __inline __mmask16
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kandn (__mmask16 __A, __mmask16 __B)
 {
-  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A, (__mmask16) __B);
+  return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A,
+					     (__mmask16) __B);
 }
 
 extern __inline __mmask16
@@ -9998,6 +10055,31 @@ _mm512_kor (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B);
 }
 
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortest_mask16_u8 (__mmask16 __A, __mmask16 __B, unsigned char *__C)
+{
+  return (unsigned char) __builtin_ia32_kortest_mask16_u8hi ((__mmask16) __A,
+							     (__mmask16) __B,
+							     (unsigned char *) __C);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestz_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestz_mask16_u8hi ((__mmask16) __A,
+							     (__mmask16) __B);
+}
+
+extern __inline unsigned char
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kortestc_mask16_u8 (__mmask16 __A, __mmask16 __B)
+{
+  return (unsigned char) __builtin_ia32_kortestc_mask16_u8hi ((__mmask16) __A,
+							     (__mmask16) __B);
+}
+
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_kortestz (__mmask16 __A, __mmask16 __B)
@@ -10042,6 +10124,13 @@ _mm512_kunpackb (__mmask16 __A, __mmask16 __B)
   return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
 }
 
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_kunpackb_mask16 (__mmask8 __A, __mmask8 __B)
+{
+  return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B);
+}
+
 #ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index b34cfda..125fa94 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -139,6 +139,12 @@ DEF_POINTER_TYPE (PLONGLONG, LONGLONG)
 DEF_POINTER_TYPE (PULONGLONG, ULONGLONG)
 DEF_POINTER_TYPE (PUNSIGNED, UNSIGNED)
 
+DEF_POINTER_TYPE (PUQI, UQI)
+DEF_POINTER_TYPE (PUHI, UHI)
+DEF_POINTER_TYPE (PUSI, USI)
+DEF_POINTER_TYPE (PUDI, UDI)
+DEF_POINTER_TYPE (PUCHAR, UCHAR)
+
 DEF_POINTER_TYPE (PV2SI, V2SI)
 DEF_POINTER_TYPE (PV2DF, V2DF)
 DEF_POINTER_TYPE (PV2DI, V2DI)
@@ -527,7 +533,23 @@ DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, PV8DI, V8DI)
 
 # Instructions returning mask
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UQI, UQI)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UHI, UHI)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, USI, USI)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI, PUCHAR)
+DEF_FUNCTION_TYPE (UCHAR, UDI, UDI)
+
+DEF_FUNCTION_TYPE (UQI, UQI, INT)
+DEF_FUNCTION_TYPE (UHI, UHI, INT)
+DEF_FUNCTION_TYPE (USI, USI, INT)
+DEF_FUNCTION_TYPE (UDI, UDI, INT)
+DEF_FUNCTION_TYPE (UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI)
+DEF_FUNCTION_TYPE (USI, USI)
+DEF_FUNCTION_TYPE (UDI, UDI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
 DEF_FUNCTION_TYPE (USI, V32QI)
 DEF_FUNCTION_TYPE (UDI, V64QI)
@@ -540,6 +562,7 @@ DEF_FUNCTION_TYPE (UHI, V16SI)
 DEF_FUNCTION_TYPE (UQI, V2DI)
 DEF_FUNCTION_TYPE (UQI, V4DI)
 DEF_FUNCTION_TYPE (UQI, V8DI)
+DEF_FUNCTION_TYPE (UQI, UQI, UQI)
 DEF_FUNCTION_TYPE (UHI, UHI, UHI)
 DEF_FUNCTION_TYPE (USI, USI, USI)
 DEF_FUNCTION_TYPE (UDI, UDI, UDI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 227526b..5dae57d 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1436,16 +1436,75 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_roundpd_vec_pack_sfix512, "__builtin_ia32_ceilpd_vec_pack_sfix512", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512, (enum rtx_code) ROUND_CEIL, (int) V16SI_FTYPE_V8DF_V8DF_ROUND)
 
 /* Mask arithmetic operations */
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi, "__builtin_ia32_kortest_mask8_u8qi", IX86_BUILTIN_KORTEST8_U8, UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kortest_mask16_u8hi", IX86_BUILTIN_KORTEST16_U8, UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi, "__builtin_ia32_kortest_mask32_u8si", IX86_BUILTIN_KORTEST32_U8, UNKNOWN, (int) UCHAR_FTYPE_USI_USI_PUCHAR)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi, "__builtin_ia32_kortest_mask64_u8di", IX86_BUILTIN_KORTEST64_U8, UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI_PUCHAR)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi, "__builtin_ia32_kortestz_mask8_u8qi", IX86_BUILTIN_KORTESTZ8_U8, UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kortestz_mask16_u8hi", IX86_BUILTIN_KORTESTZ16_U8, UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi, "__builtin_ia32_kortestz_mask32_u8si", IX86_BUILTIN_KORTESTZ32_U8, UNKNOWN, (int) UCHAR_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi, "__builtin_ia32_kortestz_mask64_u8di", IX86_BUILTIN_KORTESTZ64_U8, UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_movqi, "__builtin_ia32_kortestc_mask8_u8qi", IX86_BUILTIN_KORTESTC8_U8, UNKNOWN, (int) UCHAR_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kortestc_mask16_u8hi", IX86_BUILTIN_KORTESTC16_U8, UNKNOWN, (int) UCHAR_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movsi, "__builtin_ia32_kortestc_mask32_u8si", IX86_BUILTIN_KORTESTC32_U8, UNKNOWN, (int) UCHAR_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_movdi, "__builtin_ia32_kortestc_mask64_u8di", IX86_BUILTIN_KORTESTC64_U8, UNKNOWN, (int) UCHAR_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_shiftlqi3_1, "__builtin_ia32_kshiftliqi", IX86_BUILTIN_KSHIFTLI8, UNKNOWN, (int) UQI_FTYPE_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_shiftlhi3_1, "__builtin_ia32_kshiftlihi", IX86_BUILTIN_KSHIFTLI16, UNKNOWN, (int) UHI_FTYPE_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftlsi3_1, "__builtin_ia32_kshiftlisi", IX86_BUILTIN_KSHIFTLI32, UNKNOWN, (int) USI_FTYPE_USI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftldi3_1, "__builtin_ia32_kshiftlidi", IX86_BUILTIN_KSHIFTLI64, UNKNOWN, (int) UDI_FTYPE_UDI_INT)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_shiftrqi3_1, "__builtin_ia32_kshiftriqi", IX86_BUILTIN_KSHIFTRI8, UNKNOWN, (int) UQI_FTYPE_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_shiftrhi3_1, "__builtin_ia32_kshiftrihi", IX86_BUILTIN_KSHIFTRI16, UNKNOWN, (int) UHI_FTYPE_UHI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftrsi3_1, "__builtin_ia32_kshiftrisi", IX86_BUILTIN_KSHIFTRI32, UNKNOWN, (int) USI_FTYPE_USI_INT)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_shiftrdi3_1, "__builtin_ia32_kshiftridi", IX86_BUILTIN_KSHIFTRI64, UNKNOWN, (int) UDI_FTYPE_UDI_INT)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_one_cmplqi2, "__builtin_ia32_knotqi", IX86_BUILTIN_KNOT8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_one_cmplhi2, "__builtin_ia32_knothi", IX86_BUILTIN_KNOT16, UNKNOWN, (int) UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_one_cmplsi2, "__builtin_ia32_knotsi", IX86_BUILTIN_KNOT32, UNKNOWN, (int) USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_one_cmpldi2, "__builtin_ia32_knotdi", IX86_BUILTIN_KNOT64, UNKNOWN, (int) UDI_FTYPE_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_iorqi3, "__builtin_ia32_korqi", IX86_BUILTIN_KOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_iorhi3, "__builtin_ia32_korhi", IX86_BUILTIN_KOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_iorsi3, "__builtin_ia32_korsi", IX86_BUILTIN_KOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_iordi3, "__builtin_ia32_kordi", IX86_BUILTIN_KOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kxnorqi, "__builtin_ia32_kxnorqi", IX86_BUILTIN_KXNOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnorsi, "__builtin_ia32_kxnorsi", IX86_BUILTIN_KXNOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kxnordi, "__builtin_ia32_kxnordi", IX86_BUILTIN_KXNOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_xorqi3, "__builtin_ia32_kxorqi", IX86_BUILTIN_KXOR8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_xorhi3, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_xorsi3, "__builtin_ia32_kxorsi", IX86_BUILTIN_KXOR32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_xordi3, "__builtin_ia32_kxordi", IX86_BUILTIN_KXOR64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kaddqi, "__builtin_ia32_kaddqi", IX86_BUILTIN_KADD8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kaddhi, "__builtin_ia32_kaddhi", IX86_BUILTIN_KADD16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kaddsi, "__builtin_ia32_kaddsi", IX86_BUILTIN_KADD32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kadddi, "__builtin_ia32_kadddi", IX86_BUILTIN_KADD64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_andqi3, "__builtin_ia32_kandqi", IX86_BUILTIN_KAND8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi3, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_andsi3, "__builtin_ia32_kandsi", IX86_BUILTIN_KAND32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_anddi3, "__builtin_ia32_kanddi", IX86_BUILTIN_KAND64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kandnqi, "__builtin_ia32_kandnqi", IX86_BUILTIN_KANDN8, UNKNOWN, (int) UQI_FTYPE_UQI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kandnhi, "__builtin_ia32_kandnhi", IX86_BUILTIN_KANDN16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandnsi, "__builtin_ia32_kandnsi", IX86_BUILTIN_KANDN32, UNKNOWN, (int) USI_FTYPE_USI_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kandndi, "__builtin_ia32_kandndi", IX86_BUILTIN_KANDN64, UNKNOWN, (int) UDI_FTYPE_UDI_UDI)
+
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestchi, "__builtin_ia32_kortestchi", IX86_BUILTIN_KORTESTC16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kortestzhi, "__builtin_ia32_kortestzhi", IX86_BUILTIN_KORTESTZ16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kunpckhi, "__builtin_ia32_kunpckhi", IX86_BUILTIN_KUNPCKBW, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kxnorhi, "__builtin_ia32_kxnorhi", IX86_BUILTIN_KXNOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_xorhi3, "__builtin_ia32_kxorhi", IX86_BUILTIN_KXOR16, UNKNOWN, (int) UHI_FTYPE_UHI_UHI)
+
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_kmovb, "__builtin_ia32_kmov8", IX86_BUILTIN_KMOV8, UNKNOWN, (int) UQI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_kmovw, "__builtin_ia32_kmov16", IX86_BUILTIN_KMOV16, UNKNOWN, (int) UHI_FTYPE_UHI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovd, "__builtin_ia32_kmov32", IX86_BUILTIN_KMOV32, UNKNOWN, (int) USI_FTYPE_USI)
+BDESC (OPTION_MASK_ISA_AVX512BW, CODE_FOR_kmovq, "__builtin_ia32_kmov64", IX86_BUILTIN_KMOV64, UNKNOWN, (int) UDI_FTYPE_UDI)
 
 /* SHA */
 BDESC (OPTION_MASK_ISA_SSE2, CODE_FOR_sha1msg1, 0, IX86_BUILTIN_SHA1MSG1, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..fc40b86 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34638,7 +34638,12 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8HI:
     case V4DI_FTYPE_V4SI:
     case V4DI_FTYPE_V2DI:
+    case UQI_FTYPE_UQI:
     case UHI_FTYPE_UHI:
+    case USI_FTYPE_USI:
+//    case USI_FTYPE_UQI:
+//    case USI_FTYPE_UHI:
+    case UDI_FTYPE_UDI:
     case UHI_FTYPE_V16QI:
     case USI_FTYPE_V32QI:
     case UDI_FTYPE_V64QI:
@@ -34772,6 +34777,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case UINT_FTYPE_UINT_UCHAR:
     case UINT16_FTYPE_UINT16_INT:
     case UINT8_FTYPE_UINT8_INT:
+    case UQI_FTYPE_UQI_UQI:
     case UHI_FTYPE_UHI_UHI:
     case USI_FTYPE_USI_USI:
     case UDI_FTYPE_UDI_UDI:
@@ -34819,6 +34825,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DI_FTYPE_V8DI_INT:
     case QI_FTYPE_V4SF_INT:
     case QI_FTYPE_V2DF_INT:
+    case UQI_FTYPE_UQI_INT:
+    case UHI_FTYPE_UHI_INT:
+    case USI_FTYPE_USI_INT:
+    case UDI_FTYPE_UDI_INT:
       nargs = 2;
       nargs_constant = 1;
       break;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a5650a1..800450e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2497,6 +2497,46 @@
    (set_attr "type" "mskmov")
    (set_attr "prefix" "vex")])
 
+(define_insn "kmovb"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=k,k")
+	(unspec:QI
+	  [(match_operand:QI 1 "nonimmediate_operand" "r,km")]
+	  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512DQ"
+  "@
+   kmovb\t{%k1, %0|%0, %k1}
+   kmovb\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "QI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovd"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=k,k")
+	(unspec:SI
+	  [(match_operand:SI 1 "nonimmediate_operand" "r,km")]
+	  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
+  "@
+   kmovd\t{%k1, %0|%0, %k1}
+   kmovd\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "SI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kmovq"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=k,k,km")
+	(unspec:DI
+	  [(match_operand:DI 1 "nonimmediate_operand" "r,km,k")]
+	  UNSPEC_KMOV))]
+  "!(MEM_P (operands[0]) && MEM_P (operands[1])) && TARGET_AVX512BW"
+  "@
+   kmovq\t{%k1, %0|%0, %k1}
+   kmovq\t{%1, %0|%0, %1}
+   kmovq\t{%1, %0|%0, %1}";
+  [(set_attr "mode" "DI")
+   (set_attr "type" "mskmov")
+   (set_attr "prefix" "vex")])
+
 
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,k,k, r,m")
@@ -8304,11 +8344,11 @@
    (set_attr "mode" "QI")])
 
 (define_insn "kandn<mode>"
-  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
-	(and:SWI12
-	  (not:SWI12
-	    (match_operand:SWI12 1 "register_operand" "r,0,k"))
-	  (match_operand:SWI12 2 "register_operand" "r,r,k")))
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+	(and:SWI1248x
+	  (not:SWI1248x
+	    (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+	  (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_AVX512F"
 {
@@ -8319,10 +8359,50 @@
     case 1:
       return "#";
     case 2:
-      if (TARGET_AVX512DQ && <MODE>mode == QImode)
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+	return "kandnq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+	return "kandnd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
 	return "kandnb\t{%2, %1, %0|%0, %1, %2}";
       else
 	return "kandnw\t{%2, %1, %0|%0, %1, %2}";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kadd<mode>"
+  [(set (match_operand:SWI1248x 0 "register_operand" "=r,&r,!k")
+	(plus:SWI1248x
+	  (not:SWI1248x
+	    (match_operand:SWI1248x 1 "register_operand" "r,0,k"))
+	  (match_operand:SWI1248x 2 "register_operand" "r,r,k")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "add\t{%k2, %k1, %k0|%k0, %k1, %k2}";
+    case 1:
+      return "#";
+    case 2:
+      if (TARGET_AVX512BW && <MODE>mode == DImode)
+	return "kaddq\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512BW && <MODE>mode == SImode)
+	return "kaddd\t{%2, %1, %0|%0, %1, %2}";
+      else if (TARGET_AVX512DQ && <MODE>mode == QImode)
+	return "kaddb\t{%2, %1, %0|%0, %1, %2}";
+      else
+	return "kaddw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_unreachable ();
     }
@@ -9687,7 +9767,7 @@
 ;; shift pair, instead using moves and sign extension for counts greater
 ;; than 31.
 
-(define_insn "*<mshift><mode>3"
+(define_insn "<mshift><mode>3_1"
   [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=k")
 	(any_lshift:SWI1248_AVX512BWDQ (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")
 				       (match_operand:QI 2 "immediate_operand" "i")))]
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
new file mode 100644
index 0000000..0b38850
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
new file mode 100644
index 0000000..5b7b417
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kaddq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kaddq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
new file mode 100644
index 0000000..2a934f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
new file mode 100644
index 0000000..6b68ab3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
new file mode 100644
index 0000000..35f1c12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandnq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandnq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
new file mode 100644
index 0000000..a1aaed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kandq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kandq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
new file mode 100644
index 0000000..a89b2d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+volatile __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _load_mask32 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
new file mode 100644
index 0000000..dcb65fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+extern __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  _store_mask32 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
new file mode 100644
index 0000000..fe5e1d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask32 m1;
+extern unsigned int m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtmask32_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
new file mode 100644
index 0000000..8a085d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovd-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask32 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtu32_mask32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
new file mode 100644
index 0000000..51d547d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+volatile __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _load_mask64 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
new file mode 100644
index 0000000..9baf200
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+extern __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  _store_mask64 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
new file mode 100644
index 0000000..3a02d38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask64 m1;
+extern unsigned long long m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtmask64_u64 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
new file mode 100644
index 0000000..1cc16ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kmovq-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned long long m1;
+extern __mmask64 m2;
+
+void
+avx512bw_test ()
+{
+  m2 = _cvtu64_mask64 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
new file mode 100644
index 0000000..dd6b6e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask32 (k1);
+  x = _mm512_mask_add_epi16 (x, k1, x, x);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
new file mode 100644
index 0000000..5b94358
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-knotq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "knotq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask64 (k1);
+  x = _mm512_mask_add_epi8 (x, k1, x, x);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
new file mode 100644
index 0000000..163c46e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
new file mode 100644
index 0000000..77b1b9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-korq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "korq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
new file mode 100644
index 0000000..85be9b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftld-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftld\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
new file mode 100644
index 0000000..cd5707e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftlq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
new file mode 100644
index 0000000..91b6313
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrd\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask32 (k1, i);
+  x = _mm512_mask_add_epi16 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
new file mode 100644
index 0000000..c10fa4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kshiftrq-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2;
+  int i = 5;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask64 (k1, i);
+  x = _mm512_mask_add_epi8 (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
new file mode 100644
index 0000000..951260f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckdq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask64 k3;
+  __mmask32 k1, k2;
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackd_mask64 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
new file mode 100644
index 0000000..c68ad8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kunpckwd\[ \\t\]+\[^\{\n\]*%k\[1-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test () {
+  volatile __mmask32 k3;
+  __mmask16 k1, k2;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackw_mask32 (k1, k2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
new file mode 100644
index 0000000..ccf4b63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
new file mode 100644
index 0000000..b9c0979
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxnorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxnorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
new file mode 100644
index 0000000..ce03ab4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxord-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxord\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovd" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask32 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovd %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovd %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask32 (k1, k2);
+  x = _mm512_mask_add_epi16 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
new file mode 100644
index 0000000..d6366dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-kxorq-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "kxorq\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovq" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512bw_test ()
+{
+  __mmask64 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_si512 ();
+
+  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask64 (k1, k2);
+  x = _mm512_mask_add_epi8 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
new file mode 100644
index 0000000..a84d8ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kaddb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kaddb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
new file mode 100644
index 0000000..b5b5367
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512i x = _mm512_setzero_epi32();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kand_mask8 (k1, k2);
+  x = _mm512_mask_add_epi64 (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
new file mode 100644
index 0000000..ff50610
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kandnb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kandnb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kandn_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
new file mode 100644
index 0000000..3832853
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+volatile __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _load_mask8 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
new file mode 100644
index 0000000..8d06674
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+extern __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  _store_mask8 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
new file mode 100644
index 0000000..2da4719
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask8 m1;
+extern unsigned int m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _cvtmask8_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
new file mode 100644
index 0000000..d3f8c5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask8 m2;
+
+void
+avx512dq_test ()
+{
+  m2 = _cvtu32_mask8 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
new file mode 100644
index 0000000..8bb9249
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-knotb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "knotb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (45) );
+
+  k2 = _knot_mask8 (k1);
+  x = _mm512_mask_add_pd (x, k1, x, x);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
new file mode 100644
index 0000000..22b727d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-korb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "korb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
new file mode 100644
index 0000000..422d0b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftlb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
new file mode 100644
index 0000000..f87cf74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kshiftrb-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2;
+  int i = 5;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask8 (k1, i);
+  x = _mm512_mask_add_pd (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
new file mode 100644
index 0000000..ee21aa1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxnorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxnorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxnor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
new file mode 100644
index 0000000..63a1ff8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-kxorb-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512dq -O2" } */
+/* { dg-final { scan-assembler-times "kxorb\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovb" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512dq_test ()
+{
+  __mmask8 k1, k2, k3;
+  volatile __m512d x = _mm512_setzero_pd();
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kxor_mask8 (k1, k2);
+  x = _mm512_mask_add_pd (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
new file mode 100644
index 0000000..9faf4ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kaddw-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kaddw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw" 2 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2, k3;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kadd_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
new file mode 100644
index 0000000..77c8ddc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+volatile __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  m2 = _load_mask16 (&m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
new file mode 100644
index 0000000..740ea9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+extern __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  _store_mask16 (&m2, m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
new file mode 100644
index 0000000..127a4ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+__mmask16 m1;
+extern unsigned int m2;
+
+void
+avx512f_test ()
+{
+  m2 = _cvtmask16_u32 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
new file mode 100644
index 0000000..d729e8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "kmovw\[ \\t\]+\{*%k\[0-7\]" 1 } } */
+
+#include <immintrin.h>
+
+unsigned int m1;
+extern __mmask16 m2;
+
+void
+avx512f_test ()
+{
+  m2 = _cvtu32_mask16 (m1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
new file mode 100644
index 0000000..7a9de12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftlw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftlw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftli_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
new file mode 100644
index 0000000..641d307
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kshiftrw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kshiftrw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2;
+  int i = 5;
+  volatile __m512 x = _mm512_setzero_ps();
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+
+  k2 = _kshiftri_mask16 (k1, i);
+  x = _mm512_mask_add_ps (x, k2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
new file mode 100644
index 0000000..2061f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kunpckbw-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kunpckbw\[ \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test () {
+  __mmask8 k1, k2;
+  __mmask16 k3;
+  volatile __m512 x = _mm512_setzero_ps(); 
+
+  __asm__( "kmovb %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovb %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _kunpackb_mask16 (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2017-01-26 12:20 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-11 14:34 [PATCH] Add AVX512 k-mask intrinsics Uros Bizjak
2016-11-11 17:39 ` Andrew Senkevich
2016-11-11 17:50   ` Uros Bizjak
2016-11-11 17:56     ` Uros Bizjak
2016-11-11 18:23       ` Andrew Senkevich
2016-11-11 19:14         ` Uros Bizjak
2016-12-02 17:45           ` Andrew Senkevich
2016-12-02 18:31             ` Uros Bizjak
2016-12-05 14:59               ` Andrew Senkevich
2016-12-05 17:19                 ` H.J. Lu
2016-12-14 19:33               ` Andrew Senkevich
2016-12-14 20:35                 ` Uros Bizjak
     [not found]                   ` <CAMXFM3vC-3bMgQaQ2bnjDU7oQMPdvhurzgOFftZHqzNXAw=WgA@mail.gmail.com>
2016-12-15 16:51                     ` Uros Bizjak
2016-12-15 19:04                       ` Andrew Senkevich
2016-12-16 12:45                         ` Uros Bizjak
2017-01-16 22:30                           ` Andrew Senkevich
2017-01-16 22:55                             ` Jakub Jelinek
2017-01-17 11:05                               ` Andrew Senkevich
2017-01-17 11:06                                 ` Uros Bizjak
2017-01-17 12:30                                 ` Kirill Yukhin
2017-01-17 13:03                                   ` Andrew Senkevich
2017-01-17 13:51                                     ` Jakub Jelinek
2017-01-18 12:48                                       ` Andrew Senkevich
2017-01-18 21:45                                         ` Uros Bizjak
2017-01-19 10:46                                         ` Kirill Yukhin
2017-01-19 16:45                                           ` Andrew Senkevich
2017-01-19 18:04                                             ` Kirill Yukhin
2017-01-20 13:41                                               ` Andrew Senkevich
2017-01-20 13:47                                                 ` Uros Bizjak
2017-01-20 17:26                                                   ` Kirill Yukhin
2017-01-20 20:07                                                     ` Andrew Senkevich
2017-01-21  8:25                                                       ` Richard Biener
2017-01-23 11:33                                                       ` Kirill Yukhin
2017-01-26  9:38                                                       ` Thomas Schwinge
2017-01-26 10:04                                                         ` Uros Bizjak
2017-01-26 10:51                                                         ` Kirill Yukhin
2017-01-26 10:54                                                           ` Jakub Jelinek
2017-01-26 10:55                                                             ` Uros Bizjak
2017-01-26 11:04                                                               ` Jakub Jelinek
2017-01-26 11:18                                                                 ` Uros Bizjak
2017-01-26 11:53                                                           ` Thomas Schwinge
2017-01-26 12:04                                                             ` Kirill Yukhin
2017-01-26 12:17                                                               ` Jakub Jelinek
2017-01-26 12:23                                                                 ` Kirill Yukhin
2017-01-17  8:12                             ` Uros Bizjak
  -- strict thread matches above, loose matches on Subject: below --
2016-11-11 14:14 Andrew Senkevich
2016-11-11 15:26 ` Marc Glisse
2016-11-11 18:28   ` Andrew Senkevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).