public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch][x86] GFNI enabling [2/4]
@ 2017-10-17 13:00 Koval, Julia
  2017-10-30 10:30 ` Kirill Yukhin
  0 siblings, 1 reply; 9+ messages in thread
From: Koval, Julia @ 2017-10-17 13:00 UTC (permalink / raw)
  To: GCC Patches; +Cc: Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 1619 bytes --]

Hi, this is the second patch of enabling GFNI ISASET. It adds GF2P8AFFINEINV instruction.
The instruction is described here:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

gcc/
	* config.gcc: Add gfniintrin.h.
	* config/i386/gfniintrin.h: New.
	* config/i386/i386-builtin-types.def (__builtin_ia32_vgf2p8affineinvqb_v64qi,
	__builtin_ia32_vgf2p8affineinvqb_v64qi_mask, __builtin_ia32_vgf2p8affineinvqb_v32qi
	__builtin_ia32_vgf2p8affineinvqb_v32qi_mask, __builtin_ia32_vgf2p8affineinvqb_v16qi,
	__builtin_ia32_vgf2p8affineinvqb_v16qi_mask): New builtins.
	* config/i386/i386-builtin.def (V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI,
	V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI, V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI,
	V64QI_FTYPE_V64QI_V64QI_INT): New types.
	* config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
	* config/i386/immintrin.h: Include gfniintrin.h.
	* config/i386/sse.md (vgf2p8affineinvqb_*) New pattern.

gcc/testsuite/
	* gcc.target/i386/avx-1.c: Handle new intrinsics.
	* gcc.target/i386/avx512-check.h: Check GFNI bit.
	* gcc.target/i386/avx512f-gf2p8affineinvqb-2.c: Runtime test.
	* gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c: Runtime test.
	* gcc.target/i386/gfni-1.c: New.
	* gcc.target/i386/gfni-2.c: New.
	* gcc.target/i386/gfni-3.c: New.
	* gcc.target/i386/gfni-4.c: New.
	* gcc.target/i386/i386.exp: (check_effective_target_gfni): New.
	* gcc.target/i386/sse-13.c: Handle new intrinsics.
	* gcc.target/i386/sse-23.c: Handle new intrinsics.

Ok for trunk?

Thanks,
Julia

[-- Attachment #2: 0002-GF2P8AFFINEINVQB-instruction.patch --]
[-- Type: application/octet-stream, Size: 28930 bytes --]

From 8fb6e5f3bb98c9a6f52a27bc2e4e3c085736bd10 Mon Sep 17 00:00:00 2001
From: "julia.koval" <jkoval@gkticlel801.igk.intel.com>
Date: Mon, 20 Feb 2017 14:25:53 +0300
Subject: [PATCH 2/4] GF2P8AFFINEINVQB instruction

---
 gcc/config.gcc                                     |   4 +-
 gcc/config/i386/gfniintrin.h                       | 229 +++++++++++++++++++++
 gcc/config/i386/i386-builtin-types.def             |   6 +
 gcc/config/i386/i386-builtin.def                   |   7 +
 gcc/config/i386/i386.c                             |   8 +
 gcc/config/i386/immintrin.h                        |   2 +
 gcc/config/i386/sse.md                             |  23 +++
 gcc/testsuite/gcc.target/i386/avx-1.c              |  10 +
 gcc/testsuite/gcc.target/i386/avx512-check.h       |   3 +
 .../gcc.target/i386/avx512f-gf2p8affineinvqb-2.c   |  74 +++++++
 .../gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c  |  17 ++
 gcc/testsuite/gcc.target/i386/gfni-1.c             |  18 ++
 gcc/testsuite/gcc.target/i386/gfni-2.c             |  27 +++
 gcc/testsuite/gcc.target/i386/gfni-3.c             |  17 ++
 gcc/testsuite/gcc.target/i386/gfni-4.c             |  14 ++
 gcc/testsuite/gcc.target/i386/i386.exp             |  15 ++
 gcc/testsuite/gcc.target/i386/sse-13.c             |   8 +
 gcc/testsuite/gcc.target/i386/sse-23.c             |   8 +
 18 files changed, 488 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/i386/gfniintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-gf2p8affineinvqb-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/gfni-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/gfni-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/gfni-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/gfni-4.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2270239..25e50d7 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -378,7 +378,7 @@ i[34567]86-*-*)
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
 		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
 		       avx512vpopcntdqintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h sgxintrin.h"
+		       clzerointrin.h pkuintrin.h sgxintrin.h gfniintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -402,7 +402,7 @@ x86_64-*-*)
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
 		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
 		       avx512vpopcntdqintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h sgxintrin.h"
+		       clzerointrin.h pkuintrin.h sgxintrin.h gfniintrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/gfniintrin.h b/gcc/config/i386/gfniintrin.h
new file mode 100644
index 0000000..a42c205
--- /dev/null
+++ b/gcc/config/i386/gfniintrin.h
@@ -0,0 +1,229 @@
+/* Copyright (C) 2014-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#error "Never use <gfniintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _GFNIINTRIN_H_INCLUDED
+#define _GFNIINTRIN_H_INCLUDED
+
+#ifndef __GFNI__
+#pragma GCC push_options
+#pragma GCC target("gfni")
+#define __DISABLE_GFNI__
+#endif /* __GFNI__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_gf2p8affineinv_epi64_epi8 (__m128i __A, __m128i __B, const int __C)
+{
+  return (__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi ((__v16qi) __A,
+							   (__v16qi) __B,
+							    __C);
+}
+#else
+#define _mm_gf2p8affineinv_epi64_epi8(A, B, C)				   \
+  ((__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi((__v16qi)(__m128i)(A), \
+					   (__v16qi)(__m128i)(B), (int)(C)))
+#endif
+
+#ifdef __DISABLE_GFNI__
+#undef __DISABLE_GFNI__
+#pragma GCC pop_options
+#endif /* __DISABLE_GFNI__ */
+
+#if !defined(__GFNI__) || !defined(__AVX__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx")
+#define __DISABLE_GFNIAVX__
+#endif /* __GFNIAVX__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_gf2p8affineinv_epi64_epi8 (__m256i __A, __m256i __B, const int __C)
+{
+  return (__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi ((__v32qi) __A,
+							   (__v32qi) __B,
+							    __C);
+}
+#else
+#define _mm256_gf2p8affineinv_epi64_epi8(A, B, C)			   \
+  ((__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi((__v32qi)(__m256i)(A), \
+						    (__v32qi)(__m256i)(B), \
+						    (int)(C)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX__
+#undef __DISABLE_GFNIAVX__
+#pragma GCC pop_options
+#endif /* __GFNIAVX__ */
+
+#if !defined(__GFNI__) || !defined(__AVX512VL__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx512vl")
+#define __DISABLE_GFNIAVX512VL__
+#endif /* __GFNIAVX512VL__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_gf2p8affineinv_epi64_epi8 (__m128i __A, __mmask16 __B, __m128i __C,
+				    __m128i __D, const int __E)
+{
+  return (__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask ((__v16qi) __C,
+								(__v16qi) __D,
+								 __E,
+								(__v16qi)__A,
+								 __B);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_gf2p8affineinv_epi64_epi8 (__mmask16 __A, __m128i __B, __m128i __C,
+				     const int __D)
+{
+  return (__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask ((__v16qi) __B,
+						(__v16qi) __C, __D,
+						(__v16qi) _mm_setzero_si128 (),
+						 __A);
+}
+#else
+#define _mm_mask_gf2p8affineinv_epi64_epi8(A, B, C, D, E) 		   \
+  ((__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(		   \
+			(__v16qi)(__m128i)(C), (__v16qi)(__m128i)(D),      \
+			(int)(E), (__v16qi)(__m128i)(A), (__mmask16)(B)))
+#define _mm_maskz_gf2p8affineinv_epi64_epi8(A, B, C, D) \
+  ((__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(		   \
+			(__v16qi)(__m128i)(B), (__v16qi)(__m128i)(C),	   \
+			(int)(D), (__v16qi)(__m128i) _mm_setzero_si128 (), \
+			(__mmask16)(A)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX512VL__
+#undef __DISABLE_GFNIAVX512VL__
+#pragma GCC pop_options
+#endif /* __GFNIAVX512VL__ */
+
+#if !defined(__GFNI__) || !defined(__AVX512VL__) || !defined(__AVX512BW__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx512vl,avx512bw")
+#define __DISABLE_GFNIAVX512VLBW__
+#endif /* __GFNIAVX512VLBW__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_gf2p8affineinv_epi64_epi8 (__m256i __A, __mmask32 __B,
+				       __m256i __C, __m256i __D, const int __E)
+{
+  return (__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask ((__v32qi) __C,
+								(__v32qi) __D,
+							 	 __E,
+								(__v32qi)__A,
+								 __B);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_gf2p8affineinv_epi64_epi8 (__mmask32 __A, __m256i __B,
+					__m256i __C, const int __D)
+{
+  return (__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask ((__v32qi) __B,
+				      (__v32qi) __C, __D,
+				      (__v32qi) _mm256_setzero_si256 (), __A);
+}
+#else
+#define _mm256_mask_gf2p8affineinv_epi64_epi8(A, B, C, D, E)		\
+  ((__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(		\
+	(__v32qi)(__m256i)(C), (__v32qi)(__m256i)(D), (int)(E),		\
+	(__v32qi)(__m256i)(A), (__mmask32)(B)))
+#define _mm256_maskz_gf2p8affineinv_epi64_epi8(A, B, C, D)		\
+  ((__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(		\
+	(__v32qi)(__m256i)(B), (__v32qi)(__m256i)(C), (int)(D),		\
+	(__v32qi)(__m256i) _mm256_setzero_si256 (), (__mmask32)(A)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX512VLBW__
+#undef __DISABLE_GFNIAVX512VLBW__
+#pragma GCC pop_options
+#endif /* __GFNIAVX512VLBW__ */
+
+#if !defined(__GFNI__) || !defined(__AVX512F__) || !defined(__AVX512BW__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx512f,avx512bw")
+#define __DISABLE_GFNIAVX512FBW__
+#endif /* __GFNIAVX512FBW__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_gf2p8affineinv_epi64_epi8 (__m512i __A, __mmask64 __B, __m512i __C,
+				       __m512i __D, const int __E)
+{
+  return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask ((__v64qi) __C,
+								(__v64qi) __D,
+								 __E,
+								(__v64qi)__A,
+								 __B);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_gf2p8affineinv_epi64_epi8 (__mmask64 __A, __m512i __B,
+					__m512i __C, const int __D)
+{
+  return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask ((__v64qi) __B,
+				(__v64qi) __C, __D,
+				(__v64qi) _mm512_setzero_si512 (), __A);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_gf2p8affineinv_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
+{
+  return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ((__v64qi) __A,
+							   (__v64qi) __B, __C);
+}
+#else
+#define _mm512_mask_gf2p8affineinv_epi64_epi8(A, B, C, D, E) 		\
+  ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(		\
+	(__v64qi)(__m512i)(C), (__v64qi)(__m512i)(D), (int)(E),		\
+	(__v64qi)(__m512i)(A), (__mmask64)(B)))
+#define _mm512_maskz_gf2p8affineinv_epi64_epi8(A, B, C, D)		\
+  ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(		\
+	(__v64qi)(__m512i)(B), (__v64qi)(__m512i)(C), (int)(D),		\
+	(__v64qi)(__m512i) _mm512_setzero_si512 (), (__mmask64)(A)))
+#define _mm512_gf2p8affineinv_epi64_epi8(A, B, C)			\
+  ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi (			\
+	(__v64qi)(__m512i)(A), (__v64qi)(__m512i)(B), (int)(C)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX512FBW__
+#undef __DISABLE_GFNIAVX512FBW__
+#pragma GCC pop_options
+#endif /* __GFNIAVX512FBW__ */
+
+#endif /* _GFNIINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 8d584db..f4508f4 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1210,3 +1210,9 @@ DEF_FUNCTION_TYPE (BND, BND, BND)
 DEF_FUNCTION_TYPE (PVOID, PCVOID, BND, ULONG)
 DEF_FUNCTION_TYPE (ULONG, VOID)
 DEF_FUNCTION_TYPE (PVOID, BND)
+
+#GFNI builtins
+DEF_FUNCTION_TYPE (V64QI, V64QI, V64QI, INT)
+DEF_FUNCTION_TYPE (V64QI, V64QI, V64QI, INT, V64QI, UDI)
+DEF_FUNCTION_TYPE (V32QI, V32QI, V32QI, INT, V32QI, USI)
+DEF_FUNCTION_TYPE (V16QI, V16QI, V16QI, INT, V16QI, UHI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 0d5d5b7..24d057d 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2589,6 +2589,13 @@ BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, CODE_FOR_vpopcountv8di_mask, "__builtin_
 /* RDPID */
 BDESC (OPTION_MASK_ISA_RDPID, CODE_FOR_rdpid, "__builtin_ia32_rdpid", IX86_BUILTIN_RDPID, UNKNOWN, (int) UNSIGNED_FTYPE_VOID)
 
+/* GFNI */
+BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v32qi, "__builtin_ia32_vgf2p8affineinvqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v32qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI)
+BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v16qi, "__builtin_ia32_vgf2p8affineinvqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v16qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI)
 BDESC_END (ARGS2, MPX)
 
 /* Builtins for MPX.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eaa98cd..5a192e4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -33420,6 +33420,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case UQI_FTYPE_V4SF_V4SF_INT:
     case UHI_FTYPE_V16SI_V16SI_INT:
     case UHI_FTYPE_V16SF_V16SF_INT:
+    case V64QI_FTYPE_V64QI_V64QI_INT:
       nargs = 3;
       nargs_constant = 1;
       break;
@@ -33647,6 +33648,13 @@ ix86_expand_args_builtin (const struct builtin_description *d,
       mask_pos = 1;
       nargs_constant = 1;
       break;
+    case V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI:
+    case V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI:
+    case V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI:
+      nargs = 5;
+      mask_pos = 1;
+      nargs_constant = 2;
+      break;
 
     default:
       gcc_unreachable ();
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index b52f58e..3169f6f 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -90,6 +90,8 @@
 
 #include <xtestintrin.h>
 
+#include <gfniintrin.h>
+
 #ifndef __RDRND__
 #pragma GCC push_options
 #pragma GCC target("rdrnd")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 0c26bd1..3199f19 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -157,6 +157,9 @@
   UNSPEC_VP4FNMADD
   UNSPEC_VP4DPWSSD
   UNSPEC_VP4DPWSSDS
+
+  ;; For GFNI support
+  UNSPEC_GF2P8AFFINEINV
 ])
 
 (define_c_enum "unspecv" [
@@ -325,6 +328,9 @@
 (define_mode_iterator VI1_AVX512
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI])
 
+(define_mode_iterator VI1_AVX512F
+  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI])
+
 (define_mode_iterator VI2_AVX2
   [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
 
@@ -20165,3 +20171,20 @@
     ])]
   "TARGET_SSE && TARGET_64BIT"
   "jmp\t%P1")
+
+(define_insn "vgf2p8affineinvqb_<mode><mask_name>"
+  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,x,v")
+	(unspec:VI1_AVX512F [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v")
+			       (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm")
+			       (match_operand:QI 3 "const_0_to_255_operand" "n,n,n")]
+			      UNSPEC_GF2P8AFFINEINV))]
+  "TARGET_GFNI"
+  "@
+   gf2p8affineinvqb\t{%3, %2, %0| %0, %2, %3}
+   vgf2p8affineinvqb\t{%3, %2, %1, %0<mask_operand4>| %0<mask_operand4>, %1, %2, %3}
+   vgf2p8affineinvqb\t{%3, %2, %1, %0<mask_operand4>| %0<mask_operand4>, %1, %2, %3}"
+  [(set_attr "isa" "noavx,avx,avx512bw")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,maybe_evex,evex")
+   (set_attr "mode" "<sseinsnmode>")])
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 085ba81..67dea5b 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -603,6 +603,16 @@
 #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D)
 
+/* gfniintrin.h */
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, 1) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, 1, D, E) 
+
+
+
 #include <wmmintrin.h>
 #include <immintrin.h>
 #include <mm3dnow.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 9693fa4..9390c1a 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -75,6 +75,9 @@ main ()
 #ifdef AVX512VPOPCNTDQ
       && (ecx & bit_AVX512VPOPCNTDQ)
 #endif
+#ifdef GFNI
+      && (ecx & bit_GFNI)
+#endif
       && avx512f_os_support ())
     {
       DO_TEST ();
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-gf2p8affineinvqb-2.c b/gcc/testsuite/gcc.target/i386/avx512f-gf2p8affineinvqb-2.c
new file mode 100644
index 0000000..af4839f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-gf2p8affineinvqb-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mgfni -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target gfni } */
+
+#define AVX512F
+
+#define GFNI
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 8)
+
+#include "avx512f-mask-type.h"
+#include <x86intrin.h>
+
+static void
+CALC (unsigned char *r, unsigned char *s1, unsigned char *s2, unsigned char imm)
+{
+  for (int a = 0; a < SIZE/8; a++)
+    {
+      for (int val = 0; val < 8; val++)
+        {
+          unsigned char result = 0;
+          for (int bit = 0; bit < 8; bit++)
+          {
+            unsigned char temp = s1[a*8 + val] & s2[a*8 + bit];
+            unsigned char parity = __popcntd(temp);
+            if (parity % 2)
+              result |= (1 << (8 - bit - 1));
+          }
+          r[a*8 + val] = result ^ imm; 
+        }
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_b) res1, res2, res3, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  char res_ref[SIZE];
+  unsigned char imm = 0;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = i %2 ; // gfni inverse of 1 and 0 are 1 and 0
+      src2.a[i] = 1;
+    }
+
+  for (i = 0; i < SIZE; i++)
+    {
+      res1.a[i] = DEFAULT_VALUE;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+    }
+
+  CALC (res_ref, src1.a, src2.a, imm);
+
+  res1.x = INTRINSIC (_gf2p8affineinv_epi64_epi8) (src1.x, src2.x, imm);
+  res2.x = INTRINSIC (_mask_gf2p8affineinv_epi64_epi8) (res2.x, mask, src1.x, src2.x, imm);
+  res3.x = INTRINSIC (_maskz_gf2p8affineinv_epi64_epi8) (mask, src1.x, src2.x, imm);
+
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c
new file mode 100644
index 0000000..fa54526
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl -mgfni" } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target avx512bw } */
+/* { dg-require-effective-target gfni } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512f-gf2p8affineinvqb-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512f-gf2p8affineinvqb-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/gfni-1.c b/gcc/testsuite/gcc.target/i386/gfni-1.c
new file mode 100644
index 0000000..5e22c9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/gfni-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mgfni -mavx512bw -mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <x86intrin.h>
+
+volatile __m512i x1, x2;
+volatile __mmask64 m64;
+ 
+void extern
+avx512vl_test (void)
+{
+    x1 = _mm512_gf2p8affineinv_epi64_epi8(x1, x2, 3);
+    x1 = _mm512_mask_gf2p8affineinv_epi64_epi8(x1, m64, x2, x1, 3);
+    x1 = _mm512_maskz_gf2p8affineinv_epi64_epi8(m64, x1, x2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/gfni-2.c b/gcc/testsuite/gcc.target/i386/gfni-2.c
new file mode 100644
index 0000000..4d1f151
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/gfni-2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mgfni -mavx512bw -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <x86intrin.h>
+
+int *p;
+volatile __m256i x3, x4;
+volatile __m128i x5, x6;
+volatile __mmask32 m32;
+volatile __mmask16 m16;
+ 
+void extern
+avx512vl_test (void)
+{
+    x3 = _mm256_gf2p8affineinv_epi64_epi8(x3, x4, 3);
+    x3 = _mm256_mask_gf2p8affineinv_epi64_epi8(x3, m32, x4, x3, 3);
+    x3 = _mm256_maskz_gf2p8affineinv_epi64_epi8(m32, x3, x4, 3);
+    x5 = _mm_gf2p8affineinv_epi64_epi8(x5, x6, 3);
+    x5 = _mm_mask_gf2p8affineinv_epi64_epi8(x5, m16, x6, x5, 3);
+    x5 = _mm_maskz_gf2p8affineinv_epi64_epi8(m16, x5, x6, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/gfni-3.c b/gcc/testsuite/gcc.target/i386/gfni-3.c
new file mode 100644
index 0000000..de5f80b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/gfni-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mgfni -mavx -O2" } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <x86intrin.h>
+
+int *p;
+volatile __m256i x3, x4;
+volatile __m128i x5, x6;
+ 
+void extern
+avx512vl_test (void)
+{
+    x3 = _mm256_gf2p8affineinv_epi64_epi8(x3, x4, 3);
+    x5 = _mm_gf2p8affineinv_epi64_epi8(x5, x6, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/gfni-4.c b/gcc/testsuite/gcc.target/i386/gfni-4.c
new file mode 100644
index 0000000..1532716
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/gfni-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mgfni -O2" } */
+/* { dg-final { scan-assembler-times "gf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <x86intrin.h>
+
+int *p;
+volatile __m128i x5, x6;
+ 
+void extern
+avx512vl_test (void)
+{
+    x5 = _mm_gf2p8affineinv_epi64_epi8(x5, x6, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp
index eae2531..b2bdbfd 100644
--- a/gcc/testsuite/gcc.target/i386/i386.exp
+++ b/gcc/testsuite/gcc.target/i386/i386.exp
@@ -421,6 +421,21 @@ proc check_effective_target_avx512vpopcntdq { } {
     } "-mavx512vpopcntdq" ]
 }
 
+# Return 1 if gfni instructions can be compiled.
+proc check_effective_target_gfni { } {
+    return [check_no_compiler_messages gfni object {
+        typedef char __v16qi __attribute__ ((__vector_size__ (16)));
+
+        __v16qi
+        _mm_gf2p8affineinv_epi64_epi8 (__v16qi __A, __v16qi __B, const int __C)
+        {
+            return (__v16qi) __builtin_ia32_vgf2p8affineinvqb_v16qi ((__v16qi) __A,
+								     (__v16qi) __B,
+								      0);
+        }
+    } "-mgfni" ]
+}
+
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index c5c43b1..3378c5d 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -620,4 +620,12 @@
 #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D)
 
+/* gfniintrin.h */
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, 1) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, 1, D, E) 
+
 #include <x86intrin.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index fc339a5..d2a301c 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -619,6 +619,14 @@
 #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D)
 
+/* gfniintrin.h */
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, 1) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, 1, D, E) 
+
 #pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid")
 
 #include <x86intrin.h>
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch][x86] GFNI enabling [2/4]
  2017-10-17 13:00 [patch][x86] GFNI enabling [2/4] Koval, Julia
@ 2017-10-30 10:30 ` Kirill Yukhin
  2017-10-30 19:03   ` Koval, Julia
  0 siblings, 1 reply; 9+ messages in thread
From: Kirill Yukhin @ 2017-10-30 10:30 UTC (permalink / raw)
  To: Koval, Julia; +Cc: GCC Patches

On 17 Oct 12:58, Koval, Julia wrote:
> Hi, this is the second patch of enabling GFNI ISASET. It adds GF2P8AFFINEINV instruction.
> The instruction is described here:
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> 
> gcc/
> 	* config.gcc: Add gfniintrin.h.
> 	* config/i386/gfniintrin.h: New.
> 	* config/i386/i386-builtin-types.def (__builtin_ia32_vgf2p8affineinvqb_v64qi,
> 	__builtin_ia32_vgf2p8affineinvqb_v64qi_mask, __builtin_ia32_vgf2p8affineinvqb_v32qi
> 	__builtin_ia32_vgf2p8affineinvqb_v32qi_mask, __builtin_ia32_vgf2p8affineinvqb_v16qi,
> 	__builtin_ia32_vgf2p8affineinvqb_v16qi_mask): New builtins.
> 	* config/i386/i386-builtin.def (V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI,
> 	V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI, V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI,
> 	V64QI_FTYPE_V64QI_V64QI_INT): New types.
> 	* config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
> 	* config/i386/immintrin.h: Include gfniintrin.h.
> 	* config/i386/sse.md (vgf2p8affineinvqb_*) New pattern.
> 
> gcc/testsuite/
> 	* gcc.target/i386/avx-1.c: Handle new intrinsics.
> 	* gcc.target/i386/avx512-check.h: Check GFNI bit.
> 	* gcc.target/i386/avx512f-gf2p8affineinvqb-2.c: Runtime test.
> 	* gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c: Runtime test.
> 	* gcc.target/i386/gfni-1.c: New.
> 	* gcc.target/i386/gfni-2.c: New.
> 	* gcc.target/i386/gfni-3.c: New.
> 	* gcc.target/i386/gfni-4.c: New.
> 	* gcc.target/i386/i386.exp: (check_effective_target_gfni): New.
> 	* gcc.target/i386/sse-13.c: Handle new intrinsics.
> 	* gcc.target/i386/sse-23.c: Handle new intrinsics.
> 
> Ok for trunk?
Few comments:
1. Why copyright in config/i386/gfniintrin.h starts from 2014?

2. I think few tests updates are missing: g++.dg/other/i386-2,3.c + gcc.target/i386/sse-12,14.c

--
Thanks, K
> 
> Thanks,
> Julia


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [patch][x86] GFNI enabling [2/4]
  2017-10-30 10:30 ` Kirill Yukhin
@ 2017-10-30 19:03   ` Koval, Julia
  2017-10-31  7:03     ` Kirill Yukhin
  2017-10-31 20:08     ` Jakub Jelinek
  0 siblings, 2 replies; 9+ messages in thread
From: Koval, Julia @ 2017-10-30 19:03 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3822 bytes --]

Hi,
Fixed that.

gcc/
	* config.gcc: Add gfniintrin.h.
	* config/i386/gfniintrin.h: New.
	* config/i386/i386-builtin-types.def (__builtin_ia32_vgf2p8affineinvqb_v64qi,
	__builtin_ia32_vgf2p8affineinvqb_v64qi_mask, __builtin_ia32_vgf2p8affineinvqb_v32qi
	__builtin_ia32_vgf2p8affineinvqb_v32qi_mask, __builtin_ia32_vgf2p8affineinvqb_v16qi,
	__builtin_ia32_vgf2p8affineinvqb_v16qi_mask): New builtins.
	* config/i386/i386-builtin.def (V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI,
	V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI, V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI,
	V64QI_FTYPE_V64QI_V64QI_INT): New types.
	* config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
	* config/i386/immintrin.h: Include gfniintrin.h.
	* config/i386/sse.md (vgf2p8affineinvqb_*) New pattern.

gcc/testsuite/
	* gcc.target/i386/avx-1.c: Handle new intrinsics.
	* gcc.target/i386/avx512-check.h: Check GFNI bit.
	* gcc.target/i386/avx512f-gf2p8affineinvqb-2.c: Runtime test.
	* gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c: Runtime test.
	* gcc.target/i386/gfni-1.c: New.
	* gcc.target/i386/gfni-2.c: New.
	* gcc.target/i386/gfni-3.c: New.
	* gcc.target/i386/gfni-4.c: New.
	* gcc.target/i386/i386.exp: (check_effective_target_gfni): New. 
	* gcc.target/i386/sse-12.c: Handle new intrinsics. 
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-14.c: Ditto. 
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* g++.dg/other/i386-2.C: Ditto. 
	* g++.dg/other/i386-3.C: Ditto.

> -----Original Message-----
> From: Kirill Yukhin [mailto:kirill.yukhin@gmail.com]
> Sent: Monday, October 30, 2017 11:27 AM
> To: Koval, Julia <julia.koval@intel.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [patch][x86] GFNI enabling [2/4]
> 
> On 17 Oct 12:58, Koval, Julia wrote:
> > Hi, this is the second patch of enabling GFNI ISASET. It adds GF2P8AFFINEINV
> instruction.
> > The instruction is described here:
> > https://software.intel.com/sites/default/files/managed/c5/15/architecture-
> instruction-set-extensions-programming-reference.pdf
> >
> > gcc/
> > 	* config.gcc: Add gfniintrin.h.
> > 	* config/i386/gfniintrin.h: New.
> > 	* config/i386/i386-builtin-types.def
> (__builtin_ia32_vgf2p8affineinvqb_v64qi,
> > 	__builtin_ia32_vgf2p8affineinvqb_v64qi_mask,
> __builtin_ia32_vgf2p8affineinvqb_v32qi
> > 	__builtin_ia32_vgf2p8affineinvqb_v32qi_mask,
> __builtin_ia32_vgf2p8affineinvqb_v16qi,
> > 	__builtin_ia32_vgf2p8affineinvqb_v16qi_mask): New builtins.
> > 	* config/i386/i386-builtin.def
> (V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI,
> > 	V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI,
> V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI,
> > 	V64QI_FTYPE_V64QI_V64QI_INT): New types.
> > 	* config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
> > 	* config/i386/immintrin.h: Include gfniintrin.h.
> > 	* config/i386/sse.md (vgf2p8affineinvqb_*) New pattern.
> >
> > gcc/testsuite/
> > 	* gcc.target/i386/avx-1.c: Handle new intrinsics.
> > 	* gcc.target/i386/avx512-check.h: Check GFNI bit.
> > 	* gcc.target/i386/avx512f-gf2p8affineinvqb-2.c: Runtime test.
> > 	* gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c: Runtime test.
> > 	* gcc.target/i386/gfni-1.c: New.
> > 	* gcc.target/i386/gfni-2.c: New.
> > 	* gcc.target/i386/gfni-3.c: New.
> > 	* gcc.target/i386/gfni-4.c: New.
> > 	* gcc.target/i386/i386.exp: (check_effective_target_gfni): New.
> > 	* gcc.target/i386/sse-13.c: Handle new intrinsics.
> > 	* gcc.target/i386/sse-23.c: Handle new intrinsics.
> >
> > Ok for trunk?
> Few comments:
> 1. Why copyright in config/i386/gfniintrin.h starts from 2014?
> 
> 2. I think few tests updates are missing: g++.dg/other/i386-2,3.c +
> gcc.target/i386/sse-12,14.c
> 
> --
> Thanks, K
> >
> > Thanks,
> > Julia
> 


[-- Attachment #2: 0001-GF2P8AFFINEINVQB.PATCH --]
[-- Type: application/octet-stream, Size: 112494 bytes --]

From 32220dacdee798f89992f1ccfe530edda2cb6321 Mon Sep 17 00:00:00 2001
From: julia <jkoval@gkticlel801.igk.intel.com>
Date: Mon, 30 Oct 2017 14:43:02 +0300
Subject: [PATCH] GF2P8AFFINEINVQB

---
 gcc/config.gcc                                     |   6 +-
 gcc/config/i386/gfniintrin.h                       | 229 +++++++++++++++++++++
 gcc/config/i386/i386-builtin-types.def             |   6 +
 gcc/config/i386/i386-builtin.def                   |   7 +
 gcc/config/i386/i386.c                             |   8 +
 gcc/config/i386/immintrin.h                        |   2 +
 gcc/config/i386/sse.md                             |  23 +++
 gcc/testsuite/g++.dg/other/i386-2.C                |   6 +-
 gcc/testsuite/g++.dg/other/i386-3.C                |   6 +-
 gcc/testsuite/gcc.target/i386/avx-1.c              |  12 +-
 gcc/testsuite/gcc.target/i386/avx512-check.h       |   3 +
 .../gcc.target/i386/avx512f-gf2p8affineinvqb-2.c   |  74 +++++++
 .../gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c  |  17 ++
 gcc/testsuite/gcc.target/i386/gfni-1.c             |  18 ++
 gcc/testsuite/gcc.target/i386/gfni-2.c             |  27 +++
 gcc/testsuite/gcc.target/i386/gfni-3.c             |  17 ++
 gcc/testsuite/gcc.target/i386/gfni-4.c             |  14 ++
 gcc/testsuite/gcc.target/i386/i386.exp             |  15 ++
 gcc/testsuite/gcc.target/i386/sse-12.c             |   4 +-
 gcc/testsuite/gcc.target/i386/sse-13.c             |  10 +-
 gcc/testsuite/gcc.target/i386/sse-14.c             |  11 +-
 gcc/testsuite/gcc.target/i386/sse-22.c             |   9 +-
 gcc/testsuite/gcc.target/i386/sse-23.c             |  10 +-
 23 files changed, 516 insertions(+), 18 deletions(-)
 create mode 100644 gcc/config/i386/gfniintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-gf2p8affineinvqb-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/gfni-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/gfni-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/gfni-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/gfni-4.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index c3dab84..67e6286 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -331,125 +331,127 @@ arm*-*-*)
 	extra_options="${extra_options} arm/arm-tables.opt"
 	target_gtfiles="\$(srcdir)/config/arm/arm-builtins.c"
 	;;
 avr-*-*)
 	cpu_type=avr
 	c_target_objs="avr-c.o"
 	cxx_target_objs="avr-c.o"
 	;;
 bfin*-*)
 	cpu_type=bfin
 	;;
 crisv32-*)
 	cpu_type=cris
 	;;
 frv*)	cpu_type=frv
 	extra_options="${extra_options} g.opt"
 	;;
 ft32*)	cpu_type=ft32
 	target_has_targetm_common=no
 	;;
 moxie*)	cpu_type=moxie
 	target_has_targetm_common=no
 	;;
 fido-*-*)
 	cpu_type=m68k
 	extra_headers=math-68881.h
 	extra_options="${extra_options} m68k/m68k-tables.opt"
         ;;
 i[34567]86-*-*)
 	cpu_type=i386
 	c_target_objs="i386-c.o"
 	cxx_target_objs="i386-c.o"
 	extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o x86-tune-sched-core.o"
 	extra_options="${extra_options} fused-madd.opt"
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
 		       immintrin.h x86intrin.h avxintrin.h xopintrin.h
 		       ia32intrin.h cross-stdarg.h lwpintrin.h popcntintrin.h
 		       lzcntintrin.h bmiintrin.h bmi2intrin.h tbmintrin.h
 		       avx2intrin.h avx512fintrin.h fmaintrin.h f16cintrin.h
 		       rtmintrin.h xtestintrin.h rdseedintrin.h prfchwintrin.h
 		       adxintrin.h fxsrintrin.h xsaveintrin.h xsaveoptintrin.h
 		       avx512cdintrin.h avx512erintrin.h avx512pfintrin.h
 		       shaintrin.h clflushoptintrin.h xsavecintrin.h
 		       xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
 		       avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
 		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
 		       avx512vpopcntdqintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h sgxintrin.h cetintrin.h"
+		       clzerointrin.h pkuintrin.h sgxintrin.h cetintrin.h
+		       gfniintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
 	c_target_objs="i386-c.o"
 	cxx_target_objs="i386-c.o"
 	extra_options="${extra_options} fused-madd.opt"
 	extra_objs="x86-tune-sched.o x86-tune-sched-bd.o x86-tune-sched-atom.o x86-tune-sched-core.o"
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
 		       immintrin.h x86intrin.h avxintrin.h xopintrin.h
 		       ia32intrin.h cross-stdarg.h lwpintrin.h popcntintrin.h
 		       lzcntintrin.h bmiintrin.h tbmintrin.h bmi2intrin.h
 		       avx2intrin.h avx512fintrin.h fmaintrin.h f16cintrin.h
 		       rtmintrin.h xtestintrin.h rdseedintrin.h prfchwintrin.h
 		       adxintrin.h fxsrintrin.h xsaveintrin.h xsaveoptintrin.h
 		       avx512cdintrin.h avx512erintrin.h avx512pfintrin.h
 		       shaintrin.h clflushoptintrin.h xsavecintrin.h
 		       xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
 		       avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
 		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
 		       avx512vpopcntdqintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h sgxintrin.h cetintrin.h"
+		       clzerointrin.h pkuintrin.h sgxintrin.h cetintrin.h
+		       gfniintrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
 	extra_options="${extra_options} g.opt fused-madd.opt"
 	;;
 hppa*-*-*)
 	cpu_type=pa
 	;;
 lm32*)
 	extra_options="${extra_options} g.opt"
 	;;
 m32r*-*-*)
         cpu_type=m32r
 	extra_options="${extra_options} g.opt"
         ;;
 m68k-*-*)
 	extra_headers=math-68881.h
 	extra_options="${extra_options} m68k/m68k-tables.opt"
 	;;
 microblaze*-*-*)
         cpu_type=microblaze
 	extra_options="${extra_options} g.opt"
         ;;
 mips*-*-*)
 	cpu_type=mips
 	extra_headers="loongson.h msa.h"
 	extra_objs="frame-header-opt.o"
 	extra_options="${extra_options} g.opt fused-madd.opt mips/mips-tables.opt"
 	;;
 nds32*)
 	cpu_type=nds32
 	extra_headers="nds32_intrinsic.h"
 	extra_objs="nds32-cost.o nds32-intrinsic.o nds32-isr.o nds32-md-auxiliary.o nds32-pipelines-auxiliary.o nds32-predicates.o nds32-memory-manipulation.o nds32-fp-as-gp.o"
 	;;
 nios2-*-*)
 	cpu_type=nios2
 	extra_options="${extra_options} g.opt"
 	;;
 nvptx-*-*)
 	cpu_type=nvptx
 	;;
 powerpc*-*-*spe*)
 	cpu_type=powerpcspe
 	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
 	case x$with_cpu in
 	    xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
 		cpu_is_64bit=yes
 		;;
 	esac
 	extra_options="${extra_options} g.opt fused-madd.opt powerpcspe/powerpcspe-tables.opt"
diff --git a/gcc/config/i386/gfniintrin.h b/gcc/config/i386/gfniintrin.h
new file mode 100644
index 0000000..f4ca01c
--- /dev/null
+++ b/gcc/config/i386/gfniintrin.h
@@ -0,0 +1,229 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#error "Never use <gfniintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef _GFNIINTRIN_H_INCLUDED
+#define _GFNIINTRIN_H_INCLUDED
+
+#ifndef __GFNI__
+#pragma GCC push_options
+#pragma GCC target("gfni")
+#define __DISABLE_GFNI__
+#endif /* __GFNI__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_gf2p8affineinv_epi64_epi8 (__m128i __A, __m128i __B, const int __C)
+{
+  return (__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi ((__v16qi) __A,
+							   (__v16qi) __B,
+							    __C);
+}
+#else
+#define _mm_gf2p8affineinv_epi64_epi8(A, B, C)				   \
+  ((__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi((__v16qi)(__m128i)(A), \
+					   (__v16qi)(__m128i)(B), (int)(C)))
+#endif
+
+#ifdef __DISABLE_GFNI__
+#undef __DISABLE_GFNI__
+#pragma GCC pop_options
+#endif /* __DISABLE_GFNI__ */
+
+#if !defined(__GFNI__) || !defined(__AVX__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx")
+#define __DISABLE_GFNIAVX__
+#endif /* __GFNIAVX__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_gf2p8affineinv_epi64_epi8 (__m256i __A, __m256i __B, const int __C)
+{
+  return (__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi ((__v32qi) __A,
+							   (__v32qi) __B,
+							    __C);
+}
+#else
+#define _mm256_gf2p8affineinv_epi64_epi8(A, B, C)			   \
+  ((__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi((__v32qi)(__m256i)(A), \
+						    (__v32qi)(__m256i)(B), \
+						    (int)(C)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX__
+#undef __DISABLE_GFNIAVX__
+#pragma GCC pop_options
+#endif /* __GFNIAVX__ */
+
+#if !defined(__GFNI__) || !defined(__AVX512VL__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx512vl")
+#define __DISABLE_GFNIAVX512VL__
+#endif /* __GFNIAVX512VL__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_gf2p8affineinv_epi64_epi8 (__m128i __A, __mmask16 __B, __m128i __C,
+				    __m128i __D, const int __E)
+{
+  return (__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask ((__v16qi) __C,
+								(__v16qi) __D,
+								 __E,
+								(__v16qi)__A,
+								 __B);
+}
+
+extern __inline __m128i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_gf2p8affineinv_epi64_epi8 (__mmask16 __A, __m128i __B, __m128i __C,
+				     const int __D)
+{
+  return (__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask ((__v16qi) __B,
+						(__v16qi) __C, __D,
+						(__v16qi) _mm_setzero_si128 (),
+						 __A);
+}
+#else
+#define _mm_mask_gf2p8affineinv_epi64_epi8(A, B, C, D, E) 		   \
+  ((__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(		   \
+			(__v16qi)(__m128i)(C), (__v16qi)(__m128i)(D),      \
+			(int)(E), (__v16qi)(__m128i)(A), (__mmask16)(B)))
+#define _mm_maskz_gf2p8affineinv_epi64_epi8(A, B, C, D) \
+  ((__m128i) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(		   \
+			(__v16qi)(__m128i)(B), (__v16qi)(__m128i)(C),	   \
+			(int)(D), (__v16qi)(__m128i) _mm_setzero_si128 (), \
+			(__mmask16)(A)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX512VL__
+#undef __DISABLE_GFNIAVX512VL__
+#pragma GCC pop_options
+#endif /* __GFNIAVX512VL__ */
+
+#if !defined(__GFNI__) || !defined(__AVX512VL__) || !defined(__AVX512BW__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx512vl,avx512bw")
+#define __DISABLE_GFNIAVX512VLBW__
+#endif /* __GFNIAVX512VLBW__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_gf2p8affineinv_epi64_epi8 (__m256i __A, __mmask32 __B,
+				       __m256i __C, __m256i __D, const int __E)
+{
+  return (__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask ((__v32qi) __C,
+								(__v32qi) __D,
+							 	 __E,
+								(__v32qi)__A,
+								 __B);
+}
+
+extern __inline __m256i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_gf2p8affineinv_epi64_epi8 (__mmask32 __A, __m256i __B,
+					__m256i __C, const int __D)
+{
+  return (__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask ((__v32qi) __B,
+				      (__v32qi) __C, __D,
+				      (__v32qi) _mm256_setzero_si256 (), __A);
+}
+#else
+#define _mm256_mask_gf2p8affineinv_epi64_epi8(A, B, C, D, E)		\
+  ((__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(		\
+	(__v32qi)(__m256i)(C), (__v32qi)(__m256i)(D), (int)(E),		\
+	(__v32qi)(__m256i)(A), (__mmask32)(B)))
+#define _mm256_maskz_gf2p8affineinv_epi64_epi8(A, B, C, D)		\
+  ((__m256i) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(		\
+	(__v32qi)(__m256i)(B), (__v32qi)(__m256i)(C), (int)(D),		\
+	(__v32qi)(__m256i) _mm256_setzero_si256 (), (__mmask32)(A)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX512VLBW__
+#undef __DISABLE_GFNIAVX512VLBW__
+#pragma GCC pop_options
+#endif /* __GFNIAVX512VLBW__ */
+
+#if !defined(__GFNI__) || !defined(__AVX512F__) || !defined(__AVX512BW__)
+#pragma GCC push_options
+#pragma GCC target("gfni,avx512f,avx512bw")
+#define __DISABLE_GFNIAVX512FBW__
+#endif /* __GFNIAVX512FBW__ */
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_gf2p8affineinv_epi64_epi8 (__m512i __A, __mmask64 __B, __m512i __C,
+				       __m512i __D, const int __E)
+{
+  return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask ((__v64qi) __C,
+								(__v64qi) __D,
+								 __E,
+								(__v64qi)__A,
+								 __B);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_gf2p8affineinv_epi64_epi8 (__mmask64 __A, __m512i __B,
+					__m512i __C, const int __D)
+{
+  return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask ((__v64qi) __B,
+				(__v64qi) __C, __D,
+				(__v64qi) _mm512_setzero_si512 (), __A);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_gf2p8affineinv_epi64_epi8 (__m512i __A, __m512i __B, const int __C)
+{
+  return (__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi ((__v64qi) __A,
+							   (__v64qi) __B, __C);
+}
+#else
+#define _mm512_mask_gf2p8affineinv_epi64_epi8(A, B, C, D, E) 		\
+  ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(		\
+	(__v64qi)(__m512i)(C), (__v64qi)(__m512i)(D), (int)(E),		\
+	(__v64qi)(__m512i)(A), (__mmask64)(B)))
+#define _mm512_maskz_gf2p8affineinv_epi64_epi8(A, B, C, D)		\
+  ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(		\
+	(__v64qi)(__m512i)(B), (__v64qi)(__m512i)(C), (int)(D),		\
+	(__v64qi)(__m512i) _mm512_setzero_si512 (), (__mmask64)(A)))
+#define _mm512_gf2p8affineinv_epi64_epi8(A, B, C)			\
+  ((__m512i) __builtin_ia32_vgf2p8affineinvqb_v64qi (			\
+	(__v64qi)(__m512i)(A), (__v64qi)(__m512i)(B), (int)(C)))
+#endif
+
+#ifdef __DISABLE_GFNIAVX512FBW__
+#undef __DISABLE_GFNIAVX512FBW__
+#pragma GCC pop_options
+#endif /* __GFNIAVX512FBW__ */
+
+#endif /* _GFNIINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 1c0c6b4..5b3b96e 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1165,50 +1165,56 @@ DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_INT_V2DI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI_V2DI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V32HI_FTYPE_V32HI_INT_V32HI_USI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V32HI_FTYPE_V32HI_V8HI_V32HI_USI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V4DI_FTYPE_V4DI_INT_V4DI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V4DI_FTYPE_V4DI_V2DI_V4DI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SI_INT_V4SI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SI_V4SI_V4SI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V8DI_FTYPE_V8DI_INT_V8DI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V8DI_FTYPE_V8DI_V2DI_V8DI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V8HI_FTYPE_V8HI_INT_V8HI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V8HI_FTYPE_V8HI_V8HI_V8HI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V8SI_FTYPE_V8SI_INT_V8SI_UQI, COUNT)
 DEF_FUNCTION_TYPE_ALIAS (V8SI_FTYPE_V8SI_V4SI_V8SI_UQI, COUNT)
 
 DEF_FUNCTION_TYPE_ALIAS (V2DF_FTYPE_V2DF_V2DF, SWAP)
 DEF_FUNCTION_TYPE_ALIAS (V4SF_FTYPE_V4SF_V4SF, SWAP)
 
 DEF_FUNCTION_TYPE_ALIAS (V8DI_FTYPE_V8DI_INT, CONVERT)
 DEF_FUNCTION_TYPE_ALIAS (V4DI_FTYPE_V4DI_INT, CONVERT)
 DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_INT, CONVERT)
 DEF_FUNCTION_TYPE_ALIAS (V8DI_FTYPE_V8DI_V8DI_INT, CONVERT)
 DEF_FUNCTION_TYPE_ALIAS (V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UDI, CONVERT)
 DEF_FUNCTION_TYPE_ALIAS (V4DI_FTYPE_V4DI_V4DI_INT_V4DI_USI, CONVERT)
 DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI_INT_V2DI_UHI, CONVERT)
 DEF_FUNCTION_TYPE_ALIAS (V4DI_FTYPE_V4DI_V4DI_INT, CONVERT)
 DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI_INT, CONVERT)
 DEF_FUNCTION_TYPE_ALIAS (V1DI_FTYPE_V1DI_V1DI_INT, CONVERT)
 
 DEF_FUNCTION_TYPE_ALIAS (V16QI_FTYPE_V16QI_V16QI, CMP)
 DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI, CMP)
 DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SI_V4SI, CMP)
 DEF_FUNCTION_TYPE_ALIAS (V8HI_FTYPE_V8HI_V8HI, CMP)
 
 DEF_FUNCTION_TYPE_ALIAS (V16QI_FTYPE_V16QI_V16QI, TF)
 DEF_FUNCTION_TYPE_ALIAS (V2DF_FTYPE_V2DF_V2DF, TF)
 DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI, TF)
 DEF_FUNCTION_TYPE_ALIAS (V4SF_FTYPE_V4SF_V4SF, TF)
 DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SI_V4SI, TF)
 DEF_FUNCTION_TYPE_ALIAS (V8HI_FTYPE_V8HI_V8HI, TF)
 
 # MPX builtins
 DEF_FUNCTION_TYPE (BND, PCVOID, ULONG)
 DEF_FUNCTION_TYPE (VOID, PCVOID, BND)
 DEF_FUNCTION_TYPE (VOID, PCVOID, BND, PCVOID)
 DEF_FUNCTION_TYPE (BND, PCVOID, PCVOID)
 DEF_FUNCTION_TYPE (BND, PCVOID)
 DEF_FUNCTION_TYPE (BND, BND, BND)
 DEF_FUNCTION_TYPE (PVOID, PCVOID, BND, ULONG)
 DEF_FUNCTION_TYPE (ULONG, VOID)
 DEF_FUNCTION_TYPE (PVOID, BND)
+
+#GFNI builtins
+DEF_FUNCTION_TYPE (V64QI, V64QI, V64QI, INT)
+DEF_FUNCTION_TYPE (V64QI, V64QI, V64QI, INT, V64QI, UDI)
+DEF_FUNCTION_TYPE (V32QI, V32QI, V32QI, INT, V32QI, USI)
+DEF_FUNCTION_TYPE (V16QI, V16QI, V16QI, INT, V16QI, UHI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 5a58b94..76e5f0f 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2542,100 +2542,107 @@ BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_rcp28v8df_mask_round, "__buil
 BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_rcp28v16sf_mask_round, "__builtin_ia32_rcp28ps_mask", IX86_BUILTIN_RCP28PS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrcp28v2df_round, "__builtin_ia32_rcp28sd_round", IX86_BUILTIN_RCP28SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrcp28v4sf_round, "__builtin_ia32_rcp28ss_round", IX86_BUILTIN_RCP28SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_rsqrt28v8df_mask_round, "__builtin_ia32_rsqrt28pd_mask", IX86_BUILTIN_RSQRT28PD, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_rsqrt28v16sf_mask_round, "__builtin_ia32_rsqrt28ps_mask", IX86_BUILTIN_RSQRT28PS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrsqrt28v2df_round, "__builtin_ia32_rsqrt28sd_round", IX86_BUILTIN_RSQRT28SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrsqrt28v4sf_round, "__builtin_ia32_rsqrt28ss_round", IX86_BUILTIN_RSQRT28SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 
 /* AVX512DQ.  */
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangesv2df_round, "__builtin_ia32_rangesd128_round", IX86_BUILTIN_RANGESD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangesv4sf_round, "__builtin_ia32_rangess128_round", IX86_BUILTIN_RANGESS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_fix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2qq512_mask", IX86_BUILTIN_CVTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_cvtps2qqv8di_mask_round, "__builtin_ia32_cvtps2qq512_mask", IX86_BUILTIN_CVTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ufix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2uqq512_mask", IX86_BUILTIN_CVTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_cvtps2uqqv8di_mask_round, "__builtin_ia32_cvtps2uqq512_mask", IX86_BUILTIN_CVTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_floatv8div8sf2_mask_round, "__builtin_ia32_cvtqq2ps512_mask", IX86_BUILTIN_CVTQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ufloatv8div8sf2_mask_round, "__builtin_ia32_cvtuqq2ps512_mask", IX86_BUILTIN_CVTUQQ2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DI_V8SF_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_floatv8div8df2_mask_round, "__builtin_ia32_cvtqq2pd512_mask", IX86_BUILTIN_CVTQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ufloatv8div8df2_mask_round, "__builtin_ia32_cvtuqq2pd512_mask", IX86_BUILTIN_CVTUQQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8DI_V8DF_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_fix_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2qq512_mask", IX86_BUILTIN_CVTTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ufix_truncv8sfv8di2_mask_round, "__builtin_ia32_cvttps2uqq512_mask", IX86_BUILTIN_CVTTPS2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_fix_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2qq512_mask", IX86_BUILTIN_CVTTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ufix_truncv8dfv8di2_mask_round, "__builtin_ia32_cvttpd2uqq512_mask", IX86_BUILTIN_CVTTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
 
 BDESC_END (ROUND_ARGS, ARGS2)
 
 /* AVX512_4FMAPS and AVX512_4VNNIW builtins with variable number of arguments. Defined in additional ix86_isa_flags2.  */
 BDESC_FIRST (args2, ARGS2,
        OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddps_mask, "__builtin_ia32_4fmaddps_mask", IX86_BUILTIN_4FMAPS_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
 BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddps, "__builtin_ia32_4fmaddps", IX86_BUILTIN_4FMAPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
 BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddss, "__builtin_ia32_4fmaddss", IX86_BUILTIN_4FMASS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
 BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddss_mask, "__builtin_ia32_4fmaddss_mask", IX86_BUILTIN_4FMASS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddps_mask, "__builtin_ia32_4fnmaddps_mask", IX86_BUILTIN_4FNMAPS_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
 BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddps, "__builtin_ia32_4fnmaddps", IX86_BUILTIN_4FNMAPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
 BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddss, "__builtin_ia32_4fnmaddss", IX86_BUILTIN_4FNMASS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
 BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddss_mask, "__builtin_ia32_4fnmaddss_mask", IX86_BUILTIN_4FNMASS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssd, "__builtin_ia32_vp4dpwssd", IX86_BUILTIN_4DPWSSD, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
 BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssd_mask, "__builtin_ia32_vp4dpwssd_mask", IX86_BUILTIN_4DPWSSD_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
 BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssds, "__builtin_ia32_vp4dpwssds", IX86_BUILTIN_4DPWSSDS, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
 BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssds_mask, "__builtin_ia32_vp4dpwssds_mask", IX86_BUILTIN_4DPWSSDS_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, CODE_FOR_vpopcountv16si, "__builtin_ia32_vpopcountd_v16si", IX86_BUILTIN_VPOPCOUNTDV16SI, UNKNOWN, (int) V16SI_FTYPE_V16SI)
 BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, CODE_FOR_vpopcountv16si_mask, "__builtin_ia32_vpopcountd_v16si_mask", IX86_BUILTIN_VPOPCOUNTDV16SI_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, CODE_FOR_vpopcountv8di, "__builtin_ia32_vpopcountq_v8di", IX86_BUILTIN_VPOPCOUNTQV8DI, UNKNOWN, (int) V8DI_FTYPE_V8DI)
 BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, CODE_FOR_vpopcountv8di_mask, "__builtin_ia32_vpopcountq_v8di_mask", IX86_BUILTIN_VPOPCOUNTQV8DI_MASK, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
 
 /* RDPID */
 BDESC (OPTION_MASK_ISA_RDPID, CODE_FOR_rdpid, "__builtin_ia32_rdpid", IX86_BUILTIN_RDPID, UNKNOWN, (int) UNSIGNED_FTYPE_VOID)
 
+/* GFNI */
+BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v32qi, "__builtin_ia32_vgf2p8affineinvqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v32qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI)
+BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v16qi, "__builtin_ia32_vgf2p8affineinvqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v16qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI)
 BDESC_END (ARGS2, MPX)
 
 /* Builtins for MPX.  */
 BDESC_FIRST (mpx, MPX,
        OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndstx", IX86_BUILTIN_BNDSTX, UNKNOWN, (int) VOID_FTYPE_PCVOID_BND_PCVOID)
 BDESC (OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndcl", IX86_BUILTIN_BNDCL, UNKNOWN, (int) VOID_FTYPE_PCVOID_BND)
 BDESC (OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndcu", IX86_BUILTIN_BNDCU, UNKNOWN, (int) VOID_FTYPE_PCVOID_BND)
 
 BDESC_END (MPX, MPX_CONST)
 
 /* Const builtins for MPX.  */
 BDESC_FIRST (mpx_const, MPX_CONST,
        OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndmk", IX86_BUILTIN_BNDMK, UNKNOWN, (int) BND_FTYPE_PCVOID_ULONG)
 BDESC (OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndldx", IX86_BUILTIN_BNDLDX, UNKNOWN, (int) BND_FTYPE_PCVOID_PCVOID)
 BDESC (OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_narrow_bounds", IX86_BUILTIN_BNDNARROW, UNKNOWN, (int) PVOID_FTYPE_PCVOID_BND_ULONG)
 BDESC (OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndint", IX86_BUILTIN_BNDINT, UNKNOWN, (int) BND_FTYPE_BND_BND)
 BDESC (OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_sizeof", IX86_BUILTIN_SIZEOF, UNKNOWN, (int) ULONG_FTYPE_VOID)
 BDESC (OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndlower", IX86_BUILTIN_BNDLOWER, UNKNOWN, (int) PVOID_FTYPE_BND)
 BDESC (OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndupper", IX86_BUILTIN_BNDUPPER, UNKNOWN, (int) PVOID_FTYPE_BND)
 BDESC (OPTION_MASK_ISA_MPX, (enum insn_code)0, "__builtin_ia32_bndret", IX86_BUILTIN_BNDRET, UNKNOWN, (int) BND_FTYPE_PCVOID)
 
 BDESC_END (MPX_CONST, MULTI_ARG)
 
 /* FMA4 and XOP.  */
 BDESC_FIRST (multi_arg, MULTI_ARG,
        OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfmadd_v4sf, "__builtin_ia32_vfmaddss", IX86_BUILTIN_VFMADDSS, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfmadd_v2df, "__builtin_ia32_vfmaddsd", IX86_BUILTIN_VFMADDSD, UNKNOWN, (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v4sf, "__builtin_ia32_vfmaddss3", IX86_BUILTIN_VFMADDSS3, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v2df, "__builtin_ia32_vfmaddsd3", IX86_BUILTIN_VFMADDSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
 
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4sf, "__builtin_ia32_vfmaddps", IX86_BUILTIN_VFMADDPS, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v2df, "__builtin_ia32_vfmaddpd", IX86_BUILTIN_VFMADDPD, UNKNOWN, (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v8sf, "__builtin_ia32_vfmaddps256", IX86_BUILTIN_VFMADDPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4df, "__builtin_ia32_vfmaddpd256", IX86_BUILTIN_VFMADDPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
 
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v4sf, "__builtin_ia32_vfmaddsubps", IX86_BUILTIN_VFMADDSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v2df, "__builtin_ia32_vfmaddsubpd", IX86_BUILTIN_VFMADDSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v8sf, "__builtin_ia32_vfmaddsubps256", IX86_BUILTIN_VFMADDSUBPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v4df, "__builtin_ia32_vfmaddsubpd256", IX86_BUILTIN_VFMADDSUBPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
 
 BDESC (OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v2di,        "__builtin_ia32_vpcmov",      IX86_BUILTIN_VPCMOV,	 UNKNOWN,      (int)MULTI_ARG_3_DI)
 BDESC (OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v2di,        "__builtin_ia32_vpcmov_v2di", IX86_BUILTIN_VPCMOV_V2DI, UNKNOWN,      (int)MULTI_ARG_3_DI)
 BDESC (OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v4si,        "__builtin_ia32_vpcmov_v4si", IX86_BUILTIN_VPCMOV_V4SI, UNKNOWN,      (int)MULTI_ARG_3_SI)
 BDESC (OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v8hi,        "__builtin_ia32_vpcmov_v8hi", IX86_BUILTIN_VPCMOV_V8HI, UNKNOWN,      (int)MULTI_ARG_3_HI)
 BDESC (OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v16qi,       "__builtin_ia32_vpcmov_v16qi",IX86_BUILTIN_VPCMOV_V16QI,UNKNOWN,      (int)MULTI_ARG_3_QI)
 BDESC (OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v2df,        "__builtin_ia32_vpcmov_v2df", IX86_BUILTIN_VPCMOV_V2DF, UNKNOWN,      (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v4sf,        "__builtin_ia32_vpcmov_v4sf", IX86_BUILTIN_VPCMOV_V4SF, UNKNOWN,      (int)MULTI_ARG_3_SF)
 
 BDESC (OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v4di256,        "__builtin_ia32_vpcmov256",       IX86_BUILTIN_VPCMOV256,       UNKNOWN,      (int)MULTI_ARG_3_DI2)
 BDESC (OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v4di256,        "__builtin_ia32_vpcmov_v4di256",  IX86_BUILTIN_VPCMOV_V4DI256,  UNKNOWN,      (int)MULTI_ARG_3_DI2)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2de0dd0..382635f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -33627,100 +33627,101 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V4DF_FTYPE_V4SF_V4DF_UQI:
     case V4DF_FTYPE_V4SI_V4DF_UQI:
     case V8SI_FTYPE_V8SI_V8SI_UQI:
     case V8SI_FTYPE_V8HI_V8SI_UQI:
     case V8SI_FTYPE_V16QI_V8SI_UQI:
     case V8DF_FTYPE_V8SI_V8DF_UQI:
     case V8DI_FTYPE_DI_V8DI_UQI:
     case V16SF_FTYPE_V8SF_V16SF_UHI:
     case V16SI_FTYPE_V8SI_V16SI_UHI:
     case V16HI_FTYPE_V16HI_V16HI_UHI:
     case V8HI_FTYPE_V16QI_V8HI_UQI:
     case V16HI_FTYPE_V16QI_V16HI_UHI:
     case V32HI_FTYPE_V32HI_V32HI_USI:
     case V32HI_FTYPE_V32QI_V32HI_USI:
     case V8DI_FTYPE_V16QI_V8DI_UQI:
     case V8DI_FTYPE_V2DI_V8DI_UQI:
     case V8DI_FTYPE_V4DI_V8DI_UQI:
     case V8DI_FTYPE_V8DI_V8DI_UQI:
     case V8DI_FTYPE_V8HI_V8DI_UQI:
     case V8DI_FTYPE_V8SI_V8DI_UQI:
     case V8HI_FTYPE_V8DI_V8HI_UQI:
     case V8SI_FTYPE_V8DI_V8SI_UQI:
     case V4SI_FTYPE_V4SI_V4SI_V4SI:
       nargs = 3;
       break;
     case V32QI_FTYPE_V32QI_V32QI_INT:
     case V16HI_FTYPE_V16HI_V16HI_INT:
     case V16QI_FTYPE_V16QI_V16QI_INT:
     case V4DI_FTYPE_V4DI_V4DI_INT:
     case V8HI_FTYPE_V8HI_V8HI_INT:
     case V8SI_FTYPE_V8SI_V8SI_INT:
     case V8SI_FTYPE_V8SI_V4SI_INT:
     case V8SF_FTYPE_V8SF_V8SF_INT:
     case V8SF_FTYPE_V8SF_V4SF_INT:
     case V4SI_FTYPE_V4SI_V4SI_INT:
     case V4DF_FTYPE_V4DF_V4DF_INT:
     case V16SF_FTYPE_V16SF_V16SF_INT:
     case V16SF_FTYPE_V16SF_V4SF_INT:
     case V16SI_FTYPE_V16SI_V4SI_INT:
     case V4DF_FTYPE_V4DF_V2DF_INT:
     case V4SF_FTYPE_V4SF_V4SF_INT:
     case V2DI_FTYPE_V2DI_V2DI_INT:
     case V4DI_FTYPE_V4DI_V2DI_INT:
     case V2DF_FTYPE_V2DF_V2DF_INT:
     case UQI_FTYPE_V8DI_V8UDI_INT:
     case UQI_FTYPE_V8DF_V8DF_INT:
     case UQI_FTYPE_V2DF_V2DF_INT:
     case UQI_FTYPE_V4SF_V4SF_INT:
     case UHI_FTYPE_V16SI_V16SI_INT:
     case UHI_FTYPE_V16SF_V16SF_INT:
+    case V64QI_FTYPE_V64QI_V64QI_INT:
       nargs = 3;
       nargs_constant = 1;
       break;
     case V4DI_FTYPE_V4DI_V4DI_INT_CONVERT:
       nargs = 3;
       rmode = V4DImode;
       nargs_constant = 1;
       break;
     case V2DI_FTYPE_V2DI_V2DI_INT_CONVERT:
       nargs = 3;
       rmode = V2DImode;
       nargs_constant = 1;
       break;
     case V1DI_FTYPE_V1DI_V1DI_INT_CONVERT:
       nargs = 3;
       rmode = DImode;
       nargs_constant = 1;
       break;
     case V2DI_FTYPE_V2DI_UINT_UINT:
       nargs = 3;
       nargs_constant = 2;
       break;
     case V8DI_FTYPE_V8DI_V8DI_INT_CONVERT:
       nargs = 3;
       rmode = V8DImode;
       nargs_constant = 1;
       break;
     case V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UDI_CONVERT:
       nargs = 5;
       rmode = V8DImode;
       mask_pos = 2;
       nargs_constant = 1;
       break;
     case QI_FTYPE_V8DF_INT_UQI:
     case QI_FTYPE_V4DF_INT_UQI:
     case QI_FTYPE_V2DF_INT_UQI:
     case HI_FTYPE_V16SF_INT_UHI:
     case QI_FTYPE_V8SF_INT_UQI:
     case QI_FTYPE_V4SF_INT_UQI:
       nargs = 3;
       mask_pos = 1;
       nargs_constant = 1;
       break;
     case V4DI_FTYPE_V4DI_V4DI_INT_V4DI_USI_CONVERT:
       nargs = 5;
       rmode = V4DImode;
       mask_pos = 2;
       nargs_constant = 1;
       break;
     case V2DI_FTYPE_V2DI_V2DI_INT_V2DI_UHI_CONVERT:
@@ -33854,100 +33855,107 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V8DI_FTYPE_V8DI_INT_V8DI_UQI:
       nargs = 4;
       mask_pos = 2;
       nargs_constant = 1;
       break;
     case V16SF_FTYPE_V16SF_V4SF_INT_V16SF_UHI:
     case V16SI_FTYPE_V16SI_V4SI_INT_V16SI_UHI:
     case V8DF_FTYPE_V8DF_V8DF_INT_V8DF_UQI:
     case V8DI_FTYPE_V8DI_V8DI_INT_V8DI_UQI:
     case V16SF_FTYPE_V16SF_V16SF_INT_V16SF_UHI:
     case V16SI_FTYPE_V16SI_V16SI_INT_V16SI_UHI:
     case V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI:
     case V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI:
     case V8DF_FTYPE_V8DF_V4DF_INT_V8DF_UQI:
     case V8DI_FTYPE_V8DI_V4DI_INT_V8DI_UQI:
     case V4DF_FTYPE_V4DF_V4DF_INT_V4DF_UQI:
     case V8SF_FTYPE_V8SF_V8SF_INT_V8SF_UQI:
     case V8DF_FTYPE_V8DF_V2DF_INT_V8DF_UQI:
     case V8DI_FTYPE_V8DI_V2DI_INT_V8DI_UQI:
     case V8SI_FTYPE_V8SI_V8SI_INT_V8SI_UQI:
     case V4DI_FTYPE_V4DI_V4DI_INT_V4DI_UQI:
     case V4SI_FTYPE_V4SI_V4SI_INT_V4SI_UQI:
     case V2DI_FTYPE_V2DI_V2DI_INT_V2DI_UQI:
     case V32HI_FTYPE_V64QI_V64QI_INT_V32HI_USI:
     case V16HI_FTYPE_V32QI_V32QI_INT_V16HI_UHI:
     case V8HI_FTYPE_V16QI_V16QI_INT_V8HI_UQI:
     case V16SF_FTYPE_V16SF_V8SF_INT_V16SF_UHI:
     case V16SI_FTYPE_V16SI_V8SI_INT_V16SI_UHI:
     case V8SF_FTYPE_V8SF_V4SF_INT_V8SF_UQI:
     case V8SI_FTYPE_V8SI_V4SI_INT_V8SI_UQI:
     case V4DI_FTYPE_V4DI_V2DI_INT_V4DI_UQI:
     case V4DF_FTYPE_V4DF_V2DF_INT_V4DF_UQI:
       nargs = 5;
       mask_pos = 2;
       nargs_constant = 1;
       break;
     case V8DI_FTYPE_V8DI_V8DI_V8DI_INT_UQI:
     case V16SI_FTYPE_V16SI_V16SI_V16SI_INT_UHI:
     case V2DF_FTYPE_V2DF_V2DF_V2DI_INT_UQI:
     case V4SF_FTYPE_V4SF_V4SF_V4SI_INT_UQI:
     case V8SF_FTYPE_V8SF_V8SF_V8SI_INT_UQI:
     case V8SI_FTYPE_V8SI_V8SI_V8SI_INT_UQI:
     case V4DF_FTYPE_V4DF_V4DF_V4DI_INT_UQI:
     case V4DI_FTYPE_V4DI_V4DI_V4DI_INT_UQI:
     case V4SI_FTYPE_V4SI_V4SI_V4SI_INT_UQI:
     case V2DI_FTYPE_V2DI_V2DI_V2DI_INT_UQI:
       nargs = 5;
       mask_pos = 1;
       nargs_constant = 1;
       break;
+    case V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI:
+    case V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI:
+    case V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI:
+      nargs = 5;
+      mask_pos = 1;
+      nargs_constant = 2;
+      break;
 
     default:
       gcc_unreachable ();
     }
 
   gcc_assert (nargs <= ARRAY_SIZE (args));
 
   if (comparison != UNKNOWN)
     {
       gcc_assert (nargs == 2);
       return ix86_expand_sse_compare (d, exp, target, swap);
     }
 
   if (rmode == VOIDmode || rmode == tmode)
     {
       if (optimize
 	  || target == 0
 	  || GET_MODE (target) != tmode
 	  || !insn_p->operand[0].predicate (target, tmode))
 	target = gen_reg_rtx (tmode);
       else if (memory_operand (target, tmode))
 	num_memory++;
       real_target = target;
     }
   else
     {
       real_target = gen_reg_rtx (tmode);
       target = lowpart_subreg (rmode, real_target, tmode);
     }
 
   for (i = 0; i < nargs; i++)
     {
       tree arg = CALL_EXPR_ARG (exp, i);
       rtx op = expand_normal (arg);
       machine_mode mode = insn_p->operand[i + 1].mode;
       bool match = insn_p->operand[i + 1].predicate (op, mode);
 
       if (second_arg_count && i == 1)
 	{
 	  /* SIMD shift insns take either an 8-bit immediate or
 	     register as count.  But builtin functions take int as
 	     count.  If count doesn't match, we put it in register.
 	     The instructions are using 64-bit count, if op is just
 	     32-bit, zero-extend it, as negative shift counts
 	     are undefined behavior and zero-extension is more
 	     efficient.  */
 	  if (!match)
 	    {
 	      if (SCALAR_INT_MODE_P (GET_MODE (op)))
 		op = convert_modes (mode, GET_MODE (op), op, 1);
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index 696cd20..365d2db 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -45,100 +45,102 @@
 #include <avx512fintrin.h>
 
 #include <avx512erintrin.h>
 
 #include <avx512pfintrin.h>
 
 #include <avx512cdintrin.h>
 
 #include <avx512vlintrin.h>
 
 #include <avx512bwintrin.h>
 
 #include <avx512dqintrin.h>
 
 #include <avx512vlbwintrin.h>
 
 #include <avx512vldqintrin.h>
 
 #include <avx512ifmaintrin.h>
 
 #include <avx512ifmavlintrin.h>
 
 #include <avx512vbmiintrin.h>
 
 #include <avx512vbmivlintrin.h>
 
 #include <avx5124fmapsintrin.h>
 
 #include <avx5124vnniwintrin.h>
 
 #include <avx512vpopcntdqintrin.h>
 
 #include <shaintrin.h>
 
 #include <lzcntintrin.h>
 
 #include <bmiintrin.h>
 
 #include <bmi2intrin.h>
 
 #include <fmaintrin.h>
 
 #include <f16cintrin.h>
 
 #include <rtmintrin.h>
 
 #include <xtestintrin.h>
 
 #include <cetintrin.h>
 
+#include <gfniintrin.h>
+
 #ifndef __RDRND__
 #pragma GCC push_options
 #pragma GCC target("rdrnd")
 #define __DISABLE_RDRND__
 #endif /* __RDRND__ */
 extern __inline int
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _rdrand16_step (unsigned short *__P)
 {
   return __builtin_ia32_rdrand16_step (__P);
 }
 
 extern __inline int
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _rdrand32_step (unsigned int *__P)
 {
   return __builtin_ia32_rdrand32_step (__P);
 }
 #ifdef __DISABLE_RDRND__
 #undef __DISABLE_RDRND__
 #pragma GCC pop_options
 #endif /* __DISABLE_RDRND__ */
 
 #ifndef __RDPID__
 #pragma GCC push_options
 #pragma GCC target("rdpid")
 #define __DISABLE_RDPID__
 #endif /* __RDPID__ */
 extern __inline unsigned int
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _rdpid_u32 (void)
 {
   return __builtin_ia32_rdpid ();
 }
 #ifdef __DISABLE_RDPID__
 #undef __DISABLE_RDPID__
 #pragma GCC pop_options
 #endif /* __DISABLE_RDPID__ */
 
 #ifdef  __x86_64__
 
 #ifndef __FSGSBASE__
 #pragma GCC push_options
 #pragma GCC target("fsgsbase")
 #define __DISABLE_FSGSBASE__
 #endif /* __FSGSBASE__ */
 extern __inline unsigned int
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _readfsbase_u32 (void)
 {
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5627515..24bd5bc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -108,100 +108,103 @@
   UNSPEC_MASKOP
   UNSPEC_KORTEST
   UNSPEC_KTEST
 
   ;; For embed. rounding feature
   UNSPEC_EMBEDDED_ROUNDING
 
   ;; For AVX512PF support
   UNSPEC_GATHER_PREFETCH
   UNSPEC_SCATTER_PREFETCH
 
   ;; For AVX512ER support
   UNSPEC_EXP2
   UNSPEC_RCP28
   UNSPEC_RSQRT28
 
   ;; For SHA support
   UNSPEC_SHA1MSG1
   UNSPEC_SHA1MSG2
   UNSPEC_SHA1NEXTE
   UNSPEC_SHA1RNDS4
   UNSPEC_SHA256MSG1
   UNSPEC_SHA256MSG2
   UNSPEC_SHA256RNDS2
 
   ;; For AVX512BW support
   UNSPEC_DBPSADBW
   UNSPEC_PMADDUBSW512
   UNSPEC_PMADDWD512
   UNSPEC_PSHUFHW
   UNSPEC_PSHUFLW
   UNSPEC_CVTINT2MASK
 
   ;; For AVX512DQ support
   UNSPEC_REDUCE
   UNSPEC_FPCLASS
   UNSPEC_RANGE
 
   ;; For AVX512IFMA support
   UNSPEC_VPMADD52LUQ
   UNSPEC_VPMADD52HUQ
 
   ;; For AVX512VBMI support
   UNSPEC_VPMULTISHIFT
 
   ;; For AVX5124FMAPS/AVX5124VNNIW support
   UNSPEC_VP4FMADD
   UNSPEC_VP4FNMADD
   UNSPEC_VP4DPWSSD
   UNSPEC_VP4DPWSSDS
+
+  ;; For GFNI support
+  UNSPEC_GF2P8AFFINEINV
 ])
 
 (define_c_enum "unspecv" [
   UNSPECV_LDMXCSR
   UNSPECV_STMXCSR
   UNSPECV_CLFLUSH
   UNSPECV_MONITOR
   UNSPECV_MWAIT
   UNSPECV_VZEROALL
   UNSPECV_VZEROUPPER
 ])
 
 ;; All vector modes including V?TImode, used in move patterns.
 (define_mode_iterator VMOVE
   [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") V2DF])
 
 ;; All AVX-512{F,VL} vector modes. Supposed TARGET_AVX512F baseline.
 (define_mode_iterator V48_AVX512VL
   [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
    V8DI  (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
    V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    V8DF  (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 ;; 1,2 byte AVX-512{BW,VL} vector modes. Supposed TARGET_AVX512BW baseline.
 (define_mode_iterator VI12_AVX512VL
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
    V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI1_AVX512VL
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")])
 
 ;; All vector modes
 (define_mode_iterator V
   [(V32QI "TARGET_AVX") V16QI
    (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
 ;; All 128bit vector modes
 (define_mode_iterator V_128
   [V16QI V8HI V4SI V2DI V4SF (V2DF "TARGET_SSE2")])
 
@@ -276,100 +279,103 @@
 (define_mode_iterator VF2_AVX512VL
   [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 (define_mode_iterator VF1_AVX512VL
   [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
 
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
    (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V8SI "TARGET_AVX") V4SI
    (V4DI "TARGET_AVX") V2DI])
 
 (define_mode_iterator VI_AVX2
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
 
 ;; All QImode vector integer modes
 (define_mode_iterator VI1
   [(V32QI "TARGET_AVX") V16QI])
 
 ;; All DImode vector integer modes
 (define_mode_iterator V_AVX
   [V16QI V8HI V4SI V2DI V4SF V2DF
    (V32QI "TARGET_AVX") (V16HI "TARGET_AVX")
    (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")
    (V8SF "TARGET_AVX") (V4DF"TARGET_AVX")])
 
 (define_mode_iterator VI48_AVX
  [V4SI V2DI
   (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")])
 
 (define_mode_iterator VI8
   [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI])
 
 (define_mode_iterator VI8_AVX512VL
   [V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI8_256_512
   [V8DI (V4DI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI1_AVX2
   [(V32QI "TARGET_AVX2") V16QI])
 
 (define_mode_iterator VI1_AVX512
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI])
 
+(define_mode_iterator VI1_AVX512F
+  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI])
+
 (define_mode_iterator VI2_AVX2
   [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI2_AVX512F
   [(V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI4_AVX
   [(V8SI "TARGET_AVX") V4SI])
 
 (define_mode_iterator VI4_AVX2
   [(V8SI "TARGET_AVX2") V4SI])
 
 (define_mode_iterator VI4_AVX512F
   [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI])
 
 (define_mode_iterator VI4_AVX512VL
   [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")])
 
 (define_mode_iterator VI48_AVX512F_AVX512VL
   [V4SI V8SI (V16SI "TARGET_AVX512F")
    (V2DI "TARGET_AVX512VL") (V4DI "TARGET_AVX512VL") (V8DI "TARGET_AVX512F")])
 
 (define_mode_iterator VI2_AVX512VL
   [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI])
 
 (define_mode_iterator VI8_AVX2_AVX512BW
   [(V8DI "TARGET_AVX512BW") (V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VI8_AVX2
   [(V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VI8_AVX2_AVX512F
   [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI])
 
 (define_mode_iterator VI4_128_8_256
   [V4SI V4DI])
 
 ;; All V8D* modes
 (define_mode_iterator V8FI
   [V8DF V8DI])
 
 ;; All V16S* modes
 (define_mode_iterator V16FI
   [V16SF V16SI])
 
 ;; ??? We should probably use TImode instead.
 (define_mode_iterator VIMAX_AVX2_AVX512BW
   [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") V1TI])
 
 ;; Suppose TARGET_AVX512BW as baseline
@@ -19927,50 +19933,67 @@
 	  (match_operand:HI 5 "register_operand" "Yk")))]
   "TARGET_AVX5124VNNIW"
   "vp4dpwssds\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
    [(set_attr ("type") ("ssemuladd"))
     (set_attr ("prefix") ("evex"))
     (set_attr ("mode") ("TI"))])
 
 (define_insn "vpopcount<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(popcount:VI48_512
           (match_operand:VI48_512 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512VPOPCNTDQ"
   "vpopcnt<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}")
 
 ;; Save multiple registers out-of-line.
 (define_insn "save_multiple<mode>"
   [(match_parallel 0 "save_multiple"
     [(use (match_operand:P 1 "symbol_operand"))])]
   "TARGET_SSE && TARGET_64BIT"
   "call\t%P1")
 
 ;; Restore multiple registers out-of-line.
 (define_insn "restore_multiple<mode>"
   [(match_parallel 0 "restore_multiple"
     [(use (match_operand:P 1 "symbol_operand"))])]
   "TARGET_SSE && TARGET_64BIT"
   "call\t%P1")
 
 ;; Restore multiple registers out-of-line and return.
 (define_insn "restore_multiple_and_return<mode>"
   [(match_parallel 0 "restore_multiple"
     [(return)
      (use (match_operand:P 1 "symbol_operand"))
      (set (reg:DI SP_REG) (reg:DI R10_REG))
     ])]
   "TARGET_SSE && TARGET_64BIT"
   "jmp\t%P1")
 
 ;; Restore multiple registers out-of-line when hard frame pointer is used,
 ;; perform the leave operation prior to returning (from the function).
 (define_insn "restore_multiple_leave_return<mode>"
   [(match_parallel 0 "restore_multiple"
     [(return)
      (use (match_operand:P 1 "symbol_operand"))
      (set (reg:DI SP_REG) (plus:DI (reg:DI BP_REG) (const_int 8)))
      (set (reg:DI BP_REG) (mem:DI (reg:DI BP_REG)))
      (clobber (mem:BLK (scratch)))
     ])]
   "TARGET_SSE && TARGET_64BIT"
   "jmp\t%P1")
+
+(define_insn "vgf2p8affineinvqb_<mode><mask_name>"
+  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,x,v")
+	(unspec:VI1_AVX512F [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v")
+			       (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm")
+			       (match_operand:QI 3 "const_0_to_255_operand" "n,n,n")]
+			      UNSPEC_GF2P8AFFINEINV))]
+  "TARGET_GFNI"
+  "@
+   gf2p8affineinvqb\t{%3, %2, %0| %0, %2, %3}
+   vgf2p8affineinvqb\t{%3, %2, %1, %0<mask_operand4>| %0<mask_operand4>, %1, %2, %3}
+   vgf2p8affineinvqb\t{%3, %2, %1, %0<mask_operand4>| %0<mask_operand4>, %1, %2, %3}"
+  [(set_attr "isa" "noavx,avx,avx512bw")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,maybe_evex,evex")
+   (set_attr "mode" "<sseinsnmode>")])
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index 63c5f73..7e35e68 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,14 +1,14 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni" } */
 
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
-   avx5124vnniwintrin.h, avx512vpopcntdqintrin.h and mm_malloc.h.h are usable
-   with -O -pedantic-errors.  */
+   avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h
+   and mm_malloc.h.h are usable with -O -pedantic-errors.  */
 
 #include <x86intrin.h>
 
 int dummy;
 
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index 16a96ef..7e44d47 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,10 +1,10 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
    popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
-   avx5124vnniwintrin.h, avx512vpopcntdqintrin.h and mm_malloc.h are
-   usable with -O -fkeep-inline-functions.  */
+   avx5124vnniwintrin.h, avx512vpopcntdqintrin.h gfniintrin.h and
+   mm_malloc.h are usable with -O -fkeep-inline-functions.  */
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index d03625b..4623826 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -1,52 +1,52 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them are
    defined as inline functions in {,x,e,p,t,s,w,g,a,b}mmintrin.h and
    mm3dnow.h that reference the proper builtin functions.  Defining away
    "extern" and "__inline" results in all of them being compiled as proper
    functions.  */
 
 #define extern
 #define __inline
 
 /* Following intrinsics require immediate arguments. */
 
 /* ammintrin.h */
 #define __builtin_ia32_extrqi(X, I, L)  __builtin_ia32_extrqi(X, 1, 1)
 #define __builtin_ia32_insertqi(X, Y, I, L) __builtin_ia32_insertqi(X, Y, 1, 1)
 
 /* immintrin.h */
 #define __builtin_ia32_blendpd256(X, Y, M) __builtin_ia32_blendpd256(X, Y, 1)
 #define __builtin_ia32_blendps256(X, Y, M) __builtin_ia32_blendps256(X, Y, 1)
 #define __builtin_ia32_dpps256(X, Y, M) __builtin_ia32_dpps256(X, Y, 1)
 #define __builtin_ia32_shufpd256(X, Y, M) __builtin_ia32_shufpd256(X, Y, 1)
 #define __builtin_ia32_shufps256(X, Y, M) __builtin_ia32_shufps256(X, Y, 1)
 #define __builtin_ia32_cmpsd(X, Y, O) __builtin_ia32_cmpsd(X, Y, 1)
 #define __builtin_ia32_cmpss(X, Y, O) __builtin_ia32_cmpss(X, Y, 1)
 #define __builtin_ia32_cmppd(X, Y, O) __builtin_ia32_cmppd(X, Y, 1)
 #define __builtin_ia32_cmpps(X, Y, O) __builtin_ia32_cmpps(X, Y, 1)
 #define __builtin_ia32_cmppd256(X, Y, O) __builtin_ia32_cmppd256(X, Y, 1)
 #define __builtin_ia32_cmpps256(X, Y, O) __builtin_ia32_cmpps256(X, Y, 1)
 #define __builtin_ia32_vextractf128_pd256(X, N) __builtin_ia32_vextractf128_pd256(X, 1)
 #define __builtin_ia32_vextractf128_ps256(X, N) __builtin_ia32_vextractf128_ps256(X, 1)
 #define __builtin_ia32_vextractf128_si256(X, N) __builtin_ia32_vextractf128_si256(X, 1)
 #define __builtin_ia32_vpermilpd(X, N) __builtin_ia32_vpermilpd(X, 1)
 #define __builtin_ia32_vpermilpd256(X, N) __builtin_ia32_vpermilpd256(X, 1)
 #define __builtin_ia32_vpermilps(X, N) __builtin_ia32_vpermilps(X, 1)
 #define __builtin_ia32_vpermilps256(X, N) __builtin_ia32_vpermilps256(X, 1)
 #define __builtin_ia32_vpermil2pd(X, Y, C, I) __builtin_ia32_vpermil2pd(X, Y, C, 1)
 #define __builtin_ia32_vpermil2pd256(X, Y, C, I) __builtin_ia32_vpermil2pd256(X, Y, C, 1)
 #define __builtin_ia32_vpermil2ps(X, Y, C, I) __builtin_ia32_vpermil2ps(X, Y, C, 1)
 #define __builtin_ia32_vpermil2ps256(X, Y, C, I) __builtin_ia32_vpermil2ps256(X, Y, C, 1)
 #define __builtin_ia32_vperm2f128_pd256(X, Y, C) __builtin_ia32_vperm2f128_pd256(X, Y, 1)
 #define __builtin_ia32_vperm2f128_ps256(X, Y, C) __builtin_ia32_vperm2f128_ps256(X, Y, 1)
 #define __builtin_ia32_vperm2f128_si256(X, Y, C) __builtin_ia32_vperm2f128_si256(X, Y, 1)
 #define __builtin_ia32_vinsertf128_pd256(X, Y, C) __builtin_ia32_vinsertf128_pd256(X, Y, 1)
 #define __builtin_ia32_vinsertf128_ps256(X, Y, C) __builtin_ia32_vinsertf128_ps256(X, Y, 1)
 #define __builtin_ia32_vinsertf128_si256(X, Y, C) __builtin_ia32_vinsertf128_si256(X, Y, 1)
 #define __builtin_ia32_roundpd256(V, M) __builtin_ia32_roundpd256(V, 1)
 #define __builtin_ia32_roundps256(V, M) __builtin_ia32_roundps256(V, 1)
@@ -556,53 +556,63 @@
 #define __builtin_ia32_cmppd128_mask(A, B, E, D) __builtin_ia32_cmppd128_mask(A, B, 1, D)
 #define __builtin_ia32_cmpd256_mask(A, B, E, D) __builtin_ia32_cmpd256_mask(A, B, 1, D)
 #define __builtin_ia32_cmpd128_mask(A, B, E, D) __builtin_ia32_cmpd128_mask(A, B, 1, D)
 #define __builtin_ia32_alignq256_mask(A, B, F, D, E) __builtin_ia32_alignq256_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignq128_mask(A, B, F, D, E) __builtin_ia32_alignq128_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignd256_mask(A, B, F, D, E) __builtin_ia32_alignd256_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignd128_mask(A, B, F, D, E) __builtin_ia32_alignd128_mask(A, B, 1, D, E)
 
 /* avx512vlbwintrin.h */
 #define __builtin_ia32_ucmpw256_mask(A, B, E, D) __builtin_ia32_ucmpw256_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpw128_mask(A, B, E, D) __builtin_ia32_ucmpw128_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb256_mask(A, B, E, D) __builtin_ia32_ucmpb256_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb128_mask(A, B, E, D) __builtin_ia32_ucmpb128_mask(A, B, 1, D)
 #define __builtin_ia32_psrlwi256_mask(A, E, C, D) __builtin_ia32_psrlwi256_mask(A, 1, C, D)
 #define __builtin_ia32_psrlwi128_mask(A, E, C, D) __builtin_ia32_psrlwi128_mask(A, 1, C, D)
 #define __builtin_ia32_psrawi256_mask(A, E, C, D) __builtin_ia32_psrawi256_mask(A, 1, C, D)
 #define __builtin_ia32_psrawi128_mask(A, E, C, D) __builtin_ia32_psrawi128_mask(A, 1, C, D)
 #define __builtin_ia32_psllwi256_mask(A, E, C, D) __builtin_ia32_psllwi256_mask(A, 1, C, D)
 #define __builtin_ia32_psllwi128_mask(A, E, C, D) __builtin_ia32_psllwi128_mask(A, 1, C, D)
 #define __builtin_ia32_pshuflw256_mask(A, E, C, D) __builtin_ia32_pshuflw256_mask(A, 1, C, D)
 #define __builtin_ia32_pshuflw128_mask(A, E, C, D) __builtin_ia32_pshuflw128_mask(A, 1, C, D)
 #define __builtin_ia32_pshufhw256_mask(A, E, C, D) __builtin_ia32_pshufhw256_mask(A, 1, C, D)
 #define __builtin_ia32_pshufhw128_mask(A, E, C, D) __builtin_ia32_pshufhw128_mask(A, 1, C, D)
 #define __builtin_ia32_palignr256_mask(A, B, F, D, E) __builtin_ia32_palignr256_mask(A, B, 8, D, E)
 #define __builtin_ia32_palignr128_mask(A, B, F, D, E) __builtin_ia32_palignr128_mask(A, B, 8, D, E)
 #define __builtin_ia32_dbpsadbw256_mask(A, B, F, D, E) __builtin_ia32_dbpsadbw256_mask(A, B, 1, D, E)
 #define __builtin_ia32_dbpsadbw128_mask(A, B, F, D, E) __builtin_ia32_dbpsadbw128_mask(A, B, 1, D, E)
 #define __builtin_ia32_cmpw256_mask(A, B, E, D) __builtin_ia32_cmpw256_mask(A, B, 1, D)
 #define __builtin_ia32_cmpw128_mask(A, B, E, D) __builtin_ia32_cmpw128_mask(A, B, 1, D)
 #define __builtin_ia32_cmpb256_mask(A, B, E, D) __builtin_ia32_cmpb256_mask(A, B, 1, D)
 #define __builtin_ia32_cmpb128_mask(A, B, E, D) __builtin_ia32_cmpb128_mask(A, B, 1, D)
 
 /* avx512vldqintrin.h */
 #define __builtin_ia32_reduceps256_mask(A, E, C, D) __builtin_ia32_reduceps256_mask(A, 1, C, D)
 #define __builtin_ia32_reduceps128_mask(A, E, C, D) __builtin_ia32_reduceps128_mask(A, 1, C, D)
 #define __builtin_ia32_reducepd256_mask(A, E, C, D) __builtin_ia32_reducepd256_mask(A, 1, C, D)
 #define __builtin_ia32_reducepd128_mask(A, E, C, D) __builtin_ia32_reducepd128_mask(A, 1, C, D)
 #define __builtin_ia32_rangeps256_mask(A, B, F, D, E) __builtin_ia32_rangeps256_mask(A, B, 1, D, E)
 #define __builtin_ia32_rangeps128_mask(A, B, F, D, E) __builtin_ia32_rangeps128_mask(A, B, 1, D, E)
 #define __builtin_ia32_rangepd256_mask(A, B, F, D, E) __builtin_ia32_rangepd256_mask(A, B, 1, D, E)
 #define __builtin_ia32_rangepd128_mask(A, B, F, D, E) __builtin_ia32_rangepd128_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti64x2_256_mask(A, B, F, D, E) __builtin_ia32_inserti64x2_256_mask(A, B, 1, D, E)
 #define __builtin_ia32_insertf64x2_256_mask(A, B, F, D, E) __builtin_ia32_insertf64x2_256_mask(A, B, 1, D, E)
 #define __builtin_ia32_fpclassps256_mask(A, D, C) __builtin_ia32_fpclassps256_mask(A, 1, C)
 #define __builtin_ia32_fpclassps128_mask(A, D, C) __builtin_ia32_fpclassps128_mask(A, 1, C)
 #define __builtin_ia32_fpclasspd256_mask(A, D, C) __builtin_ia32_fpclasspd256_mask(A, 1, C)
 #define __builtin_ia32_fpclasspd128_mask(A, D, C) __builtin_ia32_fpclasspd128_mask(A, 1, C)
 #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D)
 
+/* gfniintrin.h */
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, 1) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, 1, D, E) 
+
+
+
 #include <wmmintrin.h>
 #include <immintrin.h>
 #include <mm3dnow.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 9693fa4..9390c1a 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -28,64 +28,67 @@ do_test (void)
 static int
 check_osxsave (void)
 {
   unsigned int eax, ebx, ecx, edx;
 
   __cpuid (1, eax, ebx, ecx, edx);
   return (ecx & bit_OSXSAVE) != 0;
 }
 
 int
 main ()
 {
   unsigned int eax, ebx, ecx, edx;
 
   if (!__get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx))
     return 0;
 
   /* Run AVX512 test only if host has ISA support.  */
   if (check_osxsave ()
       && (ebx & bit_AVX512F)
 #ifdef AVX512VL
       && (ebx & bit_AVX512VL)
 #endif
 #ifdef AVX512ER
       && (ebx & bit_AVX512ER)
 #endif
 #ifdef AVX512CD
       && (ebx & bit_AVX512CD)
 #endif
 #ifdef AVX512DQ
       && (ebx & bit_AVX512DQ)
 #endif
 #ifdef AVX512BW
       && (ebx & bit_AVX512BW)
 #endif
 #ifdef AVX512IFMA
       && (ebx & bit_AVX512IFMA)
 #endif
 #ifdef AVX512VBMI
       && (ecx & bit_AVX512VBMI)
 #endif
 #ifdef AVX5124FMAPS
       && (edx & bit_AVX5124FMAPS)
 #endif
 #ifdef AVX5124VNNIW
       && (edx & bit_AVX5124VNNIW)
 #endif
 #ifdef AVX512VPOPCNTDQ
       && (ecx & bit_AVX512VPOPCNTDQ)
 #endif
+#ifdef GFNI
+      && (ecx & bit_GFNI)
+#endif
       && avx512f_os_support ())
     {
       DO_TEST ();
 #ifdef DEBUG
       printf ("PASSED\n");
 #endif
       return 0;
     }
  
 #ifdef DEBUG
   printf ("SKIPPED\n");
 #endif
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-gf2p8affineinvqb-2.c b/gcc/testsuite/gcc.target/i386/avx512f-gf2p8affineinvqb-2.c
new file mode 100644
index 0000000..af4839f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-gf2p8affineinvqb-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mgfni -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target gfni } */
+
+#define AVX512F
+
+#define GFNI
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 8)
+
+#include "avx512f-mask-type.h"
+#include <x86intrin.h>
+
+static void
+CALC (unsigned char *r, unsigned char *s1, unsigned char *s2, unsigned char imm)
+{
+  for (int a = 0; a < SIZE/8; a++)
+    {
+      for (int val = 0; val < 8; val++)
+        {
+          unsigned char result = 0;
+          for (int bit = 0; bit < 8; bit++)
+          {
+            unsigned char temp = s1[a*8 + val] & s2[a*8 + bit];
+            unsigned char parity = __popcntd(temp);
+            if (parity % 2)
+              result |= (1 << (8 - bit - 1));
+          }
+          r[a*8 + val] = result ^ imm; 
+        }
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_b) res1, res2, res3, src1, src2;
+  MASK_TYPE mask = MASK_VALUE;
+  char res_ref[SIZE];
+  unsigned char imm = 0;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = i %2 ; // gfni inverse of 1 and 0 are 1 and 0
+      src2.a[i] = 1;
+    }
+
+  for (i = 0; i < SIZE; i++)
+    {
+      res1.a[i] = DEFAULT_VALUE;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+    }
+
+  CALC (res_ref, src1.a, src2.a, imm);
+
+  res1.x = INTRINSIC (_gf2p8affineinv_epi64_epi8) (src1.x, src2.x, imm);
+  res2.x = INTRINSIC (_mask_gf2p8affineinv_epi64_epi8) (res2.x, mask, src1.x, src2.x, imm);
+  res3.x = INTRINSIC (_maskz_gf2p8affineinv_epi64_epi8) (mask, src1.x, src2.x, imm);
+
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_b) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c
new file mode 100644
index 0000000..fa54526
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl -mgfni" } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target avx512bw } */
+/* { dg-require-effective-target gfni } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512f-gf2p8affineinvqb-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512f-gf2p8affineinvqb-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/gfni-1.c b/gcc/testsuite/gcc.target/i386/gfni-1.c
new file mode 100644
index 0000000..5e22c9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/gfni-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mgfni -mavx512bw -mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%zmm\[0-9\]+\[^\\n\\r]*%zmm\[0-9\]+\[^\\n\\r\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <x86intrin.h>
+
+volatile __m512i x1, x2;
+volatile __mmask64 m64;
+ 
+void extern
+avx512vl_test (void)
+{
+    x1 = _mm512_gf2p8affineinv_epi64_epi8(x1, x2, 3);
+    x1 = _mm512_mask_gf2p8affineinv_epi64_epi8(x1, m64, x2, x1, 3);
+    x1 = _mm512_maskz_gf2p8affineinv_epi64_epi8(m64, x1, x2, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/gfni-2.c b/gcc/testsuite/gcc.target/i386/gfni-2.c
new file mode 100644
index 0000000..4d1f151
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/gfni-2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mgfni -mavx512bw -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%ymm\[0-9\]+\[^\\n\\r]*%ymm\[0-9\]+\[^\\n\\r\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\\n\\r]*%xmm\[0-9\]+\[^\\n\\r\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <x86intrin.h>
+
+int *p;
+volatile __m256i x3, x4;
+volatile __m128i x5, x6;
+volatile __mmask32 m32;
+volatile __mmask16 m16;
+ 
+void extern
+avx512vl_test (void)
+{
+    x3 = _mm256_gf2p8affineinv_epi64_epi8(x3, x4, 3);
+    x3 = _mm256_mask_gf2p8affineinv_epi64_epi8(x3, m32, x4, x3, 3);
+    x3 = _mm256_maskz_gf2p8affineinv_epi64_epi8(m32, x3, x4, 3);
+    x5 = _mm_gf2p8affineinv_epi64_epi8(x5, x6, 3);
+    x5 = _mm_mask_gf2p8affineinv_epi64_epi8(x5, m16, x6, x5, 3);
+    x5 = _mm_maskz_gf2p8affineinv_epi64_epi8(m16, x5, x6, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/gfni-3.c b/gcc/testsuite/gcc.target/i386/gfni-3.c
new file mode 100644
index 0000000..de5f80b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/gfni-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mgfni -mavx -O2" } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <x86intrin.h>
+
+int *p;
+volatile __m256i x3, x4;
+volatile __m128i x5, x6;
+ 
+void extern
+avx512vl_test (void)
+{
+    x3 = _mm256_gf2p8affineinv_epi64_epi8(x3, x4, 3);
+    x5 = _mm_gf2p8affineinv_epi64_epi8(x5, x6, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/gfni-4.c b/gcc/testsuite/gcc.target/i386/gfni-4.c
new file mode 100644
index 0000000..1532716
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/gfni-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mgfni -O2" } */
+/* { dg-final { scan-assembler-times "gf2p8affineinvqb\[ \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <x86intrin.h>
+
+int *p;
+volatile __m128i x5, x6;
+ 
+void extern
+avx512vl_test (void)
+{
+    x5 = _mm_gf2p8affineinv_epi64_epi8(x5, x6, 3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp
index eae2531..b2bdbfd 100644
--- a/gcc/testsuite/gcc.target/i386/i386.exp
+++ b/gcc/testsuite/gcc.target/i386/i386.exp
@@ -374,83 +374,98 @@ proc check_effective_target_avx5124fmaps { } {
 
 	__v16sf
 	_mm512_mask_4fmadd_ps (__v16sf __DEST, __v16sf __A, __v16sf __B, __v16sf __C,
 			       __v16sf __D, __v16sf __E, __v4sf *__F)
 	{
 	    return (__v16sf) __builtin_ia32_4fmaddps_mask ((__v16sf) __A,
 							  (__v16sf) __B,
 							  (__v16sf) __C,
 							  (__v16sf) __D,
 							  (__v16sf) __E,
 							  (const __v4sf *) __F,
 							  (__v16sf) __DEST,
 							  0xffff);
 	}
     } "-mavx5124fmaps" ]
 }
 
 # Return 1 if avx512_4vnniw instructions can be compiled.
 proc check_effective_target_avx5124vnniw { } {
     return [check_no_compiler_messages avx5124vnniw object {
 	typedef int __v16si __attribute__ ((__vector_size__ (64)));
 	typedef int __v4si __attribute__ ((__vector_size__ (16)));
 
 	__v16si
 	_mm512_4dpwssd_epi32 (__v16si __A, __v16si __B, __v16si __C,
 			      __v16si __D, __v16si __E, __v4si *__F)
 	{
 	    return (__v16si) __builtin_ia32_vp4dpwssd ((__v16si) __B,
 						       (__v16si) __C,
 						       (__v16si) __D,
 						       (__v16si) __E,
 						       (__v16si) __A,
 						       (const __v4si *) __F);
 	}
     } "-mavx5124vnniw" ]
 }
 
 # Return 1 if avx512_vpopcntdq instructions can be compiled.
 proc check_effective_target_avx512vpopcntdq { } {
     return [check_no_compiler_messages avx512vpopcntdq object {
         typedef int __v16si __attribute__ ((__vector_size__ (64)));
 
         __v16si
         _mm512_popcnt_epi32 (__v16si __A)
         {
             return (__v16si) __builtin_ia32_vpopcountd_v16si ((__v16si) __A);
         }
     } "-mavx512vpopcntdq" ]
 }
 
+# Return 1 if gfni instructions can be compiled.
+proc check_effective_target_gfni { } {
+    return [check_no_compiler_messages gfni object {
+        typedef char __v16qi __attribute__ ((__vector_size__ (16)));
+
+        __v16qi
+        _mm_gf2p8affineinv_epi64_epi8 (__v16qi __A, __v16qi __B, const int __C)
+        {
+            return (__v16qi) __builtin_ia32_vgf2p8affineinvqb_v16qi ((__v16qi) __A,
+								     (__v16qi) __B,
+								      0);
+        }
+    } "-mgfni" ]
+}
+
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
     set DEFAULT_CFLAGS " -ansi -pedantic-errors"
 }
 
 # Initialize `dg'.
 dg-init
 clearcap-init
 
 global runtests
 # Special case compilation of vect-args.c so we don't have to
 # replicate it 16 times.
 if [runtest_file_p $runtests $srcdir/$subdir/vect-args.c] {
   foreach type { "" -mmmx -m3dnow -msse -msse2 -mavx -mavx2 -mavx512f } {
     foreach level { "" -O } {
       set flags "$type $level"
       verbose -log "Testing vect-args, $flags" 1
       dg-test $srcdir/$subdir/vect-args.c $flags ""
     }
   }
 }
 
 # Everything else.
 set tests [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]]
 set tests [prune $tests $srcdir/$subdir/vect-args.c]
 
 # Main loop.
 dg-runtest $tests "" $DEFAULT_CFLAGS
 
 # All done.
 clearcap-finish
 dg-finish
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index b98b8b6..82f5d3c 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -1,10 +1,10 @@
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
-   popcntintrin.h and mm_malloc.h are usable
+   popcntintrin.h gfniintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni" } */
 
 #include <x86intrin.h>
 
 int dummy;
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 7ab2223..c35ec9a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,52 +1,52 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them
    are defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h,
    mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h,
    tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h and mm_malloc.h 
    that reference the proper builtin functions.
 
    Defining away "extern" and "__inline" results in all of them being
    compiled as proper functions.  */
 
 #define extern
 #define __inline
 
 /* Following intrinsics require immediate arguments. */
 
 /* ammintrin.h */
 #define __builtin_ia32_extrqi(X, I, L)  __builtin_ia32_extrqi(X, 1, 1)
 #define __builtin_ia32_insertqi(X, Y, I, L) __builtin_ia32_insertqi(X, Y, 1, 1)
 
 /* immintrin.h */
 #define __builtin_ia32_blendpd256(X, Y, M) __builtin_ia32_blendpd256(X, Y, 1)
 #define __builtin_ia32_blendps256(X, Y, M) __builtin_ia32_blendps256(X, Y, 1)
 #define __builtin_ia32_dpps256(X, Y, M) __builtin_ia32_dpps256(X, Y, 1)
 #define __builtin_ia32_shufpd256(X, Y, M) __builtin_ia32_shufpd256(X, Y, 1)
 #define __builtin_ia32_shufps256(X, Y, M) __builtin_ia32_shufps256(X, Y, 1)
 #define __builtin_ia32_cmpsd(X, Y, O) __builtin_ia32_cmpsd(X, Y, 1)
 #define __builtin_ia32_cmpss(X, Y, O) __builtin_ia32_cmpss(X, Y, 1)
 #define __builtin_ia32_cmppd(X, Y, O) __builtin_ia32_cmppd(X, Y, 1)
 #define __builtin_ia32_cmpps(X, Y, O) __builtin_ia32_cmpps(X, Y, 1)
 #define __builtin_ia32_cmppd256(X, Y, O) __builtin_ia32_cmppd256(X, Y, 1)
 #define __builtin_ia32_cmpps256(X, Y, O) __builtin_ia32_cmpps256(X, Y, 1)
 #define __builtin_ia32_vextractf128_pd256(X, N) __builtin_ia32_vextractf128_pd256(X, 1)
 #define __builtin_ia32_vextractf128_ps256(X, N) __builtin_ia32_vextractf128_ps256(X, 1)
 #define __builtin_ia32_vextractf128_si256(X, N) __builtin_ia32_vextractf128_si256(X, 1)
 #define __builtin_ia32_vpermilpd(X, N) __builtin_ia32_vpermilpd(X, 1)
 #define __builtin_ia32_vpermilpd256(X, N) __builtin_ia32_vpermilpd256(X, 1)
 #define __builtin_ia32_vpermilps(X, N) __builtin_ia32_vpermilps(X, 1)
 #define __builtin_ia32_vpermilps256(X, N) __builtin_ia32_vpermilps256(X, 1)
 #define __builtin_ia32_vpermil2pd(X, Y, C, I) __builtin_ia32_vpermil2pd(X, Y, C, 1)
 #define __builtin_ia32_vpermil2pd256(X, Y, C, I) __builtin_ia32_vpermil2pd256(X, Y, C, 1)
 #define __builtin_ia32_vpermil2ps(X, Y, C, I) __builtin_ia32_vpermil2ps(X, Y, C, 1)
 #define __builtin_ia32_vpermil2ps256(X, Y, C, I) __builtin_ia32_vpermil2ps256(X, Y, C, 1)
 #define __builtin_ia32_vperm2f128_pd256(X, Y, C) __builtin_ia32_vperm2f128_pd256(X, Y, 1)
 #define __builtin_ia32_vperm2f128_ps256(X, Y, C) __builtin_ia32_vperm2f128_ps256(X, Y, 1)
 #define __builtin_ia32_vperm2f128_si256(X, Y, C) __builtin_ia32_vperm2f128_si256(X, Y, 1)
 #define __builtin_ia32_vinsertf128_pd256(X, Y, C) __builtin_ia32_vinsertf128_pd256(X, Y, 1)
 #define __builtin_ia32_vinsertf128_ps256(X, Y, C) __builtin_ia32_vinsertf128_ps256(X, Y, 1)
@@ -573,51 +573,59 @@
 #define __builtin_ia32_cmppd128_mask(A, B, E, D) __builtin_ia32_cmppd128_mask(A, B, 1, D)
 #define __builtin_ia32_cmpd256_mask(A, B, E, D) __builtin_ia32_cmpd256_mask(A, B, 1, D)
 #define __builtin_ia32_cmpd128_mask(A, B, E, D) __builtin_ia32_cmpd128_mask(A, B, 1, D)
 #define __builtin_ia32_alignq256_mask(A, B, F, D, E) __builtin_ia32_alignq256_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignq128_mask(A, B, F, D, E) __builtin_ia32_alignq128_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignd256_mask(A, B, F, D, E) __builtin_ia32_alignd256_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignd128_mask(A, B, F, D, E) __builtin_ia32_alignd128_mask(A, B, 1, D, E)
 
 /* avx512vlbwintrin.h */
 #define __builtin_ia32_ucmpw256_mask(A, B, E, D) __builtin_ia32_ucmpw256_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpw128_mask(A, B, E, D) __builtin_ia32_ucmpw128_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb256_mask(A, B, E, D) __builtin_ia32_ucmpb256_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb128_mask(A, B, E, D) __builtin_ia32_ucmpb128_mask(A, B, 1, D)
 #define __builtin_ia32_psrlwi256_mask(A, E, C, D) __builtin_ia32_psrlwi256_mask(A, 1, C, D)
 #define __builtin_ia32_psrlwi128_mask(A, E, C, D) __builtin_ia32_psrlwi128_mask(A, 1, C, D)
 #define __builtin_ia32_psrawi256_mask(A, E, C, D) __builtin_ia32_psrawi256_mask(A, 1, C, D)
 #define __builtin_ia32_psrawi128_mask(A, E, C, D) __builtin_ia32_psrawi128_mask(A, 1, C, D)
 #define __builtin_ia32_psllwi256_mask(A, E, C, D) __builtin_ia32_psllwi256_mask(A, 1, C, D)
 #define __builtin_ia32_psllwi128_mask(A, E, C, D) __builtin_ia32_psllwi128_mask(A, 1, C, D)
 #define __builtin_ia32_pshuflw256_mask(A, E, C, D) __builtin_ia32_pshuflw256_mask(A, 1, C, D)
 #define __builtin_ia32_pshuflw128_mask(A, E, C, D) __builtin_ia32_pshuflw128_mask(A, 1, C, D)
 #define __builtin_ia32_pshufhw256_mask(A, E, C, D) __builtin_ia32_pshufhw256_mask(A, 1, C, D)
 #define __builtin_ia32_pshufhw128_mask(A, E, C, D) __builtin_ia32_pshufhw128_mask(A, 1, C, D)
 #define __builtin_ia32_palignr256_mask(A, B, F, D, E) __builtin_ia32_palignr256_mask(A, B, 8, D, E)
 #define __builtin_ia32_palignr128_mask(A, B, F, D, E) __builtin_ia32_palignr128_mask(A, B, 8, D, E)
 #define __builtin_ia32_dbpsadbw256_mask(A, B, F, D, E) __builtin_ia32_dbpsadbw256_mask(A, B, 1, D, E)
 #define __builtin_ia32_dbpsadbw128_mask(A, B, F, D, E) __builtin_ia32_dbpsadbw128_mask(A, B, 1, D, E)
 #define __builtin_ia32_cmpw256_mask(A, B, E, D) __builtin_ia32_cmpw256_mask(A, B, 1, D)
 #define __builtin_ia32_cmpw128_mask(A, B, E, D) __builtin_ia32_cmpw128_mask(A, B, 1, D)
 #define __builtin_ia32_cmpb256_mask(A, B, E, D) __builtin_ia32_cmpb256_mask(A, B, 1, D)
 #define __builtin_ia32_cmpb128_mask(A, B, E, D) __builtin_ia32_cmpb128_mask(A, B, 1, D)
 
 /* avx512vldqintrin.h */
 #define __builtin_ia32_reduceps256_mask(A, E, C, D) __builtin_ia32_reduceps256_mask(A, 1, C, D)
 #define __builtin_ia32_reduceps128_mask(A, E, C, D) __builtin_ia32_reduceps128_mask(A, 1, C, D)
 #define __builtin_ia32_reducepd256_mask(A, E, C, D) __builtin_ia32_reducepd256_mask(A, 1, C, D)
 #define __builtin_ia32_reducepd128_mask(A, E, C, D) __builtin_ia32_reducepd128_mask(A, 1, C, D)
 #define __builtin_ia32_rangeps256_mask(A, B, F, D, E) __builtin_ia32_rangeps256_mask(A, B, 1, D, E)
 #define __builtin_ia32_rangeps128_mask(A, B, F, D, E) __builtin_ia32_rangeps128_mask(A, B, 1, D, E)
 #define __builtin_ia32_rangepd256_mask(A, B, F, D, E) __builtin_ia32_rangepd256_mask(A, B, 1, D, E)
 #define __builtin_ia32_rangepd128_mask(A, B, F, D, E) __builtin_ia32_rangepd128_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti64x2_256_mask(A, B, F, D, E) __builtin_ia32_inserti64x2_256_mask(A, B, 1, D, E)
 #define __builtin_ia32_insertf64x2_256_mask(A, B, F, D, E) __builtin_ia32_insertf64x2_256_mask(A, B, 1, D, E)
 #define __builtin_ia32_fpclassps256_mask(A, D, C) __builtin_ia32_fpclassps256_mask(A, 1, C)
 #define __builtin_ia32_fpclassps128_mask(A, D, C) __builtin_ia32_fpclassps128_mask(A, 1, C)
 #define __builtin_ia32_fpclasspd256_mask(A, D, C) __builtin_ia32_fpclasspd256_mask(A, 1, C)
 #define __builtin_ia32_fpclasspd128_mask(A, D, C) __builtin_ia32_fpclasspd128_mask(A, 1, C)
 #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D)
 
+/* gfniintrin.h */
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, 1) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, 1, D, E) 
+
 #include <x86intrin.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index c2a19b3..388026f 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,61 +1,61 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile without optimization.  All of them are
    defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h,
    fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, 
-   lwpintrin.h, fmaintrin.h and mm_malloc.h that reference the proper 
-   builtin functions.
+   lwpintrin.h, fmaintrin.h gfniintrin.h and mm_malloc.h that reference
+   the proper builtin functions.
 
    Defining away "extern" and "__inline" results in all of them being compiled
    as proper functions.  */
 
 #define extern
 #define __inline
 
 #include <x86intrin.h>
 
 #define _CONCAT(x,y) x ## y
 
 #define test_0(func, type, imm)						\
   type _CONCAT(_,func) (int const I)					\
   { return func (imm); }
 
 #define test_1(func, type, op1_type, imm)				\
   type _CONCAT(_,func) (op1_type A, int const I)			\
   { return func (A, imm); }
 
 #define test_1x(func, type, op1_type, imm1, imm2)			\
   type _CONCAT(_,func) (op1_type A, int const I, int const L)		\
   { return func (A, imm1, imm2); }
 
 #define test_1y(func, type, op1_type, imm1, imm2, imm3)			\
   type _CONCAT(_,func) (op1_type A, int const I, int const L, int const R)\
   { return func (A, imm1, imm2, imm3); }
 
 #define test_2(func, type, op1_type, op2_type, imm)			\
   type _CONCAT(_,func) (op1_type A, op2_type B, int const I)		\
   { return func (A, B, imm); }
 
 #define test_2x(func, type, op1_type, op2_type, imm1, imm2)		\
   type _CONCAT(_,func) (op1_type A, op2_type B, int const I, int const L) \
   { return func (A, B, imm1, imm2); }
 
 #define test_2y(func, type, op1_type, op2_type, imm1, imm2, imm3)	 \
   type _CONCAT(_,func) (op1_type A, op2_type B, int const I, int const L,\
 			int const R)					 \
   { return func (A, B, imm1, imm2, imm3); }
 
 #define test_2vx(func, op1_type, op2_type, imm1, imm2)     \
   void _CONCAT(_,func) (op1_type A, op2_type B, int const I, int const L) \
   { func (A, B, imm1, imm2); }
 
 #define test_3(func, type, op1_type, op2_type, op3_type, imm)		\
   type _CONCAT(_,func) (op1_type A, op2_type B,				\
 			op3_type C, int const I)			\
   { return func (A, B, C, imm); }
 
 #define test_3x(func, type, op1_type, op2_type, op3_type, imm1, imm2)		\
@@ -637,50 +637,55 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* tmmintrin.h */
 test_2 (_mm_alignr_epi8, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_alignr_pi8, __m64, __m64, __m64, 1)
 
 /* emmintrin.h */
 test_2 (_mm_shuffle_pd, __m128d, __m128d, __m128d, 1)
 test_1 (_mm_bsrli_si128, __m128i, __m128i, 1)
 test_1 (_mm_bslli_si128, __m128i, __m128i, 1)
 test_1 (_mm_srli_si128, __m128i, __m128i, 1)
 test_1 (_mm_slli_si128, __m128i, __m128i, 1)
 test_1 (_mm_extract_epi16, int, __m128i, 1)
 test_2 (_mm_insert_epi16, __m128i, __m128i, int, 1)
 test_1 (_mm_shufflehi_epi16, __m128i, __m128i, 1)
 test_1 (_mm_shufflelo_epi16, __m128i, __m128i, 1)
 test_1 (_mm_shuffle_epi32, __m128i, __m128i, 1)
 
 /* xmmintrin.h */
 test_2 (_mm_shuffle_ps, __m128, __m128, __m128, 1)
 test_1 (_mm_extract_pi16, int, __m64, 1)
 test_1 (_m_pextrw, int, __m64, 1)
 test_2 (_mm_insert_pi16, __m64, __m64, int, 1)
 test_2 (_m_pinsrw, __m64, __m64, int, 1)
 test_1 (_mm_shuffle_pi16, __m64, __m64, 1)
 test_1 (_m_pshufw, __m64, __m64, 1)
 test_1 (_mm_prefetch, void, void *, _MM_HINT_NTA)
 
 /* xopintrin.h */
 test_1 ( _mm_roti_epi8, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi16, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi32, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi64, __m128i, __m128i, 1)
 test_3 (_mm_permute2_pd, __m128d, __m128d, __m128d, __m128d, 1)
 test_3 (_mm256_permute2_pd, __m256d, __m256d, __m256d, __m256d, 1)
 test_3 (_mm_permute2_ps, __m128, __m128, __m128, __m128, 1)
 test_3 (_mm256_permute2_ps, __m256, __m256, __m256, __m256, 1)
 
 /* lwpintrin.h */
 test_2 ( __lwpval32, void, unsigned int, unsigned int, 1)
 test_2 ( __lwpins32, unsigned char, unsigned int, unsigned int, 1)
 #ifdef __x86_64__
 test_2 ( __lwpval64, void, unsigned long long, unsigned int, 1)
 test_2 ( __lwpins64, unsigned char, unsigned long long, unsigned int, 1)
 #endif
 
 /* tbmintrin.h */
 test_1 ( __bextri_u32, unsigned int, unsigned int, 1)
 #ifdef __x86_64__
 test_1 ( __bextri_u64, unsigned long long, unsigned long long, 1)
 #endif
+
+/* gfniintrin.h */
+test_2 (_mm_gf2p8affineinv_epi64_epi8, __m128i, __m128i, __m128i, 1)
+test_2 (_mm256_gf2p8affineinv_epi64_epi8, __m256i, __m256i, __m256i, 1)
+test_2 (_mm512_gf2p8affineinv_epi64_epi8, __m512i, __m512i, __m512i, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index cd8945b..3e64e29 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -54,101 +54,101 @@
   { func (A, B, imm1, imm2); }
 
 #define test_3(func, type, op1_type, op2_type, op3_type, imm)		\
   type _CONCAT(_,func) (op1_type A, op2_type B,				\
 			op3_type C, int const I)			\
   { return func (A, B, C, imm); }
 
 #define test_3x(func, type, op1_type, op2_type, op3_type, imm1, imm2)		\
   type _CONCAT(_,func) (op1_type A, op2_type B,				\
 			op3_type C, int const I, int const L)			\
   { return func (A, B, C, imm1, imm2); }
 
 #define test_3y(func, type, op1_type, op2_type, op3_type, imm1, imm2, imm3)	\
   type _CONCAT(_,func) (op1_type A, op2_type B,				\
 			op3_type C, int const I, int const L, int const R)	\
   { return func (A, B, C, imm1, imm2, imm3); }
 
 #define test_3v(func, op1_type, op2_type, op3_type, imm)		\
   int _CONCAT(_,func) (op1_type A, op2_type B,				\
 		       op3_type C, int const I)				\
   { func (A, B, C, imm); }
 
 #define test_3vx(func, op1_type, op2_type, op3_type, imm1, imm2)   \
   void _CONCAT(_,func) (op1_type A, op2_type B,             	   \
 		       op3_type C, int const I, int const L)       \
   { func (A, B, C, imm1, imm2); }
 
 #define test_4(func, type, op1_type, op2_type, op3_type, op4_type, imm)	\
   type _CONCAT(_,func) (op1_type A, op2_type B,				\
 			op3_type C, op4_type D, int const I)		\
   { return func (A, B, C, D, imm); }
 
 #define test_4x(func, type, op1_type, op2_type, op3_type, op4_type, imm1, imm2)	\
   type _CONCAT(_,func) (op1_type A, op2_type B,				\
 			op3_type C, op4_type D, int const I, int const L)		\
   { return func (A, B, C, D, imm1, imm2); }
 
 #define test_4y(func, type, op1_type, op2_type, op3_type, op4_type, imm1, imm2, imm3)	\
   type _CONCAT(_,func) (op1_type A, op2_type B,	op3_type C,		\
 			op4_type D, int const I, int const L, int const R)		\
   { return func (A, B, C, D, imm1, imm2, imm3); }
 
 
 #define test_4v(func, op1_type, op2_type, op3_type, op4_type, imm)	\
   int _CONCAT(_,func) (op1_type A, op2_type B,				\
 		       op3_type C, op4_type D, int const I)		\
   { func (A, B, C, D, imm); }
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
    are defined as macros for non-optimized compilations. */
 
 /* mmintrin.h (MMX).  */
 #ifdef DIFFERENT_PRAGMAS
 #pragma GCC target ("mmx")
 #endif
 #include <mmintrin.h>
 
 /* mm3dnow.h (3DNOW).  */
 #ifdef DIFFERENT_PRAGMAS
 #pragma GCC target ("3dnow")
 #endif
 #include <mm3dnow.h>
 
 /* xmmintrin.h (SSE).  */
 #ifdef DIFFERENT_PRAGMAS
 #pragma GCC target ("sse")
 #endif
 #include <xmmintrin.h>
 test_2 (_mm_shuffle_ps, __m128, __m128, __m128, 1)
 test_1 (_mm_extract_pi16, int, __m64, 1)
 test_1 (_m_pextrw, int, __m64, 1)
 test_2 (_mm_insert_pi16, __m64, __m64, int, 1)
 test_2 (_m_pinsrw, __m64, __m64, int, 1)
 test_1 (_mm_shuffle_pi16, __m64, __m64, 1)
 test_1 (_m_pshufw, __m64, __m64, 1)
 test_1 (_mm_prefetch, void, void *, _MM_HINT_NTA)
 
 /* emmintrin.h (SSE2).  */
 #ifdef DIFFERENT_PRAGMAS
 #pragma GCC target ("sse2")
 #endif
 #include <emmintrin.h>
 test_2 (_mm_shuffle_pd, __m128d, __m128d, __m128d, 1)
 test_1 (_mm_bsrli_si128, __m128i, __m128i, 1)
 test_1 (_mm_bslli_si128, __m128i, __m128i, 1)
 test_1 (_mm_srli_si128, __m128i, __m128i, 1)
 test_1 (_mm_slli_si128, __m128i, __m128i, 1)
 test_1 (_mm_extract_epi16, int, __m128i, 1)
 test_2 (_mm_insert_epi16, __m128i, __m128i, int, 1)
 test_1 (_mm_shufflehi_epi16, __m128i, __m128i, 1)
 test_1 (_mm_shufflelo_epi16, __m128i, __m128i, 1)
 test_1 (_mm_shuffle_epi32, __m128i, __m128i, 1)
 
 /* pmmintrin.h (SSE3).  */
 #ifdef DIFFERENT_PRAGMAS
 #pragma GCC target ("sse3")
@@ -171,101 +171,101 @@ test_2 (_mm_alignr_pi8, __m64, __m64, __m64, 1)
 test_1x (_mm_extracti_si64, __m128i, __m128i, 1, 1)
 test_2x (_mm_inserti_si64, __m128i, __m128i, __m128i, 1, 1)
 
 /* Note, nmmintrin.h includes smmintrin.h, and smmintrin.h
    checks for the #ifdef.  So just set the option to SSE4.2.  */
 #ifdef DIFFERENT_PRAGMAS
 #pragma GCC target ("sse4.2")
 #endif
 #include <nmmintrin.h>
 /* smmintrin.h (SSE4.2).  */
 test_1 (_mm_round_pd, __m128d, __m128d, 9)
 test_1 (_mm_round_ps, __m128, __m128, 9)
 test_2 (_mm_round_sd, __m128d, __m128d, __m128d, 9)
 test_2 (_mm_round_ss, __m128, __m128, __m128, 9)
 
 test_2 (_mm_blend_epi16, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_blend_ps, __m128, __m128, __m128, 1)
 test_2 (_mm_blend_pd, __m128d, __m128d, __m128d, 1)
 test_2 (_mm_dp_ps, __m128, __m128, __m128, 1)
 test_2 (_mm_dp_pd, __m128d, __m128d, __m128d, 1)
 test_2 (_mm_insert_ps, __m128, __m128, __m128, 1)
 test_1 (_mm_extract_ps, int, __m128, 1)
 test_2 (_mm_insert_epi8, __m128i, __m128i, int, 1)
 test_2 (_mm_insert_epi32, __m128i, __m128i, int, 1)
 #ifdef __x86_64__
 test_2 (_mm_insert_epi64, __m128i, __m128i, long long, 1)
 #endif
 test_1 (_mm_extract_epi8, int, __m128i, 1)
 test_1 (_mm_extract_epi32, int, __m128i, 1)
 #ifdef __x86_64__
 test_1 (_mm_extract_epi64, long long, __m128i, 1)
 #endif
 test_2 (_mm_mpsadbw_epu8, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_cmpistrm, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_cmpistri, int, __m128i, __m128i, 1)
 test_4 (_mm_cmpestrm, __m128i, __m128i, int, __m128i, int, 1)
 test_4 (_mm_cmpestri, int, __m128i, int, __m128i, int, 1)
 test_2 (_mm_cmpistra, int, __m128i, __m128i, 1)
 test_2 (_mm_cmpistrc, int, __m128i, __m128i, 1)
 test_2 (_mm_cmpistro, int, __m128i, __m128i, 1)
 test_2 (_mm_cmpistrs, int, __m128i, __m128i, 1)
 test_2 (_mm_cmpistrz, int, __m128i, __m128i, 1)
 test_4 (_mm_cmpestra, int, __m128i, int, __m128i, int, 1)
 test_4 (_mm_cmpestrc, int, __m128i, int, __m128i, int, 1)
 test_4 (_mm_cmpestro, int, __m128i, int, __m128i, int, 1)
 test_4 (_mm_cmpestrs, int, __m128i, int, __m128i, int, 1)
 test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx5124fmaps,avx5124vnniw,avx512vpopcntdq")
+#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni")
 #endif
 #include <immintrin.h>
 test_1 (_cvtss_sh, unsigned short, float, 1)
 test_1 (_mm_cvtps_ph, __m128i, __m128, 1)
 test_1 (_mm256_cvtps_ph, __m128i, __m256, 1)
 
 /* avxintrin.h */
 test_2 (_mm256_blend_pd, __m256d, __m256d, __m256d, 1)
 test_2 (_mm256_blend_ps, __m256, __m256, __m256, 1)
 test_2 (_mm256_dp_ps, __m256, __m256, __m256, 1)
 test_2 (_mm256_shuffle_pd, __m256d, __m256d, __m256d, 1)
 test_2 (_mm256_shuffle_ps, __m256, __m256, __m256, 1)
 test_2 (_mm_cmp_sd, __m128d, __m128d, __m128d, 1)
 test_2 (_mm_cmp_ss, __m128, __m128, __m128, 1)
 test_2 (_mm_cmp_pd, __m128d, __m128d, __m128d, 1)
 test_2 (_mm_cmp_ps, __m128, __m128, __m128, 1)
 test_2 (_mm256_cmp_pd, __m256d, __m256d, __m256d, 1)
 test_2 (_mm256_cmp_ps, __m256, __m256, __m256, 1)
 test_1 (_mm256_extractf128_pd, __m128d, __m256d, 1)
 test_1 (_mm256_extractf128_ps, __m128, __m256, 1)
 test_1 (_mm256_extractf128_si256, __m128i, __m256i, 1)
 test_1 (_mm256_extract_epi8, int, __m256i, 20)
 test_1 (_mm256_extract_epi16, int, __m256i, 13)
 test_1 (_mm256_extract_epi32, int, __m256i, 6)
 #ifdef __x86_64__
 test_1 (_mm256_extract_epi64, long long, __m256i, 2)
 #endif
 test_1 (_mm_permute_pd, __m128d, __m128d, 1)
 test_1 (_mm256_permute_pd, __m256d, __m256d, 1)
 test_1 (_mm_permute_ps, __m128, __m128, 1)
 test_1 (_mm256_permute_ps, __m256, __m256, 1)
 test_2 (_mm256_permute2f128_pd, __m256d, __m256d, __m256d, 1)
 test_2 (_mm256_permute2f128_ps, __m256, __m256, __m256, 1)
 test_2 (_mm256_permute2f128_si256, __m256i, __m256i, __m256i, 1)
 test_2 (_mm256_insertf128_pd, __m256d, __m256d, __m128d, 1)
 test_2 (_mm256_insertf128_ps, __m256, __m256, __m128, 1)
 test_2 (_mm256_insertf128_si256, __m256i, __m256i, __m128i, 1)
 test_2 (_mm256_insert_epi8, __m256i, __m256i, int, 30)
 test_2 (_mm256_insert_epi16, __m256i, __m256i, int, 7)
 test_2 (_mm256_insert_epi32, __m256i, __m256i, int, 3)
 #ifdef __x86_64__
 test_2 (_mm256_insert_epi64, __m256i, __m256i, long long, 1)
 #endif
 test_1 (_mm256_round_pd, __m256d, __m256d, 9)
 test_1 (_mm256_round_ps, __m256, __m256, 9)
 
 /* avx2intrin.h */
 test_2 ( _mm256_mpsadbw_epu8, __m256i, __m256i, __m256i, 1)
 test_2 ( _mm256_alignr_epi8, __m256i, __m256i, __m256i, 1)
 test_2 ( _mm256_blend_epi16, __m256i, __m256i, __m256i, 1)
@@ -648,92 +648,97 @@ test_4x (_mm512_maskz_fixupimm_round_ps, __m512, __mmask16, __m512, __m512, __m5
 test_4x (_mm_mask_fixupimm_round_sd, __m128d, __m128d, __mmask8, __m128d, __m128i, 1, 8)
 test_4x (_mm_mask_fixupimm_round_ss, __m128, __m128, __mmask8, __m128, __m128i, 1, 8)
 test_4x (_mm_maskz_fixupimm_round_sd, __m128d, __mmask8, __m128d, __m128d, __m128i, 1, 8)
 test_4x (_mm_maskz_fixupimm_round_ss, __m128, __mmask8, __m128, __m128, __m128i, 1, 8)
 
 /* avx512pfintrin.h */
 test_2vx (_mm512_prefetch_i32gather_ps, __m512i, void const *, 1, _MM_HINT_T0)
 test_2vx (_mm512_prefetch_i32scatter_ps, void const *, __m512i, 1, _MM_HINT_T0)
 test_2vx (_mm512_prefetch_i64gather_ps, __m512i, void const *, 1, _MM_HINT_T0)
 test_2vx (_mm512_prefetch_i64scatter_ps, void const *, __m512i, 1, _MM_HINT_T0)
 test_2vx (_mm512_prefetch_i32gather_pd, __m256i, void const *, 1, _MM_HINT_T0)
 test_2vx (_mm512_prefetch_i32scatter_pd, void const *, __m256i, 1, _MM_HINT_T0)
 test_2vx (_mm512_prefetch_i64gather_pd, __m512i, long long *, 1, _MM_HINT_T0)
 test_2vx (_mm512_prefetch_i64scatter_pd, void const *, __m512i, 1, _MM_HINT_T0)
 test_3vx (_mm512_mask_prefetch_i32gather_ps, __m512i, __mmask16, void const *, 1, _MM_HINT_T0)
 test_3vx (_mm512_mask_prefetch_i32scatter_ps, void const *, __mmask16, __m512i, 1, _MM_HINT_T0)
 test_3vx (_mm512_mask_prefetch_i64gather_ps, __m512i, __mmask8, void const *, 1, _MM_HINT_T0)
 test_3vx (_mm512_mask_prefetch_i64scatter_ps, void const *, __mmask8, __m512i, 1, _MM_HINT_T0)
 test_3vx (_mm512_mask_prefetch_i32gather_pd, __m256i, __mmask8, void const *, 1, _MM_HINT_T0)
 test_3vx (_mm512_mask_prefetch_i32scatter_pd, void const *, __mmask8, __m256i, 1, _MM_HINT_T0)
 test_3vx (_mm512_mask_prefetch_i64gather_pd, __m512i, __mmask8, long long *, 1, _MM_HINT_T0)
 test_3vx (_mm512_mask_prefetch_i64scatter_pd, void const *, __mmask8, __m512i, 1, _MM_HINT_T0)
 
 /* avx512erintrin.h */
 test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 8)
 test_1 (_mm512_exp2a23_round_ps, __m512, __m512, 8)
 test_1 (_mm512_rcp28_round_pd, __m512d, __m512d, 8)
 test_1 (_mm512_rcp28_round_ps, __m512, __m512, 8)
 test_1 (_mm512_rsqrt28_round_pd, __m512d, __m512d, 8)
 test_1 (_mm512_rsqrt28_round_ps, __m512, __m512, 8)
 test_2 (_mm512_maskz_exp2a23_round_pd, __m512d, __mmask8, __m512d, 8)
 test_2 (_mm512_maskz_exp2a23_round_ps, __m512, __mmask16, __m512, 8)
 test_2 (_mm512_maskz_rcp28_round_pd, __m512d, __mmask8, __m512d, 8)
 test_2 (_mm512_maskz_rcp28_round_ps, __m512, __mmask16, __m512, 8)
 test_2 (_mm512_maskz_rsqrt28_round_pd, __m512d, __mmask8, __m512d, 8)
 test_2 (_mm512_maskz_rsqrt28_round_ps, __m512, __mmask16, __m512, 8)
 test_3 (_mm512_mask_exp2a23_round_pd, __m512d, __m512d, __mmask8, __m512d, 8)
 test_3 (_mm512_mask_exp2a23_round_ps, __m512, __m512, __mmask16, __m512, 8)
 test_3 (_mm512_mask_rcp28_round_pd, __m512d, __m512d, __mmask8, __m512d, 8)
 test_3 (_mm512_mask_rcp28_round_ps, __m512, __m512, __mmask16, __m512, 8)
 test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 8)
 test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 8)
 test_2 (_mm_rcp28_round_sd, __m128d, __m128d, __m128d, 8)
 test_2 (_mm_rcp28_round_ss, __m128, __m128, __m128, 8)
 test_2 (_mm_rsqrt28_round_sd, __m128d, __m128d, __m128d, 8)
 test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 8)
 
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
 
+/* gfniintrin.h */
+test_2 (_mm_gf2p8affineinv_epi64_epi8, __m128i, __m128i, __m128i, 1)
+test_2 (_mm256_gf2p8affineinv_epi64_epi8, __m256i, __m256i, __m256i, 1)
+test_2 (_mm512_gf2p8affineinv_epi64_epi8, __m512i, __m512i, __m512i, 1)
+
 /* wmmintrin.h (AES/PCLMUL).  */
 #ifdef DIFFERENT_PRAGMAS
 #pragma GCC target ("aes,pclmul")
 #endif
 #include <wmmintrin.h>
 test_1 (_mm_aeskeygenassist_si128, __m128i, __m128i, 1)
 test_2 (_mm_clmulepi64_si128, __m128i, __m128i, __m128i, 1)
 
 /* popcnintrin.h (POPCNT).  */
 #ifdef DIFFERENT_PRAGMAS
 #pragma GCC target ("popcnt")
 #endif
 #include <popcntintrin.h>
 
 /* x86intrin.h (FMA4/XOP/LWP/BMI/BMI2/TBM/LZCNT/FMA). */
 #ifdef DIFFERENT_PRAGMAS
 #pragma GCC target ("fma4,xop,lwp,bmi,bmi2,tbm,lzcnt,fma,rdseed,prfchw,adx,fxsr,xsaveopt,xsavec,xsaves,clflushopt,clwb,pku,sgx,rdpid")
 #endif
 #include <x86intrin.h>
 /* xopintrin.h */
 test_1 ( _mm_roti_epi8, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi16, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi32, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi64, __m128i, __m128i, 1)
 test_3 (_mm_permute2_pd, __m128d, __m128d, __m128d, __m128d, 1)
 test_3 (_mm256_permute2_pd, __m256d, __m256d, __m256d, __m256d, 1)
 test_3 (_mm_permute2_ps, __m128, __m128, __m128, __m128, 1)
 test_3 (_mm256_permute2_ps, __m256, __m256, __m256, __m256, 1)
 
 /* lwpintrin.h */
 test_2 ( __lwpval32, void, unsigned int, unsigned int, 1)
 test_2 ( __lwpins32, unsigned char, unsigned int, unsigned int, 1)
 #ifdef __x86_64__
 test_2 ( __lwpval64, void, unsigned long long, unsigned int, 1)
 test_2 ( __lwpins64, unsigned char, unsigned long long, unsigned int, 1)
 #endif
 
 /* tbmintrin.h */
 test_1 ( __bextri_u32, unsigned int, unsigned int, 1)
 #ifdef __x86_64__
 test_1 ( __bextri_u64, unsigned long long, unsigned long long, 1)
 #endif
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 3a90e54..911258f 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -572,53 +572,61 @@
 #define __builtin_ia32_cmppd128_mask(A, B, E, D) __builtin_ia32_cmppd128_mask(A, B, 1, D)
 #define __builtin_ia32_cmpd256_mask(A, B, E, D) __builtin_ia32_cmpd256_mask(A, B, 1, D)
 #define __builtin_ia32_cmpd128_mask(A, B, E, D) __builtin_ia32_cmpd128_mask(A, B, 1, D)
 #define __builtin_ia32_alignq256_mask(A, B, F, D, E) __builtin_ia32_alignq256_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignq128_mask(A, B, F, D, E) __builtin_ia32_alignq128_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignd256_mask(A, B, F, D, E) __builtin_ia32_alignd256_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignd128_mask(A, B, F, D, E) __builtin_ia32_alignd128_mask(A, B, 1, D, E)
 
 /* avx512vlbwintrin.h */
 #define __builtin_ia32_ucmpw256_mask(A, B, E, D) __builtin_ia32_ucmpw256_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpw128_mask(A, B, E, D) __builtin_ia32_ucmpw128_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb256_mask(A, B, E, D) __builtin_ia32_ucmpb256_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpb128_mask(A, B, E, D) __builtin_ia32_ucmpb128_mask(A, B, 1, D)
 #define __builtin_ia32_psrlwi256_mask(A, E, C, D) __builtin_ia32_psrlwi256_mask(A, 1, C, D)
 #define __builtin_ia32_psrlwi128_mask(A, E, C, D) __builtin_ia32_psrlwi128_mask(A, 1, C, D)
 #define __builtin_ia32_psrawi256_mask(A, E, C, D) __builtin_ia32_psrawi256_mask(A, 1, C, D)
 #define __builtin_ia32_psrawi128_mask(A, E, C, D) __builtin_ia32_psrawi128_mask(A, 1, C, D)
 #define __builtin_ia32_psllwi256_mask(A, E, C, D) __builtin_ia32_psllwi256_mask(A, 1, C, D)
 #define __builtin_ia32_psllwi128_mask(A, E, C, D) __builtin_ia32_psllwi128_mask(A, 1, C, D)
 #define __builtin_ia32_pshuflw256_mask(A, E, C, D) __builtin_ia32_pshuflw256_mask(A, 1, C, D)
 #define __builtin_ia32_pshuflw128_mask(A, E, C, D) __builtin_ia32_pshuflw128_mask(A, 1, C, D)
 #define __builtin_ia32_pshufhw256_mask(A, E, C, D) __builtin_ia32_pshufhw256_mask(A, 1, C, D)
 #define __builtin_ia32_pshufhw128_mask(A, E, C, D) __builtin_ia32_pshufhw128_mask(A, 1, C, D)
 #define __builtin_ia32_palignr256_mask(A, B, F, D, E) __builtin_ia32_palignr256_mask(A, B, 8, D, E)
 #define __builtin_ia32_palignr128_mask(A, B, F, D, E) __builtin_ia32_palignr128_mask(A, B, 8, D, E)
 #define __builtin_ia32_dbpsadbw256_mask(A, B, F, D, E) __builtin_ia32_dbpsadbw256_mask(A, B, 1, D, E)
 #define __builtin_ia32_dbpsadbw128_mask(A, B, F, D, E) __builtin_ia32_dbpsadbw128_mask(A, B, 1, D, E)
 #define __builtin_ia32_cmpw256_mask(A, B, E, D) __builtin_ia32_cmpw256_mask(A, B, 1, D)
 #define __builtin_ia32_cmpw128_mask(A, B, E, D) __builtin_ia32_cmpw128_mask(A, B, 1, D)
 #define __builtin_ia32_cmpb256_mask(A, B, E, D) __builtin_ia32_cmpb256_mask(A, B, 1, D)
 #define __builtin_ia32_cmpb128_mask(A, B, E, D) __builtin_ia32_cmpb128_mask(A, B, 1, D)
 
 /* avx512vldqintrin.h */
 #define __builtin_ia32_reduceps256_mask(A, E, C, D) __builtin_ia32_reduceps256_mask(A, 1, C, D)
 #define __builtin_ia32_reduceps128_mask(A, E, C, D) __builtin_ia32_reduceps128_mask(A, 1, C, D)
 #define __builtin_ia32_reducepd256_mask(A, E, C, D) __builtin_ia32_reducepd256_mask(A, 1, C, D)
 #define __builtin_ia32_reducepd128_mask(A, E, C, D) __builtin_ia32_reducepd128_mask(A, 1, C, D)
 #define __builtin_ia32_rangeps256_mask(A, B, F, D, E) __builtin_ia32_rangeps256_mask(A, B, 1, D, E)
 #define __builtin_ia32_rangeps128_mask(A, B, F, D, E) __builtin_ia32_rangeps128_mask(A, B, 1, D, E)
 #define __builtin_ia32_rangepd256_mask(A, B, F, D, E) __builtin_ia32_rangepd256_mask(A, B, 1, D, E)
 #define __builtin_ia32_rangepd128_mask(A, B, F, D, E) __builtin_ia32_rangepd128_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti64x2_256_mask(A, B, F, D, E) __builtin_ia32_inserti64x2_256_mask(A, B, 1, D, E)
 #define __builtin_ia32_insertf64x2_256_mask(A, B, F, D, E) __builtin_ia32_insertf64x2_256_mask(A, B, 1, D, E)
 #define __builtin_ia32_fpclassps256_mask(A, D, C) __builtin_ia32_fpclassps256_mask(A, 1, C)
 #define __builtin_ia32_fpclassps128_mask(A, D, C) __builtin_ia32_fpclassps128_mask(A, 1, C)
 #define __builtin_ia32_fpclasspd256_mask(A, D, C) __builtin_ia32_fpclasspd256_mask(A, 1, C)
 #define __builtin_ia32_fpclasspd128_mask(A, D, C) __builtin_ia32_fpclasspd128_mask(A, 1, C)
 #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D)
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid")
+/* gfniintrin.h */
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v16qi(A, B, 1) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v32qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, C) __builtin_ia32_vgf2p8affineinvqb_v64qi(A, B, 1)
+#define __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v16qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v32qi_mask(A, B, 1, D, E) 
+#define __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, C, D, E) __builtin_ia32_vgf2p8affineinvqb_v64qi_mask(A, B, 1, D, E) 
+
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni")
 
 #include <x86intrin.h>
-- 
2.5.5


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch][x86] GFNI enabling [2/4]
  2017-10-30 19:03   ` Koval, Julia
@ 2017-10-31  7:03     ` Kirill Yukhin
  2017-10-31 20:08     ` Jakub Jelinek
  1 sibling, 0 replies; 9+ messages in thread
From: Kirill Yukhin @ 2017-10-31  7:03 UTC (permalink / raw)
  To: Koval, Julia; +Cc: GCC Patches

Hello Julia!
On 30 Oct 19:02, Koval, Julia wrote:
> Hi,
> Fixed that.
Your patch is OK for trunk. I've comitted it w/ minor re-indentation in
gcc/ChangeLog entry.

--
Thanks, K
> > >
> > > Ok for trunk?
> > Few comments:
> > 1. Why copyright in config/i386/gfniintrin.h starts from 2014?
> > 
> > 2. I think few tests updates are missing: g++.dg/other/i386-2,3.c +
> > gcc.target/i386/sse-12,14.c
> > 
> > --
> > Thanks, K
> > >
> > > Thanks,
> > > Julia
> > 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch][x86] GFNI enabling [2/4]
  2017-10-30 19:03   ` Koval, Julia
  2017-10-31  7:03     ` Kirill Yukhin
@ 2017-10-31 20:08     ` Jakub Jelinek
  2017-11-02 11:57       ` Koval, Julia
  1 sibling, 1 reply; 9+ messages in thread
From: Jakub Jelinek @ 2017-10-31 20:08 UTC (permalink / raw)
  To: Koval, Julia; +Cc: Kirill Yukhin, GCC Patches

On Mon, Oct 30, 2017 at 07:02:23PM +0000, Koval, Julia wrote:
> gcc/testsuite/
> 	* gcc.target/i386/avx-1.c: Handle new intrinsics.
> 	* gcc.target/i386/avx512-check.h: Check GFNI bit.
> 	* gcc.target/i386/avx512f-gf2p8affineinvqb-2.c: Runtime test.
> 	* gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c: Runtime test.
> 	* gcc.target/i386/gfni-1.c: New.
> 	* gcc.target/i386/gfni-2.c: New.
> 	* gcc.target/i386/gfni-3.c: New.
> 	* gcc.target/i386/gfni-4.c: New.

The gfni-4.c testcase ICEs on i686-linux (e.g. try
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32/-msse,-m32/-mno-sse,-m64\} i386.exp=gfni*'
to see it).

I must say I'm confused by the CPUIDs, the https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
lists GFNI; 2x AVX+GFNI; 2x AVX512VL+GFNI; AVX512F+GFNI CPUIDs for the
instructions, but i386-builtins.def has:
BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_
BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v32qi, "__builtin_ia32_vgf2p8affineinvqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v32qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_
BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v16qi, "__builtin_ia32_vgf2p8affineinvqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN
BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v16qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_
and the gfniintrin.h requires just gfni for the first insn,
and then some combinations of gfni,avx, or gfni,avx512vl, or
gfni,avx512vl,avx512bw, or gfni,avx512f,avx512bw.

So, what is right, the paper, i386-builtins.def or gfniintrin.h?

Obviously even if the GF2P8AFFINEINVQB instruction doesn't list SSE as
required CPUID, we can't really emit it without at least SSE because
then the operands can't be emitted.  So, at least in GCC we should
require both GFNI and SSE for the first instruction.

Which leads to another issue, as ix86_expand_builtin documents,
we treat the BDESC ISAs OPTION_MASK_ISA_ISA1 | OPTION_MASK_ISA_ISA2
as either ISA1 or ISA2, not ISA1 and ISA2.  The exceptions are
MMX, AVX512VL and 64BIT is also special.
So, shall GFNI be added to that set?  Do we have other ISAs that
should be handled the same?  I guess maybe OPTION_MASK_ISA_AES, but
that is handled weirdly.

	Jakub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [patch][x86] GFNI enabling [2/4]
  2017-10-31 20:08     ` Jakub Jelinek
@ 2017-11-02 11:57       ` Koval, Julia
  2017-11-03  8:27         ` Koval, Julia
  2017-11-03 17:42         ` Koval, Julia
  0 siblings, 2 replies; 9+ messages in thread
From: Koval, Julia @ 2017-11-02 11:57 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Kirill Yukhin, GCC Patches

The documentation is right, I was wrong not adding SSE/AVX flags in these builtin declaratuin.

> The exceptions are
> MMX, AVX512VL and 64BIT is also special.
> So, shall GFNI be added to that set?  
Turns out only GFNI and VAES(haven't sent those yet, they are from the same Icelake pdf) are like this, others rely on AVX512VL/BW. But what do you think about adding AVX/SSE flags to this special set instead? Looks like they more probably will be used as a flags, on which new instructions may depend in the future, than GFNI/VAES flags.

-Julia

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Tuesday, October 31, 2017 8:28 PM
> To: Koval, Julia <julia.koval@intel.com>
> Cc: Kirill Yukhin <kirill.yukhin@gmail.com>; GCC Patches <gcc-
> patches@gcc.gnu.org>
> Subject: Re: [patch][x86] GFNI enabling [2/4]
> 
> On Mon, Oct 30, 2017 at 07:02:23PM +0000, Koval, Julia wrote:
> > gcc/testsuite/
> > 	* gcc.target/i386/avx-1.c: Handle new intrinsics.
> > 	* gcc.target/i386/avx512-check.h: Check GFNI bit.
> > 	* gcc.target/i386/avx512f-gf2p8affineinvqb-2.c: Runtime test.
> > 	* gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c: Runtime test.
> > 	* gcc.target/i386/gfni-1.c: New.
> > 	* gcc.target/i386/gfni-2.c: New.
> > 	* gcc.target/i386/gfni-3.c: New.
> > 	* gcc.target/i386/gfni-4.c: New.
> 
> The gfni-4.c testcase ICEs on i686-linux (e.g. try
> make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32/-msse,-m32/-
> mno-sse,-m64\} i386.exp=gfni*'
> to see it).
> 
> I must say I'm confused by the CPUIDs, the
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-
> instruction-set-extensions-programming-reference.pdf
> lists GFNI; 2x AVX+GFNI; 2x AVX512VL+GFNI; AVX512F+GFNI CPUIDs for the
> instructions, but i386-builtins.def has:
> BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v64qi,
> "__builtin_ia32_vgf2p8affineinvqb_v64qi",
> IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN
> BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW,
> CODE_FOR_vgf2p8affineinvqb_v64qi_mask,
> "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_
> BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v32qi,
> "__builtin_ia32_vgf2p8affineinvqb_v32qi",
> IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN
> BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW,
> CODE_FOR_vgf2p8affineinvqb_v32qi_mask,
> "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_
> BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v16qi,
> "__builtin_ia32_vgf2p8affineinvqb_v16qi",
> IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN
> BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW,
> CODE_FOR_vgf2p8affineinvqb_v16qi_mask,
> "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_
> and the gfniintrin.h requires just gfni for the first insn,
> and then some combinations of gfni,avx, or gfni,avx512vl, or
> gfni,avx512vl,avx512bw, or gfni,avx512f,avx512bw.
> 
> So, what is right, the paper, i386-builtins.def or gfniintrin.h?
> 
> Obviously even if the GF2P8AFFINEINVQB instruction doesn't list SSE as
> required CPUID, we can't really emit it without at least SSE because
> then the operands can't be emitted.  So, at least in GCC we should
> require both GFNI and SSE for the first instruction.
> 
> Which leads to another issue, as ix86_expand_builtin documents,
> we treat the BDESC ISAs OPTION_MASK_ISA_ISA1 | OPTION_MASK_ISA_ISA2
> as either ISA1 or ISA2, not ISA1 and ISA2.  The exceptions are
> MMX, AVX512VL and 64BIT is also special.
> So, shall GFNI be added to that set?  Do we have other ISAs that
> should be handled the same?  I guess maybe OPTION_MASK_ISA_AES, but
> that is handled weirdly.
> 
> 	Jakub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [patch][x86] GFNI enabling [2/4]
  2017-11-02 11:57       ` Koval, Julia
@ 2017-11-03  8:27         ` Koval, Julia
  2017-11-03 17:42         ` Koval, Julia
  1 sibling, 0 replies; 9+ messages in thread
From: Koval, Julia @ 2017-11-03  8:27 UTC (permalink / raw)
  To: 'Jakub Jelinek'; +Cc: 'Kirill Yukhin', 'GCC Patches'

> But what do you think about adding AVX/SSE flags to this special set instead?
Ok, was wrong, it is impossible to add SSE, because it is used in normal "or" way. Then I'll add GFNI/VAES instead.

There is also another problem there: GFNI belongs to isa_flags2, while AVX512VL/AVX/SSE belong to isa_flags, so we can't keep them in the same field. There are candidates, which can be moved from isa_flags to isa_flags2 instead of GFNI, because there are no dependencies on other flags, but it is only a short term solution.

> -----Original Message-----
> From: Koval, Julia
> Sent: Thursday, November 02, 2017 12:57 PM
> To: Jakub Jelinek <jakub@redhat.com>
> Cc: Kirill Yukhin <kirill.yukhin@gmail.com>; GCC Patches <gcc-
> patches@gcc.gnu.org>
> Subject: RE: [patch][x86] GFNI enabling [2/4]
> 
> The documentation is right, I was wrong not adding SSE/AVX flags in these
> builtin declaratuin.
> 
> > The exceptions are
> > MMX, AVX512VL and 64BIT is also special.
> > So, shall GFNI be added to that set?
> Turns out only GFNI and VAES(haven't sent those yet, they are from the same
> Icelake pdf) are like this, others rely on AVX512VL/BW. But what do you think
> about adding AVX/SSE flags to this special set instead? Looks like they more
> probably will be used as a flags, on which new instructions may depend in the
> future, than GFNI/VAES flags.
> 
> -Julia
> 
> > -----Original Message-----
> > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> > owner@gcc.gnu.org] On Behalf Of Jakub Jelinek
> > Sent: Tuesday, October 31, 2017 8:28 PM
> > To: Koval, Julia <julia.koval@intel.com>
> > Cc: Kirill Yukhin <kirill.yukhin@gmail.com>; GCC Patches <gcc-
> > patches@gcc.gnu.org>
> > Subject: Re: [patch][x86] GFNI enabling [2/4]
> >
> > On Mon, Oct 30, 2017 at 07:02:23PM +0000, Koval, Julia wrote:
> > > gcc/testsuite/
> > > 	* gcc.target/i386/avx-1.c: Handle new intrinsics.
> > > 	* gcc.target/i386/avx512-check.h: Check GFNI bit.
> > > 	* gcc.target/i386/avx512f-gf2p8affineinvqb-2.c: Runtime test.
> > > 	* gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c: Runtime test.
> > > 	* gcc.target/i386/gfni-1.c: New.
> > > 	* gcc.target/i386/gfni-2.c: New.
> > > 	* gcc.target/i386/gfni-3.c: New.
> > > 	* gcc.target/i386/gfni-4.c: New.
> >
> > The gfni-4.c testcase ICEs on i686-linux (e.g. try
> > make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32/-msse,-m32/-
> > mno-sse,-m64\} i386.exp=gfni*'
> > to see it).
> >
> > I must say I'm confused by the CPUIDs, the
> > https://software.intel.com/sites/default/files/managed/c5/15/architecture-
> > instruction-set-extensions-programming-reference.pdf
> > lists GFNI; 2x AVX+GFNI; 2x AVX512VL+GFNI; AVX512F+GFNI CPUIDs for the
> > instructions, but i386-builtins.def has:
> > BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v64qi,
> > "__builtin_ia32_vgf2p8affineinvqb_v64qi",
> > IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN
> > BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW,
> > CODE_FOR_vgf2p8affineinvqb_v64qi_mask,
> > "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_
> > BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v32qi,
> > "__builtin_ia32_vgf2p8affineinvqb_v32qi",
> > IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN
> > BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW,
> > CODE_FOR_vgf2p8affineinvqb_v32qi_mask,
> > "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_
> > BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v16qi,
> > "__builtin_ia32_vgf2p8affineinvqb_v16qi",
> > IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN
> > BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW,
> > CODE_FOR_vgf2p8affineinvqb_v16qi_mask,
> > "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_
> > and the gfniintrin.h requires just gfni for the first insn,
> > and then some combinations of gfni,avx, or gfni,avx512vl, or
> > gfni,avx512vl,avx512bw, or gfni,avx512f,avx512bw.
> >
> > So, what is right, the paper, i386-builtins.def or gfniintrin.h?
> >
> > Obviously even if the GF2P8AFFINEINVQB instruction doesn't list SSE as
> > required CPUID, we can't really emit it without at least SSE because
> > then the operands can't be emitted.  So, at least in GCC we should
> > require both GFNI and SSE for the first instruction.
> >
> > Which leads to another issue, as ix86_expand_builtin documents,
> > we treat the BDESC ISAs OPTION_MASK_ISA_ISA1 | OPTION_MASK_ISA_ISA2
> > as either ISA1 or ISA2, not ISA1 and ISA2.  The exceptions are
> > MMX, AVX512VL and 64BIT is also special.
> > So, shall GFNI be added to that set?  Do we have other ISAs that
> > should be handled the same?  I guess maybe OPTION_MASK_ISA_AES, but
> > that is handled weirdly.
> >
> > 	Jakub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [patch][x86] GFNI enabling [2/4]
  2017-11-02 11:57       ` Koval, Julia
  2017-11-03  8:27         ` Koval, Julia
@ 2017-11-03 17:42         ` Koval, Julia
  2017-11-07 20:06           ` Kirill Yukhin
  1 sibling, 1 reply; 9+ messages in thread
From: Koval, Julia @ 2017-11-03 17:42 UTC (permalink / raw)
  To: 'Jakub Jelinek'; +Cc: 'Kirill Yukhin', 'GCC Patches'

[-- Attachment #1: Type: text/plain, Size: 6096 bytes --]

Here is the solution I propose:

gcc/
	* common/config/i386/i386-common.c
	(OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET): Remove MPX from flag.
	(ix86_handle_option): Move MPX to isa_flags2 and GFNI to isa_flags.
	* config/i386/i386-c.c (ix86_target_macros_internal): Ditto.
	* config/i386/i386.opt: Ditto.
	* config/i386/i386.c (ix86_target_string): Ditto.
	(ix86_option_override_internal): Ditto.
	(ix86_init_mpx_builtins): Move MPX to args2.
	(ix86_expand_builtin): Special handling for OPTION_MASK_ISA_GFNI.
	* config/i386/i386-builtin.def (__builtin_ia32_vgf2p8affineinvqb_v64qi,
	__builtin_ia32_vgf2p8affineinvqb_v64qi_mask,
	__builtin_ia32_vgf2p8affineinvqb_v32qi,
	__builtin_ia32_vgf2p8affineinvqb_v32qi_mask,
	__builtin_ia32_vgf2p8affineinvqb_v16qi,
	__builtin_ia32_vgf2p8affineinvqb_v16qi_mask): Move to ARGS array.

> -----Original Message-----
> From: Koval, Julia
> Sent: Friday, November 03, 2017 9:27 AM
> To: 'Jakub Jelinek' <jakub@redhat.com>
> Cc: 'Kirill Yukhin' <kirill.yukhin@gmail.com>; 'GCC Patches' <gcc-
> patches@gcc.gnu.org>
> Subject: RE: [patch][x86] GFNI enabling [2/4]
> 
> > But what do you think about adding AVX/SSE flags to this special set instead?
> Ok, was wrong, it is impossible to add SSE, because it is used in normal "or" way.
> Then I'll add GFNI/VAES instead.
> 
> There is also another problem there: GFNI belongs to isa_flags2, while
> AVX512VL/AVX/SSE belong to isa_flags, so we can't keep them in the same field.
> There are candidates, which can be moved from isa_flags to isa_flags2 instead
> of GFNI, because there are no dependencies on other flags, but it is only a short
> term solution.
> 
> > -----Original Message-----
> > From: Koval, Julia
> > Sent: Thursday, November 02, 2017 12:57 PM
> > To: Jakub Jelinek <jakub@redhat.com>
> > Cc: Kirill Yukhin <kirill.yukhin@gmail.com>; GCC Patches <gcc-
> > patches@gcc.gnu.org>
> > Subject: RE: [patch][x86] GFNI enabling [2/4]
> >
> > The documentation is right, I was wrong not adding SSE/AVX flags in these
> > builtin declaratuin.
> >
> > > The exceptions are
> > > MMX, AVX512VL and 64BIT is also special.
> > > So, shall GFNI be added to that set?
> > Turns out only GFNI and VAES(haven't sent those yet, they are from the same
> > Icelake pdf) are like this, others rely on AVX512VL/BW. But what do you think
> > about adding AVX/SSE flags to this special set instead? Looks like they more
> > probably will be used as a flags, on which new instructions may depend in the
> > future, than GFNI/VAES flags.
> >
> > -Julia
> >
> > > -----Original Message-----
> > > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> > > owner@gcc.gnu.org] On Behalf Of Jakub Jelinek
> > > Sent: Tuesday, October 31, 2017 8:28 PM
> > > To: Koval, Julia <julia.koval@intel.com>
> > > Cc: Kirill Yukhin <kirill.yukhin@gmail.com>; GCC Patches <gcc-
> > > patches@gcc.gnu.org>
> > > Subject: Re: [patch][x86] GFNI enabling [2/4]
> > >
> > > On Mon, Oct 30, 2017 at 07:02:23PM +0000, Koval, Julia wrote:
> > > > gcc/testsuite/
> > > > 	* gcc.target/i386/avx-1.c: Handle new intrinsics.
> > > > 	* gcc.target/i386/avx512-check.h: Check GFNI bit.
> > > > 	* gcc.target/i386/avx512f-gf2p8affineinvqb-2.c: Runtime test.
> > > > 	* gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c: Runtime test.
> > > > 	* gcc.target/i386/gfni-1.c: New.
> > > > 	* gcc.target/i386/gfni-2.c: New.
> > > > 	* gcc.target/i386/gfni-3.c: New.
> > > > 	* gcc.target/i386/gfni-4.c: New.
> > >
> > > The gfni-4.c testcase ICEs on i686-linux (e.g. try
> > > make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32/-msse,-m32/-
> > > mno-sse,-m64\} i386.exp=gfni*'
> > > to see it).
> > >
> > > I must say I'm confused by the CPUIDs, the
> > > https://software.intel.com/sites/default/files/managed/c5/15/architecture-
> > > instruction-set-extensions-programming-reference.pdf
> > > lists GFNI; 2x AVX+GFNI; 2x AVX512VL+GFNI; AVX512F+GFNI CPUIDs for the
> > > instructions, but i386-builtins.def has:
> > > BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v64qi,
> > > "__builtin_ia32_vgf2p8affineinvqb_v64qi",
> > > IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN
> > > BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW,
> > > CODE_FOR_vgf2p8affineinvqb_v64qi_mask,
> > > "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_
> > > BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v32qi,
> > > "__builtin_ia32_vgf2p8affineinvqb_v32qi",
> > > IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN
> > > BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW,
> > > CODE_FOR_vgf2p8affineinvqb_v32qi_mask,
> > > "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_
> > > BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v16qi,
> > > "__builtin_ia32_vgf2p8affineinvqb_v16qi",
> > > IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN
> > > BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW,
> > > CODE_FOR_vgf2p8affineinvqb_v16qi_mask,
> > > "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_
> > > and the gfniintrin.h requires just gfni for the first insn,
> > > and then some combinations of gfni,avx, or gfni,avx512vl, or
> > > gfni,avx512vl,avx512bw, or gfni,avx512f,avx512bw.
> > >
> > > So, what is right, the paper, i386-builtins.def or gfniintrin.h?
> > >
> > > Obviously even if the GF2P8AFFINEINVQB instruction doesn't list SSE as
> > > required CPUID, we can't really emit it without at least SSE because
> > > then the operands can't be emitted.  So, at least in GCC we should
> > > require both GFNI and SSE for the first instruction.
> > >
> > > Which leads to another issue, as ix86_expand_builtin documents,
> > > we treat the BDESC ISAs OPTION_MASK_ISA_ISA1 | OPTION_MASK_ISA_ISA2
> > > as either ISA1 or ISA2, not ISA1 and ISA2.  The exceptions are
> > > MMX, AVX512VL and 64BIT is also special.
> > > So, shall GFNI be added to that set?  Do we have other ISAs that
> > > should be handled the same?  I guess maybe OPTION_MASK_ISA_AES, but
> > > that is handled weirdly.
> > >
> > > 	Jakub

[-- Attachment #2: 0001-fix-gfni.patch --]
[-- Type: application/octet-stream, Size: 11200 bytes --]

From 02000c896ca17ad0cfa301ca993a9ae78defd527 Mon Sep 17 00:00:00 2001
From: julia <jkoval@gkticlel801.igk.intel.com>
Date: Fri, 3 Nov 2017 17:50:14 +0300
Subject: [PATCH] fix gfni

---
 gcc/common/config/i386/i386-common.c | 15 +++++++++------
 gcc/config/i386/i386-builtin.def     | 16 ++++++++--------
 gcc/config/i386/i386-c.c             |  4 ++--
 gcc/config/i386/i386.c               | 20 ++++++++++----------
 gcc/config/i386/i386.opt             |  4 ++--
 5 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index ada918e..acad248 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -242,8 +242,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #define OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET \
   (OPTION_MASK_ISA_MMX_UNSET \
-   | OPTION_MASK_ISA_SSE_UNSET \
-   | OPTION_MASK_ISA_MPX)
+   | OPTION_MASK_ISA_SSE_UNSET)
 
 /* Implement TARGET_HANDLE_OPTION.  */
 
@@ -265,8 +264,12 @@ ix86_handle_option (struct gcc_options *opts,
 	     general registers are allowed.  */
 	  opts->x_ix86_isa_flags
 	    &= ~OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+	  opts->x_ix86_isa_flags2
+	    &= ~OPTION_MASK_ISA_MPX;
 	  opts->x_ix86_isa_flags_explicit
 	    |= OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+	  opts->x_ix86_isa_flags2_explicit
+	    |= OPTION_MASK_ISA_MPX;
 
 	  opts->x_target_flags &= ~MASK_80387;
 	}
@@ -493,13 +496,13 @@ ix86_handle_option (struct gcc_options *opts,
     case OPT_mgfni:
       if (value)
 	{
-	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_GFNI_SET;
-	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_GFNI_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_GFNI_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_GFNI_SET;
 	}
       else
 	{
-	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_GFNI_UNSET;
-	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_GFNI_UNSET;
+	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_GFNI_UNSET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_GFNI_UNSET;
 	}
       return true;
 
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 76e5f0f..3cf5eae 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2394,6 +2394,14 @@ BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_
 BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermi2varv32qi3_mask, "__builtin_ia32_vpermi2varqi256_mask", IX86_BUILTIN_VPERMI2VARQI256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_vpermi2varv16qi3_mask, "__builtin_ia32_vpermi2varqi128_mask", IX86_BUILTIN_VPERMI2VARQI128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI_UHI)
 
+/* GFNI */
+BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX, CODE_FOR_vgf2p8affineinvqb_v32qi, "__builtin_ia32_vgf2p8affineinvqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, CODE_FOR_vgf2p8affineinvqb_v32qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE, CODE_FOR_vgf2p8affineinvqb_v16qi, "__builtin_ia32_vgf2p8affineinvqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT)
+BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_SSE, CODE_FOR_vgf2p8affineinvqb_v16qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI)
+
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
 
@@ -2588,14 +2596,6 @@ BDESC (OPTION_MASK_ISA_AVX512VPOPCNTDQ, CODE_FOR_vpopcountv8di_mask, "__builtin_
 
 /* RDPID */
 BDESC (OPTION_MASK_ISA_RDPID, CODE_FOR_rdpid, "__builtin_ia32_rdpid", IX86_BUILTIN_RDPID, UNKNOWN, (int) UNSIGNED_FTYPE_VOID)
-
-/* GFNI */
-BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v64qi, "__builtin_ia32_vgf2p8affineinvqb_v64qi", IX86_BUILTIN_VGF2P8AFFINEINVQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v64qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v64qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB512MASK, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI)
-BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v32qi, "__builtin_ia32_vgf2p8affineinvqb_v32qi", IX86_BUILTIN_VGF2P8AFFINEINVQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v32qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v32qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB256MASK, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI)
-BDESC (OPTION_MASK_ISA_GFNI, CODE_FOR_vgf2p8affineinvqb_v16qi, "__builtin_ia32_vgf2p8affineinvqb_v16qi", IX86_BUILTIN_VGF2P8AFFINEINVQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT)
-BDESC (OPTION_MASK_ISA_GFNI | OPTION_MASK_ISA_AVX512BW, CODE_FOR_vgf2p8affineinvqb_v16qi_mask, "__builtin_ia32_vgf2p8affineinvqb_v16qi_mask", IX86_BUILTIN_VGF2P8AFFINEINVQB128MASK, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI)
 BDESC_END (ARGS2, MPX)
 
 /* Builtins for MPX.  */
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 7f88bef..18042cd 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -447,7 +447,7 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__XSAVEC__");
   if (isa_flag & OPTION_MASK_ISA_XSAVES)
     def_or_undef (parse_in, "__XSAVES__");
-  if (isa_flag & OPTION_MASK_ISA_MPX)
+  if (isa_flag2 & OPTION_MASK_ISA_MPX)
     def_or_undef (parse_in, "__MPX__");
   if (isa_flag & OPTION_MASK_ISA_CLWB)
     def_or_undef (parse_in, "__CLWB__");
@@ -457,7 +457,7 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__PKU__");
   if (isa_flag2 & OPTION_MASK_ISA_RDPID)
     def_or_undef (parse_in, "__RDPID__");
-  if (isa_flag2 & OPTION_MASK_ISA_GFNI)
+  if (isa_flag & OPTION_MASK_ISA_GFNI)
     def_or_undef (parse_in, "__GFNI__");
   if (isa_flag2 & OPTION_MASK_ISA_IBT)
     {
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2967872..4b9dc05 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2741,7 +2741,7 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2,
      ISAs come first.  Target string will be displayed in the same order.  */
   static struct ix86_target_opts isa2_opts[] =
   {
-    { "-mgfni",		OPTION_MASK_ISA_GFNI },
+    { "-mmpx",		OPTION_MASK_ISA_MPX },
     { "-mrdpid",	OPTION_MASK_ISA_RDPID },
     { "-msgx",		OPTION_MASK_ISA_SGX },
     { "-mavx5124vnniw", OPTION_MASK_ISA_AVX5124VNNIW },
@@ -2752,6 +2752,7 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2,
   };
   static struct ix86_target_opts isa_opts[] =
   {
+    { "-mgfni",		OPTION_MASK_ISA_GFNI },
     { "-mavx512vbmi",	OPTION_MASK_ISA_AVX512VBMI },
     { "-mavx512ifma",	OPTION_MASK_ISA_AVX512IFMA },
     { "-mavx512vl",	OPTION_MASK_ISA_AVX512VL },
@@ -2809,7 +2810,6 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2,
     { "-mlwp",		OPTION_MASK_ISA_LWP },
     { "-mhle",		OPTION_MASK_ISA_HLE },
     { "-mfxsr",		OPTION_MASK_ISA_FXSR },
-    { "-mmpx",		OPTION_MASK_ISA_MPX },
     { "-mclwb",		OPTION_MASK_ISA_CLWB }
   };
 
@@ -4077,8 +4077,8 @@ ix86_option_override_internal (bool main_args_p,
 	    && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512VL))
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VL;
         if (processor_alias_table[i].flags & PTA_MPX
-            && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_MPX))
-          opts->x_ix86_isa_flags |= OPTION_MASK_ISA_MPX;
+            && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_MPX))
+          opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_MPX;
 	if (processor_alias_table[i].flags & PTA_AVX512VBMI
 	    && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512VBMI))
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI;
@@ -4121,10 +4121,10 @@ ix86_option_override_internal (bool main_args_p,
 	break;
       }
 
-  if (TARGET_X32 && (opts->x_ix86_isa_flags & OPTION_MASK_ISA_MPX))
+  if (TARGET_X32 && (opts->x_ix86_isa_flags2 & OPTION_MASK_ISA_MPX))
     error ("Intel MPX does not support x32");
 
-  if (TARGET_X32 && (ix86_isa_flags & OPTION_MASK_ISA_MPX))
+  if (TARGET_X32 && (ix86_isa_flags2 & OPTION_MASK_ISA_MPX))
     error ("Intel MPX does not support x32");
 
   if (i == pta_size)
@@ -30734,7 +30734,7 @@ ix86_init_mpx_builtins ()
 	continue;
 
       ftype = (enum ix86_builtin_func_type) d->flag;
-      decl = def_builtin (d->mask, d->name, ftype, d->code);
+      decl = def_builtin2 (d->mask, d->name, ftype, d->code);
 
       /* With no leaf and nothrow flags for MPX builtins
 	 abnormal edges may follow its call when setjmp
@@ -30767,7 +30767,7 @@ ix86_init_mpx_builtins ()
 	continue;
 
       ftype = (enum ix86_builtin_func_type) d->flag;
-      decl = def_builtin_const (d->mask, d->name, ftype, d->code);
+      decl = def_builtin_const2 (d->mask, d->name, ftype, d->code);
 
       if (decl)
 	{
@@ -35122,10 +35122,10 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget,
      at all, -m64 is a whole TU option.  */
   if (((ix86_builtins_isa[fcode].isa
 	& ~(OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_MMX
-	    | OPTION_MASK_ISA_64BIT))
+	    | OPTION_MASK_ISA_64BIT | OPTION_MASK_ISA_GFNI))
        && !(ix86_builtins_isa[fcode].isa
 	    & ~(OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_MMX
-		| OPTION_MASK_ISA_64BIT)
+		| OPTION_MASK_ISA_64BIT | OPTION_MASK_ISA_GFNI)
 	    & ix86_isa_flags))
       || ((ix86_builtins_isa[fcode].isa & OPTION_MASK_ISA_AVX512VL)
 	  && !(ix86_isa_flags & OPTION_MASK_ISA_AVX512VL))
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 7c9dd47..b1bcb39 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -754,7 +754,7 @@ Target Report Mask(ISA_RDPID) Var(ix86_isa_flags2) Save
 Support RDPID built-in functions and code generation.
 
 mgfni
-Target Report Mask(ISA_GFNI) Var(ix86_isa_flags2) Save
+Target Report Mask(ISA_GFNI) Var(ix86_isa_flags) Save
 Support GFNI built-in functions and code generation.
 
 mbmi
@@ -903,7 +903,7 @@ Target Report Mask(ISA_RTM) Var(ix86_isa_flags) Save
 Support RTM built-in functions and code generation.
 
 mmpx
-Target Report Mask(ISA_MPX) Var(ix86_isa_flags) Save
+Target Report Mask(ISA_MPX) Var(ix86_isa_flags2) Save
 Support MPX code generation.
 
 mmwaitx
-- 
2.5.5


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch][x86] GFNI enabling [2/4]
  2017-11-03 17:42         ` Koval, Julia
@ 2017-11-07 20:06           ` Kirill Yukhin
  0 siblings, 0 replies; 9+ messages in thread
From: Kirill Yukhin @ 2017-11-07 20:06 UTC (permalink / raw)
  To: Koval, Julia; +Cc: 'Jakub Jelinek', 'GCC Patches'

Hello Julia!
On 03 Nov 17:42, Koval, Julia wrote:
> Here is the solution I propose:
> 
> gcc/
> 	* common/config/i386/i386-common.c
> 	(OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET): Remove MPX from flag.
> 	(ix86_handle_option): Move MPX to isa_flags2 and GFNI to isa_flags.
> 	* config/i386/i386-c.c (ix86_target_macros_internal): Ditto.
> 	* config/i386/i386.opt: Ditto.
> 	* config/i386/i386.c (ix86_target_string): Ditto.
> 	(ix86_option_override_internal): Ditto.
> 	(ix86_init_mpx_builtins): Move MPX to args2.
> 	(ix86_expand_builtin): Special handling for OPTION_MASK_ISA_GFNI.
> 	* config/i386/i386-builtin.def (__builtin_ia32_vgf2p8affineinvqb_v64qi,
> 	__builtin_ia32_vgf2p8affineinvqb_v64qi_mask,
> 	__builtin_ia32_vgf2p8affineinvqb_v32qi,
> 	__builtin_ia32_vgf2p8affineinvqb_v32qi_mask,
> 	__builtin_ia32_vgf2p8affineinvqb_v16qi,
> 	__builtin_ia32_vgf2p8affineinvqb_v16qi_mask): Move to ARGS array.
Patch is OK for main trunk. I've comitted it.

--
Thanks, K

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-07 19:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-17 13:00 [patch][x86] GFNI enabling [2/4] Koval, Julia
2017-10-30 10:30 ` Kirill Yukhin
2017-10-30 19:03   ` Koval, Julia
2017-10-31  7:03     ` Kirill Yukhin
2017-10-31 20:08     ` Jakub Jelinek
2017-11-02 11:57       ` Koval, Julia
2017-11-03  8:27         ` Koval, Julia
2017-11-03 17:42         ` Koval, Julia
2017-11-07 20:06           ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).