public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
@ 2016-11-10 16:27 Andrew Senkevich
  2016-11-10 16:36 ` Jakub Jelinek
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-10 16:27 UTC (permalink / raw)
  To: gcc-patches, Vladimir Makarov, Kirill Yukhin

Hi,

this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.

It requires additional patch for register allocator from Vladimir
Makarov to be committed before.

gcc/
        * common/config/i386/i386-common.c
        (OPTION_MASK_ISA_AVX5124FMAPS_SET,
        OPTION_MASK_ISA_AVX5124FMAPS_UNSET,
        OPTION_MASK_ISA_AVX5124VNNIW_SET,
        OPTION_MASK_ISA_AVX5124VNNIW_UNSET): New.
        (ix86_handle_option): Handle OPT_mavx5124fmaps,
        OPT_mavx5124vnniw.
        * config.gcc: Add avx5124fmapsintrin.h, avx5124vnniwintrin.h.
        * config/i386/avx5124fmapsintrin.h: New file.
        * config/i386/avx5124vnniwintrin.h: Ditto.
        * config/i386/constraints.md (h): New constraint.
        * config/i386/cpuid.h: (bit_AVX5124VNNIW,
        bit_AVX5124FMAPS): New.
        * config/i386/driver-i386.c (host_detect_local_cpu):
        Detect avx5124fmaps, avx5124vnniw.
        * config/i386/i386-builtin-types.def: Add types
        V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI,
        V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF,
        V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF,
        V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI,
        V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI,
        V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI.
        * config/i386/i386-builtin.def (__builtin_ia32_4fmaddps_mask,
        __builtin_ia32_4fmaddps, __builtin_ia32_4fmaddss,
        __builtin_ia32_4fmaddss_mask, __builtin_ia32_4fnmaddps_mask,
        __builtin_ia32_4fnmaddps, __builtin_ia32_4fnmaddss,
        __builtin_ia32_4fnmaddss_mask, __builtin_ia32_vp4dpwssd,
        __builtin_ia32_vp4dpwssd_mask, __builtin_ia32_vp4dpwssds,
        __builtin_ia32_vp4dpwssds_mask): New.
        * config/i386/i386-c.c (ix86_target_macros_internal):
        Define __AVX5124FMAPS__, __AVX5124VNNIW__.
        * config/i386/i386-modes.def (VECTOR_MODES (FLOAT, 256),
        VECTOR_MODE (INT, SI, 64)): New modes.
        * config/i386/i386.c (ix86_target_string): Add -mavx5124fmaps,
        -mavx5124vnniw.
        (PTA_AVX5124FMAPS, PTA_AVX5124VNNIW): Define.
        (ix86_option_override_internal): Handle new options.
        (ix86_valid_target_attribute_inner_p): Add avx5124fmaps,
        avx5124vnniw.
        (ix86_expand_builtin): Handle new builtins.
        (ix86_additional_allocno_class_p): New.
        * config/i386/i386.h (TARGET_AVX5124FMAPS,
        TARGET_AVX5124FMAPS_P,
        TARGET_AVX5124VNNIW,
        TARGET_AVX5124VNNIW_P): Define.
        (reg_class): Add MOD4_SSE_REGS.
        (MOD4_SSE_REG_P, MOD4_SSE_REGNO_P): New.
        * config/i386/i386.opt: Add mavx5124fmaps, mavx5124vnniw.
        * config/i386/immintrin.h: Include avx5124fmapsintrin.h,
        avx5124vnniwintrin.h.
        * config/i386/sse.md (unspec): Add UNSPEC_VP4FMADD,
        UNSPEC_VP4FNMADD,
        UNSPEC_VP4DPWSSD, UNSPEC_VP4DPWSSDS.
        (define_mode_iterator IMOD4): New.
        (define_mode_attr imod4_narrow): Ditto.
        (define_insn "mov<mode>"): Ditto.
        (define_insn "avx5124fmaddps_4fmaddps"): Ditto.
        (define_insn "avx5124fmaddps_4fmaddps_mask"): Ditto.
        (define_insn "avx5124fmaddps_4fmaddps_maskz"): Ditto.
        (define_insn "avx5124fmaddps_4fmaddss"): Ditto.
        (define_insn "avx5124fmaddps_4fmaddss_mask"): Ditto.
        (define_insn "avx5124fmaddps_4fmaddss_maskz"): Ditto.
        (define_insn "avx5124fmaddps_4fnmaddps"): Ditto.
        (define_insn "avx5124fmaddps_4fnmaddps_mask"): Ditto.
        (define_insn "avx5124fmaddps_4fnmaddps_maskz"): Ditto.
        (define_insn "avx5124fmaddps_4fnmaddss"): Ditto.
        (define_insn "avx5124fmaddps_4fnmaddss_mask"): Ditto.
        (define_insn "avx5124fmaddps_4fnmaddss_maskz"): Ditto.
        (define_insn "avx5124vnniw_vp4dpwssd"): Ditto.
        (define_insn "avx5124vnniw_vp4dpwssd_mask"): Ditto.
        (define_insn "avx5124vnniw_vp4dpwssd_maskz"): Ditto.
        (define_insn "avx5124vnniw_vp4dpwssds"): Ditto.
        (define_insn "avx5124vnniw_vp4dpwssds_mask"): Ditto.
        (define_insn "avx5124vnniw_vp4dpwssds_maskz"): Ditto.
        * init-regs.c (initialize_uninitialized_regs): Add emit_clobber call.
        * genmodes.c (mode_size_inline): Extend return type.
        * machmode.h (mode_size, mode_base_align): Extend type.
gcc/testsuite/
        * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: New test.
        * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
        * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
        * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
        * gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
        * gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
        * gcc.target/i386/avx5124fmaps-check.h: Ditto.
        * gcc.target/i386/avx5124vnniw-check.h: Ditto.
        * gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
        * gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
        * gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
        * gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto.
        * gcc.target/i386/avx512f-helper.h: Add avx5124fmaps-check.h,
        avx5124vnniw-check.h.
        * gcc.target/i386/i386.exp (check_effective_target_avx5124fmaps,
        check_effective_target_avx5124vnniw): New.
        * gcc.target/i386/m128-check.h (ESP_FLOAT, ESP_DOUBLE):
        Set under ifndef.
        * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.
        * gcc.target/i386/sse-13.c: Ditto.

diff --git a/gcc/common/config/i386/i386-common.c
b/gcc/common/config/i386/i386-common.c
index d201154..deec4d3 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -76,6 +76,8 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512F_SET)
 #define OPTION_MASK_ISA_AVX512VBMI_SET \
   (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512BW_SET)
+#define OPTION_MASK_ISA_AVX5124FMAPS_SET OPTION_MASK_ISA_AVX5124FMAPS
+#define OPTION_MASK_ISA_AVX5124VNNIW_SET OPTION_MASK_ISA_AVX5124VNNIW
 #define OPTION_MASK_ISA_RTM_SET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
@@ -179,6 +181,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL
 #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA
 #define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI
+#define OPTION_MASK_ISA_AVX5124FMAPS_UNSET OPTION_MASK_ISA_AVX5124FMAPS
+#define OPTION_MASK_ISA_AVX5124VNNIW_UNSET OPTION_MASK_ISA_AVX5124VNNIW
 #define OPTION_MASK_ISA_RTM_UNSET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
@@ -399,6 +403,13 @@ ix86_handle_option (struct gcc_options *opts,
  {
   opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
   opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
+
+  //turn off additional isa flags
+  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+          opts->x_ix86_isa_flags2_explicit |=
OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+          opts->x_ix86_isa_flags2_explicit |=
OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+
  }
       return true;

@@ -441,6 +452,36 @@ ix86_handle_option (struct gcc_options *opts,
  }
       return true;

+    case OPT_mavx5124fmaps:
+      if (value)
+ {
+  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124FMAPS_SET;
+  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_SET;
+  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
+ }
+      else
+ {
+  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+ }
+      return true;
+
+    case OPT_mavx5124vnniw:
+      if (value)
+ {
+  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124VNNIW_SET;
+  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_SET;
+  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
+ }
+      else
+ {
+  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+ }
+      return true;
+
     case OPT_mavx512dq:
       if (value)
  {
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3e0be22..20413fb 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -373,8 +373,8 @@ i[34567]86-*-*)
        xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
        avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
        avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
-       avx512vbmivlintrin.h clwbintrin.h mwaitxintrin.h
-       clzerointrin.h pkuintrin.h"
+       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
+       clwbintrin.h mwaitxintrin.h clzerointrin.h pkuintrin.h"
  ;;
 x86_64-*-*)
  cpu_type=i386
@@ -395,8 +395,8 @@ x86_64-*-*)
        xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
        avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
        avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
-       avx512vbmivlintrin.h clwbintrin.h mwaitxintrin.h
-       clzerointrin.h pkuintrin.h"
+       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
+       clwbintrin.h mwaitxintrin.h clzerointrin.h pkuintrin.h"
  ;;
 ia64-*-*)
  extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx5124fmapsintrin.h
b/gcc/config/i386/avx5124fmapsintrin.h
new file mode 100644
index 0000000..6113ee9
--- /dev/null
+++ b/gcc/config/i386/avx5124fmapsintrin.h
@@ -0,0 +1,216 @@
+/* Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx5124fmapsintrin.h> directly; include
<x86intrin.h> instead."
+#endif
+
+#ifndef _AVX5124FMAPSINTRIN_H_INCLUDED
+#define _AVX5124FMAPSINTRIN_H_INCLUDED
+
+#ifndef __AVX5124FMAPS__
+#pragma GCC push_options
+#pragma GCC target("avx5124fmaps")
+#define __DISABLE_AVX5124FMAPS__
+#endif /* __AVX5124FMAPS__ */
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4fmadd_ps (__m512 __A, __m512 __B, __m512 __C,
+  __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps ((__v16sf) __B,
+   (__v16sf) __C,
+   (__v16sf) __D,
+   (__v16sf) __E,
+   (__v16sf) __A,
+   (const __v4sf *) __F);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4fmadd_ps (__m512 __A, __mmask16 __U, __m512 __B,
+       __m512 __C, __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps_mask ((__v16sf) __B,
+ (__v16sf) __C,
+ (__v16sf) __D,
+ (__v16sf) __E,
+ (__v16sf) __A,
+ (const __v4sf *) __F,
+ (__v16sf) __A,
+ (__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4fmadd_ps (__mmask16 __U,
+ __m512 __A, __m512 __B, __m512 __C,
+ __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps_mask ((__v16sf) __B,
+ (__v16sf) __C,
+ (__v16sf) __D,
+ (__v16sf) __E,
+ (__v16sf) __A,
+ (const __v4sf *) __F,
+ (__v16sf) _mm512_setzero_ps (),
+ (__mmask16) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_4fmadd_ss (__m128 __A, __m128 __B, __m128 __C,
+       __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss ((__v4sf) __B,
+   (__v4sf) __C,
+   (__v4sf) __D,
+   (__v4sf) __E,
+   (__v4sf) __A,
+   (const __v4sf *) __F);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_4fmadd_ss (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C,
+    __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
+ (__v4sf) __C,
+ (__v4sf) __D,
+ (__v4sf) __E,
+ (__v4sf) __A,
+ (const __v4sf *) __F,
+ (__v4sf) __A,
+ (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_4fmadd_ss (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C,
+     __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
+ (__v4sf) __C,
+ (__v4sf) __D,
+ (__v4sf) __E,
+ (__v4sf) __A,
+ (const __v4sf *) __F,
+ (__v4sf) _mm_setzero_ps (),
+ (__mmask8) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4fnmadd_ps (__m512 __A, __m512 __B, __m512 __C,
+   __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps ((__v16sf) __B,
+    (__v16sf) __C,
+    (__v16sf) __D,
+    (__v16sf) __E,
+    (__v16sf) __A,
+    (const __v4sf *) __F);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4fnmadd_ps (__m512 __A, __mmask16 __U, __m512 __B,
+ __m512 __C, __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps_mask ((__v16sf) __B,
+ (__v16sf) __C,
+ (__v16sf) __D,
+ (__v16sf) __E,
+ (__v16sf) __A,
+ (const __v4sf *) __F,
+ (__v16sf) __A,
+ (__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4fnmadd_ps (__mmask16 __U,
+ __m512 __A, __m512 __B, __m512 __C,
+ __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps_mask ((__v16sf) __B,
+ (__v16sf) __C,
+ (__v16sf) __D,
+ (__v16sf) __E,
+ (__v16sf) __A,
+ (const __v4sf *) __F,
+ (__v16sf) _mm512_setzero_ps (),
+ (__mmask16) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_4fnmadd_ss (__m128 __A, __m128 __B, __m128 __C,
+ __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss ((__v4sf) __B,
+    (__v4sf) __C,
+    (__v4sf) __D,
+    (__v4sf) __E,
+    (__v4sf) __A,
+    (const __v4sf *) __F);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_4fnmadd_ss (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C,
+     __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss_mask ((__v4sf) __B,
+ (__v4sf) __C,
+ (__v4sf) __D,
+ (__v4sf) __E,
+ (__v4sf) __A,
+ (const __v4sf *) __F,
+ (__v4sf) __A,
+ (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_4fnmadd_ss (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C,
+      __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss_mask ((__v4sf) __B,
+ (__v4sf) __C,
+ (__v4sf) __D,
+ (__v4sf) __E,
+ (__v4sf) __A,
+ (const __v4sf *) __F,
+ (__v4sf) _mm_setzero_ps (),
+ (__mmask8) __U);
+}
+
+#ifdef __DISABLE_AVX5124FMAPS__
+#undef __DISABLE_AVX5124FMAPS__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX5124FMAPS__ */
+
+#endif /* _AVX5124FMAPSINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx5124vnniwintrin.h
b/gcc/config/i386/avx5124vnniwintrin.h
new file mode 100644
index 0000000..392c6a5
--- /dev/null
+++ b/gcc/config/i386/avx5124vnniwintrin.h
@@ -0,0 +1,132 @@
+/* Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx5124vnniwintrin.h> directly; include
<x86intrin.h> instead."
+#endif
+
+#ifndef _AVX5124VNNIWINTRIN_H_INCLUDED
+#define _AVX5124VNNIWINTRIN_H_INCLUDED
+
+#ifndef __AVX5124VNNIW__
+#pragma GCC push_options
+#pragma GCC target("avx5124vnniw")
+#define __DISABLE_AVX5124VNNIW__
+#endif /* __AVX5124VNNIW__ */
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4dpwssd_epi32 (__m512i __A, __m512i __B, __m512i __C,
+      __m512i __D, __m512i __E, __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd ((__v16si) __B,
+     (__v16si) __C,
+     (__v16si) __D,
+     (__v16si) __E,
+     (__v16si) __A,
+     (const __v4si *) __F);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4dpwssd_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+   __m512i __C, __m512i __D, __m512i __E,
+   __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd_mask ((__v16si) __B,
+  (__v16si) __C,
+  (__v16si) __D,
+  (__v16si) __E,
+  (__v16si) __A,
+  (const __v4si *) __F,
+  (__v16si) __A,
+  (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4dpwssd_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+    __m512i __C, __m512i __D, __m512i __E,
+    __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd_mask ((__v16si) __B,
+  (__v16si) __C,
+  (__v16si) __D,
+  (__v16si) __E,
+  (__v16si) __A,
+  (const __v4si *) __F,
+  (__v16si) _mm512_setzero_ps (),
+  (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4dpwssds_epi32 (__m512i __A, __m512i __B, __m512i __C,
+       __m512i __D, __m512i __E, __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds ((__v16si) __B,
+      (__v16si) __C,
+      (__v16si) __D,
+      (__v16si) __E,
+      (__v16si) __A,
+      (const __v4si *) __F);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4dpwssds_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+    __m512i __C, __m512i __D, __m512i __E,
+    __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds_mask ((__v16si) __B,
+   (__v16si) __C,
+   (__v16si) __D,
+   (__v16si) __E,
+   (__v16si) __A,
+   (const __v4si *) __F,
+   (__v16si) __A,
+   (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4dpwssds_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+     __m512i __C, __m512i __D, __m512i __E,
+     __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds_mask ((__v16si) __B,
+   (__v16si) __C,
+   (__v16si) __D,
+   (__v16si) __E,
+   (__v16si) __A,
+   (const __v4si *) __F,
+   (__v16si) _mm512_setzero_ps (),
+   (__mmask16) __U);
+}
+
+#ifdef __DISABLE_AVX5124VNNIW__
+#undef __DISABLE_AVX5124VNNIW__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX5124VNNIW__ */
+
+#endif /* _AVX5124VNNIWINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index d610336..ebeb437 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@

 ;;; Unused letters:
 ;;;           H
-;;;           h j               z
+;;;             j               z

 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -94,6 +94,9 @@
 (define_register_constraint "v" "TARGET_SSE ? ALL_SSE_REGS : NO_REGS"
  "Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).")

+(define_register_constraint "h" "TARGET_AVX512F ? MOD4_SSE_REGS : NO_REGS"
+ "Any EVEX encodable SSE register, which has number factor of four.")
+
 (define_register_constraint "w" "TARGET_MPX ? BND_REGS : NO_REGS"
  "@internal Any bound register.")

diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 2a946bf..abe7c62 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -60,6 +60,8 @@
 #define bit_MWAITX      (1 << 29)

 /* %edx */
+#define bit_AVX5124VNNIW (1 << 2)
+#define bit_AVX5124FMAPS (1 << 3)
 #define bit_MMXEXT (1 << 22)
 #define bit_LM (1 << 29)
 #define bit_3DNOWP (1 << 30)
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index e026482..f0d0e8f 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -414,6 +414,7 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
   unsigned int has_avx512dq = 0, has_avx512bw = 0, has_avx512vl = 0;
   unsigned int has_avx512vbmi = 0, has_avx512ifma = 0, has_clwb = 0;
   unsigned int has_mwaitx = 0, has_clzero = 0, has_pku = 0;
+  unsigned int has_avx5124fmaps = 0, has_avx5124vnniw = 0;

   bool arch;

@@ -501,6 +502,8 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
       has_prefetchwt1 = ecx & bit_PREFETCHWT1;
       has_avx512vbmi = ecx & bit_AVX512VBMI;
       has_pku = ecx & bit_OSPKE;
+      has_avx5124vnniw = edx & bit_AVX5124VNNIW;
+      has_avx5124fmaps = edx & bit_AVX5124FMAPS;
     }

   if (max_level >= 13)
@@ -1021,6 +1024,8 @@ const char *host_detect_local_cpu (int argc,
const char **argv)
       const char *avx512vl = has_avx512vl ? " -mavx512vl" : " -mno-avx512vl";
       const char *avx512ifma = has_avx512ifma ? " -mavx512ifma" : "
-mno-avx512ifma";
       const char *avx512vbmi = has_avx512vbmi ? " -mavx512vbmi" : "
-mno-avx512vbmi";
+      const char *avx5124vnniw = has_avx5124vnniw ? " -mavx5124vnniw"
: " -mno-avx5124vnniw";
+      const char *avx5124fmaps = has_avx5124fmaps ? " -mavx5124fmaps"
: " -mno-avx5124fmaps";
       const char *clwb = has_clwb ? " -mclwb" : " -mno-clwb";
       const char *mwaitx  = has_mwaitx  ? " -mmwaitx"  : " -mno-mwaitx";
       const char *clzero  = has_clzero  ? " -mclzero"  : " -mno-clzero";
@@ -1033,8 +1038,8 @@ const char *host_detect_local_cpu (int argc,
const char **argv)
  fxsr, xsave, xsaveopt, avx512f, avx512er,
  avx512cd, avx512pf, prefetchwt1, clflushopt,
  xsavec, xsaves, avx512dq, avx512bw, avx512vl,
- avx512ifma, avx512vbmi, clwb, mwaitx,
- clzero, pku, NULL);
+ avx512ifma, avx512vbmi, avx5124fmaps, avx5124vnniw,
+ clwb, mwaitx, clzero, pku, NULL);
     }

 done:
diff --git a/gcc/config/i386/i386-builtin-types.def
b/gcc/config/i386/i386-builtin-types.def
index b34cfda..4a38c12 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -526,6 +526,15 @@ DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, PV8DI, V8DI)

+DEF_FUNCTION_TYPE (V16SF, V16SF, V16SF, V16SF, V16SF, V16SF, PCV4SF,
V16SF, UHI)
+DEF_FUNCTION_TYPE (V16SF, V16SF, V16SF, V16SF, V16SF, V16SF, PCV4SF)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, V4SF, V4SF, PCV4SF)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, V4SF, V4SF, PCV4SF, V4SF, UQI)
+
+DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI,
V16SI, UHI)
+DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI)
+
+
 # Instructions returning mask
 DEF_FUNCTION_TYPE (UHI, UHI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 227526b..3cf18f0 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2482,7 +2482,24 @@ BDESC (OPTION_MASK_ISA_AVX512DQ,
CODE_FOR_ufix_truncv8dfv8di2_mask_round, "__bui
 BDESC (OPTION_MASK_ISA_AVX512DQ,
CODE_FOR_avx512dq_rangepv16sf_mask_round,
"__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN,
(int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ,
CODE_FOR_avx512dq_rangepv8df_mask_round,
"__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN,
(int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)

-BDESC_END (ROUND_ARGS, MPX)
+BDESC_END (ROUND_ARGS, ARGS2)
+
+/* AVX-5124FMA/NNI builtins with variable number of arguments.
Defined in additional ix86_isa_flags2.  */
+BDESC_FIRST (args2, ARGS2,
+       OPTION_MASK_ISA_AVX5124FMAPS,
CODE_FOR_avx5124fmaddps_4fmaddps_mask, "__builtin_ia32_4fmaddps_mask",
IX86_BUILTIN_4FMAPS_MASK, UNKNOWN, (int)
V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS,
CODE_FOR_avx5124fmaddps_4fmaddps, "__builtin_ia32_4fmaddps",
IX86_BUILTIN_4FMAPS, UNKNOWN, (int)
V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS,
CODE_FOR_avx5124fmaddps_4fmaddss, "__builtin_ia32_4fmaddss",
IX86_BUILTIN_4FMASS, UNKNOWN, (int)
V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS,
CODE_FOR_avx5124fmaddps_4fmaddss_mask, "__builtin_ia32_4fmaddss_mask",
IX86_BUILTIN_4FMASS_MASK, UNKNOWN, (int)
V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS,
CODE_FOR_avx5124fmaddps_4fnmaddps_mask,
"__builtin_ia32_4fnmaddps_mask", IX86_BUILTIN_4FNMAPS_MASK, UNKNOWN,
(int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS,
CODE_FOR_avx5124fmaddps_4fnmaddps, "__builtin_ia32_4fnmaddps",
IX86_BUILTIN_4FNMAPS, UNKNOWN, (int)
V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS,
CODE_FOR_avx5124fmaddps_4fnmaddss, "__builtin_ia32_4fnmaddss",
IX86_BUILTIN_4FNMASS, UNKNOWN, (int)
V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS,
CODE_FOR_avx5124fmaddps_4fnmaddss_mask,
"__builtin_ia32_4fnmaddss_mask", IX86_BUILTIN_4FNMASS_MASK, UNKNOWN,
(int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssd,
"__builtin_ia32_vp4dpwssd", IX86_BUILTIN_4DPWSSD, UNKNOWN, (int)
V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW,
CODE_FOR_avx5124vnniw_vp4dpwssd_mask, "__builtin_ia32_vp4dpwssd_mask",
IX86_BUILTIN_4DPWSSD_MASK, UNKNOWN, (int)
V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW,
CODE_FOR_avx5124vnniw_vp4dpwssds, "__builtin_ia32_vp4dpwssds",
IX86_BUILTIN_4DPWSSDS, UNKNOWN, (int)
V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW,
CODE_FOR_avx5124vnniw_vp4dpwssds_mask,
"__builtin_ia32_vp4dpwssds_mask", IX86_BUILTIN_4DPWSSDS_MASK, UNKNOWN,
(int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
+
+BDESC_END (ARGS2, MPX)

 /* Builtins for MPX.  */
 BDESC_FIRST (mpx, MPX,
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 9bb80c0..9599e11 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -28,7 +28,7 @@ along with GCC; see the file COPYING3.  If not see

 static bool ix86_pragma_target_parse (tree, tree);
 static void ix86_target_macros_internal
-  (HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
+  (HOST_WIDE_INT, HOST_WIDE_INT, enum processor_type, enum
processor_type, enum fpmath_unit,
    void (*def_or_undef) (cpp_reader *, const char *));


@@ -36,6 +36,7 @@ static void ix86_target_macros_internal
    macros.  */
 static void
 ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
+     HOST_WIDE_INT isa_flag2,
      enum processor_type arch,
      enum processor_type tune,
      enum fpmath_unit fpmath,
@@ -376,6 +377,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__AVX512VBMI__");
   if (isa_flag & OPTION_MASK_ISA_AVX512IFMA)
     def_or_undef (parse_in, "__AVX512IFMA__");
+  if (isa_flag & OPTION_MASK_ISA_AVX5124VNNIW)
+    def_or_undef (parse_in, "__AVX5124VNNIW__");
+  if (isa_flag2 & OPTION_MASK_ISA_AVX5124FMAPS)
+    def_or_undef (parse_in, "__AVX5124FMAPS__");
   if (isa_flag & OPTION_MASK_ISA_FMA)
     def_or_undef (parse_in, "__FMA__");
   if (isa_flag & OPTION_MASK_ISA_RTM)
@@ -462,6 +467,9 @@ ix86_pragma_target_parse (tree args, tree pop_target)
   HOST_WIDE_INT prev_isa;
   HOST_WIDE_INT cur_isa;
   HOST_WIDE_INT diff_isa;
+  HOST_WIDE_INT prev_isa2;
+  HOST_WIDE_INT cur_isa2;
+  HOST_WIDE_INT diff_isa2;
   enum processor_type prev_arch;
   enum processor_type prev_tune;
   enum processor_type cur_arch;
@@ -494,6 +502,9 @@ ix86_pragma_target_parse (tree args, tree pop_target)
   prev_isa  = prev_opt->x_ix86_isa_flags;
   cur_isa   = cur_opt->x_ix86_isa_flags;
   diff_isa  = (prev_isa ^ cur_isa);
+  prev_isa2  = prev_opt->x_ix86_isa_flags2;
+  cur_isa2   = cur_opt->x_ix86_isa_flags2;
+  diff_isa2  = (prev_isa2 ^ cur_isa2);
   prev_arch = (enum processor_type) prev_opt->arch;
   prev_tune = (enum processor_type) prev_opt->tune;
   cur_arch  = (enum processor_type) cur_opt->arch;
@@ -509,6 +520,7 @@ ix86_pragma_target_parse (tree args, tree pop_target)

   /* Undef all of the macros for that are no longer current.  */
   ix86_target_macros_internal (prev_isa & diff_isa,
+       prev_isa2 & diff_isa2,
        prev_arch,
        prev_tune,
        (enum fpmath_unit) prev_opt->x_ix86_fpmath,
@@ -523,6 +535,7 @@ ix86_pragma_target_parse (tree args, tree pop_target)

   /* Define all of the macros for new options that were just turned on.  */
   ix86_target_macros_internal (cur_isa & diff_isa,
+       cur_isa2 & diff_isa2,
        cur_arch,
        cur_tune,
        (enum fpmath_unit) cur_opt->x_ix86_fpmath,
@@ -583,6 +596,7 @@ ix86_target_macros (void)
   cpp_define (parse_in, "__GCC_ASM_FLAG_OUTPUTS__");

   ix86_target_macros_internal (ix86_isa_flags,
+       ix86_isa_flags2,
        ix86_arch,
        ix86_tune,
        ix86_fpmath,
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index d524313..22b9713 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -84,6 +84,7 @@ VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
 VECTOR_MODES (FLOAT, 32);     /*        V16HF V8SF V4DF */
 VECTOR_MODES (FLOAT, 64);     /*       V32HF V16SF V8DF */
 VECTOR_MODES (FLOAT, 128);    /*      V64HF V32SF V16DF */
+VECTOR_MODES (FLOAT, 256);    /*      V64SF V32DF V16TF */
 VECTOR_MODE (INT, TI, 1);     /*                   V1TI */
 VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
 VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
@@ -91,6 +92,7 @@ VECTOR_MODE (INT, QI, 2);     /*                   V2QI */
 VECTOR_MODE (INT, QI, 12);    /*                  V12QI */
 VECTOR_MODE (INT, QI, 14);    /*                  V14QI */
 VECTOR_MODE (INT, HI, 6);     /*                   V6HI */
+VECTOR_MODE (INT, SI, 64);    /*  V64SI */

 POINTER_BOUNDS_MODE (BND32, 8);
 POINTER_BOUNDS_MODE (BND64, 16);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..0dad131 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2579,7 +2579,7 @@ static int ix86_function_regparm (const_tree, const_tree);
 static void ix86_compute_frame_layout (struct ix86_frame *);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
  rtx, rtx, int);
-static void ix86_add_new_builtins (HOST_WIDE_INT);
+static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
 static tree ix86_canonical_va_list_type (tree);
 static void predict_jump (int);
 static unsigned int split_stack_prologue_scratch_regno (void);
@@ -2592,8 +2592,9 @@ enum ix86_function_specific_strings
   IX86_FUNCTION_SPECIFIC_MAX
 };

-static char *ix86_target_string (HOST_WIDE_INT, int, int, const char *,
- const char *, enum fpmath_unit, bool);
+static char *ix86_target_string (HOST_WIDE_INT, HOST_WIDE_INT, int, int,
+ const char *, const char *, enum fpmath_unit,
+ bool);
 static void ix86_function_specific_save (struct cl_target_option *,
  struct gcc_options *opts);
 static void ix86_function_specific_restore (struct gcc_options *opts,
@@ -4188,8 +4189,8 @@ ix86_using_red_zone (void)
    responsible for freeing the string.  */

 static char *
-ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
-    const char *arch, const char *tune,
+ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2, int flags,
+    int ix86_flags, const char *arch, const char *tune,
     enum fpmath_unit fpmath, bool add_nl_p)
 {
   struct ix86_target_opts
@@ -4257,7 +4258,12 @@ ix86_target_string (HOST_WIDE_INT isa, int
flags, int ix86_flags,
     { "-mclzero", OPTION_MASK_ISA_CLZERO  },
     { "-mpku", OPTION_MASK_ISA_PKU  },
   };
-
+//additional structure for isa flags
+  static struct ix86_target_opts isa_opts2[] =
+  {
+    { "-mavx5124vnniw", OPTION_MASK_ISA_AVX5124VNNIW },
+    { "-mavx5124fmaps", OPTION_MASK_ISA_AVX5124FMAPS },
+  };
   /* Flag options.  */
   static struct ix86_target_opts flag_opts[] =
   {
@@ -4298,8 +4304,8 @@ ix86_target_string (HOST_WIDE_INT isa, int
flags, int ix86_flags,
     { "-mgeneral-regs-only", OPTION_MASK_GENERAL_REGS_ONLY },
   };

-  const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (flag_opts)
-   + ARRAY_SIZE (ix86_flag_opts) + 6][2];
+  const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (isa_opts2)
+   + ARRAY_SIZE (flag_opts) + ARRAY_SIZE (ix86_flag_opts) + 6][2];

   char isa_other[40];
   char target_other[40];
@@ -4361,6 +4367,17 @@ ix86_target_string (HOST_WIDE_INT isa, int
flags, int ix86_flags,
        isa);
     }

+  /* Pick out the options in isa2 options.  */
+  for (i = 0; i < ARRAY_SIZE (isa_opts2); i++)
+    {
+      if ((isa2 & isa_opts2[i].mask) != 0)
+ {
+  opts[num++][0] = isa_opts2[i].option;
+  isa &= ~ isa_opts2[i].mask;
+ }
+    }
+
+
   /* Add flag options.  */
   for (i = 0; i < ARRAY_SIZE (flag_opts); i++)
     {
@@ -4486,9 +4503,9 @@ ix86_profile_before_prologue (void)
 void ATTRIBUTE_UNUSED
 ix86_debug_options (void)
 {
-  char *opts = ix86_target_string (ix86_isa_flags, target_flags,
-   ix86_target_flags,
-   ix86_arch_string, ix86_tune_string,
+  char *opts = ix86_target_string (ix86_isa_flags, ix86_isa_flags2,
+   target_flags, ix86_target_flags,
+   ix86_arch_string,ix86_tune_string,
    ix86_fpmath, true);

   if (opts)
@@ -4844,6 +4861,8 @@ ix86_option_override_internal (bool main_args_p,
 #define PTA_CLZERO (HOST_WIDE_INT_1 << 57)
 #define PTA_NO_80387 (HOST_WIDE_INT_1 << 58)
 #define PTA_PKU (HOST_WIDE_INT_1 << 59)
+#define PTA_AVX5124VNNIW (HOST_WIDE_INT_1 << 60)
+#define PTA_AVX5124FMAPS (HOST_WIDE_INT_1 << 61)

 #define PTA_CORE2 \
   (PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 \
@@ -5499,6 +5518,14 @@ ix86_option_override_internal (bool main_args_p,
  if (processor_alias_table[i].flags & PTA_AVX512IFMA
     && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512IFMA))
   opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512IFMA;
+
+ if (processor_alias_table[i].flags & PTA_AVX5124VNNIW
+    && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_AVX5124VNNIW))
+  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124VNNIW;
+ if (processor_alias_table[i].flags & PTA_AVX5124FMAPS
+    && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_AVX5124FMAPS))
+  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124FMAPS;
+
  if (processor_alias_table[i].flags & (PTA_PREFETCH_SSE | PTA_SSE))
   x86_prefetch_sse = true;
  if (processor_alias_table[i].flags & PTA_MWAITX
@@ -6298,6 +6325,7 @@ ix86_function_specific_save (struct cl_target_option *ptr,
   ptr->tune_defaulted = ix86_tune_defaulted;
   ptr->arch_specified = ix86_arch_specified;
   ptr->x_ix86_isa_flags_explicit = opts->x_ix86_isa_flags_explicit;
+  ptr->x_ix86_isa_flags2_explicit = opts->x_ix86_isa_flags2_explicit;
   ptr->x_recip_mask_explicit = opts->x_recip_mask_explicit;
   ptr->x_ix86_arch_string = opts->x_ix86_arch_string;
   ptr->x_ix86_tune_string = opts->x_ix86_tune_string;
@@ -6354,6 +6382,7 @@ ix86_function_specific_restore (struct gcc_options *opts,
   ix86_tune_defaulted = ptr->tune_defaulted;
   ix86_arch_specified = ptr->arch_specified;
   opts->x_ix86_isa_flags_explicit = ptr->x_ix86_isa_flags_explicit;
+  opts->x_ix86_isa_flags2_explicit = ptr->x_ix86_isa_flags2_explicit;
   opts->x_recip_mask_explicit = ptr->x_recip_mask_explicit;
   opts->x_ix86_arch_string = ptr->x_ix86_arch_string;
   opts->x_ix86_tune_string = ptr->x_ix86_tune_string;
@@ -6459,9 +6488,9 @@ ix86_function_specific_print (FILE *file, int indent,
       struct cl_target_option *ptr)
 {
   char *target_string
-    = ix86_target_string (ptr->x_ix86_isa_flags, ptr->x_target_flags,
-  ptr->x_ix86_target_flags, NULL, NULL,
-  ptr->x_ix86_fpmath, false);
+    = ix86_target_string (ptr->x_ix86_isa_flags, ptr->x_ix86_isa_flags2,
+  ptr->x_target_flags, ptr->x_ix86_target_flags,
+  NULL, NULL, ptr->x_ix86_fpmath, false);

   gcc_assert (ptr->arch < PROCESSOR_max);
   fprintf (file, "%*sarch = %d (%s)\n",
@@ -6538,6 +6567,8 @@ ix86_valid_target_attribute_inner_p (tree args,
char *p_strings[],
     IX86_ATTR_ISA ("avx512dq", OPT_mavx512dq),
     IX86_ATTR_ISA ("avx512bw", OPT_mavx512bw),
     IX86_ATTR_ISA ("avx512vl", OPT_mavx512vl),
+    IX86_ATTR_ISA ("avx5124fmaps", OPT_mavx5124fmaps),
+    IX86_ATTR_ISA ("avx5124vnniw", OPT_mavx5124vnniw),
     IX86_ATTR_ISA ("mmx", OPT_mmmx),
     IX86_ATTR_ISA ("pclmul", OPT_mpclmul),
     IX86_ATTR_ISA ("popcnt", OPT_mpopcnt),
@@ -6796,6 +6827,7 @@ ix86_valid_target_attribute_tree (tree args,
      The string options are attribute options, and will be undone
      when we copy the save structure.  */
   if (opts->x_ix86_isa_flags != def->x_ix86_isa_flags
+      || opts->x_ix86_isa_flags2 != def->x_ix86_isa_flags2
       || opts->x_target_flags != def->x_target_flags
       || option_strings[IX86_FUNCTION_SPECIFIC_ARCH]
       || option_strings[IX86_FUNCTION_SPECIFIC_TUNE]
@@ -6814,7 +6846,7 @@ ix86_valid_target_attribute_tree (tree args,
      | OPTION_MASK_ABI_64
      | OPTION_MASK_ABI_X32
      | OPTION_MASK_CODE16);
-
+  opts->x_ix86_isa_flags &= 0;
  }
       else if (!orig_arch_specified)
  opts->x_ix86_arch_string = NULL;
@@ -6848,7 +6880,7 @@ ix86_valid_target_attribute_tree (tree args,
  }

       /* Add any builtin functions with the new isa if any.  */
-      ix86_add_new_builtins (opts->x_ix86_isa_flags);
+      ix86_add_new_builtins (opts->x_ix86_isa_flags, opts->x_ix86_isa_flags2);

       /* Save the current options unless we are validating options for
  #pragma.  */
@@ -6953,8 +6985,10 @@ ix86_can_inline_p (tree caller, tree callee)
       /* Callee's isa options should a subset of the caller's, i.e. a
SSE4 function
  can inline a SSE2 function but a SSE2 function can't inline a SSE4
  function.  */
-      if ((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
-  != callee_opts->x_ix86_isa_flags)
+      if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
+  != callee_opts->x_ix86_isa_flags) &
+  ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
+  != callee_opts->x_ix86_isa_flags2))
  ret = false;

       /* See if we have the same non-isa options.  */
@@ -12078,6 +12112,15 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
       && df_regs_ever_live_p (regno)));
 }

+/* Return true if register class CL should be an additional allocno
+   class.  */
+
+static bool
+ix86_additional_allocno_class_p (reg_class_t cl)
+{
+  return cl == MOD4_SSE_REGS;
+}
+
 /* Return TRUE if we need to save REGNO.  */

 static bool
@@ -30836,6 +30879,7 @@ struct builtin_isa {
   const char *name; /* function name */
   enum ix86_builtin_func_type tcode; /* type to use in the declaration */
   HOST_WIDE_INT isa; /* isa_flags this builtin is defined for */
+  HOST_WIDE_INT isa2; /* additional isa_flags this builtin is defined for */
   bool const_p; /* true if the declaration is constant */
   bool leaf_p; /* true if the declaration has leaf attribute */
   bool nothrow_p; /* true if the declaration has nothrow attribute */
@@ -30846,6 +30890,7 @@ static struct builtin_isa
ix86_builtins_isa[(int) IX86_BUILTIN_MAX];

 /* Bits that can still enable any inclusion of a builtin.  */
 static HOST_WIDE_INT deferred_isa_values = 0;
+static HOST_WIDE_INT deferred_isa_values2 = 0;

 /* Add an ix86 target builtin function with CODE, NAME and TYPE.  Save the MASK
    of which isa_flags to use in the ix86_builtins_isa array.  Stores the
@@ -30927,19 +30972,74 @@ def_builtin_const (HOST_WIDE_INT mask, const
char *name,

   return decl;
 }
+//def_builting for additional isa flags
+static inline tree
+def_builtin2 (HOST_WIDE_INT mask, const char *name,
+     enum ix86_builtin_func_type tcode,
+     enum ix86_builtins code)
+{
+  tree decl = NULL_TREE;
+
+  ix86_builtins_isa[(int) code].isa2 = mask;
+
+  if (mask == 0
+      || (mask & ix86_isa_flags2) != 0
+      || (lang_hooks.builtin_function
+  == lang_hooks.builtin_function_ext_scope))
+
+    {
+      tree type = ix86_get_builtin_func_type (tcode);
+      decl = add_builtin_function (name, type, code, BUILT_IN_MD,
+   NULL, NULL_TREE);
+  ix86_builtins[(int) code] = decl;
+  ix86_builtins_isa[(int) code].set_and_not_built_p = false;
+    }
+  else
+    {
+      /* Just a MASK where set_and_not_built_p == true can potentially
+ include a builtin.  */
+      deferred_isa_values2 |= mask;
+      ix86_builtins[(int) code] = NULL_TREE;
+      ix86_builtins_isa[(int) code].tcode = tcode;
+      ix86_builtins_isa[(int) code].name = name;
+      ix86_builtins_isa[(int) code].leaf_p = false;
+      ix86_builtins_isa[(int) code].nothrow_p = false;
+      ix86_builtins_isa[(int) code].const_p = false;
+      ix86_builtins_isa[(int) code].set_and_not_built_p = true;
+    }
+
+  return decl;
+}
+
+/* Like def_builtin, but also marks the function decl "const".  */
+
+static inline tree
+def_builtin_const2 (HOST_WIDE_INT mask, const char *name,
+   enum ix86_builtin_func_type tcode, enum ix86_builtins code)
+{
+  tree decl = def_builtin2 (mask, name, tcode, code);
+  if (decl)
+    TREE_READONLY (decl) = 1;
+  else
+    ix86_builtins_isa[(int) code].const_p = true;
+
+  return decl;
+}

 /* Add any new builtin functions for a given ISA that may not have been
    declared.  This saves a bit of space compared to adding all of the
    declarations to the tree, even if we didn't use them.  */

 static void
-ix86_add_new_builtins (HOST_WIDE_INT isa)
+ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2)
 {
-  if ((isa & deferred_isa_values) == 0)
+  if (((isa & deferred_isa_values) == 0)
+      &&((isa2 & deferred_isa_values2) == 0))
     return;

   /* Bits in ISA value can be removed from potential isa values.  */
   deferred_isa_values &= ~isa;
+  deferred_isa_values2 &= ~isa2;

   int i;
   tree saved_current_target_pragma = current_target_pragma;
@@ -30947,7 +31047,7 @@ ix86_add_new_builtins (HOST_WIDE_INT isa)

   for (i = 0; i < (int)IX86_BUILTIN_MAX; i++)
     {
-      if ((ix86_builtins_isa[i].isa & isa) != 0
+      if ((((ix86_builtins_isa[i].isa & isa) != 0) ||
((ix86_builtins_isa[i].isa2 & isa2) != 0))
   && ix86_builtins_isa[i].set_and_not_built_p)
  {
   tree decl, type;
@@ -31185,8 +31285,10 @@ BDESC_VERIFYS (IX86_BUILTIN__BDESC_ARGS_FIRST,
        IX86_BUILTIN__BDESC_SPECIAL_ARGS_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_ROUND_ARGS_FIRST,
        IX86_BUILTIN__BDESC_ARGS_LAST, 1);
-BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_FIRST,
+BDESC_VERIFYS (IX86_BUILTIN__BDESC_ARGS2_FIRST,
        IX86_BUILTIN__BDESC_ROUND_ARGS_LAST, 1);
+BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_FIRST,
+       IX86_BUILTIN__BDESC_ARGS2_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_CONST_FIRST,
        IX86_BUILTIN__BDESC_MPX_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_MULTI_ARG_FIRST,
@@ -31237,6 +31339,18 @@ ix86_init_mmx_sse_builtins (void)
  IX86_BUILTIN__BDESC_ARGS_FIRST,
  ARRAY_SIZE (bdesc_args) - 1);

+  /* Add all builtins with variable number of operands.  */
+  for (i = 0, d = bdesc_args2;
+       i < ARRAY_SIZE (bdesc_args2);
+       i++, d++)
+    {
+      if (d->name == 0)
+ continue;
+
+      ftype = (enum ix86_builtin_func_type) d->flag;
+      def_builtin_const2 (d->mask, d->name, ftype, d->code);
+    }
+
   /* Add all builtins with rounding.  */
   for (i = 0, d = bdesc_round_args;
        i < ARRAY_SIZE (bdesc_round_args);
@@ -36428,10 +36542,13 @@ ix86_expand_builtin (tree exp, rtx target,
rtx subtarget,
      current ISA based on the command line switches.  With function specific
      options, we need to check in the context of the function making the call
      whether it is supported.  */
-  if (ix86_builtins_isa[fcode].isa
+  if ((ix86_builtins_isa[fcode].isa
       && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
+ && (ix86_builtins_isa[fcode].isa2
+      && !(ix86_builtins_isa[fcode].isa2 & ix86_isa_flags2)))
     {
-      char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa, 0, 0,
+      char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa,
+       ix86_builtins_isa[fcode].isa2, 0, 0,
        NULL, NULL, (enum fpmath_unit) 0,
        false);
       if (!opts)
@@ -38091,6 +38208,246 @@ rdseed_step:
  }
     }

+  if (fcode >= IX86_BUILTIN__BDESC_ARGS2_FIRST
+      && fcode <= IX86_BUILTIN__BDESC_ARGS2_LAST)
+    {
+      i = fcode - IX86_BUILTIN__BDESC_ARGS2_FIRST;
+      rtx (*fcn) (rtx, rtx, rtx, rtx);
+      rtx (*fcn_mask) (rtx, rtx, rtx, rtx, rtx);
+      rtx (*fcn_maskz) (rtx, rtx, rtx, rtx, rtx, rtx);
+      rtx (*msk_mov) (rtx, rtx, rtx, rtx);
+      int masked = 1;
+      machine_mode mode, wide_mode, nar_mode;
+
+      nar_mode  = V4SFmode;
+      mode      = V16SFmode;
+      wide_mode = V64SFmode;
+      msk_mov   = gen_avx512f_loadv16sf_mask;
+      fcn_mask  = gen_avx5124fmaddps_4fmaddps_mask;
+      fcn_maskz = gen_avx5124fmaddps_4fmaddps_maskz;
+
+      switch (fcode)
+ {
+ case IX86_BUILTIN_4FMAPS:
+  fcn = gen_avx5124fmaddps_4fmaddps;
+  masked = 0;
+  goto v4fma_expand;
+
+ case IX86_BUILTIN_4DPWSSD:
+  nar_mode  = V4SImode;
+  mode      = V16SImode;
+  wide_mode = V64SImode;
+  fcn = gen_avx5124vnniw_vp4dpwssd;
+  masked = 0;
+  goto v4fma_expand;
+
+ case IX86_BUILTIN_4DPWSSDS:
+  nar_mode  = V4SImode;
+  mode      = V16SImode;
+  wide_mode = V64SImode;
+  fcn = gen_avx5124vnniw_vp4dpwssds;
+  masked = 0;
+  goto v4fma_expand;
+
+ case IX86_BUILTIN_4FNMAPS:
+  fcn = gen_avx5124fmaddps_4fnmaddps;
+  masked = 0;
+  goto v4fma_expand;
+
+ case IX86_BUILTIN_4FNMAPS_MASK:
+  fcn_mask  = gen_avx5124fmaddps_4fnmaddps_mask;
+  fcn_maskz = gen_avx5124fmaddps_4fnmaddps_maskz;
+  goto v4fma_expand;
+
+ case IX86_BUILTIN_4DPWSSD_MASK:
+  nar_mode  = V4SImode;
+  mode      = V16SImode;
+  wide_mode = V64SImode;
+  fcn_mask  = gen_avx5124vnniw_vp4dpwssd_mask;
+  fcn_maskz = gen_avx5124vnniw_vp4dpwssd_maskz;
+  msk_mov   = gen_avx512f_loadv16si_mask;
+  goto v4fma_expand;
+
+ case IX86_BUILTIN_4DPWSSDS_MASK:
+  nar_mode  = V4SImode;
+  mode      = V16SImode;
+  wide_mode = V64SImode;
+  fcn_mask  = gen_avx5124vnniw_vp4dpwssds_mask;
+  fcn_maskz = gen_avx5124vnniw_vp4dpwssds_maskz;
+  msk_mov   = gen_avx512f_loadv16si_mask;
+  goto v4fma_expand;
+
+ case IX86_BUILTIN_4FMAPS_MASK:
+  {
+    tree args[4];
+    rtx ops[4];
+    rtx wide_reg;
+    rtx accum;
+    rtx addr;
+    rtx mem;
+
+v4fma_expand:
+    wide_reg = gen_reg_rtx (wide_mode);
+    for (i = 0; i < 4; i++)
+      {
+        args[i] = CALL_EXPR_ARG (exp, i);
+ ops[i] = expand_normal (args[i]);
+
+ emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, (i) * 64),
+  ops[i]);
+      }
+
+    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
+    accum = force_reg (mode, accum);
+
+    addr = expand_normal (CALL_EXPR_ARG (exp, 5));
+    addr = force_reg (Pmode, addr);
+
+    mem = gen_rtx_MEM (nar_mode, addr);
+
+    target = gen_reg_rtx (mode);
+
+    emit_move_insn (target, accum);
+
+    if (! masked)
+      emit_insn (fcn (target, accum, wide_reg, mem));
+    else
+      {
+        rtx merge, mask;
+ merge = expand_normal (CALL_EXPR_ARG (exp, 6));
+
+ mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+
+ if (CONST_INT_P (mask))
+  mask = fixup_modeless_constant (mask, HImode);
+
+ mask = force_reg (HImode, mask);
+
+ if (GET_MODE (mask) != HImode)
+  mask = gen_rtx_SUBREG (HImode, mask, 0);
+
+ /* If merge is 0 then we're about to emit z-masked variant.  */
+ if (const0_operand (merge, mode))
+  emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
+ /* If merge is the same as accum then emit merge-masked variant.  */
+ else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
+  {
+    merge = force_reg (mode, merge);
+    emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
+  }
+        /* Merge with something unknown might happen if we z-mask w/ -O0.  */
+ else
+  {
+    rtx tmp = target;
+    emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
+
+    target = force_reg (mode, merge);
+    emit_insn (msk_mov (target, tmp, target, mask));
+  }
+      }
+      return target;
+    }
+
+ case IX86_BUILTIN_4FNMASS:
+  fcn = gen_avx5124fmaddps_4fnmaddss;
+  masked = 0;
+  goto s4fma_expand;
+
+ case IX86_BUILTIN_4FMASS:
+  fcn = gen_avx5124fmaddps_4fmaddss;
+  masked = 0;
+  goto s4fma_expand;
+
+ case IX86_BUILTIN_4FNMASS_MASK:
+  fcn_mask = gen_avx5124fmaddps_4fnmaddss_mask;
+  fcn_maskz = gen_avx5124fmaddps_4fnmaddss_maskz;
+  msk_mov   = gen_avx512vl_loadv4sf_mask;
+  goto s4fma_expand;
+
+ case IX86_BUILTIN_4FMASS_MASK:
+  {
+    tree args[4];
+    rtx ops[4];
+    rtx wide_reg;
+    rtx accum;
+    rtx addr;
+    rtx mem;
+
+    fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
+    fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
+    msk_mov   = gen_avx512vl_loadv4sf_mask;
+
+s4fma_expand:
+    mode = V4SFmode;
+    wide_reg = gen_reg_rtx (V64SFmode);
+    for (i = 0; i < 4; i++)
+      {
+ rtx tmp;
+ args[i] = CALL_EXPR_ARG (exp, i);
+ ops[i] = expand_normal (args[i]);
+
+ tmp = gen_reg_rtx (SFmode);
+ emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
+
+ emit_move_insn (gen_rtx_SUBREG (V16SFmode, wide_reg, i * 64),
+  gen_rtx_SUBREG (V16SFmode, tmp, 0));
+      }
+
+    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
+    accum = force_reg (V4SFmode, accum);
+
+    addr = expand_normal (CALL_EXPR_ARG (exp, 5));
+    addr = force_reg (Pmode, addr);
+
+    mem = gen_rtx_MEM (V4SFmode, addr);
+
+    target = gen_reg_rtx (V4SFmode);
+
+    emit_move_insn (target, accum);
+
+    if (! masked)
+      emit_insn (fcn (target, accum, wide_reg, mem));
+    else
+      {
+ rtx merge, mask;
+ merge = expand_normal (CALL_EXPR_ARG (exp, 6));
+
+ mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+
+ if (CONST_INT_P (mask))
+   mask = fixup_modeless_constant (mask, QImode);
+
+ mask = force_reg (QImode, mask);
+
+ if (GET_MODE (mask) != QImode)
+   mask = gen_rtx_SUBREG (QImode, mask, 0);
+
+ /* If merge is 0 then we're about to emit z-masked variant.  */
+ if (const0_operand (merge, mode))
+   emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
+ /* If merge is the same as accum then emit merge-masked variant.  */
+ else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
+   {
+     merge = force_reg (mode, merge);
+     emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
+   }
+ /* Merge with something unknown might happen if we z-mask w/ -O0.  */
+ else
+   {
+     rtx tmp = target;
+     emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
+
+     target = force_reg (mode, merge);
+     emit_insn (msk_mov (target, tmp, target, mask));
+   }
+ }
+      return target;
+    }
+  default:
+    return ix86_expand_args_builtin (bdesc_args2 + i, exp, target);
+  }
+    }
+
   if (fcode >= IX86_BUILTIN__BDESC_COMI_FIRST
       && fcode <= IX86_BUILTIN__BDESC_COMI_LAST)
     {
@@ -38151,7 +38508,8 @@ static tree ix86_get_builtin (enum ix86_builtins code)

   opts = TREE_TARGET_OPTION (target_tree);

-  if (ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
+  if ((ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
+ && (ix86_builtins_isa[(int) code].isa2 & opts->x_ix86_isa_flags2))
     return ix86_builtin_decl (code, true);
   else
     return NULL_TREE;
@@ -39735,6 +40093,18 @@ ix86_hard_regno_mode_ok (int regno, machine_mode mode)
       || VALID_AVX512F_SCALAR_MODE (mode)))
  return true;

+      /* For AVX-5124FMAPS allow V64SFmode for special regnos.  */
+      if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
+  && MOD4_SSE_REGNO_P (regno)
+  && mode == V64SFmode)
+ return true;
+
+      /* For AVX-5124VNNIW allow V64SImode for special regnos.  */
+      if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
+  && MOD4_SSE_REGNO_P (regno)
+  && mode == V64SImode)
+ return true;
+
       /* TODO check for QI/HI scalars.  */
       /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
       if (TARGET_AVX512VL
@@ -51134,6 +51504,9 @@ ix86_run_selftests (void)
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
 #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1

+#undef TARGET_ADDITIONAL_ALLOCNO_CLASS_P
+#define TARGET_ADDITIONAL_ALLOCNO_CLASS_P ix86_additional_allocno_class_p
+
 #undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
 #define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index add7a64..801b68a 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -81,6 +81,10 @@ see the files COPYING3 and COPYING.RUNTIME
respectively.  If not, see
 #define TARGET_AVX512VBMI_P(x) TARGET_ISA_AVX512VBMI_P(x)
 #define TARGET_AVX512IFMA TARGET_ISA_AVX512IFMA
 #define TARGET_AVX512IFMA_P(x) TARGET_ISA_AVX512IFMA_P(x)
+#define TARGET_AVX5124FMAPS TARGET_ISA_AVX5124FMAPS
+#define TARGET_AVX5124FMAPS_P(x) TARGET_ISA_AVX5124FMAPS_P(x)
+#define TARGET_AVX5124VNNIW TARGET_ISA_AVX5124VNNIW
+#define TARGET_AVX5124VNNIW_P(x) TARGET_ISA_AVX5124VNNIW_P(x)
 #define TARGET_FMA TARGET_ISA_FMA
 #define TARGET_FMA_P(x) TARGET_ISA_FMA_P(x)
 #define TARGET_SSE4A TARGET_ISA_SSE4A
@@ -1089,7 +1093,8 @@ extern const char *host_detect_local_cpu (int
argc, const char **argv);
 #define HARD_REGNO_NREGS(REGNO, MODE) \
   (STACK_REGNO_P (REGNO) || SSE_REGNO_P (REGNO) || MMX_REGNO_P (REGNO) \
    || MASK_REGNO_P (REGNO) || BND_REGNO_P (REGNO) \
-   ? (COMPLEX_MODE_P (MODE) ? 2 : 1) \
+   ? (COMPLEX_MODE_P (MODE) ? 2 : \
+      (((MODE == V64SFmode) || (MODE == V64SImode)) ? 4 : 1)) \
    : ((MODE) == XFmode \
       ? (TARGET_64BIT ? 2 : 3) \
       : ((MODE) == XCmode \
@@ -1365,6 +1370,7 @@ enum reg_class
   FLOAT_INT_SSE_REGS,
   MASK_EVEX_REGS,
   MASK_REGS,
+  MOD4_SSE_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };

@@ -1425,6 +1431,7 @@ enum reg_class
    "FLOAT_INT_SSE_REGS", \
    "MASK_EVEX_REGS", \
    "MASK_REGS", \
+   "MOD4_SSE_REGS" \
    "ALL_REGS" }

 /* Define which registers fit in which classes.  This is an initializer
@@ -1465,11 +1472,14 @@ enum reg_class
 {   0x11ffff,    0x1fe0,    0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,   0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,   0x1f },       /* FLOAT_INT_SSE_REGS */        \
-       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */            \
        { 0x0,       0x0, 0x1fe0 },       /* MASK_REGS */                 \
-{ 0xffffffff,0xffffffff,0x1ffff }                                        \
+{ 0x1fe00000,0xffffe000,   0x1f },       /* MOD4_SSE_REGS */ \
+{ 0xffffffff,0xffffffff,0x1ffff } \
 }

+/* { 0x02200000,0x22222000,   0x02 },*/       /* MOD4_SSE_REGS */
+
 /* The same information, inverted:
    Return the class number of the smallest class containing
    reg number REGNO.  This could be a conditional expression
@@ -1533,6 +1543,16 @@ enum reg_class
 #define BND_REG_P(X) (REG_P (X) && BND_REGNO_P (REGNO (X)))
 #define BND_REGNO_P(N) IN_RANGE ((N), FIRST_BND_REG, LAST_BND_REG)

+#define MOD4_SSE_REG_P(X) (REG_P (X) && MOD4_SSE_REGNO_P (REGNO (X)))
+#define MOD4_SSE_REGNO_P(N) ((N) == XMM0_REG  \
+     || (N) == XMM4_REG  \
+     || (N) == XMM8_REG  \
+     || (N) == XMM12_REG \
+     || (N) == XMM16_REG \
+     || (N) == XMM20_REG \
+     || (N) == XMM24_REG \
+     || (N) == XMM28_REG)
+
 /* First floating point reg */
 #define FIRST_FLOAT_REG FIRST_STACK_REG
 #define STACK_TOP_P(X) (REG_P (X) && REGNO (X) == FIRST_FLOAT_REG)
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 9eef558..68e650b 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -25,11 +25,17 @@ config/i386/i386-opts.h
 Variable
 HOST_WIDE_INT ix86_isa_flags = TARGET_64BIT_DEFAULT |
TARGET_SUBTARGET_ISA_DEFAULT

+Variable
+HOST_WIDE_INT ix86_isa_flags2 = 0
+
 ; A mask of ix86_isa_flags that includes bit X if X was set or cleared
 ; on the command line.
 Variable
 HOST_WIDE_INT ix86_isa_flags_explicit

+Variable
+HOST_WIDE_INT ix86_isa_flags2_explicit
+
 ; Additional target flags
 Variable
 int ix86_target_flags
@@ -74,6 +80,10 @@ unsigned char branch_cost

 ;; which flags were passed by the user
 TargetSave
+HOST_WIDE_INT x_ix86_isa_flags2_explicit
+
+;; which flags were passed by the user
+TargetSave
 HOST_WIDE_INT x_ix86_isa_flags_explicit

 ;; whether -mtune was not specified
@@ -687,6 +697,14 @@ mavx512vbmi
 Target Report Mask(ISA_AVX512VBMI) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and
AVX512F and AVX512VBMI built-in functions and code generation.

+mavx5124fmaps
+Target Report Mask(ISA_AVX5124FMAPS) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and
AVX512F and AVX5124FMAPS built-in functions and code generation.
+
+mavx5124vnniw
+Target Report Mask(ISA_AVX5124VNNIW) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and
AVX512F and AVX5124VNNIW built-in functions and code generation.
+
 mfma
 Target Report Mask(ISA_FMA) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and FMA
built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index 9333111..3fd3c9c 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -68,6 +68,10 @@

 #include <avx512vbmivlintrin.h>

+#include <avx5124fmapsintrin.h>
+
+#include <avx5124vnniwintrin.h>
+
 #include <shaintrin.h>

 #include <lzcntintrin.h>
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 14fcd67..78bf2a4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -146,6 +146,12 @@

   ;; For AVX512VBMI support
   UNSPEC_VPMULTISHIFT
+
+  ;; For AVX5124FMAPS/AVX5124VNNIW support
+  UNSPEC_VP4FMADD
+  UNSPEC_VP4FNMADD
+  UNSPEC_VP4DPWSSD
+  UNSPEC_VP4DPWSSDS
 ])

 (define_c_enum "unspecv" [
@@ -19397,3 +19403,274 @@
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
+
+(define_mode_iterator IMOD4
+  [(V64SF "TARGET_AVX5124FMAPS") (V64SI "TARGET_AVX5124VNNIW")])
+
+(define_mode_attr imod4_narrow
+  [(V64SF "V16SF") (V64SI "V16SI")])
+
+(define_insn "mov<mode>"
+  [(set (match_operand:IMOD4 0 "nonimmediate_operand")
+ (match_operand:IMOD4 1 "general_operand"))]
+  "TARGET_AVX512F"
+  "#")
+
+(define_split
+  [(set (match_operand:IMOD4 0 "register_operand")
+ (match_operand:IMOD4 1 "nonimmediate_operand"))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (subreg:<imod4_narrow> (match_dup 0) 0)
+ (subreg:<imod4_narrow> (match_dup 1) 0))
+   (set (subreg:<imod4_narrow> (match_dup 0) 64)
+ (subreg:<imod4_narrow> (match_dup 1) 64))
+   (set (subreg:<imod4_narrow> (match_dup 0) 128)
+ (subreg:<imod4_narrow> (match_dup 1) 128))
+   (set (subreg:<imod4_narrow> (match_dup 0) 192)
+ (subreg:<imod4_narrow> (match_dup 1) 192))])
+
+(define_insn "avx5124fmaddps_4fmaddps"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+ (unspec:V16SF
+  [(match_operand:V16SF 1 "register_operand" "0")
+   (match_operand:V64SF 2 "register_operand" "h")
+   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddps_mask"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+ (vec_merge:V16SF
+  (unspec:V16SF
+     [(match_operand:V64SF 1 "register_operand" "h")
+      (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FMADD)
+  (match_operand:V16SF 3 "register_operand" "0")
+  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddps_maskz"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+ (vec_merge:V16SF
+  (unspec:V16SF
+    [(match_operand:V16SF 1 "register_operand" "0")
+     (match_operand:V64SF 2 "register_operand" "h")
+     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD)
+  (match_operand:V16SF 4 "const0_operand" "C")
+  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+ (unspec:V4SF
+  [(match_operand:V4SF 1 "register_operand" "0")
+   (match_operand:V64SF 2 "register_operand" "h")
+   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%3, %x2, %0|%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss_mask"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+ (vec_merge:V4SF
+  (unspec:V4SF
+    [(match_operand:V64SF 1 "register_operand" "h")
+     (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FMADD)
+  (match_operand:V4SF 3 "register_operand" "0")
+  (match_operand:QI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%2, %x1, %0%{%4%}|%{%4%}%0, %x1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss_maskz"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+ (vec_merge:V4SF
+  (unspec:V4SF
+    [(match_operand:V4SF 1 "register_operand" "0")
+     (match_operand:V64SF 2 "register_operand" "h")
+     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD)
+  (match_operand:V4SF 4 "const0_operand" "C")
+  (match_operand:QI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%3, %x2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+ (unspec:V16SF
+  [(match_operand:V16SF 1 "register_operand" "0")
+   (match_operand:V64SF 2 "register_operand" "h")
+   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps_mask"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+ (vec_merge:V16SF
+  (unspec:V16SF
+     [(match_operand:V64SF 1 "register_operand" "h")
+      (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+  (match_operand:V16SF 3 "register_operand" "0")
+  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps_maskz"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+ (vec_merge:V16SF
+  (unspec:V16SF
+    [(match_operand:V16SF 1 "register_operand" "0")
+     (match_operand:V64SF 2 "register_operand" "h")
+     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+  (match_operand:V16SF 4 "const0_operand" "C")
+  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+ (unspec:V4SF
+  [(match_operand:V4SF 1 "register_operand" "0")
+   (match_operand:V64SF 2 "register_operand" "h")
+   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%3, %x2, %0|%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss_mask"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+ (vec_merge:V4SF
+  (unspec:V4SF
+    [(match_operand:V64SF 1 "register_operand" "h")
+     (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+  (match_operand:V4SF 3 "register_operand" "0")
+  (match_operand:QI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%2, %x1, %0%{%4%}|%{%4%}%0, %x1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss_maskz"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+ (vec_merge:V4SF
+  (unspec:V4SF
+    [(match_operand:V4SF 1 "register_operand" "0")
+     (match_operand:V64SF 2 "register_operand" "h")
+     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+  (match_operand:V4SF 4 "const0_operand" "C")
+  (match_operand:QI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%3, %x2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+ (unspec:V16SI
+  [(match_operand:V16SI 1 "register_operand" "0")
+   (match_operand:V64SI 2 "register_operand" "h")
+   (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSD))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd_mask"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+ (vec_merge:V16SI
+  (unspec:V16SI
+     [(match_operand:V64SI 1 "register_operand" "h")
+      (match_operand:V4SI 2 "memory_operand" "m")] UNSPEC_VP4DPWSSD)
+  (match_operand:V16SI 3 "register_operand" "0")
+  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd_maskz"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+ (vec_merge:V16SI
+  (unspec:V16SI
+    [(match_operand:V16SI 1 "register_operand" "0")
+     (match_operand:V64SI 2 "register_operand" "h")
+     (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSD)
+  (match_operand:V16SI 4 "const0_operand" "C")
+  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+ (unspec:V16SI
+  [(match_operand:V16SI 1 "register_operand" "0")
+   (match_operand:V64SI 2 "register_operand" "h")
+   (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSDS))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds_mask"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+ (vec_merge:V16SI
+  (unspec:V16SI
+     [(match_operand:V64SI 1 "register_operand" "h")
+      (match_operand:V4SI 2 "memory_operand" "m")] UNSPEC_VP4DPWSSDS)
+  (match_operand:V16SI 3 "register_operand" "0")
+  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds_maskz"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+ (vec_merge:V16SI
+  (unspec:V16SI
+    [(match_operand:V16SI 1 "register_operand" "0")
+     (match_operand:V64SI 2 "register_operand" "h")
+     (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSDS)
+  (match_operand:V16SI 4 "const0_operand" "C")
+  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index 92ca055..42ab5f0 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -973,10 +973,10 @@ inline __attribute__((__always_inline__))\n\
 #else\n\
 extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
 #endif\n\
-unsigned char\n\
+unsigned short\n\
 mode_size_inline (machine_mode mode)\n\
 {\n\
-  extern %sunsigned char mode_size[NUM_MACHINE_MODES];\n\
+  extern %sunsigned short mode_size[NUM_MACHINE_MODES];\n\
   gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
   switch (mode)\n\
     {\n", adj_bytesize ? "" : "const ");
@@ -1301,7 +1301,7 @@ emit_mode_size (void)
   int c;
   struct mode_data *m;

-  print_maybe_const_decl ("%sunsigned char", "mode_size",
+  print_maybe_const_decl ("%sunsigned short", "mode_size",
   "NUM_MACHINE_MODES", bytesize);

   for_all_modes (c, m)
@@ -1492,7 +1492,7 @@ emit_mode_base_align (void)
   int c;
   struct mode_data *m;

-  print_maybe_const_decl ("%sunsigned char",
+  print_maybe_const_decl ("%sunsigned short",
   "mode_base_align", "NUM_MACHINE_MODES",
   alignment);

diff --git a/gcc/init-regs.c b/gcc/init-regs.c
index 3fbaee1..2ee4bd4 100644
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
@@ -104,6 +104,7 @@ initialize_uninitialized_regs (void)
   bitmap_set_bit (already_genned, regno);

   start_sequence ();
+  emit_clobber (reg);
   emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
   move_insn = get_insns ();
   end_sequence ();
diff --git a/gcc/machmode.h b/gcc/machmode.h
index 3dcadd8..d924e83 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -179,7 +179,7 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];

 /* Get the size in bytes and bits of an object of mode MODE.  */

-extern CONST_MODE_SIZE unsigned char mode_size[NUM_MACHINE_MODES];
+extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
 #if GCC_VERSION >= 4001
 #define GET_MODE_SIZE(MODE) \
   ((unsigned short) (__builtin_constant_p (MODE) \
@@ -330,7 +330,7 @@ extern machine_mode get_best_mode (int, int,

 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */

-extern CONST_MODE_BASE_ALIGN unsigned char mode_base_align[NUM_MACHINE_MODES];
+extern CONST_MODE_BASE_ALIGN unsigned short mode_base_align[NUM_MACHINE_MODES];

 extern unsigned get_mode_alignment (machine_mode);

diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c
b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c
new file mode 100644
index 0000000..1035f25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 }
} */
+
+#include <x86intrin.h>
+
+__m512 a, b, c, d, e, f, g, x1, x2, x3;
+__m128 *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4fmadd_ps (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4fmadd_ps (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4fmadd_ps (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c
b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c
new file mode 100644
index 0000000..f977b65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-require-effective-target avx5124fmaps } */
+
+#define ESP_FLOAT 1.0
+
+#define AVX5124FMAPS
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (float *src1, float* src2, float *src3,
+      float *src4, float* prev_dst, float *mult, float *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      dst[i] = (double)prev_dst[i]
+ + (double)src1[i] * (double)mult[0]
+ + (double)src2[i] * (double)mult[1]
+ + (double)src3[i] * (double)mult[2]
+ + (double)src4[i] * (double)mult[3];
+    }
+}
+
+void
+TEST (void)
+{
+  int i, sign;
+  UNION_TYPE (AVX512F_LEN,) src1, src2, src3, src4, src5, dst, res1,
res2, res3;
+  UNION_TYPE (128,) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  float res_ref[SIZE];
+
+  sign = -1;
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = 1.5 + 34.67 * i * sign;
+      src2.a[i] = -22.17 * i * sign;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+      sign = sign * -1;
+    }
+  for (i = 0; i < 4; i++)
+    mult.a[i] = 3.1415 + i * 2.71828;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4fmadd_ps)       (      src5.x, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4fmadd_ps)  (src5.x, mask, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4fmadd_ps) (mask, src5.x, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c
b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c
new file mode 100644
index 0000000..2f1a558
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[
\\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 }
} */
+
+#include <x86intrin.h>
+
+__m128 a, b, c, d, e, f, x1, x2, x3;
+__m128 *mem;
+__mmask8 m;
+
+int foo ()
+{
+  x1 = _mm_4fmadd_ss (a, b, c, d, e, mem);
+  x2 = _mm_mask_4fmadd_ss (a, m, b, c, d, e, mem);
+  x3 = _mm_maskz_4fmadd_ss (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c
b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c
new file mode 100644
index 0000000..45bd7da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 }
} */
+
+#include <x86intrin.h>
+
+__m512 a, b, c, d, e, f, g, x1, x2, x3;
+__m128 *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4fnmadd_ps (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4fnmadd_ps (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4fnmadd_ps (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c
b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c
new file mode 100644
index 0000000..3c75fcf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-require-effective-target avx5124fmaps } */
+
+#define ESP_FLOAT 1.0
+
+#define AVX5124FMAPS
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (float *src1, float* src2, float *src3,
+      float *src4, float* prev_dst, float *mult, float *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      dst[i] = (double)prev_dst[i]
+ - (double)src1[i] * (double)mult[0]
+ - (double)src2[i] * (double)mult[1]
+ - (double)src3[i] * (double)mult[2]
+ - (double)src4[i] * (double)mult[3];
+    }
+}
+
+void
+TEST (void)
+{
+  int i, sign;
+  UNION_TYPE (AVX512F_LEN,) src1, src2, src3, src4, src5, dst, res1,
res2, res3;
+  UNION_TYPE (128,) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  float res_ref[SIZE];
+
+  sign = -1;
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = 1.5 + 34.67 * i * sign;
+      src2.a[i] = -22.17 * i * sign;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+      sign = sign * -1;
+    }
+  for (i = 0; i < 4; i++)
+    mult.a[i] = 3.1415 + i * 2.71828;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4fnmadd_ps)       (      src5.x, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4fnmadd_ps)  (src5.x, mask, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4fnmadd_ps) (mask, src5.x, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c
b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c
new file mode 100644
index 0000000..1755afb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[
\\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 }
} */
+
+
+#include <x86intrin.h>
+
+__m128 a, b, c, d, e, f, x1, x2, x3;
+__m128 *mem;
+__mmask8 m;
+
+int foo ()
+{
+  x1 = _mm_4fnmadd_ss (a, b, c, d, e, mem);
+  x2 = _mm_mask_4fnmadd_ss (a, m, b, c, d, e, mem);
+  x3 = _mm_maskz_4fnmadd_ss (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
b/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
new file mode 100644
index 0000000..eba93cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
@@ -0,0 +1,47 @@
+#include <stdlib.h>
+#include "cpuid.h"
+#include "m512-check.h"
+#include "avx512f-os-support.h"
+
+static void avx5124fmaps_test (void);
+
+static void __attribute__ ((noinline)) do_test (void)
+{
+  avx5124fmaps_test ();
+}
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AVX512_4FNMA test only if host has the support.  */
+  if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE))
+    {
+      if (__get_cpuid_max (0, NULL) < 7)
+ return 0;
+
+      __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+      if ((avx512f_os_support ()) && ((edx & bit_AVX5124FMAPS) ==
bit_AVX5124FMAPS))
+ {
+  do_test ();
+#ifdef DEBUG
+  printf ("PASSED\n");
+#endif
+  return 0;
+ }
+#ifdef DEBUG
+      printf ("SKIPPED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
b/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
new file mode 100644
index 0000000..a706cfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
@@ -0,0 +1,47 @@
+#include <stdlib.h>
+#include "cpuid.h"
+#include "m512-check.h"
+#include "avx512f-os-support.h"
+
+static void avx5124vnniw_test (void);
+
+static void __attribute__ ((noinline)) do_test (void)
+{
+  avx5124vnniw_test ();
+}
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AVX512_4FNMA test only if host has the support.  */
+  if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE))
+    {
+      if (__get_cpuid_max (0, NULL) < 7)
+ return 0;
+
+      __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+      if ((avx512f_os_support ()) && ((edx & bit_AVX5124VNNIW) ==
bit_AVX5124VNNIW))
+ {
+  do_test ();
+#ifdef DEBUG
+  printf ("PASSED\n");
+#endif
+  return 0;
+ }
+#ifdef DEBUG
+      printf ("SKIPPED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c
b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c
new file mode 100644
index 0000000..a234fdd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 }
} */
+
+#include <x86intrin.h>
+
+__m512i a, b, c, d, e, f, g, x1, x2, x3;
+__m128i *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4dpwssd_epi32 (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4dpwssd_epi32 (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4dpwssd_epi32 (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c
b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c
new file mode 100644
index 0000000..a0a6825
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c
@@ -0,0 +1,79 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-require-effective-target avx5124vnniw } */
+
+#define AVX5124VNNIW
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (short *src1, short* src2, short *src3,
+      short *src4, int* prev_dst, short *mult, int *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      int p1dword, p2dword;
+      dst[i] = prev_dst[i];
+      p1dword = (int)(src1[2*i  ]) * (int)(mult[0]);
+      p2dword = (int)(src1[2*i+1]) * (int)(mult[1]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src2[2*i  ]) * (int)(mult[2]);
+      p2dword = (int)(src2[2*i+1]) * (int)(mult[3]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src3[2*i  ]) * (int)(mult[4]);
+      p2dword = (int)(src3[2*i+1]) * (int)(mult[5]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src4[2*i  ]) * (int)(mult[6]);
+      p2dword = (int)(src4[2*i+1]) * (int)(mult[7]);
+      dst[i] += p1dword + p2dword;
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_w) src1, src2, src3, src4;
+  UNION_TYPE (AVX512F_LEN, i_d) src5, dst, res1, res2, res3;
+  UNION_TYPE (128, i_w) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  int res_ref[SIZE];
+
+  for (i = 0; i < SIZE * 2; i++)
+    {
+      src1.a[i] = 2 + 7 * i % 291;
+      src2.a[i] = 3 + 11 * (i % 377) * i;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+    }
+  for (i = 0; i < 8; i++)
+    mult.a[i] = 3 + i * 2;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4dpwssd_epi32)       (      src5.x, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4dpwssd_epi32)  (src5.x, mask, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4dpwssd_epi32) (mask, src5.x, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c
b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c
new file mode 100644
index 0000000..d1bed37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[
\\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 }
} */
+
+#include <x86intrin.h>
+
+__m512i a, b, c, d, e, f, g, x1, x2, x3;
+__m128i *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4dpwssds_epi32 (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4dpwssds_epi32 (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4dpwssds_epi32 (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c
b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c
new file mode 100644
index 0000000..e1e5536
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c
@@ -0,0 +1,98 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-require-effective-target avx5124vnniw } */
+
+#define DEFAULT_VALUE 0x7ffffffe
+
+#define AVX5124VNNIW
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (short *src1, short* src2, short *src3,
+      short *src4, int* prev_dst, short *mult, int *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      int p1dword, p2dword;
+      long long int tmp;
+      dst[i] = prev_dst[i];
+      p1dword = (int)(src1[2*i  ]) * (int)(mult[0]);
+      p2dword = (int)(src1[2*i+1]) * (int)(mult[1]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+ dst[i] = 0x7fffffff;
+      else
+ dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src2[2*i  ]) * (int)(mult[2]);
+      p2dword = (int)(src2[2*i+1]) * (int)(mult[3]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+ dst[i] = 0x7fffffff;
+      else
+ dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src3[2*i  ]) * (int)(mult[4]);
+      p2dword = (int)(src3[2*i+1]) * (int)(mult[5]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+ dst[i] = 0x7fffffff;
+      else
+ dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src4[2*i  ]) * (int)(mult[6]);
+      p2dword = (int)(src4[2*i+1]) * (int)(mult[7]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+ dst[i] = 0x7fffffff;
+      else
+ dst[i] += p1dword + p2dword;
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_w) src1, src2, src3, src4;
+  UNION_TYPE (AVX512F_LEN, i_d) src5, dst, res1, res2, res3;
+  UNION_TYPE (128, i_w) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  int res_ref[SIZE];
+
+  for (i = 0; i < SIZE * 2; i++)
+    {
+      src1.a[i] = 2 + 7 * i % 291;
+      src2.a[i] = 3 + 11 * (i % 377) * i;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+    }
+  for (i = 0; i < 8; i++)
+    mult.a[i] = 3 + i * 2;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4dpwssds_epi32)     (      src5.x, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4dpwssds_epi32)  (src5.x, mask, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4dpwssds_epi32) (mask, src5.x, src1.x,
src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h
b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
index 5923085..6aca0d6 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
@@ -22,6 +22,10 @@
 #include "avx512ifma-check.h"
 #elif defined (AVX512VBMI) && !defined (AVX512VL)
 #include "avx512vbmi-check.h"
+#elif defined (AVX5124FMAPS) && !defined (AVX512VL)
+#include "avx5124fmaps-check.h"
+#elif defined (AVX5124VNNIW) && !defined (AVX512VL)
+#include "avx5124vnniw-check.h"
 #elif defined (AVX512VL)
 #include "avx512vl-check.h"
 #endif
@@ -33,7 +37,9 @@
 /* Value to be written into destination.
    We have one value for all types so it must be small enough
    to fit into signed char.  */
+#ifndef DEFAULT_VALUE
 #define DEFAULT_VALUE 117
+#endif

 #define MAKE_MASK_MERGE(NAME, TYPE)      \
 static void      \
@@ -132,6 +138,12 @@ avx512ifma_test (void) { test_512 (); }
 #elif defined (AVX512VBMI) && !defined (AVX512VL)
 void
 avx512vbmi_test (void) { test_512 (); }
+#elif defined (AVX5124FMAPS) && !defined (AVX512VL)
+void
+avx5124fmaps_test (void) { test_512 (); }
+#elif defined (AVX5124VNNIW) && !defined (AVX512VL)
+void
+avx5124vnniw_test (void) { test_512 (); }
 #elif defined (AVX512VL)
 void
 avx512vl_test (void) { test_256 (); test_128 (); }
diff --git a/gcc/testsuite/gcc.target/i386/i386.exp
b/gcc/testsuite/gcc.target/i386/i386.exp
index 877d224..4057240 100644
--- a/gcc/testsuite/gcc.target/i386/i386.exp
+++ b/gcc/testsuite/gcc.target/i386/i386.exp
@@ -366,6 +366,48 @@ proc check_effective_target_avx512vbmi { } {
     } "-mavx512vbmi" ]
 }

+# Return 1 if avx512_4fmaps instructions can be compiled.
+proc check_effective_target_avx5124fmaps { } {
+    return [check_no_compiler_messages avx5124fmaps object {
+ typedef float __v16sf __attribute__ ((__vector_size__ (64)));
+ typedef float __v4sf __attribute__ ((__vector_size__ (16)));
+
+ __v16sf
+ _mm512_mask_4fmadd_ps (__v16sf __DEST, __v16sf __A, __v16sf __B, __v16sf __C,
+       __v16sf __D, __v16sf __E, __v4sf *__F)
+ {
+    return (__v16sf) __builtin_ia32_4fmaddps_mask ((__v16sf) __A,
+  (__v16sf) __B,
+  (__v16sf) __C,
+  (__v16sf) __D,
+  (__v16sf) __E,
+  (const __v4sf *) __F,
+  (__v16sf) __DEST,
+  0xffff);
+ }
+    } "-mavx5124fmaps" ]
+}
+
+# Return 1 if avx512_4vnniw instructions can be compiled.
+proc check_effective_target_avx5124vnniw { } {
+    return [check_no_compiler_messages avx5124vnniw object {
+ typedef int __v16si __attribute__ ((__vector_size__ (64)));
+ typedef int __v4si __attribute__ ((__vector_size__ (16)));
+
+ __v16si
+ _mm512_4dpwssd_epi32 (__v16si __A, __v16si __B, __v16si __C,
+      __v16si __D, __v16si __E, __v4si *__F)
+ {
+    return (__v16si) __builtin_ia32_vp4dpwssd ((__v16si) __B,
+       (__v16si) __C,
+       (__v16si) __D,
+       (__v16si) __E,
+       (__v16si) __A,
+       (const __v4si *) __F);
+ }
+    } "-mavx5124vnniw" ]
+}
+
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
diff --git a/gcc/testsuite/gcc.target/i386/m128-check.h
b/gcc/testsuite/gcc.target/i386/m128-check.h
index abb792b..48b2332 100644
--- a/gcc/testsuite/gcc.target/i386/m128-check.h
+++ b/gcc/testsuite/gcc.target/i386/m128-check.h
@@ -108,8 +108,12 @@ CHECK_EXP (union128d, double, "%f")

 CHECK_EXP (union128, float, "%f")

+#ifndef ESP_FLOAT
 #define ESP_FLOAT 0.000001
+#endif
+#ifndef ESP_DOUBLE
 #define ESP_DOUBLE 0.000001
+#endif
 #define CHECK_ARRAY(ARRAY, TYPE, FMT)                   \
 static int                                              \
 __attribute__((noinline, unused))                       \
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c
b/gcc/testsuite/gcc.target/i386/sse-12.c
index f0f5457..94990af 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@
    popcntintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a
-m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm
-mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm
-mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er
-mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves
-mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi
-mavx512ifma -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a
-m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm
-mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm
-mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er
-mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves
-mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi
-mavx512ifma -mavx5124fmaps -mclwb -mmwaitx -mclzero -mpku" } */

 #include <x86intrin.h>

diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c
b/gcc/testsuite/gcc.target/i386/sse-13.c
index 80d8c20..4e4ed11 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8
-msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt
-mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma
-mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er
-mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves
-mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi
-mavx512ifma -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8
-msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt
-mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma
-mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er
-mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves
-mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi
-mavx512ifma -mavx5124fmaps -mclwb -mmwaitx -mclzero -mpku" } */
 /* { dg-add-options bind_pic_locally } */

 #include <mm_malloc.h>


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-10 16:27 [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions Andrew Senkevich
@ 2016-11-10 16:36 ` Jakub Jelinek
  2016-11-10 17:18   ` Andrew Senkevich
  2016-11-10 17:14 ` Vladimir N Makarov
  2016-11-11 11:30 ` Jakub Jelinek
  2 siblings, 1 reply; 29+ messages in thread
From: Jakub Jelinek @ 2016-11-10 16:36 UTC (permalink / raw)
  To: Andrew Senkevich, Uros Bizjak
  Cc: gcc-patches, Vladimir Makarov, Kirill Yukhin

On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
> Hi,
> 
> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
> 
> It requires additional patch for register allocator from Vladimir
> Makarov to be committed before.

Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
tabs), can you repost as attachment or configure your MUA not to do this?

Just a couple of random nits follow:

>         * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.

This mentions an option that doesn't exist, is that s/dd// ?

>         * gcc.target/i386/sse-13.c: Ditto.

> @@ -399,6 +403,13 @@ ix86_handle_option (struct gcc_options *opts,
>   {
>    opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
>    opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
> +
> +  //turn off additional isa flags

Comments start with capital letter, end with ., there should be space
between // and T, better use /* ... */ style comment to match other
comments in the file.

> +  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
> +          opts->x_ix86_isa_flags2_explicit |=
> OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
> +  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
> +          opts->x_ix86_isa_flags2_explicit |=
> OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
> +
>   }

The formatting looks very weird.

	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-10 16:27 [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions Andrew Senkevich
  2016-11-10 16:36 ` Jakub Jelinek
@ 2016-11-10 17:14 ` Vladimir N Makarov
  2016-11-10 17:19   ` Andrew Senkevich
  2016-11-11 11:30 ` Jakub Jelinek
  2 siblings, 1 reply; 29+ messages in thread
From: Vladimir N Makarov @ 2016-11-10 17:14 UTC (permalink / raw)
  To: Andrew Senkevich, gcc-patches, Kirill Yukhin



On 11/10/2016 11:27 AM, Andrew Senkevich wrote:
> Hi,
>
> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>
> It requires additional patch for register allocator from Vladimir
> Makarov to be committed before.
>
>
I've just committed the necessary patch.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-10 16:36 ` Jakub Jelinek
@ 2016-11-10 17:18   ` Andrew Senkevich
  2016-11-11 11:16     ` Uros Bizjak
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-10 17:18 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, gcc-patches, Vladimir Makarov, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 681 bytes --]

2016-11-10 19:36 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
>> Hi,
>>
>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>
>> It requires additional patch for register allocator from Vladimir
>> Makarov to be committed before.
>
> Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
> tabs), can you repost as attachment or configure your MUA not to do this?
>
> Just a couple of random nits follow:
>
>>         * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.
>
> This mentions an option that doesn't exist, is that s/dd// ?

Yes.
Attached fixed version.


--
WBR,
Andrew

[-- Attachment #2: new_avx512_instructions.patch --]
[-- Type: application/octet-stream, Size: 98570 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9e93f79..93f5f35 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,84 @@
+2016-11-10  Kirill Yukhin  <kirill.yukhin@gmail.com>
+	    Andrew Senkevich <andrew.senkevich@intel.com>
+
+	* common/config/i386/i386-common.c
+	(OPTION_MASK_ISA_AVX5124FMAPS_SET,
+	OPTION_MASK_ISA_AVX5124FMAPS_UNSET,
+	OPTION_MASK_ISA_AVX5124VNNIW_SET,
+	OPTION_MASK_ISA_AVX5124VNNIW_UNSET): New.
+	(ix86_handle_option): Handle OPT_mavx5124fmaps,
+	OPT_mavx5124vnniw.
+	* config.gcc: Add avx5124fmapsintrin.h, avx5124vnniwintrin.h.
+	* config/i386/avx5124fmapsintrin.h: New file.
+	* config/i386/avx5124vnniwintrin.h: Ditto.
+	* config/i386/constraints.md (h): New constraint.
+	* config/i386/cpuid.h: (bit_AVX5124VNNIW,
+	bit_AVX5124FMAPS): New.
+	* config/i386/driver-i386.c (host_detect_local_cpu):
+	Detect avx5124fmaps, avx5124vnniw.
+	* config/i386/i386-builtin-types.def: Add types
+	V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI,
+	V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF,
+	V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF,
+	V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI,
+	V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI,
+	V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI.
+	* config/i386/i386-builtin.def (__builtin_ia32_4fmaddps_mask,
+	__builtin_ia32_4fmaddps, __builtin_ia32_4fmaddss,
+	__builtin_ia32_4fmaddss_mask, __builtin_ia32_4fnmaddps_mask,
+	__builtin_ia32_4fnmaddps, __builtin_ia32_4fnmaddss,
+	__builtin_ia32_4fnmaddss_mask, __builtin_ia32_vp4dpwssd,
+	__builtin_ia32_vp4dpwssd_mask, __builtin_ia32_vp4dpwssds,
+	__builtin_ia32_vp4dpwssds_mask): New.
+	* config/i386/i386-c.c (ix86_target_macros_internal):
+	Define __AVX5124FMAPS__, __AVX5124VNNIW__.
+	* config/i386/i386-modes.def (VECTOR_MODES (FLOAT, 256),
+	VECTOR_MODE (INT, SI, 64)): New modes.
+	* config/i386/i386.c (ix86_target_string): Add -mavx5124fmaps,
+	-mavx5124vnniw.
+	(PTA_AVX5124FMAPS, PTA_AVX5124VNNIW): Define.
+	(ix86_option_override_internal): Handle new options.
+	(ix86_valid_target_attribute_inner_p): Add avx5124fmaps,
+	avx5124vnniw.
+	(ix86_expand_builtin): Handle new builtins.
+	(ix86_additional_allocno_class_p): New.
+	* config/i386/i386.h (TARGET_AVX5124FMAPS,
+	TARGET_AVX5124FMAPS_P,
+	TARGET_AVX5124VNNIW,
+	TARGET_AVX5124VNNIW_P): Define.
+	(reg_class): Add MOD4_SSE_REGS.
+	(MOD4_SSE_REG_P, MOD4_SSE_REGNO_P): New.
+	* config/i386/i386.opt: Add mavx5124fmaps, mavx5124vnniw.
+	* config/i386/immintrin.h: Include avx5124fmapsintrin.h,
+	avx5124vnniwintrin.h.
+	* config/i386/sse.md (unspec): Add UNSPEC_VP4FMADD,
+	UNSPEC_VP4FNMADD,
+	UNSPEC_VP4DPWSSD, UNSPEC_VP4DPWSSDS.
+	(define_mode_iterator IMOD4): New.
+	(define_mode_attr imod4_narrow): Ditto.
+	(define_insn "mov<mode>"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddps"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddps_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddps_maskz"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddss"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddss_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddss_maskz"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddps"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddps_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddps_maskz"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddss"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddss_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddss_maskz"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssd"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssd_mask"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssd_maskz"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssds"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssds_mask"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssds_maskz"): Ditto.
+	* init-regs.c (initialize_uninitialized_regs): Add emit_clobber call.
+	* genmodes.c (mode_size_inline): Extend return type.
+	* machmode.h (mode_size, mode_base_align): Extend type.
+
 2016-11-10  Jakub Jelinek  <jakub@redhat.com>
 
 	* omp-low.c (lower_omp_target): Fix up argument to is_reference.
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index d201154..98224f5 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -76,6 +76,8 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512F_SET)
 #define OPTION_MASK_ISA_AVX512VBMI_SET \
   (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512BW_SET)
+#define OPTION_MASK_ISA_AVX5124FMAPS_SET OPTION_MASK_ISA_AVX5124FMAPS
+#define OPTION_MASK_ISA_AVX5124VNNIW_SET OPTION_MASK_ISA_AVX5124VNNIW
 #define OPTION_MASK_ISA_RTM_SET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
@@ -179,6 +181,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL
 #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA
 #define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI
+#define OPTION_MASK_ISA_AVX5124FMAPS_UNSET OPTION_MASK_ISA_AVX5124FMAPS
+#define OPTION_MASK_ISA_AVX5124VNNIW_UNSET OPTION_MASK_ISA_AVX5124VNNIW
 #define OPTION_MASK_ISA_RTM_UNSET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
@@ -399,6 +403,12 @@ ix86_handle_option (struct gcc_options *opts,
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
+
+	  /* Turn off additional isa flags.  */
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
 	}
       return true;
 
@@ -441,6 +451,36 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mavx5124fmaps:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124FMAPS_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	}
+      return true;
+
+    case OPT_mavx5124vnniw:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124VNNIW_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+	}
+      return true;
+
     case OPT_mavx512dq:
       if (value)
 	{
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3e0be22..20413fb 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -373,8 +373,8 @@ i[34567]86-*-*)
 		       xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
 		       avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
-		       avx512vbmivlintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h"
+		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
+		       clwbintrin.h mwaitxintrin.h clzerointrin.h pkuintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -395,8 +395,8 @@ x86_64-*-*)
 		       xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
 		       avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
-		       avx512vbmivlintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h"
+		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
+		       clwbintrin.h mwaitxintrin.h clzerointrin.h pkuintrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx5124fmapsintrin.h b/gcc/config/i386/avx5124fmapsintrin.h
new file mode 100644
index 0000000..6113ee9
--- /dev/null
+++ b/gcc/config/i386/avx5124fmapsintrin.h
@@ -0,0 +1,216 @@
+/* Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx5124fmapsintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _AVX5124FMAPSINTRIN_H_INCLUDED
+#define _AVX5124FMAPSINTRIN_H_INCLUDED
+
+#ifndef __AVX5124FMAPS__
+#pragma GCC push_options
+#pragma GCC target("avx5124fmaps")
+#define __DISABLE_AVX5124FMAPS__
+#endif /* __AVX5124FMAPS__ */
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4fmadd_ps (__m512 __A, __m512 __B, __m512 __C,
+		  __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps ((__v16sf) __B,
+					   (__v16sf) __C,
+					   (__v16sf) __D,
+					   (__v16sf) __E,
+					   (__v16sf) __A,
+					   (const __v4sf *) __F);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4fmadd_ps (__m512 __A, __mmask16 __U, __m512 __B,
+		       __m512 __C, __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps_mask ((__v16sf) __B,
+						(__v16sf) __C,
+						(__v16sf) __D,
+						(__v16sf) __E,
+						(__v16sf) __A,
+						(const __v4sf *) __F,
+						(__v16sf) __A,
+						(__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4fmadd_ps (__mmask16 __U,
+			__m512 __A, __m512 __B, __m512 __C,
+			__m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps_mask ((__v16sf) __B,
+						(__v16sf) __C,
+						(__v16sf) __D,
+						(__v16sf) __E,
+						(__v16sf) __A,
+						(const __v4sf *) __F,
+						(__v16sf) _mm512_setzero_ps (),
+						(__mmask16) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_4fmadd_ss (__m128 __A, __m128 __B, __m128 __C,
+	       __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss ((__v4sf) __B,
+					   (__v4sf) __C,
+					   (__v4sf) __D,
+					   (__v4sf) __E,
+					   (__v4sf) __A,
+					   (const __v4sf *) __F);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_4fmadd_ss (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C,
+		    __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
+						(__v4sf) __C,
+						(__v4sf) __D,
+						(__v4sf) __E,
+						(__v4sf) __A,
+						(const __v4sf *) __F,
+						(__v4sf) __A,
+						(__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_4fmadd_ss (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C,
+		     __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
+						(__v4sf) __C,
+						(__v4sf) __D,
+						(__v4sf) __E,
+						(__v4sf) __A,
+						(const __v4sf *) __F,
+						(__v4sf) _mm_setzero_ps (),
+						(__mmask8) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4fnmadd_ps (__m512 __A, __m512 __B, __m512 __C,
+		   __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps ((__v16sf) __B,
+					    (__v16sf) __C,
+					    (__v16sf) __D,
+					    (__v16sf) __E,
+					    (__v16sf) __A,
+					    (const __v4sf *) __F);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4fnmadd_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			__m512 __C, __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps_mask ((__v16sf) __B,
+						 (__v16sf) __C,
+						 (__v16sf) __D,
+						 (__v16sf) __E,
+						 (__v16sf) __A,
+						 (const __v4sf *) __F,
+						 (__v16sf) __A,
+						 (__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4fnmadd_ps (__mmask16 __U,
+			 __m512 __A, __m512 __B, __m512 __C,
+			 __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps_mask ((__v16sf) __B,
+						 (__v16sf) __C,
+						 (__v16sf) __D,
+						 (__v16sf) __E,
+						 (__v16sf) __A,
+						 (const __v4sf *) __F,
+						 (__v16sf) _mm512_setzero_ps (),
+						 (__mmask16) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_4fnmadd_ss (__m128 __A, __m128 __B, __m128 __C,
+		__m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss ((__v4sf) __B,
+					    (__v4sf) __C,
+					    (__v4sf) __D,
+					    (__v4sf) __E,
+					    (__v4sf) __A,
+					    (const __v4sf *) __F);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_4fnmadd_ss (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C,
+		     __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss_mask ((__v4sf) __B,
+						 (__v4sf) __C,
+						 (__v4sf) __D,
+						 (__v4sf) __E,
+						 (__v4sf) __A,
+						 (const __v4sf *) __F,
+						 (__v4sf) __A,
+						 (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_4fnmadd_ss (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C,
+		      __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss_mask ((__v4sf) __B,
+						 (__v4sf) __C,
+						 (__v4sf) __D,
+						 (__v4sf) __E,
+						 (__v4sf) __A,
+						 (const __v4sf *) __F,
+						 (__v4sf) _mm_setzero_ps (),
+						 (__mmask8) __U);
+}
+
+#ifdef __DISABLE_AVX5124FMAPS__
+#undef __DISABLE_AVX5124FMAPS__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX5124FMAPS__ */
+
+#endif /* _AVX5124FMAPSINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx5124vnniwintrin.h b/gcc/config/i386/avx5124vnniwintrin.h
new file mode 100644
index 0000000..392c6a5
--- /dev/null
+++ b/gcc/config/i386/avx5124vnniwintrin.h
@@ -0,0 +1,132 @@
+/* Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx5124vnniwintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _AVX5124VNNIWINTRIN_H_INCLUDED
+#define _AVX5124VNNIWINTRIN_H_INCLUDED
+
+#ifndef __AVX5124VNNIW__
+#pragma GCC push_options
+#pragma GCC target("avx5124vnniw")
+#define __DISABLE_AVX5124VNNIW__
+#endif /* __AVX5124VNNIW__ */
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4dpwssd_epi32 (__m512i __A, __m512i __B, __m512i __C,
+		      __m512i __D, __m512i __E, __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd ((__v16si) __B,
+					     (__v16si) __C,
+					     (__v16si) __D,
+					     (__v16si) __E,
+					     (__v16si) __A,
+					     (const __v4si *) __F);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4dpwssd_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+			   __m512i __C, __m512i __D, __m512i __E,
+			   __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd_mask ((__v16si) __B,
+						  (__v16si) __C,
+						  (__v16si) __D,
+						  (__v16si) __E,
+						  (__v16si) __A,
+						  (const __v4si *) __F,
+						  (__v16si) __A,
+						  (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4dpwssd_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+			    __m512i __C, __m512i __D, __m512i __E,
+			    __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd_mask ((__v16si) __B,
+						  (__v16si) __C,
+						  (__v16si) __D,
+						  (__v16si) __E,
+						  (__v16si) __A,
+						  (const __v4si *) __F,
+						  (__v16si) _mm512_setzero_ps (),
+						  (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4dpwssds_epi32 (__m512i __A, __m512i __B, __m512i __C,
+		       __m512i __D, __m512i __E, __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds ((__v16si) __B,
+					      (__v16si) __C,
+					      (__v16si) __D,
+					      (__v16si) __E,
+					      (__v16si) __A,
+					      (const __v4si *) __F);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4dpwssds_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+			    __m512i __C, __m512i __D, __m512i __E,
+			    __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds_mask ((__v16si) __B,
+						   (__v16si) __C,
+						   (__v16si) __D,
+						   (__v16si) __E,
+						   (__v16si) __A,
+						   (const __v4si *) __F,
+						   (__v16si) __A,
+						   (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4dpwssds_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+			     __m512i __C, __m512i __D, __m512i __E,
+			     __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds_mask ((__v16si) __B,
+						   (__v16si) __C,
+						   (__v16si) __D,
+						   (__v16si) __E,
+						   (__v16si) __A,
+						   (const __v4si *) __F,
+						   (__v16si) _mm512_setzero_ps (),
+						   (__mmask16) __U);
+}
+
+#ifdef __DISABLE_AVX5124VNNIW__
+#undef __DISABLE_AVX5124VNNIW__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX5124VNNIW__ */
+
+#endif /* _AVX5124VNNIWINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index d610336..ebeb437 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;           H
-;;;           h j               z
+;;;             j               z
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -94,6 +94,9 @@
 (define_register_constraint "v" "TARGET_SSE ? ALL_SSE_REGS : NO_REGS"
  "Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).")
 
+(define_register_constraint "h" "TARGET_AVX512F ? MOD4_SSE_REGS : NO_REGS"
+ "Any EVEX encodable SSE register, which has number factor of four.")
+
 (define_register_constraint "w" "TARGET_MPX ? BND_REGS : NO_REGS"
  "@internal Any bound register.")
 
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 2a946bf..abe7c62 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -60,6 +60,8 @@
 #define bit_MWAITX      (1 << 29)
 
 /* %edx */
+#define bit_AVX5124VNNIW (1 << 2)
+#define bit_AVX5124FMAPS (1 << 3)
 #define bit_MMXEXT	(1 << 22)
 #define bit_LM		(1 << 29)
 #define bit_3DNOWP	(1 << 30)
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index e026482..f0d0e8f 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -414,6 +414,7 @@ const char *host_detect_local_cpu (int argc, const char **argv)
   unsigned int has_avx512dq = 0, has_avx512bw = 0, has_avx512vl = 0;
   unsigned int has_avx512vbmi = 0, has_avx512ifma = 0, has_clwb = 0;
   unsigned int has_mwaitx = 0, has_clzero = 0, has_pku = 0;
+  unsigned int has_avx5124fmaps = 0, has_avx5124vnniw = 0;
 
   bool arch;
 
@@ -501,6 +502,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
       has_prefetchwt1 = ecx & bit_PREFETCHWT1;
       has_avx512vbmi = ecx & bit_AVX512VBMI;
       has_pku = ecx & bit_OSPKE;
+      has_avx5124vnniw = edx & bit_AVX5124VNNIW;
+      has_avx5124fmaps = edx & bit_AVX5124FMAPS;
     }
 
   if (max_level >= 13)
@@ -1021,6 +1024,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
       const char *avx512vl = has_avx512vl ? " -mavx512vl" : " -mno-avx512vl";
       const char *avx512ifma = has_avx512ifma ? " -mavx512ifma" : " -mno-avx512ifma";
       const char *avx512vbmi = has_avx512vbmi ? " -mavx512vbmi" : " -mno-avx512vbmi";
+      const char *avx5124vnniw = has_avx5124vnniw ? " -mavx5124vnniw" : " -mno-avx5124vnniw";
+      const char *avx5124fmaps = has_avx5124fmaps ? " -mavx5124fmaps" : " -mno-avx5124fmaps";
       const char *clwb = has_clwb ? " -mclwb" : " -mno-clwb";
       const char *mwaitx  = has_mwaitx  ? " -mmwaitx"  : " -mno-mwaitx"; 
       const char *clzero  = has_clzero  ? " -mclzero"  : " -mno-clzero";
@@ -1033,8 +1038,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
 			fxsr, xsave, xsaveopt, avx512f, avx512er,
 			avx512cd, avx512pf, prefetchwt1, clflushopt,
 			xsavec, xsaves, avx512dq, avx512bw, avx512vl,
-			avx512ifma, avx512vbmi, clwb, mwaitx,
-			clzero, pku, NULL);
+			avx512ifma, avx512vbmi, avx5124fmaps, avx5124vnniw,
+			clwb, mwaitx, clzero, pku, NULL);
     }
 
 done:
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index b34cfda..4a38c12 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -526,6 +526,15 @@ DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, PV8DI, V8DI)
 
+DEF_FUNCTION_TYPE (V16SF, V16SF, V16SF, V16SF, V16SF, V16SF, PCV4SF, V16SF, UHI)
+DEF_FUNCTION_TYPE (V16SF, V16SF, V16SF, V16SF, V16SF, V16SF, PCV4SF)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, V4SF, V4SF, PCV4SF)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, V4SF, V4SF, PCV4SF, V4SF, UQI)
+
+DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI, V16SI, UHI)
+DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI)
+
+
 # Instructions returning mask
 DEF_FUNCTION_TYPE (UHI, UHI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 227526b..3cf18f0 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2482,7 +2482,24 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ufix_truncv8dfv8di2_mask_round, "__bui
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
 
-BDESC_END (ROUND_ARGS, MPX)
+BDESC_END (ROUND_ARGS, ARGS2)
+
+/* AVX-5124FMA/NNI builtins with variable number of arguments. Defined in additional ix86_isa_flags2.  */
+BDESC_FIRST (args2, ARGS2,
+       OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddps_mask, "__builtin_ia32_4fmaddps_mask", IX86_BUILTIN_4FMAPS_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddps, "__builtin_ia32_4fmaddps", IX86_BUILTIN_4FMAPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddss, "__builtin_ia32_4fmaddss", IX86_BUILTIN_4FMASS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddss_mask, "__builtin_ia32_4fmaddss_mask", IX86_BUILTIN_4FMASS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddps_mask, "__builtin_ia32_4fnmaddps_mask", IX86_BUILTIN_4FNMAPS_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddps, "__builtin_ia32_4fnmaddps", IX86_BUILTIN_4FNMAPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddss, "__builtin_ia32_4fnmaddss", IX86_BUILTIN_4FNMASS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddss_mask, "__builtin_ia32_4fnmaddss_mask", IX86_BUILTIN_4FNMASS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssd, "__builtin_ia32_vp4dpwssd", IX86_BUILTIN_4DPWSSD, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssd_mask, "__builtin_ia32_vp4dpwssd_mask", IX86_BUILTIN_4DPWSSD_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssds, "__builtin_ia32_vp4dpwssds", IX86_BUILTIN_4DPWSSDS, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssds_mask, "__builtin_ia32_vp4dpwssds_mask", IX86_BUILTIN_4DPWSSDS_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
+
+BDESC_END (ARGS2, MPX)
 
 /* Builtins for MPX.  */
 BDESC_FIRST (mpx, MPX,
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 9bb80c0..9599e11 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -28,7 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 
 static bool ix86_pragma_target_parse (tree, tree);
 static void ix86_target_macros_internal
-  (HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
+  (HOST_WIDE_INT, HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
    void (*def_or_undef) (cpp_reader *, const char *));
 
 \f
@@ -36,6 +36,7 @@ static void ix86_target_macros_internal
    macros.  */
 static void
 ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
+			     HOST_WIDE_INT isa_flag2,
 			     enum processor_type arch,
 			     enum processor_type tune,
 			     enum fpmath_unit fpmath,
@@ -376,6 +377,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__AVX512VBMI__");
   if (isa_flag & OPTION_MASK_ISA_AVX512IFMA)
     def_or_undef (parse_in, "__AVX512IFMA__");
+  if (isa_flag & OPTION_MASK_ISA_AVX5124VNNIW)
+    def_or_undef (parse_in, "__AVX5124VNNIW__");
+  if (isa_flag2 & OPTION_MASK_ISA_AVX5124FMAPS)
+    def_or_undef (parse_in, "__AVX5124FMAPS__");
   if (isa_flag & OPTION_MASK_ISA_FMA)
     def_or_undef (parse_in, "__FMA__");
   if (isa_flag & OPTION_MASK_ISA_RTM)
@@ -462,6 +467,9 @@ ix86_pragma_target_parse (tree args, tree pop_target)
   HOST_WIDE_INT prev_isa;
   HOST_WIDE_INT cur_isa;
   HOST_WIDE_INT diff_isa;
+  HOST_WIDE_INT prev_isa2;
+  HOST_WIDE_INT cur_isa2;
+  HOST_WIDE_INT diff_isa2;
   enum processor_type prev_arch;
   enum processor_type prev_tune;
   enum processor_type cur_arch;
@@ -494,6 +502,9 @@ ix86_pragma_target_parse (tree args, tree pop_target)
   prev_isa  = prev_opt->x_ix86_isa_flags;
   cur_isa   = cur_opt->x_ix86_isa_flags;
   diff_isa  = (prev_isa ^ cur_isa);
+  prev_isa2  = prev_opt->x_ix86_isa_flags2;
+  cur_isa2   = cur_opt->x_ix86_isa_flags2;
+  diff_isa2  = (prev_isa2 ^ cur_isa2);
   prev_arch = (enum processor_type) prev_opt->arch;
   prev_tune = (enum processor_type) prev_opt->tune;
   cur_arch  = (enum processor_type) cur_opt->arch;
@@ -509,6 +520,7 @@ ix86_pragma_target_parse (tree args, tree pop_target)
 
   /* Undef all of the macros for that are no longer current.  */
   ix86_target_macros_internal (prev_isa & diff_isa,
+			       prev_isa2 & diff_isa2,
 			       prev_arch,
 			       prev_tune,
 			       (enum fpmath_unit) prev_opt->x_ix86_fpmath,
@@ -523,6 +535,7 @@ ix86_pragma_target_parse (tree args, tree pop_target)
 
   /* Define all of the macros for new options that were just turned on.  */
   ix86_target_macros_internal (cur_isa & diff_isa,
+			       cur_isa2 & diff_isa2,
 			       cur_arch,
 			       cur_tune,
 			       (enum fpmath_unit) cur_opt->x_ix86_fpmath,
@@ -583,6 +596,7 @@ ix86_target_macros (void)
   cpp_define (parse_in, "__GCC_ASM_FLAG_OUTPUTS__");
 
   ix86_target_macros_internal (ix86_isa_flags,
+			       ix86_isa_flags2,
 			       ix86_arch,
 			       ix86_tune,
 			       ix86_fpmath,
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index d524313..22b9713 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -84,6 +84,7 @@ VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
 VECTOR_MODES (FLOAT, 32);     /*        V16HF V8SF V4DF */
 VECTOR_MODES (FLOAT, 64);     /*       V32HF V16SF V8DF */
 VECTOR_MODES (FLOAT, 128);    /*      V64HF V32SF V16DF */
+VECTOR_MODES (FLOAT, 256);    /*      V64SF V32DF V16TF */
 VECTOR_MODE (INT, TI, 1);     /*                   V1TI */
 VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
 VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
@@ -91,6 +92,7 @@ VECTOR_MODE (INT, QI, 2);     /*                   V2QI */
 VECTOR_MODE (INT, QI, 12);    /*                  V12QI */
 VECTOR_MODE (INT, QI, 14);    /*                  V14QI */
 VECTOR_MODE (INT, HI, 6);     /*                   V6HI */
+VECTOR_MODE (INT, SI, 64);    /* 		  V64SI */
 
 POINTER_BOUNDS_MODE (BND32, 8);
 POINTER_BOUNDS_MODE (BND64, 16);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..0dad131 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2579,7 +2579,7 @@ static int ix86_function_regparm (const_tree, const_tree);
 static void ix86_compute_frame_layout (struct ix86_frame *);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
 						 rtx, rtx, int);
-static void ix86_add_new_builtins (HOST_WIDE_INT);
+static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
 static tree ix86_canonical_va_list_type (tree);
 static void predict_jump (int);
 static unsigned int split_stack_prologue_scratch_regno (void);
@@ -2592,8 +2592,9 @@ enum ix86_function_specific_strings
   IX86_FUNCTION_SPECIFIC_MAX
 };
 
-static char *ix86_target_string (HOST_WIDE_INT, int, int, const char *,
-				 const char *, enum fpmath_unit, bool);
+static char *ix86_target_string (HOST_WIDE_INT, HOST_WIDE_INT, int, int,
+				 const char *, const char *, enum fpmath_unit,
+				 bool);
 static void ix86_function_specific_save (struct cl_target_option *,
 					 struct gcc_options *opts);
 static void ix86_function_specific_restore (struct gcc_options *opts,
@@ -4188,8 +4189,8 @@ ix86_using_red_zone (void)
    responsible for freeing the string.  */
 
 static char *
-ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
-		    const char *arch, const char *tune,
+ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2, int flags,
+		    int ix86_flags, const char *arch, const char *tune,
 		    enum fpmath_unit fpmath, bool add_nl_p)
 {
   struct ix86_target_opts
@@ -4257,7 +4258,12 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
     { "-mclzero",	OPTION_MASK_ISA_CLZERO  },
     { "-mpku",		OPTION_MASK_ISA_PKU  },
   };
-
+//additional structure for isa flags
+  static struct ix86_target_opts isa_opts2[] =
+  {
+    { "-mavx5124vnniw", OPTION_MASK_ISA_AVX5124VNNIW },
+    { "-mavx5124fmaps", OPTION_MASK_ISA_AVX5124FMAPS },
+  };
   /* Flag options.  */
   static struct ix86_target_opts flag_opts[] =
   {
@@ -4298,8 +4304,8 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
     { "-mgeneral-regs-only",		OPTION_MASK_GENERAL_REGS_ONLY },
   };
 
-  const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (flag_opts)
-		   + ARRAY_SIZE (ix86_flag_opts) + 6][2];
+  const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (isa_opts2)
+		   + ARRAY_SIZE (flag_opts) + ARRAY_SIZE (ix86_flag_opts) + 6][2];
 
   char isa_other[40];
   char target_other[40];
@@ -4361,6 +4367,17 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
 	       isa);
     }
 
+  /* Pick out the options in isa2 options.  */
+  for (i = 0; i < ARRAY_SIZE (isa_opts2); i++)
+    {
+      if ((isa2 & isa_opts2[i].mask) != 0)
+	{
+	  opts[num++][0] = isa_opts2[i].option;
+	  isa &= ~ isa_opts2[i].mask;
+	}
+    }
+
+
   /* Add flag options.  */
   for (i = 0; i < ARRAY_SIZE (flag_opts); i++)
     {
@@ -4486,9 +4503,9 @@ ix86_profile_before_prologue (void)
 void ATTRIBUTE_UNUSED
 ix86_debug_options (void)
 {
-  char *opts = ix86_target_string (ix86_isa_flags, target_flags,
-				   ix86_target_flags,
-				   ix86_arch_string, ix86_tune_string,
+  char *opts = ix86_target_string (ix86_isa_flags, ix86_isa_flags2,
+				   target_flags, ix86_target_flags,
+				   ix86_arch_string,ix86_tune_string,
 				   ix86_fpmath, true);
 
   if (opts)
@@ -4844,6 +4861,8 @@ ix86_option_override_internal (bool main_args_p,
 #define PTA_CLZERO		(HOST_WIDE_INT_1 << 57)
 #define PTA_NO_80387		(HOST_WIDE_INT_1 << 58)
 #define PTA_PKU		(HOST_WIDE_INT_1 << 59)
+#define PTA_AVX5124VNNIW	(HOST_WIDE_INT_1 << 60)
+#define PTA_AVX5124FMAPS	(HOST_WIDE_INT_1 << 61)
 
 #define PTA_CORE2 \
   (PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 \
@@ -5499,6 +5518,14 @@ ix86_option_override_internal (bool main_args_p,
 	if (processor_alias_table[i].flags & PTA_AVX512IFMA
 	    && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512IFMA))
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512IFMA;
+
+	if (processor_alias_table[i].flags & PTA_AVX5124VNNIW
+	    && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_AVX5124VNNIW))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124VNNIW;
+	if (processor_alias_table[i].flags & PTA_AVX5124FMAPS
+	    && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_AVX5124FMAPS))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124FMAPS;
+
 	if (processor_alias_table[i].flags & (PTA_PREFETCH_SSE | PTA_SSE))
 	  x86_prefetch_sse = true;
 	if (processor_alias_table[i].flags & PTA_MWAITX
@@ -6298,6 +6325,7 @@ ix86_function_specific_save (struct cl_target_option *ptr,
   ptr->tune_defaulted = ix86_tune_defaulted;
   ptr->arch_specified = ix86_arch_specified;
   ptr->x_ix86_isa_flags_explicit = opts->x_ix86_isa_flags_explicit;
+  ptr->x_ix86_isa_flags2_explicit = opts->x_ix86_isa_flags2_explicit;
   ptr->x_recip_mask_explicit = opts->x_recip_mask_explicit;
   ptr->x_ix86_arch_string = opts->x_ix86_arch_string;
   ptr->x_ix86_tune_string = opts->x_ix86_tune_string;
@@ -6354,6 +6382,7 @@ ix86_function_specific_restore (struct gcc_options *opts,
   ix86_tune_defaulted = ptr->tune_defaulted;
   ix86_arch_specified = ptr->arch_specified;
   opts->x_ix86_isa_flags_explicit = ptr->x_ix86_isa_flags_explicit;
+  opts->x_ix86_isa_flags2_explicit = ptr->x_ix86_isa_flags2_explicit;
   opts->x_recip_mask_explicit = ptr->x_recip_mask_explicit;
   opts->x_ix86_arch_string = ptr->x_ix86_arch_string;
   opts->x_ix86_tune_string = ptr->x_ix86_tune_string;
@@ -6459,9 +6488,9 @@ ix86_function_specific_print (FILE *file, int indent,
 			      struct cl_target_option *ptr)
 {
   char *target_string
-    = ix86_target_string (ptr->x_ix86_isa_flags, ptr->x_target_flags,
-			  ptr->x_ix86_target_flags, NULL, NULL,
-			  ptr->x_ix86_fpmath, false);
+    = ix86_target_string (ptr->x_ix86_isa_flags, ptr->x_ix86_isa_flags2,
+			  ptr->x_target_flags, ptr->x_ix86_target_flags,
+			  NULL, NULL, ptr->x_ix86_fpmath, false);
 
   gcc_assert (ptr->arch < PROCESSOR_max);
   fprintf (file, "%*sarch = %d (%s)\n",
@@ -6538,6 +6567,8 @@ ix86_valid_target_attribute_inner_p (tree args, char *p_strings[],
     IX86_ATTR_ISA ("avx512dq",	OPT_mavx512dq),
     IX86_ATTR_ISA ("avx512bw",	OPT_mavx512bw),
     IX86_ATTR_ISA ("avx512vl",	OPT_mavx512vl),
+    IX86_ATTR_ISA ("avx5124fmaps",	OPT_mavx5124fmaps),
+    IX86_ATTR_ISA ("avx5124vnniw",	OPT_mavx5124vnniw),
     IX86_ATTR_ISA ("mmx",	OPT_mmmx),
     IX86_ATTR_ISA ("pclmul",	OPT_mpclmul),
     IX86_ATTR_ISA ("popcnt",	OPT_mpopcnt),
@@ -6796,6 +6827,7 @@ ix86_valid_target_attribute_tree (tree args,
      The string options are attribute options, and will be undone
      when we copy the save structure.  */
   if (opts->x_ix86_isa_flags != def->x_ix86_isa_flags
+      || opts->x_ix86_isa_flags2 != def->x_ix86_isa_flags2
       || opts->x_target_flags != def->x_target_flags
       || option_strings[IX86_FUNCTION_SPECIFIC_ARCH]
       || option_strings[IX86_FUNCTION_SPECIFIC_TUNE]
@@ -6814,7 +6846,7 @@ ix86_valid_target_attribute_tree (tree args,
 				     | OPTION_MASK_ABI_64
 				     | OPTION_MASK_ABI_X32
 				     | OPTION_MASK_CODE16);
-
+	  opts->x_ix86_isa_flags &= 0;
 	}
       else if (!orig_arch_specified)
 	opts->x_ix86_arch_string = NULL;
@@ -6848,7 +6880,7 @@ ix86_valid_target_attribute_tree (tree args,
 	}
 
       /* Add any builtin functions with the new isa if any.  */
-      ix86_add_new_builtins (opts->x_ix86_isa_flags);
+      ix86_add_new_builtins (opts->x_ix86_isa_flags, opts->x_ix86_isa_flags2);
 
       /* Save the current options unless we are validating options for
 	 #pragma.  */
@@ -6953,8 +6985,10 @@ ix86_can_inline_p (tree caller, tree callee)
       /* Callee's isa options should a subset of the caller's, i.e. a SSE4 function
 	 can inline a SSE2 function but a SSE2 function can't inline a SSE4
 	 function.  */
-      if ((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
-	  != callee_opts->x_ix86_isa_flags)
+      if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
+	  != callee_opts->x_ix86_isa_flags) &
+	  ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
+	  != callee_opts->x_ix86_isa_flags2))
 	ret = false;
 
       /* See if we have the same non-isa options.  */
@@ -12078,6 +12112,15 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
 	      && df_regs_ever_live_p (regno)));
 }
 
+/* Return true if register class CL should be an additional allocno
+   class.  */
+
+static bool
+ix86_additional_allocno_class_p (reg_class_t cl)
+{
+  return cl == MOD4_SSE_REGS;
+}
+
 /* Return TRUE if we need to save REGNO.  */
 
 static bool
@@ -30836,6 +30879,7 @@ struct builtin_isa {
   const char *name;		/* function name */
   enum ix86_builtin_func_type tcode; /* type to use in the declaration */
   HOST_WIDE_INT isa;		/* isa_flags this builtin is defined for */
+  HOST_WIDE_INT isa2;		/* additional isa_flags this builtin is defined for */
   bool const_p;			/* true if the declaration is constant */
   bool leaf_p;			/* true if the declaration has leaf attribute */
   bool nothrow_p;		/* true if the declaration has nothrow attribute */
@@ -30846,6 +30890,7 @@ static struct builtin_isa ix86_builtins_isa[(int) IX86_BUILTIN_MAX];
 
 /* Bits that can still enable any inclusion of a builtin.  */
 static HOST_WIDE_INT deferred_isa_values = 0;
+static HOST_WIDE_INT deferred_isa_values2 = 0;
 
 /* Add an ix86 target builtin function with CODE, NAME and TYPE.  Save the MASK
    of which isa_flags to use in the ix86_builtins_isa array.  Stores the
@@ -30927,19 +30972,74 @@ def_builtin_const (HOST_WIDE_INT mask, const char *name,
 
   return decl;
 }
+//def_builting for additional isa flags
+static inline tree
+def_builtin2 (HOST_WIDE_INT mask, const char *name,
+	     enum ix86_builtin_func_type tcode,
+	     enum ix86_builtins code)
+{
+  tree decl = NULL_TREE;
+
+  ix86_builtins_isa[(int) code].isa2 = mask;
+
+  if (mask == 0
+      || (mask & ix86_isa_flags2) != 0
+      || (lang_hooks.builtin_function
+	  == lang_hooks.builtin_function_ext_scope))
+
+    {
+      tree type = ix86_get_builtin_func_type (tcode);
+      decl = add_builtin_function (name, type, code, BUILT_IN_MD,
+				   NULL, NULL_TREE);
+	  ix86_builtins[(int) code] = decl;
+	  ix86_builtins_isa[(int) code].set_and_not_built_p = false;
+    }
+  else
+    {
+      /* Just a MASK where set_and_not_built_p == true can potentially
+	 include a builtin.  */
+      deferred_isa_values2 |= mask;
+      ix86_builtins[(int) code] = NULL_TREE;
+      ix86_builtins_isa[(int) code].tcode = tcode;
+      ix86_builtins_isa[(int) code].name = name;
+      ix86_builtins_isa[(int) code].leaf_p = false;
+      ix86_builtins_isa[(int) code].nothrow_p = false;
+      ix86_builtins_isa[(int) code].const_p = false;
+      ix86_builtins_isa[(int) code].set_and_not_built_p = true;
+    }
+
+  return decl;
+}
+
+/* Like def_builtin, but also marks the function decl "const".  */
+
+static inline tree
+def_builtin_const2 (HOST_WIDE_INT mask, const char *name,
+		   enum ix86_builtin_func_type tcode, enum ix86_builtins code)
+{
+  tree decl = def_builtin2 (mask, name, tcode, code);
+  if (decl)
+    TREE_READONLY (decl) = 1;
+  else
+    ix86_builtins_isa[(int) code].const_p = true;
+
+  return decl;
+}
 
 /* Add any new builtin functions for a given ISA that may not have been
    declared.  This saves a bit of space compared to adding all of the
    declarations to the tree, even if we didn't use them.  */
 
 static void
-ix86_add_new_builtins (HOST_WIDE_INT isa)
+ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2)
 {
-  if ((isa & deferred_isa_values) == 0)
+  if (((isa & deferred_isa_values) == 0)
+      &&((isa2 & deferred_isa_values2) == 0))
     return;
 
   /* Bits in ISA value can be removed from potential isa values.  */
   deferred_isa_values &= ~isa;
+  deferred_isa_values2 &= ~isa2;
 
   int i;
   tree saved_current_target_pragma = current_target_pragma;
@@ -30947,7 +31047,7 @@ ix86_add_new_builtins (HOST_WIDE_INT isa)
 
   for (i = 0; i < (int)IX86_BUILTIN_MAX; i++)
     {
-      if ((ix86_builtins_isa[i].isa & isa) != 0
+      if ((((ix86_builtins_isa[i].isa & isa) != 0) || ((ix86_builtins_isa[i].isa2 & isa2) != 0))
 	  && ix86_builtins_isa[i].set_and_not_built_p)
 	{
 	  tree decl, type;
@@ -31185,8 +31285,10 @@ BDESC_VERIFYS (IX86_BUILTIN__BDESC_ARGS_FIRST,
 	       IX86_BUILTIN__BDESC_SPECIAL_ARGS_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_ROUND_ARGS_FIRST,
 	       IX86_BUILTIN__BDESC_ARGS_LAST, 1);
-BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_FIRST,
+BDESC_VERIFYS (IX86_BUILTIN__BDESC_ARGS2_FIRST,
 	       IX86_BUILTIN__BDESC_ROUND_ARGS_LAST, 1);
+BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_FIRST,
+	       IX86_BUILTIN__BDESC_ARGS2_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_CONST_FIRST,
 	       IX86_BUILTIN__BDESC_MPX_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_MULTI_ARG_FIRST,
@@ -31237,6 +31339,18 @@ ix86_init_mmx_sse_builtins (void)
 		 IX86_BUILTIN__BDESC_ARGS_FIRST,
 		 ARRAY_SIZE (bdesc_args) - 1);
 
+  /* Add all builtins with variable number of operands.  */
+  for (i = 0, d = bdesc_args2;
+       i < ARRAY_SIZE (bdesc_args2);
+       i++, d++)
+    {
+      if (d->name == 0)
+	continue;
+
+      ftype = (enum ix86_builtin_func_type) d->flag;
+      def_builtin_const2 (d->mask, d->name, ftype, d->code);
+    }
+
   /* Add all builtins with rounding.  */
   for (i = 0, d = bdesc_round_args;
        i < ARRAY_SIZE (bdesc_round_args);
@@ -36428,10 +36542,13 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget,
      current ISA based on the command line switches.  With function specific
      options, we need to check in the context of the function making the call
      whether it is supported.  */
-  if (ix86_builtins_isa[fcode].isa
+  if ((ix86_builtins_isa[fcode].isa
       && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
+	&& (ix86_builtins_isa[fcode].isa2
+      && !(ix86_builtins_isa[fcode].isa2 & ix86_isa_flags2)))
     {
-      char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa, 0, 0,
+      char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa,
+				       ix86_builtins_isa[fcode].isa2, 0, 0,
 				       NULL, NULL, (enum fpmath_unit) 0,
 				       false);
       if (!opts)
@@ -38091,6 +38208,246 @@ rdseed_step:
 	}
     }
 
+  if (fcode >= IX86_BUILTIN__BDESC_ARGS2_FIRST
+      && fcode <= IX86_BUILTIN__BDESC_ARGS2_LAST)
+    {
+      i = fcode - IX86_BUILTIN__BDESC_ARGS2_FIRST;
+      rtx (*fcn) (rtx, rtx, rtx, rtx);
+      rtx (*fcn_mask) (rtx, rtx, rtx, rtx, rtx);
+      rtx (*fcn_maskz) (rtx, rtx, rtx, rtx, rtx, rtx);
+      rtx (*msk_mov) (rtx, rtx, rtx, rtx);
+      int masked = 1;
+      machine_mode mode, wide_mode, nar_mode;
+
+      nar_mode  = V4SFmode;
+      mode      = V16SFmode;
+      wide_mode = V64SFmode;
+      msk_mov   = gen_avx512f_loadv16sf_mask;
+      fcn_mask  = gen_avx5124fmaddps_4fmaddps_mask;
+      fcn_maskz = gen_avx5124fmaddps_4fmaddps_maskz;
+
+      switch (fcode)
+	{
+	case IX86_BUILTIN_4FMAPS:
+	  fcn = gen_avx5124fmaddps_4fmaddps;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSD:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn = gen_avx5124vnniw_vp4dpwssd;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSDS:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn = gen_avx5124vnniw_vp4dpwssds;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4FNMAPS:
+	  fcn = gen_avx5124fmaddps_4fnmaddps;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4FNMAPS_MASK:
+	  fcn_mask  = gen_avx5124fmaddps_4fnmaddps_mask;
+	  fcn_maskz = gen_avx5124fmaddps_4fnmaddps_maskz;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSD_MASK:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn_mask  = gen_avx5124vnniw_vp4dpwssd_mask;
+	  fcn_maskz = gen_avx5124vnniw_vp4dpwssd_maskz;
+	  msk_mov   = gen_avx512f_loadv16si_mask;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSDS_MASK:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn_mask  = gen_avx5124vnniw_vp4dpwssds_mask;
+	  fcn_maskz = gen_avx5124vnniw_vp4dpwssds_maskz;
+	  msk_mov   = gen_avx512f_loadv16si_mask;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4FMAPS_MASK:
+	  {
+	    tree args[4];
+	    rtx ops[4];
+	    rtx wide_reg;
+	    rtx accum;
+	    rtx addr;
+	    rtx mem;
+
+v4fma_expand:
+	    wide_reg = gen_reg_rtx (wide_mode);
+	    for (i = 0; i < 4; i++)
+	      {
+	        args[i] = CALL_EXPR_ARG (exp, i);
+		ops[i] = expand_normal (args[i]);
+
+		emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, (i) * 64),
+				  ops[i]);
+	      }
+
+	    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
+	    accum = force_reg (mode, accum);
+
+	    addr = expand_normal (CALL_EXPR_ARG (exp, 5));
+	    addr = force_reg (Pmode, addr);
+
+	    mem = gen_rtx_MEM (nar_mode, addr);
+
+	    target = gen_reg_rtx (mode);
+
+	    emit_move_insn (target, accum);
+
+	    if (! masked)
+	      emit_insn (fcn (target, accum, wide_reg, mem));
+	    else
+	      {
+	        rtx merge, mask;
+		merge = expand_normal (CALL_EXPR_ARG (exp, 6));
+
+		mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+
+		if (CONST_INT_P (mask))
+		  mask = fixup_modeless_constant (mask, HImode);
+
+		mask = force_reg (HImode, mask);
+
+		if (GET_MODE (mask) != HImode)
+		  mask = gen_rtx_SUBREG (HImode, mask, 0);
+
+		/* If merge is 0 then we're about to emit z-masked variant.  */
+		if (const0_operand (merge, mode))
+		  emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
+		/* If merge is the same as accum then emit merge-masked variant.  */
+		else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
+		  {
+		    merge = force_reg (mode, merge);
+		    emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
+		  }
+	        /* Merge with something unknown might happen if we z-mask w/ -O0.  */
+		else
+		  {
+		    rtx tmp = target;
+		    emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
+
+		    target = force_reg (mode, merge);
+		    emit_insn (msk_mov (target, tmp, target, mask));
+		  }
+	      }
+	      return target;
+	    }
+
+	case IX86_BUILTIN_4FNMASS:
+	  fcn = gen_avx5124fmaddps_4fnmaddss;
+	  masked = 0;
+	  goto s4fma_expand;
+
+	case IX86_BUILTIN_4FMASS:
+	  fcn = gen_avx5124fmaddps_4fmaddss;
+	  masked = 0;
+	  goto s4fma_expand;
+
+	case IX86_BUILTIN_4FNMASS_MASK:
+	  fcn_mask = gen_avx5124fmaddps_4fnmaddss_mask;
+	  fcn_maskz = gen_avx5124fmaddps_4fnmaddss_maskz;
+	  msk_mov   = gen_avx512vl_loadv4sf_mask;
+	  goto s4fma_expand;
+
+	case IX86_BUILTIN_4FMASS_MASK:
+	  {
+	    tree args[4];
+	    rtx ops[4];
+	    rtx wide_reg;
+	    rtx accum;
+	    rtx addr;
+	    rtx mem;
+
+	    fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
+	    fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
+	    msk_mov   = gen_avx512vl_loadv4sf_mask;
+
+s4fma_expand:
+	    mode = V4SFmode;
+	    wide_reg = gen_reg_rtx (V64SFmode);
+	    for (i = 0; i < 4; i++)
+	      {
+		 rtx tmp;
+		 args[i] = CALL_EXPR_ARG (exp, i);
+		 ops[i] = expand_normal (args[i]);
+
+		 tmp = gen_reg_rtx (SFmode);
+		 emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
+
+		 emit_move_insn (gen_rtx_SUBREG (V16SFmode, wide_reg, i * 64),
+				  gen_rtx_SUBREG (V16SFmode, tmp, 0));
+	      }
+
+	    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
+	    accum = force_reg (V4SFmode, accum);
+
+	    addr = expand_normal (CALL_EXPR_ARG (exp, 5));
+	    addr = force_reg (Pmode, addr);
+
+	    mem = gen_rtx_MEM (V4SFmode, addr);
+
+	    target = gen_reg_rtx (V4SFmode);
+
+	    emit_move_insn (target, accum);
+
+	    if (! masked)
+	      emit_insn (fcn (target, accum, wide_reg, mem));
+	    else
+	      {
+		 rtx merge, mask;
+		 merge = expand_normal (CALL_EXPR_ARG (exp, 6));
+
+		 mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+
+		 if (CONST_INT_P (mask))
+		   mask = fixup_modeless_constant (mask, QImode);
+
+		 mask = force_reg (QImode, mask);
+
+		 if (GET_MODE (mask) != QImode)
+		   mask = gen_rtx_SUBREG (QImode, mask, 0);
+
+		 /* If merge is 0 then we're about to emit z-masked variant.  */
+		 if (const0_operand (merge, mode))
+		   emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
+		 /* If merge is the same as accum then emit merge-masked variant.  */
+		 else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
+		   {
+		     merge = force_reg (mode, merge);
+		     emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
+		   }
+		 /* Merge with something unknown might happen if we z-mask w/ -O0.  */
+		 else
+		   {
+		     rtx tmp = target;
+		     emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
+
+		     target = force_reg (mode, merge);
+		     emit_insn (msk_mov (target, tmp, target, mask));
+		   }
+		}
+	      return target;
+	    }
+	  default:
+	    return ix86_expand_args_builtin (bdesc_args2 + i, exp, target);
+	  }
+    }
+
   if (fcode >= IX86_BUILTIN__BDESC_COMI_FIRST
       && fcode <= IX86_BUILTIN__BDESC_COMI_LAST)
     {
@@ -38151,7 +38508,8 @@ static tree ix86_get_builtin (enum ix86_builtins code)
 
   opts = TREE_TARGET_OPTION (target_tree);
 
-  if (ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
+  if ((ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
+	&& (ix86_builtins_isa[(int) code].isa2 & opts->x_ix86_isa_flags2))
     return ix86_builtin_decl (code, true);
   else
     return NULL_TREE;
@@ -39735,6 +40093,18 @@ ix86_hard_regno_mode_ok (int regno, machine_mode mode)
 	      || VALID_AVX512F_SCALAR_MODE (mode)))
 	return true;
 
+      /* For AVX-5124FMAPS allow V64SFmode for special regnos.  */
+      if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
+	  && MOD4_SSE_REGNO_P (regno)
+	  && mode == V64SFmode)
+	return true;
+
+      /* For AVX-5124VNNIW allow V64SImode for special regnos.  */
+      if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
+	  && MOD4_SSE_REGNO_P (regno)
+	  && mode == V64SImode)
+	return true;
+
       /* TODO check for QI/HI scalars.  */
       /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
       if (TARGET_AVX512VL
@@ -51134,6 +51504,9 @@ ix86_run_selftests (void)
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
 #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
 
+#undef TARGET_ADDITIONAL_ALLOCNO_CLASS_P
+#define TARGET_ADDITIONAL_ALLOCNO_CLASS_P ix86_additional_allocno_class_p
+
 #undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
 #define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index add7a64..801b68a 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -81,6 +81,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TARGET_AVX512VBMI_P(x)	TARGET_ISA_AVX512VBMI_P(x)
 #define TARGET_AVX512IFMA	TARGET_ISA_AVX512IFMA
 #define TARGET_AVX512IFMA_P(x)	TARGET_ISA_AVX512IFMA_P(x)
+#define TARGET_AVX5124FMAPS	TARGET_ISA_AVX5124FMAPS
+#define TARGET_AVX5124FMAPS_P(x) TARGET_ISA_AVX5124FMAPS_P(x)
+#define TARGET_AVX5124VNNIW	TARGET_ISA_AVX5124VNNIW
+#define TARGET_AVX5124VNNIW_P(x) TARGET_ISA_AVX5124VNNIW_P(x)
 #define TARGET_FMA	TARGET_ISA_FMA
 #define TARGET_FMA_P(x)	TARGET_ISA_FMA_P(x)
 #define TARGET_SSE4A	TARGET_ISA_SSE4A
@@ -1089,7 +1093,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define HARD_REGNO_NREGS(REGNO, MODE)					\
   (STACK_REGNO_P (REGNO) || SSE_REGNO_P (REGNO) || MMX_REGNO_P (REGNO)	\
    || MASK_REGNO_P (REGNO) || BND_REGNO_P (REGNO)			\
-   ? (COMPLEX_MODE_P (MODE) ? 2 : 1)					\
+   ? (COMPLEX_MODE_P (MODE) ? 2 :					\
+      (((MODE == V64SFmode) || (MODE == V64SImode)) ? 4 : 1))		\
    : ((MODE) == XFmode							\
       ? (TARGET_64BIT ? 2 : 3)						\
       : ((MODE) == XCmode						\
@@ -1365,6 +1370,7 @@ enum reg_class
   FLOAT_INT_SSE_REGS,
   MASK_EVEX_REGS,
   MASK_REGS,
+  MOD4_SSE_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1425,6 +1431,7 @@ enum reg_class
    "FLOAT_INT_SSE_REGS",		\
    "MASK_EVEX_REGS",			\
    "MASK_REGS",				\
+   "MOD4_SSE_REGS"			\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1465,11 +1472,14 @@ enum reg_class
 {   0x11ffff,    0x1fe0,    0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,   0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,   0x1f },       /* FLOAT_INT_SSE_REGS */        \
-       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */            \
        { 0x0,       0x0, 0x1fe0 },       /* MASK_REGS */                 \
-{ 0xffffffff,0xffffffff,0x1ffff }                                        \
+{ 0x1fe00000,0xffffe000,   0x1f },       /* MOD4_SSE_REGS */		 \
+{ 0xffffffff,0xffffffff,0x1ffff }		\
 }
 
+/* { 0x02200000,0x22222000,   0x02 },*/       /* MOD4_SSE_REGS */
+
 /* The same information, inverted:
    Return the class number of the smallest class containing
    reg number REGNO.  This could be a conditional expression
@@ -1533,6 +1543,16 @@ enum reg_class
 #define BND_REG_P(X) (REG_P (X) && BND_REGNO_P (REGNO (X)))
 #define BND_REGNO_P(N) IN_RANGE ((N), FIRST_BND_REG, LAST_BND_REG)
 
+#define MOD4_SSE_REG_P(X) (REG_P (X) && MOD4_SSE_REGNO_P (REGNO (X)))
+#define MOD4_SSE_REGNO_P(N) ((N) == XMM0_REG  \
+			     || (N) == XMM4_REG  \
+			     || (N) == XMM8_REG  \
+			     || (N) == XMM12_REG \
+			     || (N) == XMM16_REG \
+			     || (N) == XMM20_REG \
+			     || (N) == XMM24_REG \
+			     || (N) == XMM28_REG)
+
 /* First floating point reg */
 #define FIRST_FLOAT_REG FIRST_STACK_REG
 #define STACK_TOP_P(X) (REG_P (X) && REGNO (X) == FIRST_FLOAT_REG)
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 9eef558..68e650b 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -25,11 +25,17 @@ config/i386/i386-opts.h
 Variable
 HOST_WIDE_INT ix86_isa_flags = TARGET_64BIT_DEFAULT | TARGET_SUBTARGET_ISA_DEFAULT
 
+Variable
+HOST_WIDE_INT ix86_isa_flags2 = 0
+
 ; A mask of ix86_isa_flags that includes bit X if X was set or cleared
 ; on the command line.
 Variable
 HOST_WIDE_INT ix86_isa_flags_explicit
 
+Variable
+HOST_WIDE_INT ix86_isa_flags2_explicit
+
 ; Additional target flags
 Variable
 int ix86_target_flags
@@ -74,6 +80,10 @@ unsigned char branch_cost
 
 ;; which flags were passed by the user
 TargetSave
+HOST_WIDE_INT x_ix86_isa_flags2_explicit
+
+;; which flags were passed by the user
+TargetSave
 HOST_WIDE_INT x_ix86_isa_flags_explicit
 
 ;; whether -mtune was not specified
@@ -687,6 +697,14 @@ mavx512vbmi
 Target Report Mask(ISA_AVX512VBMI) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512VBMI built-in functions and code generation.
 
+mavx5124fmaps
+Target Report Mask(ISA_AVX5124FMAPS) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX5124FMAPS built-in functions and code generation.
+
+mavx5124vnniw
+Target Report Mask(ISA_AVX5124VNNIW) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX5124VNNIW built-in functions and code generation.
+
 mfma
 Target Report Mask(ISA_FMA) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and FMA built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index 9333111..3fd3c9c 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -68,6 +68,10 @@
 
 #include <avx512vbmivlintrin.h>
 
+#include <avx5124fmapsintrin.h>
+
+#include <avx5124vnniwintrin.h>
+
 #include <shaintrin.h>
 
 #include <lzcntintrin.h>
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 14fcd67..78bf2a4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -146,6 +146,12 @@
 
   ;; For AVX512VBMI support
   UNSPEC_VPMULTISHIFT
+
+  ;; For AVX5124FMAPS/AVX5124VNNIW support
+  UNSPEC_VP4FMADD
+  UNSPEC_VP4FNMADD
+  UNSPEC_VP4DPWSSD
+  UNSPEC_VP4DPWSSDS
 ])
 
 (define_c_enum "unspecv" [
@@ -19397,3 +19403,274 @@
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
+
+(define_mode_iterator IMOD4
+  [(V64SF "TARGET_AVX5124FMAPS") (V64SI "TARGET_AVX5124VNNIW")])
+
+(define_mode_attr imod4_narrow
+  [(V64SF "V16SF") (V64SI "V16SI")])
+
+(define_insn "mov<mode>"
+  [(set (match_operand:IMOD4 0 "nonimmediate_operand")
+	(match_operand:IMOD4 1 "general_operand"))]
+  "TARGET_AVX512F"
+  "#")
+
+(define_split
+  [(set (match_operand:IMOD4 0 "register_operand")
+	(match_operand:IMOD4 1 "nonimmediate_operand"))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (subreg:<imod4_narrow> (match_dup 0) 0)
+	(subreg:<imod4_narrow> (match_dup 1) 0))
+   (set (subreg:<imod4_narrow> (match_dup 0) 64)
+	(subreg:<imod4_narrow> (match_dup 1) 64))
+   (set (subreg:<imod4_narrow> (match_dup 0) 128)
+	(subreg:<imod4_narrow> (match_dup 1) 128))
+   (set (subreg:<imod4_narrow> (match_dup 0) 192)
+	(subreg:<imod4_narrow> (match_dup 1) 192))])
+
+(define_insn "avx5124fmaddps_4fmaddps"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(unspec:V16SF
+	  [(match_operand:V16SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "h")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddps_mask"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	     [(match_operand:V64SF 1 "register_operand" "h")
+	      (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V16SF 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddps_maskz"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	    [(match_operand:V16SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "h")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V16SF 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(unspec:V4SF
+	  [(match_operand:V4SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "h")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%3, %x2, %0|%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss_mask"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V64SF 1 "register_operand" "h")
+	     (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V4SF 3 "register_operand" "0")
+	  (match_operand:QI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%2, %x1, %0%{%4%}|%{%4%}%0, %x1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss_maskz"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V4SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "h")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V4SF 4 "const0_operand" "C")
+	  (match_operand:QI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%3, %x2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(unspec:V16SF
+	  [(match_operand:V16SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "h")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps_mask"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	     [(match_operand:V64SF 1 "register_operand" "h")
+	      (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V16SF 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps_maskz"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	    [(match_operand:V16SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "h")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V16SF 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(unspec:V4SF
+	  [(match_operand:V4SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "h")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%3, %x2, %0|%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss_mask"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V64SF 1 "register_operand" "h")
+	     (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V4SF 3 "register_operand" "0")
+	  (match_operand:QI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%2, %x1, %0%{%4%}|%{%4%}%0, %x1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss_maskz"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V4SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "h")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V4SF 4 "const0_operand" "C")
+	  (match_operand:QI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%3, %x2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(unspec:V16SI
+	  [(match_operand:V16SI 1 "register_operand" "0")
+	   (match_operand:V64SI 2 "register_operand" "h")
+	   (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSD))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd_mask"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	     [(match_operand:V64SI 1 "register_operand" "h")
+	      (match_operand:V4SI 2 "memory_operand" "m")] UNSPEC_VP4DPWSSD)
+	  (match_operand:V16SI 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd_maskz"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	    [(match_operand:V16SI 1 "register_operand" "0")
+	     (match_operand:V64SI 2 "register_operand" "h")
+	     (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSD)
+	  (match_operand:V16SI 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(unspec:V16SI
+	  [(match_operand:V16SI 1 "register_operand" "0")
+	   (match_operand:V64SI 2 "register_operand" "h")
+	   (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSDS))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds_mask"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	     [(match_operand:V64SI 1 "register_operand" "h")
+	      (match_operand:V4SI 2 "memory_operand" "m")] UNSPEC_VP4DPWSSDS)
+	  (match_operand:V16SI 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds_maskz"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	    [(match_operand:V16SI 1 "register_operand" "0")
+	     (match_operand:V64SI 2 "register_operand" "h")
+	     (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSDS)
+	  (match_operand:V16SI 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index 92ca055..42ab5f0 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -973,10 +973,10 @@ inline __attribute__((__always_inline__))\n\
 #else\n\
 extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
 #endif\n\
-unsigned char\n\
+unsigned short\n\
 mode_size_inline (machine_mode mode)\n\
 {\n\
-  extern %sunsigned char mode_size[NUM_MACHINE_MODES];\n\
+  extern %sunsigned short mode_size[NUM_MACHINE_MODES];\n\
   gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
   switch (mode)\n\
     {\n", adj_bytesize ? "" : "const ");
@@ -1301,7 +1301,7 @@ emit_mode_size (void)
   int c;
   struct mode_data *m;
 
-  print_maybe_const_decl ("%sunsigned char", "mode_size",
+  print_maybe_const_decl ("%sunsigned short", "mode_size",
 			  "NUM_MACHINE_MODES", bytesize);
 
   for_all_modes (c, m)
@@ -1492,7 +1492,7 @@ emit_mode_base_align (void)
   int c;
   struct mode_data *m;
 
-  print_maybe_const_decl ("%sunsigned char",
+  print_maybe_const_decl ("%sunsigned short",
 			  "mode_base_align", "NUM_MACHINE_MODES",
 			  alignment);
 
diff --git a/gcc/init-regs.c b/gcc/init-regs.c
index 3fbaee1..2ee4bd4 100644
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
@@ -104,6 +104,7 @@ initialize_uninitialized_regs (void)
 		  bitmap_set_bit (already_genned, regno);
 
 		  start_sequence ();
+		  emit_clobber (reg);
 		  emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
 		  move_insn = get_insns ();
 		  end_sequence ();
diff --git a/gcc/machmode.h b/gcc/machmode.h
index 3dcadd8..d924e83 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -179,7 +179,7 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
 
 /* Get the size in bytes and bits of an object of mode MODE.  */
 
-extern CONST_MODE_SIZE unsigned char mode_size[NUM_MACHINE_MODES];
+extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
 #if GCC_VERSION >= 4001
 #define GET_MODE_SIZE(MODE) \
   ((unsigned short) (__builtin_constant_p (MODE) \
@@ -330,7 +330,7 @@ extern machine_mode get_best_mode (int, int,
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
 
-extern CONST_MODE_BASE_ALIGN unsigned char mode_base_align[NUM_MACHINE_MODES];
+extern CONST_MODE_BASE_ALIGN unsigned short mode_base_align[NUM_MACHINE_MODES];
 
 extern unsigned get_mode_alignment (machine_mode);
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 03dcd5b..a61d4e5 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,27 @@
+2016-11-10  Kirill Yukhin  <kirill.yukhin@gmail.com>
+	    Andrew Senkevich <andrew.senkevich@intel.com>
+
+	* gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: New test.
+	* gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
+	* gcc.target/i386/avx5124fmaps-check.h: Ditto.
+	* gcc.target/i386/avx5124vnniw-check.h: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto.
+	* gcc.target/i386/avx512f-helper.h: Add avx5124fmaps-check.h,
+	avx5124vnniw-check.h.
+	* gcc.target/i386/i386.exp (check_effective_target_avx5124fmaps,
+	check_effective_target_avx5124vnniw): New.
+	* gcc.target/i386/m128-check.h (ESP_FLOAT, ESP_DOUBLE):
+	Set under ifndef.
+	* gcc.target/i386/sse-12.c: Add -mavx5124fmaps.
+	* gcc.target/i386/sse-13.c: Ditto.	    
+
 2016-11-10  Jakub Jelinek  <jakub@redhat.com>
 
 	* gfortran.dg/gomp/pr77516.f90: Add dg-warning.
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c
new file mode 100644
index 0000000..1035f25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512 a, b, c, d, e, f, g, x1, x2, x3;
+__m128 *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4fmadd_ps (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4fmadd_ps (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4fmadd_ps (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c
new file mode 100644
index 0000000..f977b65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-require-effective-target avx5124fmaps } */
+
+#define ESP_FLOAT 1.0
+
+#define AVX5124FMAPS
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (float *src1, float* src2, float *src3,
+      float *src4, float* prev_dst, float *mult, float *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      dst[i] = (double)prev_dst[i]
+	+ (double)src1[i] * (double)mult[0]
+	+ (double)src2[i] * (double)mult[1]
+	+ (double)src3[i] * (double)mult[2]
+	+ (double)src4[i] * (double)mult[3];
+    }
+}
+
+void
+TEST (void)
+{
+  int i, sign;
+  UNION_TYPE (AVX512F_LEN,) src1, src2, src3, src4, src5, dst, res1, res2, res3;
+  UNION_TYPE (128,) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  float res_ref[SIZE];
+
+  sign = -1;
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = 1.5 + 34.67 * i * sign;
+      src2.a[i] = -22.17 * i * sign;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+      sign = sign * -1;
+    }
+  for (i = 0; i < 4; i++)
+    mult.a[i] = 3.1415 + i * 2.71828;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4fmadd_ps)       (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4fmadd_ps)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4fmadd_ps) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c
new file mode 100644
index 0000000..2f1a558
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m128 a, b, c, d, e, f, x1, x2, x3;
+__m128 *mem;
+__mmask8 m;
+
+int foo ()
+{
+  x1 = _mm_4fmadd_ss (a, b, c, d, e, mem);
+  x2 = _mm_mask_4fmadd_ss (a, m, b, c, d, e, mem);
+  x3 = _mm_maskz_4fmadd_ss (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c
new file mode 100644
index 0000000..45bd7da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512 a, b, c, d, e, f, g, x1, x2, x3;
+__m128 *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4fnmadd_ps (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4fnmadd_ps (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4fnmadd_ps (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c
new file mode 100644
index 0000000..3c75fcf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-require-effective-target avx5124fmaps } */
+
+#define ESP_FLOAT 1.0
+
+#define AVX5124FMAPS
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (float *src1, float* src2, float *src3,
+      float *src4, float* prev_dst, float *mult, float *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      dst[i] = (double)prev_dst[i]
+	- (double)src1[i] * (double)mult[0]
+	- (double)src2[i] * (double)mult[1]
+	- (double)src3[i] * (double)mult[2]
+	- (double)src4[i] * (double)mult[3];
+    }
+}
+
+void
+TEST (void)
+{
+  int i, sign;
+  UNION_TYPE (AVX512F_LEN,) src1, src2, src3, src4, src5, dst, res1, res2, res3;
+  UNION_TYPE (128,) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  float res_ref[SIZE];
+
+  sign = -1;
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = 1.5 + 34.67 * i * sign;
+      src2.a[i] = -22.17 * i * sign;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+      sign = sign * -1;
+    }
+  for (i = 0; i < 4; i++)
+    mult.a[i] = 3.1415 + i * 2.71828;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4fnmadd_ps)       (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4fnmadd_ps)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4fnmadd_ps) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c
new file mode 100644
index 0000000..1755afb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+
+#include <x86intrin.h>
+
+__m128 a, b, c, d, e, f, x1, x2, x3;
+__m128 *mem;
+__mmask8 m;
+
+int foo ()
+{
+  x1 = _mm_4fnmadd_ss (a, b, c, d, e, mem);
+  x2 = _mm_mask_4fnmadd_ss (a, m, b, c, d, e, mem);
+  x3 = _mm_maskz_4fnmadd_ss (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h b/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
new file mode 100644
index 0000000..eba93cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
@@ -0,0 +1,47 @@
+#include <stdlib.h>
+#include "cpuid.h"
+#include "m512-check.h"
+#include "avx512f-os-support.h"
+
+static void avx5124fmaps_test (void);
+
+static void __attribute__ ((noinline)) do_test (void)
+{
+  avx5124fmaps_test ();
+}
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AVX512_4FNMA test only if host has the support.  */
+  if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE))
+    {
+      if (__get_cpuid_max (0, NULL) < 7)
+	return 0;
+
+      __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+      if ((avx512f_os_support ()) && ((edx & bit_AVX5124FMAPS) == bit_AVX5124FMAPS))
+	{
+	  do_test ();
+#ifdef DEBUG
+	  printf ("PASSED\n");
+#endif
+	  return 0;
+	}
+#ifdef DEBUG
+      printf ("SKIPPED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h b/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
new file mode 100644
index 0000000..a706cfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
@@ -0,0 +1,47 @@
+#include <stdlib.h>
+#include "cpuid.h"
+#include "m512-check.h"
+#include "avx512f-os-support.h"
+
+static void avx5124vnniw_test (void);
+
+static void __attribute__ ((noinline)) do_test (void)
+{
+  avx5124vnniw_test ();
+}
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AVX512_4FNMA test only if host has the support.  */
+  if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE))
+    {
+      if (__get_cpuid_max (0, NULL) < 7)
+	return 0;
+
+      __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+      if ((avx512f_os_support ()) && ((edx & bit_AVX5124VNNIW) == bit_AVX5124VNNIW))
+	{
+	  do_test ();
+#ifdef DEBUG
+	  printf ("PASSED\n");
+#endif
+	  return 0;
+	}
+#ifdef DEBUG
+      printf ("SKIPPED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c
new file mode 100644
index 0000000..a234fdd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512i a, b, c, d, e, f, g, x1, x2, x3;
+__m128i *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4dpwssd_epi32 (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4dpwssd_epi32 (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4dpwssd_epi32 (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c
new file mode 100644
index 0000000..a0a6825
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c
@@ -0,0 +1,79 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-require-effective-target avx5124vnniw } */
+
+#define AVX5124VNNIW
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (short *src1, short* src2, short *src3,
+      short *src4, int* prev_dst, short *mult, int *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      int p1dword, p2dword;
+      dst[i] = prev_dst[i];
+      p1dword = (int)(src1[2*i  ]) * (int)(mult[0]);
+      p2dword = (int)(src1[2*i+1]) * (int)(mult[1]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src2[2*i  ]) * (int)(mult[2]);
+      p2dword = (int)(src2[2*i+1]) * (int)(mult[3]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src3[2*i  ]) * (int)(mult[4]);
+      p2dword = (int)(src3[2*i+1]) * (int)(mult[5]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src4[2*i  ]) * (int)(mult[6]);
+      p2dword = (int)(src4[2*i+1]) * (int)(mult[7]);
+      dst[i] += p1dword + p2dword;
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_w) src1, src2, src3, src4;
+  UNION_TYPE (AVX512F_LEN, i_d) src5, dst, res1, res2, res3;
+  UNION_TYPE (128, i_w) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  int res_ref[SIZE];
+
+  for (i = 0; i < SIZE * 2; i++)
+    {
+      src1.a[i] = 2 + 7 * i % 291;
+      src2.a[i] = 3 + 11 * (i % 377) * i;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+    }
+  for (i = 0; i < 8; i++)
+    mult.a[i] = 3 + i * 2;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4dpwssd_epi32)       (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4dpwssd_epi32)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4dpwssd_epi32) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c
new file mode 100644
index 0000000..d1bed37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512i a, b, c, d, e, f, g, x1, x2, x3;
+__m128i *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4dpwssds_epi32 (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4dpwssds_epi32 (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4dpwssds_epi32 (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c
new file mode 100644
index 0000000..e1e5536
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c
@@ -0,0 +1,98 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-require-effective-target avx5124vnniw } */
+
+#define DEFAULT_VALUE 0x7ffffffe
+
+#define AVX5124VNNIW
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (short *src1, short* src2, short *src3,
+      short *src4, int* prev_dst, short *mult, int *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      int p1dword, p2dword;
+      long long int tmp;
+      dst[i] = prev_dst[i];
+      p1dword = (int)(src1[2*i  ]) * (int)(mult[0]);
+      p2dword = (int)(src1[2*i+1]) * (int)(mult[1]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src2[2*i  ]) * (int)(mult[2]);
+      p2dword = (int)(src2[2*i+1]) * (int)(mult[3]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src3[2*i  ]) * (int)(mult[4]);
+      p2dword = (int)(src3[2*i+1]) * (int)(mult[5]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src4[2*i  ]) * (int)(mult[6]);
+      p2dword = (int)(src4[2*i+1]) * (int)(mult[7]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_w) src1, src2, src3, src4;
+  UNION_TYPE (AVX512F_LEN, i_d) src5, dst, res1, res2, res3;
+  UNION_TYPE (128, i_w) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  int res_ref[SIZE];
+
+  for (i = 0; i < SIZE * 2; i++)
+    {
+      src1.a[i] = 2 + 7 * i % 291;
+      src2.a[i] = 3 + 11 * (i % 377) * i;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+    }
+  for (i = 0; i < 8; i++)
+    mult.a[i] = 3 + i * 2;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4dpwssds_epi32)	     (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4dpwssds_epi32)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4dpwssds_epi32) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
index 5923085..6aca0d6 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
@@ -22,6 +22,10 @@
 #include "avx512ifma-check.h"
 #elif defined (AVX512VBMI) && !defined (AVX512VL)
 #include "avx512vbmi-check.h"
+#elif defined (AVX5124FMAPS) && !defined (AVX512VL)
+#include "avx5124fmaps-check.h"
+#elif defined (AVX5124VNNIW) && !defined (AVX512VL)
+#include "avx5124vnniw-check.h"
 #elif defined (AVX512VL)
 #include "avx512vl-check.h"
 #endif
@@ -33,7 +37,9 @@
 /* Value to be written into destination.
    We have one value for all types so it must be small enough
    to fit into signed char.  */
+#ifndef DEFAULT_VALUE
 #define DEFAULT_VALUE 117
+#endif
 
 #define MAKE_MASK_MERGE(NAME, TYPE)				      \
 static void							      \
@@ -132,6 +138,12 @@ avx512ifma_test (void) { test_512 (); }
 #elif defined (AVX512VBMI) && !defined (AVX512VL)
 void
 avx512vbmi_test (void) { test_512 (); }
+#elif defined (AVX5124FMAPS) && !defined (AVX512VL)
+void
+avx5124fmaps_test (void) { test_512 (); }
+#elif defined (AVX5124VNNIW) && !defined (AVX512VL)
+void
+avx5124vnniw_test (void) { test_512 (); }
 #elif defined (AVX512VL)
 void
 avx512vl_test (void) { test_256 (); test_128 (); }
diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp
index 877d224..4057240 100644
--- a/gcc/testsuite/gcc.target/i386/i386.exp
+++ b/gcc/testsuite/gcc.target/i386/i386.exp
@@ -366,6 +366,48 @@ proc check_effective_target_avx512vbmi { } {
     } "-mavx512vbmi" ]
 }
 
+# Return 1 if avx512_4fmaps instructions can be compiled.
+proc check_effective_target_avx5124fmaps { } {
+    return [check_no_compiler_messages avx5124fmaps object {
+	typedef float __v16sf __attribute__ ((__vector_size__ (64)));
+	typedef float __v4sf __attribute__ ((__vector_size__ (16)));
+
+	__v16sf
+	_mm512_mask_4fmadd_ps (__v16sf __DEST, __v16sf __A, __v16sf __B, __v16sf __C,
+			       __v16sf __D, __v16sf __E, __v4sf *__F)
+	{
+	    return (__v16sf) __builtin_ia32_4fmaddps_mask ((__v16sf) __A,
+							  (__v16sf) __B,
+							  (__v16sf) __C,
+							  (__v16sf) __D,
+							  (__v16sf) __E,
+							  (const __v4sf *) __F,
+							  (__v16sf) __DEST,
+							  0xffff);
+	}
+    } "-mavx5124fmaps" ]
+}
+
+# Return 1 if avx512_4vnniw instructions can be compiled.
+proc check_effective_target_avx5124vnniw { } {
+    return [check_no_compiler_messages avx5124vnniw object {
+	typedef int __v16si __attribute__ ((__vector_size__ (64)));
+	typedef int __v4si __attribute__ ((__vector_size__ (16)));
+
+	__v16si
+	_mm512_4dpwssd_epi32 (__v16si __A, __v16si __B, __v16si __C,
+			      __v16si __D, __v16si __E, __v4si *__F)
+	{
+	    return (__v16si) __builtin_ia32_vp4dpwssd ((__v16si) __B,
+						       (__v16si) __C,
+						       (__v16si) __D,
+						       (__v16si) __E,
+						       (__v16si) __A,
+						       (const __v4si *) __F);
+	}
+    } "-mavx5124vnniw" ]
+}
+
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
diff --git a/gcc/testsuite/gcc.target/i386/m128-check.h b/gcc/testsuite/gcc.target/i386/m128-check.h
index abb792b..48b2332 100644
--- a/gcc/testsuite/gcc.target/i386/m128-check.h
+++ b/gcc/testsuite/gcc.target/i386/m128-check.h
@@ -108,8 +108,12 @@ CHECK_EXP (union128d, double, "%f")
 
 CHECK_EXP (union128, float, "%f")
 
+#ifndef ESP_FLOAT
 #define ESP_FLOAT 0.000001
+#endif
+#ifndef ESP_DOUBLE
 #define ESP_DOUBLE 0.000001
+#endif
 #define CHECK_ARRAY(ARRAY, TYPE, FMT)                   \
 static int                                              \
 __attribute__((noinline, unused))                       \
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index f0f5457..94990af 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@
    popcntintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma -mavx5124fmaps -mclwb -mmwaitx -mclzero -mpku" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 80d8c20..4e4ed11 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma -mavx5124fmaps -mclwb -mmwaitx -mclzero -mpku" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-10 17:14 ` Vladimir N Makarov
@ 2016-11-10 17:19   ` Andrew Senkevich
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-10 17:19 UTC (permalink / raw)
  To: Vladimir N Makarov; +Cc: gcc-patches, Kirill Yukhin

2016-11-10 20:14 GMT+03:00 Vladimir N Makarov <vmakarov@redhat.com>:
>
>
> On 11/10/2016 11:27 AM, Andrew Senkevich wrote:
>>
>> Hi,
>>
>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>
>> It requires additional patch for register allocator from Vladimir
>> Makarov to be committed before.
>>
>>
> I've just committed the necessary patch.

Thanks, Vladimir.


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-10 17:18   ` Andrew Senkevich
@ 2016-11-11 11:16     ` Uros Bizjak
  2016-11-14 18:28       ` Andrew Senkevich
  2016-11-15 12:55       ` Andrew Senkevich
  0 siblings, 2 replies; 29+ messages in thread
From: Uros Bizjak @ 2016-11-11 11:16 UTC (permalink / raw)
  To: Andrew Senkevich
  Cc: Jakub Jelinek, gcc-patches, Vladimir Makarov, Kirill Yukhin

On Thu, Nov 10, 2016 at 6:18 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2016-11-10 19:36 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
>> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
>>> Hi,
>>>
>>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>>
>>> It requires additional patch for register allocator from Vladimir
>>> Makarov to be committed before.
>>
>> Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
>> tabs), can you repost as attachment or configure your MUA not to do this?
>>
>> Just a couple of random nits follow:
>>
>>>         * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.
>>
>> This mentions an option that doesn't exist, is that s/dd// ?
>
> Yes.
> Attached fixed version.

A couple of questions and comments below.

You are introducing flag2 ABI option flags. There are no tests for
corresponding __target__ attribute, please add some tests, similar to
gcc.target/i386/funcspec-?.c. These can be in a follow-up patch.

Please add new option to g++.dg/other/i386-{2,3}.C tests. These are
like gcc.target/i386/sse-{22,23}.c for c++.

Also, I guess we want to support these new options with
__builtin_cpu_supports. Please add this functionality in a follow-up
patch.

+(define_register_constraint "h" "TARGET_AVX512F ? MOD4_SSE_REGS : NO_REGS"
+ "Any EVEX encodable SSE register, which has number factor of four.")
+
No, we are extremely low on a single-letter constraints. We will use
these for possible future new register sets. Use Yv or something
similar instead.

+//additional structure for isa flags

Please use c comments throughout the patch.

@@ -1465,11 +1472,14 @@ enum reg_class
 {   0x11ffff,    0x1fe0,    0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,   0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,   0x1f },       /* FLOAT_INT_SSE_REGS */        \
-       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */            \
        { 0x0,       0x0, 0x1fe0 },       /* MASK_REGS */                 \
-{ 0xffffffff,0xffffffff,0x1ffff }                                        \
+{ 0x1fe00000,0xffffe000,   0x1f },       /* MOD4_SSE_REGS */         \
+{ 0xffffffff,0xffffffff,0x1ffff }        \
 }

+/* { 0x02200000,0x22222000,   0x02 },*/       /* MOD4_SSE_REGS */
+

Please remove commented out code. Also, please fix whitespace at the new entry.

+mavx5124fmaps
+Target Report Mask(ISA_AVX5124FMAPS) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and
AVX512F and AVX5124FMAPS built-in functions and code generation.
+
+mavx5124vnniw
+Target Report Mask(ISA_AVX5124VNNIW) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and
AVX512F and AVX5124VNNIW built-in functions and code generation.

Too much "and"s in the description.

--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
--- a/gcc/machmode.h
+++ b/gcc/machmode.h

These are middle-end changes, you will need a separate review for these.

The x86 part of the patch is OK with the above changes and additional
target attribute test for flags2 ISA features..

Uros.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-10 16:27 [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions Andrew Senkevich
  2016-11-10 16:36 ` Jakub Jelinek
  2016-11-10 17:14 ` Vladimir N Makarov
@ 2016-11-11 11:30 ` Jakub Jelinek
  2016-11-14 18:29   ` Andrew Senkevich
  2 siblings, 1 reply; 29+ messages in thread
From: Jakub Jelinek @ 2016-11-11 11:30 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: gcc-patches, Vladimir Makarov, Kirill Yukhin

Hi!

I've noticed preexisting:

On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:

> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -84,6 +84,7 @@ VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
>  VECTOR_MODES (FLOAT, 32);     /*        V16HF V8SF V4DF */
>  VECTOR_MODES (FLOAT, 64);     /*       V32HF V16SF V8DF */
>  VECTOR_MODES (FLOAT, 128);    /*      V64HF V32SF V16DF */

The VECTOR_MODES (FLOAT, comments don't really match reality, shall we fix
that?  None of them create V*HF mode, but they do create V*TF mode.

	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-11 11:16     ` Uros Bizjak
@ 2016-11-14 18:28       ` Andrew Senkevich
  2016-11-15 10:04         ` Uros Bizjak
  2016-11-15 12:55       ` Andrew Senkevich
  1 sibling, 1 reply; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-14 18:28 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, gcc-patches, Vladimir Makarov, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 275 bytes --]

2016-11-11 14:16 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> The x86 part of the patch is OK with the above changes and additional
> target attribute test for flags2 ISA features..

Fixed according your comments, I will followup with additional tests soon.


--
WBR,
Andrew

[-- Attachment #2: new_avx512_instructions_14.11.patch --]
[-- Type: application/octet-stream, Size: 105739 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9e93f79..93f5f35 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,84 @@
+2016-11-10  Kirill Yukhin  <kirill.yukhin@gmail.com>
+	    Andrew Senkevich <andrew.senkevich@intel.com>
+
+	* common/config/i386/i386-common.c
+	(OPTION_MASK_ISA_AVX5124FMAPS_SET,
+	OPTION_MASK_ISA_AVX5124FMAPS_UNSET,
+	OPTION_MASK_ISA_AVX5124VNNIW_SET,
+	OPTION_MASK_ISA_AVX5124VNNIW_UNSET): New.
+	(ix86_handle_option): Handle OPT_mavx5124fmaps,
+	OPT_mavx5124vnniw.
+	* config.gcc: Add avx5124fmapsintrin.h, avx5124vnniwintrin.h.
+	* config/i386/avx5124fmapsintrin.h: New file.
+	* config/i386/avx5124vnniwintrin.h: Ditto.
+	* config/i386/constraints.md (h): New constraint.
+	* config/i386/cpuid.h: (bit_AVX5124VNNIW,
+	bit_AVX5124FMAPS): New.
+	* config/i386/driver-i386.c (host_detect_local_cpu):
+	Detect avx5124fmaps, avx5124vnniw.
+	* config/i386/i386-builtin-types.def: Add types
+	V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI,
+	V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF,
+	V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF,
+	V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI,
+	V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI,
+	V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI.
+	* config/i386/i386-builtin.def (__builtin_ia32_4fmaddps_mask,
+	__builtin_ia32_4fmaddps, __builtin_ia32_4fmaddss,
+	__builtin_ia32_4fmaddss_mask, __builtin_ia32_4fnmaddps_mask,
+	__builtin_ia32_4fnmaddps, __builtin_ia32_4fnmaddss,
+	__builtin_ia32_4fnmaddss_mask, __builtin_ia32_vp4dpwssd,
+	__builtin_ia32_vp4dpwssd_mask, __builtin_ia32_vp4dpwssds,
+	__builtin_ia32_vp4dpwssds_mask): New.
+	* config/i386/i386-c.c (ix86_target_macros_internal):
+	Define __AVX5124FMAPS__, __AVX5124VNNIW__.
+	* config/i386/i386-modes.def: Fixed comment typos, added new
+	modes (VECTOR_MODES (FLOAT, 256), VECTOR_MODE (INT, SI, 64)).
+	* config/i386/i386.c (ix86_target_string): Add -mavx5124fmaps,
+	-mavx5124vnniw.
+	(PTA_AVX5124FMAPS, PTA_AVX5124VNNIW): Define.
+	(ix86_option_override_internal): Handle new options.
+	(ix86_valid_target_attribute_inner_p): Add avx5124fmaps,
+	avx5124vnniw.
+	(ix86_expand_builtin): Handle new builtins.
+	(ix86_additional_allocno_class_p): New.
+	* config/i386/i386.h (TARGET_AVX5124FMAPS,
+	TARGET_AVX5124FMAPS_P,
+	TARGET_AVX5124VNNIW,
+	TARGET_AVX5124VNNIW_P): Define.
+	(reg_class): Add MOD4_SSE_REGS.
+	(MOD4_SSE_REG_P, MOD4_SSE_REGNO_P): New.
+	* config/i386/i386.opt: Add mavx5124fmaps, mavx5124vnniw.
+	* config/i386/immintrin.h: Include avx5124fmapsintrin.h,
+	avx5124vnniwintrin.h.
+	* config/i386/sse.md (unspec): Add UNSPEC_VP4FMADD,
+	UNSPEC_VP4FNMADD,
+	UNSPEC_VP4DPWSSD, UNSPEC_VP4DPWSSDS.
+	(define_mode_iterator IMOD4): New.
+	(define_mode_attr imod4_narrow): Ditto.
+	(define_insn "mov<mode>"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddps"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddps_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddps_maskz"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddss"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddss_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddss_maskz"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddps"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddps_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddps_maskz"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddss"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddss_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddss_maskz"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssd"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssd_mask"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssd_maskz"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssds"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssds_mask"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssds_maskz"): Ditto.
+	* init-regs.c (initialize_uninitialized_regs): Add emit_clobber call.
+	* genmodes.c (mode_size_inline): Extend return type.
+	* machmode.h (mode_size, mode_base_align): Extend type.
+
 2016-11-14  Uros Bizjak  <ubizjak@gmail.com>
 
 	* config/i386/i386.md (*andndi3_doubleword): Merge operand constraints.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d522e24..819e836 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,31 @@
+2016-11-10  Kirill Yukhin  <kirill.yukhin@gmail.com>
+	    Andrew Senkevich <andrew.senkevich@intel.com>
+
+	* gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: New test.
+	* gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
+	* gcc.target/i386/avx5124fmaps-check.h: Ditto.
+	* gcc.target/i386/avx5124vnniw-check.h: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto.
+	* gcc.target/i386/avx512f-helper.h: Add avx5124fmaps-check.h,
+	avx5124vnniw-check.h.
+	* gcc.target/i386/i386.exp (check_effective_target_avx5124fmaps,
+	check_effective_target_avx5124vnniw): New.
+	* gcc.target/i386/m128-check.h (ESP_FLOAT, ESP_DOUBLE):
+	Set under ifndef.
+	* gcc.target/i386/sse-12.c: Add -mavx5124fmaps, -mavx5124vnniw.
+	* gcc.target/i386/sse-13.c: Ditto.
+	* g++.dg/other/i386-2.C: Ditto.
+	* g++.dg/other/i386-3.C: Ditto.
+	* gcc.target/i386/sse-22.c: Ditto.
+	* gcc.target/i386/sse-23.c: Ditto.
+
 2016-11-14  Janus Weil  <janus@gcc.gnu.org>
 
         PR fortran/78300
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index d201154..98224f5 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -76,6 +76,8 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512F_SET)
 #define OPTION_MASK_ISA_AVX512VBMI_SET \
   (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512BW_SET)
+#define OPTION_MASK_ISA_AVX5124FMAPS_SET OPTION_MASK_ISA_AVX5124FMAPS
+#define OPTION_MASK_ISA_AVX5124VNNIW_SET OPTION_MASK_ISA_AVX5124VNNIW
 #define OPTION_MASK_ISA_RTM_SET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
@@ -179,6 +181,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL
 #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA
 #define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI
+#define OPTION_MASK_ISA_AVX5124FMAPS_UNSET OPTION_MASK_ISA_AVX5124FMAPS
+#define OPTION_MASK_ISA_AVX5124VNNIW_UNSET OPTION_MASK_ISA_AVX5124VNNIW
 #define OPTION_MASK_ISA_RTM_UNSET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
@@ -399,6 +403,12 @@ ix86_handle_option (struct gcc_options *opts,
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
+
+	  /* Turn off additional isa flags.  */
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
 	}
       return true;
 
@@ -441,6 +451,36 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mavx5124fmaps:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124FMAPS_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	}
+      return true;
+
+    case OPT_mavx5124vnniw:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124VNNIW_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+	}
+      return true;
+
     case OPT_mavx512dq:
       if (value)
 	{
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3e0be22..20413fb 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -373,8 +373,8 @@ i[34567]86-*-*)
 		       xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
 		       avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
-		       avx512vbmivlintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h"
+		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
+		       clwbintrin.h mwaitxintrin.h clzerointrin.h pkuintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -395,8 +395,8 @@ x86_64-*-*)
 		       xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
 		       avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
-		       avx512vbmivlintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h"
+		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
+		       clwbintrin.h mwaitxintrin.h clzerointrin.h pkuintrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx5124fmapsintrin.h b/gcc/config/i386/avx5124fmapsintrin.h
new file mode 100644
index 0000000..6113ee9
--- /dev/null
+++ b/gcc/config/i386/avx5124fmapsintrin.h
@@ -0,0 +1,216 @@
+/* Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx5124fmapsintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _AVX5124FMAPSINTRIN_H_INCLUDED
+#define _AVX5124FMAPSINTRIN_H_INCLUDED
+
+#ifndef __AVX5124FMAPS__
+#pragma GCC push_options
+#pragma GCC target("avx5124fmaps")
+#define __DISABLE_AVX5124FMAPS__
+#endif /* __AVX5124FMAPS__ */
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4fmadd_ps (__m512 __A, __m512 __B, __m512 __C,
+		  __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps ((__v16sf) __B,
+					   (__v16sf) __C,
+					   (__v16sf) __D,
+					   (__v16sf) __E,
+					   (__v16sf) __A,
+					   (const __v4sf *) __F);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4fmadd_ps (__m512 __A, __mmask16 __U, __m512 __B,
+		       __m512 __C, __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps_mask ((__v16sf) __B,
+						(__v16sf) __C,
+						(__v16sf) __D,
+						(__v16sf) __E,
+						(__v16sf) __A,
+						(const __v4sf *) __F,
+						(__v16sf) __A,
+						(__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4fmadd_ps (__mmask16 __U,
+			__m512 __A, __m512 __B, __m512 __C,
+			__m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps_mask ((__v16sf) __B,
+						(__v16sf) __C,
+						(__v16sf) __D,
+						(__v16sf) __E,
+						(__v16sf) __A,
+						(const __v4sf *) __F,
+						(__v16sf) _mm512_setzero_ps (),
+						(__mmask16) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_4fmadd_ss (__m128 __A, __m128 __B, __m128 __C,
+	       __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss ((__v4sf) __B,
+					   (__v4sf) __C,
+					   (__v4sf) __D,
+					   (__v4sf) __E,
+					   (__v4sf) __A,
+					   (const __v4sf *) __F);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_4fmadd_ss (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C,
+		    __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
+						(__v4sf) __C,
+						(__v4sf) __D,
+						(__v4sf) __E,
+						(__v4sf) __A,
+						(const __v4sf *) __F,
+						(__v4sf) __A,
+						(__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_4fmadd_ss (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C,
+		     __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
+						(__v4sf) __C,
+						(__v4sf) __D,
+						(__v4sf) __E,
+						(__v4sf) __A,
+						(const __v4sf *) __F,
+						(__v4sf) _mm_setzero_ps (),
+						(__mmask8) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4fnmadd_ps (__m512 __A, __m512 __B, __m512 __C,
+		   __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps ((__v16sf) __B,
+					    (__v16sf) __C,
+					    (__v16sf) __D,
+					    (__v16sf) __E,
+					    (__v16sf) __A,
+					    (const __v4sf *) __F);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4fnmadd_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			__m512 __C, __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps_mask ((__v16sf) __B,
+						 (__v16sf) __C,
+						 (__v16sf) __D,
+						 (__v16sf) __E,
+						 (__v16sf) __A,
+						 (const __v4sf *) __F,
+						 (__v16sf) __A,
+						 (__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4fnmadd_ps (__mmask16 __U,
+			 __m512 __A, __m512 __B, __m512 __C,
+			 __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps_mask ((__v16sf) __B,
+						 (__v16sf) __C,
+						 (__v16sf) __D,
+						 (__v16sf) __E,
+						 (__v16sf) __A,
+						 (const __v4sf *) __F,
+						 (__v16sf) _mm512_setzero_ps (),
+						 (__mmask16) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_4fnmadd_ss (__m128 __A, __m128 __B, __m128 __C,
+		__m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss ((__v4sf) __B,
+					    (__v4sf) __C,
+					    (__v4sf) __D,
+					    (__v4sf) __E,
+					    (__v4sf) __A,
+					    (const __v4sf *) __F);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_4fnmadd_ss (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C,
+		     __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss_mask ((__v4sf) __B,
+						 (__v4sf) __C,
+						 (__v4sf) __D,
+						 (__v4sf) __E,
+						 (__v4sf) __A,
+						 (const __v4sf *) __F,
+						 (__v4sf) __A,
+						 (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_4fnmadd_ss (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C,
+		      __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss_mask ((__v4sf) __B,
+						 (__v4sf) __C,
+						 (__v4sf) __D,
+						 (__v4sf) __E,
+						 (__v4sf) __A,
+						 (const __v4sf *) __F,
+						 (__v4sf) _mm_setzero_ps (),
+						 (__mmask8) __U);
+}
+
+#ifdef __DISABLE_AVX5124FMAPS__
+#undef __DISABLE_AVX5124FMAPS__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX5124FMAPS__ */
+
+#endif /* _AVX5124FMAPSINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx5124vnniwintrin.h b/gcc/config/i386/avx5124vnniwintrin.h
new file mode 100644
index 0000000..392c6a5
--- /dev/null
+++ b/gcc/config/i386/avx5124vnniwintrin.h
@@ -0,0 +1,132 @@
+/* Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx5124vnniwintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _AVX5124VNNIWINTRIN_H_INCLUDED
+#define _AVX5124VNNIWINTRIN_H_INCLUDED
+
+#ifndef __AVX5124VNNIW__
+#pragma GCC push_options
+#pragma GCC target("avx5124vnniw")
+#define __DISABLE_AVX5124VNNIW__
+#endif /* __AVX5124VNNIW__ */
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4dpwssd_epi32 (__m512i __A, __m512i __B, __m512i __C,
+		      __m512i __D, __m512i __E, __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd ((__v16si) __B,
+					     (__v16si) __C,
+					     (__v16si) __D,
+					     (__v16si) __E,
+					     (__v16si) __A,
+					     (const __v4si *) __F);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4dpwssd_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+			   __m512i __C, __m512i __D, __m512i __E,
+			   __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd_mask ((__v16si) __B,
+						  (__v16si) __C,
+						  (__v16si) __D,
+						  (__v16si) __E,
+						  (__v16si) __A,
+						  (const __v4si *) __F,
+						  (__v16si) __A,
+						  (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4dpwssd_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+			    __m512i __C, __m512i __D, __m512i __E,
+			    __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd_mask ((__v16si) __B,
+						  (__v16si) __C,
+						  (__v16si) __D,
+						  (__v16si) __E,
+						  (__v16si) __A,
+						  (const __v4si *) __F,
+						  (__v16si) _mm512_setzero_ps (),
+						  (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4dpwssds_epi32 (__m512i __A, __m512i __B, __m512i __C,
+		       __m512i __D, __m512i __E, __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds ((__v16si) __B,
+					      (__v16si) __C,
+					      (__v16si) __D,
+					      (__v16si) __E,
+					      (__v16si) __A,
+					      (const __v4si *) __F);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4dpwssds_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+			    __m512i __C, __m512i __D, __m512i __E,
+			    __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds_mask ((__v16si) __B,
+						   (__v16si) __C,
+						   (__v16si) __D,
+						   (__v16si) __E,
+						   (__v16si) __A,
+						   (const __v4si *) __F,
+						   (__v16si) __A,
+						   (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4dpwssds_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+			     __m512i __C, __m512i __D, __m512i __E,
+			     __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds_mask ((__v16si) __B,
+						   (__v16si) __C,
+						   (__v16si) __D,
+						   (__v16si) __E,
+						   (__v16si) __A,
+						   (const __v4si *) __F,
+						   (__v16si) _mm512_setzero_ps (),
+						   (__mmask16) __U);
+}
+
+#ifdef __DISABLE_AVX5124VNNIW__
+#undef __DISABLE_AVX5124VNNIW__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX5124VNNIW__ */
+
+#endif /* _AVX5124VNNIWINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index d610336..b734ce4 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -112,6 +112,7 @@
 ;;  f	x87 register when 80387 floating point arithmetic is enabled
 ;;  r	SSE regs not requiring REX prefix when prefixes avoidance is enabled
 ;;	and all SSE regs otherwise
+;;  h   EVEX encodable SSE register with number factor of four
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -160,6 +161,9 @@
  "TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
  "@internal For AVX512VL, any EVEX encodable SSE register (@code{%xmm0-%xmm31}), otherwise any SSE register.")
 
+(define_register_constraint "Yh" "TARGET_AVX512F ? MOD4_SSE_REGS : NO_REGS"
+ "@internal Any EVEX encodable SSE register, which has number factor of four.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  f  FLAGS_REG
 ;;  g  GOT memory operand.
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 2a946bf..abe7c62 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -60,6 +60,8 @@
 #define bit_MWAITX      (1 << 29)
 
 /* %edx */
+#define bit_AVX5124VNNIW (1 << 2)
+#define bit_AVX5124FMAPS (1 << 3)
 #define bit_MMXEXT	(1 << 22)
 #define bit_LM		(1 << 29)
 #define bit_3DNOWP	(1 << 30)
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index e026482..f0d0e8f 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -414,6 +414,7 @@ const char *host_detect_local_cpu (int argc, const char **argv)
   unsigned int has_avx512dq = 0, has_avx512bw = 0, has_avx512vl = 0;
   unsigned int has_avx512vbmi = 0, has_avx512ifma = 0, has_clwb = 0;
   unsigned int has_mwaitx = 0, has_clzero = 0, has_pku = 0;
+  unsigned int has_avx5124fmaps = 0, has_avx5124vnniw = 0;
 
   bool arch;
 
@@ -501,6 +502,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
       has_prefetchwt1 = ecx & bit_PREFETCHWT1;
       has_avx512vbmi = ecx & bit_AVX512VBMI;
       has_pku = ecx & bit_OSPKE;
+      has_avx5124vnniw = edx & bit_AVX5124VNNIW;
+      has_avx5124fmaps = edx & bit_AVX5124FMAPS;
     }
 
   if (max_level >= 13)
@@ -1021,6 +1024,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
       const char *avx512vl = has_avx512vl ? " -mavx512vl" : " -mno-avx512vl";
       const char *avx512ifma = has_avx512ifma ? " -mavx512ifma" : " -mno-avx512ifma";
       const char *avx512vbmi = has_avx512vbmi ? " -mavx512vbmi" : " -mno-avx512vbmi";
+      const char *avx5124vnniw = has_avx5124vnniw ? " -mavx5124vnniw" : " -mno-avx5124vnniw";
+      const char *avx5124fmaps = has_avx5124fmaps ? " -mavx5124fmaps" : " -mno-avx5124fmaps";
       const char *clwb = has_clwb ? " -mclwb" : " -mno-clwb";
       const char *mwaitx  = has_mwaitx  ? " -mmwaitx"  : " -mno-mwaitx"; 
       const char *clzero  = has_clzero  ? " -mclzero"  : " -mno-clzero";
@@ -1033,8 +1038,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
 			fxsr, xsave, xsaveopt, avx512f, avx512er,
 			avx512cd, avx512pf, prefetchwt1, clflushopt,
 			xsavec, xsaves, avx512dq, avx512bw, avx512vl,
-			avx512ifma, avx512vbmi, clwb, mwaitx,
-			clzero, pku, NULL);
+			avx512ifma, avx512vbmi, avx5124fmaps, avx5124vnniw,
+			clwb, mwaitx, clzero, pku, NULL);
     }
 
 done:
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index b34cfda..4a38c12 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -526,6 +526,15 @@ DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, PV8DI, V8DI)
 
+DEF_FUNCTION_TYPE (V16SF, V16SF, V16SF, V16SF, V16SF, V16SF, PCV4SF, V16SF, UHI)
+DEF_FUNCTION_TYPE (V16SF, V16SF, V16SF, V16SF, V16SF, V16SF, PCV4SF)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, V4SF, V4SF, PCV4SF)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, V4SF, V4SF, PCV4SF, V4SF, UQI)
+
+DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI, V16SI, UHI)
+DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI)
+
+
 # Instructions returning mask
 DEF_FUNCTION_TYPE (UHI, UHI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 227526b..b23b70c 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2482,7 +2482,24 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ufix_truncv8dfv8di2_mask_round, "__bui
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
 
-BDESC_END (ROUND_ARGS, MPX)
+BDESC_END (ROUND_ARGS, ARGS2)
+
+/* AVX512_4FMAPS and AVX512_4VNNIW builtins with variable number of arguments. Defined in additional ix86_isa_flags2.  */
+BDESC_FIRST (args2, ARGS2,
+       OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddps_mask, "__builtin_ia32_4fmaddps_mask", IX86_BUILTIN_4FMAPS_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddps, "__builtin_ia32_4fmaddps", IX86_BUILTIN_4FMAPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddss, "__builtin_ia32_4fmaddss", IX86_BUILTIN_4FMASS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddss_mask, "__builtin_ia32_4fmaddss_mask", IX86_BUILTIN_4FMASS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddps_mask, "__builtin_ia32_4fnmaddps_mask", IX86_BUILTIN_4FNMAPS_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddps, "__builtin_ia32_4fnmaddps", IX86_BUILTIN_4FNMAPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddss, "__builtin_ia32_4fnmaddss", IX86_BUILTIN_4FNMASS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddss_mask, "__builtin_ia32_4fnmaddss_mask", IX86_BUILTIN_4FNMASS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssd, "__builtin_ia32_vp4dpwssd", IX86_BUILTIN_4DPWSSD, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssd_mask, "__builtin_ia32_vp4dpwssd_mask", IX86_BUILTIN_4DPWSSD_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssds, "__builtin_ia32_vp4dpwssds", IX86_BUILTIN_4DPWSSDS, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssds_mask, "__builtin_ia32_vp4dpwssds_mask", IX86_BUILTIN_4DPWSSDS_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
+
+BDESC_END (ARGS2, MPX)
 
 /* Builtins for MPX.  */
 BDESC_FIRST (mpx, MPX,
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 9bb80c0..6e56c83 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -28,14 +28,14 @@ along with GCC; see the file COPYING3.  If not see
 
 static bool ix86_pragma_target_parse (tree, tree);
 static void ix86_target_macros_internal
-  (HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
+  (HOST_WIDE_INT, HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
    void (*def_or_undef) (cpp_reader *, const char *));
 
-\f
 /* Internal function to either define or undef the appropriate system
    macros.  */
 static void
 ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
+			     HOST_WIDE_INT isa_flag2,
 			     enum processor_type arch,
 			     enum processor_type tune,
 			     enum fpmath_unit fpmath,
@@ -376,6 +376,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__AVX512VBMI__");
   if (isa_flag & OPTION_MASK_ISA_AVX512IFMA)
     def_or_undef (parse_in, "__AVX512IFMA__");
+  if (isa_flag2 & OPTION_MASK_ISA_AVX5124VNNIW)
+    def_or_undef (parse_in, "__AVX5124VNNIW__");
+  if (isa_flag2 & OPTION_MASK_ISA_AVX5124FMAPS)
+    def_or_undef (parse_in, "__AVX5124FMAPS__");
   if (isa_flag & OPTION_MASK_ISA_FMA)
     def_or_undef (parse_in, "__FMA__");
   if (isa_flag & OPTION_MASK_ISA_RTM)
@@ -462,6 +466,9 @@ ix86_pragma_target_parse (tree args, tree pop_target)
   HOST_WIDE_INT prev_isa;
   HOST_WIDE_INT cur_isa;
   HOST_WIDE_INT diff_isa;
+  HOST_WIDE_INT prev_isa2;
+  HOST_WIDE_INT cur_isa2;
+  HOST_WIDE_INT diff_isa2;
   enum processor_type prev_arch;
   enum processor_type prev_tune;
   enum processor_type cur_arch;
@@ -494,6 +501,9 @@ ix86_pragma_target_parse (tree args, tree pop_target)
   prev_isa  = prev_opt->x_ix86_isa_flags;
   cur_isa   = cur_opt->x_ix86_isa_flags;
   diff_isa  = (prev_isa ^ cur_isa);
+  prev_isa2  = prev_opt->x_ix86_isa_flags2;
+  cur_isa2   = cur_opt->x_ix86_isa_flags2;
+  diff_isa2  = (prev_isa2 ^ cur_isa2);
   prev_arch = (enum processor_type) prev_opt->arch;
   prev_tune = (enum processor_type) prev_opt->tune;
   cur_arch  = (enum processor_type) cur_opt->arch;
@@ -509,6 +519,7 @@ ix86_pragma_target_parse (tree args, tree pop_target)
 
   /* Undef all of the macros for that are no longer current.  */
   ix86_target_macros_internal (prev_isa & diff_isa,
+			       prev_isa2 & diff_isa2,
 			       prev_arch,
 			       prev_tune,
 			       (enum fpmath_unit) prev_opt->x_ix86_fpmath,
@@ -523,6 +534,7 @@ ix86_pragma_target_parse (tree args, tree pop_target)
 
   /* Define all of the macros for new options that were just turned on.  */
   ix86_target_macros_internal (cur_isa & diff_isa,
+			       cur_isa2 & diff_isa2,
 			       cur_arch,
 			       cur_tune,
 			       (enum fpmath_unit) cur_opt->x_ix86_fpmath,
@@ -583,6 +595,7 @@ ix86_target_macros (void)
   cpp_define (parse_in, "__GCC_ASM_FLAG_OUTPUTS__");
 
   ix86_target_macros_internal (ix86_isa_flags,
+			       ix86_isa_flags2,
 			       ix86_arch,
 			       ix86_tune,
 			       ix86_fpmath,
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index d524313..1899d06 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -79,11 +79,12 @@ VECTOR_MODES (INT, 16);       /*   V16QI V8HI V4SI V2DI */
 VECTOR_MODES (INT, 32);       /*  V32QI V16HI V8SI V4DI */
 VECTOR_MODES (INT, 64);       /* V64QI V32HI V16SI V8DI */
 VECTOR_MODES (INT, 128);      /* V128QI V64HI V32SI V16DI */
-VECTOR_MODES (FLOAT, 8);      /*              V4HF V2SF */
-VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
-VECTOR_MODES (FLOAT, 32);     /*        V16HF V8SF V4DF */
-VECTOR_MODES (FLOAT, 64);     /*       V32HF V16SF V8DF */
-VECTOR_MODES (FLOAT, 128);    /*      V64HF V32SF V16DF */
+VECTOR_MODES (FLOAT, 8);      /*                   V2SF */
+VECTOR_MODES (FLOAT, 16);     /*              V4SF V2DF */
+VECTOR_MODES (FLOAT, 32);     /*         V8SF V4DF V2TF */
+VECTOR_MODES (FLOAT, 64);     /*        V16SF V8DF V4TF */
+VECTOR_MODES (FLOAT, 128);    /*       V32SF V16DF V8TF */
+VECTOR_MODES (FLOAT, 256);    /*      V64SF V32DF V16TF */
 VECTOR_MODE (INT, TI, 1);     /*                   V1TI */
 VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
 VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
@@ -91,6 +92,7 @@ VECTOR_MODE (INT, QI, 2);     /*                   V2QI */
 VECTOR_MODE (INT, QI, 12);    /*                  V12QI */
 VECTOR_MODE (INT, QI, 14);    /*                  V14QI */
 VECTOR_MODE (INT, HI, 6);     /*                   V6HI */
+VECTOR_MODE (INT, SI, 64);    /* 		  V64SI */
 
 POINTER_BOUNDS_MODE (BND32, 8);
 POINTER_BOUNDS_MODE (BND64, 16);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..1da1abc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2579,7 +2579,7 @@ static int ix86_function_regparm (const_tree, const_tree);
 static void ix86_compute_frame_layout (struct ix86_frame *);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
 						 rtx, rtx, int);
-static void ix86_add_new_builtins (HOST_WIDE_INT);
+static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
 static tree ix86_canonical_va_list_type (tree);
 static void predict_jump (int);
 static unsigned int split_stack_prologue_scratch_regno (void);
@@ -2592,8 +2592,9 @@ enum ix86_function_specific_strings
   IX86_FUNCTION_SPECIFIC_MAX
 };
 
-static char *ix86_target_string (HOST_WIDE_INT, int, int, const char *,
-				 const char *, enum fpmath_unit, bool);
+static char *ix86_target_string (HOST_WIDE_INT, HOST_WIDE_INT, int, int,
+				 const char *, const char *, enum fpmath_unit,
+				 bool);
 static void ix86_function_specific_save (struct cl_target_option *,
 					 struct gcc_options *opts);
 static void ix86_function_specific_restore (struct gcc_options *opts,
@@ -4188,8 +4189,8 @@ ix86_using_red_zone (void)
    responsible for freeing the string.  */
 
 static char *
-ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
-		    const char *arch, const char *tune,
+ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2, int flags,
+		    int ix86_flags, const char *arch, const char *tune,
 		    enum fpmath_unit fpmath, bool add_nl_p)
 {
   struct ix86_target_opts
@@ -4257,7 +4258,12 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
     { "-mclzero",	OPTION_MASK_ISA_CLZERO  },
     { "-mpku",		OPTION_MASK_ISA_PKU  },
   };
-
+  /* Additional structure for isa flags.  */
+  static struct ix86_target_opts isa_opts2[] =
+  {
+    { "-mavx5124vnniw", OPTION_MASK_ISA_AVX5124VNNIW },
+    { "-mavx5124fmaps", OPTION_MASK_ISA_AVX5124FMAPS },
+  };
   /* Flag options.  */
   static struct ix86_target_opts flag_opts[] =
   {
@@ -4298,8 +4304,8 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
     { "-mgeneral-regs-only",		OPTION_MASK_GENERAL_REGS_ONLY },
   };
 
-  const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (flag_opts)
-		   + ARRAY_SIZE (ix86_flag_opts) + 6][2];
+  const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (isa_opts2)
+		   + ARRAY_SIZE (flag_opts) + ARRAY_SIZE (ix86_flag_opts) + 6][2];
 
   char isa_other[40];
   char target_other[40];
@@ -4361,6 +4367,16 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
 	       isa);
     }
 
+  /* Pick out the options in isa2 options.  */
+  for (i = 0; i < ARRAY_SIZE (isa_opts2); i++)
+    {
+      if ((isa2 & isa_opts2[i].mask) != 0)
+	{
+	  opts[num++][0] = isa_opts2[i].option;
+	  isa &= ~ isa_opts2[i].mask;
+	}
+    }
+
   /* Add flag options.  */
   for (i = 0; i < ARRAY_SIZE (flag_opts); i++)
     {
@@ -4486,9 +4502,9 @@ ix86_profile_before_prologue (void)
 void ATTRIBUTE_UNUSED
 ix86_debug_options (void)
 {
-  char *opts = ix86_target_string (ix86_isa_flags, target_flags,
-				   ix86_target_flags,
-				   ix86_arch_string, ix86_tune_string,
+  char *opts = ix86_target_string (ix86_isa_flags, ix86_isa_flags2,
+				   target_flags, ix86_target_flags,
+				   ix86_arch_string,ix86_tune_string,
 				   ix86_fpmath, true);
 
   if (opts)
@@ -4844,6 +4860,8 @@ ix86_option_override_internal (bool main_args_p,
 #define PTA_CLZERO		(HOST_WIDE_INT_1 << 57)
 #define PTA_NO_80387		(HOST_WIDE_INT_1 << 58)
 #define PTA_PKU		(HOST_WIDE_INT_1 << 59)
+#define PTA_AVX5124VNNIW	(HOST_WIDE_INT_1 << 60)
+#define PTA_AVX5124FMAPS	(HOST_WIDE_INT_1 << 61)
 
 #define PTA_CORE2 \
   (PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 \
@@ -5499,6 +5517,14 @@ ix86_option_override_internal (bool main_args_p,
 	if (processor_alias_table[i].flags & PTA_AVX512IFMA
 	    && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512IFMA))
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512IFMA;
+
+	if (processor_alias_table[i].flags & PTA_AVX5124VNNIW
+	    && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_AVX5124VNNIW))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124VNNIW;
+	if (processor_alias_table[i].flags & PTA_AVX5124FMAPS
+	    && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_AVX5124FMAPS))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124FMAPS;
+
 	if (processor_alias_table[i].flags & (PTA_PREFETCH_SSE | PTA_SSE))
 	  x86_prefetch_sse = true;
 	if (processor_alias_table[i].flags & PTA_MWAITX
@@ -6298,6 +6324,7 @@ ix86_function_specific_save (struct cl_target_option *ptr,
   ptr->tune_defaulted = ix86_tune_defaulted;
   ptr->arch_specified = ix86_arch_specified;
   ptr->x_ix86_isa_flags_explicit = opts->x_ix86_isa_flags_explicit;
+  ptr->x_ix86_isa_flags2_explicit = opts->x_ix86_isa_flags2_explicit;
   ptr->x_recip_mask_explicit = opts->x_recip_mask_explicit;
   ptr->x_ix86_arch_string = opts->x_ix86_arch_string;
   ptr->x_ix86_tune_string = opts->x_ix86_tune_string;
@@ -6354,6 +6381,7 @@ ix86_function_specific_restore (struct gcc_options *opts,
   ix86_tune_defaulted = ptr->tune_defaulted;
   ix86_arch_specified = ptr->arch_specified;
   opts->x_ix86_isa_flags_explicit = ptr->x_ix86_isa_flags_explicit;
+  opts->x_ix86_isa_flags2_explicit = ptr->x_ix86_isa_flags2_explicit;
   opts->x_recip_mask_explicit = ptr->x_recip_mask_explicit;
   opts->x_ix86_arch_string = ptr->x_ix86_arch_string;
   opts->x_ix86_tune_string = ptr->x_ix86_tune_string;
@@ -6459,9 +6487,9 @@ ix86_function_specific_print (FILE *file, int indent,
 			      struct cl_target_option *ptr)
 {
   char *target_string
-    = ix86_target_string (ptr->x_ix86_isa_flags, ptr->x_target_flags,
-			  ptr->x_ix86_target_flags, NULL, NULL,
-			  ptr->x_ix86_fpmath, false);
+    = ix86_target_string (ptr->x_ix86_isa_flags, ptr->x_ix86_isa_flags2,
+			  ptr->x_target_flags, ptr->x_ix86_target_flags,
+			  NULL, NULL, ptr->x_ix86_fpmath, false);
 
   gcc_assert (ptr->arch < PROCESSOR_max);
   fprintf (file, "%*sarch = %d (%s)\n",
@@ -6538,6 +6566,8 @@ ix86_valid_target_attribute_inner_p (tree args, char *p_strings[],
     IX86_ATTR_ISA ("avx512dq",	OPT_mavx512dq),
     IX86_ATTR_ISA ("avx512bw",	OPT_mavx512bw),
     IX86_ATTR_ISA ("avx512vl",	OPT_mavx512vl),
+    IX86_ATTR_ISA ("avx5124fmaps",	OPT_mavx5124fmaps),
+    IX86_ATTR_ISA ("avx5124vnniw",	OPT_mavx5124vnniw),
     IX86_ATTR_ISA ("mmx",	OPT_mmmx),
     IX86_ATTR_ISA ("pclmul",	OPT_mpclmul),
     IX86_ATTR_ISA ("popcnt",	OPT_mpopcnt),
@@ -6796,6 +6826,7 @@ ix86_valid_target_attribute_tree (tree args,
      The string options are attribute options, and will be undone
      when we copy the save structure.  */
   if (opts->x_ix86_isa_flags != def->x_ix86_isa_flags
+      || opts->x_ix86_isa_flags2 != def->x_ix86_isa_flags2
       || opts->x_target_flags != def->x_target_flags
       || option_strings[IX86_FUNCTION_SPECIFIC_ARCH]
       || option_strings[IX86_FUNCTION_SPECIFIC_TUNE]
@@ -6814,7 +6845,7 @@ ix86_valid_target_attribute_tree (tree args,
 				     | OPTION_MASK_ABI_64
 				     | OPTION_MASK_ABI_X32
 				     | OPTION_MASK_CODE16);
-
+	  opts->x_ix86_isa_flags &= 0;
 	}
       else if (!orig_arch_specified)
 	opts->x_ix86_arch_string = NULL;
@@ -6848,7 +6879,7 @@ ix86_valid_target_attribute_tree (tree args,
 	}
 
       /* Add any builtin functions with the new isa if any.  */
-      ix86_add_new_builtins (opts->x_ix86_isa_flags);
+      ix86_add_new_builtins (opts->x_ix86_isa_flags, opts->x_ix86_isa_flags2);
 
       /* Save the current options unless we are validating options for
 	 #pragma.  */
@@ -6953,8 +6984,10 @@ ix86_can_inline_p (tree caller, tree callee)
       /* Callee's isa options should a subset of the caller's, i.e. a SSE4 function
 	 can inline a SSE2 function but a SSE2 function can't inline a SSE4
 	 function.  */
-      if ((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
-	  != callee_opts->x_ix86_isa_flags)
+      if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
+	  != callee_opts->x_ix86_isa_flags) &
+	  ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
+	  != callee_opts->x_ix86_isa_flags2))
 	ret = false;
 
       /* See if we have the same non-isa options.  */
@@ -12078,6 +12111,15 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
 	      && df_regs_ever_live_p (regno)));
 }
 
+/* Return true if register class CL should be an additional allocno
+   class.  */
+
+static bool
+ix86_additional_allocno_class_p (reg_class_t cl)
+{
+  return cl == MOD4_SSE_REGS;
+}
+
 /* Return TRUE if we need to save REGNO.  */
 
 static bool
@@ -30836,6 +30878,7 @@ struct builtin_isa {
   const char *name;		/* function name */
   enum ix86_builtin_func_type tcode; /* type to use in the declaration */
   HOST_WIDE_INT isa;		/* isa_flags this builtin is defined for */
+  HOST_WIDE_INT isa2;		/* additional isa_flags this builtin is defined for */
   bool const_p;			/* true if the declaration is constant */
   bool leaf_p;			/* true if the declaration has leaf attribute */
   bool nothrow_p;		/* true if the declaration has nothrow attribute */
@@ -30846,6 +30889,7 @@ static struct builtin_isa ix86_builtins_isa[(int) IX86_BUILTIN_MAX];
 
 /* Bits that can still enable any inclusion of a builtin.  */
 static HOST_WIDE_INT deferred_isa_values = 0;
+static HOST_WIDE_INT deferred_isa_values2 = 0;
 
 /* Add an ix86 target builtin function with CODE, NAME and TYPE.  Save the MASK
    of which isa_flags to use in the ix86_builtins_isa array.  Stores the
@@ -30928,18 +30972,75 @@ def_builtin_const (HOST_WIDE_INT mask, const char *name,
   return decl;
 }
 
+/* Like def_builtin, but for additional isa2 flags.  */
+
+static inline tree
+def_builtin2 (HOST_WIDE_INT mask, const char *name,
+	     enum ix86_builtin_func_type tcode,
+	     enum ix86_builtins code)
+{
+  tree decl = NULL_TREE;
+
+  ix86_builtins_isa[(int) code].isa2 = mask;
+
+  if (mask == 0
+      || (mask & ix86_isa_flags2) != 0
+      || (lang_hooks.builtin_function
+	  == lang_hooks.builtin_function_ext_scope))
+
+    {
+      tree type = ix86_get_builtin_func_type (tcode);
+      decl = add_builtin_function (name, type, code, BUILT_IN_MD,
+				   NULL, NULL_TREE);
+	  ix86_builtins[(int) code] = decl;
+	  ix86_builtins_isa[(int) code].set_and_not_built_p = false;
+    }
+  else
+    {
+      /* Just a MASK where set_and_not_built_p == true can potentially
+	 include a builtin.  */
+      deferred_isa_values2 |= mask;
+      ix86_builtins[(int) code] = NULL_TREE;
+      ix86_builtins_isa[(int) code].tcode = tcode;
+      ix86_builtins_isa[(int) code].name = name;
+      ix86_builtins_isa[(int) code].leaf_p = false;
+      ix86_builtins_isa[(int) code].nothrow_p = false;
+      ix86_builtins_isa[(int) code].const_p = false;
+      ix86_builtins_isa[(int) code].set_and_not_built_p = true;
+    }
+
+  return decl;
+}
+
+/* Like def_builtin, but also marks the function decl "const".  */
+
+static inline tree
+def_builtin_const2 (HOST_WIDE_INT mask, const char *name,
+		   enum ix86_builtin_func_type tcode, enum ix86_builtins code)
+{
+  tree decl = def_builtin2 (mask, name, tcode, code);
+  if (decl)
+    TREE_READONLY (decl) = 1;
+  else
+    ix86_builtins_isa[(int) code].const_p = true;
+
+  return decl;
+}
+
 /* Add any new builtin functions for a given ISA that may not have been
    declared.  This saves a bit of space compared to adding all of the
    declarations to the tree, even if we didn't use them.  */
 
 static void
-ix86_add_new_builtins (HOST_WIDE_INT isa)
+ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2)
 {
-  if ((isa & deferred_isa_values) == 0)
+  if (((isa & deferred_isa_values) == 0)
+      && ((isa2 & deferred_isa_values2) == 0))
     return;
 
   /* Bits in ISA value can be removed from potential isa values.  */
   deferred_isa_values &= ~isa;
+  deferred_isa_values2 &= ~isa2;
 
   int i;
   tree saved_current_target_pragma = current_target_pragma;
@@ -30947,7 +31048,7 @@ ix86_add_new_builtins (HOST_WIDE_INT isa)
 
   for (i = 0; i < (int)IX86_BUILTIN_MAX; i++)
     {
-      if ((ix86_builtins_isa[i].isa & isa) != 0
+      if ((((ix86_builtins_isa[i].isa & isa) != 0) || ((ix86_builtins_isa[i].isa2 & isa2) != 0))
 	  && ix86_builtins_isa[i].set_and_not_built_p)
 	{
 	  tree decl, type;
@@ -31185,8 +31286,10 @@ BDESC_VERIFYS (IX86_BUILTIN__BDESC_ARGS_FIRST,
 	       IX86_BUILTIN__BDESC_SPECIAL_ARGS_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_ROUND_ARGS_FIRST,
 	       IX86_BUILTIN__BDESC_ARGS_LAST, 1);
-BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_FIRST,
+BDESC_VERIFYS (IX86_BUILTIN__BDESC_ARGS2_FIRST,
 	       IX86_BUILTIN__BDESC_ROUND_ARGS_LAST, 1);
+BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_FIRST,
+	       IX86_BUILTIN__BDESC_ARGS2_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_CONST_FIRST,
 	       IX86_BUILTIN__BDESC_MPX_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_MULTI_ARG_FIRST,
@@ -31237,6 +31340,18 @@ ix86_init_mmx_sse_builtins (void)
 		 IX86_BUILTIN__BDESC_ARGS_FIRST,
 		 ARRAY_SIZE (bdesc_args) - 1);
 
+  /* Add all builtins with variable number of operands.  */
+  for (i = 0, d = bdesc_args2;
+       i < ARRAY_SIZE (bdesc_args2);
+       i++, d++)
+    {
+      if (d->name == 0)
+	continue;
+
+      ftype = (enum ix86_builtin_func_type) d->flag;
+      def_builtin_const2 (d->mask, d->name, ftype, d->code);
+    }
+
   /* Add all builtins with rounding.  */
   for (i = 0, d = bdesc_round_args;
        i < ARRAY_SIZE (bdesc_round_args);
@@ -36428,10 +36543,13 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget,
      current ISA based on the command line switches.  With function specific
      options, we need to check in the context of the function making the call
      whether it is supported.  */
-  if (ix86_builtins_isa[fcode].isa
-      && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
+  if ((ix86_builtins_isa[fcode].isa
+       && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
+      && (ix86_builtins_isa[fcode].isa2
+	  && !(ix86_builtins_isa[fcode].isa2 & ix86_isa_flags2)))
     {
-      char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa, 0, 0,
+      char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa,
+				       ix86_builtins_isa[fcode].isa2, 0, 0,
 				       NULL, NULL, (enum fpmath_unit) 0,
 				       false);
       if (!opts)
@@ -38091,6 +38209,246 @@ rdseed_step:
 	}
     }
 
+  if (fcode >= IX86_BUILTIN__BDESC_ARGS2_FIRST
+      && fcode <= IX86_BUILTIN__BDESC_ARGS2_LAST)
+    {
+      i = fcode - IX86_BUILTIN__BDESC_ARGS2_FIRST;
+      rtx (*fcn) (rtx, rtx, rtx, rtx);
+      rtx (*fcn_mask) (rtx, rtx, rtx, rtx, rtx);
+      rtx (*fcn_maskz) (rtx, rtx, rtx, rtx, rtx, rtx);
+      rtx (*msk_mov) (rtx, rtx, rtx, rtx);
+      int masked = 1;
+      machine_mode mode, wide_mode, nar_mode;
+
+      nar_mode  = V4SFmode;
+      mode      = V16SFmode;
+      wide_mode = V64SFmode;
+      msk_mov   = gen_avx512f_loadv16sf_mask;
+      fcn_mask  = gen_avx5124fmaddps_4fmaddps_mask;
+      fcn_maskz = gen_avx5124fmaddps_4fmaddps_maskz;
+
+      switch (fcode)
+	{
+	case IX86_BUILTIN_4FMAPS:
+	  fcn = gen_avx5124fmaddps_4fmaddps;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSD:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn = gen_avx5124vnniw_vp4dpwssd;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSDS:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn = gen_avx5124vnniw_vp4dpwssds;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4FNMAPS:
+	  fcn = gen_avx5124fmaddps_4fnmaddps;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4FNMAPS_MASK:
+	  fcn_mask  = gen_avx5124fmaddps_4fnmaddps_mask;
+	  fcn_maskz = gen_avx5124fmaddps_4fnmaddps_maskz;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSD_MASK:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn_mask  = gen_avx5124vnniw_vp4dpwssd_mask;
+	  fcn_maskz = gen_avx5124vnniw_vp4dpwssd_maskz;
+	  msk_mov   = gen_avx512f_loadv16si_mask;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSDS_MASK:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn_mask  = gen_avx5124vnniw_vp4dpwssds_mask;
+	  fcn_maskz = gen_avx5124vnniw_vp4dpwssds_maskz;
+	  msk_mov   = gen_avx512f_loadv16si_mask;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4FMAPS_MASK:
+	  {
+	    tree args[4];
+	    rtx ops[4];
+	    rtx wide_reg;
+	    rtx accum;
+	    rtx addr;
+	    rtx mem;
+
+v4fma_expand:
+	    wide_reg = gen_reg_rtx (wide_mode);
+	    for (i = 0; i < 4; i++)
+	      {
+	        args[i] = CALL_EXPR_ARG (exp, i);
+		ops[i] = expand_normal (args[i]);
+
+		emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, (i) * 64),
+				  ops[i]);
+	      }
+
+	    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
+	    accum = force_reg (mode, accum);
+
+	    addr = expand_normal (CALL_EXPR_ARG (exp, 5));
+	    addr = force_reg (Pmode, addr);
+
+	    mem = gen_rtx_MEM (nar_mode, addr);
+
+	    target = gen_reg_rtx (mode);
+
+	    emit_move_insn (target, accum);
+
+	    if (! masked)
+	      emit_insn (fcn (target, accum, wide_reg, mem));
+	    else
+	      {
+	        rtx merge, mask;
+		merge = expand_normal (CALL_EXPR_ARG (exp, 6));
+
+		mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+
+		if (CONST_INT_P (mask))
+		  mask = fixup_modeless_constant (mask, HImode);
+
+		mask = force_reg (HImode, mask);
+
+		if (GET_MODE (mask) != HImode)
+		  mask = gen_rtx_SUBREG (HImode, mask, 0);
+
+		/* If merge is 0 then we're about to emit z-masked variant.  */
+		if (const0_operand (merge, mode))
+		  emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
+		/* If merge is the same as accum then emit merge-masked variant.  */
+		else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
+		  {
+		    merge = force_reg (mode, merge);
+		    emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
+		  }
+	        /* Merge with something unknown might happen if we z-mask w/ -O0.  */
+		else
+		  {
+		    rtx tmp = target;
+		    emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
+
+		    target = force_reg (mode, merge);
+		    emit_insn (msk_mov (target, tmp, target, mask));
+		  }
+	      }
+	      return target;
+	    }
+
+	case IX86_BUILTIN_4FNMASS:
+	  fcn = gen_avx5124fmaddps_4fnmaddss;
+	  masked = 0;
+	  goto s4fma_expand;
+
+	case IX86_BUILTIN_4FMASS:
+	  fcn = gen_avx5124fmaddps_4fmaddss;
+	  masked = 0;
+	  goto s4fma_expand;
+
+	case IX86_BUILTIN_4FNMASS_MASK:
+	  fcn_mask = gen_avx5124fmaddps_4fnmaddss_mask;
+	  fcn_maskz = gen_avx5124fmaddps_4fnmaddss_maskz;
+	  msk_mov   = gen_avx512vl_loadv4sf_mask;
+	  goto s4fma_expand;
+
+	case IX86_BUILTIN_4FMASS_MASK:
+	  {
+	    tree args[4];
+	    rtx ops[4];
+	    rtx wide_reg;
+	    rtx accum;
+	    rtx addr;
+	    rtx mem;
+
+	    fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
+	    fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
+	    msk_mov   = gen_avx512vl_loadv4sf_mask;
+
+s4fma_expand:
+	    mode = V4SFmode;
+	    wide_reg = gen_reg_rtx (V64SFmode);
+	    for (i = 0; i < 4; i++)
+	      {
+		 rtx tmp;
+		 args[i] = CALL_EXPR_ARG (exp, i);
+		 ops[i] = expand_normal (args[i]);
+
+		 tmp = gen_reg_rtx (SFmode);
+		 emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
+
+		 emit_move_insn (gen_rtx_SUBREG (V16SFmode, wide_reg, i * 64),
+				  gen_rtx_SUBREG (V16SFmode, tmp, 0));
+	      }
+
+	    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
+	    accum = force_reg (V4SFmode, accum);
+
+	    addr = expand_normal (CALL_EXPR_ARG (exp, 5));
+	    addr = force_reg (Pmode, addr);
+
+	    mem = gen_rtx_MEM (V4SFmode, addr);
+
+	    target = gen_reg_rtx (V4SFmode);
+
+	    emit_move_insn (target, accum);
+
+	    if (! masked)
+	      emit_insn (fcn (target, accum, wide_reg, mem));
+	    else
+	      {
+		 rtx merge, mask;
+		 merge = expand_normal (CALL_EXPR_ARG (exp, 6));
+
+		 mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+
+		 if (CONST_INT_P (mask))
+		   mask = fixup_modeless_constant (mask, QImode);
+
+		 mask = force_reg (QImode, mask);
+
+		 if (GET_MODE (mask) != QImode)
+		   mask = gen_rtx_SUBREG (QImode, mask, 0);
+
+		 /* If merge is 0 then we're about to emit z-masked variant.  */
+		 if (const0_operand (merge, mode))
+		   emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
+		 /* If merge is the same as accum then emit merge-masked variant.  */
+		 else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
+		   {
+		     merge = force_reg (mode, merge);
+		     emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
+		   }
+		 /* Merge with something unknown might happen if we z-mask w/ -O0.  */
+		 else
+		   {
+		     rtx tmp = target;
+		     emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
+
+		     target = force_reg (mode, merge);
+		     emit_insn (msk_mov (target, tmp, target, mask));
+		   }
+		}
+	      return target;
+	    }
+	  default:
+	    return ix86_expand_args_builtin (bdesc_args2 + i, exp, target);
+	  }
+    }
+
   if (fcode >= IX86_BUILTIN__BDESC_COMI_FIRST
       && fcode <= IX86_BUILTIN__BDESC_COMI_LAST)
     {
@@ -38151,7 +38509,8 @@ static tree ix86_get_builtin (enum ix86_builtins code)
 
   opts = TREE_TARGET_OPTION (target_tree);
 
-  if (ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
+  if ((ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
+	&& (ix86_builtins_isa[(int) code].isa2 & opts->x_ix86_isa_flags2))
     return ix86_builtin_decl (code, true);
   else
     return NULL_TREE;
@@ -39735,6 +40094,18 @@ ix86_hard_regno_mode_ok (int regno, machine_mode mode)
 	      || VALID_AVX512F_SCALAR_MODE (mode)))
 	return true;
 
+      /* For AVX-5124FMAPS allow V64SFmode for special regnos.  */
+      if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
+	  && MOD4_SSE_REGNO_P (regno)
+	  && mode == V64SFmode)
+	return true;
+
+      /* For AVX-5124VNNIW allow V64SImode for special regnos.  */
+      if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
+	  && MOD4_SSE_REGNO_P (regno)
+	  && mode == V64SImode)
+	return true;
+
       /* TODO check for QI/HI scalars.  */
       /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
       if (TARGET_AVX512VL
@@ -51134,6 +51505,9 @@ ix86_run_selftests (void)
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
 #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
 
+#undef TARGET_ADDITIONAL_ALLOCNO_CLASS_P
+#define TARGET_ADDITIONAL_ALLOCNO_CLASS_P ix86_additional_allocno_class_p
+
 #undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
 #define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index add7a64..10533eb 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -81,6 +81,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TARGET_AVX512VBMI_P(x)	TARGET_ISA_AVX512VBMI_P(x)
 #define TARGET_AVX512IFMA	TARGET_ISA_AVX512IFMA
 #define TARGET_AVX512IFMA_P(x)	TARGET_ISA_AVX512IFMA_P(x)
+#define TARGET_AVX5124FMAPS	TARGET_ISA_AVX5124FMAPS
+#define TARGET_AVX5124FMAPS_P(x) TARGET_ISA_AVX5124FMAPS_P(x)
+#define TARGET_AVX5124VNNIW	TARGET_ISA_AVX5124VNNIW
+#define TARGET_AVX5124VNNIW_P(x) TARGET_ISA_AVX5124VNNIW_P(x)
 #define TARGET_FMA	TARGET_ISA_FMA
 #define TARGET_FMA_P(x)	TARGET_ISA_FMA_P(x)
 #define TARGET_SSE4A	TARGET_ISA_SSE4A
@@ -1089,7 +1093,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define HARD_REGNO_NREGS(REGNO, MODE)					\
   (STACK_REGNO_P (REGNO) || SSE_REGNO_P (REGNO) || MMX_REGNO_P (REGNO)	\
    || MASK_REGNO_P (REGNO) || BND_REGNO_P (REGNO)			\
-   ? (COMPLEX_MODE_P (MODE) ? 2 : 1)					\
+   ? (COMPLEX_MODE_P (MODE) ? 2 :					\
+      (((MODE == V64SFmode) || (MODE == V64SImode)) ? 4 : 1))		\
    : ((MODE) == XFmode							\
       ? (TARGET_64BIT ? 2 : 3)						\
       : ((MODE) == XCmode						\
@@ -1365,6 +1370,7 @@ enum reg_class
   FLOAT_INT_SSE_REGS,
   MASK_EVEX_REGS,
   MASK_REGS,
+  MOD4_SSE_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1425,6 +1431,7 @@ enum reg_class
    "FLOAT_INT_SSE_REGS",		\
    "MASK_EVEX_REGS",			\
    "MASK_REGS",				\
+   "MOD4_SSE_REGS"			\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1465,9 +1472,10 @@ enum reg_class
 {   0x11ffff,    0x1fe0,    0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,   0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,   0x1f },       /* FLOAT_INT_SSE_REGS */        \
-       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */            \
        { 0x0,       0x0, 0x1fe0 },       /* MASK_REGS */                 \
-{ 0xffffffff,0xffffffff,0x1ffff }                                        \
+{ 0x1fe00000,0xffffe000,   0x1f },       /* MOD4_SSE_REGS */		 \
+{ 0xffffffff,0xffffffff,0x1ffff }		\
 }
 
 /* The same information, inverted:
@@ -1533,6 +1541,16 @@ enum reg_class
 #define BND_REG_P(X) (REG_P (X) && BND_REGNO_P (REGNO (X)))
 #define BND_REGNO_P(N) IN_RANGE ((N), FIRST_BND_REG, LAST_BND_REG)
 
+#define MOD4_SSE_REG_P(X) (REG_P (X) && MOD4_SSE_REGNO_P (REGNO (X)))
+#define MOD4_SSE_REGNO_P(N) ((N) == XMM0_REG  \
+			     || (N) == XMM4_REG  \
+			     || (N) == XMM8_REG  \
+			     || (N) == XMM12_REG \
+			     || (N) == XMM16_REG \
+			     || (N) == XMM20_REG \
+			     || (N) == XMM24_REG \
+			     || (N) == XMM28_REG)
+
 /* First floating point reg */
 #define FIRST_FLOAT_REG FIRST_STACK_REG
 #define STACK_TOP_P(X) (REG_P (X) && REGNO (X) == FIRST_FLOAT_REG)
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 9eef558..390412a 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -25,11 +25,17 @@ config/i386/i386-opts.h
 Variable
 HOST_WIDE_INT ix86_isa_flags = TARGET_64BIT_DEFAULT | TARGET_SUBTARGET_ISA_DEFAULT
 
+Variable
+HOST_WIDE_INT ix86_isa_flags2 = 0
+
 ; A mask of ix86_isa_flags that includes bit X if X was set or cleared
 ; on the command line.
 Variable
 HOST_WIDE_INT ix86_isa_flags_explicit
 
+Variable
+HOST_WIDE_INT ix86_isa_flags2_explicit
+
 ; Additional target flags
 Variable
 int ix86_target_flags
@@ -74,6 +80,10 @@ unsigned char branch_cost
 
 ;; which flags were passed by the user
 TargetSave
+HOST_WIDE_INT x_ix86_isa_flags2_explicit
+
+;; which flags were passed by the user
+TargetSave
 HOST_WIDE_INT x_ix86_isa_flags_explicit
 
 ;; whether -mtune was not specified
@@ -687,6 +697,14 @@ mavx512vbmi
 Target Report Mask(ISA_AVX512VBMI) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512VBMI built-in functions and code generation.
 
+mavx5124fmaps
+Target Report Mask(ISA_AVX5124FMAPS) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX5124FMAPS built-in functions and code generation.
+
+mavx5124vnniw
+Target Report Mask(ISA_AVX5124VNNIW) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX5124VNNIW built-in functions and code generation.
+
 mfma
 Target Report Mask(ISA_FMA) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and FMA built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index 9333111..3fd3c9c 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -68,6 +68,10 @@
 
 #include <avx512vbmivlintrin.h>
 
+#include <avx5124fmapsintrin.h>
+
+#include <avx5124vnniwintrin.h>
+
 #include <shaintrin.h>
 
 #include <lzcntintrin.h>
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 14fcd67..81fcc1d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -146,6 +146,12 @@
 
   ;; For AVX512VBMI support
   UNSPEC_VPMULTISHIFT
+
+  ;; For AVX5124FMAPS/AVX5124VNNIW support
+  UNSPEC_VP4FMADD
+  UNSPEC_VP4FNMADD
+  UNSPEC_VP4DPWSSD
+  UNSPEC_VP4DPWSSDS
 ])
 
 (define_c_enum "unspecv" [
@@ -19397,3 +19403,274 @@
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
+
+(define_mode_iterator IMOD4
+  [(V64SF "TARGET_AVX5124FMAPS") (V64SI "TARGET_AVX5124VNNIW")])
+
+(define_mode_attr imod4_narrow
+  [(V64SF "V16SF") (V64SI "V16SI")])
+
+(define_insn "mov<mode>"
+  [(set (match_operand:IMOD4 0 "nonimmediate_operand")
+	(match_operand:IMOD4 1 "general_operand"))]
+  "TARGET_AVX512F"
+  "#")
+
+(define_split
+  [(set (match_operand:IMOD4 0 "register_operand")
+	(match_operand:IMOD4 1 "nonimmediate_operand"))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (subreg:<imod4_narrow> (match_dup 0) 0)
+	(subreg:<imod4_narrow> (match_dup 1) 0))
+   (set (subreg:<imod4_narrow> (match_dup 0) 64)
+	(subreg:<imod4_narrow> (match_dup 1) 64))
+   (set (subreg:<imod4_narrow> (match_dup 0) 128)
+	(subreg:<imod4_narrow> (match_dup 1) 128))
+   (set (subreg:<imod4_narrow> (match_dup 0) 192)
+	(subreg:<imod4_narrow> (match_dup 1) 192))])
+
+(define_insn "avx5124fmaddps_4fmaddps"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(unspec:V16SF
+	  [(match_operand:V16SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "Yh")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddps_mask"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	     [(match_operand:V64SF 1 "register_operand" "Yh")
+	      (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V16SF 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddps_maskz"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	    [(match_operand:V16SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "Yh")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V16SF 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(unspec:V4SF
+	  [(match_operand:V4SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "Yh")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%3, %x2, %0|%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss_mask"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V64SF 1 "register_operand" "Yh")
+	     (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V4SF 3 "register_operand" "0")
+	  (match_operand:QI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%2, %x1, %0%{%4%}|%{%4%}%0, %x1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss_maskz"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V4SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "Yh")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V4SF 4 "const0_operand" "C")
+	  (match_operand:QI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%3, %x2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(unspec:V16SF
+	  [(match_operand:V16SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "Yh")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps_mask"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	     [(match_operand:V64SF 1 "register_operand" "Yh")
+	      (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V16SF 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps_maskz"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	    [(match_operand:V16SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "Yh")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V16SF 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(unspec:V4SF
+	  [(match_operand:V4SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "Yh")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%3, %x2, %0|%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss_mask"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V64SF 1 "register_operand" "Yh")
+	     (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V4SF 3 "register_operand" "0")
+	  (match_operand:QI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%2, %x1, %0%{%4%}|%{%4%}%0, %x1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss_maskz"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V4SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "Yh")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V4SF 4 "const0_operand" "C")
+	  (match_operand:QI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%3, %x2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(unspec:V16SI
+	  [(match_operand:V16SI 1 "register_operand" "0")
+	   (match_operand:V64SI 2 "register_operand" "Yh")
+	   (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSD))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd_mask"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	     [(match_operand:V64SI 1 "register_operand" "Yh")
+	      (match_operand:V4SI 2 "memory_operand" "m")] UNSPEC_VP4DPWSSD)
+	  (match_operand:V16SI 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd_maskz"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	    [(match_operand:V16SI 1 "register_operand" "0")
+	     (match_operand:V64SI 2 "register_operand" "Yh")
+	     (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSD)
+	  (match_operand:V16SI 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(unspec:V16SI
+	  [(match_operand:V16SI 1 "register_operand" "0")
+	   (match_operand:V64SI 2 "register_operand" "Yh")
+	   (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSDS))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds_mask"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	     [(match_operand:V64SI 1 "register_operand" "Yh")
+	      (match_operand:V4SI 2 "memory_operand" "m")] UNSPEC_VP4DPWSSDS)
+	  (match_operand:V16SI 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds_maskz"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	    [(match_operand:V16SI 1 "register_operand" "0")
+	     (match_operand:V64SI 2 "register_operand" "Yh")
+	     (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSDS)
+	  (match_operand:V16SI 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index 92ca055..42ab5f0 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -973,10 +973,10 @@ inline __attribute__((__always_inline__))\n\
 #else\n\
 extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
 #endif\n\
-unsigned char\n\
+unsigned short\n\
 mode_size_inline (machine_mode mode)\n\
 {\n\
-  extern %sunsigned char mode_size[NUM_MACHINE_MODES];\n\
+  extern %sunsigned short mode_size[NUM_MACHINE_MODES];\n\
   gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
   switch (mode)\n\
     {\n", adj_bytesize ? "" : "const ");
@@ -1301,7 +1301,7 @@ emit_mode_size (void)
   int c;
   struct mode_data *m;
 
-  print_maybe_const_decl ("%sunsigned char", "mode_size",
+  print_maybe_const_decl ("%sunsigned short", "mode_size",
 			  "NUM_MACHINE_MODES", bytesize);
 
   for_all_modes (c, m)
@@ -1492,7 +1492,7 @@ emit_mode_base_align (void)
   int c;
   struct mode_data *m;
 
-  print_maybe_const_decl ("%sunsigned char",
+  print_maybe_const_decl ("%sunsigned short",
 			  "mode_base_align", "NUM_MACHINE_MODES",
 			  alignment);
 
diff --git a/gcc/init-regs.c b/gcc/init-regs.c
index 3fbaee1..2ee4bd4 100644
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
@@ -104,6 +104,7 @@ initialize_uninitialized_regs (void)
 		  bitmap_set_bit (already_genned, regno);
 
 		  start_sequence ();
+		  emit_clobber (reg);
 		  emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
 		  move_insn = get_insns ();
 		  end_sequence ();
diff --git a/gcc/machmode.h b/gcc/machmode.h
index 3dcadd8..d924e83 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -179,7 +179,7 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
 
 /* Get the size in bytes and bits of an object of mode MODE.  */
 
-extern CONST_MODE_SIZE unsigned char mode_size[NUM_MACHINE_MODES];
+extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
 #if GCC_VERSION >= 4001
 #define GET_MODE_SIZE(MODE) \
   ((unsigned short) (__builtin_constant_p (MODE) \
@@ -330,7 +330,7 @@ extern machine_mode get_best_mode (int, int,
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
 
-extern CONST_MODE_BASE_ALIGN unsigned char mode_base_align[NUM_MACHINE_MODES];
+extern CONST_MODE_BASE_ALIGN unsigned short mode_base_align[NUM_MACHINE_MODES];
 
 extern unsigned get_mode_alignment (machine_mode);
 
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index b6b3559..701051d 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,9 +1,10 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mclwb -mmwaitx -mclzero -mpku" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
-   popcntintrin.h, fmaintrin.h, pkuintrin.h and mm_malloc.h.h are usable with
+   popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
+   avx5124vnniwintrin.h and mm_malloc.h.h are usable with
    -O -pedantic-errors.  */
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index 994ed28..cd8f217 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,9 +1,10 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mclwb -mmwaitx -mclzero -mpku" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
-   popcntintrin.h, fmaintrin.h, pkuintrin.h and mm_malloc.h are usable with
+   popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
+   avx5124vnniwintrin.h and mm_malloc.h are usable with
    -O -fkeep-inline-functions.  */
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c
new file mode 100644
index 0000000..1035f25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512 a, b, c, d, e, f, g, x1, x2, x3;
+__m128 *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4fmadd_ps (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4fmadd_ps (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4fmadd_ps (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c
new file mode 100644
index 0000000..f977b65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-require-effective-target avx5124fmaps } */
+
+#define ESP_FLOAT 1.0
+
+#define AVX5124FMAPS
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (float *src1, float* src2, float *src3,
+      float *src4, float* prev_dst, float *mult, float *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      dst[i] = (double)prev_dst[i]
+	+ (double)src1[i] * (double)mult[0]
+	+ (double)src2[i] * (double)mult[1]
+	+ (double)src3[i] * (double)mult[2]
+	+ (double)src4[i] * (double)mult[3];
+    }
+}
+
+void
+TEST (void)
+{
+  int i, sign;
+  UNION_TYPE (AVX512F_LEN,) src1, src2, src3, src4, src5, dst, res1, res2, res3;
+  UNION_TYPE (128,) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  float res_ref[SIZE];
+
+  sign = -1;
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = 1.5 + 34.67 * i * sign;
+      src2.a[i] = -22.17 * i * sign;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+      sign = sign * -1;
+    }
+  for (i = 0; i < 4; i++)
+    mult.a[i] = 3.1415 + i * 2.71828;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4fmadd_ps)       (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4fmadd_ps)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4fmadd_ps) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c
new file mode 100644
index 0000000..2f1a558
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m128 a, b, c, d, e, f, x1, x2, x3;
+__m128 *mem;
+__mmask8 m;
+
+int foo ()
+{
+  x1 = _mm_4fmadd_ss (a, b, c, d, e, mem);
+  x2 = _mm_mask_4fmadd_ss (a, m, b, c, d, e, mem);
+  x3 = _mm_maskz_4fmadd_ss (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c
new file mode 100644
index 0000000..45bd7da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512 a, b, c, d, e, f, g, x1, x2, x3;
+__m128 *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4fnmadd_ps (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4fnmadd_ps (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4fnmadd_ps (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c
new file mode 100644
index 0000000..3c75fcf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-require-effective-target avx5124fmaps } */
+
+#define ESP_FLOAT 1.0
+
+#define AVX5124FMAPS
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (float *src1, float* src2, float *src3,
+      float *src4, float* prev_dst, float *mult, float *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      dst[i] = (double)prev_dst[i]
+	- (double)src1[i] * (double)mult[0]
+	- (double)src2[i] * (double)mult[1]
+	- (double)src3[i] * (double)mult[2]
+	- (double)src4[i] * (double)mult[3];
+    }
+}
+
+void
+TEST (void)
+{
+  int i, sign;
+  UNION_TYPE (AVX512F_LEN,) src1, src2, src3, src4, src5, dst, res1, res2, res3;
+  UNION_TYPE (128,) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  float res_ref[SIZE];
+
+  sign = -1;
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = 1.5 + 34.67 * i * sign;
+      src2.a[i] = -22.17 * i * sign;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+      sign = sign * -1;
+    }
+  for (i = 0; i < 4; i++)
+    mult.a[i] = 3.1415 + i * 2.71828;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4fnmadd_ps)       (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4fnmadd_ps)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4fnmadd_ps) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c
new file mode 100644
index 0000000..1755afb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+
+#include <x86intrin.h>
+
+__m128 a, b, c, d, e, f, x1, x2, x3;
+__m128 *mem;
+__mmask8 m;
+
+int foo ()
+{
+  x1 = _mm_4fnmadd_ss (a, b, c, d, e, mem);
+  x2 = _mm_mask_4fnmadd_ss (a, m, b, c, d, e, mem);
+  x3 = _mm_maskz_4fnmadd_ss (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h b/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
new file mode 100644
index 0000000..eba93cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
@@ -0,0 +1,47 @@
+#include <stdlib.h>
+#include "cpuid.h"
+#include "m512-check.h"
+#include "avx512f-os-support.h"
+
+static void avx5124fmaps_test (void);
+
+static void __attribute__ ((noinline)) do_test (void)
+{
+  avx5124fmaps_test ();
+}
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AVX512_4FNMA test only if host has the support.  */
+  if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE))
+    {
+      if (__get_cpuid_max (0, NULL) < 7)
+	return 0;
+
+      __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+      if ((avx512f_os_support ()) && ((edx & bit_AVX5124FMAPS) == bit_AVX5124FMAPS))
+	{
+	  do_test ();
+#ifdef DEBUG
+	  printf ("PASSED\n");
+#endif
+	  return 0;
+	}
+#ifdef DEBUG
+      printf ("SKIPPED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h b/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
new file mode 100644
index 0000000..a706cfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
@@ -0,0 +1,47 @@
+#include <stdlib.h>
+#include "cpuid.h"
+#include "m512-check.h"
+#include "avx512f-os-support.h"
+
+static void avx5124vnniw_test (void);
+
+static void __attribute__ ((noinline)) do_test (void)
+{
+  avx5124vnniw_test ();
+}
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AVX512_4FNMA test only if host has the support.  */
+  if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE))
+    {
+      if (__get_cpuid_max (0, NULL) < 7)
+	return 0;
+
+      __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+      if ((avx512f_os_support ()) && ((edx & bit_AVX5124VNNIW) == bit_AVX5124VNNIW))
+	{
+	  do_test ();
+#ifdef DEBUG
+	  printf ("PASSED\n");
+#endif
+	  return 0;
+	}
+#ifdef DEBUG
+      printf ("SKIPPED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c
new file mode 100644
index 0000000..a234fdd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512i a, b, c, d, e, f, g, x1, x2, x3;
+__m128i *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4dpwssd_epi32 (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4dpwssd_epi32 (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4dpwssd_epi32 (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c
new file mode 100644
index 0000000..a0a6825
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c
@@ -0,0 +1,79 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-require-effective-target avx5124vnniw } */
+
+#define AVX5124VNNIW
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (short *src1, short* src2, short *src3,
+      short *src4, int* prev_dst, short *mult, int *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      int p1dword, p2dword;
+      dst[i] = prev_dst[i];
+      p1dword = (int)(src1[2*i  ]) * (int)(mult[0]);
+      p2dword = (int)(src1[2*i+1]) * (int)(mult[1]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src2[2*i  ]) * (int)(mult[2]);
+      p2dword = (int)(src2[2*i+1]) * (int)(mult[3]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src3[2*i  ]) * (int)(mult[4]);
+      p2dword = (int)(src3[2*i+1]) * (int)(mult[5]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src4[2*i  ]) * (int)(mult[6]);
+      p2dword = (int)(src4[2*i+1]) * (int)(mult[7]);
+      dst[i] += p1dword + p2dword;
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_w) src1, src2, src3, src4;
+  UNION_TYPE (AVX512F_LEN, i_d) src5, dst, res1, res2, res3;
+  UNION_TYPE (128, i_w) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  int res_ref[SIZE];
+
+  for (i = 0; i < SIZE * 2; i++)
+    {
+      src1.a[i] = 2 + 7 * i % 291;
+      src2.a[i] = 3 + 11 * (i % 377) * i;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+    }
+  for (i = 0; i < 8; i++)
+    mult.a[i] = 3 + i * 2;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4dpwssd_epi32)       (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4dpwssd_epi32)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4dpwssd_epi32) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c
new file mode 100644
index 0000000..d1bed37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512i a, b, c, d, e, f, g, x1, x2, x3;
+__m128i *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4dpwssds_epi32 (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4dpwssds_epi32 (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4dpwssds_epi32 (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c
new file mode 100644
index 0000000..e1e5536
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c
@@ -0,0 +1,98 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-require-effective-target avx5124vnniw } */
+
+#define DEFAULT_VALUE 0x7ffffffe
+
+#define AVX5124VNNIW
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (short *src1, short* src2, short *src3,
+      short *src4, int* prev_dst, short *mult, int *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      int p1dword, p2dword;
+      long long int tmp;
+      dst[i] = prev_dst[i];
+      p1dword = (int)(src1[2*i  ]) * (int)(mult[0]);
+      p2dword = (int)(src1[2*i+1]) * (int)(mult[1]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src2[2*i  ]) * (int)(mult[2]);
+      p2dword = (int)(src2[2*i+1]) * (int)(mult[3]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src3[2*i  ]) * (int)(mult[4]);
+      p2dword = (int)(src3[2*i+1]) * (int)(mult[5]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src4[2*i  ]) * (int)(mult[6]);
+      p2dword = (int)(src4[2*i+1]) * (int)(mult[7]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_w) src1, src2, src3, src4;
+  UNION_TYPE (AVX512F_LEN, i_d) src5, dst, res1, res2, res3;
+  UNION_TYPE (128, i_w) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  int res_ref[SIZE];
+
+  for (i = 0; i < SIZE * 2; i++)
+    {
+      src1.a[i] = 2 + 7 * i % 291;
+      src2.a[i] = 3 + 11 * (i % 377) * i;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+    }
+  for (i = 0; i < 8; i++)
+    mult.a[i] = 3 + i * 2;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4dpwssds_epi32)	     (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4dpwssds_epi32)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4dpwssds_epi32) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
index 5923085..6aca0d6 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
@@ -22,6 +22,10 @@
 #include "avx512ifma-check.h"
 #elif defined (AVX512VBMI) && !defined (AVX512VL)
 #include "avx512vbmi-check.h"
+#elif defined (AVX5124FMAPS) && !defined (AVX512VL)
+#include "avx5124fmaps-check.h"
+#elif defined (AVX5124VNNIW) && !defined (AVX512VL)
+#include "avx5124vnniw-check.h"
 #elif defined (AVX512VL)
 #include "avx512vl-check.h"
 #endif
@@ -33,7 +37,9 @@
 /* Value to be written into destination.
    We have one value for all types so it must be small enough
    to fit into signed char.  */
+#ifndef DEFAULT_VALUE
 #define DEFAULT_VALUE 117
+#endif
 
 #define MAKE_MASK_MERGE(NAME, TYPE)				      \
 static void							      \
@@ -132,6 +138,12 @@ avx512ifma_test (void) { test_512 (); }
 #elif defined (AVX512VBMI) && !defined (AVX512VL)
 void
 avx512vbmi_test (void) { test_512 (); }
+#elif defined (AVX5124FMAPS) && !defined (AVX512VL)
+void
+avx5124fmaps_test (void) { test_512 (); }
+#elif defined (AVX5124VNNIW) && !defined (AVX512VL)
+void
+avx5124vnniw_test (void) { test_512 (); }
 #elif defined (AVX512VL)
 void
 avx512vl_test (void) { test_256 (); test_128 (); }
diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp
index 877d224..4057240 100644
--- a/gcc/testsuite/gcc.target/i386/i386.exp
+++ b/gcc/testsuite/gcc.target/i386/i386.exp
@@ -366,6 +366,48 @@ proc check_effective_target_avx512vbmi { } {
     } "-mavx512vbmi" ]
 }
 
+# Return 1 if avx512_4fmaps instructions can be compiled.
+proc check_effective_target_avx5124fmaps { } {
+    return [check_no_compiler_messages avx5124fmaps object {
+	typedef float __v16sf __attribute__ ((__vector_size__ (64)));
+	typedef float __v4sf __attribute__ ((__vector_size__ (16)));
+
+	__v16sf
+	_mm512_mask_4fmadd_ps (__v16sf __DEST, __v16sf __A, __v16sf __B, __v16sf __C,
+			       __v16sf __D, __v16sf __E, __v4sf *__F)
+	{
+	    return (__v16sf) __builtin_ia32_4fmaddps_mask ((__v16sf) __A,
+							  (__v16sf) __B,
+							  (__v16sf) __C,
+							  (__v16sf) __D,
+							  (__v16sf) __E,
+							  (const __v4sf *) __F,
+							  (__v16sf) __DEST,
+							  0xffff);
+	}
+    } "-mavx5124fmaps" ]
+}
+
+# Return 1 if avx512_4vnniw instructions can be compiled.
+proc check_effective_target_avx5124vnniw { } {
+    return [check_no_compiler_messages avx5124vnniw object {
+	typedef int __v16si __attribute__ ((__vector_size__ (64)));
+	typedef int __v4si __attribute__ ((__vector_size__ (16)));
+
+	__v16si
+	_mm512_4dpwssd_epi32 (__v16si __A, __v16si __B, __v16si __C,
+			      __v16si __D, __v16si __E, __v4si *__F)
+	{
+	    return (__v16si) __builtin_ia32_vp4dpwssd ((__v16si) __B,
+						       (__v16si) __C,
+						       (__v16si) __D,
+						       (__v16si) __E,
+						       (__v16si) __A,
+						       (const __v4si *) __F);
+	}
+    } "-mavx5124vnniw" ]
+}
+
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
diff --git a/gcc/testsuite/gcc.target/i386/m128-check.h b/gcc/testsuite/gcc.target/i386/m128-check.h
index abb792b..48b2332 100644
--- a/gcc/testsuite/gcc.target/i386/m128-check.h
+++ b/gcc/testsuite/gcc.target/i386/m128-check.h
@@ -108,8 +108,12 @@ CHECK_EXP (union128d, double, "%f")
 
 CHECK_EXP (union128, float, "%f")
 
+#ifndef ESP_FLOAT
 #define ESP_FLOAT 0.000001
+#endif
+#ifndef ESP_DOUBLE
 #define ESP_DOUBLE 0.000001
+#endif
 #define CHECK_ARRAY(ARRAY, TYPE, FMT)                   \
 static int                                              \
 __attribute__((noinline, unused))                       \
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index f0f5457..3e8417b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@
    popcntintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mclwb -mmwaitx -mclzero -mpku" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 80d8c20..67f3b93 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mclwb -mmwaitx -mclzero -mpku" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 9242493..256d933 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -8,7 +8,8 @@
 /* Test that the intrinsics compile with optimization.  All of them
    are defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h,
    mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h,
-   tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h and mm_malloc.h 
+   tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h,
+   avx5124fmapsintrin.h, avx5124vnniwintrin.h and mm_malloc.h 
    that reference the proper builtin functions.
 
    Defining away "extern" and "__inline" results in all of them being
@@ -100,7 +101,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512ifma")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 4635fb0..61f1b00 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -7,7 +7,8 @@
 /* Test that the intrinsics compile with optimization.  All of them
    are defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h,
    mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h,
-   tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h and mm_malloc.h 
+   tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h,
+   avx5124fmapsintrin.h, avx5124vnniwintrin.h and mm_malloc.h 
    that reference the proper builtin functions.
 
    Defining away "extern" and "__inline" results in all of them being
@@ -594,6 +595,6 @@
 #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D)
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,clwb,mwaitx,clzero,pku")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,clwb,mwaitx,clzero,pku")
 
 #include <x86intrin.h>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-11 11:30 ` Jakub Jelinek
@ 2016-11-14 18:29   ` Andrew Senkevich
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-14 18:29 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Vladimir Makarov, Kirill Yukhin

2016-11-11 14:29 GMT+03:00 Jakub Jelinek <jakub@redhat.com>:
> Hi!
>
> I've noticed preexisting:
>
> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
>
>> --- a/gcc/config/i386/i386-modes.def
>> +++ b/gcc/config/i386/i386-modes.def
>> @@ -84,6 +84,7 @@ VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
>>  VECTOR_MODES (FLOAT, 32);     /*        V16HF V8SF V4DF */
>>  VECTOR_MODES (FLOAT, 64);     /*       V32HF V16SF V8DF */
>>  VECTOR_MODES (FLOAT, 128);    /*      V64HF V32SF V16DF */
>
> The VECTOR_MODES (FLOAT, comments don't really match reality, shall we fix
> that?  None of them create V*HF mode, but they do create V*TF mode.

I have fixed it in new patch.


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-14 18:28       ` Andrew Senkevich
@ 2016-11-15 10:04         ` Uros Bizjak
  0 siblings, 0 replies; 29+ messages in thread
From: Uros Bizjak @ 2016-11-15 10:04 UTC (permalink / raw)
  To: Andrew Senkevich
  Cc: Jakub Jelinek, gcc-patches, Vladimir Makarov, Kirill Yukhin

On Mon, Nov 14, 2016 at 7:28 PM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 2016-11-11 14:16 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>> The x86 part of the patch is OK with the above changes and additional
>> target attribute test for flags2 ISA features..
>
> Fixed according your comments, I will followup with additional tests soon.

OK.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-11 11:16     ` Uros Bizjak
  2016-11-14 18:28       ` Andrew Senkevich
@ 2016-11-15 12:55       ` Andrew Senkevich
  2016-11-15 14:56         ` Jeff Law
  1 sibling, 1 reply; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-15 12:55 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Jakub Jelinek, gcc-patches, Vladimir Makarov, Kirill Yukhin, Jeff Law

2016-11-11 14:16 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> --- a/gcc/genmodes.c
> +++ b/gcc/genmodes.c
> --- a/gcc/init-regs.c
> +++ b/gcc/init-regs.c
> --- a/gcc/machmode.h
> +++ b/gcc/machmode.h
>
> These are middle-end changes, you will need a separate review for these.

Who could review these changes?


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-15 12:55       ` Andrew Senkevich
@ 2016-11-15 14:56         ` Jeff Law
  2016-11-15 16:31           ` Andrew Senkevich
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Law @ 2016-11-15 14:56 UTC (permalink / raw)
  To: Andrew Senkevich, Uros Bizjak
  Cc: Jakub Jelinek, gcc-patches, Vladimir Makarov, Kirill Yukhin

On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
> 2016-11-11 14:16 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>> --- a/gcc/genmodes.c
>> +++ b/gcc/genmodes.c
>> --- a/gcc/init-regs.c
>> +++ b/gcc/init-regs.c
>> --- a/gcc/machmode.h
>> +++ b/gcc/machmode.h
>>
>> These are middle-end changes, you will need a separate review for these.
>
> Who could review these changes?
I can.  I likely dropped the message because it looked x86 specific, so 
if you could resend it'd be appreciated.

jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-15 14:56         ` Jeff Law
@ 2016-11-15 16:31           ` Andrew Senkevich
  2016-11-16 16:21             ` Bernd Schmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-15 16:31 UTC (permalink / raw)
  To: Jeff Law
  Cc: Uros Bizjak, Jakub Jelinek, gcc-patches, Vladimir Makarov, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 645 bytes --]

2016-11-15 17:56 GMT+03:00 Jeff Law <law@redhat.com>:
> On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
>>
>> 2016-11-11 14:16 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>
>>> --- a/gcc/genmodes.c
>>> +++ b/gcc/genmodes.c
>>> --- a/gcc/init-regs.c
>>> +++ b/gcc/init-regs.c
>>> --- a/gcc/machmode.h
>>> +++ b/gcc/machmode.h
>>>
>>> These are middle-end changes, you will need a separate review for these.
>>
>>
>> Who could review these changes?
>
> I can.  I likely dropped the message because it looked x86 specific, so if
> you could resend it'd be appreciated.

Attached (diff with previous only in fixed comments typos).


--
WBR,
Andrew

[-- Attachment #2: new_avx512_instructions_15.11.patch --]
[-- Type: application/octet-stream, Size: 105828 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9e93f79..93f5f35 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,84 @@
+2016-11-10  Kirill Yukhin  <kirill.yukhin@gmail.com>
+	    Andrew Senkevich <andrew.senkevich@intel.com>
+
+	* common/config/i386/i386-common.c
+	(OPTION_MASK_ISA_AVX5124FMAPS_SET,
+	OPTION_MASK_ISA_AVX5124FMAPS_UNSET,
+	OPTION_MASK_ISA_AVX5124VNNIW_SET,
+	OPTION_MASK_ISA_AVX5124VNNIW_UNSET): New.
+	(ix86_handle_option): Handle OPT_mavx5124fmaps,
+	OPT_mavx5124vnniw.
+	* config.gcc: Add avx5124fmapsintrin.h, avx5124vnniwintrin.h.
+	* config/i386/avx5124fmapsintrin.h: New file.
+	* config/i386/avx5124vnniwintrin.h: Ditto.
+	* config/i386/constraints.md (h): New constraint.
+	* config/i386/cpuid.h: (bit_AVX5124VNNIW,
+	bit_AVX5124FMAPS): New.
+	* config/i386/driver-i386.c (host_detect_local_cpu):
+	Detect avx5124fmaps, avx5124vnniw.
+	* config/i386/i386-builtin-types.def: Add types
+	V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI,
+	V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF,
+	V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF,
+	V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI,
+	V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI,
+	V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI.
+	* config/i386/i386-builtin.def (__builtin_ia32_4fmaddps_mask,
+	__builtin_ia32_4fmaddps, __builtin_ia32_4fmaddss,
+	__builtin_ia32_4fmaddss_mask, __builtin_ia32_4fnmaddps_mask,
+	__builtin_ia32_4fnmaddps, __builtin_ia32_4fnmaddss,
+	__builtin_ia32_4fnmaddss_mask, __builtin_ia32_vp4dpwssd,
+	__builtin_ia32_vp4dpwssd_mask, __builtin_ia32_vp4dpwssds,
+	__builtin_ia32_vp4dpwssds_mask): New.
+	* config/i386/i386-c.c (ix86_target_macros_internal):
+	Define __AVX5124FMAPS__, __AVX5124VNNIW__.
+	* config/i386/i386-modes.def: Fixed comment typos, added new
+	modes (VECTOR_MODES (FLOAT, 256), VECTOR_MODE (INT, SI, 64)).
+	* config/i386/i386.c (ix86_target_string): Add -mavx5124fmaps,
+	-mavx5124vnniw.
+	(PTA_AVX5124FMAPS, PTA_AVX5124VNNIW): Define.
+	(ix86_option_override_internal): Handle new options.
+	(ix86_valid_target_attribute_inner_p): Add avx5124fmaps,
+	avx5124vnniw.
+	(ix86_expand_builtin): Handle new builtins.
+	(ix86_additional_allocno_class_p): New.
+	* config/i386/i386.h (TARGET_AVX5124FMAPS,
+	TARGET_AVX5124FMAPS_P,
+	TARGET_AVX5124VNNIW,
+	TARGET_AVX5124VNNIW_P): Define.
+	(reg_class): Add MOD4_SSE_REGS.
+	(MOD4_SSE_REG_P, MOD4_SSE_REGNO_P): New.
+	* config/i386/i386.opt: Add mavx5124fmaps, mavx5124vnniw.
+	* config/i386/immintrin.h: Include avx5124fmapsintrin.h,
+	avx5124vnniwintrin.h.
+	* config/i386/sse.md (unspec): Add UNSPEC_VP4FMADD,
+	UNSPEC_VP4FNMADD,
+	UNSPEC_VP4DPWSSD, UNSPEC_VP4DPWSSDS.
+	(define_mode_iterator IMOD4): New.
+	(define_mode_attr imod4_narrow): Ditto.
+	(define_insn "mov<mode>"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddps"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddps_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddps_maskz"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddss"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddss_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fmaddss_maskz"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddps"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddps_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddps_maskz"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddss"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddss_mask"): Ditto.
+	(define_insn "avx5124fmaddps_4fnmaddss_maskz"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssd"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssd_mask"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssd_maskz"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssds"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssds_mask"): Ditto.
+	(define_insn "avx5124vnniw_vp4dpwssds_maskz"): Ditto.
+	* init-regs.c (initialize_uninitialized_regs): Add emit_clobber call.
+	* genmodes.c (mode_size_inline): Extend return type.
+	* machmode.h (mode_size, mode_base_align): Extend type.
+
 2016-11-15  Richard Sandiford  <richard.sandiford@arm.com>
             Alan Hayward  <alan.hayward@arm.com>
             David Sherwood  <david.sherwood@arm.com>
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d522e24..819e836 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,31 @@
+2016-11-10  Kirill Yukhin  <kirill.yukhin@gmail.com>
+	    Andrew Senkevich <andrew.senkevich@intel.com>
+
+	* gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: New test.
+	* gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
+	* gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
+	* gcc.target/i386/avx5124fmaps-check.h: Ditto.
+	* gcc.target/i386/avx5124vnniw-check.h: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
+	* gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto.
+	* gcc.target/i386/avx512f-helper.h: Add avx5124fmaps-check.h,
+	avx5124vnniw-check.h.
+	* gcc.target/i386/i386.exp (check_effective_target_avx5124fmaps,
+	check_effective_target_avx5124vnniw): New.
+	* gcc.target/i386/m128-check.h (ESP_FLOAT, ESP_DOUBLE):
+	Set under ifndef.
+	* gcc.target/i386/sse-12.c: Add -mavx5124fmaps, -mavx5124vnniw.
+	* gcc.target/i386/sse-13.c: Ditto.
+	* g++.dg/other/i386-2.C: Ditto.
+	* g++.dg/other/i386-3.C: Ditto.
+	* gcc.target/i386/sse-22.c: Ditto.
+	* gcc.target/i386/sse-23.c: Ditto.
+
 2016-11-14  Michael Meissner  <meissner@linux.vnet.ibm.com>
 
         * gcc.target/powerpc/vec-set-int.c: New test.
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index d201154..98224f5 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -76,6 +76,8 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512F_SET)
 #define OPTION_MASK_ISA_AVX512VBMI_SET \
   (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512BW_SET)
+#define OPTION_MASK_ISA_AVX5124FMAPS_SET OPTION_MASK_ISA_AVX5124FMAPS
+#define OPTION_MASK_ISA_AVX5124VNNIW_SET OPTION_MASK_ISA_AVX5124VNNIW
 #define OPTION_MASK_ISA_RTM_SET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
@@ -179,6 +181,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL
 #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA
 #define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI
+#define OPTION_MASK_ISA_AVX5124FMAPS_UNSET OPTION_MASK_ISA_AVX5124FMAPS
+#define OPTION_MASK_ISA_AVX5124VNNIW_UNSET OPTION_MASK_ISA_AVX5124VNNIW
 #define OPTION_MASK_ISA_RTM_UNSET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
@@ -399,6 +403,12 @@ ix86_handle_option (struct gcc_options *opts,
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
+
+	  /* Turn off additional isa flags.  */
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
 	}
       return true;
 
@@ -441,6 +451,36 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mavx5124fmaps:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124FMAPS_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
+	}
+      return true;
+
+    case OPT_mavx5124vnniw:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124VNNIW_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+	}
+      return true;
+
     case OPT_mavx512dq:
       if (value)
 	{
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3e0be22..20413fb 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -373,8 +373,8 @@ i[34567]86-*-*)
 		       xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
 		       avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
-		       avx512vbmivlintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h"
+		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
+		       clwbintrin.h mwaitxintrin.h clzerointrin.h pkuintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -395,8 +395,8 @@ x86_64-*-*)
 		       xsavesintrin.h avx512dqintrin.h avx512bwintrin.h
 		       avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
 		       avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
-		       avx512vbmivlintrin.h clwbintrin.h mwaitxintrin.h
-		       clzerointrin.h pkuintrin.h"
+		       avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
+		       clwbintrin.h mwaitxintrin.h clzerointrin.h pkuintrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx5124fmapsintrin.h b/gcc/config/i386/avx5124fmapsintrin.h
new file mode 100644
index 0000000..6113ee9
--- /dev/null
+++ b/gcc/config/i386/avx5124fmapsintrin.h
@@ -0,0 +1,216 @@
+/* Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx5124fmapsintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _AVX5124FMAPSINTRIN_H_INCLUDED
+#define _AVX5124FMAPSINTRIN_H_INCLUDED
+
+#ifndef __AVX5124FMAPS__
+#pragma GCC push_options
+#pragma GCC target("avx5124fmaps")
+#define __DISABLE_AVX5124FMAPS__
+#endif /* __AVX5124FMAPS__ */
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4fmadd_ps (__m512 __A, __m512 __B, __m512 __C,
+		  __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps ((__v16sf) __B,
+					   (__v16sf) __C,
+					   (__v16sf) __D,
+					   (__v16sf) __E,
+					   (__v16sf) __A,
+					   (const __v4sf *) __F);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4fmadd_ps (__m512 __A, __mmask16 __U, __m512 __B,
+		       __m512 __C, __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps_mask ((__v16sf) __B,
+						(__v16sf) __C,
+						(__v16sf) __D,
+						(__v16sf) __E,
+						(__v16sf) __A,
+						(const __v4sf *) __F,
+						(__v16sf) __A,
+						(__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4fmadd_ps (__mmask16 __U,
+			__m512 __A, __m512 __B, __m512 __C,
+			__m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fmaddps_mask ((__v16sf) __B,
+						(__v16sf) __C,
+						(__v16sf) __D,
+						(__v16sf) __E,
+						(__v16sf) __A,
+						(const __v4sf *) __F,
+						(__v16sf) _mm512_setzero_ps (),
+						(__mmask16) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_4fmadd_ss (__m128 __A, __m128 __B, __m128 __C,
+	       __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss ((__v4sf) __B,
+					   (__v4sf) __C,
+					   (__v4sf) __D,
+					   (__v4sf) __E,
+					   (__v4sf) __A,
+					   (const __v4sf *) __F);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_4fmadd_ss (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C,
+		    __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
+						(__v4sf) __C,
+						(__v4sf) __D,
+						(__v4sf) __E,
+						(__v4sf) __A,
+						(const __v4sf *) __F,
+						(__v4sf) __A,
+						(__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_4fmadd_ss (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C,
+		     __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
+						(__v4sf) __C,
+						(__v4sf) __D,
+						(__v4sf) __E,
+						(__v4sf) __A,
+						(const __v4sf *) __F,
+						(__v4sf) _mm_setzero_ps (),
+						(__mmask8) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4fnmadd_ps (__m512 __A, __m512 __B, __m512 __C,
+		   __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps ((__v16sf) __B,
+					    (__v16sf) __C,
+					    (__v16sf) __D,
+					    (__v16sf) __E,
+					    (__v16sf) __A,
+					    (const __v4sf *) __F);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4fnmadd_ps (__m512 __A, __mmask16 __U, __m512 __B,
+			__m512 __C, __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps_mask ((__v16sf) __B,
+						 (__v16sf) __C,
+						 (__v16sf) __D,
+						 (__v16sf) __E,
+						 (__v16sf) __A,
+						 (const __v4sf *) __F,
+						 (__v16sf) __A,
+						 (__mmask16) __U);
+}
+
+extern __inline __m512
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4fnmadd_ps (__mmask16 __U,
+			 __m512 __A, __m512 __B, __m512 __C,
+			 __m512 __D, __m512 __E, __m128 *__F)
+{
+  return (__m512) __builtin_ia32_4fnmaddps_mask ((__v16sf) __B,
+						 (__v16sf) __C,
+						 (__v16sf) __D,
+						 (__v16sf) __E,
+						 (__v16sf) __A,
+						 (const __v4sf *) __F,
+						 (__v16sf) _mm512_setzero_ps (),
+						 (__mmask16) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_4fnmadd_ss (__m128 __A, __m128 __B, __m128 __C,
+		__m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss ((__v4sf) __B,
+					    (__v4sf) __C,
+					    (__v4sf) __D,
+					    (__v4sf) __E,
+					    (__v4sf) __A,
+					    (const __v4sf *) __F);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_4fnmadd_ss (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C,
+		     __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss_mask ((__v4sf) __B,
+						 (__v4sf) __C,
+						 (__v4sf) __D,
+						 (__v4sf) __E,
+						 (__v4sf) __A,
+						 (const __v4sf *) __F,
+						 (__v4sf) __A,
+						 (__mmask8) __U);
+}
+
+extern __inline __m128
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_4fnmadd_ss (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C,
+		      __m128 __D, __m128 __E, __m128 *__F)
+{
+  return (__m128) __builtin_ia32_4fnmaddss_mask ((__v4sf) __B,
+						 (__v4sf) __C,
+						 (__v4sf) __D,
+						 (__v4sf) __E,
+						 (__v4sf) __A,
+						 (const __v4sf *) __F,
+						 (__v4sf) _mm_setzero_ps (),
+						 (__mmask8) __U);
+}
+
+#ifdef __DISABLE_AVX5124FMAPS__
+#undef __DISABLE_AVX5124FMAPS__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX5124FMAPS__ */
+
+#endif /* _AVX5124FMAPSINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/avx5124vnniwintrin.h b/gcc/config/i386/avx5124vnniwintrin.h
new file mode 100644
index 0000000..392c6a5
--- /dev/null
+++ b/gcc/config/i386/avx5124vnniwintrin.h
@@ -0,0 +1,132 @@
+/* Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if !defined _IMMINTRIN_H_INCLUDED
+# error "Never use <avx5124vnniwintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _AVX5124VNNIWINTRIN_H_INCLUDED
+#define _AVX5124VNNIWINTRIN_H_INCLUDED
+
+#ifndef __AVX5124VNNIW__
+#pragma GCC push_options
+#pragma GCC target("avx5124vnniw")
+#define __DISABLE_AVX5124VNNIW__
+#endif /* __AVX5124VNNIW__ */
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4dpwssd_epi32 (__m512i __A, __m512i __B, __m512i __C,
+		      __m512i __D, __m512i __E, __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd ((__v16si) __B,
+					     (__v16si) __C,
+					     (__v16si) __D,
+					     (__v16si) __E,
+					     (__v16si) __A,
+					     (const __v4si *) __F);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4dpwssd_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+			   __m512i __C, __m512i __D, __m512i __E,
+			   __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd_mask ((__v16si) __B,
+						  (__v16si) __C,
+						  (__v16si) __D,
+						  (__v16si) __E,
+						  (__v16si) __A,
+						  (const __v4si *) __F,
+						  (__v16si) __A,
+						  (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4dpwssd_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+			    __m512i __C, __m512i __D, __m512i __E,
+			    __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssd_mask ((__v16si) __B,
+						  (__v16si) __C,
+						  (__v16si) __D,
+						  (__v16si) __E,
+						  (__v16si) __A,
+						  (const __v4si *) __F,
+						  (__v16si) _mm512_setzero_ps (),
+						  (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_4dpwssds_epi32 (__m512i __A, __m512i __B, __m512i __C,
+		       __m512i __D, __m512i __E, __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds ((__v16si) __B,
+					      (__v16si) __C,
+					      (__v16si) __D,
+					      (__v16si) __E,
+					      (__v16si) __A,
+					      (const __v4si *) __F);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_4dpwssds_epi32 (__m512i __A, __mmask16 __U, __m512i __B,
+			    __m512i __C, __m512i __D, __m512i __E,
+			    __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds_mask ((__v16si) __B,
+						   (__v16si) __C,
+						   (__v16si) __D,
+						   (__v16si) __E,
+						   (__v16si) __A,
+						   (const __v4si *) __F,
+						   (__v16si) __A,
+						   (__mmask16) __U);
+}
+
+extern __inline __m512i
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_4dpwssds_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
+			     __m512i __C, __m512i __D, __m512i __E,
+			     __m128i *__F)
+{
+  return (__m512i) __builtin_ia32_vp4dpwssds_mask ((__v16si) __B,
+						   (__v16si) __C,
+						   (__v16si) __D,
+						   (__v16si) __E,
+						   (__v16si) __A,
+						   (const __v4si *) __F,
+						   (__v16si) _mm512_setzero_ps (),
+						   (__mmask16) __U);
+}
+
+#ifdef __DISABLE_AVX5124VNNIW__
+#undef __DISABLE_AVX5124VNNIW__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX5124VNNIW__ */
+
+#endif /* _AVX5124VNNIWINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index d610336..b734ce4 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -112,6 +112,7 @@
 ;;  f	x87 register when 80387 floating point arithmetic is enabled
 ;;  r	SSE regs not requiring REX prefix when prefixes avoidance is enabled
 ;;	and all SSE regs otherwise
+;;  h   EVEX encodable SSE register with number factor of four
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -160,6 +161,9 @@
  "TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
  "@internal For AVX512VL, any EVEX encodable SSE register (@code{%xmm0-%xmm31}), otherwise any SSE register.")
 
+(define_register_constraint "Yh" "TARGET_AVX512F ? MOD4_SSE_REGS : NO_REGS"
+ "@internal Any EVEX encodable SSE register, which has number factor of four.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  f  FLAGS_REG
 ;;  g  GOT memory operand.
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 2a946bf..abe7c62 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -60,6 +60,8 @@
 #define bit_MWAITX      (1 << 29)
 
 /* %edx */
+#define bit_AVX5124VNNIW (1 << 2)
+#define bit_AVX5124FMAPS (1 << 3)
 #define bit_MMXEXT	(1 << 22)
 #define bit_LM		(1 << 29)
 #define bit_3DNOWP	(1 << 30)
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index e026482..f0d0e8f 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -414,6 +414,7 @@ const char *host_detect_local_cpu (int argc, const char **argv)
   unsigned int has_avx512dq = 0, has_avx512bw = 0, has_avx512vl = 0;
   unsigned int has_avx512vbmi = 0, has_avx512ifma = 0, has_clwb = 0;
   unsigned int has_mwaitx = 0, has_clzero = 0, has_pku = 0;
+  unsigned int has_avx5124fmaps = 0, has_avx5124vnniw = 0;
 
   bool arch;
 
@@ -501,6 +502,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
       has_prefetchwt1 = ecx & bit_PREFETCHWT1;
       has_avx512vbmi = ecx & bit_AVX512VBMI;
       has_pku = ecx & bit_OSPKE;
+      has_avx5124vnniw = edx & bit_AVX5124VNNIW;
+      has_avx5124fmaps = edx & bit_AVX5124FMAPS;
     }
 
   if (max_level >= 13)
@@ -1021,6 +1024,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
       const char *avx512vl = has_avx512vl ? " -mavx512vl" : " -mno-avx512vl";
       const char *avx512ifma = has_avx512ifma ? " -mavx512ifma" : " -mno-avx512ifma";
       const char *avx512vbmi = has_avx512vbmi ? " -mavx512vbmi" : " -mno-avx512vbmi";
+      const char *avx5124vnniw = has_avx5124vnniw ? " -mavx5124vnniw" : " -mno-avx5124vnniw";
+      const char *avx5124fmaps = has_avx5124fmaps ? " -mavx5124fmaps" : " -mno-avx5124fmaps";
       const char *clwb = has_clwb ? " -mclwb" : " -mno-clwb";
       const char *mwaitx  = has_mwaitx  ? " -mmwaitx"  : " -mno-mwaitx"; 
       const char *clzero  = has_clzero  ? " -mclzero"  : " -mno-clzero";
@@ -1033,8 +1038,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
 			fxsr, xsave, xsaveopt, avx512f, avx512er,
 			avx512cd, avx512pf, prefetchwt1, clflushopt,
 			xsavec, xsaves, avx512dq, avx512bw, avx512vl,
-			avx512ifma, avx512vbmi, clwb, mwaitx,
-			clzero, pku, NULL);
+			avx512ifma, avx512vbmi, avx5124fmaps, avx5124vnniw,
+			clwb, mwaitx, clzero, pku, NULL);
     }
 
 done:
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index b34cfda..4a38c12 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -526,6 +526,15 @@ DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (VOID, PV8DI, V8DI)
 
+DEF_FUNCTION_TYPE (V16SF, V16SF, V16SF, V16SF, V16SF, V16SF, PCV4SF, V16SF, UHI)
+DEF_FUNCTION_TYPE (V16SF, V16SF, V16SF, V16SF, V16SF, V16SF, PCV4SF)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, V4SF, V4SF, PCV4SF)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, V4SF, V4SF, PCV4SF, V4SF, UQI)
+
+DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI, V16SI, UHI)
+DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, V16SI, V16SI, V16SI, PCV4SI)
+
+
 # Instructions returning mask
 DEF_FUNCTION_TYPE (UHI, UHI)
 DEF_FUNCTION_TYPE (UHI, V16QI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 227526b..b23b70c 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2482,7 +2482,24 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ufix_truncv8dfv8di2_mask_round, "__bui
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
 
-BDESC_END (ROUND_ARGS, MPX)
+BDESC_END (ROUND_ARGS, ARGS2)
+
+/* AVX512_4FMAPS and AVX512_4VNNIW builtins with variable number of arguments. Defined in additional ix86_isa_flags2.  */
+BDESC_FIRST (args2, ARGS2,
+       OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddps_mask, "__builtin_ia32_4fmaddps_mask", IX86_BUILTIN_4FMAPS_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddps, "__builtin_ia32_4fmaddps", IX86_BUILTIN_4FMAPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddss, "__builtin_ia32_4fmaddss", IX86_BUILTIN_4FMASS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fmaddss_mask, "__builtin_ia32_4fmaddss_mask", IX86_BUILTIN_4FMASS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddps_mask, "__builtin_ia32_4fnmaddps_mask", IX86_BUILTIN_4FNMAPS_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddps, "__builtin_ia32_4fnmaddps", IX86_BUILTIN_4FNMAPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddss, "__builtin_ia32_4fnmaddss", IX86_BUILTIN_4FNMASS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF)
+BDESC (OPTION_MASK_ISA_AVX5124FMAPS, CODE_FOR_avx5124fmaddps_4fnmaddss_mask, "__builtin_ia32_4fnmaddss_mask", IX86_BUILTIN_4FNMASS_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssd, "__builtin_ia32_vp4dpwssd", IX86_BUILTIN_4DPWSSD, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssd_mask, "__builtin_ia32_vp4dpwssd_mask", IX86_BUILTIN_4DPWSSD_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssds, "__builtin_ia32_vp4dpwssds", IX86_BUILTIN_4DPWSSDS, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI)
+BDESC (OPTION_MASK_ISA_AVX5124VNNIW, CODE_FOR_avx5124vnniw_vp4dpwssds_mask, "__builtin_ia32_vp4dpwssds_mask", IX86_BUILTIN_4DPWSSDS_MASK, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI)
+
+BDESC_END (ARGS2, MPX)
 
 /* Builtins for MPX.  */
 BDESC_FIRST (mpx, MPX,
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 9bb80c0..6e56c83 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -28,14 +28,14 @@ along with GCC; see the file COPYING3.  If not see
 
 static bool ix86_pragma_target_parse (tree, tree);
 static void ix86_target_macros_internal
-  (HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
+  (HOST_WIDE_INT, HOST_WIDE_INT, enum processor_type, enum processor_type, enum fpmath_unit,
    void (*def_or_undef) (cpp_reader *, const char *));
 
-\f
 /* Internal function to either define or undef the appropriate system
    macros.  */
 static void
 ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
+			     HOST_WIDE_INT isa_flag2,
 			     enum processor_type arch,
 			     enum processor_type tune,
 			     enum fpmath_unit fpmath,
@@ -376,6 +376,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__AVX512VBMI__");
   if (isa_flag & OPTION_MASK_ISA_AVX512IFMA)
     def_or_undef (parse_in, "__AVX512IFMA__");
+  if (isa_flag2 & OPTION_MASK_ISA_AVX5124VNNIW)
+    def_or_undef (parse_in, "__AVX5124VNNIW__");
+  if (isa_flag2 & OPTION_MASK_ISA_AVX5124FMAPS)
+    def_or_undef (parse_in, "__AVX5124FMAPS__");
   if (isa_flag & OPTION_MASK_ISA_FMA)
     def_or_undef (parse_in, "__FMA__");
   if (isa_flag & OPTION_MASK_ISA_RTM)
@@ -462,6 +466,9 @@ ix86_pragma_target_parse (tree args, tree pop_target)
   HOST_WIDE_INT prev_isa;
   HOST_WIDE_INT cur_isa;
   HOST_WIDE_INT diff_isa;
+  HOST_WIDE_INT prev_isa2;
+  HOST_WIDE_INT cur_isa2;
+  HOST_WIDE_INT diff_isa2;
   enum processor_type prev_arch;
   enum processor_type prev_tune;
   enum processor_type cur_arch;
@@ -494,6 +501,9 @@ ix86_pragma_target_parse (tree args, tree pop_target)
   prev_isa  = prev_opt->x_ix86_isa_flags;
   cur_isa   = cur_opt->x_ix86_isa_flags;
   diff_isa  = (prev_isa ^ cur_isa);
+  prev_isa2  = prev_opt->x_ix86_isa_flags2;
+  cur_isa2   = cur_opt->x_ix86_isa_flags2;
+  diff_isa2  = (prev_isa2 ^ cur_isa2);
   prev_arch = (enum processor_type) prev_opt->arch;
   prev_tune = (enum processor_type) prev_opt->tune;
   cur_arch  = (enum processor_type) cur_opt->arch;
@@ -509,6 +519,7 @@ ix86_pragma_target_parse (tree args, tree pop_target)
 
   /* Undef all of the macros for that are no longer current.  */
   ix86_target_macros_internal (prev_isa & diff_isa,
+			       prev_isa2 & diff_isa2,
 			       prev_arch,
 			       prev_tune,
 			       (enum fpmath_unit) prev_opt->x_ix86_fpmath,
@@ -523,6 +534,7 @@ ix86_pragma_target_parse (tree args, tree pop_target)
 
   /* Define all of the macros for new options that were just turned on.  */
   ix86_target_macros_internal (cur_isa & diff_isa,
+			       cur_isa2 & diff_isa2,
 			       cur_arch,
 			       cur_tune,
 			       (enum fpmath_unit) cur_opt->x_ix86_fpmath,
@@ -583,6 +595,7 @@ ix86_target_macros (void)
   cpp_define (parse_in, "__GCC_ASM_FLAG_OUTPUTS__");
 
   ix86_target_macros_internal (ix86_isa_flags,
+			       ix86_isa_flags2,
 			       ix86_arch,
 			       ix86_tune,
 			       ix86_fpmath,
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index d524313..1899d06 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -79,11 +79,12 @@ VECTOR_MODES (INT, 16);       /*   V16QI V8HI V4SI V2DI */
 VECTOR_MODES (INT, 32);       /*  V32QI V16HI V8SI V4DI */
 VECTOR_MODES (INT, 64);       /* V64QI V32HI V16SI V8DI */
 VECTOR_MODES (INT, 128);      /* V128QI V64HI V32SI V16DI */
-VECTOR_MODES (FLOAT, 8);      /*              V4HF V2SF */
-VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
-VECTOR_MODES (FLOAT, 32);     /*        V16HF V8SF V4DF */
-VECTOR_MODES (FLOAT, 64);     /*       V32HF V16SF V8DF */
-VECTOR_MODES (FLOAT, 128);    /*      V64HF V32SF V16DF */
+VECTOR_MODES (FLOAT, 8);      /*                   V2SF */
+VECTOR_MODES (FLOAT, 16);     /*              V4SF V2DF */
+VECTOR_MODES (FLOAT, 32);     /*         V8SF V4DF V2TF */
+VECTOR_MODES (FLOAT, 64);     /*        V16SF V8DF V4TF */
+VECTOR_MODES (FLOAT, 128);    /*       V32SF V16DF V8TF */
+VECTOR_MODES (FLOAT, 256);    /*      V64SF V32DF V16TF */
 VECTOR_MODE (INT, TI, 1);     /*                   V1TI */
 VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
 VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
@@ -91,6 +92,7 @@ VECTOR_MODE (INT, QI, 2);     /*                   V2QI */
 VECTOR_MODE (INT, QI, 12);    /*                  V12QI */
 VECTOR_MODE (INT, QI, 14);    /*                  V14QI */
 VECTOR_MODE (INT, HI, 6);     /*                   V6HI */
+VECTOR_MODE (INT, SI, 64);    /* 		  V64SI */
 
 POINTER_BOUNDS_MODE (BND32, 8);
 POINTER_BOUNDS_MODE (BND64, 16);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..1da1abc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2579,7 +2579,7 @@ static int ix86_function_regparm (const_tree, const_tree);
 static void ix86_compute_frame_layout (struct ix86_frame *);
 static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode,
 						 rtx, rtx, int);
-static void ix86_add_new_builtins (HOST_WIDE_INT);
+static void ix86_add_new_builtins (HOST_WIDE_INT, HOST_WIDE_INT);
 static tree ix86_canonical_va_list_type (tree);
 static void predict_jump (int);
 static unsigned int split_stack_prologue_scratch_regno (void);
@@ -2592,8 +2592,9 @@ enum ix86_function_specific_strings
   IX86_FUNCTION_SPECIFIC_MAX
 };
 
-static char *ix86_target_string (HOST_WIDE_INT, int, int, const char *,
-				 const char *, enum fpmath_unit, bool);
+static char *ix86_target_string (HOST_WIDE_INT, HOST_WIDE_INT, int, int,
+				 const char *, const char *, enum fpmath_unit,
+				 bool);
 static void ix86_function_specific_save (struct cl_target_option *,
 					 struct gcc_options *opts);
 static void ix86_function_specific_restore (struct gcc_options *opts,
@@ -4188,8 +4189,8 @@ ix86_using_red_zone (void)
    responsible for freeing the string.  */
 
 static char *
-ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
-		    const char *arch, const char *tune,
+ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2, int flags,
+		    int ix86_flags, const char *arch, const char *tune,
 		    enum fpmath_unit fpmath, bool add_nl_p)
 {
   struct ix86_target_opts
@@ -4257,7 +4258,12 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
     { "-mclzero",	OPTION_MASK_ISA_CLZERO  },
     { "-mpku",		OPTION_MASK_ISA_PKU  },
   };
-
+  /* Additional structure for isa flags.  */
+  static struct ix86_target_opts isa_opts2[] =
+  {
+    { "-mavx5124vnniw", OPTION_MASK_ISA_AVX5124VNNIW },
+    { "-mavx5124fmaps", OPTION_MASK_ISA_AVX5124FMAPS },
+  };
   /* Flag options.  */
   static struct ix86_target_opts flag_opts[] =
   {
@@ -4298,8 +4304,8 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
     { "-mgeneral-regs-only",		OPTION_MASK_GENERAL_REGS_ONLY },
   };
 
-  const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (flag_opts)
-		   + ARRAY_SIZE (ix86_flag_opts) + 6][2];
+  const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (isa_opts2)
+		   + ARRAY_SIZE (flag_opts) + ARRAY_SIZE (ix86_flag_opts) + 6][2];
 
   char isa_other[40];
   char target_other[40];
@@ -4361,6 +4367,16 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int ix86_flags,
 	       isa);
     }
 
+  /* Pick out the options in isa2 options.  */
+  for (i = 0; i < ARRAY_SIZE (isa_opts2); i++)
+    {
+      if ((isa2 & isa_opts2[i].mask) != 0)
+	{
+	  opts[num++][0] = isa_opts2[i].option;
+	  isa &= ~ isa_opts2[i].mask;
+	}
+    }
+
   /* Add flag options.  */
   for (i = 0; i < ARRAY_SIZE (flag_opts); i++)
     {
@@ -4486,9 +4502,9 @@ ix86_profile_before_prologue (void)
 void ATTRIBUTE_UNUSED
 ix86_debug_options (void)
 {
-  char *opts = ix86_target_string (ix86_isa_flags, target_flags,
-				   ix86_target_flags,
-				   ix86_arch_string, ix86_tune_string,
+  char *opts = ix86_target_string (ix86_isa_flags, ix86_isa_flags2,
+				   target_flags, ix86_target_flags,
+				   ix86_arch_string,ix86_tune_string,
 				   ix86_fpmath, true);
 
   if (opts)
@@ -4844,6 +4860,8 @@ ix86_option_override_internal (bool main_args_p,
 #define PTA_CLZERO		(HOST_WIDE_INT_1 << 57)
 #define PTA_NO_80387		(HOST_WIDE_INT_1 << 58)
 #define PTA_PKU		(HOST_WIDE_INT_1 << 59)
+#define PTA_AVX5124VNNIW	(HOST_WIDE_INT_1 << 60)
+#define PTA_AVX5124FMAPS	(HOST_WIDE_INT_1 << 61)
 
 #define PTA_CORE2 \
   (PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 \
@@ -5499,6 +5517,14 @@ ix86_option_override_internal (bool main_args_p,
 	if (processor_alias_table[i].flags & PTA_AVX512IFMA
 	    && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512IFMA))
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512IFMA;
+
+	if (processor_alias_table[i].flags & PTA_AVX5124VNNIW
+	    && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_AVX5124VNNIW))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124VNNIW;
+	if (processor_alias_table[i].flags & PTA_AVX5124FMAPS
+	    && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA_AVX5124FMAPS))
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX5124FMAPS;
+
 	if (processor_alias_table[i].flags & (PTA_PREFETCH_SSE | PTA_SSE))
 	  x86_prefetch_sse = true;
 	if (processor_alias_table[i].flags & PTA_MWAITX
@@ -6298,6 +6324,7 @@ ix86_function_specific_save (struct cl_target_option *ptr,
   ptr->tune_defaulted = ix86_tune_defaulted;
   ptr->arch_specified = ix86_arch_specified;
   ptr->x_ix86_isa_flags_explicit = opts->x_ix86_isa_flags_explicit;
+  ptr->x_ix86_isa_flags2_explicit = opts->x_ix86_isa_flags2_explicit;
   ptr->x_recip_mask_explicit = opts->x_recip_mask_explicit;
   ptr->x_ix86_arch_string = opts->x_ix86_arch_string;
   ptr->x_ix86_tune_string = opts->x_ix86_tune_string;
@@ -6354,6 +6381,7 @@ ix86_function_specific_restore (struct gcc_options *opts,
   ix86_tune_defaulted = ptr->tune_defaulted;
   ix86_arch_specified = ptr->arch_specified;
   opts->x_ix86_isa_flags_explicit = ptr->x_ix86_isa_flags_explicit;
+  opts->x_ix86_isa_flags2_explicit = ptr->x_ix86_isa_flags2_explicit;
   opts->x_recip_mask_explicit = ptr->x_recip_mask_explicit;
   opts->x_ix86_arch_string = ptr->x_ix86_arch_string;
   opts->x_ix86_tune_string = ptr->x_ix86_tune_string;
@@ -6459,9 +6487,9 @@ ix86_function_specific_print (FILE *file, int indent,
 			      struct cl_target_option *ptr)
 {
   char *target_string
-    = ix86_target_string (ptr->x_ix86_isa_flags, ptr->x_target_flags,
-			  ptr->x_ix86_target_flags, NULL, NULL,
-			  ptr->x_ix86_fpmath, false);
+    = ix86_target_string (ptr->x_ix86_isa_flags, ptr->x_ix86_isa_flags2,
+			  ptr->x_target_flags, ptr->x_ix86_target_flags,
+			  NULL, NULL, ptr->x_ix86_fpmath, false);
 
   gcc_assert (ptr->arch < PROCESSOR_max);
   fprintf (file, "%*sarch = %d (%s)\n",
@@ -6538,6 +6566,8 @@ ix86_valid_target_attribute_inner_p (tree args, char *p_strings[],
     IX86_ATTR_ISA ("avx512dq",	OPT_mavx512dq),
     IX86_ATTR_ISA ("avx512bw",	OPT_mavx512bw),
     IX86_ATTR_ISA ("avx512vl",	OPT_mavx512vl),
+    IX86_ATTR_ISA ("avx5124fmaps",	OPT_mavx5124fmaps),
+    IX86_ATTR_ISA ("avx5124vnniw",	OPT_mavx5124vnniw),
     IX86_ATTR_ISA ("mmx",	OPT_mmmx),
     IX86_ATTR_ISA ("pclmul",	OPT_mpclmul),
     IX86_ATTR_ISA ("popcnt",	OPT_mpopcnt),
@@ -6796,6 +6826,7 @@ ix86_valid_target_attribute_tree (tree args,
      The string options are attribute options, and will be undone
      when we copy the save structure.  */
   if (opts->x_ix86_isa_flags != def->x_ix86_isa_flags
+      || opts->x_ix86_isa_flags2 != def->x_ix86_isa_flags2
       || opts->x_target_flags != def->x_target_flags
       || option_strings[IX86_FUNCTION_SPECIFIC_ARCH]
       || option_strings[IX86_FUNCTION_SPECIFIC_TUNE]
@@ -6814,7 +6845,7 @@ ix86_valid_target_attribute_tree (tree args,
 				     | OPTION_MASK_ABI_64
 				     | OPTION_MASK_ABI_X32
 				     | OPTION_MASK_CODE16);
-
+	  opts->x_ix86_isa_flags &= 0;
 	}
       else if (!orig_arch_specified)
 	opts->x_ix86_arch_string = NULL;
@@ -6848,7 +6879,7 @@ ix86_valid_target_attribute_tree (tree args,
 	}
 
       /* Add any builtin functions with the new isa if any.  */
-      ix86_add_new_builtins (opts->x_ix86_isa_flags);
+      ix86_add_new_builtins (opts->x_ix86_isa_flags, opts->x_ix86_isa_flags2);
 
       /* Save the current options unless we are validating options for
 	 #pragma.  */
@@ -6953,8 +6984,10 @@ ix86_can_inline_p (tree caller, tree callee)
       /* Callee's isa options should a subset of the caller's, i.e. a SSE4 function
 	 can inline a SSE2 function but a SSE2 function can't inline a SSE4
 	 function.  */
-      if ((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
-	  != callee_opts->x_ix86_isa_flags)
+      if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
+	  != callee_opts->x_ix86_isa_flags) &
+	  ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
+	  != callee_opts->x_ix86_isa_flags2))
 	ret = false;
 
       /* See if we have the same non-isa options.  */
@@ -12078,6 +12111,15 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
 	      && df_regs_ever_live_p (regno)));
 }
 
+/* Return true if register class CL should be an additional allocno
+   class.  */
+
+static bool
+ix86_additional_allocno_class_p (reg_class_t cl)
+{
+  return cl == MOD4_SSE_REGS;
+}
+
 /* Return TRUE if we need to save REGNO.  */
 
 static bool
@@ -30836,6 +30878,7 @@ struct builtin_isa {
   const char *name;		/* function name */
   enum ix86_builtin_func_type tcode; /* type to use in the declaration */
   HOST_WIDE_INT isa;		/* isa_flags this builtin is defined for */
+  HOST_WIDE_INT isa2;		/* additional isa_flags this builtin is defined for */
   bool const_p;			/* true if the declaration is constant */
   bool leaf_p;			/* true if the declaration has leaf attribute */
   bool nothrow_p;		/* true if the declaration has nothrow attribute */
@@ -30846,6 +30889,7 @@ static struct builtin_isa ix86_builtins_isa[(int) IX86_BUILTIN_MAX];
 
 /* Bits that can still enable any inclusion of a builtin.  */
 static HOST_WIDE_INT deferred_isa_values = 0;
+static HOST_WIDE_INT deferred_isa_values2 = 0;
 
 /* Add an ix86 target builtin function with CODE, NAME and TYPE.  Save the MASK
    of which isa_flags to use in the ix86_builtins_isa array.  Stores the
@@ -30928,18 +30972,75 @@ def_builtin_const (HOST_WIDE_INT mask, const char *name,
   return decl;
 }
 
+/* Like def_builtin, but for additional isa2 flags.  */
+
+static inline tree
+def_builtin2 (HOST_WIDE_INT mask, const char *name,
+	     enum ix86_builtin_func_type tcode,
+	     enum ix86_builtins code)
+{
+  tree decl = NULL_TREE;
+
+  ix86_builtins_isa[(int) code].isa2 = mask;
+
+  if (mask == 0
+      || (mask & ix86_isa_flags2) != 0
+      || (lang_hooks.builtin_function
+	  == lang_hooks.builtin_function_ext_scope))
+
+    {
+      tree type = ix86_get_builtin_func_type (tcode);
+      decl = add_builtin_function (name, type, code, BUILT_IN_MD,
+				   NULL, NULL_TREE);
+	  ix86_builtins[(int) code] = decl;
+	  ix86_builtins_isa[(int) code].set_and_not_built_p = false;
+    }
+  else
+    {
+      /* Just a MASK where set_and_not_built_p == true can potentially
+	 include a builtin.  */
+      deferred_isa_values2 |= mask;
+      ix86_builtins[(int) code] = NULL_TREE;
+      ix86_builtins_isa[(int) code].tcode = tcode;
+      ix86_builtins_isa[(int) code].name = name;
+      ix86_builtins_isa[(int) code].leaf_p = false;
+      ix86_builtins_isa[(int) code].nothrow_p = false;
+      ix86_builtins_isa[(int) code].const_p = false;
+      ix86_builtins_isa[(int) code].set_and_not_built_p = true;
+    }
+
+  return decl;
+}
+
+/* Like def_builtin, but also marks the function decl "const".  */
+
+static inline tree
+def_builtin_const2 (HOST_WIDE_INT mask, const char *name,
+		   enum ix86_builtin_func_type tcode, enum ix86_builtins code)
+{
+  tree decl = def_builtin2 (mask, name, tcode, code);
+  if (decl)
+    TREE_READONLY (decl) = 1;
+  else
+    ix86_builtins_isa[(int) code].const_p = true;
+
+  return decl;
+}
+
 /* Add any new builtin functions for a given ISA that may not have been
    declared.  This saves a bit of space compared to adding all of the
    declarations to the tree, even if we didn't use them.  */
 
 static void
-ix86_add_new_builtins (HOST_WIDE_INT isa)
+ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2)
 {
-  if ((isa & deferred_isa_values) == 0)
+  if (((isa & deferred_isa_values) == 0)
+      && ((isa2 & deferred_isa_values2) == 0))
     return;
 
   /* Bits in ISA value can be removed from potential isa values.  */
   deferred_isa_values &= ~isa;
+  deferred_isa_values2 &= ~isa2;
 
   int i;
   tree saved_current_target_pragma = current_target_pragma;
@@ -30947,7 +31048,7 @@ ix86_add_new_builtins (HOST_WIDE_INT isa)
 
   for (i = 0; i < (int)IX86_BUILTIN_MAX; i++)
     {
-      if ((ix86_builtins_isa[i].isa & isa) != 0
+      if ((((ix86_builtins_isa[i].isa & isa) != 0) || ((ix86_builtins_isa[i].isa2 & isa2) != 0))
 	  && ix86_builtins_isa[i].set_and_not_built_p)
 	{
 	  tree decl, type;
@@ -31185,8 +31286,10 @@ BDESC_VERIFYS (IX86_BUILTIN__BDESC_ARGS_FIRST,
 	       IX86_BUILTIN__BDESC_SPECIAL_ARGS_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_ROUND_ARGS_FIRST,
 	       IX86_BUILTIN__BDESC_ARGS_LAST, 1);
-BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_FIRST,
+BDESC_VERIFYS (IX86_BUILTIN__BDESC_ARGS2_FIRST,
 	       IX86_BUILTIN__BDESC_ROUND_ARGS_LAST, 1);
+BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_FIRST,
+	       IX86_BUILTIN__BDESC_ARGS2_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_MPX_CONST_FIRST,
 	       IX86_BUILTIN__BDESC_MPX_LAST, 1);
 BDESC_VERIFYS (IX86_BUILTIN__BDESC_MULTI_ARG_FIRST,
@@ -31237,6 +31340,18 @@ ix86_init_mmx_sse_builtins (void)
 		 IX86_BUILTIN__BDESC_ARGS_FIRST,
 		 ARRAY_SIZE (bdesc_args) - 1);
 
+  /* Add all builtins with variable number of operands.  */
+  for (i = 0, d = bdesc_args2;
+       i < ARRAY_SIZE (bdesc_args2);
+       i++, d++)
+    {
+      if (d->name == 0)
+	continue;
+
+      ftype = (enum ix86_builtin_func_type) d->flag;
+      def_builtin_const2 (d->mask, d->name, ftype, d->code);
+    }
+
   /* Add all builtins with rounding.  */
   for (i = 0, d = bdesc_round_args;
        i < ARRAY_SIZE (bdesc_round_args);
@@ -36428,10 +36543,13 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget,
      current ISA based on the command line switches.  With function specific
      options, we need to check in the context of the function making the call
      whether it is supported.  */
-  if (ix86_builtins_isa[fcode].isa
-      && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
+  if ((ix86_builtins_isa[fcode].isa
+       && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
+      && (ix86_builtins_isa[fcode].isa2
+	  && !(ix86_builtins_isa[fcode].isa2 & ix86_isa_flags2)))
     {
-      char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa, 0, 0,
+      char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa,
+				       ix86_builtins_isa[fcode].isa2, 0, 0,
 				       NULL, NULL, (enum fpmath_unit) 0,
 				       false);
       if (!opts)
@@ -38091,6 +38209,246 @@ rdseed_step:
 	}
     }
 
+  if (fcode >= IX86_BUILTIN__BDESC_ARGS2_FIRST
+      && fcode <= IX86_BUILTIN__BDESC_ARGS2_LAST)
+    {
+      i = fcode - IX86_BUILTIN__BDESC_ARGS2_FIRST;
+      rtx (*fcn) (rtx, rtx, rtx, rtx);
+      rtx (*fcn_mask) (rtx, rtx, rtx, rtx, rtx);
+      rtx (*fcn_maskz) (rtx, rtx, rtx, rtx, rtx, rtx);
+      rtx (*msk_mov) (rtx, rtx, rtx, rtx);
+      int masked = 1;
+      machine_mode mode, wide_mode, nar_mode;
+
+      nar_mode  = V4SFmode;
+      mode      = V16SFmode;
+      wide_mode = V64SFmode;
+      msk_mov   = gen_avx512f_loadv16sf_mask;
+      fcn_mask  = gen_avx5124fmaddps_4fmaddps_mask;
+      fcn_maskz = gen_avx5124fmaddps_4fmaddps_maskz;
+
+      switch (fcode)
+	{
+	case IX86_BUILTIN_4FMAPS:
+	  fcn = gen_avx5124fmaddps_4fmaddps;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSD:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn = gen_avx5124vnniw_vp4dpwssd;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSDS:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn = gen_avx5124vnniw_vp4dpwssds;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4FNMAPS:
+	  fcn = gen_avx5124fmaddps_4fnmaddps;
+	  masked = 0;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4FNMAPS_MASK:
+	  fcn_mask  = gen_avx5124fmaddps_4fnmaddps_mask;
+	  fcn_maskz = gen_avx5124fmaddps_4fnmaddps_maskz;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSD_MASK:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn_mask  = gen_avx5124vnniw_vp4dpwssd_mask;
+	  fcn_maskz = gen_avx5124vnniw_vp4dpwssd_maskz;
+	  msk_mov   = gen_avx512f_loadv16si_mask;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4DPWSSDS_MASK:
+	  nar_mode  = V4SImode;
+	  mode      = V16SImode;
+	  wide_mode = V64SImode;
+	  fcn_mask  = gen_avx5124vnniw_vp4dpwssds_mask;
+	  fcn_maskz = gen_avx5124vnniw_vp4dpwssds_maskz;
+	  msk_mov   = gen_avx512f_loadv16si_mask;
+	  goto v4fma_expand;
+
+	case IX86_BUILTIN_4FMAPS_MASK:
+	  {
+	    tree args[4];
+	    rtx ops[4];
+	    rtx wide_reg;
+	    rtx accum;
+	    rtx addr;
+	    rtx mem;
+
+v4fma_expand:
+	    wide_reg = gen_reg_rtx (wide_mode);
+	    for (i = 0; i < 4; i++)
+	      {
+	        args[i] = CALL_EXPR_ARG (exp, i);
+		ops[i] = expand_normal (args[i]);
+
+		emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, (i) * 64),
+				  ops[i]);
+	      }
+
+	    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
+	    accum = force_reg (mode, accum);
+
+	    addr = expand_normal (CALL_EXPR_ARG (exp, 5));
+	    addr = force_reg (Pmode, addr);
+
+	    mem = gen_rtx_MEM (nar_mode, addr);
+
+	    target = gen_reg_rtx (mode);
+
+	    emit_move_insn (target, accum);
+
+	    if (! masked)
+	      emit_insn (fcn (target, accum, wide_reg, mem));
+	    else
+	      {
+	        rtx merge, mask;
+		merge = expand_normal (CALL_EXPR_ARG (exp, 6));
+
+		mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+
+		if (CONST_INT_P (mask))
+		  mask = fixup_modeless_constant (mask, HImode);
+
+		mask = force_reg (HImode, mask);
+
+		if (GET_MODE (mask) != HImode)
+		  mask = gen_rtx_SUBREG (HImode, mask, 0);
+
+		/* If merge is 0 then we're about to emit z-masked variant.  */
+		if (const0_operand (merge, mode))
+		  emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
+		/* If merge is the same as accum then emit merge-masked variant.  */
+		else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
+		  {
+		    merge = force_reg (mode, merge);
+		    emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
+		  }
+	        /* Merge with something unknown might happen if we z-mask w/ -O0.  */
+		else
+		  {
+		    rtx tmp = target;
+		    emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
+
+		    target = force_reg (mode, merge);
+		    emit_insn (msk_mov (target, tmp, target, mask));
+		  }
+	      }
+	      return target;
+	    }
+
+	case IX86_BUILTIN_4FNMASS:
+	  fcn = gen_avx5124fmaddps_4fnmaddss;
+	  masked = 0;
+	  goto s4fma_expand;
+
+	case IX86_BUILTIN_4FMASS:
+	  fcn = gen_avx5124fmaddps_4fmaddss;
+	  masked = 0;
+	  goto s4fma_expand;
+
+	case IX86_BUILTIN_4FNMASS_MASK:
+	  fcn_mask = gen_avx5124fmaddps_4fnmaddss_mask;
+	  fcn_maskz = gen_avx5124fmaddps_4fnmaddss_maskz;
+	  msk_mov   = gen_avx512vl_loadv4sf_mask;
+	  goto s4fma_expand;
+
+	case IX86_BUILTIN_4FMASS_MASK:
+	  {
+	    tree args[4];
+	    rtx ops[4];
+	    rtx wide_reg;
+	    rtx accum;
+	    rtx addr;
+	    rtx mem;
+
+	    fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
+	    fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
+	    msk_mov   = gen_avx512vl_loadv4sf_mask;
+
+s4fma_expand:
+	    mode = V4SFmode;
+	    wide_reg = gen_reg_rtx (V64SFmode);
+	    for (i = 0; i < 4; i++)
+	      {
+		 rtx tmp;
+		 args[i] = CALL_EXPR_ARG (exp, i);
+		 ops[i] = expand_normal (args[i]);
+
+		 tmp = gen_reg_rtx (SFmode);
+		 emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
+
+		 emit_move_insn (gen_rtx_SUBREG (V16SFmode, wide_reg, i * 64),
+				  gen_rtx_SUBREG (V16SFmode, tmp, 0));
+	      }
+
+	    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
+	    accum = force_reg (V4SFmode, accum);
+
+	    addr = expand_normal (CALL_EXPR_ARG (exp, 5));
+	    addr = force_reg (Pmode, addr);
+
+	    mem = gen_rtx_MEM (V4SFmode, addr);
+
+	    target = gen_reg_rtx (V4SFmode);
+
+	    emit_move_insn (target, accum);
+
+	    if (! masked)
+	      emit_insn (fcn (target, accum, wide_reg, mem));
+	    else
+	      {
+		 rtx merge, mask;
+		 merge = expand_normal (CALL_EXPR_ARG (exp, 6));
+
+		 mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+
+		 if (CONST_INT_P (mask))
+		   mask = fixup_modeless_constant (mask, QImode);
+
+		 mask = force_reg (QImode, mask);
+
+		 if (GET_MODE (mask) != QImode)
+		   mask = gen_rtx_SUBREG (QImode, mask, 0);
+
+		 /* If merge is 0 then we're about to emit z-masked variant.  */
+		 if (const0_operand (merge, mode))
+		   emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
+		 /* If merge is the same as accum then emit merge-masked variant.  */
+		 else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
+		   {
+		     merge = force_reg (mode, merge);
+		     emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
+		   }
+		 /* Merge with something unknown might happen if we z-mask w/ -O0.  */
+		 else
+		   {
+		     rtx tmp = target;
+		     emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
+
+		     target = force_reg (mode, merge);
+		     emit_insn (msk_mov (target, tmp, target, mask));
+		   }
+		}
+	      return target;
+	    }
+	  default:
+	    return ix86_expand_args_builtin (bdesc_args2 + i, exp, target);
+	  }
+    }
+
   if (fcode >= IX86_BUILTIN__BDESC_COMI_FIRST
       && fcode <= IX86_BUILTIN__BDESC_COMI_LAST)
     {
@@ -38151,7 +38509,8 @@ static tree ix86_get_builtin (enum ix86_builtins code)
 
   opts = TREE_TARGET_OPTION (target_tree);
 
-  if (ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
+  if ((ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
+	&& (ix86_builtins_isa[(int) code].isa2 & opts->x_ix86_isa_flags2))
     return ix86_builtin_decl (code, true);
   else
     return NULL_TREE;
@@ -39735,6 +40094,18 @@ ix86_hard_regno_mode_ok (int regno, machine_mode mode)
 	      || VALID_AVX512F_SCALAR_MODE (mode)))
 	return true;
 
+      /* For AVX-5124FMAPS allow V64SFmode for special regnos.  */
+      if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
+	  && MOD4_SSE_REGNO_P (regno)
+	  && mode == V64SFmode)
+	return true;
+
+      /* For AVX-5124VNNIW allow V64SImode for special regnos.  */
+      if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
+	  && MOD4_SSE_REGNO_P (regno)
+	  && mode == V64SImode)
+	return true;
+
       /* TODO check for QI/HI scalars.  */
       /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
       if (TARGET_AVX512VL
@@ -51134,6 +51505,9 @@ ix86_run_selftests (void)
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
 #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
 
+#undef TARGET_ADDITIONAL_ALLOCNO_CLASS_P
+#define TARGET_ADDITIONAL_ALLOCNO_CLASS_P ix86_additional_allocno_class_p
+
 #undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
 #define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index add7a64..10533eb 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -81,6 +81,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TARGET_AVX512VBMI_P(x)	TARGET_ISA_AVX512VBMI_P(x)
 #define TARGET_AVX512IFMA	TARGET_ISA_AVX512IFMA
 #define TARGET_AVX512IFMA_P(x)	TARGET_ISA_AVX512IFMA_P(x)
+#define TARGET_AVX5124FMAPS	TARGET_ISA_AVX5124FMAPS
+#define TARGET_AVX5124FMAPS_P(x) TARGET_ISA_AVX5124FMAPS_P(x)
+#define TARGET_AVX5124VNNIW	TARGET_ISA_AVX5124VNNIW
+#define TARGET_AVX5124VNNIW_P(x) TARGET_ISA_AVX5124VNNIW_P(x)
 #define TARGET_FMA	TARGET_ISA_FMA
 #define TARGET_FMA_P(x)	TARGET_ISA_FMA_P(x)
 #define TARGET_SSE4A	TARGET_ISA_SSE4A
@@ -1089,7 +1093,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define HARD_REGNO_NREGS(REGNO, MODE)					\
   (STACK_REGNO_P (REGNO) || SSE_REGNO_P (REGNO) || MMX_REGNO_P (REGNO)	\
    || MASK_REGNO_P (REGNO) || BND_REGNO_P (REGNO)			\
-   ? (COMPLEX_MODE_P (MODE) ? 2 : 1)					\
+   ? (COMPLEX_MODE_P (MODE) ? 2 :					\
+      (((MODE == V64SFmode) || (MODE == V64SImode)) ? 4 : 1))		\
    : ((MODE) == XFmode							\
       ? (TARGET_64BIT ? 2 : 3)						\
       : ((MODE) == XCmode						\
@@ -1365,6 +1370,7 @@ enum reg_class
   FLOAT_INT_SSE_REGS,
   MASK_EVEX_REGS,
   MASK_REGS,
+  MOD4_SSE_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1425,6 +1431,7 @@ enum reg_class
    "FLOAT_INT_SSE_REGS",		\
    "MASK_EVEX_REGS",			\
    "MASK_REGS",				\
+   "MOD4_SSE_REGS"			\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1465,9 +1472,10 @@ enum reg_class
 {   0x11ffff,    0x1fe0,    0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,   0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,   0x1f },       /* FLOAT_INT_SSE_REGS */        \
-       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0, 0x1fc0 },       /* MASK_EVEX_REGS */            \
        { 0x0,       0x0, 0x1fe0 },       /* MASK_REGS */                 \
-{ 0xffffffff,0xffffffff,0x1ffff }                                        \
+{ 0x1fe00000,0xffffe000,   0x1f },       /* MOD4_SSE_REGS */		 \
+{ 0xffffffff,0xffffffff,0x1ffff }		\
 }
 
 /* The same information, inverted:
@@ -1533,6 +1541,16 @@ enum reg_class
 #define BND_REG_P(X) (REG_P (X) && BND_REGNO_P (REGNO (X)))
 #define BND_REGNO_P(N) IN_RANGE ((N), FIRST_BND_REG, LAST_BND_REG)
 
+#define MOD4_SSE_REG_P(X) (REG_P (X) && MOD4_SSE_REGNO_P (REGNO (X)))
+#define MOD4_SSE_REGNO_P(N) ((N) == XMM0_REG  \
+			     || (N) == XMM4_REG  \
+			     || (N) == XMM8_REG  \
+			     || (N) == XMM12_REG \
+			     || (N) == XMM16_REG \
+			     || (N) == XMM20_REG \
+			     || (N) == XMM24_REG \
+			     || (N) == XMM28_REG)
+
 /* First floating point reg */
 #define FIRST_FLOAT_REG FIRST_STACK_REG
 #define STACK_TOP_P(X) (REG_P (X) && REGNO (X) == FIRST_FLOAT_REG)
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 9eef558..390412a 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -25,11 +25,17 @@ config/i386/i386-opts.h
 Variable
 HOST_WIDE_INT ix86_isa_flags = TARGET_64BIT_DEFAULT | TARGET_SUBTARGET_ISA_DEFAULT
 
+Variable
+HOST_WIDE_INT ix86_isa_flags2 = 0
+
 ; A mask of ix86_isa_flags that includes bit X if X was set or cleared
 ; on the command line.
 Variable
 HOST_WIDE_INT ix86_isa_flags_explicit
 
+Variable
+HOST_WIDE_INT ix86_isa_flags2_explicit
+
 ; Additional target flags
 Variable
 int ix86_target_flags
@@ -74,6 +80,10 @@ unsigned char branch_cost
 
 ;; which flags were passed by the user
 TargetSave
+HOST_WIDE_INT x_ix86_isa_flags2_explicit
+
+;; which flags were passed by the user
+TargetSave
 HOST_WIDE_INT x_ix86_isa_flags_explicit
 
 ;; whether -mtune was not specified
@@ -687,6 +697,14 @@ mavx512vbmi
 Target Report Mask(ISA_AVX512VBMI) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512VBMI built-in functions and code generation.
 
+mavx5124fmaps
+Target Report Mask(ISA_AVX5124FMAPS) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX5124FMAPS built-in functions and code generation.
+
+mavx5124vnniw
+Target Report Mask(ISA_AVX5124VNNIW) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX5124VNNIW built-in functions and code generation.
+
 mfma
 Target Report Mask(ISA_FMA) Var(ix86_isa_flags) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and FMA built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index 9333111..3fd3c9c 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -68,6 +68,10 @@
 
 #include <avx512vbmivlintrin.h>
 
+#include <avx5124fmapsintrin.h>
+
+#include <avx5124vnniwintrin.h>
+
 #include <shaintrin.h>
 
 #include <lzcntintrin.h>
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 14fcd67..81fcc1d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -146,6 +146,12 @@
 
   ;; For AVX512VBMI support
   UNSPEC_VPMULTISHIFT
+
+  ;; For AVX5124FMAPS/AVX5124VNNIW support
+  UNSPEC_VP4FMADD
+  UNSPEC_VP4FNMADD
+  UNSPEC_VP4DPWSSD
+  UNSPEC_VP4DPWSSDS
 ])
 
 (define_c_enum "unspecv" [
@@ -19397,3 +19403,274 @@
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
+
+(define_mode_iterator IMOD4
+  [(V64SF "TARGET_AVX5124FMAPS") (V64SI "TARGET_AVX5124VNNIW")])
+
+(define_mode_attr imod4_narrow
+  [(V64SF "V16SF") (V64SI "V16SI")])
+
+(define_insn "mov<mode>"
+  [(set (match_operand:IMOD4 0 "nonimmediate_operand")
+	(match_operand:IMOD4 1 "general_operand"))]
+  "TARGET_AVX512F"
+  "#")
+
+(define_split
+  [(set (match_operand:IMOD4 0 "register_operand")
+	(match_operand:IMOD4 1 "nonimmediate_operand"))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (subreg:<imod4_narrow> (match_dup 0) 0)
+	(subreg:<imod4_narrow> (match_dup 1) 0))
+   (set (subreg:<imod4_narrow> (match_dup 0) 64)
+	(subreg:<imod4_narrow> (match_dup 1) 64))
+   (set (subreg:<imod4_narrow> (match_dup 0) 128)
+	(subreg:<imod4_narrow> (match_dup 1) 128))
+   (set (subreg:<imod4_narrow> (match_dup 0) 192)
+	(subreg:<imod4_narrow> (match_dup 1) 192))])
+
+(define_insn "avx5124fmaddps_4fmaddps"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(unspec:V16SF
+	  [(match_operand:V16SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "Yh")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddps_mask"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	     [(match_operand:V64SF 1 "register_operand" "Yh")
+	      (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V16SF 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddps_maskz"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	    [(match_operand:V16SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "Yh")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V16SF 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddps\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(unspec:V4SF
+	  [(match_operand:V4SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "Yh")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%3, %x2, %0|%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss_mask"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V64SF 1 "register_operand" "Yh")
+	     (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V4SF 3 "register_operand" "0")
+	  (match_operand:QI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%2, %x1, %0%{%4%}|%{%4%}%0, %x1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fmaddss_maskz"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V4SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "Yh")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FMADD)
+	  (match_operand:V4SF 4 "const0_operand" "C")
+	  (match_operand:QI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fmaddss\t{%3, %x2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(unspec:V16SF
+	  [(match_operand:V16SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "Yh")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps_mask"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	     [(match_operand:V64SF 1 "register_operand" "Yh")
+	      (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V16SF 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddps_maskz"
+  [(set (match_operand:V16SF 0 "register_operand" "=v")
+	(vec_merge:V16SF
+	  (unspec:V16SF
+	    [(match_operand:V16SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "Yh")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V16SF 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddps\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("V16SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(unspec:V4SF
+	  [(match_operand:V4SF 1 "register_operand" "0")
+	   (match_operand:V64SF 2 "register_operand" "Yh")
+	   (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%3, %x2, %0|%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss_mask"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V64SF 1 "register_operand" "Yh")
+	     (match_operand:V4SF 2 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V4SF 3 "register_operand" "0")
+	  (match_operand:QI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%2, %x1, %0%{%4%}|%{%4%}%0, %x1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124fmaddps_4fnmaddss_maskz"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+	(vec_merge:V4SF
+	  (unspec:V4SF
+	    [(match_operand:V4SF 1 "register_operand" "0")
+	     (match_operand:V64SF 2 "register_operand" "Yh")
+	     (match_operand:V4SF 3 "memory_operand" "m")] UNSPEC_VP4FNMADD)
+	  (match_operand:V4SF 4 "const0_operand" "C")
+	  (match_operand:QI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124FMAPS"
+  "v4fnmaddss\t{%3, %x2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %x2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("SF"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(unspec:V16SI
+	  [(match_operand:V16SI 1 "register_operand" "0")
+	   (match_operand:V64SI 2 "register_operand" "Yh")
+	   (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSD))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd_mask"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	     [(match_operand:V64SI 1 "register_operand" "Yh")
+	      (match_operand:V4SI 2 "memory_operand" "m")] UNSPEC_VP4DPWSSD)
+	  (match_operand:V16SI 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssd_maskz"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	    [(match_operand:V16SI 1 "register_operand" "0")
+	     (match_operand:V64SI 2 "register_operand" "Yh")
+	     (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSD)
+	  (match_operand:V16SI 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssd\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(unspec:V16SI
+	  [(match_operand:V16SI 1 "register_operand" "0")
+	   (match_operand:V64SI 2 "register_operand" "Yh")
+	   (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSDS))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%3, %g2, %0|%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds_mask"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	     [(match_operand:V64SI 1 "register_operand" "Yh")
+	      (match_operand:V4SI 2 "memory_operand" "m")] UNSPEC_VP4DPWSSDS)
+	  (match_operand:V16SI 3 "register_operand" "0")
+	  (match_operand:HI 4 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%2, %g1, %0%{%4%}|%{%4%}%0, %g1, %2}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
+
+(define_insn "avx5124vnniw_vp4dpwssds_maskz"
+  [(set (match_operand:V16SI 0 "register_operand" "=v")
+	(vec_merge:V16SI
+	  (unspec:V16SI
+	    [(match_operand:V16SI 1 "register_operand" "0")
+	     (match_operand:V64SI 2 "register_operand" "Yh")
+	     (match_operand:V4SI 3 "memory_operand" "m")] UNSPEC_VP4DPWSSDS)
+	  (match_operand:V16SI 4 "const0_operand" "C")
+	  (match_operand:HI 5 "register_operand" "Yk")))]
+  "TARGET_AVX5124VNNIW"
+  "vp4dpwssds\t{%3, %g2, %0%{%5%}%{z%}|%{%5%}%{z%}%0, %g2, %3}"
+   [(set_attr ("type") ("ssemuladd"))
+    (set_attr ("prefix") ("evex"))
+    (set_attr ("mode") ("TI"))])
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index 92ca055..42ab5f0 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -973,10 +973,10 @@ inline __attribute__((__always_inline__))\n\
 #else\n\
 extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
 #endif\n\
-unsigned char\n\
+unsigned short\n\
 mode_size_inline (machine_mode mode)\n\
 {\n\
-  extern %sunsigned char mode_size[NUM_MACHINE_MODES];\n\
+  extern %sunsigned short mode_size[NUM_MACHINE_MODES];\n\
   gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
   switch (mode)\n\
     {\n", adj_bytesize ? "" : "const ");
@@ -1301,7 +1301,7 @@ emit_mode_size (void)
   int c;
   struct mode_data *m;
 
-  print_maybe_const_decl ("%sunsigned char", "mode_size",
+  print_maybe_const_decl ("%sunsigned short", "mode_size",
 			  "NUM_MACHINE_MODES", bytesize);
 
   for_all_modes (c, m)
@@ -1492,7 +1492,7 @@ emit_mode_base_align (void)
   int c;
   struct mode_data *m;
 
-  print_maybe_const_decl ("%sunsigned char",
+  print_maybe_const_decl ("%sunsigned short",
 			  "mode_base_align", "NUM_MACHINE_MODES",
 			  alignment);
 
diff --git a/gcc/init-regs.c b/gcc/init-regs.c
index 3fbaee1..2ee4bd4 100644
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
@@ -104,6 +104,7 @@ initialize_uninitialized_regs (void)
 		  bitmap_set_bit (already_genned, regno);
 
 		  start_sequence ();
+		  emit_clobber (reg);
 		  emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
 		  move_insn = get_insns ();
 		  end_sequence ();
diff --git a/gcc/machmode.h b/gcc/machmode.h
index 3dcadd8..d924e83 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -179,7 +179,7 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
 
 /* Get the size in bytes and bits of an object of mode MODE.  */
 
-extern CONST_MODE_SIZE unsigned char mode_size[NUM_MACHINE_MODES];
+extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
 #if GCC_VERSION >= 4001
 #define GET_MODE_SIZE(MODE) \
   ((unsigned short) (__builtin_constant_p (MODE) \
@@ -330,7 +330,7 @@ extern machine_mode get_best_mode (int, int,
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
 
-extern CONST_MODE_BASE_ALIGN unsigned char mode_base_align[NUM_MACHINE_MODES];
+extern CONST_MODE_BASE_ALIGN unsigned short mode_base_align[NUM_MACHINE_MODES];
 
 extern unsigned get_mode_alignment (machine_mode);
 
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index b6b3559..701051d 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,9 +1,10 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mclwb -mmwaitx -mclzero -mpku" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
-   popcntintrin.h, fmaintrin.h, pkuintrin.h and mm_malloc.h.h are usable with
+   popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
+   avx5124vnniwintrin.h and mm_malloc.h.h are usable with
    -O -pedantic-errors.  */
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index 994ed28..cd8f217 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,9 +1,10 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx5124fmaps -mavx5124vnniw -mclwb -mmwaitx -mclzero -mpku" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
-   popcntintrin.h, fmaintrin.h, pkuintrin.h and mm_malloc.h are usable with
+   popcntintrin.h, fmaintrin.h, pkuintrin.h, avx5124fmapsintrin.h,
+   avx5124vnniwintrin.h and mm_malloc.h are usable with
    -O -fkeep-inline-functions.  */
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c
new file mode 100644
index 0000000..1035f25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512 a, b, c, d, e, f, g, x1, x2, x3;
+__m128 *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4fmadd_ps (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4fmadd_ps (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4fmadd_ps (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c
new file mode 100644
index 0000000..f977b65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddps-2.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-require-effective-target avx5124fmaps } */
+
+#define ESP_FLOAT 1.0
+
+#define AVX5124FMAPS
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (float *src1, float* src2, float *src3,
+      float *src4, float* prev_dst, float *mult, float *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      dst[i] = (double)prev_dst[i]
+	+ (double)src1[i] * (double)mult[0]
+	+ (double)src2[i] * (double)mult[1]
+	+ (double)src3[i] * (double)mult[2]
+	+ (double)src4[i] * (double)mult[3];
+    }
+}
+
+void
+TEST (void)
+{
+  int i, sign;
+  UNION_TYPE (AVX512F_LEN,) src1, src2, src3, src4, src5, dst, res1, res2, res3;
+  UNION_TYPE (128,) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  float res_ref[SIZE];
+
+  sign = -1;
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = 1.5 + 34.67 * i * sign;
+      src2.a[i] = -22.17 * i * sign;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+      sign = sign * -1;
+    }
+  for (i = 0; i < 4; i++)
+    mult.a[i] = 3.1415 + i * 2.71828;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4fmadd_ps)       (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4fmadd_ps)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4fmadd_ps) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c
new file mode 100644
index 0000000..2f1a558
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fmaddss-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m128 a, b, c, d, e, f, x1, x2, x3;
+__m128 *mem;
+__mmask8 m;
+
+int foo ()
+{
+  x1 = _mm_4fmadd_ss (a, b, c, d, e, mem);
+  x2 = _mm_mask_4fmadd_ss (a, m, b, c, d, e, mem);
+  x3 = _mm_maskz_4fmadd_ss (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c
new file mode 100644
index 0000000..45bd7da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512 a, b, c, d, e, f, g, x1, x2, x3;
+__m128 *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4fnmadd_ps (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4fnmadd_ps (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4fnmadd_ps (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c
new file mode 100644
index 0000000..3c75fcf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-require-effective-target avx5124fmaps } */
+
+#define ESP_FLOAT 1.0
+
+#define AVX5124FMAPS
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (float *src1, float* src2, float *src3,
+      float *src4, float* prev_dst, float *mult, float *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      dst[i] = (double)prev_dst[i]
+	- (double)src1[i] * (double)mult[0]
+	- (double)src2[i] * (double)mult[1]
+	- (double)src3[i] * (double)mult[2]
+	- (double)src4[i] * (double)mult[3];
+    }
+}
+
+void
+TEST (void)
+{
+  int i, sign;
+  UNION_TYPE (AVX512F_LEN,) src1, src2, src3, src4, src5, dst, res1, res2, res3;
+  UNION_TYPE (128,) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  float res_ref[SIZE];
+
+  sign = -1;
+  for (i = 0; i < SIZE; i++)
+    {
+      src1.a[i] = 1.5 + 34.67 * i * sign;
+      src2.a[i] = -22.17 * i * sign;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+      sign = sign * -1;
+    }
+  for (i = 0; i < 4; i++)
+    mult.a[i] = 3.1415 + i * 2.71828;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4fnmadd_ps)       (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4fnmadd_ps)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4fnmadd_ps) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO () (res_ref, mask, SIZE);
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c
new file mode 100644
index 0000000..1755afb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124fmaps" } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "v4fnmaddss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+
+#include <x86intrin.h>
+
+__m128 a, b, c, d, e, f, x1, x2, x3;
+__m128 *mem;
+__mmask8 m;
+
+int foo ()
+{
+  x1 = _mm_4fnmadd_ss (a, b, c, d, e, mem);
+  x2 = _mm_mask_4fnmadd_ss (a, m, b, c, d, e, mem);
+  x3 = _mm_maskz_4fnmadd_ss (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h b/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
new file mode 100644
index 0000000..eba93cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124fmaps-check.h
@@ -0,0 +1,47 @@
+#include <stdlib.h>
+#include "cpuid.h"
+#include "m512-check.h"
+#include "avx512f-os-support.h"
+
+static void avx5124fmaps_test (void);
+
+static void __attribute__ ((noinline)) do_test (void)
+{
+  avx5124fmaps_test ();
+}
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AVX512_4FMAPS test only if host has the support.  */
+  if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE))
+    {
+      if (__get_cpuid_max (0, NULL) < 7)
+	return 0;
+
+      __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+      if ((avx512f_os_support ()) && ((edx & bit_AVX5124FMAPS) == bit_AVX5124FMAPS))
+	{
+	  do_test ();
+#ifdef DEBUG
+	  printf ("PASSED\n");
+#endif
+	  return 0;
+	}
+#ifdef DEBUG
+      printf ("SKIPPED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h b/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
new file mode 100644
index 0000000..a706cfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-check.h
@@ -0,0 +1,47 @@
+#include <stdlib.h>
+#include "cpuid.h"
+#include "m512-check.h"
+#include "avx512f-os-support.h"
+
+static void avx5124vnniw_test (void);
+
+static void __attribute__ ((noinline)) do_test (void)
+{
+  avx5124vnniw_test ();
+}
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AVX512_4VNNIW test only if host has the support.  */
+  if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE))
+    {
+      if (__get_cpuid_max (0, NULL) < 7)
+	return 0;
+
+      __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+      if ((avx512f_os_support ()) && ((edx & bit_AVX5124VNNIW) == bit_AVX5124VNNIW))
+	{
+	  do_test ();
+#ifdef DEBUG
+	  printf ("PASSED\n");
+#endif
+	  return 0;
+	}
+#ifdef DEBUG
+      printf ("SKIPPED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c
new file mode 100644
index 0000000..a234fdd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512i a, b, c, d, e, f, g, x1, x2, x3;
+__m128i *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4dpwssd_epi32 (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4dpwssd_epi32 (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4dpwssd_epi32 (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c
new file mode 100644
index 0000000..a0a6825
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c
@@ -0,0 +1,79 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-require-effective-target avx5124vnniw } */
+
+#define AVX5124VNNIW
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (short *src1, short* src2, short *src3,
+      short *src4, int* prev_dst, short *mult, int *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      int p1dword, p2dword;
+      dst[i] = prev_dst[i];
+      p1dword = (int)(src1[2*i  ]) * (int)(mult[0]);
+      p2dword = (int)(src1[2*i+1]) * (int)(mult[1]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src2[2*i  ]) * (int)(mult[2]);
+      p2dword = (int)(src2[2*i+1]) * (int)(mult[3]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src3[2*i  ]) * (int)(mult[4]);
+      p2dword = (int)(src3[2*i+1]) * (int)(mult[5]);
+      dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src4[2*i  ]) * (int)(mult[6]);
+      p2dword = (int)(src4[2*i+1]) * (int)(mult[7]);
+      dst[i] += p1dword + p2dword;
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_w) src1, src2, src3, src4;
+  UNION_TYPE (AVX512F_LEN, i_d) src5, dst, res1, res2, res3;
+  UNION_TYPE (128, i_w) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  int res_ref[SIZE];
+
+  for (i = 0; i < SIZE * 2; i++)
+    {
+      src1.a[i] = 2 + 7 * i % 291;
+      src2.a[i] = 3 + 11 * (i % 377) * i;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+    }
+  for (i = 0; i < 8; i++)
+    mult.a[i] = 3 + i * 2;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4dpwssd_epi32)       (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4dpwssd_epi32)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4dpwssd_epi32) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c
new file mode 100644
index 0000000..d1bed37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vp4dpwssds\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <x86intrin.h>
+
+__m512i a, b, c, d, e, f, g, x1, x2, x3;
+__m128i *mem;
+__mmask16 m;
+
+int foo ()
+{
+  x1 = _mm512_4dpwssds_epi32 (a, b, c, d, e, mem);
+  x2 = _mm512_mask_4dpwssds_epi32 (a, m, b, c, d, e, mem);
+  x3 = _mm512_maskz_4dpwssds_epi32 (m, a, b, c, d, e, mem);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c
new file mode 100644
index 0000000..e1e5536
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c
@@ -0,0 +1,98 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx5124vnniw" } */
+/* { dg-require-effective-target avx5124vnniw } */
+
+#define DEFAULT_VALUE 0x7ffffffe
+
+#define AVX5124VNNIW
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+
+#include "avx512f-mask-type.h"
+
+void
+CALC (short *src1, short* src2, short *src3,
+      short *src4, int* prev_dst, short *mult, int *dst)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      int p1dword, p2dword;
+      long long int tmp;
+      dst[i] = prev_dst[i];
+      p1dword = (int)(src1[2*i  ]) * (int)(mult[0]);
+      p2dword = (int)(src1[2*i+1]) * (int)(mult[1]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src2[2*i  ]) * (int)(mult[2]);
+      p2dword = (int)(src2[2*i+1]) * (int)(mult[3]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src3[2*i  ]) * (int)(mult[4]);
+      p2dword = (int)(src3[2*i+1]) * (int)(mult[5]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+
+      p1dword = (int)(src4[2*i  ]) * (int)(mult[6]);
+      p2dword = (int)(src4[2*i+1]) * (int)(mult[7]);
+      tmp = (long long)dst[i] + p1dword + p2dword;
+      if (tmp > 0x7fffffff)
+	dst[i] = 0x7fffffff;
+      else
+	dst[i] += p1dword + p2dword;
+    }
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, i_w) src1, src2, src3, src4;
+  UNION_TYPE (AVX512F_LEN, i_d) src5, dst, res1, res2, res3;
+  UNION_TYPE (128, i_w) mult;
+  MASK_TYPE mask = MASK_VALUE;
+  int res_ref[SIZE];
+
+  for (i = 0; i < SIZE * 2; i++)
+    {
+      src1.a[i] = 2 + 7 * i % 291;
+      src2.a[i] = 3 + 11 * (i % 377) * i;
+      src3.a[i] = src1.a[i] * src1.a[i];
+      src4.a[i] = src2.a[i] * src2.a[i];
+    }
+  for (i = 0; i < 8; i++)
+    mult.a[i] = 3 + i * 2;
+
+  for (i = 0; i < SIZE; i++)
+    src5.a[i] = DEFAULT_VALUE;
+
+  CALC (src1.a, src2.a, src3.a, src4.a, src5.a, mult.a, res_ref);
+
+  res1.x = INTRINSIC (_4dpwssds_epi32)	     (      src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res2.x = INTRINSIC (_mask_4dpwssds_epi32)  (src5.x, mask, src1.x, src2.x, src3.x, src4.x, &mult.x);
+  res3.x = INTRINSIC (_maskz_4dpwssds_epi32) (mask, src5.x, src1.x, src2.x, src3.x, src4.x, &mult.x);
+
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-helper.h b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
index 5923085..6aca0d6 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512f-helper.h
@@ -22,6 +22,10 @@
 #include "avx512ifma-check.h"
 #elif defined (AVX512VBMI) && !defined (AVX512VL)
 #include "avx512vbmi-check.h"
+#elif defined (AVX5124FMAPS) && !defined (AVX512VL)
+#include "avx5124fmaps-check.h"
+#elif defined (AVX5124VNNIW) && !defined (AVX512VL)
+#include "avx5124vnniw-check.h"
 #elif defined (AVX512VL)
 #include "avx512vl-check.h"
 #endif
@@ -33,7 +37,9 @@
 /* Value to be written into destination.
    We have one value for all types so it must be small enough
    to fit into signed char.  */
+#ifndef DEFAULT_VALUE
 #define DEFAULT_VALUE 117
+#endif
 
 #define MAKE_MASK_MERGE(NAME, TYPE)				      \
 static void							      \
@@ -132,6 +138,12 @@ avx512ifma_test (void) { test_512 (); }
 #elif defined (AVX512VBMI) && !defined (AVX512VL)
 void
 avx512vbmi_test (void) { test_512 (); }
+#elif defined (AVX5124FMAPS) && !defined (AVX512VL)
+void
+avx5124fmaps_test (void) { test_512 (); }
+#elif defined (AVX5124VNNIW) && !defined (AVX512VL)
+void
+avx5124vnniw_test (void) { test_512 (); }
 #elif defined (AVX512VL)
 void
 avx512vl_test (void) { test_256 (); test_128 (); }
diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp
index 877d224..4057240 100644
--- a/gcc/testsuite/gcc.target/i386/i386.exp
+++ b/gcc/testsuite/gcc.target/i386/i386.exp
@@ -366,6 +366,48 @@ proc check_effective_target_avx512vbmi { } {
     } "-mavx512vbmi" ]
 }
 
+# Return 1 if avx512_4fmaps instructions can be compiled.
+proc check_effective_target_avx5124fmaps { } {
+    return [check_no_compiler_messages avx5124fmaps object {
+	typedef float __v16sf __attribute__ ((__vector_size__ (64)));
+	typedef float __v4sf __attribute__ ((__vector_size__ (16)));
+
+	__v16sf
+	_mm512_mask_4fmadd_ps (__v16sf __DEST, __v16sf __A, __v16sf __B, __v16sf __C,
+			       __v16sf __D, __v16sf __E, __v4sf *__F)
+	{
+	    return (__v16sf) __builtin_ia32_4fmaddps_mask ((__v16sf) __A,
+							  (__v16sf) __B,
+							  (__v16sf) __C,
+							  (__v16sf) __D,
+							  (__v16sf) __E,
+							  (const __v4sf *) __F,
+							  (__v16sf) __DEST,
+							  0xffff);
+	}
+    } "-mavx5124fmaps" ]
+}
+
+# Return 1 if avx512_4vnniw instructions can be compiled.
+proc check_effective_target_avx5124vnniw { } {
+    return [check_no_compiler_messages avx5124vnniw object {
+	typedef int __v16si __attribute__ ((__vector_size__ (64)));
+	typedef int __v4si __attribute__ ((__vector_size__ (16)));
+
+	__v16si
+	_mm512_4dpwssd_epi32 (__v16si __A, __v16si __B, __v16si __C,
+			      __v16si __D, __v16si __E, __v4si *__F)
+	{
+	    return (__v16si) __builtin_ia32_vp4dpwssd ((__v16si) __B,
+						       (__v16si) __C,
+						       (__v16si) __D,
+						       (__v16si) __E,
+						       (__v16si) __A,
+						       (const __v4si *) __F);
+	}
+    } "-mavx5124vnniw" ]
+}
+
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
diff --git a/gcc/testsuite/gcc.target/i386/m128-check.h b/gcc/testsuite/gcc.target/i386/m128-check.h
index abb792b..48b2332 100644
--- a/gcc/testsuite/gcc.target/i386/m128-check.h
+++ b/gcc/testsuite/gcc.target/i386/m128-check.h
@@ -108,8 +108,12 @@ CHECK_EXP (union128d, double, "%f")
 
 CHECK_EXP (union128, float, "%f")
 
+#ifndef ESP_FLOAT
 #define ESP_FLOAT 0.000001
+#endif
+#ifndef ESP_DOUBLE
 #define ESP_DOUBLE 0.000001
+#endif
 #define CHECK_ARRAY(ARRAY, TYPE, FMT)                   \
 static int                                              \
 __attribute__((noinline, unused))                       \
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index f0f5457..3e8417b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@
    popcntintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mclwb -mmwaitx -mclzero -mpku" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 80d8c20..67f3b93 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma -mclwb -mmwaitx -mclzero -mpku" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mclwb -mmwaitx -mclzero -mpku" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 9242493..256d933 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -8,7 +8,8 @@
 /* Test that the intrinsics compile with optimization.  All of them
    are defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h,
    mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h,
-   tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h and mm_malloc.h 
+   tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h,
+   avx5124fmapsintrin.h, avx5124vnniwintrin.h and mm_malloc.h 
    that reference the proper builtin functions.
 
    Defining away "extern" and "__inline" results in all of them being
@@ -100,7 +101,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512ifma")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 4635fb0..61f1b00 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -7,7 +7,8 @@
 /* Test that the intrinsics compile with optimization.  All of them
    are defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h,
    mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h,
-   tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h and mm_malloc.h 
+   tbmintrin.h, lwpintrin.h, popcntintrin.h, fmaintrin.h,
+   avx5124fmapsintrin.h, avx5124vnniwintrin.h and mm_malloc.h 
    that reference the proper builtin functions.
 
    Defining away "extern" and "__inline" results in all of them being
@@ -594,6 +595,6 @@
 #define __builtin_ia32_extracti64x2_256_mask(A, E, C, D) __builtin_ia32_extracti64x2_256_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x2_256_mask(A, E, C, D) __builtin_ia32_extractf64x2_256_mask(A, 1, C, D)
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,clwb,mwaitx,clzero,pku")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,clwb,mwaitx,clzero,pku")
 
 #include <x86intrin.h>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-15 16:31           ` Andrew Senkevich
@ 2016-11-16 16:21             ` Bernd Schmidt
  2016-11-16 18:14               ` Andrew Senkevich
       [not found]               ` <CAMXFM3seRe2+yxLN7e99zcxPsSfLwYqHmT=7EweW4vZ4NV+b7A@mail.gmail.com>
  0 siblings, 2 replies; 29+ messages in thread
From: Bernd Schmidt @ 2016-11-16 16:21 UTC (permalink / raw)
  To: Andrew Senkevich, Jeff Law
  Cc: Uros Bizjak, Jakub Jelinek, gcc-patches, Vladimir Makarov, Kirill Yukhin

On 11/15/2016 05:31 PM, Andrew Senkevich wrote:
> 2016-11-15 17:56 GMT+03:00 Jeff Law <law@redhat.com>:
>> On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
>>>
>>> 2016-11-11 14:16 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>
>>>> --- a/gcc/genmodes.c
>>>> +++ b/gcc/genmodes.c
>>>> --- a/gcc/init-regs.c
>>>> +++ b/gcc/init-regs.c
>>>> --- a/gcc/machmode.h
>>>> +++ b/gcc/machmode.h
>>>>
>>>> These are middle-end changes, you will need a separate review for these.
>>>
>>>
>>> Who could review these changes?
>>
>> I can.  I likely dropped the message because it looked x86 specific, so if
>> you could resend it'd be appreciated.
>
> Attached (diff with previous only in fixed comments typos).

Next time please split middle-end changes out from target-related stuff 
and send them separately.

These ones are OK.


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-16 16:21             ` Bernd Schmidt
@ 2016-11-16 18:14               ` Andrew Senkevich
       [not found]               ` <CAMXFM3seRe2+yxLN7e99zcxPsSfLwYqHmT=7EweW4vZ4NV+b7A@mail.gmail.com>
  1 sibling, 0 replies; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-16 18:14 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Jeff Law, Uros Bizjak, Jakub Jelinek, gcc-patches,
	Vladimir Makarov, Kirill Yukhin

2016-11-16 19:21 GMT+03:00 Bernd Schmidt <bschmidt@redhat.com>:
> On 11/15/2016 05:31 PM, Andrew Senkevich wrote:
>>
>> 2016-11-15 17:56 GMT+03:00 Jeff Law <law@redhat.com>:
>>>
>>> On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
>>>>
>>>>
>>>> 2016-11-11 14:16 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>>
>>>>>
>>>>> --- a/gcc/genmodes.c
>>>>> +++ b/gcc/genmodes.c
>>>>> --- a/gcc/init-regs.c
>>>>> +++ b/gcc/init-regs.c
>>>>> --- a/gcc/machmode.h
>>>>> +++ b/gcc/machmode.h
>>>>>
>>>>> These are middle-end changes, you will need a separate review for
>>>>> these.
>>>>
>>>>
>>>>
>>>> Who could review these changes?
>>>
>>>
>>> I can.  I likely dropped the message because it looked x86 specific, so
>>> if
>>> you could resend it'd be appreciated.
>>
>>
>> Attached (diff with previous only in fixed comments typos).
>
>
> Next time please split middle-end changes out from target-related stuff and
> send them separately.

Ok.

> These ones are OK.
>
>
> Bernd

Thanks!

Who could commit it?


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
       [not found]               ` <CAMXFM3seRe2+yxLN7e99zcxPsSfLwYqHmT=7EweW4vZ4NV+b7A@mail.gmail.com>
@ 2016-11-17 22:19                 ` H.J. Lu
  2016-11-18 19:41                   ` Jakub Jelinek
  0 siblings, 1 reply; 29+ messages in thread
From: H.J. Lu @ 2016-11-17 22:19 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: GCC Patches

On Thu, Nov 17, 2016 at 4:20 AM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> 16 Ноя 2016 г. 19:21 пользователь "Bernd Schmidt" <bschmidt@redhat.com>
> написал:
>
>
>>
>> On 11/15/2016 05:31 PM, Andrew Senkevich wrote:
>>>
>>> 2016-11-15 17:56 GMT+03:00 Jeff Law <law@redhat.com>:
>>>>
>>>> On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
>>>>>
>>>>>
>>>>> 2016-11-11 14:16 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>>>>>>
>>>>>>
>>>>>> --- a/gcc/genmodes.c
>>>>>> +++ b/gcc/genmodes.c
>>>>>> --- a/gcc/init-regs.c
>>>>>> +++ b/gcc/init-regs.c
>>>>>> --- a/gcc/machmode.h
>>>>>> +++ b/gcc/machmode.h
>>>>>>
>>>>>> These are middle-end changes, you will need a separate review for
>>>>>> these.
>>>>>
>>>>>
>>>>>
>>>>> Who could review these changes?
>>>>
>>>>
>>>> I can.  I likely dropped the message because it looked x86 specific, so
>>>> if
>>>> you could resend it'd be appreciated.
>>>
>>>
>>> Attached (diff with previous only in fixed comments typos).
>>
>>
>> Next time please split middle-end changes out from target-related stuff
>> and send them separately.
>>
>> These ones are OK.
>>
>>
>> Bernd
>
> Hi HJ, could you please commit it?

Done.

-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-17 22:19                 ` H.J. Lu
@ 2016-11-18 19:41                   ` Jakub Jelinek
  2016-11-18 20:30                     ` Jakub Jelinek
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Jelinek @ 2016-11-18 19:41 UTC (permalink / raw)
  To: H.J. Lu, Andrew Senkevich; +Cc: GCC Patches

Hi!

On Thu, Nov 17, 2016 at 02:18:57PM -0800, H.J. Lu wrote:
> > Hi HJ, could you please commit it?
> 
> Done.

I'm seeing lots of ICEs with this.

E.g. reduced:

typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));
typedef unsigned char __mmask8;
typedef float __v4sf __attribute__ ((__vector_size__ (16)));

static inline  __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_setzero_ps (void)
{
  return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f };
}

 __m128
foo (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C, __m128 __D, __m128 __E, __m128 *__F)
{
  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
      (__v4sf) __C,
      (__v4sf) __D,
      (__v4sf) __E,
      (__v4sf) __A,
      (const __v4sf *) __F,
      (__v4sf) _mm_setzero_ps (),
      (__mmask8) __U);
}

ICEs with -mavx5124fmaps -O0, but succeeds with
-mavx512vl -mavx5124fmaps -O0 or -mavx5124fmaps -O2.

            fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
            fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
            msk_mov   = gen_avx512vl_loadv4sf_mask;

looks wrong, while -mavx5124fmaps implies -mavx512f, it doesn't
imply -mavx512vl, so using -mavx512vl insns unconditionally is just wrong.
You need some fallback if avx512vl isn't available, perhaps use
avx512f 512-bit masked insns with bits in the mask forced to pick only the
ones you want?

Also, seems there are various formatting issues in the change,
e.g. shortly after s4fma_expand: there is indentation by 3 chars relative to
above { instead of 2, gen_rtx_SUBREG (V16SFmode, tmp, 0)); has extra 1 char
indentation, some lines too long.

	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-18 19:41                   ` Jakub Jelinek
@ 2016-11-18 20:30                     ` Jakub Jelinek
  2016-11-19  8:05                       ` Jakub Jelinek
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Jelinek @ 2016-11-18 20:30 UTC (permalink / raw)
  To: H.J. Lu, Andrew Senkevich; +Cc: GCC Patches

On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
> I'm seeing lots of ICEs with this.

Here is untested fix for that, will bootstrap/regtest it soon (after my
current set of bootstraps finishes).

2016-11-18  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
	don't initialize it, don't use it for the case where it isn't
	provable %{z} nor using the same argument, instead move merge
	argument into a new pseudo and use that as target.  Formatting fixes.

--- gcc/config/i386/i386.c.jj	2016-11-18 20:04:31.000000000 +0100
+++ gcc/config/i386/i386.c	2016-11-18 21:21:17.764190127 +0100
@@ -38220,14 +38220,12 @@ rdseed_step:
       rtx (*fcn) (rtx, rtx, rtx, rtx);
       rtx (*fcn_mask) (rtx, rtx, rtx, rtx, rtx);
       rtx (*fcn_maskz) (rtx, rtx, rtx, rtx, rtx, rtx);
-      rtx (*msk_mov) (rtx, rtx, rtx, rtx);
       int masked = 1;
       machine_mode mode, wide_mode, nar_mode;
 
       nar_mode  = V4SFmode;
       mode      = V16SFmode;
       wide_mode = V64SFmode;
-      msk_mov   = gen_avx512f_loadv16sf_mask;
       fcn_mask  = gen_avx5124fmaddps_4fmaddps_mask;
       fcn_maskz = gen_avx5124fmaddps_4fmaddps_maskz;
 
@@ -38270,7 +38268,6 @@ rdseed_step:
 	  wide_mode = V64SImode;
 	  fcn_mask  = gen_avx5124vnniw_vp4dpwssd_mask;
 	  fcn_maskz = gen_avx5124vnniw_vp4dpwssd_maskz;
-	  msk_mov   = gen_avx512f_loadv16si_mask;
 	  goto v4fma_expand;
 
 	case IX86_BUILTIN_4DPWSSDS_MASK:
@@ -38279,7 +38276,6 @@ rdseed_step:
 	  wide_mode = V64SImode;
 	  fcn_mask  = gen_avx5124vnniw_vp4dpwssds_mask;
 	  fcn_maskz = gen_avx5124vnniw_vp4dpwssds_maskz;
-	  msk_mov   = gen_avx512f_loadv16si_mask;
 	  goto v4fma_expand;
 
 	case IX86_BUILTIN_4FMAPS_MASK:
@@ -38295,11 +38291,11 @@ v4fma_expand:
 	    wide_reg = gen_reg_rtx (wide_mode);
 	    for (i = 0; i < 4; i++)
 	      {
-	        args[i] = CALL_EXPR_ARG (exp, i);
+		args[i] = CALL_EXPR_ARG (exp, i);
 		ops[i] = expand_normal (args[i]);
 
-		emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, (i) * 64),
-				  ops[i]);
+		emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, i * 64),
+				ops[i]);
 	      }
 
 	    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
@@ -38318,7 +38314,7 @@ v4fma_expand:
 	      emit_insn (fcn (target, accum, wide_reg, mem));
 	    else
 	      {
-	        rtx merge, mask;
+		rtx merge, mask;
 		merge = expand_normal (CALL_EXPR_ARG (exp, 6));
 
 		mask = expand_normal (CALL_EXPR_ARG (exp, 7));
@@ -38340,18 +38336,16 @@ v4fma_expand:
 		    merge = force_reg (mode, merge);
 		    emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
 		  }
-	        /* Merge with something unknown might happen if we z-mask w/ -O0.  */
+		/* Merge with something unknown might happen if we z-mask w/ -O0.  */
 		else
 		  {
-		    rtx tmp = target;
-		    emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
-
-		    target = force_reg (mode, merge);
-		    emit_insn (msk_mov (target, tmp, target, mask));
+		    target = gen_reg_rtx (mode);
+		    emit_move_insn (target, merge);
+		    emit_insn (fcn_mask (target, wide_reg, mem, target, mask));
 		  }
 	      }
-	      return target;
-	    }
+	    return target;
+	  }
 
 	case IX86_BUILTIN_4FNMASS:
 	  fcn = gen_avx5124fmaddps_4fnmaddss;
@@ -38366,7 +38360,6 @@ v4fma_expand:
 	case IX86_BUILTIN_4FNMASS_MASK:
 	  fcn_mask = gen_avx5124fmaddps_4fnmaddss_mask;
 	  fcn_maskz = gen_avx5124fmaddps_4fnmaddss_maskz;
-	  msk_mov   = gen_avx512vl_loadv4sf_mask;
 	  goto s4fma_expand;
 
 	case IX86_BUILTIN_4FMASS_MASK:
@@ -38380,22 +38373,21 @@ v4fma_expand:
 
 	    fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
 	    fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
-	    msk_mov   = gen_avx512vl_loadv4sf_mask;
 
 s4fma_expand:
 	    mode = V4SFmode;
 	    wide_reg = gen_reg_rtx (V64SFmode);
 	    for (i = 0; i < 4; i++)
 	      {
-		 rtx tmp;
-		 args[i] = CALL_EXPR_ARG (exp, i);
-		 ops[i] = expand_normal (args[i]);
+		rtx tmp;
+		args[i] = CALL_EXPR_ARG (exp, i);
+		ops[i] = expand_normal (args[i]);
 
-		 tmp = gen_reg_rtx (SFmode);
-		 emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
+		tmp = gen_reg_rtx (SFmode);
+		emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
 
-		 emit_move_insn (gen_rtx_SUBREG (V16SFmode, wide_reg, i * 64),
-				  gen_rtx_SUBREG (V16SFmode, tmp, 0));
+		emit_move_insn (gen_rtx_SUBREG (V16SFmode, wide_reg, i * 64),
+				gen_rtx_SUBREG (V16SFmode, tmp, 0));
 	      }
 
 	    accum = expand_normal (CALL_EXPR_ARG (exp, 4));
@@ -38414,37 +38406,37 @@ s4fma_expand:
 	      emit_insn (fcn (target, accum, wide_reg, mem));
 	    else
 	      {
-		 rtx merge, mask;
-		 merge = expand_normal (CALL_EXPR_ARG (exp, 6));
+		rtx merge, mask;
+		merge = expand_normal (CALL_EXPR_ARG (exp, 6));
 
-		 mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+		mask = expand_normal (CALL_EXPR_ARG (exp, 7));
+
+		if (CONST_INT_P (mask))
+		  mask = fixup_modeless_constant (mask, QImode);
 
-		 if (CONST_INT_P (mask))
-		   mask = fixup_modeless_constant (mask, QImode);
+		mask = force_reg (QImode, mask);
 
-		 mask = force_reg (QImode, mask);
-
-		 if (GET_MODE (mask) != QImode)
-		   mask = gen_rtx_SUBREG (QImode, mask, 0);
-
-		 /* If merge is 0 then we're about to emit z-masked variant.  */
-		 if (const0_operand (merge, mode))
-		   emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
-		 /* If merge is the same as accum then emit merge-masked variant.  */
-		 else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
-		   {
-		     merge = force_reg (mode, merge);
-		     emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
-		   }
-		 /* Merge with something unknown might happen if we z-mask w/ -O0.  */
-		 else
-		   {
-		     rtx tmp = target;
-		     emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
-
-		     target = force_reg (mode, merge);
-		     emit_insn (msk_mov (target, tmp, target, mask));
-		   }
+		if (GET_MODE (mask) != QImode)
+		  mask = gen_rtx_SUBREG (QImode, mask, 0);
+
+		/* If merge is 0 then we're about to emit z-masked variant.  */
+		if (const0_operand (merge, mode))
+		  emit_insn (fcn_maskz (target, accum, wide_reg, mem, merge, mask));
+		/* If merge is the same as accum then emit merge-masked
+		   variant.  */
+		else if (CALL_EXPR_ARG (exp, 6) == CALL_EXPR_ARG (exp, 4))
+		  {
+		    merge = force_reg (mode, merge);
+		    emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
+		  }
+		/* Merge with something unknown might happen if we z-mask
+		   w/ -O0.  */
+		else
+		  {
+		    target = gen_reg_rtx (mode);
+		    emit_move_insn (target, merge);
+		    emit_insn (fcn_mask (target, wide_reg, mem, target, mask));
+		  }
 		}
 	      return target;
 	    }


	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-18 20:30                     ` Jakub Jelinek
@ 2016-11-19  8:05                       ` Jakub Jelinek
  2016-11-19  8:28                         ` Jakub Jelinek
  2016-11-19 10:18                         ` Uros Bizjak
  0 siblings, 2 replies; 29+ messages in thread
From: Jakub Jelinek @ 2016-11-19  8:05 UTC (permalink / raw)
  To: H.J. Lu, Andrew Senkevich, Uros Bizjak; +Cc: GCC Patches

On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
> > I'm seeing lots of ICEs with this.
> 
> Here is untested fix for that, will bootstrap/regtest it soon (after my
> current set of bootstraps finishes).
> 
> 2016-11-18  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
> 	don't initialize it, don't use it for the case where it isn't
> 	provable %{z} nor using the same argument, instead move merge
> 	argument into a new pseudo and use that as target.  Formatting fixes.

Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and
fixed a couple of FAILs, but not tons of others.

Here is another patch I'm going to test which fixes many other FAILs, but
still some are left:
FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
FAIL: gcc.target/i386/mvc1.c (internal compiler error)
FAIL: gcc.target/i386/mvc1.c (test for excess errors)
FAIL: gcc.target/i386/mvc6.c (internal compiler error)
FAIL: gcc.target/i386/mvc6.c (test for excess errors)
FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
FAIL: gcc.target/i386/mvc8.c (internal compiler error)
FAIL: gcc.target/i386/mvc8.c (test for excess errors)
FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)
Will debug even those.

2016-11-19  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/i386.c (def_builtin, def_builtin2, def_builtin_const2,
	ix86_add_new_builtins): Formatting fixes.
	(ix86_expand_builtin): Use || instead of && for isa vs. isa2.
	(ix86_get_builtin): Likewise.

--- gcc/config/i386/i386.c.jj	2016-11-18 22:30:16.000000000 +0100
+++ gcc/config/i386/i386.c	2016-11-19 08:37:45.748175866 +0100
@@ -30924,7 +30924,7 @@ def_builtin (HOST_WIDE_INT mask, const c
 	 means that *both* cpuid bits must be set for the built-in to be available.
 	 Handle this here.  */
       if (mask & ix86_isa_flags & OPTION_MASK_ISA_AVX512VL)
-	  mask &= ~OPTION_MASK_ISA_AVX512VL;
+	mask &= ~OPTION_MASK_ISA_AVX512VL;
 
       mask &= ~OPTION_MASK_ISA_64BIT;
       if (mask == 0
@@ -30976,8 +30976,8 @@ def_builtin_const (HOST_WIDE_INT mask, c
 
 static inline tree
 def_builtin2 (HOST_WIDE_INT mask, const char *name,
-	     enum ix86_builtin_func_type tcode,
-	     enum ix86_builtins code)
+	      enum ix86_builtin_func_type tcode,
+	      enum ix86_builtins code)
 {
   tree decl = NULL_TREE;
 
@@ -30992,8 +30992,8 @@ def_builtin2 (HOST_WIDE_INT mask, const
       tree type = ix86_get_builtin_func_type (tcode);
       decl = add_builtin_function (name, type, code, BUILT_IN_MD,
 				   NULL, NULL_TREE);
-	  ix86_builtins[(int) code] = decl;
-	  ix86_builtins_isa[(int) code].set_and_not_built_p = false;
+      ix86_builtins[(int) code] = decl;
+      ix86_builtins_isa[(int) code].set_and_not_built_p = false;
     }
   else
     {
@@ -31016,7 +31016,7 @@ def_builtin2 (HOST_WIDE_INT mask, const
 
 static inline tree
 def_builtin_const2 (HOST_WIDE_INT mask, const char *name,
-		   enum ix86_builtin_func_type tcode, enum ix86_builtins code)
+		    enum ix86_builtin_func_type tcode, enum ix86_builtins code)
 {
   tree decl = def_builtin2 (mask, name, tcode, code);
   if (decl)
@@ -31034,8 +31034,8 @@ def_builtin_const2 (HOST_WIDE_INT mask,
 static void
 ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2)
 {
-  if (((isa & deferred_isa_values) == 0)
-      && ((isa2 & deferred_isa_values2) == 0))
+  if ((isa & deferred_isa_values) == 0
+      && (isa2 & deferred_isa_values2) == 0)
     return;
 
   /* Bits in ISA value can be removed from potential isa values.  */
@@ -31048,7 +31048,8 @@ ix86_add_new_builtins (HOST_WIDE_INT isa
 
   for (i = 0; i < (int)IX86_BUILTIN_MAX; i++)
     {
-      if ((((ix86_builtins_isa[i].isa & isa) != 0) || ((ix86_builtins_isa[i].isa2 & isa2) != 0))
+      if (((ix86_builtins_isa[i].isa & isa) != 0
+	   || (ix86_builtins_isa[i].isa2 & isa2) != 0)
 	  && ix86_builtins_isa[i].set_and_not_built_p)
 	{
 	  tree decl, type;
@@ -36549,7 +36550,7 @@ ix86_expand_builtin (tree exp, rtx targe
      whether it is supported.  */
   if ((ix86_builtins_isa[fcode].isa
        && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
-      && (ix86_builtins_isa[fcode].isa2
+      || (ix86_builtins_isa[fcode].isa2
 	  && !(ix86_builtins_isa[fcode].isa2 & ix86_isa_flags2)))
     {
       char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa,
@@ -38514,7 +38515,7 @@ static tree ix86_get_builtin (enum ix86_
   opts = TREE_TARGET_OPTION (target_tree);
 
   if ((ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
-	&& (ix86_builtins_isa[(int) code].isa2 & opts->x_ix86_isa_flags2))
+      || (ix86_builtins_isa[(int) code].isa2 & opts->x_ix86_isa_flags2))
     return ix86_builtin_decl (code, true);
   else
     return NULL_TREE;


	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-19  8:05                       ` Jakub Jelinek
@ 2016-11-19  8:28                         ` Jakub Jelinek
  2016-11-19 10:18                         ` Uros Bizjak
  1 sibling, 0 replies; 29+ messages in thread
From: Jakub Jelinek @ 2016-11-19  8:28 UTC (permalink / raw)
  To: H.J. Lu, Andrew Senkevich, Uros Bizjak; +Cc: GCC Patches

On Sat, Nov 19, 2016 at 09:05:05AM +0100, Jakub Jelinek wrote:
> On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote:
> > On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
> > > I'm seeing lots of ICEs with this.
> > 
> > Here is untested fix for that, will bootstrap/regtest it soon (after my
> > current set of bootstraps finishes).
> > 
> > 2016-11-18  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	* config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
> > 	don't initialize it, don't use it for the case where it isn't
> > 	provable %{z} nor using the same argument, instead move merge
> > 	argument into a new pseudo and use that as target.  Formatting fixes.
> 
> Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and
> fixed a couple of FAILs, but not tons of others.
> 
> Here is another patch I'm going to test which fixes many other FAILs, but
> still some are left:
> FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
> FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
> FAIL: gcc.target/i386/mvc1.c (internal compiler error)
> FAIL: gcc.target/i386/mvc1.c (test for excess errors)
> FAIL: gcc.target/i386/mvc6.c (internal compiler error)
> FAIL: gcc.target/i386/mvc6.c (test for excess errors)
> FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
> FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
> FAIL: gcc.target/i386/mvc8.c (internal compiler error)
> FAIL: gcc.target/i386/mvc8.c (test for excess errors)
> FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
> FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
> FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
> FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
> FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)
> Will debug even those.

The fix for that (still not bootstrapped/regtested) is below.
Will now bootstrap/regtest all 3 patches and hopefully all the 4fma*
introduced regressions will be gone.

2016-11-19  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/i386.c (ix86_valid_target_attribute_tree): Don't
	clear opts->x_ix86_isa_flags, clear opts->x_ix86_isa_flags2
	instead and using = 0 instead of &= 0.

--- gcc/config/i386/i386.c.jj	2016-11-19 08:54:37.000000000 +0100
+++ gcc/config/i386/i386.c	2016-11-19 09:20:52.314913008 +0100
@@ -6845,7 +6845,7 @@ ix86_valid_target_attribute_tree (tree a
 				     | OPTION_MASK_ABI_64
 				     | OPTION_MASK_ABI_X32
 				     | OPTION_MASK_CODE16);
-	  opts->x_ix86_isa_flags &= 0;
+	  opts->x_ix86_isa_flags2 = 0;
 	}
       else if (!orig_arch_specified)
 	opts->x_ix86_arch_string = NULL;


	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-19  8:05                       ` Jakub Jelinek
  2016-11-19  8:28                         ` Jakub Jelinek
@ 2016-11-19 10:18                         ` Uros Bizjak
  2016-11-19 11:28                           ` Jakub Jelinek
  2016-11-19 14:04                           ` Andrew Senkevich
  1 sibling, 2 replies; 29+ messages in thread
From: Uros Bizjak @ 2016-11-19 10:18 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, Andrew Senkevich, GCC Patches

On Sat, Nov 19, 2016 at 9:05 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote:
>> On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
>> > I'm seeing lots of ICEs with this.
>>
>> Here is untested fix for that, will bootstrap/regtest it soon (after my
>> current set of bootstraps finishes).
>>
>> 2016-11-18  Jakub Jelinek  <jakub@redhat.com>
>>
>>       * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
>>       don't initialize it, don't use it for the case where it isn't
>>       provable %{z} nor using the same argument, instead move merge
>>       argument into a new pseudo and use that as target.  Formatting fixes.
>
> Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and
> fixed a couple of FAILs, but not tons of others.
>
> Here is another patch I'm going to test which fixes many other FAILs, but
> still some are left:
> FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
> FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
> FAIL: gcc.target/i386/mvc1.c (internal compiler error)
> FAIL: gcc.target/i386/mvc1.c (test for excess errors)
> FAIL: gcc.target/i386/mvc6.c (internal compiler error)
> FAIL: gcc.target/i386/mvc6.c (test for excess errors)
> FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
> FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
> FAIL: gcc.target/i386/mvc8.c (internal compiler error)
> FAIL: gcc.target/i386/mvc8.c (test for excess errors)
> FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
> FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
> FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
> FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
> FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)

I wonder why patch submitter didn't get these failures during
regtesting. There are plenty of tests (the above multi-vrsioning
tests) that depend on correct handling of ISA variables. I assumed
that these tests passed and consequently didn't went deep into the
implementation, but rather requested a couple of additional tests that
exercised added functionality.some more.

> Will debug even those.

Thanks!

Uros.

> 2016-11-19  Jakub Jelinek  <jakub@redhat.com>
>
>         * config/i386/i386.c (def_builtin, def_builtin2, def_builtin_const2,
>         ix86_add_new_builtins): Formatting fixes.
>         (ix86_expand_builtin): Use || instead of && for isa vs. isa2.
>         (ix86_get_builtin): Likewise.
>
> --- gcc/config/i386/i386.c.jj   2016-11-18 22:30:16.000000000 +0100
> +++ gcc/config/i386/i386.c      2016-11-19 08:37:45.748175866 +0100
> @@ -30924,7 +30924,7 @@ def_builtin (HOST_WIDE_INT mask, const c
>          means that *both* cpuid bits must be set for the built-in to be available.
>          Handle this here.  */
>        if (mask & ix86_isa_flags & OPTION_MASK_ISA_AVX512VL)
> -         mask &= ~OPTION_MASK_ISA_AVX512VL;
> +       mask &= ~OPTION_MASK_ISA_AVX512VL;
>
>        mask &= ~OPTION_MASK_ISA_64BIT;
>        if (mask == 0
> @@ -30976,8 +30976,8 @@ def_builtin_const (HOST_WIDE_INT mask, c
>
>  static inline tree
>  def_builtin2 (HOST_WIDE_INT mask, const char *name,
> -            enum ix86_builtin_func_type tcode,
> -            enum ix86_builtins code)
> +             enum ix86_builtin_func_type tcode,
> +             enum ix86_builtins code)
>  {
>    tree decl = NULL_TREE;
>
> @@ -30992,8 +30992,8 @@ def_builtin2 (HOST_WIDE_INT mask, const
>        tree type = ix86_get_builtin_func_type (tcode);
>        decl = add_builtin_function (name, type, code, BUILT_IN_MD,
>                                    NULL, NULL_TREE);
> -         ix86_builtins[(int) code] = decl;
> -         ix86_builtins_isa[(int) code].set_and_not_built_p = false;
> +      ix86_builtins[(int) code] = decl;
> +      ix86_builtins_isa[(int) code].set_and_not_built_p = false;
>      }
>    else
>      {
> @@ -31016,7 +31016,7 @@ def_builtin2 (HOST_WIDE_INT mask, const
>
>  static inline tree
>  def_builtin_const2 (HOST_WIDE_INT mask, const char *name,
> -                  enum ix86_builtin_func_type tcode, enum ix86_builtins code)
> +                   enum ix86_builtin_func_type tcode, enum ix86_builtins code)
>  {
>    tree decl = def_builtin2 (mask, name, tcode, code);
>    if (decl)
> @@ -31034,8 +31034,8 @@ def_builtin_const2 (HOST_WIDE_INT mask,
>  static void
>  ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2)
>  {
> -  if (((isa & deferred_isa_values) == 0)
> -      && ((isa2 & deferred_isa_values2) == 0))
> +  if ((isa & deferred_isa_values) == 0
> +      && (isa2 & deferred_isa_values2) == 0)
>      return;
>
>    /* Bits in ISA value can be removed from potential isa values.  */
> @@ -31048,7 +31048,8 @@ ix86_add_new_builtins (HOST_WIDE_INT isa
>
>    for (i = 0; i < (int)IX86_BUILTIN_MAX; i++)
>      {
> -      if ((((ix86_builtins_isa[i].isa & isa) != 0) || ((ix86_builtins_isa[i].isa2 & isa2) != 0))
> +      if (((ix86_builtins_isa[i].isa & isa) != 0
> +          || (ix86_builtins_isa[i].isa2 & isa2) != 0)
>           && ix86_builtins_isa[i].set_and_not_built_p)
>         {
>           tree decl, type;
> @@ -36549,7 +36550,7 @@ ix86_expand_builtin (tree exp, rtx targe
>       whether it is supported.  */
>    if ((ix86_builtins_isa[fcode].isa
>         && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
> -      && (ix86_builtins_isa[fcode].isa2
> +      || (ix86_builtins_isa[fcode].isa2
>           && !(ix86_builtins_isa[fcode].isa2 & ix86_isa_flags2)))
>      {
>        char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa,
> @@ -38514,7 +38515,7 @@ static tree ix86_get_builtin (enum ix86_
>    opts = TREE_TARGET_OPTION (target_tree);
>
>    if ((ix86_builtins_isa[(int) code].isa & opts->x_ix86_isa_flags)
> -       && (ix86_builtins_isa[(int) code].isa2 & opts->x_ix86_isa_flags2))
> +      || (ix86_builtins_isa[(int) code].isa2 & opts->x_ix86_isa_flags2))
>      return ix86_builtin_decl (code, true);
>    else
>      return NULL_TREE;
>
>
>         Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-19 10:18                         ` Uros Bizjak
@ 2016-11-19 11:28                           ` Jakub Jelinek
  2016-11-19 17:24                             ` Jakub Jelinek
  2016-11-19 14:04                           ` Andrew Senkevich
  1 sibling, 1 reply; 29+ messages in thread
From: Jakub Jelinek @ 2016-11-19 11:28 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, Andrew Senkevich, GCC Patches

On Sat, Nov 19, 2016 at 11:17:55AM +0100, Uros Bizjak wrote:
> > Here is another patch I'm going to test which fixes many other FAILs, but
> > still some are left:
> > FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
> > FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
> > FAIL: gcc.target/i386/mvc1.c (internal compiler error)
> > FAIL: gcc.target/i386/mvc1.c (test for excess errors)
> > FAIL: gcc.target/i386/mvc6.c (internal compiler error)
> > FAIL: gcc.target/i386/mvc6.c (test for excess errors)
> > FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
> > FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
> > FAIL: gcc.target/i386/mvc8.c (internal compiler error)
> > FAIL: gcc.target/i386/mvc8.c (test for excess errors)
> > FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
> > FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
> > FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
> > FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
> > FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)
> 
> I wonder why patch submitter didn't get these failures during
> regtesting. There are plenty of tests (the above multi-vrsioning
> tests) that depend on correct handling of ISA variables. I assumed
> that these tests passed and consequently didn't went deep into the
> implementation, but rather requested a couple of additional tests that
> exercised added functionality.some more.

Dunno, clearly the patch has not been tested at all, at least not in
the form that has been checked in.
I've now bootstrapped/regtested on x86_64-linux and i686-linux all these
3 patches:
http://gcc.gnu.org/ml/gcc-patches/2016-11/msg01992.html
http://gcc.gnu.org/ml/gcc-patches/2016-11/msg02026.html
http://gcc.gnu.org/ml/gcc-patches/2016-11/msg02027.html
, e.g. on x86_64-linux they fix:
-FAIL: gcc.target/i386/avx-2.c (internal compiler error)
-FAIL: gcc.target/i386/avx-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx2-gather-2.c scan-tree-dump-times vect "note: vectorized 1 loops in function" 16
-FAIL: gcc.target/i386/avx2-gather-6.c scan-tree-dump-times vect "note: vectorized 1 loops in function" 1
-FAIL: gcc.target/i386/avx512f-ceil-sfix-vec-2.c scan-assembler-times vcvttpd2dq[^\\n]*zmm[0-9].{7}(?:\\n|[ \\\\t]+#) 2
-FAIL: gcc.target/i386/avx512f-ceil-sfix-vec-2.c scan-assembler-times vrndscalepd[^\\n]*zmm[0-9](?:\\n|[ \\\\t]+#) 2
-FAIL: gcc.target/i386/avx512f-ceil-vec-2.c scan-assembler-times vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-ceilf-sfix-vec-2.c scan-assembler-times vcvttps2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-ceilf-sfix-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-ceilf-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-floor-sfix-vec-2.c scan-assembler-times vcvttpd2dq[^\\n]*zmm[0-9].{7}(?:\\n|[ \\\\t]+#) 2
-FAIL: gcc.target/i386/avx512f-floor-sfix-vec-2.c scan-assembler-times vrndscalepd[^\\n]*zmm[0-9](?:\\n|[ \\\\t]+#) 2
-FAIL: gcc.target/i386/avx512f-floor-vec-2.c scan-assembler-times vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-floorf-sfix-vec-2.c scan-assembler-times vcvttps2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-floorf-sfix-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-floorf-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-gather-2.c scan-tree-dump-times vect "note: vectorized 1 loops in function" 16
-FAIL: gcc.target/i386/avx512f-gather-5.c scan-assembler-times gather[^\\n]*zmm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 2
-FAIL: gcc.target/i386/avx512f-rint-sfix-vec-2.c scan-assembler-times vcvtpd2dq[^\\n]+ymm[0-9](?:\\n|[ \\\\t]+#) 2
-FAIL: gcc.target/i386/avx512f-rint-sfix-vec-2.c scan-assembler-times vinserti64x4[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-rintf-sfix-vec-2.c scan-assembler-times vcvtps2dq[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-round-sfix-vec-2.c scan-assembler-times vcvttpd2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ \\\\t]+#) 2
-FAIL: gcc.target/i386/avx512f-round-sfix-vec-2.c scan-assembler-times vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 2
-FAIL: gcc.target/i386/avx512f-roundf-sfix-vec-2.c scan-assembler-times vcvttps2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-roundf-sfix-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-trunc-vec-2.c scan-assembler-times vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/avx512f-truncf-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ \\\\t]+#) 1
-FAIL: gcc.target/i386/funcspec-8.c  (test for errors, line 104)
-FAIL: gcc.target/i386/funcspec-8.c  (test for errors, line 123)
-FAIL: gcc.target/i386/funcspec-8.c  (test for errors, line 142)
-FAIL: gcc.target/i386/funcspec-8.c  (test for errors, line 161)
-FAIL: gcc.target/i386/funcspec-8.c  (test for errors, line 28)
-FAIL: gcc.target/i386/funcspec-8.c  (test for errors, line 47)
-FAIL: gcc.target/i386/funcspec-8.c  (test for errors, line 66)
-FAIL: gcc.target/i386/funcspec-8.c  (test for errors, line 85)
-FAIL: gcc.target/i386/funcspec-8.c (internal compiler error)
-FAIL: gcc.target/i386/funcspec-8.c (test for excess errors)
-FAIL: gcc.target/i386/mvc1.c (internal compiler error)
-FAIL: gcc.target/i386/mvc1.c (test for excess errors)
-UNRESOLVED: gcc.target/i386/mvc1.c compilation failed to produce executable
-FAIL: gcc.target/i386/mvc6.c (test for excess errors)
-UNRESOLVED: gcc.target/i386/mvc6.c scan-assembler punpcklbw
-UNRESOLVED: gcc.target/i386/mvc6.c scan-assembler vpshufb
-FAIL: gcc.target/i386/mvc8.c (internal compiler error)
-FAIL: gcc.target/i386/mvc8.c (test for excess errors)
-UNRESOLVED: gcc.target/i386/mvc8.c scan-assembler-not constprop
 FAIL: gcc.target/i386/pr45685.c scan-assembler-times cmov 6
-FAIL: gcc.target/i386/pr67995-1.c  (test for errors, line 11)
-FAIL: gcc.target/i386/pr67995-1.c (internal compiler error)
-FAIL: gcc.target/i386/pr67995-1.c (test for excess errors)
-FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
-FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
-FAIL: gcc.target/i386/pr69255-1.c  (test for errors, line 13)
-FAIL: gcc.target/i386/pr69255-1.c  (test for warnings, line 13)
-FAIL: gcc.target/i386/pr69255-1.c (internal compiler error)
-FAIL: gcc.target/i386/pr69255-1.c (test for excess errors)
-FAIL: gcc.target/i386/pr69255-2.c  (test for errors, line 13)
-FAIL: gcc.target/i386/pr69255-2.c  (test for warnings, line 13)
-FAIL: gcc.target/i386/pr69255-2.c  (test for warnings, line 13)
-FAIL: gcc.target/i386/pr69255-2.c (internal compiler error)
-FAIL: gcc.target/i386/pr69255-2.c (test for excess errors)
-FAIL: gcc.target/i386/pr69255-3.c  (test for errors, line 13)
-FAIL: gcc.target/i386/pr69255-3.c  (test for warnings, line 13)
-FAIL: gcc.target/i386/pr69255-3.c (internal compiler error)
-FAIL: gcc.target/i386/pr69255-3.c (test for excess errors)
-FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
-FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
-FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)
-FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
-FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
-WARNING: program timed out.
-FAIL: g++.dg/ext/mv6.C  -std=gnu++11 (internal compiler error)
-FAIL: g++.dg/ext/mv6.C  -std=gnu++11 (test for excess errors)
-UNRESOLVED: g++.dg/ext/mv6.C  -std=gnu++11 compilation failed to produce executable
-FAIL: g++.dg/ext/mv6.C  -std=gnu++14 (internal compiler error)
-FAIL: g++.dg/ext/mv6.C  -std=gnu++14 (test for excess errors)
-UNRESOLVED: g++.dg/ext/mv6.C  -std=gnu++14 compilation failed to produce executable
-FAIL: g++.dg/ext/mv6.C  -std=gnu++98 (internal compiler error)
-FAIL: g++.dg/ext/mv6.C  -std=gnu++98 (test for excess errors)
-UNRESOLVED: g++.dg/ext/mv6.C  -std=gnu++98 compilation failed to produce executable
-FAIL: g++.dg/ext/mvc1.C  -std=c++11 (internal compiler error)
-FAIL: g++.dg/ext/mvc1.C  -std=c++11 (test for excess errors)
-UNRESOLVED: g++.dg/ext/mvc1.C  -std=c++11 compilation failed to produce executable
-FAIL: g++.dg/ext/mvc1.C  -std=c++14 (internal compiler error)
-FAIL: g++.dg/ext/mvc1.C  -std=c++14 (test for excess errors)
-UNRESOLVED: g++.dg/ext/mvc1.C  -std=c++14 compilation failed to produce executable
-FAIL: g++.dg/ext/mvc1.C  -std=c++98 (internal compiler error)
-FAIL: g++.dg/ext/mvc1.C  -std=c++98 (test for excess errors)
-UNRESOLVED: g++.dg/ext/mvc1.C  -std=c++98 compilation failed to produce executable
-FAIL: g++.dg/ext/pr57548.C  -std=c++11 (internal compiler error)
-FAIL: g++.dg/ext/pr57548.C  -std=c++11 (test for excess errors)
-FAIL: g++.dg/ext/pr57548.C  -std=c++14 (internal compiler error)
-FAIL: g++.dg/ext/pr57548.C  -std=c++14 (test for excess errors)
-FAIL: g++.dg/ext/pr57548.C  -std=c++98 (internal compiler error)
-FAIL: g++.dg/ext/pr57548.C  -std=c++98 (test for excess errors)

On x86_64-linux with the 3 patches I'm not seeing any new FAILs
compared to before r242569, on i686-linux there is still:
+FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
+FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
compared to pre-r242569 (so some further fix is needed).

	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-19 10:18                         ` Uros Bizjak
  2016-11-19 11:28                           ` Jakub Jelinek
@ 2016-11-19 14:04                           ` Andrew Senkevich
  1 sibling, 0 replies; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-19 14:04 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, H.J. Lu, GCC Patches

2016-11-19 13:17 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> On Sat, Nov 19, 2016 at 9:05 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote:
>>> On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
>>> > I'm seeing lots of ICEs with this.
>>>
>>> Here is untested fix for that, will bootstrap/regtest it soon (after my
>>> current set of bootstraps finishes).
>>>
>>> 2016-11-18  Jakub Jelinek  <jakub@redhat.com>
>>>
>>>       * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
>>>       don't initialize it, don't use it for the case where it isn't
>>>       provable %{z} nor using the same argument, instead move merge
>>>       argument into a new pseudo and use that as target.  Formatting fixes.
>>
>> Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and
>> fixed a couple of FAILs, but not tons of others.
>>
>> Here is another patch I'm going to test which fixes many other FAILs, but
>> still some are left:
>> FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
>> FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
>> FAIL: gcc.target/i386/mvc1.c (internal compiler error)
>> FAIL: gcc.target/i386/mvc1.c (test for excess errors)
>> FAIL: gcc.target/i386/mvc6.c (internal compiler error)
>> FAIL: gcc.target/i386/mvc6.c (test for excess errors)
>> FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
>> FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
>> FAIL: gcc.target/i386/mvc8.c (internal compiler error)
>> FAIL: gcc.target/i386/mvc8.c (test for excess errors)
>> FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
>> FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
>> FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
>> FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
>> FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)
>
> I wonder why patch submitter didn't get these failures during
> regtesting. There are plenty of tests (the above multi-vrsioning
> tests) that depend on correct handling of ISA variables. I assumed
> that these tests passed and consequently didn't went deep into the
> implementation, but rather requested a couple of additional tests that
> exercised added functionality.some more.

Completely my bad. Starting from addition last intrinsics testing gone wrong.
Will double check next time to avoid repeating in the future.

>> Will debug even those.

Thank you, Jakub.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-19 11:28                           ` Jakub Jelinek
@ 2016-11-19 17:24                             ` Jakub Jelinek
  2016-11-19 18:52                               ` Uros Bizjak
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Jelinek @ 2016-11-19 17:24 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, Andrew Senkevich, GCC Patches

On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:
> On x86_64-linux with the 3 patches I'm not seeing any new FAILs
> compared to before r242569, on i686-linux there is still:
> +FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
> +FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
> compared to pre-r242569 (so some further fix is needed).

And finally here is yet another patch that fixes pr57756 on i686-linux.
Ok for trunk together with the other 3 patches?

2016-11-19  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/i386.c (ix86_can_inline_p): Use || instead of &
	when checking if callee's isa flags are subset of caller's isa flags.
	Fix comment wording.

--- gcc/config/i386/i386.c.jj	2016-11-19 18:02:56.000000000 +0100
+++ gcc/config/i386/i386.c	2016-11-19 18:21:23.649463040 +0100
@@ -6981,13 +6981,13 @@ ix86_can_inline_p (tree caller, tree cal
       struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
       struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
 
-      /* Callee's isa options should a subset of the caller's, i.e. a SSE4 function
-	 can inline a SSE2 function but a SSE2 function can't inline a SSE4
-	 function.  */
+      /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
+	 function can inline a SSE2 function but a SSE2 function can't inline
+	 a SSE4 function.  */
       if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
-	  != callee_opts->x_ix86_isa_flags) &
-	  ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
-	  != callee_opts->x_ix86_isa_flags2))
+	   != callee_opts->x_ix86_isa_flags)
+	  || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
+	      != callee_opts->x_ix86_isa_flags2))
 	ret = false;
 
       /* See if we have the same non-isa options.  */


	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-19 17:24                             ` Jakub Jelinek
@ 2016-11-19 18:52                               ` Uros Bizjak
  2016-11-20 18:16                                 ` Uros Bizjak
  0 siblings, 1 reply; 29+ messages in thread
From: Uros Bizjak @ 2016-11-19 18:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, Andrew Senkevich, GCC Patches

On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:
>> On x86_64-linux with the 3 patches I'm not seeing any new FAILs
>> compared to before r242569, on i686-linux there is still:
>> +FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
>> +FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
>> compared to pre-r242569 (so some further fix is needed).
>
> And finally here is yet another patch that fixes pr57756 on i686-linux.
> Ok for trunk together with the other 3 patches?

OK for the whole patch series.

Big thanks,
Uros.

>
> 2016-11-19  Jakub Jelinek  <jakub@redhat.com>
>
>         * config/i386/i386.c (ix86_can_inline_p): Use || instead of &
>         when checking if callee's isa flags are subset of caller's isa flags.
>         Fix comment wording.
>
> --- gcc/config/i386/i386.c.jj   2016-11-19 18:02:56.000000000 +0100
> +++ gcc/config/i386/i386.c      2016-11-19 18:21:23.649463040 +0100
> @@ -6981,13 +6981,13 @@ ix86_can_inline_p (tree caller, tree cal
>        struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
>        struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
>
> -      /* Callee's isa options should a subset of the caller's, i.e. a SSE4 function
> -        can inline a SSE2 function but a SSE2 function can't inline a SSE4
> -        function.  */
> +      /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
> +        function can inline a SSE2 function but a SSE2 function can't inline
> +        a SSE4 function.  */
>        if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
> -         != callee_opts->x_ix86_isa_flags) &
> -         ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> -         != callee_opts->x_ix86_isa_flags2))
> +          != callee_opts->x_ix86_isa_flags)
> +         || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> +             != callee_opts->x_ix86_isa_flags2))
>         ret = false;
>
>        /* See if we have the same non-isa options.  */
>
>
>         Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-19 18:52                               ` Uros Bizjak
@ 2016-11-20 18:16                                 ` Uros Bizjak
  2016-11-21 17:12                                   ` Martin Sebor
  0 siblings, 1 reply; 29+ messages in thread
From: Uros Bizjak @ 2016-11-20 18:16 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, Andrew Senkevich, GCC Patches

On Sat, Nov 19, 2016 at 7:52 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:
>>> On x86_64-linux with the 3 patches I'm not seeing any new FAILs
>>> compared to before r242569, on i686-linux there is still:
>>> +FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
>>> +FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
>>> compared to pre-r242569 (so some further fix is needed).
>>
>> And finally here is yet another patch that fixes pr57756 on i686-linux.
>> Ok for trunk together with the other 3 patches?
>
> OK for the whole patch series.

Hm, I still see (both, 32bit and 64bit targets):

In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:45:0,^M
                 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
                 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
/ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h: In function
'_mm512_maskz_4fmadd_ps':^M
/ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
failed in call to always_inline '_mm512_setzero_ps': target specific
option mismatch^M
In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:71:0,^M
                 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
                 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
/ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h:77:17: note:
called from here^M
compiler exited with status 1
FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
Excess errors:
/ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
failed in call to always_inline '_mm512_setzero_ps': target specific
option mismatch

Uros.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-20 18:16                                 ` Uros Bizjak
@ 2016-11-21 17:12                                   ` Martin Sebor
  2016-11-21 17:41                                     ` Andrew Senkevich
  0 siblings, 1 reply; 29+ messages in thread
From: Martin Sebor @ 2016-11-21 17:12 UTC (permalink / raw)
  To: Uros Bizjak, Jakub Jelinek; +Cc: H.J. Lu, Andrew Senkevich, GCC Patches

On 11/20/2016 11:16 AM, Uros Bizjak wrote:
> On Sat, Nov 19, 2016 at 7:52 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:
>>>> On x86_64-linux with the 3 patches I'm not seeing any new FAILs
>>>> compared to before r242569, on i686-linux there is still:
>>>> +FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
>>>> +FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
>>>> compared to pre-r242569 (so some further fix is needed).
>>>
>>> And finally here is yet another patch that fixes pr57756 on i686-linux.
>>> Ok for trunk together with the other 3 patches?
>>
>> OK for the whole patch series.
>
> Hm, I still see (both, 32bit and 64bit targets):
>
> In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:45:0,^M
>                  from
> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
>                  from
> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
> /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h: In function
> '_mm512_maskz_4fmadd_ps':^M
> /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
> failed in call to always_inline '_mm512_setzero_ps': target specific
> option mismatch^M
> In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:71:0,^M
>                  from
> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
>                  from
> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
> /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h:77:17: note:
> called from here^M
> compiler exited with status 1
> FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
> Excess errors:
> /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
> failed in call to always_inline '_mm512_setzero_ps': target specific
> option mismatch

FWIW, I came across the same error in my own testing and raised
bug 78451.

Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-21 17:12                                   ` Martin Sebor
@ 2016-11-21 17:41                                     ` Andrew Senkevich
  2016-11-21 18:45                                       ` Jakub Jelinek
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Senkevich @ 2016-11-21 17:41 UTC (permalink / raw)
  To: Martin Sebor; +Cc: Uros Bizjak, Jakub Jelinek, H.J. Lu, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3861 bytes --]

2016-11-21 20:12 GMT+03:00 Martin Sebor <msebor@gmail.com>:
> On 11/20/2016 11:16 AM, Uros Bizjak wrote:
>>
>> On Sat, Nov 19, 2016 at 7:52 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>
>>> On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>
>>>> On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:
>>>>>
>>>>> On x86_64-linux with the 3 patches I'm not seeing any new FAILs
>>>>> compared to before r242569, on i686-linux there is still:
>>>>> +FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
>>>>> +FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
>>>>> compared to pre-r242569 (so some further fix is needed).
>>>>
>>>>
>>>> And finally here is yet another patch that fixes pr57756 on i686-linux.
>>>> Ok for trunk together with the other 3 patches?
>>>
>>>
>>> OK for the whole patch series.
>>
>>
>> Hm, I still see (both, 32bit and 64bit targets):
>>
>> In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:45:0,^M
>>                  from
>> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
>>                  from
>> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
>> /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h: In function
>> '_mm512_maskz_4fmadd_ps':^M
>> /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
>> failed in call to always_inline '_mm512_setzero_ps': target specific
>> option mismatch^M
>> In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:71:0,^M
>>                  from
>> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
>>                  from
>> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
>> /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h:77:17: note:
>> called from here^M
>> compiler exited with status 1
>> FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
>> Excess errors:
>> /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
>> failed in call to always_inline '_mm512_setzero_ps': target specific
>> option mismatch
>
>
> FWIW, I came across the same error in my own testing and raised
> bug 78451.

Can we fix it with the following patch? Regtesting in progress.

    PR target/78451
    * gcc/config/i386/avx5124fmapsintrin.h: Avoid call to
    _mm512_setzero_ps.
    * gcc/config/i386/avx5124vnniwintrin.h: Ditto.

diff --git a/gcc/config/i386/avx5124fmapsintrin.h
b/gcc/config/i386/avx5124fmapsintrin.h
index 6113ee9..dd9a322
--- a/gcc/config/i386/avx5124fmapsintrin.h
+++ b/gcc/config/i386/avx5124fmapsintrin.h
@@ -74,7 +74,9 @@ _mm512_maskz_4fmadd_ps (__mmask16 __U,
  (__v16sf) __E,
  (__v16sf) __A,
  (const __v4sf *) __F,
- (__v16sf) _mm512_setzero_ps (),
+ (__v16sf) {0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0},
  (__mmask16) __U);
 }

@@ -161,7 +163,9 @@ _mm512_maskz_4fnmadd_ps (__mmask16 __U,
  (__v16sf) __E,
  (__v16sf) __A,
  (const __v4sf *) __F,
- (__v16sf) _mm512_setzero_ps (),
+ (__v16sf) {0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0},
  (__mmask16) __U);
 }

diff --git a/gcc/config/i386/avx5124vnniwintrin.h
b/gcc/config/i386/avx5124vnniwintrin.h
index 392c6a5..a4faa24
--- a/gcc/config/i386/avx5124vnniwintrin.h
+++ b/gcc/config/i386/avx5124vnniwintrin.h
@@ -75,7 +75,9 @@ _mm512_maskz_4dpwssd_epi32 (__mmask16 __U, __m512i
__A, __m512i __B,
   (__v16si) __E,
   (__v16si) __A,
   (const __v4si *) __F,
-  (__v16si) _mm512_setzero_ps (),
+  (__v16si) {0, 0, 0, 0,
+  0, 0, 0, 0, 0, 0, 0, 0,
+  0, 0, 0, 0},
   (__mmask16) __U);
 }

@@ -120,7 +122,9 @@ _mm512_maskz_4dpwssds_epi32 (__mmask16 __U,
__m512i __A, __m512i __B,
    (__v16si) __E,
    (__v16si) __A,
    (const __v4si *) __F,
-   (__v16si) _mm512_setzero_ps (),
+   (__v16si) {0, 0, 0, 0,
+   0, 0, 0, 0, 0, 0, 0, 0,
+   0, 0, 0, 0},
    (__mmask16) __U);
 }


--
WBR,
Andrew

[-- Attachment #2: sse-22a-fix.patch --]
[-- Type: application/octet-stream, Size: 1630 bytes --]

diff --git a/gcc/config/i386/avx5124fmapsintrin.h b/gcc/config/i386/avx5124fmapsintrin.h
index 6113ee9..dd9a322
--- a/gcc/config/i386/avx5124fmapsintrin.h
+++ b/gcc/config/i386/avx5124fmapsintrin.h
@@ -74,7 +74,9 @@ _mm512_maskz_4fmadd_ps (__mmask16 __U,
 						(__v16sf) __E,
 						(__v16sf) __A,
 						(const __v4sf *) __F,
-						(__v16sf) _mm512_setzero_ps (),
+						(__v16sf) {0, 0, 0, 0,
+						0, 0, 0, 0, 0, 0, 0, 0,
+						0, 0, 0, 0},
 						(__mmask16) __U);
 }
 
@@ -161,7 +163,9 @@ _mm512_maskz_4fnmadd_ps (__mmask16 __U,
 						 (__v16sf) __E,
 						 (__v16sf) __A,
 						 (const __v4sf *) __F,
-						 (__v16sf) _mm512_setzero_ps (),
+						 (__v16sf) {0, 0, 0, 0,
+						 0, 0, 0, 0, 0, 0, 0, 0,
+						 0, 0, 0, 0},
 						 (__mmask16) __U);
 }
 
diff --git a/gcc/config/i386/avx5124vnniwintrin.h b/gcc/config/i386/avx5124vnniwintrin.h
index 392c6a5..a4faa24
--- a/gcc/config/i386/avx5124vnniwintrin.h
+++ b/gcc/config/i386/avx5124vnniwintrin.h
@@ -75,7 +75,9 @@ _mm512_maskz_4dpwssd_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
 						  (__v16si) __E,
 						  (__v16si) __A,
 						  (const __v4si *) __F,
-						  (__v16si) _mm512_setzero_ps (),
+						  (__v16si) {0, 0, 0, 0,
+						  0, 0, 0, 0, 0, 0, 0, 0,
+						  0, 0, 0, 0},
 						  (__mmask16) __U);
 }
 
@@ -120,7 +122,9 @@ _mm512_maskz_4dpwssds_epi32 (__mmask16 __U, __m512i __A, __m512i __B,
 						   (__v16si) __E,
 						   (__v16si) __A,
 						   (const __v4si *) __F,
-						   (__v16si) _mm512_setzero_ps (),
+						   (__v16si) {0, 0, 0, 0,
+						   0, 0, 0, 0, 0, 0, 0, 0,
+						   0, 0, 0, 0},
 						   (__mmask16) __U);
 }
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
  2016-11-21 17:41                                     ` Andrew Senkevich
@ 2016-11-21 18:45                                       ` Jakub Jelinek
  0 siblings, 0 replies; 29+ messages in thread
From: Jakub Jelinek @ 2016-11-21 18:45 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Martin Sebor, Uros Bizjak, H.J. Lu, GCC Patches

On Mon, Nov 21, 2016 at 08:40:37PM +0300, Andrew Senkevich wrote:
> > FWIW, I came across the same error in my own testing and raised
> > bug 78451.
> 
> Can we fix it with the following patch? Regtesting in progress.
> 
>     PR target/78451
>     * gcc/config/i386/avx5124fmapsintrin.h: Avoid call to
>     _mm512_setzero_ps.
>     * gcc/config/i386/avx5124vnniwintrin.h: Ditto.

That is just a workaround, we want to fix the real bug.  I'll have a look
tomorrow.

	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-11-21 18:45 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-10 16:27 [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions Andrew Senkevich
2016-11-10 16:36 ` Jakub Jelinek
2016-11-10 17:18   ` Andrew Senkevich
2016-11-11 11:16     ` Uros Bizjak
2016-11-14 18:28       ` Andrew Senkevich
2016-11-15 10:04         ` Uros Bizjak
2016-11-15 12:55       ` Andrew Senkevich
2016-11-15 14:56         ` Jeff Law
2016-11-15 16:31           ` Andrew Senkevich
2016-11-16 16:21             ` Bernd Schmidt
2016-11-16 18:14               ` Andrew Senkevich
     [not found]               ` <CAMXFM3seRe2+yxLN7e99zcxPsSfLwYqHmT=7EweW4vZ4NV+b7A@mail.gmail.com>
2016-11-17 22:19                 ` H.J. Lu
2016-11-18 19:41                   ` Jakub Jelinek
2016-11-18 20:30                     ` Jakub Jelinek
2016-11-19  8:05                       ` Jakub Jelinek
2016-11-19  8:28                         ` Jakub Jelinek
2016-11-19 10:18                         ` Uros Bizjak
2016-11-19 11:28                           ` Jakub Jelinek
2016-11-19 17:24                             ` Jakub Jelinek
2016-11-19 18:52                               ` Uros Bizjak
2016-11-20 18:16                                 ` Uros Bizjak
2016-11-21 17:12                                   ` Martin Sebor
2016-11-21 17:41                                     ` Andrew Senkevich
2016-11-21 18:45                                       ` Jakub Jelinek
2016-11-19 14:04                           ` Andrew Senkevich
2016-11-10 17:14 ` Vladimir N Makarov
2016-11-10 17:19   ` Andrew Senkevich
2016-11-11 11:30 ` Jakub Jelinek
2016-11-14 18:29   ` Andrew Senkevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).