public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 2/4] i386: Enable AVX512 memory broadcast for FNMADD
  2018-10-20  8:35 [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB H.J. Lu
@ 2018-10-20  7:31 ` H.J. Lu
  2018-10-20  9:03 ` [PATCH 4/4] i386: Update AVX512 FMSUB/FNMADD/FNMSUB tests H.J. Lu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: H.J. Lu @ 2018-10-20  7:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FNMADD operations.  In order to
support AVX512 memory broadcast for FNMADD, FNMADD builtin functions are
also added, instead of passing the negated value to FMA builtin functions.

gcc/

	PR target/72782
	* config/i386/avx512fintrin.h (_mm512_fnmadd_round_pd): Use
	__builtin_ia32_vfnmaddpd512_mask.
	(_mm512_mask_fnmadd_round_pd): Likewise.
	(_mm512_fnmadd_pd): Likewise.
	(_mm512_mask_fnmadd_pd): Likewise.
	(_mm512_maskz_fnmadd_round_pd): Use
	__builtin_ia32_vfnmaddpd512_maskz.
	(_mm512_maskz_fnmadd_pd): Likewise.
	(_mm512_fnmadd_round_ps): Use __builtin_ia32_vfnmaddps512_mask.
	(_mm512_mask_fnmadd_round_ps): Likewise.
	(_mm512_fnmadd_ps): Likewise.
	(_mm512_mask_fnmadd_ps): Likewise.
	(_mm512_maskz_fnmadd_round_ps): Use
	__builtin_ia32_vfnmaddps512_maskz.
	(_mm512_maskz_fnmadd_ps): Likewise.
	* config/i386/avx512vlintrin.h (_mm256_mask_fnmadd_pd): Use
	__builtin_ia32_vfnmaddpd256_mask.
	(_mm256_maskz_fnmadd_pd): Use __builtin_ia32_vfnmaddpd256_maskz.
	(_mm_mask_fnmadd_pd): Use __builtin_ia32_vfmaddpd128_mask
	(_mm_maskz_fnmadd_pd): Use __builtin_ia32_vfnmaddpd128_maskz.
	(_mm256_mask_fnmadd_ps): Use __builtin_ia32_vfnmaddps256_mask.
	(_mm256_mask_fnmadd_ps): Use __builtin_ia32_vfnmaddps256_mask.
	(_mm256_maskz_fnmadd_ps): Use __builtin_ia32_vfnmaddps256_maskz.
	(_mm_mask_fnmadd_ps): Use __builtin_ia32_vfnmaddps128_mask.
	(_mm_maskz_fnmadd_ps): Use __builtin_ia32_vfnmaddps128_maskz.
	* config/i386/fmaintrin.h (_mm_fnmadd_pd): Use
	__builtin_ia32_vfnmaddpd.
	(_mm256_fnmadd_pd): Use __builtin_ia32_vfnmaddpd256.
	(_mm_fnmadd_ps): Use __builtin_ia32_vfnmaddps.
	(_mm256_fnmadd_ps): Use __builtin_ia32_vfnmaddps256.
	(_mm_fnmadd_sd): Use __builtin_ia32_vfnmaddsd3.
	(_mm_fnmadd_ss): Use __builtin_ia32_vfnmaddss3.
	* config/i386/i386-builtin.def: Add
	__builtin_ia32_vfnmaddpd256_mask,
	__builtin_ia32_vfnmaddpd256_maskz,
	__builtin_ia32_vfnmaddpd128_mask,
	__builtin_ia32_vfnmaddpd128_maskz,
	__builtin_ia32_vfnmaddps256_mask,
	__builtin_ia32_vfnmaddps256_maskz,
	__builtin_ia32_vfnmaddps128_mask,
	__builtin_ia32_vfnmaddps128_maskz,
	__builtin_ia32_vfnmaddpd512_mask,
	__builtin_ia32_vfnmaddpd512_maskz,
	__builtin_ia32_vfnmaddps512_mask,
	__builtin_ia32_vfnmaddps512_maskz, __builtin_ia32_vfnmaddss3,
	__builtin_ia32_vfnmaddsd3, __builtin_ia32_vfnmaddps,
	__builtin_ia32_vfnmaddpd, __builtin_ia32_vfnmaddps256 and.
	__builtin_ia32_vfnmaddpd256.
	* config/i386/sse.md (fma4i_fnmadd_<mode>): New.
	(<avx512>_fnmadd_<mode>_maskz<round_expand_name>): Likewise.
	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1):
	Likewise.
	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2):
	Likewise.
	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3):
	Likewise.
	(fmai_vmfnmadd_<mode><round_name>): Likewise.

gcc/testsuite/

	PR target/72782
	* gcc.target/i386/avx512f-fnmadd-df-zmm-1.c: New test.
	* gcc.target/i386/avx512f-fnmadd-sf-zmm-1.c: Likewise.
	* gcc.target/i386/avx512f-fnmadd-sf-zmm-2.c: Likewise.
	* gcc.target/i386/avx512f-fnmadd-sf-zmm-3.c: Likewise.
	* gcc.target/i386/avx512f-fnmadd-sf-zmm-4.c: Likewise.
	* gcc.target/i386/avx512f-fnmadd-sf-zmm-5.c: Likewise.
	* gcc.target/i386/avx512f-fnmadd-sf-zmm-6.c: Likewise.
	* gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Likewise.
	* gcc.target/i386/avx512f-fnmadd-sf-zmm-8.c: Likewise.
	* gcc.target/i386/avx512vl-fnmadd-sf-xmm-1.c: Likewise.
	* gcc.target/i386/avx512vl-fnmadd-sf-ymm-1.c: Likewise.
---
 gcc/config/i386/avx512fintrin.h               | 124 +++++++++---------
 gcc/config/i386/avx512vlintrin.h              |  64 ++++-----
 gcc/config/i386/fmaintrin.h                   |  24 ++--
 gcc/config/i386/i386-builtin.def              |  20 +++
 gcc/config/i386/sse.md                        |  77 +++++++++++
 .../gcc.target/i386/avx512f-fnmadd-df-zmm-1.c |  12 ++
 .../gcc.target/i386/avx512f-fnmadd-sf-zmm-1.c |  12 ++
 .../gcc.target/i386/avx512f-fnmadd-sf-zmm-2.c |  12 ++
 .../gcc.target/i386/avx512f-fnmadd-sf-zmm-3.c |  12 ++
 .../gcc.target/i386/avx512f-fnmadd-sf-zmm-4.c |  12 ++
 .../gcc.target/i386/avx512f-fnmadd-sf-zmm-5.c |  12 ++
 .../gcc.target/i386/avx512f-fnmadd-sf-zmm-6.c |  12 ++
 .../gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c |  12 ++
 .../gcc.target/i386/avx512f-fnmadd-sf-zmm-8.c |  12 ++
 .../i386/avx512vl-fnmadd-sf-xmm-1.c           |  12 ++
 .../i386/avx512vl-fnmadd-sf-ymm-1.c           |  12 ++
 16 files changed, 335 insertions(+), 106 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmadd-df-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-xmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-ymm-1.c

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index c0c8fa1efd0..1445e9e8b3c 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -3613,10 +3613,10 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask (-(__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) -1, __R);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) -1, __R);
 }
 
 extern __inline __m512d
@@ -3635,10 +3635,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask3_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C,
 			      __mmask8 __U, const int __R)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask3 (-(__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U, __R);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U, __R);
 }
 
 extern __inline __m512d
@@ -3646,20 +3646,20 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fnmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
 			      __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_maskz (-(__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U, __R);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask (-(__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) -1, __R);
+  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) -1, __R);
 }
 
 extern __inline __m512
@@ -3678,10 +3678,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask3_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C,
 			      __mmask16 __U, const int __R)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask3 (-(__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U, __R);
+  return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U, __R);
 }
 
 extern __inline __m512
@@ -3689,10 +3689,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fnmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
 			      __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_maskz (-(__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U, __R);
+  return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U, __R);
 }
 
 extern __inline __m512d
@@ -3878,28 +3878,28 @@ _mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
     (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, -(C), U, R)
 
 #define _mm512_fnmadd_round_pd(A, B, C, R)            \
-    (__m512d)__builtin_ia32_vfmaddpd512_mask(-(A), B, C, -1, R)
+    (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, -1, R)
 
 #define _mm512_mask_fnmadd_round_pd(A, U, B, C, R)    \
-    (__m512d)__builtin_ia32_vfnmaddpd512_mask(-(A), B, C, U, R)
+    (__m512d)__builtin_ia32_vfnmaddpd512_mask(A, B, C, U, R)
 
 #define _mm512_mask3_fnmadd_round_pd(A, B, C, U, R)   \
-    (__m512d)__builtin_ia32_vfmaddpd512_mask3(-(A), B, C, U, R)
+    (__m512d)__builtin_ia32_vfnmaddpd512_mask3(A, B, C, U, R)
 
 #define _mm512_maskz_fnmadd_round_pd(U, A, B, C, R)   \
-    (__m512d)__builtin_ia32_vfmaddpd512_maskz(-(A), B, C, U, R)
+    (__m512d)__builtin_ia32_vfnmaddpd512_maskz(A, B, C, U, R)
 
 #define _mm512_fnmadd_round_ps(A, B, C, R)            \
-    (__m512)__builtin_ia32_vfmaddps512_mask(-(A), B, C, -1, R)
+    (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, -1, R)
 
 #define _mm512_mask_fnmadd_round_ps(A, U, B, C, R)    \
-    (__m512)__builtin_ia32_vfnmaddps512_mask(-(A), B, C, U, R)
+    (__m512)__builtin_ia32_vfnmaddps512_mask(A, B, C, U, R)
 
 #define _mm512_mask3_fnmadd_round_ps(A, B, C, U, R)   \
-    (__m512)__builtin_ia32_vfmaddps512_mask3(-(A), B, C, U, R)
+    (__m512)__builtin_ia32_vfnmaddps512_mask3(A, B, C, U, R)
 
 #define _mm512_maskz_fnmadd_round_ps(U, A, B, C, R)   \
-    (__m512)__builtin_ia32_vfmaddps512_maskz(-(A), B, C, U, R)
+    (__m512)__builtin_ia32_vfnmaddps512_maskz(A, B, C, U, R)
 
 #define _mm512_fnmsub_round_pd(A, B, C, R)            \
     (__m512d)__builtin_ia32_vfmaddpd512_mask(-(A), B, -(C), -1, R)
@@ -12680,11 +12680,11 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask (-(__v8df) __A,
-						    (__v8df) __B,
-						    (__v8df) __C,
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
@@ -12702,33 +12702,33 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask3_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask3 (-(__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_mask3 ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fnmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_maskz (-(__v8df) __A,
-						     (__v8df) __B,
-						     (__v8df) __C,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmaddpd512_maskz ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask (-(__v16sf) __A,
-						   (__v16sf) __B,
-						   (__v16sf) __C,
-						   (__mmask16) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
@@ -12746,22 +12746,22 @@ extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask3_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask3 (-(__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfnmaddps512_mask3 ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fnmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_maskz (-(__v16sf) __A,
-						    (__v16sf) __B,
-						    (__v16sf) __C,
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfnmaddps512_maskz ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
diff --git a/gcc/config/i386/avx512vlintrin.h b/gcc/config/i386/avx512vlintrin.h
index 1e2e6da29bd..0135652eb8b 100644
--- a/gcc/config/i386/avx512vlintrin.h
+++ b/gcc/config/i386/avx512vlintrin.h
@@ -4525,10 +4525,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_mask3_fnmadd_pd (__m256d __A, __m256d __B, __m256d __C,
 			__mmask8 __U)
 {
-  return (__m256d) __builtin_ia32_vfmaddpd256_mask3 (-(__v4df) __A,
-						     (__v4df) __B,
-						     (__v4df) __C,
-						     (__mmask8) __U);
+  return (__m256d) __builtin_ia32_vfnmaddpd256_mask3 ((__v4df) __A,
+						      (__v4df) __B,
+						      (__v4df) __C,
+						      (__mmask8) __U);
 }
 
 extern __inline __m256d
@@ -4536,10 +4536,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_maskz_fnmadd_pd (__mmask8 __U, __m256d __A, __m256d __B,
 			__m256d __C)
 {
-  return (__m256d) __builtin_ia32_vfmaddpd256_maskz (-(__v4df) __A,
-						     (__v4df) __B,
-						     (__v4df) __C,
-						     (__mmask8) __U);
+  return (__m256d) __builtin_ia32_vfnmaddpd256_maskz ((__v4df) __A,
+						      (__v4df) __B,
+						      (__v4df) __C,
+						      (__mmask8) __U);
 }
 
 extern __inline __m128d
@@ -4558,10 +4558,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_mask3_fnmadd_pd (__m128d __A, __m128d __B, __m128d __C,
 		     __mmask8 __U)
 {
-  return (__m128d) __builtin_ia32_vfmaddpd128_mask3 (-(__v2df) __A,
-						     (__v2df) __B,
-						     (__v2df) __C,
-						     (__mmask8) __U);
+  return (__m128d) __builtin_ia32_vfnmaddpd128_mask3 ((__v2df) __A,
+						      (__v2df) __B,
+						      (__v2df) __C,
+						      (__mmask8) __U);
 }
 
 extern __inline __m128d
@@ -4569,10 +4569,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_maskz_fnmadd_pd (__mmask8 __U, __m128d __A, __m128d __B,
 		     __m128d __C)
 {
-  return (__m128d) __builtin_ia32_vfmaddpd128_maskz (-(__v2df) __A,
-						     (__v2df) __B,
-						     (__v2df) __C,
-						     (__mmask8) __U);
+  return (__m128d) __builtin_ia32_vfnmaddpd128_maskz ((__v2df) __A,
+						      (__v2df) __B,
+						      (__v2df) __C,
+						      (__mmask8) __U);
 }
 
 extern __inline __m256
@@ -4591,10 +4591,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_mask3_fnmadd_ps (__m256 __A, __m256 __B, __m256 __C,
 			__mmask8 __U)
 {
-  return (__m256) __builtin_ia32_vfmaddps256_mask3 (-(__v8sf) __A,
-						    (__v8sf) __B,
-						    (__v8sf) __C,
-						    (__mmask8) __U);
+  return (__m256) __builtin_ia32_vfnmaddps256_mask3 ((__v8sf) __A,
+						     (__v8sf) __B,
+						     (__v8sf) __C,
+						     (__mmask8) __U);
 }
 
 extern __inline __m256
@@ -4602,10 +4602,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_maskz_fnmadd_ps (__mmask8 __U, __m256 __A, __m256 __B,
 			__m256 __C)
 {
-  return (__m256) __builtin_ia32_vfmaddps256_maskz (-(__v8sf) __A,
-						    (__v8sf) __B,
-						    (__v8sf) __C,
-						    (__mmask8) __U);
+  return (__m256) __builtin_ia32_vfnmaddps256_maskz ((__v8sf) __A,
+						     (__v8sf) __B,
+						     (__v8sf) __C,
+						     (__mmask8) __U);
 }
 
 extern __inline __m128
@@ -4622,20 +4622,20 @@ extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_mask3_fnmadd_ps (__m128 __A, __m128 __B, __m128 __C, __mmask8 __U)
 {
-  return (__m128) __builtin_ia32_vfmaddps128_mask3 (-(__v4sf) __A,
-						    (__v4sf) __B,
-						    (__v4sf) __C,
-						    (__mmask8) __U);
+  return (__m128) __builtin_ia32_vfnmaddps128_mask3 ((__v4sf) __A,
+						     (__v4sf) __B,
+						     (__v4sf) __C,
+						     (__mmask8) __U);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_maskz_fnmadd_ps (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128) __builtin_ia32_vfmaddps128_maskz (-(__v4sf) __A,
-						    (__v4sf) __B,
-						    (__v4sf) __C,
-						    (__mmask8) __U);
+  return (__m128) __builtin_ia32_vfnmaddps128_maskz ((__v4sf) __A,
+						     (__v4sf) __B,
+						     (__v4sf) __C,
+						     (__mmask8) __U);
 }
 
 extern __inline __m256d
diff --git a/gcc/config/i386/fmaintrin.h b/gcc/config/i386/fmaintrin.h
index 2eddd896579..0a2f4a7f676 100644
--- a/gcc/config/i386/fmaintrin.h
+++ b/gcc/config/i386/fmaintrin.h
@@ -134,48 +134,48 @@ extern __inline __m128d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmadd_pd (__m128d __A, __m128d __B, __m128d __C)
 {
-  return (__m128d)__builtin_ia32_vfmaddpd (-(__v2df)__A, (__v2df)__B,
-                                           (__v2df)__C);
+  return (__m128d)__builtin_ia32_vfnmaddpd ((__v2df)__A, (__v2df)__B,
+					    (__v2df)__C);
 }
 
 extern __inline __m256d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_fnmadd_pd (__m256d __A, __m256d __B, __m256d __C)
 {
-  return (__m256d)__builtin_ia32_vfmaddpd256 (-(__v4df)__A, (__v4df)__B,
-                                              (__v4df)__C);
+  return (__m256d)__builtin_ia32_vfnmaddpd256 ((__v4df)__A, (__v4df)__B,
+					       (__v4df)__C);
 }
 
 extern __inline __m128
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmadd_ps (__m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128)__builtin_ia32_vfmaddps (-(__v4sf)__A, (__v4sf)__B,
-                                          (__v4sf)__C);
+  return (__m128)__builtin_ia32_vfnmaddps ((__v4sf)__A, (__v4sf)__B,
+					   (__v4sf)__C);
 }
 
 extern __inline __m256
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_fnmadd_ps (__m256 __A, __m256 __B, __m256 __C)
 {
-  return (__m256)__builtin_ia32_vfmaddps256 (-(__v8sf)__A, (__v8sf)__B,
-                                             (__v8sf)__C);
+  return (__m256)__builtin_ia32_vfnmaddps256 ((__v8sf)__A, (__v8sf)__B,
+					      (__v8sf)__C);
 }
 
 extern __inline __m128d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmadd_sd (__m128d __A, __m128d __B, __m128d __C)
 {
-  return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, -(__v2df)__B,
-                                            (__v2df)__C);
+  return (__m128d)__builtin_ia32_vfnmaddsd3 ((__v2df)__A, (__v2df)__B,
+					     (__v2df)__C);
 }
 
 extern __inline __m128
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmadd_ss (__m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, -(__v4sf)__B,
-                                           (__v4sf)__C);
+  return (__m128)__builtin_ia32_vfnmaddss3 ((__v4sf)__A, (__v4sf)__B,
+					    (__v4sf)__C);
 }
 
 extern __inline __m128d
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index f5b5e56a01c..74343db1cbe 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1916,9 +1916,19 @@ BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_mask, "__builtin_i
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_mask3, "__builtin_ia32_vfmsubps128_mask3", IX86_BUILTIN_VFMSUBPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_maskz, "__builtin_ia32_vfmsubps128_maskz", IX86_BUILTIN_VFMSUBPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4df_mask, "__builtin_ia32_vfnmaddpd256_mask", IX86_BUILTIN_VFNMADDPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4df_mask3, "__builtin_ia32_vfnmaddpd256_mask3", IX86_BUILTIN_VFNMADDPD256_MASK3, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4df_maskz,
+"__builtin_ia32_vfnmaddpd256_maskz", IX86_BUILTIN_VFNMADDPD256_MASKZ, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v2df_mask, "__builtin_ia32_vfnmaddpd128_mask", IX86_BUILTIN_VFNMADDPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v2df_mask3, "__builtin_ia32_vfnmaddpd128_mask3", IX86_BUILTIN_VFNMADDPD128_MASK3, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v2df_maskz,
+"__builtin_ia32_vfnmaddpd128_maskz", IX86_BUILTIN_VFNMADDPD128_MASKZ, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v8sf_mask, "__builtin_ia32_vfnmaddps256_mask", IX86_BUILTIN_VFNMADDPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v8sf_mask3, "__builtin_ia32_vfnmaddps256_mask3", IX86_BUILTIN_VFNMADDPS256_MASK3, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v8sf_maskz, "__builtin_ia32_vfnmaddps256_maskz", IX86_BUILTIN_VFNMADDPS256_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4sf_mask, "__builtin_ia32_vfnmaddps128_mask", IX86_BUILTIN_VFNMADDPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4sf_mask3, "__builtin_ia32_vfnmaddps128_mask3", IX86_BUILTIN_VFNMADDPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4sf_maskz, "__builtin_ia32_vfnmaddps128_maskz", IX86_BUILTIN_VFNMADDPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v4df_mask, "__builtin_ia32_vfnmsubpd256_mask", IX86_BUILTIN_VFNMSUBPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v4df_mask3, "__builtin_ia32_vfnmsubpd256_mask3", IX86_BUILTIN_VFNMSUBPD256_MASK3, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v2df_mask, "__builtin_ia32_vfnmsubpd128_mask", IX86_BUILTIN_VFNMSUBPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
@@ -2783,7 +2793,11 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__buil
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v8df_mask3_round, "__builtin_ia32_vfnmaddpd512_mask3", IX86_BUILTIN_VFNMADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v8df_maskz_round, "__builtin_ia32_vfnmaddpd512_maskz", IX86_BUILTIN_VFNMADDPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v16sf_mask3_round, "__builtin_ia32_vfnmaddps512_mask3", IX86_BUILTIN_VFNMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v16sf_maskz_round, "__builtin_ia32_vfnmaddps512_maskz", IX86_BUILTIN_VFNMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v8df_mask3_round, "__builtin_ia32_vfnmsubpd512_mask3", IX86_BUILTIN_VFNMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v16sf_mask_round, "__builtin_ia32_vfnmsubps512_mask", IX86_BUILTIN_VFNMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
@@ -2869,6 +2883,8 @@ BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v4sf, "__builtin_ia32_vfmaddss
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v2df, "__builtin_ia32_vfmaddsd3", IX86_BUILTIN_VFMADDSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v4sf, "__builtin_ia32_vfmsubss3", IX86_BUILTIN_VFMSUBSS3, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v2df, "__builtin_ia32_vfmsubsd3", IX86_BUILTIN_VFMSUBSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
+BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfnmadd_v4sf, "__builtin_ia32_vfnmaddss3", IX86_BUILTIN_VFNMADDSS3, UNKNOWN, (int)MULTI_ARG_3_SF)
+BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfnmadd_v2df, "__builtin_ia32_vfnmaddsd3", IX86_BUILTIN_VFNMADDSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
 
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4sf, "__builtin_ia32_vfmaddps", IX86_BUILTIN_VFMADDPS, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v2df, "__builtin_ia32_vfmaddpd", IX86_BUILTIN_VFMADDPD, UNKNOWN, (int)MULTI_ARG_3_DF)
@@ -2878,6 +2894,10 @@ BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v4sf, "_
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v2df, "__builtin_ia32_vfmsubpd", IX86_BUILTIN_VFMSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v8sf, "__builtin_ia32_vfmsubps256", IX86_BUILTIN_VFMSUBPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v4df, "__builtin_ia32_vfmsubpd256", IX86_BUILTIN_VFMSUBPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmadd_v4sf, "__builtin_ia32_vfnmaddps", IX86_BUILTIN_VFNMADDPS, UNKNOWN, (int)MULTI_ARG_3_SF)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmadd_v2df, "__builtin_ia32_vfnmaddpd", IX86_BUILTIN_VFNMADDPD, UNKNOWN, (int)MULTI_ARG_3_DF)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmadd_v8sf, "__builtin_ia32_vfnmaddps256", IX86_BUILTIN_VFNMADDPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmadd_v4df, "__builtin_ia32_vfnmaddpd256", IX86_BUILTIN_VFNMADDPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
 
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v4sf, "__builtin_ia32_vfmaddsubps", IX86_BUILTIN_VFMADDSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v2df, "__builtin_ia32_vfmaddsubpd", IX86_BUILTIN_VFMADDSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 0e9d3541ccc..9ed03fa85e2 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3768,6 +3768,14 @@
 	  (neg:FMAMODE_AVX512
 	    (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand"))))])
 
+(define_expand "fma4i_fnmadd_<mode>"
+  [(set (match_operand:FMAMODE_AVX512 0 "register_operand")
+	(fma:FMAMODE_AVX512
+	  (neg:FMAMODE_AVX512
+	    (match_operand:FMAMODE_AVX512 1 "nonimmediate_operand"))
+	  (match_operand:FMAMODE_AVX512 2 "nonimmediate_operand")
+	  (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand")))])
+
 (define_expand "<avx512>_fmadd_<mode>_maskz<round_expand_name>"
   [(match_operand:VF_AVX512VL 0 "register_operand")
    (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
@@ -4028,6 +4036,20 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_expand "<avx512>_fnmadd_<mode>_maskz<round_expand_name>"
+  [(match_operand:VF_AVX512VL 0 "register_operand")
+   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F && <round_mode512bit_condition>"
+{
+  emit_insn (gen_fma_fnmadd_<mode>_maskz_1<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
 (define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
 	(fma:VF_SF_AVX512VL
@@ -4043,6 +4065,49 @@
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+	(fma:VF_AVX512
+	  (neg:VF_AVX512
+	    (match_operand:VF_AVX512 1 "register_operand" "0,v"))
+	  (match_operand:VF_AVX512 2 "register_operand" "v,0")
+	  (vec_duplicate:VF_AVX512
+	    (match_operand:<ssescalarmode> 3 "memory_operand" "m,m"))))]
+  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
+  "vfnmadd213<ssemodesuffix>\t{%3<avx512bcst>, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<avx512bcst>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+	(fma:VF_AVX512
+	  (neg:VF_AVX512
+	    (vec_duplicate:VF_AVX512
+	      (match_operand:<ssescalarmode> 1 "memory_operand" "m,m")))
+	  (match_operand:VF_AVX512 2 "register_operand" "0,v")
+	  (match_operand:VF_AVX512 3 "register_operand" "v,0")))]
+  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
+  "@
+   vfnmadd132<ssemodesuffix>\t{%1<avx512bcst>, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %1<avx512bcst>}
+   vfnmadd231<ssemodesuffix>\t{%1<avx512bcst>, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %1<avx512bcst>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+	(fma:VF_AVX512
+	  (neg:VF_AVX512
+	    (match_operand:VF_AVX512 1 "register_operand" "0,v"))
+	  (vec_duplicate:VF_AVX512
+	    (match_operand:<ssescalarmode> 2 "memory_operand" "m,m"))
+	  (match_operand:VF_AVX512 3 "register_operand" "v,0")))]
+  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
+  "@
+   vfnmadd132<ssemodesuffix>\t{%2<avx512bcst>, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<avx512bcst>}
+   vfnmadd231<ssemodesuffix>\t{%2<avx512bcst>, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<avx512bcst>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "<avx512>_fnmadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
 	(vec_merge:VF_AVX512VL
@@ -4338,6 +4403,18 @@
 	  (const_int 1)))]
   "TARGET_FMA")
 
+(define_expand "fmai_vmfnmadd_<mode><round_name>"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (fma:VF_128
+	    (neg:VF_128
+	      (match_operand:VF_128 2 "<round_nimm_predicate>"))
+	    (match_operand:VF_128 1 "<round_nimm_predicate>")
+	    (match_operand:VF_128 3 "<round_nimm_predicate>"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_FMA")
+
 (define_insn "*fmai_fmadd_<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-df-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-df-zmm-1.c
new file mode 100644
index 00000000000..d4db6f74ff8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-df-zmm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...pd\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastsd\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512d
+#define vec 512
+#define op fnmadd
+#define suffix pd
+#define SCALAR double
+
+#include "avx512-fma-1.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-1.c
new file mode 100644
index 00000000000..9da0cad2f09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-1.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-2.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-2.c
new file mode 100644
index 00000000000..cf355d16dd9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-2.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-3.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-3.c
new file mode 100644
index 00000000000..a4e723e2320
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-3.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-4.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-4.c
new file mode 100644
index 00000000000..ceeba440ca5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-4.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-4.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-5.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-5.c
new file mode 100644
index 00000000000..4733dce087e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-5.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-6.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-6.c
new file mode 100644
index 00000000000..cbf90ca7587
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-6.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c
new file mode 100644
index 00000000000..db5c34678c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
+
+#define type __m512
+#define vec 512
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-8.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-8.c
new file mode 100644
index 00000000000..909ab25796b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[ \\t\]+%zmm\[0-9\]+, %zmm\[0-9\]+, %zmm0" 1 } } */
+
+#define type __m512
+#define vec 512
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-8.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-xmm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-xmm-1.c
new file mode 100644
index 00000000000..ec68fa9e084
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-xmm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mfma -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%xmm\[0-9\]+" } } */
+
+#define type __m128
+#define vec
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-1.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-ymm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-ymm-1.c
new file mode 100644
index 00000000000..d668d0ee319
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-ymm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mfma -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%ymm\[0-9\]+" } } */
+
+#define type __m256
+#define vec 256
+#define op fnmadd
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-1.h"
-- 
2.17.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB
@ 2018-10-20  8:35 H.J. Lu
  2018-10-20  7:31 ` [PATCH 2/4] i386: Enable AVX512 memory broadcast for FNMADD H.J. Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: H.J. Lu @ 2018-10-20  8:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FMSUB operations.  In order to
support AVX512 memory broadcast for FMSUB, FMSUB builtin functions are
also added, instead of passing the negated value to FMA builtin functions.

gcc/

	PR target/72782
	* config/i386/avx512fintrin.h (_mm512_fmsub_round_pd): Use
	__builtin_ia32_vfmsubpd512_mask.
	(_mm512_mask_fmsub_round_pd): Likewise.
	(_mm512_fmsub_pd): Likewise.
	(_mm512_mask_fmsub_pd): Likewise.
	(_mm512_maskz_fmsub_round_pd): Use
	__builtin_ia32_vfmsubpd512_maskz.
	(_mm512_maskz_fmsub_pd): Likewise.
	(_mm512_fmsub_round_ps): Use __builtin_ia32_vfmsubps512_mask.
	(_mm512_mask_fmsub_round_ps): Likewise.
	(_mm512_fmsub_ps): Likewise.
	(_mm512_mask_fmsub_ps): Likewise.
	(_mm512_maskz_fmsub_round_ps): Use
	__builtin_ia32_vfmsubps512_maskz.
	(_mm512_maskz_fmsub_ps): Likewise.
	* config/i386/avx512vlintrin.h (_mm256_mask_fmsub_pd): Use
	__builtin_ia32_vfmsubpd256_mask.
	(_mm256_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd256_maskz.
	(_mm_mask_fmsub_pd): Use __builtin_ia32_vfmaddpd128_mask
	(_mm_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd128_maskz.
	(_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask.
	(_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask.
	(_mm256_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps256_maskz.
	(_mm_mask_fmsub_ps): Use __builtin_ia32_vfmsubps128_mask.
	(_mm_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps128_maskz.
	* config/i386/fmaintrin.h (_mm_fmsub_pd): Use
	__builtin_ia32_vfmsubpd.
	(_mm256_fmsub_pd): Use __builtin_ia32_vfmsubpd256.
	(_mm_fmsub_ps): Use __builtin_ia32_vfmsubps.
	(_mm256_fmsub_ps): Use __builtin_ia32_vfmsubps256.
	(_mm_fmsub_sd): Use __builtin_ia32_vfmsubsd3.
	(_mm_fmsub_ss): Use __builtin_ia32_vfmsubss3.
	* config/i386/i386-builtin.def: Add
	__builtin_ia32_vfmsubpd256_mask,
	__builtin_ia32_vfmsubpd256_maskz,
	__builtin_ia32_vfmsubpd128_mask,
	__builtin_ia32_vfmsubpd128_maskz,
	__builtin_ia32_vfmsubps256_mask,
	__builtin_ia32_vfmsubps256_maskz,
	__builtin_ia32_vfmsubps128_mask,
	__builtin_ia32_vfmsubps128_maskz,
	__builtin_ia32_vfmsubpd512_mask,
	__builtin_ia32_vfmsubpd512_maskz,
	__builtin_ia32_vfmsubps512_mask,
	__builtin_ia32_vfmsubps512_maskz, __builtin_ia32_vfmsubss3,
	__builtin_ia32_vfmsubsd3, __builtin_ia32_vfmsubps,
	__builtin_ia32_vfmsubpd, __builtin_ia32_vfmsubps256 and.
	__builtin_ia32_vfmsubpd256.
	* config/i386/sse.md (fma4i_fmsub_<mode>): New.
	(<avx512>_fmsub_<mode>_maskz<round_expand_name>): Likewise.
	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1):
	Likewise.
	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2):
	Likewise.
	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3):
	Likewise.
	(fmai_vmfmsub_<mode><round_name>): Likewise.

gcc/testsuite/

	PR target/72782
	* gcc.target/i386/avx512f-fmsub-df-zmm-1.c: New test.
	* gcc.target/i386/avx512f-fmsub-sf-zmm-1.c: Likewise.
	* gcc.target/i386/avx512f-fmsub-sf-zmm-2.c: Likewise.
	* gcc.target/i386/avx512f-fmsub-sf-zmm-3.c: Likewise.
	* gcc.target/i386/avx512f-fmsub-sf-zmm-4.c: Likewise.
	* gcc.target/i386/avx512f-fmsub-sf-zmm-5.c: Likewise.
	* gcc.target/i386/avx512f-fmsub-sf-zmm-6.c: Likewise.
	* gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Likewise.
	* gcc.target/i386/avx512f-fmsub-sf-zmm-8.c: Likewise.
	* gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c: Likewise.
	* gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c: Likewise.
---
 gcc/config/i386/avx512fintrin.h               | 60 +++++++--------
 gcc/config/i386/avx512vlintrin.h              | 32 ++++----
 gcc/config/i386/fmaintrin.h                   | 24 +++---
 gcc/config/i386/i386-builtin.def              | 18 +++++
 gcc/config/i386/sse.md                        | 77 +++++++++++++++++++
 .../gcc.target/i386/avx512f-fmsub-df-zmm-1.c  | 12 +++
 .../gcc.target/i386/avx512f-fmsub-sf-zmm-1.c  | 12 +++
 .../gcc.target/i386/avx512f-fmsub-sf-zmm-2.c  | 12 +++
 .../gcc.target/i386/avx512f-fmsub-sf-zmm-3.c  | 12 +++
 .../gcc.target/i386/avx512f-fmsub-sf-zmm-4.c  | 12 +++
 .../gcc.target/i386/avx512f-fmsub-sf-zmm-5.c  | 12 +++
 .../gcc.target/i386/avx512f-fmsub-sf-zmm-6.c  | 12 +++
 .../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c  | 12 +++
 .../gcc.target/i386/avx512f-fmsub-sf-zmm-8.c  | 12 +++
 .../gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c | 12 +++
 .../gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c | 12 +++
 16 files changed, 285 insertions(+), 58 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 8473cd0d26c..c0c8fa1efd0 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -3355,9 +3355,9 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
 						    (__v8df) __B,
-						    -(__v8df) __C,
+						    (__v8df) __C,
 						    (__mmask8) -1, __R);
 }
 
@@ -3366,9 +3366,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_fmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
 			    __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
 						    (__v8df) __B,
-						    -(__v8df) __C,
+						    (__v8df) __C,
 						    (__mmask8) __U, __R);
 }
 
@@ -3388,9 +3388,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
 			     __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
+  return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
 						     (__v8df) __B,
-						     -(__v8df) __C,
+						     (__v8df) __C,
 						     (__mmask8) __U, __R);
 }
 
@@ -3398,9 +3398,9 @@ extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
 						   (__v16sf) __B,
-						   -(__v16sf) __C,
+						   (__v16sf) __C,
 						   (__mmask16) -1, __R);
 }
 
@@ -3409,9 +3409,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_fmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
 			    __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
 						   (__v16sf) __B,
-						   -(__v16sf) __C,
+						   (__v16sf) __C,
 						   (__mmask16) __U, __R);
 }
 
@@ -3431,9 +3431,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
 			     __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
+  return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
 						    (__v16sf) __B,
-						    -(__v16sf) __C,
+						    (__v16sf) __C,
 						    (__mmask16) __U, __R);
 }
 
@@ -3806,28 +3806,28 @@ _mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
     (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, C, U, R)
 
 #define _mm512_fmsub_round_pd(A, B, C, R)            \
-    (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, -(C), -1, R)
+    (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, -1, R)
 
 #define _mm512_mask_fmsub_round_pd(A, U, B, C, R)    \
-    (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, -(C), U, R)
+    (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, U, R)
 
 #define _mm512_mask3_fmsub_round_pd(A, B, C, U, R)   \
     (__m512d)__builtin_ia32_vfmsubpd512_mask3(A, B, C, U, R)
 
 #define _mm512_maskz_fmsub_round_pd(U, A, B, C, R)   \
-    (__m512d)__builtin_ia32_vfmaddpd512_maskz(A, B, -(C), U, R)
+    (__m512d)__builtin_ia32_vfmsubpd512_maskz(A, B, C, U, R)
 
 #define _mm512_fmsub_round_ps(A, B, C, R)            \
-    (__m512)__builtin_ia32_vfmaddps512_mask(A, B, -(C), -1, R)
+    (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, -1, R)
 
 #define _mm512_mask_fmsub_round_ps(A, U, B, C, R)    \
-    (__m512)__builtin_ia32_vfmaddps512_mask(A, B, -(C), U, R)
+    (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, U, R)
 
 #define _mm512_mask3_fmsub_round_ps(A, B, C, U, R)   \
     (__m512)__builtin_ia32_vfmsubps512_mask3(A, B, C, U, R)
 
 #define _mm512_maskz_fmsub_round_ps(U, A, B, C, R)   \
-    (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, -(C), U, R)
+    (__m512)__builtin_ia32_vfmsubps512_maskz(A, B, C, U, R)
 
 #define _mm512_fmaddsub_round_pd(A, B, C, R)            \
     (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, -1, R)
@@ -12416,9 +12416,9 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fmsub_pd (__m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
 						    (__v8df) __B,
-						    -(__v8df) __C,
+						    (__v8df) __C,
 						    (__mmask8) -1,
 						    _MM_FROUND_CUR_DIRECTION);
 }
@@ -12427,9 +12427,9 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_fmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
+  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
 						    (__v8df) __B,
-						    -(__v8df) __C,
+						    (__v8df) __C,
 						    (__mmask8) __U,
 						    _MM_FROUND_CUR_DIRECTION);
 }
@@ -12449,9 +12449,9 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
+  return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
 						     (__v8df) __B,
-						     -(__v8df) __C,
+						     (__v8df) __C,
 						     (__mmask8) __U,
 						     _MM_FROUND_CUR_DIRECTION);
 }
@@ -12460,9 +12460,9 @@ extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fmsub_ps (__m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
 						   (__v16sf) __B,
-						   -(__v16sf) __C,
+						   (__v16sf) __C,
 						   (__mmask16) -1,
 						   _MM_FROUND_CUR_DIRECTION);
 }
@@ -12471,9 +12471,9 @@ extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_fmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
+  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
 						   (__v16sf) __B,
-						   -(__v16sf) __C,
+						   (__v16sf) __C,
 						   (__mmask16) __U,
 						   _MM_FROUND_CUR_DIRECTION);
 }
@@ -12493,9 +12493,9 @@ extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
+  return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
 						    (__v16sf) __B,
-						    -(__v16sf) __C,
+						    (__v16sf) __C,
 						    (__mmask16) __U,
 						    _MM_FROUND_CUR_DIRECTION);
 }
diff --git a/gcc/config/i386/avx512vlintrin.h b/gcc/config/i386/avx512vlintrin.h
index 68b5537845b..1e2e6da29bd 100644
--- a/gcc/config/i386/avx512vlintrin.h
+++ b/gcc/config/i386/avx512vlintrin.h
@@ -4117,9 +4117,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_mask_fmsub_pd (__m256d __A, __mmask8 __U, __m256d __B,
 		      __m256d __C)
 {
-  return (__m256d) __builtin_ia32_vfmaddpd256_mask ((__v4df) __A,
+  return (__m256d) __builtin_ia32_vfmsubpd256_mask ((__v4df) __A,
 						    (__v4df) __B,
-						    -(__v4df) __C,
+						    (__v4df) __C,
 						    (__mmask8) __U);
 }
 
@@ -4139,9 +4139,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_maskz_fmsub_pd (__mmask8 __U, __m256d __A, __m256d __B,
 		       __m256d __C)
 {
-  return (__m256d) __builtin_ia32_vfmaddpd256_maskz ((__v4df) __A,
+  return (__m256d) __builtin_ia32_vfmsubpd256_maskz ((__v4df) __A,
 						     (__v4df) __B,
-						     -(__v4df) __C,
+						     (__v4df) __C,
 						     (__mmask8) __U);
 }
 
@@ -4149,9 +4149,9 @@ extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_mask_fmsub_pd (__m128d __A, __mmask8 __U, __m128d __B, __m128d __C)
 {
-  return (__m128d) __builtin_ia32_vfmaddpd128_mask ((__v2df) __A,
+  return (__m128d) __builtin_ia32_vfmsubpd128_mask ((__v2df) __A,
 						    (__v2df) __B,
-						    -(__v2df) __C,
+						    (__v2df) __C,
 						    (__mmask8) __U);
 }
 
@@ -4171,9 +4171,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_maskz_fmsub_pd (__mmask8 __U, __m128d __A, __m128d __B,
 		    __m128d __C)
 {
-  return (__m128d) __builtin_ia32_vfmaddpd128_maskz ((__v2df) __A,
+  return (__m128d) __builtin_ia32_vfmsubpd128_maskz ((__v2df) __A,
 						     (__v2df) __B,
-						     -(__v2df) __C,
+						     (__v2df) __C,
 						     (__mmask8) __U);
 }
 
@@ -4181,9 +4181,9 @@ extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_mask_fmsub_ps (__m256 __A, __mmask8 __U, __m256 __B, __m256 __C)
 {
-  return (__m256) __builtin_ia32_vfmaddps256_mask ((__v8sf) __A,
+  return (__m256) __builtin_ia32_vfmsubps256_mask ((__v8sf) __A,
 						   (__v8sf) __B,
-						   -(__v8sf) __C,
+						   (__v8sf) __C,
 						   (__mmask8) __U);
 }
 
@@ -4203,9 +4203,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_maskz_fmsub_ps (__mmask8 __U, __m256 __A, __m256 __B,
 		       __m256 __C)
 {
-  return (__m256) __builtin_ia32_vfmaddps256_maskz ((__v8sf) __A,
+  return (__m256) __builtin_ia32_vfmsubps256_maskz ((__v8sf) __A,
 						    (__v8sf) __B,
-						    -(__v8sf) __C,
+						    (__v8sf) __C,
 						    (__mmask8) __U);
 }
 
@@ -4213,9 +4213,9 @@ extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_mask_fmsub_ps (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C)
 {
-  return (__m128) __builtin_ia32_vfmaddps128_mask ((__v4sf) __A,
+  return (__m128) __builtin_ia32_vfmsubps128_mask ((__v4sf) __A,
 						   (__v4sf) __B,
-						   -(__v4sf) __C,
+						   (__v4sf) __C,
 						   (__mmask8) __U);
 }
 
@@ -4233,9 +4233,9 @@ extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_maskz_fmsub_ps (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128) __builtin_ia32_vfmaddps128_maskz ((__v4sf) __A,
+  return (__m128) __builtin_ia32_vfmsubps128_maskz ((__v4sf) __A,
 						    (__v4sf) __B,
-						    -(__v4sf) __C,
+						    (__v4sf) __C,
 						    (__mmask8) __U);
 }
 
diff --git a/gcc/config/i386/fmaintrin.h b/gcc/config/i386/fmaintrin.h
index 660d3453590..2eddd896579 100644
--- a/gcc/config/i386/fmaintrin.h
+++ b/gcc/config/i386/fmaintrin.h
@@ -86,48 +86,48 @@ extern __inline __m128d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fmsub_pd (__m128d __A, __m128d __B, __m128d __C)
 {
-  return (__m128d)__builtin_ia32_vfmaddpd ((__v2df)__A, (__v2df)__B,
-                                           -(__v2df)__C);
+  return (__m128d)__builtin_ia32_vfmsubpd ((__v2df)__A, (__v2df)__B,
+                                           (__v2df)__C);
 }
 
 extern __inline __m256d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_fmsub_pd (__m256d __A, __m256d __B, __m256d __C)
 {
-  return (__m256d)__builtin_ia32_vfmaddpd256 ((__v4df)__A, (__v4df)__B,
-                                              -(__v4df)__C);
+  return (__m256d)__builtin_ia32_vfmsubpd256 ((__v4df)__A, (__v4df)__B,
+                                              (__v4df)__C);
 }
 
 extern __inline __m128
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fmsub_ps (__m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128)__builtin_ia32_vfmaddps ((__v4sf)__A, (__v4sf)__B,
-                                          -(__v4sf)__C);
+  return (__m128)__builtin_ia32_vfmsubps ((__v4sf)__A, (__v4sf)__B,
+                                          (__v4sf)__C);
 }
 
 extern __inline __m256
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_fmsub_ps (__m256 __A, __m256 __B, __m256 __C)
 {
-  return (__m256)__builtin_ia32_vfmaddps256 ((__v8sf)__A, (__v8sf)__B,
-                                             -(__v8sf)__C);
+  return (__m256)__builtin_ia32_vfmsubps256 ((__v8sf)__A, (__v8sf)__B,
+                                             (__v8sf)__C);
 }
 
 extern __inline __m128d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fmsub_sd (__m128d __A, __m128d __B, __m128d __C)
 {
-  return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, (__v2df)__B,
-                                            -(__v2df)__C);
+  return (__m128d)__builtin_ia32_vfmsubsd3 ((__v2df)__A, (__v2df)__B,
+                                            (__v2df)__C);
 }
 
 extern __inline __m128
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fmsub_ss (__m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, (__v4sf)__B,
-                                           -(__v4sf)__C);
+  return (__m128)__builtin_ia32_vfmsubss3 ((__v4sf)__A, (__v4sf)__B,
+                                           (__v4sf)__C);
 }
 
 extern __inline __m128d
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index dc4c70c7ea3..f5b5e56a01c 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1903,10 +1903,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v8sf_maskz, "__builtin_
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v4sf_mask, "__builtin_ia32_vfmaddps128_mask", IX86_BUILTIN_VFMADDPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v4sf_mask3, "__builtin_ia32_vfmaddps128_mask3", IX86_BUILTIN_VFMADDPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v4sf_maskz, "__builtin_ia32_vfmaddps128_maskz", IX86_BUILTIN_VFMADDPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4df_mask, "__builtin_ia32_vfmsubpd256_mask", IX86_BUILTIN_VFMSUBPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4df_mask3, "__builtin_ia32_vfmsubpd256_mask3", IX86_BUILTIN_VFMSUBPD256_MASK3, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4df_maskz, "__builtin_ia32_vfmsubpd256_maskz", IX86_BUILTIN_VFMSUBPD256_MASKZ, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v2df_mask, "__builtin_ia32_vfmsubpd128_mask", IX86_BUILTIN_VFMSUBPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v2df_mask3, "__builtin_ia32_vfmsubpd128_mask3", IX86_BUILTIN_VFMSUBPD128_MASK3, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v2df_maskz, "__builtin_ia32_vfmsubpd128_maskz", IX86_BUILTIN_VFMSUBPD128_MASKZ, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v8sf_mask, "__builtin_ia32_vfmsubps256_mask", IX86_BUILTIN_VFMSUBPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v8sf_mask3, "__builtin_ia32_vfmsubps256_mask3", IX86_BUILTIN_VFMSUBPS256_MASK3, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v8sf_maskz, "__builtin_ia32_vfmsubps256_maskz", IX86_BUILTIN_VFMSUBPS256_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_mask, "__builtin_ia32_vfmsubps128_mask", IX86_BUILTIN_VFMSUBPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_mask3, "__builtin_ia32_vfmsubps128_mask3", IX86_BUILTIN_VFMSUBPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_maskz, "__builtin_ia32_vfmsubps128_maskz", IX86_BUILTIN_VFMSUBPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4df_mask, "__builtin_ia32_vfnmaddpd256_mask", IX86_BUILTIN_VFNMADDPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v2df_mask, "__builtin_ia32_vfnmaddpd128_mask", IX86_BUILTIN_VFNMADDPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v8sf_mask, "__builtin_ia32_vfnmaddps256_mask", IX86_BUILTIN_VFNMADDPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
@@ -2768,8 +2776,12 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v16sf_mask3_round, "__
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v16sf_maskz_round, "__builtin_ia32_vfmaddsubps512_maskz", IX86_BUILTIN_VFMADDSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsubadd_v8df_mask3_round, "__builtin_ia32_vfmsubaddpd512_mask3", IX86_BUILTIN_VFMSUBADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsubadd_v16sf_mask3_round, "__builtin_ia32_vfmsubaddps512_mask3", IX86_BUILTIN_VFMSUBADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v8df_mask_round, "__builtin_ia32_vfmsubpd512_mask", IX86_BUILTIN_VFMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v8df_mask3_round, "__builtin_ia32_vfmsubpd512_mask3", IX86_BUILTIN_VFMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v8df_maskz_round, "__builtin_ia32_vfmsubpd512_maskz", IX86_BUILTIN_VFMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__builtin_ia32_vfmsubps512_mask", IX86_BUILTIN_VFMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
@@ -2855,11 +2867,17 @@ BDESC_FIRST (multi_arg, MULTI_ARG,
 BDESC (OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfmadd_v2df, "__builtin_ia32_vfmaddsd", IX86_BUILTIN_VFMADDSD, UNKNOWN, (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v4sf, "__builtin_ia32_vfmaddss3", IX86_BUILTIN_VFMADDSS3, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v2df, "__builtin_ia32_vfmaddsd3", IX86_BUILTIN_VFMADDSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
+BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v4sf, "__builtin_ia32_vfmsubss3", IX86_BUILTIN_VFMSUBSS3, UNKNOWN, (int)MULTI_ARG_3_SF)
+BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v2df, "__builtin_ia32_vfmsubsd3", IX86_BUILTIN_VFMSUBSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
 
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4sf, "__builtin_ia32_vfmaddps", IX86_BUILTIN_VFMADDPS, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v2df, "__builtin_ia32_vfmaddpd", IX86_BUILTIN_VFMADDPD, UNKNOWN, (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v8sf, "__builtin_ia32_vfmaddps256", IX86_BUILTIN_VFMADDPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4df, "__builtin_ia32_vfmaddpd256", IX86_BUILTIN_VFMADDPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v4sf, "__builtin_ia32_vfmsubps", IX86_BUILTIN_VFMSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v2df, "__builtin_ia32_vfmsubpd", IX86_BUILTIN_VFMSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v8sf, "__builtin_ia32_vfmsubps256", IX86_BUILTIN_VFMSUBPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v4df, "__builtin_ia32_vfmsubpd256", IX86_BUILTIN_VFMSUBPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
 
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v4sf, "__builtin_ia32_vfmaddsubps", IX86_BUILTIN_VFMADDSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v2df, "__builtin_ia32_vfmaddsubpd", IX86_BUILTIN_VFMADDSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 635a6902d33..0e9d3541ccc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3760,6 +3760,14 @@
 	  (match_operand:FMAMODE_AVX512 2 "nonimmediate_operand")
 	  (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand")))])
 
+(define_expand "fma4i_fmsub_<mode>"
+  [(set (match_operand:FMAMODE_AVX512 0 "register_operand")
+	(fma:FMAMODE_AVX512
+	  (match_operand:FMAMODE_AVX512 1 "nonimmediate_operand")
+	  (match_operand:FMAMODE_AVX512 2 "nonimmediate_operand")
+	  (neg:FMAMODE_AVX512
+	    (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand"))))])
+
 (define_expand "<avx512>_fmadd_<mode>_maskz<round_expand_name>"
   [(match_operand:VF_AVX512VL 0 "register_operand")
    (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
@@ -3898,6 +3906,20 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_expand "<avx512>_fmsub_<mode>_maskz<round_expand_name>"
+  [(match_operand:VF_AVX512VL 0 "register_operand")
+   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F && <round_mode512bit_condition>"
+{
+  emit_insn (gen_fma_fmsub_<mode>_maskz_1<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
 (define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
 	(fma:VF_SF_AVX512VL
@@ -3913,6 +3935,49 @@
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+	(fma:VF_AVX512
+	  (match_operand:VF_AVX512 1 "register_operand" "0,v")
+	  (match_operand:VF_AVX512 2 "register_operand" "v,0")
+	  (neg:VF_AVX512
+	    (vec_duplicate:VF_AVX512
+	      (match_operand:<ssescalarmode> 3 "memory_operand" "m,m")))))]
+  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
+  "vfmsub213<ssemodesuffix>\t{%3<avx512bcst>, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<avx512bcst>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+	(fma:VF_AVX512
+	  (vec_duplicate:VF_AVX512
+	    (match_operand:<ssescalarmode> 1 "memory_operand" "m,m"))
+	  (match_operand:VF_AVX512 2 "register_operand" "0,v")
+	  (neg:VF_AVX512
+	    (match_operand:VF_AVX512 3 "register_operand" "v,0"))))]
+  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
+  "@
+   vfmsub132<ssemodesuffix>\t{%1<avx512bcst>, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %1<avx512bcst>}
+   vfmsub231<ssemodesuffix>\t{%1<avx512bcst>, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %1<avx512bcst>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+	(fma:VF_AVX512
+	  (match_operand:VF_AVX512 1 "register_operand" "0,v")
+	  (vec_duplicate:VF_AVX512
+	    (match_operand:<ssescalarmode> 2 "memory_operand" "m,m"))
+	  (neg:VF_AVX512
+	    (match_operand:VF_AVX512 3 "nonimmediate_operand" "v,0"))))]
+  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
+  "@
+   vfmsub132<ssemodesuffix>\t{%2<avx512bcst>, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<avx512bcst>}
+   vfmsub231<ssemodesuffix>\t{%2<avx512bcst>, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<avx512bcst>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "<avx512>_fmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
 	(vec_merge:VF_AVX512VL
@@ -4261,6 +4326,18 @@
 	  (const_int 1)))]
   "TARGET_FMA")
 
+(define_expand "fmai_vmfmsub_<mode><round_name>"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (fma:VF_128
+	    (match_operand:VF_128 1 "<round_nimm_predicate>")
+	    (match_operand:VF_128 2 "<round_nimm_predicate>")
+	    (neg:VF_128
+	      (match_operand:VF_128 3 "<round_nimm_predicate>")))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_FMA")
+
 (define_insn "*fmai_fmadd_<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c
new file mode 100644
index 00000000000..840888a2d81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...pd\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastsd\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512d
+#define vec 512
+#define op fmsub
+#define suffix pd
+#define SCALAR double
+
+#include "avx512-fma-1.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c
new file mode 100644
index 00000000000..0cb675b7628
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-1.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c
new file mode 100644
index 00000000000..10212d471b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-2.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c
new file mode 100644
index 00000000000..feb34077085
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-3.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c
new file mode 100644
index 00000000000..4305fffe628
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-4.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c
new file mode 100644
index 00000000000..d57251f83be
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-5.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c
new file mode 100644
index 00000000000..b26a9ee7eeb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-6.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
new file mode 100644
index 00000000000..cc705af8ea5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
+
+#define type __m512
+#define vec 512
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c
new file mode 100644
index 00000000000..2b929fa11e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+%zmm\[0-9\]+, %zmm\[0-9\]+, %zmm0" 1 } } */
+
+#define type __m512
+#define vec 512
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-8.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c
new file mode 100644
index 00000000000..70efbcc98f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mfma -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%xmm\[0-9\]+" } } */
+
+#define type __m128
+#define vec
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-1.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c
new file mode 100644
index 00000000000..a7c1b370b74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mfma -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%ymm\[0-9\]+" } } */
+
+#define type __m256
+#define vec 256
+#define op fmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-1.h"
-- 
2.17.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 4/4] i386: Update AVX512 FMSUB/FNMADD/FNMSUB tests
  2018-10-20  8:35 [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB H.J. Lu
  2018-10-20  7:31 ` [PATCH 2/4] i386: Enable AVX512 memory broadcast for FNMADD H.J. Lu
@ 2018-10-20  9:03 ` H.J. Lu
  2018-10-20  9:16 ` [PATCH 3/4] i386: Enable AVX512 memory broadcast for FNMSUB H.J. Lu
  2018-10-21 20:39 ` [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB Uros Bizjak
  3 siblings, 0 replies; 6+ messages in thread
From: H.J. Lu @ 2018-10-20  9:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

Update AVX512 tests to test the newly added FMSUB, FNMADD and FNMSUB
builtin functions.

	PR target/72782
	* gcc.target/i386/avx-1.c (__builtin_ia32_vfmsubpd512_mask): New.
	(__builtin_ia32_vfmsubpd512_maskz): Likewise.
	(__builtin_ia32_vfmsubps512_mask): Likewise.
	(__builtin_ia32_vfmsubps512_maskz): Likewise.
	(__builtin_ia32_vfnmaddpd512_mask3): Likewise.
	(__builtin_ia32_vfnmaddpd512_maskz): Likewise.
	(__builtin_ia32_vfnmaddps512_mask3): Likewise.
	(__builtin_ia32_vfnmaddps512_maskz): Likewise.
	(__builtin_ia32_vfnmsubpd512_maskz): Likewise.
	(__builtin_ia32_vfnmsubps512_maskz): Likewise.
	* testsuite/gcc.target/i386/sse-13.c
	(__builtin_ia32_vfmsubpd512_mask): Likewise.
	(__builtin_ia32_vfmsubpd512_maskz): Likewise.
	(__builtin_ia32_vfmsubps512_mask): Likewise.
	(__builtin_ia32_vfmsubps512_maskz): Likewise.
	(__builtin_ia32_vfnmaddpd512_mask3): Likewise.
	(__builtin_ia32_vfnmaddpd512_maskz): Likewise.
	(__builtin_ia32_vfnmaddps512_mask3): Likewise.
	(__builtin_ia32_vfnmaddps512_maskz): Likewise.
	(__builtin_ia32_vfnmsubpd512_maskz): Likewise.
	(__builtin_ia32_vfnmsubps512_maskz): Likewise.
	* testsuite/gcc.target/i386/sse-23.c
	(__builtin_ia32_vfmsubpd512_mask): Likewise.
	(__builtin_ia32_vfmsubpd512_maskz): Likewise.
	(__builtin_ia32_vfmsubps512_mask): Likewise.
	(__builtin_ia32_vfmsubps512_maskz): Likewise.
	(__builtin_ia32_vfnmaddpd512_mask3): Likewise.
	(__builtin_ia32_vfnmaddpd512_maskz): Likewise.
	(__builtin_ia32_vfnmaddps512_mask3): Likewise.
	(__builtin_ia32_vfnmaddps512_maskz): Likewise.
	(__builtin_ia32_vfnmsubpd512_maskz): Likewise.
	(__builtin_ia32_vfnmsubps512_maskz): Likewise.
---
 gcc/testsuite/gcc.target/i386/avx-1.c  | 10 ++++++++++
 gcc/testsuite/gcc.target/i386/sse-13.c | 10 ++++++++++
 gcc/testsuite/gcc.target/i386/sse-23.c | 10 ++++++++++
 3 files changed, 30 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index c877f9996b3..f67bc5f5044 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -351,16 +351,26 @@
 #define __builtin_ia32_vfmaddsubps512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddpd512_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfmsubpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubpd512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubps512_mask(A, B, C, D, E) __builtin_ia32_vfmsubps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubps512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubsd3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsd3_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubss3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubss3_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddpd512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddpd512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmaddps512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddps512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddps512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubps512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubps512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubps512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vpermilpd512_mask(A, E, C, D) __builtin_ia32_vpermilpd512_mask(A, 1, C, D)
 #define __builtin_ia32_vpermilps512_mask(A, E, C, D) __builtin_ia32_vpermilps512_mask(A, 1, C, D)
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 310ebfff73a..64da3cd1992 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -368,16 +368,26 @@
 #define __builtin_ia32_vfmaddsubps512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddpd512_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfmsubpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubpd512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubps512_mask(A, B, C, D, E) __builtin_ia32_vfmsubps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubps512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubsd3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsd3_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubss3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubss3_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddpd512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddpd512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmaddps512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddps512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddps512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubps512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubps512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubps512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vpermilpd512_mask(A, E, C, D) __builtin_ia32_vpermilpd512_mask(A, 1, C, D)
 #define __builtin_ia32_vpermilps512_mask(A, E, C, D) __builtin_ia32_vpermilps512_mask(A, 1, C, D)
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index cb5cdd8cd10..f9d372c47e2 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -369,14 +369,24 @@
 #define __builtin_ia32_vfmaddsubps512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddpd512_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfmsubpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubpd512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubps512_mask(A, B, C, D, E) __builtin_ia32_vfmsubps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubps512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddpd512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddpd512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmaddps512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddps512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddps512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubps512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubps512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubps512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubps512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubps512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubps512_maskz(A, B, C, D, 8)
 #define __builtin_ia32_vpermilpd512_mask(A, E, C, D) __builtin_ia32_vpermilpd512_mask(A, 1, C, D)
 #define __builtin_ia32_vpermilps512_mask(A, E, C, D) __builtin_ia32_vpermilps512_mask(A, 1, C, D)
 
-- 
2.17.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/4] i386: Enable AVX512 memory broadcast for FNMSUB
  2018-10-20  8:35 [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB H.J. Lu
  2018-10-20  7:31 ` [PATCH 2/4] i386: Enable AVX512 memory broadcast for FNMADD H.J. Lu
  2018-10-20  9:03 ` [PATCH 4/4] i386: Update AVX512 FMSUB/FNMADD/FNMSUB tests H.J. Lu
@ 2018-10-20  9:16 ` H.J. Lu
  2018-10-21 20:39 ` [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB Uros Bizjak
  3 siblings, 0 replies; 6+ messages in thread
From: H.J. Lu @ 2018-10-20  9:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FNMSUB operations.  In order to
support AVX512 memory broadcast for FNMSUB, FNMSUB builtin functions are
also added, instead of passing the negated value to FMA builtin functions.

gcc/

	PR target/72782
	* config/i386/avx512fintrin.h (_mm512_fnmsub_round_pd): Use
	__builtin_ia32_vfnmsubpd512_mask.
	(_mm512_mask_fnmsub_round_pd): Likewise.
	(_mm512_fnmsub_pd): Likewise.
	(_mm512_mask_fnmsub_pd): Likewise.
	(_mm512_maskz_fnmsub_round_pd): Use
	__builtin_ia32_vfnmsubpd512_maskz.
	(_mm512_maskz_fnmsub_pd): Likewise.
	(_mm512_fnmsub_round_ps): Use __builtin_ia32_vfnmsubps512_mask.
	(_mm512_mask_fnmsub_round_ps): Likewise.
	(_mm512_fnmsub_ps): Likewise.
	(_mm512_mask_fnmsub_ps): Likewise.
	(_mm512_maskz_fnmsub_round_ps): Use
	__builtin_ia32_vfnmsubps512_maskz.
	(_mm512_maskz_fnmsub_ps): Likewise.
	* config/i386/avx512vlintrin.h (_mm256_mask_fnmsub_pd): Use
	__builtin_ia32_vfnmsubpd256_mask.
	(_mm256_maskz_fnmsub_pd): Use __builtin_ia32_vfnmsubpd256_maskz.
	(_mm_mask_fnmsub_pd): Use __builtin_ia32_vfmaddpd128_mask
	(_mm_maskz_fnmsub_pd): Use __builtin_ia32_vfnmsubpd128_maskz.
	(_mm256_mask_fnmsub_ps): Use __builtin_ia32_vfnmsubps256_mask.
	(_mm256_mask_fnmsub_ps): Use __builtin_ia32_vfnmsubps256_mask.
	(_mm256_maskz_fnmsub_ps): Use __builtin_ia32_vfnmsubps256_maskz.
	(_mm_mask_fnmsub_ps): Use __builtin_ia32_vfnmsubps128_mask.
	(_mm_maskz_fnmsub_ps): Use __builtin_ia32_vfnmsubps128_maskz.
	* config/i386/fmaintrin.h (_mm_fnmsub_pd): Use
	__builtin_ia32_vfnmsubpd.
	(_mm256_fnmsub_pd): Use __builtin_ia32_vfnmsubpd256.
	(_mm_fnmsub_ps): Use __builtin_ia32_vfnmsubps.
	(_mm256_fnmsub_ps): Use __builtin_ia32_vfnmsubps256.
	(_mm_fnmsub_sd): Use __builtin_ia32_vfnmsubsd3.
	(_mm_fnmsub_ss): Use __builtin_ia32_vfnmsubss3.
	* config/i386/i386-builtin.def: Add
	__builtin_ia32_vfnmsubpd256_mask,
	__builtin_ia32_vfnmsubpd256_maskz,
	__builtin_ia32_vfnmsubpd128_mask,
	__builtin_ia32_vfnmsubpd128_maskz,
	__builtin_ia32_vfnmsubps256_mask,
	__builtin_ia32_vfnmsubps256_maskz,
	__builtin_ia32_vfnmsubps128_mask,
	__builtin_ia32_vfnmsubps128_maskz,
	__builtin_ia32_vfnmsubpd512_mask,
	__builtin_ia32_vfnmsubpd512_maskz,
	__builtin_ia32_vfnmsubps512_mask,
	__builtin_ia32_vfnmsubps512_maskz, __builtin_ia32_vfnmsubss3,
	__builtin_ia32_vfnmsubsd3, __builtin_ia32_vfnmsubps,
	__builtin_ia32_vfnmsubpd, __builtin_ia32_vfnmsubps256 and.
	__builtin_ia32_vfnmsubpd256.
	* config/i386/sse.md (fma4i_fnmsub_<mode>): New.
	(<avx512>_fnmsub_<mode>_maskz<round_expand_name>): Likewise.
	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1):
	Likewise.
	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2):
	Likewise.
	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3):
	Likewise.
	(fmai_vmfnmsub_<mode><round_name>): Likewise.

gcc/testsuite/

	PR target/72782
	* gcc.target/i386/avx512f-fnmsub-df-zmm-1.c: New test.
	* gcc.target/i386/avx512f-fnmsub-sf-zmm-1.c: Likewise.
	* gcc.target/i386/avx512f-fnmsub-sf-zmm-2.c: Likewise.
	* gcc.target/i386/avx512f-fnmsub-sf-zmm-3.c: Likewise.
	* gcc.target/i386/avx512f-fnmsub-sf-zmm-4.c: Likewise.
	* gcc.target/i386/avx512f-fnmsub-sf-zmm-5.c: Likewise.
	* gcc.target/i386/avx512f-fnmsub-sf-zmm-6.c: Likewise.
	* gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Likewise.
	* gcc.target/i386/avx512f-fnmsub-sf-zmm-8.c: Likewise.
	* gcc.target/i386/avx512vl-fnmsub-sf-xmm-1.c: Likewise.
	* gcc.target/i386/avx512vl-fnmsub-sf-ymm-1.c: Likewise.
---
 gcc/config/i386/avx512fintrin.h               | 80 +++++++++---------
 gcc/config/i386/avx512vlintrin.h              | 32 ++++----
 gcc/config/i386/fmaintrin.h                   | 24 +++---
 gcc/config/i386/i386-builtin.def              | 12 +++
 gcc/config/i386/sse.md                        | 82 +++++++++++++++++++
 .../gcc.target/i386/avx512f-fnmsub-df-zmm-1.c | 12 +++
 .../gcc.target/i386/avx512f-fnmsub-sf-zmm-1.c | 12 +++
 .../gcc.target/i386/avx512f-fnmsub-sf-zmm-2.c | 12 +++
 .../gcc.target/i386/avx512f-fnmsub-sf-zmm-3.c | 12 +++
 .../gcc.target/i386/avx512f-fnmsub-sf-zmm-4.c | 12 +++
 .../gcc.target/i386/avx512f-fnmsub-sf-zmm-5.c | 12 +++
 .../gcc.target/i386/avx512f-fnmsub-sf-zmm-6.c | 12 +++
 .../gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c | 12 +++
 .../gcc.target/i386/avx512f-fnmsub-sf-zmm-8.c | 12 +++
 .../i386/avx512vl-fnmsub-sf-xmm-1.c           | 12 +++
 .../i386/avx512vl-fnmsub-sf-ymm-1.c           | 12 +++
 16 files changed, 294 insertions(+), 68 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmsub-df-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-xmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-ymm-1.c

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 1445e9e8b3c..001d610bc81 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -3699,10 +3699,10 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask (-(__v8df) __A,
-						    (__v8df) __B,
-						    -(__v8df) __C,
-						    (__mmask8) -1, __R);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) -1, __R);
 }
 
 extern __inline __m512d
@@ -3732,20 +3732,20 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fnmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
 			      __m512d __C, const int __R)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_maskz (-(__v8df) __A,
-						     (__v8df) __B,
-						     -(__v8df) __C,
-						     (__mmask8) __U, __R);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U, __R);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask (-(__v16sf) __A,
-						   (__v16sf) __B,
-						   -(__v16sf) __C,
-						   (__mmask16) -1, __R);
+  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) -1, __R);
 }
 
 extern __inline __m512
@@ -3775,10 +3775,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
 			      __m512 __C, const int __R)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_maskz (-(__v16sf) __A,
-						    (__v16sf) __B,
-						    -(__v16sf) __C,
-						    (__mmask16) __U, __R);
+  return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U, __R);
 }
 #else
 #define _mm512_fmadd_round_pd(A, B, C, R)            \
@@ -3902,7 +3902,7 @@ _mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
     (__m512)__builtin_ia32_vfnmaddps512_maskz(A, B, C, U, R)
 
 #define _mm512_fnmsub_round_pd(A, B, C, R)            \
-    (__m512d)__builtin_ia32_vfmaddpd512_mask(-(A), B, -(C), -1, R)
+    (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, -1, R)
 
 #define _mm512_mask_fnmsub_round_pd(A, U, B, C, R)    \
     (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, U, R)
@@ -3911,10 +3911,10 @@ _mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
     (__m512d)__builtin_ia32_vfnmsubpd512_mask3(A, B, C, U, R)
 
 #define _mm512_maskz_fnmsub_round_pd(U, A, B, C, R)   \
-    (__m512d)__builtin_ia32_vfmaddpd512_maskz(-(A), B, -(C), U, R)
+    (__m512d)__builtin_ia32_vfnmsubpd512_maskz(A, B, C, U, R)
 
 #define _mm512_fnmsub_round_ps(A, B, C, R)            \
-    (__m512)__builtin_ia32_vfmaddps512_mask(-(A), B, -(C), -1, R)
+    (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, -1, R)
 
 #define _mm512_mask_fnmsub_round_ps(A, U, B, C, R)    \
     (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, U, R)
@@ -3923,7 +3923,7 @@ _mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
     (__m512)__builtin_ia32_vfnmsubps512_mask3(A, B, C, U, R)
 
 #define _mm512_maskz_fnmsub_round_ps(U, A, B, C, R)   \
-    (__m512)__builtin_ia32_vfmaddps512_maskz(-(A), B, -(C), U, R)
+    (__m512)__builtin_ia32_vfnmsubps512_maskz(A, B, C, U, R)
 #endif
 
 extern __inline __m512i
@@ -12768,11 +12768,11 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_mask (-(__v8df) __A,
-						    (__v8df) __B,
-						    -(__v8df) __C,
-						    (__mmask8) -1,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A,
+						     (__v8df) __B,
+						     (__v8df) __C,
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512d
@@ -12801,22 +12801,22 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fnmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
 {
-  return (__m512d) __builtin_ia32_vfmaddpd512_maskz (-(__v8df) __A,
-						     (__v8df) __B,
-						     -(__v8df) __C,
-						     (__mmask8) __U,
-						     _MM_FROUND_CUR_DIRECTION);
+  return (__m512d) __builtin_ia32_vfnmsubpd512_maskz ((__v8df) __A,
+						      (__v8df) __B,
+						      (__v8df) __C,
+						      (__mmask8) __U,
+						      _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_mask (-(__v16sf) __A,
-						   (__v16sf) __B,
-						   -(__v16sf) __C,
-						   (__mmask16) -1,
-						   _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A,
+						    (__v16sf) __B,
+						    (__v16sf) __C,
+						    (__mmask16) -1,
+						    _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512
@@ -12845,11 +12845,11 @@ extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_fnmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
 {
-  return (__m512) __builtin_ia32_vfmaddps512_maskz (-(__v16sf) __A,
-						    (__v16sf) __B,
-						    -(__v16sf) __C,
-						    (__mmask16) __U,
-						    _MM_FROUND_CUR_DIRECTION);
+  return (__m512) __builtin_ia32_vfnmsubps512_maskz ((__v16sf) __A,
+						     (__v16sf) __B,
+						     (__v16sf) __C,
+						     (__mmask16) __U,
+						     _MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m256i
diff --git a/gcc/config/i386/avx512vlintrin.h b/gcc/config/i386/avx512vlintrin.h
index 0135652eb8b..c7151787d83 100644
--- a/gcc/config/i386/avx512vlintrin.h
+++ b/gcc/config/i386/avx512vlintrin.h
@@ -4665,10 +4665,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_maskz_fnmsub_pd (__mmask8 __U, __m256d __A, __m256d __B,
 			__m256d __C)
 {
-  return (__m256d) __builtin_ia32_vfmaddpd256_maskz (-(__v4df) __A,
-						     (__v4df) __B,
-						     -(__v4df) __C,
-						     (__mmask8) __U);
+  return (__m256d) __builtin_ia32_vfnmsubpd256_maskz ((__v4df) __A,
+						      (__v4df) __B,
+						      (__v4df) __C,
+						      (__mmask8) __U);
 }
 
 extern __inline __m128d
@@ -4698,10 +4698,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_maskz_fnmsub_pd (__mmask8 __U, __m128d __A, __m128d __B,
 		     __m128d __C)
 {
-  return (__m128d) __builtin_ia32_vfmaddpd128_maskz (-(__v2df) __A,
-						     (__v2df) __B,
-						     -(__v2df) __C,
-						     (__mmask8) __U);
+  return (__m128d) __builtin_ia32_vfnmsubpd128_maskz ((__v2df) __A,
+						      (__v2df) __B,
+						      (__v2df) __C,
+						      (__mmask8) __U);
 }
 
 extern __inline __m256
@@ -4731,10 +4731,10 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_maskz_fnmsub_ps (__mmask8 __U, __m256 __A, __m256 __B,
 			__m256 __C)
 {
-  return (__m256) __builtin_ia32_vfmaddps256_maskz (-(__v8sf) __A,
-						    (__v8sf) __B,
-						    -(__v8sf) __C,
-						    (__mmask8) __U);
+  return (__m256) __builtin_ia32_vfnmsubps256_maskz ((__v8sf) __A,
+						     (__v8sf) __B,
+						     (__v8sf) __C,
+						     (__mmask8) __U);
 }
 
 extern __inline __m128
@@ -4761,10 +4761,10 @@ extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_maskz_fnmsub_ps (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128) __builtin_ia32_vfmaddps128_maskz (-(__v4sf) __A,
-						    (__v4sf) __B,
-						    -(__v4sf) __C,
-						    (__mmask8) __U);
+  return (__m128) __builtin_ia32_vfnmsubps128_maskz ((__v4sf) __A,
+						     (__v4sf) __B,
+						     (__v4sf) __C,
+						     (__mmask8) __U);
 }
 
 extern __inline __m128i
diff --git a/gcc/config/i386/fmaintrin.h b/gcc/config/i386/fmaintrin.h
index 0a2f4a7f676..4f13e81cf34 100644
--- a/gcc/config/i386/fmaintrin.h
+++ b/gcc/config/i386/fmaintrin.h
@@ -182,48 +182,48 @@ extern __inline __m128d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmsub_pd (__m128d __A, __m128d __B, __m128d __C)
 {
-  return (__m128d)__builtin_ia32_vfmaddpd (-(__v2df)__A, (__v2df)__B,
-                                           -(__v2df)__C);
+  return (__m128d)__builtin_ia32_vfnmsubpd ((__v2df)__A, (__v2df)__B,
+					    (__v2df)__C);
 }
 
 extern __inline __m256d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_fnmsub_pd (__m256d __A, __m256d __B, __m256d __C)
 {
-  return (__m256d)__builtin_ia32_vfmaddpd256 (-(__v4df)__A, (__v4df)__B,
-                                              -(__v4df)__C);
+  return (__m256d)__builtin_ia32_vfnmsubpd256 ((__v4df)__A, (__v4df)__B,
+					       (__v4df)__C);
 }
 
 extern __inline __m128
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmsub_ps (__m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128)__builtin_ia32_vfmaddps (-(__v4sf)__A, (__v4sf)__B,
-                                          -(__v4sf)__C);
+  return (__m128)__builtin_ia32_vfnmsubps ((__v4sf)__A, (__v4sf)__B,
+					   (__v4sf)__C);
 }
 
 extern __inline __m256
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_fnmsub_ps (__m256 __A, __m256 __B, __m256 __C)
 {
-  return (__m256)__builtin_ia32_vfmaddps256 (-(__v8sf)__A, (__v8sf)__B,
-                                             -(__v8sf)__C);
+  return (__m256)__builtin_ia32_vfnmsubps256 ((__v8sf)__A, (__v8sf)__B,
+					      (__v8sf)__C);
 }
 
 extern __inline __m128d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmsub_sd (__m128d __A, __m128d __B, __m128d __C)
 {
-  return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, -(__v2df)__B,
-                                            -(__v2df)__C);
+  return (__m128d)__builtin_ia32_vfnmsubsd3 ((__v2df)__A, (__v2df)__B,
+					     (__v2df)__C);
 }
 
 extern __inline __m128
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmsub_ss (__m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, -(__v4sf)__B,
-                                           -(__v4sf)__C);
+  return (__m128)__builtin_ia32_vfnmsubss3 ((__v4sf)__A, (__v4sf)__B,
+					    (__v4sf)__C);
 }
 
 extern __inline __m128d
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 74343db1cbe..df0f7e975ac 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1931,12 +1931,16 @@ BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4sf_mask3, "__builtin
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4sf_maskz, "__builtin_ia32_vfnmaddps128_maskz", IX86_BUILTIN_VFNMADDPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v4df_mask, "__builtin_ia32_vfnmsubpd256_mask", IX86_BUILTIN_VFNMSUBPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v4df_mask3, "__builtin_ia32_vfnmsubpd256_mask3", IX86_BUILTIN_VFNMSUBPD256_MASK3, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v4df_maskz, "__builtin_ia32_vfnmsubpd256_maskz", IX86_BUILTIN_VFNMSUBPD256_MASKZ, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v2df_mask, "__builtin_ia32_vfnmsubpd128_mask", IX86_BUILTIN_VFNMSUBPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v2df_mask3, "__builtin_ia32_vfnmsubpd128_mask3", IX86_BUILTIN_VFNMSUBPD128_MASK3, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v2df_maskz, "__builtin_ia32_vfnmsubpd128_maskz", IX86_BUILTIN_VFNMSUBPD128_MASKZ, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v8sf_mask, "__builtin_ia32_vfnmsubps256_mask", IX86_BUILTIN_VFNMSUBPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v8sf_mask3, "__builtin_ia32_vfnmsubps256_mask3", IX86_BUILTIN_VFNMSUBPS256_MASK3, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v8sf_maskz, "__builtin_ia32_vfnmsubps256_maskz", IX86_BUILTIN_VFNMSUBPS256_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v4sf_mask, "__builtin_ia32_vfnmsubps128_mask", IX86_BUILTIN_VFNMSUBPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v4sf_mask3, "__builtin_ia32_vfnmsubps128_mask3", IX86_BUILTIN_VFNMSUBPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmsub_v4sf_maskz, "__builtin_ia32_vfnmsubps128_maskz", IX86_BUILTIN_VFNMSUBPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmaddsub_v4df_mask, "__builtin_ia32_vfmaddsubpd256_mask", IX86_BUILTIN_VFMADDSUBPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmaddsub_v4df_mask3, "__builtin_ia32_vfmaddsubpd256_mask3", IX86_BUILTIN_VFMADDSUBPD256_MASK3, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmaddsub_v4df_maskz, "__builtin_ia32_vfmaddsubpd256_maskz", IX86_BUILTIN_VFMADDSUBPD256_MASKZ, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
@@ -2800,8 +2804,10 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v16sf_mask3_round, "__bu
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v16sf_maskz_round, "__builtin_ia32_vfnmaddps512_maskz", IX86_BUILTIN_VFNMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v8df_mask3_round, "__builtin_ia32_vfnmsubpd512_mask3", IX86_BUILTIN_VFNMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v8df_maskz_round, "__builtin_ia32_vfnmsubpd512_maskz", IX86_BUILTIN_VFNMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v16sf_mask_round, "__builtin_ia32_vfnmsubps512_mask", IX86_BUILTIN_VFNMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v16sf_mask3_round, "__builtin_ia32_vfnmsubps512_mask3", IX86_BUILTIN_VFNMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v16sf_maskz_round, "__builtin_ia32_vfnmsubps512_maskz", IX86_BUILTIN_VFNMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
 
 /* AVX512ER */
 BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_exp2v8df_mask_round, "__builtin_ia32_exp2pd_mask", IX86_BUILTIN_EXP2PD_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT)
@@ -2885,6 +2891,8 @@ BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v4sf, "__builtin_ia32_vfmsubss
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v2df, "__builtin_ia32_vfmsubsd3", IX86_BUILTIN_VFMSUBSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfnmadd_v4sf, "__builtin_ia32_vfnmaddss3", IX86_BUILTIN_VFNMADDSS3, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfnmadd_v2df, "__builtin_ia32_vfnmaddsd3", IX86_BUILTIN_VFNMADDSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
+BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfnmsub_v4sf, "__builtin_ia32_vfnmsubss3", IX86_BUILTIN_VFNMSUBSS3, UNKNOWN, (int)MULTI_ARG_3_SF)
+BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfnmsub_v2df, "__builtin_ia32_vfnmsubsd3", IX86_BUILTIN_VFNMSUBSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
 
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4sf, "__builtin_ia32_vfmaddps", IX86_BUILTIN_VFMADDPS, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v2df, "__builtin_ia32_vfmaddpd", IX86_BUILTIN_VFMADDPD, UNKNOWN, (int)MULTI_ARG_3_DF)
@@ -2898,6 +2906,10 @@ BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmadd_v4sf, "
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmadd_v2df, "__builtin_ia32_vfnmaddpd", IX86_BUILTIN_VFNMADDPD, UNKNOWN, (int)MULTI_ARG_3_DF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmadd_v8sf, "__builtin_ia32_vfnmaddps256", IX86_BUILTIN_VFNMADDPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmadd_v4df, "__builtin_ia32_vfnmaddpd256", IX86_BUILTIN_VFNMADDPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmsub_v4sf, "__builtin_ia32_vfnmsubps", IX86_BUILTIN_VFNMSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmsub_v2df, "__builtin_ia32_vfnmsubpd", IX86_BUILTIN_VFNMSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmsub_v8sf, "__builtin_ia32_vfnmsubps256", IX86_BUILTIN_VFNMSUBPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
+BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmsub_v4df, "__builtin_ia32_vfnmsubpd256", IX86_BUILTIN_VFNMSUBPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
 
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v4sf, "__builtin_ia32_vfmaddsubps", IX86_BUILTIN_VFMADDSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF)
 BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v2df, "__builtin_ia32_vfmaddsubpd", IX86_BUILTIN_VFMADDSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9ed03fa85e2..6faa2e1f653 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3776,6 +3776,15 @@
 	  (match_operand:FMAMODE_AVX512 2 "nonimmediate_operand")
 	  (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand")))])
 
+(define_expand "fma4i_fnmsub_<mode>"
+  [(set (match_operand:FMAMODE_AVX512 0 "register_operand")
+	(fma:FMAMODE_AVX512
+	  (neg:FMAMODE_AVX512
+	    (match_operand:FMAMODE_AVX512 1 "nonimmediate_operand"))
+	  (match_operand:FMAMODE_AVX512 2 "nonimmediate_operand")
+	  (neg:FMAMODE_AVX512
+	    (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand"))))])
+
 (define_expand "<avx512>_fmadd_<mode>_maskz<round_expand_name>"
   [(match_operand:VF_AVX512VL 0 "register_operand")
    (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
@@ -4159,6 +4168,20 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_expand "<avx512>_fnmsub_<mode>_maskz<round_expand_name>"
+  [(match_operand:VF_AVX512VL 0 "register_operand")
+   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F && <round_mode512bit_condition>"
+{
+  emit_insn (gen_fma_fnmsub_<mode>_maskz_1<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
 (define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
 	(fma:VF_SF_AVX512VL
@@ -4175,6 +4198,52 @@
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+	(fma:VF_AVX512
+	  (neg:VF_AVX512
+	    (match_operand:VF_AVX512 1 "register_operand" "0,v"))
+	  (match_operand:VF_AVX512 2 "register_operand" "v,0")
+	  (neg:VF_AVX512
+	    (vec_duplicate:VF_AVX512
+	      (match_operand:<ssescalarmode> 3 "memory_operand" "m,m")))))]
+  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
+  "vfnmsub213<ssemodesuffix>\t{%3<avx512bcst>, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<avx512bcst>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+	(fma:VF_AVX512
+	  (neg:VF_AVX512
+	    (vec_duplicate:VF_AVX512
+	      (match_operand:<ssescalarmode> 1 "memory_operand" "m,m")))
+	  (match_operand:VF_AVX512 2 "register_operand" "0,v")
+	  (neg:VF_AVX512
+	    (match_operand:VF_AVX512 3 "register_operand" "v,0"))))]
+  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
+  "@
+   vfnmsub132<ssemodesuffix>\t{%1<avx512bcst>, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %1<avx512bcst>}
+   vfnmsub231<ssemodesuffix>\t{%1<avx512bcst>, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %1<avx512bcst>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+	(fma:VF_AVX512
+	  (neg:VF_AVX512
+	    (match_operand:VF_AVX512 1 "register_operand" "0,v"))
+	  (vec_duplicate:VF_AVX512
+	    (match_operand:<ssescalarmode> 2 "memory_operand" "m,m"))
+	  (neg:VF_AVX512
+	    (match_operand:VF_AVX512 3 "register_operand" "v,0"))))]
+  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
+  "@
+   vfnmsub132<ssemodesuffix>\t{%2<avx512bcst>, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<avx512bcst>}
+   vfnmsub231<ssemodesuffix>\t{%2<avx512bcst>, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<avx512bcst>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "<avx512>_fnmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
 	(vec_merge:VF_AVX512VL
@@ -4415,6 +4484,19 @@
 	  (const_int 1)))]
   "TARGET_FMA")
 
+(define_expand "fmai_vmfnmsub_<mode><round_name>"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (fma:VF_128
+	    (neg:VF_128
+	      (match_operand:VF_128 2 "<round_nimm_predicate>"))
+	    (match_operand:VF_128 1 "<round_nimm_predicate>")
+	    (neg:VF_128
+	      (match_operand:VF_128 3 "<round_nimm_predicate>")))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_FMA")
+
 (define_insn "*fmai_fmadd_<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-df-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-df-zmm-1.c
new file mode 100644
index 00000000000..4d4d31e1375
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-df-zmm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...pd\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastsd\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512d
+#define vec 512
+#define op fnmsub
+#define suffix pd
+#define SCALAR double
+
+#include "avx512-fma-1.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-1.c
new file mode 100644
index 00000000000..ff9632a12cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-1.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-2.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-2.c
new file mode 100644
index 00000000000..42563501e9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-2.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-3.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-3.c
new file mode 100644
index 00000000000..6966cea551e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-3.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-4.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-4.c
new file mode 100644
index 00000000000..0748c12c335
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-4.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-4.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-5.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-5.c
new file mode 100644
index 00000000000..d24a80fb74e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-5.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-6.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-6.c
new file mode 100644
index 00000000000..ee36dc90389
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
+
+#define type __m512
+#define vec 512
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-6.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c
new file mode 100644
index 00000000000..7815251b82d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
+
+#define type __m512
+#define vec 512
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-7.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-8.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-8.c
new file mode 100644
index 00000000000..8bd669e1c4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[ \\t\]+%zmm\[0-9\]+, %zmm\[0-9\]+, %zmm0" 1 } } */
+
+#define type __m512
+#define vec 512
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-8.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-xmm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-xmm-1.c
new file mode 100644
index 00000000000..0a63df8aeae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-xmm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mfma -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%xmm\[0-9\]+" } } */
+
+#define type __m128
+#define vec
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-1.h"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-ymm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-ymm-1.c
new file mode 100644
index 00000000000..d8d48d62a74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-ymm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mfma -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */
+/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%ymm\[0-9\]+" } } */
+
+#define type __m256
+#define vec 256
+#define op fnmsub
+#define suffix ps
+#define SCALAR float
+
+#include "avx512-fma-1.h"
-- 
2.17.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB
  2018-10-20  8:35 [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB H.J. Lu
                   ` (2 preceding siblings ...)
  2018-10-20  9:16 ` [PATCH 3/4] i386: Enable AVX512 memory broadcast for FNMSUB H.J. Lu
@ 2018-10-21 20:39 ` Uros Bizjak
  2018-10-21 22:37   ` Uros Bizjak
  3 siblings, 1 reply; 6+ messages in thread
From: Uros Bizjak @ 2018-10-21 20:39 UTC (permalink / raw)
  To: H. J. Lu; +Cc: gcc-patches

On Sat, Oct 20, 2018 at 8:46 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Many AVX512 vector operations can broadcast from a scalar memory source.
> This patch enables memory broadcast for FMSUB operations.  In order to
> support AVX512 memory broadcast for FMSUB, FMSUB builtin functions are
> also added, instead of passing the negated value to FMA builtin functions.
>
> gcc/
>
>         PR target/72782
>         * config/i386/avx512fintrin.h (_mm512_fmsub_round_pd): Use
>         __builtin_ia32_vfmsubpd512_mask.
>         (_mm512_mask_fmsub_round_pd): Likewise.
>         (_mm512_fmsub_pd): Likewise.
>         (_mm512_mask_fmsub_pd): Likewise.
>         (_mm512_maskz_fmsub_round_pd): Use
>         __builtin_ia32_vfmsubpd512_maskz.
>         (_mm512_maskz_fmsub_pd): Likewise.
>         (_mm512_fmsub_round_ps): Use __builtin_ia32_vfmsubps512_mask.
>         (_mm512_mask_fmsub_round_ps): Likewise.
>         (_mm512_fmsub_ps): Likewise.
>         (_mm512_mask_fmsub_ps): Likewise.
>         (_mm512_maskz_fmsub_round_ps): Use
>         __builtin_ia32_vfmsubps512_maskz.
>         (_mm512_maskz_fmsub_ps): Likewise.
>         * config/i386/avx512vlintrin.h (_mm256_mask_fmsub_pd): Use
>         __builtin_ia32_vfmsubpd256_mask.
>         (_mm256_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd256_maskz.
>         (_mm_mask_fmsub_pd): Use __builtin_ia32_vfmaddpd128_mask
>         (_mm_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd128_maskz.
>         (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask.
>         (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask.
>         (_mm256_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps256_maskz.
>         (_mm_mask_fmsub_ps): Use __builtin_ia32_vfmsubps128_mask.
>         (_mm_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps128_maskz.
>         * config/i386/fmaintrin.h (_mm_fmsub_pd): Use
>         __builtin_ia32_vfmsubpd.
>         (_mm256_fmsub_pd): Use __builtin_ia32_vfmsubpd256.
>         (_mm_fmsub_ps): Use __builtin_ia32_vfmsubps.
>         (_mm256_fmsub_ps): Use __builtin_ia32_vfmsubps256.
>         (_mm_fmsub_sd): Use __builtin_ia32_vfmsubsd3.
>         (_mm_fmsub_ss): Use __builtin_ia32_vfmsubss3.
>         * config/i386/i386-builtin.def: Add
>         __builtin_ia32_vfmsubpd256_mask,
>         __builtin_ia32_vfmsubpd256_maskz,
>         __builtin_ia32_vfmsubpd128_mask,
>         __builtin_ia32_vfmsubpd128_maskz,
>         __builtin_ia32_vfmsubps256_mask,
>         __builtin_ia32_vfmsubps256_maskz,
>         __builtin_ia32_vfmsubps128_mask,
>         __builtin_ia32_vfmsubps128_maskz,
>         __builtin_ia32_vfmsubpd512_mask,
>         __builtin_ia32_vfmsubpd512_maskz,
>         __builtin_ia32_vfmsubps512_mask,
>         __builtin_ia32_vfmsubps512_maskz, __builtin_ia32_vfmsubss3,
>         __builtin_ia32_vfmsubsd3, __builtin_ia32_vfmsubps,
>         __builtin_ia32_vfmsubpd, __builtin_ia32_vfmsubps256 and.
>         __builtin_ia32_vfmsubpd256.
>         * config/i386/sse.md (fma4i_fmsub_<mode>): New.
>         (<avx512>_fmsub_<mode>_maskz<round_expand_name>): Likewise.
>         (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1):
>         Likewise.
>         (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2):
>         Likewise.
>         (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3):
>         Likewise.
>         (fmai_vmfmsub_<mode><round_name>): Likewise.
>
> gcc/testsuite/
>
>         PR target/72782
>         * gcc.target/i386/avx512f-fmsub-df-zmm-1.c: New test.
>         * gcc.target/i386/avx512f-fmsub-sf-zmm-1.c: Likewise.
>         * gcc.target/i386/avx512f-fmsub-sf-zmm-2.c: Likewise.
>         * gcc.target/i386/avx512f-fmsub-sf-zmm-3.c: Likewise.
>         * gcc.target/i386/avx512f-fmsub-sf-zmm-4.c: Likewise.
>         * gcc.target/i386/avx512f-fmsub-sf-zmm-5.c: Likewise.
>         * gcc.target/i386/avx512f-fmsub-sf-zmm-6.c: Likewise.
>         * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Likewise.
>         * gcc.target/i386/avx512f-fmsub-sf-zmm-8.c: Likewise.
>         * gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c: Likewise.
>         * gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c: Likewise.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/avx512fintrin.h               | 60 +++++++--------
>  gcc/config/i386/avx512vlintrin.h              | 32 ++++----
>  gcc/config/i386/fmaintrin.h                   | 24 +++---
>  gcc/config/i386/i386-builtin.def              | 18 +++++
>  gcc/config/i386/sse.md                        | 77 +++++++++++++++++++
>  .../gcc.target/i386/avx512f-fmsub-df-zmm-1.c  | 12 +++
>  .../gcc.target/i386/avx512f-fmsub-sf-zmm-1.c  | 12 +++
>  .../gcc.target/i386/avx512f-fmsub-sf-zmm-2.c  | 12 +++
>  .../gcc.target/i386/avx512f-fmsub-sf-zmm-3.c  | 12 +++
>  .../gcc.target/i386/avx512f-fmsub-sf-zmm-4.c  | 12 +++
>  .../gcc.target/i386/avx512f-fmsub-sf-zmm-5.c  | 12 +++
>  .../gcc.target/i386/avx512f-fmsub-sf-zmm-6.c  | 12 +++
>  .../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c  | 12 +++
>  .../gcc.target/i386/avx512f-fmsub-sf-zmm-8.c  | 12 +++
>  .../gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c | 12 +++
>  .../gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c | 12 +++
>  16 files changed, 285 insertions(+), 58 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c
>
> diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
> index 8473cd0d26c..c0c8fa1efd0 100644
> --- a/gcc/config/i386/avx512fintrin.h
> +++ b/gcc/config/i386/avx512fintrin.h
> @@ -3355,9 +3355,9 @@ extern __inline __m512d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R)
>  {
> -  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
> +  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
>                                                     (__v8df) __B,
> -                                                   -(__v8df) __C,
> +                                                   (__v8df) __C,
>                                                     (__mmask8) -1, __R);
>  }
>
> @@ -3366,9 +3366,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_fmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B,
>                             __m512d __C, const int __R)
>  {
> -  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
> +  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
>                                                     (__v8df) __B,
> -                                                   -(__v8df) __C,
> +                                                   (__v8df) __C,
>                                                     (__mmask8) __U, __R);
>  }
>
> @@ -3388,9 +3388,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_maskz_fmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B,
>                              __m512d __C, const int __R)
>  {
> -  return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
> +  return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
>                                                      (__v8df) __B,
> -                                                    -(__v8df) __C,
> +                                                    (__v8df) __C,
>                                                      (__mmask8) __U, __R);
>  }
>
> @@ -3398,9 +3398,9 @@ extern __inline __m512
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R)
>  {
> -  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
> +  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
>                                                    (__v16sf) __B,
> -                                                  -(__v16sf) __C,
> +                                                  (__v16sf) __C,
>                                                    (__mmask16) -1, __R);
>  }
>
> @@ -3409,9 +3409,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_fmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B,
>                             __m512 __C, const int __R)
>  {
> -  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
> +  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
>                                                    (__v16sf) __B,
> -                                                  -(__v16sf) __C,
> +                                                  (__v16sf) __C,
>                                                    (__mmask16) __U, __R);
>  }
>
> @@ -3431,9 +3431,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_maskz_fmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
>                              __m512 __C, const int __R)
>  {
> -  return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
> +  return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
>                                                     (__v16sf) __B,
> -                                                   -(__v16sf) __C,
> +                                                   (__v16sf) __C,
>                                                     (__mmask16) __U, __R);
>  }
>
> @@ -3806,28 +3806,28 @@ _mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
>      (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, C, U, R)
>
>  #define _mm512_fmsub_round_pd(A, B, C, R)            \
> -    (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, -(C), -1, R)
> +    (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, -1, R)
>
>  #define _mm512_mask_fmsub_round_pd(A, U, B, C, R)    \
> -    (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, -(C), U, R)
> +    (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, U, R)
>
>  #define _mm512_mask3_fmsub_round_pd(A, B, C, U, R)   \
>      (__m512d)__builtin_ia32_vfmsubpd512_mask3(A, B, C, U, R)
>
>  #define _mm512_maskz_fmsub_round_pd(U, A, B, C, R)   \
> -    (__m512d)__builtin_ia32_vfmaddpd512_maskz(A, B, -(C), U, R)
> +    (__m512d)__builtin_ia32_vfmsubpd512_maskz(A, B, C, U, R)
>
>  #define _mm512_fmsub_round_ps(A, B, C, R)            \
> -    (__m512)__builtin_ia32_vfmaddps512_mask(A, B, -(C), -1, R)
> +    (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, -1, R)
>
>  #define _mm512_mask_fmsub_round_ps(A, U, B, C, R)    \
> -    (__m512)__builtin_ia32_vfmaddps512_mask(A, B, -(C), U, R)
> +    (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, U, R)
>
>  #define _mm512_mask3_fmsub_round_ps(A, B, C, U, R)   \
>      (__m512)__builtin_ia32_vfmsubps512_mask3(A, B, C, U, R)
>
>  #define _mm512_maskz_fmsub_round_ps(U, A, B, C, R)   \
> -    (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, -(C), U, R)
> +    (__m512)__builtin_ia32_vfmsubps512_maskz(A, B, C, U, R)
>
>  #define _mm512_fmaddsub_round_pd(A, B, C, R)            \
>      (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, -1, R)
> @@ -12416,9 +12416,9 @@ extern __inline __m512d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_fmsub_pd (__m512d __A, __m512d __B, __m512d __C)
>  {
> -  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
> +  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
>                                                     (__v8df) __B,
> -                                                   -(__v8df) __C,
> +                                                   (__v8df) __C,
>                                                     (__mmask8) -1,
>                                                     _MM_FROUND_CUR_DIRECTION);
>  }
> @@ -12427,9 +12427,9 @@ extern __inline __m512d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_fmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C)
>  {
> -  return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A,
> +  return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A,
>                                                     (__v8df) __B,
> -                                                   -(__v8df) __C,
> +                                                   (__v8df) __C,
>                                                     (__mmask8) __U,
>                                                     _MM_FROUND_CUR_DIRECTION);
>  }
> @@ -12449,9 +12449,9 @@ extern __inline __m512d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_maskz_fmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C)
>  {
> -  return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A,
> +  return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A,
>                                                      (__v8df) __B,
> -                                                    -(__v8df) __C,
> +                                                    (__v8df) __C,
>                                                      (__mmask8) __U,
>                                                      _MM_FROUND_CUR_DIRECTION);
>  }
> @@ -12460,9 +12460,9 @@ extern __inline __m512
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_fmsub_ps (__m512 __A, __m512 __B, __m512 __C)
>  {
> -  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
> +  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
>                                                    (__v16sf) __B,
> -                                                  -(__v16sf) __C,
> +                                                  (__v16sf) __C,
>                                                    (__mmask16) -1,
>                                                    _MM_FROUND_CUR_DIRECTION);
>  }
> @@ -12471,9 +12471,9 @@ extern __inline __m512
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_fmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C)
>  {
> -  return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A,
> +  return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A,
>                                                    (__v16sf) __B,
> -                                                  -(__v16sf) __C,
> +                                                  (__v16sf) __C,
>                                                    (__mmask16) __U,
>                                                    _MM_FROUND_CUR_DIRECTION);
>  }
> @@ -12493,9 +12493,9 @@ extern __inline __m512
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_maskz_fmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C)
>  {
> -  return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A,
> +  return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A,
>                                                     (__v16sf) __B,
> -                                                   -(__v16sf) __C,
> +                                                   (__v16sf) __C,
>                                                     (__mmask16) __U,
>                                                     _MM_FROUND_CUR_DIRECTION);
>  }
> diff --git a/gcc/config/i386/avx512vlintrin.h b/gcc/config/i386/avx512vlintrin.h
> index 68b5537845b..1e2e6da29bd 100644
> --- a/gcc/config/i386/avx512vlintrin.h
> +++ b/gcc/config/i386/avx512vlintrin.h
> @@ -4117,9 +4117,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_mask_fmsub_pd (__m256d __A, __mmask8 __U, __m256d __B,
>                       __m256d __C)
>  {
> -  return (__m256d) __builtin_ia32_vfmaddpd256_mask ((__v4df) __A,
> +  return (__m256d) __builtin_ia32_vfmsubpd256_mask ((__v4df) __A,
>                                                     (__v4df) __B,
> -                                                   -(__v4df) __C,
> +                                                   (__v4df) __C,
>                                                     (__mmask8) __U);
>  }
>
> @@ -4139,9 +4139,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_maskz_fmsub_pd (__mmask8 __U, __m256d __A, __m256d __B,
>                        __m256d __C)
>  {
> -  return (__m256d) __builtin_ia32_vfmaddpd256_maskz ((__v4df) __A,
> +  return (__m256d) __builtin_ia32_vfmsubpd256_maskz ((__v4df) __A,
>                                                      (__v4df) __B,
> -                                                    -(__v4df) __C,
> +                                                    (__v4df) __C,
>                                                      (__mmask8) __U);
>  }
>
> @@ -4149,9 +4149,9 @@ extern __inline __m128d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_mask_fmsub_pd (__m128d __A, __mmask8 __U, __m128d __B, __m128d __C)
>  {
> -  return (__m128d) __builtin_ia32_vfmaddpd128_mask ((__v2df) __A,
> +  return (__m128d) __builtin_ia32_vfmsubpd128_mask ((__v2df) __A,
>                                                     (__v2df) __B,
> -                                                   -(__v2df) __C,
> +                                                   (__v2df) __C,
>                                                     (__mmask8) __U);
>  }
>
> @@ -4171,9 +4171,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_maskz_fmsub_pd (__mmask8 __U, __m128d __A, __m128d __B,
>                     __m128d __C)
>  {
> -  return (__m128d) __builtin_ia32_vfmaddpd128_maskz ((__v2df) __A,
> +  return (__m128d) __builtin_ia32_vfmsubpd128_maskz ((__v2df) __A,
>                                                      (__v2df) __B,
> -                                                    -(__v2df) __C,
> +                                                    (__v2df) __C,
>                                                      (__mmask8) __U);
>  }
>
> @@ -4181,9 +4181,9 @@ extern __inline __m256
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_mask_fmsub_ps (__m256 __A, __mmask8 __U, __m256 __B, __m256 __C)
>  {
> -  return (__m256) __builtin_ia32_vfmaddps256_mask ((__v8sf) __A,
> +  return (__m256) __builtin_ia32_vfmsubps256_mask ((__v8sf) __A,
>                                                    (__v8sf) __B,
> -                                                  -(__v8sf) __C,
> +                                                  (__v8sf) __C,
>                                                    (__mmask8) __U);
>  }
>
> @@ -4203,9 +4203,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_maskz_fmsub_ps (__mmask8 __U, __m256 __A, __m256 __B,
>                        __m256 __C)
>  {
> -  return (__m256) __builtin_ia32_vfmaddps256_maskz ((__v8sf) __A,
> +  return (__m256) __builtin_ia32_vfmsubps256_maskz ((__v8sf) __A,
>                                                     (__v8sf) __B,
> -                                                   -(__v8sf) __C,
> +                                                   (__v8sf) __C,
>                                                     (__mmask8) __U);
>  }
>
> @@ -4213,9 +4213,9 @@ extern __inline __m128
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_mask_fmsub_ps (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C)
>  {
> -  return (__m128) __builtin_ia32_vfmaddps128_mask ((__v4sf) __A,
> +  return (__m128) __builtin_ia32_vfmsubps128_mask ((__v4sf) __A,
>                                                    (__v4sf) __B,
> -                                                  -(__v4sf) __C,
> +                                                  (__v4sf) __C,
>                                                    (__mmask8) __U);
>  }
>
> @@ -4233,9 +4233,9 @@ extern __inline __m128
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_maskz_fmsub_ps (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C)
>  {
> -  return (__m128) __builtin_ia32_vfmaddps128_maskz ((__v4sf) __A,
> +  return (__m128) __builtin_ia32_vfmsubps128_maskz ((__v4sf) __A,
>                                                     (__v4sf) __B,
> -                                                   -(__v4sf) __C,
> +                                                   (__v4sf) __C,
>                                                     (__mmask8) __U);
>  }
>
> diff --git a/gcc/config/i386/fmaintrin.h b/gcc/config/i386/fmaintrin.h
> index 660d3453590..2eddd896579 100644
> --- a/gcc/config/i386/fmaintrin.h
> +++ b/gcc/config/i386/fmaintrin.h
> @@ -86,48 +86,48 @@ extern __inline __m128d
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_fmsub_pd (__m128d __A, __m128d __B, __m128d __C)
>  {
> -  return (__m128d)__builtin_ia32_vfmaddpd ((__v2df)__A, (__v2df)__B,
> -                                           -(__v2df)__C);
> +  return (__m128d)__builtin_ia32_vfmsubpd ((__v2df)__A, (__v2df)__B,
> +                                           (__v2df)__C);
>  }
>
>  extern __inline __m256d
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_fmsub_pd (__m256d __A, __m256d __B, __m256d __C)
>  {
> -  return (__m256d)__builtin_ia32_vfmaddpd256 ((__v4df)__A, (__v4df)__B,
> -                                              -(__v4df)__C);
> +  return (__m256d)__builtin_ia32_vfmsubpd256 ((__v4df)__A, (__v4df)__B,
> +                                              (__v4df)__C);
>  }
>
>  extern __inline __m128
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_fmsub_ps (__m128 __A, __m128 __B, __m128 __C)
>  {
> -  return (__m128)__builtin_ia32_vfmaddps ((__v4sf)__A, (__v4sf)__B,
> -                                          -(__v4sf)__C);
> +  return (__m128)__builtin_ia32_vfmsubps ((__v4sf)__A, (__v4sf)__B,
> +                                          (__v4sf)__C);
>  }
>
>  extern __inline __m256
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_fmsub_ps (__m256 __A, __m256 __B, __m256 __C)
>  {
> -  return (__m256)__builtin_ia32_vfmaddps256 ((__v8sf)__A, (__v8sf)__B,
> -                                             -(__v8sf)__C);
> +  return (__m256)__builtin_ia32_vfmsubps256 ((__v8sf)__A, (__v8sf)__B,
> +                                             (__v8sf)__C);
>  }
>
>  extern __inline __m128d
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_fmsub_sd (__m128d __A, __m128d __B, __m128d __C)
>  {
> -  return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, (__v2df)__B,
> -                                            -(__v2df)__C);
> +  return (__m128d)__builtin_ia32_vfmsubsd3 ((__v2df)__A, (__v2df)__B,
> +                                            (__v2df)__C);
>  }
>
>  extern __inline __m128
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_fmsub_ss (__m128 __A, __m128 __B, __m128 __C)
>  {
> -  return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, (__v4sf)__B,
> -                                           -(__v4sf)__C);
> +  return (__m128)__builtin_ia32_vfmsubss3 ((__v4sf)__A, (__v4sf)__B,
> +                                           (__v4sf)__C);
>  }
>
>  extern __inline __m128d
> diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
> index dc4c70c7ea3..f5b5e56a01c 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -1903,10 +1903,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v8sf_maskz, "__builtin_
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v4sf_mask, "__builtin_ia32_vfmaddps128_mask", IX86_BUILTIN_VFMADDPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v4sf_mask3, "__builtin_ia32_vfmaddps128_mask3", IX86_BUILTIN_VFMADDPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v4sf_maskz, "__builtin_ia32_vfmaddps128_maskz", IX86_BUILTIN_VFMADDPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4df_mask, "__builtin_ia32_vfmsubpd256_mask", IX86_BUILTIN_VFMSUBPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4df_mask3, "__builtin_ia32_vfmsubpd256_mask3", IX86_BUILTIN_VFMSUBPD256_MASK3, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4df_maskz, "__builtin_ia32_vfmsubpd256_maskz", IX86_BUILTIN_VFMSUBPD256_MASKZ, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v2df_mask, "__builtin_ia32_vfmsubpd128_mask", IX86_BUILTIN_VFMSUBPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v2df_mask3, "__builtin_ia32_vfmsubpd128_mask3", IX86_BUILTIN_VFMSUBPD128_MASK3, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v2df_maskz, "__builtin_ia32_vfmsubpd128_maskz", IX86_BUILTIN_VFMSUBPD128_MASKZ, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v8sf_mask, "__builtin_ia32_vfmsubps256_mask", IX86_BUILTIN_VFMSUBPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v8sf_mask3, "__builtin_ia32_vfmsubps256_mask3", IX86_BUILTIN_VFMSUBPS256_MASK3, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v8sf_maskz, "__builtin_ia32_vfmsubps256_maskz", IX86_BUILTIN_VFMSUBPS256_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_mask, "__builtin_ia32_vfmsubps128_mask", IX86_BUILTIN_VFMSUBPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_mask3, "__builtin_ia32_vfmsubps128_mask3", IX86_BUILTIN_VFMSUBPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_maskz, "__builtin_ia32_vfmsubps128_maskz", IX86_BUILTIN_VFMSUBPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4df_mask, "__builtin_ia32_vfnmaddpd256_mask", IX86_BUILTIN_VFNMADDPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v2df_mask, "__builtin_ia32_vfnmaddpd128_mask", IX86_BUILTIN_VFNMADDPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v8sf_mask, "__builtin_ia32_vfnmaddps256_mask", IX86_BUILTIN_VFNMADDPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
> @@ -2768,8 +2776,12 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v16sf_mask3_round, "__
>  BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v16sf_maskz_round, "__builtin_ia32_vfmaddsubps512_maskz", IX86_BUILTIN_VFMADDSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
>  BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsubadd_v8df_mask3_round, "__builtin_ia32_vfmsubaddpd512_mask3", IX86_BUILTIN_VFMSUBADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
>  BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsubadd_v16sf_mask3_round, "__builtin_ia32_vfmsubaddps512_mask3", IX86_BUILTIN_VFMSUBADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
> +BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v8df_mask_round, "__builtin_ia32_vfmsubpd512_mask", IX86_BUILTIN_VFMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
>  BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v8df_mask3_round, "__builtin_ia32_vfmsubpd512_mask3", IX86_BUILTIN_VFMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
> +BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v8df_maskz_round, "__builtin_ia32_vfmsubpd512_maskz", IX86_BUILTIN_VFMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
> +BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__builtin_ia32_vfmsubps512_mask", IX86_BUILTIN_VFMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
>  BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
> +BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
>  BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
>  BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT)
>  BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT)
> @@ -2855,11 +2867,17 @@ BDESC_FIRST (multi_arg, MULTI_ARG,
>  BDESC (OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfmadd_v2df, "__builtin_ia32_vfmaddsd", IX86_BUILTIN_VFMADDSD, UNKNOWN, (int)MULTI_ARG_3_DF)
>  BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v4sf, "__builtin_ia32_vfmaddss3", IX86_BUILTIN_VFMADDSS3, UNKNOWN, (int)MULTI_ARG_3_SF)
>  BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v2df, "__builtin_ia32_vfmaddsd3", IX86_BUILTIN_VFMADDSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
> +BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v4sf, "__builtin_ia32_vfmsubss3", IX86_BUILTIN_VFMSUBSS3, UNKNOWN, (int)MULTI_ARG_3_SF)
> +BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v2df, "__builtin_ia32_vfmsubsd3", IX86_BUILTIN_VFMSUBSD3, UNKNOWN, (int)MULTI_ARG_3_DF)
>
>  BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4sf, "__builtin_ia32_vfmaddps", IX86_BUILTIN_VFMADDPS, UNKNOWN, (int)MULTI_ARG_3_SF)
>  BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v2df, "__builtin_ia32_vfmaddpd", IX86_BUILTIN_VFMADDPD, UNKNOWN, (int)MULTI_ARG_3_DF)
>  BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v8sf, "__builtin_ia32_vfmaddps256", IX86_BUILTIN_VFMADDPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
>  BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4df, "__builtin_ia32_vfmaddpd256", IX86_BUILTIN_VFMADDPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
> +BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v4sf, "__builtin_ia32_vfmsubps", IX86_BUILTIN_VFMSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF)
> +BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v2df, "__builtin_ia32_vfmsubpd", IX86_BUILTIN_VFMSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF)
> +BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v8sf, "__builtin_ia32_vfmsubps256", IX86_BUILTIN_VFMSUBPS256, UNKNOWN, (int)MULTI_ARG_3_SF2)
> +BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v4df, "__builtin_ia32_vfmsubpd256", IX86_BUILTIN_VFMSUBPD256, UNKNOWN, (int)MULTI_ARG_3_DF2)
>
>  BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v4sf, "__builtin_ia32_vfmaddsubps", IX86_BUILTIN_VFMADDSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF)
>  BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v2df, "__builtin_ia32_vfmaddsubpd", IX86_BUILTIN_VFMADDSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF)
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 635a6902d33..0e9d3541ccc 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -3760,6 +3760,14 @@
>           (match_operand:FMAMODE_AVX512 2 "nonimmediate_operand")
>           (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand")))])
>
> +(define_expand "fma4i_fmsub_<mode>"
> +  [(set (match_operand:FMAMODE_AVX512 0 "register_operand")
> +       (fma:FMAMODE_AVX512
> +         (match_operand:FMAMODE_AVX512 1 "nonimmediate_operand")
> +         (match_operand:FMAMODE_AVX512 2 "nonimmediate_operand")
> +         (neg:FMAMODE_AVX512
> +           (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand"))))])
> +
>  (define_expand "<avx512>_fmadd_<mode>_maskz<round_expand_name>"
>    [(match_operand:VF_AVX512VL 0 "register_operand")
>     (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
> @@ -3898,6 +3906,20 @@
>     (set_attr "type" "ssemuladd")
>     (set_attr "mode" "<MODE>")])
>
> +(define_expand "<avx512>_fmsub_<mode>_maskz<round_expand_name>"
> +  [(match_operand:VF_AVX512VL 0 "register_operand")
> +   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
> +   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
> +   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
> +   (match_operand:<avx512fmaskmode> 4 "register_operand")]
> +  "TARGET_AVX512F && <round_mode512bit_condition>"
> +{
> +  emit_insn (gen_fma_fmsub_<mode>_maskz_1<round_expand_name> (
> +    operands[0], operands[1], operands[2], operands[3],
> +    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
> +  DONE;
> +})
> +
>  (define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>"
>    [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
>         (fma:VF_SF_AVX512VL
> @@ -3913,6 +3935,49 @@
>    [(set_attr "type" "ssemuladd")
>     (set_attr "mode" "<MODE>")])
>
> +(define_insn "*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1"
> +  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
> +       (fma:VF_AVX512
> +         (match_operand:VF_AVX512 1 "register_operand" "0,v")
> +         (match_operand:VF_AVX512 2 "register_operand" "v,0")
> +         (neg:VF_AVX512
> +           (vec_duplicate:VF_AVX512
> +             (match_operand:<ssescalarmode> 3 "memory_operand" "m,m")))))]
> +  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
> +  "vfmsub213<ssemodesuffix>\t{%3<avx512bcst>, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<avx512bcst>}"
> +  [(set_attr "type" "ssemuladd")
> +   (set_attr "mode" "<MODE>")])
> +
> +(define_insn "*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2"
> +  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
> +       (fma:VF_AVX512
> +         (vec_duplicate:VF_AVX512
> +           (match_operand:<ssescalarmode> 1 "memory_operand" "m,m"))
> +         (match_operand:VF_AVX512 2 "register_operand" "0,v")
> +         (neg:VF_AVX512
> +           (match_operand:VF_AVX512 3 "register_operand" "v,0"))))]
> +  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
> +  "@
> +   vfmsub132<ssemodesuffix>\t{%1<avx512bcst>, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %1<avx512bcst>}
> +   vfmsub231<ssemodesuffix>\t{%1<avx512bcst>, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %1<avx512bcst>}"
> +  [(set_attr "type" "ssemuladd")
> +   (set_attr "mode" "<MODE>")])
> +
> +(define_insn "*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3"
> +  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
> +       (fma:VF_AVX512
> +         (match_operand:VF_AVX512 1 "register_operand" "0,v")
> +         (vec_duplicate:VF_AVX512
> +           (match_operand:<ssescalarmode> 2 "memory_operand" "m,m"))
> +         (neg:VF_AVX512
> +           (match_operand:VF_AVX512 3 "nonimmediate_operand" "v,0"))))]
> +  "TARGET_AVX512F && <sd_mask_mode512bit_condition>"
> +  "@
> +   vfmsub132<ssemodesuffix>\t{%2<avx512bcst>, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<avx512bcst>}
> +   vfmsub231<ssemodesuffix>\t{%2<avx512bcst>, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<avx512bcst>}"
> +  [(set_attr "type" "ssemuladd")
> +   (set_attr "mode" "<MODE>")])
> +
>  (define_insn "<avx512>_fmsub_<mode>_mask<round_name>"
>    [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
>         (vec_merge:VF_AVX512VL
> @@ -4261,6 +4326,18 @@
>           (const_int 1)))]
>    "TARGET_FMA")
>
> +(define_expand "fmai_vmfmsub_<mode><round_name>"
> +  [(set (match_operand:VF_128 0 "register_operand")
> +       (vec_merge:VF_128
> +         (fma:VF_128
> +           (match_operand:VF_128 1 "<round_nimm_predicate>")
> +           (match_operand:VF_128 2 "<round_nimm_predicate>")
> +           (neg:VF_128
> +             (match_operand:VF_128 3 "<round_nimm_predicate>")))
> +         (match_dup 1)
> +         (const_int 1)))]
> +  "TARGET_FMA")
> +
>  (define_insn "*fmai_fmadd_<mode>"
>    [(set (match_operand:VF_128 0 "register_operand" "=v,v")
>          (vec_merge:VF_128
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c
> new file mode 100644
> index 00000000000..840888a2d81
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vfmsub...pd\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vbroadcastsd\[^\n\]*%zmm\[0-9\]+" } } */
> +
> +#define type __m512d
> +#define vec 512
> +#define op fmsub
> +#define suffix pd
> +#define SCALAR double
> +
> +#include "avx512-fma-1.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c
> new file mode 100644
> index 00000000000..0cb675b7628
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
> +
> +#define type __m512
> +#define vec 512
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-1.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c
> new file mode 100644
> index 00000000000..10212d471b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
> +
> +#define type __m512
> +#define vec 512
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-2.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c
> new file mode 100644
> index 00000000000..feb34077085
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
> +
> +#define type __m512
> +#define vec 512
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-3.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c
> new file mode 100644
> index 00000000000..4305fffe628
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
> +
> +#define type __m512
> +#define vec 512
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-4.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c
> new file mode 100644
> index 00000000000..d57251f83be
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
> +
> +#define type __m512
> +#define vec 512
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-5.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c
> new file mode 100644
> index 00000000000..b26a9ee7eeb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */
> +
> +#define type __m512
> +#define vec 512
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-6.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
> new file mode 100644
> index 00000000000..cc705af8ea5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
> +
> +#define type __m512
> +#define vec 512
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-7.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c
> new file mode 100644
> index 00000000000..2b929fa11e8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+%zmm\[0-9\]+, %zmm\[0-9\]+, %zmm0" 1 } } */
> +
> +#define type __m512
> +#define vec 512
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-8.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c
> new file mode 100644
> index 00000000000..70efbcc98f9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mfma -mavx512vl -O2" } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%xmm\[0-9\]+" } } */
> +
> +#define type __m128
> +#define vec
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-1.h"
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c
> new file mode 100644
> index 00000000000..a7c1b370b74
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mfma -mavx512vl -O2" } */
> +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%ymm\[0-9\]+" } } */
> +
> +#define type __m256
> +#define vec 256
> +#define op fmsub
> +#define suffix ps
> +#define SCALAR float
> +
> +#include "avx512-fma-1.h"
> --
> 2.17.2
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB
  2018-10-21 20:39 ` [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB Uros Bizjak
@ 2018-10-21 22:37   ` Uros Bizjak
  0 siblings, 0 replies; 6+ messages in thread
From: Uros Bizjak @ 2018-10-21 22:37 UTC (permalink / raw)
  To: H. J. Lu; +Cc: gcc-patches

On Sun, Oct 21, 2018 at 6:04 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Sat, Oct 20, 2018 at 8:46 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > Many AVX512 vector operations can broadcast from a scalar memory source.
> > This patch enables memory broadcast for FMSUB operations.  In order to
> > support AVX512 memory broadcast for FMSUB, FMSUB builtin functions are
> > also added, instead of passing the negated value to FMA builtin functions.
> >
> > gcc/
> >
> >         PR target/72782
> >         * config/i386/avx512fintrin.h (_mm512_fmsub_round_pd): Use
> >         __builtin_ia32_vfmsubpd512_mask.
> >         (_mm512_mask_fmsub_round_pd): Likewise.
> >         (_mm512_fmsub_pd): Likewise.
> >         (_mm512_mask_fmsub_pd): Likewise.
> >         (_mm512_maskz_fmsub_round_pd): Use
> >         __builtin_ia32_vfmsubpd512_maskz.
> >         (_mm512_maskz_fmsub_pd): Likewise.
> >         (_mm512_fmsub_round_ps): Use __builtin_ia32_vfmsubps512_mask.
> >         (_mm512_mask_fmsub_round_ps): Likewise.
> >         (_mm512_fmsub_ps): Likewise.
> >         (_mm512_mask_fmsub_ps): Likewise.
> >         (_mm512_maskz_fmsub_round_ps): Use
> >         __builtin_ia32_vfmsubps512_maskz.
> >         (_mm512_maskz_fmsub_ps): Likewise.
> >         * config/i386/avx512vlintrin.h (_mm256_mask_fmsub_pd): Use
> >         __builtin_ia32_vfmsubpd256_mask.
> >         (_mm256_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd256_maskz.
> >         (_mm_mask_fmsub_pd): Use __builtin_ia32_vfmaddpd128_mask
> >         (_mm_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd128_maskz.
> >         (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask.
> >         (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask.
> >         (_mm256_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps256_maskz.
> >         (_mm_mask_fmsub_ps): Use __builtin_ia32_vfmsubps128_mask.
> >         (_mm_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps128_maskz.
> >         * config/i386/fmaintrin.h (_mm_fmsub_pd): Use
> >         __builtin_ia32_vfmsubpd.
> >         (_mm256_fmsub_pd): Use __builtin_ia32_vfmsubpd256.
> >         (_mm_fmsub_ps): Use __builtin_ia32_vfmsubps.
> >         (_mm256_fmsub_ps): Use __builtin_ia32_vfmsubps256.
> >         (_mm_fmsub_sd): Use __builtin_ia32_vfmsubsd3.
> >         (_mm_fmsub_ss): Use __builtin_ia32_vfmsubss3.
> >         * config/i386/i386-builtin.def: Add
> >         __builtin_ia32_vfmsubpd256_mask,
> >         __builtin_ia32_vfmsubpd256_maskz,
> >         __builtin_ia32_vfmsubpd128_mask,
> >         __builtin_ia32_vfmsubpd128_maskz,
> >         __builtin_ia32_vfmsubps256_mask,
> >         __builtin_ia32_vfmsubps256_maskz,
> >         __builtin_ia32_vfmsubps128_mask,
> >         __builtin_ia32_vfmsubps128_maskz,
> >         __builtin_ia32_vfmsubpd512_mask,
> >         __builtin_ia32_vfmsubpd512_maskz,
> >         __builtin_ia32_vfmsubps512_mask,
> >         __builtin_ia32_vfmsubps512_maskz, __builtin_ia32_vfmsubss3,
> >         __builtin_ia32_vfmsubsd3, __builtin_ia32_vfmsubps,
> >         __builtin_ia32_vfmsubpd, __builtin_ia32_vfmsubps256 and.
> >         __builtin_ia32_vfmsubpd256.
> >         * config/i386/sse.md (fma4i_fmsub_<mode>): New.
> >         (<avx512>_fmsub_<mode>_maskz<round_expand_name>): Likewise.
> >         (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1):
> >         Likewise.
> >         (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2):
> >         Likewise.
> >         (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3):
> >         Likewise.
> >         (fmai_vmfmsub_<mode><round_name>): Likewise.
> >
> > gcc/testsuite/
> >
> >         PR target/72782
> >         * gcc.target/i386/avx512f-fmsub-df-zmm-1.c: New test.
> >         * gcc.target/i386/avx512f-fmsub-sf-zmm-1.c: Likewise.
> >         * gcc.target/i386/avx512f-fmsub-sf-zmm-2.c: Likewise.
> >         * gcc.target/i386/avx512f-fmsub-sf-zmm-3.c: Likewise.
> >         * gcc.target/i386/avx512f-fmsub-sf-zmm-4.c: Likewise.
> >         * gcc.target/i386/avx512f-fmsub-sf-zmm-5.c: Likewise.
> >         * gcc.target/i386/avx512f-fmsub-sf-zmm-6.c: Likewise.
> >         * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Likewise.
> >         * gcc.target/i386/avx512f-fmsub-sf-zmm-8.c: Likewise.
> >         * gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c: Likewise.
> >         * gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c: Likewise.
>
> LGTM.

LGTM for the whole patch serie (all patches implement the same approach).

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-21 16:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-20  8:35 [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB H.J. Lu
2018-10-20  7:31 ` [PATCH 2/4] i386: Enable AVX512 memory broadcast for FNMADD H.J. Lu
2018-10-20  9:03 ` [PATCH 4/4] i386: Update AVX512 FMSUB/FNMADD/FNMSUB tests H.J. Lu
2018-10-20  9:16 ` [PATCH 3/4] i386: Enable AVX512 memory broadcast for FNMSUB H.J. Lu
2018-10-21 20:39 ` [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB Uros Bizjak
2018-10-21 22:37   ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).