[PATCH i386 4/8] [AVX512] Add substed patterns.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH i386 4/8] [AVX512] Add substed patterns.
@ 2013-08-14  7:44 Kirill Yukhin
  2013-08-22 14:15 ` Kirill Yukhin
                   ` (5 more replies)
  0 siblings, 6 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-08-14  7:44 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek; +Cc: GCC Patches

Hello,

This patch introduces support of patterns for EVEX Masking and Rounding fetures.

**********
MASKING
**********
For each insn we wanted to add its original variant, extended to 512 bit, its
masked variant, unmasked variant with rounding and masked variant with rounding.
Later we.d probably add variants with and without embedded broadcasting.  To do
that, we introduced define_subst and used it in the following way:
The most of insn patterns are something like
(set
  (match_operand /*dest*/ )
  (match_operand /*operation*/)),
so we made next subst for them:
(define_subst "mask"
  [(set (match_operand:SUBST_V 0)
        (match_operand:SUBST_V 1))]
  "TARGET_AVX512F"
  [(set (match_dup 0)
        (vec_merge:SUBST_V
            (match_dup 1)
            (match_operand:SUBST_V 2 "vector_move_operand" "0C")
            (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))])

But that subst wasn.t enough to cover all patterns for instructions with
masking: we needed 3 more subst for masking. Those are:
* mask_scalar,
* mask_scalar_merge,
* sd

mask_scalar is used for scalar instructions, in which we need to merge not with
destination but with another source operand. Example of such instruction is
vsqrtss. We can.t use usual mask-subst here, as we need to firstly do vec_merge
for masking, and only then take the lowest element with the second vec_merge .
so the order of vec_merges matters here.  So, we specify the pattern in more
details for this subst:
(set
  (match_operand /*dest*/)
  (vec_merge
    (match_operand /*operation*/)
    (match_operand /*src operand from which we take upper bits*/)
    (const_int 1)))
We want to add our vec_merge (for masking) upon the operation, not on the
exsisting vec_merge

The next subst is mask_scalar_merge, it is used for new cmp-instructions, which
compares vectors and stores the result in a mask. If the operation is masked
itself, we need to AND the result-mask of comparison with the input mask - and
that's what the subst does:
(define_subst "mask_scalar_merge"
  [(set (match_operand:SUBST_S 0)
        (match_operand:SUBST_S 1))]
  "TARGET_AVX512F"
  [(set (match_dup 0)
        (and:SUBST_S
            (match_dup 1)
            (match_operand:SUBST_S 3 "register_operand" "k")))])

The last subst is called sd (Source-Destination). It's is almost the same, as
the usual mask-subst, but it's only used for zero-masking. The reason is that
some patterns already have an operand with constraint "0" and we can't add a new
operand with the same constraint. So we add only zero-masking here by subst and
manually write a pattern for merge-masking where we use match_dup instead of an
operand with constraint "0":
(define_insn "avx512f_fmadd_<mode>_mask<round_name>"
  [(set (match_operand:VF_512 0 "register_operand" "=T,T")
	(vec_merge:VF_512
	  (fma:VF_512
	    (match_operand:VF_512 1 "register_operand" "0,0")
	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,T")
	    (match_operand:VF_512 3 "nonimmediate_operand" "T,<round_constraint>"))
	  (match_dup 1) <<<<<------ Operand for merge
	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
  "TARGET_AVX512F"
  "@
   vfmadd132
   ..."
  [(...)])
Examples of such instruction are FMA-insns and some permutes.

We also added set of subst-attributes: they are used to:
1) modify name of insn-pattern:
(define_subst_attr "mask_name" "mask" "" "_mask")
2) properly add masking operands
(define_subst_attr "mask_operand3" "mask" "" "%M4%N3")
3) adjust operand's constraints
(define_subst_attr "store_mask_constraint" "mask" "Tm" "T")
4) hide unmasked version of pattern with '*'
(define_subst_attr "mask_codefor" "mask" "*" "")

It's not possible to share subst-attrs across different substs, so we created
such set of subst-attrs for each subst.

**********
ROUNDING
**********
Rounding is implemented by adding a const_int value in parallel with the
original pattern - however, that might be not the best approach.  define_subst
for this transformation are quite straightforward - we needed three of them to
cover all patterns we need to transform.  The most interesting part here is how
to determine which operand corresponds to the rounding immediate. To somehow
reflect this in the pattern, we allowed using attributes in other attributes,
i.e. attributes nesting (that's already in the trunk). The problem here arises
from the next: when subst for rounding is applied to a pattern after subst for
masking, it's actually applied to two patterns with different number of
operands. To find that number, we wrote next attributes: (define_subst_attr
"round_mask_operand2" "mask" "%R2" "%R4") (define_subst_attr "round_mask_op2"
"round" "" "<round_mask_operand2>") In result, we get either empty string (if
rounding-subst isn't applied), or "%R2" or "%R4" depending on whether mask-subst
was applied.

There are other similar attributes for different combinations of round- and
mask- substs.

    
It is applicable on top of [3/8] (which was rebased on new trunk today).
      

**********
TESTING
**********
  1. Bootstrap pass.
  2. make check shows no regressions.
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option.
  4. Spec 2000 & 2006 run shows no regressions without -m512d option.

**********
ChangeLog entry
**********
2013-08-14  Alexander Ivchenko  <alexander.ivchenko@intel.com>
	    Maxim Kuznetsov  <maxim.kuznetsov@intel.com>
	    Sergey Lega  <sergey.s.lega@intel.com>
	    Anna Tikhonova  <anna.tikhonova@intel.com>
	    Ilya Tocar  <ilya.tocar@intel.com>
	    Andrey Turetskiy  <andrey.turetskiy@intel.com>
	    Ilya Verbin  <ilya.verbin@intel.com>
	    Kirill Yukhin  <kirill.yukhin@intel.com>
	    Michael Zolotukhin  <michael.v.zolotukhin@intel.com>

	* config/i386/i386.c (ix86_print_operand): Support masking and
	rounding.
	* config/i386/i386.md (define_constants): Support rounding.
	* config/i386/predicate.md (const_0_to_4_operand): New.
	(const_0_to_5_operand): Ditto.
	* config/i386/sse.md (avx512f_load<mode>_mask): New.
	(avx512f_store<mode>_mask): Ditto.
	(avx512f_storeu<ssemodesuffix>512_mask): Ditto.
	(avx512f_storedqu<mode>_mask): Ditto.
	(avx512f_moves<mode>_mask): Ditto.
	(avx512f_loads<mode>_mask): Ditto.
	(*avx512f_loads<mode>_mask): Ditto.
	(avx512f_stores<mode>_mask): Ditto.
	(avx512f_vmcmp<mode>3_mask<round_saeonly_name>): Ditto.
	(avx512f_fmadd_<mode>_maskz<round_expand_name>): Ditto.
	(avx512f_fmadd_<mode>_mask<round_name>): Ditto.
	(avx512f_fmadd_<mode>_mask3<round_name>): Ditto.
	(avx512f_fmsub_<mode>_mask<round_name>): Ditto.
	(avx512f_fmsub_<mode>_mask3<round_name>): Ditto.
	(avx512f_fnmadd_<mode>_mask<round_name>): Ditto.
	(avx512f_fnmadd_<mode>_mask3<round_name>): Ditto.
	(avx512f_fnmsub_<mode>_mask<round_name>): Ditto.
	(avx512f_fnmsub_<mode>_mask<round_name>): Ditto.
	(avx512f_fmaddsub_<mode>_maskz<round_expand_name>): Ditto.
	(avx512f_fmaddsub_<mode>_mask<round_name>): Ditto.
	(avx512f_fmaddsub_<mode>_mask3<round_name>): Ditto.
	(avx512f_fmsubadd_<mode>_mask<round_name>): Ditto.
	(avx512f_fmsubadd_<mode>_mask3<round_name>): Ditto.
	(fmai_vmfmadd_<mode>_maskz<round_name>): Ditto.
	(fmai_vmfmadd_<mode>_mask<round_name>): Ditto.
	(fmai_vmfmadd_<mode>_mask3<round_name>): Ditto.
	(*fmai_fmadd_<mode>_maskz<round_name>): Ditto.
	(*fmai_fmsub_<mode>_mask<round_name>): Ditto.
	(fmai_vmfmsub_<mode>_mask3<round_name>): Ditto.
	(*fmai_fmsub_<mode>_maskz<round_name>): Ditto.
	(*fmai_vmfnmadd_<mode>_mask<round_name>): Ditto.
	(*fmai_vmfnmadd_<mode>_mask3<round_name>): Ditto.
	(*fmai_vmfnmadd_<mode>_maskz<round_name>): Ditto.
	(*fmai_fnmsub_<mode>_mask<round_name>): Ditto.
	(*fmai_fnmsub_<mode>_mask3<round_name>): Ditto.
	(*fmai_fnmsub_<mode>_maskz<round_name>): Ditto.
	(vec_unpacku_float_hi_v16si): Ditto.
	(vec_unpacku_float_lo_v16si): Ditto.
	(avx512f_vextract<shuffletype>32x4_mask): Ditto.
	(avx512f_vextract<shuffletype>32x4_1_maskm): Ditto.
	(avx512f_vextract<shuffletype>64x4_mask): Ditto.
	(vec_extract_lo_<mode>_maskm): Ditto.
	(vec_extract_hi_<mode>_maskm): Ditto.
	(avx512f_vternlog<mode>_maskz): Ditto.
	(avx512f_vternlog<mode>_mask): Ditto.
	(avx512f_shufps512_mask): Ditto.
	(avx512f_fixupimm<mode>_mask<round_saeonly_name>): Ditto.
	(avx512f_sfixupimm<mode>_maskz<round_saeonly_expand_name5>): Ditto.
	(avx512f_sfixupimm<mode>_mask<round_saeonly_name>): Ditto.
	(avx512f_shufpd512_mask): Ditto.
	(avx512f_<code><pmov_src_lower><mode>2_mask): Ditto.
	(avx512f_<code>v8div16qi2_mask): Ditto.
	(*avx512f_<code>v8div16qi2_store_mask): Ditto.
	(ashr<mode>3<mask_name>): Ditto.
	(avx512f_vinsert<shuffletype>32x4_mask): Ditto.
	(avx512f_shuf_<shuffletype>64x2_mask): Ditto.
	(avx512f_shuf_<shuffletype>32x4_mask): Ditto.
	(avx512f_pshufdv3_mask): Ditto.
	(avx512f_perm<mode>_mask): Ditto.
	(avx512f_vpermi2var<mode>3_maskz): Ditto.
	(avx512f_vpermi2var<mode>3_mask): Ditto.
	(avx512f_vpermt2var<mode>3_maskz): Ditto.
	(avx512f_vpermt2var<mode>3_mask): Ditto.
	(avx512f_compress<mode>_mask): Ditto.
	(avx512f_compressstore<mode>_mask): Ditto.
	(avx512f_expand<mode>_maskz): Ditto.
	(avx512f_expand<mode>_mask): Ditto.
	(<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>): Extend for
	masking and/or rounding.
	(<sse2_avx_avx512f>_loaddqu<mode><mask_name>): Ditto.
	(<plusminus_insn><mode>3<mask_name><round_name>): Ditto.
	(*<plusminus_insn><mode>3<mask_name><round_name>): Ditto.
	(<sse>_vm<plusminus_insn><mode>3<mask_scalar_name><round_name>): Ditto.
	(mul<mode>3<mask_name><round_name>): Ditto.
	(*mul<mode>3<mask_name><round_name>): Ditto.
	(<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_name>): Ditto.
	(<sse>_div<mode>3<mask_name><round_name>): Ditto.
	(<mask_codefor>rcp14<mode><mask_name>): Ditto.
	(<mask_scalar_codefor>srcp14<mode><mask_scalar_name>): Ditto.
	(<sse>_sqrt<mode>2<mask_name><round_name>): Ditto.
	(<sse>_vmsqrt<mode>2<mask_scalar_name><round_name>): Ditto.
	(<mask_codefor>rsqrt14<mode><mask_name>): Ditto.
	(<mask_scalar_codefor>rsqrt14<mode><mask_scalar_name>): Ditto.
	(<code><mode>3<mask_name><round_saeonly_name>): Ditto.
	(*<code><mode>3_finite<mask_name><round_saeonly_name>): Ditto.
	(*<code><mode>3<mask_name><round_saeonly_name>): Ditto.
	(<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_name>): Ditto.
	(avx512f_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>): Ditto.
	(avx512f_ucmp<mode>3<mask_scalar_merge_name>): Ditto.
	(avx512f_vmcmp<mode>3<round_saeonly_name>): Ditto.
	(<sse>_comi<round_saeonly_name>): Ditto.
	(<sse>_ucomi<round_saeonly_name>): Ditto.
	(<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>): Ditto.
	(<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>): Ditto.
	(<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>): Ditto.
	(<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>): Ditto.
	(<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>): Ditto.
	(<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>): Ditto.
	(*fmai_fmadd_<mode>_maskz<round_name>): Ditto.
	(*fmai_fmadd_<mode><round_name>): Ditto.
	(*fmai_fmsub_<mode><round_name>): Ditto.
	(*fmai_fnmadd_<mode><round_name>): Ditto.
	(*fmai_fnmsub_<mode><round_name>): Ditto.
	(sse_cvtsi2ss<round_name>): Ditto.
	(sse_cvtsi2ssq<round_name>): Ditto.
	(sse_cvtss2si<round_name>): Ditto.
	(sse_cvtss2siq<round_name>): Ditto.
	(sse_cvttss2si<round_saeonly_name>): Ditto.
	(sse_cvttss2siq<round_saeonly_name>): Ditto.
	(cvtusi2<ssescalarmodesuffix>32<round_name>): Ditto.
	(cvtusi2<ssescalarmodesuffix>64<round_name>): Ditto.
	(float<sseintvecmodelower><mode>2<mask_name><round_name>): Ditto.
	(ufloatv16siv16sf2<mask_name><round_name>): Ditto.
	(<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name><round_name>): Ditto.
	(<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name><round_name>): Ditto.
	(<fixsuffix>fix_truncv16sfv16si2<mask_name><round_saeonly_name>): Ditto.
	(sse2_cvtsi2sdq<round_name>): Ditto.
	(avx512f_vcvtss2usi<round_name>): Ditto.
	(avx512f_vcvtss2usiq<round_name>): Ditto.
	(avx512f_vcvttss2usi<round_saeonly_name>): Ditto.
	(avx512f_vcvttss2usiq<round_saeonly_name>): Ditto.
	(avx512f_vcvtsd2usi<round_name>): Ditto.
	(avx512f_vcvtsd2usiq<round_name>): Ditto.
	(avx512f_vcvttsd2usi<round_saeonly_name>): Ditto.
	(avx512f_vcvttsd2usiq<round_saeonly_name>): Ditto.
	(sse2_cvtsd2si<round_name>): Ditto.
	(sse2_cvtsd2siq<round_name>): Ditto.
	(sse2_cvttsd2si<round_saeonly_name>): Ditto.
	(sse2_cvttsd2siq<round_saeonly_name>): Ditto.
	(float<si2dfmodelower><mode>2<mask_name>): Ditto.
	(ufloatv8siv8df<mask_name>): Ditto.
	(<mask_codefor>avx512f_cvtpd2dq512<mask_name><round_name>): Ditto.
	(avx512f_ufix_notruncv8dfv8si<mask_name><round_name>): Ditto.
	(<fixsuffix>fix_truncv8dfv8si2<mask_name><round_saeonly_name>): Ditto.
	(sse2_cvtsd2ss<mask_scalar_name><round_name>): Ditto.
	(sse2_cvtss2sd<mask_scalar_name><round_saeonly_name>): Ditto.
	(<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>): Ditto.
	(<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix><mask_name><round_saeonly_name>): Ditto.
	(<mask_codefor>avx512f_unpckhps512<mask_name>): Ditto.
	(<mask_codefor>avx512f_unpcklps512<mask_name>): Ditto.
	(<mask_codefor>avx512f_movshdup512<mask_name>): Ditto.
	(<mask_codefor>avx512f_movsldup512<mask_name>): Ditto.
	(<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>): Ditto.
	(vec_extract_lo_<mode><mask_name>): Ditto.
	(vec_extract_hi_<mode><mask_name>): Ditto.
	(<mask_codefor>avx512f_unpckhpd512<mask_name>): Ditto.
	(avx512f_movddup512<mask_name>): Ditto.
	(avx512f_unpcklpd512<mask_name>): Ditto.
	(*avx512f_unpcklpd512<mask_name>): Ditto.
	(<mask_scalar_codefor>avx512f_vmscalef<mode><mask_scalar_name><round_name>): Ditto.
	(avx512f_scalef<mode><mask_name><round_name>): Ditto.
	(avx512f_vternlog<mode><sd_maskz_name>): Ditto.
	(avx512f_getexp<mode><mask_name><round_saeonly_name>): Ditto.
	(avx512f_sgetexp<mode><mask_scalar_name><round_saeonly_name>): Ditto.
	(<mask_codefor>avx512f_align<mode><mask_name>): Ditto.
	(avx512f_fixupimm<mode><sd_maskz_name><round_saeonly_name>): Ditto.
	(avx512f_sfixupimm<mode><sd_maskz_name><round_saeonly_name>): Ditto.
	(avx512f_rndscale<mode><mask_name><round_saeonly_name>): Ditto.
	(<mask_scalar_codefor>avx512f_rndscale<mode><mask_scalar_name><round_saeonly_name>): Ditto.
	(avx512f_shufps512_1<mask_name>): Ditto.
	(avx512f_shufpd512_1<mask_name>): Ditto.
	(<mask_codefor>avx512f_interleave_highv8di<mask_name>): Ditto.
	(<mask_codefor>avx512f_interleave_lowv8di<mask_name>): Ditto.
	(<plusminus_insn><mode>3<mask_name>): Ditto.
	(*<plusminus_insn><mode>3<mask_name>): Ditto.
	(vec_widen_umult_even_v16si<mask_name>): Ditto.
	(*vec_widen_umult_even_v16si<mask_name>): Ditto.
	(vec_widen_smult_even_v16si<mask_name>): Ditto.
	(*vec_widen_smult_even_v16si<mask_name>): Ditto.
	(mul<mode>3<mask_name>): Ditto.
	(*<sse4_1_avx2>_mul<mode>3<mask_name>): Ditto.
	(<shift_insn><mode>3<mask_name>): Ditto.
	(avx512f_<rotate>v<mode><mask_name>): Ditto.
	(avx512f_<rotate><mode><mask_name>): Ditto.
	(<code><mode>3<mask_name><round_name>): Ditto.
	(*avx2_<code><mode>3<mask_name><round_name>): Ditto.
	(avx512f_eq<mode>3<mask_scalar_merge_name>): Ditto.
	(avx512f_eq<mode>3<mask_scalar_merge_name>_1): Ditto.
	(avx512f_gt<mode>3<mask_scalar_merge_name>): Ditto.
	(<sse2_avx2>_andnot<mode>3<mask_name>): Ditto.
	(*andnot<mode>3<mask_name>): Ditto.
	(<mask_codefor><code><mode>3<mask_name>): Ditto.
	(avx512f_testm<mode>3<mask_scalar_merge_name>): Ditto.
	(avx512f_testnm<mode>3<mask_scalar_merge_name>): Ditto.
	(<mask_codefor>avx512f_interleave_highv16si<mask_name>): Ditto.
	(<mask_codefor>avx512f_interleave_lowv16si<mask_name>): Ditto.
	(<mask_codefor>avx512f_vinsert<shuffletype>32x4_1<mask_name>): Ditto.
	(avx512f_vinsert<shuffletype>64x4_mask): Ditto.
	(vec_set_lo_<mode><mask_name>): Ditto.
	(vec_set_hi_<mode><mask_name>): Ditto.
	(avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ditto.
	(avx512f_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
	(avx512f_pshufd_1<mask_name>): Ditto.
	(abs<mode>2<mask_name>): Ditto.
	(<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>): Ditto.
	(avx512f_<code>v16hiv16si2<mask_name>): Ditto.
	(avx512f_<code>v8qiv8di2<mask_name>): Ditto.
	(avx512f_<code>v8hiv8di2<mask_name>): Ditto.
	(avx512f_<code>v8siv8di2<mask_name>): Ditto.
	(avx512er_exp2<mode><mask_name><round_name>): Ditto.
	(<mask_codefor>avx512er_rcp28<mode><mask_name><round_name>): Ditto.
	(<mask_codefor>avx512er_rsqrt28<mode><mask_name><round_name>): Ditto.
	(<avx2_avx512f>_permvar<mode><mask_name>): Ditto.
	(<avx2_avx512f>_perm<mode>_1<mask_name>): Ditto.
	(<mask_codefor>avx512f_vec_dup<mode><mask_name>): Ditto.
	(<mask_codefor>avx512f_broadcast<mode><mask_name>): Ditto.
	(<mask_codefor>avx512f_broadcast<mode><mask_name>): Ditto.
	(<mask_codefor>avx512f_vec_dup_gpr<mode><mask_name>): Ditto.
	(<mask_codefor>avx512f_vec_dup_mem<mode><mask_name>): Ditto.
	(<sse2_avx_avx512f>_vpermil<mode><mask_name>): Ditto.
	(<sse2_avx_avx512f>_vpermil<mode><mask_name>): Ditto.
	(*<sse2_avx_avx512f>_vpermilp<mode><mask_name>): Ditto.
	(<sse2_avx_avx512f>_vpermilvar<mode>3<mask_name>): Ditto.
	(avx512f_vpermi2var<mode>3<sd_maskz_name>): Ditto.
	(avx512f_vpermt2var<mode>3<sd_maskz_name>): Ditto.
	(<avx2_avx512f>_ashrv<mode><mask_name>): Ditto.
	(<avx2_avx512f>_<shift_insn>v<mode><mask_name>): Ditto.
	(<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>): Ditto.
	(<mask_codefor>avx512f_vcvtps2ph512<mask_name>): Ditto.
	(avx512f_getmant<mode><mask_name><round_saeonly_name>): Ditto.
	(avx512f_getmant<mode><mask_scalar_name><round_saeonly_name>): Ditto.
	(clz<mode>2<mask_name>): Ditto.
	(<mask_codefor>conflict<mode><mask_name>): Ditto.
	* config/i386/subst.md (SUBST_V): New iterator.
	(SUBST_S): Ditto.
	(SUBST_S): Ditto.
	(mask): New subst with set of attributes.
	(mask_scalar): Ditto.
	(mask_scalar_merge): Ditto.
	(sd): Ditto.
	(round): Ditto.
	(round_saeonly): Ditto.
	(round_expand): Ditto.
	(round_saeonly_expand5): Ditto.


Is it ok to install to trunk?

Thanks, K

---
 gcc/config/i386/i386.c        |   37 +
 gcc/config/i386/i386.md       |   10 +
 gcc/config/i386/predicates.md |   10 +
 gcc/config/i386/sse.md        | 2323 ++++++++++++++++++++++++++++++++---------
 gcc/config/i386/subst.md      |  222 ++++
 5 files changed, 2115 insertions(+), 487 deletions(-)
 create mode 100644 gcc/config/i386/subst.md

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7851d12..29d4384 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15082,6 +15082,43 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	  /* We do not want to print value of the operand.  */
 	  return;
 
+	case 'N':
+	  if (x == const0_rtx || x == CONST0_RTX (GET_MODE (x)))
+	    fputs ("{z}", file);
+	  return;
+
+	case 'R':
+	  gcc_assert (CONST_INT_P (x));
+
+	  if (ASSEMBLER_DIALECT == ASM_INTEL)
+	    fputs (", ", file);
+
+	  switch (INTVAL (x))
+	    {
+	    case ROUND_NEAREST_INT:
+	      fputs ("{rn-sae}", file);
+	      break;
+	    case ROUND_NEG_INF:
+	      fputs ("{rd-sae}", file);
+	      break;
+	    case ROUND_POS_INF:
+	      fputs ("{ru-sae}", file);
+	      break;
+	    case ROUND_ZERO:
+	      fputs ("{rz-sae}", file);
+	      break;
+	    case ROUND_SAE:
+	      fputs ("{sae}", file);
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+
+	  if (ASSEMBLER_DIALECT == ASM_ATT)
+	    fputs (", ", file);
+
+	  return;
+
 	case '*':
 	  if (ASSEMBLER_DIALECT == ASM_ATT)
 	    putc ('*', file);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bffe748..cae180f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -241,6 +241,16 @@
    (ROUND_NO_EXC		0x8)
   ])
 
+;; Constants to represent AVX512F embeded rounding
+(define_constants
+  [(ROUND_NEAREST_INT			0)
+   (ROUND_NEG_INF			1)
+   (ROUND_POS_INF			2)
+   (ROUND_ZERO				3)
+   (NO_ROUND				4)
+   (ROUND_SAE				5)
+  ])
+
 ;; Constants to represent pcomtrue/pcomfalse variants
 (define_constants
   [(PCOM_FALSE			0)
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index c029fa8..668ec49 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -672,6 +672,16 @@
   (and (match_code "const_int")
        (match_test "IN_RANGE (INTVAL (op), 0, 3)")))
 
+;; Match 0 to 4.
+(define_predicate "const_0_to_4_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 4)")))
+
+;; Match 0 to 5.
+(define_predicate "const_0_to_5_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 5)")))
+
 ;; Match 0 to 7.
 (define_predicate "const_0_to_7_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6802031..e0de91b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -629,6 +629,9 @@
 (define_mode_attr bcstscalarsuff
   [(V16SI "d") (V16SF "ss") (V8DI "q") (V8DF "sd")])
 
+;; Include define_subst patterns for instructions with mask
+(include "subst.md")
+
 ;; Patterns whose name begins with "sse{,2,3}_" are invoked by intrinsics.
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -768,6 +771,28 @@
 	      ]
 	      (const_string "<sseinsnmode>")))])
 
+(define_insn "avx512f_load<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v,v")
+	(vec_merge:VI48F_512
+	  (match_operand:VI48F_512 1 "nonimmediate_operand" "v,m")
+	  (match_operand:VI48F_512 2 "vector_move_operand" "0C,0C")
+	  (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+{
+  switch (get_attr_mode (insn))
+    {
+    case MODE_V8DF:
+    case MODE_V16SF:
+      return "vmova<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}";
+    default:
+      return "vmovdqa<ssescalarsize>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}";
+    }
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "none,load")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "avx512f_blendm<mode>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(vec_merge:VI48F_512
@@ -780,6 +805,28 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_store<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "memory_operand" "=m")
+	(vec_merge:VI48F_512
+	  (match_operand:VI48F_512 1 "register_operand" "v")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
+  "TARGET_AVX512F"
+{
+  switch (get_attr_mode (insn))
+    {
+    case MODE_V8DF:
+    case MODE_V16SF:
+      return "vmova<ssemodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    default:
+      return "vmovdqa<ssescalarsize>\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    }
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "sse2_movq128"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(vec_concat:V2DI
@@ -871,21 +918,21 @@
   DONE;
 })
 
-(define_insn "<sse>_loadu<ssemodesuffix><avxsizesuffix>"
+(define_insn "<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=v")
 	(unspec:VF_AVX512F
 	  [(match_operand:VF_AVX512F 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_LOADU))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
 {
   switch (get_attr_mode (insn))
     {
     case MODE_V16SF:
     case MODE_V8SF:
     case MODE_V4SF:
-      return "%vmovups\t{%1, %0|%0, %1}";
+      return "%vmovups\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     default:
-      return "%vmovu<ssemodesuffix>\t{%1, %0|%0, %1}";
+      return "%vmovu<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     }
 }
   [(set_attr "type" "ssemov")
@@ -932,12 +979,36 @@
 	      ]
 	      (const_string "<MODE>")))])
 
-(define_insn "<sse2_avx_avx512f>_loaddqu<mode>"
+(define_insn "avx512f_storeu<ssemodesuffix>512_mask"
+  [(set (match_operand:VF_512 0 "memory_operand" "=m")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "v")]
+	    UNSPEC_STOREU)
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
+  "TARGET_AVX512F"
+{
+  switch (get_attr_mode (insn))
+    {
+    case MODE_V16SF:
+      return "vmovups\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    default:
+      return "vmovu<ssemodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    }
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "movu" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "<sse2_avx_avx512f>_loaddqu<mode><mask_name>"
   [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "register_operand" "=v")
 	(unspec:VI_UNALIGNED_LOADSTORE
 	  [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_LOADU))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
 {
   switch (get_attr_mode (insn))
     {
@@ -946,9 +1017,9 @@
       return "%vmovups\t{%1, %0|%0, %1}";
     case MODE_XI:
       if (<MODE>mode == V8DImode)
-	return "vmovdqu64\t{%1, %0|%0, %1}";
+	return "vmovdqu64\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
       else
-	return "vmovdqu32\t{%1, %0|%0, %1}";
+	return "vmovdqu32\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     default:
       return "%vmovdqu\t{%1, %0|%0, %1}";
     }
@@ -1011,6 +1082,88 @@
 	      ]
 	      (const_string "<sseinsnmode>")))])
 
+(define_insn "avx512f_storedqu<mode>_mask"
+  [(set (match_operand:VI48_512 0 "memory_operand" "=m")
+	(vec_merge:VI48_512
+	  (unspec:VI48_512
+	    [(match_operand:VI48_512 1 "register_operand" "v")]
+	    UNSPEC_STOREU)
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
+  "TARGET_AVX512F"
+{
+  if (<MODE>mode == V8DImode)
+    return "vmovdqu64\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+  else
+    return "vmovdqu32\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "movu" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_moves<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (match_operand:VF_128 2 "register_operand" "v")
+	    (match_operand:VF_128 3 "vector_move_operand" "0C")
+	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
+	  (match_operand:VF_128 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512f_loads<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (vec_duplicate:VF_128
+	      (match_operand:<ssescalarmode> 1 "memory_operand"))
+	    (match_operand:VF_128 2 "vector_move_operand")
+	    (match_operand:<avx512fmaskmode> 3 "register_operand"))
+	  (match_dup 4)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "operands[4] = CONST0_RTX (<MODE>mode);")
+
+(define_insn "*avx512f_loads<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (vec_duplicate:VF_128
+	      (match_operand:<ssescalarmode> 1 "memory_operand" "m"))
+	    (match_operand:VF_128 2 "vector_move_operand" "0C")
+	    (match_operand:<avx512fmaskmode> 3 "register_operand" "k"))
+	  (match_operand:VF_128 4 "const0_operand")
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vmov<ssescalarmodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "load")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_stores<mode>_mask"
+  [(set (match_operand:<ssescalarmode> 0 "memory_operand" "=m")
+	(vec_select:<ssescalarmode>
+	  (vec_merge:VF_128
+	    (match_operand:VF_128 1 "register_operand" "v")
+	    (vec_duplicate:VF_128
+	      (match_dup 0))
+	    (match_operand:<avx512fmaskmode> 2 "register_operand" "k"))
+	  (parallel [(const_int 0)])))]
+  "TARGET_AVX512F"
+  "vmov<ssescalarmodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "<ssescalarmode>")])
+
 (define_insn "<sse3>_lddqu<avxsizesuffix>"
   [(set (match_operand:VI1 0 "register_operand" "=x")
 	(unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")]
@@ -1138,83 +1291,83 @@
 }
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
-(define_expand "<plusminus_insn><mode>3"
+(define_expand "<plusminus_insn><mode>3<mask_name><round_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand")
 	(plusminus:VF_AVX512F
 	  (match_operand:VF_AVX512F 1 "nonimmediate_operand")
 	  (match_operand:VF_AVX512F 2 "nonimmediate_operand")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*<plusminus_insn><mode>3"
+(define_insn "*<plusminus_insn><mode>3<mask_name><round_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=x,v")
 	(plusminus:VF_AVX512F
 	  (match_operand:VF_AVX512F 1 "nonimmediate_operand" "<comm>0,v")
-	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,<round_constraint>")))]
+  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    <plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   v<plusminus_mnemonic><ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<plusminus_insn><mode>3"
+(define_insn "<sse>_vm<plusminus_insn><mode>3<mask_scalar_name><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (plusminus:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<plusminus_mnemonic><ssescalarmodesuffix>\t{<round_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2<round_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_expand "mul<mode>3"
+(define_expand "mul<mode>3<mask_name><round_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand")
 	(mult:VF_AVX512F
 	  (match_operand:VF_AVX512F 1 "nonimmediate_operand")
 	  (match_operand:VF_AVX512F 2 "nonimmediate_operand")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
 
-(define_insn "*mul<mode>3"
+(define_insn "*mul<mode>3<mask_name><round_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=x,v")
 	(mult:VF_AVX512F
 	  (match_operand:VF_AVX512F 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands)"
+	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,<round_constraint>")))]
+  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    mul<ssemodesuffix>\t{%2, %0|%0, %2}
-   vmul<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vmul<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssemul")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<multdiv_mnemonic><mode>3"
+(define_insn "<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (multdiv:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    <multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<multdiv_mnemonic><ssescalarmodesuffix>\t{<round_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2<round_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1246,18 +1399,18 @@
     }
 })
 
-(define_insn "<sse>_div<mode>3"
+(define_insn "<sse>_div<mode>3<mask_name><round_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=x,v")
 	(div:VF_AVX512F
 	  (match_operand:VF_AVX512F 1 "register_operand" "0,v")
-	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE"
+	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,<round_constraint>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    div<ssemodesuffix>\t{%2, %0|%0, %2}
-   vdiv<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vdiv<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssediv")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_rcp<mode>2"
@@ -1290,18 +1443,18 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
-(define_insn "rcp14<mode>"
+(define_insn "<mask_codefor>rcp14<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RCP14))]
   "TARGET_AVX512F"
-  "vrcp14<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrcp14<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "srcp14<mode>"
+(define_insn "<mask_scalar_codefor>srcp14<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1311,7 +1464,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -1337,33 +1490,33 @@
     }
 })
 
-(define_insn "<sse>_sqrt<mode>2"
+(define_insn "<sse>_sqrt<mode>2<mask_name><round_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=v")
-	(sqrt:VF_AVX512F (match_operand:VF_AVX512F 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE"
-  "%vsqrt<ssemodesuffix>\t{%1, %0|%0, %1}"
+	(sqrt:VF_AVX512F (match_operand:VF_AVX512F 1 "nonimmediate_operand" "<round_constraint>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "%vsqrt<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
    (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vmsqrt<mode>2"
+(define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (sqrt:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 1 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_operand:VF_128 2 "register_operand" "0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vsqrt<ssescalarmodesuffix>\t{%1, %2, %0|%0, %2, %<iptr>1}"
+   vsqrt<ssescalarmodesuffix>\t{<round_mask_scalar_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %<iptr>1<round_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "btver2_sse_attr" "sqrt")
-   (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_expand "rsqrt<mode>2"
@@ -1386,18 +1539,18 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "rsqrt14<mode>"
+(define_insn "<mask_codefor>rsqrt14<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RSQRT14))]
   "TARGET_AVX512F"
-  "vrsqrt14<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrsqrt14<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "rsqrt14<mode>"
+(define_insn "<mask_scalar_codefor>rsqrt14<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1407,7 +1560,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrsqrt14<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrsqrt14<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -1432,67 +1585,67 @@
 ;; isn't really correct, as those rtl operators aren't defined when
 ;; applied to NaNs.  Hopefully the optimizers won't get too smart on us.
 
-(define_expand "<code><mode>3"
+(define_expand "<code><mode>3<mask_name><round_saeonly_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand")
 	(smaxmin:VF_AVX512F
 	  (match_operand:VF_AVX512F 1 "nonimmediate_operand")
 	  (match_operand:VF_AVX512F 2 "nonimmediate_operand")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
 {
   if (!flag_finite_math_only)
     operands[1] = force_reg (<MODE>mode, operands[1]);
   ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);
 })
 
-(define_insn "*<code><mode>3_finite"
+(define_insn "*<code><mode>3_finite<mask_name><round_saeonly_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=x,v")
 	(smaxmin:VF_AVX512F
 	  (match_operand:VF_AVX512F 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>")))]
   "TARGET_SSE && flag_finite_math_only
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
-  "
+   && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{<round_saeonly_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_saeonly_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*<code><mode>3"
+(define_insn "*<code><mode>3<mask_name><round_saeonly_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=x,v")
 	(smaxmin:VF_AVX512F
 	  (match_operand:VF_AVX512F 1 "register_operand" "0,v")
-	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VF_AVX512F 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>")))]
   "TARGET_SSE && !flag_finite_math_only
-  "
+   && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{<round_saeonly_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_saeonly_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<code><mode>3"
+(define_insn "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (smaxmin:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>"))
 	 (match_dup 1)
 	 (const_int 1)))]
   "TARGET_SSE"
   "@
    <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<maxmin_float><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<maxmin_float><ssescalarmodesuffix>\t{<round_saeonly_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2<round_saeonly_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
 ;; These versions of the min/max patterns implement exactly the operations
@@ -2008,21 +2161,21 @@
   [(V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand")
   (V16SI "const_0_to_7_operand") (V8DI "const_0_to_7_operand")])
 
-(define_insn "avx512f_cmp<mode>3"
+(define_insn "avx512f_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48F_512 1 "register_operand" "v")
-	   (match_operand:VI48F_512 2 "nonimmediate_operand" "vm")
+	   (match_operand:VI48F_512 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 3 "<cmp_imm_predicate>" "n")]
 	  UNSPEC_PCMP))]
-  "TARGET_AVX512F"
-  "v<sseintprefix>cmp<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "TARGET_AVX512F && <round_saeonly_mode512bit_condition_op1>"
+  "v<sseintprefix>cmp<ssemodesuffix>\t{%3, <round_saeonly_mask_scalar_merge_op4>%2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2<round_saeonly_mask_scalar_merge_op4>, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_ucmp<mode>3"
+(define_insn "avx512f_ucmp<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48_512 1 "register_operand" "v")
@@ -2030,23 +2183,41 @@
 	   (match_operand:SI 3 "const_0_to_7_operand" "n")]
 	  UNSPEC_UNSIGNED_PCMP))]
   "TARGET_AVX512F"
-  "vpcmpu<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vpcmpu<ssemodesuffix>\t{%3, %2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vmcmp<mode>3"
+(define_insn "avx512f_vmcmp<mode>3<round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(and:<avx512fmaskmode>
 	  (unspec:<avx512fmaskmode>
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
 	    UNSPEC_PCMP)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vcmp<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vcmp<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}"
+  [(set_attr "type" "ssecmp")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<ssescalarmode>")])
+
+(define_insn "avx512f_vmcmp<mode>3_mask<round_saeonly_name>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
+	(and:<avx512fmaskmode>
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VF_128 1 "register_operand" "v")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
+	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
+	    UNSPEC_PCMP)
+	  (and:<avx512fmaskmode>
+	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k")
+	    (const_int 1))))]
+  "TARGET_AVX512F"
+  "vcmp<ssescalarmodesuffix>\t{%3, <round_saeonly_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_saeonly_op5>, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
@@ -2064,17 +2235,17 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<sse>_comi"
+(define_insn "<sse>_comi<round_saeonly_name>"
   [(set (reg:CCFP FLAGS_REG)
 	(compare:CCFP
 	  (vec_select:MODEF
 	    (match_operand:<ssevecmode> 0 "register_operand" "v")
 	    (parallel [(const_int 0)]))
 	  (vec_select:MODEF
-	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "vm")
+	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "SSE_FLOAT_MODE_P (<MODE>mode)"
-  "%vcomi<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "%vcomi<ssemodesuffix>\t{<round_saeonly_op2>%1, %0|%0, %<iptr>1<round_saeonly_op2>}"
   [(set_attr "type" "ssecomi")
    (set_attr "prefix" "maybe_vex")
    (set_attr "prefix_rep" "0")
@@ -2084,17 +2255,17 @@
 		      (const_string "0")))
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_ucomi"
+(define_insn "<sse>_ucomi<round_saeonly_name>"
   [(set (reg:CCFPU FLAGS_REG)
 	(compare:CCFPU
 	  (vec_select:MODEF
 	    (match_operand:<ssevecmode> 0 "register_operand" "v")
 	    (parallel [(const_int 0)]))
 	  (vec_select:MODEF
-	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "vm")
+	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "SSE_FLOAT_MODE_P (<MODE>mode)"
-  "%vucomi<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "%vucomi<ssemodesuffix>\t{<round_saeonly_op2>%1, %0|%0, %<iptr>1<round_saeonly_op2>}"
   [(set_attr "type" "ssecomi")
    (set_attr "prefix" "maybe_vex")
    (set_attr "prefix_rep" "0")
@@ -2585,78 +2756,228 @@
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_insn "*fma_fmadd_<mode>"
+(define_expand "avx512f_fmadd_<mode>_maskz<round_expand_name>"
+  [(match_operand:VF_512 0 "register_operand")
+   (match_operand:VF_512 1 "<round_expand_predicate>")
+   (match_operand:VF_512 2 "<round_expand_predicate>")
+   (match_operand:VF_512 3 "<round_expand_predicate>")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_fma_fmadd_<mode>_maskz_1<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
+(define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
-	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  ""
-  "@
-   vfmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x")
+	  (match_operand:FMAMODE 2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
+	  (match_operand:FMAMODE 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x")))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "@
+   vfmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "fma_fmsub_<mode>"
+(define_insn "avx512f_fmadd_<mode>_mask<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "0,0")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>"))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmadd_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=x")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "x")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
+	    (match_operand:VF_512 3 "register_operand" "0"))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
-	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0,0,v,x,x")
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
-  ""
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x"))))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "fma_fnmadd_<mode>"
+(define_insn "avx512f_fmsub_<mode>_mask<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "0,0")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>")))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmsub_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "v")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "register_operand" "0")))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE   3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  ""
-  "@
-   vfnmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfnmadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfnmadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
+	  (match_operand:FMAMODE   3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x")))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "@
+   vfnmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfnmadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfnmadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fma_fnmsub_<mode>"
+(define_insn "avx512f_fnmadd_<mode>_mask<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "0,0"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>"))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfnmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfnmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fnmadd_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "v"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
+	    (match_operand:VF_512 3 "register_operand" "0"))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfnmadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
-  ""
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x"))))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfnmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfnmsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfnmsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfnmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfnmsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfnmsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fnmsub_<mode>_mask<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "0,0"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>")))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfnmsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfnmsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fnmsub_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "v"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "register_operand" "0")))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfnmsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 ;; FMA parallel floating point multiply addsub and subadd operations.
 
 ;; It would be possible to represent these without the UNSPEC as
@@ -2677,47 +2998,146 @@
 	  UNSPEC_FMADDSUB))]
   "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
 
-(define_insn "*fma_fmaddsub_<mode>"
+(define_expand "avx512f_fmaddsub_<mode>_maskz<round_expand_name>"
+  [(match_operand:VF_512 0 "register_operand")
+   (match_operand:VF_512 1 "<round_expand_predicate>")
+   (match_operand:VF_512 2 "<round_expand_predicate>")
+   (match_operand:VF_512 3 "<round_expand_predicate>")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_fma_fmaddsub_<mode>_maskz_1<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
+(define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF_AVX512F
-	  [(match_operand:VF_AVX512F 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF_AVX512F 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	   (match_operand:VF_AVX512F 3 "nonimmediate_operand" " v,vm, 0,xm,x")]
+	  [(match_operand:VF_AVX512F 1 "nonimmediate_operand" "%0,0,v,x,x")
+	   (match_operand:VF_AVX512F 2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
+	   (match_operand:VF_AVX512F 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x")]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfmaddsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmaddsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmaddsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fma_fmsubadd_<mode>"
+(define_insn "avx512f_fmaddsub_<mode>_mask<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "0,0")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	     (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>")]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmaddsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmaddsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmaddsub_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "v")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
+	     (match_operand:VF_512 3 "register_operand" "0")]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmaddsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF_AVX512F
-	  [(match_operand:VF_AVX512F   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF_AVX512F   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  [(match_operand:VF_AVX512F   1 "nonimmediate_operand" "%0,0,v,x,x")
+	   (match_operand:VF_AVX512F   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
 	   (neg:VF_AVX512F
-	     (match_operand:VF_AVX512F 3 "nonimmediate_operand" " v,vm, 0,xm,x"))]
+	     (match_operand:VF_AVX512F 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x"))]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmsubadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfmsubadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmsubadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmsubadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fmsubadd_<mode>_mask<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "0,0")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	     (neg:VF_512
+	       (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>"))]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmsubadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmsubadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmsubadd_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "v")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
+	     (neg:VF_512
+	       (match_operand:VF_512 3 "register_operand" "0"))]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmsubadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 ;; FMA3 floating point scalar intrinsics. These merge result with
 ;; high-order elements from the destination register.
 
-(define_expand "fmai_vmfmadd_<mode>"
+(define_expand "fmai_vmfmadd_<mode>_maskz<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand")
+	      (match_operand:VF_128 2 "nonimmediate_operand")
+	      (match_operand:VF_128 3 "nonimmediate_operand"))
+	    (match_dup <round_opnum>)
+	    (match_operand:QI 4 "register_operand"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "operands[<round_opnum>] = CONST0_RTX (<MODE>mode);")
+
+(define_expand "fmai_vmfmadd_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand")
 	(vec_merge:VF_128
 	  (fma:VF_128
@@ -2728,71 +3148,303 @@
 	  (const_int 1)))]
   "TARGET_FMA")
 
-(define_insn "*fmai_fmadd_<mode>"
+(define_insn "fmai_vmfmadd_<mode>_mask<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>, v")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmadd132<ssescalarmodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmadd213<ssescalarmodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fmai_vmfmadd_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "%v")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")
+	      (match_operand:VF_128 3 "register_operand" "0"))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfmadd231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmadd_<mode>_maskz<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmadd132<ssescalarmodesuffix>\t{<round_op6>%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %3, %2<round_op6>}
+   vfmadd213<ssescalarmodesuffix>\t{<round_op6>%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3<round_op6>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmadd_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "vm, v")
-	    (match_operand:VF_128 3 "nonimmediate_operand" " v,vm"))
+	    (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfmadd132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfmadd213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmsub_<mode>_mask<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2<round_op5>}
+   vfmsub213<ssescalarmodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fmai_vmfmsub_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "%v")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "register_operand" "0")))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfmsub231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmsub_<mode>_maskz<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{<round_op6>%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2<round_op6>}
+   vfmsub213<ssescalarmodesuffix>\t{<round_op6>%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3<round_op6>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fmsub_<mode>"
+(define_insn "*fmai_vmfnmadd_<mode>_mask<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmadd132<ssescalarmodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2<round_op5>}
+   vfnmadd213<ssescalarmodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_vmfnmadd_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 1 "nonimmediate_operand" "%v"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")
+	      (match_operand:VF_128 3 "register_operand" "0"))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfnmadd231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_vmfnmadd_<mode>_maskz<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmadd132<ssescalarmodesuffix>\t{<round_op6>%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2<round_op6>}
+   vfnmadd213<ssescalarmodesuffix>\t{<round_op6>%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3<round_op6>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fnmsub_<mode>_mask<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (neg:VF_128
+		(match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmsub132<ssescalarmodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2<round_op5>}
+   vfnmsub213<ssescalarmodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fnmsub_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 1 "nonimmediate_operand" "%v"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")
+	      (neg:VF_128
+	        (match_operand:VF_128 3 "register_operand" "0")))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfnmsub231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fnmsub_<mode>_maskz<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmsub132<ssescalarmodesuffix>\t{<round_op6>%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2<round_op6>}
+   vfnmsub213<ssescalarmodesuffix>\t{<round_op6>%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3<round_op6>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmsub_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   2 "nonimmediate_operand" "vm, v")
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128   2 "nonimmediate_operand" "<round_constraint>,v")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfmsub132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfmsub213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmadd_<mode>"
+(define_insn "*fmai_fnmadd_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   3 "nonimmediate_operand" " v,vm"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128   3 "nonimmediate_operand" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfnmadd132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfnmadd213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmsub_<mode>"
+(define_insn "*fmai_fnmsub_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfnmsub132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfnmsub213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
@@ -2913,18 +3565,18 @@
    (set_attr "prefix_rep" "0")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtsi2ss"
+(define_insn "sse_cvtsi2ss<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (float:SF (match_operand:SI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:SF (match_operand:SI 2 "nonimmediate_operand" "r,m,<round_constraint3>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    cvtsi2ss\t{%2, %0|%0, %2}
    cvtsi2ss\t{%2, %0|%0, %2}
-   vcvtsi2ss\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2ss\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "vector,double,*")
@@ -2934,18 +3586,18 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtsi2ssq"
+(define_insn "sse_cvtsi2ssq<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (float:SF (match_operand:DI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:SF (match_operand:DI 2 "nonimmediate_operand" "r,m,<round_constraint3>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE && TARGET_64BIT"
   "@
    cvtsi2ssq\t{%2, %0|%0, %2}
    cvtsi2ssq\t{%2, %0|%0, %2}
-   vcvtsi2ssq\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2ssq\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "vector,double,*")
@@ -2957,15 +3609,15 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtss2si"
+(define_insn "sse_cvtss2si<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(unspec:SI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE"
-  "%vcvtss2si\t{%1, %0|%0, %k1}"
+  "%vcvtss2si\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -2987,15 +3639,15 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse_cvtss2siq"
+(define_insn "sse_cvtss2siq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(unspec:DI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE && TARGET_64BIT"
-  "%vcvtss2si{q}\t{%1, %0|%0, %k1}"
+  "%vcvtss2si{q}\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3017,14 +3669,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse_cvttss2si"
+(define_insn "sse_cvttss2si<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(fix:SI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_saeonly_constraint2>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE"
-  "%vcvttss2si\t{%1, %0|%0, %k1}"
+  "%vcvttss2si\t{<round_saeonly_op2>%1, %0|%0, %k1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3033,14 +3685,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse_cvttss2siq"
+(define_insn "sse_cvttss2siq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(fix:DI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "v,vm")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE && TARGET_64BIT"
-  "%vcvttss2si{q}\t{%1, %0|%0, %k1}"
+  "%vcvttss2si{q}\t{<round_saeonly_op2>%1, %0|%0, %k1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3049,50 +3701,50 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "cvtusi2<ssescalarmodesuffix>32"
+(define_insn "cvtusi2<ssescalarmodesuffix>32<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_duplicate:VF_128
 	    (unsigned_float:<ssescalarmode>
-	      (match_operand:SI 2 "nonimmediate_operand" "rm")))
+	      (match_operand:SI 2 "nonimmediate_operand" "<round_constraint3>")))
 	  (match_operand:VF_128 1 "register_operand" "v")
 	  (const_int 1)))]
-  "TARGET_AVX512F"
-  "vcvtusi2<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX512F && <round_modev4sf_condition>"
+  "vcvtusi2<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "cvtusi2<ssescalarmodesuffix>64"
+(define_insn "cvtusi2<ssescalarmodesuffix>64<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_duplicate:VF_128
 	    (unsigned_float:<ssescalarmode>
-	      (match_operand:DI 2 "nonimmediate_operand" "rm")))
+	      (match_operand:DI 2 "nonimmediate_operand" "<round_constraint3>")))
 	  (match_operand:VF_128 1 "register_operand" "v")
 	  (const_int 1)))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtusi2<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vcvtusi2<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "float<sseintvecmodelower><mode>2"
+(define_insn "float<sseintvecmodelower><mode>2<mask_name><round_name>"
   [(set (match_operand:VF1_AVX512F 0 "register_operand" "=v")
 	(float:VF1_AVX512F
-	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE2"
-  "%vcvtdq2ps\t{%1, %0|%0, %1}"
+	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "<round_constraint>")))]
+  "TARGET_SSE2 && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "%vcvtdq2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "ufloatv16siv16sf2"
+(define_insn "ufloatv16siv16sf2<mask_name><round_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(unsigned_float:V16SF
-	  (match_operand:V16SI 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V16SI 1 "nonimmediate_operand" "<round_constraint>")))]
   "TARGET_AVX512F"
-  "vcvtudq2ps\t{%1, %0|%0, %1}"
+  "vcvtudq2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -3126,34 +3778,34 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_fix_notruncv16sfv16si"
+(define_insn "<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name><round_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
-	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16SF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2dq\t{%1, %0|%0, %1}"
+  "vcvtps2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "avx512f_ufix_notruncv16sfv16si"
+(define_insn "<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name><round_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
-	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16SF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2udq\t{%1, %0|%0, %1}"
+  "vcvtps2udq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "<fixsuffix>fix_truncv16sfv16si2"
+(define_insn "<fixsuffix>fix_truncv16sfv16si2<mask_name><round_saeonly_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_fix:V16SI
-	  (match_operand:V16SF 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V16SF 1 "nonimmediate_operand" "<round_saeonly_constraint>")))]
   "TARGET_AVX512F"
-  "vcvttps2<fixsuffix>dq\t{%1, %0|%0, %1}"
+  "vcvttps2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -3261,18 +3913,18 @@
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "DF")])
 
-(define_insn "sse2_cvtsi2sdq"
+(define_insn "sse2_cvtsi2sdq<round_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (vec_duplicate:V2DF
-	    (float:DF (match_operand:DI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:DF (match_operand:DI 2 "nonimmediate_operand" "r,m,<round_constraint3>")))
 	  (match_operand:V2DF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE2 && TARGET_64BIT"
   "@
    cvtsi2sdq\t{%2, %0|%0, %2}
    cvtsi2sdq\t{%2, %0|%0, %2}
-   vcvtsi2sdq\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2sdq\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,direct,*")
@@ -3283,115 +3935,115 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "DF")])
 
-(define_insn "avx512f_vcvtss2usi"
+(define_insn "avx512f_vcvtss2usi<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtss2usi\t{%1, %0|%0, %1}"
+  "vcvtss2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvtss2usiq"
+(define_insn "avx512f_vcvtss2usiq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec:DI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtss2usi\t{%1, %0|%0, %1}"
+  "vcvtss2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvttss2usi"
+(define_insn "avx512f_vcvttss2usi<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unsigned_fix:SI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
-  "vcvttss2usi\t{%1, %0|%0, %1}"
+  "vcvttss2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvttss2usiq"
+(define_insn "avx512f_vcvttss2usiq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unsigned_fix:DI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvttss2usi\t{%1, %0|%0, %1}"
+  "vcvttss2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvtsd2usi"
+(define_insn "avx512f_vcvtsd2usi<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtsd2usi\t{%1, %0|%0, %1}"
+  "vcvtsd2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvtsd2usiq"
+(define_insn "avx512f_vcvtsd2usiq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec:DI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtsd2usi\t{%1, %0|%0, %1}"
+  "vcvtsd2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvttsd2usi"
+(define_insn "avx512f_vcvttsd2usi<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unsigned_fix:SI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
-  "vcvttsd2usi\t{%1, %0|%0, %1}"
+  "vcvttsd2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvttsd2usiq"
+(define_insn "avx512f_vcvttsd2usiq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unsigned_fix:DI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvttsd2usi\t{%1, %0|%0, %1}"
+  "vcvttsd2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse2_cvtsd2si"
+(define_insn "sse2_cvtsd2si<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(unspec:SI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2"
-  "%vcvtsd2si\t{%1, %0|%0, %q1}"
+  "%vcvtsd2si\t{<round_op2>%1, %0|%0, %q1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3414,15 +4066,15 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse2_cvtsd2siq"
+(define_insn "sse2_cvtsd2siq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(unspec:DI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2 && TARGET_64BIT"
-  "%vcvtsd2si{q}\t{%1, %0|%0, %q1}"
+  "%vcvtsd2si{q}\t{<round_op2>%1, %0|%0, %q1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3444,14 +4096,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse2_cvttsd2si"
+(define_insn "sse2_cvttsd2si<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(fix:SI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_saeonly_constraint2>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE2"
-  "%vcvttsd2si\t{%1, %0|%0, %q1}"
+  "%vcvttsd2si\t{<round_saeonly_op2>%1, %0|%0, %q1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3461,14 +4113,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse2_cvttsd2siq"
+(define_insn "sse2_cvttsd2siq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(fix:DI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_saeonly_constraint2>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE2 && TARGET_64BIT"
-  "%vcvttsd2si{q}\t{%1, %0|%0, %q1}"
+  "%vcvttsd2si{q}\t{<round_saeonly_op2>%1, %0|%0, %q1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3483,20 +4135,21 @@
 (define_mode_attr si2dfmodelower
   [(V8DF "v8si") (V4DF "v4si")])
 
-(define_insn "float<si2dfmodelower><mode>2"
+(define_insn "float<si2dfmodelower><mode>2<mask_name>"
   [(set (match_operand:VF2_512_256 0 "register_operand" "=v")
 	(float:VF2_512_256 (match_operand:<si2dfmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX"
-  "vcvtdq2pd\t{%1, %0|%0, %1}"
+  "TARGET_AVX && <mask_mode512bit_condition>"
+  "vcvtdq2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "ufloatv8siv8df"
+(define_insn "ufloatv8siv8df<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v")
-	(unsigned_float:V8DF (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
+	(unsigned_float:V8DF
+	  (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvtudq2pd\t{%1, %0|%0, %1}"
+  "vcvtudq2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8DF")])
@@ -3541,12 +4194,13 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V2DF")])
 
-(define_insn "avx512f_cvtpd2dq512"
+(define_insn "<mask_codefor>avx512f_cvtpd2dq512<mask_name><round_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
-	(unspec:V8SI [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
-		     UNSPEC_FIX_NOTRUNC))]
+	(unspec:V8SI
+	  [(match_operand:V8DF 1 "nonimmediate_operand" "<round_constraint>")]
+	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2dq\t{%1, %0|%0, %1}"
+  "vcvtpd2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -3614,22 +4268,23 @@
    (set_attr "athlon_decode" "vector")
    (set_attr "bdver1_decode" "double")])
 
-(define_insn "avx512f_ufix_notruncv8dfv8si"
+(define_insn "avx512f_ufix_notruncv8dfv8si<mask_name><round_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(unspec:V8SI
-	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V8DF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2udq\t{%1, %0|%0, %1}"
+  "vcvtpd2udq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
 
-(define_insn "<fixsuffix>fix_truncv8dfv8si2"
+(define_insn "<fixsuffix>fix_truncv8dfv8si2<mask_name><round_saeonly_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
-	(any_fix:V8SI (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
+	(any_fix:V8SI
+	  (match_operand:V8DF 1 "nonimmediate_operand" "<round_saeonly_constraint>")))]
   "TARGET_AVX512F"
-  "vcvttpd2<fixsuffix>dq\t{%1, %0|%0, %1}"
+  "vcvttpd2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -3690,34 +4345,34 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "sse2_cvtsd2ss"
+(define_insn "sse2_cvtsd2ss<mask_scalar_name><round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
 	    (float_truncate:V2SF
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m,vm")))
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m,<round_constraint>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE2"
   "@
    cvtsd2ss\t{%2, %0|%0, %2}
    cvtsd2ss\t{%2, %0|%0, %q2}
-   vcvtsd2ss\t{%2, %1, %0|%0, %1, %q2}"
+   vcvtsd2ss\t{<round_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %q2<round_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "athlon_decode" "vector,double,*")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "bdver1_decode" "direct,direct,*")
+   (set_attr "prefix" "orig,orig,<mask_scalar_prefix2>")
    (set_attr "btver2_decode" "double,double,double")
-   (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "SF")])
 
-(define_insn "sse2_cvtss2sd"
+(define_insn "sse2_cvtss2sd<mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (float_extend:V2DF
 	    (vec_select:V2SF
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m,vm")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m,<round_saeonly_constraint>")
 	      (parallel [(const_int 0) (const_int 1)])))
 	  (match_operand:V2DF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
@@ -3725,22 +4380,22 @@
   "@
    cvtss2sd\t{%2, %0|%0, %2}
    cvtss2sd\t{%2, %0|%0, %k2}
-   vcvtss2sd\t{%2, %1, %0|%0, %1, %k2}"
+   vcvtss2sd\t{<round_saeonly_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %k2<round_saeonly_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "athlon_decode" "direct,direct,*")
    (set_attr "bdver1_decode" "direct,direct,*")
    (set_attr "btver2_decode" "double,double,double")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,<mask_scalar_prefix2>")
    (set_attr "mode" "DF")])
 
-(define_insn "avx512f_cvtpd2ps512"
+(define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>"
   [(set (match_operand:V8SF 0 "register_operand" "=v")
 	(float_truncate:V8SF
-	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V8DF 1 "nonimmediate_operand" "<round_constraint>")))]
   "TARGET_AVX512F"
-  "vcvtpd2ps\t{%1, %0|%0, %1}"
+  "vcvtpd2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8SF")])
@@ -3790,12 +4445,12 @@
 (define_mode_attr sf2dfmode
   [(V8DF "V8SF") (V4DF "V4SF")])
 
-(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix>"
+(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix><mask_name><round_saeonly_name>"
   [(set (match_operand:VF2_512_256 0 "register_operand" "=v")
 	(float_extend:VF2_512_256
-	  (match_operand:<sf2dfmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX"
-  "vcvtps2pd\t{%1, %0|%0, %1}"
+	  (match_operand:<sf2dfmode> 1 "nonimmediate_operand" "<round_saeonly_constraint>")))]
+  "TARGET_AVX && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
+  "vcvtps2pd\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
@@ -4119,6 +4774,32 @@
   DONE;
 })
 
+(define_expand "vec_unpacku_float_hi_v16si"
+  [(match_operand:V8DF 0 "register_operand")
+   (match_operand:V16SI 1 "register_operand")]
+  "TARGET_AVX512F"
+{
+  REAL_VALUE_TYPE TWO32r;
+  rtx k, x, tmp[4];
+
+  real_ldexp (&TWO32r, &dconst1, 32);
+  x = const_double_from_real_value (TWO32r, DFmode);
+
+  tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode));
+  tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x));
+  tmp[2] = gen_reg_rtx (V8DFmode);
+  tmp[3] = gen_reg_rtx (V8SImode);
+  k = gen_reg_rtx (QImode);
+
+  emit_insn (gen_vec_extract_hi_v16si (tmp[3], operands[1]));
+  emit_insn (gen_floatv8siv8df2 (tmp[2], tmp[3]));
+  emit_insn (gen_rtx_SET (VOIDmode, k,
+			  gen_rtx_LT (QImode, tmp[2], tmp[0])));
+  emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k));
+  emit_move_insn (operands[0], tmp[2]);
+  DONE;
+})
+
 (define_expand "vec_unpacku_float_lo_v8si"
   [(match_operand:V4DF 0 "register_operand")
    (match_operand:V8SI 1 "nonimmediate_operand")]
@@ -4144,6 +4825,30 @@
   DONE;
 })
 
+(define_expand "vec_unpacku_float_lo_v16si"
+  [(match_operand:V8DF 0 "register_operand")
+   (match_operand:V16SI 1 "nonimmediate_operand")]
+  "TARGET_AVX512F"
+{
+  REAL_VALUE_TYPE TWO32r;
+  rtx k, x, tmp[3];
+
+  real_ldexp (&TWO32r, &dconst1, 32);
+  x = const_double_from_real_value (TWO32r, DFmode);
+
+  tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode));
+  tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x));
+  tmp[2] = gen_reg_rtx (V8DFmode);
+  k = gen_reg_rtx (QImode);
+
+  emit_insn (gen_avx512f_cvtdq2pd512_2 (tmp[2], operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode, k,
+			  gen_rtx_LT (QImode, tmp[2], tmp[0])));
+  emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k));
+  emit_move_insn (operands[0], tmp[2]);
+  DONE;
+})
+
 (define_expand "vec_pack_trunc_<mode>"
   [(set (match_dup 3)
 	(float_truncate:<sf2dfmode>
@@ -4431,7 +5136,7 @@
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
 
-(define_insn "avx512f_unpckhps512"
+(define_insn "<mask_codefor>avx512f_unpckhps512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4446,7 +5151,7 @@
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
   "TARGET_AVX512F"
-  "vunpckhps\t{%2, %1, %0|%0, %1, %2}"
+  "vunpckhps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -4519,7 +5224,7 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
 
-(define_insn "avx512f_unpcklps512"
+(define_insn "<mask_codefor>avx512f_unpcklps512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4534,7 +5239,7 @@
 		     (const_int 12) (const_int 28)
 		     (const_int 13) (const_int 29)])))]
   "TARGET_AVX512F"
-  "vunpcklps\t{%2, %1, %0|%0, %1, %2}"
+  "vunpcklps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -4642,7 +5347,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V4SF")])
 
-(define_insn "avx512f_movshdup512"
+(define_insn "<mask_codefor>avx512f_movshdup512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4657,7 +5362,7 @@
 		     (const_int 13) (const_int 13)
 		     (const_int 15) (const_int 15)])))]
   "TARGET_AVX512F"
-  "vmovshdup\t{%1, %0|%0, %1}"
+  "vmovshdup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -4695,7 +5400,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V4SF")])
 
-(define_insn "avx512f_movsldup512"
+(define_insn "<mask_codefor>avx512f_movsldup512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4710,7 +5415,7 @@
 		     (const_int 12) (const_int 12)
 		     (const_int 14) (const_int 14)])))]
   "TARGET_AVX512F"
-  "vmovsldup\t{%1, %0|%0, %1}"
+  "vmovsldup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -5244,8 +5949,71 @@
   operands[1] = adjust_address (operands[1], SFmode, INTVAL (operands[2]) * 4);
 })
 
-(define_insn "avx512f_vextract<shuffletype>32x4_1"
-  [(set (match_operand:<ssequartermode> 0 "nonimmediate_operand" "=vm")
+(define_expand "avx512f_vextract<shuffletype>32x4_mask"
+  [(match_operand:<ssequartermode> 0 "nonimmediate_operand")
+   (match_operand:V16FI 1 "register_operand")
+   (match_operand:SI 2 "const_0_to_3_operand")
+   (match_operand:<ssequartermode> 3 "nonimmediate_operand")
+   (match_operand:QI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
+    operands[0] = force_reg (<ssequartermode>mode, operands[0]);
+  switch (INTVAL (operands[2]))
+    {
+    case 0:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (0), GEN_INT (1), GEN_INT (2),
+          GEN_INT (3), operands[3], operands[4]));
+      break;
+    case 1:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (4), GEN_INT (5), GEN_INT (6),
+          GEN_INT (7), operands[3], operands[4]));
+      break;
+    case 2:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (8), GEN_INT (9), GEN_INT (10),
+          GEN_INT (11), operands[3], operands[4]));
+      break;
+    case 3:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (12), GEN_INT (13), GEN_INT (14),
+          GEN_INT (15), operands[3], operands[4]));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  DONE;
+})
+
+(define_insn "avx512f_vextract<shuffletype>32x4_1_maskm"
+  [(set (match_operand:<ssequartermode> 0 "memory_operand" "=m")
+	(vec_merge:<ssequartermode>
+	  (vec_select:<ssequartermode>
+	    (match_operand:V16FI 1 "register_operand" "v")
+	    (parallel [(match_operand 2  "const_0_to_15_operand")
+	      (match_operand 3  "const_0_to_15_operand")
+	      (match_operand 4  "const_0_to_15_operand")
+	      (match_operand 5  "const_0_to_15_operand")]))
+	  (match_operand:<ssequartermode> 6 "memory_operand" "0")
+	  (match_operand:QI 7 "register_operand" "k")))]
+  "TARGET_AVX512F && (INTVAL (operands[2]) = INTVAL (operands[3]) - 1)
+  && (INTVAL (operands[3]) = INTVAL (operands[4]) - 1)
+  && (INTVAL (operands[4]) = INTVAL (operands[5]) - 1)"
+{
+  operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2);
+  return "vextract<shuffletype>32x4\t{%2, %1, %0%{%7%}|%0%{%7%}, %1, %2}";
+}
+  [(set_attr "type" "sselog")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>"
+  [(set (match_operand:<ssequartermode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
 	(vec_select:<ssequartermode>
 	  (match_operand:V16FI 1 "register_operand" "v")
 	  (parallel [(match_operand 2  "const_0_to_15_operand")
@@ -5257,7 +6025,7 @@
   && (INTVAL (operands[4]) = INTVAL (operands[5]) - 1)"
 {
   operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2);
-  return "vextract<shuffletype>32x4\t{%2, %1, %0|%0, %1, %2}";
+  return "vextract<shuffletype>32x4\t{%2, %1, %0<mask_operand6>|%0<mask_operand6>, %1, %2}";
 }
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
@@ -5269,6 +6037,35 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_expand "avx512f_vextract<shuffletype>64x4_mask"
+  [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
+   (match_operand:V8FI 1 "register_operand")
+   (match_operand:SI 2 "const_0_to_1_operand")
+   (match_operand:<ssehalfvecmode> 3 "nonimmediate_operand")
+   (match_operand:QI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  rtx (*insn)(rtx, rtx, rtx, rtx);
+
+  if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
+    operands[0] = force_reg (<ssequartermode>mode, operands[0]);
+
+  switch (INTVAL (operands[2]))
+    {
+    case 0:
+      insn = gen_vec_extract_lo_<mode>_mask;
+      break;
+    case 1:
+      insn = gen_vec_extract_hi_<mode>_mask;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (insn (operands[0], operands[1], operands[3], operands[4]));
+  DONE;
+})
+
 (define_split
   [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
 	(vec_select:<ssehalfvecmode>
@@ -5288,14 +6085,36 @@
   DONE;
 })
 
-(define_insn "vec_extract_lo_<mode>"
-  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
+(define_insn "vec_extract_lo_<mode>_maskm"
+  [(set (match_operand:<ssehalfvecmode> 0 "memory_operand" "=m")
+	(vec_merge:<ssehalfvecmode>
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V8FI 1 "register_operand" "v")
+	    (parallel [(const_int 0) (const_int 1)
+	      (const_int 2) (const_int 3)]))
+	  (match_operand:<ssehalfvecmode> 2 "memory_operand" "0")
+	  (match_operand:QI 3 "register_operand" "k")))]
+  "TARGET_AVX512F"
+"vextract<shuffletype>64x4\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "vec_extract_lo_<mode><mask_name>"
+  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V8FI 1 "nonimmediate_operand" "vm")
 	  (parallel [(const_int 0) (const_int 1)
             (const_int 2) (const_int 3)])))]
   "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
-  "#"
+{
+  if (<mask_applied>)
+    return "vextract<shuffletype>64x4\t{$0x0, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x0}";
+  else
+    return "#";
+}
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -5306,14 +6125,32 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "vec_extract_hi_<mode>"
-  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
+(define_insn "vec_extract_hi_<mode>_maskm"
+  [(set (match_operand:<ssehalfvecmode> 0 "memory_operand" "=m")
+	(vec_merge:<ssehalfvecmode>
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V8FI 1 "register_operand" "v")
+	    (parallel [(const_int 4) (const_int 5)
+	      (const_int 6) (const_int 7)]))
+	  (match_operand:<ssehalfvecmode> 2 "memory_operand" "0")
+	  (match_operand:QI 3 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vextract<shuffletype>64x4\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "vec_extract_hi_<mode><mask_name>"
+  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V8FI 1 "register_operand" "v")
 	  (parallel [(const_int 4) (const_int 5)
             (const_int 6) (const_int 7)])))]
   "TARGET_AVX512F"
-  "vextract<shuffletype>64x4\t{$0x1, %1, %0|%0, %1, 0x1}"
+  "vextract<shuffletype>64x4\t{$0x1, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x1}"
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -5665,7 +6502,7 @@
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
-(define_insn "avx512f_unpckhpd512"
+(define_insn "<mask_codefor>avx512f_unpckhpd512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5676,7 +6513,7 @@
 		     (const_int 5) (const_int 13)
 		     (const_int 7) (const_int 15)])))]
   "TARGET_AVX512F"
-  "vunpckhpd\t{%2, %1, %0|%0, %1, %2}"
+  "vunpckhpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8DF")])
@@ -5761,7 +6598,7 @@
    (set_attr "prefix" "orig,vex,maybe_vex,orig,vex,maybe_vex")
    (set_attr "mode" "V2DF,V2DF,DF,V1DF,V1DF,V1DF")])
 
-(define_expand "avx512f_movddup512"
+(define_expand "avx512f_movddup512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5773,7 +6610,7 @@
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F")
 
-(define_expand "avx512f_unpcklpd512"
+(define_expand "avx512f_unpcklpd512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5785,7 +6622,7 @@
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F")
 
-(define_insn "*avx512f_unpcklpd512"
+(define_insn "*avx512f_unpcklpd512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v,v")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5797,8 +6634,8 @@
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F"
   "@
-   vunpcklpd\t{%2, %1, %0|%0, %1, %2}
-   vmovddup\t{%1, %0|%0, %1}"
+   vunpcklpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}
+   vmovddup\t{%1, %0<mask_operand3>|%0<mask_operand3>, %1}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8DF")])
@@ -5935,30 +6772,47 @@
   operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8);
 })
 
-(define_insn "avx512f_vmscalef<mode>"
+(define_insn "<mask_scalar_codefor>avx512f_vmscalef<mode><mask_scalar_name><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
-	  (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v")
-			  (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
-			 UNSPEC_SCALEF)
+	  (unspec:VF_128
+	    [(match_operand:VF_128 1 "register_operand" "v")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")]
+	    UNSPEC_SCALEF)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "%vscalef<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vscalef<ssescalarmodesuffix>\t{<round_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2<round_mask_scalar_op3>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
-(define_insn "avx512f_scalef<mode>"
+(define_insn "avx512f_scalef<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
-	(unspec:VF_512 [(match_operand:VF_512 1 "register_operand" "v")
-			(match_operand:VF_512 2 "nonimmediate_operand" "vm")]
-		       UNSPEC_SCALEF))]
+	(unspec:VF_512
+	  [(match_operand:VF_512 1 "register_operand" "v")
+	   (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")]
+	  UNSPEC_SCALEF))]
   "TARGET_AVX512F"
-  "%vscalef<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vscalef<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<MODE>")])
 
-(define_insn "avx512f_vternlog<mode>"
+(define_expand "avx512f_vternlog<mode>_maskz"
+  [(match_operand:VI48_512 0 "register_operand")
+   (match_operand:VI48_512 1 "register_operand")
+   (match_operand:VI48_512 2 "register_operand")
+   (match_operand:VI48_512 3 "nonimmediate_operand")
+   (match_operand:SI 4 "const_0_to_255_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_vternlog<mode>_maskz_1 (
+    operands[0], operands[1], operands[2], operands[3],
+    operands[4], CONST0_RTX (<MODE>mode), operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_vternlog<mode><sd_maskz_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(unspec:VI48_512
 	  [(match_operand:VI48_512 1 "register_operand" "0")
@@ -5967,103 +6821,220 @@
 	   (match_operand:SI 4 "const_0_to_255_operand")]
 	  UNSPEC_VTERNLOG))]
   "TARGET_AVX512F"
-  "vpternlog<ssemodesuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}"
+  "vpternlog<ssemodesuffix>\t{%4, %3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3, %4}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_vternlog<mode>_mask"
+  [(set (match_operand:VI48_512 0 "register_operand" "=v")
+	(vec_merge:VI48_512
+	  (unspec:VI48_512
+	    [(match_operand:VI48_512 1 "register_operand" "0")
+	     (match_operand:VI48_512 2 "register_operand" "v")
+	     (match_operand:VI48_512 3 "nonimmediate_operand" "vm")
+	     (match_operand:SI 4 "const_0_to_255_operand")]
+	    UNSPEC_VTERNLOG)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vpternlog<ssemodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getexp<mode>"
+(define_insn "avx512f_getexp<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
-        (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+        (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "<round_saeonly_constraint>")]
                         UNSPEC_GETEXP))]
    "TARGET_AVX512F"
-   "vgetexp<ssemodesuffix>\t{%1, %0|%0, %1}";
+   "vgetexp<ssemodesuffix>\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_sgetexp<mode>"
+(define_insn "avx512f_sgetexp<mode><mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
-	  (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v")
-			  (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
-			 UNSPEC_GETEXP)
+	  (unspec:VF_128
+	    [(match_operand:VF_128 1 "register_operand" "v")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")]
+	    UNSPEC_GETEXP)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetexp<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+   "vgetexp<ssescalarmodesuffix>\t{<round_saeonly_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2<round_saeonly_mask_scalar_op3>}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_align<mode>"
+(define_insn "<mask_codefor>avx512f_align<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
         (unspec:VI48_512 [(match_operand:VI48_512 1 "register_operand" "v")
 			  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")
 			  (match_operand:SI 3 "const_0_to_255_operand")]
 			 UNSPEC_ALIGN))]
   "TARGET_AVX512F"
-  "valign<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  "valign<ssemodesuffix>\t{%3, %2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2, %3}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_fixupimm<mode>"
+(define_expand "avx512f_shufps512_mask"
+  [(match_operand:V16SF 0 "register_operand")
+   (match_operand:V16SF 1 "register_operand")
+   (match_operand:V16SF 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V16SF 4 "register_operand")
+   (match_operand:HI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shufps512_1_mask (operands[0], operands[1], operands[2],
+					  GEN_INT ((mask >> 0) & 3),
+					  GEN_INT ((mask >> 2) & 3),
+					  GEN_INT (((mask >> 4) & 3) + 16),
+					  GEN_INT (((mask >> 6) & 3) + 16),
+					  GEN_INT (((mask >> 0) & 3) + 4),
+					  GEN_INT (((mask >> 2) & 3) + 4),
+					  GEN_INT (((mask >> 4) & 3) + 20),
+					  GEN_INT (((mask >> 6) & 3) + 20),
+					  GEN_INT (((mask >> 0) & 3) + 8),
+					  GEN_INT (((mask >> 2) & 3) + 8),
+					  GEN_INT (((mask >> 4) & 3) + 24),
+					  GEN_INT (((mask >> 6) & 3) + 24),
+					  GEN_INT (((mask >> 0) & 3) + 12),
+					  GEN_INT (((mask >> 2) & 3) + 12),
+					  GEN_INT (((mask >> 4) & 3) + 28),
+					  GEN_INT (((mask >> 6) & 3) + 28),
+					  operands[4], operands[5]));
+  DONE;
+})
+
+
+(define_expand "avx512f_fixupimm<mode>_maskz<round_saeonly_expand_name5>"
+  [(match_operand:VF_512 0 "register_operand")
+   (match_operand:VF_512 1 "register_operand")
+   (match_operand:VF_512 2 "register_operand")
+   (match_operand:<ssefixupmode> 3 "<round_saeonly_expand_predicate5>")
+   (match_operand:SI 4 "const_0_to_255_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_fixupimm<mode>_maskz_1<round_saeonly_expand_name5> (
+	operands[0], operands[1], operands[2], operands[3],
+	operands[4], CONST0_RTX (<MODE>mode), operands[5]
+	<round_saeonly_expand_operand6>));
+  DONE;
+})
+
+(define_insn "avx512f_fixupimm<mode><sd_maskz_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
         (unspec:VF_512
           [(match_operand:VF_512 1 "register_operand" "0")
 	   (match_operand:VF_512 2 "register_operand" "v")
-           (match_operand:<ssefixupmode> 3 "nonimmediate_operand" "vm")
+           (match_operand:<ssefixupmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
            (match_operand:SI 4 "const_0_to_255_operand")]
            UNSPEC_FIXUPIMM))]
   "TARGET_AVX512F"
-  "vfixupimm<ssemodesuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}";
+  "vfixupimm<ssemodesuffix>\t{%4, <round_saeonly_sd_mask_op5>%3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3<round_saeonly_sd_mask_op5>, %4}";
+  [(set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fixupimm<mode>_mask<round_saeonly_name>"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+          (unspec:VF_512
+            [(match_operand:VF_512 1 "register_operand" "0")
+	     (match_operand:VF_512 2 "register_operand" "v")
+             (match_operand:<ssefixupmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
+             (match_operand:SI 4 "const_0_to_255_operand")]
+             UNSPEC_FIXUPIMM)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfixupimm<ssemodesuffix>\t{%4, <round_saeonly_op6>%3, %2, %0%{%5%}|%0%{%5%}, %2, %3<round_saeonly_op6>, %4}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_sfixupimm<mode>"
+(define_expand "avx512f_sfixupimm<mode>_maskz<round_saeonly_expand_name5>"
+  [(match_operand:VF_128 0 "register_operand")
+   (match_operand:VF_128 1 "register_operand")
+   (match_operand:VF_128 2 "register_operand")
+   (match_operand:<ssefixupmode> 3 "<round_saeonly_expand_predicate5>")
+   (match_operand:SI 4 "const_0_to_255_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_sfixupimm<mode>_maskz_1<round_saeonly_expand_name5> (
+	operands[0], operands[1], operands[2], operands[3],
+	operands[4], CONST0_RTX (<MODE>mode), operands[5]
+	<round_saeonly_expand_operand6>));
+  DONE;
+})
+
+(define_insn "avx512f_sfixupimm<mode><sd_maskz_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
           (unspec:VF_128
             [(match_operand:VF_128 1 "register_operand" "0")
 	     (match_operand:VF_128 2 "register_operand" "v")
-	     (match_operand:<ssefixupmode> 3 "nonimmediate_operand" "vm")
+	     (match_operand:<ssefixupmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 4 "const_0_to_255_operand")]
 	    UNSPEC_FIXUPIMM)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vfixupimm<ssescalarmodesuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}";
+   "vfixupimm<ssescalarmodesuffix>\t{%4, <round_saeonly_sd_mask_op5>%3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3<round_saeonly_sd_mask_op5>, %4}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_rndscale<mode>"
+(define_insn "avx512f_sfixupimm<mode>_mask<round_saeonly_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (unspec:VF_128
+	       [(match_operand:VF_128 1 "register_operand" "0")
+		(match_operand:VF_128 2 "register_operand" "v")
+		(match_operand:<ssefixupmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
+		(match_operand:SI 4 "const_0_to_255_operand")]
+	       UNSPEC_FIXUPIMM)
+	    (match_dup 1)
+	    (const_int 1))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfixupimm<ssescalarmodesuffix>\t{%4, <round_saeonly_op6>%3, %2, %0%{%5%}|%0%{%5%}, %2, %3<round_saeonly_op6>, %4}";
+  [(set_attr "prefix" "evex")
+   (set_attr "mode" "<ssescalarmode>")])
+
+(define_insn "avx512f_rndscale<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_AVX512F"
-  "vrndscale<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrndscale<ssemodesuffix>\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_rndscale<mode>"
+(define_insn "<mask_scalar_codefor>avx512f_rndscale<mode><mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_255_operand")]
 	    UNSPEC_ROUND)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_mask_scalar_op4>%2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2<round_saeonly_mask_scalar_op4>, %3}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
 ;; One bit in mask selects 2 elements.
-(define_insn "avx512f_shufps512_1"
+(define_insn "avx512f_shufps512_1<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -6106,14 +7077,37 @@
   mask |= (INTVAL (operands[6]) - 16) << 6;
   operands[3] = GEN_INT (mask);
 
-  return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshufps\t{%3, %2, %1, %0<mask_operand19>|%0<mask_operand19>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
 
-(define_insn "avx512f_shufpd512_1"
+(define_expand "avx512f_shufpd512_mask"
+  [(match_operand:V8DF 0 "register_operand")
+   (match_operand:V8DF 1 "register_operand")
+   (match_operand:V8DF 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V8DF 4 "register_operand")
+   (match_operand:QI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shufpd512_1_mask (operands[0], operands[1], operands[2],
+					GEN_INT (mask & 1),
+					GEN_INT (mask & 2 ? 9 : 8),
+					GEN_INT (mask & 4 ? 3 : 2),
+					GEN_INT (mask & 8 ? 11 : 10),
+					GEN_INT (mask & 16 ? 5 : 4),
+					GEN_INT (mask & 32 ? 13 : 12),
+					GEN_INT (mask & 64 ? 7 : 6),
+					GEN_INT (mask & 128 ? 15 : 14),
+					operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_shufpd512_1<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -6140,7 +7134,7 @@
   mask |= (INTVAL (operands[10]) - 14) << 7;
   operands[3] = GEN_INT (mask);
 
-  return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshufpd\t{%3, %2, %1, %0<mask_operand11>|%0<mask_operand11>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
@@ -6220,7 +7214,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_highv8di"
+(define_insn "<mask_codefor>avx512f_interleave_highv8di<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(vec_select:V8DI
 	  (vec_concat:V16DI
@@ -6231,7 +7225,7 @@
 		     (const_int 5) (const_int 13)
 		     (const_int 7) (const_int 15)])))]
   "TARGET_AVX512F"
-  "vpunpckhqdq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpckhqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -6270,7 +7264,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_lowv8di"
+(define_insn "<mask_codefor>avx512f_interleave_lowv8di<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(vec_select:V8DI
 	  (vec_concat:V16DI
@@ -6281,7 +7275,7 @@
 		     (const_int 4) (const_int 12)
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F"
-  "vpunpcklqdq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpcklqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -6652,6 +7646,20 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_<code><pmov_src_lower><mode>2_mask"
+  [(set (match_operand:PMOV_DST_MODE 0 "nonimmediate_operand" "=v,m")
+    (vec_merge:PMOV_DST_MODE
+      (any_truncate:PMOV_DST_MODE
+        (match_operand:<pmov_src_mode> 1 "register_operand" "v,v"))
+      (match_operand:PMOV_DST_MODE 2 "vector_move_operand" "0C,0")
+      (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "vpmov<trunsuffix><pmov_suff>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "memory" "none,store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "*avx512f_<code>v8div16qi2"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 	(vec_concat:V16QI
@@ -6685,6 +7693,55 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "TI")])
 
+(define_insn "avx512f_<code>v8div16qi2_mask"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+    (vec_concat:V16QI
+      (vec_merge:V8QI
+        (any_truncate:V8QI
+          (match_operand:V8DI 1 "register_operand" "v"))
+        (vec_select:V8QI
+          (match_operand:V16QI 2 "vector_move_operand" "0C")
+          (parallel [(const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)]))
+        (match_operand:QI 3 "register_operand" "k"))
+      (const_vector:V8QI [(const_int 0) (const_int 0)
+                          (const_int 0) (const_int 0)
+                          (const_int 0) (const_int 0)
+                          (const_int 0) (const_int 0)])))]
+  "TARGET_AVX512F"
+  "vpmov<trunsuffix>qb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+(define_insn "*avx512f_<code>v8div16qi2_store_mask"
+  [(set (match_operand:V16QI 0 "memory_operand" "=m")
+    (vec_concat:V16QI
+      (vec_merge:V8QI
+        (any_truncate:V8QI
+          (match_operand:V8DI 1 "register_operand" "v"))
+        (vec_select:V8QI
+          (match_dup 0)
+          (parallel [(const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)]))
+        (match_operand:QI 2 "register_operand" "k"))
+      (vec_select:V8QI
+        (match_dup 0)
+        (parallel [(const_int 8) (const_int 9)
+                   (const_int 10) (const_int 11)
+                   (const_int 12) (const_int 13)
+                   (const_int 14) (const_int 15)]))))]
+  "TARGET_AVX512F"
+  "vpmov<trunsuffix>qb\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel integral arithmetic
@@ -6699,27 +7756,27 @@
   "TARGET_SSE2"
   "operands[2] = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));")
 
-(define_expand "<plusminus_insn><mode>3"
+(define_expand "<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VI_AVX2 0 "register_operand")
 	(plusminus:VI_AVX2
 	  (match_operand:VI_AVX2 1 "nonimmediate_operand")
 	  (match_operand:VI_AVX2 2 "nonimmediate_operand")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*<plusminus_insn><mode>3"
+(define_insn "*<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
 	(plusminus:VI_AVX2
 	  (match_operand:VI_AVX2 1 "nonimmediate_operand" "<comm>0,v")
 	  (match_operand:VI_AVX2 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+  "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
-   vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseiadd")
    (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "<sse2_avx2>_<plusminus_insn><mode>3"
@@ -6809,7 +7866,7 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "vec_widen_umult_even_v16si"
+(define_expand "vec_widen_umult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand")
         (mult:V8DI
           (zero_extend:V8DI
@@ -6829,7 +7886,7 @@
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
 
-(define_insn "*vec_widen_umult_even_v16si"
+(define_insn "*vec_widen_umult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
         (mult:V8DI
           (zero_extend:V8DI
@@ -6847,7 +7904,7 @@
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
   "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)"
-  "vpmuludq\t{%2, %1, %0|%0, %1, %2}"
+  "vpmuludq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
@@ -6924,7 +7981,7 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
-(define_expand "vec_widen_smult_even_v16si"
+(define_expand "vec_widen_smult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand")
         (mult:V8DI
           (sign_extend:V8DI
@@ -6944,7 +8001,7 @@
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
 
-(define_insn "*vec_widen_smult_even_v16si"
+(define_insn "*vec_widen_smult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=x")
         (mult:V8DI
           (sign_extend:V8DI
@@ -6962,7 +8019,7 @@
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
   "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)"
-  "vpmuldq\t{%2, %1, %0|%0, %1, %2}"
+  "vpmuldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
@@ -7173,12 +8230,12 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
-(define_expand "mul<mode>3"
+(define_expand "mul<mode>3<mask_name>"
   [(set (match_operand:VI4_AVX512F 0 "register_operand")
 	(mult:VI4_AVX512F
 	  (match_operand:VI4_AVX512F 1 "general_vector_operand")
 	  (match_operand:VI4_AVX512F 2 "general_vector_operand")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
 {
   if (TARGET_SSE4_1)
     {
@@ -7195,19 +8252,19 @@
     }
 })
 
-(define_insn "*<sse4_1_avx2>_mul<mode>3"
+(define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
   [(set (match_operand:VI4_AVX512F 0 "register_operand" "=x,v")
 	(mult:VI4_AVX512F
 	  (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands)"
+  "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    pmulld\t{%2, %0|%0, %2}
-   vpmulld\t{%2, %1, %0|%0, %1, %2}"
+   vpmulld\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "btver2_decode" "vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -7320,6 +8377,20 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "ashr<mode>3<mask_name>"
+  [(set (match_operand:VI48_512 0 "register_operand" "=v,v")
+	(ashiftrt:VI48_512
+	  (match_operand:VI48_512 1 "nonimmediate_operand" "v,vm")
+	  (match_operand:SI 2 "nonmemory_operand" "v,N")))]
+  "TARGET_AVX512F && <mask_mode512bit_condition>"
+  "vpsra<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+  [(set_attr "type" "sseishft")
+   (set (attr "length_immediate")
+     (if_then_else (match_operand 2 "const_int_operand")
+       (const_string "1")
+       (const_string "0")))
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "<shift_insn><mode>3"
   [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x")
 	(any_lshift:VI248_AVX2
@@ -7339,13 +8410,13 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<shift_insn><mode>3"
+(define_insn "<shift_insn><mode>3<mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v,v")
 	(any_lshift:VI48_512
 	  (match_operand:VI48_512 1 "register_operand" "v,m")
 	  (match_operand:SI 2 "nonmemory_operand" "vN,N")))]
-  "TARGET_AVX512F"
-  "vp<vshift><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX512F && <mask_mode512bit_condition>"
+  "vp<vshift><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
    (set_attr "type" "sseishft")
    (set (attr "length_immediate")
@@ -7355,6 +8426,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+
 (define_expand "vec_shl_<mode>"
   [(set (match_operand:VI_128 0 "register_operand")
 	(ashift:V1TI
@@ -7430,41 +8502,42 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_<rotate>v<mode>"
+(define_insn "avx512f_<rotate>v<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(any_rotate:VI48_512
 	  (match_operand:VI48_512 1 "register_operand" "v")
 	  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vp<rotate>v<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vp<rotate>v<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_<rotate><mode>"
+(define_insn "avx512f_<rotate><mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(any_rotate:VI48_512
 	  (match_operand:VI48_512 1 "nonimmediate_operand" "vm")
 	  (match_operand:SI 2 "const_0_to_255_operand")))]
   "TARGET_AVX512F"
-  "vp<rotate><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vp<rotate><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "<code><mode>3"
+(define_expand "<code><mode>3<mask_name><round_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand")
 	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand")))]
-  "TARGET_AVX2"
+  "TARGET_AVX2 && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*avx2_<code><mode>3"
+(define_insn "*avx2_<code><mode>3<mask_name><round_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand" "=v")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand" "%v")
-	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
-  "vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "<round_constraint>")))]
+  "TARGET_AVX2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
+   && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "vp<maxmin_int><ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "type" "sseiadd")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_evex")
@@ -7682,7 +8755,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_expand "avx512f_eq<mode>3"
+(define_expand "avx512f_eq<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48_512 1 "register_operand")
@@ -7691,14 +8764,14 @@
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (EQ, <MODE>mode, operands);")
 
-(define_insn "avx512f_eq<mode>3_1"
+(define_insn "avx512f_eq<mode>3<mask_scalar_merge_name>_1"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48_512 1 "register_operand" "%v")
 	   (match_operand:VI48_512 2 "nonimmediate_operand" "vm")]
 	  UNSPEC_MASKED_EQ))]
   "TARGET_AVX512F && ix86_binary_operator_ok (EQ, <MODE>mode, operands)"
-  "vpcmpeq<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
   [(set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "evex")
@@ -7778,13 +8851,13 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_gt<mode>3"
+(define_insn "avx512f_gt<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48_512 1 "register_operand" "v")
 	   (match_operand:VI48_512 2 "nonimmediate_operand" "vm")] UNSPEC_MASKED_GT))]
   "TARGET_AVX512F"
-  "vpcmpgt<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vpcmpgt<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
   [(set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "evex")
@@ -8000,19 +9073,19 @@
   operands[2] = force_reg (<MODE>mode, gen_rtx_CONST_VECTOR (<MODE>mode, v));
 })
 
-(define_expand "<sse2_avx2>_andnot<mode>3"
+(define_expand "<sse2_avx2>_andnot<mode>3<mask_name>"
   [(set (match_operand:VI_AVX2 0 "register_operand")
 	(and:VI_AVX2
 	  (not:VI_AVX2 (match_operand:VI_AVX2 1 "register_operand"))
 	  (match_operand:VI_AVX2 2 "nonimmediate_operand")))]
-  "TARGET_SSE2")
+  "TARGET_SSE2 && <mask_mode512bit_condition>")
 
-(define_insn "*andnot<mode>3"
+(define_insn "*andnot<mode>3<mask_name>"
   [(set (match_operand:VI 0 "register_operand" "=x,v")
 	(and:VI
 	  (not:VI (match_operand:VI 1 "register_operand" "0,v"))
 	  (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE "
+  "TARGET_SSE  && <mask_mode512bit_condition>"
 {
   static char buf[64];
   const char *ops;
@@ -8052,7 +9125,7 @@
       ops = "%s\t{%%2, %%0|%%0, %%2}";
       break;
     case 1:
-      ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
+      ops = "v%s\t{%%2, %%1, %%0<mask_operand3_1>|%%0<mask_operand3_1>, %%1, %%2}";
       break;
     default:
       gcc_unreachable ();
@@ -8069,7 +9142,7 @@
 	    (eq_attr "mode" "TI"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set (attr "mode")
 	(cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
 		 (const_string "<ssePSmode>")
@@ -8097,12 +9170,12 @@
   DONE;
 })
 
-(define_insn "*<code><mode>3"
+(define_insn "<mask_codefor><code><mode>3<mask_name>"
   [(set (match_operand:VI 0 "register_operand" "=x,v")
 	(any_logic:VI
 	  (match_operand:VI 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE
+  "TARGET_SSE && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
   static char buf[64];
@@ -8145,7 +9218,7 @@
       ops = "%s\t{%%2, %%0|%%0, %%2}";
       break;
     case 1:
-      ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
+      ops = "v%s\t{%%2, %%1, %%0<mask_operand3_1>|%%0<mask_operand3_1>, %%1, %%2}";
       break;
     default:
       gcc_unreachable ();
@@ -8162,7 +9235,7 @@
 	    (eq_attr "mode" "TI"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set (attr "mode")
 	(cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
 		 (const_string "<ssePSmode>")
@@ -8179,25 +9252,25 @@
 	      ]
 	      (const_string "<sseinsnmode>")))])
 
-(define_insn "avx512f_testm<mode>3"
+(define_insn "avx512f_testm<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	 [(match_operand:VI48_512 1 "register_operand" "v")
 	  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")]
 	 UNSPEC_TESTM))]
   "TARGET_AVX512F"
-  "vptestm<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vptestm<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<sseinsnmode>")])
 
-(define_insn "avx512f_testnm<mode>3"
+(define_insn "avx512f_testnm<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	 [(match_operand:VI48_512 1 "register_operand" "v")
 	  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")]
 	 UNSPEC_TESTNM))]
   "TARGET_AVX512CD"
-  "%vptestnm<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vptestnm<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<sseinsnmode>")])
 
@@ -8470,7 +9543,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_highv16si"
+(define_insn "<mask_codefor>avx512f_interleave_highv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(vec_select:V16SI
 	  (vec_concat:V32SI
@@ -8485,7 +9558,7 @@
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
   "TARGET_AVX512F"
-  "vpunpckhdq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpckhdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -8525,7 +9598,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_lowv16si"
+(define_insn "<mask_codefor>avx512f_interleave_lowv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(vec_select:V16SI
 	  (vec_concat:V32SI
@@ -8540,7 +9613,7 @@
 		     (const_int 12) (const_int 28)
 		     (const_int 13) (const_int 29)])))]
   "TARGET_AVX512F"
-  "vpunpckldq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpckldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -8663,7 +9736,45 @@
    (set_attr "prefix" "orig,orig,vex,vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_vinsert<shuffletype>32x4_1"
+(define_expand "avx512f_vinsert<shuffletype>32x4_mask"
+  [(match_operand:V16FI 0 "register_operand")
+   (match_operand:V16FI 1 "register_operand")
+   (match_operand:<ssequartermode> 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_3_operand")
+   (match_operand:V16FI 4 "register_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  switch (INTVAL (operands[3]))
+    {
+    case 0:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xFFF), operands[4],
+	  operands[5]));
+      break;
+    case 1:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xF0FF), operands[4],
+	  operands[5]));
+      break;
+    case 2:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xFF0F), operands[4],
+	  operands[5]));
+      break;
+    case 3:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xFFF0), operands[4],
+	  operands[5]));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  DONE;
+
+})
+
+(define_insn "<mask_codefor>avx512f_vinsert<shuffletype>32x4_1<mask_name>"
   [(set (match_operand:V16FI 0 "register_operand" "=v")
 	(vec_merge:V16FI
 	  (match_operand:V16FI 1 "register_operand" "v")
@@ -8686,14 +9797,35 @@
 
   operands[3] = GEN_INT (mask);
 
-  return "vinsert<shuffletype>32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vinsert<shuffletype>32x4\t{%3, %2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "vec_set_lo_<mode>"
+(define_expand "avx512f_vinsert<shuffletype>64x4_mask"
+  [(match_operand:V8FI 0 "register_operand")
+   (match_operand:V8FI 1 "register_operand")
+   (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_1_operand")
+   (match_operand:V8FI 4 "register_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  if (mask == 0)
+    emit_insn (gen_vec_set_lo_<mode>_mask
+      (operands[0], operands[1], operands[2],
+       operands[4], operands[5]));
+  else
+    emit_insn (gen_vec_set_hi_<mode>_mask
+      (operands[0], operands[1], operands[2],
+       operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "vec_set_lo_<mode><mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
 	(vec_concat:V8FI
 	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
@@ -8702,13 +9834,13 @@
 	    (parallel [(const_int 4) (const_int 5)
               (const_int 6) (const_int 7)]))))]
   "TARGET_AVX512F"
-  "vinsert<shuffletype>64x4\t{$0x0, %2, %1, %0|%0, %1, %2, $0x0}"
+  "vinsert<shuffletype>64x4\t{$0x0, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, $0x0}"
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "vec_set_hi_<mode>"
+(define_insn "vec_set_hi_<mode><mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
 	(vec_concat:V8FI
 	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
@@ -8717,13 +9849,37 @@
 	    (parallel [(const_int 0) (const_int 1)
               (const_int 2) (const_int 3)]))))]
   "TARGET_AVX512F"
-  "vinsert<shuffletype>64x4\t{$0x1, %2, %1, %0|%0, %1, %2, $0x1}"
+  "vinsert<shuffletype>64x4\t{$0x1, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, $0x1}"
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "avx512f_shuf_<shuffletype>64x2_1"
+(define_expand "avx512f_shuf_<shuffletype>64x2_mask"
+  [(match_operand:V8FI 0 "register_operand")
+   (match_operand:V8FI 1 "register_operand")
+   (match_operand:V8FI 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V8FI 4 "register_operand")
+   (match_operand:QI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shuf_<shuffletype>64x2_1_mask
+      (operands[0], operands[1], operands[2],
+       GEN_INT (((mask >> 0) & 3) * 2),
+       GEN_INT (((mask >> 0) & 3) * 2 + 1),
+       GEN_INT (((mask >> 2) & 3) * 2),
+       GEN_INT (((mask >> 2) & 3) * 2 + 1),
+       GEN_INT (((mask >> 4) & 3) * 2 + 8),
+       GEN_INT (((mask >> 4) & 3) * 2 + 9),
+       GEN_INT (((mask >> 6) & 3) * 2 + 8),
+       GEN_INT (((mask >> 6) & 3) * 2 + 9),
+       operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_shuf_<shuffletype>64x2_1<mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
 	(vec_select:V8FI
 	  (vec_concat:<ssedoublemode>
@@ -8750,14 +9906,46 @@
   mask |= (INTVAL (operands[9]) - 8) / 2 << 6;
   operands[3] = GEN_INT (mask);
 
-  return "vshuf<shuffletype>64x2\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshuf<shuffletype>64x2\t{%3, %2, %1, %0<mask_operand11>|%0<mask_operand11>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_shuf_<shuffletype>32x4_1"
+(define_expand "avx512f_shuf_<shuffletype>32x4_mask"
+  [(match_operand:V16FI 0 "register_operand")
+   (match_operand:V16FI 1 "register_operand")
+   (match_operand:V16FI 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V16FI 4 "register_operand")
+   (match_operand:HI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shuf_<shuffletype>32x4_1_mask
+      (operands[0], operands[1], operands[2],
+       GEN_INT (((mask >> 0) & 3) * 4),
+       GEN_INT (((mask >> 0) & 3) * 4 + 1),
+       GEN_INT (((mask >> 0) & 3) * 4 + 2),
+       GEN_INT (((mask >> 0) & 3) * 4 + 3),
+       GEN_INT (((mask >> 2) & 3) * 4),
+       GEN_INT (((mask >> 2) & 3) * 4 + 1),
+       GEN_INT (((mask >> 2) & 3) * 4 + 2),
+       GEN_INT (((mask >> 2) & 3) * 4 + 3),
+       GEN_INT (((mask >> 4) & 3) * 4 + 16),
+       GEN_INT (((mask >> 4) & 3) * 4 + 17),
+       GEN_INT (((mask >> 4) & 3) * 4 + 18),
+       GEN_INT (((mask >> 4) & 3) * 4 + 19),
+       GEN_INT (((mask >> 6) & 3) * 4 + 16),
+       GEN_INT (((mask >> 6) & 3) * 4 + 17),
+       GEN_INT (((mask >> 6) & 3) * 4 + 18),
+       GEN_INT (((mask >> 6) & 3) * 4 + 19),
+       operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_shuf_<shuffletype>32x4_1<mask_name>"
   [(set (match_operand:V16FI 0 "register_operand" "=v")
 	(vec_select:V16FI
 	  (vec_concat:<ssedoublemode>
@@ -8800,14 +9988,44 @@
   mask |= (INTVAL (operands[15]) - 16) / 4 << 6;
   operands[3] = GEN_INT (mask);
 
-  return "vshuf<shuffletype>32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshuf<shuffletype>32x4\t{%3, %2, %1, %0<mask_operand19>|%0<mask_operand19>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_pshufd_1"
+(define_expand "avx512f_pshufdv3_mask"
+  [(match_operand:V16SI 0 "register_operand")
+   (match_operand:V16SI 1 "nonimmediate_operand")
+   (match_operand:SI 2 "const_0_to_255_operand")
+   (match_operand:V16SI 3 "register_operand")
+   (match_operand:HI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[2]);
+  emit_insn (gen_avx512f_pshufd_1_mask (operands[0], operands[1],
+				       GEN_INT ((mask >> 0) & 3),
+				       GEN_INT ((mask >> 2) & 3),
+				       GEN_INT ((mask >> 4) & 3),
+				       GEN_INT ((mask >> 6) & 3),
+				       GEN_INT (((mask >> 0) & 3) + 4),
+				       GEN_INT (((mask >> 2) & 3) + 4),
+				       GEN_INT (((mask >> 4) & 3) + 4),
+				       GEN_INT (((mask >> 6) & 3) + 4),
+				       GEN_INT (((mask >> 0) & 3) + 8),
+				       GEN_INT (((mask >> 2) & 3) + 8),
+				       GEN_INT (((mask >> 4) & 3) + 8),
+				       GEN_INT (((mask >> 6) & 3) + 8),
+				       GEN_INT (((mask >> 0) & 3) + 12),
+				       GEN_INT (((mask >> 2) & 3) + 12),
+				       GEN_INT (((mask >> 4) & 3) + 12),
+				       GEN_INT (((mask >> 6) & 3) + 12),
+				       operands[3], operands[4]));
+  DONE;
+})
+
+(define_insn "avx512f_pshufd_1<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(vec_select:V16SI
 	  (match_operand:V16SI 1 "nonimmediate_operand" "vm")
@@ -8848,7 +10066,7 @@
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
 
-  return "vpshufd\t{%2, %1, %0|%0, %1, %2}";
+  return "vpshufd\t{%2, %1, %0<mask_operand18>|%0<mask_operand18>, %1, %2}";
 }
   [(set_attr "type" "sselog1")
    (set_attr "prefix" "evex")
@@ -10299,12 +11517,12 @@
    (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
    (set_attr "mode" "DI")])
 
-(define_insn "abs<mode>2"
+(define_insn "abs<mode>2<mask_name>"
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand" "=v")
 	(abs:VI124_AVX2_48_AVX512F
 	  (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSSE3"
-  "%vpabs<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "TARGET_SSSE3 && <mask_mode512bit_condition>"
+  "%vpabs<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sselog1")
    (set_attr "prefix_data16" "1")
    (set_attr "prefix_extra" "1")
@@ -10645,12 +11863,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v16qiv16si2"
+(define_insn "<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
 	  (match_operand:V16QI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>bd\t{%1, %0|%0, %q1}"
+  "vpmov<extsuffix>bd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10685,12 +11903,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v16hiv16si2"
+(define_insn "avx512f_<code>v16hiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
 	  (match_operand:V16HI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>wd\t{%1, %0|%0, %1}"
+  "vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10720,7 +11938,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v8qiv8di2"
+(define_insn "avx512f_<code>v8qiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (vec_select:V8QI
@@ -10730,7 +11948,7 @@
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>bq\t{%1, %0|%0, %k1}"
+  "vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10762,12 +11980,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v8hiv8di2"
+(define_insn "avx512f_<code>v8hiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8HI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>wq\t{%1, %0|%0, %q1}"
+  "vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10799,12 +12017,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v8siv8di2"
+(define_insn "avx512f_<code>v8siv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>dq\t{%1, %0|%0, %1}"
+  "vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -11587,33 +12805,33 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512er_exp2<mode>"
+(define_insn "avx512er_exp2<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_EXP2))]
   "TARGET_AVX512ER"
-  "vexp2<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vexp2<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512er_rcp28<mode>"
+(define_insn "<mask_codefor>avx512er_rcp28<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_RCP28))]
   "TARGET_AVX512ER"
-  "vrcp28<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrcp28<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512er_rsqrt28<mode>"
+(define_insn "<mask_codefor>avx512er_rsqrt28<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_RSQRT28))]
   "TARGET_AVX512ER"
-  "vrsqrt28<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrsqrt28<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
@@ -12663,16 +13881,16 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<avx2_avx512f>_permvar<mode>"
+(define_insn "<avx2_avx512f>_permvar<mode><mask_name>"
   [(set (match_operand:VI48F_256_512 0 "register_operand" "=v")
 	(unspec:VI48F_256_512
 	  [(match_operand:VI48F_256_512 1 "nonimmediate_operand" "vm")
 	   (match_operand:<sseintvecmode> 2 "register_operand" "v")]
 	  UNSPEC_VPERMVAR))]
-  "TARGET_AVX2"
-  "vperm<ssemodesuffix>\t{%1, %2, %0|%0, %2, %1}"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "vperm<ssemodesuffix>\t{%1, %2, %0<mask_operand3>|%0<mask_operand3>, %2, %1}"
   [(set_attr "type" "sselog")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix2>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "<avx2_avx512f>_perm<mode>"
@@ -12683,14 +13901,32 @@
 {
   int mask = INTVAL (operands[2]);
   emit_insn (gen_<avx2_avx512f>_perm<mode>_1 (operands[0], operands[1],
-					    GEN_INT ((mask >> 0) & 3),
-					    GEN_INT ((mask >> 2) & 3),
-					    GEN_INT ((mask >> 4) & 3),
-					    GEN_INT ((mask >> 6) & 3)));
+					      GEN_INT ((mask >> 0) & 3),
+					      GEN_INT ((mask >> 2) & 3),
+					      GEN_INT ((mask >> 4) & 3),
+					      GEN_INT ((mask >> 6) & 3)));
+  DONE;
+})
+
+(define_expand "avx512f_perm<mode>_mask"
+  [(match_operand:V8FI 0 "register_operand")
+   (match_operand:V8FI 1 "nonimmediate_operand")
+   (match_operand:SI 2 "const_0_to_255_operand")
+   (match_operand:V8FI 3 "vector_move_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[2]);
+  emit_insn (gen_<avx2_avx512f>_perm<mode>_1_mask (operands[0], operands[1],
+						   GEN_INT ((mask >> 0) & 3),
+						   GEN_INT ((mask >> 2) & 3),
+						   GEN_INT ((mask >> 4) & 3),
+						   GEN_INT ((mask >> 6) & 3),
+						   operands[3], operands[4]));
   DONE;
 })
 
-(define_insn "<avx2_avx512f>_perm<mode>_1"
+(define_insn "<avx2_avx512f>_perm<mode>_1<mask_name>"
   [(set (match_operand:VI8F_256_512 0 "register_operand" "=v")
 	(vec_select:VI8F_256_512
 	  (match_operand:VI8F_256_512 1 "nonimmediate_operand" "vm")
@@ -12698,7 +13934,7 @@
 		     (match_operand 3 "const_0_to_3_operand")
 		     (match_operand 4 "const_0_to_3_operand")
 		     (match_operand 5 "const_0_to_3_operand")])))]
-  "TARGET_AVX2"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
 {
   int mask = 0;
   mask |= INTVAL (operands[2]) << 0;
@@ -12706,10 +13942,10 @@
   mask |= INTVAL (operands[4]) << 4;
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
-  return "vperm<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+  return "vperm<ssemodesuffix>\t{%2, %1, %0<mask_operand6>|%0<mask_operand6>, %1, %2}";
 }
   [(set_attr "type" "sselog")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix2>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx2_permv2ti"
@@ -12756,58 +13992,58 @@
    (set_attr "isa" "*,avx2,noavx2")
    (set_attr "mode" "V8SF")])
 
-(define_insn "avx512f_vec_dup<mode>"
+(define_insn "<mask_codefor>avx512f_vec_dup<mode><mask_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(vec_duplicate:VI48F_512
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
-  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0|%0, %1}"
+  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_broadcast<mode>"
+(define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
   [(set (match_operand:V16FI 0 "register_operand" "=v,v")
 	(vec_duplicate:V16FI
 	  (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "v,m")))]
   "TARGET_AVX512F"
   "@
-   vshuf<shuffletype>32x4\t{$0x0, %g1, %g1, %0|%0, %g1, %g1, 0x0}
-   vbroadcast<shuffletype>32x4\t{%1, %0|%0, %1}"
+   vshuf<shuffletype>32x4\t{$0x0, %g1, %g1, %0<mask_operand2>|%0<mask_operand2>, %g1, %g1, 0x0}
+   vbroadcast<shuffletype>32x4\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_broadcast<mode>"
+(define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v,v")
 	(vec_duplicate:V8FI
 	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "v,m")))]
   "TARGET_AVX512F"
   "@
-   vshuf<shuffletype>64x2\t{$0x44, %g1, %g1, %0|%0, %g1, %g1, 0x44}
-   vbroadcast<shuffletype>64x4\t{%1, %0|%0, %1}"
+   vshuf<shuffletype>64x2\t{$0x44, %g1, %g1, %0<mask_operand2>|%0<mask_operand2>, %g1, %g1, 0x44}
+   vbroadcast<shuffletype>64x4\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vec_dup_gpr<mode>"
+(define_insn "<mask_codefor>avx512f_vec_dup_gpr<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(vec_duplicate:VI48_512
 	  (match_operand:<ssescalarmode> 1 "register_operand" "r")))]
   "TARGET_AVX512F && (<MODE>mode != V8DImode || TARGET_64BIT)"
-  "vpbroadcast<bcstscalarsuff>\t{%1, %0|%0, %1}"
+  "vpbroadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vec_dup_mem<mode>"
+(define_insn "<mask_codefor>avx512f_vec_dup_mem<mode><mask_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=x")
 	(vec_duplicate:VI48F_512
 	  (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "xm")))]
   "TARGET_AVX512F"
-  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0|%0, %1}"
+  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -12947,12 +14183,12 @@
 				elt * GET_MODE_SIZE (<ssescalarmode>mode));
 })
 
-(define_expand "<sse2_avx_avx512f>_vpermil<mode>"
+(define_expand "<sse2_avx_avx512f>_vpermil<mode><mask_name>"
   [(set (match_operand:VF2_AVX512F 0 "register_operand")
 	(vec_select:VF2_AVX512F
 	  (match_operand:VF2_AVX512F 1 "nonimmediate_operand")
 	  (match_operand:SI 2 "const_0_to_255_operand")))]
-  "TARGET_AVX"
+  "TARGET_AVX && <mask_mode512bit_condition>"
 {
   int mask = INTVAL (operands[2]);
   rtx perm[<ssescalarnum>];
@@ -12968,12 +14204,12 @@
     = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (<ssescalarnum>, perm));
 })
 
-(define_expand "<sse2_avx_avx512f>_vpermil<mode>"
+(define_expand "<sse2_avx_avx512f>_vpermil<mode><mask_name>"
   [(set (match_operand:VF1_AVX512F 0 "register_operand")
 	(vec_select:VF1_AVX512F
 	  (match_operand:VF1_AVX512F 1 "nonimmediate_operand")
 	  (match_operand:SI 2 "const_0_to_255_operand")))]
-  "TARGET_AVX"
+  "TARGET_AVX && <mask_mode512bit_condition>"
 {
   int mask = INTVAL (operands[2]);
   rtx perm[<ssescalarnum>];
@@ -12991,40 +14227,54 @@
     = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (<ssescalarnum>, perm));
 })
 
-(define_insn "*<sse2_avx_avx512f>_vpermilp<mode>"
+(define_insn "*<sse2_avx_avx512f>_vpermilp<mode><mask_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=v")
 	(vec_select:VF_AVX512F
 	  (match_operand:VF_AVX512F 1 "nonimmediate_operand" "vm")
 	  (match_parallel 2 ""
 	    [(match_operand 3 "const_int_operand")])))]
-  "TARGET_AVX
+  "TARGET_AVX && <mask_mode512bit_condition>
    && avx_vpermilp_parallel (operands[2], <MODE>mode)"
 {
   int mask = avx_vpermilp_parallel (operands[2], <MODE>mode) - 1;
   operands[2] = GEN_INT (mask);
-  return "vpermil<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+  return "vpermil<ssemodesuffix>\t{%2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2}";
 }
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix>")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<sse2_avx_avx512f>_vpermilvar<mode>3"
+(define_insn "<sse2_avx_avx512f>_vpermilvar<mode>3<mask_name>"
   [(set (match_operand:VF_AVX512F 0 "register_operand" "=v")
 	(unspec:VF_AVX512F
 	  [(match_operand:VF_AVX512F 1 "register_operand" "v")
 	   (match_operand:<sseintvecmode> 2 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPERMIL))]
-  "TARGET_AVX"
-  "vpermil<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX && <mask_mode512bit_condition>"
+  "vpermil<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "btver2_decode" "vector")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix>")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vpermi2var<mode>3"
+(define_expand "avx512f_vpermi2var<mode>3_maskz"
+  [(match_operand:VI48F_512 0 "register_operand" "=v")
+   (match_operand:VI48F_512 1 "register_operand" "v")
+   (match_operand:<sseintvecmode> 2 "register_operand" "0")
+   (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")
+   (match_operand:<avx512fmaskmode> 4 "register_operand" "k")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_vpermi2var<mode>3_maskz_1 (
+	operands[0], operands[1], operands[2], operands[3],
+	CONST0_RTX (<MODE>mode), operands[4]));
+  DONE;
+})
+
+(define_insn "avx512f_vpermi2var<mode>3<sd_maskz_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(unspec:VI48F_512
 	  [(match_operand:VI48F_512 1 "register_operand" "v")
@@ -13032,12 +14282,42 @@
 	   (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPERMI2))]
   "TARGET_AVX512F"
-  "vpermi2<ssemodesuffix>\t{%3, %1, %0|%0, %1, %3}"
+  "vpermi2<ssemodesuffix>\t{%3, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %3}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_vpermi2var<mode>3_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v")
+	(vec_merge:VI48F_512
+	  (unspec:VI48F_512
+	    [(match_operand:VI48F_512 1 "register_operand" "v")
+	    (match_operand:<sseintvecmode> 2 "register_operand" "0")
+	    (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
+	    UNSPEC_VPERMI2_MASK)
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vpermi2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vpermt2var<mode>3"
+(define_expand "avx512f_vpermt2var<mode>3_maskz"
+  [(match_operand:VI48F_512 0 "register_operand" "=v")
+   (match_operand:<sseintvecmode> 1 "register_operand" "v")
+   (match_operand:VI48F_512 2 "register_operand" "0")
+   (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")
+   (match_operand:<avx512fmaskmode> 4 "register_operand" "k")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_vpermt2var<mode>3_maskz_1 (
+	operands[0], operands[1], operands[2], operands[3],
+	CONST0_RTX (<MODE>mode), operands[4]));
+  DONE;
+})
+
+(define_insn "avx512f_vpermt2var<mode>3<sd_maskz_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(unspec:VI48F_512
 	  [(match_operand:<sseintvecmode> 1 "register_operand" "v")
@@ -13045,7 +14325,23 @@
 	   (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPERMT2))]
   "TARGET_AVX512F"
-  "vpermt2<ssemodesuffix>\t{%3, %1, %0|%0, %1, %3}"
+  "vpermt2<ssemodesuffix>\t{%3, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %3}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_vpermt2var<mode>3_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v")
+	(vec_merge:VI48F_512
+	  (unspec:VI48F_512
+	    [(match_operand:<sseintvecmode> 1 "register_operand" "v")
+	    (match_operand:VI48F_512 2 "register_operand" "0")
+	    (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
+	    UNSPEC_VPERMT2)
+	  (match_dup 2)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vpermt2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -13440,24 +14736,24 @@
   DONE;
 })
 
-(define_insn "<avx2_avx512f>_ashrv<mode>"
+(define_insn "<avx2_avx512f>_ashrv<mode><mask_name>"
   [(set (match_operand:VI48_AVX512F 0 "register_operand" "=v")
 	(ashiftrt:VI48_AVX512F
 	  (match_operand:VI48_AVX512F 1 "register_operand" "v")
 	  (match_operand:VI48_AVX512F 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX2"
-  "vpsrav<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "vpsrav<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<avx2_avx512f>_<shift_insn>v<mode>"
+(define_insn "<avx2_avx512f>_<shift_insn>v<mode><mask_name>"
   [(set (match_operand:VI48_AVX2_48_AVX512F 0 "register_operand" "=v")
 	(any_lshift:VI48_AVX2_48_AVX512F
 	  (match_operand:VI48_AVX2_48_AVX512F 1 "register_operand" "v")
 	  (match_operand:VI48_AVX2_48_AVX512F 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX2"
-  "vp<vshift>v<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "vp<vshift>v<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -13540,12 +14836,13 @@
    (set_attr "btver2_decode" "double")
    (set_attr "mode" "V8SF")])
 
-(define_insn "avx512f_vcvtph2ps512"
+(define_insn "<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
-	(unspec:V16SF [(match_operand:V16HI 1 "nonimmediate_operand" "vm")]
-		      UNSPEC_VCVTPH2PS))]
+	(unspec:V16SF
+	  [(match_operand:V16HI 1 "nonimmediate_operand" "<round_saeonly_constraint>")]
+	  UNSPEC_VCVTPH2PS))]
   "TARGET_AVX512F"
-  "vcvtph2ps\t{%1, %0|%0, %1}"
+  "vcvtph2ps\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -13596,13 +14893,14 @@
    (set_attr "btver2_decode" "vector")
    (set_attr "mode" "V8SF")])
 
-(define_insn "avx512f_vcvtps2ph512"
+(define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>"
   [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
-	(unspec:V16HI [(match_operand:V16SF 1 "register_operand" "v")
-		      (match_operand:SI 2 "const_0_to_255_operand" "N")]
-		     UNSPEC_VCVTPS2PH))]
+	(unspec:V16HI
+	  [(match_operand:V16SF 1 "register_operand" "v")
+	   (match_operand:SI 2 "const_0_to_255_operand" "N")]
+	  UNSPEC_VCVTPS2PH))]
   "TARGET_AVX512F"
-  "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}"
+  "vcvtps2ph\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -13993,49 +15291,100 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getmant<mode>"
+(define_insn "avx512f_compress<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "register_operand" "v")
+	   (match_operand:VI48F_512 2 "vector_move_operand" "0C")
+	   (match_operand:<avx512fmaskmode> 3 "register_operand" "k")]
+	  UNSPEC_COMPRESS))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>compress<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_compressstore<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "memory_operand" "=m")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "register_operand" "x")
+	   (match_dup 0)
+	   (match_operand:<avx512fmaskmode> 2 "register_operand" "k")]
+	  UNSPEC_COMPRESS_STORE))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>compress<ssemodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512f_expand<mode>_maskz"
+  [(set (match_operand:VI48F_512 0 "register_operand")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "nonimmediate_operand")
+	   (match_operand:VI48F_512 2 "vector_move_operand")
+	   (match_operand:<avx512fmaskmode> 3 "register_operand")]
+	  UNSPEC_EXPAND))]
+  "TARGET_AVX512F"
+  "operands[2] = CONST0_RTX (<MODE>mode);")
+
+(define_insn "avx512f_expand<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v,v")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "nonimmediate_operand" "v,m")
+	   (match_operand:VI48F_512 2 "vector_move_operand" "0C,0C")
+	   (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")]
+	  UNSPEC_EXPAND))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>expand<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "none,load")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_getmant<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_GETMANT))]
   "TARGET_AVX512F"
-  "vgetmant<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+  "vgetmant<ssemodesuffix>\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_getmant<mode>"
+(define_insn "avx512f_getmant<mode><mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_15_operand")]
 	    UNSPEC_GETMANT)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetmant<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+   "vgetmant<ssescalarmodesuffix>\t{%3, <round_saeonly_mask_scalar_op4>%2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2<round_saeonly_mask_scalar_op4>, %3}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "clz<mode>2"
+(define_insn "clz<mode>2<mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(clz:VI48_512
 	  (match_operand:VI48_512 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512CD"
-  "vplzcnt<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vplzcnt<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "conflict<mode>"
+(define_insn "<mask_codefor>conflict<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(unspec:VI48_512
 	  [(match_operand:VI48_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_CONFLICT))]
   "TARGET_AVX512CD"
-  "vpconflict<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vpconflict<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
new file mode 100644
index 0000000..fcc5e8c
--- /dev/null
+++ b/gcc/config/i386/subst.md
@@ -0,0 +1,222 @@
+;; GCC machine description for AVX512F instructions
+;; Copyright (C) 2013 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Some iterators for extending subst as much as possible
+;; All vectors (Use it for destination)
+(define_mode_iterator SUBST_V
+  [V16QI
+   V16HI V8HI
+   V16SI V8SI  V4SI
+   V8DI  V4DI  V2DI
+   V16SF V8SF  V4SF
+   V8DF  V4DF  V2DF])
+
+(define_mode_iterator SUBST_S
+  [QI HI SI DI])
+
+(define_mode_iterator SUBST_A
+  [V16QI
+   V16HI V8HI
+   V16SI V8SI  V4SI
+   V8DI  V4DI  V2DI
+   V16SF V8SF  V4SF
+   V8DF  V4DF  V2DF
+   QI HI SI DI SF DF
+   CCFP CCFPU])
+
+(define_subst_attr "mask_name" "mask" "" "_mask")
+(define_subst_attr "mask_applied" "mask" "false" "true")
+(define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2")
+(define_subst_attr "mask_operand3" "mask" "" "%{%4%}%N3")
+(define_subst_attr "mask_operand3_1" "mask" "" "%%{%%4%%}%%N3") ;; for sprintf
+(define_subst_attr "mask_operand4" "mask" "" "%{%5%}%N4")
+(define_subst_attr "mask_operand6" "mask" "" "%{%7%}%N6")
+(define_subst_attr "mask_operand11" "mask" "" "%{%12%}%N11")
+(define_subst_attr "mask_operand18" "mask" "" "%{%19%}%N18")
+(define_subst_attr "mask_operand19" "mask" "" "%{%20%}%N19")
+(define_subst_attr "mask_codefor" "mask" "*" "")
+(define_subst_attr "mask_mode512bit_condition" "mask" "1" "(GET_MODE_SIZE (GET_MODE (operands[0])) == 64)")
+(define_subst_attr "store_mask_constraint" "mask" "vm" "v")
+(define_subst_attr "store_mask_predicate" "mask" "nonimmediate_operand" "register_operand")
+(define_subst_attr "mask_prefix" "mask" "vex" "evex")
+(define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex")
+(define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex")
+
+(define_subst "mask"
+  [(set (match_operand:SUBST_V 0)
+        (match_operand:SUBST_V 1))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+        (vec_merge:SUBST_V
+	  (match_dup 1)
+	  (match_operand:SUBST_V 2 "vector_move_operand" "0C")
+	  (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))])
+
+(define_subst_attr "mask_scalar_name" "mask_scalar" "" "_mask")
+(define_subst_attr "mask_scalar_operand3" "mask_scalar" "" "%{%4%}%N3")
+(define_subst_attr "mask_scalar_operand4" "mask_scalar" "" "%{%5%}%N4")
+(define_subst_attr "mask_scalar_codefor" "mask_scalar" "*" "")
+(define_subst_attr "mask_scalar_prefix" "mask_scalar" "orig,vex" "evex")
+(define_subst_attr "mask_scalar_prefix2" "mask_scalar" "vex" "evex")
+
+(define_subst "mask_scalar"
+  [(set (match_operand:SUBST_V 0)
+	(vec_merge:SUBST_V
+	  (match_operand:SUBST_V 1)
+	  (match_operand:SUBST_V 2)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+        (vec_merge:SUBST_V
+	  (vec_merge:SUBST_V
+	    (match_dup 1)
+	    (match_operand:SUBST_V 4 "vector_move_operand" "0C")
+	    (match_operand:<avx512fmaskmode> 5 "register_operand" "k"))
+	  (match_dup 2)
+	  (const_int 1)))])
+
+(define_subst_attr "mask_scalar_merge_name" "mask_scalar_merge" "" "_mask")
+(define_subst_attr "mask_scalar_merge_operand3" "mask_scalar_merge" "" "%{%3%}")
+(define_subst_attr "mask_scalar_merge_operand4" "mask_scalar_merge" "" "%{%4%}")
+
+(define_subst "mask_scalar_merge"
+  [(set (match_operand:SUBST_S 0)
+        (match_operand:SUBST_S 1))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+        (and:SUBST_S
+	  (match_dup 1)
+	  (match_operand:SUBST_S 3 "register_operand" "k")))])
+
+(define_subst_attr "sd_maskz_name" "sd" "" "_maskz_1")
+(define_subst_attr "sd_mask_op4" "sd" "" "%{%5%}%N4")
+(define_subst_attr "sd_mask_op5" "sd" "" "%{%6%}%N5")
+(define_subst_attr "sd_mask_codefor" "sd" "*" "")
+(define_subst_attr "sd_mask_mode512bit_condition" "sd" "1" "(GET_MODE_SIZE (GET_MODE (operands[0])) == 64)")
+
+(define_subst "sd"
+ [(set (match_operand:SUBST_V 0)
+       (match_operand:SUBST_V 1))]
+ ""
+ [(set (match_dup 0)
+       (vec_merge:SUBST_V
+	 (match_dup 1)
+	 (match_operand:SUBST_V 2 "const0_operand" "C")
+	 (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))
+])
+
+(define_subst_attr "round_name" "round" "" "_round")
+(define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4")
+(define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5")
+(define_subst_attr "round_mask_scalar_operand3" "mask_scalar" "%R3" "%R5")
+(define_subst_attr "round_sd_mask_operand4" "sd" "%R4" "%R6")
+(define_subst_attr "round_op2" "round" "" "%R2")
+(define_subst_attr "round_op3" "round" "" "%R3")
+(define_subst_attr "round_op4" "round" "" "%R4")
+(define_subst_attr "round_op5" "round" "" "%R5")
+(define_subst_attr "round_op6" "round" "" "%R6")
+(define_subst_attr "round_mask_op2" "round" "" "<round_mask_operand2>")
+(define_subst_attr "round_mask_op3" "round" "" "<round_mask_operand3>")
+(define_subst_attr "round_mask_scalar_op3" "round" "" "<round_mask_scalar_operand3>")
+(define_subst_attr "round_sd_mask_op4" "round" "" "<round_sd_mask_operand4>")
+(define_subst_attr "round_constraint" "round" "vm" "v")
+(define_subst_attr "round_constraint2" "round" "m" "v")
+(define_subst_attr "round_constraint3" "round" "rm" "r")
+(define_subst_attr "round_mode512bit_condition" "round" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) == V8DFmode)")
+(define_subst_attr "round_modev4sf_condition" "round" "1" "(GET_MODE (operands[0]) == V4SFmode)")
+(define_subst_attr "round_codefor" "round" "*" "")
+(define_subst_attr "round_opnum" "round" "5" "6")
+
+(define_subst "round"
+  [(set (match_operand:SUBST_A 0)
+        (match_operand:SUBST_A 1))]
+  "TARGET_AVX512F"
+  [(parallel[
+     (set (match_dup 0)
+          (match_dup 1))
+     (unspec [(match_operand:SI 2 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])])
+
+(define_subst_attr "round_saeonly_name" "round_saeonly" "" "_round")
+(define_subst_attr "round_saeonly_mask_operand2" "mask" "%R2" "%R4")
+(define_subst_attr "round_saeonly_mask_operand3" "mask" "%R3" "%R5")
+(define_subst_attr "round_saeonly_mask_scalar_operand3" "mask_scalar" "%R3" "%R5")
+(define_subst_attr "round_saeonly_mask_scalar_operand4" "mask_scalar" "%R4" "%R6")
+(define_subst_attr "round_saeonly_mask_scalar_merge_operand4" "mask_scalar_merge" "%R4" "%R5")
+(define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%R5" "%R7")
+(define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%R2")
+(define_subst_attr "round_saeonly_op4" "round_saeonly" "" "%R4")
+(define_subst_attr "round_saeonly_op5" "round_saeonly" "" "%R5")
+(define_subst_attr "round_saeonly_op6" "round_saeonly" "" "%R6")
+(define_subst_attr "round_saeonly_mask_op2" "round_saeonly" "" "<round_saeonly_mask_operand2>")
+(define_subst_attr "round_saeonly_mask_op3" "round_saeonly" "" "<round_saeonly_mask_operand3>")
+(define_subst_attr "round_saeonly_mask_scalar_op3" "round_saeonly" "" "<round_saeonly_mask_scalar_operand3>")
+(define_subst_attr "round_saeonly_mask_scalar_op4" "round_saeonly" "" "<round_saeonly_mask_scalar_operand4>")
+(define_subst_attr "round_saeonly_mask_scalar_merge_op4" "round_saeonly" "" "<round_saeonly_mask_scalar_merge_operand4>")
+(define_subst_attr "round_saeonly_sd_mask_op5" "round_saeonly" "" "<round_saeonly_sd_mask_operand5>")
+(define_subst_attr "round_saeonly_constraint" "round_saeonly" "vm" "v")
+(define_subst_attr "round_saeonly_constraint2" "round_saeonly" "m" "v")
+(define_subst_attr "round_saeonly_mode512bit_condition" "round_saeonly" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) == V8DFmode)")
+(define_subst_attr "round_saeonly_mode512bit_condition_op1" "round_saeonly" "1" "(GET_MODE (operands[1]) == V16SFmode || GET_MODE (operands[1]) == V8DFmode)")
+
+(define_subst "round_saeonly"
+  [(set (match_operand:SUBST_A 0)
+        (match_operand:SUBST_A 1))]
+  "TARGET_AVX512F"
+  [(parallel[
+     (set (match_dup 0)
+          (match_dup 1))
+     (unspec [(match_operand:SI 2 "const_4_to_5_operand")] UNSPEC_EMBEDDED_ROUNDING)])])
+
+(define_subst_attr "round_expand_name" "round_expand" "" "_round")
+(define_subst_attr "round_expand_predicate" "round_expand" "nonimmediate_operand" "register_operand")
+(define_subst_attr "round_expand_operand" "round_expand" "" ", operands[5]")
+
+(define_subst "round_expand"
+ [(match_operand:SUBST_V 0)
+  (match_operand:SUBST_V 1)
+  (match_operand:SUBST_V 2)
+  (match_operand:SUBST_V 3)
+  (match_operand:SUBST_S 4)]
+  "TARGET_AVX512F"
+  [(match_dup 0)
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)
+   (match_dup 4)
+   (unspec [(match_operand:SI 5 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])
+
+(define_subst_attr "round_saeonly_expand_name5" "round_saeonly_expand5" "" "_round")
+(define_subst_attr "round_saeonly_expand_predicate5" "round_saeonly_expand5" "nonimmediate_operand" "register_operand")
+(define_subst_attr "round_saeonly_expand_operand6" "round_saeonly_expand5" "" ", operands[6]")
+
+(define_subst "round_saeonly_expand5"
+ [(match_operand:SUBST_V 0)
+  (match_operand:SUBST_V 1)
+  (match_operand:SUBST_V 2)
+  (match_operand:SUBST_A 3)
+  (match_operand:SI 4)
+  (match_operand:SUBST_S 5)]
+  "TARGET_AVX512F"
+  [(match_dup 0)
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)
+   (match_dup 4)
+   (match_dup 5)
+   (unspec [(match_operand:SI 6 "const_4_to_5_operand")] UNSPEC_EMBEDDED_ROUNDING)])
-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] Add substed patterns.
  2013-08-14  7:44 [PATCH i386 4/8] [AVX512] Add substed patterns Kirill Yukhin
@ 2013-08-22 14:15 ` Kirill Yukhin
  2013-10-17 14:17   ` [PATCH i386 4/8] [AVX512] [1/n] " Kirill Yukhin
                     ` (2 more replies)
  2013-11-06  7:22 ` [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst Kirill Yukhin
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-08-22 14:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

On 14 Aug 11:44, Kirill Yukhin wrote:
PING?

Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-08-22 14:15 ` Kirill Yukhin
@ 2013-10-17 14:17   ` Kirill Yukhin
  2013-10-22  1:43     ` Richard Henderson
  2013-10-28 15:13   ` [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst Kirill Yukhin
  2013-10-28 15:20   ` [PATCH i386 4/8] [AVX512] Add substed patterns: mask_scalar_merge subst Kirill Yukhin
  2 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-10-17 14:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello,
On 22 Aug 18:10, Kirill Yukhin wrote:
> On 14 Aug 11:44, Kirill Yukhin wrote:
> PING?

I've splitted this patch.
Each new subpatch will introduce one new subst.

I am starting from most important one, which introduce
EVEX's masking feature to the patterns.

Changes are massive, but they all similar to each other
and straightforfard (supposing one knows what is `define_subst').

Most common example:
@@ -10776,12 +11933,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])

-(define_insn "avx512f_<code>v8siv8di2"
+(define_insn "avx512f_<code>v8siv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
        (any_extend:V8DI
          (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>dq\t{%1, %0|%0, %1}"
+  "vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])

Some patterns cannot be substed for different reasons and `mask' version
was introduced explicitly for them.

Few patterns were introduced to support future built-ins, like this:
+(define_expand "avx512f_vextract<shuffletype>64x4_mask"

Testing:
  1. Bootstrap pass.
  2. make check shows no regressions.
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option.
  4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option.

Is it ok to install to main trunk?

--
Thanks, K

commit 11509122dda1c8e67a6c667f0a7b4baef65c71b4
Author: Ilya Tocar <ilya.tocar@intel.com>
Date:   Wed Oct 16 14:32:45 2013 +0400

    Add mask subst.
---
 gcc/config/i386/i386.c        |    5 +
 gcc/config/i386/predicates.md |   10 +
 gcc/config/i386/sse.md        | 1836 ++++++++++++++++++++++++++++++++++-------
 gcc/config/i386/subst.md      |   56 ++
 4 files changed, 1614 insertions(+), 293 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index febceca..0e91500 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14780,6 +14780,11 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	  /* We do not want to print value of the operand.  */
 	  return;
 
+	case 'N':
+	  if (x == const0_rtx || x == CONST0_RTX (GET_MODE (x)))
+	    fputs ("{z}", file);
+	  return;
+
 	case '*':
 	  if (ASSEMBLER_DIALECT == ASM_ATT)
 	    putc ('*', file);
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 261335d..00a203e 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -687,6 +687,16 @@
   (and (match_code "const_int")
        (match_test "IN_RANGE (INTVAL (op), 0, 3)")))
 
+;; Match 0 to 4.
+(define_predicate "const_0_to_4_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 4)")))
+
+;; Match 0 to 5.
+(define_predicate "const_0_to_5_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 5)")))
+
 ;; Match 0 to 7.
 (define_predicate "const_0_to_7_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 939cc33..bf0e1ed 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -87,6 +87,7 @@
   ;; For AVX512F support
   UNSPEC_VPERMI2
   UNSPEC_VPERMT2
+  UNSPEC_VPERMI2_MASK
   UNSPEC_UNSIGNED_FIX_NOTRUNC
   UNSPEC_UNSIGNED_PCMP
   UNSPEC_TESTM
@@ -101,9 +102,15 @@
   UNSPEC_GETMANT
   UNSPEC_ALIGN
   UNSPEC_CONFLICT
+  UNSPEC_COMPRESS
+  UNSPEC_COMPRESS_STORE
+  UNSPEC_EXPAND
   UNSPEC_MASKED_EQ
   UNSPEC_MASKED_GT
 
+  ;; For embed. rounding feature
+  UNSPEC_EMBEDDED_ROUNDING
+
   ;; For AVX512PF support
   UNSPEC_GATHER_PREFETCH
   UNSPEC_SCATTER_PREFETCH
@@ -554,6 +561,12 @@
    (V8SF "7") (V4DF "3")
    (V4SF "3") (V2DF "1")])
 
+(define_mode_attr ssescalarsize
+  [(V8DI  "64") (V4DI  "64") (V2DI  "64")
+   (V32HI "16") (V16HI "16") (V8HI "16")
+   (V16SI "32") (V8SI "32") (V4SI "32")
+   (V16SF "16") (V8DF "64")])
+
 ;; SSE prefix for integer vector modes
 (define_mode_attr sseintprefix
   [(V2DI  "p") (V2DF  "")
@@ -610,6 +623,9 @@
 (define_mode_attr bcstscalarsuff
   [(V16SI "d") (V16SF "ss") (V8DI "q") (V8DF "sd")])
 
+;; Include define_subst patterns for instructions with mask
+(include "subst.md")
+
 ;; Patterns whose name begins with "sse{,2,3}_" are invoked by intrinsics.
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -749,6 +765,28 @@
 	      ]
 	      (const_string "<sseinsnmode>")))])
 
+(define_insn "avx512f_load<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v,v")
+	(vec_merge:VI48F_512
+	  (match_operand:VI48F_512 1 "nonimmediate_operand" "v,m")
+	  (match_operand:VI48F_512 2 "vector_move_operand" "0C,0C")
+	  (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+{
+  switch (get_attr_mode (insn))
+    {
+    case MODE_V8DF:
+    case MODE_V16SF:
+      return "vmova<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}";
+    default:
+      return "vmovdqa<ssescalarsize>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}";
+    }
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "none,load")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "avx512f_blendm<mode>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(vec_merge:VI48F_512
@@ -761,6 +799,28 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_store<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "memory_operand" "=m")
+	(vec_merge:VI48F_512
+	  (match_operand:VI48F_512 1 "register_operand" "v")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
+  "TARGET_AVX512F"
+{
+  switch (get_attr_mode (insn))
+    {
+    case MODE_V8DF:
+    case MODE_V16SF:
+      return "vmova<ssemodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    default:
+      return "vmovdqa<ssescalarsize>\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    }
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "sse2_movq128"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(vec_concat:V2DI
@@ -852,21 +912,21 @@
   DONE;
 })
 
-(define_insn "<sse>_loadu<ssemodesuffix><avxsizesuffix>"
+(define_insn "<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
 	(unspec:VF
 	  [(match_operand:VF 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_LOADU))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
 {
   switch (get_attr_mode (insn))
     {
     case MODE_V16SF:
     case MODE_V8SF:
     case MODE_V4SF:
-      return "%vmovups\t{%1, %0|%0, %1}";
+      return "%vmovups\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     default:
-      return "%vmovu<ssemodesuffix>\t{%1, %0|%0, %1}";
+      return "%vmovu<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     }
 }
   [(set_attr "type" "ssemov")
@@ -913,12 +973,36 @@
 	      ]
 	      (const_string "<MODE>")))])
 
-(define_insn "<sse2_avx_avx512f>_loaddqu<mode>"
+(define_insn "avx512f_storeu<ssemodesuffix>512_mask"
+  [(set (match_operand:VF_512 0 "memory_operand" "=m")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "v")]
+	    UNSPEC_STOREU)
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
+  "TARGET_AVX512F"
+{
+  switch (get_attr_mode (insn))
+    {
+    case MODE_V16SF:
+      return "vmovups\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    default:
+      return "vmovu<ssemodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    }
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "movu" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "<sse2_avx_avx512f>_loaddqu<mode><mask_name>"
   [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "register_operand" "=v")
 	(unspec:VI_UNALIGNED_LOADSTORE
 	  [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_LOADU))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
 {
   switch (get_attr_mode (insn))
     {
@@ -927,9 +1011,9 @@
       return "%vmovups\t{%1, %0|%0, %1}";
     case MODE_XI:
       if (<MODE>mode == V8DImode)
-	return "vmovdqu64\t{%1, %0|%0, %1}";
+	return "vmovdqu64\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
       else
-	return "vmovdqu32\t{%1, %0|%0, %1}";
+	return "vmovdqu32\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     default:
       return "%vmovdqu\t{%1, %0|%0, %1}";
     }
@@ -992,6 +1076,88 @@
 	      ]
 	      (const_string "<sseinsnmode>")))])
 
+(define_insn "avx512f_storedqu<mode>_mask"
+  [(set (match_operand:VI48_512 0 "memory_operand" "=m")
+	(vec_merge:VI48_512
+	  (unspec:VI48_512
+	    [(match_operand:VI48_512 1 "register_operand" "v")]
+	    UNSPEC_STOREU)
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
+  "TARGET_AVX512F"
+{
+  if (<MODE>mode == V8DImode)
+    return "vmovdqu64\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+  else
+    return "vmovdqu32\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "movu" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_moves<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (match_operand:VF_128 2 "register_operand" "v")
+	    (match_operand:VF_128 3 "vector_move_operand" "0C")
+	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
+	  (match_operand:VF_128 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512f_loads<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (vec_duplicate:VF_128
+	      (match_operand:<ssescalarmode> 1 "memory_operand"))
+	    (match_operand:VF_128 2 "vector_move_operand")
+	    (match_operand:<avx512fmaskmode> 3 "register_operand"))
+	  (match_dup 4)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "operands[4] = CONST0_RTX (<MODE>mode);")
+
+(define_insn "*avx512f_loads<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (vec_duplicate:VF_128
+	      (match_operand:<ssescalarmode> 1 "memory_operand" "m"))
+	    (match_operand:VF_128 2 "vector_move_operand" "0C")
+	    (match_operand:<avx512fmaskmode> 3 "register_operand" "k"))
+	  (match_operand:VF_128 4 "const0_operand")
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vmov<ssescalarmodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "load")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_stores<mode>_mask"
+  [(set (match_operand:<ssescalarmode> 0 "memory_operand" "=m")
+	(vec_select:<ssescalarmode>
+	  (vec_merge:VF_128
+	    (match_operand:VF_128 1 "register_operand" "v")
+	    (vec_duplicate:VF_128
+	      (match_dup 0))
+	    (match_operand:<avx512fmaskmode> 2 "register_operand" "k"))
+	  (parallel [(const_int 0)])))]
+  "TARGET_AVX512F"
+  "vmov<ssescalarmodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "<ssescalarmode>")])
+
 (define_insn "<sse3>_lddqu<avxsizesuffix>"
   [(set (match_operand:VI1 0 "register_operand" "=x")
 	(unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")]
@@ -1119,26 +1285,26 @@
 }
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
-(define_expand "<plusminus_insn><mode>3"
+(define_expand "<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(plusminus:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*<plusminus_insn><mode>3"
+(define_insn "*<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(plusminus:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "<comm>0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    <plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   v<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_vm<plusminus_insn><mode>3"
@@ -1158,26 +1324,26 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_expand "mul<mode>3"
+(define_expand "mul<mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(mult:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
 
-(define_insn "*mul<mode>3"
+(define_insn "*mul<mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(mult:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands)"
+  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    mul<ssemodesuffix>\t{%2, %0|%0, %2}
-   vmul<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vmul<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssemul")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<MODE>")])
 
@@ -1195,7 +1361,7 @@
    v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
-   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "prefix" "orig,vex")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1225,18 +1391,18 @@
     }
 })
 
-(define_insn "<sse>_div<mode>3"
+(define_insn "<sse>_div<mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(div:VF
 	  (match_operand:VF 1 "register_operand" "0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
   "@
    div<ssemodesuffix>\t{%2, %0|%0, %2}
-   vdiv<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vdiv<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssediv")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_rcp<mode>2"
@@ -1269,18 +1435,18 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
-(define_insn "rcp14<mode>"
+(define_insn "<mask_codefor>rcp14<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RCP14))]
   "TARGET_AVX512F"
-  "vrcp14<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrcp14<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "srcp14<mode>"
+(define_insn "*srcp14<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1316,11 +1482,11 @@
     }
 })
 
-(define_insn "<sse>_sqrt<mode>2"
+(define_insn "<sse>_sqrt<mode>2<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
 	(sqrt:VF (match_operand:VF 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE"
-  "%vsqrt<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "TARGET_SSE && <mask_mode512bit_condition>"
+  "%vsqrt<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
    (set_attr "btver2_sse_attr" "sqrt")
@@ -1341,8 +1507,8 @@
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
-   (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "prefix" "orig,vex")
+   (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_expand "rsqrt<mode>2"
@@ -1365,18 +1531,18 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "rsqrt14<mode>"
+(define_insn "<mask_codefor>rsqrt14<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RSQRT14))]
   "TARGET_AVX512F"
-  "vrsqrt14<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrsqrt14<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "rsqrt14<mode>"
+(define_insn "*rsqrt14<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1411,47 +1577,49 @@
 ;; isn't really correct, as those rtl operators aren't defined when
 ;; applied to NaNs.  Hopefully the optimizers won't get too smart on us.
 
-(define_expand "<code><mode>3"
+(define_expand "<code><mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(smaxmin:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
 {
   if (!flag_finite_math_only)
     operands[1] = force_reg (<MODE>mode, operands[1]);
   ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);
 })
 
-(define_insn "*<code><mode>3_finite"
+(define_insn "*<code><mode>3_finite<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(smaxmin:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
   "TARGET_SSE && flag_finite_math_only
-   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
+   && <mask_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*<code><mode>3"
+(define_insn "*<code><mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(smaxmin:VF
 	  (match_operand:VF 1 "register_operand" "0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && !flag_finite_math_only"
+  "TARGET_SSE && !flag_finite_math_only
+   && <mask_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_vm<code><mode>3"
@@ -2029,6 +2197,24 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
+(define_insn "avx512f_vmcmp<mode>3_mask"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
+	(and:<avx512fmaskmode>
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VF_128 1 "register_operand" "v")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
+	    UNSPEC_PCMP)
+	  (and:<avx512fmaskmode>
+	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k")
+	    (const_int 1))))]
+  "TARGET_AVX512F"
+  "vcmp<ssescalarmodesuffix>\t{%3, %2, %1, %0%{%4%}|%0%{%4%}, %1, %2, %3}"
+  [(set_attr "type" "ssecmp")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<ssescalarmode>")])
+
 (define_insn "avx512f_maskcmp<mode>3"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(match_operator:<avx512fmaskmode> 3 "sse_comparison_operator"
@@ -2565,9 +2751,9 @@
 (define_insn "*fma_fmadd_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
-	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
+	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x")
+	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm,v,vm,x,m")
+	  (match_operand:FMAMODE 3 "nonimmediate_operand" "v,vm,0,xm,x")))]
   ""
   "@
    vfmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
@@ -2579,13 +2765,45 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "fma_fmsub_<mode>"
+(define_insn "avx512f_fmadd_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "0,0")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmadd_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=x")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "x")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 3 "register_operand" "0"))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fma_fmsub_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
-	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0,0,v,x,x")
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm,v,vm,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" "v,vm,0,xm,x"))))]
   ""
   "@
    vfmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
@@ -2597,13 +2815,47 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "fma_fnmadd_<mode>"
+(define_insn "avx512f_fmsub_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "0,0")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmsub_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "v")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "register_operand" "0")))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fma_fnmadd_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE   3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm,v,vm,x,m")
+	  (match_operand:FMAMODE   3 "nonimmediate_operand" "v,vm,0,xm,x")))]
   ""
   "@
    vfnmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
@@ -2615,14 +2867,48 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fnmadd_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "0,0"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfnmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfnmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fnmadd_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "v"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 3 "register_operand" "0"))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfnmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*fma_fnmsub_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm,v,vm,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" "v,vm,0,xm,x"))))]
   ""
   "@
    vfnmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
@@ -2634,6 +2920,42 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fnmsub_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "0,0"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfnmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfnmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fnmsub_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "v"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "register_operand" "0")))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfnmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 ;; FMA parallel floating point multiply addsub and subadd operations.
 
 ;; It would be possible to represent these without the UNSPEC as
@@ -2657,9 +2979,9 @@
 (define_insn "*fma_fmaddsub_<mode>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
-	  [(match_operand:VF 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	   (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x")]
+	  [(match_operand:VF 1 "nonimmediate_operand" "%0,0,v,x,x")
+	   (match_operand:VF 2 "nonimmediate_operand" "vm,v,vm,x,m")
+	   (match_operand:VF 3 "nonimmediate_operand" "v,vm,0,xm,x")]
 	  UNSPEC_FMADDSUB))]
   "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)"
   "@
@@ -2672,13 +2994,47 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fmaddsub_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "0,0")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	     (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmaddsub_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "v")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_512 3 "register_operand" "0")]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmaddsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*fma_fmsubadd_<mode>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
-	  [(match_operand:VF   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  [(match_operand:VF   1 "nonimmediate_operand" "%0,0,v,x,x")
+	   (match_operand:VF   2 "nonimmediate_operand" "vm,v,vm,x,m")
 	   (neg:VF
-	     (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x"))]
+	     (match_operand:VF 3 "nonimmediate_operand" "v,vm,0,xm,x"))]
 	  UNSPEC_FMADDSUB))]
   "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)"
   "@
@@ -2691,9 +3047,60 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fmsubadd_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "0,0")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	     (neg:VF_512
+	       (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmsubadd_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "v")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (neg:VF_512
+	       (match_operand:VF_512 3 "register_operand" "0"))]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmsubadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 ;; FMA3 floating point scalar intrinsics. These merge result with
 ;; high-order elements from the destination register.
 
+(define_expand "fmai_vmfmadd_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand")
+	      (match_operand:VF_128 2 "nonimmediate_operand")
+	      (match_operand:VF_128 3 "nonimmediate_operand"))
+	    (match_dup 5)
+	    (match_operand:QI 4 "register_operand"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "operands[5] = CONST0_RTX (<MODE>mode);")
+
 (define_expand "fmai_vmfmadd_<mode>"
   [(set (match_operand:VF_128 0 "register_operand")
 	(vec_merge:VF_128
@@ -2705,13 +3112,68 @@
 	  (const_int 1)))]
   "TARGET_FMA")
 
+(define_insn "fmai_vmfmadd_<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fmai_vmfmadd_<mode>_mask3"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "%v")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	      (match_operand:VF_128 3 "register_operand" "0"))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfmadd231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmadd_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %3, %2}
+   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*fmai_fmadd_<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "vm, v")
-	    (match_operand:VF_128 3 "nonimmediate_operand" " v,vm"))
+	    (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
@@ -2721,14 +3183,191 @@
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*fmai_fmsub_<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
+   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fmai_vmfmsub_<mode>_mask3"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "%v")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "register_operand" "0")))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfmsub231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmsub_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
+   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_vmfnmadd_<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
+   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_vmfnmadd_<mode>_mask3"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 1 "nonimmediate_operand" "%v"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	      (match_operand:VF_128 3 "register_operand" "0"))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfnmadd231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_vmfnmadd_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
+   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fnmsub_<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (neg:VF_128
+		(match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
+   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fnmsub_<mode>_mask3"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 1 "nonimmediate_operand" "%v"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	      (neg:VF_128
+	        (match_operand:VF_128 3 "register_operand" "0")))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfnmsub231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fnmsub_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
+   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*fmai_fmsub_<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   2 "nonimmediate_operand" "vm, v")
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128   2 "nonimmediate_operand" "vm,v")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
@@ -2743,9 +3382,9 @@
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   3 "nonimmediate_operand" " v,vm"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128   3 "nonimmediate_operand" "v,vm"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
@@ -2760,10 +3399,10 @@
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
@@ -3014,7 +3653,7 @@
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(fix:DI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "v,vm")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE && TARGET_64BIT"
   "%vcvttss2si{q}\t{%1, %0|%0, %k1}"
@@ -3054,22 +3693,22 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "float<sseintvecmodelower><mode>2"
+(define_insn "float<sseintvecmodelower><mode>2<mask_name>"
   [(set (match_operand:VF1 0 "register_operand" "=v")
 	(float:VF1
 	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE2"
-  "%vcvtdq2ps\t{%1, %0|%0, %1}"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
+  "%vcvtdq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "ufloatv16siv16sf2"
+(define_insn "ufloatv16siv16sf2<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(unsigned_float:V16SF
 	  (match_operand:V16SI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvtudq2ps\t{%1, %0|%0, %1}"
+  "vcvtudq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -3104,34 +3743,34 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_fix_notruncv16sfv16si"
+(define_insn "<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
 	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2dq\t{%1, %0|%0, %1}"
+  "vcvtps2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "avx512f_ufix_notruncv16sfv16si"
+(define_insn "<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
 	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2udq\t{%1, %0|%0, %1}"
+  "vcvtps2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "<fixsuffix>fix_truncv16sfv16si2"
+(define_insn "<fixsuffix>fix_truncv16sfv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_fix:V16SI
 	  (match_operand:V16SF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvttps2<fixsuffix>dq\t{%1, %0|%0, %1}"
+  "vcvttps2<fixsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -3461,20 +4100,21 @@
 (define_mode_attr si2dfmodelower
   [(V8DF "v8si") (V4DF "v4si")])
 
-(define_insn "float<si2dfmodelower><mode>2"
+(define_insn "float<si2dfmodelower><mode>2<mask_name>"
   [(set (match_operand:VF2_512_256 0 "register_operand" "=v")
 	(float:VF2_512_256 (match_operand:<si2dfmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX"
-  "vcvtdq2pd\t{%1, %0|%0, %1}"
+  "TARGET_AVX && <mask_mode512bit_condition>"
+  "vcvtdq2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "ufloatv8siv8df"
+(define_insn "ufloatv8siv8df<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v")
-	(unsigned_float:V8DF (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
+	(unsigned_float:V8DF
+	  (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvtudq2pd\t{%1, %0|%0, %1}"
+  "vcvtudq2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8DF")])
@@ -3519,12 +4159,13 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V2DF")])
 
-(define_insn "avx512f_cvtpd2dq512"
+(define_insn "<mask_codefor>avx512f_cvtpd2dq512<mask_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
-	(unspec:V8SI [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
-		     UNSPEC_FIX_NOTRUNC))]
+	(unspec:V8SI
+	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
+	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2dq\t{%1, %0|%0, %1}"
+  "vcvtpd2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -3592,22 +4233,23 @@
    (set_attr "athlon_decode" "vector")
    (set_attr "bdver1_decode" "double")])
 
-(define_insn "avx512f_ufix_notruncv8dfv8si"
+(define_insn "avx512f_ufix_notruncv8dfv8si<mask_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(unspec:V8SI
 	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2udq\t{%1, %0|%0, %1}"
+  "vcvtpd2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
 
-(define_insn "<fixsuffix>fix_truncv8dfv8si2"
+(define_insn "<fixsuffix>fix_truncv8dfv8si2<mask_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
-	(any_fix:V8SI (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
+	(any_fix:V8SI
+	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvttpd2<fixsuffix>dq\t{%1, %0|%0, %1}"
+  "vcvttpd2<fixsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -3686,8 +4328,8 @@
    (set_attr "athlon_decode" "vector,double,*")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "bdver1_decode" "direct,direct,*")
-   (set_attr "btver2_decode" "double,double,double")
    (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "double,double,double")
    (set_attr "mode" "SF")])
 
 (define_insn "sse2_cvtss2sd"
@@ -3713,12 +4355,12 @@
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "DF")])
 
-(define_insn "avx512f_cvtpd2ps512"
+(define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name>"
   [(set (match_operand:V8SF 0 "register_operand" "=v")
 	(float_truncate:V8SF
 	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvtpd2ps\t{%1, %0|%0, %1}"
+  "vcvtpd2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8SF")])
@@ -3768,12 +4410,12 @@
 (define_mode_attr sf2dfmode
   [(V8DF "V8SF") (V4DF "V4SF")])
 
-(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix>"
+(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix><mask_name>"
   [(set (match_operand:VF2_512_256 0 "register_operand" "=v")
 	(float_extend:VF2_512_256
 	  (match_operand:<sf2dfmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX"
-  "vcvtps2pd\t{%1, %0|%0, %1}"
+  "TARGET_AVX && <mask_mode512bit_condition>"
+  "vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
@@ -4097,6 +4739,32 @@
   DONE;
 })
 
+(define_expand "vec_unpacku_float_hi_v16si"
+  [(match_operand:V8DF 0 "register_operand")
+   (match_operand:V16SI 1 "register_operand")]
+  "TARGET_AVX512F"
+{
+  REAL_VALUE_TYPE TWO32r;
+  rtx k, x, tmp[4];
+
+  real_ldexp (&TWO32r, &dconst1, 32);
+  x = const_double_from_real_value (TWO32r, DFmode);
+
+  tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode));
+  tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x));
+  tmp[2] = gen_reg_rtx (V8DFmode);
+  tmp[3] = gen_reg_rtx (V8SImode);
+  k = gen_reg_rtx (QImode);
+
+  emit_insn (gen_vec_extract_hi_v16si (tmp[3], operands[1]));
+  emit_insn (gen_floatv8siv8df2 (tmp[2], tmp[3]));
+  emit_insn (gen_rtx_SET (VOIDmode, k,
+			  gen_rtx_LT (QImode, tmp[2], tmp[0])));
+  emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k));
+  emit_move_insn (operands[0], tmp[2]);
+  DONE;
+})
+
 (define_expand "vec_unpacku_float_lo_v8si"
   [(match_operand:V4DF 0 "register_operand")
    (match_operand:V8SI 1 "nonimmediate_operand")]
@@ -4122,6 +4790,30 @@
   DONE;
 })
 
+(define_expand "vec_unpacku_float_lo_v16si"
+  [(match_operand:V8DF 0 "register_operand")
+   (match_operand:V16SI 1 "nonimmediate_operand")]
+  "TARGET_AVX512F"
+{
+  REAL_VALUE_TYPE TWO32r;
+  rtx k, x, tmp[3];
+
+  real_ldexp (&TWO32r, &dconst1, 32);
+  x = const_double_from_real_value (TWO32r, DFmode);
+
+  tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode));
+  tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x));
+  tmp[2] = gen_reg_rtx (V8DFmode);
+  k = gen_reg_rtx (QImode);
+
+  emit_insn (gen_avx512f_cvtdq2pd512_2 (tmp[2], operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode, k,
+			  gen_rtx_LT (QImode, tmp[2], tmp[0])));
+  emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k));
+  emit_move_insn (operands[0], tmp[2]);
+  DONE;
+})
+
 (define_expand "vec_pack_trunc_<mode>"
   [(set (match_dup 3)
 	(float_truncate:<sf2dfmode>
@@ -4409,7 +5101,7 @@
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
 
-(define_insn "avx512f_unpckhps512"
+(define_insn "<mask_codefor>avx512f_unpckhps512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4424,7 +5116,7 @@
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
   "TARGET_AVX512F"
-  "vunpckhps\t{%2, %1, %0|%0, %1, %2}"
+  "vunpckhps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -4497,7 +5189,7 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
 
-(define_insn "avx512f_unpcklps512"
+(define_insn "<mask_codefor>avx512f_unpcklps512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4512,7 +5204,7 @@
 		     (const_int 12) (const_int 28)
 		     (const_int 13) (const_int 29)])))]
   "TARGET_AVX512F"
-  "vunpcklps\t{%2, %1, %0|%0, %1, %2}"
+  "vunpcklps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -4620,7 +5312,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V4SF")])
 
-(define_insn "avx512f_movshdup512"
+(define_insn "<mask_codefor>avx512f_movshdup512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4635,7 +5327,7 @@
 		     (const_int 13) (const_int 13)
 		     (const_int 15) (const_int 15)])))]
   "TARGET_AVX512F"
-  "vmovshdup\t{%1, %0|%0, %1}"
+  "vmovshdup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -4673,7 +5365,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V4SF")])
 
-(define_insn "avx512f_movsldup512"
+(define_insn "<mask_codefor>avx512f_movsldup512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4688,7 +5380,7 @@
 		     (const_int 12) (const_int 12)
 		     (const_int 14) (const_int 14)])))]
   "TARGET_AVX512F"
-  "vmovsldup\t{%1, %0|%0, %1}"
+  "vmovsldup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -5222,8 +5914,71 @@
   operands[1] = adjust_address (operands[1], SFmode, INTVAL (operands[2]) * 4);
 })
 
-(define_insn "avx512f_vextract<shuffletype>32x4_1"
-  [(set (match_operand:<ssequartermode> 0 "nonimmediate_operand" "=vm")
+(define_expand "avx512f_vextract<shuffletype>32x4_mask"
+  [(match_operand:<ssequartermode> 0 "nonimmediate_operand")
+   (match_operand:V16FI 1 "register_operand")
+   (match_operand:SI 2 "const_0_to_3_operand")
+   (match_operand:<ssequartermode> 3 "nonimmediate_operand")
+   (match_operand:QI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
+    operands[0] = force_reg (<ssequartermode>mode, operands[0]);
+  switch (INTVAL (operands[2]))
+    {
+    case 0:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (0), GEN_INT (1), GEN_INT (2),
+          GEN_INT (3), operands[3], operands[4]));
+      break;
+    case 1:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (4), GEN_INT (5), GEN_INT (6),
+          GEN_INT (7), operands[3], operands[4]));
+      break;
+    case 2:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (8), GEN_INT (9), GEN_INT (10),
+          GEN_INT (11), operands[3], operands[4]));
+      break;
+    case 3:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (12), GEN_INT (13), GEN_INT (14),
+          GEN_INT (15), operands[3], operands[4]));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  DONE;
+})
+
+(define_insn "avx512f_vextract<shuffletype>32x4_1_maskm"
+  [(set (match_operand:<ssequartermode> 0 "memory_operand" "=m")
+	(vec_merge:<ssequartermode>
+	  (vec_select:<ssequartermode>
+	    (match_operand:V16FI 1 "register_operand" "v")
+	    (parallel [(match_operand 2  "const_0_to_15_operand")
+	      (match_operand 3  "const_0_to_15_operand")
+	      (match_operand 4  "const_0_to_15_operand")
+	      (match_operand 5  "const_0_to_15_operand")]))
+	  (match_operand:<ssequartermode> 6 "memory_operand" "0")
+	  (match_operand:QI 7 "register_operand" "k")))]
+  "TARGET_AVX512F && (INTVAL (operands[2]) = INTVAL (operands[3]) - 1)
+  && (INTVAL (operands[3]) = INTVAL (operands[4]) - 1)
+  && (INTVAL (operands[4]) = INTVAL (operands[5]) - 1)"
+{
+  operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2);
+  return "vextract<shuffletype>32x4\t{%2, %1, %0%{%7%}|%0%{%7%}, %1, %2}";
+}
+  [(set_attr "type" "sselog")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>"
+  [(set (match_operand:<ssequartermode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
 	(vec_select:<ssequartermode>
 	  (match_operand:V16FI 1 "register_operand" "v")
 	  (parallel [(match_operand 2  "const_0_to_15_operand")
@@ -5235,7 +5990,7 @@
   && (INTVAL (operands[4]) = INTVAL (operands[5]) - 1)"
 {
   operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2);
-  return "vextract<shuffletype>32x4\t{%2, %1, %0|%0, %1, %2}";
+  return "vextract<shuffletype>32x4\t{%2, %1, %0<mask_operand6>|%0<mask_operand6>, %1, %2}";
 }
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
@@ -5247,6 +6002,35 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_expand "avx512f_vextract<shuffletype>64x4_mask"
+  [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
+   (match_operand:V8FI 1 "register_operand")
+   (match_operand:SI 2 "const_0_to_1_operand")
+   (match_operand:<ssehalfvecmode> 3 "nonimmediate_operand")
+   (match_operand:QI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  rtx (*insn)(rtx, rtx, rtx, rtx);
+
+  if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
+    operands[0] = force_reg (<ssequartermode>mode, operands[0]);
+
+  switch (INTVAL (operands[2]))
+    {
+    case 0:
+      insn = gen_vec_extract_lo_<mode>_mask;
+      break;
+    case 1:
+      insn = gen_vec_extract_hi_<mode>_mask;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (insn (operands[0], operands[1], operands[3], operands[4]));
+  DONE;
+})
+
 (define_split
   [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
 	(vec_select:<ssehalfvecmode>
@@ -5266,14 +6050,36 @@
   DONE;
 })
 
-(define_insn "vec_extract_lo_<mode>"
-  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
+(define_insn "vec_extract_lo_<mode>_maskm"
+  [(set (match_operand:<ssehalfvecmode> 0 "memory_operand" "=m")
+	(vec_merge:<ssehalfvecmode>
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V8FI 1 "register_operand" "v")
+	    (parallel [(const_int 0) (const_int 1)
+	      (const_int 2) (const_int 3)]))
+	  (match_operand:<ssehalfvecmode> 2 "memory_operand" "0")
+	  (match_operand:QI 3 "register_operand" "k")))]
+  "TARGET_AVX512F"
+"vextract<shuffletype>64x4\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "vec_extract_lo_<mode><mask_name>"
+  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V8FI 1 "nonimmediate_operand" "vm")
 	  (parallel [(const_int 0) (const_int 1)
             (const_int 2) (const_int 3)])))]
   "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
-  "#"
+{
+  if (<mask_applied>)
+    return "vextract<shuffletype>64x4\t{$0x0, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x0}";
+  else
+    return "#";
+}
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -5284,14 +6090,32 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "vec_extract_hi_<mode>"
-  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
+(define_insn "vec_extract_hi_<mode>_maskm"
+  [(set (match_operand:<ssehalfvecmode> 0 "memory_operand" "=m")
+	(vec_merge:<ssehalfvecmode>
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V8FI 1 "register_operand" "v")
+	    (parallel [(const_int 4) (const_int 5)
+	      (const_int 6) (const_int 7)]))
+	  (match_operand:<ssehalfvecmode> 2 "memory_operand" "0")
+	  (match_operand:QI 3 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vextract<shuffletype>64x4\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "vec_extract_hi_<mode><mask_name>"
+  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V8FI 1 "register_operand" "v")
 	  (parallel [(const_int 4) (const_int 5)
             (const_int 6) (const_int 7)])))]
   "TARGET_AVX512F"
-  "vextract<shuffletype>64x4\t{$0x1, %1, %0|%0, %1, 0x1}"
+  "vextract<shuffletype>64x4\t{$0x1, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x1}"
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -5643,7 +6467,7 @@
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
-(define_insn "avx512f_unpckhpd512"
+(define_insn "<mask_codefor>avx512f_unpckhpd512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5654,7 +6478,7 @@
 		     (const_int 5) (const_int 13)
 		     (const_int 7) (const_int 15)])))]
   "TARGET_AVX512F"
-  "vunpckhpd\t{%2, %1, %0|%0, %1, %2}"
+  "vunpckhpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8DF")])
@@ -5739,7 +6563,7 @@
    (set_attr "prefix" "orig,vex,maybe_vex,orig,vex,maybe_vex")
    (set_attr "mode" "V2DF,V2DF,DF,V1DF,V1DF,V1DF")])
 
-(define_expand "avx512f_movddup512"
+(define_expand "avx512f_movddup512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5751,7 +6575,7 @@
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F")
 
-(define_expand "avx512f_unpcklpd512"
+(define_expand "avx512f_unpcklpd512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5763,7 +6587,7 @@
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F")
 
-(define_insn "*avx512f_unpcklpd512"
+(define_insn "*avx512f_unpcklpd512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v,v")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5775,8 +6599,8 @@
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F"
   "@
-   vunpcklpd\t{%2, %1, %0|%0, %1, %2}
-   vmovddup\t{%1, %0|%0, %1}"
+   vunpcklpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}
+   vmovddup\t{%1, %0<mask_operand3>|%0<mask_operand3>, %1}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8DF")])
@@ -5913,12 +6737,13 @@
   operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8);
 })
 
-(define_insn "avx512f_vmscalef<mode>"
+(define_insn "*avx512f_vmscalef<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
-	  (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v")
-			  (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
-			 UNSPEC_SCALEF)
+	  (unspec:VF_128
+	    [(match_operand:VF_128 1 "register_operand" "v")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	    UNSPEC_SCALEF)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
@@ -5926,13 +6751,14 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
-(define_insn "avx512f_scalef<mode>"
+(define_insn "avx512f_scalef<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
-	(unspec:VF_512 [(match_operand:VF_512 1 "register_operand" "v")
-			(match_operand:VF_512 2 "nonimmediate_operand" "vm")]
-		       UNSPEC_SCALEF))]
+	(unspec:VF_512
+	  [(match_operand:VF_512 1 "register_operand" "v")
+	   (match_operand:VF_512 2 "nonimmediate_operand" "vm")]
+	  UNSPEC_SCALEF))]
   "TARGET_AVX512F"
-  "%vscalef<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vscalef<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<MODE>")])
 
@@ -5950,21 +6776,39 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getexp<mode>"
+(define_insn "avx512f_vternlog<mode>_mask"
+  [(set (match_operand:VI48_512 0 "register_operand" "=v")
+	(vec_merge:VI48_512
+	  (unspec:VI48_512
+	    [(match_operand:VI48_512 1 "register_operand" "0")
+	     (match_operand:VI48_512 2 "register_operand" "v")
+	     (match_operand:VI48_512 3 "nonimmediate_operand" "vm")
+	     (match_operand:SI 4 "const_0_to_255_operand")]
+	    UNSPEC_VTERNLOG)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vpternlog<ssemodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_getexp<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
         (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
                         UNSPEC_GETEXP))]
    "TARGET_AVX512F"
-   "vgetexp<ssemodesuffix>\t{%1, %0|%0, %1}";
+   "vgetexp<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<MODE>")])
 
 (define_insn "avx512f_sgetexp<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
-	  (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v")
-			  (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
-			 UNSPEC_GETEXP)
+	  (unspec:VF_128
+	    [(match_operand:VF_128 1 "register_operand" "v")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	    UNSPEC_GETEXP)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
@@ -5972,17 +6816,48 @@
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_align<mode>"
+(define_insn "<mask_codefor>avx512f_align<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
         (unspec:VI48_512 [(match_operand:VI48_512 1 "register_operand" "v")
 			  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")
 			  (match_operand:SI 3 "const_0_to_255_operand")]
 			 UNSPEC_ALIGN))]
   "TARGET_AVX512F"
-  "valign<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  "valign<ssemodesuffix>\t{%3, %2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2, %3}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_expand "avx512f_shufps512_mask"
+  [(match_operand:V16SF 0 "register_operand")
+   (match_operand:V16SF 1 "register_operand")
+   (match_operand:V16SF 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V16SF 4 "register_operand")
+   (match_operand:HI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shufps512_1_mask (operands[0], operands[1], operands[2],
+					  GEN_INT ((mask >> 0) & 3),
+					  GEN_INT ((mask >> 2) & 3),
+					  GEN_INT (((mask >> 4) & 3) + 16),
+					  GEN_INT (((mask >> 6) & 3) + 16),
+					  GEN_INT (((mask >> 0) & 3) + 4),
+					  GEN_INT (((mask >> 2) & 3) + 4),
+					  GEN_INT (((mask >> 4) & 3) + 20),
+					  GEN_INT (((mask >> 6) & 3) + 20),
+					  GEN_INT (((mask >> 0) & 3) + 8),
+					  GEN_INT (((mask >> 2) & 3) + 8),
+					  GEN_INT (((mask >> 4) & 3) + 24),
+					  GEN_INT (((mask >> 6) & 3) + 24),
+					  GEN_INT (((mask >> 0) & 3) + 12),
+					  GEN_INT (((mask >> 2) & 3) + 12),
+					  GEN_INT (((mask >> 4) & 3) + 28),
+					  GEN_INT (((mask >> 6) & 3) + 28),
+					  operands[4], operands[5]));
+  DONE;
+})
+
 (define_insn "avx512f_fixupimm<mode>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
         (unspec:VF_512
@@ -5996,6 +6871,22 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fixupimm<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+          (unspec:VF_512
+            [(match_operand:VF_512 1 "register_operand" "0")
+	     (match_operand:VF_512 2 "register_operand" "v")
+             (match_operand:<ssefixupmode> 3 "nonimmediate_operand" "vm")
+             (match_operand:SI 4 "const_0_to_255_operand")]
+             UNSPEC_FIXUPIMM)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfixupimm<ssemodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}";
+  [(set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "avx512f_sfixupimm<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
@@ -6012,19 +6903,38 @@
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_rndscale<mode>"
+(define_insn "avx512f_sfixupimm<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (unspec:VF_128
+	       [(match_operand:VF_128 1 "register_operand" "0")
+		(match_operand:VF_128 2 "register_operand" "v")
+		(match_operand:<ssefixupmode> 3 "nonimmediate_operand" "vm")
+		(match_operand:SI 4 "const_0_to_255_operand")]
+	       UNSPEC_FIXUPIMM)
+	    (match_dup 1)
+	    (const_int 1))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfixupimm<ssescalarmodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}";
+  [(set_attr "prefix" "evex")
+   (set_attr "mode" "<ssescalarmode>")])
+
+(define_insn "avx512f_rndscale<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_AVX512F"
-  "vrndscale<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrndscale<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_rndscale<mode>"
+(define_insn "*avx512f_rndscale<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -6041,7 +6951,7 @@
    (set_attr "mode" "<MODE>")])
 
 ;; One bit in mask selects 2 elements.
-(define_insn "avx512f_shufps512_1"
+(define_insn "avx512f_shufps512_1<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -6084,14 +6994,37 @@
   mask |= (INTVAL (operands[6]) - 16) << 6;
   operands[3] = GEN_INT (mask);
 
-  return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshufps\t{%3, %2, %1, %0<mask_operand19>|%0<mask_operand19>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
 
-(define_insn "avx512f_shufpd512_1"
+(define_expand "avx512f_shufpd512_mask"
+  [(match_operand:V8DF 0 "register_operand")
+   (match_operand:V8DF 1 "register_operand")
+   (match_operand:V8DF 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V8DF 4 "register_operand")
+   (match_operand:QI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shufpd512_1_mask (operands[0], operands[1], operands[2],
+					GEN_INT (mask & 1),
+					GEN_INT (mask & 2 ? 9 : 8),
+					GEN_INT (mask & 4 ? 3 : 2),
+					GEN_INT (mask & 8 ? 11 : 10),
+					GEN_INT (mask & 16 ? 5 : 4),
+					GEN_INT (mask & 32 ? 13 : 12),
+					GEN_INT (mask & 64 ? 7 : 6),
+					GEN_INT (mask & 128 ? 15 : 14),
+					operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_shufpd512_1<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -6118,7 +7051,7 @@
   mask |= (INTVAL (operands[10]) - 14) << 7;
   operands[3] = GEN_INT (mask);
 
-  return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshufpd\t{%3, %2, %1, %0<mask_operand11>|%0<mask_operand11>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
@@ -6198,7 +7131,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_highv8di"
+(define_insn "<mask_codefor>avx512f_interleave_highv8di<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(vec_select:V8DI
 	  (vec_concat:V16DI
@@ -6209,7 +7142,7 @@
 		     (const_int 5) (const_int 13)
 		     (const_int 7) (const_int 15)])))]
   "TARGET_AVX512F"
-  "vpunpckhqdq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpckhqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -6248,7 +7181,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_lowv8di"
+(define_insn "<mask_codefor>avx512f_interleave_lowv8di<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(vec_select:V8DI
 	  (vec_concat:V16DI
@@ -6259,7 +7192,7 @@
 		     (const_int 4) (const_int 12)
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F"
-  "vpunpcklqdq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpcklqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -6630,6 +7563,20 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_<code><pmov_src_lower><mode>2_mask"
+  [(set (match_operand:PMOV_DST_MODE 0 "nonimmediate_operand" "=v,m")
+    (vec_merge:PMOV_DST_MODE
+      (any_truncate:PMOV_DST_MODE
+        (match_operand:<pmov_src_mode> 1 "register_operand" "v,v"))
+      (match_operand:PMOV_DST_MODE 2 "vector_move_operand" "0C,0")
+      (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "vpmov<trunsuffix><pmov_suff>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "memory" "none,store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "*avx512f_<code>v8div16qi2"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 	(vec_concat:V16QI
@@ -6663,6 +7610,55 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "TI")])
 
+(define_insn "avx512f_<code>v8div16qi2_mask"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+    (vec_concat:V16QI
+      (vec_merge:V8QI
+        (any_truncate:V8QI
+          (match_operand:V8DI 1 "register_operand" "v"))
+        (vec_select:V8QI
+          (match_operand:V16QI 2 "vector_move_operand" "0C")
+          (parallel [(const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)]))
+        (match_operand:QI 3 "register_operand" "k"))
+      (const_vector:V8QI [(const_int 0) (const_int 0)
+                          (const_int 0) (const_int 0)
+                          (const_int 0) (const_int 0)
+                          (const_int 0) (const_int 0)])))]
+  "TARGET_AVX512F"
+  "vpmov<trunsuffix>qb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+(define_insn "*avx512f_<code>v8div16qi2_store_mask"
+  [(set (match_operand:V16QI 0 "memory_operand" "=m")
+    (vec_concat:V16QI
+      (vec_merge:V8QI
+        (any_truncate:V8QI
+          (match_operand:V8DI 1 "register_operand" "v"))
+        (vec_select:V8QI
+          (match_dup 0)
+          (parallel [(const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)]))
+        (match_operand:QI 2 "register_operand" "k"))
+      (vec_select:V8QI
+        (match_dup 0)
+        (parallel [(const_int 8) (const_int 9)
+                   (const_int 10) (const_int 11)
+                   (const_int 12) (const_int 13)
+                   (const_int 14) (const_int 15)]))))]
+  "TARGET_AVX512F"
+  "vpmov<trunsuffix>qb\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel integral arithmetic
@@ -6677,27 +7673,27 @@
   "TARGET_SSE2"
   "operands[2] = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));")
 
-(define_expand "<plusminus_insn><mode>3"
+(define_expand "<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VI_AVX2 0 "register_operand")
 	(plusminus:VI_AVX2
 	  (match_operand:VI_AVX2 1 "nonimmediate_operand")
 	  (match_operand:VI_AVX2 2 "nonimmediate_operand")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*<plusminus_insn><mode>3"
+(define_insn "*<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
 	(plusminus:VI_AVX2
 	  (match_operand:VI_AVX2 1 "nonimmediate_operand" "<comm>0,v")
 	  (match_operand:VI_AVX2 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+  "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
-   vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseiadd")
    (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "<sse2_avx2>_<plusminus_insn><mode>3"
@@ -6787,7 +7783,7 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "vec_widen_umult_even_v16si"
+(define_expand "vec_widen_umult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand")
         (mult:V8DI
           (zero_extend:V8DI
@@ -6807,7 +7803,7 @@
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
 
-(define_insn "*vec_widen_umult_even_v16si"
+(define_insn "*vec_widen_umult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
         (mult:V8DI
           (zero_extend:V8DI
@@ -6825,7 +7821,7 @@
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
   "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)"
-  "vpmuludq\t{%2, %1, %0|%0, %1, %2}"
+  "vpmuludq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
@@ -6902,7 +7898,7 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
-(define_expand "vec_widen_smult_even_v16si"
+(define_expand "vec_widen_smult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand")
         (mult:V8DI
           (sign_extend:V8DI
@@ -6922,7 +7918,7 @@
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
 
-(define_insn "*vec_widen_smult_even_v16si"
+(define_insn "*vec_widen_smult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=x")
         (mult:V8DI
           (sign_extend:V8DI
@@ -6940,7 +7936,7 @@
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
   "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)"
-  "vpmuldq\t{%2, %1, %0|%0, %1, %2}"
+  "vpmuldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
@@ -7151,12 +8147,12 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
-(define_expand "mul<mode>3"
+(define_expand "mul<mode>3<mask_name>"
   [(set (match_operand:VI4_AVX512F 0 "register_operand")
 	(mult:VI4_AVX512F
 	  (match_operand:VI4_AVX512F 1 "general_vector_operand")
 	  (match_operand:VI4_AVX512F 2 "general_vector_operand")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
 {
   if (TARGET_SSE4_1)
     {
@@ -7173,19 +8169,19 @@
     }
 })
 
-(define_insn "*<sse4_1_avx2>_mul<mode>3"
+(define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
   [(set (match_operand:VI4_AVX512F 0 "register_operand" "=x,v")
 	(mult:VI4_AVX512F
 	  (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands)"
+  "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    pmulld\t{%2, %0|%0, %2}
-   vpmulld\t{%2, %1, %0|%0, %1, %2}"
+   vpmulld\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "btver2_decode" "vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -7298,6 +8294,20 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "ashr<mode>3<mask_name>"
+  [(set (match_operand:VI48_512 0 "register_operand" "=v,v")
+	(ashiftrt:VI48_512
+	  (match_operand:VI48_512 1 "nonimmediate_operand" "v,vm")
+	  (match_operand:SI 2 "nonmemory_operand" "v,N")))]
+  "TARGET_AVX512F && <mask_mode512bit_condition>"
+  "vpsra<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+  [(set_attr "type" "sseishft")
+   (set (attr "length_immediate")
+     (if_then_else (match_operand 2 "const_int_operand")
+       (const_string "1")
+       (const_string "0")))
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "<shift_insn><mode>3"
   [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x")
 	(any_lshift:VI248_AVX2
@@ -7317,13 +8327,13 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<shift_insn><mode>3"
+(define_insn "<shift_insn><mode>3<mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v,v")
 	(any_lshift:VI48_512
 	  (match_operand:VI48_512 1 "register_operand" "v,m")
 	  (match_operand:SI 2 "nonmemory_operand" "vN,N")))]
-  "TARGET_AVX512F"
-  "vp<vshift><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX512F && <mask_mode512bit_condition>"
+  "vp<vshift><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
    (set_attr "type" "sseishft")
    (set (attr "length_immediate")
@@ -7333,6 +8343,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+
 (define_expand "vec_shl_<mode>"
   [(set (match_operand:VI_128 0 "register_operand")
 	(ashift:V1TI
@@ -7408,41 +8419,42 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_<rotate>v<mode>"
+(define_insn "avx512f_<rotate>v<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(any_rotate:VI48_512
 	  (match_operand:VI48_512 1 "register_operand" "v")
 	  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vp<rotate>v<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vp<rotate>v<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_<rotate><mode>"
+(define_insn "avx512f_<rotate><mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(any_rotate:VI48_512
 	  (match_operand:VI48_512 1 "nonimmediate_operand" "vm")
 	  (match_operand:SI 2 "const_0_to_255_operand")))]
   "TARGET_AVX512F"
-  "vp<rotate><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vp<rotate><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "<code><mode>3"
+(define_expand "<code><mode>3<mask_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand")
 	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand")))]
-  "TARGET_AVX2"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*avx2_<code><mode>3"
+(define_insn "*avx2_<code><mode>3<mask_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand" "=v")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand" "%v")
 	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
-  "vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
+   && <mask_mode512bit_condition>"
+  "vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseiadd")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_evex")
@@ -7978,19 +8990,19 @@
   operands[2] = force_reg (<MODE>mode, gen_rtx_CONST_VECTOR (<MODE>mode, v));
 })
 
-(define_expand "<sse2_avx2>_andnot<mode>3"
+(define_expand "<sse2_avx2>_andnot<mode>3<mask_name>"
   [(set (match_operand:VI_AVX2 0 "register_operand")
 	(and:VI_AVX2
 	  (not:VI_AVX2 (match_operand:VI_AVX2 1 "register_operand"))
 	  (match_operand:VI_AVX2 2 "nonimmediate_operand")))]
-  "TARGET_SSE2")
+  "TARGET_SSE2 && <mask_mode512bit_condition>")
 
-(define_insn "*andnot<mode>3"
+(define_insn "*andnot<mode>3<mask_name>"
   [(set (match_operand:VI 0 "register_operand" "=x,v")
 	(and:VI
 	  (not:VI (match_operand:VI 1 "register_operand" "0,v"))
 	  (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
 {
   static char buf[64];
   const char *ops;
@@ -8030,7 +9042,7 @@
       ops = "%s\t{%%2, %%0|%%0, %%2}";
       break;
     case 1:
-      ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
+      ops = "v%s\t{%%2, %%1, %%0<mask_operand3_1>|%%0<mask_operand3_1>, %%1, %%2}";
       break;
     default:
       gcc_unreachable ();
@@ -8047,7 +9059,7 @@
 	    (eq_attr "mode" "TI"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set (attr "mode")
 	(cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
 		 (const_string "<ssePSmode>")
@@ -8075,12 +9087,12 @@
   DONE;
 })
 
-(define_insn "*<code><mode>3"
+(define_insn "<mask_codefor><code><mode>3<mask_name>"
   [(set (match_operand:VI 0 "register_operand" "=x,v")
 	(any_logic:VI
 	  (match_operand:VI 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE
+  "TARGET_SSE && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
   static char buf[64];
@@ -8122,7 +9134,7 @@
       ops = "%s\t{%%2, %%0|%%0, %%2}";
       break;
     case 1:
-      ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
+      ops = "v%s\t{%%2, %%1, %%0<mask_operand3_1>|%%0<mask_operand3_1>, %%1, %%2}";
       break;
     default:
       gcc_unreachable ();
@@ -8139,7 +9151,7 @@
 	    (eq_attr "mode" "TI"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set (attr "mode")
 	(cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
 		 (const_string "<ssePSmode>")
@@ -8447,7 +9459,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_highv16si"
+(define_insn "<mask_codefor>avx512f_interleave_highv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(vec_select:V16SI
 	  (vec_concat:V32SI
@@ -8462,7 +9474,7 @@
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
   "TARGET_AVX512F"
-  "vpunpckhdq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpckhdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -8502,7 +9514,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_lowv16si"
+(define_insn "<mask_codefor>avx512f_interleave_lowv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(vec_select:V16SI
 	  (vec_concat:V32SI
@@ -8517,7 +9529,7 @@
 		     (const_int 12) (const_int 28)
 		     (const_int 13) (const_int 29)])))]
   "TARGET_AVX512F"
-  "vpunpckldq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpckldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -8640,7 +9652,45 @@
    (set_attr "prefix" "orig,orig,vex,vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_vinsert<shuffletype>32x4_1"
+(define_expand "avx512f_vinsert<shuffletype>32x4_mask"
+  [(match_operand:V16FI 0 "register_operand")
+   (match_operand:V16FI 1 "register_operand")
+   (match_operand:<ssequartermode> 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_3_operand")
+   (match_operand:V16FI 4 "register_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  switch (INTVAL (operands[3]))
+    {
+    case 0:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xFFF), operands[4],
+	  operands[5]));
+      break;
+    case 1:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xF0FF), operands[4],
+	  operands[5]));
+      break;
+    case 2:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xFF0F), operands[4],
+	  operands[5]));
+      break;
+    case 3:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xFFF0), operands[4],
+	  operands[5]));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  DONE;
+
+})
+
+(define_insn "<mask_codefor>avx512f_vinsert<shuffletype>32x4_1<mask_name>"
   [(set (match_operand:V16FI 0 "register_operand" "=v")
 	(vec_merge:V16FI
 	  (match_operand:V16FI 1 "register_operand" "v")
@@ -8663,14 +9713,35 @@
 
   operands[3] = GEN_INT (mask);
 
-  return "vinsert<shuffletype>32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vinsert<shuffletype>32x4\t{%3, %2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "vec_set_lo_<mode>"
+(define_expand "avx512f_vinsert<shuffletype>64x4_mask"
+  [(match_operand:V8FI 0 "register_operand")
+   (match_operand:V8FI 1 "register_operand")
+   (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_1_operand")
+   (match_operand:V8FI 4 "register_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  if (mask == 0)
+    emit_insn (gen_vec_set_lo_<mode>_mask
+      (operands[0], operands[1], operands[2],
+       operands[4], operands[5]));
+  else
+    emit_insn (gen_vec_set_hi_<mode>_mask
+      (operands[0], operands[1], operands[2],
+       operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "vec_set_lo_<mode><mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
 	(vec_concat:V8FI
 	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
@@ -8679,13 +9750,13 @@
 	    (parallel [(const_int 4) (const_int 5)
               (const_int 6) (const_int 7)]))))]
   "TARGET_AVX512F"
-  "vinsert<shuffletype>64x4\t{$0x0, %2, %1, %0|%0, %1, %2, $0x0}"
+  "vinsert<shuffletype>64x4\t{$0x0, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, $0x0}"
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "vec_set_hi_<mode>"
+(define_insn "vec_set_hi_<mode><mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
 	(vec_concat:V8FI
 	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
@@ -8694,13 +9765,37 @@
 	    (parallel [(const_int 0) (const_int 1)
               (const_int 2) (const_int 3)]))))]
   "TARGET_AVX512F"
-  "vinsert<shuffletype>64x4\t{$0x1, %2, %1, %0|%0, %1, %2, $0x1}"
+  "vinsert<shuffletype>64x4\t{$0x1, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, $0x1}"
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "avx512f_shuf_<shuffletype>64x2_1"
+(define_expand "avx512f_shuf_<shuffletype>64x2_mask"
+  [(match_operand:V8FI 0 "register_operand")
+   (match_operand:V8FI 1 "register_operand")
+   (match_operand:V8FI 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V8FI 4 "register_operand")
+   (match_operand:QI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shuf_<shuffletype>64x2_1_mask
+      (operands[0], operands[1], operands[2],
+       GEN_INT (((mask >> 0) & 3) * 2),
+       GEN_INT (((mask >> 0) & 3) * 2 + 1),
+       GEN_INT (((mask >> 2) & 3) * 2),
+       GEN_INT (((mask >> 2) & 3) * 2 + 1),
+       GEN_INT (((mask >> 4) & 3) * 2 + 8),
+       GEN_INT (((mask >> 4) & 3) * 2 + 9),
+       GEN_INT (((mask >> 6) & 3) * 2 + 8),
+       GEN_INT (((mask >> 6) & 3) * 2 + 9),
+       operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_shuf_<shuffletype>64x2_1<mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
 	(vec_select:V8FI
 	  (vec_concat:<ssedoublemode>
@@ -8727,14 +9822,46 @@
   mask |= (INTVAL (operands[9]) - 8) / 2 << 6;
   operands[3] = GEN_INT (mask);
 
-  return "vshuf<shuffletype>64x2\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshuf<shuffletype>64x2\t{%3, %2, %1, %0<mask_operand11>|%0<mask_operand11>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_shuf_<shuffletype>32x4_1"
+(define_expand "avx512f_shuf_<shuffletype>32x4_mask"
+  [(match_operand:V16FI 0 "register_operand")
+   (match_operand:V16FI 1 "register_operand")
+   (match_operand:V16FI 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V16FI 4 "register_operand")
+   (match_operand:HI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shuf_<shuffletype>32x4_1_mask
+      (operands[0], operands[1], operands[2],
+       GEN_INT (((mask >> 0) & 3) * 4),
+       GEN_INT (((mask >> 0) & 3) * 4 + 1),
+       GEN_INT (((mask >> 0) & 3) * 4 + 2),
+       GEN_INT (((mask >> 0) & 3) * 4 + 3),
+       GEN_INT (((mask >> 2) & 3) * 4),
+       GEN_INT (((mask >> 2) & 3) * 4 + 1),
+       GEN_INT (((mask >> 2) & 3) * 4 + 2),
+       GEN_INT (((mask >> 2) & 3) * 4 + 3),
+       GEN_INT (((mask >> 4) & 3) * 4 + 16),
+       GEN_INT (((mask >> 4) & 3) * 4 + 17),
+       GEN_INT (((mask >> 4) & 3) * 4 + 18),
+       GEN_INT (((mask >> 4) & 3) * 4 + 19),
+       GEN_INT (((mask >> 6) & 3) * 4 + 16),
+       GEN_INT (((mask >> 6) & 3) * 4 + 17),
+       GEN_INT (((mask >> 6) & 3) * 4 + 18),
+       GEN_INT (((mask >> 6) & 3) * 4 + 19),
+       operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_shuf_<shuffletype>32x4_1<mask_name>"
   [(set (match_operand:V16FI 0 "register_operand" "=v")
 	(vec_select:V16FI
 	  (vec_concat:<ssedoublemode>
@@ -8777,14 +9904,44 @@
   mask |= (INTVAL (operands[15]) - 16) / 4 << 6;
   operands[3] = GEN_INT (mask);
 
-  return "vshuf<shuffletype>32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshuf<shuffletype>32x4\t{%3, %2, %1, %0<mask_operand19>|%0<mask_operand19>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_pshufd_1"
+(define_expand "avx512f_pshufdv3_mask"
+  [(match_operand:V16SI 0 "register_operand")
+   (match_operand:V16SI 1 "nonimmediate_operand")
+   (match_operand:SI 2 "const_0_to_255_operand")
+   (match_operand:V16SI 3 "register_operand")
+   (match_operand:HI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[2]);
+  emit_insn (gen_avx512f_pshufd_1_mask (operands[0], operands[1],
+				       GEN_INT ((mask >> 0) & 3),
+				       GEN_INT ((mask >> 2) & 3),
+				       GEN_INT ((mask >> 4) & 3),
+				       GEN_INT ((mask >> 6) & 3),
+				       GEN_INT (((mask >> 0) & 3) + 4),
+				       GEN_INT (((mask >> 2) & 3) + 4),
+				       GEN_INT (((mask >> 4) & 3) + 4),
+				       GEN_INT (((mask >> 6) & 3) + 4),
+				       GEN_INT (((mask >> 0) & 3) + 8),
+				       GEN_INT (((mask >> 2) & 3) + 8),
+				       GEN_INT (((mask >> 4) & 3) + 8),
+				       GEN_INT (((mask >> 6) & 3) + 8),
+				       GEN_INT (((mask >> 0) & 3) + 12),
+				       GEN_INT (((mask >> 2) & 3) + 12),
+				       GEN_INT (((mask >> 4) & 3) + 12),
+				       GEN_INT (((mask >> 6) & 3) + 12),
+				       operands[3], operands[4]));
+  DONE;
+})
+
+(define_insn "avx512f_pshufd_1<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(vec_select:V16SI
 	  (match_operand:V16SI 1 "nonimmediate_operand" "vm")
@@ -8825,7 +9982,7 @@
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
 
-  return "vpshufd\t{%2, %1, %0|%0, %1, %2}";
+  return "vpshufd\t{%2, %1, %0<mask_operand18>|%0<mask_operand18>, %1, %2}";
 }
   [(set_attr "type" "sselog1")
    (set_attr "prefix" "evex")
@@ -10276,12 +11433,12 @@
    (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
    (set_attr "mode" "DI")])
 
-(define_insn "abs<mode>2"
+(define_insn "abs<mode>2<mask_name>"
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand" "=v")
 	(abs:VI124_AVX2_48_AVX512F
 	  (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSSE3"
-  "%vpabs<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "TARGET_SSSE3 && <mask_mode512bit_condition>"
+  "%vpabs<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sselog1")
    (set_attr "prefix_data16" "1")
    (set_attr "prefix_extra" "1")
@@ -10622,12 +11779,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v16qiv16si2"
+(define_insn "<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
 	  (match_operand:V16QI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>bd\t{%1, %0|%0, %q1}"
+  "vpmov<extsuffix>bd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10662,12 +11819,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v16hiv16si2"
+(define_insn "avx512f_<code>v16hiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
 	  (match_operand:V16HI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>wd\t{%1, %0|%0, %1}"
+  "vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10697,7 +11854,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v8qiv8di2"
+(define_insn "avx512f_<code>v8qiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (vec_select:V8QI
@@ -10707,7 +11864,7 @@
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>bq\t{%1, %0|%0, %k1}"
+  "vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10739,12 +11896,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v8hiv8di2"
+(define_insn "avx512f_<code>v8hiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8HI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>wq\t{%1, %0|%0, %q1}"
+  "vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10776,12 +11933,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v8siv8di2"
+(define_insn "avx512f_<code>v8siv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>dq\t{%1, %0|%0, %1}"
+  "vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -11564,33 +12721,33 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512er_exp2<mode>"
+(define_insn "avx512er_exp2<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_EXP2))]
   "TARGET_AVX512ER"
-  "vexp2<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vexp2<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512er_rcp28<mode>"
+(define_insn "<mask_codefor>avx512er_rcp28<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RCP28))]
   "TARGET_AVX512ER"
-  "vrcp28<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrcp28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512er_rsqrt28<mode>"
+(define_insn "<mask_codefor>avx512er_rsqrt28<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RSQRT28))]
   "TARGET_AVX512ER"
-  "vrsqrt28<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrsqrt28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
@@ -12640,16 +13797,16 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<avx2_avx512f>_permvar<mode>"
+(define_insn "<avx2_avx512f>_permvar<mode><mask_name>"
   [(set (match_operand:VI48F_256_512 0 "register_operand" "=v")
 	(unspec:VI48F_256_512
 	  [(match_operand:VI48F_256_512 1 "nonimmediate_operand" "vm")
 	   (match_operand:<sseintvecmode> 2 "register_operand" "v")]
 	  UNSPEC_VPERMVAR))]
-  "TARGET_AVX2"
-  "vperm<ssemodesuffix>\t{%1, %2, %0|%0, %2, %1}"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "vperm<ssemodesuffix>\t{%1, %2, %0<mask_operand3>|%0<mask_operand3>, %2, %1}"
   [(set_attr "type" "sselog")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix2>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "<avx2_avx512f>_perm<mode>"
@@ -12660,14 +13817,32 @@
 {
   int mask = INTVAL (operands[2]);
   emit_insn (gen_<avx2_avx512f>_perm<mode>_1 (operands[0], operands[1],
-					    GEN_INT ((mask >> 0) & 3),
-					    GEN_INT ((mask >> 2) & 3),
-					    GEN_INT ((mask >> 4) & 3),
-					    GEN_INT ((mask >> 6) & 3)));
+					      GEN_INT ((mask >> 0) & 3),
+					      GEN_INT ((mask >> 2) & 3),
+					      GEN_INT ((mask >> 4) & 3),
+					      GEN_INT ((mask >> 6) & 3)));
+  DONE;
+})
+
+(define_expand "avx512f_perm<mode>_mask"
+  [(match_operand:V8FI 0 "register_operand")
+   (match_operand:V8FI 1 "nonimmediate_operand")
+   (match_operand:SI 2 "const_0_to_255_operand")
+   (match_operand:V8FI 3 "vector_move_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[2]);
+  emit_insn (gen_<avx2_avx512f>_perm<mode>_1_mask (operands[0], operands[1],
+						   GEN_INT ((mask >> 0) & 3),
+						   GEN_INT ((mask >> 2) & 3),
+						   GEN_INT ((mask >> 4) & 3),
+						   GEN_INT ((mask >> 6) & 3),
+						   operands[3], operands[4]));
   DONE;
 })
 
-(define_insn "<avx2_avx512f>_perm<mode>_1"
+(define_insn "<avx2_avx512f>_perm<mode>_1<mask_name>"
   [(set (match_operand:VI8F_256_512 0 "register_operand" "=v")
 	(vec_select:VI8F_256_512
 	  (match_operand:VI8F_256_512 1 "nonimmediate_operand" "vm")
@@ -12675,7 +13850,7 @@
 		     (match_operand 3 "const_0_to_3_operand")
 		     (match_operand 4 "const_0_to_3_operand")
 		     (match_operand 5 "const_0_to_3_operand")])))]
-  "TARGET_AVX2"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
 {
   int mask = 0;
   mask |= INTVAL (operands[2]) << 0;
@@ -12683,10 +13858,10 @@
   mask |= INTVAL (operands[4]) << 4;
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
-  return "vperm<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+  return "vperm<ssemodesuffix>\t{%2, %1, %0<mask_operand6>|%0<mask_operand6>, %1, %2}";
 }
   [(set_attr "type" "sselog")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix2>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx2_permv2ti"
@@ -12733,58 +13908,58 @@
    (set_attr "isa" "*,avx2,noavx2")
    (set_attr "mode" "V8SF")])
 
-(define_insn "avx512f_vec_dup<mode>"
+(define_insn "<mask_codefor>avx512f_vec_dup<mode><mask_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(vec_duplicate:VI48F_512
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
-  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0|%0, %1}"
+  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_broadcast<mode>"
+(define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
   [(set (match_operand:V16FI 0 "register_operand" "=v,v")
 	(vec_duplicate:V16FI
 	  (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "v,m")))]
   "TARGET_AVX512F"
   "@
-   vshuf<shuffletype>32x4\t{$0x0, %g1, %g1, %0|%0, %g1, %g1, 0x0}
-   vbroadcast<shuffletype>32x4\t{%1, %0|%0, %1}"
+   vshuf<shuffletype>32x4\t{$0x0, %g1, %g1, %0<mask_operand2>|%0<mask_operand2>, %g1, %g1, 0x0}
+   vbroadcast<shuffletype>32x4\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_broadcast<mode>"
+(define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v,v")
 	(vec_duplicate:V8FI
 	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "v,m")))]
   "TARGET_AVX512F"
   "@
-   vshuf<shuffletype>64x2\t{$0x44, %g1, %g1, %0|%0, %g1, %g1, 0x44}
-   vbroadcast<shuffletype>64x4\t{%1, %0|%0, %1}"
+   vshuf<shuffletype>64x2\t{$0x44, %g1, %g1, %0<mask_operand2>|%0<mask_operand2>, %g1, %g1, 0x44}
+   vbroadcast<shuffletype>64x4\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vec_dup_gpr<mode>"
+(define_insn "<mask_codefor>avx512f_vec_dup_gpr<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(vec_duplicate:VI48_512
 	  (match_operand:<ssescalarmode> 1 "register_operand" "r")))]
   "TARGET_AVX512F && (<MODE>mode != V8DImode || TARGET_64BIT)"
-  "vpbroadcast<bcstscalarsuff>\t{%1, %0|%0, %1}"
+  "vpbroadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vec_dup_mem<mode>"
+(define_insn "<mask_codefor>avx512f_vec_dup_mem<mode><mask_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=x")
 	(vec_duplicate:VI48F_512
 	  (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "xm")))]
   "TARGET_AVX512F"
-  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0|%0, %1}"
+  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -12924,12 +14099,12 @@
 				elt * GET_MODE_SIZE (<ssescalarmode>mode));
 })
 
-(define_expand "<sse2_avx_avx512f>_vpermil<mode>"
+(define_expand "<sse2_avx_avx512f>_vpermil<mode><mask_name>"
   [(set (match_operand:VF2 0 "register_operand")
 	(vec_select:VF2
 	  (match_operand:VF2 1 "nonimmediate_operand")
 	  (match_operand:SI 2 "const_0_to_255_operand")))]
-  "TARGET_AVX"
+  "TARGET_AVX && <mask_mode512bit_condition>"
 {
   int mask = INTVAL (operands[2]);
   rtx perm[<ssescalarnum>];
@@ -12945,12 +14120,12 @@
     = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (<ssescalarnum>, perm));
 })
 
-(define_expand "<sse2_avx_avx512f>_vpermil<mode>"
+(define_expand "<sse2_avx_avx512f>_vpermil<mode><mask_name>"
   [(set (match_operand:VF1 0 "register_operand")
 	(vec_select:VF1
 	  (match_operand:VF1 1 "nonimmediate_operand")
 	  (match_operand:SI 2 "const_0_to_255_operand")))]
-  "TARGET_AVX"
+  "TARGET_AVX && <mask_mode512bit_condition>"
 {
   int mask = INTVAL (operands[2]);
   rtx perm[<ssescalarnum>];
@@ -12968,37 +14143,37 @@
     = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (<ssescalarnum>, perm));
 })
 
-(define_insn "*<sse2_avx_avx512f>_vpermilp<mode>"
+(define_insn "*<sse2_avx_avx512f>_vpermilp<mode><mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
 	(vec_select:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "vm")
 	  (match_parallel 2 ""
 	    [(match_operand 3 "const_int_operand")])))]
-  "TARGET_AVX
+  "TARGET_AVX && <mask_mode512bit_condition>
    && avx_vpermilp_parallel (operands[2], <MODE>mode)"
 {
   int mask = avx_vpermilp_parallel (operands[2], <MODE>mode) - 1;
   operands[2] = GEN_INT (mask);
-  return "vpermil<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+  return "vpermil<ssemodesuffix>\t{%2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2}";
 }
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix>")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<sse2_avx_avx512f>_vpermilvar<mode>3"
+(define_insn "<sse2_avx_avx512f>_vpermilvar<mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
 	(unspec:VF
 	  [(match_operand:VF 1 "register_operand" "v")
 	   (match_operand:<sseintvecmode> 2 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPERMIL))]
-  "TARGET_AVX"
-  "vpermil<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX && <mask_mode512bit_condition>"
+  "vpermil<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "btver2_decode" "vector")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx512f_vpermi2var<mode>3"
@@ -13014,6 +14189,22 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_vpermi2var<mode>3_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v")
+	(vec_merge:VI48F_512
+	  (unspec:VI48F_512
+	    [(match_operand:VI48F_512 1 "register_operand" "v")
+	    (match_operand:<sseintvecmode> 2 "register_operand" "0")
+	    (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
+	    UNSPEC_VPERMI2_MASK)
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vpermi2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "avx512f_vpermt2var<mode>3"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(unspec:VI48F_512
@@ -13027,6 +14218,22 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_vpermt2var<mode>3_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v")
+	(vec_merge:VI48F_512
+	  (unspec:VI48F_512
+	    [(match_operand:<sseintvecmode> 1 "register_operand" "v")
+	    (match_operand:VI48F_512 2 "register_operand" "0")
+	    (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
+	    UNSPEC_VPERMT2)
+	  (match_dup 2)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vpermt2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_expand "avx_vperm2f128<mode>3"
   [(set (match_operand:AVX256MODE2P 0 "register_operand")
 	(unspec:AVX256MODE2P
@@ -13417,24 +14624,24 @@
   DONE;
 })
 
-(define_insn "<avx2_avx512f>_ashrv<mode>"
+(define_insn "<avx2_avx512f>_ashrv<mode><mask_name>"
   [(set (match_operand:VI48_AVX512F 0 "register_operand" "=v")
 	(ashiftrt:VI48_AVX512F
 	  (match_operand:VI48_AVX512F 1 "register_operand" "v")
 	  (match_operand:VI48_AVX512F 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX2"
-  "vpsrav<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "vpsrav<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<avx2_avx512f>_<shift_insn>v<mode>"
+(define_insn "<avx2_avx512f>_<shift_insn>v<mode><mask_name>"
   [(set (match_operand:VI48_AVX2_48_AVX512F 0 "register_operand" "=v")
 	(any_lshift:VI48_AVX2_48_AVX512F
 	  (match_operand:VI48_AVX2_48_AVX512F 1 "register_operand" "v")
 	  (match_operand:VI48_AVX2_48_AVX512F 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX2"
-  "vp<vshift>v<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "vp<vshift>v<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -13517,12 +14724,13 @@
    (set_attr "btver2_decode" "double")
    (set_attr "mode" "V8SF")])
 
-(define_insn "avx512f_vcvtph2ps512"
+(define_insn "<mask_codefor>avx512f_vcvtph2ps512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
-	(unspec:V16SF [(match_operand:V16HI 1 "nonimmediate_operand" "vm")]
-		      UNSPEC_VCVTPH2PS))]
+	(unspec:V16SF
+	  [(match_operand:V16HI 1 "nonimmediate_operand" "vm")]
+	  UNSPEC_VCVTPH2PS))]
   "TARGET_AVX512F"
-  "vcvtph2ps\t{%1, %0|%0, %1}"
+  "vcvtph2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -13573,13 +14781,14 @@
    (set_attr "btver2_decode" "vector")
    (set_attr "mode" "V8SF")])
 
-(define_insn "avx512f_vcvtps2ph512"
+(define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>"
   [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
-	(unspec:V16HI [(match_operand:V16SF 1 "register_operand" "v")
-		      (match_operand:SI 2 "const_0_to_255_operand" "N")]
-		     UNSPEC_VCVTPS2PH))]
+	(unspec:V16HI
+	  [(match_operand:V16SF 1 "register_operand" "v")
+	   (match_operand:SI 2 "const_0_to_255_operand" "N")]
+	  UNSPEC_VCVTPS2PH))]
   "TARGET_AVX512F"
-  "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}"
+  "vcvtps2ph\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -13969,14 +15178,55 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getmant<mode>"
+(define_insn "avx512f_compress<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "register_operand" "v")
+	   (match_operand:VI48F_512 2 "vector_move_operand" "0C")
+	   (match_operand:<avx512fmaskmode> 3 "register_operand" "k")]
+	  UNSPEC_COMPRESS))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>compress<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_compressstore<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "memory_operand" "=m")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "register_operand" "x")
+	   (match_dup 0)
+	   (match_operand:<avx512fmaskmode> 2 "register_operand" "k")]
+	  UNSPEC_COMPRESS_STORE))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>compress<ssemodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_expand<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v,v")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "nonimmediate_operand" "v,m")
+	   (match_operand:VI48F_512 2 "vector_move_operand" "0C,0C")
+	   (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")]
+	  UNSPEC_EXPAND))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>expand<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "none,load")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_getmant<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_GETMANT))]
   "TARGET_AVX512F"
-  "vgetmant<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+  "vgetmant<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
@@ -13995,23 +15245,23 @@
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "clz<mode>2"
+(define_insn "clz<mode>2<mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(clz:VI48_512
 	  (match_operand:VI48_512 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512CD"
-  "vplzcnt<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vplzcnt<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "conflict<mode>"
+(define_insn "<mask_codefor>conflict<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(unspec:VI48_512
 	  [(match_operand:VI48_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_CONFLICT))]
   "TARGET_AVX512CD"
-  "vpconflict<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vpconflict<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
new file mode 100644
index 0000000..6b45d05
--- /dev/null
+++ b/gcc/config/i386/subst.md
@@ -0,0 +1,56 @@
+;; GCC machine description for AVX512F instructions
+;; Copyright (C) 2013 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Some iterators for extending subst as much as possible
+;; All vectors (Use it for destination)
+(define_mode_iterator SUBST_V
+  [V16QI
+   V16HI V8HI
+   V16SI V8SI  V4SI
+   V8DI  V4DI  V2DI
+   V16SF V8SF  V4SF
+   V8DF  V4DF  V2DF])
+
+(define_subst_attr "mask_name" "mask" "" "_mask")
+(define_subst_attr "mask_applied" "mask" "false" "true")
+(define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2")
+(define_subst_attr "mask_operand3" "mask" "" "%{%4%}%N3")
+(define_subst_attr "mask_operand3_1" "mask" "" "%%{%%4%%}%%N3") ;; for sprintf
+(define_subst_attr "mask_operand4" "mask" "" "%{%5%}%N4")
+(define_subst_attr "mask_operand6" "mask" "" "%{%7%}%N6")
+(define_subst_attr "mask_operand11" "mask" "" "%{%12%}%N11")
+(define_subst_attr "mask_operand18" "mask" "" "%{%19%}%N18")
+(define_subst_attr "mask_operand19" "mask" "" "%{%20%}%N19")
+(define_subst_attr "mask_codefor" "mask" "*" "")
+(define_subst_attr "mask_mode512bit_condition" "mask" "1" "(GET_MODE_SIZE (GET_MODE (operands[0])) == 64)")
+(define_subst_attr "store_mask_constraint" "mask" "vm" "v")
+(define_subst_attr "store_mask_predicate" "mask" "nonimmediate_operand" "register_operand")
+(define_subst_attr "mask_prefix" "mask" "vex" "evex")
+(define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex")
+(define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex")
+
+(define_subst "mask"
+  [(set (match_operand:SUBST_V 0)
+        (match_operand:SUBST_V 1))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+        (vec_merge:SUBST_V
+	  (match_dup 1)
+	  (match_operand:SUBST_V 2 "vector_move_operand" "0C")
+	  (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-17 14:17   ` [PATCH i386 4/8] [AVX512] [1/n] " Kirill Yukhin
@ 2013-10-22  1:43     ` Richard Henderson
  2013-10-22 15:58       ` Kirill Yukhin
  2013-11-01 12:20       ` Kirill Yukhin
  0 siblings, 2 replies; 63+ messages in thread
From: Richard Henderson @ 2013-10-22  1:43 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

On 10/17/2013 07:15 AM, Kirill Yukhin wrote:
> +(define_mode_attr ssescalarsize
> +  [(V8DI  "64") (V4DI  "64") (V2DI  "64")
> +   (V32HI "16") (V16HI "16") (V8HI "16")
> +   (V16SI "32") (V8SI "32") (V4SI "32")
> +   (V16SF "16") (V8DF "64")])

Error on V16SF.  Probably better to fill this out.

> +(define_insn "avx512f_load<mode>_mask"
> +  [(set (match_operand:VI48F_512 0 "register_operand" "=v,v")
> +	(vec_merge:VI48F_512
> +	  (match_operand:VI48F_512 1 "nonimmediate_operand" "v,m")
> +	  (match_operand:VI48F_512 2 "vector_move_operand" "0C,0C")
> +	  (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")))]
> +  "TARGET_AVX512F"
> +{
> +  switch (get_attr_mode (insn))

Better to just use <sseinsnmode> here, as it's a compile-time constant.

> +    {
> +    case MODE_V8DF:
> +    case MODE_V16SF:
> +      return "vmova<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}";
> +    default:
> +      return "vmovdqa<ssescalarsize>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}";
> +    }
> +}
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "memory" "none,load")
> +   (set_attr "mode" "<sseinsnmode>")])
> +

> +(define_insn "avx512f_store<mode>_mask"

Likewise.

> +(define_insn "avx512f_moves<mode>_mask"
> +  [(set (match_operand:VF_128 0 "register_operand" "=v")
> +	(vec_merge:VF_128
> +	  (vec_merge:VF_128
> +	    (match_operand:VF_128 2 "register_operand" "v")
> +	    (match_operand:VF_128 3 "vector_move_operand" "0C")
> +	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
> +	  (match_operand:VF_128 1 "register_operand" "v")
> +	  (const_int 1)))]
> +  "TARGET_AVX512F"
> +  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<sseinsnmode>")])

Nested vec_merge?  That seems... odd to say the least.
How in the world does this get matched?

> +(define_insn "*avx512f_loads<mode>_mask"

Likewise.

> +(define_insn "avx512f_stores<mode>_mask"
> +  [(set (match_operand:<ssescalarmode> 0 "memory_operand" "=m")
> +	(vec_select:<ssescalarmode>
> +	  (vec_merge:VF_128
> +	    (match_operand:VF_128 1 "register_operand" "v")
> +	    (vec_duplicate:VF_128
> +	      (match_dup 0))
> +	    (match_operand:<avx512fmaskmode> 2 "register_operand" "k"))
> +	  (parallel [(const_int 0)])))]

This seems similar, though of course it's an extract.
I still can't imagine how it could be used.

> -(define_insn "rcp14<mode>"
> +(define_insn "<mask_codefor>rcp14<mode><mask_name>"

What, this name isn't used for non-masked anymore?

> -(define_insn "srcp14<mode>"
> +(define_insn "*srcp14<mode>"

Likewise.  These changes don't belong in this patch.

> -(define_insn "rsqrt14<mode>"
> +(define_insn "*rsqrt14<mode>"

Likewise.

> @@ -2565,9 +2751,9 @@
>  (define_insn "*fma_fmadd_<mode>"
>    [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
>  	(fma:FMAMODE
> -	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x")
> -	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm, v,vm, x,m")
> -	  (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
> +	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x")
> +	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm,v,vm,x,m")
> +	  (match_operand:FMAMODE 3 "nonimmediate_operand" "v,vm,0,xm,x")))]

Unrelated changes.  Repeated throughout the fma patterns.

> +(define_insn "*fmai_fmadd_<mode>_maskz"
> +  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
> +	(vec_merge:VF_128
> +	  (vec_merge:VF_128
> +	    (fma:VF_128
> +	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
> +	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
> +	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
> +	    (match_operand:VF_128 4 "const0_operand")
> +	    (match_operand:QI 5 "register_operand" "k,k"))
> +	  (match_dup 1)
> +	  (const_int 1)))]
> +  "TARGET_AVX512F"
> +  "@
> +   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %3, %2}
> +   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3}"
> +  [(set_attr "type" "ssemuladd")
> +   (set_attr "mode" "<MODE>")])

These seem like useless patterns.  If they're for builtins,
then they seem like useless builtins.  See above.

> @@ -3686,8 +4328,8 @@
>     (set_attr "athlon_decode" "vector,double,*")
>     (set_attr "amdfam10_decode" "vector,double,*")
>     (set_attr "bdver1_decode" "direct,direct,*")
> -   (set_attr "btver2_decode" "double,double,double")
>     (set_attr "prefix" "orig,orig,vex")
> +   (set_attr "btver2_decode" "double,double,double")
>     (set_attr "mode" "SF")])

Unrelated changes.

> +(define_expand "vec_unpacku_float_hi_v16si"
> +  [(match_operand:V8DF 0 "register_operand")
> +   (match_operand:V16SI 1 "register_operand")]
> +  "TARGET_AVX512F"
> +{
> +  REAL_VALUE_TYPE TWO32r;
> +  rtx k, x, tmp[4];
> +
> +  real_ldexp (&TWO32r, &dconst1, 32);
> +  x = const_double_from_real_value (TWO32r, DFmode);
> +
> +  tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode));
> +  tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x));
> +  tmp[2] = gen_reg_rtx (V8DFmode);
> +  tmp[3] = gen_reg_rtx (V8SImode);
> +  k = gen_reg_rtx (QImode);
> +
> +  emit_insn (gen_vec_extract_hi_v16si (tmp[3], operands[1]));
> +  emit_insn (gen_floatv8siv8df2 (tmp[2], tmp[3]));
> +  emit_insn (gen_rtx_SET (VOIDmode, k,
> +			  gen_rtx_LT (QImode, tmp[2], tmp[0])));
> +  emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k));
> +  emit_move_insn (operands[0], tmp[2]);
> +  DONE;
> +})

Separate patch.  And this is too complicated, since vcvtudq2pd exists.

> -(define_insn "avx512f_unpckhps512"
> +(define_insn "<mask_codefor>avx512f_unpckhps512<mask_name>"

Non-masked name change again.

> -(define_insn "avx512f_unpcklps512"
> +(define_insn "<mask_codefor>avx512f_unpcklps512<mask_name>"

Ditto.

> -(define_insn "avx512f_movshdup512"
> +(define_insn "<mask_codefor>avx512f_movshdup512<mask_name>"

Ditto.

> -(define_insn "avx512f_movsldup512"
> +(define_insn "<mask_codefor>avx512f_movsldup512<mask_name>"

Ditto.

There's probably more, but that'll do for a first pass.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-22  1:43     ` Richard Henderson
@ 2013-10-22 15:58       ` Kirill Yukhin
  2013-10-22 15:59         ` Richard Henderson
  2013-11-01 12:20       ` Kirill Yukhin
  1 sibling, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-10-22 15:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello Richard,
Thanks for remarks, they all seems reasonable.

One question

On 21 Oct 16:01, Richard Henderson wrote:
> > +(define_insn "avx512f_moves<mode>_mask"
> > +  [(set (match_operand:VF_128 0 "register_operand" "=v")
> > +	(vec_merge:VF_128
> > +	  (vec_merge:VF_128
> > +	    (match_operand:VF_128 2 "register_operand" "v")
> > +	    (match_operand:VF_128 3 "vector_move_operand" "0C")
> > +	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
> > +	  (match_operand:VF_128 1 "register_operand" "v")
> > +	  (const_int 1)))]
> > +  "TARGET_AVX512F"
> > +  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
> > +  [(set_attr "type" "ssemov")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "<sseinsnmode>")])
> 
> Nested vec_merge?  That seems... odd to say the least.
> How in the world does this get matched?

This is generic approach for all scalar `masked' instructions.

Reason is that we must save higher bits of vector (outer vec_merge)
and apply single-bit mask (inner vec_merge).


We may do it with unspecs though... But is it really better?

What do you think?

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-22 15:58       ` Kirill Yukhin
@ 2013-10-22 15:59         ` Richard Henderson
  2013-10-28 13:10           ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-10-22 15:59 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

On 10/22/2013 07:42 AM, Kirill Yukhin wrote:
> Hello Richard,
> Thanks for remarks, they all seems reasonable.
> 
> One question
> 
> On 21 Oct 16:01, Richard Henderson wrote:
>>> +(define_insn "avx512f_moves<mode>_mask"
>>> +  [(set (match_operand:VF_128 0 "register_operand" "=v")
>>> +	(vec_merge:VF_128
>>> +	  (vec_merge:VF_128
>>> +	    (match_operand:VF_128 2 "register_operand" "v")
>>> +	    (match_operand:VF_128 3 "vector_move_operand" "0C")
>>> +	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
>>> +	  (match_operand:VF_128 1 "register_operand" "v")
>>> +	  (const_int 1)))]
>>> +  "TARGET_AVX512F"
>>> +  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
>>> +  [(set_attr "type" "ssemov")
>>> +   (set_attr "prefix" "evex")
>>> +   (set_attr "mode" "<sseinsnmode>")])
>>
>> Nested vec_merge?  That seems... odd to say the least.
>> How in the world does this get matched?
> 
> This is generic approach for all scalar `masked' instructions.
> 
> Reason is that we must save higher bits of vector (outer vec_merge)
> and apply single-bit mask (inner vec_merge).
> 
> 
> We may do it with unspecs though... But is it really better?
> 
> What do you think?

What I think is that while it's an instruction that exists in the ISA,
does that mean we must model it in the compiler?

How would this pattern be used?


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-22 15:59         ` Richard Henderson
@ 2013-10-28 13:10           ` Kirill Yukhin
  2013-10-28 16:37             ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-10-28 13:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello Richard,
On 22 Oct 08:16, Richard Henderson wrote:
> On 10/22/2013 07:42 AM, Kirill Yukhin wrote:
> > Hello Richard,
> > Thanks for remarks, they all seems reasonable.
> > 
> > One question
> > 
> > On 21 Oct 16:01, Richard Henderson wrote:
> >>> +(define_insn "avx512f_moves<mode>_mask"
> >>> +  [(set (match_operand:VF_128 0 "register_operand" "=v")
> >>> +	(vec_merge:VF_128
> >>> +	  (vec_merge:VF_128
> >>> +	    (match_operand:VF_128 2 "register_operand" "v")
> >>> +	    (match_operand:VF_128 3 "vector_move_operand" "0C")
> >>> +	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
> >>> +	  (match_operand:VF_128 1 "register_operand" "v")
> >>> +	  (const_int 1)))]
> >>> +  "TARGET_AVX512F"
> >>> +  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
> >>> +  [(set_attr "type" "ssemov")
> >>> +   (set_attr "prefix" "evex")
> >>> +   (set_attr "mode" "<sseinsnmode>")])
> >>
> >> Nested vec_merge?  That seems... odd to say the least.
> >> How in the world does this get matched?
> > 
> > This is generic approach for all scalar `masked' instructions.
> > 
> > Reason is that we must save higher bits of vector (outer vec_merge)
> > and apply single-bit mask (inner vec_merge).
> > 
> > 
> > We may do it with unspecs though... But is it really better?
> > 
> > What do you think?
> 
> What I think is that while it's an instruction that exists in the ISA,
> does that mean we must model it in the compiler?
> 
> How would this pattern be used?

When we have all-1 mask then simplifier may reduce such pattern to simpler form
with single vec_merge.
This will be impossible if we put unspec there.

So, for example for thise code:
    __m128d
    foo (__m128d x, __m128d y)
    {
      return _mm_maskz_add_sd (-1, x, y);
    }

With unspec we will have:
foo:
.LFB2328:
        movl    $-1, %eax       # 10    *movqi_internal/2       [length = 5]
        kmovw   %eax, %k1       # 24    *movqi_internal/8       [length = 4]
        vaddsd  %xmm1, %xmm0, %xmm0{%k1}{z}     # 11    sse2_vmaddv2df3_mask/2 [length = 6]
        ret     # 27    simple_return_internal  [length = 1]

While for `semantic' version it will be simplified to:
foo:
.LFB2329:
        vaddsd  %xmm1, %xmm0, %xmm0     # 11    sse2_vmaddv2df3/2       [length = 4]
	ret     # 26    simple_return_internal  [length = 1]

So, we have short VEX insn vs. long EVEX one + mask creation insns.
That is why we want to expose semantics of such operations.

Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
  2013-08-22 14:15 ` Kirill Yukhin
  2013-10-17 14:17   ` [PATCH i386 4/8] [AVX512] [1/n] " Kirill Yukhin
@ 2013-10-28 15:13   ` Kirill Yukhin
  2013-11-13 14:44     ` Kirill Yukhin
  2013-10-28 15:20   ` [PATCH i386 4/8] [AVX512] Add substed patterns: mask_scalar_merge subst Kirill Yukhin
  2 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-10-28 15:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello,

This patch introduces mask scalar subst.

Is it ok to commit to main trunk?

Testing pass.

--
Thanks, K

---
 gcc/config/i386/sse.md   | 104 ++++++++++++++++++++++++++++++++---------------
 gcc/config/i386/subst.md |  23 +++++++++++
 2 files changed, 95 insertions(+), 32 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index bf0e1ed..1f0d6fa 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1307,7 +1307,7 @@
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<plusminus_insn><mode>3"
+(define_insn "<sse>_vm<plusminus_insn><mode>3<mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (plusminus:VF_128
@@ -1318,10 +1318,10 @@
   "TARGET_SSE"
   "@
    <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_expand "mul<mode>3<mask_name>"
@@ -1347,7 +1347,7 @@
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<multdiv_mnemonic><mode>3"
+(define_insn "<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (multdiv:VF_128
@@ -1358,10 +1358,10 @@
   "TARGET_SSE"
   "@
    <multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1446,7 +1446,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*srcp14<mode>"
+(define_insn "<mask_scalar_codefor>srcp14<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1456,7 +1456,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -1493,7 +1493,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vmsqrt<mode>2"
+(define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (sqrt:VF_128
@@ -1503,11 +1503,11 @@
   "TARGET_SSE"
   "@
    sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vsqrt<ssescalarmodesuffix>\t{%1, %2, %0|%0, %2, %<iptr>1}"
+   vsqrt<ssescalarmodesuffix>\t{%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %<iptr>1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1542,7 +1542,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*rsqrt14<mode>"
+(define_insn "<mask_scalar_codefor>rsqrt14<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1552,7 +1552,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrsqrt14<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrsqrt14<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -1622,7 +1622,7 @@
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<code><mode>3"
+(define_insn "<sse>_vm<code><mode>3<mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (smaxmin:VF_128
@@ -1633,11 +1633,11 @@
   "TARGET_SSE"
   "@
    <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<maxmin_float><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<maxmin_float><ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
 ;; These versions of the min/max patterns implement exactly the operations
@@ -2748,7 +2748,7 @@
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_insn "*fma_fmadd_<mode>"
+(define_insn "fma_fmadd_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x")
@@ -2976,7 +2976,7 @@
 	  UNSPEC_FMADDSUB))]
   "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
 
-(define_insn "*fma_fmaddsub_<mode>"
+(define_insn "fma_fmaddsub_<mode>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
 	  [(match_operand:VF 1 "nonimmediate_operand" "%0,0,v,x,x")
@@ -3241,6 +3241,46 @@
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*fmai_fmsub_<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
+   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmsub_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
+   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*fmai_vmfnmadd_<mode>_mask"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
 	(vec_merge:VF_128
@@ -4310,7 +4350,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "sse2_cvtsd2ss"
+(define_insn "sse2_cvtsd2ss<mask_scalar_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
@@ -4322,17 +4362,17 @@
   "@
    cvtsd2ss\t{%2, %0|%0, %2}
    cvtsd2ss\t{%2, %0|%0, %q2}
-   vcvtsd2ss\t{%2, %1, %0|%0, %1, %q2}"
+   vcvtsd2ss\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %q2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "athlon_decode" "vector,double,*")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "bdver1_decode" "direct,direct,*")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,<mask_scalar_prefix2>")
    (set_attr "btver2_decode" "double,double,double")
    (set_attr "mode" "SF")])
 
-(define_insn "sse2_cvtss2sd"
+(define_insn "sse2_cvtss2sd<mask_scalar_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (float_extend:V2DF
@@ -4345,14 +4385,14 @@
   "@
    cvtss2sd\t{%2, %0|%0, %2}
    cvtss2sd\t{%2, %0|%0, %k2}
-   vcvtss2sd\t{%2, %1, %0|%0, %1, %k2}"
+   vcvtss2sd\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %k2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "athlon_decode" "direct,direct,*")
    (set_attr "bdver1_decode" "direct,direct,*")
    (set_attr "btver2_decode" "double,double,double")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,<mask_scalar_prefix2>")
    (set_attr "mode" "DF")])
 
 (define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name>"
@@ -6737,7 +6777,7 @@
   operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8);
 })
 
-(define_insn "*avx512f_vmscalef<mode>"
+(define_insn "<mask_scalar_codefor>avx512f_vmscalef<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -6747,7 +6787,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "%vscalef<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vscalef<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
@@ -6802,7 +6842,7 @@
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_sgetexp<mode>"
+(define_insn "avx512f_sgetexp<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -6812,7 +6852,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetexp<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+   "vgetexp<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<ssescalarmode>")])
 
@@ -6934,7 +6974,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512f_rndscale<mode>"
+(define_insn "<mask_scalar_codefor>avx512f_rndscale<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -6945,7 +6985,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2, %3}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -15230,7 +15270,7 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_getmant<mode>"
+(define_insn "avx512f_getmant<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -15241,7 +15281,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetmant<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+   "vgetmant<ssescalarmodesuffix>\t{%3, %2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2, %3}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 6b45d05..532a3a1 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -54,3 +54,26 @@
 	  (match_dup 1)
 	  (match_operand:SUBST_V 2 "vector_move_operand" "0C")
 	  (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))])
+
+(define_subst_attr "mask_scalar_name" "mask_scalar" "" "_mask")
+(define_subst_attr "mask_scalar_operand3" "mask_scalar" "" "%{%4%}%N3")
+(define_subst_attr "mask_scalar_operand4" "mask_scalar" "" "%{%5%}%N4")
+(define_subst_attr "mask_scalar_codefor" "mask_scalar" "*" "")
+(define_subst_attr "mask_scalar_prefix" "mask_scalar" "orig,vex" "evex")
+(define_subst_attr "mask_scalar_prefix2" "mask_scalar" "vex" "evex")
+
+(define_subst "mask_scalar"
+  [(set (match_operand:SUBST_V 0)
+	(vec_merge:SUBST_V
+	  (match_operand:SUBST_V 1)
+	  (match_operand:SUBST_V 2)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+        (vec_merge:SUBST_V
+	  (vec_merge:SUBST_V
+	    (match_dup 1)
+	    (match_operand:SUBST_V 4 "vector_move_operand" "0C")
+	    (match_operand:<avx512fmaskmode> 5 "register_operand" "k"))
+	  (match_dup 2)
+	  (const_int 1)))])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] Add substed patterns: mask_scalar_merge subst.
  2013-08-22 14:15 ` Kirill Yukhin
  2013-10-17 14:17   ` [PATCH i386 4/8] [AVX512] [1/n] " Kirill Yukhin
  2013-10-28 15:13   ` [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst Kirill Yukhin
@ 2013-10-28 15:20   ` Kirill Yukhin
  2013-10-28 20:15     ` Richard Henderson
  2 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-10-28 15:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello,

This patch introduces "mask_scalar_merge"  subst.

Is it ok to commit to main trunk?

Testing pass.

--
Thanks, K

---
 gcc/config/i386/sse.md   | 26 +++++++++++++-------------
 gcc/config/i386/subst.md | 16 ++++++++++++++++
 2 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 1f0d6fa..f3cca59 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2153,7 +2153,7 @@
   [(V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand")
   (V16SI "const_0_to_7_operand") (V8DI "const_0_to_7_operand")])
 
-(define_insn "avx512f_cmp<mode>3"
+(define_insn "avx512f_cmp<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48F_512 1 "register_operand" "v")
@@ -2161,13 +2161,13 @@
 	   (match_operand:SI 3 "<cmp_imm_predicate>" "n")]
 	  UNSPEC_PCMP))]
   "TARGET_AVX512F"
-  "v<sseintprefix>cmp<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "v<sseintprefix>cmp<ssemodesuffix>\t{%3, %2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_ucmp<mode>3"
+(define_insn "avx512f_ucmp<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48_512 1 "register_operand" "v")
@@ -2175,7 +2175,7 @@
 	   (match_operand:SI 3 "const_0_to_7_operand" "n")]
 	  UNSPEC_UNSIGNED_PCMP))]
   "TARGET_AVX512F"
-  "vpcmpu<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vpcmpu<ssemodesuffix>\t{%3, %2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
@@ -8712,7 +8712,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_expand "avx512f_eq<mode>3"
+(define_expand "avx512f_eq<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48_512 1 "register_operand")
@@ -8721,14 +8721,14 @@
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (EQ, <MODE>mode, operands);")
 
-(define_insn "avx512f_eq<mode>3_1"
+(define_insn "avx512f_eq<mode>3<mask_scalar_merge_name>_1"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48_512 1 "register_operand" "%v")
 	   (match_operand:VI48_512 2 "nonimmediate_operand" "vm")]
 	  UNSPEC_MASKED_EQ))]
   "TARGET_AVX512F && ix86_binary_operator_ok (EQ, <MODE>mode, operands)"
-  "vpcmpeq<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
   [(set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "evex")
@@ -8808,13 +8808,13 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_gt<mode>3"
+(define_insn "avx512f_gt<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48_512 1 "register_operand" "v")
 	   (match_operand:VI48_512 2 "nonimmediate_operand" "vm")] UNSPEC_MASKED_GT))]
   "TARGET_AVX512F"
-  "vpcmpgt<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vpcmpgt<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
   [(set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "evex")
@@ -9208,25 +9208,25 @@
 	      ]
 	      (const_string "<sseinsnmode>")))])
 
-(define_insn "avx512f_testm<mode>3"
+(define_insn "avx512f_testm<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	 [(match_operand:VI48_512 1 "register_operand" "v")
 	  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")]
 	 UNSPEC_TESTM))]
   "TARGET_AVX512F"
-  "vptestm<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vptestm<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<sseinsnmode>")])
 
-(define_insn "avx512f_testnm<mode>3"
+(define_insn "avx512f_testnm<mode>3<mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	 [(match_operand:VI48_512 1 "register_operand" "v")
 	  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")]
 	 UNSPEC_TESTNM))]
   "TARGET_AVX512CD"
-  "%vptestnm<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vptestnm<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<sseinsnmode>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 532a3a1..b537c5e 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -27,6 +27,9 @@
    V16SF V8SF  V4SF
    V8DF  V4DF  V2DF])
 
+(define_mode_iterator SUBST_S
+  [QI HI SI DI])
+
 (define_subst_attr "mask_name" "mask" "" "_mask")
 (define_subst_attr "mask_applied" "mask" "false" "true")
 (define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2")
@@ -77,3 +80,16 @@
 	    (match_operand:<avx512fmaskmode> 5 "register_operand" "k"))
 	  (match_dup 2)
 	  (const_int 1)))])
+
+(define_subst_attr "mask_scalar_merge_name" "mask_scalar_merge" "" "_mask")
+(define_subst_attr "mask_scalar_merge_operand3" "mask_scalar_merge" "" "%{%3%}")
+(define_subst_attr "mask_scalar_merge_operand4" "mask_scalar_merge" "" "%{%4%}")
+
+(define_subst "mask_scalar_merge"
+  [(set (match_operand:SUBST_S 0)
+        (match_operand:SUBST_S 1))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+        (and:SUBST_S
+	  (match_dup 1)
+	  (match_operand:SUBST_S 3 "register_operand" "k")))])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-28 13:10           ` Kirill Yukhin
@ 2013-10-28 16:37             ` Richard Henderson
  2013-10-28 21:19               ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-10-28 16:37 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

On 10/28/2013 03:24 AM, Kirill Yukhin wrote:
> Hello Richard,
> On 22 Oct 08:16, Richard Henderson wrote:
>> On 10/22/2013 07:42 AM, Kirill Yukhin wrote:
>>> Hello Richard,
>>> Thanks for remarks, they all seems reasonable.
>>>
>>> One question
>>>
>>> On 21 Oct 16:01, Richard Henderson wrote:
>>>>> +(define_insn "avx512f_moves<mode>_mask"
>>>>> +  [(set (match_operand:VF_128 0 "register_operand" "=v")
>>>>> +	(vec_merge:VF_128
>>>>> +	  (vec_merge:VF_128
>>>>> +	    (match_operand:VF_128 2 "register_operand" "v")
>>>>> +	    (match_operand:VF_128 3 "vector_move_operand" "0C")
>>>>> +	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
>>>>> +	  (match_operand:VF_128 1 "register_operand" "v")
>>>>> +	  (const_int 1)))]
>>>>> +  "TARGET_AVX512F"
>>>>> +  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
>>>>> +  [(set_attr "type" "ssemov")
>>>>> +   (set_attr "prefix" "evex")
>>>>> +   (set_attr "mode" "<sseinsnmode>")])
>>>>
>>>> Nested vec_merge?  That seems... odd to say the least.
>>>> How in the world does this get matched?
>>>
>>> This is generic approach for all scalar `masked' instructions.
>>>
>>> Reason is that we must save higher bits of vector (outer vec_merge)
>>> and apply single-bit mask (inner vec_merge).
>>>
>>>
>>> We may do it with unspecs though... But is it really better?
>>>
>>> What do you think?
>>
>> What I think is that while it's an instruction that exists in the ISA,
>> does that mean we must model it in the compiler?
>>
>> How would this pattern be used?
> 
> When we have all-1 mask then simplifier may reduce such pattern to simpler form
> with single vec_merge.
> This will be impossible if we put unspec there.
> 
> So, for example for thise code:
>     __m128d
>     foo (__m128d x, __m128d y)
>     {
>       return _mm_maskz_add_sd (-1, x, y);
>     }
> 
> With unspec we will have:
> foo:
> .LFB2328:
>         movl    $-1, %eax       # 10    *movqi_internal/2       [length = 5]
>         kmovw   %eax, %k1       # 24    *movqi_internal/8       [length = 4]
>         vaddsd  %xmm1, %xmm0, %xmm0{%k1}{z}     # 11    sse2_vmaddv2df3_mask/2 [length = 6]
>         ret     # 27    simple_return_internal  [length = 1]
> 
> While for `semantic' version it will be simplified to:
> foo:
> .LFB2329:
>         vaddsd  %xmm1, %xmm0, %xmm0     # 11    sse2_vmaddv2df3/2       [length = 4]
> 	ret     # 26    simple_return_internal  [length = 1]
> 
> So, we have short VEX insn vs. long EVEX one + mask creation insns.
> That is why we want to expose semantics of such operations.

This is not the question that I asked.

Why is a masked *scalar* operation useful?  The only way I can see
that one would get created is that builtin, which begs the question
of why the builtin exists.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] Add substed patterns: mask_scalar_merge subst.
  2013-10-28 15:20   ` [PATCH i386 4/8] [AVX512] Add substed patterns: mask_scalar_merge subst Kirill Yukhin
@ 2013-10-28 20:15     ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-10-28 20:15 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

On 10/28/2013 07:53 AM, Kirill Yukhin wrote:
> This patch introduces "mask_scalar_merge"  subst.
> 
> Is it ok to commit to main trunk?

Ok.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-28 16:37             ` Richard Henderson
@ 2013-10-28 21:19               ` Kirill Yukhin
  2013-10-28 22:05                 ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-10-28 21:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello Richard,
On 28 Oct 08:20, Richard Henderson wrote:
> Why is a masked *scalar* operation useful?

The reason the instructions exist is so that
you can do fully fault correct predicated scalar algorithms.

I example.

In fact, with some hacky tricks, you can fully predicate
normal C code in the SIMD registers. 

One might want to have such region
as fast as possible while staying scalar:
(all vars are integers):
  if ( a[i] )
    b[i] += c[i];

Definetely to have max performace we want to have
the region fully predicated. This code cannot be
predicated correctly in IA pre-AVX-512:

  vmovd a[i], %xmm0
  vptestm %zmm0, %zmm0, %k1
  // hack because we didnâ€™t have masking for VPMOVD/Q
  vmovss b[i], %xmm0 {%k1}{z}
  // no scalar int add, hack, 128-bit works fine
  // because mask is sawed off in the right places
  vpaddd c[i], %zmm0, %zmm0 {%k1}{z}
  // vmpmovd/w hack again
  vmovss %xmm0, b[i] {%k1}

So, having such masked scalar insns allows us to have
non-branching scalar code.

II Example.

Perhaps one interesting case of scalar and mask, though not
to do with predication (and really narrow), is as an idiom to
generate a write mask value of 0x1:

  vcmpss k1, xmm0, xmm0, {sae}, 0xf

Currently we do:
  mov   1, %eax
  kmovw %eax, %k1

But we sometimes have to spill a GPR in order to introduce
this sequence. In that case the vcmpss idiom seems like a
better choice  (though you might worry about the additional
false dependency on xmm0).

And finally.
Itâ€™s kind of strange not to have complete ISA support.
What if someone would want to have equivalent code for
the vector version and the scalar remainder version?

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-28 21:19               ` Kirill Yukhin
@ 2013-10-28 22:05                 ` Richard Henderson
  2013-10-29 10:22                   ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2013-10-28 22:05 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

On 10/28/2013 01:58 PM, Kirill Yukhin wrote:
> Hello Richard,
> On 28 Oct 08:20, Richard Henderson wrote:
>> Why is a masked *scalar* operation useful?
> 
> The reason the instructions exist is so that
> you can do fully fault correct predicated scalar algorithms.

Using VEC_MERGE isn't the proper representation for that.

If that's your real goal, then COND_EXEC is the only way to let
rtl know that faults are suppressed in the false condition.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-28 22:05                 ` Richard Henderson
@ 2013-10-29 10:22                   ` Kirill Yukhin
  2013-10-29 16:46                     ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-10-29 10:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello Richard,

On 28 Oct 14:45, Richard Henderson wrote:
> On 10/28/2013 01:58 PM, Kirill Yukhin wrote:
> > Hello Richard,
> > On 28 Oct 08:20, Richard Henderson wrote:
> >> Why is a masked *scalar* operation useful?
> > 
> > The reason the instructions exist is so that
> > you can do fully fault correct predicated scalar algorithms.
> 
> Using VEC_MERGE isn't the proper representation for that.
> 
> If that's your real goal, then COND_EXEC is the only way to let
> rtl know that faults are suppressed in the false condition.

I believe cond_exec approach supposed to look like this:
  (define_subst "mask_scalar"
    [(set (match_operand:SUBST_V 0)
          (vec_merge:SUBST_V
            (match_operand:SUBST_V 1)
            (match_operand:SUBST_V 2)
            (const_int 1)))]
    "TARGET_AVX512F"
    [(cond_exec (eq:CC
                  (match_operand:<avx512fmaskmode> 3 "register_operand" "k")
                  (const_int 1))
                (set (match_dup 0)
                     (vec_merge:SUBST_V
                       (match_dup 1)
                       (match_dup 2)
                       (const_int 1))))])

But this only will describe merge-masking in incorrect way.
We will need to add a clobber to signal that even for false
condition we will zero higher part of register.
Preferable zerro-masking will be indistinguishable from merge-
masking and will need to choose which mask mode to enable. Bad turn.

IMHO, we have 3 options to implement scalar masked insns:
  1. `vec_merge' over vec_merge (current approach).
     Pro.
       1. Precise semantic description
       2. Unified approach with vector patterns
       3. Freedom for simplifier to reduce EVEX to VEX for
       certain const masks
     Cons.
       1. Too precise semantic description and as a
       consequence complicated code in md-file

  2. `cond_exec' approach
    Pro.
      1. Look useful for compiler when trying to generate
      predicated code
    Cons.
      1. Not precise. Extra clobbers (?) needed: to signal
      that we're changing the register even for false
      condition in cond_exec 
      2. Unable to describe zero masking nicely
      3. Code still complicated as for option #1
      4. Simplifier won't work (clobber is always clobber)

  3. Make all masked scalar insns to be unspecs
    Pro.
      1. Straight-forward, not overweighted. Enough for
      intrinsics to work
    Cons.
      1. Since every unspec needs a code: substs won't be
      applied directly: huge volume of similar code
      2. Simplifier won't work
      3. Generation of predicated code become hard

Am I missing  some options, or thatâ€™s all we have?
If so, what option would you prefer?

Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-29 10:22                   ` Kirill Yukhin
@ 2013-10-29 16:46                     ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-10-29 16:46 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

On 10/29/2013 03:02 AM, Kirill Yukhin wrote:
> Hello Richard,
> 
> On 28 Oct 14:45, Richard Henderson wrote:
>> On 10/28/2013 01:58 PM, Kirill Yukhin wrote:
>>> Hello Richard,
>>> On 28 Oct 08:20, Richard Henderson wrote:
>>>> Why is a masked *scalar* operation useful?
>>>
>>> The reason the instructions exist is so that
>>> you can do fully fault correct predicated scalar algorithms.
>>
>> Using VEC_MERGE isn't the proper representation for that.
>>
>> If that's your real goal, then COND_EXEC is the only way to let
>> rtl know that faults are suppressed in the false condition.
> 
> I believe cond_exec approach supposed to look like this:
>   (define_subst "mask_scalar"
>     [(set (match_operand:SUBST_V 0)
>           (vec_merge:SUBST_V
>             (match_operand:SUBST_V 1)
>             (match_operand:SUBST_V 2)
>             (const_int 1)))]
>     "TARGET_AVX512F"
>     [(cond_exec (eq:CC
>                   (match_operand:<avx512fmaskmode> 3 "register_operand" "k")
>                   (const_int 1))
>                 (set (match_dup 0)
>                      (vec_merge:SUBST_V
>                        (match_dup 1)
>                        (match_dup 2)
>                        (const_int 1))))])
> 
> But this only will describe merge-masking in incorrect way.
> We will need to add a clobber to signal that even for false
> condition we will zero higher part of register.
> Preferable zerro-masking will be indistinguishable from merge-
> masking and will need to choose which mask mode to enable. Bad turn.

No, a cond_exec approach to scalars would use scalar modes
not vector modes with vec_merge.  In that case the higher
part of the register is ignored and undefined, and the fact
that zeroing happens is irrelevant.

> 
> IMHO, we have 3 options to implement scalar masked insns:
>   1. `vec_merge' over vec_merge (current approach).
>      Pro.
>        1. Precise semantic description

False, as I've described above.  The compiler will believe
that all exceptions, especially memory exceptions, will be
taken.

>        2. Unified approach with vector patterns
>        3. Freedom for simplifier to reduce EVEX to VEX for
>        certain const masks
>      Cons.
>        1. Too precise semantic description and as a
>        consequence complicated code in md-file
> 
>   2. `cond_exec' approach
>     Pro.
>       1. Look useful for compiler when trying to generate
>       predicated code
>     Cons.
>       1. Not precise. Extra clobbers (?) needed: to signal
>       that we're changing the register even for false
>       condition in cond_exec 
>       2. Unable to describe zero masking nicely
>       3. Code still complicated as for option #1
>       4. Simplifier won't work (clobber is always clobber)
> 
>   3. Make all masked scalar insns to be unspecs
>     Pro.
>       1. Straight-forward, not overweighted. Enough for
>       intrinsics to work
>     Cons.
>       1. Since every unspec needs a code: substs won't be
>       applied directly: huge volume of similar code
>       2. Simplifier won't work
>       3. Generation of predicated code become hard
> 
> Am I missing  some options, or thatâ€™s all we have?
> If so, what option would you prefer?

As far as builtins are concerned, all three approaches are
functional.  But in order for the compiler to be able to
automatically create conditional code from normal scalar
code we'll have to use cond_exec.

Indeed, if we've done our job right, then the user-facing
inlines that we expose could be written

  inline double add_mask(double r, double x, double y, int m)
  {
    if (m & 1)
      r = x + y;
    return r;
  }

although honestly such inlines would be silly.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-10-22  1:43     ` Richard Henderson
  2013-10-22 15:58       ` Kirill Yukhin
@ 2013-11-01 12:20       ` Kirill Yukhin
  2013-11-05  9:19         ` Kirill Yukhin
                           ` (2 more replies)
  1 sibling, 3 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-01 12:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello Richard,

On 21 Oct 16:01, Richard Henderson wrote:
> Error on V16SF.  Probably better to fill this out.
Thanks, fixed.

> Better to just use <sseinsnmode> here, as it's a compile-time constant.
Fixed.

> > +(define_insn "avx512f_store<mode>_mask"
> 
> Likewise.
Fixed.

> Nested vec_merge?  That seems... odd to say the least.
> How in the world does this get matched?
Moved to separate patch.

> > +(define_insn "*avx512f_loads<mode>_mask"
> 
> Likewise.
Moved to separate patch.

> > +(define_insn "avx512f_stores<mode>_mask"
> This seems similar, though of course it's an extract.
> I still can't imagine how it could be used.
Separate patch.

> > -(define_insn "rcp14<mode>"
> > +(define_insn "<mask_codefor>rcp14<mode><mask_name>"
> 
> What, this name isn't used for non-masked anymore?
Bogus original pattern. Introduced by me here:
      svn+ssh://gcc.gnu.org/svn/gcc/trunk@203609
This (and subseqent patterns you mention) are not used
by name. In general case for every built-in whether it is
masked or not we use masked version of built-in. E.g.:
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_rcp14_pd (__m512d __A)
{
  return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
                                                   (__v8df)
                                                   _mm512_setzero_pd (),
                                                   (__mmask8) -1);
}
Then, while expanding, if we can prove that mask is const -1 we remove
redundant vec_merge getting non-masked variant of the built-in.
Same holds for all inssn you mentioned


> > -(define_insn "srcp14<mode>"
> Likewise.  These changes don't belong in this patch.
Ditto.

> > -(define_insn "rsqrt14<mode>"
> Likewise.
Ditto.

> > -	  (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
> > +	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x")
> > +	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm,v,vm,x,m")
> > +	  (match_operand:FMAMODE 3 "nonimmediate_operand" "v,vm,0,xm,x")))]
> 
> Unrelated changes.  Repeated throughout the fma patterns.
Reverted.

> > +(define_insn "*fmai_fmadd_<mode>_maskz"
> > +  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
> > +	(vec_merge:VF_128
> > +	  (vec_merge:VF_128
> > +	    (fma:VF_128
> > +	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
> > +	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
> > +	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
> > +	    (match_operand:VF_128 4 "const0_operand")
> > +	    (match_operand:QI 5 "register_operand" "k,k"))
> > +	  (match_dup 1)
> > +	  (const_int 1)))]
> > +  "TARGET_AVX512F"
> > +  "@
> > +   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %3, %2}
> > +   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3}"
> > +  [(set_attr "type" "ssemuladd")
> > +   (set_attr "mode" "<MODE>")])
> 
> These seem like useless patterns.  If they're for builtins,
> then they seem like useless builtins.  See above.
Moved to dedicated patch.

> > @@ -3686,8 +4328,8 @@
> >     (set_attr "athlon_decode" "vector,double,*")
> >     (set_attr "amdfam10_decode" "vector,double,*")
> >     (set_attr "bdver1_decode" "direct,direct,*")
> > -   (set_attr "btver2_decode" "double,double,double")
> >     (set_attr "prefix" "orig,orig,vex")
> > +   (set_attr "btver2_decode" "double,double,double")
> >     (set_attr "mode" "SF")])
> 
> Unrelated changes.
Ugh, reverted.

> > +(define_expand "vec_unpacku_float_hi_v16si"
> > +  [(match_operand:V8DF 0 "register_operand")
> > +   (match_operand:V16SI 1 "register_operand")]
> > +  "TARGET_AVX512F"
> > +{
> > +  REAL_VALUE_TYPE TWO32r;
> > +  rtx k, x, tmp[4];
> > +
> > +  real_ldexp (&TWO32r, &dconst1, 32);
> > +  x = const_double_from_real_value (TWO32r, DFmode);
> > +
> > +  tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode));
> > +  tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x));
> > +  tmp[2] = gen_reg_rtx (V8DFmode);
> > +  tmp[3] = gen_reg_rtx (V8SImode);
> > +  k = gen_reg_rtx (QImode);
> > +
> > +  emit_insn (gen_vec_extract_hi_v16si (tmp[3], operands[1]));
> > +  emit_insn (gen_floatv8siv8df2 (tmp[2], tmp[3]));
> > +  emit_insn (gen_rtx_SET (VOIDmode, k,
> > +			  gen_rtx_LT (QImode, tmp[2], tmp[0])));
> > +  emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k));
> > +  emit_move_insn (operands[0], tmp[2]);
> > +  DONE;
> > +})
> 
> Separate patch.  And this is too complicated, since vcvtudq2pd exists.
Moved to [5/8] Extend hooks.

> Non-masked name change again.
See above.
> > +(define_insn "<mask_codefor>avx512f_unpcklps512<mask_name>"
> 
> Ditto.
Above.

> > +(define_insn "<mask_codefor>avx512f_movshdup512<mask_name>"
> 
> Ditto.
Above.

> > +(define_insn "<mask_codefor>avx512f_movsldup512<mask_name>"
> 
> Ditto.
Above.

Updated patch in the bottom.

Bootstrapped.

Coould you pls take a look?

--
Thanks, K

---
 gcc/config/i386/i386.c        |    5 +
 gcc/config/i386/predicates.md |   10 +
 gcc/config/i386/sse.md        | 1440 +++++++++++++++++++++++++++++++++--------
 gcc/config/i386/subst.md      |   56 ++
 4 files changed, 1249 insertions(+), 262 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index febceca..0e91500 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14780,6 +14780,11 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	  /* We do not want to print value of the operand.  */
 	  return;
 
+	case 'N':
+	  if (x == const0_rtx || x == CONST0_RTX (GET_MODE (x)))
+	    fputs ("{z}", file);
+	  return;
+
 	case '*':
 	  if (ASSEMBLER_DIALECT == ASM_ATT)
 	    putc ('*', file);
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 261335d..00a203e 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -687,6 +687,16 @@
   (and (match_code "const_int")
        (match_test "IN_RANGE (INTVAL (op), 0, 3)")))
 
+;; Match 0 to 4.
+(define_predicate "const_0_to_4_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 4)")))
+
+;; Match 0 to 5.
+(define_predicate "const_0_to_5_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 5)")))
+
 ;; Match 0 to 7.
 (define_predicate "const_0_to_7_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 939cc33..ac7f108 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -87,6 +87,7 @@
   ;; For AVX512F support
   UNSPEC_VPERMI2
   UNSPEC_VPERMT2
+  UNSPEC_VPERMI2_MASK
   UNSPEC_UNSIGNED_FIX_NOTRUNC
   UNSPEC_UNSIGNED_PCMP
   UNSPEC_TESTM
@@ -101,9 +102,15 @@
   UNSPEC_GETMANT
   UNSPEC_ALIGN
   UNSPEC_CONFLICT
+  UNSPEC_COMPRESS
+  UNSPEC_COMPRESS_STORE
+  UNSPEC_EXPAND
   UNSPEC_MASKED_EQ
   UNSPEC_MASKED_GT
 
+  ;; For embed. rounding feature
+  UNSPEC_EMBEDDED_ROUNDING
+
   ;; For AVX512PF support
   UNSPEC_GATHER_PREFETCH
   UNSPEC_SCATTER_PREFETCH
@@ -554,6 +561,12 @@
    (V8SF "7") (V4DF "3")
    (V4SF "3") (V2DF "1")])
 
+(define_mode_attr ssescalarsize
+  [(V8DI  "64") (V4DI  "64") (V2DI  "64")
+   (V32HI "16") (V16HI "16") (V8HI "16")
+   (V16SI "32") (V8SI "32") (V4SI "32")
+   (V16SF "32") (V8DF "64")])
+
 ;; SSE prefix for integer vector modes
 (define_mode_attr sseintprefix
   [(V2DI  "p") (V2DF  "")
@@ -610,6 +623,9 @@
 (define_mode_attr bcstscalarsuff
   [(V16SI "d") (V16SF "ss") (V8DI "q") (V8DF "sd")])
 
+;; Include define_subst patterns for instructions with mask
+(include "subst.md")
+
 ;; Patterns whose name begins with "sse{,2,3}_" are invoked by intrinsics.
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -749,6 +765,28 @@
 	      ]
 	      (const_string "<sseinsnmode>")))])
 
+(define_insn "avx512f_load<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v,v")
+	(vec_merge:VI48F_512
+	  (match_operand:VI48F_512 1 "nonimmediate_operand" "v,m")
+	  (match_operand:VI48F_512 2 "vector_move_operand" "0C,0C")
+	  (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+{
+  switch (<sseinsnmode>mode)
+    {
+    case MODE_V8DF:
+    case MODE_V16SF:
+      return "vmova<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}";
+    default:
+      return "vmovdqa<ssescalarsize>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}";
+    }
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "none,load")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "avx512f_blendm<mode>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(vec_merge:VI48F_512
@@ -761,6 +799,28 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_store<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "memory_operand" "=m")
+	(vec_merge:VI48F_512
+	  (match_operand:VI48F_512 1 "register_operand" "v")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
+  "TARGET_AVX512F"
+{
+  switch (<sseinsnmode>mode)
+    {
+    case MODE_V8DF:
+    case MODE_V16SF:
+      return "vmova<ssemodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    default:
+      return "vmovdqa<ssescalarsize>\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    }
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "sse2_movq128"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(vec_concat:V2DI
@@ -852,21 +912,21 @@
   DONE;
 })
 
-(define_insn "<sse>_loadu<ssemodesuffix><avxsizesuffix>"
+(define_insn "<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
 	(unspec:VF
 	  [(match_operand:VF 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_LOADU))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
 {
   switch (get_attr_mode (insn))
     {
     case MODE_V16SF:
     case MODE_V8SF:
     case MODE_V4SF:
-      return "%vmovups\t{%1, %0|%0, %1}";
+      return "%vmovups\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     default:
-      return "%vmovu<ssemodesuffix>\t{%1, %0|%0, %1}";
+      return "%vmovu<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     }
 }
   [(set_attr "type" "ssemov")
@@ -913,12 +973,36 @@
 	      ]
 	      (const_string "<MODE>")))])
 
-(define_insn "<sse2_avx_avx512f>_loaddqu<mode>"
+(define_insn "avx512f_storeu<ssemodesuffix>512_mask"
+  [(set (match_operand:VF_512 0 "memory_operand" "=m")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "v")]
+	    UNSPEC_STOREU)
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
+  "TARGET_AVX512F"
+{
+  switch (get_attr_mode (insn))
+    {
+    case MODE_V16SF:
+      return "vmovups\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    default:
+      return "vmovu<ssemodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+    }
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "movu" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "<sse2_avx_avx512f>_loaddqu<mode><mask_name>"
   [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "register_operand" "=v")
 	(unspec:VI_UNALIGNED_LOADSTORE
 	  [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_LOADU))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
 {
   switch (get_attr_mode (insn))
     {
@@ -927,9 +1011,9 @@
       return "%vmovups\t{%1, %0|%0, %1}";
     case MODE_XI:
       if (<MODE>mode == V8DImode)
-	return "vmovdqu64\t{%1, %0|%0, %1}";
+	return "vmovdqu64\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
       else
-	return "vmovdqu32\t{%1, %0|%0, %1}";
+	return "vmovdqu32\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     default:
       return "%vmovdqu\t{%1, %0|%0, %1}";
     }
@@ -992,6 +1076,27 @@
 	      ]
 	      (const_string "<sseinsnmode>")))])
 
+(define_insn "avx512f_storedqu<mode>_mask"
+  [(set (match_operand:VI48_512 0 "memory_operand" "=m")
+	(vec_merge:VI48_512
+	  (unspec:VI48_512
+	    [(match_operand:VI48_512 1 "register_operand" "v")]
+	    UNSPEC_STOREU)
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
+  "TARGET_AVX512F"
+{
+  if (<MODE>mode == V8DImode)
+    return "vmovdqu64\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+  else
+    return "vmovdqu32\t{%1, %0%{%2%}|%0%{%2%}, %1}";
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "movu" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "<sse3>_lddqu<avxsizesuffix>"
   [(set (match_operand:VI1 0 "register_operand" "=x")
 	(unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")]
@@ -1119,26 +1224,26 @@
 }
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
-(define_expand "<plusminus_insn><mode>3"
+(define_expand "<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(plusminus:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*<plusminus_insn><mode>3"
+(define_insn "*<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(plusminus:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "<comm>0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    <plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   v<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_vm<plusminus_insn><mode>3"
@@ -1158,26 +1263,26 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_expand "mul<mode>3"
+(define_expand "mul<mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(mult:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
 
-(define_insn "*mul<mode>3"
+(define_insn "*mul<mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(mult:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands)"
+  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    mul<ssemodesuffix>\t{%2, %0|%0, %2}
-   vmul<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vmul<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssemul")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<MODE>")])
 
@@ -1195,7 +1300,7 @@
    v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
-   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "prefix" "orig,vex")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1225,18 +1330,18 @@
     }
 })
 
-(define_insn "<sse>_div<mode>3"
+(define_insn "<sse>_div<mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(div:VF
 	  (match_operand:VF 1 "register_operand" "0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
   "@
    div<ssemodesuffix>\t{%2, %0|%0, %2}
-   vdiv<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vdiv<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssediv")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_rcp<mode>2"
@@ -1269,18 +1374,18 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
-(define_insn "rcp14<mode>"
+(define_insn "<mask_codefor>rcp14<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RCP14))]
   "TARGET_AVX512F"
-  "vrcp14<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrcp14<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "srcp14<mode>"
+(define_insn "*srcp14<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1316,11 +1421,11 @@
     }
 })
 
-(define_insn "<sse>_sqrt<mode>2"
+(define_insn "<sse>_sqrt<mode>2<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
 	(sqrt:VF (match_operand:VF 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE"
-  "%vsqrt<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "TARGET_SSE && <mask_mode512bit_condition>"
+  "%vsqrt<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
    (set_attr "btver2_sse_attr" "sqrt")
@@ -1341,8 +1446,8 @@
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
-   (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "prefix" "orig,vex")
+   (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_expand "rsqrt<mode>2"
@@ -1365,18 +1470,18 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "rsqrt14<mode>"
+(define_insn "<mask_codefor>rsqrt14<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RSQRT14))]
   "TARGET_AVX512F"
-  "vrsqrt14<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrsqrt14<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "rsqrt14<mode>"
+(define_insn "*rsqrt14<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1411,47 +1516,49 @@
 ;; isn't really correct, as those rtl operators aren't defined when
 ;; applied to NaNs.  Hopefully the optimizers won't get too smart on us.
 
-(define_expand "<code><mode>3"
+(define_expand "<code><mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(smaxmin:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
 {
   if (!flag_finite_math_only)
     operands[1] = force_reg (<MODE>mode, operands[1]);
   ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);
 })
 
-(define_insn "*<code><mode>3_finite"
+(define_insn "*<code><mode>3_finite<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(smaxmin:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
   "TARGET_SSE && flag_finite_math_only
-   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
+   && <mask_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*<code><mode>3"
+(define_insn "*<code><mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(smaxmin:VF
 	  (match_operand:VF 1 "register_operand" "0,v")
 	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && !flag_finite_math_only"
+  "TARGET_SSE && !flag_finite_math_only
+   && <mask_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_vm<code><mode>3"
@@ -2029,6 +2136,24 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
+(define_insn "avx512f_vmcmp<mode>3_mask"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
+	(and:<avx512fmaskmode>
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:VF_128 1 "register_operand" "v")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
+	    UNSPEC_PCMP)
+	  (and:<avx512fmaskmode>
+	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k")
+	    (const_int 1))))]
+  "TARGET_AVX512F"
+  "vcmp<ssescalarmodesuffix>\t{%3, %2, %1, %0%{%4%}|%0%{%4%}, %1, %2, %3}"
+  [(set_attr "type" "ssecmp")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<ssescalarmode>")])
+
 (define_insn "avx512f_maskcmp<mode>3"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(match_operator:<avx512fmaskmode> 3 "sse_comparison_operator"
@@ -2579,7 +2704,39 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "fma_fmsub_<mode>"
+(define_insn "avx512f_fmadd_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "0,0")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmadd_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=x")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "x")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 3 "register_operand" "0"))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fma_fmsub_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0, 0, v, x,x")
@@ -2597,7 +2754,41 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "fma_fnmadd_<mode>"
+(define_insn "avx512f_fmsub_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "0,0")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmsub_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (match_operand:VF_512 1 "register_operand" "v")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "register_operand" "0")))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fma_fnmadd_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
@@ -2615,6 +2806,40 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fnmadd_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "0,0"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfnmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfnmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fnmadd_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "v"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 3 "register_operand" "0"))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfnmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*fma_fnmsub_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
@@ -2634,6 +2859,42 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fnmsub_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "0,0"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfnmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfnmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fnmsub_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (fma:VF_512
+	    (neg:VF_512
+	      (match_operand:VF_512 1 "register_operand" "v"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (neg:VF_512
+	      (match_operand:VF_512 3 "register_operand" "0")))
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfnmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 ;; FMA parallel floating point multiply addsub and subadd operations.
 
 ;; It would be possible to represent these without the UNSPEC as
@@ -2672,6 +2933,40 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fmaddsub_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "0,0")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	     (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmaddsub_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "v")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_512 3 "register_operand" "0")]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmaddsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*fma_fmsubadd_<mode>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
@@ -2691,6 +2986,42 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fmsubadd_<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v,v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "0,0")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	     (neg:VF_512
+	       (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "@
+   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
+   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+  [(set_attr "isa" "fma_avx512f,fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512f_fmsubadd_<mode>_mask3"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+	  (unspec:VF_512
+	    [(match_operand:VF_512 1 "register_operand" "v")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (neg:VF_512
+	       (match_operand:VF_512 3 "register_operand" "0"))]
+	    UNSPEC_FMADDSUB)
+	  (match_dup 3)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfmsubadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  [(set_attr "isa" "fma_avx512f")
+   (set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 ;; FMA3 floating point scalar intrinsics. These merge result with
 ;; high-order elements from the destination register.
 
@@ -3014,7 +3345,7 @@
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(fix:DI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "v,vm")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE && TARGET_64BIT"
   "%vcvttss2si{q}\t{%1, %0|%0, %k1}"
@@ -3054,22 +3385,22 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "float<sseintvecmodelower><mode>2"
+(define_insn "float<sseintvecmodelower><mode>2<mask_name>"
   [(set (match_operand:VF1 0 "register_operand" "=v")
 	(float:VF1
 	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE2"
-  "%vcvtdq2ps\t{%1, %0|%0, %1}"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
+  "%vcvtdq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "ufloatv16siv16sf2"
+(define_insn "ufloatv16siv16sf2<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(unsigned_float:V16SF
 	  (match_operand:V16SI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvtudq2ps\t{%1, %0|%0, %1}"
+  "vcvtudq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -3104,34 +3435,34 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_fix_notruncv16sfv16si"
+(define_insn "<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
 	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2dq\t{%1, %0|%0, %1}"
+  "vcvtps2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "avx512f_ufix_notruncv16sfv16si"
+(define_insn "<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
 	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2udq\t{%1, %0|%0, %1}"
+  "vcvtps2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "<fixsuffix>fix_truncv16sfv16si2"
+(define_insn "<fixsuffix>fix_truncv16sfv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_fix:V16SI
 	  (match_operand:V16SF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvttps2<fixsuffix>dq\t{%1, %0|%0, %1}"
+  "vcvttps2<fixsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -3461,20 +3792,21 @@
 (define_mode_attr si2dfmodelower
   [(V8DF "v8si") (V4DF "v4si")])
 
-(define_insn "float<si2dfmodelower><mode>2"
+(define_insn "float<si2dfmodelower><mode>2<mask_name>"
   [(set (match_operand:VF2_512_256 0 "register_operand" "=v")
 	(float:VF2_512_256 (match_operand:<si2dfmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX"
-  "vcvtdq2pd\t{%1, %0|%0, %1}"
+  "TARGET_AVX && <mask_mode512bit_condition>"
+  "vcvtdq2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "ufloatv8siv8df"
+(define_insn "ufloatv8siv8df<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v")
-	(unsigned_float:V8DF (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
+	(unsigned_float:V8DF
+	  (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvtudq2pd\t{%1, %0|%0, %1}"
+  "vcvtudq2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8DF")])
@@ -3519,12 +3851,13 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V2DF")])
 
-(define_insn "avx512f_cvtpd2dq512"
+(define_insn "<mask_codefor>avx512f_cvtpd2dq512<mask_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
-	(unspec:V8SI [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
-		     UNSPEC_FIX_NOTRUNC))]
+	(unspec:V8SI
+	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
+	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2dq\t{%1, %0|%0, %1}"
+  "vcvtpd2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -3592,22 +3925,23 @@
    (set_attr "athlon_decode" "vector")
    (set_attr "bdver1_decode" "double")])
 
-(define_insn "avx512f_ufix_notruncv8dfv8si"
+(define_insn "avx512f_ufix_notruncv8dfv8si<mask_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(unspec:V8SI
 	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2udq\t{%1, %0|%0, %1}"
+  "vcvtpd2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
 
-(define_insn "<fixsuffix>fix_truncv8dfv8si2"
+(define_insn "<fixsuffix>fix_truncv8dfv8si2<mask_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
-	(any_fix:V8SI (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
+	(any_fix:V8SI
+	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvttpd2<fixsuffix>dq\t{%1, %0|%0, %1}"
+  "vcvttpd2<fixsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -3713,12 +4047,12 @@
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "DF")])
 
-(define_insn "avx512f_cvtpd2ps512"
+(define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name>"
   [(set (match_operand:V8SF 0 "register_operand" "=v")
 	(float_truncate:V8SF
 	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vcvtpd2ps\t{%1, %0|%0, %1}"
+  "vcvtpd2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8SF")])
@@ -3768,12 +4102,12 @@
 (define_mode_attr sf2dfmode
   [(V8DF "V8SF") (V4DF "V4SF")])
 
-(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix>"
+(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix><mask_name>"
   [(set (match_operand:VF2_512_256 0 "register_operand" "=v")
 	(float_extend:VF2_512_256
 	  (match_operand:<sf2dfmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX"
-  "vcvtps2pd\t{%1, %0|%0, %1}"
+  "TARGET_AVX && <mask_mode512bit_condition>"
+  "vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
@@ -4122,6 +4456,30 @@
   DONE;
 })
 
+(define_expand "vec_unpacku_float_lo_v16si"
+  [(match_operand:V8DF 0 "register_operand")
+   (match_operand:V16SI 1 "nonimmediate_operand")]
+  "TARGET_AVX512F"
+{
+  REAL_VALUE_TYPE TWO32r;
+  rtx k, x, tmp[3];
+
+  real_ldexp (&TWO32r, &dconst1, 32);
+  x = const_double_from_real_value (TWO32r, DFmode);
+
+  tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode));
+  tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x));
+  tmp[2] = gen_reg_rtx (V8DFmode);
+  k = gen_reg_rtx (QImode);
+
+  emit_insn (gen_avx512f_cvtdq2pd512_2 (tmp[2], operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode, k,
+			  gen_rtx_LT (QImode, tmp[2], tmp[0])));
+  emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k));
+  emit_move_insn (operands[0], tmp[2]);
+  DONE;
+})
+
 (define_expand "vec_pack_trunc_<mode>"
   [(set (match_dup 3)
 	(float_truncate:<sf2dfmode>
@@ -4409,7 +4767,7 @@
    (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
    (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
 
-(define_insn "avx512f_unpckhps512"
+(define_insn "<mask_codefor>avx512f_unpckhps512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4424,7 +4782,7 @@
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
   "TARGET_AVX512F"
-  "vunpckhps\t{%2, %1, %0|%0, %1, %2}"
+  "vunpckhps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -4497,7 +4855,7 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V4SF")])
 
-(define_insn "avx512f_unpcklps512"
+(define_insn "<mask_codefor>avx512f_unpcklps512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4512,7 +4870,7 @@
 		     (const_int 12) (const_int 28)
 		     (const_int 13) (const_int 29)])))]
   "TARGET_AVX512F"
-  "vunpcklps\t{%2, %1, %0|%0, %1, %2}"
+  "vunpcklps\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -4620,7 +4978,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V4SF")])
 
-(define_insn "avx512f_movshdup512"
+(define_insn "<mask_codefor>avx512f_movshdup512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4635,7 +4993,7 @@
 		     (const_int 13) (const_int 13)
 		     (const_int 15) (const_int 15)])))]
   "TARGET_AVX512F"
-  "vmovshdup\t{%1, %0|%0, %1}"
+  "vmovshdup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -4673,7 +5031,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V4SF")])
 
-(define_insn "avx512f_movsldup512"
+(define_insn "<mask_codefor>avx512f_movsldup512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -4688,7 +5046,7 @@
 		     (const_int 12) (const_int 12)
 		     (const_int 14) (const_int 14)])))]
   "TARGET_AVX512F"
-  "vmovsldup\t{%1, %0|%0, %1}"
+  "vmovsldup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -5222,8 +5580,71 @@
   operands[1] = adjust_address (operands[1], SFmode, INTVAL (operands[2]) * 4);
 })
 
-(define_insn "avx512f_vextract<shuffletype>32x4_1"
-  [(set (match_operand:<ssequartermode> 0 "nonimmediate_operand" "=vm")
+(define_expand "avx512f_vextract<shuffletype>32x4_mask"
+  [(match_operand:<ssequartermode> 0 "nonimmediate_operand")
+   (match_operand:V16FI 1 "register_operand")
+   (match_operand:SI 2 "const_0_to_3_operand")
+   (match_operand:<ssequartermode> 3 "nonimmediate_operand")
+   (match_operand:QI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
+    operands[0] = force_reg (<ssequartermode>mode, operands[0]);
+  switch (INTVAL (operands[2]))
+    {
+    case 0:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (0), GEN_INT (1), GEN_INT (2),
+          GEN_INT (3), operands[3], operands[4]));
+      break;
+    case 1:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (4), GEN_INT (5), GEN_INT (6),
+          GEN_INT (7), operands[3], operands[4]));
+      break;
+    case 2:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (8), GEN_INT (9), GEN_INT (10),
+          GEN_INT (11), operands[3], operands[4]));
+      break;
+    case 3:
+      emit_insn (gen_avx512f_vextract<shuffletype>32x4_1_mask (operands[0],
+          operands[1], GEN_INT (12), GEN_INT (13), GEN_INT (14),
+          GEN_INT (15), operands[3], operands[4]));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  DONE;
+})
+
+(define_insn "avx512f_vextract<shuffletype>32x4_1_maskm"
+  [(set (match_operand:<ssequartermode> 0 "memory_operand" "=m")
+	(vec_merge:<ssequartermode>
+	  (vec_select:<ssequartermode>
+	    (match_operand:V16FI 1 "register_operand" "v")
+	    (parallel [(match_operand 2  "const_0_to_15_operand")
+	      (match_operand 3  "const_0_to_15_operand")
+	      (match_operand 4  "const_0_to_15_operand")
+	      (match_operand 5  "const_0_to_15_operand")]))
+	  (match_operand:<ssequartermode> 6 "memory_operand" "0")
+	  (match_operand:QI 7 "register_operand" "k")))]
+  "TARGET_AVX512F && (INTVAL (operands[2]) = INTVAL (operands[3]) - 1)
+  && (INTVAL (operands[3]) = INTVAL (operands[4]) - 1)
+  && (INTVAL (operands[4]) = INTVAL (operands[5]) - 1)"
+{
+  operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2);
+  return "vextract<shuffletype>32x4\t{%2, %1, %0%{%7%}|%0%{%7%}, %1, %2}";
+}
+  [(set_attr "type" "sselog")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>"
+  [(set (match_operand:<ssequartermode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
 	(vec_select:<ssequartermode>
 	  (match_operand:V16FI 1 "register_operand" "v")
 	  (parallel [(match_operand 2  "const_0_to_15_operand")
@@ -5235,7 +5656,7 @@
   && (INTVAL (operands[4]) = INTVAL (operands[5]) - 1)"
 {
   operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2);
-  return "vextract<shuffletype>32x4\t{%2, %1, %0|%0, %1, %2}";
+  return "vextract<shuffletype>32x4\t{%2, %1, %0<mask_operand6>|%0<mask_operand6>, %1, %2}";
 }
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
@@ -5247,6 +5668,35 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_expand "avx512f_vextract<shuffletype>64x4_mask"
+  [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
+   (match_operand:V8FI 1 "register_operand")
+   (match_operand:SI 2 "const_0_to_1_operand")
+   (match_operand:<ssehalfvecmode> 3 "nonimmediate_operand")
+   (match_operand:QI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  rtx (*insn)(rtx, rtx, rtx, rtx);
+
+  if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR)
+    operands[0] = force_reg (<ssequartermode>mode, operands[0]);
+
+  switch (INTVAL (operands[2]))
+    {
+    case 0:
+      insn = gen_vec_extract_lo_<mode>_mask;
+      break;
+    case 1:
+      insn = gen_vec_extract_hi_<mode>_mask;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (insn (operands[0], operands[1], operands[3], operands[4]));
+  DONE;
+})
+
 (define_split
   [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
 	(vec_select:<ssehalfvecmode>
@@ -5266,14 +5716,36 @@
   DONE;
 })
 
-(define_insn "vec_extract_lo_<mode>"
-  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
+(define_insn "vec_extract_lo_<mode>_maskm"
+  [(set (match_operand:<ssehalfvecmode> 0 "memory_operand" "=m")
+	(vec_merge:<ssehalfvecmode>
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V8FI 1 "register_operand" "v")
+	    (parallel [(const_int 0) (const_int 1)
+	      (const_int 2) (const_int 3)]))
+	  (match_operand:<ssehalfvecmode> 2 "memory_operand" "0")
+	  (match_operand:QI 3 "register_operand" "k")))]
+  "TARGET_AVX512F"
+"vextract<shuffletype>64x4\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "vec_extract_lo_<mode><mask_name>"
+  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V8FI 1 "nonimmediate_operand" "vm")
 	  (parallel [(const_int 0) (const_int 1)
             (const_int 2) (const_int 3)])))]
   "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
-  "#"
+{
+  if (<mask_applied>)
+    return "vextract<shuffletype>64x4\t{$0x0, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x0}";
+  else
+    return "#";
+}
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -5284,14 +5756,32 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "vec_extract_hi_<mode>"
-  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
+(define_insn "vec_extract_hi_<mode>_maskm"
+  [(set (match_operand:<ssehalfvecmode> 0 "memory_operand" "=m")
+	(vec_merge:<ssehalfvecmode>
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V8FI 1 "register_operand" "v")
+	    (parallel [(const_int 4) (const_int 5)
+	      (const_int 6) (const_int 7)]))
+	  (match_operand:<ssehalfvecmode> 2 "memory_operand" "0")
+	  (match_operand:QI 3 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vextract<shuffletype>64x4\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "vec_extract_hi_<mode><mask_name>"
+  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
 	(vec_select:<ssehalfvecmode>
 	  (match_operand:V8FI 1 "register_operand" "v")
 	  (parallel [(const_int 4) (const_int 5)
             (const_int 6) (const_int 7)])))]
   "TARGET_AVX512F"
-  "vextract<shuffletype>64x4\t{$0x1, %1, %0|%0, %1, 0x1}"
+  "vextract<shuffletype>64x4\t{$0x1, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x1}"
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -5643,7 +6133,7 @@
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
-(define_insn "avx512f_unpckhpd512"
+(define_insn "<mask_codefor>avx512f_unpckhpd512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5654,7 +6144,7 @@
 		     (const_int 5) (const_int 13)
 		     (const_int 7) (const_int 15)])))]
   "TARGET_AVX512F"
-  "vunpckhpd\t{%2, %1, %0|%0, %1, %2}"
+  "vunpckhpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8DF")])
@@ -5739,7 +6229,7 @@
    (set_attr "prefix" "orig,vex,maybe_vex,orig,vex,maybe_vex")
    (set_attr "mode" "V2DF,V2DF,DF,V1DF,V1DF,V1DF")])
 
-(define_expand "avx512f_movddup512"
+(define_expand "avx512f_movddup512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5751,7 +6241,7 @@
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F")
 
-(define_expand "avx512f_unpcklpd512"
+(define_expand "avx512f_unpcklpd512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5763,7 +6253,7 @@
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F")
 
-(define_insn "*avx512f_unpcklpd512"
+(define_insn "*avx512f_unpcklpd512<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v,v")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -5775,8 +6265,8 @@
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F"
   "@
-   vunpcklpd\t{%2, %1, %0|%0, %1, %2}
-   vmovddup\t{%1, %0|%0, %1}"
+   vunpcklpd\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}
+   vmovddup\t{%1, %0<mask_operand3>|%0<mask_operand3>, %1}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8DF")])
@@ -5913,12 +6403,13 @@
   operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8);
 })
 
-(define_insn "avx512f_vmscalef<mode>"
+(define_insn "*avx512f_vmscalef<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
-	  (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v")
-			  (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
-			 UNSPEC_SCALEF)
+	  (unspec:VF_128
+	    [(match_operand:VF_128 1 "register_operand" "v")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	    UNSPEC_SCALEF)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
@@ -5926,13 +6417,14 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
-(define_insn "avx512f_scalef<mode>"
+(define_insn "avx512f_scalef<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
-	(unspec:VF_512 [(match_operand:VF_512 1 "register_operand" "v")
-			(match_operand:VF_512 2 "nonimmediate_operand" "vm")]
-		       UNSPEC_SCALEF))]
+	(unspec:VF_512
+	  [(match_operand:VF_512 1 "register_operand" "v")
+	   (match_operand:VF_512 2 "nonimmediate_operand" "vm")]
+	  UNSPEC_SCALEF))]
   "TARGET_AVX512F"
-  "%vscalef<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vscalef<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<MODE>")])
 
@@ -5950,21 +6442,39 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getexp<mode>"
+(define_insn "avx512f_vternlog<mode>_mask"
+  [(set (match_operand:VI48_512 0 "register_operand" "=v")
+	(vec_merge:VI48_512
+	  (unspec:VI48_512
+	    [(match_operand:VI48_512 1 "register_operand" "0")
+	     (match_operand:VI48_512 2 "register_operand" "v")
+	     (match_operand:VI48_512 3 "nonimmediate_operand" "vm")
+	     (match_operand:SI 4 "const_0_to_255_operand")]
+	    UNSPEC_VTERNLOG)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vpternlog<ssemodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_getexp<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
         (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
                         UNSPEC_GETEXP))]
    "TARGET_AVX512F"
-   "vgetexp<ssemodesuffix>\t{%1, %0|%0, %1}";
+   "vgetexp<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<MODE>")])
 
 (define_insn "avx512f_sgetexp<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
-	  (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v")
-			  (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
-			 UNSPEC_GETEXP)
+	  (unspec:VF_128
+	    [(match_operand:VF_128 1 "register_operand" "v")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	    UNSPEC_GETEXP)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
@@ -5972,17 +6482,48 @@
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_align<mode>"
+(define_insn "<mask_codefor>avx512f_align<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
         (unspec:VI48_512 [(match_operand:VI48_512 1 "register_operand" "v")
 			  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")
 			  (match_operand:SI 3 "const_0_to_255_operand")]
 			 UNSPEC_ALIGN))]
   "TARGET_AVX512F"
-  "valign<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  "valign<ssemodesuffix>\t{%3, %2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2, %3}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_expand "avx512f_shufps512_mask"
+  [(match_operand:V16SF 0 "register_operand")
+   (match_operand:V16SF 1 "register_operand")
+   (match_operand:V16SF 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V16SF 4 "register_operand")
+   (match_operand:HI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shufps512_1_mask (operands[0], operands[1], operands[2],
+					  GEN_INT ((mask >> 0) & 3),
+					  GEN_INT ((mask >> 2) & 3),
+					  GEN_INT (((mask >> 4) & 3) + 16),
+					  GEN_INT (((mask >> 6) & 3) + 16),
+					  GEN_INT (((mask >> 0) & 3) + 4),
+					  GEN_INT (((mask >> 2) & 3) + 4),
+					  GEN_INT (((mask >> 4) & 3) + 20),
+					  GEN_INT (((mask >> 6) & 3) + 20),
+					  GEN_INT (((mask >> 0) & 3) + 8),
+					  GEN_INT (((mask >> 2) & 3) + 8),
+					  GEN_INT (((mask >> 4) & 3) + 24),
+					  GEN_INT (((mask >> 6) & 3) + 24),
+					  GEN_INT (((mask >> 0) & 3) + 12),
+					  GEN_INT (((mask >> 2) & 3) + 12),
+					  GEN_INT (((mask >> 4) & 3) + 28),
+					  GEN_INT (((mask >> 6) & 3) + 28),
+					  operands[4], operands[5]));
+  DONE;
+})
+
 (define_insn "avx512f_fixupimm<mode>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
         (unspec:VF_512
@@ -5996,6 +6537,22 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "avx512f_fixupimm<mode>_mask"
+  [(set (match_operand:VF_512 0 "register_operand" "=v")
+	(vec_merge:VF_512
+          (unspec:VF_512
+            [(match_operand:VF_512 1 "register_operand" "0")
+	     (match_operand:VF_512 2 "register_operand" "v")
+             (match_operand:<ssefixupmode> 3 "nonimmediate_operand" "vm")
+             (match_operand:SI 4 "const_0_to_255_operand")]
+             UNSPEC_FIXUPIMM)
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfixupimm<ssemodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}";
+  [(set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "avx512f_sfixupimm<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
@@ -6012,19 +6569,38 @@
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_rndscale<mode>"
+(define_insn "avx512f_sfixupimm<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (unspec:VF_128
+	       [(match_operand:VF_128 1 "register_operand" "0")
+		(match_operand:VF_128 2 "register_operand" "v")
+		(match_operand:<ssefixupmode> 3 "nonimmediate_operand" "vm")
+		(match_operand:SI 4 "const_0_to_255_operand")]
+	       UNSPEC_FIXUPIMM)
+	    (match_dup 1)
+	    (const_int 1))
+	  (match_dup 1)
+	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vfixupimm<ssescalarmodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}";
+  [(set_attr "prefix" "evex")
+   (set_attr "mode" "<ssescalarmode>")])
+
+(define_insn "avx512f_rndscale<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_AVX512F"
-  "vrndscale<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrndscale<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_rndscale<mode>"
+(define_insn "*avx512f_rndscale<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -6041,7 +6617,7 @@
    (set_attr "mode" "<MODE>")])
 
 ;; One bit in mask selects 2 elements.
-(define_insn "avx512f_shufps512_1"
+(define_insn "avx512f_shufps512_1<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(vec_select:V16SF
 	  (vec_concat:V32SF
@@ -6084,14 +6660,37 @@
   mask |= (INTVAL (operands[6]) - 16) << 6;
   operands[3] = GEN_INT (mask);
 
-  return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshufps\t{%3, %2, %1, %0<mask_operand19>|%0<mask_operand19>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
 
-(define_insn "avx512f_shufpd512_1"
+(define_expand "avx512f_shufpd512_mask"
+  [(match_operand:V8DF 0 "register_operand")
+   (match_operand:V8DF 1 "register_operand")
+   (match_operand:V8DF 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V8DF 4 "register_operand")
+   (match_operand:QI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shufpd512_1_mask (operands[0], operands[1], operands[2],
+					GEN_INT (mask & 1),
+					GEN_INT (mask & 2 ? 9 : 8),
+					GEN_INT (mask & 4 ? 3 : 2),
+					GEN_INT (mask & 8 ? 11 : 10),
+					GEN_INT (mask & 16 ? 5 : 4),
+					GEN_INT (mask & 32 ? 13 : 12),
+					GEN_INT (mask & 64 ? 7 : 6),
+					GEN_INT (mask & 128 ? 15 : 14),
+					operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_shufpd512_1<mask_name>"
   [(set (match_operand:V8DF 0 "register_operand" "=v")
 	(vec_select:V8DF
 	  (vec_concat:V16DF
@@ -6118,7 +6717,7 @@
   mask |= (INTVAL (operands[10]) - 14) << 7;
   operands[3] = GEN_INT (mask);
 
-  return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshufpd\t{%3, %2, %1, %0<mask_operand11>|%0<mask_operand11>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
@@ -6198,7 +6797,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_highv8di"
+(define_insn "<mask_codefor>avx512f_interleave_highv8di<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(vec_select:V8DI
 	  (vec_concat:V16DI
@@ -6209,7 +6808,7 @@
 		     (const_int 5) (const_int 13)
 		     (const_int 7) (const_int 15)])))]
   "TARGET_AVX512F"
-  "vpunpckhqdq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpckhqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -6248,7 +6847,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_lowv8di"
+(define_insn "<mask_codefor>avx512f_interleave_lowv8di<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(vec_select:V8DI
 	  (vec_concat:V16DI
@@ -6259,7 +6858,7 @@
 		     (const_int 4) (const_int 12)
 		     (const_int 6) (const_int 14)])))]
   "TARGET_AVX512F"
-  "vpunpcklqdq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpcklqdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -6630,6 +7229,20 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_<code><pmov_src_lower><mode>2_mask"
+  [(set (match_operand:PMOV_DST_MODE 0 "nonimmediate_operand" "=v,m")
+    (vec_merge:PMOV_DST_MODE
+      (any_truncate:PMOV_DST_MODE
+        (match_operand:<pmov_src_mode> 1 "register_operand" "v,v"))
+      (match_operand:PMOV_DST_MODE 2 "vector_move_operand" "0C,0")
+      (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")))]
+  "TARGET_AVX512F"
+  "vpmov<trunsuffix><pmov_suff>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "memory" "none,store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "*avx512f_<code>v8div16qi2"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 	(vec_concat:V16QI
@@ -6663,6 +7276,55 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "TI")])
 
+(define_insn "avx512f_<code>v8div16qi2_mask"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+    (vec_concat:V16QI
+      (vec_merge:V8QI
+        (any_truncate:V8QI
+          (match_operand:V8DI 1 "register_operand" "v"))
+        (vec_select:V8QI
+          (match_operand:V16QI 2 "vector_move_operand" "0C")
+          (parallel [(const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)]))
+        (match_operand:QI 3 "register_operand" "k"))
+      (const_vector:V8QI [(const_int 0) (const_int 0)
+                          (const_int 0) (const_int 0)
+                          (const_int 0) (const_int 0)
+                          (const_int 0) (const_int 0)])))]
+  "TARGET_AVX512F"
+  "vpmov<trunsuffix>qb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+(define_insn "*avx512f_<code>v8div16qi2_store_mask"
+  [(set (match_operand:V16QI 0 "memory_operand" "=m")
+    (vec_concat:V16QI
+      (vec_merge:V8QI
+        (any_truncate:V8QI
+          (match_operand:V8DI 1 "register_operand" "v"))
+        (vec_select:V8QI
+          (match_dup 0)
+          (parallel [(const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)]))
+        (match_operand:QI 2 "register_operand" "k"))
+      (vec_select:V8QI
+        (match_dup 0)
+        (parallel [(const_int 8) (const_int 9)
+                   (const_int 10) (const_int 11)
+                   (const_int 12) (const_int 13)
+                   (const_int 14) (const_int 15)]))))]
+  "TARGET_AVX512F"
+  "vpmov<trunsuffix>qb\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "memory" "store")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel integral arithmetic
@@ -6677,27 +7339,27 @@
   "TARGET_SSE2"
   "operands[2] = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));")
 
-(define_expand "<plusminus_insn><mode>3"
+(define_expand "<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VI_AVX2 0 "register_operand")
 	(plusminus:VI_AVX2
 	  (match_operand:VI_AVX2 1 "nonimmediate_operand")
 	  (match_operand:VI_AVX2 2 "nonimmediate_operand")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*<plusminus_insn><mode>3"
+(define_insn "*<plusminus_insn><mode>3<mask_name>"
   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
 	(plusminus:VI_AVX2
 	  (match_operand:VI_AVX2 1 "nonimmediate_operand" "<comm>0,v")
 	  (match_operand:VI_AVX2 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+  "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
-   vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseiadd")
    (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "<sse2_avx2>_<plusminus_insn><mode>3"
@@ -6787,7 +7449,7 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "vec_widen_umult_even_v16si"
+(define_expand "vec_widen_umult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand")
         (mult:V8DI
           (zero_extend:V8DI
@@ -6807,7 +7469,7 @@
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
 
-(define_insn "*vec_widen_umult_even_v16si"
+(define_insn "*vec_widen_umult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
         (mult:V8DI
           (zero_extend:V8DI
@@ -6825,7 +7487,7 @@
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
   "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)"
-  "vpmuludq\t{%2, %1, %0|%0, %1, %2}"
+  "vpmuludq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
@@ -6902,7 +7564,7 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
-(define_expand "vec_widen_smult_even_v16si"
+(define_expand "vec_widen_smult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand")
         (mult:V8DI
           (sign_extend:V8DI
@@ -6922,7 +7584,7 @@
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);")
 
-(define_insn "*vec_widen_smult_even_v16si"
+(define_insn "*vec_widen_smult_even_v16si<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=x")
         (mult:V8DI
           (sign_extend:V8DI
@@ -6940,7 +7602,7 @@
                          (const_int 8) (const_int 10)
                          (const_int 12) (const_int 14)])))))]
   "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)"
-  "vpmuldq\t{%2, %1, %0|%0, %1, %2}"
+  "vpmuldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
@@ -7151,12 +7813,12 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
-(define_expand "mul<mode>3"
+(define_expand "mul<mode>3<mask_name>"
   [(set (match_operand:VI4_AVX512F 0 "register_operand")
 	(mult:VI4_AVX512F
 	  (match_operand:VI4_AVX512F 1 "general_vector_operand")
 	  (match_operand:VI4_AVX512F 2 "general_vector_operand")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && <mask_mode512bit_condition>"
 {
   if (TARGET_SSE4_1)
     {
@@ -7173,19 +7835,19 @@
     }
 })
 
-(define_insn "*<sse4_1_avx2>_mul<mode>3"
+(define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
   [(set (match_operand:VI4_AVX512F 0 "register_operand" "=x,v")
 	(mult:VI4_AVX512F
 	  (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands)"
+  "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
   "@
    pmulld\t{%2, %0|%0, %2}
-   vpmulld\t{%2, %1, %0|%0, %1, %2}"
+   vpmulld\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set_attr "btver2_decode" "vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -7298,6 +7960,20 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "ashr<mode>3<mask_name>"
+  [(set (match_operand:VI48_512 0 "register_operand" "=v,v")
+	(ashiftrt:VI48_512
+	  (match_operand:VI48_512 1 "nonimmediate_operand" "v,vm")
+	  (match_operand:SI 2 "nonmemory_operand" "v,N")))]
+  "TARGET_AVX512F && <mask_mode512bit_condition>"
+  "vpsra<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+  [(set_attr "type" "sseishft")
+   (set (attr "length_immediate")
+     (if_then_else (match_operand 2 "const_int_operand")
+       (const_string "1")
+       (const_string "0")))
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "<shift_insn><mode>3"
   [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x")
 	(any_lshift:VI248_AVX2
@@ -7317,13 +7993,13 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<shift_insn><mode>3"
+(define_insn "<shift_insn><mode>3<mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v,v")
 	(any_lshift:VI48_512
 	  (match_operand:VI48_512 1 "register_operand" "v,m")
 	  (match_operand:SI 2 "nonmemory_operand" "vN,N")))]
-  "TARGET_AVX512F"
-  "vp<vshift><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX512F && <mask_mode512bit_condition>"
+  "vp<vshift><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "avx512f")
    (set_attr "type" "sseishft")
    (set (attr "length_immediate")
@@ -7333,6 +8009,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+
 (define_expand "vec_shl_<mode>"
   [(set (match_operand:VI_128 0 "register_operand")
 	(ashift:V1TI
@@ -7408,41 +8085,42 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_<rotate>v<mode>"
+(define_insn "avx512f_<rotate>v<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(any_rotate:VI48_512
 	  (match_operand:VI48_512 1 "register_operand" "v")
 	  (match_operand:VI48_512 2 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vp<rotate>v<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vp<rotate>v<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_<rotate><mode>"
+(define_insn "avx512f_<rotate><mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(any_rotate:VI48_512
 	  (match_operand:VI48_512 1 "nonimmediate_operand" "vm")
 	  (match_operand:SI 2 "const_0_to_255_operand")))]
   "TARGET_AVX512F"
-  "vp<rotate><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vp<rotate><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "<code><mode>3"
+(define_expand "<code><mode>3<mask_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand")
 	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand")))]
-  "TARGET_AVX2"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*avx2_<code><mode>3"
+(define_insn "*avx2_<code><mode>3<mask_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand" "=v")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand" "%v")
 	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
-  "vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
+   && <mask_mode512bit_condition>"
+  "vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseiadd")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_evex")
@@ -7978,19 +8656,19 @@
   operands[2] = force_reg (<MODE>mode, gen_rtx_CONST_VECTOR (<MODE>mode, v));
 })
 
-(define_expand "<sse2_avx2>_andnot<mode>3"
+(define_expand "<sse2_avx2>_andnot<mode>3<mask_name>"
   [(set (match_operand:VI_AVX2 0 "register_operand")
 	(and:VI_AVX2
 	  (not:VI_AVX2 (match_operand:VI_AVX2 1 "register_operand"))
 	  (match_operand:VI_AVX2 2 "nonimmediate_operand")))]
-  "TARGET_SSE2")
+  "TARGET_SSE2 && <mask_mode512bit_condition>")
 
-(define_insn "*andnot<mode>3"
+(define_insn "*andnot<mode>3<mask_name>"
   [(set (match_operand:VI 0 "register_operand" "=x,v")
 	(and:VI
 	  (not:VI (match_operand:VI 1 "register_operand" "0,v"))
 	  (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE"
+  "TARGET_SSE && <mask_mode512bit_condition>"
 {
   static char buf[64];
   const char *ops;
@@ -8030,7 +8708,7 @@
       ops = "%s\t{%%2, %%0|%%0, %%2}";
       break;
     case 1:
-      ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
+      ops = "v%s\t{%%2, %%1, %%0<mask_operand3_1>|%%0<mask_operand3_1>, %%1, %%2}";
       break;
     default:
       gcc_unreachable ();
@@ -8047,7 +8725,7 @@
 	    (eq_attr "mode" "TI"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set (attr "mode")
 	(cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
 		 (const_string "<ssePSmode>")
@@ -8075,12 +8753,12 @@
   DONE;
 })
 
-(define_insn "*<code><mode>3"
+(define_insn "<mask_codefor><code><mode>3<mask_name>"
   [(set (match_operand:VI 0 "register_operand" "=x,v")
 	(any_logic:VI
 	  (match_operand:VI 1 "nonimmediate_operand" "%0,v")
 	  (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE
+  "TARGET_SSE && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
   static char buf[64];
@@ -8122,7 +8800,7 @@
       ops = "%s\t{%%2, %%0|%%0, %%2}";
       break;
     case 1:
-      ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
+      ops = "v%s\t{%%2, %%1, %%0<mask_operand3_1>|%%0<mask_operand3_1>, %%1, %%2}";
       break;
     default:
       gcc_unreachable ();
@@ -8139,7 +8817,7 @@
 	    (eq_attr "mode" "TI"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_prefix3>")
    (set (attr "mode")
 	(cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
 		 (const_string "<ssePSmode>")
@@ -8447,7 +9125,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_highv16si"
+(define_insn "<mask_codefor>avx512f_interleave_highv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(vec_select:V16SI
 	  (vec_concat:V32SI
@@ -8462,7 +9140,7 @@
 		     (const_int 14) (const_int 30)
 		     (const_int 15) (const_int 31)])))]
   "TARGET_AVX512F"
-  "vpunpckhdq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpckhdq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -8502,7 +9180,7 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
-(define_insn "avx512f_interleave_lowv16si"
+(define_insn "<mask_codefor>avx512f_interleave_lowv16si<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(vec_select:V16SI
 	  (vec_concat:V32SI
@@ -8517,7 +9195,7 @@
 		     (const_int 12) (const_int 28)
 		     (const_int 13) (const_int 29)])))]
   "TARGET_AVX512F"
-  "vpunpckldq\t{%2, %1, %0|%0, %1, %2}"
+  "vpunpckldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -8640,7 +9318,45 @@
    (set_attr "prefix" "orig,orig,vex,vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_vinsert<shuffletype>32x4_1"
+(define_expand "avx512f_vinsert<shuffletype>32x4_mask"
+  [(match_operand:V16FI 0 "register_operand")
+   (match_operand:V16FI 1 "register_operand")
+   (match_operand:<ssequartermode> 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_3_operand")
+   (match_operand:V16FI 4 "register_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  switch (INTVAL (operands[3]))
+    {
+    case 0:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xFFF), operands[4],
+	  operands[5]));
+      break;
+    case 1:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xF0FF), operands[4],
+	  operands[5]));
+      break;
+    case 2:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xFF0F), operands[4],
+	  operands[5]));
+      break;
+    case 3:
+      emit_insn (gen_avx512f_vinsert<shuffletype>32x4_1_mask (operands[0],
+          operands[1], operands[2], GEN_INT (0xFFF0), operands[4],
+	  operands[5]));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  DONE;
+
+})
+
+(define_insn "<mask_codefor>avx512f_vinsert<shuffletype>32x4_1<mask_name>"
   [(set (match_operand:V16FI 0 "register_operand" "=v")
 	(vec_merge:V16FI
 	  (match_operand:V16FI 1 "register_operand" "v")
@@ -8663,14 +9379,35 @@
 
   operands[3] = GEN_INT (mask);
 
-  return "vinsert<shuffletype>32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vinsert<shuffletype>32x4\t{%3, %2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "vec_set_lo_<mode>"
+(define_expand "avx512f_vinsert<shuffletype>64x4_mask"
+  [(match_operand:V8FI 0 "register_operand")
+   (match_operand:V8FI 1 "register_operand")
+   (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_1_operand")
+   (match_operand:V8FI 4 "register_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  if (mask == 0)
+    emit_insn (gen_vec_set_lo_<mode>_mask
+      (operands[0], operands[1], operands[2],
+       operands[4], operands[5]));
+  else
+    emit_insn (gen_vec_set_hi_<mode>_mask
+      (operands[0], operands[1], operands[2],
+       operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "vec_set_lo_<mode><mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
 	(vec_concat:V8FI
 	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
@@ -8679,13 +9416,13 @@
 	    (parallel [(const_int 4) (const_int 5)
               (const_int 6) (const_int 7)]))))]
   "TARGET_AVX512F"
-  "vinsert<shuffletype>64x4\t{$0x0, %2, %1, %0|%0, %1, %2, $0x0}"
+  "vinsert<shuffletype>64x4\t{$0x0, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, $0x0}"
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "vec_set_hi_<mode>"
+(define_insn "vec_set_hi_<mode><mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
 	(vec_concat:V8FI
 	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
@@ -8694,13 +9431,37 @@
 	    (parallel [(const_int 0) (const_int 1)
               (const_int 2) (const_int 3)]))))]
   "TARGET_AVX512F"
-  "vinsert<shuffletype>64x4\t{$0x1, %2, %1, %0|%0, %1, %2, $0x1}"
+  "vinsert<shuffletype>64x4\t{$0x1, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, $0x1}"
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "avx512f_shuf_<shuffletype>64x2_1"
+(define_expand "avx512f_shuf_<shuffletype>64x2_mask"
+  [(match_operand:V8FI 0 "register_operand")
+   (match_operand:V8FI 1 "register_operand")
+   (match_operand:V8FI 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V8FI 4 "register_operand")
+   (match_operand:QI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shuf_<shuffletype>64x2_1_mask
+      (operands[0], operands[1], operands[2],
+       GEN_INT (((mask >> 0) & 3) * 2),
+       GEN_INT (((mask >> 0) & 3) * 2 + 1),
+       GEN_INT (((mask >> 2) & 3) * 2),
+       GEN_INT (((mask >> 2) & 3) * 2 + 1),
+       GEN_INT (((mask >> 4) & 3) * 2 + 8),
+       GEN_INT (((mask >> 4) & 3) * 2 + 9),
+       GEN_INT (((mask >> 6) & 3) * 2 + 8),
+       GEN_INT (((mask >> 6) & 3) * 2 + 9),
+       operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_shuf_<shuffletype>64x2_1<mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v")
 	(vec_select:V8FI
 	  (vec_concat:<ssedoublemode>
@@ -8727,14 +9488,46 @@
   mask |= (INTVAL (operands[9]) - 8) / 2 << 6;
   operands[3] = GEN_INT (mask);
 
-  return "vshuf<shuffletype>64x2\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshuf<shuffletype>64x2\t{%3, %2, %1, %0<mask_operand11>|%0<mask_operand11>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_shuf_<shuffletype>32x4_1"
+(define_expand "avx512f_shuf_<shuffletype>32x4_mask"
+  [(match_operand:V16FI 0 "register_operand")
+   (match_operand:V16FI 1 "register_operand")
+   (match_operand:V16FI 2 "nonimmediate_operand")
+   (match_operand:SI 3 "const_0_to_255_operand")
+   (match_operand:V16FI 4 "register_operand")
+   (match_operand:HI 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_avx512f_shuf_<shuffletype>32x4_1_mask
+      (operands[0], operands[1], operands[2],
+       GEN_INT (((mask >> 0) & 3) * 4),
+       GEN_INT (((mask >> 0) & 3) * 4 + 1),
+       GEN_INT (((mask >> 0) & 3) * 4 + 2),
+       GEN_INT (((mask >> 0) & 3) * 4 + 3),
+       GEN_INT (((mask >> 2) & 3) * 4),
+       GEN_INT (((mask >> 2) & 3) * 4 + 1),
+       GEN_INT (((mask >> 2) & 3) * 4 + 2),
+       GEN_INT (((mask >> 2) & 3) * 4 + 3),
+       GEN_INT (((mask >> 4) & 3) * 4 + 16),
+       GEN_INT (((mask >> 4) & 3) * 4 + 17),
+       GEN_INT (((mask >> 4) & 3) * 4 + 18),
+       GEN_INT (((mask >> 4) & 3) * 4 + 19),
+       GEN_INT (((mask >> 6) & 3) * 4 + 16),
+       GEN_INT (((mask >> 6) & 3) * 4 + 17),
+       GEN_INT (((mask >> 6) & 3) * 4 + 18),
+       GEN_INT (((mask >> 6) & 3) * 4 + 19),
+       operands[4], operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_shuf_<shuffletype>32x4_1<mask_name>"
   [(set (match_operand:V16FI 0 "register_operand" "=v")
 	(vec_select:V16FI
 	  (vec_concat:<ssedoublemode>
@@ -8777,14 +9570,44 @@
   mask |= (INTVAL (operands[15]) - 16) / 4 << 6;
   operands[3] = GEN_INT (mask);
 
-  return "vshuf<shuffletype>32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  return "vshuf<shuffletype>32x4\t{%3, %2, %1, %0<mask_operand19>|%0<mask_operand19>, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_pshufd_1"
+(define_expand "avx512f_pshufdv3_mask"
+  [(match_operand:V16SI 0 "register_operand")
+   (match_operand:V16SI 1 "nonimmediate_operand")
+   (match_operand:SI 2 "const_0_to_255_operand")
+   (match_operand:V16SI 3 "register_operand")
+   (match_operand:HI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[2]);
+  emit_insn (gen_avx512f_pshufd_1_mask (operands[0], operands[1],
+				       GEN_INT ((mask >> 0) & 3),
+				       GEN_INT ((mask >> 2) & 3),
+				       GEN_INT ((mask >> 4) & 3),
+				       GEN_INT ((mask >> 6) & 3),
+				       GEN_INT (((mask >> 0) & 3) + 4),
+				       GEN_INT (((mask >> 2) & 3) + 4),
+				       GEN_INT (((mask >> 4) & 3) + 4),
+				       GEN_INT (((mask >> 6) & 3) + 4),
+				       GEN_INT (((mask >> 0) & 3) + 8),
+				       GEN_INT (((mask >> 2) & 3) + 8),
+				       GEN_INT (((mask >> 4) & 3) + 8),
+				       GEN_INT (((mask >> 6) & 3) + 8),
+				       GEN_INT (((mask >> 0) & 3) + 12),
+				       GEN_INT (((mask >> 2) & 3) + 12),
+				       GEN_INT (((mask >> 4) & 3) + 12),
+				       GEN_INT (((mask >> 6) & 3) + 12),
+				       operands[3], operands[4]));
+  DONE;
+})
+
+(define_insn "avx512f_pshufd_1<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(vec_select:V16SI
 	  (match_operand:V16SI 1 "nonimmediate_operand" "vm")
@@ -8825,7 +9648,7 @@
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
 
-  return "vpshufd\t{%2, %1, %0|%0, %1, %2}";
+  return "vpshufd\t{%2, %1, %0<mask_operand18>|%0<mask_operand18>, %1, %2}";
 }
   [(set_attr "type" "sselog1")
    (set_attr "prefix" "evex")
@@ -10276,12 +11099,12 @@
    (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
    (set_attr "mode" "DI")])
 
-(define_insn "abs<mode>2"
+(define_insn "abs<mode>2<mask_name>"
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand" "=v")
 	(abs:VI124_AVX2_48_AVX512F
 	  (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSSE3"
-  "%vpabs<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "TARGET_SSSE3 && <mask_mode512bit_condition>"
+  "%vpabs<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sselog1")
    (set_attr "prefix_data16" "1")
    (set_attr "prefix_extra" "1")
@@ -10622,12 +11445,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v16qiv16si2"
+(define_insn "<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
 	  (match_operand:V16QI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>bd\t{%1, %0|%0, %q1}"
+  "vpmov<extsuffix>bd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10662,12 +11485,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v16hiv16si2"
+(define_insn "avx512f_<code>v16hiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
 	  (match_operand:V16HI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>wd\t{%1, %0|%0, %1}"
+  "vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10697,7 +11520,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v8qiv8di2"
+(define_insn "avx512f_<code>v8qiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (vec_select:V8QI
@@ -10707,7 +11530,7 @@
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>bq\t{%1, %0|%0, %k1}"
+  "vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10739,12 +11562,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v8hiv8di2"
+(define_insn "avx512f_<code>v8hiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8HI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>wq\t{%1, %0|%0, %q1}"
+  "vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -10776,12 +11599,12 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "avx512f_<code>v8siv8di2"
+(define_insn "avx512f_<code>v8siv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
 	  (match_operand:V8SI 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
-  "vpmov<extsuffix>dq\t{%1, %0|%0, %1}"
+  "vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -11564,33 +12387,33 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512er_exp2<mode>"
+(define_insn "avx512er_exp2<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_EXP2))]
   "TARGET_AVX512ER"
-  "vexp2<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vexp2<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512er_rcp28<mode>"
+(define_insn "<mask_codefor>avx512er_rcp28<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RCP28))]
   "TARGET_AVX512ER"
-  "vrcp28<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrcp28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512er_rsqrt28<mode>"
+(define_insn "<mask_codefor>avx512er_rsqrt28<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_RSQRT28))]
   "TARGET_AVX512ER"
-  "vrsqrt28<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vrsqrt28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
@@ -12640,16 +13463,16 @@
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<avx2_avx512f>_permvar<mode>"
+(define_insn "<avx2_avx512f>_permvar<mode><mask_name>"
   [(set (match_operand:VI48F_256_512 0 "register_operand" "=v")
 	(unspec:VI48F_256_512
 	  [(match_operand:VI48F_256_512 1 "nonimmediate_operand" "vm")
 	   (match_operand:<sseintvecmode> 2 "register_operand" "v")]
 	  UNSPEC_VPERMVAR))]
-  "TARGET_AVX2"
-  "vperm<ssemodesuffix>\t{%1, %2, %0|%0, %2, %1}"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "vperm<ssemodesuffix>\t{%1, %2, %0<mask_operand3>|%0<mask_operand3>, %2, %1}"
   [(set_attr "type" "sselog")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix2>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "<avx2_avx512f>_perm<mode>"
@@ -12660,14 +13483,32 @@
 {
   int mask = INTVAL (operands[2]);
   emit_insn (gen_<avx2_avx512f>_perm<mode>_1 (operands[0], operands[1],
-					    GEN_INT ((mask >> 0) & 3),
-					    GEN_INT ((mask >> 2) & 3),
-					    GEN_INT ((mask >> 4) & 3),
-					    GEN_INT ((mask >> 6) & 3)));
+					      GEN_INT ((mask >> 0) & 3),
+					      GEN_INT ((mask >> 2) & 3),
+					      GEN_INT ((mask >> 4) & 3),
+					      GEN_INT ((mask >> 6) & 3)));
+  DONE;
+})
+
+(define_expand "avx512f_perm<mode>_mask"
+  [(match_operand:V8FI 0 "register_operand")
+   (match_operand:V8FI 1 "nonimmediate_operand")
+   (match_operand:SI 2 "const_0_to_255_operand")
+   (match_operand:V8FI 3 "vector_move_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  int mask = INTVAL (operands[2]);
+  emit_insn (gen_<avx2_avx512f>_perm<mode>_1_mask (operands[0], operands[1],
+						   GEN_INT ((mask >> 0) & 3),
+						   GEN_INT ((mask >> 2) & 3),
+						   GEN_INT ((mask >> 4) & 3),
+						   GEN_INT ((mask >> 6) & 3),
+						   operands[3], operands[4]));
   DONE;
 })
 
-(define_insn "<avx2_avx512f>_perm<mode>_1"
+(define_insn "<avx2_avx512f>_perm<mode>_1<mask_name>"
   [(set (match_operand:VI8F_256_512 0 "register_operand" "=v")
 	(vec_select:VI8F_256_512
 	  (match_operand:VI8F_256_512 1 "nonimmediate_operand" "vm")
@@ -12675,7 +13516,7 @@
 		     (match_operand 3 "const_0_to_3_operand")
 		     (match_operand 4 "const_0_to_3_operand")
 		     (match_operand 5 "const_0_to_3_operand")])))]
-  "TARGET_AVX2"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
 {
   int mask = 0;
   mask |= INTVAL (operands[2]) << 0;
@@ -12683,10 +13524,10 @@
   mask |= INTVAL (operands[4]) << 4;
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
-  return "vperm<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+  return "vperm<ssemodesuffix>\t{%2, %1, %0<mask_operand6>|%0<mask_operand6>, %1, %2}";
 }
   [(set_attr "type" "sselog")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix2>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx2_permv2ti"
@@ -12733,58 +13574,58 @@
    (set_attr "isa" "*,avx2,noavx2")
    (set_attr "mode" "V8SF")])
 
-(define_insn "avx512f_vec_dup<mode>"
+(define_insn "<mask_codefor>avx512f_vec_dup<mode><mask_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(vec_duplicate:VI48F_512
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
-  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0|%0, %1}"
+  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_broadcast<mode>"
+(define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
   [(set (match_operand:V16FI 0 "register_operand" "=v,v")
 	(vec_duplicate:V16FI
 	  (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "v,m")))]
   "TARGET_AVX512F"
   "@
-   vshuf<shuffletype>32x4\t{$0x0, %g1, %g1, %0|%0, %g1, %g1, 0x0}
-   vbroadcast<shuffletype>32x4\t{%1, %0|%0, %1}"
+   vshuf<shuffletype>32x4\t{$0x0, %g1, %g1, %0<mask_operand2>|%0<mask_operand2>, %g1, %g1, 0x0}
+   vbroadcast<shuffletype>32x4\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_broadcast<mode>"
+(define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
   [(set (match_operand:V8FI 0 "register_operand" "=v,v")
 	(vec_duplicate:V8FI
 	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "v,m")))]
   "TARGET_AVX512F"
   "@
-   vshuf<shuffletype>64x2\t{$0x44, %g1, %g1, %0|%0, %g1, %g1, 0x44}
-   vbroadcast<shuffletype>64x4\t{%1, %0|%0, %1}"
+   vshuf<shuffletype>64x2\t{$0x44, %g1, %g1, %0<mask_operand2>|%0<mask_operand2>, %g1, %g1, 0x44}
+   vbroadcast<shuffletype>64x4\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vec_dup_gpr<mode>"
+(define_insn "<mask_codefor>avx512f_vec_dup_gpr<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(vec_duplicate:VI48_512
 	  (match_operand:<ssescalarmode> 1 "register_operand" "r")))]
   "TARGET_AVX512F && (<MODE>mode != V8DImode || TARGET_64BIT)"
-  "vpbroadcast<bcstscalarsuff>\t{%1, %0|%0, %1}"
+  "vpbroadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vec_dup_mem<mode>"
+(define_insn "<mask_codefor>avx512f_vec_dup_mem<mode><mask_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=x")
 	(vec_duplicate:VI48F_512
 	  (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "xm")))]
   "TARGET_AVX512F"
-  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0|%0, %1}"
+  "v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -12924,12 +13765,12 @@
 				elt * GET_MODE_SIZE (<ssescalarmode>mode));
 })
 
-(define_expand "<sse2_avx_avx512f>_vpermil<mode>"
+(define_expand "<sse2_avx_avx512f>_vpermil<mode><mask_name>"
   [(set (match_operand:VF2 0 "register_operand")
 	(vec_select:VF2
 	  (match_operand:VF2 1 "nonimmediate_operand")
 	  (match_operand:SI 2 "const_0_to_255_operand")))]
-  "TARGET_AVX"
+  "TARGET_AVX && <mask_mode512bit_condition>"
 {
   int mask = INTVAL (operands[2]);
   rtx perm[<ssescalarnum>];
@@ -12945,12 +13786,12 @@
     = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (<ssescalarnum>, perm));
 })
 
-(define_expand "<sse2_avx_avx512f>_vpermil<mode>"
+(define_expand "<sse2_avx_avx512f>_vpermil<mode><mask_name>"
   [(set (match_operand:VF1 0 "register_operand")
 	(vec_select:VF1
 	  (match_operand:VF1 1 "nonimmediate_operand")
 	  (match_operand:SI 2 "const_0_to_255_operand")))]
-  "TARGET_AVX"
+  "TARGET_AVX && <mask_mode512bit_condition>"
 {
   int mask = INTVAL (operands[2]);
   rtx perm[<ssescalarnum>];
@@ -12968,37 +13809,37 @@
     = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (<ssescalarnum>, perm));
 })
 
-(define_insn "*<sse2_avx_avx512f>_vpermilp<mode>"
+(define_insn "*<sse2_avx_avx512f>_vpermilp<mode><mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
 	(vec_select:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "vm")
 	  (match_parallel 2 ""
 	    [(match_operand 3 "const_int_operand")])))]
-  "TARGET_AVX
+  "TARGET_AVX && <mask_mode512bit_condition>
    && avx_vpermilp_parallel (operands[2], <MODE>mode)"
 {
   int mask = avx_vpermilp_parallel (operands[2], <MODE>mode) - 1;
   operands[2] = GEN_INT (mask);
-  return "vpermil<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+  return "vpermil<ssemodesuffix>\t{%2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2}";
 }
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix>")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<sse2_avx_avx512f>_vpermilvar<mode>3"
+(define_insn "<sse2_avx_avx512f>_vpermilvar<mode>3<mask_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
 	(unspec:VF
 	  [(match_operand:VF 1 "register_operand" "v")
 	   (match_operand:<sseintvecmode> 2 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPERMIL))]
-  "TARGET_AVX"
-  "vpermil<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX && <mask_mode512bit_condition>"
+  "vpermil<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "btver2_decode" "vector")
-   (set_attr "prefix" "vex")
+   (set_attr "prefix" "<mask_prefix>")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx512f_vpermi2var<mode>3"
@@ -13014,6 +13855,22 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_vpermi2var<mode>3_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v")
+	(vec_merge:VI48F_512
+	  (unspec:VI48F_512
+	    [(match_operand:VI48F_512 1 "register_operand" "v")
+	    (match_operand:<sseintvecmode> 2 "register_operand" "0")
+	    (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
+	    UNSPEC_VPERMI2_MASK)
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vpermi2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "avx512f_vpermt2var<mode>3"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(unspec:VI48F_512
@@ -13027,6 +13884,22 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_vpermt2var<mode>3_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v")
+	(vec_merge:VI48F_512
+	  (unspec:VI48F_512
+	    [(match_operand:<sseintvecmode> 1 "register_operand" "v")
+	    (match_operand:VI48F_512 2 "register_operand" "0")
+	    (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
+	    UNSPEC_VPERMT2)
+	  (match_dup 2)
+	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
+  "TARGET_AVX512F"
+  "vpermt2<ssemodesuffix>\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_expand "avx_vperm2f128<mode>3"
   [(set (match_operand:AVX256MODE2P 0 "register_operand")
 	(unspec:AVX256MODE2P
@@ -13417,24 +14290,24 @@
   DONE;
 })
 
-(define_insn "<avx2_avx512f>_ashrv<mode>"
+(define_insn "<avx2_avx512f>_ashrv<mode><mask_name>"
   [(set (match_operand:VI48_AVX512F 0 "register_operand" "=v")
 	(ashiftrt:VI48_AVX512F
 	  (match_operand:VI48_AVX512F 1 "register_operand" "v")
 	  (match_operand:VI48_AVX512F 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX2"
-  "vpsrav<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "vpsrav<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<avx2_avx512f>_<shift_insn>v<mode>"
+(define_insn "<avx2_avx512f>_<shift_insn>v<mode><mask_name>"
   [(set (match_operand:VI48_AVX2_48_AVX512F 0 "register_operand" "=v")
 	(any_lshift:VI48_AVX2_48_AVX512F
 	  (match_operand:VI48_AVX2_48_AVX512F 1 "register_operand" "v")
 	  (match_operand:VI48_AVX2_48_AVX512F 2 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX2"
-  "vp<vshift>v<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "vp<vshift>v<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -13517,12 +14390,13 @@
    (set_attr "btver2_decode" "double")
    (set_attr "mode" "V8SF")])
 
-(define_insn "avx512f_vcvtph2ps512"
+(define_insn "<mask_codefor>avx512f_vcvtph2ps512<mask_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
-	(unspec:V16SF [(match_operand:V16HI 1 "nonimmediate_operand" "vm")]
-		      UNSPEC_VCVTPH2PS))]
+	(unspec:V16SF
+	  [(match_operand:V16HI 1 "nonimmediate_operand" "vm")]
+	  UNSPEC_VCVTPH2PS))]
   "TARGET_AVX512F"
-  "vcvtph2ps\t{%1, %0|%0, %1}"
+  "vcvtph2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -13573,13 +14447,14 @@
    (set_attr "btver2_decode" "vector")
    (set_attr "mode" "V8SF")])
 
-(define_insn "avx512f_vcvtps2ph512"
+(define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>"
   [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
-	(unspec:V16HI [(match_operand:V16SF 1 "register_operand" "v")
-		      (match_operand:SI 2 "const_0_to_255_operand" "N")]
-		     UNSPEC_VCVTPS2PH))]
+	(unspec:V16HI
+	  [(match_operand:V16SF 1 "register_operand" "v")
+	   (match_operand:SI 2 "const_0_to_255_operand" "N")]
+	  UNSPEC_VCVTPS2PH))]
   "TARGET_AVX512F"
-  "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}"
+  "vcvtps2ph\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -13969,14 +14844,55 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getmant<mode>"
+(define_insn "avx512f_compress<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "register_operand" "v")
+	   (match_operand:VI48F_512 2 "vector_move_operand" "0C")
+	   (match_operand:<avx512fmaskmode> 3 "register_operand" "k")]
+	  UNSPEC_COMPRESS))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>compress<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_compressstore<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "memory_operand" "=m")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "register_operand" "x")
+	   (match_dup 0)
+	   (match_operand:<avx512fmaskmode> 2 "register_operand" "k")]
+	  UNSPEC_COMPRESS_STORE))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>compress<ssemodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_expand<mode>_mask"
+  [(set (match_operand:VI48F_512 0 "register_operand" "=v,v")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "nonimmediate_operand" "v,m")
+	   (match_operand:VI48F_512 2 "vector_move_operand" "0C,0C")
+	   (match_operand:<avx512fmaskmode> 3 "register_operand" "k,k")]
+	  UNSPEC_EXPAND))]
+  "TARGET_AVX512F"
+  "v<sseintprefix>expand<ssemodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "none,load")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_getmant<mode><mask_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_GETMANT))]
   "TARGET_AVX512F"
-  "vgetmant<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+  "vgetmant<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
@@ -13995,23 +14911,23 @@
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "clz<mode>2"
+(define_insn "clz<mode>2<mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(clz:VI48_512
 	  (match_operand:VI48_512 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512CD"
-  "vplzcnt<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vplzcnt<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "conflict<mode>"
+(define_insn "<mask_codefor>conflict<mode><mask_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(unspec:VI48_512
 	  [(match_operand:VI48_512 1 "nonimmediate_operand" "vm")]
 	  UNSPEC_CONFLICT))]
   "TARGET_AVX512CD"
-  "vpconflict<ssemodesuffix>\t{%1, %0|%0, %1}"
+  "vpconflict<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
new file mode 100644
index 0000000..6b45d05
--- /dev/null
+++ b/gcc/config/i386/subst.md
@@ -0,0 +1,56 @@
+;; GCC machine description for AVX512F instructions
+;; Copyright (C) 2013 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Some iterators for extending subst as much as possible
+;; All vectors (Use it for destination)
+(define_mode_iterator SUBST_V
+  [V16QI
+   V16HI V8HI
+   V16SI V8SI  V4SI
+   V8DI  V4DI  V2DI
+   V16SF V8SF  V4SF
+   V8DF  V4DF  V2DF])
+
+(define_subst_attr "mask_name" "mask" "" "_mask")
+(define_subst_attr "mask_applied" "mask" "false" "true")
+(define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2")
+(define_subst_attr "mask_operand3" "mask" "" "%{%4%}%N3")
+(define_subst_attr "mask_operand3_1" "mask" "" "%%{%%4%%}%%N3") ;; for sprintf
+(define_subst_attr "mask_operand4" "mask" "" "%{%5%}%N4")
+(define_subst_attr "mask_operand6" "mask" "" "%{%7%}%N6")
+(define_subst_attr "mask_operand11" "mask" "" "%{%12%}%N11")
+(define_subst_attr "mask_operand18" "mask" "" "%{%19%}%N18")
+(define_subst_attr "mask_operand19" "mask" "" "%{%20%}%N19")
+(define_subst_attr "mask_codefor" "mask" "*" "")
+(define_subst_attr "mask_mode512bit_condition" "mask" "1" "(GET_MODE_SIZE (GET_MODE (operands[0])) == 64)")
+(define_subst_attr "store_mask_constraint" "mask" "vm" "v")
+(define_subst_attr "store_mask_predicate" "mask" "nonimmediate_operand" "register_operand")
+(define_subst_attr "mask_prefix" "mask" "vex" "evex")
+(define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex")
+(define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex")
+
+(define_subst "mask"
+  [(set (match_operand:SUBST_V 0)
+        (match_operand:SUBST_V 1))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+        (vec_merge:SUBST_V
+	  (match_dup 1)
+	  (match_operand:SUBST_V 2 "vector_move_operand" "0C")
+	  (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-11-01 12:20       ` Kirill Yukhin
@ 2013-11-05  9:19         ` Kirill Yukhin
  2013-11-05 13:06         ` Kirill Yukhin
  2013-11-26  7:01         ` Richard Henderson
  2 siblings, 0 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-05  9:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello,

On 01 Nov 16:19, Kirill Yukhin wrote:
> Coould you pls take a look?

PING.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-11-01 12:20       ` Kirill Yukhin
  2013-11-05  9:19         ` Kirill Yukhin
@ 2013-11-05 13:06         ` Kirill Yukhin
  2013-11-11 13:52           ` Kirill Yukhin
  2013-11-11 23:34           ` Richard Henderson
  2013-11-26  7:01         ` Richard Henderson
  2 siblings, 2 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-05 13:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello,
Small correction.

On 01 Nov 16:19, Kirill Yukhin wrote:
> +(define_insn "avx512f_store<mode>_mask"
> +  [(set (match_operand:VI48F_512 0 "memory_operand" "=m")
> +	(vec_merge:VI48F_512
> +	  (match_operand:VI48F_512 1 "register_operand" "v")
> +	  (match_dup 0)
> +	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
> +  "TARGET_AVX512F"
> +{
> +  switch (<sseinsnmode>mode)
Need to be MODE_<sseinsnmode>. Same for load.

Is it ok with that change?

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst.
  2013-08-14  7:44 [PATCH i386 4/8] [AVX512] Add substed patterns Kirill Yukhin
  2013-08-22 14:15 ` Kirill Yukhin
@ 2013-11-06  7:22 ` Kirill Yukhin
  2013-11-15 18:04   ` Kirill Yukhin
  2013-11-26  7:21   ` Richard Henderson
  2013-11-06  7:24 ` [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst Kirill Yukhin
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-06  7:22 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek; +Cc: GCC Patches

Hello,
This small patch introduces `sd' subst.
`sd' (Source-Destination) subst is almost the same, as
the usual mask-subst, but it's only used for zero-masking. The reason is that
some patterns already have an operand with constraint "0" and we can't add a new
operand with the same constraint. So, we add only zero-masking here by subst and
manually write a pattern for merge-masking where we use match_dup instead of an
operand with constraint "0".

Bootstrap pass.

Is it ok for trunk?

--
Thanks, K

---
 gcc/config/i386/sse.md   | 180 ++++++++++++++++++++++++++++++++++++-----------
 gcc/config/i386/subst.md |  17 +++++
 2 files changed, 156 insertions(+), 41 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 298791d..6b41060 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2752,17 +2752,17 @@
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_insn "fma_fmadd_<mode>"
+(define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x")
 	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm, v,vm, x,m")
 	  (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  ""
+  "<sd_mask_mode512bit_condition>"
   "@
-   vfmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfmadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfmadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfmadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
@@ -2801,18 +2801,18 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fma_fmsub_<mode>"
+(define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0, 0, v, x,x")
 	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
-  ""
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" "" v,vm, 0,xm,x"))))]
+  "<sd_mask_mode512bit_condition>"
   "@
-   vfmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfmsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfmsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfmsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
@@ -2853,18 +2853,18 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fma_fnmadd_<mode>"
+(define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
 	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
 	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
 	  (match_operand:FMAMODE   3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  ""
+  "<sd_mask_mode512bit_condition>"
   "@
-   vfnmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfnmadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfnmadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfnmadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfnmadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfnmadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
@@ -2905,7 +2905,7 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fma_fnmsub_<mode>"
+(define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
@@ -2913,11 +2913,11 @@
 	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
 	  (neg:FMAMODE
 	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
-  ""
+  "<sd_mask_mode512bit_condition>"
   "@
-   vfnmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfnmsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfnmsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfnmsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfnmsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfnmsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
@@ -2980,18 +2980,32 @@
 	  UNSPEC_FMADDSUB))]
   "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
 
-(define_insn "fma_fmaddsub_<mode>"
+(define_expand "avx512f_fmaddsub_<mode>_maskz"
+  [(match_operand:VF_512 0 "register_operand")
+   (match_operand:VF_512 1 "nonimmediate_operand")
+   (match_operand:VF_512 2 "nonimmediate_operand")
+   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_fma_fmaddsub_<mode>_maskz_1 (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]));
+  DONE;
+})
+
+(define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
 	  [(match_operand:VF 1 "nonimmediate_operand" "%0, 0, v, x,x")
 	   (match_operand:VF 2 "nonimmediate_operand" "vm, v,vm, x,m")
 	   (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x")]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition>"
   "@
-   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
@@ -3032,7 +3046,7 @@
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fma_fmsubadd_<mode>"
+(define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
 	  [(match_operand:VF   1 "nonimmediate_operand" "%0, 0, v, x,x")
@@ -3040,11 +3054,11 @@
 	   (neg:VF
 	     (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x"))]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition>"
   "@
-   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmsubadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
+   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
+   vfmsubadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
@@ -6706,7 +6720,22 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<MODE>")])
 
-(define_insn "avx512f_vternlog<mode>"
+(define_expand "avx512f_vternlog<mode>_maskz"
+  [(match_operand:VI48_512 0 "register_operand")
+   (match_operand:VI48_512 1 "register_operand")
+   (match_operand:VI48_512 2 "register_operand")
+   (match_operand:VI48_512 3 "nonimmediate_operand")
+   (match_operand:SI 4 "const_0_to_255_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_vternlog<mode>_maskz_1 (
+    operands[0], operands[1], operands[2], operands[3],
+    operands[4], CONST0_RTX (<MODE>mode), operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_vternlog<mode><sd_maskz_name>"
   [(set (match_operand:VI48_512 0 "register_operand" "=v")
 	(unspec:VI48_512
 	  [(match_operand:VI48_512 1 "register_operand" "0")
@@ -6715,7 +6744,7 @@
 	   (match_operand:SI 4 "const_0_to_255_operand")]
 	  UNSPEC_VTERNLOG))]
   "TARGET_AVX512F"
-  "vpternlog<ssemodesuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}"
+  "vpternlog<ssemodesuffix>\t{%4, %3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3, %4}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -6802,7 +6831,23 @@
   DONE;
 })
 
-(define_insn "avx512f_fixupimm<mode>"
+
+(define_expand "avx512f_fixupimm<mode>_maskz"
+  [(match_operand:VF_512 0 "register_operand")
+   (match_operand:VF_512 1 "register_operand")
+   (match_operand:VF_512 2 "register_operand")
+   (match_operand:<sseintvecmode> 3 "nonimmediate_operand")
+   (match_operand:SI 4 "const_0_to_255_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_fixupimm<mode>_maskz_1 (
+	operands[0], operands[1], operands[2], operands[3],
+	operands[4], CONST0_RTX (<MODE>mode), operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_fixupimm<mode><sd_maskz_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
         (unspec:VF_512
           [(match_operand:VF_512 1 "register_operand" "0")
@@ -6811,7 +6856,7 @@
            (match_operand:SI 4 "const_0_to_255_operand")]
            UNSPEC_FIXUPIMM))]
   "TARGET_AVX512F"
-  "vfixupimm<ssemodesuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}";
+  "vfixupimm<ssemodesuffix>\t{%4, %3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3, %4}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
@@ -6831,7 +6876,22 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_sfixupimm<mode>"
+(define_expand "avx512f_sfixupimm<mode>_maskz"
+  [(match_operand:VF_128 0 "register_operand")
+   (match_operand:VF_128 1 "register_operand")
+   (match_operand:VF_128 2 "register_operand")
+   (match_operand:<sseintvecmode> 3 "nonimmediate_operand")
+   (match_operand:SI 4 "const_0_to_255_operand")
+   (match_operand:<avx512fmaskmode> 5 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_sfixupimm<mode>_maskz_1 (
+	operands[0], operands[1], operands[2], operands[3],
+	operands[4], CONST0_RTX (<MODE>mode), operands[5]));
+  DONE;
+})
+
+(define_insn "avx512f_sfixupimm<mode><sd_maskz_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
           (unspec:VF_128
@@ -6843,7 +6903,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vfixupimm<ssescalarmodesuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}";
+   "vfixupimm<ssescalarmodesuffix>\t{%4, %3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3, %4}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -14138,7 +14198,21 @@
    (set_attr "prefix" "<mask_prefix>")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vpermi2var<mode>3"
+(define_expand "avx512f_vpermi2var<mode>3_maskz"
+  [(match_operand:VI48F_512 0 "register_operand" "=v")
+   (match_operand:VI48F_512 1 "register_operand" "v")
+   (match_operand:<sseintvecmode> 2 "register_operand" "0")
+   (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")
+   (match_operand:<avx512fmaskmode> 4 "register_operand" "k")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_vpermi2var<mode>3_maskz_1 (
+	operands[0], operands[1], operands[2], operands[3],
+	CONST0_RTX (<MODE>mode), operands[4]));
+  DONE;
+})
+
+(define_insn "avx512f_vpermi2var<mode>3<sd_maskz_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(unspec:VI48F_512
 	  [(match_operand:VI48F_512 1 "register_operand" "v")
@@ -14146,7 +14220,7 @@
 	   (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPERMI2))]
   "TARGET_AVX512F"
-  "vpermi2<ssemodesuffix>\t{%3, %1, %0|%0, %1, %3}"
+  "vpermi2<ssemodesuffix>\t{%3, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %3}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -14167,7 +14241,21 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vpermt2var<mode>3"
+(define_expand "avx512f_vpermt2var<mode>3_maskz"
+  [(match_operand:VI48F_512 0 "register_operand" "=v")
+   (match_operand:<sseintvecmode> 1 "register_operand" "v")
+   (match_operand:VI48F_512 2 "register_operand" "0")
+   (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")
+   (match_operand:<avx512fmaskmode> 4 "register_operand" "k")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_vpermt2var<mode>3_maskz_1 (
+	operands[0], operands[1], operands[2], operands[3],
+	CONST0_RTX (<MODE>mode), operands[4]));
+  DONE;
+})
+
+(define_insn "avx512f_vpermt2var<mode>3<sd_maskz_name>"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v")
 	(unspec:VI48F_512
 	  [(match_operand:<sseintvecmode> 1 "register_operand" "v")
@@ -14175,7 +14263,7 @@
 	   (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")]
 	  UNSPEC_VPERMT2))]
   "TARGET_AVX512F"
-  "vpermt2<ssemodesuffix>\t{%3, %1, %0|%0, %1, %3}"
+  "vpermt2<ssemodesuffix>\t{%3, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %3}"
   [(set_attr "type" "sselog")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -15167,6 +15255,16 @@
    (set_attr "memory" "store")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_expand "avx512f_expand<mode>_maskz"
+  [(set (match_operand:VI48F_512 0 "register_operand")
+	(unspec:VI48F_512
+	  [(match_operand:VI48F_512 1 "nonimmediate_operand")
+	   (match_operand:VI48F_512 2 "vector_move_operand")
+	   (match_operand:<avx512fmaskmode> 3 "register_operand")]
+	  UNSPEC_EXPAND))]
+  "TARGET_AVX512F"
+  "operands[2] = CONST0_RTX (<MODE>mode);")
+
 (define_insn "avx512f_expand<mode>_mask"
   [(set (match_operand:VI48F_512 0 "register_operand" "=v,v")
 	(unspec:VI48F_512
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index b537c5e..f81741f 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -93,3 +93,20 @@
         (and:SUBST_S
 	  (match_dup 1)
 	  (match_operand:SUBST_S 3 "register_operand" "k")))])
+
+(define_subst_attr "sd_maskz_name" "sd" "" "_maskz_1")
+(define_subst_attr "sd_mask_op4" "sd" "" "%{%5%}%N4")
+(define_subst_attr "sd_mask_op5" "sd" "" "%{%6%}%N5")
+(define_subst_attr "sd_mask_codefor" "sd" "*" "")
+(define_subst_attr "sd_mask_mode512bit_condition" "sd" "1" "(GET_MODE_SIZE (GET_MODE (operands[0])) == 64)")
+
+(define_subst "sd"
+ [(set (match_operand:SUBST_V 0)
+       (match_operand:SUBST_V 1))]
+ ""
+ [(set (match_dup 0)
+       (vec_merge:SUBST_V
+	 (match_dup 1)
+	 (match_operand:SUBST_V 2 "const0_operand" "C")
+	 (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))
+])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.
  2013-08-14  7:44 [PATCH i386 4/8] [AVX512] Add substed patterns Kirill Yukhin
  2013-08-22 14:15 ` Kirill Yukhin
  2013-11-06  7:22 ` [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst Kirill Yukhin
@ 2013-11-06  7:24 ` Kirill Yukhin
  2013-11-15 18:07   ` Kirill Yukhin
  2013-11-06  7:28 ` [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst Kirill Yukhin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-06  7:24 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek; +Cc: GCC Patches

Hello,
On 14 Aug 11:44, Kirill Yukhin wrote:
> Rounding is implemented by adding a const_int value in parallel with the
> original pattern - however, that might be not the best approach.  define_subst
> for this transformation are quite straightforward - we needed three of them to
> cover all patterns we need to transform.  The most interesting part here is how
> to determine which operand corresponds to the rounding immediate. To somehow
> reflect this in the pattern, we allowed using attributes in other attributes,
> i.e. attributes nesting (that's already in the trunk). The problem here arises
> from the next: when subst for rounding is applied to a pattern after subst for
> masking, it's actually applied to two patterns with different number of
> operands. To find that number, we wrote next attributes: (define_subst_attr
> "round_mask_operand2" "mask" "%R2" "%R4") (define_subst_attr "round_mask_op2"
> "round" "" "<round_mask_operand2>") In result, we get either empty string (if
> rounding-subst isn't applied), or "%R2" or "%R4" depending on whether mask-subst
> was applied.

Corrsponding subst implementation is in the bottom.

Bootstrapped.

Is it ok for trunk?

--
Thanks, K

---
 gcc/config/i386/i386.c   |  32 +++
 gcc/config/i386/i386.md  |  10 +
 gcc/config/i386/sse.md   | 642 +++++++++++++++++++++++++----------------------
 gcc/config/i386/subst.md |  41 +++
 4 files changed, 426 insertions(+), 299 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a69e057..e172adf 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14839,6 +14839,38 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	    fputs ("{z}", file);
 	  return;
 
+	case 'R':
+	  gcc_assert (CONST_INT_P (x));
+
+	  if (ASSEMBLER_DIALECT == ASM_INTEL)
+	    fputs (", ", file);
+
+	  switch (INTVAL (x))
+	    {
+	    case ROUND_NEAREST_INT:
+	      fputs ("{rn-sae}", file);
+	      break;
+	    case ROUND_NEG_INF:
+	      fputs ("{rd-sae}", file);
+	      break;
+	    case ROUND_POS_INF:
+	      fputs ("{ru-sae}", file);
+	      break;
+	    case ROUND_ZERO:
+	      fputs ("{rz-sae}", file);
+	      break;
+	    case ROUND_SAE:
+	      fputs ("{sae}", file);
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+
+	  if (ASSEMBLER_DIALECT == ASM_ATT)
+	    fputs (", ", file);
+
+	  return;
+
 	case '*':
 	  if (ASSEMBLER_DIALECT == ASM_ATT)
 	    putc ('*', file);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c7ec0c1..344702b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -246,6 +246,16 @@
    (ROUND_NO_EXC		0x8)
   ])
 
+;; Constants to represent AVX512F embeded rounding
+(define_constants
+  [(ROUND_NEAREST_INT			0)
+   (ROUND_NEG_INF			1)
+   (ROUND_POS_INF			2)
+   (ROUND_ZERO				3)
+   (NO_ROUND				4)
+   (ROUND_SAE				5)
+  ])
+
 ;; Constants to represent pcomtrue/pcomfalse variants
 (define_constants
   [(PCOM_FALSE			0)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6b41060..61b96a1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1285,80 +1285,80 @@
 }
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
-(define_expand "<plusminus_insn><mode>3<mask_name>"
+(define_expand "<plusminus_insn><mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(plusminus:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*<plusminus_insn><mode>3<mask_name>"
+(define_insn "*<plusminus_insn><mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(plusminus:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "<comm>0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition>"
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_constraint>")))]
+  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    <plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   v<plusminus_mnemonic><ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<plusminus_insn><mode>3<mask_scalar_name>"
+(define_insn "<sse>_vm<plusminus_insn><mode>3<mask_scalar_name><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (plusminus:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2}"
+   v<plusminus_mnemonic><ssescalarmodesuffix>\t{<round_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2<round_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_expand "mul<mode>3<mask_name>"
+(define_expand "mul<mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(mult:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
 
-(define_insn "*mul<mode>3<mask_name>"
+(define_insn "*mul<mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(mult:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_constraint>")))]
+  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    mul<ssemodesuffix>\t{%2, %0|%0, %2}
-   vmul<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   vmul<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssemul")
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name>"
+(define_insn "<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (multdiv:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    <multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2}"
+   v<multdiv_mnemonic><ssescalarmodesuffix>\t{<round_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2<round_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
    (set_attr "prefix" "<mask_scalar_prefix>")
@@ -1391,15 +1391,15 @@
     }
 })
 
-(define_insn "<sse>_div<mode>3<mask_name>"
+(define_insn "<sse>_div<mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(div:VF
 	  (match_operand:VF 1 "register_operand" "0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_constraint>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    div<ssemodesuffix>\t{%2, %0|%0, %2}
-   vdiv<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   vdiv<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssediv")
    (set_attr "prefix" "<mask_prefix3>")
@@ -1482,28 +1482,28 @@
     }
 })
 
-(define_insn "<sse>_sqrt<mode>2<mask_name>"
+(define_insn "<sse>_sqrt<mode>2<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
-	(sqrt:VF (match_operand:VF 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
-  "%vsqrt<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+	(sqrt:VF (match_operand:VF 1 "nonimmediate_operand" "<round_constraint>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "%vsqrt<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
    (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name>"
+(define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (sqrt:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 1 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_operand:VF_128 2 "register_operand" "0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vsqrt<ssescalarmodesuffix>\t{%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %<iptr>1}"
+   vsqrt<ssescalarmodesuffix>\t{<round_mask_scalar_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %<iptr>1<round_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
@@ -2752,210 +2752,224 @@
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>"
+(define_expand "avx512f_fmadd_<mode>_maskz"
+  [(match_operand:VF_512 0 "register_operand")
+   (match_operand:VF_512 1 "nonimmediate_operand")
+   (match_operand:VF_512 2 "nonimmediate_operand")
+   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_fma_fmadd_<mode>_maskz_1 (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]));
+  DONE;
+})
+
+(define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
-	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  "<sd_mask_mode512bit_condition>"
-  "@
-   vfmadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x")
+	  (match_operand:FMAMODE 2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
+	  (match_operand:FMAMODE 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x")))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "@
+   vfmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmadd_<mode>_mask"
+(define_insn "avx512f_fmadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "0,0")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
-	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmadd_<mode>_mask3"
+(define_insn "avx512f_fmadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=x")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "x")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	    (match_operand:VF_512 3 "register_operand" "0"))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" "" v,vm, 0,xm,x"))))]
-  "<sd_mask_mode512bit_condition>"
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x"))))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsub_<mode>_mask"
+(define_insn "avx512f_fmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "0,0")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
 	    (neg:VF_512
-	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsub_<mode>_mask3"
+(define_insn "avx512f_fmsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "v")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	    (neg:VF_512
 	      (match_operand:VF_512 3 "register_operand" "0")))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE   3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  "<sd_mask_mode512bit_condition>"
-  "@
-   vfnmadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfnmadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfnmadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
+	  (match_operand:FMAMODE   3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x")))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "@
+   vfnmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfnmadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfnmadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmadd_<mode>_mask"
+(define_insn "avx512f_fnmadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "0,0"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
-	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfnmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfnmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfnmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfnmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmadd_<mode>_mask3"
+(define_insn "avx512f_fnmadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "v"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	    (match_operand:VF_512 3 "register_operand" "0"))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfnmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfnmadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
-  "<sd_mask_mode512bit_condition>"
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x"))))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfnmsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfnmsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfnmsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfnmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfnmsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfnmsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmsub_<mode>_mask"
+(define_insn "avx512f_fnmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "0,0"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
 	    (neg:VF_512
-	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfnmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfnmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfnmsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfnmsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmsub_<mode>_mask3"
+(define_insn "avx512f_fnmsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "v"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	    (neg:VF_512
 	      (match_operand:VF_512 3 "register_operand" "0")))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfnmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfnmsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2994,109 +3008,109 @@
   DONE;
 })
 
-(define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
-	  [(match_operand:VF 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	   (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x")]
+	  [(match_operand:VF 1 "nonimmediate_operand" "%0,0,v,x,x")
+	   (match_operand:VF 2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
+	   (match_operand:VF 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x")]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition>"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfmaddsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmaddsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmaddsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmaddsub_<mode>_mask"
+(define_insn "avx512f_fmaddsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "0,0")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
-	     (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")]
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	     (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>")]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmaddsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmaddsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmaddsub_<mode>_mask3"
+(define_insn "avx512f_fmaddsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "v")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	     (match_operand:VF_512 3 "register_operand" "0")]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmaddsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmaddsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
-	  [(match_operand:VF   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  [(match_operand:VF   1 "nonimmediate_operand" "%0,0,v,x,x")
+	   (match_operand:VF   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
 	   (neg:VF
-	     (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x"))]
+	     (match_operand:VF 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x"))]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition>"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmsubadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfmsubadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmsubadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmsubadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsubadd_<mode>_mask"
+(define_insn "avx512f_fmsubadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "0,0")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
 	     (neg:VF_512
-	       (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))]
+	       (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>"))]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmsubadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmsubadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsubadd_<mode>_mask3"
+(define_insn "avx512f_fmsubadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "v")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	     (neg:VF_512
 	       (match_operand:VF_512 3 "register_operand" "0"))]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmsubadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmsubadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -3104,7 +3118,22 @@
 ;; FMA3 floating point scalar intrinsics. These merge result with
 ;; high-order elements from the destination register.
 
-(define_expand "fmai_vmfmadd_<mode>"
+(define_expand "fmai_vmfmadd_<mode>_maskz<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand")
+	      (match_operand:VF_128 2 "nonimmediate_operand")
+	      (match_operand:VF_128 3 "nonimmediate_operand"))
+	    (match_dup <round_opnum>)
+	    (match_operand:QI 4 "register_operand"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "operands[<round_opnum>] = CONST0_RTX (<MODE>mode);")
+
+(define_expand "fmai_vmfmadd_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand")
 	(vec_merge:VF_128
 	  (fma:VF_128
@@ -3115,207 +3144,222 @@
 	  (const_int 1)))]
   "TARGET_FMA")
 
-(define_insn "*fmai_fmadd_<mode>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-        (vec_merge:VF_128
-	  (fma:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "vm, v")
-	    (match_operand:VF_128 3 "nonimmediate_operand" " v,vm"))
-	  (match_dup 1)
-	  (const_int 1)))]
-  "TARGET_FMA || TARGET_AVX512F"
-  "@
-   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
-  [(set_attr "type" "ssemuladd")
-   (set_attr "mode" "<MODE>")])
-
-(define_insn "*fmai_fmsub_<mode>_mask"
+(define_insn "fmai_vmfmadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v")
-          (neg:VF_128
-	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>, v")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
 	    (match_dup 1)
 	    (match_operand:QI 4 "register_operand" "k,k"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "@
-   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
-   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+   vfmadd132<ssescalarmodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmadd213<ssescalarmodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "fmai_vmfmsub_<mode>_mask3"
+(define_insn "fmai_vmfmadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (match_operand:VF_128 1 "nonimmediate_operand" "%v")
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
-          (neg:VF_128
-	        (match_operand:VF_128 3 "register_operand" "0")))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")
+	      (match_operand:VF_128 3 "register_operand" "0"))
 	    (match_dup 3)
 	    (match_operand:QI 4 "register_operand" "k"))
 	  (match_dup 3)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vfmsub231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2}"
+  "vfmadd231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fmsub_<mode>_maskz"
+(define_insn "*fmai_fmadd_<mode>_maskz<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
-          (neg:VF_128
-	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
 	    (match_operand:VF_128 4 "const0_operand")
 	    (match_operand:QI 5 "register_operand" "k,k"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "@
-   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
-   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+   vfmadd132<ssescalarmodesuffix>\t{<round_op6>%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %3, %2<round_op6>}
+   vfmadd213<ssescalarmodesuffix>\t{<round_op6>%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3<round_op6>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmadd_<mode><round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+        (vec_merge:VF_128
+	  (fma:VF_128
+	    (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_FMA || TARGET_AVX512F"
+  "@
+   vfmadd132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfmadd213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fmsub_<mode>_mask"
+(define_insn "*fmai_fmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v")
           (neg:VF_128
-	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
 	    (match_dup 1)
 	    (match_operand:QI 4 "register_operand" "k,k"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "@
-   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
-   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+   vfmsub132<ssescalarmodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2<round_op5>}
+   vfmsub213<ssescalarmodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fmai_vmfmsub_<mode>_mask3<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "%v")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "register_operand" "0")))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfmsub231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2<round_op5>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fmsub_<mode>_maskz"
+(define_insn "*fmai_fmsub_<mode>_maskz<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v")
           (neg:VF_128
-	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
 	    (match_operand:VF_128 4 "const0_operand")
 	    (match_operand:QI 5 "register_operand" "k,k"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "@
-   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
-   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+   vfmsub132<ssescalarmodesuffix>\t{<round_op6>%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2<round_op6>}
+   vfmsub213<ssescalarmodesuffix>\t{<round_op6>%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3<round_op6>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_vmfnmadd_<mode>_mask"
+(define_insn "*fmai_vmfnmadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (neg:VF_128
-		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+		(match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
 	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
-	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
 	    (match_dup 1)
 	    (match_operand:QI 4 "register_operand" "k,k"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "@
-   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
-   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+   vfnmadd132<ssescalarmodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2<round_op5>}
+   vfnmadd213<ssescalarmodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3<round_op5>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_vmfnmadd_<mode>_mask3"
+(define_insn "*fmai_vmfnmadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (neg:VF_128
 		(match_operand:VF_128 1 "nonimmediate_operand" "%v"))
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")
 	      (match_operand:VF_128 3 "register_operand" "0"))
 	    (match_dup 3)
 	    (match_operand:QI 4 "register_operand" "k"))
 	  (match_dup 3)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vfnmadd231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2}"
+  "vfnmadd231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2<round_op5>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_vmfnmadd_<mode>_maskz"
+(define_insn "*fmai_vmfnmadd_<mode>_maskz<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (neg:VF_128
-		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+		(match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
 	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
-	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>"))
 	    (match_operand:VF_128 4 "const0_operand")
 	    (match_operand:QI 5 "register_operand" "k,k"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "@
-   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
-   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+   vfnmadd132<ssescalarmodesuffix>\t{<round_op6>%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2<round_op6>}
+   vfnmadd213<ssescalarmodesuffix>\t{<round_op6>%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3<round_op6>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmsub_<mode>_mask"
+(define_insn "*fmai_fnmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (neg:VF_128
-		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+		(match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
 	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
 	      (neg:VF_128
-		(match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+		(match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
 	    (match_dup 1)
 	    (match_operand:QI 4 "register_operand" "k,k"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "@
-   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
-   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+   vfnmsub132<ssescalarmodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2<round_op5>}
+   vfnmsub213<ssescalarmodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3<round_op5>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmsub_<mode>_mask3"
+(define_insn "*fmai_fnmsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (neg:VF_128
 		(match_operand:VF_128 1 "nonimmediate_operand" "%v"))
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")
 	      (neg:VF_128
 	        (match_operand:VF_128 3 "register_operand" "0")))
 	    (match_dup 3)
@@ -3323,80 +3367,80 @@
 	  (match_dup 3)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vfnmsub231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2}"
+  "vfnmsub231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2<round_op5>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmsub_<mode>_maskz"
+(define_insn "*fmai_fnmsub_<mode>_maskz<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (fma:VF_128
 	      (neg:VF_128
-		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+		(match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
 	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
 	      (neg:VF_128
-	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,<round_constraint>")))
 	    (match_operand:VF_128 4 "const0_operand")
 	    (match_operand:QI 5 "register_operand" "k,k"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "@
-   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
-   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+   vfnmsub132<ssescalarmodesuffix>\t{<round_op6>%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2<round_op6>}
+   vfnmsub213<ssescalarmodesuffix>\t{<round_op6>%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3<round_op6>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fmsub_<mode>"
+(define_insn "*fmai_fmsub_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   2 "nonimmediate_operand" "vm, v")
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128   2 "nonimmediate_operand" "<round_constraint>,v")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "nonimmediate_operand" " v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfmsub132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfmsub213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmadd_<mode>"
+(define_insn "*fmai_fnmadd_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   3 "nonimmediate_operand" " v,vm"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128   3 "nonimmediate_operand" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfnmadd132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfnmadd213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmsub_<mode>"
+(define_insn "*fmai_fnmsub_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>, v"))
 	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "nonimmediate_operand" " v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfnmsub132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfnmsub213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
@@ -3517,18 +3561,18 @@
    (set_attr "prefix_rep" "0")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtsi2ss"
+(define_insn "sse_cvtsi2ss<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (float:SF (match_operand:SI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:SF (match_operand:SI 2 "nonimmediate_operand" "r,m,<round_constraint3>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    cvtsi2ss\t{%2, %0|%0, %2}
    cvtsi2ss\t{%2, %0|%0, %2}
-   vcvtsi2ss\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2ss\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "vector,double,*")
@@ -3538,18 +3582,18 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtsi2ssq"
+(define_insn "sse_cvtsi2ssq<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (float:SF (match_operand:DI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:SF (match_operand:DI 2 "nonimmediate_operand" "r,m,<round_constraint3>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE && TARGET_64BIT"
   "@
    cvtsi2ssq\t{%2, %0|%0, %2}
    cvtsi2ssq\t{%2, %0|%0, %2}
-   vcvtsi2ssq\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2ssq\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "vector,double,*")
@@ -3561,15 +3605,15 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtss2si"
+(define_insn "sse_cvtss2si<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(unspec:SI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE"
-  "%vcvtss2si\t{%1, %0|%0, %k1}"
+  "%vcvtss2si\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3591,15 +3635,15 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse_cvtss2siq"
+(define_insn "sse_cvtss2siq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(unspec:DI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE && TARGET_64BIT"
-  "%vcvtss2si{q}\t{%1, %0|%0, %k1}"
+  "%vcvtss2si{q}\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3653,50 +3697,50 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "cvtusi2<ssescalarmodesuffix>32"
+(define_insn "cvtusi2<ssescalarmodesuffix>32<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_duplicate:VF_128
 	    (unsigned_float:<ssescalarmode>
-	      (match_operand:SI 2 "nonimmediate_operand" "rm")))
+	      (match_operand:SI 2 "nonimmediate_operand" "<round_constraint3>")))
 	  (match_operand:VF_128 1 "register_operand" "v")
 	  (const_int 1)))]
-  "TARGET_AVX512F"
-  "vcvtusi2<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX512F && <round_modev4sf_condition>"
+  "vcvtusi2<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "cvtusi2<ssescalarmodesuffix>64"
+(define_insn "cvtusi2<ssescalarmodesuffix>64<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_duplicate:VF_128
 	    (unsigned_float:<ssescalarmode>
-	      (match_operand:DI 2 "nonimmediate_operand" "rm")))
+	      (match_operand:DI 2 "nonimmediate_operand" "<round_constraint3>")))
 	  (match_operand:VF_128 1 "register_operand" "v")
 	  (const_int 1)))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtusi2<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vcvtusi2<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "float<sseintvecmodelower><mode>2<mask_name>"
+(define_insn "float<sseintvecmodelower><mode>2<mask_name><round_name>"
   [(set (match_operand:VF1 0 "register_operand" "=v")
 	(float:VF1
-	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE2 && <mask_mode512bit_condition>"
-  "%vcvtdq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "<round_constraint>")))]
+  "TARGET_SSE2 && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "%vcvtdq2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "ufloatv16siv16sf2<mask_name>"
+(define_insn "ufloatv16siv16sf2<mask_name><round_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(unsigned_float:V16SF
-	  (match_operand:V16SI 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V16SI 1 "nonimmediate_operand" "<round_constraint>")))]
   "TARGET_AVX512F"
-  "vcvtudq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtudq2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -3731,24 +3775,24 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name>"
+(define_insn "<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name><round_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
-	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16SF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtps2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name>"
+(define_insn "<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name><round_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
-	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16SF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtps2udq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -3866,18 +3910,18 @@
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "DF")])
 
-(define_insn "sse2_cvtsi2sdq"
+(define_insn "sse2_cvtsi2sdq<round_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (vec_duplicate:V2DF
-	    (float:DF (match_operand:DI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:DF (match_operand:DI 2 "nonimmediate_operand" "r,m,<round_constraint3>")))
 	  (match_operand:V2DF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE2 && TARGET_64BIT"
   "@
    cvtsi2sdq\t{%2, %0|%0, %2}
    cvtsi2sdq\t{%2, %0|%0, %2}
-   vcvtsi2sdq\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2sdq\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,direct,*")
@@ -3888,28 +3932,28 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "DF")])
 
-(define_insn "avx512f_vcvtss2usi"
+(define_insn "avx512f_vcvtss2usi<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtss2usi\t{%1, %0|%0, %1}"
+  "vcvtss2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvtss2usiq"
+(define_insn "avx512f_vcvtss2usiq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec:DI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtss2usi\t{%1, %0|%0, %1}"
+  "vcvtss2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -3938,28 +3982,28 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvtsd2usi"
+(define_insn "avx512f_vcvtsd2usi<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtsd2usi\t{%1, %0|%0, %1}"
+  "vcvtsd2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvtsd2usiq"
+(define_insn "avx512f_vcvtsd2usiq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec:DI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtsd2usi\t{%1, %0|%0, %1}"
+  "vcvtsd2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -3988,15 +4032,15 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse2_cvtsd2si"
+(define_insn "sse2_cvtsd2si<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(unspec:SI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2"
-  "%vcvtsd2si\t{%1, %0|%0, %q1}"
+  "%vcvtsd2si\t{<round_op2>%1, %0|%0, %q1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -4019,15 +4063,15 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse2_cvtsd2siq"
+(define_insn "sse2_cvtsd2siq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(unspec:DI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2 && TARGET_64BIT"
-  "%vcvtsd2si{q}\t{%1, %0|%0, %q1}"
+  "%vcvtsd2si{q}\t{<round_op2>%1, %0|%0, %q1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -4147,13 +4191,13 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V2DF")])
 
-(define_insn "<mask_codefor>avx512f_cvtpd2dq512<mask_name>"
+(define_insn "<mask_codefor>avx512f_cvtpd2dq512<mask_name><round_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(unspec:V8SI
-	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V8DF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtpd2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -4221,13 +4265,13 @@
    (set_attr "athlon_decode" "vector")
    (set_attr "bdver1_decode" "double")])
 
-(define_insn "avx512f_ufix_notruncv8dfv8si<mask_name>"
+(define_insn "avx512f_ufix_notruncv8dfv8si<mask_name><round_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(unspec:V8SI
-	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V8DF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtpd2udq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -4298,19 +4342,19 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "sse2_cvtsd2ss<mask_scalar_name>"
+(define_insn "sse2_cvtsd2ss<mask_scalar_name><round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
 	    (float_truncate:V2SF
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m,vm")))
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m,<round_constraint>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE2"
   "@
    cvtsd2ss\t{%2, %0|%0, %2}
    cvtsd2ss\t{%2, %0|%0, %q2}
-   vcvtsd2ss\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %q2}"
+   vcvtsd2ss\t{<round_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %q2<round_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "athlon_decode" "vector,double,*")
@@ -4343,12 +4387,12 @@
    (set_attr "prefix" "orig,orig,<mask_scalar_prefix2>")
    (set_attr "mode" "DF")])
 
-(define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name>"
+(define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>"
   [(set (match_operand:V8SF 0 "register_operand" "=v")
 	(float_truncate:V8SF
-	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V8DF 1 "nonimmediate_operand" "<round_constraint>")))]
   "TARGET_AVX512F"
-  "vcvtpd2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtpd2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8SF")])
@@ -6695,28 +6739,28 @@
   operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8);
 })
 
-(define_insn "<mask_scalar_codefor>avx512f_vmscalef<mode><mask_scalar_name>"
+(define_insn "<mask_scalar_codefor>avx512f_vmscalef<mode><mask_scalar_name><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")]
 	    UNSPEC_SCALEF)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "%vscalef<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}"
+  "%vscalef<ssescalarmodesuffix>\t{<round_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2<round_mask_scalar_op3>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
-(define_insn "avx512f_scalef<mode><mask_name>"
+(define_insn "avx512f_scalef<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "register_operand" "v")
-	   (match_operand:VF_512 2 "nonimmediate_operand" "vm")]
+	   (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_SCALEF))]
   "TARGET_AVX512F"
-  "%vscalef<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+  "%vscalef<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<MODE>")])
 
@@ -8446,22 +8490,22 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "<code><mode>3<mask_name>"
+(define_expand "<code><mode>3<mask_name><round_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand")
 	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand")))]
-  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "TARGET_AVX2 && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*avx2_<code><mode>3<mask_name>"
+(define_insn "*avx2_<code><mode>3<mask_name><round_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand" "=v")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand" "%v")
-	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "vm")))]
+	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "<round_constraint>")))]
   "TARGET_AVX2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
-   && <mask_mode512bit_condition>"
-  "vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "vp<maxmin_int><ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "type" "sseiadd")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_evex")
@@ -12743,33 +12787,33 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "avx512er_exp2<mode><mask_name>"
+(define_insn "avx512er_exp2<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_EXP2))]
   "TARGET_AVX512ER"
-  "vexp2<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vexp2<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<mask_codefor>avx512er_rcp28<mode><mask_name>"
+(define_insn "<mask_codefor>avx512er_rcp28<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_RCP28))]
   "TARGET_AVX512ER"
-  "vrcp28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vrcp28<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<mask_codefor>avx512er_rsqrt28<mode><mask_name>"
+(define_insn "<mask_codefor>avx512er_rsqrt28<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_RSQRT28))]
   "TARGET_AVX512ER"
-  "vrsqrt28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vrsqrt28<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index f81741f..3e6e788 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -30,6 +30,16 @@
 (define_mode_iterator SUBST_S
   [QI HI SI DI])
 
+(define_mode_iterator SUBST_A
+  [V16QI
+   V16HI V8HI
+   V16SI V8SI  V4SI
+   V8DI  V4DI  V2DI
+   V16SF V8SF  V4SF
+   V8DF  V4DF  V2DF
+   QI HI SI DI SF DF
+   CCFP CCFPU])
+
 (define_subst_attr "mask_name" "mask" "" "_mask")
 (define_subst_attr "mask_applied" "mask" "false" "true")
 (define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2")
@@ -110,3 +120,34 @@
 	 (match_operand:SUBST_V 2 "const0_operand" "C")
 	 (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))
 ])
+
+(define_subst_attr "round_name" "round" "" "_round")
+(define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4")
+(define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5")
+(define_subst_attr "round_mask_scalar_operand3" "mask_scalar" "%R3" "%R5")
+(define_subst_attr "round_sd_mask_operand4" "sd" "%R4" "%R6")
+(define_subst_attr "round_op2" "round" "" "%R2")
+(define_subst_attr "round_op3" "round" "" "%R3")
+(define_subst_attr "round_op4" "round" "" "%R4")
+(define_subst_attr "round_op5" "round" "" "%R5")
+(define_subst_attr "round_op6" "round" "" "%R6")
+(define_subst_attr "round_mask_op2" "round" "" "<round_mask_operand2>")
+(define_subst_attr "round_mask_op3" "round" "" "<round_mask_operand3>")
+(define_subst_attr "round_mask_scalar_op3" "round" "" "<round_mask_scalar_operand3>")
+(define_subst_attr "round_sd_mask_op4" "round" "" "<round_sd_mask_operand4>")
+(define_subst_attr "round_constraint" "round" "vm" "v")
+(define_subst_attr "round_constraint2" "round" "m" "v")
+(define_subst_attr "round_constraint3" "round" "rm" "r")
+(define_subst_attr "round_mode512bit_condition" "round" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) == V8DFmode)")
+(define_subst_attr "round_modev4sf_condition" "round" "1" "(GET_MODE (operands[0]) == V4SFmode)")
+(define_subst_attr "round_codefor" "round" "*" "")
+(define_subst_attr "round_opnum" "round" "5" "6")
+
+(define_subst "round"
+  [(set (match_operand:SUBST_A 0)
+        (match_operand:SUBST_A 1))]
+  "TARGET_AVX512F"
+  [(parallel[
+     (set (match_dup 0)
+          (match_dup 1))
+     (unspec [(match_operand:SI 2 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst.
  2013-08-14  7:44 [PATCH i386 4/8] [AVX512] Add substed patterns Kirill Yukhin
                   ` (2 preceding siblings ...)
  2013-11-06  7:24 ` [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst Kirill Yukhin
@ 2013-11-06  7:28 ` Kirill Yukhin
  2013-11-15 18:18   ` Kirill Yukhin
  2013-11-06  7:39 ` [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst Kirill Yukhin
  2013-11-06  7:40 ` [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only " Kirill Yukhin
  5 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-06  7:28 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek; +Cc: GCC Patches

Hello,
This patch introduces `sae-only' (EVEX feature to Supress
Arithmetic Exceptions) subst.

Bootstrapped.

Is it ok for trunk?

--
Thanks, K

---
 gcc/config/i386/sse.md   | 198 +++++++++++++++++++++++------------------------
 gcc/config/i386/subst.md |  31 ++++++++
 2 files changed, 130 insertions(+), 99 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 61b96a1..2325328 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1577,63 +1577,63 @@
 ;; isn't really correct, as those rtl operators aren't defined when
 ;; applied to NaNs.  Hopefully the optimizers won't get too smart on us.
 
-(define_expand "<code><mode>3<mask_name>"
+(define_expand "<code><mode>3<mask_name><round_saeonly_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(smaxmin:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
 {
   if (!flag_finite_math_only)
     operands[1] = force_reg (<MODE>mode, operands[1]);
   ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);
 })
 
-(define_insn "*<code><mode>3_finite<mask_name>"
+(define_insn "*<code><mode>3_finite<mask_name><round_saeonly_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(smaxmin:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>")))]
   "TARGET_SSE && flag_finite_math_only
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
-   && <mask_mode512bit_condition>"
+   && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{<round_saeonly_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_saeonly_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*<code><mode>3<mask_name>"
+(define_insn "*<code><mode>3<mask_name><round_saeonly_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(smaxmin:VF
 	  (match_operand:VF 1 "register_operand" "0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>")))]
   "TARGET_SSE && !flag_finite_math_only
-   && <mask_mode512bit_condition>"
+   && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{<round_saeonly_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_saeonly_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<code><mode>3<mask_scalar_name>"
+(define_insn "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (smaxmin:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>"))
 	 (match_dup 1)
 	 (const_int 1)))]
   "TARGET_SSE"
   "@
    <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<maxmin_float><ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2}"
+   v<maxmin_float><ssescalarmodesuffix>\t{<round_saeonly_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2<round_saeonly_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "btver2_sse_attr" "maxmin")
@@ -2153,15 +2153,15 @@
   [(V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand")
   (V16SI "const_0_to_7_operand") (V8DI "const_0_to_7_operand")])
 
-(define_insn "avx512f_cmp<mode>3<mask_scalar_merge_name>"
+(define_insn "avx512f_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48F_512 1 "register_operand" "v")
-	   (match_operand:VI48F_512 2 "nonimmediate_operand" "vm")
+	   (match_operand:VI48F_512 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 3 "<cmp_imm_predicate>" "n")]
 	  UNSPEC_PCMP))]
-  "TARGET_AVX512F"
-  "v<sseintprefix>cmp<ssemodesuffix>\t{%3, %2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2, %3}"
+  "TARGET_AVX512F && <round_saeonly_mode512bit_condition_op1>"
+  "v<sseintprefix>cmp<ssemodesuffix>\t{%3, <round_saeonly_mask_scalar_merge_op4>%2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2<round_saeonly_mask_scalar_merge_op4>, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
@@ -2181,35 +2181,35 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vmcmp<mode>3"
+(define_insn "avx512f_vmcmp<mode>3<round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(and:<avx512fmaskmode>
 	  (unspec:<avx512fmaskmode>
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
 	    UNSPEC_PCMP)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vcmp<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vcmp<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_vmcmp<mode>3_mask"
+(define_insn "avx512f_vmcmp<mode>3_mask<round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(and:<avx512fmaskmode>
 	  (unspec:<avx512fmaskmode>
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
 	    UNSPEC_PCMP)
 	  (and:<avx512fmaskmode>
 	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k")
 	    (const_int 1))))]
   "TARGET_AVX512F"
-  "vcmp<ssescalarmodesuffix>\t{%3, %2, %1, %0%{%4%}|%0%{%4%}, %1, %2, %3}"
+  "vcmp<ssescalarmodesuffix>\t{%3, <round_saeonly_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_saeonly_op5>, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
@@ -2227,17 +2227,17 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<sse>_comi"
+(define_insn "<sse>_comi<round_saeonly_name>"
   [(set (reg:CCFP FLAGS_REG)
 	(compare:CCFP
 	  (vec_select:MODEF
 	    (match_operand:<ssevecmode> 0 "register_operand" "v")
 	    (parallel [(const_int 0)]))
 	  (vec_select:MODEF
-	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "vm")
+	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "SSE_FLOAT_MODE_P (<MODE>mode)"
-  "%vcomi<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "%vcomi<ssemodesuffix>\t{<round_saeonly_op2>%1, %0|%0, %<iptr>1<round_saeonly_op2>}"
   [(set_attr "type" "ssecomi")
    (set_attr "prefix" "maybe_vex")
    (set_attr "prefix_rep" "0")
@@ -2247,17 +2247,17 @@
 		      (const_string "0")))
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_ucomi"
+(define_insn "<sse>_ucomi<round_saeonly_name>"
   [(set (reg:CCFPU FLAGS_REG)
 	(compare:CCFPU
 	  (vec_select:MODEF
 	    (match_operand:<ssevecmode> 0 "register_operand" "v")
 	    (parallel [(const_int 0)]))
 	  (vec_select:MODEF
-	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "vm")
+	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "SSE_FLOAT_MODE_P (<MODE>mode)"
-  "%vucomi<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "%vucomi<ssemodesuffix>\t{<round_saeonly_op2>%1, %0|%0, %<iptr>1<round_saeonly_op2>}"
   [(set_attr "type" "ssecomi")
    (set_attr "prefix" "maybe_vex")
    (set_attr "prefix_rep" "0")
@@ -3665,14 +3665,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse_cvttss2si"
+(define_insn "sse_cvttss2si<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(fix:SI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_saeonly_constraint2>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE"
-  "%vcvttss2si\t{%1, %0|%0, %k1}"
+  "%vcvttss2si\t{<round_saeonly_op2>%1, %0|%0, %k1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3681,14 +3681,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse_cvttss2siq"
+(define_insn "sse_cvttss2siq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(fix:DI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "v,vm")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE && TARGET_64BIT"
-  "%vcvttss2si{q}\t{%1, %0|%0, %k1}"
+  "%vcvttss2si{q}\t{<round_saeonly_op2>%1, %0|%0, %k1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3797,12 +3797,12 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "<fixsuffix>fix_truncv16sfv16si2<mask_name>"
+(define_insn "<fixsuffix>fix_truncv16sfv16si2<mask_name><round_saeonly_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_fix:V16SI
-	  (match_operand:V16SF 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V16SF 1 "nonimmediate_operand" "<round_saeonly_constraint>")))]
   "TARGET_AVX512F"
-  "vcvttps2<fixsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvttps2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -3958,26 +3958,26 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvttss2usi"
+(define_insn "avx512f_vcvttss2usi<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unsigned_fix:SI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
-  "vcvttss2usi\t{%1, %0|%0, %1}"
+  "vcvttss2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvttss2usiq"
+(define_insn "avx512f_vcvttss2usiq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unsigned_fix:DI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvttss2usi\t{%1, %0|%0, %1}"
+  "vcvttss2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -4008,26 +4008,26 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvttsd2usi"
+(define_insn "avx512f_vcvttsd2usi<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unsigned_fix:SI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
-  "vcvttsd2usi\t{%1, %0|%0, %1}"
+  "vcvttsd2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvttsd2usiq"
+(define_insn "avx512f_vcvttsd2usiq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unsigned_fix:DI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvttsd2usi\t{%1, %0|%0, %1}"
+  "vcvttsd2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -4093,14 +4093,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse2_cvttsd2si"
+(define_insn "sse2_cvttsd2si<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(fix:SI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_saeonly_constraint2>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE2"
-  "%vcvttsd2si\t{%1, %0|%0, %q1}"
+  "%vcvttsd2si\t{<round_saeonly_op2>%1, %0|%0, %q1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -4110,14 +4110,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse2_cvttsd2siq"
+(define_insn "sse2_cvttsd2siq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(fix:DI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_saeonly_constraint2>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE2 && TARGET_64BIT"
-  "%vcvttsd2si{q}\t{%1, %0|%0, %q1}"
+  "%vcvttsd2si{q}\t{<round_saeonly_op2>%1, %0|%0, %q1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -4276,12 +4276,12 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
 
-(define_insn "<fixsuffix>fix_truncv8dfv8si2<mask_name>"
+(define_insn "<fixsuffix>fix_truncv8dfv8si2<mask_name><round_saeonly_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(any_fix:V8SI
-	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V8DF 1 "nonimmediate_operand" "<round_saeonly_constraint>")))]
   "TARGET_AVX512F"
-  "vcvttpd2<fixsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvttpd2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -4364,12 +4364,12 @@
    (set_attr "prefix" "orig,orig,<mask_scalar_prefix2>")
    (set_attr "mode" "SF")])
 
-(define_insn "sse2_cvtss2sd<mask_scalar_name>"
+(define_insn "sse2_cvtss2sd<mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (float_extend:V2DF
 	    (vec_select:V2SF
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m,vm")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m,<round_saeonly_constraint>")
 	      (parallel [(const_int 0) (const_int 1)])))
 	  (match_operand:V2DF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
@@ -4377,7 +4377,7 @@
   "@
    cvtss2sd\t{%2, %0|%0, %2}
    cvtss2sd\t{%2, %0|%0, %k2}
-   vcvtss2sd\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %k2}"
+   vcvtss2sd\t{<round_saeonly_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %k2<round_saeonly_mask_scalar_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "amdfam10_decode" "vector,double,*")
@@ -4442,12 +4442,12 @@
 (define_mode_attr sf2dfmode
   [(V8DF "V8SF") (V4DF "V4SF")])
 
-(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix><mask_name>"
+(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix><mask_name><round_saeonly_name>"
   [(set (match_operand:VF2_512_256 0 "register_operand" "=v")
 	(float_extend:VF2_512_256
-	  (match_operand:<sf2dfmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX && <mask_mode512bit_condition>"
-  "vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+	  (match_operand:<sf2dfmode> 1 "nonimmediate_operand" "<round_saeonly_constraint>")))]
+  "TARGET_AVX && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
+  "vcvtps2pd\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
@@ -6810,26 +6810,26 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getexp<mode><mask_name>"
+(define_insn "avx512f_getexp<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
-        (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+        (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "<round_saeonly_constraint>")]
                         UNSPEC_GETEXP))]
    "TARGET_AVX512F"
-   "vgetexp<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
+   "vgetexp<ssemodesuffix>\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_sgetexp<mode><mask_scalar_name>"
+(define_insn "avx512f_sgetexp<mode><mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")]
 	    UNSPEC_GETEXP)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetexp<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}";
+   "vgetexp<ssescalarmodesuffix>\t{<round_saeonly_mask_scalar_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2<round_saeonly_mask_scalar_op3>}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<ssescalarmode>")])
 
@@ -6891,32 +6891,32 @@
   DONE;
 })
 
-(define_insn "avx512f_fixupimm<mode><sd_maskz_name>"
+(define_insn "avx512f_fixupimm<mode><sd_maskz_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
         (unspec:VF_512
           [(match_operand:VF_512 1 "register_operand" "0")
 	   (match_operand:VF_512 2 "register_operand" "v")
-           (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "vm")
+           (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
            (match_operand:SI 4 "const_0_to_255_operand")]
            UNSPEC_FIXUPIMM))]
   "TARGET_AVX512F"
-  "vfixupimm<ssemodesuffix>\t{%4, %3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3, %4}";
+  "vfixupimm<ssemodesuffix>\t{%4, <round_saeonly_sd_mask_op5>%3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3<round_saeonly_sd_mask_op5>, %4}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fixupimm<mode>_mask"
+(define_insn "avx512f_fixupimm<mode>_mask<round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
           (unspec:VF_512
             [(match_operand:VF_512 1 "register_operand" "0")
 	     (match_operand:VF_512 2 "register_operand" "v")
-             (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "vm")
+             (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
              (match_operand:SI 4 "const_0_to_255_operand")]
              UNSPEC_FIXUPIMM)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfixupimm<ssemodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}";
+  "vfixupimm<ssemodesuffix>\t{%4, <round_saeonly_op6>%3, %2, %0%{%5%}|%0%{%5%}, %2, %3<round_saeonly_op6>, %4}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
@@ -6935,30 +6935,30 @@
   DONE;
 })
 
-(define_insn "avx512f_sfixupimm<mode><sd_maskz_name>"
+(define_insn "avx512f_sfixupimm<mode><sd_maskz_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
           (unspec:VF_128
             [(match_operand:VF_128 1 "register_operand" "0")
 	     (match_operand:VF_128 2 "register_operand" "v")
-	     (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "vm")
+	     (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 4 "const_0_to_255_operand")]
 	    UNSPEC_FIXUPIMM)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vfixupimm<ssescalarmodesuffix>\t{%4, %3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3, %4}";
+   "vfixupimm<ssescalarmodesuffix>\t{%4, <round_saeonly_sd_mask_op5>%3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3<round_saeonly_sd_mask_op5>, %4}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_sfixupimm<mode>_mask"
+(define_insn "avx512f_sfixupimm<mode>_mask<round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (unspec:VF_128
 	       [(match_operand:VF_128 1 "register_operand" "0")
 		(match_operand:VF_128 2 "register_operand" "v")
-		(match_operand:<sseintvecmode> 3 "nonimmediate_operand" "vm")
+		(match_operand:<sseintvecmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
 		(match_operand:SI 4 "const_0_to_255_operand")]
 	       UNSPEC_FIXUPIMM)
 	    (match_dup 1)
@@ -6966,34 +6966,34 @@
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfixupimm<ssescalarmodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}";
+  "vfixupimm<ssescalarmodesuffix>\t{%4, <round_saeonly_op6>%3, %2, %0%{%5%}|%0%{%5%}, %2, %3<round_saeonly_op6>, %4}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_rndscale<mode><mask_name>"
+(define_insn "avx512f_rndscale<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_AVX512F"
-  "vrndscale<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+  "vrndscale<ssemodesuffix>\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<mask_scalar_codefor>avx512f_rndscale<mode><mask_scalar_name>"
+(define_insn "<mask_scalar_codefor>avx512f_rndscale<mode><mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_255_operand")]
 	    UNSPEC_ROUND)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2, %3}"
+  "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_mask_scalar_op4>%2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2<round_saeonly_mask_scalar_op4>, %3}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -14818,13 +14818,13 @@
    (set_attr "btver2_decode" "double")
    (set_attr "mode" "V8SF")])
 
-(define_insn "<mask_codefor>avx512f_vcvtph2ps512<mask_name>"
+(define_insn "<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(unspec:V16SF
-	  [(match_operand:V16HI 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16HI 1 "nonimmediate_operand" "<round_saeonly_constraint>")]
 	  UNSPEC_VCVTPH2PS))]
   "TARGET_AVX512F"
-  "vcvtph2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtph2ps\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -15323,29 +15323,29 @@
    (set_attr "memory" "none,load")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getmant<mode><mask_name>"
+(define_insn "avx512f_getmant<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_GETMANT))]
   "TARGET_AVX512F"
-  "vgetmant<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}";
+  "vgetmant<ssemodesuffix>\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_getmant<mode><mask_scalar_name>"
+(define_insn "avx512f_getmant<mode><mask_scalar_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_15_operand")]
 	    UNSPEC_GETMANT)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetmant<ssescalarmodesuffix>\t{%3, %2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2, %3}";
+   "vgetmant<ssescalarmodesuffix>\t{%3, <round_saeonly_mask_scalar_op4>%2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2<round_saeonly_mask_scalar_op4>, %3}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 3e6e788..825c6d0 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -151,3 +151,34 @@
      (set (match_dup 0)
           (match_dup 1))
      (unspec [(match_operand:SI 2 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])])
+
+(define_subst_attr "round_saeonly_name" "round_saeonly" "" "_round")
+(define_subst_attr "round_saeonly_mask_operand2" "mask" "%R2" "%R4")
+(define_subst_attr "round_saeonly_mask_operand3" "mask" "%R3" "%R5")
+(define_subst_attr "round_saeonly_mask_scalar_operand3" "mask_scalar" "%R3" "%R5")
+(define_subst_attr "round_saeonly_mask_scalar_operand4" "mask_scalar" "%R4" "%R6")
+(define_subst_attr "round_saeonly_mask_scalar_merge_operand4" "mask_scalar_merge" "%R4" "%R5")
+(define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%R5" "%R7")
+(define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%R2")
+(define_subst_attr "round_saeonly_op4" "round_saeonly" "" "%R4")
+(define_subst_attr "round_saeonly_op5" "round_saeonly" "" "%R5")
+(define_subst_attr "round_saeonly_op6" "round_saeonly" "" "%R6")
+(define_subst_attr "round_saeonly_mask_op2" "round_saeonly" "" "<round_saeonly_mask_operand2>")
+(define_subst_attr "round_saeonly_mask_op3" "round_saeonly" "" "<round_saeonly_mask_operand3>")
+(define_subst_attr "round_saeonly_mask_scalar_op3" "round_saeonly" "" "<round_saeonly_mask_scalar_operand3>")
+(define_subst_attr "round_saeonly_mask_scalar_op4" "round_saeonly" "" "<round_saeonly_mask_scalar_operand4>")
+(define_subst_attr "round_saeonly_mask_scalar_merge_op4" "round_saeonly" "" "<round_saeonly_mask_scalar_merge_operand4>")
+(define_subst_attr "round_saeonly_sd_mask_op5" "round_saeonly" "" "<round_saeonly_sd_mask_operand5>")
+(define_subst_attr "round_saeonly_constraint" "round_saeonly" "vm" "v")
+(define_subst_attr "round_saeonly_constraint2" "round_saeonly" "m" "v")
+(define_subst_attr "round_saeonly_mode512bit_condition" "round_saeonly" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) == V8DFmode)")
+(define_subst_attr "round_saeonly_mode512bit_condition_op1" "round_saeonly" "1" "(GET_MODE (operands[1]) == V16SFmode || GET_MODE (operands[1]) == V8DFmode)")
+
+(define_subst "round_saeonly"
+  [(set (match_operand:SUBST_A 0)
+        (match_operand:SUBST_A 1))]
+  "TARGET_AVX512F"
+  [(parallel[
+     (set (match_dup 0)
+          (match_dup 1))
+     (unspec [(match_operand:SI 2 "const_4_to_5_operand")] UNSPEC_EMBEDDED_ROUNDING)])])


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst.
  2013-08-14  7:44 [PATCH i386 4/8] [AVX512] Add substed patterns Kirill Yukhin
                   ` (3 preceding siblings ...)
  2013-11-06  7:28 ` [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst Kirill Yukhin
@ 2013-11-06  7:39 ` Kirill Yukhin
  2013-11-15 18:19   ` Kirill Yukhin
  2013-11-06  7:40 ` [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only " Kirill Yukhin
  5 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-06  7:39 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek; +Cc: GCC Patches

Hello,
This patch introduces dedicated subst to add rounding to
structureless expands.

Bootstrapped.

Is it ok for trunk?

--
Thanks, K

---
 gcc/config/i386/sse.md   | 24 ++++++++++++------------
 gcc/config/i386/subst.md | 18 ++++++++++++++++++
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2325328..5aa1563 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2752,17 +2752,17 @@
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_expand "avx512f_fmadd_<mode>_maskz"
+(define_expand "avx512f_fmadd_<mode>_maskz<round_expand_name>"
   [(match_operand:VF_512 0 "register_operand")
-   (match_operand:VF_512 1 "nonimmediate_operand")
-   (match_operand:VF_512 2 "nonimmediate_operand")
-   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:VF_512 1 "<round_expand_predicate>")
+   (match_operand:VF_512 2 "<round_expand_predicate>")
+   (match_operand:VF_512 3 "<round_expand_predicate>")
    (match_operand:<avx512fmaskmode> 4 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_fma_fmadd_<mode>_maskz_1 (
+  emit_insn (gen_fma_fmadd_<mode>_maskz_1<round_expand_name> (
     operands[0], operands[1], operands[2], operands[3],
-    CONST0_RTX (<MODE>mode), operands[4]));
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
   DONE;
 })
 
@@ -2994,17 +2994,17 @@
 	  UNSPEC_FMADDSUB))]
   "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
 
-(define_expand "avx512f_fmaddsub_<mode>_maskz"
+(define_expand "avx512f_fmaddsub_<mode>_maskz<round_expand_name>"
   [(match_operand:VF_512 0 "register_operand")
-   (match_operand:VF_512 1 "nonimmediate_operand")
-   (match_operand:VF_512 2 "nonimmediate_operand")
-   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:VF_512 1 "<round_expand_predicate>")
+   (match_operand:VF_512 2 "<round_expand_predicate>")
+   (match_operand:VF_512 3 "<round_expand_predicate>")
    (match_operand:<avx512fmaskmode> 4 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_fma_fmaddsub_<mode>_maskz_1 (
+  emit_insn (gen_fma_fmaddsub_<mode>_maskz_1<round_expand_name> (
     operands[0], operands[1], operands[2], operands[3],
-    CONST0_RTX (<MODE>mode), operands[4]));
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
   DONE;
 })
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 825c6d0..a3b2714 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -182,3 +182,21 @@
      (set (match_dup 0)
           (match_dup 1))
      (unspec [(match_operand:SI 2 "const_4_to_5_operand")] UNSPEC_EMBEDDED_ROUNDING)])])
+
+(define_subst_attr "round_expand_name" "round_expand" "" "_round")
+(define_subst_attr "round_expand_predicate" "round_expand" "nonimmediate_operand" "register_operand")
+(define_subst_attr "round_expand_operand" "round_expand" "" ", operands[5]")
+
+(define_subst "round_expand"
+ [(match_operand:SUBST_V 0)
+  (match_operand:SUBST_V 1)
+  (match_operand:SUBST_V 2)
+  (match_operand:SUBST_V 3)
+  (match_operand:SUBST_S 4)]
+  "TARGET_AVX512F"
+  [(match_dup 0)
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)
+   (match_dup 4)
+   (unspec [(match_operand:SI 5 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.
  2013-08-14  7:44 [PATCH i386 4/8] [AVX512] Add substed patterns Kirill Yukhin
                   ` (4 preceding siblings ...)
  2013-11-06  7:39 ` [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst Kirill Yukhin
@ 2013-11-06  7:40 ` Kirill Yukhin
  2013-11-15 18:20   ` Kirill Yukhin
  5 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-06  7:40 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek; +Cc: GCC Patches

Hello,
This patch introduces sae-only feature for
structureless expands.

Bootstrapped.

Is it ok for trunk?

--
Thanks, K

---
 gcc/config/i386/sse.md   | 18 ++++++++++--------
 gcc/config/i386/subst.md | 20 ++++++++++++++++++++
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5aa1563..321d969 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6876,18 +6876,19 @@
 })
 
 
-(define_expand "avx512f_fixupimm<mode>_maskz"
+(define_expand "avx512f_fixupimm<mode>_maskz<round_saeonly_expand_name5>"
   [(match_operand:VF_512 0 "register_operand")
    (match_operand:VF_512 1 "register_operand")
    (match_operand:VF_512 2 "register_operand")
-   (match_operand:<sseintvecmode> 3 "nonimmediate_operand")
+   (match_operand:<sseintvecmode> 3 "<round_saeonly_expand_predicate5>")
    (match_operand:SI 4 "const_0_to_255_operand")
    (match_operand:<avx512fmaskmode> 5 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_avx512f_fixupimm<mode>_maskz_1 (
+  emit_insn (gen_avx512f_fixupimm<mode>_maskz_1<round_saeonly_expand_name5> (
 	operands[0], operands[1], operands[2], operands[3],
-	operands[4], CONST0_RTX (<MODE>mode), operands[5]));
+	operands[4], CONST0_RTX (<MODE>mode), operands[5]
+	<round_saeonly_expand_operand6>));
   DONE;
 })
 
@@ -6920,18 +6921,19 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_expand "avx512f_sfixupimm<mode>_maskz"
+(define_expand "avx512f_sfixupimm<mode>_maskz<round_saeonly_expand_name5>"
   [(match_operand:VF_128 0 "register_operand")
    (match_operand:VF_128 1 "register_operand")
    (match_operand:VF_128 2 "register_operand")
-   (match_operand:<sseintvecmode> 3 "nonimmediate_operand")
+   (match_operand:<sseintvecmode> 3 "<round_saeonly_expand_predicate5>")
    (match_operand:SI 4 "const_0_to_255_operand")
    (match_operand:<avx512fmaskmode> 5 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_avx512f_sfixupimm<mode>_maskz_1 (
+  emit_insn (gen_avx512f_sfixupimm<mode>_maskz_1<round_saeonly_expand_name5> (
 	operands[0], operands[1], operands[2], operands[3],
-	operands[4], CONST0_RTX (<MODE>mode), operands[5]));
+	operands[4], CONST0_RTX (<MODE>mode), operands[5]
+	<round_saeonly_expand_operand6>));
   DONE;
 })
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index a3b2714..fcc5e8c 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -200,3 +200,23 @@
    (match_dup 3)
    (match_dup 4)
    (unspec [(match_operand:SI 5 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])
+
+(define_subst_attr "round_saeonly_expand_name5" "round_saeonly_expand5" "" "_round")
+(define_subst_attr "round_saeonly_expand_predicate5" "round_saeonly_expand5" "nonimmediate_operand" "register_operand")
+(define_subst_attr "round_saeonly_expand_operand6" "round_saeonly_expand5" "" ", operands[6]")
+
+(define_subst "round_saeonly_expand5"
+ [(match_operand:SUBST_V 0)
+  (match_operand:SUBST_V 1)
+  (match_operand:SUBST_V 2)
+  (match_operand:SUBST_A 3)
+  (match_operand:SI 4)
+  (match_operand:SUBST_S 5)]
+  "TARGET_AVX512F"
+  [(match_dup 0)
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)
+   (match_dup 4)
+   (match_dup 5)
+   (unspec [(match_operand:SI 6 "const_4_to_5_operand")] UNSPEC_EMBEDDED_ROUNDING)])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-11-05 13:06         ` Kirill Yukhin
@ 2013-11-11 13:52           ` Kirill Yukhin
  2013-11-11 23:34           ` Richard Henderson
  1 sibling, 0 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-11 13:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello,
On 05 Nov 16:05, Kirill Yukhin wrote:
> Is it ok with that change?

Ping?

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-11-05 13:06         ` Kirill Yukhin
  2013-11-11 13:52           ` Kirill Yukhin
@ 2013-11-11 23:34           ` Richard Henderson
  1 sibling, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-11-11 23:34 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

On 11/05/2013 11:05 PM, Kirill Yukhin wrote:
> Hello,
> Small correction.
> 
> On 01 Nov 16:19, Kirill Yukhin wrote:
>> +(define_insn "avx512f_store<mode>_mask"
>> +  [(set (match_operand:VI48F_512 0 "memory_operand" "=m")
>> +	(vec_merge:VI48F_512
>> +	  (match_operand:VI48F_512 1 "register_operand" "v")
>> +	  (match_dup 0)
>> +	  (match_operand:<avx512fmaskmode> 2 "register_operand" "k")))]
>> +  "TARGET_AVX512F"
>> +{
>> +  switch (<sseinsnmode>mode)
> Need to be MODE_<sseinsnmode>. Same for load.
> 
> Is it ok with that change?

Yes, of course that's what I meant.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
  2013-10-28 15:13   ` [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst Kirill Yukhin
@ 2013-11-13 14:44     ` Kirill Yukhin
  2013-11-15 17:59       ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-13 14:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

Hello,
> Is it ok to commit to main trunk?

I've moved to this patch few hunks from
[4/8] [1/n], which related to scalar insns.

Updated patch in the bottom.

Testing pass.

Could you pls comment on it?
I cannot imagine anything except double nested vec_merges.
We may introduce (non-substed) dedicated patterns of
such a kind:
(cond_exec (BI mode k reg)
	   (op:SF/DF (SF/DF operand1)
	   	     (SF/DF operand2)))
However this only applicable to merge variant of EVEX's
masking feature.
For zero-masking, when we need to put 0 with predicate is
false into destination (first element of vector) cond_exec
is not applicable. Maybe extend cond_exec? Maybe use 
if_then_else?

--
Thanks, K

---
 gcc/config/i386/sse.md   | 342 ++++++++++++++++++++++++++++++++++++++++++-----
 gcc/config/i386/subst.md |  23 ++++
 2 files changed, 333 insertions(+), 32 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6d6e16e..dddf907 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1097,6 +1097,67 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512f_moves<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (match_operand:VF_128 2 "register_operand" "v")
+	    (match_operand:VF_128 3 "vector_move_operand" "0C")
+	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k"))
+	  (match_operand:VF_128 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512f_loads<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (vec_duplicate:VF_128
+	      (match_operand:<ssescalarmode> 1 "memory_operand"))
+	    (match_operand:VF_128 2 "vector_move_operand")
+	    (match_operand:<avx512fmaskmode> 3 "register_operand"))
+	  (match_dup 4)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "operands[4] = CONST0_RTX (<MODE>mode);")
+
+(define_insn "*avx512f_loads<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (vec_duplicate:VF_128
+	      (match_operand:<ssescalarmode> 1 "memory_operand" "m"))
+	    (match_operand:VF_128 2 "vector_move_operand" "0C")
+	    (match_operand:<avx512fmaskmode> 3 "register_operand" "k"))
+	  (match_operand:VF_128 4 "const0_operand")
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vmov<ssescalarmodesuffix>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "load")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512f_stores<mode>_mask"
+  [(set (match_operand:<ssescalarmode> 0 "memory_operand" "=m")
+	(vec_select:<ssescalarmode>
+	  (vec_merge:VF_128
+	    (match_operand:VF_128 1 "register_operand" "v")
+	    (vec_duplicate:VF_128
+	      (match_dup 0))
+	    (match_operand:<avx512fmaskmode> 2 "register_operand" "k"))
+	  (parallel [(const_int 0)])))]
+  "TARGET_AVX512F"
+  "vmov<ssescalarmodesuffix>\t{%1, %0%{%2%}|%0%{%2%}, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "memory" "store")
+   (set_attr "mode" "<ssescalarmode>")])
+
 (define_insn "<sse3>_lddqu<avxsizesuffix>"
   [(set (match_operand:VI1 0 "register_operand" "=x")
 	(unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")]
@@ -1246,7 +1307,7 @@
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<plusminus_insn><mode>3"
+(define_insn "<sse>_vm<plusminus_insn><mode>3<mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (plusminus:VF_128
@@ -1257,10 +1318,10 @@
   "TARGET_SSE"
   "@
    <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_expand "mul<mode>3<mask_name>"
@@ -1286,7 +1347,7 @@
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<multdiv_mnemonic><mode>3"
+(define_insn "<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (multdiv:VF_128
@@ -1297,10 +1358,10 @@
   "TARGET_SSE"
   "@
    <multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1385,7 +1446,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*srcp14<mode>"
+(define_insn "<mask_scalar_codefor>srcp14<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1395,7 +1456,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -1432,7 +1493,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vmsqrt<mode>2"
+(define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (sqrt:VF_128
@@ -1442,11 +1503,11 @@
   "TARGET_SSE"
   "@
    sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vsqrt<ssescalarmodesuffix>\t{%1, %2, %0|%0, %2, %<iptr>1}"
+   vsqrt<ssescalarmodesuffix>\t{%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %<iptr>1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1481,7 +1542,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*rsqrt14<mode>"
+(define_insn "<mask_scalar_codefor>rsqrt14<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1491,7 +1552,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrsqrt14<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrsqrt14<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -1561,7 +1622,7 @@
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<code><mode>3"
+(define_insn "<sse>_vm<code><mode>3<mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (smaxmin:VF_128
@@ -1572,11 +1633,11 @@
   "TARGET_SSE"
   "@
    <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<maxmin_float><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<maxmin_float><ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<mask_scalar_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
 ;; These versions of the min/max patterns implement exactly the operations
@@ -2691,7 +2752,7 @@
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_insn "*fma_fmadd_<mode>"
+(define_insn "fma_fmadd_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x")
@@ -2919,7 +2980,7 @@
 	  UNSPEC_FMADDSUB))]
   "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
 
-(define_insn "*fma_fmaddsub_<mode>"
+(define_insn "fma_fmaddsub_<mode>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
 	  [(match_operand:VF 1 "nonimmediate_operand" "%0, 0, v, x,x")
@@ -3056,6 +3117,223 @@
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*fmai_fmsub_<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
+   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fmai_vmfmsub_<mode>_mask3"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "%v")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "register_operand" "0")))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfmsub231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmsub_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
+   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmsub_<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
+   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fmsub_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
+          (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
+   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_vmfnmadd_<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
+   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_vmfnmadd_<mode>_mask3"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 1 "nonimmediate_operand" "%v"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	      (match_operand:VF_128 3 "register_operand" "0"))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfnmadd231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_vmfnmadd_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
+   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fnmsub_<mode>_mask"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (neg:VF_128
+		(match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_dup 1)
+	    (match_operand:QI 4 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %<iptr>3, %<iptr>2}
+   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fnmsub_<mode>_mask3"
+  [(set (match_operand:VF_128 0 "register_operand" "=v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 1 "nonimmediate_operand" "%v"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	      (neg:VF_128
+	        (match_operand:VF_128 3 "register_operand" "0")))
+	    (match_dup 3)
+	    (match_operand:QI 4 "register_operand" "k"))
+	  (match_dup 3)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "vfnmsub231<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %<iptr>1, %<iptr>2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fmai_fnmsub_<mode>_maskz"
+  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (neg:VF_128
+		(match_operand:VF_128 2 "nonimmediate_operand" "vm,v"))
+	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
+	      (neg:VF_128
+	        (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")))
+	    (match_operand:VF_128 4 "const0_operand")
+	    (match_operand:QI 5 "register_operand" "k,k"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "@
+   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>3, %<iptr>2}
+   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %<iptr>2, %<iptr>3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*fmai_fmsub_<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
@@ -4006,7 +4284,7 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "sse2_cvtsd2ss"
+(define_insn "sse2_cvtsd2ss<mask_scalar_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
@@ -4018,17 +4296,17 @@
   "@
    cvtsd2ss\t{%2, %0|%0, %2}
    cvtsd2ss\t{%2, %0|%0, %q2}
-   vcvtsd2ss\t{%2, %1, %0|%0, %1, %q2}"
+   vcvtsd2ss\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %q2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "athlon_decode" "vector,double,*")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "bdver1_decode" "direct,direct,*")
    (set_attr "btver2_decode" "double,double,double")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,<mask_scalar_prefix2>")
    (set_attr "mode" "SF")])
 
-(define_insn "sse2_cvtss2sd"
+(define_insn "sse2_cvtss2sd<mask_scalar_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (float_extend:V2DF
@@ -4041,14 +4319,14 @@
   "@
    cvtss2sd\t{%2, %0|%0, %2}
    cvtss2sd\t{%2, %0|%0, %k2}
-   vcvtss2sd\t{%2, %1, %0|%0, %1, %k2}"
+   vcvtss2sd\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %k2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "athlon_decode" "direct,direct,*")
    (set_attr "bdver1_decode" "direct,direct,*")
    (set_attr "btver2_decode" "double,double,double")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,<mask_scalar_prefix2>")
    (set_attr "mode" "DF")])
 
 (define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name>"
@@ -6403,7 +6681,7 @@
   operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8);
 })
 
-(define_insn "*avx512f_vmscalef<mode>"
+(define_insn "<mask_scalar_codefor>avx512f_vmscalef<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -6413,7 +6691,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "%vscalef<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vscalef<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
@@ -6468,7 +6746,7 @@
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_sgetexp<mode>"
+(define_insn "avx512f_sgetexp<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -6478,7 +6756,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetexp<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+   "vgetexp<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %2}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<ssescalarmode>")])
 
@@ -6600,7 +6878,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512f_rndscale<mode>"
+(define_insn "<mask_scalar_codefor>avx512f_rndscale<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -6611,7 +6889,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2, %3}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -14914,7 +15192,7 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_getmant<mode>"
+(define_insn "avx512f_getmant<mode><mask_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -14925,7 +15203,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetmant<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+   "vgetmant<ssescalarmodesuffix>\t{%3, %2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2, %3}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 6b45d05..532a3a1 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -54,3 +54,26 @@
 	  (match_dup 1)
 	  (match_operand:SUBST_V 2 "vector_move_operand" "0C")
 	  (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))])
+
+(define_subst_attr "mask_scalar_name" "mask_scalar" "" "_mask")
+(define_subst_attr "mask_scalar_operand3" "mask_scalar" "" "%{%4%}%N3")
+(define_subst_attr "mask_scalar_operand4" "mask_scalar" "" "%{%5%}%N4")
+(define_subst_attr "mask_scalar_codefor" "mask_scalar" "*" "")
+(define_subst_attr "mask_scalar_prefix" "mask_scalar" "orig,vex" "evex")
+(define_subst_attr "mask_scalar_prefix2" "mask_scalar" "vex" "evex")
+
+(define_subst "mask_scalar"
+  [(set (match_operand:SUBST_V 0)
+	(vec_merge:SUBST_V
+	  (match_operand:SUBST_V 1)
+	  (match_operand:SUBST_V 2)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+        (vec_merge:SUBST_V
+	  (vec_merge:SUBST_V
+	    (match_dup 1)
+	    (match_operand:SUBST_V 4 "vector_move_operand" "0C")
+	    (match_operand:<avx512fmaskmode> 5 "register_operand" "k"))
+	  (match_dup 2)
+	  (const_int 1)))])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
  2013-11-13 14:44     ` Kirill Yukhin
@ 2013-11-15 17:59       ` Kirill Yukhin
  2013-11-19  9:37         ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-15 17:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek, law

Helllo,
On 13 Nov 16:18, Kirill Yukhin wrote:
> Testing pass.
Ping?

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst.
  2013-11-06  7:22 ` [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst Kirill Yukhin
@ 2013-11-15 18:04   ` Kirill Yukhin
  2013-11-19  9:42     ` Kirill Yukhin
  2013-11-26  7:21   ` Richard Henderson
  1 sibling, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-15 18:04 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 06 Nov 10:15, Kirill Yukhin wrote:
> Bootstrap pass.
> 
> Is it ok for trunk?
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.
  2013-11-06  7:24 ` [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst Kirill Yukhin
@ 2013-11-15 18:07   ` Kirill Yukhin
  2013-11-19  9:45     ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-15 18:07 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 06 Nov 10:18, Kirill Yukhin wrote:
> Corrsponding subst implementation is in the bottom.
> 
> Bootstrapped.
> 
> Is it ok for trunk?
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst.
  2013-11-06  7:28 ` [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst Kirill Yukhin
@ 2013-11-15 18:18   ` Kirill Yukhin
  2013-11-19  9:58     ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-15 18:18 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 06 Nov 10:21, Kirill Yukhin wrote:
> Bootstrapped.
> 
> Is it ok for trunk?
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst.
  2013-11-06  7:39 ` [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst Kirill Yukhin
@ 2013-11-15 18:19   ` Kirill Yukhin
  2013-11-19  9:53     ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-15 18:19 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 06 Nov 10:24, Kirill Yukhin wrote:
> Bootstrapped.
> 
> Is it ok for trunk?
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.
  2013-11-06  7:40 ` [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only " Kirill Yukhin
@ 2013-11-15 18:20   ` Kirill Yukhin
  2013-11-19 10:06     ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-15 18:20 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 06 Nov 10:27, Kirill Yukhin wrote:
> Hello,
> This patch introduces sae-only feature for
> structureless expands.
> 
> Bootstrapped.
> 
> Is it ok for trunk?
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
  2013-11-15 17:59       ` Kirill Yukhin
@ 2013-11-19  9:37         ` Kirill Yukhin
  2013-12-02 13:10           ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-19  9:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek, law

Hello,
On 15 Nov 20:03, Kirill Yukhin wrote:
> Ping?
Ping?

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst.
  2013-11-15 18:04   ` Kirill Yukhin
@ 2013-11-19  9:42     ` Kirill Yukhin
  2013-11-19  9:48       ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-19  9:42 UTC (permalink / raw)
  To: Urosrth, Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 15 Nov 20:05, Kirill Yukhin wrote:
> > Is it ok for trunk?
> Ping.
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.
  2013-11-15 18:07   ` Kirill Yukhin
@ 2013-11-19  9:45     ` Kirill Yukhin
  2013-12-02 13:11       ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-19  9:45 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 15 Nov 20:06, Kirill Yukhin wrote:
> Ping.
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst.
  2013-11-19  9:42     ` Kirill Yukhin
@ 2013-11-19  9:48       ` Kirill Yukhin
  0 siblings, 0 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-19  9:48 UTC (permalink / raw)
  To: rth, ubizjak, Jakub Jelinek, law; +Cc: GCC Patches

Adding back Richard.

On 19 Nov 12:07, Kirill Yukhin wrote:
> Hello,
> On 15 Nov 20:05, Kirill Yukhin wrote:
> > > Is it ok for trunk?
> > Ping.
> Ping.
> 
> --
> Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst.
  2013-11-15 18:19   ` Kirill Yukhin
@ 2013-11-19  9:53     ` Kirill Yukhin
  2013-12-02 13:13       ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-19  9:53 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 15 Nov 20:08, Kirill Yukhin wrote:
> > Is it ok for trunk?
> Ping.
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst.
  2013-11-15 18:18   ` Kirill Yukhin
@ 2013-11-19  9:58     ` Kirill Yukhin
  2013-12-02 13:13       ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-19  9:58 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 15 Nov 20:07, Kirill Yukhin wrote:
> > Is it ok for trunk?
> Ping.
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.
  2013-11-15 18:20   ` Kirill Yukhin
@ 2013-11-19 10:06     ` Kirill Yukhin
  2013-12-02 13:14       ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-11-19 10:06 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 15 Nov 20:09, Kirill Yukhin wrote:
> > Is it ok for trunk?
> Ping.
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
  2013-11-01 12:20       ` Kirill Yukhin
  2013-11-05  9:19         ` Kirill Yukhin
  2013-11-05 13:06         ` Kirill Yukhin
@ 2013-11-26  7:01         ` Richard Henderson
  2 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-11-26  7:01 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek

On 11/01/2013 10:19 PM, Kirill Yukhin wrote:
> Hello Richard,
> 
> On 21 Oct 16:01, Richard Henderson wrote:
>> Error on V16SF.  Probably better to fill this out.
> Thanks, fixed.
> 
>> Better to just use <sseinsnmode> here, as it's a compile-time constant.
> Fixed.
> 
>>> +(define_insn "avx512f_store<mode>_mask"
>>
>> Likewise.
> Fixed.
> 
>> Nested vec_merge?  That seems... odd to say the least.
>> How in the world does this get matched?
> Moved to separate patch.
> 
>>> +(define_insn "*avx512f_loads<mode>_mask"
>>
>> Likewise.
> Moved to separate patch.
> 
>>> +(define_insn "avx512f_stores<mode>_mask"
>> This seems similar, though of course it's an extract.
>> I still can't imagine how it could be used.
> Separate patch.
> 
>>> -(define_insn "rcp14<mode>"
>>> +(define_insn "<mask_codefor>rcp14<mode><mask_name>"
>>
>> What, this name isn't used for non-masked anymore?
> Bogus original pattern. Introduced by me here:
>       svn+ssh://gcc.gnu.org/svn/gcc/trunk@203609
> This (and subseqent patterns you mention) are not used
> by name. In general case for every built-in whether it is
> masked or not we use masked version of built-in. E.g.:
> extern __inline __m512d
> __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> _mm512_rcp14_pd (__m512d __A)
> {
>   return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A,
>                                                    (__v8df)
>                                                    _mm512_setzero_pd (),
>                                                    (__mmask8) -1);
> }
> Then, while expanding, if we can prove that mask is const -1 we remove
> redundant vec_merge getting non-masked variant of the built-in.
> Same holds for all inssn you mentioned
> 
> 
>>> -(define_insn "srcp14<mode>"
>> Likewise.  These changes don't belong in this patch.
> Ditto.
> 
>>> -(define_insn "rsqrt14<mode>"
>> Likewise.
> Ditto.
> 
>>> -	  (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
>>> +	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x")
>>> +	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm,v,vm,x,m")
>>> +	  (match_operand:FMAMODE 3 "nonimmediate_operand" "v,vm,0,xm,x")))]
>>
>> Unrelated changes.  Repeated throughout the fma patterns.
> Reverted.
> 
>>> +(define_insn "*fmai_fmadd_<mode>_maskz"
>>> +  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
>>> +	(vec_merge:VF_128
>>> +	  (vec_merge:VF_128
>>> +	    (fma:VF_128
>>> +	      (match_operand:VF_128 1 "nonimmediate_operand" "0,0")
>>> +	      (match_operand:VF_128 2 "nonimmediate_operand" "vm,v")
>>> +	      (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))
>>> +	    (match_operand:VF_128 4 "const0_operand")
>>> +	    (match_operand:QI 5 "register_operand" "k,k"))
>>> +	  (match_dup 1)
>>> +	  (const_int 1)))]
>>> +  "TARGET_AVX512F"
>>> +  "@
>>> +   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %3, %2}
>>> +   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3}"
>>> +  [(set_attr "type" "ssemuladd")
>>> +   (set_attr "mode" "<MODE>")])
>>
>> These seem like useless patterns.  If they're for builtins,
>> then they seem like useless builtins.  See above.
> Moved to dedicated patch.
> 
>>> @@ -3686,8 +4328,8 @@
>>>     (set_attr "athlon_decode" "vector,double,*")
>>>     (set_attr "amdfam10_decode" "vector,double,*")
>>>     (set_attr "bdver1_decode" "direct,direct,*")
>>> -   (set_attr "btver2_decode" "double,double,double")
>>>     (set_attr "prefix" "orig,orig,vex")
>>> +   (set_attr "btver2_decode" "double,double,double")
>>>     (set_attr "mode" "SF")])
>>
>> Unrelated changes.
> Ugh, reverted.
> 
>>> +(define_expand "vec_unpacku_float_hi_v16si"
>>> +  [(match_operand:V8DF 0 "register_operand")
>>> +   (match_operand:V16SI 1 "register_operand")]
>>> +  "TARGET_AVX512F"
>>> +{
>>> +  REAL_VALUE_TYPE TWO32r;
>>> +  rtx k, x, tmp[4];
>>> +
>>> +  real_ldexp (&TWO32r, &dconst1, 32);
>>> +  x = const_double_from_real_value (TWO32r, DFmode);
>>> +
>>> +  tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode));
>>> +  tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x));
>>> +  tmp[2] = gen_reg_rtx (V8DFmode);
>>> +  tmp[3] = gen_reg_rtx (V8SImode);
>>> +  k = gen_reg_rtx (QImode);
>>> +
>>> +  emit_insn (gen_vec_extract_hi_v16si (tmp[3], operands[1]));
>>> +  emit_insn (gen_floatv8siv8df2 (tmp[2], tmp[3]));
>>> +  emit_insn (gen_rtx_SET (VOIDmode, k,
>>> +			  gen_rtx_LT (QImode, tmp[2], tmp[0])));
>>> +  emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k));
>>> +  emit_move_insn (operands[0], tmp[2]);
>>> +  DONE;
>>> +})
>>
>> Separate patch.  And this is too complicated, since vcvtudq2pd exists.
> Moved to [5/8] Extend hooks.
> 
>> Non-masked name change again.
> See above.
>>> +(define_insn "<mask_codefor>avx512f_unpcklps512<mask_name>"
>>
>> Ditto.
> Above.
> 
>>> +(define_insn "<mask_codefor>avx512f_movshdup512<mask_name>"
>>
>> Ditto.
> Above.
> 
>>> +(define_insn "<mask_codefor>avx512f_movsldup512<mask_name>"
>>
>> Ditto.
> Above.
> 
> Updated patch in the bottom.
> 
> Bootstrapped.
> 
> Coould you pls take a look?

This is ok now.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst.
  2013-11-06  7:22 ` [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst Kirill Yukhin
  2013-11-15 18:04   ` Kirill Yukhin
@ 2013-11-26  7:21   ` Richard Henderson
  1 sibling, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2013-11-26  7:21 UTC (permalink / raw)
  To: Kirill Yukhin, Uros Bizjak, Jakub Jelinek; +Cc: GCC Patches

On 11/06/2013 05:15 PM, Kirill Yukhin wrote:
> Hello,
> This small patch introduces `sd' subst.
> `sd' (Source-Destination) subst is almost the same, as
> the usual mask-subst, but it's only used for zero-masking. The reason is that
> some patterns already have an operand with constraint "0" and we can't add a new
> operand with the same constraint. So, we add only zero-masking here by subst and
> manually write a pattern for merge-masking where we use match_dup instead of an
> operand with constraint "0".
> 
> Bootstrap pass.


Ok.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
  2013-11-19  9:37         ` Kirill Yukhin
@ 2013-12-02 13:10           ` Kirill Yukhin
  2013-12-24  4:56             ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-02 13:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek, law

Hello,

On 19 Nov 12:05, Kirill Yukhin wrote:
> Hello,
> On 15 Nov 20:03, Kirill Yukhin wrote:
> > Ping?
> Ping?
Ping?

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.
  2013-11-19  9:45     ` Kirill Yukhin
@ 2013-12-02 13:11       ` Kirill Yukhin
  2013-12-18 13:00         ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-02 13:11 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 19 Nov 12:08, Kirill Yukhin wrote:
> Hello,
> On 15 Nov 20:06, Kirill Yukhin wrote:
> > Ping.
> Ping.
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst.
  2013-11-19  9:53     ` Kirill Yukhin
@ 2013-12-02 13:13       ` Kirill Yukhin
  2013-12-18 13:04         ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-02 13:13 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 19 Nov 12:12, Kirill Yukhin wrote:
> Hello,
> On 15 Nov 20:08, Kirill Yukhin wrote:
> > > Is it ok for trunk?
> > Ping.
> Ping.
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst.
  2013-11-19  9:58     ` Kirill Yukhin
@ 2013-12-02 13:13       ` Kirill Yukhin
  2013-12-18 13:02         ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-02 13:13 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 19 Nov 12:11, Kirill Yukhin wrote:
> Hello,
> On 15 Nov 20:07, Kirill Yukhin wrote:
> > > Is it ok for trunk?
> > Ping.
> Ping.
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.
  2013-11-19 10:06     ` Kirill Yukhin
@ 2013-12-02 13:14       ` Kirill Yukhin
  2013-12-18 13:06         ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-02 13:14 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 19 Nov 12:14, Kirill Yukhin wrote:
> Hello,
> On 15 Nov 20:09, Kirill Yukhin wrote:
> > > Is it ok for trunk?
> > Ping.
> Ping.
Ping.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.
  2013-12-02 13:11       ` Kirill Yukhin
@ 2013-12-18 13:00         ` Kirill Yukhin
  2013-12-23 16:11           ` Uros Bizjak
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-18 13:00 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,
On 02 Dec 16:09, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:08, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:06, Kirill Yukhin wrote:
> > > Ping.
> > Ping.
> Ping.
Ping.

Rebased patch in the bottom.

--
Thanks, K

---
 gcc/config/i386/i386.c   |  32 ++++
 gcc/config/i386/i386.md  |  10 ++
 gcc/config/i386/sse.md   | 457 +++++++++++++++++++++++++----------------------
 gcc/config/i386/subst.md |  41 +++++
 4 files changed, 326 insertions(+), 214 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ecf5e0b..a3dd307 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15041,6 +15041,38 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	    fputs ("{z}", file);
 	  return;
 
+	case 'R':
+	  gcc_assert (CONST_INT_P (x));
+
+	  if (ASSEMBLER_DIALECT == ASM_INTEL)
+	    fputs (", ", file);
+
+	  switch (INTVAL (x))
+	    {
+	    case ROUND_NEAREST_INT:
+	      fputs ("{rn-sae}", file);
+	      break;
+	    case ROUND_NEG_INF:
+	      fputs ("{rd-sae}", file);
+	      break;
+	    case ROUND_POS_INF:
+	      fputs ("{ru-sae}", file);
+	      break;
+	    case ROUND_ZERO:
+	      fputs ("{rz-sae}", file);
+	      break;
+	    case ROUND_SAE:
+	      fputs ("{sae}", file);
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+
+	  if (ASSEMBLER_DIALECT == ASM_ATT)
+	    fputs (", ", file);
+
+	  return;
+
 	case '*':
 	  if (ASSEMBLER_DIALECT == ASM_ATT)
 	    putc ('*', file);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ab5b33f..30b8d74 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -241,6 +241,16 @@
    (ROUND_NO_EXC		0x8)
   ])
 
+;; Constants to represent AVX512F embeded rounding
+(define_constants
+  [(ROUND_NEAREST_INT			0)
+   (ROUND_NEG_INF			1)
+   (ROUND_POS_INF			2)
+   (ROUND_ZERO				3)
+   (NO_ROUND				4)
+   (ROUND_SAE				5)
+  ])
+
 ;; Constants to represent pcomtrue/pcomfalse variants
 (define_constants
   [(PCOM_FALSE			0)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index adedf44..23edbd3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1229,23 +1229,23 @@
 }
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
-(define_expand "<plusminus_insn><mode>3<mask_name>"
+(define_expand "<plusminus_insn><mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(plusminus:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*<plusminus_insn><mode>3<mask_name>"
+(define_insn "*<plusminus_insn><mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(plusminus:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "<comm>0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition>"
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_constraint>")))]
+  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    <plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   v<plusminus_mnemonic><ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "prefix" "<mask_prefix3>")
@@ -1268,23 +1268,23 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_expand "mul<mode>3<mask_name>"
+(define_expand "mul<mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(mult:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
 
-(define_insn "*mul<mode>3<mask_name>"
+(define_insn "*mul<mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(mult:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_constraint>")))]
+  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    mul<ssemodesuffix>\t{%2, %0|%0, %2}
-   vmul<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   vmul<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssemul")
    (set_attr "prefix" "<mask_prefix3>")
@@ -1335,15 +1335,15 @@
     }
 })
 
-(define_insn "<sse>_div<mode>3<mask_name>"
+(define_insn "<sse>_div<mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(div:VF
 	  (match_operand:VF 1 "register_operand" "0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_constraint>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    div<ssemodesuffix>\t{%2, %0|%0, %2}
-   vdiv<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   vdiv<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssediv")
    (set_attr "prefix" "<mask_prefix3>")
@@ -1427,11 +1427,11 @@
     }
 })
 
-(define_insn "<sse>_sqrt<mode>2<mask_name>"
+(define_insn "<sse>_sqrt<mode>2<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
-	(sqrt:VF (match_operand:VF 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
-  "%vsqrt<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+	(sqrt:VF (match_operand:VF 1 "nonimmediate_operand" "<round_constraint>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "%vsqrt<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
    (set_attr "btver2_sse_attr" "sqrt")
@@ -2698,210 +2698,224 @@
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>"
+(define_expand "avx512f_fmadd_<mode>_maskz"
+  [(match_operand:VF_512 0 "register_operand")
+   (match_operand:VF_512 1 "nonimmediate_operand")
+   (match_operand:VF_512 2 "nonimmediate_operand")
+   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_fma_fmadd_<mode>_maskz_1 (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]));
+  DONE;
+})
+
+(define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
-	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  "<sd_mask_mode512bit_condition>"
-  "@
-   vfmadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x")
+	  (match_operand:FMAMODE 2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
+	  (match_operand:FMAMODE 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x")))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "@
+   vfmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmadd_<mode>_mask"
+(define_insn "avx512f_fmadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "0,0")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
-	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmadd_<mode>_mask3"
+(define_insn "avx512f_fmadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=x")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "x")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	    (match_operand:VF_512 3 "register_operand" "0"))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
-  "<sd_mask_mode512bit_condition>"
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x"))))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsub_<mode>_mask"
+(define_insn "avx512f_fmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "0,0")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
 	    (neg:VF_512
-	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsub_<mode>_mask3"
+(define_insn "avx512f_fmsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "v")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	    (neg:VF_512
 	      (match_operand:VF_512 3 "register_operand" "0")))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE   3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  "<sd_mask_mode512bit_condition>"
-  "@
-   vfnmadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfnmadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfnmadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
+	  (match_operand:FMAMODE   3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x")))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "@
+   vfnmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfnmadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfnmadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmadd_<mode>_mask"
+(define_insn "avx512f_fnmadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "0,0"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
-	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	    (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfnmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfnmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfnmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfnmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmadd_<mode>_mask3"
+(define_insn "avx512f_fnmadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "v"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	    (match_operand:VF_512 3 "register_operand" "0"))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfnmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfnmadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
-  "<sd_mask_mode512bit_condition>"
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x"))))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfnmsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfnmsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfnmsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfnmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfnmsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfnmsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmsub_<mode>_mask"
+(define_insn "avx512f_fnmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "0,0"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
 	    (neg:VF_512
-	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	      (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfnmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfnmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfnmsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfnmsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmsub_<mode>_mask3"
+(define_insn "avx512f_fnmsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "v"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	    (neg:VF_512
 	      (match_operand:VF_512 3 "register_operand" "0")))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfnmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfnmsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2940,109 +2954,109 @@
   DONE;
 })
 
-(define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
-	  [(match_operand:VF 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	   (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x")]
+	  [(match_operand:VF 1 "nonimmediate_operand" "%0,0,v,x,x")
+	   (match_operand:VF 2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
+	   (match_operand:VF 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x")]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition>"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfmaddsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmaddsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmaddsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmaddsub_<mode>_mask"
+(define_insn "avx512f_fmaddsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "0,0")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
-	     (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")]
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
+	     (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>")]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmaddsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmaddsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmaddsub_<mode>_mask3"
+(define_insn "avx512f_fmaddsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "v")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	     (match_operand:VF_512 3 "register_operand" "0")]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmaddsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmaddsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
-	  [(match_operand:VF   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  [(match_operand:VF   1 "nonimmediate_operand" "%0,0,v,x,x")
+	   (match_operand:VF   2 "nonimmediate_operand" "<round_constraint>,v,<round_constraint>,x,m")
 	   (neg:VF
-	     (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x"))]
+	     (match_operand:VF 3 "nonimmediate_operand" "v,<round_constraint>,0,xm,x"))]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition>"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmsubadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfmsubadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmsubadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmsubadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsubadd_<mode>_mask"
+(define_insn "avx512f_fmsubadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "0,0")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>,v")
 	     (neg:VF_512
-	       (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))]
+	       (match_operand:VF_512 3 "nonimmediate_operand" "v,<round_constraint>"))]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmsubadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmsubadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsubadd_<mode>_mask3"
+(define_insn "avx512f_fmsubadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "v")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")
 	     (neg:VF_512
 	       (match_operand:VF_512 3 "register_operand" "0"))]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmsubadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmsubadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -3050,7 +3064,22 @@
 ;; FMA3 floating point scalar intrinsics. These merge result with
 ;; high-order elements from the destination register.
 
-(define_expand "fmai_vmfmadd_<mode>"
+(define_expand "fmai_vmfmadd_<mode>_maskz<round_name>"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (vec_merge:VF_128
+	    (fma:VF_128
+	      (match_operand:VF_128 1 "nonimmediate_operand")
+	      (match_operand:VF_128 2 "nonimmediate_operand")
+	      (match_operand:VF_128 3 "nonimmediate_operand"))
+	    (match_dup <round_opnum>)
+	    (match_operand:QI 4 "register_operand"))
+	  (match_dup 1)
+	  (const_int 1)))]
+  "TARGET_AVX512F"
+  "operands[<round_opnum>] = CONST0_RTX (<MODE>mode);")
+
+(define_expand "fmai_vmfmadd_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand")
 	(vec_merge:VF_128
 	  (fma:VF_128
@@ -3081,51 +3110,51 @@
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   2 "nonimmediate_operand" "vm, v")
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128   2 "nonimmediate_operand" "<round_constraint>,v")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "nonimmediate_operand" " v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfmsub132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfmsub213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmadd_<mode>"
+(define_insn "*fmai_fnmadd_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   3 "nonimmediate_operand" " v,vm"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>,v"))
+	    (match_operand:VF_128   1 "nonimmediate_operand" "0,0")
+	    (match_operand:VF_128   3 "nonimmediate_operand" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfnmadd132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfnmadd213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmsub_<mode>"
+(define_insn "*fmai_fnmsub_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
+	      (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>, v"))
 	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "nonimmediate_operand" " v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfnmsub132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfnmsub213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
@@ -3246,18 +3275,18 @@
    (set_attr "prefix_rep" "0")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtsi2ss"
+(define_insn "sse_cvtsi2ss<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (float:SF (match_operand:SI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:SF (match_operand:SI 2 "nonimmediate_operand" "r,m,<round_constraint3>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    cvtsi2ss\t{%2, %0|%0, %2}
    cvtsi2ss\t{%2, %0|%0, %2}
-   vcvtsi2ss\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2ss\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "vector,double,*")
@@ -3267,18 +3296,18 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtsi2ssq"
+(define_insn "sse_cvtsi2ssq<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (float:SF (match_operand:DI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:SF (match_operand:DI 2 "nonimmediate_operand" "r,m,<round_constraint3>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE && TARGET_64BIT"
   "@
    cvtsi2ssq\t{%2, %0|%0, %2}
    cvtsi2ssq\t{%2, %0|%0, %2}
-   vcvtsi2ssq\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2ssq\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "vector,double,*")
@@ -3290,15 +3319,15 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtss2si"
+(define_insn "sse_cvtss2si<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(unspec:SI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE"
-  "%vcvtss2si\t{%1, %0|%0, %k1}"
+  "%vcvtss2si\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3320,15 +3349,15 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse_cvtss2siq"
+(define_insn "sse_cvtss2siq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(unspec:DI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE && TARGET_64BIT"
-  "%vcvtss2si{q}\t{%1, %0|%0, %k1}"
+  "%vcvtss2si{q}\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3382,50 +3411,50 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "cvtusi2<ssescalarmodesuffix>32"
+(define_insn "cvtusi2<ssescalarmodesuffix>32<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_duplicate:VF_128
 	    (unsigned_float:<ssescalarmode>
-	      (match_operand:SI 2 "nonimmediate_operand" "rm")))
+	      (match_operand:SI 2 "nonimmediate_operand" "<round_constraint3>")))
 	  (match_operand:VF_128 1 "register_operand" "v")
 	  (const_int 1)))]
-  "TARGET_AVX512F"
-  "vcvtusi2<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX512F && <round_modev4sf_condition>"
+  "vcvtusi2<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "cvtusi2<ssescalarmodesuffix>64"
+(define_insn "cvtusi2<ssescalarmodesuffix>64<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_duplicate:VF_128
 	    (unsigned_float:<ssescalarmode>
-	      (match_operand:DI 2 "nonimmediate_operand" "rm")))
+	      (match_operand:DI 2 "nonimmediate_operand" "<round_constraint3>")))
 	  (match_operand:VF_128 1 "register_operand" "v")
 	  (const_int 1)))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtusi2<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vcvtusi2<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "float<sseintvecmodelower><mode>2<mask_name>"
+(define_insn "float<sseintvecmodelower><mode>2<mask_name><round_name>"
   [(set (match_operand:VF1 0 "register_operand" "=v")
 	(float:VF1
-	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE2 && <mask_mode512bit_condition>"
-  "%vcvtdq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "<round_constraint>")))]
+  "TARGET_SSE2 && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "%vcvtdq2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "ufloatv16siv16sf2<mask_name>"
+(define_insn "ufloatv16siv16sf2<mask_name><round_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(unsigned_float:V16SF
-	  (match_operand:V16SI 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V16SI 1 "nonimmediate_operand" "<round_constraint>")))]
   "TARGET_AVX512F"
-  "vcvtudq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtudq2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -3460,24 +3489,24 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name>"
+(define_insn "<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name><round_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
-	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16SF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtps2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name>"
+(define_insn "<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name><round_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
-	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16SF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtps2udq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -3595,18 +3624,18 @@
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "DF")])
 
-(define_insn "sse2_cvtsi2sdq"
+(define_insn "sse2_cvtsi2sdq<round_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (vec_duplicate:V2DF
-	    (float:DF (match_operand:DI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:DF (match_operand:DI 2 "nonimmediate_operand" "r,m,<round_constraint3>")))
 	  (match_operand:V2DF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE2 && TARGET_64BIT"
   "@
    cvtsi2sdq\t{%2, %0|%0, %2}
    cvtsi2sdq\t{%2, %0|%0, %2}
-   vcvtsi2sdq\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2sdq\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,direct,*")
@@ -3617,28 +3646,28 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "DF")])
 
-(define_insn "avx512f_vcvtss2usi"
+(define_insn "avx512f_vcvtss2usi<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtss2usi\t{%1, %0|%0, %1}"
+  "vcvtss2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvtss2usiq"
+(define_insn "avx512f_vcvtss2usiq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec:DI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V4SF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtss2usi\t{%1, %0|%0, %1}"
+  "vcvtss2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -3667,28 +3696,28 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvtsd2usi"
+(define_insn "avx512f_vcvtsd2usi<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtsd2usi\t{%1, %0|%0, %1}"
+  "vcvtsd2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvtsd2usiq"
+(define_insn "avx512f_vcvtsd2usiq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec:DI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtsd2usi\t{%1, %0|%0, %1}"
+  "vcvtsd2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -3717,15 +3746,15 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse2_cvtsd2si"
+(define_insn "sse2_cvtsd2si<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(unspec:SI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2"
-  "%vcvtsd2si\t{%1, %0|%0, %q1}"
+  "%vcvtsd2si\t{<round_op2>%1, %0|%0, %q1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3748,15 +3777,15 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse2_cvtsd2siq"
+(define_insn "sse2_cvtsd2siq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(unspec:DI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2 && TARGET_64BIT"
-  "%vcvtsd2si{q}\t{%1, %0|%0, %q1}"
+  "%vcvtsd2si{q}\t{<round_op2>%1, %0|%0, %q1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3877,13 +3906,13 @@
    (set_attr "ssememalign" "64")
    (set_attr "mode" "V2DF")])
 
-(define_insn "<mask_codefor>avx512f_cvtpd2dq512<mask_name>"
+(define_insn "<mask_codefor>avx512f_cvtpd2dq512<mask_name><round_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(unspec:V8SI
-	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V8DF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtpd2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -3951,13 +3980,13 @@
    (set_attr "athlon_decode" "vector")
    (set_attr "bdver1_decode" "double")])
 
-(define_insn "avx512f_ufix_notruncv8dfv8si<mask_name>"
+(define_insn "avx512f_ufix_notruncv8dfv8si<mask_name><round_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(unspec:V8SI
-	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V8DF 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtpd2udq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -4073,12 +4102,12 @@
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "DF")])
 
-(define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name>"
+(define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>"
   [(set (match_operand:V8SF 0 "register_operand" "=v")
 	(float_truncate:V8SF
-	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V8DF 1 "nonimmediate_operand" "<round_constraint>")))]
   "TARGET_AVX512F"
-  "vcvtpd2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtpd2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8SF")])
@@ -6446,14 +6475,14 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
-(define_insn "avx512f_scalef<mode><mask_name>"
+(define_insn "avx512f_scalef<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "register_operand" "v")
-	   (match_operand:VF_512 2 "nonimmediate_operand" "vm")]
+	   (match_operand:VF_512 2 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_SCALEF))]
   "TARGET_AVX512F"
-  "%vscalef<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+  "%vscalef<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<MODE>")])
 
@@ -8187,22 +8216,22 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "<code><mode>3<mask_name>"
+(define_expand "<code><mode>3<mask_name><round_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand")
 	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand")))]
-  "TARGET_AVX2 && <mask_mode512bit_condition>"
+  "TARGET_AVX2 && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*avx2_<code><mode>3<mask_name>"
+(define_insn "*avx2_<code><mode>3<mask_name><round_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand" "=v")
 	(maxmin:VI124_256_48_512
 	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand" "%v")
-	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "vm")))]
+	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "<round_constraint>")))]
   "TARGET_AVX2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
-   && <mask_mode512bit_condition>"
-  "vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "vp<maxmin_int><ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "type" "sseiadd")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_evex")
@@ -12500,33 +12529,33 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "avx512er_exp2<mode><mask_name>"
+(define_insn "avx512er_exp2<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_EXP2))]
   "TARGET_AVX512ER"
-  "vexp2<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vexp2<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<mask_codefor>avx512er_rcp28<mode><mask_name>"
+(define_insn "<mask_codefor>avx512er_rcp28<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_RCP28))]
   "TARGET_AVX512ER"
-  "vrcp28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vrcp28<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<mask_codefor>avx512er_rsqrt28<mode><mask_name>"
+(define_insn "<mask_codefor>avx512er_rsqrt28<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_RSQRT28))]
   "TARGET_AVX512ER"
-  "vrsqrt28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vrsqrt28<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 594dc43..76c183c 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -30,6 +30,16 @@
 (define_mode_iterator SUBST_S
   [QI HI SI DI])
 
+(define_mode_iterator SUBST_A
+  [V16QI
+   V16HI V8HI
+   V16SI V8SI  V4SI
+   V8DI  V4DI  V2DI
+   V16SF V8SF  V4SF
+   V8DF  V4DF  V2DF
+   QI HI SI DI SF DF
+   CCFP CCFPU])
+
 (define_subst_attr "mask_name" "mask" "" "_mask")
 (define_subst_attr "mask_applied" "mask" "false" "true")
 (define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2")
@@ -87,3 +97,34 @@
 	 (match_operand:SUBST_V 2 "const0_operand" "C")
 	 (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))
 ])
+
+(define_subst_attr "round_name" "round" "" "_round")
+(define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4")
+(define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5")
+(define_subst_attr "round_mask_scalar_operand3" "mask_scalar" "%R3" "%R5")
+(define_subst_attr "round_sd_mask_operand4" "sd" "%R4" "%R6")
+(define_subst_attr "round_op2" "round" "" "%R2")
+(define_subst_attr "round_op3" "round" "" "%R3")
+(define_subst_attr "round_op4" "round" "" "%R4")
+(define_subst_attr "round_op5" "round" "" "%R5")
+(define_subst_attr "round_op6" "round" "" "%R6")
+(define_subst_attr "round_mask_op2" "round" "" "<round_mask_operand2>")
+(define_subst_attr "round_mask_op3" "round" "" "<round_mask_operand3>")
+(define_subst_attr "round_mask_scalar_op3" "round" "" "<round_mask_scalar_operand3>")
+(define_subst_attr "round_sd_mask_op4" "round" "" "<round_sd_mask_operand4>")
+(define_subst_attr "round_constraint" "round" "vm" "v")
+(define_subst_attr "round_constraint2" "round" "m" "v")
+(define_subst_attr "round_constraint3" "round" "rm" "r")
+(define_subst_attr "round_mode512bit_condition" "round" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) == V8DFmode)")
+(define_subst_attr "round_modev4sf_condition" "round" "1" "(GET_MODE (operands[0]) == V4SFmode)")
+(define_subst_attr "round_codefor" "round" "*" "")
+(define_subst_attr "round_opnum" "round" "5" "6")
+
+(define_subst "round"
+  [(set (match_operand:SUBST_A 0)
+        (match_operand:SUBST_A 1))]
+  "TARGET_AVX512F"
+  [(parallel[
+     (set (match_dup 0)
+          (match_dup 1))
+     (unspec [(match_operand:SI 2 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst.
  2013-12-02 13:13       ` Kirill Yukhin
@ 2013-12-18 13:02         ` Kirill Yukhin
  2013-12-23 16:41           ` Uros Bizjak
  2014-01-09 15:35           ` H.J. Lu
  0 siblings, 2 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-18 13:02 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,

On 02 Dec 16:10, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:11, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:07, Kirill Yukhin wrote:
> > > > Is it ok for trunk?
> > > Ping.
> > Ping.
> Ping.
Ping.

Rebased patch in the bottom.

--
Thanks, K

---
 gcc/config/i386/sse.md   | 168 +++++++++++++++++++++++------------------------
 gcc/config/i386/subst.md |  31 +++++++++
 2 files changed, 115 insertions(+), 84 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 23edbd3..5b10cec 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1523,45 +1523,45 @@
 ;; isn't really correct, as those rtl operators aren't defined when
 ;; applied to NaNs.  Hopefully the optimizers won't get too smart on us.
 
-(define_expand "<code><mode>3<mask_name>"
+(define_expand "<code><mode>3<mask_name><round_saeonly_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(smaxmin:VF
 	  (match_operand:VF 1 "nonimmediate_operand")
 	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
 {
   if (!flag_finite_math_only)
     operands[1] = force_reg (<MODE>mode, operands[1]);
   ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);
 })
 
-(define_insn "*<code><mode>3_finite<mask_name>"
+(define_insn "*<code><mode>3_finite<mask_name><round_saeonly_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(smaxmin:VF
 	  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>")))]
   "TARGET_SSE && flag_finite_math_only
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
-   && <mask_mode512bit_condition>"
+   && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{<round_saeonly_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_saeonly_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*<code><mode>3<mask_name>"
+(define_insn "*<code><mode>3<mask_name><round_saeonly_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(smaxmin:VF
 	  (match_operand:VF 1 "register_operand" "0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:VF 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>")))]
   "TARGET_SSE && !flag_finite_math_only
-   && <mask_mode512bit_condition>"
+   && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<maxmin_float><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   v<maxmin_float><ssemodesuffix>\t{<round_saeonly_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_saeonly_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
@@ -2099,15 +2099,15 @@
   [(V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand")
   (V16SI "const_0_to_7_operand") (V8DI "const_0_to_7_operand")])
 
-(define_insn "avx512f_cmp<mode>3<mask_scalar_merge_name>"
+(define_insn "avx512f_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
 	  [(match_operand:VI48F_512 1 "register_operand" "v")
-	   (match_operand:VI48F_512 2 "nonimmediate_operand" "vm")
+	   (match_operand:VI48F_512 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 3 "<cmp_imm_predicate>" "n")]
 	  UNSPEC_PCMP))]
-  "TARGET_AVX512F"
-  "v<sseintprefix>cmp<ssemodesuffix>\t{%3, %2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2, %3}"
+  "TARGET_AVX512F && <round_saeonly_mode512bit_condition_op1>"
+  "v<sseintprefix>cmp<ssemodesuffix>\t{%3, <round_saeonly_mask_scalar_merge_op4>%2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2<round_saeonly_mask_scalar_merge_op4>, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
@@ -2127,35 +2127,35 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_vmcmp<mode>3"
+(define_insn "avx512f_vmcmp<mode>3<round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(and:<avx512fmaskmode>
 	  (unspec:<avx512fmaskmode>
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
 	    UNSPEC_PCMP)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vcmp<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vcmp<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_vmcmp<mode>3_mask"
+(define_insn "avx512f_vmcmp<mode>3_mask<round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(and:<avx512fmaskmode>
 	  (unspec:<avx512fmaskmode>
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
 	    UNSPEC_PCMP)
 	  (and:<avx512fmaskmode>
 	    (match_operand:<avx512fmaskmode> 4 "register_operand" "k")
 	    (const_int 1))))]
   "TARGET_AVX512F"
-  "vcmp<ssescalarmodesuffix>\t{%3, %2, %1, %0%{%4%}|%0%{%4%}, %1, %2, %3}"
+  "vcmp<ssescalarmodesuffix>\t{%3, <round_saeonly_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_saeonly_op5>, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
@@ -2173,17 +2173,17 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<sse>_comi"
+(define_insn "<sse>_comi<round_saeonly_name>"
   [(set (reg:CCFP FLAGS_REG)
 	(compare:CCFP
 	  (vec_select:MODEF
 	    (match_operand:<ssevecmode> 0 "register_operand" "v")
 	    (parallel [(const_int 0)]))
 	  (vec_select:MODEF
-	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "vm")
+	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "SSE_FLOAT_MODE_P (<MODE>mode)"
-  "%vcomi<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "%vcomi<ssemodesuffix>\t{<round_saeonly_op2>%1, %0|%0, %<iptr>1<round_saeonly_op2>}"
   [(set_attr "type" "ssecomi")
    (set_attr "prefix" "maybe_vex")
    (set_attr "prefix_rep" "0")
@@ -2193,17 +2193,17 @@
 		      (const_string "0")))
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_ucomi"
+(define_insn "<sse>_ucomi<round_saeonly_name>"
   [(set (reg:CCFPU FLAGS_REG)
 	(compare:CCFPU
 	  (vec_select:MODEF
 	    (match_operand:<ssevecmode> 0 "register_operand" "v")
 	    (parallel [(const_int 0)]))
 	  (vec_select:MODEF
-	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "vm")
+	    (match_operand:<ssevecmode> 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "SSE_FLOAT_MODE_P (<MODE>mode)"
-  "%vucomi<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "%vucomi<ssemodesuffix>\t{<round_saeonly_op2>%1, %0|%0, %<iptr>1<round_saeonly_op2>}"
   [(set_attr "type" "ssecomi")
    (set_attr "prefix" "maybe_vex")
    (set_attr "prefix_rep" "0")
@@ -3379,14 +3379,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse_cvttss2si"
+(define_insn "sse_cvttss2si<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(fix:SI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_saeonly_constraint2>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE"
-  "%vcvttss2si\t{%1, %0|%0, %k1}"
+  "%vcvttss2si\t{<round_saeonly_op2>%1, %0|%0, %k1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3395,14 +3395,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse_cvttss2siq"
+(define_insn "sse_cvttss2siq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(fix:DI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "v,vm")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "v,<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE && TARGET_64BIT"
-  "%vcvttss2si{q}\t{%1, %0|%0, %k1}"
+  "%vcvttss2si{q}\t{<round_saeonly_op2>%1, %0|%0, %k1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3511,12 +3511,12 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "<fixsuffix>fix_truncv16sfv16si2<mask_name>"
+(define_insn "<fixsuffix>fix_truncv16sfv16si2<mask_name><round_saeonly_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_fix:V16SI
-	  (match_operand:V16SF 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V16SF 1 "nonimmediate_operand" "<round_saeonly_constraint>")))]
   "TARGET_AVX512F"
-  "vcvttps2<fixsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvttps2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -3672,26 +3672,26 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvttss2usi"
+(define_insn "avx512f_vcvttss2usi<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unsigned_fix:SI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
-  "vcvttss2usi\t{%1, %0|%0, %1}"
+  "vcvttss2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvttss2usiq"
+(define_insn "avx512f_vcvttss2usiq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unsigned_fix:DI
 	  (vec_select:SF
-	    (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V4SF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvttss2usi\t{%1, %0|%0, %1}"
+  "vcvttss2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -3722,26 +3722,26 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvttsd2usi"
+(define_insn "avx512f_vcvttsd2usi<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unsigned_fix:SI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
-  "vcvttsd2usi\t{%1, %0|%0, %1}"
+  "vcvttsd2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvttsd2usiq"
+(define_insn "avx512f_vcvttsd2usiq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unsigned_fix:DI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvttsd2usi\t{%1, %0|%0, %1}"
+  "vcvttsd2usi\t{<round_saeonly_op2>%1, %0|%0, %1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -3807,14 +3807,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse2_cvttsd2si"
+(define_insn "sse2_cvttsd2si<round_saeonly_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(fix:SI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_saeonly_constraint2>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE2"
-  "%vcvttsd2si\t{%1, %0|%0, %q1}"
+  "%vcvttsd2si\t{<round_saeonly_op2>%1, %0|%0, %q1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3824,14 +3824,14 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse2_cvttsd2siq"
+(define_insn "sse2_cvttsd2siq<round_saeonly_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(fix:DI
 	  (vec_select:DF
-	    (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	    (match_operand:V2DF 1 "nonimmediate_operand" "v,<round_saeonly_constraint2>")
 	    (parallel [(const_int 0)]))))]
   "TARGET_SSE2 && TARGET_64BIT"
-  "%vcvttsd2si{q}\t{%1, %0|%0, %q1}"
+  "%vcvttsd2si{q}\t{<round_saeonly_op2>%1, %0|%0, %q1<round_saeonly_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "amdfam10_decode" "double,double")
@@ -3991,12 +3991,12 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
 
-(define_insn "<fixsuffix>fix_truncv8dfv8si2<mask_name>"
+(define_insn "<fixsuffix>fix_truncv8dfv8si2<mask_name><round_saeonly_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(any_fix:V8SI
-	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V8DF 1 "nonimmediate_operand" "<round_saeonly_constraint>")))]
   "TARGET_AVX512F"
-  "vcvttpd2<fixsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvttpd2<fixsuffix>dq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -4157,12 +4157,12 @@
 (define_mode_attr sf2dfmode
   [(V8DF "V8SF") (V4DF "V4SF")])
 
-(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix><mask_name>"
+(define_insn "<sse2_avx_avx512f>_cvtps2pd<avxsizesuffix><mask_name><round_saeonly_name>"
   [(set (match_operand:VF2_512_256 0 "register_operand" "=v")
 	(float_extend:VF2_512_256
-	  (match_operand:<sf2dfmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_AVX && <mask_mode512bit_condition>"
-  "vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+	  (match_operand:<sf2dfmode> 1 "nonimmediate_operand" "<round_saeonly_constraint>")))]
+  "TARGET_AVX && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
+  "vcvtps2pd\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
@@ -6532,12 +6532,12 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getexp<mode><mask_name>"
+(define_insn "avx512f_getexp<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
-        (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+        (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "<round_saeonly_constraint>")]
                         UNSPEC_GETEXP))]
    "TARGET_AVX512F"
-   "vgetexp<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}";
+   "vgetexp<ssemodesuffix>\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<MODE>")])
 
@@ -6613,32 +6613,32 @@
   DONE;
 })
 
-(define_insn "avx512f_fixupimm<mode><sd_maskz_name>"
+(define_insn "avx512f_fixupimm<mode><sd_maskz_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
         (unspec:VF_512
           [(match_operand:VF_512 1 "register_operand" "0")
 	   (match_operand:VF_512 2 "register_operand" "v")
-           (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "vm")
+           (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
            (match_operand:SI 4 "const_0_to_255_operand")]
            UNSPEC_FIXUPIMM))]
   "TARGET_AVX512F"
-  "vfixupimm<ssemodesuffix>\t{%4, %3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3, %4}";
+  "vfixupimm<ssemodesuffix>\t{%4, <round_saeonly_sd_mask_op5>%3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3<round_saeonly_sd_mask_op5>, %4}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fixupimm<mode>_mask"
+(define_insn "avx512f_fixupimm<mode>_mask<round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
           (unspec:VF_512
             [(match_operand:VF_512 1 "register_operand" "0")
 	     (match_operand:VF_512 2 "register_operand" "v")
-             (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "vm")
+             (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
              (match_operand:SI 4 "const_0_to_255_operand")]
              UNSPEC_FIXUPIMM)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfixupimm<ssemodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}";
+  "vfixupimm<ssemodesuffix>\t{%4, <round_saeonly_op6>%3, %2, %0%{%5%}|%0%{%5%}, %2, %3<round_saeonly_op6>, %4}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
@@ -6657,30 +6657,30 @@
   DONE;
 })
 
-(define_insn "avx512f_sfixupimm<mode><sd_maskz_name>"
+(define_insn "avx512f_sfixupimm<mode><sd_maskz_name><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
           (unspec:VF_128
             [(match_operand:VF_128 1 "register_operand" "0")
 	     (match_operand:VF_128 2 "register_operand" "v")
-	     (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "vm")
+	     (match_operand:<sseintvecmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 4 "const_0_to_255_operand")]
 	    UNSPEC_FIXUPIMM)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vfixupimm<ssescalarmodesuffix>\t{%4, %3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3, %4}";
+   "vfixupimm<ssescalarmodesuffix>\t{%4, <round_saeonly_sd_mask_op5>%3, %2, %0<sd_mask_op5>|%0<sd_mask_op5>, %2, %3<round_saeonly_sd_mask_op5>, %4}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_sfixupimm<mode>_mask"
+(define_insn "avx512f_sfixupimm<mode>_mask<round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_merge:VF_128
 	    (unspec:VF_128
 	       [(match_operand:VF_128 1 "register_operand" "0")
 		(match_operand:VF_128 2 "register_operand" "v")
-		(match_operand:<sseintvecmode> 3 "nonimmediate_operand" "vm")
+		(match_operand:<sseintvecmode> 3 "nonimmediate_operand" "<round_saeonly_constraint>")
 		(match_operand:SI 4 "const_0_to_255_operand")]
 	       UNSPEC_FIXUPIMM)
 	    (match_dup 1)
@@ -6688,18 +6688,18 @@
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 5 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfixupimm<ssescalarmodesuffix>\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}";
+  "vfixupimm<ssescalarmodesuffix>\t{%4, <round_saeonly_op6>%3, %2, %0%{%5%}|%0%{%5%}, %2, %3<round_saeonly_op6>, %4}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "avx512f_rndscale<mode><mask_name>"
+(define_insn "avx512f_rndscale<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_AVX512F"
-  "vrndscale<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+  "vrndscale<ssemodesuffix>\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -14574,13 +14574,13 @@
    (set_attr "btver2_decode" "double")
    (set_attr "mode" "V8SF")])
 
-(define_insn "<mask_codefor>avx512f_vcvtph2ps512<mask_name>"
+(define_insn "<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(unspec:V16SF
-	  [(match_operand:V16HI 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16HI 1 "nonimmediate_operand" "<round_saeonly_constraint>")]
 	  UNSPEC_VCVTPH2PS))]
   "TARGET_AVX512F"
-  "vcvtph2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtph2ps\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -15079,14 +15079,14 @@
    (set_attr "memory" "none,load")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "avx512f_getmant<mode><mask_name>"
+(define_insn "avx512f_getmant<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
+	  [(match_operand:VF_512 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_GETMANT))]
   "TARGET_AVX512F"
-  "vgetmant<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}";
+  "vgetmant<ssemodesuffix>\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}";
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 76c183c..0887f14 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -128,3 +128,34 @@
      (set (match_dup 0)
           (match_dup 1))
      (unspec [(match_operand:SI 2 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])])
+
+(define_subst_attr "round_saeonly_name" "round_saeonly" "" "_round")
+(define_subst_attr "round_saeonly_mask_operand2" "mask" "%R2" "%R4")
+(define_subst_attr "round_saeonly_mask_operand3" "mask" "%R3" "%R5")
+(define_subst_attr "round_saeonly_mask_scalar_operand3" "mask_scalar" "%R3" "%R5")
+(define_subst_attr "round_saeonly_mask_scalar_operand4" "mask_scalar" "%R4" "%R6")
+(define_subst_attr "round_saeonly_mask_scalar_merge_operand4" "mask_scalar_merge" "%R4" "%R5")
+(define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%R5" "%R7")
+(define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%R2")
+(define_subst_attr "round_saeonly_op4" "round_saeonly" "" "%R4")
+(define_subst_attr "round_saeonly_op5" "round_saeonly" "" "%R5")
+(define_subst_attr "round_saeonly_op6" "round_saeonly" "" "%R6")
+(define_subst_attr "round_saeonly_mask_op2" "round_saeonly" "" "<round_saeonly_mask_operand2>")
+(define_subst_attr "round_saeonly_mask_op3" "round_saeonly" "" "<round_saeonly_mask_operand3>")
+(define_subst_attr "round_saeonly_mask_scalar_op3" "round_saeonly" "" "<round_saeonly_mask_scalar_operand3>")
+(define_subst_attr "round_saeonly_mask_scalar_op4" "round_saeonly" "" "<round_saeonly_mask_scalar_operand4>")
+(define_subst_attr "round_saeonly_mask_scalar_merge_op4" "round_saeonly" "" "<round_saeonly_mask_scalar_merge_operand4>")
+(define_subst_attr "round_saeonly_sd_mask_op5" "round_saeonly" "" "<round_saeonly_sd_mask_operand5>")
+(define_subst_attr "round_saeonly_constraint" "round_saeonly" "vm" "v")
+(define_subst_attr "round_saeonly_constraint2" "round_saeonly" "m" "v")
+(define_subst_attr "round_saeonly_mode512bit_condition" "round_saeonly" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) == V8DFmode)")
+(define_subst_attr "round_saeonly_mode512bit_condition_op1" "round_saeonly" "1" "(GET_MODE (operands[1]) == V16SFmode || GET_MODE (operands[1]) == V8DFmode)")
+
+(define_subst "round_saeonly"
+  [(set (match_operand:SUBST_A 0)
+        (match_operand:SUBST_A 1))]
+  "TARGET_AVX512F"
+  [(parallel[
+     (set (match_dup 0)
+          (match_dup 1))
+     (unspec [(match_operand:SI 2 "const_4_to_5_operand")] UNSPEC_EMBEDDED_ROUNDING)])])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst.
  2013-12-02 13:13       ` Kirill Yukhin
@ 2013-12-18 13:04         ` Kirill Yukhin
  2013-12-23 16:46           ` Uros Bizjak
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-18 13:04 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,

On 02 Dec 16:11, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:12, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:08, Kirill Yukhin wrote:
> > > > Is it ok for trunk?
> > > Ping.
> > Ping.
> Ping.
Ping.

Rebased patch in the bottom.

--
Thanks, K

---
 gcc/config/i386/sse.md   | 24 ++++++++++++------------
 gcc/config/i386/subst.md | 18 ++++++++++++++++++
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5b10cec..e15e1b1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2698,17 +2698,17 @@
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_expand "avx512f_fmadd_<mode>_maskz"
+(define_expand "avx512f_fmadd_<mode>_maskz<round_expand_name>"
   [(match_operand:VF_512 0 "register_operand")
-   (match_operand:VF_512 1 "nonimmediate_operand")
-   (match_operand:VF_512 2 "nonimmediate_operand")
-   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:VF_512 1 "<round_expand_predicate>")
+   (match_operand:VF_512 2 "<round_expand_predicate>")
+   (match_operand:VF_512 3 "<round_expand_predicate>")
    (match_operand:<avx512fmaskmode> 4 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_fma_fmadd_<mode>_maskz_1 (
+  emit_insn (gen_fma_fmadd_<mode>_maskz_1<round_expand_name> (
     operands[0], operands[1], operands[2], operands[3],
-    CONST0_RTX (<MODE>mode), operands[4]));
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
   DONE;
 })
 
@@ -2940,17 +2940,17 @@
 	  UNSPEC_FMADDSUB))]
   "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
 
-(define_expand "avx512f_fmaddsub_<mode>_maskz"
+(define_expand "avx512f_fmaddsub_<mode>_maskz<round_expand_name>"
   [(match_operand:VF_512 0 "register_operand")
-   (match_operand:VF_512 1 "nonimmediate_operand")
-   (match_operand:VF_512 2 "nonimmediate_operand")
-   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:VF_512 1 "<round_expand_predicate>")
+   (match_operand:VF_512 2 "<round_expand_predicate>")
+   (match_operand:VF_512 3 "<round_expand_predicate>")
    (match_operand:<avx512fmaskmode> 4 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_fma_fmaddsub_<mode>_maskz_1 (
+  emit_insn (gen_fma_fmaddsub_<mode>_maskz_1<round_expand_name> (
     operands[0], operands[1], operands[2], operands[3],
-    CONST0_RTX (<MODE>mode), operands[4]));
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
   DONE;
 })
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 0887f14..c41da4a 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -159,3 +159,21 @@
      (set (match_dup 0)
           (match_dup 1))
      (unspec [(match_operand:SI 2 "const_4_to_5_operand")] UNSPEC_EMBEDDED_ROUNDING)])])
+
+(define_subst_attr "round_expand_name" "round_expand" "" "_round")
+(define_subst_attr "round_expand_predicate" "round_expand" "nonimmediate_operand" "register_operand")
+(define_subst_attr "round_expand_operand" "round_expand" "" ", operands[5]")
+
+(define_subst "round_expand"
+ [(match_operand:SUBST_V 0)
+  (match_operand:SUBST_V 1)
+  (match_operand:SUBST_V 2)
+  (match_operand:SUBST_V 3)
+  (match_operand:SUBST_S 4)]
+  "TARGET_AVX512F"
+  [(match_dup 0)
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)
+   (match_dup 4)
+   (unspec [(match_operand:SI 5 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.
  2013-12-02 13:14       ` Kirill Yukhin
@ 2013-12-18 13:06         ` Kirill Yukhin
  2013-12-18 13:17           ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-18 13:06 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

Hello,

On 02 Dec 16:11, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:14, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:09, Kirill Yukhin wrote:
> > > > Is it ok for trunk?
> > > Ping.
> > Ping.
> Ping.
Ping.

Rebased patch in the bottom.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.
  2013-12-18 13:06         ` Kirill Yukhin
@ 2013-12-18 13:17           ` Kirill Yukhin
  2013-12-23 16:52             ` Uros Bizjak
  0 siblings, 1 reply; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-18 13:17 UTC (permalink / raw)
  To: rth, Uros Bizjak, Jakub Jelinek, law; +Cc: GCC Patches

> Rebased patch in the bottom.
Adding the patch.

--
Thanks, K

---
 gcc/config/i386/sse.md   | 18 ++++++++++--------
 gcc/config/i386/subst.md | 20 ++++++++++++++++++++
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e15e1b1..8620541 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6598,18 +6598,19 @@
 })
 
 
-(define_expand "avx512f_fixupimm<mode>_maskz"
+(define_expand "avx512f_fixupimm<mode>_maskz<round_saeonly_expand_name5>"
   [(match_operand:VF_512 0 "register_operand")
    (match_operand:VF_512 1 "register_operand")
    (match_operand:VF_512 2 "register_operand")
-   (match_operand:<sseintvecmode> 3 "nonimmediate_operand")
+   (match_operand:<sseintvecmode> 3 "<round_saeonly_expand_predicate5>")
    (match_operand:SI 4 "const_0_to_255_operand")
    (match_operand:<avx512fmaskmode> 5 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_avx512f_fixupimm<mode>_maskz_1 (
+  emit_insn (gen_avx512f_fixupimm<mode>_maskz_1<round_saeonly_expand_name5> (
 	operands[0], operands[1], operands[2], operands[3],
-	operands[4], CONST0_RTX (<MODE>mode), operands[5]));
+	operands[4], CONST0_RTX (<MODE>mode), operands[5]
+	<round_saeonly_expand_operand6>));
   DONE;
 })
 
@@ -6642,18 +6643,19 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_expand "avx512f_sfixupimm<mode>_maskz"
+(define_expand "avx512f_sfixupimm<mode>_maskz<round_saeonly_expand_name5>"
   [(match_operand:VF_128 0 "register_operand")
    (match_operand:VF_128 1 "register_operand")
    (match_operand:VF_128 2 "register_operand")
-   (match_operand:<sseintvecmode> 3 "nonimmediate_operand")
+   (match_operand:<sseintvecmode> 3 "<round_saeonly_expand_predicate5>")
    (match_operand:SI 4 "const_0_to_255_operand")
    (match_operand:<avx512fmaskmode> 5 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_avx512f_sfixupimm<mode>_maskz_1 (
+  emit_insn (gen_avx512f_sfixupimm<mode>_maskz_1<round_saeonly_expand_name5> (
 	operands[0], operands[1], operands[2], operands[3],
-	operands[4], CONST0_RTX (<MODE>mode), operands[5]));
+	operands[4], CONST0_RTX (<MODE>mode), operands[5]
+	<round_saeonly_expand_operand6>));
   DONE;
 })
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index c41da4a..11f2e5f 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -177,3 +177,23 @@
    (match_dup 3)
    (match_dup 4)
    (unspec [(match_operand:SI 5 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])
+
+(define_subst_attr "round_saeonly_expand_name5" "round_saeonly_expand5" "" "_round")
+(define_subst_attr "round_saeonly_expand_predicate5" "round_saeonly_expand5" "nonimmediate_operand" "register_operand")
+(define_subst_attr "round_saeonly_expand_operand6" "round_saeonly_expand5" "" ", operands[6]")
+
+(define_subst "round_saeonly_expand5"
+ [(match_operand:SUBST_V 0)
+  (match_operand:SUBST_V 1)
+  (match_operand:SUBST_V 2)
+  (match_operand:SUBST_A 3)
+  (match_operand:SI 4)
+  (match_operand:SUBST_S 5)]
+  "TARGET_AVX512F"
+  [(match_dup 0)
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)
+   (match_dup 4)
+   (match_dup 5)
+   (unspec [(match_operand:SI 6 "const_4_to_5_operand")] UNSPEC_EMBEDDED_ROUNDING)])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.
  2013-12-18 13:00         ` Kirill Yukhin
@ 2013-12-23 16:11           ` Uros Bizjak
  2013-12-23 16:26             ` Uros Bizjak
  0 siblings, 1 reply; 63+ messages in thread
From: Uros Bizjak @ 2013-12-23 16:11 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Richard Henderson, Jakub Jelinek, Jeff Law, GCC Patches

On Wed, Dec 18, 2013 at 2:00 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
> On 02 Dec 16:09, Kirill Yukhin wrote:
>> Hello,
>> On 19 Nov 12:08, Kirill Yukhin wrote:
>> > Hello,
>> > On 15 Nov 20:06, Kirill Yukhin wrote:
>> > > Ping.
>> > Ping.
>> Ping.
> Ping.
>
> Rebased patch in the bottom.

At the end of the day, the patch looks fairly mechanical, adding
extensions to insn templates in a consistent way. The approach with
define_subst is already approved and used throughout the .md files.

I have reviewed the patch, and didn't find any obvious mistakes - and
there is a huge testsuite to find non-obvious ones, so I'm confident
enough to approve the patch.

So, OK for mainline, but I would kindly ask you to please wait a
couple of days for possible Richard's comments

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.
  2013-12-23 16:11           ` Uros Bizjak
@ 2013-12-23 16:26             ` Uros Bizjak
  2013-12-26 13:10               ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Uros Bizjak @ 2013-12-23 16:26 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Richard Henderson, Jakub Jelinek, Jeff Law, GCC Patches

On Mon, Dec 23, 2013 at 5:11 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, Dec 18, 2013 at 2:00 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> Hello,
>> On 02 Dec 16:09, Kirill Yukhin wrote:
>>> Hello,
>>> On 19 Nov 12:08, Kirill Yukhin wrote:
>>> > Hello,
>>> > On 15 Nov 20:06, Kirill Yukhin wrote:
>>> > > Ping.
>>> > Ping.
>>> Ping.
>> Ping.
>>
>> Rebased patch in the bottom.
>
> At the end of the day, the patch looks fairly mechanical, adding
> extensions to insn templates in a consistent way. The approach with
> define_subst is already approved and used throughout the .md files.
>
> I have reviewed the patch, and didn't find any obvious mistakes - and
> there is a huge testsuite to find non-obvious ones, so I'm confident
> enough to approve the patch.
>
> So, OK for mainline, but I would kindly ask you to please wait a
> couple of days for possible Richard's comments

There is one issue:

+(define_subst_attr "round_constraint" "round" "vm" "v")
+(define_subst_attr "round_constraint2" "round" "m" "v")
+(define_subst_attr "round_constraint3" "round" "rm" "r")

When substituting constraints, please also substitute corresponding
operand predicate:

nonimmediate_operand -> register_operand in 1st and 3rd case
memory_operand -> register_operand in 2nd case.

When you allow e.g. nonimmediate_operand in predicate, but only
register in operand constraint, reload will resolve it, however -
memory load will remain in the loop even if it is invariant. There is
no pass to hoist invariant loads after reload.

Uros.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst.
  2013-12-18 13:02         ` Kirill Yukhin
@ 2013-12-23 16:41           ` Uros Bizjak
  2014-01-09 15:35           ` H.J. Lu
  1 sibling, 0 replies; 63+ messages in thread
From: Uros Bizjak @ 2013-12-23 16:41 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Richard Henderson, Jakub Jelinek, Jeff Law, GCC Patches

On Wed, Dec 18, 2013 at 2:02 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
>
> On 02 Dec 16:10, Kirill Yukhin wrote:
>> Hello,
>> On 19 Nov 12:11, Kirill Yukhin wrote:
>> > Hello,
>> > On 15 Nov 20:07, Kirill Yukhin wrote:
>> > > > Is it ok for trunk?
>> > > Ping.
>> > Ping.
>> Ping.
> Ping.
>
> Rebased patch in the bottom.

+(define_subst_attr "round_saeonly_constraint" "round_saeonly" "vm" "v")
+(define_subst_attr "round_saeonly_constraint2" "round_saeonly" "m" "v")

The same comment as in previous patch. Please introduce corresponding
predicate substitution that will follow constraint changes.

+(define_subst_attr "round_saeonly_mode512bit_condition"
"round_saeonly" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE
(operands[0]) == V8DFmode)")
+(define_subst_attr "round_saeonly_mode512bit_condition_op1"
"round_saeonly" "1" "(GET_MODE (operands[1]) == V16SFmode || GET_MODE
(operands[1]) == V8DFmode)")

Use "<MODE>mode == ..." static checks in above conditions.

The patch is OK for mainline with these changes.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst.
  2013-12-18 13:04         ` Kirill Yukhin
@ 2013-12-23 16:46           ` Uros Bizjak
  2013-12-26  9:48             ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Uros Bizjak @ 2013-12-23 16:46 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Richard Henderson, Jakub Jelinek, Jeff Law, GCC Patches

On Wed, Dec 18, 2013 at 2:04 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
>
> On 02 Dec 16:11, Kirill Yukhin wrote:
>> Hello,
>> On 19 Nov 12:12, Kirill Yukhin wrote:
>> > Hello,
>> > On 15 Nov 20:08, Kirill Yukhin wrote:
>> > > > Is it ok for trunk?
>> > > Ping.
>> > Ping.
>> Ping.
> Ping.
>
> Rebased patch in the bottom.

This "round_expand_predicate" is the predicate substitution I was
referred to in the review of 5/8. Please use it also in insn patterns,
perhaps renamed as "round_predicate", as it is not exclusive to
expanders. As mentioned, predicates should mirror constraints as close
as possible.

OK with these changes,
Uros.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.
  2013-12-18 13:17           ` Kirill Yukhin
@ 2013-12-23 16:52             ` Uros Bizjak
  0 siblings, 0 replies; 63+ messages in thread
From: Uros Bizjak @ 2013-12-23 16:52 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Richard Henderson, Jakub Jelinek, Jeff Law, GCC Patches

On Wed, Dec 18, 2013 at 2:16 PM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> Rebased patch in the bottom.
> Adding the patch.

The same comment as in 7/8 applies here. The predicate is not
exclusive to expanders, should also be used in insn patterns. The name
of the predicate is a bit weird, please name it simply
"round_saeonly_predicate".

OK with these changes.

Uros.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
  2013-12-02 13:10           ` Kirill Yukhin
@ 2013-12-24  4:56             ` Kirill Yukhin
  2013-12-24  4:59               ` Kirill Yukhin
       [not found]               ` <CAGs3RfukBMLGgmzR8+EFP2ok4mfXXz1DY88ghr-b_u=TbcSkdA@mail.gmail.com>
  0 siblings, 2 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-24  4:56 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson; +Cc: GCC Patches, Jakub Jelinek, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

Hello,
On 02 Dec 16:07, Kirill Yukhin wrote:
> Hello,
>
> On 19 Nov 12:05, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:03, Kirill Yukhin wrote:
> > > Ping?
> > Ping?
> Ping?
>
> --
> Thanks, K

It seems that we have no solution of how scalar instructions
with EVEX's masking should look like. It seems that nested
vec_merge is not very welcome.

So, in order not to miss 4.9 I've stripped out `mask_scalar`
subst from the patch. It now contains only AVX-512 scalar
instructions, intrinsics and tests.

Tested as mentioned in the top post.

Patch attached.

Ok for trunk?

--
Thanks, K

[-- Attachment #2: tmp --]
[-- Type: application/octet-stream, Size: 142449 bytes --]

commit 0de457ff5ac90e555229a7ef87804a62a376c5b4
Author: Kirill Yukhin <kirill.yukhin@intel.com>
Date:   Tue Dec 17 15:39:22 2013 +0300

    [AVX-512] Scalar instructions support. Without EVEX's masking feature.

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index f717d46..d6dd438 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -1279,6 +1279,57 @@ _mm512_maskz_sra_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_add_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_addsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_add_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_addss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sub_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_subsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sub_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_subss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+#else
+#define _mm_add_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_addsd_round(A, B, C)
+
+#define _mm_add_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_addss_round(A, B, C)
+
+#define _mm_sub_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_subsd_round(A, B, C)
+
+#define _mm_sub_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_subss_round(A, B, C)
+#endif
+
+#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_ternarylogic_epi64 (__m512i __A, __m512i __B, __m512i __C, const int imm)
@@ -1424,6 +1475,22 @@ _mm512_maskz_rcp14_ps (__mmask16 __U, __m512 __A)
 						  (__mmask16) __U);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rcp14_sd (__m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_rcp14sd ((__v2df) __A,
+					   (__v2df) __B);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rcp14_ss (__m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_rcp14ss ((__v4sf) __A,
+					  (__v4sf) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_rsqrt14_pd (__m512d __A)
@@ -1482,6 +1549,22 @@ _mm512_maskz_rsqrt14_ps (__mmask16 __U, __m512 __A)
 						    (__mmask16) __U);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rsqrt14_sd (__m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_rsqrt14sd ((__v2df) __A,
+					     (__v2df) __B);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rsqrt14_ss (__m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_rsqrt14ss ((__v4sf) __A,
+					    (__v4sf) __B);
+}
+
 #ifdef __OPTIMIZE__
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -1542,6 +1625,23 @@ _mm512_maskz_sqrt_round_ps (__mmask16 __U, __m512 __A, const int __R)
 						 (__mmask16) __U, __R);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sqrt_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_sqrtsd_round ((__v2df) __B,
+						(__v2df) __A,
+						__R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sqrt_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_sqrtss_round ((__v4sf) __B,
+					       (__v4sf) __A,
+					       __R);
+}
 #else
 #define _mm512_sqrt_round_pd(A, C)            \
     (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_setzero_pd(), -1, C)
@@ -1560,6 +1660,12 @@ _mm512_maskz_sqrt_round_ps (__mmask16 __U, __m512 __A, const int __R)
 
 #define _mm512_maskz_sqrt_round_ps(U, A, C)   \
     (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#define _mm_sqrt_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_sqrtsd_round(A, B, C)
+
+#define _mm_sqrt_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_sqrtss_round(A, B, C)
 #endif
 
 extern __inline __m512i
@@ -2159,6 +2265,42 @@ _mm512_maskz_div_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
 						(__mmask16) __U, __R);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_mulsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_mulss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_div_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_divsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_div_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_divss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
 #else
 #define _mm512_mul_round_pd(A, B, C)            \
     (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), -1, C)
@@ -2195,6 +2337,24 @@ _mm512_maskz_div_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
 
 #define _mm512_maskz_div_round_ps(U, A, B, C)   \
     (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#define _mm_mul_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_mulsd_round(A, B, C)
+
+#define _mm_mul_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_mulss_round(A, B, C)
+
+#define _mm_div_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_divsd_round(A, B, C)
+
+#define _mm_div_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_divss_round(A, B, C)
+
+#define _mm_mask_div_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_divss_mask(A, B, W, U, C)
+
+#define _mm_maskz_div_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_divss_mask(A, B, (__v4sf)_mm_setzero_ps(), U, C)
 #endif
 
 #ifdef __OPTIMIZE__
@@ -2438,6 +2598,23 @@ _mm512_maskz_scalef_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
 						   (__mmask16) __U, __R);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_scalefsd_round ((__v2df) __A,
+						  (__v2df) __B,
+						  __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_scalefss_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 __R);
+}
 #else
 #define _mm512_scalef_round_pd(A, B, C)            \
     (__m512d)__builtin_ia32_scalefpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), -1, C)
@@ -2456,6 +2633,12 @@ _mm512_maskz_scalef_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
 
 #define _mm512_maskz_scalef_round_ps(U, A, B, C)   \
     (__m512)__builtin_ia32_scalefps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#define _mm_scalef_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_scalefsd_round(A, B, C)
+
+#define _mm_scalef_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_scalefss_round(A, B, C)
 #endif
 
 #ifdef __OPTIMIZE__
@@ -7578,6 +7761,23 @@ _mm512_maskz_cvt_roundpd_ps (__mmask8 __U, __m512d __A, const int __R)
 						   (__mmask8) __U, __R);
 }
 
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundsd_ss (__m128 __A, __m128d __B, const int __R)
+{
+  return (__m128) __builtin_ia32_cvtsd2ss_round ((__v4sf) __A,
+						 (__v2df) __B,
+						 __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundss_sd (__m128d __A, __m128 __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_cvtss2sd_round ((__v2df) __A,
+						  (__v4sf) __B,
+						  __R);
+}
 #else
 #define _mm512_cvt_roundpd_ps(A, B)		 \
     (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_setzero_ps(), -1, B)
@@ -7587,6 +7787,12 @@ _mm512_maskz_cvt_roundpd_ps (__mmask8 __U, __m512d __A, const int __R)
 
 #define _mm512_maskz_cvt_roundpd_ps(U, A, B)     \
     (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_setzero_ps(), U, B)
+
+#define _mm_cvt_roundsd_ss(A, B, C)		 \
+    (__m128)__builtin_ia32_cvtsd2ss_round(A, B, C)
+
+#define _mm_cvt_roundss_sd(A, B, C)		 \
+    (__m128d)__builtin_ia32_cvtss2sd_round(A, B, C)
 #endif
 
 extern __inline void
@@ -7611,6 +7817,24 @@ _mm512_stream_pd (double *__P, __m512d __A)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
+						    (__v4sf) __B,
+						    __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
+						     (__v2df) __B,
+						     __R);
+}
+
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_getexp_round_ps (__m512 __A, const int __R)
@@ -7759,6 +7983,30 @@ _mm512_maskz_getmant_round_ps (__mmask16 __U, __m512 __A,
 						    __U, __R);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_round_sd (__m128d __A, __m128d __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+  return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
+						  (__v2df) __B,
+						  (__D << 2) | __C,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_round_ss (__m128 __A, __m128 __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+  return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__D << 2) | __C,
+						  __R);
+}
+
 #else
 #define _mm512_getmant_round_pd(X, B, C, R)                                                  \
   ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
@@ -7800,6 +8048,24 @@ _mm512_maskz_getmant_round_ps (__mmask16 __U, __m512 __A,
                                              (__v16sf)(__m512)_mm512_setzero_ps(),  \
                                              (__mmask16)(U),\
 					     (R)))
+#define _mm_getmant_round_sd(X, Y, C, D, R)                                                  \
+  ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X),                    \
+					    (__v2df)(__m128d)(Y),	\
+					    (int)(((D)<<2) | (C)),	\
+					    (R)))
+
+#define _mm_getmant_round_ss(X, Y, C, D, R)                                                  \
+  ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X),                      \
+					   (__v4sf)(__m128)(Y),		\
+					   (int)(((D)<<2) | (C)),	\
+					   (R)))
+
+#define _mm_getexp_round_ss(A, B, R)						      \
+  ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), R))
+
+#define _mm_getexp_round_sd(A, B, R)						       \
+  ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B), R))
+
 #define _mm512_getexp_round_ps(A, R)						\
   ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
   (__v16sf)_mm512_setzero_ps(), (__mmask16)-1, R))
@@ -7885,6 +8151,24 @@ _mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B,
 						   _mm512_setzero_pd (),
 						   (__mmask8) __A, __R);
 }
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm, const int __R)
+{
+  return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A,
+						   (__v4sf) __B, __imm, __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm,
+			 const int __R)
+{
+  return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A,
+						    (__v2df) __B, __imm, __R);
+}
+
 #else
 #define _mm512_roundscale_round_ps(A, B, R) \
   ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
@@ -7912,6 +8196,12 @@ _mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B,
 					     (int)(C),			\
 					     (__v8df)_mm512_setzero_pd(),\
 					     (__mmask8)(A), R))
+#define _mm_roundscale_round_ss(A, B, C, R)					\
+  ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), R))
+#define _mm_roundscale_round_sd(A, B, C, R)					\
+  ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A),	\
+    (__v2df)(__m128d)(B), (int)(C), R))
 #endif
 
 extern __inline __m512
@@ -9825,6 +10115,57 @@ _mm512_maskz_unpacklo_ps (__mmask16 __U, __m512 __A, __m512 __B)
 						   (__mmask16) __U);
 }
 
+#ifdef __OPTIMIZE__
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_maxsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_maxss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_minsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_minss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+#else
+#define _mm_max_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_addsd_round(A, B, C)
+
+#define _mm_max_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_addss_round(A, B, C)
+
+#define _mm_min_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_subsd_round(A, B, C)
+
+#define _mm_min_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_subss_round(A, B, C)
+#endif
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_blend_pd (__mmask8 __U, __m512d __A, __m512d __W)
@@ -9862,6 +10203,112 @@ _mm512_mask_blend_epi32 (__mmask16 __U, __m512i __A, __m512i __W)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   (__v2df) __A,
+						   (__v2df) __B,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  (__v4sf) __A,
+						  (__v4sf) __B,
+						  __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   (__v2df) __A,
+						   -(__v2df) __B,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  (__v4sf) __A,
+						  -(__v4sf) __B,
+						  __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   -(__v2df) __A,
+						   (__v2df) __B,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  (__v4sf) __B,
+						  __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   -(__v2df) __A,
+						   -(__v2df) __B,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  -(__v4sf) __B,
+						  __R);
+}
+#else
+#define _mm_fmadd_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, C, R)
+
+#define _mm_fmadd_round_ss(A, B, C, R)            \
+    (__m128)__builtin_ia32_vfmaddss3_round(A, B, C, R)
+
+#define _mm_fmsub_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, -(C), R)
+
+#define _mm_fmsub_round_ss(A, B, C, R)            \
+    (__m128)__builtin_ia32_vfmaddss3_round(A, B, -(C), R)
+
+#define _mm_fnmadd_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), C, R)
+
+#define _mm_fnmadd_round_ss(A, B, C, R)            \
+   (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), C, R)
+
+#define _mm_fnmsub_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), -(C), R)
+
+#define _mm_fnmsub_round_ss(A, B, C, R)            \
+    (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), -(C), R)
+#endif
+
+#ifdef __OPTIMIZE__
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_comi_round_ss (__m128 __A, __m128 __B, const int __P, const int __R)
@@ -10436,6 +10883,24 @@ _mm512_maskz_scalef_ps (__mmask16 __U, __m512 __A, __m512 __B)
 						   _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_sd (__m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_scalefsd_round ((__v2df) __A,
+						  (__v2df) __B,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_ss (__m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_scalefss_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 _MM_FROUND_CUR_DIRECTION);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fmadd_pd (__m512d __A, __m512d __B, __m512d __C)
@@ -11784,6 +12249,24 @@ _mm512_maskz_getexp_pd (__mmask8 __U, __m512d __A)
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_ss (__m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
+						    (__v4sf) __B,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_sd (__m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
+						     (__v2df) __B,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_getmant_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B,
@@ -11856,6 +12339,28 @@ _mm512_maskz_getmant_ps (__mmask16 __U, __m512 __A,
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_sd (__m128d __A, __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
+		_MM_MANTISSA_SIGN_ENUM __D)
+{
+  return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
+						   (__v2df) __B,
+						   (__D << 2) | __C,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_ss (__m128 __A, __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
+		_MM_MANTISSA_SIGN_ENUM __D)
+{
+  return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__D << 2) | __C,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
 #else
 #define _mm512_getmant_pd(X, B, C)                                                  \
   ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
@@ -11897,6 +12402,26 @@ _mm512_maskz_getmant_ps (__mmask16 __U, __m512 __A,
                                              (__v16sf)(__m512)_mm512_setzero_ps(),  \
                                              (__mmask16)(U),\
 					     _MM_FROUND_CUR_DIRECTION))
+#define _mm_getmant_sd(X, Y, C, D)                                                  \
+  ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X),                    \
+                                           (__v2df)(__m128d)(Y),                    \
+                                           (int)(((D)<<2) | (C)),                   \
+					   _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_getmant_ss(X, Y, C, D)                                                  \
+  ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X),                      \
+                                          (__v4sf)(__m128)(Y),                      \
+                                          (int)(((D)<<2) | (C)),                    \
+					  _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_getexp_ss(A, B)						      \
+  ((__m128)__builtin_ia32_getexpss128_mask((__v4sf)(__m128)(A), (__v4sf)(__m128)(B),  \
+					   _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_getexp_sd(A, B)						       \
+  ((__m128d)__builtin_ia32_getexpsd128_mask((__v2df)(__m128d)(A), (__v2df)(__m128d)(B),\
+					    _MM_FROUND_CUR_DIRECTION))
+
 #define _mm512_getexp_ps(A)						\
   ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
   (__v16sf)_mm512_setzero_ps(), (__mmask16)-1, _MM_FROUND_CUR_DIRECTION))
@@ -11987,6 +12512,24 @@ _mm512_maskz_roundscale_pd (__mmask8 __A, __m512d __B, const int __imm)
 						   _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_ss (__m128 __A, __m128 __B, const int __imm)
+{
+  return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A,
+						   (__v4sf) __B, __imm,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm)
+{
+  return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A,
+						    (__v2df) __B, __imm,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
 #else
 #define _mm512_roundscale_ps(A, B) \
   ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
@@ -12014,6 +12557,12 @@ _mm512_maskz_roundscale_pd (__mmask8 __A, __m512d __B, const int __imm)
 					     (int)(C),			\
 					     (__v8df)_mm512_setzero_pd(),\
 					     (__mmask8)(A), _MM_FROUND_CUR_DIRECTION))
+#define _mm_roundscale_ss(A, B, C)					\
+  ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A),	\
+  (__v4sf)(__m128)(B), (int)(C), _MM_FROUND_CUR_DIRECTION))
+#define _mm_roundscale_sd(A, B, C)					\
+  ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A),	\
+    (__v2df)(__m128d)(B), (int)(C), _MM_FROUND_CUR_DIRECTION))
 #endif
 
 #ifdef __OPTIMIZE__
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 86ad31e..d19ca84 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -516,6 +516,7 @@ DEF_FUNCTION_TYPE (V16QI, V16QI, V16QI, INT)
 DEF_FUNCTION_TYPE (V16QI, V16QI, V16QI, V16QI)
 DEF_FUNCTION_TYPE (V1DI, V1DI, V1DI, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, INT)
+DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, INT, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DF)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DI, INT)
 DEF_FUNCTION_TYPE (V2DI, V2DI, DI, INT)
@@ -531,6 +532,9 @@ DEF_FUNCTION_TYPE (V4DI, V4DI, V4DI, V4DI)
 DEF_FUNCTION_TYPE (V4HI, V4HI, HI, INT)
 DEF_FUNCTION_TYPE (V4SF, V4SF, FLOAT, INT)
 DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, INT)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, INT, INT)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V2DF, INT)
+DEF_FUNCTION_TYPE (V2DF, V2DF, V4SF, INT)
 DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF)
 DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SI, INT)
 DEF_FUNCTION_TYPE (V4SI, V4SI, SI, INT)
@@ -678,6 +682,7 @@ DEF_FUNCTION_TYPE (V4SF, V4SF, V2DF, V4SF, QI, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DF, QI, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V4SF, V2DF, QI, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DF, INT)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, INT)
 
 DEF_FUNCTION_TYPE (V16SF, V16SF, INT, V16SF, HI, INT)
 DEF_FUNCTION_TYPE (V8DF, V8DF, INT, V8DF, QI, INT)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d780c8a..db768f6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -27960,6 +27960,8 @@ enum ix86_builtins
   /* AVX512F */
   IX86_BUILTIN_ADDPD512,
   IX86_BUILTIN_ADDPS512,
+  IX86_BUILTIN_ADDSD_ROUND,
+  IX86_BUILTIN_ADDSS_ROUND,
   IX86_BUILTIN_ALIGND512,
   IX86_BUILTIN_ALIGNQ512,
   IX86_BUILTIN_BLENDMD512,
@@ -27994,9 +27996,11 @@ enum ix86_builtins
   IX86_BUILTIN_CVTPS2PD512,
   IX86_BUILTIN_CVTPS2PH512,
   IX86_BUILTIN_CVTPS2UDQ512,
+  IX86_BUILTIN_CVTSD2SS_ROUND,
   IX86_BUILTIN_CVTSI2SD64,
   IX86_BUILTIN_CVTSI2SS32,
   IX86_BUILTIN_CVTSI2SS64,
+  IX86_BUILTIN_CVTSS2SD_ROUND,
   IX86_BUILTIN_CVTTPD2DQ512,
   IX86_BUILTIN_CVTTPD2UDQ512,
   IX86_BUILTIN_CVTTPS2DQ512,
@@ -28009,6 +28013,8 @@ enum ix86_builtins
   IX86_BUILTIN_CVTUSI2SS64,
   IX86_BUILTIN_DIVPD512,
   IX86_BUILTIN_DIVPS512,
+  IX86_BUILTIN_DIVSD_ROUND,
+  IX86_BUILTIN_DIVSS_ROUND,
   IX86_BUILTIN_EXPANDPD512,
   IX86_BUILTIN_EXPANDPD512Z,
   IX86_BUILTIN_EXPANDPDLOAD512,
@@ -28031,8 +28037,12 @@ enum ix86_builtins
   IX86_BUILTIN_FIXUPIMMSS128_MASKZ,
   IX86_BUILTIN_GETEXPPD512,
   IX86_BUILTIN_GETEXPPS512,
+  IX86_BUILTIN_GETEXPSD128,
+  IX86_BUILTIN_GETEXPSS128,
   IX86_BUILTIN_GETMANTPD512,
   IX86_BUILTIN_GETMANTPS512,
+  IX86_BUILTIN_GETMANTSD128,
+  IX86_BUILTIN_GETMANTSS128,
   IX86_BUILTIN_INSERTF32X4,
   IX86_BUILTIN_INSERTF64X4,
   IX86_BUILTIN_INSERTI32X4,
@@ -28045,8 +28055,12 @@ enum ix86_builtins
   IX86_BUILTIN_LOADUPS512,
   IX86_BUILTIN_MAXPD512,
   IX86_BUILTIN_MAXPS512,
+  IX86_BUILTIN_MAXSD_ROUND,
+  IX86_BUILTIN_MAXSS_ROUND,
   IX86_BUILTIN_MINPD512,
   IX86_BUILTIN_MINPS512,
+  IX86_BUILTIN_MINSD_ROUND,
+  IX86_BUILTIN_MINSS_ROUND,
   IX86_BUILTIN_MOVAPD512,
   IX86_BUILTIN_MOVAPS512,
   IX86_BUILTIN_MOVDDUP512,
@@ -28063,6 +28077,8 @@ enum ix86_builtins
   IX86_BUILTIN_MOVSLDUP512,
   IX86_BUILTIN_MULPD512,
   IX86_BUILTIN_MULPS512,
+  IX86_BUILTIN_MULSD_ROUND,
+  IX86_BUILTIN_MULSS_ROUND,
   IX86_BUILTIN_PABSD512,
   IX86_BUILTIN_PABSQ512,
   IX86_BUILTIN_PADDD512,
@@ -28173,12 +28189,20 @@ enum ix86_builtins
   IX86_BUILTIN_PXORQ512,
   IX86_BUILTIN_RCP14PD512,
   IX86_BUILTIN_RCP14PS512,
+  IX86_BUILTIN_RCP14SD,
+  IX86_BUILTIN_RCP14SS,
   IX86_BUILTIN_RNDSCALEPD,
   IX86_BUILTIN_RNDSCALEPS,
+  IX86_BUILTIN_RNDSCALESD,
+  IX86_BUILTIN_RNDSCALESS,
   IX86_BUILTIN_RSQRT14PD512,
   IX86_BUILTIN_RSQRT14PS512,
+  IX86_BUILTIN_RSQRT14SD,
+  IX86_BUILTIN_RSQRT14SS,
   IX86_BUILTIN_SCALEFPD512,
   IX86_BUILTIN_SCALEFPS512,
+  IX86_BUILTIN_SCALEFSD,
+  IX86_BUILTIN_SCALEFSS,
   IX86_BUILTIN_SHUFPD512,
   IX86_BUILTIN_SHUFPS512,
   IX86_BUILTIN_SHUF_F32x4,
@@ -28189,6 +28213,8 @@ enum ix86_builtins
   IX86_BUILTIN_SQRTPD512_MASK,
   IX86_BUILTIN_SQRTPS512_MASK,
   IX86_BUILTIN_SQRTPS_NR512,
+  IX86_BUILTIN_SQRTSD_ROUND,
+  IX86_BUILTIN_SQRTSS_ROUND,
   IX86_BUILTIN_STOREAPD512,
   IX86_BUILTIN_STOREAPS512,
   IX86_BUILTIN_STOREDQUDI512,
@@ -28197,6 +28223,8 @@ enum ix86_builtins
   IX86_BUILTIN_STOREUPS512,
   IX86_BUILTIN_SUBPD512,
   IX86_BUILTIN_SUBPS512,
+  IX86_BUILTIN_SUBSD_ROUND,
+  IX86_BUILTIN_SUBSS_ROUND,
   IX86_BUILTIN_UCMPD512,
   IX86_BUILTIN_UCMPQ512,
   IX86_BUILTIN_UNPCKHPD512,
@@ -28225,6 +28253,8 @@ enum ix86_builtins
   IX86_BUILTIN_VFMADDPS512_MASK,
   IX86_BUILTIN_VFMADDPS512_MASK3,
   IX86_BUILTIN_VFMADDPS512_MASKZ,
+  IX86_BUILTIN_VFMADDSD3_ROUND,
+  IX86_BUILTIN_VFMADDSS3_ROUND,
   IX86_BUILTIN_VFMADDSUBPD512_MASK,
   IX86_BUILTIN_VFMADDSUBPD512_MASK3,
   IX86_BUILTIN_VFMADDSUBPD512_MASKZ,
@@ -28235,6 +28265,8 @@ enum ix86_builtins
   IX86_BUILTIN_VFMSUBADDPS512_MASK3,
   IX86_BUILTIN_VFMSUBPD512_MASK3,
   IX86_BUILTIN_VFMSUBPS512_MASK3,
+  IX86_BUILTIN_VFMSUBSD3_MASK3,
+  IX86_BUILTIN_VFMSUBSS3_MASK3,
   IX86_BUILTIN_VFNMADDPD512_MASK,
   IX86_BUILTIN_VFNMADDPS512_MASK,
   IX86_BUILTIN_VFNMSUBPD512_MASK,
@@ -29913,8 +29945,12 @@ static const struct builtin_description bdesc_args[] =
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_xorv8di3_mask, "__builtin_ia32_pxorq512_mask", IX86_BUILTIN_PXORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_rcp14v8df_mask, "__builtin_ia32_rcp14pd512_mask", IX86_BUILTIN_RCP14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_rcp14v16sf_mask, "__builtin_ia32_rcp14ps512_mask", IX86_BUILTIN_RCP14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_srcp14v2df, "__builtin_ia32_rcp14sd", IX86_BUILTIN_RCP14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_srcp14v4sf, "__builtin_ia32_rcp14ss", IX86_BUILTIN_RCP14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_rsqrt14v8df_mask, "__builtin_ia32_rsqrt14pd512_mask", IX86_BUILTIN_RSQRT14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_rsqrt14v16sf_mask, "__builtin_ia32_rsqrt14ps512_mask", IX86_BUILTIN_RSQRT14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_rsqrt14v2df, "__builtin_ia32_rsqrt14sd", IX86_BUILTIN_RSQRT14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_rsqrt14v4sf, "__builtin_ia32_rsqrt14ss", IX86_BUILTIN_RSQRT14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_shufpd512_mask, "__builtin_ia32_shufpd512_mask", IX86_BUILTIN_SHUFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_shufps512_mask, "__builtin_ia32_shufps512_mask", IX86_BUILTIN_SHUFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_shuf_f32x4_mask, "__builtin_ia32_shuf_f32x4_mask", IX86_BUILTIN_SHUF_F32x4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI },
@@ -29994,6 +30030,8 @@ static const struct builtin_description bdesc_round_args[] =
   /* AVX512F */
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_addv8df3_mask_round, "__builtin_ia32_addpd512_mask", IX86_BUILTIN_ADDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_addv16sf3_mask_round, "__builtin_ia32_addps512_mask", IX86_BUILTIN_ADDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmaddv2df3_round, "__builtin_ia32_addsd_round", IX86_BUILTIN_ADDSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmaddv4sf3_round, "__builtin_ia32_addss_round", IX86_BUILTIN_ADDSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_cmpv8df3_mask_round, "__builtin_ia32_cmppd512_mask", IX86_BUILTIN_CMPPD512, UNKNOWN, (int) QI_FTYPE_V8DF_V8DF_INT_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_cmpv16sf3_mask_round, "__builtin_ia32_cmpps512_mask", IX86_BUILTIN_CMPPS512, UNKNOWN, (int) HI_FTYPE_V16SF_V16SF_INT_HI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vmcmpv2df3_mask_round, "__builtin_ia32_cmpsd_mask", IX86_BUILTIN_CMPSD_MASK, UNKNOWN, (int) QI_FTYPE_V2DF_V2DF_INT_QI_INT },
@@ -30008,9 +30046,11 @@ static const struct builtin_description bdesc_round_args[] =
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2dq512_mask", IX86_BUILTIN_CVTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_cvtps2pd512_mask_round, "__builtin_ia32_cvtps2pd512_mask", IX86_BUILTIN_CVTPS2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_ufix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2udq512_mask", IX86_BUILTIN_CVTPS2UDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_cvtsd2ss_round, "__builtin_ia32_cvtsd2ss_round", IX86_BUILTIN_CVTSD2SS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V2DF_INT },
   { OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, CODE_FOR_sse2_cvtsi2sdq_round, "__builtin_ia32_cvtsi2sd64", IX86_BUILTIN_CVTSI2SD64, UNKNOWN, (int) V2DF_FTYPE_V2DF_INT64_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_cvtsi2ss_round, "__builtin_ia32_cvtsi2ss32", IX86_BUILTIN_CVTSI2SS32, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT_INT },
   { OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, CODE_FOR_sse_cvtsi2ssq_round, "__builtin_ia32_cvtsi2ss64", IX86_BUILTIN_CVTSI2SS64, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT64_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_cvtss2sd_round, "__builtin_ia32_cvtss2sd_round", IX86_BUILTIN_CVTSS2SD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_fix_truncv8dfv8si2_mask_round, "__builtin_ia32_cvttpd2dq512_mask", IX86_BUILTIN_CVTTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_ufix_truncv8dfv8si2_mask_round, "__builtin_ia32_cvttpd2udq512_mask", IX86_BUILTIN_CVTTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_fix_truncv16sfv16si2_mask_round, "__builtin_ia32_cvttps2dq512_mask", IX86_BUILTIN_CVTTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT },
@@ -30021,6 +30061,8 @@ static const struct builtin_description bdesc_round_args[] =
   { OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, CODE_FOR_cvtusi2ss64_round, "__builtin_ia32_cvtusi2ss64", IX86_BUILTIN_CVTUSI2SS64, UNKNOWN, (int) V4SF_FTYPE_V4SF_UINT64_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_divv8df3_mask_round, "__builtin_ia32_divpd512_mask", IX86_BUILTIN_DIVPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_divv16sf3_mask_round, "__builtin_ia32_divps512_mask", IX86_BUILTIN_DIVPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmdivv2df3_round, "__builtin_ia32_divsd_round", IX86_BUILTIN_DIVSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmdivv4sf3_round, "__builtin_ia32_divss_round", IX86_BUILTIN_DIVSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fixupimmv8df_mask_round, "__builtin_ia32_fixupimmpd512_mask", IX86_BUILTIN_FIXUPIMMPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fixupimmv8df_maskz_round, "__builtin_ia32_fixupimmpd512_maskz", IX86_BUILTIN_FIXUPIMMPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fixupimmv16sf_mask_round, "__builtin_ia32_fixupimmps512_mask", IX86_BUILTIN_FIXUPIMMPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT },
@@ -30031,22 +30073,40 @@ static const struct builtin_description bdesc_round_args[] =
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sfixupimmv4sf_maskz_round, "__builtin_ia32_fixupimmss_maskz", IX86_BUILTIN_FIXUPIMMSS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI_INT_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getexpv8df_mask_round, "__builtin_ia32_getexppd512_mask", IX86_BUILTIN_GETEXPPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getexpv16sf_mask_round, "__builtin_ia32_getexpps512_mask", IX86_BUILTIN_GETEXPPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sgetexpv2df_round, "__builtin_ia32_getexpsd128_round", IX86_BUILTIN_GETEXPSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sgetexpv4sf_round, "__builtin_ia32_getexpss128_round", IX86_BUILTIN_GETEXPSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getmantv8df_mask_round, "__builtin_ia32_getmantpd512_mask", IX86_BUILTIN_GETMANTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getmantv16sf_mask_round, "__builtin_ia32_getmantps512_mask", IX86_BUILTIN_GETMANTPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getmantv2df_round, "__builtin_ia32_getmantsd_round", IX86_BUILTIN_GETMANTSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getmantv4sf_round, "__builtin_ia32_getmantss_round", IX86_BUILTIN_GETMANTSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_smaxv8df3_mask_round, "__builtin_ia32_maxpd512_mask", IX86_BUILTIN_MAXPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_smaxv16sf3_mask_round, "__builtin_ia32_maxps512_mask", IX86_BUILTIN_MAXPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmsmaxv2df3_round, "__builtin_ia32_maxsd_round", IX86_BUILTIN_MAXSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmsmaxv4sf3_round, "__builtin_ia32_maxss_round", IX86_BUILTIN_MAXSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_sminv8df3_mask_round, "__builtin_ia32_minpd512_mask", IX86_BUILTIN_MINPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_sminv16sf3_mask_round, "__builtin_ia32_minps512_mask", IX86_BUILTIN_MINPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmsminv2df3_round, "__builtin_ia32_minsd_round", IX86_BUILTIN_MINSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmsminv4sf3_round, "__builtin_ia32_minss_round", IX86_BUILTIN_MINSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_mulv8df3_mask_round, "__builtin_ia32_mulpd512_mask", IX86_BUILTIN_MULPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_mulv16sf3_mask_round, "__builtin_ia32_mulps512_mask", IX86_BUILTIN_MULPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmmulv2df3_round, "__builtin_ia32_mulsd_round", IX86_BUILTIN_MULSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmmulv4sf3_round, "__builtin_ia32_mulss_round", IX86_BUILTIN_MULSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_rndscalev2df_round, "__builtin_ia32_rndscalesd_round", IX86_BUILTIN_RNDSCALESD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_rndscalev4sf_round, "__builtin_ia32_rndscaless_round", IX86_BUILTIN_RNDSCALESS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vmscalefv2df_round, "__builtin_ia32_scalefsd_round", IX86_BUILTIN_SCALEFSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vmscalefv4sf_round, "__builtin_ia32_scalefss_round", IX86_BUILTIN_SCALEFSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sqrtv8df2_mask_round, "__builtin_ia32_sqrtpd512_mask", IX86_BUILTIN_SQRTPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sqrtv16sf2_mask_round, "__builtin_ia32_sqrtps512_mask", IX86_BUILTIN_SQRTPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmsqrtv2df2_round, "__builtin_ia32_sqrtsd_round", IX86_BUILTIN_SQRTSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmsqrtv4sf2_round, "__builtin_ia32_sqrtss_round", IX86_BUILTIN_SQRTSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_subv8df3_mask_round, "__builtin_ia32_subpd512_mask", IX86_BUILTIN_SUBPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_subv16sf3_mask_round, "__builtin_ia32_subps512_mask", IX86_BUILTIN_SUBPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmsubv2df3_round, "__builtin_ia32_subsd_round", IX86_BUILTIN_SUBSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmsubv4sf3_round, "__builtin_ia32_subss_round", IX86_BUILTIN_SUBSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_cvtsd2si_round, "__builtin_ia32_vcvtsd2si32", IX86_BUILTIN_VCVTSD2SI32, UNKNOWN, (int) INT_FTYPE_V2DF_INT },
   { OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, CODE_FOR_sse2_cvtsd2siq_round, "__builtin_ia32_vcvtsd2si64", IX86_BUILTIN_VCVTSD2SI64, UNKNOWN, (int) INT64_FTYPE_V2DF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vcvtsd2usi_round, "__builtin_ia32_vcvtsd2usi32", IX86_BUILTIN_VCVTSD2USI32, UNKNOWN, (int) UINT_FTYPE_V2DF_INT },
@@ -30069,6 +30129,8 @@ static const struct builtin_description bdesc_round_args[] =
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmadd_v16sf_mask_round, "__builtin_ia32_vfmaddps512_mask", IX86_BUILTIN_VFMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmadd_v16sf_mask3_round, "__builtin_ia32_vfmaddps512_mask3", IX86_BUILTIN_VFMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmadd_v16sf_maskz_round, "__builtin_ia32_vfmaddps512_maskz", IX86_BUILTIN_VFMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_fmai_vmfmadd_v2df_round, "__builtin_ia32_vfmaddsd3_round", IX86_BUILTIN_VFMADDSD3_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_fmai_vmfmadd_v4sf_round, "__builtin_ia32_vfmaddss3_round", IX86_BUILTIN_VFMADDSS3_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v8df_mask_round, "__builtin_ia32_vfmaddsubpd512_mask", IX86_BUILTIN_VFMADDSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v8df_mask3_round, "__builtin_ia32_vfmaddsubpd512_mask3", IX86_BUILTIN_VFMADDSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v8df_maskz_round, "__builtin_ia32_vfmaddsubpd512_maskz", IX86_BUILTIN_VFMADDSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
@@ -34083,6 +34145,10 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V4SF_FTYPE_V4SF_INT_INT:
     case V4SF_FTYPE_V4SF_INT64_INT:
     case V2DF_FTYPE_V2DF_INT64_INT:
+    case V4SF_FTYPE_V4SF_V4SF_INT:
+    case V2DF_FTYPE_V2DF_V2DF_INT:
+    case V4SF_FTYPE_V4SF_V2DF_INT:
+    case V2DF_FTYPE_V2DF_V4SF_INT:
       nargs = 3;
       break;
     case V8SF_FTYPE_V8DF_V8SF_QI_INT:
@@ -34093,6 +34159,13 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V16SI_FTYPE_V16SF_V16SI_HI_INT:
     case V8DF_FTYPE_V8SF_V8DF_QI_INT:
     case V16SF_FTYPE_V16HI_V16SF_HI_INT:
+    case V2DF_FTYPE_V2DF_V2DF_V2DF_INT:
+    case V4SF_FTYPE_V4SF_V4SF_V4SF_INT:
+      nargs = 4;
+      break;
+    case V4SF_FTYPE_V4SF_V4SF_INT_INT:
+    case V2DF_FTYPE_V2DF_V2DF_INT_INT:
+      nargs_constant = 2;
       nargs = 4;
       break;
     case INT_FTYPE_V4SF_V4SF_INT_INT:
@@ -34156,6 +34229,8 @@ ix86_expand_round_builtin (const struct builtin_description *d,
 		{
 		case CODE_FOR_avx512f_getmantv8df_mask_round:
 		case CODE_FOR_avx512f_getmantv16sf_mask_round:
+		case CODE_FOR_avx512f_getmantv2df_round:
+		case CODE_FOR_avx512f_getmantv4sf_round:
 		  error ("the immediate argument must be 4-bit immediate.");
 		  return const0_rtx;
 		case CODE_FOR_avx512f_cmpv8df3_mask_round:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index de08033..efd00cf 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1264,21 +1264,21 @@
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<plusminus_insn><mode>3"
+(define_insn "<sse>_vm<plusminus_insn><mode>3<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (plusminus:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<plusminus_mnemonic><ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %<iptr>2<round_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<round_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_expand "mul<mode>3<mask_name><round_name>"
@@ -1304,21 +1304,21 @@
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<multdiv_mnemonic><mode>3"
+(define_insn "<sse>_vm<multdiv_mnemonic><mode>3<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (multdiv:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    <multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<multdiv_mnemonic><ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %<iptr>2<round_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<round_prefix>")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1404,7 +1404,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*srcp14<mode>"
+(define_insn "srcp14<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1414,7 +1414,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0|, %1, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -1451,21 +1451,21 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vmsqrt<mode>2"
+(define_insn "<sse>_vmsqrt<mode>2<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (sqrt:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 1 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_operand:VF_128 2 "register_operand" "0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vsqrt<ssescalarmodesuffix>\t{%1, %2, %0|%0, %2, %<iptr>1}"
+   vsqrt<ssescalarmodesuffix>\t{<round_op3>%1, %2, %0|%0, %2, %<iptr>1<round_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<round_prefix>")
    (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1500,7 +1500,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*rsqrt14<mode>"
+(define_insn "rsqrt14<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1581,22 +1581,22 @@
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<code><mode>3"
+(define_insn "<sse>_vm<code><mode>3<round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (smaxmin:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>"))
 	 (match_dup 1)
 	 (const_int 1)))]
   "TARGET_SSE"
   "@
    <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<maxmin_float><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<maxmin_float><ssescalarmodesuffix>\t{<round_saeonly_op3>%2, %1, %0|%0, %1, %<iptr>2<round_saeonly_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<round_saeonly_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
 ;; These versions of the min/max patterns implement exactly the operations
@@ -4061,34 +4061,34 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "sse2_cvtsd2ss"
+(define_insn "sse2_cvtsd2ss<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
 	    (float_truncate:V2SF
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m,vm")))
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m,<round_constraint>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE2"
   "@
    cvtsd2ss\t{%2, %0|%0, %2}
    cvtsd2ss\t{%2, %0|%0, %q2}
-   vcvtsd2ss\t{%2, %1, %0|%0, %1, %q2}"
+   vcvtsd2ss\t{<round_op3>%2, %1, %0|%0, %1, %q2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "athlon_decode" "vector,double,*")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "bdver1_decode" "direct,direct,*")
    (set_attr "btver2_decode" "double,double,double")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,<round_prefix>")
    (set_attr "mode" "SF")])
 
-(define_insn "sse2_cvtss2sd"
+(define_insn "sse2_cvtss2sd<round_saeonly_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (float_extend:V2DF
 	    (vec_select:V2SF
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m,vm")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m,<round_saeonly_constraint>")
 	      (parallel [(const_int 0) (const_int 1)])))
 	  (match_operand:V2DF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
@@ -4096,14 +4096,14 @@
   "@
    cvtss2sd\t{%2, %0|%0, %2}
    cvtss2sd\t{%2, %0|%0, %k2}
-   vcvtss2sd\t{%2, %1, %0|%0, %1, %k2}"
+   vcvtss2sd\t{<round_saeonly_op3>%2, %1, %0|%0, %1, %k2<round_saeonly_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "athlon_decode" "direct,direct,*")
    (set_attr "bdver1_decode" "direct,direct,*")
    (set_attr "btver2_decode" "double,double,double")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,<round_saeonly_prefix>")
    (set_attr "mode" "DF")])
 
 (define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>"
@@ -6506,17 +6506,17 @@
   operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8);
 })
 
-(define_insn "*avx512f_vmscalef<mode>"
+(define_insn "avx512f_vmscalef<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")]
 	    UNSPEC_SCALEF)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "%vscalef<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vscalef<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
@@ -6586,17 +6586,17 @@
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_sgetexp<mode>"
+(define_insn "avx512f_sgetexp<mode><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")]
 	    UNSPEC_GETEXP)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetexp<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+   "vgetexp<ssescalarmodesuffix>\t{<round_saeonly_op3>%2, %1, %0|%0, %1, %2<round_saeonly_op3>}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<ssescalarmode>")])
 
@@ -6751,18 +6751,18 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512f_rndscale<mode>"
+(define_insn "avx512f_rndscale<mode><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_255_operand")]
 	    UNSPEC_ROUND)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -15137,18 +15137,18 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_getmant<mode>"
+(define_insn "avx512f_getmant<mode><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_15_operand")]
 	    UNSPEC_GETMANT)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetmant<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+   "vgetmant<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 11f2e5f..87ff87e 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -101,7 +101,6 @@
 (define_subst_attr "round_name" "round" "" "_round")
 (define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4")
 (define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5")
-(define_subst_attr "round_mask_scalar_operand3" "mask_scalar" "%R3" "%R5")
 (define_subst_attr "round_sd_mask_operand4" "sd" "%R4" "%R6")
 (define_subst_attr "round_op2" "round" "" "%R2")
 (define_subst_attr "round_op3" "round" "" "%R3")
@@ -115,6 +114,7 @@
 (define_subst_attr "round_constraint" "round" "vm" "v")
 (define_subst_attr "round_constraint2" "round" "m" "v")
 (define_subst_attr "round_constraint3" "round" "rm" "r")
+(define_subst_attr "round_prefix" "round" "vex" "evex")
 (define_subst_attr "round_mode512bit_condition" "round" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) == V8DFmode)")
 (define_subst_attr "round_modev4sf_condition" "round" "1" "(GET_MODE (operands[0]) == V4SFmode)")
 (define_subst_attr "round_codefor" "round" "*" "")
@@ -137,9 +137,11 @@
 (define_subst_attr "round_saeonly_mask_scalar_merge_operand4" "mask_scalar_merge" "%R4" "%R5")
 (define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%R5" "%R7")
 (define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%R2")
+(define_subst_attr "round_saeonly_op3" "round_saeonly" "" "%R3")
 (define_subst_attr "round_saeonly_op4" "round_saeonly" "" "%R4")
 (define_subst_attr "round_saeonly_op5" "round_saeonly" "" "%R5")
 (define_subst_attr "round_saeonly_op6" "round_saeonly" "" "%R6")
+(define_subst_attr "round_saeonly_prefix" "round_saeonly" "vex" "evex")
 (define_subst_attr "round_saeonly_mask_op2" "round_saeonly" "" "<round_saeonly_mask_operand2>")
 (define_subst_attr "round_saeonly_mask_op3" "round_saeonly" "" "<round_saeonly_mask_operand3>")
 (define_subst_attr "round_saeonly_mask_scalar_op3" "round_saeonly" "" "<round_saeonly_mask_scalar_operand3>")
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 0d38f30..7201592 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -169,6 +169,8 @@
 /* avx512fintrin.h */
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 1)
+#define __builtin_ia32_addss_round(A, B, C) __builtin_ia32_addss_round(A, B, 1)
 #define __builtin_ia32_alignd512_mask(A, B, F, D, E) __builtin_ia32_alignd512_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignq512_mask(A, B, F, D, E) __builtin_ia32_alignq512_mask(A, B, 1, D, E)
 #define __builtin_ia32_cmpd512_mask(A, B, E, D) __builtin_ia32_cmpd512_mask(A, B, 1, D)
@@ -184,11 +186,11 @@
 #define __builtin_ia32_cvtps2dq512_mask(A, B, C, D) __builtin_ia32_cvtps2dq512_mask(A, B, C, 1)
 #define __builtin_ia32_cvtps2pd512_mask(A, B, C, D) __builtin_ia32_cvtps2pd512_mask(A, B, C, 5)
 #define __builtin_ia32_cvtps2udq512_mask(A, B, C, D) __builtin_ia32_cvtps2udq512_mask(A, B, C, 1)
-#define __builtin_ia32_cvtsd2ss_mask(A, B, C, D, E) __builtin_ia32_cvtsd2ss_mask(A, B, C, D, 1)
+#define __builtin_ia32_cvtsd2ss_round(A, B, C) __builtin_ia32_cvtsd2ss_round(A, B, 1)
+#define __builtin_ia32_cvtss2sd_round(A, B, C) __builtin_ia32_cvtss2sd_round(A, B, 4)
 #define __builtin_ia32_cvtsi2sd64(A, B, C) __builtin_ia32_cvtsi2sd64(A, B, 1)
 #define __builtin_ia32_cvtsi2ss32(A, B, C) __builtin_ia32_cvtsi2ss32(A, B, 1)
 #define __builtin_ia32_cvtsi2ss64(A, B, C) __builtin_ia32_cvtsi2ss64(A, B, 1)
-#define __builtin_ia32_cvtss2sd_mask(A, B, C, D, E) __builtin_ia32_cvtss2sd_mask(A, B, C, D, 5)
 #define __builtin_ia32_cvttpd2dq512_mask(A, B, C, D) __builtin_ia32_cvttpd2dq512_mask(A, B, C, 5)
 #define __builtin_ia32_cvttpd2udq512_mask(A, B, C, D) __builtin_ia32_cvttpd2udq512_mask(A, B, C, 5)
 #define __builtin_ia32_cvttps2dq512_mask(A, B, C, D) __builtin_ia32_cvttps2dq512_mask(A, B, C, 5)
@@ -199,6 +201,8 @@
 #define __builtin_ia32_cvtusi2ss64(A, B, C) __builtin_ia32_cvtusi2ss64(A, B, 1)
 #define __builtin_ia32_divpd512_mask(A, B, C, D, E) __builtin_ia32_divpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_divps512_mask(A, B, C, D, E) __builtin_ia32_divps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_divsd_round(A, B, C) __builtin_ia32_divsd_round(A, B, 1)
+#define __builtin_ia32_divss_round(A, B, C) __builtin_ia32_divss_round(A, B, 1)
 #define __builtin_ia32_extractf32x4_mask(A, E, C, D) __builtin_ia32_extractf32x4_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x4_mask(A, E, C, D) __builtin_ia32_extractf64x4_mask(A, 1, C, D)
 #define __builtin_ia32_extracti32x4_mask(A, E, C, D) __builtin_ia32_extracti32x4_mask(A, 1, C, D)
@@ -221,18 +225,28 @@
 #define __builtin_ia32_gathersiv8di(A, B, C, D, F) __builtin_ia32_gathersiv8di(A, B, C, D, 1)
 #define __builtin_ia32_getexppd512_mask(A, B, C, D) __builtin_ia32_getexppd512_mask(A, B, C, 5)
 #define __builtin_ia32_getexpps512_mask(A, B, C, D) __builtin_ia32_getexpps512_mask(A, B, C, 5)
+#define __builtin_ia32_getexpsd128_round(A, B, C) __builtin_ia32_getexpsd128_round(A, B, 4)
+#define __builtin_ia32_getexpss128_round(A, B, C) __builtin_ia32_getexpss128_round(A, B, 4)
 #define __builtin_ia32_getmantpd512_mask(A, F, C, D, E) __builtin_ia32_getmantpd512_mask(A, 1, C, D, 5)
 #define __builtin_ia32_getmantps512_mask(A, F, C, D, E) __builtin_ia32_getmantps512_mask(A, 1, C, D, 5)
+#define __builtin_ia32_getmantsd_round(A, B, C, D) __builtin_ia32_getmantsd_round(A, B, 1, 4)
+#define __builtin_ia32_getmantss_round(A, B, C, D) __builtin_ia32_getmantss_round(A, B, 1, 4)
 #define __builtin_ia32_insertf32x4_mask(A, B, F, D, E) __builtin_ia32_insertf32x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_insertf64x4_mask(A, B, F, D, E) __builtin_ia32_insertf64x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti32x4_mask(A, B, F, D, E) __builtin_ia32_inserti32x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti64x4_mask(A, B, F, D, E) __builtin_ia32_inserti64x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_maxpd512_mask(A, B, C, D, E) __builtin_ia32_maxpd512_mask(A, B, C, D, 5)
 #define __builtin_ia32_maxps512_mask(A, B, C, D, E) __builtin_ia32_maxps512_mask(A, B, C, D, 5)
+#define __builtin_ia32_maxsd_round(A, B, C) __builtin_ia32_maxsd_round(A, B, 4)
+#define __builtin_ia32_maxss_round(A, B, C) __builtin_ia32_maxss_round(A, B, 4)
 #define __builtin_ia32_minpd512_mask(A, B, C, D, E) __builtin_ia32_minpd512_mask(A, B, C, D, 5)
 #define __builtin_ia32_minps512_mask(A, B, C, D, E) __builtin_ia32_minps512_mask(A, B, C, D, 5)
+#define __builtin_ia32_minsd_round(A, B, C) __builtin_ia32_minsd_round(A, B, 4)
+#define __builtin_ia32_minss_round(A, B, C) __builtin_ia32_minss_round(A, B, 4)
 #define __builtin_ia32_mulpd512_mask(A, B, C, D, E) __builtin_ia32_mulpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_mulps512_mask(A, B, C, D, E) __builtin_ia32_mulps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_mulsd_round(A, B, C) __builtin_ia32_mulsd_round(A, B, 1)
+#define __builtin_ia32_mulss_round(A, B, C) __builtin_ia32_mulss_round(A, B, 1)
 #define __builtin_ia32_permdf512_mask(A, E, C, D) __builtin_ia32_permdf512_mask(A, 1, C, D)
 #define __builtin_ia32_permdi512_mask(A, E, C, D) __builtin_ia32_permdi512_mask(A, 1, C, D)
 #define __builtin_ia32_prold512_mask(A, E, C, D) __builtin_ia32_prold512_mask(A, 1, C, D)
@@ -252,10 +266,12 @@
 #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E)
 #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 5)
 #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 5)
-#define __builtin_ia32_rndscalesd_mask(A, B, I, D, E, F) __builtin_ia32_rndscalesd_mask(A, B, 1, D, E, 5)
-#define __builtin_ia32_rndscaless_mask(A, B, I, D, E, F) __builtin_ia32_rndscaless_mask(A, B, 1, D, E, 5)
+#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4)
+#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4)
 #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_scalefsd_round(A, B, C) __builtin_ia32_scalefsd_round(A, B, 1)
+#define __builtin_ia32_scalefss_round(A, B, C) __builtin_ia32_scalefss_round(A, B, 1)
 #define __builtin_ia32_scatterdiv8df(A, B, C, D, F) __builtin_ia32_scatterdiv8df(A, B, C, D, 1)
 #define __builtin_ia32_scatterdiv8di(A, B, C, D, F) __builtin_ia32_scatterdiv8di(A, B, C, D, 1)
 #define __builtin_ia32_scatterdiv16sf(A, B, C, D, F) __builtin_ia32_scatterdiv16sf(A, B, C, D, 1)
@@ -272,10 +288,12 @@
 #define __builtin_ia32_shufps512_mask(A, B, F, D, E) __builtin_ia32_shufps512_mask(A, B, 1, D, E)
 #define __builtin_ia32_sqrtpd512_mask(A, B, C, D) __builtin_ia32_sqrtpd512_mask(A, B, C, 1)
 #define __builtin_ia32_sqrtps512_mask(A, B, C, D) __builtin_ia32_sqrtps512_mask(A, B, C, 1)
-#define __builtin_ia32_sqrtsd_mask(A, B, C, D, E) __builtin_ia32_sqrtsd_mask(A, B, C, D, 1)
-#define __builtin_ia32_sqrtss_mask(A, B, C, D, E) __builtin_ia32_sqrtss_mask(A, B, C, D, 1)
+#define __builtin_ia32_sqrtss_round(A, B, C) __builtin_ia32_sqrtss_round(A, B, 1)
+#define __builtin_ia32_sqrtsd_round(A, B, C) __builtin_ia32_sqrtsd_round(A, B, 1)
 #define __builtin_ia32_subpd512_mask(A, B, C, D, E) __builtin_ia32_subpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_subps512_mask(A, B, C, D, E) __builtin_ia32_subps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_subsd_round(A, B, C) __builtin_ia32_subsd_round(A, B, 1)
+#define __builtin_ia32_subss_round(A, B, C) __builtin_ia32_subss_round(A, B, 1)
 #define __builtin_ia32_ucmpd512_mask(A, B, E, D) __builtin_ia32_ucmpd512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpq512_mask(A, B, E, D) __builtin_ia32_ucmpq512_mask(A, B, 1, D)
 #define __builtin_ia32_vcomisd(A, B, C, D) __builtin_ia32_vcomisd(A, B, 1, 5)
@@ -304,12 +322,8 @@
 #define __builtin_ia32_vfmaddps512_mask(A, B, C, D, E) __builtin_ia32_vfmaddps512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddps512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddps512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddps512_maskz(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsd3_mask(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsd3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsd3_maskz(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_mask(A, B, C, D, E) __builtin_ia32_vfmaddss3_mask(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddss3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddss3_maskz(A, B, C, D, 1)
+#define __builtin_ia32_vfmaddsd3_round(A, B, C, D) __builtin_ia32_vfmaddsd3_round(A, B, C, 1)
+#define __builtin_ia32_vfmaddss3_round(A, B, C, D) __builtin_ia32_vfmaddss3_round(A, B, C, 1)
 #define __builtin_ia32_vfmaddsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_maskz(A, B, C, D, 1)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vaddsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vaddsd-1.c
new file mode 100644
index 0000000..f0bc5ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vaddsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vaddsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_add_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vaddss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vaddss-1.c
new file mode 100644
index 0000000..5a8491c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vaddss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vaddss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_add_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2ss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2ss-1.c
new file mode 100644
index 0000000..8cb51c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2ss-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsd2ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 s1, r;
+volatile __m128d s2;
+
+void extern
+avx512f_test (void)
+{
+  r = _mm_cvt_roundsd_ss (s1, s2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vcvtss2sd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vcvtss2sd-1.c
new file mode 100644
index 0000000..5b6a43f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vcvtss2sd-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vcvtss2sd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d s1, r;
+volatile __m128 s2;
+
+void extern
+avx512f_test (void)
+{
+  r = _mm_cvt_roundss_sd (s1, s2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vdivsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vdivsd-1.c
new file mode 100644
index 0000000..95df56c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vdivsd-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vdivsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_div_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vdivss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vdivss-1.c
new file mode 100644
index 0000000..5c6eb94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vdivss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vdivss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_div_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vextractf32x4-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vextractf32x4-2.c
new file mode 100644
index 0000000..26d7c3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vextractf32x4-2.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -DAVX512F" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+#include "avx512f-mask-type.h"
+#include "string.h"
+
+void
+CALC (UNION_TYPE (AVX512F_LEN,) s1, float *res_ref, int mask)
+{
+  memset (res_ref, 0, 16);
+  memcpy (res_ref, s1.a + mask * 4, 16);
+}
+
+void static
+TEST (void)
+{
+  UNION_TYPE (AVX512F_LEN,) s1;
+  union128 res1, res2, res3;
+  float res_ref[4];
+  MASK_TYPE mask = MASK_VALUE;
+  int j;
+
+  for (j = 0; j < SIZE; j++)
+    {
+      s1.a[j] = j * j / 4.56;
+    }
+
+  for (j = 0; j < 4; j++)
+    {
+      res1.a[j] = DEFAULT_VALUE;
+      res2.a[j] = DEFAULT_VALUE;
+      res3.a[j] = DEFAULT_VALUE;
+    }
+
+  res1.x = INTRINSIC (_extractf32x4_ps) (s1.x, 1);
+  res2.x = INTRINSIC (_mask_extractf32x4_ps) (res2.x, mask, s1.x, 1);
+  res3.x = INTRINSIC (_maskz_extractf32x4_ps) (mask, s1.x, 1);
+  CALC (s1, res_ref, 1);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+
+  MASK_MERGE ()(res_ref, mask, 4);
+  if (check_union128 (res2, res_ref))
+    abort ();
+
+  MASK_ZERO ()(res_ref, mask, 4);
+  if (check_union128 (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vextracti32x4-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vextracti32x4-2.c
new file mode 100644
index 0000000..c82858d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vextracti32x4-2.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -DAVX512F" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+#include "avx512f-mask-type.h"
+#include "string.h"
+
+void
+CALC (UNION_TYPE (AVX512F_LEN, i_d) s1, int *res_ref, int mask)
+{
+  memset (res_ref, 0, 16);
+  memcpy (res_ref, s1.a + mask * 4, 16);
+}
+
+void static
+TEST (void)
+{
+  UNION_TYPE (AVX512F_LEN, i_d) s1;
+  union128i_d res1, res2, res3;
+  int res_ref[4];
+  MASK_TYPE mask = MASK_VALUE;
+  int j;
+
+  for (j = 0; j < SIZE; j++)
+    {
+      s1.a[j] = j * j / 4.56;
+    }
+
+  for (j = 0; j < 4; j++)
+    {
+      res1.a[j] = DEFAULT_VALUE;
+      res2.a[j] = DEFAULT_VALUE;
+      res3.a[j] = DEFAULT_VALUE;
+    }
+
+  res1.x = INTRINSIC (_extracti32x4_epi32) (s1.x, 1);
+  res2.x =
+    INTRINSIC (_mask_extracti32x4_epi32) (res2.x, mask, s1.x, 1);
+  res3.x = INTRINSIC (_maskz_extracti32x4_epi32) (mask, s1.x, 1);
+  CALC (s1, res_ref, 1);
+
+  if (check_union128i_d (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, 4);
+  if (check_union128i_d (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, 4);
+  if (check_union128i_d (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXsd-1.c
new file mode 100644
index 0000000..ea8b17c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmadd...sd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmadd_round_sd (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXss-1.c
new file mode 100644
index 0000000..cd44fb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmadd...ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmadd_round_ss (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXsd-1.c
new file mode 100644
index 0000000..2d78df6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...sd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmsub_round_sd (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXss-1.c
new file mode 100644
index 0000000..b7609f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmsub_round_ss (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXsd-1.c
new file mode 100644
index 0000000..e938236
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...sd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmadd_round_sd (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXss-1.c
new file mode 100644
index 0000000..f5752e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmadd_round_ss (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXsd-1.c
new file mode 100644
index 0000000..931b5d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...sd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmsub_round_sd (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXss-1.c
new file mode 100644
index 0000000..f097f1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmsub_round_ss (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-1.c
new file mode 100644
index 0000000..952ed54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vgetexpsd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\, %xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vgetexpsd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\, %xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getexp_sd (x, x);
+  x = _mm_getexp_round_sd (x, x, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-2.c
new file mode 100644
index 0000000..c1e5e5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-2.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#define SIZE (128 / 64)
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vgetexpsd (double *s, double *r)
+{
+  r[0] = floor (log (s[0]) / log (2));
+}
+
+void static
+avx512f_test (void)
+{
+  int i;
+  union128d res1, s1;
+  double res_ref[SIZE];
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 5.0 - i;
+      res_ref[i] = s1.a[i];
+    }
+
+  res1.x = _mm_getexp_sd (s1.x, s1.x);
+
+  compute_vgetexpsd (s1.a, res_ref);
+
+  if (check_fp_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-1.c
new file mode 100644
index 0000000..d946a47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vgetexpss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\, %xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vgetexpss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\, %xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getexp_ss (x, x);
+  x = _mm_getexp_round_ss (x, x, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-2.c
new file mode 100644
index 0000000..39d77c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-2.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#define SIZE (128 / 32)
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vgetexpss (float *s, float *r)
+{
+  r[0] = floor (log (s[0]) / log (2));
+}
+
+void static
+avx512f_test (void)
+{
+  int i;
+  union128 res1, s1;
+  float res_ref[SIZE];
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 5.0 - i;
+      res_ref[i] = s1.a[i];
+    }
+
+  res1.x = _mm_getexp_ss (s1.x, s1.x);
+
+  compute_vgetexpss (s1.a, res_ref);
+
+  if (check_fp_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-1.c
new file mode 100644
index 0000000..4b252a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-final { scan-assembler-times "vgetmantsd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[\\n\]" 2 } } */
+/* { dg-final { scan-assembler-times "vgetmantsd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[\\n\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x, y, z;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getmant_sd (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm_getmant_round_sd (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src,
+			    _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-2.c
new file mode 100644
index 0000000..2d39066
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-2.c
@@ -0,0 +1,84 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+#include <math.h>
+
+double
+get_norm_mant (double source, int signctrl, int interv)
+{
+  long long dest, src, sign, exp, fraction;
+  src = *(long long *) &source;
+  sign = (signctrl & 0x1) ? 0 : (src >> 63);
+  exp = (src & 0x7ff0000000000000) >> 52;
+  fraction = (src & 0xfffffffffffff);
+
+  if (isnan (source))
+    return signbit (source) ? -NAN : NAN;
+  if (source == 0.0 || source == -0.0 || isinf (source))
+    return sign ? -1.0 : 1.0;
+  if (signbit (source) && (signctrl & 0x2))
+    return -NAN;
+  if (!isnormal (source))
+    {
+      src = (src & 0xfff7ffffffffffff);
+      exp = 0x3ff;
+      while (!(src & 0x8000000000000))
+	{
+	  src += fraction & 0x8000000000000;
+	  fraction = fraction << 1;
+	  exp--;
+	}
+    }
+
+  switch (interv)
+    {
+    case 0:
+      exp = 0x3ff;
+      break;
+    case 1:
+      exp = ((exp - 0x3ff) & 0x1) ? 0x3fe : 0x3ff;
+      break;
+    case 2:
+      exp = 0x3fe;
+      break;
+    case 3:
+      exp = (fraction & 0x8000000000000) ? 0x3fe : 0x3ff;
+      break;
+    default:
+      abort ();
+    }
+
+  dest = (sign << 63) | (exp << 52) | fraction;
+  return *(double *) &dest;
+}
+
+static void
+compute_vgetmantsd (double *r, double *s1, double *s2, int interv,
+		    int signctrl)
+{
+  r[0] = get_norm_mant (s2[0], signctrl, interv);
+  r[1] = s1[1];
+}
+
+static void
+avx512f_test (void)
+{
+  int i, sign;
+  union128d res1, src1, src2;
+  double res_ref[2];
+  int interv = _MM_MANT_NORM_p5_1;
+  int signctrl = _MM_MANT_SIGN_src;
+
+  src1.x = _mm_set_pd (-3.0, 111.111);
+  src2.x = _mm_set_pd (222.222, -2.0);
+
+  res1.x = _mm_getmant_sd (src1.x, src2.x, interv, signctrl);
+
+  compute_vgetmantsd (res_ref, src1.a, src2.a, interv, signctrl);
+
+  if (check_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-1.c
new file mode 100644
index 0000000..30c837b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-final { scan-assembler-times "vgetmantss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[\\n\]" 2 } } */
+/* { dg-final { scan-assembler-times "vgetmantss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[\\n\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x, y, z;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getmant_ss (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm_getmant_round_ss (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src,
+		      _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-2.c
new file mode 100644
index 0000000..5f878b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-2.c
@@ -0,0 +1,88 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+#include <math.h>
+
+float
+get_norm_mant (float source, int signctrl, int interv)
+{
+  int dest, src, sign, exp, fraction;
+  src = *(int *) &source;
+  sign = (signctrl & 0x1) ? 0 : (src >> 31);
+  exp = (src & 0x7f800000) >> 23;
+  fraction = (src & 0x7fffff);
+
+  if (isnan (source))
+    return signbit (source) ? -NAN : NAN;
+  if (source == 0.0 || source == -0.0 || isinf (source))
+    return sign ? -1.0 : 1.0;
+  if (signbit (source) && (signctrl & 0x2))
+    return -NAN;
+  if (!isnormal (source))
+    {
+      src = (src & 0xffbfffff);
+      exp = 0x7f;
+      while (!(src & 0x400000))
+	{
+	  src += fraction & 0x400000;
+	  fraction = fraction << 1;
+	  exp--;
+	}
+    }
+
+  switch (interv)
+    {
+    case 0:
+      exp = 0x7f;
+      break;
+    case 1:
+      exp = ((exp - 0x7f) & 0x1) ? 0x7e : 0x7f;
+      break;
+    case 2:
+      exp = 0x7e;
+      break;
+    case 3:
+      exp = (fraction & 0x400000) ? 0x7e : 0x7f;
+      break;
+    default:
+      abort ();
+    }
+
+  dest = (sign << 31) | (exp << 23) | fraction;
+  return *(float *) &dest;
+}
+
+static void
+compute_vgetmantss (float *r, float *s1, float *s2, int interv,
+		    int signctrl)
+{
+  int i;
+  r[0] = get_norm_mant (s2[0], signctrl, interv);
+  for (i = 1; i < 4; i++)
+    {
+      r[i] = s1[i];
+    }
+}
+
+static void
+avx512f_test (void)
+{
+  int i, sign;
+  union128 res1, src1, src2;
+  float res_ref[4];
+  int interv = _MM_MANT_NORM_p5_1;
+  int signctrl = _MM_MANT_SIGN_src;
+
+  src1.x = _mm_set_ps (-24.043, 68.346, -43.35, 546.46);
+  src2.x = _mm_set_ps (222.222, 333.333, 444.444, -2.0);
+
+  res1.x = _mm_getmant_ss (src1.x, src2.x, interv, signctrl);
+
+  compute_vgetmantss (res_ref, src1.a, src2.a, interv, signctrl);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmaxsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmaxsd-1.c
new file mode 100644
index 0000000..8c24704
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmaxsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vmaxsd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_max_round_sd (x1, x2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmaxss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmaxss-1.c
new file mode 100644
index 0000000..027445d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmaxss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vmaxss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_max_round_ss (x1, x2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vminsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vminsd-1.c
new file mode 100644
index 0000000..8f8488f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vminsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vminsd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_min_round_sd (x1, x2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vminss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vminss-1.c
new file mode 100644
index 0000000..0774b75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vminss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vminss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_min_round_ss (x1, x2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmulsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmulsd-1.c
new file mode 100644
index 0000000..c85832a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmulsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vmulsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_mul_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmulss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmulss-1.c
new file mode 100644
index 0000000..cb4bf0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmulss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vmulss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_mul_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-1.c
new file mode 100644
index 0000000..c0c8d03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrcp14sd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_rcp14_sd (x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c
new file mode 100644
index 0000000..9ff3541
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vrcp14sd (double *s1, double *s2, double *r)
+{
+  r[0] = 1.0 / s2[0];
+  r[1] = s1[1];
+}
+
+static void
+avx512f_test (void)
+{
+  union128d s1, s2, res1, res2, res3;
+  double res_ref[2];
+
+  s1.x = _mm_set_pd (-3.0, 111.111);
+  s2.x = _mm_set_pd (222.222, -2.0);
+  res2.a[0] = DEFAULT_VALUE;
+
+  res1.x = _mm_rcp14_sd (s1.x, s2.x);
+
+  compute_vrcp14sd (s1.a, s2.a, res_ref);
+
+  if (check_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-1.c
new file mode 100644
index 0000000..580dfd6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrcp14ss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_rcp14_ss (x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c
new file mode 100644
index 0000000..fe8989a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vrcp14ss (float *s1, float *s2, float *r)
+{
+  r[0] = 1.0 / s2[0];
+  r[1] = s1[1];
+  r[2] = s1[2];
+  r[3] = s1[3];
+}
+
+static void
+avx512f_test (void)
+{
+  union128 s1, s2, res1, res2, res3;
+  float res_ref[4];
+
+  s1.x = _mm_set_ps (-24.043, 68.346, -43.35, 546.46);
+  s2.x = _mm_set_ps (222.222, 333.333, 444.444, -2.0);
+  res2.a[0] = DEFAULT_VALUE;
+
+  res1.x = _mm_rcp14_ss (s1.x, s2.x);
+
+  compute_vrcp14ss (s1.a, s2.a, res_ref);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c
new file mode 100644
index 0000000..2f370a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\\S*,\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_roundscale_sd (x1, x2, 0x42);
+  x1 = _mm_roundscale_round_sd (x1, x2, 0x42, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c
new file mode 100644
index 0000000..5b4e842
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#define SIZE (128 / 64)
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_rndscalesd (double *s1, double *s2, double *r, int imm)
+{
+  int rc, m;
+  rc = imm & 0xf;
+  m = imm >> 4;
+
+  switch (rc)
+    {
+    case _MM_FROUND_FLOOR:
+      r[0] = floor (s2[0] * pow (2, m)) / pow (2, m);
+      break;
+    case _MM_FROUND_CEIL:
+      r[0] = ceil (s2[0] * pow (2, m)) / pow (2, m);
+      break;
+    default:
+      abort ();
+      break;
+    }
+
+  r[1] = s1[1];
+}
+
+static void
+avx512f_test (void)
+{
+  int imm = _MM_FROUND_FLOOR | (7 << 4);
+  union128d s1, s2, res1;
+  double res_ref[SIZE];
+
+  s1.x = _mm_set_pd (4.05084, -1.23162);
+  s2.x = _mm_set_pd (-3.53222, 7.33527);
+
+  res1.x = _mm_roundscale_sd (s1.x, s2.x, imm);
+
+  compute_rndscalesd (s1.a, s2.a, res_ref, imm);
+
+  if (check_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c
new file mode 100644
index 0000000..c9f5a75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\\S*,\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_roundscale_ss (x1, x2, 0x42);
+  x1 = _mm_roundscale_round_ss (x1, x2, 0x42, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
new file mode 100644
index 0000000..7acfe4c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#define SIZE (128 / 32)
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_rndscaless (float *s1, float *s2, float *r, int imm)
+{
+  int rc, m;
+  rc = imm & 0xf;
+  m = imm >> 4;
+
+  switch (rc)
+    {
+    case _MM_FROUND_FLOOR:
+      r[0] = floorf (s2[0] * pow (2, m)) / pow (2, m);
+      break;
+    case _MM_FROUND_CEIL:
+      r[0] = ceilf (s2[0] * pow (2, m)) / pow (2, m);
+      break;
+    default:
+      abort ();
+      break;
+    }
+
+  r[1] = s1[1];
+  r[2] = s1[2];
+  r[3] = s1[3];
+}
+
+static void
+avx512f_test (void)
+{
+  int imm = _MM_FROUND_FLOOR | (7 << 4);
+  union128 s1, s2, res1;
+  float res_ref[SIZE];
+
+  s1.x = _mm_set_ps (4.05084, -1.23162, 2.00231, -6.22103);
+  s2.x = _mm_set_ps (-4.19319, -3.53222, 7.33527, 5.57655);
+
+  res1.x = _mm_roundscale_ss (s1.x, s2.x, imm);
+
+  compute_rndscaless (s1.a, s2.a, res_ref, imm);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-1.c
new file mode 100644
index 0000000..bd8b7a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrsqrt14sd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_rsqrt14_sd (x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-2.c
new file mode 100644
index 0000000..ef4e407
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-2.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vrsqrt14sd (double *s1, double *s2, double *r)
+{
+  r[0] = 1.0 / sqrt (s2[0]);
+  r[1] = s1[1];
+}
+
+static void
+avx512f_test (void)
+{
+  union128d s1, s2, res1, res2, res3;
+  double res_ref[2];
+
+  s1.x = _mm_set_pd (-3.0, 111.111);
+  s2.x = _mm_set_pd (222.222, 4.0);
+  res2.a[0] = DEFAULT_VALUE;
+
+  res1.x = _mm_rsqrt14_sd (s1.x, s2.x);
+
+  compute_vrsqrt14sd (s1.a, s2.a, res_ref);
+
+  if (check_fp_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-1.c
new file mode 100644
index 0000000..d4d4eea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrsqrt14ss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_rsqrt14_ss (x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-2.c
new file mode 100644
index 0000000..b01420f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-2.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vrsqrt14ss (float *s1, float *s2, float *r)
+{
+  r[0] = 1.0 / sqrt (s2[0]);
+  r[1] = s1[1];
+  r[2] = s1[2];
+  r[3] = s1[3];
+}
+
+static void
+avx512f_test (void)
+{
+  union128 s1, s2, res1, res2, res3;
+  float res_ref[4];
+
+  s1.x = _mm_set_ps (-24.43, 68.346, -43.35, 546.46);
+  s2.x = _mm_set_ps (222.222, 333.333, 444.444, 4.0);
+  res2.a[0] = DEFAULT_VALUE;
+
+  res1.x = _mm_rsqrt14_ss (s1.x, s2.x);
+
+  compute_vrsqrt14ss (s1.a, s2.a, res_ref);
+
+  if (check_fp_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-1.c
new file mode 100644
index 0000000..bbf238e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vscalefsd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscalefsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_scalef_sd (x, x);
+  x = _mm_scalef_round_sd (x, x, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-2.c
new file mode 100644
index 0000000..131fc67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-2.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+#define SIZE (128 / 64)
+
+static void
+compute_scalefsd (double *s1, double *s2, double *r)
+{
+  r[0] = s1[0] * pow (2, floor (s2[0]));
+  r[1] = s1[1];
+}
+
+void static
+avx512f_test (void)
+{
+  union128d res1, s1, s2;
+  double res_ref[SIZE];
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 11.5 * (i + 1);
+      s2.a[i] = 10.5 * (i + 1);
+    }
+
+  res1.x = _mm_scalef_sd (s1.x, s2.x);
+
+  compute_scalefsd (s1.a, s2.a, res_ref);
+
+  if (check_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-1.c
new file mode 100644
index 0000000..d36b2ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vscalefss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscalefss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_scalef_ss (x, x);
+  x = _mm_scalef_round_ss (x, x, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-2.c
new file mode 100644
index 0000000..3e8f6d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-2.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+#define SIZE (128 / 32)
+
+static void
+compute_scalefss (float *s1, float *s2, float *r)
+{
+  r[0] = s1[0] * (float) pow (2, floor (s2[0]));
+  r[1] = s1[1];
+  r[2] = s1[2];
+  r[3] = s1[3];
+}
+
+static void
+avx512f_test (void)
+{
+  union128 res1, s1, s2;
+  float res_ref[SIZE];
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 11.5 * (i + 1);
+      s2.a[i] = 10.5 * (i + 1);
+    }
+
+  res1.x = _mm_scalef_ss (s1.x, s2.x);
+
+  compute_scalefss (s1.a, s2.a, res_ref);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vsqrtsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vsqrtsd-1.c
new file mode 100644
index 0000000..5814e3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vsqrtsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vsqrtsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_sqrt_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vsqrtss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vsqrtss-1.c
new file mode 100644
index 0000000..81e8a0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vsqrtss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vsqrtss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_sqrt_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vsubsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vsubsd-1.c
new file mode 100644
index 0000000..511ceb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vsubsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vsubsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_sub_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vsubss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vsubss-1.c
new file mode 100644
index 0000000..618662f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vsubss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vsubss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_sub_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index e8cb533..c5d8876 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -199,6 +199,7 @@ test_1x (_mm512_getmant_pd, __m512d, __m512d, 1, 1)
 test_1x (_mm512_getmant_ps, __m512, __m512, 1, 1)
 test_1x (_mm512_roundscale_round_pd, __m512d, __m512d, 1, 5)
 test_1x (_mm512_roundscale_round_ps, __m512, __m512, 1, 5)
+test_1x (_mm_cvt_roundi32_ss, __m128, __m128, 1, 1)
 test_2 (_mm512_add_round_pd, __m512d, __m512d, __m512d, 1)
 test_2 (_mm512_add_round_ps, __m512, __m512, __m512, 1)
 test_2 (_mm512_alignr_epi32, __m512i, __m512i, __m512i, 1)
@@ -278,16 +279,45 @@ test_2 (_mm512_shuffle_pd, __m512d, __m512d, __m512d, 1)
 test_2 (_mm512_shuffle_ps, __m512, __m512, __m512, 1)
 test_2 (_mm512_sub_round_pd, __m512d, __m512d, __m512d, 1)
 test_2 (_mm512_sub_round_ps, __m512, __m512, __m512, 1)
+test_2 (_mm_add_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_add_round_ss, __m128, __m128, __m128, 1)
 test_2 (_mm_cmp_sd_mask, __mmask8, __m128d, __m128d, 1)
 test_2 (_mm_cmp_ss_mask, __mmask8, __m128, __m128, 1)
 #ifdef __x86_64__
+test_2 (_mm_cvt_roundi64_sd, __m128d, __m128d, long long, 1)
+test_2 (_mm_cvt_roundi64_ss, __m128, __m128, long long, 1)
 #endif
+test_2 (_mm_cvt_roundsd_ss, __m128, __m128, __m128d, 1)
+test_2 (_mm_cvt_roundss_sd, __m128d, __m128d, __m128, 5)
+test_2 (_mm_cvt_roundu32_ss, __m128, __m128, unsigned, 1)
 #ifdef __x86_64__
+test_2 (_mm_cvt_roundu64_sd, __m128d, __m128d, unsigned long long, 1)
+test_2 (_mm_cvt_roundu64_ss, __m128, __m128, unsigned long long, 1)
 #endif
+test_2 (_mm_div_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_div_round_ss, __m128, __m128, __m128, 1)
+test_2 (_mm_getexp_round_sd, __m128d, __m128d, __m128d, 5)
+test_2 (_mm_getexp_round_ss, __m128, __m128, __m128, 5)
+test_2y (_mm_getmant_round_sd, __m128d, __m128d, __m128d, 1, 1, 5)
+test_2y (_mm_getmant_round_ss, __m128, __m128, __m128, 1, 1, 5)
+test_2 (_mm_mul_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_mul_round_ss, __m128, __m128, __m128, 1)
+test_2 (_mm_scalef_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_scalef_round_ss, __m128, __m128, __m128, 1)
+test_2 (_mm_sqrt_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_sqrt_round_ss, __m128, __m128, __m128, 1)
+test_2 (_mm_sub_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_sub_round_ss, __m128, __m128, __m128, 1)
 test_2x (_mm512_cmp_round_pd_mask, __mmask8, __m512d, __m512d, 1, 5)
 test_2x (_mm512_cmp_round_ps_mask, __mmask16, __m512, __m512, 1, 5)
 test_2x (_mm512_maskz_roundscale_round_pd, __m512d, __mmask8, __m512d, 1, 5)
 test_2x (_mm512_maskz_roundscale_round_ps, __m512, __mmask16, __m512, 1, 5)
+test_2x (_mm_cmp_round_sd_mask, __mmask8, __m128d, __m128d, 1, 5)
+test_2x (_mm_cmp_round_ss_mask, __mmask8, __m128, __m128, 1, 5)
+test_2x (_mm_comi_round_sd, int, __m128d, __m128d, 1, 5)
+test_2x (_mm_comi_round_ss, int, __m128, __m128, 1, 5)
+test_2x (_mm_roundscale_round_sd, __m128d, __m128d, __m128d, 1, 5)
+test_2x (_mm_roundscale_round_ss, __m128, __m128, __m128, 1, 5)
 test_3 (_mm512_fmadd_round_pd, __m512d, __m512d, __m512d, __m512d, 1)
 test_3 (_mm512_fmadd_round_ps, __m512, __m512, __m512, __m512, 1)
 test_3 (_mm512_fmaddsub_round_pd, __m512d, __m512d, __m512d, __m512d, 1)
@@ -373,6 +403,14 @@ test_3 (_mm512_maskz_sub_round_pd, __m512d, __mmask8, __m512d, __m512d, 1)
 test_3 (_mm512_maskz_sub_round_ps, __m512, __mmask16, __m512, __m512, 1)
 test_3 (_mm512_ternarylogic_epi32, __m512i, __m512i, __m512i, __m512i, 1)
 test_3 (_mm512_ternarylogic_epi64, __m512i, __m512i, __m512i, __m512i, 1)
+test_3 (_mm_fmadd_round_sd, __m128d, __m128d, __m128d, __m128d, 1)
+test_3 (_mm_fmadd_round_ss, __m128, __m128, __m128, __m128, 1)
+test_3 (_mm_fmsub_round_sd, __m128d, __m128d, __m128d, __m128d, 1)
+test_3 (_mm_fmsub_round_ss, __m128, __m128, __m128, __m128, 1)
+test_3 (_mm_fnmadd_round_sd, __m128d, __m128d, __m128d, __m128d, 1)
+test_3 (_mm_fnmadd_round_ss, __m128, __m128, __m128, __m128, 1)
+test_3 (_mm_fnmsub_round_sd, __m128d, __m128d, __m128d, __m128d, 1)
+test_3 (_mm_fnmsub_round_ss, __m128, __m128, __m128, __m128, 1)
 test_3 (_mm_mask_cmp_sd_mask, __mmask8, __mmask8, __m128d, __m128d, 1)
 test_3 (_mm_mask_cmp_ss_mask, __mmask8, __mmask8, __m128, __m128, 1)
 test_3v (_mm512_i32scatter_epi32, void *, __m512i, __m512i, 1)
@@ -385,6 +423,10 @@ test_3v (_mm512_i64scatter_pd, void *, __m512i, __m512d, 1)
 test_3v (_mm512_i64scatter_ps, void *, __m512i, __m256, 1)
 test_3x (_mm512_mask_roundscale_round_pd, __m512d, __m512d, __mmask8, __m512d, 1, 5)
 test_3x (_mm512_mask_roundscale_round_ps, __m512, __m512, __mmask16, __m512, 1, 5)
+test_3x (_mm_fixupimm_round_sd, __m128d, __m128d, __m128d, __m128i, 1, 5)
+test_3x (_mm_fixupimm_round_ss, __m128, __m128, __m128, __m128i, 1, 5)
+test_3x (_mm_mask_cmp_round_sd_mask, __mmask8, __mmask8, __m128d, __m128d, 1, 5)
+test_3x (_mm_mask_cmp_round_ss_mask, __mmask8, __mmask8, __m128, __m128, 1, 5)
 test_4 (_mm512_mask3_fmadd_round_pd, __m512d, __m512d, __m512d, __m512d, __mmask8, 1)
 test_4 (_mm512_mask3_fmadd_round_ps, __m512, __m512, __m512, __m512, __mmask16, 1)
 test_4 (_mm512_mask3_fmaddsub_round_pd, __m512d, __m512d, __m512d, __m512d, __mmask8, 1)
@@ -471,6 +513,10 @@ test_4x (_mm512_mask_fixupimm_round_pd, __m512d, __m512d, __mmask8, __m512d, __m
 test_4x (_mm512_mask_fixupimm_round_ps, __m512, __m512, __mmask16, __m512, __m512i, 1, 5)
 test_4x (_mm512_maskz_fixupimm_round_pd, __m512d, __mmask8, __m512d, __m512d, __m512i, 1, 5)
 test_4x (_mm512_maskz_fixupimm_round_ps, __m512, __mmask16, __m512, __m512, __m512i, 1, 5)
+test_4x (_mm_mask_fixupimm_round_sd, __m128d, __m128d, __mmask8, __m128d, __m128i, 1, 5)
+test_4x (_mm_mask_fixupimm_round_ss, __m128, __m128, __mmask8, __m128, __m128i, 1, 5)
+test_4x (_mm_maskz_fixupimm_round_sd, __m128d, __mmask8, __m128d, __m128d, __m128i, 1, 5)
+test_4x (_mm_maskz_fixupimm_round_ss, __m128, __mmask8, __m128, __m128, __m128i, 1, 5)
 
 /* avx512pfintrin.h */
 test_3vx (_mm512_mask_prefetch_i32gather_ps, __m512i, __mmask16, void const *, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 0123538..a6a7b39 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -186,6 +186,8 @@
 /* avx512fintrin.h */
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 1)
+#define __builtin_ia32_addss_round(A, B, C) __builtin_ia32_addss_round(A, B, 1)
 #define __builtin_ia32_alignd512_mask(A, B, F, D, E) __builtin_ia32_alignd512_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignq512_mask(A, B, F, D, E) __builtin_ia32_alignq512_mask(A, B, 1, D, E)
 #define __builtin_ia32_cmpd512_mask(A, B, E, D) __builtin_ia32_cmpd512_mask(A, B, 1, D)
@@ -201,6 +203,8 @@
 #define __builtin_ia32_cvtps2dq512_mask(A, B, C, D) __builtin_ia32_cvtps2dq512_mask(A, B, C, 1)
 #define __builtin_ia32_cvtps2pd512_mask(A, B, C, D) __builtin_ia32_cvtps2pd512_mask(A, B, C, 5)
 #define __builtin_ia32_cvtps2udq512_mask(A, B, C, D) __builtin_ia32_cvtps2udq512_mask(A, B, C, 1)
+#define __builtin_ia32_cvtsd2ss_round(A, B, C) __builtin_ia32_cvtsd2ss_round(A, B, 1)
+#define __builtin_ia32_cvtss2sd_round(A, B, C) __builtin_ia32_cvtss2sd_round(A, B, 4)
 #define __builtin_ia32_cvtsi2sd64(A, B, C) __builtin_ia32_cvtsi2sd64(A, B, 1)
 #define __builtin_ia32_cvtsi2ss32(A, B, C) __builtin_ia32_cvtsi2ss32(A, B, 1)
 #define __builtin_ia32_cvtsi2ss64(A, B, C) __builtin_ia32_cvtsi2ss64(A, B, 1)
@@ -214,6 +218,8 @@
 #define __builtin_ia32_cvtusi2ss64(A, B, C) __builtin_ia32_cvtusi2ss64(A, B, 1)
 #define __builtin_ia32_divpd512_mask(A, B, C, D, E) __builtin_ia32_divpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_divps512_mask(A, B, C, D, E) __builtin_ia32_divps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_divsd_round(A, B, C) __builtin_ia32_divsd_round(A, B, 1)
+#define __builtin_ia32_divss_round(A, B, C) __builtin_ia32_divss_round(A, B, 1)
 #define __builtin_ia32_extractf32x4_mask(A, E, C, D) __builtin_ia32_extractf32x4_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x4_mask(A, E, C, D) __builtin_ia32_extractf64x4_mask(A, 1, C, D)
 #define __builtin_ia32_extracti32x4_mask(A, E, C, D) __builtin_ia32_extracti32x4_mask(A, 1, C, D)
@@ -236,18 +242,28 @@
 #define __builtin_ia32_gathersiv8di(A, B, C, D, F) __builtin_ia32_gathersiv8di(A, B, C, D, 1)
 #define __builtin_ia32_getexppd512_mask(A, B, C, D) __builtin_ia32_getexppd512_mask(A, B, C, 5)
 #define __builtin_ia32_getexpps512_mask(A, B, C, D) __builtin_ia32_getexpps512_mask(A, B, C, 5)
+#define __builtin_ia32_getexpsd128_round(A, B, C) __builtin_ia32_getexpsd128_round(A, B, 4)
+#define __builtin_ia32_getexpss128_round(A, B, C) __builtin_ia32_getexpss128_round(A, B, 4)
 #define __builtin_ia32_getmantpd512_mask(A, F, C, D, E) __builtin_ia32_getmantpd512_mask(A, 1, C, D, 5)
 #define __builtin_ia32_getmantps512_mask(A, F, C, D, E) __builtin_ia32_getmantps512_mask(A, 1, C, D, 5)
+#define __builtin_ia32_getmantsd_round(A, B, C, D) __builtin_ia32_getmantsd_round(A, B, 1, 4)
+#define __builtin_ia32_getmantss_round(A, B, C, D) __builtin_ia32_getmantss_round(A, B, 1, 4)
 #define __builtin_ia32_insertf32x4_mask(A, B, F, D, E) __builtin_ia32_insertf32x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_insertf64x4_mask(A, B, F, D, E) __builtin_ia32_insertf64x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti32x4_mask(A, B, F, D, E) __builtin_ia32_inserti32x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti64x4_mask(A, B, F, D, E) __builtin_ia32_inserti64x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_maxpd512_mask(A, B, C, D, E) __builtin_ia32_maxpd512_mask(A, B, C, D, 5)
 #define __builtin_ia32_maxps512_mask(A, B, C, D, E) __builtin_ia32_maxps512_mask(A, B, C, D, 5)
+#define __builtin_ia32_maxsd_round(A, B, C) __builtin_ia32_maxsd_round(A, B, 4)
+#define __builtin_ia32_maxss_round(A, B, C) __builtin_ia32_maxss_round(A, B, 4)
 #define __builtin_ia32_minpd512_mask(A, B, C, D, E) __builtin_ia32_minpd512_mask(A, B, C, D, 5)
 #define __builtin_ia32_minps512_mask(A, B, C, D, E) __builtin_ia32_minps512_mask(A, B, C, D, 5)
+#define __builtin_ia32_minsd_round(A, B, C) __builtin_ia32_minsd_round(A, B, 4)
+#define __builtin_ia32_minss_round(A, B, C) __builtin_ia32_minss_round(A, B, 4)
 #define __builtin_ia32_mulpd512_mask(A, B, C, D, E) __builtin_ia32_mulpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_mulps512_mask(A, B, C, D, E) __builtin_ia32_mulps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_mulsd_round(A, B, C) __builtin_ia32_mulsd_round(A, B, 1)
+#define __builtin_ia32_mulss_round(A, B, C) __builtin_ia32_mulss_round(A, B, 1)
 #define __builtin_ia32_permdf512_mask(A, E, C, D) __builtin_ia32_permdf512_mask(A, 1, C, D)
 #define __builtin_ia32_permdi512_mask(A, E, C, D) __builtin_ia32_permdi512_mask(A, 1, C, D)
 #define __builtin_ia32_prold512_mask(A, E, C, D) __builtin_ia32_prold512_mask(A, 1, C, D)
@@ -267,8 +283,12 @@
 #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E)
 #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 5)
 #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 5)
+#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4)
+#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4)
 #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_scalefsd_round(A, B, C) __builtin_ia32_scalefsd_round(A, B, 1)
+#define __builtin_ia32_scalefss_round(A, B, C) __builtin_ia32_scalefss_round(A, B, 1)
 #define __builtin_ia32_scatterdiv8df(A, B, C, D, F) __builtin_ia32_scatterdiv8df(A, B, C, D, 1)
 #define __builtin_ia32_scatterdiv8di(A, B, C, D, F) __builtin_ia32_scatterdiv8di(A, B, C, D, 1)
 #define __builtin_ia32_scatterdiv16sf(A, B, C, D, F) __builtin_ia32_scatterdiv16sf(A, B, C, D, 1)
@@ -285,8 +305,12 @@
 #define __builtin_ia32_shufps512_mask(A, B, F, D, E) __builtin_ia32_shufps512_mask(A, B, 1, D, E)
 #define __builtin_ia32_sqrtpd512_mask(A, B, C, D) __builtin_ia32_sqrtpd512_mask(A, B, C, 1)
 #define __builtin_ia32_sqrtps512_mask(A, B, C, D) __builtin_ia32_sqrtps512_mask(A, B, C, 1)
+#define __builtin_ia32_sqrtss_round(A, B, C) __builtin_ia32_sqrtss_round(A, B, 1)
+#define __builtin_ia32_sqrtsd_round(A, B, C) __builtin_ia32_sqrtsd_round(A, B, 1)
 #define __builtin_ia32_subpd512_mask(A, B, C, D, E) __builtin_ia32_subpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_subps512_mask(A, B, C, D, E) __builtin_ia32_subps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_subsd_round(A, B, C) __builtin_ia32_subsd_round(A, B, 1)
+#define __builtin_ia32_subss_round(A, B, C) __builtin_ia32_subss_round(A, B, 1)
 #define __builtin_ia32_ucmpd512_mask(A, B, E, D) __builtin_ia32_ucmpd512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpq512_mask(A, B, E, D) __builtin_ia32_ucmpq512_mask(A, B, 1, D)
 #define __builtin_ia32_vcomisd(A, B, C, D) __builtin_ia32_vcomisd(A, B, 1, 5)
@@ -315,12 +339,8 @@
 #define __builtin_ia32_vfmaddps512_mask(A, B, C, D, E) __builtin_ia32_vfmaddps512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddps512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddps512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddps512_maskz(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsd3_mask(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsd3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsd3_maskz(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_mask(A, B, C, D, E) __builtin_ia32_vfmaddss3_mask(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddss3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddss3_maskz(A, B, C, D, 1)
+#define __builtin_ia32_vfmaddsd3_round(A, B, C, D) __builtin_ia32_vfmaddsd3_round(A, B, C, 1)
+#define __builtin_ia32_vfmaddss3_round(A, B, C, D) __builtin_ia32_vfmaddss3_round(A, B, C, 1)
 #define __builtin_ia32_vfmaddsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_maskz(A, B, C, D, 1)
@@ -331,8 +351,6 @@
 #define __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmsubps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubps512_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmsubsd3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsd3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmsubss3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubss3_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfnmaddps512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddps512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, 1)
diff --git a/gcc/testsuite/gcc.target/i386/testimm-10.c b/gcc/testsuite/gcc.target/i386/testimm-10.c
index b73f7f0..58e5e52 100644
--- a/gcc/testsuite/gcc.target/i386/testimm-10.c
+++ b/gcc/testsuite/gcc.target/i386/testimm-10.c
@@ -77,7 +77,13 @@ test8bit (void)
   m512  = _mm512_mask_fixupimm_ps (m512, mmask16, m512, m512i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
   m512  = _mm512_maskz_fixupimm_ps (mmask16, m512, m512, m512i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
 
+  m128d = _mm_fixupimm_sd (m128d, m128d, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
+  m128d = _mm_mask_fixupimm_sd (m128d, mmask8, m128d, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
+  m128d = _mm_maskz_fixupimm_sd (mmask8, m128d, m128d, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
 
+  m128  = _mm_fixupimm_ss (m128, m128, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
+  m128  = _mm_mask_fixupimm_ss (m128, mmask8, m128, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
+  m128  = _mm_maskz_fixupimm_ss (mmask8, m128, m128, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
 
   m512i = _mm512_rol_epi32 (m512i, 256); /* { dg-error "the last argument must be an 8-bit immediate" } */
   m512i = _mm512_mask_rol_epi32 (m512i, mmask16, m512i, 256); /* { dg-error "the last argument must be an 8-bit immediate" } */
@@ -107,6 +113,8 @@ test8bit (void)
   m512  = _mm512_mask_roundscale_ps (m512, mmask16, m512, 256); /* { dg-error "the immediate argument must be 8-bit immediate" } */
   m512  = _mm512_maskz_roundscale_ps (mmask16, m512, 256); /* { dg-error "the immediate argument must be 8-bit immediate" } */
 
+  m128d = _mm_roundscale_sd (m128d, m128d, 256); /* { dg-error "the immediate argument must be 8-bit immediate" } */
+  m128  = _mm_roundscale_ss (m128, m128, 256); /* { dg-error "the immediate argument must be 8-bit immediate" } */
 
   m512i = _mm512_alignr_epi32 (m512i, m512i, 256); /* { dg-error "the last argument must be an 8-bit immediate" } */
   m512i = _mm512_mask_alignr_epi32 (m512i, mmask16, m512i, m512i, 256); /* { dg-error "the last argument must be an 8-bit immediate" } */
@@ -179,5 +187,6 @@ test4bit (void) {
   m512  = _mm512_mask_getmant_ps (m512, mmask16, m512, 1, 64); /* { dg-error "the immediate argument must be 4-bit immediate." } */
   m512  = _mm512_maskz_getmant_ps (mmask16, m512, 1, 64); /* { dg-error "the immediate argument must be 4-bit immediate." } */
 
-
+  m128d = _mm_getmant_sd (m128d, m128d, 1, 64); /* { dg-error "the immediate argument must be 4-bit immediate." } */
+  m128  = _mm_getmant_ss (m128, m128, 1, 64); /* { dg-error "the immediate argument must be 4-bit immediate." } */
 }
diff --git a/gcc/testsuite/gcc.target/i386/testround-1.c b/gcc/testsuite/gcc.target/i386/testround-1.c
index 20db176..114f386 100644
--- a/gcc/testsuite/gcc.target/i386/testround-1.c
+++ b/gcc/testsuite/gcc.target/i386/testround-1.c
@@ -19,12 +19,19 @@ __mmask16 mmask16;
 void
 test_round (void)
 {
+  m128d = _mm_add_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_add_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_sub_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_sub_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+
   m512d = _mm512_sqrt_round_pd (m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_sqrt_round_pd (m512d, mmask8, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_maskz_sqrt_round_pd (mmask8, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_sqrt_round_ps (m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_sqrt_round_ps (m512, mmask16, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_sqrt_round_ps (mmask16, m512, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_sqrt_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_sqrt_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_add_round_pd (m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_add_round_pd (m512d, mmask8, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
@@ -51,6 +58,10 @@ test_round (void)
   m512 = _mm512_div_round_ps (m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_div_round_ps (m512, mmask16, m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_div_round_ps (mmask16, m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_mul_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_mul_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_div_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_div_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_scalef_round_pd(m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_scalef_round_pd(m512d, mmask8, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
@@ -58,6 +69,8 @@ test_round (void)
   m512 = _mm512_scalef_round_ps(m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_scalef_round_ps(m512, mmask16, m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_scalef_round_ps(mmask16, m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_scalef_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_scalef_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_fmadd_round_pd (m512d, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_fmadd_round_pd (m512d, mmask8, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
@@ -141,6 +154,16 @@ test_round (void)
   m256 = _mm512_cvt_roundpd_ps (m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m256 = _mm512_mask_cvt_roundpd_ps (m256, mmask8, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m256 = _mm512_maskz_cvt_roundpd_ps (mmask8, m512d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_cvt_roundsd_ss (m128, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+
+  m128d = _mm_fmadd_round_sd (m128d, m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fmadd_round_ss (m128, m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fmsub_round_sd (m128d, m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fmsub_round_ss (m128, m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fnmadd_round_sd (m128d, m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fnmadd_round_ss (m128, m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fnmsub_round_sd (m128d, m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fnmsub_round_ss (m128, m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_max_round_pd (m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_max_round_pd (m512d, mmask8, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
@@ -195,6 +218,10 @@ test_round (void)
   m512 = _mm512_mask_cvt_roundph_ps (m512, mmask16, m256i, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_cvt_roundph_ps (mmask16, m256i, 7); /* { dg-error "incorrect rounding operand." } */
 
+  m128d = _mm_cvt_roundss_sd (m128d, m128, 7); /* { dg-error "incorrect rounding operand." } */
+
+  m128 = _mm_getexp_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_getexp_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_getexp_round_ps (m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_getexp_round_ps (m512, mmask16, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_getexp_round_ps (mmask16, m512, 7); /* { dg-error "incorrect rounding operand." } */
@@ -207,6 +234,8 @@ test_round (void)
   m512 = _mm512_getmant_round_ps (m512, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_getmant_round_ps (m512, mmask16, m512, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_getmant_round_ps (mmask16, m512, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_getmant_round_sd (m128d, m128d, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_getmant_round_ss (m128, m128, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512 = _mm512_roundscale_round_ps (m512, 4, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_roundscale_round_ps (m512, mmask16, m512, 4, 7); /* { dg-error "incorrect rounding operand." } */
@@ -214,6 +243,8 @@ test_round (void)
   m512d = _mm512_roundscale_round_pd (m512d, 4, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_roundscale_round_pd (m512d, mmask8, m512d, 4, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_maskz_roundscale_round_pd (mmask8, m512d, 4, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_roundscale_round_ss (m128, m128, 4, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_roundscale_round_sd (m128d, m128d, 4, 7); /* { dg-error "incorrect rounding operand." } */
 
   mmask8 = _mm512_cmp_round_pd_mask (m512d, m512d, 4, 7); /* { dg-error "incorrect rounding operand." } */
   mmask16 = _mm512_cmp_round_ps_mask (m512, m512, 4, 7); /* { dg-error "incorrect rounding operand." } */
@@ -231,12 +262,19 @@ test_round (void)
 void
 test_round_sae (void)
 {
+  m128d = _mm_add_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_add_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_sub_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_sub_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+
   m512d = _mm512_sqrt_round_pd (m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_sqrt_round_pd (m512d, mmask8, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_maskz_sqrt_round_pd (mmask8, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_sqrt_round_ps (m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_sqrt_round_ps (m512, mmask16, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_sqrt_round_ps (mmask16, m512, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_sqrt_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_sqrt_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_add_round_pd (m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_add_round_pd (m512d, mmask8, m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
@@ -263,6 +301,10 @@ test_round_sae (void)
   m512 = _mm512_div_round_ps (m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_div_round_ps (m512, mmask16, m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_div_round_ps (mmask16, m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_mul_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_mul_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_div_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_div_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_scalef_round_pd(m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_scalef_round_pd(m512d, mmask8, m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
@@ -270,6 +312,8 @@ test_round_sae (void)
   m512 = _mm512_scalef_round_ps(m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_scalef_round_ps(m512, mmask16, m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_scalef_round_ps(mmask16, m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_scalef_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_scalef_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_fmadd_round_pd (m512d, m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_fmadd_round_pd (m512d, mmask8, m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
@@ -353,6 +397,16 @@ test_round_sae (void)
   m256 = _mm512_cvt_roundpd_ps (m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m256 = _mm512_mask_cvt_roundpd_ps (m256, mmask8, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m256 = _mm512_maskz_cvt_roundpd_ps (mmask8, m512d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_cvt_roundsd_ss (m128, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+
+  m128d = _mm_fmadd_round_sd (m128d, m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fmadd_round_ss (m128, m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fmsub_round_sd (m128d, m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fmsub_round_ss (m128, m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fnmadd_round_sd (m128d, m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fnmadd_round_ss (m128, m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fnmsub_round_sd (m128d, m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fnmsub_round_ss (m128, m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
 }
 
 void
@@ -411,6 +465,10 @@ test_sae_only (void)
   m512 = _mm512_mask_cvt_roundph_ps (m512, mmask16, m256i, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_cvt_roundph_ps (mmask16, m256i, 3); /* { dg-error "incorrect rounding operand." } */
 
+  m128d = _mm_cvt_roundss_sd (m128d, m128, 3); /* { dg-error "incorrect rounding operand." } */
+
+  m128 = _mm_getexp_round_ss (m128, m128, 3); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_getexp_round_sd (m128d, m128d, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_getexp_round_ps (m512, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_getexp_round_ps (m512, mmask16, m512, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_getexp_round_ps (mmask16, m512, 3); /* { dg-error "incorrect rounding operand." } */
@@ -423,12 +481,17 @@ test_sae_only (void)
   m512 = _mm512_getmant_round_ps (m512, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_getmant_round_ps (m512, mmask16, m512, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_getmant_round_ps (mmask16, m512, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_getmant_round_sd (m128d, m128d, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_getmant_round_ss (m128, m128, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
+
   m512 = _mm512_roundscale_round_ps (m512, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_roundscale_round_ps (m512, mmask16, m512, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_roundscale_round_ps (mmask16, m512, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_roundscale_round_pd (m512d, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_roundscale_round_pd (m512d, mmask8, m512d, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_maskz_roundscale_round_pd (mmask8, m512d, 4, 3); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_roundscale_round_ss (m128, m128, 4, 3); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_roundscale_round_sd (m128d, m128d, 4, 3); /* { dg-error "incorrect rounding operand." } */
 
   mmask8 = _mm512_cmp_round_pd_mask (m512d, m512d, 4, 3); /* { dg-error "incorrect rounding operand." } */
   mmask16 = _mm512_cmp_round_ps_mask (m512, m512, 4, 3); /* { dg-error "incorrect rounding operand." } */

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
  2013-12-24  4:56             ` Kirill Yukhin
@ 2013-12-24  4:59               ` Kirill Yukhin
       [not found]               ` <CAGs3RfukBMLGgmzR8+EFP2ok4mfXXz1DY88ghr-b_u=TbcSkdA@mail.gmail.com>
  1 sibling, 0 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-24  4:59 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson; +Cc: GCC Patches, Jakub Jelinek, Jeff Law

> Patch attached.
>
> Ok for trunk?
>

Just noticed Uros's input about predicates. So, ok with fix of predicate?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
       [not found]               ` <CAGs3RfukBMLGgmzR8+EFP2ok4mfXXz1DY88ghr-b_u=TbcSkdA@mail.gmail.com>
@ 2013-12-24  7:34                 ` Uros Bizjak
  2013-12-30 12:59                   ` Kirill Yukhin
  0 siblings, 1 reply; 63+ messages in thread
From: Uros Bizjak @ 2013-12-24  7:34 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Richard Henderson, GCC Patches, Jakub Jelinek, Jeff Law

On Tue, Dec 24, 2013 at 5:57 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
>> Patch attached.
>>
>> Ok for trunk?
>
>
> Just noticed Uros's input about predicates. So, ok with fix of predicate?

Please retest and repost the patch with the predicate fix.

Looks good otherwise, with a couple of minor changes below:

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vextractf32x4-2.c
b/gcc/testsuite/gcc.target/i386/avx512f-vextractf32x4-2.c
new file mode 100644
index 0000000..26d7c3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vextractf32x4-2.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -DAVX512F" } */

Please move defines from options to source.

+(define_subst_attr "round_prefix" "round" "vex" "evex")
 (define_subst_attr "round_mode512bit_condition" "round" "1"
"(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) ==
V8DFmode)")
 (define_subst_attr "round_modev4sf_condition" "round" "1" "(GET_MODE
(operands[0]) == V4SFmode)")

While here, can you also change conditions to static checks
(<MODE>mode == ....) ?

Uros.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst.
  2013-12-23 16:46           ` Uros Bizjak
@ 2013-12-26  9:48             ` Kirill Yukhin
  0 siblings, 0 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-26  9:48 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Henderson, Jakub Jelinek, Jeff Law, GCC Patches

Hello Uros,
On 23 Dec 17:46, Uros Bizjak wrote:
> This "round_expand_predicate" is the predicate substitution I was
> referred to in the review of 5/8. Please use it also in insn patterns,
> perhaps renamed as "round_predicate"

This is drawback of substs. We bind given subst attribute to given subst
strictly. So, this guy:

+(define_subst_attr "round_expand_predicate" "round_expand" "nonimmediate_operand" "register_operand")

is binded to "round_expand" (second argument of definition) subst and to it only.
That is way name is "round_expand...", it reflects subst it relates to.

For rest substs I'll introduce dedicated attributes.

--
Thanks, K

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.
  2013-12-23 16:26             ` Uros Bizjak
@ 2013-12-26 13:10               ` Kirill Yukhin
  0 siblings, 0 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-26 13:10 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Henderson, Jakub Jelinek, Jeff Law, GCC Patches

Hello,

On 23 Dec 17:26, Uros Bizjak wrote:
> On Mon, Dec 23, 2013 at 5:11 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > So, OK for mainline, but I would kindly ask you to please wait a
> > couple of days for possible Richard's comments
> 
> When substituting constraints, please also substitute corresponding
> operand predicate:
> 
> nonimmediate_operand -> register_operand in 1st and 3rd case
> memory_operand -> register_operand in 2nd case.

Thanks! I've introduced new subst attribute:
+(define_subst_attr "round_nimm_predicate" "round" "nonimmediate_operand" "register_operand")

which name reflect:
  1.  affilation to `round' subst (`round')
  2.  predicate it intended to affect (`nimm_predicate')

TESTING
  1. Bootstrap pass.
  2. make check shows no regressions.
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option.
  4. Spec 2000 & 2006 run shows no regressions without -m512f option.

If no more inputs - I'll check it in after 24 hrs from now.

--
Thanks, K

---
 gcc/config/i386/i386.c   |  32 ++++
 gcc/config/i386/i386.md  |  10 +
 gcc/config/i386/sse.md   | 480 ++++++++++++++++++++++++-----------------------
 gcc/config/i386/subst.md |  42 +++++
 4 files changed, 331 insertions(+), 233 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ecf5e0b..a3dd307 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15041,6 +15041,38 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	    fputs ("{z}", file);
 	  return;
 
+	case 'R':
+	  gcc_assert (CONST_INT_P (x));
+
+	  if (ASSEMBLER_DIALECT == ASM_INTEL)
+	    fputs (", ", file);
+
+	  switch (INTVAL (x))
+	    {
+	    case ROUND_NEAREST_INT:
+	      fputs ("{rn-sae}", file);
+	      break;
+	    case ROUND_NEG_INF:
+	      fputs ("{rd-sae}", file);
+	      break;
+	    case ROUND_POS_INF:
+	      fputs ("{ru-sae}", file);
+	      break;
+	    case ROUND_ZERO:
+	      fputs ("{rz-sae}", file);
+	      break;
+	    case ROUND_SAE:
+	      fputs ("{sae}", file);
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+
+	  if (ASSEMBLER_DIALECT == ASM_ATT)
+	    fputs (", ", file);
+
+	  return;
+
 	case '*':
 	  if (ASSEMBLER_DIALECT == ASM_ATT)
 	    putc ('*', file);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ab5b33f..30b8d74 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -241,6 +241,16 @@
    (ROUND_NO_EXC		0x8)
   ])
 
+;; Constants to represent AVX512F embeded rounding
+(define_constants
+  [(ROUND_NEAREST_INT			0)
+   (ROUND_NEG_INF			1)
+   (ROUND_POS_INF			2)
+   (ROUND_ZERO				3)
+   (NO_ROUND				4)
+   (ROUND_SAE				5)
+  ])
+
 ;; Constants to represent pcomtrue/pcomfalse variants
 (define_constants
   [(PCOM_FALSE			0)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index adedf44..119d0b0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1229,23 +1229,23 @@
 }
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
-(define_expand "<plusminus_insn><mode>3<mask_name>"
+(define_expand "<plusminus_insn><mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(plusminus:VF
-	  (match_operand:VF 1 "nonimmediate_operand")
-	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+	  (match_operand:VF 1 "<round_nimm_predicate>")
+	  (match_operand:VF 2 "<round_nimm_predicate>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*<plusminus_insn><mode>3<mask_name>"
+(define_insn "*<plusminus_insn><mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(plusminus:VF
-	  (match_operand:VF 1 "nonimmediate_operand" "<comm>0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition>"
+	  (match_operand:VF 1 "<round_nimm_predicate>" "<comm>0,v")
+	  (match_operand:VF 2 "<round_nimm_predicate>" "xm,<round_constraint>")))]
+  "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    <plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
-   v<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   v<plusminus_mnemonic><ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "prefix" "<mask_prefix3>")
@@ -1268,23 +1268,23 @@
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_expand "mul<mode>3<mask_name>"
+(define_expand "mul<mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand")
 	(mult:VF
-	  (match_operand:VF 1 "nonimmediate_operand")
-	  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+	  (match_operand:VF 1 "<round_nimm_predicate>")
+	  (match_operand:VF 2 "<round_nimm_predicate>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
 
-(define_insn "*mul<mode>3<mask_name>"
+(define_insn "*mul<mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(mult:VF
-	  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition>"
+	  (match_operand:VF 1 "<round_nimm_predicate>" "%0,v")
+	  (match_operand:VF 2 "<round_nimm_predicate>" "xm,<round_constraint>")))]
+  "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands) && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    mul<ssemodesuffix>\t{%2, %0|%0, %2}
-   vmul<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   vmul<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssemul")
    (set_attr "prefix" "<mask_prefix3>")
@@ -1335,15 +1335,15 @@
     }
 })
 
-(define_insn "<sse>_div<mode>3<mask_name>"
+(define_insn "<sse>_div<mode>3<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
 	(div:VF
 	  (match_operand:VF 1 "register_operand" "0,v")
-	  (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
+	  (match_operand:VF 2 "<round_nimm_predicate>" "xm,<round_constraint>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    div<ssemodesuffix>\t{%2, %0|%0, %2}
-   vdiv<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   vdiv<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "ssediv")
    (set_attr "prefix" "<mask_prefix3>")
@@ -1427,11 +1427,11 @@
     }
 })
 
-(define_insn "<sse>_sqrt<mode>2<mask_name>"
+(define_insn "<sse>_sqrt<mode>2<mask_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=v")
-	(sqrt:VF (match_operand:VF 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE && <mask_mode512bit_condition>"
-  "%vsqrt<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+	(sqrt:VF (match_operand:VF 1 "<round_nimm_predicate>" "<round_constraint>")))]
+  "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "%vsqrt<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
    (set_attr "btver2_sse_attr" "sqrt")
@@ -2698,210 +2698,224 @@
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>"
+(define_expand "avx512f_fmadd_<mode>_maskz"
+  [(match_operand:VF_512 0 "register_operand")
+   (match_operand:VF_512 1 "nonimmediate_operand")
+   (match_operand:VF_512 2 "nonimmediate_operand")
+   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_fma_fmadd_<mode>_maskz_1 (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]));
+  DONE;
+})
+
+(define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
-	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  "<sd_mask_mode512bit_condition>"
-  "@
-   vfmadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+	  (match_operand:FMAMODE 1 "<round_nimm_predicate>" "%0,0,v,x,x")
+	  (match_operand:FMAMODE 2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>,x,m")
+	  (match_operand:FMAMODE 3 "<round_nimm_predicate>" "v,<round_constraint>,0,xm,x")))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "@
+   vfmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmadd_<mode>_mask"
+(define_insn "avx512f_fmadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "0,0")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
-	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	    (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>,v")
+	    (match_operand:VF_512 3 "<round_nimm_predicate>" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmadd_<mode>_mask3"
+(define_insn "avx512f_fmadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=x")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "x")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>")
 	    (match_operand:VF_512 3 "register_operand" "0"))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
-	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  (match_operand:FMAMODE   1 "<round_nimm_predicate>" "%0, 0, v, x,x")
+	  (match_operand:FMAMODE   2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
-  "<sd_mask_mode512bit_condition>"
+	    (match_operand:FMAMODE 3 "<round_nimm_predicate>" "v,<round_constraint>,0,xm,x"))))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsub_<mode>_mask"
+(define_insn "avx512f_fmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "0,0")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>,v")
 	    (neg:VF_512
-	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	      (match_operand:VF_512 3 "<round_nimm_predicate>" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsub_<mode>_mask3"
+(define_insn "avx512f_fmsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (match_operand:VF_512 1 "register_operand" "v")
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>")
 	    (neg:VF_512
 	      (match_operand:VF_512 3 "register_operand" "0")))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
-	  (match_operand:FMAMODE   3 "nonimmediate_operand" " v,vm, 0,xm,x")))]
-  "<sd_mask_mode512bit_condition>"
-  "@
-   vfnmadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfnmadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfnmadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+	    (match_operand:FMAMODE 1 "<round_nimm_predicate>" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>,x,m")
+	  (match_operand:FMAMODE   3 "<round_nimm_predicate>" "v,<round_constraint>,0,xm,x")))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "@
+   vfnmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfnmadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfnmadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmadd_<mode>_mask"
+(define_insn "avx512f_fnmadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "0,0"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
-	    (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))
+	    (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>,v")
+	    (match_operand:VF_512 3 "<round_nimm_predicate>" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfnmadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfnmadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfnmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfnmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmadd_<mode>_mask3"
+(define_insn "avx512f_fnmadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "v"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>")
 	    (match_operand:VF_512 3 "register_operand" "0"))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfnmadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfnmadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	    (match_operand:FMAMODE 1 "<round_nimm_predicate>" "%0,0,v,x,x"))
+	  (match_operand:FMAMODE   2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>,x,m")
 	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x"))))]
-  "<sd_mask_mode512bit_condition>"
+	    (match_operand:FMAMODE 3 "<round_nimm_predicate>" "v,<round_constraint>,0,xm,x"))))]
+  "<sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfnmsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfnmsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfnmsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfnmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfnmsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfnmsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfnmsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmsub_<mode>_mask"
+(define_insn "avx512f_fnmsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "0,0"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	    (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>,v")
 	    (neg:VF_512
-	      (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")))
+	      (match_operand:VF_512 3 "<round_nimm_predicate>" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfnmsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfnmsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfnmsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfnmsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fnmsub_<mode>_mask3"
+(define_insn "avx512f_fnmsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (fma:VF_512
 	    (neg:VF_512
 	      (match_operand:VF_512 1 "register_operand" "v"))
-	    (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	    (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>")
 	    (neg:VF_512
 	      (match_operand:VF_512 3 "register_operand" "0")))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfnmsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfnmsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2940,109 +2954,109 @@
   DONE;
 })
 
-(define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
-	  [(match_operand:VF 1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF 2 "nonimmediate_operand" "vm, v,vm, x,m")
-	   (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x")]
+	  [(match_operand:VF 1 "<round_nimm_predicate>" "%0,0,v,x,x")
+	   (match_operand:VF 2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>,x,m")
+	   (match_operand:VF 3 "<round_nimm_predicate>" "v,<round_constraint>,0,xm,x")]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition>"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfmaddsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmaddsub213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmaddsub231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmaddsub_<mode>_mask"
+(define_insn "avx512f_fmaddsub_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "0,0")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
-	     (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")]
+	     (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>,v")
+	     (match_operand:VF_512 3 "<round_nimm_predicate>" "v,<round_constraint>")]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmaddsub132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmaddsub213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmaddsub_<mode>_mask3"
+(define_insn "avx512f_fmaddsub_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "v")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>")
 	     (match_operand:VF_512 3 "register_operand" "0")]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmaddsub231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmaddsub231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name>"
+(define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF
-	  [(match_operand:VF   1 "nonimmediate_operand" "%0, 0, v, x,x")
-	   (match_operand:VF   2 "nonimmediate_operand" "vm, v,vm, x,m")
+	  [(match_operand:VF   1 "<round_nimm_predicate>" "%0,0,v,x,x")
+	   (match_operand:VF   2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>,x,m")
 	   (neg:VF
-	     (match_operand:VF 3 "nonimmediate_operand" " v,vm, 0,xm,x"))]
+	     (match_operand:VF 3 "<round_nimm_predicate>" "v,<round_constraint>,0,xm,x"))]
 	  UNSPEC_FMADDSUB))]
-  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition>"
+  "(TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
-   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2}
-   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3}
-   vfmsubadd231<ssemodesuffix>\t{%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2}
+   vfmsubadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
+   vfmsubadd213<ssemodesuffix>\t{<round_sd_mask_op4>%3, %2, %0<sd_mask_op4>|%0<sd_mask_op4>, %2, %3<round_sd_mask_op4>}
+   vfmsubadd231<ssemodesuffix>\t{<round_sd_mask_op4>%2, %1, %0<sd_mask_op4>|%0<sd_mask_op4>, %1, %2<round_sd_mask_op4>}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
    vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f,fma_avx512f,fma4,fma4")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsubadd_<mode>_mask"
+(define_insn "avx512f_fmsubadd_<mode>_mask<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v,v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "0,0")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm,v")
+	     (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>,v")
 	     (neg:VF_512
-	       (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))]
+	       (match_operand:VF_512 3 "<round_nimm_predicate>" "v,<round_constraint>"))]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k,k")))]
   "TARGET_AVX512F"
   "@
-   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2}
-   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}"
+   vfmsubadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
+   vfmsubadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
   [(set_attr "isa" "fma_avx512f,fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_fmsubadd_<mode>_mask3"
+(define_insn "avx512f_fmsubadd_<mode>_mask3<round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(vec_merge:VF_512
 	  (unspec:VF_512
 	    [(match_operand:VF_512 1 "register_operand" "v")
-	     (match_operand:VF_512 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>")
 	     (neg:VF_512
 	       (match_operand:VF_512 3 "register_operand" "0"))]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "k")))]
   "TARGET_AVX512F"
-  "vfmsubadd231<ssemodesuffix>\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}"
+  "vfmsubadd231<ssemodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%}, %1, %2<round_op5>}"
   [(set_attr "isa" "fma_avx512f")
    (set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -3050,13 +3064,13 @@
 ;; FMA3 floating point scalar intrinsics. These merge result with
 ;; high-order elements from the destination register.
 
-(define_expand "fmai_vmfmadd_<mode>"
+(define_expand "fmai_vmfmadd_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand")
 	(vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand")
-	    (match_operand:VF_128 2 "nonimmediate_operand")
-	    (match_operand:VF_128 3 "nonimmediate_operand"))
+	    (match_operand:VF_128 1 "<round_nimm_predicate>")
+	    (match_operand:VF_128 2 "<round_nimm_predicate>")
+	    (match_operand:VF_128 3 "<round_nimm_predicate>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA")
@@ -3065,15 +3079,15 @@
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "vm, v")
-	    (match_operand:VF_128 3 "nonimmediate_operand" " v,vm"))
+	    (match_operand:VF_128 1 "<round_nimm_predicate>" " 0, 0")
+	    (match_operand:VF_128 2 "<round_nimm_predicate>" "<round_constraint>, v")
+	    (match_operand:VF_128 3 "<round_nimm_predicate>" " v,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfmadd132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfmadd213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
@@ -3081,51 +3095,51 @@
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   2 "nonimmediate_operand" "vm, v")
+	    (match_operand:VF_128   1 "<round_nimm_predicate>" "0,0")
+	    (match_operand:VF_128   2 "<round_nimm_predicate>" "<round_constraint>,v")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "<round_nimm_predicate>" " v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfmsub132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfmsub213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmadd_<mode>"
+(define_insn "*fmai_fnmadd_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
-	    (match_operand:VF_128   3 "nonimmediate_operand" " v,vm"))
+	      (match_operand:VF_128 2 "<round_nimm_predicate>" "<round_constraint>,v"))
+	    (match_operand:VF_128   1 "<round_nimm_predicate>" "0,0")
+	    (match_operand:VF_128   3 "<round_nimm_predicate>" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfnmadd132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfnmadd213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fmai_fnmsub_<mode>"
+(define_insn "*fmai_fnmsub_<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v,v")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 2 "nonimmediate_operand" "vm, v"))
-	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
+	      (match_operand:VF_128 2 "<round_nimm_predicate>" "<round_constraint>, v"))
+	    (match_operand:VF_128   1 "<round_nimm_predicate>" " 0, 0")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " v,vm")))
+	      (match_operand:VF_128 3 "<round_nimm_predicate>" " v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
   "@
-   vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %<iptr>3, %<iptr>2}
-   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %<iptr>3}"
+   vfnmsub132<ssescalarmodesuffix>\t{<round_op4>%2, %3, %0|%0, %<iptr>3, %<iptr>2<round_op4>}
+   vfnmsub213<ssescalarmodesuffix>\t{<round_op4>%3, %2, %0|%0, %<iptr>2, %<iptr>3<round_op4>}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
@@ -3246,18 +3260,18 @@
    (set_attr "prefix_rep" "0")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtsi2ss"
+(define_insn "sse_cvtsi2ss<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (float:SF (match_operand:SI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:SF (match_operand:SI 2 "<round_nimm_predicate>" "r,m,<round_constraint3>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    cvtsi2ss\t{%2, %0|%0, %2}
    cvtsi2ss\t{%2, %0|%0, %2}
-   vcvtsi2ss\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2ss\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "vector,double,*")
@@ -3267,18 +3281,18 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtsi2ssq"
+(define_insn "sse_cvtsi2ssq<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (float:SF (match_operand:DI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:SF (match_operand:DI 2 "<round_nimm_predicate>" "r,m,<round_constraint3>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE && TARGET_64BIT"
   "@
    cvtsi2ssq\t{%2, %0|%0, %2}
    cvtsi2ssq\t{%2, %0|%0, %2}
-   vcvtsi2ssq\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2ssq\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "vector,double,*")
@@ -3290,15 +3304,15 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "SF")])
 
-(define_insn "sse_cvtss2si"
+(define_insn "sse_cvtss2si<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(unspec:SI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V4SF 1 "<round_nimm_predicate>" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE"
-  "%vcvtss2si\t{%1, %0|%0, %k1}"
+  "%vcvtss2si\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3320,15 +3334,15 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse_cvtss2siq"
+(define_insn "sse_cvtss2siq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(unspec:DI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V4SF 1 "<round_nimm_predicate>" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE && TARGET_64BIT"
-  "%vcvtss2si{q}\t{%1, %0|%0, %k1}"
+  "%vcvtss2si{q}\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3382,50 +3396,50 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "DI")])
 
-(define_insn "cvtusi2<ssescalarmodesuffix>32"
+(define_insn "cvtusi2<ssescalarmodesuffix>32<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_duplicate:VF_128
 	    (unsigned_float:<ssescalarmode>
-	      (match_operand:SI 2 "nonimmediate_operand" "rm")))
+	      (match_operand:SI 2 "<round_nimm_predicate>" "<round_constraint3>")))
 	  (match_operand:VF_128 1 "register_operand" "v")
 	  (const_int 1)))]
-  "TARGET_AVX512F"
-  "vcvtusi2<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "TARGET_AVX512F && <round_modev4sf_condition>"
+  "vcvtusi2<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "cvtusi2<ssescalarmodesuffix>64"
+(define_insn "cvtusi2<ssescalarmodesuffix>64<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (vec_duplicate:VF_128
 	    (unsigned_float:<ssescalarmode>
-	      (match_operand:DI 2 "nonimmediate_operand" "rm")))
+	      (match_operand:DI 2 "<round_nimm_predicate>" "<round_constraint3>")))
 	  (match_operand:VF_128 1 "register_operand" "v")
 	  (const_int 1)))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtusi2<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vcvtusi2<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
-(define_insn "float<sseintvecmodelower><mode>2<mask_name>"
+(define_insn "float<sseintvecmodelower><mode>2<mask_name><round_name>"
   [(set (match_operand:VF1 0 "register_operand" "=v")
 	(float:VF1
-	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "vm")))]
-  "TARGET_SSE2 && <mask_mode512bit_condition>"
-  "%vcvtdq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+	  (match_operand:<sseintvecmode> 1 "<round_nimm_predicate>" "<round_constraint>")))]
+  "TARGET_SSE2 && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "%vcvtdq2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "ufloatv16siv16sf2<mask_name>"
+(define_insn "ufloatv16siv16sf2<mask_name><round_name>"
   [(set (match_operand:V16SF 0 "register_operand" "=v")
 	(unsigned_float:V16SF
-	  (match_operand:V16SI 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V16SI 1 "<round_nimm_predicate>" "<round_constraint>")))]
   "TARGET_AVX512F"
-  "vcvtudq2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtudq2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V16SF")])
@@ -3460,24 +3474,24 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name>"
+(define_insn "<mask_codefor>avx512f_fix_notruncv16sfv16si<mask_name><round_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
-	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16SF 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtps2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name>"
+(define_insn "<mask_codefor>avx512f_ufix_notruncv16sfv16si<mask_name><round_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(unspec:V16SI
-	  [(match_operand:V16SF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V16SF 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtps2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtps2udq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
@@ -3595,18 +3609,18 @@
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "DF")])
 
-(define_insn "sse2_cvtsi2sdq"
+(define_insn "sse2_cvtsi2sdq<round_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (vec_duplicate:V2DF
-	    (float:DF (match_operand:DI 2 "nonimmediate_operand" "r,m,rm")))
+	    (float:DF (match_operand:DI 2 "<round_nimm_predicate>" "r,m,<round_constraint3>")))
 	  (match_operand:V2DF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE2 && TARGET_64BIT"
   "@
    cvtsi2sdq\t{%2, %0|%0, %2}
    cvtsi2sdq\t{%2, %0|%0, %2}
-   vcvtsi2sdq\t{%2, %1, %0|%0, %1, %2}"
+   vcvtsi2sdq\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,direct,*")
@@ -3617,28 +3631,28 @@
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "DF")])
 
-(define_insn "avx512f_vcvtss2usi"
+(define_insn "avx512f_vcvtss2usi<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V4SF 1 "<round_nimm_predicate>" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtss2usi\t{%1, %0|%0, %1}"
+  "vcvtss2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvtss2usiq"
+(define_insn "avx512f_vcvtss2usiq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec:DI
 	  [(vec_select:SF
-	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V4SF 1 "<round_nimm_predicate>" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtss2usi\t{%1, %0|%0, %1}"
+  "vcvtss2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -3667,28 +3681,28 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "avx512f_vcvtsd2usi"
+(define_insn "avx512f_vcvtsd2usi<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V2DF 1 "<round_nimm_predicate>" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtsd2usi\t{%1, %0|%0, %1}"
+  "vcvtsd2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "SI")])
 
-(define_insn "avx512f_vcvtsd2usiq"
+(define_insn "avx512f_vcvtsd2usiq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec:DI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "vm")
+	     (match_operand:V2DF 1 "<round_nimm_predicate>" "<round_constraint>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F && TARGET_64BIT"
-  "vcvtsd2usi\t{%1, %0|%0, %1}"
+  "vcvtsd2usi\t{<round_op2>%1, %0|%0, %1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
@@ -3717,15 +3731,15 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "DI")])
 
-(define_insn "sse2_cvtsd2si"
+(define_insn "sse2_cvtsd2si<round_name>"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(unspec:SI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V2DF 1 "<round_nimm_predicate>" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2"
-  "%vcvtsd2si\t{%1, %0|%0, %q1}"
+  "%vcvtsd2si\t{<round_op2>%1, %0|%0, %q1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3748,15 +3762,15 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SI")])
 
-(define_insn "sse2_cvtsd2siq"
+(define_insn "sse2_cvtsd2siq<round_name>"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(unspec:DI
 	  [(vec_select:DF
-	     (match_operand:V2DF 1 "nonimmediate_operand" "v,m")
+	     (match_operand:V2DF 1 "<round_nimm_predicate>" "v,<round_constraint2>")
 	     (parallel [(const_int 0)]))]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2 && TARGET_64BIT"
-  "%vcvtsd2si{q}\t{%1, %0|%0, %q1}"
+  "%vcvtsd2si{q}\t{<round_op2>%1, %0|%0, %q1<round_op2>}"
   [(set_attr "type" "sseicvt")
    (set_attr "athlon_decode" "double,vector")
    (set_attr "bdver1_decode" "double,double")
@@ -3877,13 +3891,13 @@
    (set_attr "ssememalign" "64")
    (set_attr "mode" "V2DF")])
 
-(define_insn "<mask_codefor>avx512f_cvtpd2dq512<mask_name>"
+(define_insn "<mask_codefor>avx512f_cvtpd2dq512<mask_name><round_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(unspec:V8SI
-	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V8DF 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtpd2dq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -3951,13 +3965,13 @@
    (set_attr "athlon_decode" "vector")
    (set_attr "bdver1_decode" "double")])
 
-(define_insn "avx512f_ufix_notruncv8dfv8si<mask_name>"
+(define_insn "avx512f_ufix_notruncv8dfv8si<mask_name><round_name>"
   [(set (match_operand:V8SI 0 "register_operand" "=v")
 	(unspec:V8SI
-	  [(match_operand:V8DF 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:V8DF 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
   "TARGET_AVX512F"
-  "vcvtpd2udq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtpd2udq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "OI")])
@@ -4073,12 +4087,12 @@
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "DF")])
 
-(define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name>"
+(define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>"
   [(set (match_operand:V8SF 0 "register_operand" "=v")
 	(float_truncate:V8SF
-	  (match_operand:V8DF 1 "nonimmediate_operand" "vm")))]
+	  (match_operand:V8DF 1 "<round_nimm_predicate>" "<round_constraint>")))]
   "TARGET_AVX512F"
-  "vcvtpd2ps\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vcvtpd2ps\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "V8SF")])
@@ -6446,14 +6460,14 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
-(define_insn "avx512f_scalef<mode><mask_name>"
+(define_insn "avx512f_scalef<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
 	  [(match_operand:VF_512 1 "register_operand" "v")
-	   (match_operand:VF_512 2 "nonimmediate_operand" "vm")]
+	   (match_operand:VF_512 2 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_SCALEF))]
   "TARGET_AVX512F"
-  "%vscalef<ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+  "%vscalef<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<MODE>")])
 
@@ -8187,22 +8201,22 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "<code><mode>3<mask_name>"
+(define_expand "<code><mode>3<mask_name><round_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand")
 	(maxmin:VI124_256_48_512
-	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand")
-	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand")))]
-  "TARGET_AVX2 && <mask_mode512bit_condition>"
+	  (match_operand:VI124_256_48_512 1 "<round_nimm_predicate>")
+	  (match_operand:VI124_256_48_512 2 "<round_nimm_predicate>")))]
+  "TARGET_AVX2 && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
-(define_insn "*avx2_<code><mode>3<mask_name>"
+(define_insn "*avx2_<code><mode>3<mask_name><round_name>"
   [(set (match_operand:VI124_256_48_512 0 "register_operand" "=v")
 	(maxmin:VI124_256_48_512
-	  (match_operand:VI124_256_48_512 1 "nonimmediate_operand" "%v")
-	  (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "vm")))]
+	  (match_operand:VI124_256_48_512 1 "<round_nimm_predicate>" "%v")
+	  (match_operand:VI124_256_48_512 2 "<round_nimm_predicate>" "<round_constraint>")))]
   "TARGET_AVX2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
-   && <mask_mode512bit_condition>"
-  "vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
+   && <mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "vp<maxmin_int><ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
   [(set_attr "type" "sseiadd")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_evex")
@@ -12500,33 +12514,33 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "avx512er_exp2<mode><mask_name>"
+(define_insn "avx512er_exp2<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_EXP2))]
   "TARGET_AVX512ER"
-  "vexp2<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vexp2<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<mask_codefor>avx512er_rcp28<mode><mask_name>"
+(define_insn "<mask_codefor>avx512er_rcp28<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_RCP28))]
   "TARGET_AVX512ER"
-  "vrcp28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vrcp28<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<mask_codefor>avx512er_rsqrt28<mode><mask_name>"
+(define_insn "<mask_codefor>avx512er_rsqrt28<mode><mask_name><round_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
-	  [(match_operand:VF_512 1 "nonimmediate_operand" "vm")]
+	  [(match_operand:VF_512 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_RSQRT28))]
   "TARGET_AVX512ER"
-  "vrsqrt28<ssemodesuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "vrsqrt28<ssemodesuffix>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 594dc43..b20cf20 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -30,6 +30,16 @@
 (define_mode_iterator SUBST_S
   [QI HI SI DI])
 
+(define_mode_iterator SUBST_A
+  [V16QI
+   V16HI V8HI
+   V16SI V8SI  V4SI
+   V8DI  V4DI  V2DI
+   V16SF V8SF  V4SF
+   V8DF  V4DF  V2DF
+   QI HI SI DI SF DF
+   CCFP CCFPU])
+
 (define_subst_attr "mask_name" "mask" "" "_mask")
 (define_subst_attr "mask_applied" "mask" "false" "true")
 (define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2")
@@ -87,3 +97,35 @@
 	 (match_operand:SUBST_V 2 "const0_operand" "C")
 	 (match_operand:<avx512fmaskmode> 3 "register_operand" "k")))
 ])
+
+(define_subst_attr "round_name" "round" "" "_round")
+(define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4")
+(define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5")
+(define_subst_attr "round_mask_scalar_operand3" "mask_scalar" "%R3" "%R5")
+(define_subst_attr "round_sd_mask_operand4" "sd" "%R4" "%R6")
+(define_subst_attr "round_op2" "round" "" "%R2")
+(define_subst_attr "round_op3" "round" "" "%R3")
+(define_subst_attr "round_op4" "round" "" "%R4")
+(define_subst_attr "round_op5" "round" "" "%R5")
+(define_subst_attr "round_op6" "round" "" "%R6")
+(define_subst_attr "round_mask_op2" "round" "" "<round_mask_operand2>")
+(define_subst_attr "round_mask_op3" "round" "" "<round_mask_operand3>")
+(define_subst_attr "round_mask_scalar_op3" "round" "" "<round_mask_scalar_operand3>")
+(define_subst_attr "round_sd_mask_op4" "round" "" "<round_sd_mask_operand4>")
+(define_subst_attr "round_constraint" "round" "vm" "v")
+(define_subst_attr "round_constraint2" "round" "m" "v")
+(define_subst_attr "round_constraint3" "round" "rm" "r")
+(define_subst_attr "round_nimm_predicate" "round" "nonimmediate_operand" "register_operand")
+(define_subst_attr "round_mode512bit_condition" "round" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) == V8DFmode)")
+(define_subst_attr "round_modev4sf_condition" "round" "1" "(GET_MODE (operands[0]) == V4SFmode)")
+(define_subst_attr "round_codefor" "round" "*" "")
+(define_subst_attr "round_opnum" "round" "5" "6")
+
+(define_subst "round"
+  [(set (match_operand:SUBST_A 0)
+        (match_operand:SUBST_A 1))]
+  "TARGET_AVX512F"
+  [(parallel[
+     (set (match_dup 0)
+          (match_dup 1))
+     (unspec [(match_operand:SI 2 "const_0_to_4_operand")] UNSPEC_EMBEDDED_ROUNDING)])])

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
  2013-12-24  7:34                 ` Uros Bizjak
@ 2013-12-30 12:59                   ` Kirill Yukhin
  0 siblings, 0 replies; 63+ messages in thread
From: Kirill Yukhin @ 2013-12-30 12:59 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Henderson, GCC Patches, Jakub Jelinek, Jeff Law

Hello,
On 24 Dec 08:34, Uros Bizjak wrote:
> Please retest and repost the patch with the predicate fix.
> 
> Looks good otherwise, with a couple of minor changes below:

I am testing patch in the bottom. If no more inputs and testing pass - 
I'll commit it tomorrow.

--
Thanks, K

---
 gcc/config/i386/avx512fintrin.h                    | 549 +++++++++++++++++++++
 gcc/config/i386/i386-builtin-types.def             |   5 +
 gcc/config/i386/i386.c                             |  77 ++-
 gcc/config/i386/sse.md                             |  78 +--
 gcc/config/i386/subst.md                           |  12 +-
 gcc/testsuite/gcc.target/i386/avx-1.c              |  38 +-
 gcc/testsuite/gcc.target/i386/avx512f-vaddsd-1.c   |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vaddss-1.c   |  13 +
 .../gcc.target/i386/avx512f-vcvtsd2ss-1.c          |  14 +
 .../gcc.target/i386/avx512f-vcvtss2sd-1.c          |  14 +
 gcc/testsuite/gcc.target/i386/avx512f-vdivsd-1.c   |  14 +
 gcc/testsuite/gcc.target/i386/avx512f-vdivss-1.c   |  13 +
 .../gcc.target/i386/avx512f-vextractf32x4-2.c      |  56 +++
 .../gcc.target/i386/avx512f-vextracti32x4-2.c      |  57 +++
 .../gcc.target/i386/avx512f-vfmaddXXXsd-1.c        |  13 +
 .../gcc.target/i386/avx512f-vfmaddXXXss-1.c        |  13 +
 .../gcc.target/i386/avx512f-vfmsubXXXsd-1.c        |  13 +
 .../gcc.target/i386/avx512f-vfmsubXXXss-1.c        |  13 +
 .../gcc.target/i386/avx512f-vfnmaddXXXsd-1.c       |  13 +
 .../gcc.target/i386/avx512f-vfnmaddXXXss-1.c       |  13 +
 .../gcc.target/i386/avx512f-vfnmsubXXXsd-1.c       |  13 +
 .../gcc.target/i386/avx512f-vfnmsubXXXss-1.c       |  13 +
 .../gcc.target/i386/avx512f-vgetexpsd-1.c          |  15 +
 .../gcc.target/i386/avx512f-vgetexpsd-2.c          |  36 ++
 .../gcc.target/i386/avx512f-vgetexpss-1.c          |  15 +
 .../gcc.target/i386/avx512f-vgetexpss-2.c          |  36 ++
 .../gcc.target/i386/avx512f-vgetmantsd-1.c         |  16 +
 .../gcc.target/i386/avx512f-vgetmantsd-2.c         |  94 ++++
 .../gcc.target/i386/avx512f-vgetmantss-1.c         |  16 +
 .../gcc.target/i386/avx512f-vgetmantss-2.c         |  99 ++++
 gcc/testsuite/gcc.target/i386/avx512f-vmaxsd-1.c   |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vmaxss-1.c   |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vminsd-1.c   |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vminss-1.c   |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vmulsd-1.c   |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vmulss-1.c   |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-1.c |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c |  31 ++
 gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-1.c |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c |  33 ++
 .../gcc.target/i386/avx512f-vrndscalesd-1.c        |  15 +
 .../gcc.target/i386/avx512f-vrndscalesd-2.c        |  50 ++
 .../gcc.target/i386/avx512f-vrndscaless-1.c        |  15 +
 .../gcc.target/i386/avx512f-vrndscaless-2.c        |  52 ++
 .../gcc.target/i386/avx512f-vrsqrt14sd-1.c         |  14 +
 .../gcc.target/i386/avx512f-vrsqrt14sd-2.c         |  32 ++
 .../gcc.target/i386/avx512f-vrsqrt14ss-1.c         |  13 +
 .../gcc.target/i386/avx512f-vrsqrt14ss-2.c         |  34 ++
 .../gcc.target/i386/avx512f-vscalefsd-1.c          |  15 +
 .../gcc.target/i386/avx512f-vscalefsd-2.c          |  37 ++
 .../gcc.target/i386/avx512f-vscalefss-1.c          |  15 +
 .../gcc.target/i386/avx512f-vscalefss-2.c          |  39 ++
 gcc/testsuite/gcc.target/i386/avx512f-vsqrtsd-1.c  |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vsqrtss-1.c  |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vsubsd-1.c   |  13 +
 gcc/testsuite/gcc.target/i386/avx512f-vsubss-1.c   |  13 +
 gcc/testsuite/gcc.target/i386/sse-14.c             |  46 ++
 gcc/testsuite/gcc.target/i386/sse-23.c             |  34 +-
 gcc/testsuite/gcc.target/i386/testimm-10.c         |  11 +-
 gcc/testsuite/gcc.target/i386/testround-1.c        |  63 +++
 60 files changed, 2023 insertions(+), 66 deletions(-)

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index f717d46..d6dd438 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -1279,6 +1279,57 @@ _mm512_maskz_sra_epi32 (__mmask16 __U, __m512i __A, __m128i __B)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_add_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_addsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_add_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_addss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sub_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_subsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sub_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_subss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+#else
+#define _mm_add_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_addsd_round(A, B, C)
+
+#define _mm_add_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_addss_round(A, B, C)
+
+#define _mm_sub_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_subsd_round(A, B, C)
+
+#define _mm_sub_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_subss_round(A, B, C)
+#endif
+
+#ifdef __OPTIMIZE__
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_ternarylogic_epi64 (__m512i __A, __m512i __B, __m512i __C, const int imm)
@@ -1424,6 +1475,22 @@ _mm512_maskz_rcp14_ps (__mmask16 __U, __m512 __A)
 						  (__mmask16) __U);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rcp14_sd (__m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_rcp14sd ((__v2df) __A,
+					   (__v2df) __B);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rcp14_ss (__m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_rcp14ss ((__v4sf) __A,
+					  (__v4sf) __B);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_rsqrt14_pd (__m512d __A)
@@ -1482,6 +1549,22 @@ _mm512_maskz_rsqrt14_ps (__mmask16 __U, __m512 __A)
 						    (__mmask16) __U);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rsqrt14_sd (__m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_rsqrt14sd ((__v2df) __A,
+					     (__v2df) __B);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rsqrt14_ss (__m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_rsqrt14ss ((__v4sf) __A,
+					    (__v4sf) __B);
+}
+
 #ifdef __OPTIMIZE__
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -1542,6 +1625,23 @@ _mm512_maskz_sqrt_round_ps (__mmask16 __U, __m512 __A, const int __R)
 						 (__mmask16) __U, __R);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sqrt_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_sqrtsd_round ((__v2df) __B,
+						(__v2df) __A,
+						__R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sqrt_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_sqrtss_round ((__v4sf) __B,
+					       (__v4sf) __A,
+					       __R);
+}
 #else
 #define _mm512_sqrt_round_pd(A, C)            \
     (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_setzero_pd(), -1, C)
@@ -1560,6 +1660,12 @@ _mm512_maskz_sqrt_round_ps (__mmask16 __U, __m512 __A, const int __R)
 
 #define _mm512_maskz_sqrt_round_ps(U, A, C)   \
     (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#define _mm_sqrt_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_sqrtsd_round(A, B, C)
+
+#define _mm_sqrt_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_sqrtss_round(A, B, C)
 #endif
 
 extern __inline __m512i
@@ -2159,6 +2265,42 @@ _mm512_maskz_div_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
 						(__mmask16) __U, __R);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_mulsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_mulss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_div_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_divsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_div_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_divss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
 #else
 #define _mm512_mul_round_pd(A, B, C)            \
     (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), -1, C)
@@ -2195,6 +2337,24 @@ _mm512_maskz_div_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R)
 
 #define _mm512_maskz_div_round_ps(U, A, B, C)   \
     (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#define _mm_mul_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_mulsd_round(A, B, C)
+
+#define _mm_mul_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_mulss_round(A, B, C)
+
+#define _mm_div_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_divsd_round(A, B, C)
+
+#define _mm_div_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_divss_round(A, B, C)
+
+#define _mm_mask_div_round_ss(W, U, A, B, C) \
+    (__m128)__builtin_ia32_divss_mask(A, B, W, U, C)
+
+#define _mm_maskz_div_round_ss(U, A, B, C)   \
+    (__m128)__builtin_ia32_divss_mask(A, B, (__v4sf)_mm_setzero_ps(), U, C)
 #endif
 
 #ifdef __OPTIMIZE__
@@ -2438,6 +2598,23 @@ _mm512_maskz_scalef_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
 						   (__mmask16) __U, __R);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_scalefsd_round ((__v2df) __A,
+						  (__v2df) __B,
+						  __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_scalefss_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 __R);
+}
 #else
 #define _mm512_scalef_round_pd(A, B, C)            \
     (__m512d)__builtin_ia32_scalefpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), -1, C)
@@ -2456,6 +2633,12 @@ _mm512_maskz_scalef_round_ps (__mmask16 __U, __m512 __A, __m512 __B,
 
 #define _mm512_maskz_scalef_round_ps(U, A, B, C)   \
     (__m512)__builtin_ia32_scalefps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C)
+
+#define _mm_scalef_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_scalefsd_round(A, B, C)
+
+#define _mm_scalef_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_scalefss_round(A, B, C)
 #endif
 
 #ifdef __OPTIMIZE__
@@ -7578,6 +7761,23 @@ _mm512_maskz_cvt_roundpd_ps (__mmask8 __U, __m512d __A, const int __R)
 						   (__mmask8) __U, __R);
 }
 
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundsd_ss (__m128 __A, __m128d __B, const int __R)
+{
+  return (__m128) __builtin_ia32_cvtsd2ss_round ((__v4sf) __A,
+						 (__v2df) __B,
+						 __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundss_sd (__m128d __A, __m128 __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_cvtss2sd_round ((__v2df) __A,
+						  (__v4sf) __B,
+						  __R);
+}
 #else
 #define _mm512_cvt_roundpd_ps(A, B)		 \
     (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_setzero_ps(), -1, B)
@@ -7587,6 +7787,12 @@ _mm512_maskz_cvt_roundpd_ps (__mmask8 __U, __m512d __A, const int __R)
 
 #define _mm512_maskz_cvt_roundpd_ps(U, A, B)     \
     (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_setzero_ps(), U, B)
+
+#define _mm_cvt_roundsd_ss(A, B, C)		 \
+    (__m128)__builtin_ia32_cvtsd2ss_round(A, B, C)
+
+#define _mm_cvt_roundss_sd(A, B, C)		 \
+    (__m128d)__builtin_ia32_cvtss2sd_round(A, B, C)
 #endif
 
 extern __inline void
@@ -7611,6 +7817,24 @@ _mm512_stream_pd (double *__P, __m512d __A)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
+						    (__v4sf) __B,
+						    __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
+						     (__v2df) __B,
+						     __R);
+}
+
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_getexp_round_ps (__m512 __A, const int __R)
@@ -7759,6 +7983,30 @@ _mm512_maskz_getmant_round_ps (__mmask16 __U, __m512 __A,
 						    __U, __R);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_round_sd (__m128d __A, __m128d __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+  return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
+						  (__v2df) __B,
+						  (__D << 2) | __C,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_round_ss (__m128 __A, __m128 __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+  return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__D << 2) | __C,
+						  __R);
+}
+
 #else
 #define _mm512_getmant_round_pd(X, B, C, R)                                                  \
   ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
@@ -7800,6 +8048,24 @@ _mm512_maskz_getmant_round_ps (__mmask16 __U, __m512 __A,
                                              (__v16sf)(__m512)_mm512_setzero_ps(),  \
                                              (__mmask16)(U),\
 					     (R)))
+#define _mm_getmant_round_sd(X, Y, C, D, R)                                                  \
+  ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X),                    \
+					    (__v2df)(__m128d)(Y),	\
+					    (int)(((D)<<2) | (C)),	\
+					    (R)))
+
+#define _mm_getmant_round_ss(X, Y, C, D, R)                                                  \
+  ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X),                      \
+					   (__v4sf)(__m128)(Y),		\
+					   (int)(((D)<<2) | (C)),	\
+					   (R)))
+
+#define _mm_getexp_round_ss(A, B, R)						      \
+  ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), R))
+
+#define _mm_getexp_round_sd(A, B, R)						       \
+  ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B), R))
+
 #define _mm512_getexp_round_ps(A, R)						\
   ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
   (__v16sf)_mm512_setzero_ps(), (__mmask16)-1, R))
@@ -7885,6 +8151,24 @@ _mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B,
 						   _mm512_setzero_pd (),
 						   (__mmask8) __A, __R);
 }
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm, const int __R)
+{
+  return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A,
+						   (__v4sf) __B, __imm, __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm,
+			 const int __R)
+{
+  return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A,
+						    (__v2df) __B, __imm, __R);
+}
+
 #else
 #define _mm512_roundscale_round_ps(A, B, R) \
   ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
@@ -7912,6 +8196,12 @@ _mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B,
 					     (int)(C),			\
 					     (__v8df)_mm512_setzero_pd(),\
 					     (__mmask8)(A), R))
+#define _mm_roundscale_round_ss(A, B, C, R)					\
+  ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), R))
+#define _mm_roundscale_round_sd(A, B, C, R)					\
+  ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A),	\
+    (__v2df)(__m128d)(B), (int)(C), R))
 #endif
 
 extern __inline __m512
@@ -9825,6 +10115,57 @@ _mm512_maskz_unpacklo_ps (__mmask16 __U, __m512 __A, __m512 __B)
 						   (__mmask16) __U);
 }
 
+#ifdef __OPTIMIZE__
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_maxsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_maxss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_round_sd (__m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_minsd_round ((__v2df) __A,
+					       (__v2df) __B,
+					       __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_round_ss (__m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_minss_round ((__v4sf) __A,
+					      (__v4sf) __B,
+					      __R);
+}
+
+#else
+#define _mm_max_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_addsd_round(A, B, C)
+
+#define _mm_max_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_addss_round(A, B, C)
+
+#define _mm_min_round_sd(A, B, C)            \
+    (__m128d)__builtin_ia32_subsd_round(A, B, C)
+
+#define _mm_min_round_ss(A, B, C)            \
+    (__m128)__builtin_ia32_subss_round(A, B, C)
+#endif
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_blend_pd (__mmask8 __U, __m512d __A, __m512d __W)
@@ -9862,6 +10203,112 @@ _mm512_mask_blend_epi32 (__mmask16 __U, __m512i __A, __m512i __W)
 }
 
 #ifdef __OPTIMIZE__
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   (__v2df) __A,
+						   (__v2df) __B,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  (__v4sf) __A,
+						  (__v4sf) __B,
+						  __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   (__v2df) __A,
+						   -(__v2df) __B,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  (__v4sf) __A,
+						  -(__v4sf) __B,
+						  __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   -(__v2df) __A,
+						   (__v2df) __B,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  (__v4sf) __B,
+						  __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W,
+						   -(__v2df) __A,
+						   -(__v2df) __B,
+						   __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R)
+{
+  return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W,
+						  -(__v4sf) __A,
+						  -(__v4sf) __B,
+						  __R);
+}
+#else
+#define _mm_fmadd_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, C, R)
+
+#define _mm_fmadd_round_ss(A, B, C, R)            \
+    (__m128)__builtin_ia32_vfmaddss3_round(A, B, C, R)
+
+#define _mm_fmsub_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, -(C), R)
+
+#define _mm_fmsub_round_ss(A, B, C, R)            \
+    (__m128)__builtin_ia32_vfmaddss3_round(A, B, -(C), R)
+
+#define _mm_fnmadd_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), C, R)
+
+#define _mm_fnmadd_round_ss(A, B, C, R)            \
+   (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), C, R)
+
+#define _mm_fnmsub_round_sd(A, B, C, R)            \
+    (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), -(C), R)
+
+#define _mm_fnmsub_round_ss(A, B, C, R)            \
+    (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), -(C), R)
+#endif
+
+#ifdef __OPTIMIZE__
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_comi_round_ss (__m128 __A, __m128 __B, const int __P, const int __R)
@@ -10436,6 +10883,24 @@ _mm512_maskz_scalef_ps (__mmask16 __U, __m512 __A, __m512 __B)
 						   _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_sd (__m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_scalefsd_round ((__v2df) __A,
+						  (__v2df) __B,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_ss (__m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_scalefss_round ((__v4sf) __A,
+						 (__v4sf) __B,
+						 _MM_FROUND_CUR_DIRECTION);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_fmadd_pd (__m512d __A, __m512d __B, __m512d __C)
@@ -11784,6 +12249,24 @@ _mm512_maskz_getexp_pd (__mmask8 __U, __m512d __A)
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_ss (__m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A,
+						    (__v4sf) __B,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_sd (__m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A,
+						     (__v2df) __B,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_getmant_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B,
@@ -11856,6 +12339,28 @@ _mm512_maskz_getmant_ps (__mmask16 __U, __m512 __A,
 						    _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_sd (__m128d __A, __m128d __B, _MM_MANTISSA_NORM_ENUM __C,
+		_MM_MANTISSA_SIGN_ENUM __D)
+{
+  return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A,
+						   (__v2df) __B,
+						   (__D << 2) | __C,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_ss (__m128 __A, __m128 __B, _MM_MANTISSA_NORM_ENUM __C,
+		_MM_MANTISSA_SIGN_ENUM __D)
+{
+  return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A,
+						  (__v4sf) __B,
+						  (__D << 2) | __C,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
 #else
 #define _mm512_getmant_pd(X, B, C)                                                  \
   ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X),                 \
@@ -11897,6 +12402,26 @@ _mm512_maskz_getmant_ps (__mmask16 __U, __m512 __A,
                                              (__v16sf)(__m512)_mm512_setzero_ps(),  \
                                              (__mmask16)(U),\
 					     _MM_FROUND_CUR_DIRECTION))
+#define _mm_getmant_sd(X, Y, C, D)                                                  \
+  ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X),                    \
+                                           (__v2df)(__m128d)(Y),                    \
+                                           (int)(((D)<<2) | (C)),                   \
+					   _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_getmant_ss(X, Y, C, D)                                                  \
+  ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X),                      \
+                                          (__v4sf)(__m128)(Y),                      \
+                                          (int)(((D)<<2) | (C)),                    \
+					  _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_getexp_ss(A, B)						      \
+  ((__m128)__builtin_ia32_getexpss128_mask((__v4sf)(__m128)(A), (__v4sf)(__m128)(B),  \
+					   _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_getexp_sd(A, B)						       \
+  ((__m128d)__builtin_ia32_getexpsd128_mask((__v2df)(__m128d)(A), (__v2df)(__m128d)(B),\
+					    _MM_FROUND_CUR_DIRECTION))
+
 #define _mm512_getexp_ps(A)						\
   ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A),		\
   (__v16sf)_mm512_setzero_ps(), (__mmask16)-1, _MM_FROUND_CUR_DIRECTION))
@@ -11987,6 +12512,24 @@ _mm512_maskz_roundscale_pd (__mmask8 __A, __m512d __B, const int __imm)
 						   _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_ss (__m128 __A, __m128 __B, const int __imm)
+{
+  return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A,
+						   (__v4sf) __B, __imm,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm)
+{
+  return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A,
+						    (__v2df) __B, __imm,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
 #else
 #define _mm512_roundscale_ps(A, B) \
   ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\
@@ -12014,6 +12557,12 @@ _mm512_maskz_roundscale_pd (__mmask8 __A, __m512d __B, const int __imm)
 					     (int)(C),			\
 					     (__v8df)_mm512_setzero_pd(),\
 					     (__mmask8)(A), _MM_FROUND_CUR_DIRECTION))
+#define _mm_roundscale_ss(A, B, C)					\
+  ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A),	\
+  (__v4sf)(__m128)(B), (int)(C), _MM_FROUND_CUR_DIRECTION))
+#define _mm_roundscale_sd(A, B, C)					\
+  ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A),	\
+    (__v2df)(__m128d)(B), (int)(C), _MM_FROUND_CUR_DIRECTION))
 #endif
 
 #ifdef __OPTIMIZE__
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 86ad31e..d19ca84 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -516,6 +516,7 @@ DEF_FUNCTION_TYPE (V16QI, V16QI, V16QI, INT)
 DEF_FUNCTION_TYPE (V16QI, V16QI, V16QI, V16QI)
 DEF_FUNCTION_TYPE (V1DI, V1DI, V1DI, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, INT)
+DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, INT, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DF)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DI, INT)
 DEF_FUNCTION_TYPE (V2DI, V2DI, DI, INT)
@@ -531,6 +532,9 @@ DEF_FUNCTION_TYPE (V4DI, V4DI, V4DI, V4DI)
 DEF_FUNCTION_TYPE (V4HI, V4HI, HI, INT)
 DEF_FUNCTION_TYPE (V4SF, V4SF, FLOAT, INT)
 DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, INT)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, INT, INT)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V2DF, INT)
+DEF_FUNCTION_TYPE (V2DF, V2DF, V4SF, INT)
 DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF)
 DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SI, INT)
 DEF_FUNCTION_TYPE (V4SI, V4SI, SI, INT)
@@ -678,6 +682,7 @@ DEF_FUNCTION_TYPE (V4SF, V4SF, V2DF, V4SF, QI, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DF, QI, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V4SF, V2DF, QI, INT)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DF, INT)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SF, INT)
 
 DEF_FUNCTION_TYPE (V16SF, V16SF, INT, V16SF, HI, INT)
 DEF_FUNCTION_TYPE (V8DF, V8DF, INT, V8DF, QI, INT)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 90473b3..43cc08a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -27905,6 +27905,8 @@ enum ix86_builtins
   /* AVX512F */
   IX86_BUILTIN_ADDPD512,
   IX86_BUILTIN_ADDPS512,
+  IX86_BUILTIN_ADDSD_ROUND,
+  IX86_BUILTIN_ADDSS_ROUND,
   IX86_BUILTIN_ALIGND512,
   IX86_BUILTIN_ALIGNQ512,
   IX86_BUILTIN_BLENDMD512,
@@ -27939,9 +27941,11 @@ enum ix86_builtins
   IX86_BUILTIN_CVTPS2PD512,
   IX86_BUILTIN_CVTPS2PH512,
   IX86_BUILTIN_CVTPS2UDQ512,
+  IX86_BUILTIN_CVTSD2SS_ROUND,
   IX86_BUILTIN_CVTSI2SD64,
   IX86_BUILTIN_CVTSI2SS32,
   IX86_BUILTIN_CVTSI2SS64,
+  IX86_BUILTIN_CVTSS2SD_ROUND,
   IX86_BUILTIN_CVTTPD2DQ512,
   IX86_BUILTIN_CVTTPD2UDQ512,
   IX86_BUILTIN_CVTTPS2DQ512,
@@ -27954,6 +27958,8 @@ enum ix86_builtins
   IX86_BUILTIN_CVTUSI2SS64,
   IX86_BUILTIN_DIVPD512,
   IX86_BUILTIN_DIVPS512,
+  IX86_BUILTIN_DIVSD_ROUND,
+  IX86_BUILTIN_DIVSS_ROUND,
   IX86_BUILTIN_EXPANDPD512,
   IX86_BUILTIN_EXPANDPD512Z,
   IX86_BUILTIN_EXPANDPDLOAD512,
@@ -27976,8 +27982,12 @@ enum ix86_builtins
   IX86_BUILTIN_FIXUPIMMSS128_MASKZ,
   IX86_BUILTIN_GETEXPPD512,
   IX86_BUILTIN_GETEXPPS512,
+  IX86_BUILTIN_GETEXPSD128,
+  IX86_BUILTIN_GETEXPSS128,
   IX86_BUILTIN_GETMANTPD512,
   IX86_BUILTIN_GETMANTPS512,
+  IX86_BUILTIN_GETMANTSD128,
+  IX86_BUILTIN_GETMANTSS128,
   IX86_BUILTIN_INSERTF32X4,
   IX86_BUILTIN_INSERTF64X4,
   IX86_BUILTIN_INSERTI32X4,
@@ -27990,8 +28000,12 @@ enum ix86_builtins
   IX86_BUILTIN_LOADUPS512,
   IX86_BUILTIN_MAXPD512,
   IX86_BUILTIN_MAXPS512,
+  IX86_BUILTIN_MAXSD_ROUND,
+  IX86_BUILTIN_MAXSS_ROUND,
   IX86_BUILTIN_MINPD512,
   IX86_BUILTIN_MINPS512,
+  IX86_BUILTIN_MINSD_ROUND,
+  IX86_BUILTIN_MINSS_ROUND,
   IX86_BUILTIN_MOVAPD512,
   IX86_BUILTIN_MOVAPS512,
   IX86_BUILTIN_MOVDDUP512,
@@ -28008,6 +28022,8 @@ enum ix86_builtins
   IX86_BUILTIN_MOVSLDUP512,
   IX86_BUILTIN_MULPD512,
   IX86_BUILTIN_MULPS512,
+  IX86_BUILTIN_MULSD_ROUND,
+  IX86_BUILTIN_MULSS_ROUND,
   IX86_BUILTIN_PABSD512,
   IX86_BUILTIN_PABSQ512,
   IX86_BUILTIN_PADDD512,
@@ -28118,12 +28134,20 @@ enum ix86_builtins
   IX86_BUILTIN_PXORQ512,
   IX86_BUILTIN_RCP14PD512,
   IX86_BUILTIN_RCP14PS512,
+  IX86_BUILTIN_RCP14SD,
+  IX86_BUILTIN_RCP14SS,
   IX86_BUILTIN_RNDSCALEPD,
   IX86_BUILTIN_RNDSCALEPS,
+  IX86_BUILTIN_RNDSCALESD,
+  IX86_BUILTIN_RNDSCALESS,
   IX86_BUILTIN_RSQRT14PD512,
   IX86_BUILTIN_RSQRT14PS512,
+  IX86_BUILTIN_RSQRT14SD,
+  IX86_BUILTIN_RSQRT14SS,
   IX86_BUILTIN_SCALEFPD512,
   IX86_BUILTIN_SCALEFPS512,
+  IX86_BUILTIN_SCALEFSD,
+  IX86_BUILTIN_SCALEFSS,
   IX86_BUILTIN_SHUFPD512,
   IX86_BUILTIN_SHUFPS512,
   IX86_BUILTIN_SHUF_F32x4,
@@ -28134,6 +28158,8 @@ enum ix86_builtins
   IX86_BUILTIN_SQRTPD512_MASK,
   IX86_BUILTIN_SQRTPS512_MASK,
   IX86_BUILTIN_SQRTPS_NR512,
+  IX86_BUILTIN_SQRTSD_ROUND,
+  IX86_BUILTIN_SQRTSS_ROUND,
   IX86_BUILTIN_STOREAPD512,
   IX86_BUILTIN_STOREAPS512,
   IX86_BUILTIN_STOREDQUDI512,
@@ -28142,6 +28168,8 @@ enum ix86_builtins
   IX86_BUILTIN_STOREUPS512,
   IX86_BUILTIN_SUBPD512,
   IX86_BUILTIN_SUBPS512,
+  IX86_BUILTIN_SUBSD_ROUND,
+  IX86_BUILTIN_SUBSS_ROUND,
   IX86_BUILTIN_UCMPD512,
   IX86_BUILTIN_UCMPQ512,
   IX86_BUILTIN_UNPCKHPD512,
@@ -28170,6 +28198,8 @@ enum ix86_builtins
   IX86_BUILTIN_VFMADDPS512_MASK,
   IX86_BUILTIN_VFMADDPS512_MASK3,
   IX86_BUILTIN_VFMADDPS512_MASKZ,
+  IX86_BUILTIN_VFMADDSD3_ROUND,
+  IX86_BUILTIN_VFMADDSS3_ROUND,
   IX86_BUILTIN_VFMADDSUBPD512_MASK,
   IX86_BUILTIN_VFMADDSUBPD512_MASK3,
   IX86_BUILTIN_VFMADDSUBPD512_MASKZ,
@@ -28180,6 +28210,8 @@ enum ix86_builtins
   IX86_BUILTIN_VFMSUBADDPS512_MASK3,
   IX86_BUILTIN_VFMSUBPD512_MASK3,
   IX86_BUILTIN_VFMSUBPS512_MASK3,
+  IX86_BUILTIN_VFMSUBSD3_MASK3,
+  IX86_BUILTIN_VFMSUBSS3_MASK3,
   IX86_BUILTIN_VFNMADDPD512_MASK,
   IX86_BUILTIN_VFNMADDPS512_MASK,
   IX86_BUILTIN_VFNMSUBPD512_MASK,
@@ -29859,8 +29891,12 @@ static const struct builtin_description bdesc_args[] =
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_xorv8di3_mask, "__builtin_ia32_pxorq512_mask", IX86_BUILTIN_PXORQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_rcp14v8df_mask, "__builtin_ia32_rcp14pd512_mask", IX86_BUILTIN_RCP14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_rcp14v16sf_mask, "__builtin_ia32_rcp14ps512_mask", IX86_BUILTIN_RCP14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_srcp14v2df, "__builtin_ia32_rcp14sd", IX86_BUILTIN_RCP14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_srcp14v4sf, "__builtin_ia32_rcp14ss", IX86_BUILTIN_RCP14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_rsqrt14v8df_mask, "__builtin_ia32_rsqrt14pd512_mask", IX86_BUILTIN_RSQRT14PD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_rsqrt14v16sf_mask, "__builtin_ia32_rsqrt14ps512_mask", IX86_BUILTIN_RSQRT14PS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_rsqrt14v2df, "__builtin_ia32_rsqrt14sd", IX86_BUILTIN_RSQRT14SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_rsqrt14v4sf, "__builtin_ia32_rsqrt14ss", IX86_BUILTIN_RSQRT14SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_shufpd512_mask, "__builtin_ia32_shufpd512_mask", IX86_BUILTIN_SHUFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_shufps512_mask, "__builtin_ia32_shufps512_mask", IX86_BUILTIN_SHUFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_shuf_f32x4_mask, "__builtin_ia32_shuf_f32x4_mask", IX86_BUILTIN_SHUF_F32x4, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI },
@@ -29940,6 +29976,8 @@ static const struct builtin_description bdesc_round_args[] =
   /* AVX512F */
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_addv8df3_mask_round, "__builtin_ia32_addpd512_mask", IX86_BUILTIN_ADDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_addv16sf3_mask_round, "__builtin_ia32_addps512_mask", IX86_BUILTIN_ADDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmaddv2df3_round, "__builtin_ia32_addsd_round", IX86_BUILTIN_ADDSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmaddv4sf3_round, "__builtin_ia32_addss_round", IX86_BUILTIN_ADDSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_cmpv8df3_mask_round, "__builtin_ia32_cmppd512_mask", IX86_BUILTIN_CMPPD512, UNKNOWN, (int) QI_FTYPE_V8DF_V8DF_INT_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_cmpv16sf3_mask_round, "__builtin_ia32_cmpps512_mask", IX86_BUILTIN_CMPPS512, UNKNOWN, (int) HI_FTYPE_V16SF_V16SF_INT_HI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vmcmpv2df3_mask_round, "__builtin_ia32_cmpsd_mask", IX86_BUILTIN_CMPSD_MASK, UNKNOWN, (int) QI_FTYPE_V2DF_V2DF_INT_QI_INT },
@@ -29954,9 +29992,11 @@ static const struct builtin_description bdesc_round_args[] =
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2dq512_mask", IX86_BUILTIN_CVTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_cvtps2pd512_mask_round, "__builtin_ia32_cvtps2pd512_mask", IX86_BUILTIN_CVTPS2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_ufix_notruncv16sfv16si_mask_round, "__builtin_ia32_cvtps2udq512_mask", IX86_BUILTIN_CVTPS2UDQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_cvtsd2ss_round, "__builtin_ia32_cvtsd2ss_round", IX86_BUILTIN_CVTSD2SS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V2DF_INT },
   { OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, CODE_FOR_sse2_cvtsi2sdq_round, "__builtin_ia32_cvtsi2sd64", IX86_BUILTIN_CVTSI2SD64, UNKNOWN, (int) V2DF_FTYPE_V2DF_INT64_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_cvtsi2ss_round, "__builtin_ia32_cvtsi2ss32", IX86_BUILTIN_CVTSI2SS32, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT_INT },
   { OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, CODE_FOR_sse_cvtsi2ssq_round, "__builtin_ia32_cvtsi2ss64", IX86_BUILTIN_CVTSI2SS64, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT64_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_cvtss2sd_round, "__builtin_ia32_cvtss2sd_round", IX86_BUILTIN_CVTSS2SD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_fix_truncv8dfv8si2_mask_round, "__builtin_ia32_cvttpd2dq512_mask", IX86_BUILTIN_CVTTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_ufix_truncv8dfv8si2_mask_round, "__builtin_ia32_cvttpd2udq512_mask", IX86_BUILTIN_CVTTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_fix_truncv16sfv16si2_mask_round, "__builtin_ia32_cvttps2dq512_mask", IX86_BUILTIN_CVTTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT },
@@ -29967,6 +30007,8 @@ static const struct builtin_description bdesc_round_args[] =
   { OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, CODE_FOR_cvtusi2ss64_round, "__builtin_ia32_cvtusi2ss64", IX86_BUILTIN_CVTUSI2SS64, UNKNOWN, (int) V4SF_FTYPE_V4SF_UINT64_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_divv8df3_mask_round, "__builtin_ia32_divpd512_mask", IX86_BUILTIN_DIVPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_divv16sf3_mask_round, "__builtin_ia32_divps512_mask", IX86_BUILTIN_DIVPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmdivv2df3_round, "__builtin_ia32_divsd_round", IX86_BUILTIN_DIVSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmdivv4sf3_round, "__builtin_ia32_divss_round", IX86_BUILTIN_DIVSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fixupimmv8df_mask_round, "__builtin_ia32_fixupimmpd512_mask", IX86_BUILTIN_FIXUPIMMPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fixupimmv8df_maskz_round, "__builtin_ia32_fixupimmpd512_maskz", IX86_BUILTIN_FIXUPIMMPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fixupimmv16sf_mask_round, "__builtin_ia32_fixupimmps512_mask", IX86_BUILTIN_FIXUPIMMPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI_INT },
@@ -29977,22 +30019,40 @@ static const struct builtin_description bdesc_round_args[] =
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sfixupimmv4sf_maskz_round, "__builtin_ia32_fixupimmss_maskz", IX86_BUILTIN_FIXUPIMMSS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI_INT_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getexpv8df_mask_round, "__builtin_ia32_getexppd512_mask", IX86_BUILTIN_GETEXPPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getexpv16sf_mask_round, "__builtin_ia32_getexpps512_mask", IX86_BUILTIN_GETEXPPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sgetexpv2df_round, "__builtin_ia32_getexpsd128_round", IX86_BUILTIN_GETEXPSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sgetexpv4sf_round, "__builtin_ia32_getexpss128_round", IX86_BUILTIN_GETEXPSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getmantv8df_mask_round, "__builtin_ia32_getmantpd512_mask", IX86_BUILTIN_GETMANTPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getmantv16sf_mask_round, "__builtin_ia32_getmantps512_mask", IX86_BUILTIN_GETMANTPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getmantv2df_round, "__builtin_ia32_getmantsd_round", IX86_BUILTIN_GETMANTSD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_getmantv4sf_round, "__builtin_ia32_getmantss_round", IX86_BUILTIN_GETMANTSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_smaxv8df3_mask_round, "__builtin_ia32_maxpd512_mask", IX86_BUILTIN_MAXPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_smaxv16sf3_mask_round, "__builtin_ia32_maxps512_mask", IX86_BUILTIN_MAXPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmsmaxv2df3_round, "__builtin_ia32_maxsd_round", IX86_BUILTIN_MAXSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmsmaxv4sf3_round, "__builtin_ia32_maxss_round", IX86_BUILTIN_MAXSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_sminv8df3_mask_round, "__builtin_ia32_minpd512_mask", IX86_BUILTIN_MINPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_sminv16sf3_mask_round, "__builtin_ia32_minps512_mask", IX86_BUILTIN_MINPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmsminv2df3_round, "__builtin_ia32_minsd_round", IX86_BUILTIN_MINSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmsminv4sf3_round, "__builtin_ia32_minss_round", IX86_BUILTIN_MINSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_mulv8df3_mask_round, "__builtin_ia32_mulpd512_mask", IX86_BUILTIN_MULPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_mulv16sf3_mask_round, "__builtin_ia32_mulps512_mask", IX86_BUILTIN_MULPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmmulv2df3_round, "__builtin_ia32_mulsd_round", IX86_BUILTIN_MULSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmmulv4sf3_round, "__builtin_ia32_mulss_round", IX86_BUILTIN_MULSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_rndscalev2df_round, "__builtin_ia32_rndscalesd_round", IX86_BUILTIN_RNDSCALESD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_rndscalev4sf_round, "__builtin_ia32_rndscaless_round", IX86_BUILTIN_RNDSCALESS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vmscalefv2df_round, "__builtin_ia32_scalefsd_round", IX86_BUILTIN_SCALEFSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vmscalefv4sf_round, "__builtin_ia32_scalefss_round", IX86_BUILTIN_SCALEFSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sqrtv8df2_mask_round, "__builtin_ia32_sqrtpd512_mask", IX86_BUILTIN_SQRTPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_sqrtv16sf2_mask_round, "__builtin_ia32_sqrtps512_mask", IX86_BUILTIN_SQRTPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmsqrtv2df2_round, "__builtin_ia32_sqrtsd_round", IX86_BUILTIN_SQRTSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmsqrtv4sf2_round, "__builtin_ia32_sqrtss_round", IX86_BUILTIN_SQRTSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_subv8df3_mask_round, "__builtin_ia32_subpd512_mask", IX86_BUILTIN_SUBPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_subv16sf3_mask_round, "__builtin_ia32_subps512_mask", IX86_BUILTIN_SUBPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_vmsubv2df3_round, "__builtin_ia32_subsd_round", IX86_BUILTIN_SUBSD_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse_vmsubv4sf3_round, "__builtin_ia32_subss_round", IX86_BUILTIN_SUBSS_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_sse2_cvtsd2si_round, "__builtin_ia32_vcvtsd2si32", IX86_BUILTIN_VCVTSD2SI32, UNKNOWN, (int) INT_FTYPE_V2DF_INT },
   { OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, CODE_FOR_sse2_cvtsd2siq_round, "__builtin_ia32_vcvtsd2si64", IX86_BUILTIN_VCVTSD2SI64, UNKNOWN, (int) INT64_FTYPE_V2DF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vcvtsd2usi_round, "__builtin_ia32_vcvtsd2usi32", IX86_BUILTIN_VCVTSD2USI32, UNKNOWN, (int) UINT_FTYPE_V2DF_INT },
@@ -30015,6 +30075,8 @@ static const struct builtin_description bdesc_round_args[] =
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmadd_v16sf_mask_round, "__builtin_ia32_vfmaddps512_mask", IX86_BUILTIN_VFMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmadd_v16sf_mask3_round, "__builtin_ia32_vfmaddps512_mask3", IX86_BUILTIN_VFMADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmadd_v16sf_maskz_round, "__builtin_ia32_vfmaddps512_maskz", IX86_BUILTIN_VFMADDPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_fmai_vmfmadd_v2df_round, "__builtin_ia32_vfmaddsd3_round", IX86_BUILTIN_VFMADDSD3_ROUND, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_fmai_vmfmadd_v4sf_round, "__builtin_ia32_vfmaddss3_round", IX86_BUILTIN_VFMADDSS3_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v8df_mask_round, "__builtin_ia32_vfmaddsubpd512_mask", IX86_BUILTIN_VFMADDSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v8df_mask3_round, "__builtin_ia32_vfmaddsubpd512_mask3", IX86_BUILTIN_VFMADDSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v8df_maskz_round, "__builtin_ia32_vfmaddsubpd512_maskz", IX86_BUILTIN_VFMADDSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_QI_INT },
@@ -34044,6 +34106,10 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V4SF_FTYPE_V4SF_INT_INT:
     case V4SF_FTYPE_V4SF_INT64_INT:
     case V2DF_FTYPE_V2DF_INT64_INT:
+    case V4SF_FTYPE_V4SF_V4SF_INT:
+    case V2DF_FTYPE_V2DF_V2DF_INT:
+    case V4SF_FTYPE_V4SF_V2DF_INT:
+    case V2DF_FTYPE_V2DF_V4SF_INT:
       nargs = 3;
       break;
     case V8SF_FTYPE_V8DF_V8SF_QI_INT:
@@ -34054,6 +34120,13 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V16SI_FTYPE_V16SF_V16SI_HI_INT:
     case V8DF_FTYPE_V8SF_V8DF_QI_INT:
     case V16SF_FTYPE_V16HI_V16SF_HI_INT:
+    case V2DF_FTYPE_V2DF_V2DF_V2DF_INT:
+    case V4SF_FTYPE_V4SF_V4SF_V4SF_INT:
+      nargs = 4;
+      break;
+    case V4SF_FTYPE_V4SF_V4SF_INT_INT:
+    case V2DF_FTYPE_V2DF_V2DF_INT_INT:
+      nargs_constant = 2;
       nargs = 4;
       break;
     case INT_FTYPE_V4SF_V4SF_INT_INT:
@@ -34117,7 +34190,9 @@ ix86_expand_round_builtin (const struct builtin_description *d,
 		{
 		case CODE_FOR_avx512f_getmantv8df_mask_round:
 		case CODE_FOR_avx512f_getmantv16sf_mask_round:
-		  error ("the immediate argument must be an 4-bit immediate");
+		case CODE_FOR_avx512f_getmantv2df_round:
+		case CODE_FOR_avx512f_getmantv4sf_round:
+		  error ("the immediate argument must be 4-bit immediate");
 		  return const0_rtx;
 		case CODE_FOR_avx512f_cmpv8df3_mask_round:
 		case CODE_FOR_avx512f_cmpv16sf3_mask_round:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5005a47..d75edb7 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1307,21 +1307,21 @@
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<plusminus_insn><mode>3"
+(define_insn "<sse>_vm<plusminus_insn><mode>3<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (plusminus:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<plusminus_mnemonic><ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %<iptr>2<round_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<round_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_expand "mul<mode>3<mask_name><round_name>"
@@ -1347,21 +1347,21 @@
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<multdiv_mnemonic><mode>3"
+(define_insn "<sse>_vm<multdiv_mnemonic><mode>3<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (multdiv:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    <multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<multdiv_mnemonic><ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %<iptr>2<round_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<round_prefix>")
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1447,7 +1447,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*srcp14<mode>"
+(define_insn "srcp14<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1457,7 +1457,7 @@
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "vrcp14<ssescalarmodesuffix>\t{%2, %1, %0|, %1, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -1494,21 +1494,21 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vmsqrt<mode>2"
+(define_insn "<sse>_vmsqrt<mode>2<round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (sqrt:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 1 "nonimmediate_operand" "xm,<round_constraint>"))
 	  (match_operand:VF_128 2 "register_operand" "0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
    sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vsqrt<ssescalarmodesuffix>\t{%1, %2, %0|%0, %2, %<iptr>1}"
+   vsqrt<ssescalarmodesuffix>\t{<round_op3>%1, %2, %0|%0, %2, %<iptr>1<round_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "sqrt")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<round_prefix>")
    (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -1543,7 +1543,7 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*rsqrt14<mode>"
+(define_insn "rsqrt14<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
@@ -1624,22 +1624,22 @@
    (set_attr "prefix" "<mask_prefix3>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "<sse>_vm<code><mode>3"
+(define_insn "<sse>_vm<code><mode>3<round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
 	(vec_merge:VF_128
 	  (smaxmin:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm"))
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_saeonly_constraint>"))
 	 (match_dup 1)
 	 (const_int 1)))]
   "TARGET_SSE"
   "@
    <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
-   v<maxmin_float><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
+   v<maxmin_float><ssescalarmodesuffix>\t{<round_saeonly_op3>%2, %1, %0|%0, %1, %<iptr>2<round_saeonly_op3>}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "<round_saeonly_prefix>")
    (set_attr "mode" "<ssescalarmode>")])
 
 ;; These versions of the min/max patterns implement exactly the operations
@@ -4108,34 +4108,34 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
-(define_insn "sse2_cvtsd2ss"
+(define_insn "sse2_cvtsd2ss<round_name>"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
 	    (float_truncate:V2SF
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m,vm")))
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m,<round_constraint>")))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
   "TARGET_SSE2"
   "@
    cvtsd2ss\t{%2, %0|%0, %2}
    cvtsd2ss\t{%2, %0|%0, %q2}
-   vcvtsd2ss\t{%2, %1, %0|%0, %1, %q2}"
+   vcvtsd2ss\t{<round_op3>%2, %1, %0|%0, %1, %q2<round_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "athlon_decode" "vector,double,*")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "bdver1_decode" "direct,direct,*")
    (set_attr "btver2_decode" "double,double,double")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,<round_prefix>")
    (set_attr "mode" "SF")])
 
-(define_insn "sse2_cvtss2sd"
+(define_insn "sse2_cvtss2sd<round_saeonly_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
 	(vec_merge:V2DF
 	  (float_extend:V2DF
 	    (vec_select:V2SF
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m,vm")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m,<round_saeonly_constraint>")
 	      (parallel [(const_int 0) (const_int 1)])))
 	  (match_operand:V2DF 1 "register_operand" "0,0,v")
 	  (const_int 1)))]
@@ -4143,14 +4143,14 @@
   "@
    cvtss2sd\t{%2, %0|%0, %2}
    cvtss2sd\t{%2, %0|%0, %k2}
-   vcvtss2sd\t{%2, %1, %0|%0, %1, %k2}"
+   vcvtss2sd\t{<round_saeonly_op3>%2, %1, %0|%0, %1, %k2<round_saeonly_op3>}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
    (set_attr "amdfam10_decode" "vector,double,*")
    (set_attr "athlon_decode" "direct,direct,*")
    (set_attr "bdver1_decode" "direct,direct,*")
    (set_attr "btver2_decode" "double,double,double")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,<round_saeonly_prefix>")
    (set_attr "mode" "DF")])
 
 (define_insn "<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>"
@@ -6553,17 +6553,17 @@
   operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8);
 })
 
-(define_insn "*avx512f_vmscalef<mode>"
+(define_insn "avx512f_vmscalef<mode><round_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_constraint>")]
 	    UNSPEC_SCALEF)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "%vscalef<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  "%vscalef<ssescalarmodesuffix>\t{<round_op3>%2, %1, %0|%0, %1, %2<round_op3>}"
   [(set_attr "prefix" "evex")
    (set_attr "mode"  "<ssescalarmode>")])
 
@@ -6633,17 +6633,17 @@
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_sgetexp<mode>"
+(define_insn "avx512f_sgetexp<mode><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")]
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")]
 	    UNSPEC_GETEXP)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetexp<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}";
+   "vgetexp<ssescalarmodesuffix>\t{<round_saeonly_op3>%2, %1, %0|%0, %1, %2<round_saeonly_op3>}";
     [(set_attr "prefix" "evex")
      (set_attr "mode" "<ssescalarmode>")])
 
@@ -6798,18 +6798,18 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512f_rndscale<mode>"
+(define_insn "avx512f_rndscale<mode><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_255_operand")]
 	    UNSPEC_ROUND)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512F"
-  "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}"
   [(set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
@@ -15184,18 +15184,18 @@
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512f_getmant<mode>"
+(define_insn "avx512f_getmant<mode><round_saeonly_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "vm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_15_operand")]
 	    UNSPEC_GETMANT)
 	  (match_dup 1)
 	  (const_int 1)))]
    "TARGET_AVX512F"
-   "vgetmant<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+   "vgetmant<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}";
    [(set_attr "prefix" "evex")
    (set_attr "mode" "<ssescalarmode>")])
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 4a6d477..487b749 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -51,7 +51,7 @@
 (define_subst_attr "mask_operand18" "mask" "" "%{%19%}%N18")
 (define_subst_attr "mask_operand19" "mask" "" "%{%20%}%N19")
 (define_subst_attr "mask_codefor" "mask" "*" "")
-(define_subst_attr "mask_mode512bit_condition" "mask" "1" "(GET_MODE_SIZE (GET_MODE (operands[0])) == 64)")
+(define_subst_attr "mask_mode512bit_condition" "mask" "1" "(GET_MODE_SIZE (<MODE>mode) == 64)")
 (define_subst_attr "store_mask_constraint" "mask" "vm" "v")
 (define_subst_attr "store_mask_predicate" "mask" "nonimmediate_operand" "register_operand")
 (define_subst_attr "mask_prefix" "mask" "vex" "evex")
@@ -85,7 +85,7 @@
 (define_subst_attr "sd_mask_op4" "sd" "" "%{%5%}%N4")
 (define_subst_attr "sd_mask_op5" "sd" "" "%{%6%}%N5")
 (define_subst_attr "sd_mask_codefor" "sd" "*" "")
-(define_subst_attr "sd_mask_mode512bit_condition" "sd" "1" "(GET_MODE_SIZE (GET_MODE (operands[0])) == 64)")
+(define_subst_attr "sd_mask_mode512bit_condition" "sd" "1" "(GET_MODE_SIZE (<MODE>mode) == 64)")
 
 (define_subst "sd"
  [(set (match_operand:SUBST_V 0)
@@ -101,7 +101,6 @@
 (define_subst_attr "round_name" "round" "" "_round")
 (define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4")
 (define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5")
-(define_subst_attr "round_mask_scalar_operand3" "mask_scalar" "%R3" "%R5")
 (define_subst_attr "round_sd_mask_operand4" "sd" "%R4" "%R6")
 (define_subst_attr "round_op2" "round" "" "%R2")
 (define_subst_attr "round_op3" "round" "" "%R3")
@@ -116,8 +115,9 @@
 (define_subst_attr "round_constraint2" "round" "m" "v")
 (define_subst_attr "round_constraint3" "round" "rm" "r")
 (define_subst_attr "round_nimm_predicate" "round" "nonimmediate_operand" "register_operand")
-(define_subst_attr "round_mode512bit_condition" "round" "1" "(GET_MODE (operands[0]) == V16SFmode || GET_MODE (operands[0]) == V8DFmode)")
-(define_subst_attr "round_modev4sf_condition" "round" "1" "(GET_MODE (operands[0]) == V4SFmode)")
+(define_subst_attr "round_prefix" "round" "vex" "evex")
+(define_subst_attr "round_mode512bit_condition" "round" "1" "(<MODE>mode == V16SFmode || <MODE>mode == V8DFmode)")
+(define_subst_attr "round_modev4sf_condition" "round" "1" "(<MODE>mode == V4SFmode)")
 (define_subst_attr "round_codefor" "round" "*" "")
 (define_subst_attr "round_opnum" "round" "5" "6")
 
@@ -138,9 +138,11 @@
 (define_subst_attr "round_saeonly_mask_scalar_merge_operand4" "mask_scalar_merge" "%R4" "%R5")
 (define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%R5" "%R7")
 (define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%R2")
+(define_subst_attr "round_saeonly_op3" "round_saeonly" "" "%R3")
 (define_subst_attr "round_saeonly_op4" "round_saeonly" "" "%R4")
 (define_subst_attr "round_saeonly_op5" "round_saeonly" "" "%R5")
 (define_subst_attr "round_saeonly_op6" "round_saeonly" "" "%R6")
+(define_subst_attr "round_saeonly_prefix" "round_saeonly" "vex" "evex")
 (define_subst_attr "round_saeonly_mask_op2" "round_saeonly" "" "<round_saeonly_mask_operand2>")
 (define_subst_attr "round_saeonly_mask_op3" "round_saeonly" "" "<round_saeonly_mask_operand3>")
 (define_subst_attr "round_saeonly_mask_scalar_op3" "round_saeonly" "" "<round_saeonly_mask_scalar_operand3>")
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 0d38f30..7201592 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -169,6 +169,8 @@
 /* avx512fintrin.h */
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 1)
+#define __builtin_ia32_addss_round(A, B, C) __builtin_ia32_addss_round(A, B, 1)
 #define __builtin_ia32_alignd512_mask(A, B, F, D, E) __builtin_ia32_alignd512_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignq512_mask(A, B, F, D, E) __builtin_ia32_alignq512_mask(A, B, 1, D, E)
 #define __builtin_ia32_cmpd512_mask(A, B, E, D) __builtin_ia32_cmpd512_mask(A, B, 1, D)
@@ -184,11 +186,11 @@
 #define __builtin_ia32_cvtps2dq512_mask(A, B, C, D) __builtin_ia32_cvtps2dq512_mask(A, B, C, 1)
 #define __builtin_ia32_cvtps2pd512_mask(A, B, C, D) __builtin_ia32_cvtps2pd512_mask(A, B, C, 5)
 #define __builtin_ia32_cvtps2udq512_mask(A, B, C, D) __builtin_ia32_cvtps2udq512_mask(A, B, C, 1)
-#define __builtin_ia32_cvtsd2ss_mask(A, B, C, D, E) __builtin_ia32_cvtsd2ss_mask(A, B, C, D, 1)
+#define __builtin_ia32_cvtsd2ss_round(A, B, C) __builtin_ia32_cvtsd2ss_round(A, B, 1)
+#define __builtin_ia32_cvtss2sd_round(A, B, C) __builtin_ia32_cvtss2sd_round(A, B, 4)
 #define __builtin_ia32_cvtsi2sd64(A, B, C) __builtin_ia32_cvtsi2sd64(A, B, 1)
 #define __builtin_ia32_cvtsi2ss32(A, B, C) __builtin_ia32_cvtsi2ss32(A, B, 1)
 #define __builtin_ia32_cvtsi2ss64(A, B, C) __builtin_ia32_cvtsi2ss64(A, B, 1)
-#define __builtin_ia32_cvtss2sd_mask(A, B, C, D, E) __builtin_ia32_cvtss2sd_mask(A, B, C, D, 5)
 #define __builtin_ia32_cvttpd2dq512_mask(A, B, C, D) __builtin_ia32_cvttpd2dq512_mask(A, B, C, 5)
 #define __builtin_ia32_cvttpd2udq512_mask(A, B, C, D) __builtin_ia32_cvttpd2udq512_mask(A, B, C, 5)
 #define __builtin_ia32_cvttps2dq512_mask(A, B, C, D) __builtin_ia32_cvttps2dq512_mask(A, B, C, 5)
@@ -199,6 +201,8 @@
 #define __builtin_ia32_cvtusi2ss64(A, B, C) __builtin_ia32_cvtusi2ss64(A, B, 1)
 #define __builtin_ia32_divpd512_mask(A, B, C, D, E) __builtin_ia32_divpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_divps512_mask(A, B, C, D, E) __builtin_ia32_divps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_divsd_round(A, B, C) __builtin_ia32_divsd_round(A, B, 1)
+#define __builtin_ia32_divss_round(A, B, C) __builtin_ia32_divss_round(A, B, 1)
 #define __builtin_ia32_extractf32x4_mask(A, E, C, D) __builtin_ia32_extractf32x4_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x4_mask(A, E, C, D) __builtin_ia32_extractf64x4_mask(A, 1, C, D)
 #define __builtin_ia32_extracti32x4_mask(A, E, C, D) __builtin_ia32_extracti32x4_mask(A, 1, C, D)
@@ -221,18 +225,28 @@
 #define __builtin_ia32_gathersiv8di(A, B, C, D, F) __builtin_ia32_gathersiv8di(A, B, C, D, 1)
 #define __builtin_ia32_getexppd512_mask(A, B, C, D) __builtin_ia32_getexppd512_mask(A, B, C, 5)
 #define __builtin_ia32_getexpps512_mask(A, B, C, D) __builtin_ia32_getexpps512_mask(A, B, C, 5)
+#define __builtin_ia32_getexpsd128_round(A, B, C) __builtin_ia32_getexpsd128_round(A, B, 4)
+#define __builtin_ia32_getexpss128_round(A, B, C) __builtin_ia32_getexpss128_round(A, B, 4)
 #define __builtin_ia32_getmantpd512_mask(A, F, C, D, E) __builtin_ia32_getmantpd512_mask(A, 1, C, D, 5)
 #define __builtin_ia32_getmantps512_mask(A, F, C, D, E) __builtin_ia32_getmantps512_mask(A, 1, C, D, 5)
+#define __builtin_ia32_getmantsd_round(A, B, C, D) __builtin_ia32_getmantsd_round(A, B, 1, 4)
+#define __builtin_ia32_getmantss_round(A, B, C, D) __builtin_ia32_getmantss_round(A, B, 1, 4)
 #define __builtin_ia32_insertf32x4_mask(A, B, F, D, E) __builtin_ia32_insertf32x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_insertf64x4_mask(A, B, F, D, E) __builtin_ia32_insertf64x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti32x4_mask(A, B, F, D, E) __builtin_ia32_inserti32x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti64x4_mask(A, B, F, D, E) __builtin_ia32_inserti64x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_maxpd512_mask(A, B, C, D, E) __builtin_ia32_maxpd512_mask(A, B, C, D, 5)
 #define __builtin_ia32_maxps512_mask(A, B, C, D, E) __builtin_ia32_maxps512_mask(A, B, C, D, 5)
+#define __builtin_ia32_maxsd_round(A, B, C) __builtin_ia32_maxsd_round(A, B, 4)
+#define __builtin_ia32_maxss_round(A, B, C) __builtin_ia32_maxss_round(A, B, 4)
 #define __builtin_ia32_minpd512_mask(A, B, C, D, E) __builtin_ia32_minpd512_mask(A, B, C, D, 5)
 #define __builtin_ia32_minps512_mask(A, B, C, D, E) __builtin_ia32_minps512_mask(A, B, C, D, 5)
+#define __builtin_ia32_minsd_round(A, B, C) __builtin_ia32_minsd_round(A, B, 4)
+#define __builtin_ia32_minss_round(A, B, C) __builtin_ia32_minss_round(A, B, 4)
 #define __builtin_ia32_mulpd512_mask(A, B, C, D, E) __builtin_ia32_mulpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_mulps512_mask(A, B, C, D, E) __builtin_ia32_mulps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_mulsd_round(A, B, C) __builtin_ia32_mulsd_round(A, B, 1)
+#define __builtin_ia32_mulss_round(A, B, C) __builtin_ia32_mulss_round(A, B, 1)
 #define __builtin_ia32_permdf512_mask(A, E, C, D) __builtin_ia32_permdf512_mask(A, 1, C, D)
 #define __builtin_ia32_permdi512_mask(A, E, C, D) __builtin_ia32_permdi512_mask(A, 1, C, D)
 #define __builtin_ia32_prold512_mask(A, E, C, D) __builtin_ia32_prold512_mask(A, 1, C, D)
@@ -252,10 +266,12 @@
 #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E)
 #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 5)
 #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 5)
-#define __builtin_ia32_rndscalesd_mask(A, B, I, D, E, F) __builtin_ia32_rndscalesd_mask(A, B, 1, D, E, 5)
-#define __builtin_ia32_rndscaless_mask(A, B, I, D, E, F) __builtin_ia32_rndscaless_mask(A, B, 1, D, E, 5)
+#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4)
+#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4)
 #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_scalefsd_round(A, B, C) __builtin_ia32_scalefsd_round(A, B, 1)
+#define __builtin_ia32_scalefss_round(A, B, C) __builtin_ia32_scalefss_round(A, B, 1)
 #define __builtin_ia32_scatterdiv8df(A, B, C, D, F) __builtin_ia32_scatterdiv8df(A, B, C, D, 1)
 #define __builtin_ia32_scatterdiv8di(A, B, C, D, F) __builtin_ia32_scatterdiv8di(A, B, C, D, 1)
 #define __builtin_ia32_scatterdiv16sf(A, B, C, D, F) __builtin_ia32_scatterdiv16sf(A, B, C, D, 1)
@@ -272,10 +288,12 @@
 #define __builtin_ia32_shufps512_mask(A, B, F, D, E) __builtin_ia32_shufps512_mask(A, B, 1, D, E)
 #define __builtin_ia32_sqrtpd512_mask(A, B, C, D) __builtin_ia32_sqrtpd512_mask(A, B, C, 1)
 #define __builtin_ia32_sqrtps512_mask(A, B, C, D) __builtin_ia32_sqrtps512_mask(A, B, C, 1)
-#define __builtin_ia32_sqrtsd_mask(A, B, C, D, E) __builtin_ia32_sqrtsd_mask(A, B, C, D, 1)
-#define __builtin_ia32_sqrtss_mask(A, B, C, D, E) __builtin_ia32_sqrtss_mask(A, B, C, D, 1)
+#define __builtin_ia32_sqrtss_round(A, B, C) __builtin_ia32_sqrtss_round(A, B, 1)
+#define __builtin_ia32_sqrtsd_round(A, B, C) __builtin_ia32_sqrtsd_round(A, B, 1)
 #define __builtin_ia32_subpd512_mask(A, B, C, D, E) __builtin_ia32_subpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_subps512_mask(A, B, C, D, E) __builtin_ia32_subps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_subsd_round(A, B, C) __builtin_ia32_subsd_round(A, B, 1)
+#define __builtin_ia32_subss_round(A, B, C) __builtin_ia32_subss_round(A, B, 1)
 #define __builtin_ia32_ucmpd512_mask(A, B, E, D) __builtin_ia32_ucmpd512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpq512_mask(A, B, E, D) __builtin_ia32_ucmpq512_mask(A, B, 1, D)
 #define __builtin_ia32_vcomisd(A, B, C, D) __builtin_ia32_vcomisd(A, B, 1, 5)
@@ -304,12 +322,8 @@
 #define __builtin_ia32_vfmaddps512_mask(A, B, C, D, E) __builtin_ia32_vfmaddps512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddps512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddps512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddps512_maskz(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsd3_mask(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsd3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsd3_maskz(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_mask(A, B, C, D, E) __builtin_ia32_vfmaddss3_mask(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddss3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddss3_maskz(A, B, C, D, 1)
+#define __builtin_ia32_vfmaddsd3_round(A, B, C, D) __builtin_ia32_vfmaddsd3_round(A, B, C, 1)
+#define __builtin_ia32_vfmaddss3_round(A, B, C, D) __builtin_ia32_vfmaddss3_round(A, B, C, 1)
 #define __builtin_ia32_vfmaddsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_maskz(A, B, C, D, 1)
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vaddsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vaddsd-1.c
new file mode 100644
index 0000000..f0bc5ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vaddsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vaddsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_add_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vaddss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vaddss-1.c
new file mode 100644
index 0000000..5a8491c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vaddss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vaddss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_add_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2ss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2ss-1.c
new file mode 100644
index 0000000..8cb51c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2ss-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsd2ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 s1, r;
+volatile __m128d s2;
+
+void extern
+avx512f_test (void)
+{
+  r = _mm_cvt_roundsd_ss (s1, s2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vcvtss2sd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vcvtss2sd-1.c
new file mode 100644
index 0000000..5b6a43f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vcvtss2sd-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vcvtss2sd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d s1, r;
+volatile __m128 s2;
+
+void extern
+avx512f_test (void)
+{
+  r = _mm_cvt_roundss_sd (s1, s2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vdivsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vdivsd-1.c
new file mode 100644
index 0000000..95df56c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vdivsd-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vdivsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_div_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vdivss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vdivss-1.c
new file mode 100644
index 0000000..5c6eb94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vdivss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vdivss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_div_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vextractf32x4-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vextractf32x4-2.c
new file mode 100644
index 0000000..35377b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vextractf32x4-2.c
@@ -0,0 +1,56 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#define AVX512F
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+#include "avx512f-mask-type.h"
+#include "string.h"
+
+void
+CALC (UNION_TYPE (AVX512F_LEN,) s1, float *res_ref, int mask)
+{
+  memset (res_ref, 0, 16);
+  memcpy (res_ref, s1.a + mask * 4, 16);
+}
+
+void static
+TEST (void)
+{
+  UNION_TYPE (AVX512F_LEN,) s1;
+  union128 res1, res2, res3;
+  float res_ref[4];
+  MASK_TYPE mask = MASK_VALUE;
+  int j;
+
+  for (j = 0; j < SIZE; j++)
+    {
+      s1.a[j] = j * j / 4.56;
+    }
+
+  for (j = 0; j < 4; j++)
+    {
+      res1.a[j] = DEFAULT_VALUE;
+      res2.a[j] = DEFAULT_VALUE;
+      res3.a[j] = DEFAULT_VALUE;
+    }
+
+  res1.x = INTRINSIC (_extractf32x4_ps) (s1.x, 1);
+  res2.x = INTRINSIC (_mask_extractf32x4_ps) (res2.x, mask, s1.x, 1);
+  res3.x = INTRINSIC (_maskz_extractf32x4_ps) (mask, s1.x, 1);
+  CALC (s1, res_ref, 1);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+
+  MASK_MERGE ()(res_ref, mask, 4);
+  if (check_union128 (res2, res_ref))
+    abort ();
+
+  MASK_ZERO ()(res_ref, mask, 4);
+  if (check_union128 (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vextracti32x4-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vextracti32x4-2.c
new file mode 100644
index 0000000..1ea77b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vextracti32x4-2.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#define AVX512F
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+#include "avx512f-mask-type.h"
+#include "string.h"
+
+void
+CALC (UNION_TYPE (AVX512F_LEN, i_d) s1, int *res_ref, int mask)
+{
+  memset (res_ref, 0, 16);
+  memcpy (res_ref, s1.a + mask * 4, 16);
+}
+
+void static
+TEST (void)
+{
+  UNION_TYPE (AVX512F_LEN, i_d) s1;
+  union128i_d res1, res2, res3;
+  int res_ref[4];
+  MASK_TYPE mask = MASK_VALUE;
+  int j;
+
+  for (j = 0; j < SIZE; j++)
+    {
+      s1.a[j] = j * j / 4.56;
+    }
+
+  for (j = 0; j < 4; j++)
+    {
+      res1.a[j] = DEFAULT_VALUE;
+      res2.a[j] = DEFAULT_VALUE;
+      res3.a[j] = DEFAULT_VALUE;
+    }
+
+  res1.x = INTRINSIC (_extracti32x4_epi32) (s1.x, 1);
+  res2.x =
+    INTRINSIC (_mask_extracti32x4_epi32) (res2.x, mask, s1.x, 1);
+  res3.x = INTRINSIC (_maskz_extracti32x4_epi32) (mask, s1.x, 1);
+  CALC (s1, res_ref, 1);
+
+  if (check_union128i_d (res1, res_ref))
+    abort ();
+
+  MASK_MERGE (i_d) (res_ref, mask, 4);
+  if (check_union128i_d (res2, res_ref))
+    abort ();
+
+  MASK_ZERO (i_d) (res_ref, mask, 4);
+  if (check_union128i_d (res3, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXsd-1.c
new file mode 100644
index 0000000..ea8b17c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmadd...sd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmadd_round_sd (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXss-1.c
new file mode 100644
index 0000000..cd44fb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfmaddXXXss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmadd...ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmadd_round_ss (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXsd-1.c
new file mode 100644
index 0000000..2d78df6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...sd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmsub_round_sd (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXss-1.c
new file mode 100644
index 0000000..b7609f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfmsubXXXss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmsub_round_ss (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXsd-1.c
new file mode 100644
index 0000000..e938236
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...sd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmadd_round_sd (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXss-1.c
new file mode 100644
index 0000000..f5752e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfnmaddXXXss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmadd_round_ss (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXsd-1.c
new file mode 100644
index 0000000..931b5d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...sd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmsub_round_sd (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXss-1.c
new file mode 100644
index 0000000..f097f1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vfnmsubXXXss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 a, b, c;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmsub_round_ss (a, b, c, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-1.c
new file mode 100644
index 0000000..952ed54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vgetexpsd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\, %xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vgetexpsd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\, %xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getexp_sd (x, x);
+  x = _mm_getexp_round_sd (x, x, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-2.c
new file mode 100644
index 0000000..c1e5e5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpsd-2.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#define SIZE (128 / 64)
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vgetexpsd (double *s, double *r)
+{
+  r[0] = floor (log (s[0]) / log (2));
+}
+
+void static
+avx512f_test (void)
+{
+  int i;
+  union128d res1, s1;
+  double res_ref[SIZE];
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 5.0 - i;
+      res_ref[i] = s1.a[i];
+    }
+
+  res1.x = _mm_getexp_sd (s1.x, s1.x);
+
+  compute_vgetexpsd (s1.a, res_ref);
+
+  if (check_fp_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-1.c
new file mode 100644
index 0000000..d946a47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vgetexpss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\, %xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vgetexpss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\, %xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getexp_ss (x, x);
+  x = _mm_getexp_round_ss (x, x, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-2.c
new file mode 100644
index 0000000..39d77c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetexpss-2.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#define SIZE (128 / 32)
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vgetexpss (float *s, float *r)
+{
+  r[0] = floor (log (s[0]) / log (2));
+}
+
+void static
+avx512f_test (void)
+{
+  int i;
+  union128 res1, s1;
+  float res_ref[SIZE];
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 5.0 - i;
+      res_ref[i] = s1.a[i];
+    }
+
+  res1.x = _mm_getexp_ss (s1.x, s1.x);
+
+  compute_vgetexpss (s1.a, res_ref);
+
+  if (check_fp_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-1.c
new file mode 100644
index 0000000..4b252a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-final { scan-assembler-times "vgetmantsd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[\\n\]" 2 } } */
+/* { dg-final { scan-assembler-times "vgetmantsd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[\\n\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x, y, z;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getmant_sd (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm_getmant_round_sd (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src,
+			    _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-2.c
new file mode 100644
index 0000000..50d98a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantsd-2.c
@@ -0,0 +1,94 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+#include <math.h>
+
+union fp_int_t
+{
+  long long int int_val;
+  double fp_val;
+};
+
+double
+get_norm_mant (double source, int signctrl, int interv)
+{
+  long long src, sign, exp, fraction;
+
+  union fp_int_t bin_conv;
+
+  bin_conv.fp_val = source;
+  src = bin_conv.int_val;
+  sign = (signctrl & 0x1) ? 0 : (src >> 63);
+  exp = (src & 0x7ff0000000000000) >> 52;
+  fraction = (src & 0xfffffffffffff);
+
+  if (isnan (source))
+    return signbit (source) ? -NAN : NAN;
+  if (source == 0.0 || source == -0.0 || isinf (source))
+    return sign ? -1.0 : 1.0;
+  if (signbit (source) && (signctrl & 0x2))
+    return -NAN;
+  if (!isnormal (source))
+    {
+      src = (src & 0xfff7ffffffffffff);
+      exp = 0x3ff;
+      while (!(src & 0x8000000000000))
+	{
+	  src += fraction & 0x8000000000000;
+	  fraction = fraction << 1;
+	  exp--;
+	}
+    }
+
+  switch (interv)
+    {
+    case 0:
+      exp = 0x3ff;
+      break;
+    case 1:
+      exp = ((exp - 0x3ff) & 0x1) ? 0x3fe : 0x3ff;
+      break;
+    case 2:
+      exp = 0x3fe;
+      break;
+    case 3:
+      exp = (fraction & 0x8000000000000) ? 0x3fe : 0x3ff;
+      break;
+    default:
+      abort ();
+    }
+
+  bin_conv.int_val = (sign << 63) | (exp << 52) | fraction;
+  return bin_conv.fp_val;
+}
+
+static void
+compute_vgetmantsd (double *r, double *s1, double *s2, int interv,
+		    int signctrl)
+{
+  r[0] = get_norm_mant (s2[0], signctrl, interv);
+  r[1] = s1[1];
+}
+
+static void
+avx512f_test (void)
+{
+  int i, sign;
+  union128d res1, src1, src2;
+  double res_ref[2];
+  int interv = _MM_MANT_NORM_p5_1;
+  int signctrl = _MM_MANT_SIGN_src;
+
+  src1.x = _mm_set_pd (-3.0, 111.111);
+  src2.x = _mm_set_pd (222.222, -2.0);
+
+  res1.x = _mm_getmant_sd (src1.x, src2.x, interv, signctrl);
+
+  compute_vgetmantsd (res_ref, src1.a, src2.a, interv, signctrl);
+
+  if (check_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-1.c
new file mode 100644
index 0000000..30c837b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-final { scan-assembler-times "vgetmantss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[\\n\]" 2 } } */
+/* { dg-final { scan-assembler-times "vgetmantss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[\\n\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x, y, z;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getmant_ss (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm_getmant_round_ss (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src,
+		      _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-2.c
new file mode 100644
index 0000000..291c0df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantss-2.c
@@ -0,0 +1,99 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+#include <math.h>
+
+union fp_int_t
+{
+  int int_val;
+  float fp_val;
+};
+
+float
+get_norm_mant (float source, int signctrl, int interv)
+{
+  int src, sign, exp, fraction;
+  union fp_int_t bin_conv;
+
+  bin_conv.fp_val = source;
+  src = bin_conv.int_val;
+  sign = (signctrl & 0x1) ? 0 : (src >> 31);
+  exp = (src & 0x7f800000) >> 23;
+  fraction = (src & 0x7fffff);
+
+  if (isnan (source))
+    return signbit (source) ? -NAN : NAN;
+  if (source == 0.0 || source == -0.0 || isinf (source))
+    return sign ? -1.0 : 1.0;
+  if (signbit (source) && (signctrl & 0x2))
+    return -NAN;
+  if (!isnormal (source))
+    {
+      src = (src & 0xffbfffff);
+      exp = 0x7f;
+      while (!(src & 0x400000))
+	{
+	  src += fraction & 0x400000;
+	  fraction = fraction << 1;
+	  exp--;
+	}
+    }
+
+  switch (interv)
+    {
+    case 0:
+      exp = 0x7f;
+      break;
+    case 1:
+      exp = ((exp - 0x7f) & 0x1) ? 0x7e : 0x7f;
+      break;
+    case 2:
+      exp = 0x7e;
+      break;
+    case 3:
+      exp = (fraction & 0x400000) ? 0x7e : 0x7f;
+      break;
+    default:
+      abort ();
+    }
+
+  bin_conv.int_val = (sign << 31) | (exp << 23) | fraction;
+
+  return bin_conv.fp_val;
+
+}
+
+static void
+compute_vgetmantss (float *r, float *s1, float *s2, int interv,
+		    int signctrl)
+{
+  int i;
+  r[0] = get_norm_mant (s2[0], signctrl, interv);
+  for (i = 1; i < 4; i++)
+    {
+      r[i] = s1[i];
+    }
+}
+
+static void
+avx512f_test (void)
+{
+  int i, sign;
+  union128 res1, src1, src2;
+  float res_ref[4];
+  int interv = _MM_MANT_NORM_p5_1;
+  int signctrl = _MM_MANT_SIGN_src;
+
+  src1.x = _mm_set_ps (-24.043, 68.346, -43.35, 546.46);
+  src2.x = _mm_set_ps (222.222, 333.333, 444.444, -2.0);
+
+  res1.x = _mm_getmant_ss (src1.x, src2.x, interv, signctrl);
+
+  compute_vgetmantss (res_ref, src1.a, src2.a, interv, signctrl);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmaxsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmaxsd-1.c
new file mode 100644
index 0000000..8c24704
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmaxsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vmaxsd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_max_round_sd (x1, x2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmaxss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmaxss-1.c
new file mode 100644
index 0000000..027445d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmaxss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vmaxss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_max_round_ss (x1, x2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vminsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vminsd-1.c
new file mode 100644
index 0000000..8f8488f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vminsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vminsd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_min_round_sd (x1, x2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vminss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vminss-1.c
new file mode 100644
index 0000000..0774b75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vminss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vminss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_min_round_ss (x1, x2, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmulsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmulsd-1.c
new file mode 100644
index 0000000..c85832a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmulsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vmulsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_mul_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmulss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmulss-1.c
new file mode 100644
index 0000000..cb4bf0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vmulss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vmulss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_mul_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-1.c
new file mode 100644
index 0000000..c0c8d03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrcp14sd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_rcp14_sd (x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c
new file mode 100644
index 0000000..9ff3541
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vrcp14sd (double *s1, double *s2, double *r)
+{
+  r[0] = 1.0 / s2[0];
+  r[1] = s1[1];
+}
+
+static void
+avx512f_test (void)
+{
+  union128d s1, s2, res1, res2, res3;
+  double res_ref[2];
+
+  s1.x = _mm_set_pd (-3.0, 111.111);
+  s2.x = _mm_set_pd (222.222, -2.0);
+  res2.a[0] = DEFAULT_VALUE;
+
+  res1.x = _mm_rcp14_sd (s1.x, s2.x);
+
+  compute_vrcp14sd (s1.a, s2.a, res_ref);
+
+  if (check_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-1.c
new file mode 100644
index 0000000..580dfd6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrcp14ss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_rcp14_ss (x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c
new file mode 100644
index 0000000..fe8989a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vrcp14ss (float *s1, float *s2, float *r)
+{
+  r[0] = 1.0 / s2[0];
+  r[1] = s1[1];
+  r[2] = s1[2];
+  r[3] = s1[3];
+}
+
+static void
+avx512f_test (void)
+{
+  union128 s1, s2, res1, res2, res3;
+  float res_ref[4];
+
+  s1.x = _mm_set_ps (-24.043, 68.346, -43.35, 546.46);
+  s2.x = _mm_set_ps (222.222, 333.333, 444.444, -2.0);
+  res2.a[0] = DEFAULT_VALUE;
+
+  res1.x = _mm_rcp14_ss (s1.x, s2.x);
+
+  compute_vrcp14ss (s1.a, s2.a, res_ref);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c
new file mode 100644
index 0000000..2f370a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\\S*,\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_roundscale_sd (x1, x2, 0x42);
+  x1 = _mm_roundscale_round_sd (x1, x2, 0x42, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c
new file mode 100644
index 0000000..5b4e842
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#define SIZE (128 / 64)
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_rndscalesd (double *s1, double *s2, double *r, int imm)
+{
+  int rc, m;
+  rc = imm & 0xf;
+  m = imm >> 4;
+
+  switch (rc)
+    {
+    case _MM_FROUND_FLOOR:
+      r[0] = floor (s2[0] * pow (2, m)) / pow (2, m);
+      break;
+    case _MM_FROUND_CEIL:
+      r[0] = ceil (s2[0] * pow (2, m)) / pow (2, m);
+      break;
+    default:
+      abort ();
+      break;
+    }
+
+  r[1] = s1[1];
+}
+
+static void
+avx512f_test (void)
+{
+  int imm = _MM_FROUND_FLOOR | (7 << 4);
+  union128d s1, s2, res1;
+  double res_ref[SIZE];
+
+  s1.x = _mm_set_pd (4.05084, -1.23162);
+  s2.x = _mm_set_pd (-3.53222, 7.33527);
+
+  res1.x = _mm_roundscale_sd (s1.x, s2.x, imm);
+
+  compute_rndscalesd (s1.a, s2.a, res_ref, imm);
+
+  if (check_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c
new file mode 100644
index 0000000..c9f5a75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\\S*,\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_roundscale_ss (x1, x2, 0x42);
+  x1 = _mm_roundscale_round_ss (x1, x2, 0x42, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
new file mode 100644
index 0000000..7acfe4c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#define SIZE (128 / 32)
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_rndscaless (float *s1, float *s2, float *r, int imm)
+{
+  int rc, m;
+  rc = imm & 0xf;
+  m = imm >> 4;
+
+  switch (rc)
+    {
+    case _MM_FROUND_FLOOR:
+      r[0] = floorf (s2[0] * pow (2, m)) / pow (2, m);
+      break;
+    case _MM_FROUND_CEIL:
+      r[0] = ceilf (s2[0] * pow (2, m)) / pow (2, m);
+      break;
+    default:
+      abort ();
+      break;
+    }
+
+  r[1] = s1[1];
+  r[2] = s1[2];
+  r[3] = s1[3];
+}
+
+static void
+avx512f_test (void)
+{
+  int imm = _MM_FROUND_FLOOR | (7 << 4);
+  union128 s1, s2, res1;
+  float res_ref[SIZE];
+
+  s1.x = _mm_set_ps (4.05084, -1.23162, 2.00231, -6.22103);
+  s2.x = _mm_set_ps (-4.19319, -3.53222, 7.33527, 5.57655);
+
+  res1.x = _mm_roundscale_ss (s1.x, s2.x, imm);
+
+  compute_rndscaless (s1.a, s2.a, res_ref, imm);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-1.c
new file mode 100644
index 0000000..bd8b7a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrsqrt14sd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_rsqrt14_sd (x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-2.c
new file mode 100644
index 0000000..ef4e407
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14sd-2.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vrsqrt14sd (double *s1, double *s2, double *r)
+{
+  r[0] = 1.0 / sqrt (s2[0]);
+  r[1] = s1[1];
+}
+
+static void
+avx512f_test (void)
+{
+  union128d s1, s2, res1, res2, res3;
+  double res_ref[2];
+
+  s1.x = _mm_set_pd (-3.0, 111.111);
+  s2.x = _mm_set_pd (222.222, 4.0);
+  res2.a[0] = DEFAULT_VALUE;
+
+  res1.x = _mm_rsqrt14_sd (s1.x, s2.x);
+
+  compute_vrsqrt14sd (s1.a, s2.a, res_ref);
+
+  if (check_fp_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-1.c
new file mode 100644
index 0000000..d4d4eea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vrsqrt14ss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_rsqrt14_ss (x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-2.c
new file mode 100644
index 0000000..b01420f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrsqrt14ss-2.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+static void
+compute_vrsqrt14ss (float *s1, float *s2, float *r)
+{
+  r[0] = 1.0 / sqrt (s2[0]);
+  r[1] = s1[1];
+  r[2] = s1[2];
+  r[3] = s1[3];
+}
+
+static void
+avx512f_test (void)
+{
+  union128 s1, s2, res1, res2, res3;
+  float res_ref[4];
+
+  s1.x = _mm_set_ps (-24.43, 68.346, -43.35, 546.46);
+  s2.x = _mm_set_ps (222.222, 333.333, 444.444, 4.0);
+  res2.a[0] = DEFAULT_VALUE;
+
+  res1.x = _mm_rsqrt14_ss (s1.x, s2.x);
+
+  compute_vrsqrt14ss (s1.a, s2.a, res_ref);
+
+  if (check_fp_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-1.c
new file mode 100644
index 0000000..bbf238e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vscalefsd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscalefsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_scalef_sd (x, x);
+  x = _mm_scalef_round_sd (x, x, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-2.c
new file mode 100644
index 0000000..131fc67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vscalefsd-2.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+#define SIZE (128 / 64)
+
+static void
+compute_scalefsd (double *s1, double *s2, double *r)
+{
+  r[0] = s1[0] * pow (2, floor (s2[0]));
+  r[1] = s1[1];
+}
+
+void static
+avx512f_test (void)
+{
+  union128d res1, s1, s2;
+  double res_ref[SIZE];
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 11.5 * (i + 1);
+      s2.a[i] = 10.5 * (i + 1);
+    }
+
+  res1.x = _mm_scalef_sd (s1.x, s2.x);
+
+  compute_scalefsd (s1.a, s2.a, res_ref);
+
+  if (check_union128d (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-1.c
new file mode 100644
index 0000000..d36b2ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vscalefss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[^\{\]" 2 } } */
+/* { dg-final { scan-assembler-times "vscalefss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_scalef_ss (x, x);
+  x = _mm_scalef_round_ss (x, x, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-2.c
new file mode 100644
index 0000000..3e8f6d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vscalefss-2.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-require-effective-target avx512f } */
+
+#include <math.h>
+#include "avx512f-check.h"
+#include "avx512f-helper.h"
+
+#define SIZE (128 / 32)
+
+static void
+compute_scalefss (float *s1, float *s2, float *r)
+{
+  r[0] = s1[0] * (float) pow (2, floor (s2[0]));
+  r[1] = s1[1];
+  r[2] = s1[2];
+  r[3] = s1[3];
+}
+
+static void
+avx512f_test (void)
+{
+  union128 res1, s1, s2;
+  float res_ref[SIZE];
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 11.5 * (i + 1);
+      s2.a[i] = 10.5 * (i + 1);
+    }
+
+  res1.x = _mm_scalef_ss (s1.x, s2.x);
+
+  compute_scalefss (s1.a, s2.a, res_ref);
+
+  if (check_union128 (res1, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vsqrtsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vsqrtsd-1.c
new file mode 100644
index 0000000..5814e3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vsqrtsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vsqrtsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_sqrt_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vsqrtss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vsqrtss-1.c
new file mode 100644
index 0000000..81e8a0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vsqrtss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vsqrtss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_sqrt_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vsubsd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vsubsd-1.c
new file mode 100644
index 0000000..511ceb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vsubsd-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vsubsd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128d x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_sub_round_sd (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vsubss-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vsubss-1.c
new file mode 100644
index 0000000..618662f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vsubss-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vsubss\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128 x1, x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_sub_round_ss (x1, x2, _MM_FROUND_TO_NEAREST_INT);
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index e8cb533..c5d8876 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -199,6 +199,7 @@ test_1x (_mm512_getmant_pd, __m512d, __m512d, 1, 1)
 test_1x (_mm512_getmant_ps, __m512, __m512, 1, 1)
 test_1x (_mm512_roundscale_round_pd, __m512d, __m512d, 1, 5)
 test_1x (_mm512_roundscale_round_ps, __m512, __m512, 1, 5)
+test_1x (_mm_cvt_roundi32_ss, __m128, __m128, 1, 1)
 test_2 (_mm512_add_round_pd, __m512d, __m512d, __m512d, 1)
 test_2 (_mm512_add_round_ps, __m512, __m512, __m512, 1)
 test_2 (_mm512_alignr_epi32, __m512i, __m512i, __m512i, 1)
@@ -278,16 +279,45 @@ test_2 (_mm512_shuffle_pd, __m512d, __m512d, __m512d, 1)
 test_2 (_mm512_shuffle_ps, __m512, __m512, __m512, 1)
 test_2 (_mm512_sub_round_pd, __m512d, __m512d, __m512d, 1)
 test_2 (_mm512_sub_round_ps, __m512, __m512, __m512, 1)
+test_2 (_mm_add_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_add_round_ss, __m128, __m128, __m128, 1)
 test_2 (_mm_cmp_sd_mask, __mmask8, __m128d, __m128d, 1)
 test_2 (_mm_cmp_ss_mask, __mmask8, __m128, __m128, 1)
 #ifdef __x86_64__
+test_2 (_mm_cvt_roundi64_sd, __m128d, __m128d, long long, 1)
+test_2 (_mm_cvt_roundi64_ss, __m128, __m128, long long, 1)
 #endif
+test_2 (_mm_cvt_roundsd_ss, __m128, __m128, __m128d, 1)
+test_2 (_mm_cvt_roundss_sd, __m128d, __m128d, __m128, 5)
+test_2 (_mm_cvt_roundu32_ss, __m128, __m128, unsigned, 1)
 #ifdef __x86_64__
+test_2 (_mm_cvt_roundu64_sd, __m128d, __m128d, unsigned long long, 1)
+test_2 (_mm_cvt_roundu64_ss, __m128, __m128, unsigned long long, 1)
 #endif
+test_2 (_mm_div_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_div_round_ss, __m128, __m128, __m128, 1)
+test_2 (_mm_getexp_round_sd, __m128d, __m128d, __m128d, 5)
+test_2 (_mm_getexp_round_ss, __m128, __m128, __m128, 5)
+test_2y (_mm_getmant_round_sd, __m128d, __m128d, __m128d, 1, 1, 5)
+test_2y (_mm_getmant_round_ss, __m128, __m128, __m128, 1, 1, 5)
+test_2 (_mm_mul_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_mul_round_ss, __m128, __m128, __m128, 1)
+test_2 (_mm_scalef_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_scalef_round_ss, __m128, __m128, __m128, 1)
+test_2 (_mm_sqrt_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_sqrt_round_ss, __m128, __m128, __m128, 1)
+test_2 (_mm_sub_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_sub_round_ss, __m128, __m128, __m128, 1)
 test_2x (_mm512_cmp_round_pd_mask, __mmask8, __m512d, __m512d, 1, 5)
 test_2x (_mm512_cmp_round_ps_mask, __mmask16, __m512, __m512, 1, 5)
 test_2x (_mm512_maskz_roundscale_round_pd, __m512d, __mmask8, __m512d, 1, 5)
 test_2x (_mm512_maskz_roundscale_round_ps, __m512, __mmask16, __m512, 1, 5)
+test_2x (_mm_cmp_round_sd_mask, __mmask8, __m128d, __m128d, 1, 5)
+test_2x (_mm_cmp_round_ss_mask, __mmask8, __m128, __m128, 1, 5)
+test_2x (_mm_comi_round_sd, int, __m128d, __m128d, 1, 5)
+test_2x (_mm_comi_round_ss, int, __m128, __m128, 1, 5)
+test_2x (_mm_roundscale_round_sd, __m128d, __m128d, __m128d, 1, 5)
+test_2x (_mm_roundscale_round_ss, __m128, __m128, __m128, 1, 5)
 test_3 (_mm512_fmadd_round_pd, __m512d, __m512d, __m512d, __m512d, 1)
 test_3 (_mm512_fmadd_round_ps, __m512, __m512, __m512, __m512, 1)
 test_3 (_mm512_fmaddsub_round_pd, __m512d, __m512d, __m512d, __m512d, 1)
@@ -373,6 +403,14 @@ test_3 (_mm512_maskz_sub_round_pd, __m512d, __mmask8, __m512d, __m512d, 1)
 test_3 (_mm512_maskz_sub_round_ps, __m512, __mmask16, __m512, __m512, 1)
 test_3 (_mm512_ternarylogic_epi32, __m512i, __m512i, __m512i, __m512i, 1)
 test_3 (_mm512_ternarylogic_epi64, __m512i, __m512i, __m512i, __m512i, 1)
+test_3 (_mm_fmadd_round_sd, __m128d, __m128d, __m128d, __m128d, 1)
+test_3 (_mm_fmadd_round_ss, __m128, __m128, __m128, __m128, 1)
+test_3 (_mm_fmsub_round_sd, __m128d, __m128d, __m128d, __m128d, 1)
+test_3 (_mm_fmsub_round_ss, __m128, __m128, __m128, __m128, 1)
+test_3 (_mm_fnmadd_round_sd, __m128d, __m128d, __m128d, __m128d, 1)
+test_3 (_mm_fnmadd_round_ss, __m128, __m128, __m128, __m128, 1)
+test_3 (_mm_fnmsub_round_sd, __m128d, __m128d, __m128d, __m128d, 1)
+test_3 (_mm_fnmsub_round_ss, __m128, __m128, __m128, __m128, 1)
 test_3 (_mm_mask_cmp_sd_mask, __mmask8, __mmask8, __m128d, __m128d, 1)
 test_3 (_mm_mask_cmp_ss_mask, __mmask8, __mmask8, __m128, __m128, 1)
 test_3v (_mm512_i32scatter_epi32, void *, __m512i, __m512i, 1)
@@ -385,6 +423,10 @@ test_3v (_mm512_i64scatter_pd, void *, __m512i, __m512d, 1)
 test_3v (_mm512_i64scatter_ps, void *, __m512i, __m256, 1)
 test_3x (_mm512_mask_roundscale_round_pd, __m512d, __m512d, __mmask8, __m512d, 1, 5)
 test_3x (_mm512_mask_roundscale_round_ps, __m512, __m512, __mmask16, __m512, 1, 5)
+test_3x (_mm_fixupimm_round_sd, __m128d, __m128d, __m128d, __m128i, 1, 5)
+test_3x (_mm_fixupimm_round_ss, __m128, __m128, __m128, __m128i, 1, 5)
+test_3x (_mm_mask_cmp_round_sd_mask, __mmask8, __mmask8, __m128d, __m128d, 1, 5)
+test_3x (_mm_mask_cmp_round_ss_mask, __mmask8, __mmask8, __m128, __m128, 1, 5)
 test_4 (_mm512_mask3_fmadd_round_pd, __m512d, __m512d, __m512d, __m512d, __mmask8, 1)
 test_4 (_mm512_mask3_fmadd_round_ps, __m512, __m512, __m512, __m512, __mmask16, 1)
 test_4 (_mm512_mask3_fmaddsub_round_pd, __m512d, __m512d, __m512d, __m512d, __mmask8, 1)
@@ -471,6 +513,10 @@ test_4x (_mm512_mask_fixupimm_round_pd, __m512d, __m512d, __mmask8, __m512d, __m
 test_4x (_mm512_mask_fixupimm_round_ps, __m512, __m512, __mmask16, __m512, __m512i, 1, 5)
 test_4x (_mm512_maskz_fixupimm_round_pd, __m512d, __mmask8, __m512d, __m512d, __m512i, 1, 5)
 test_4x (_mm512_maskz_fixupimm_round_ps, __m512, __mmask16, __m512, __m512, __m512i, 1, 5)
+test_4x (_mm_mask_fixupimm_round_sd, __m128d, __m128d, __mmask8, __m128d, __m128i, 1, 5)
+test_4x (_mm_mask_fixupimm_round_ss, __m128, __m128, __mmask8, __m128, __m128i, 1, 5)
+test_4x (_mm_maskz_fixupimm_round_sd, __m128d, __mmask8, __m128d, __m128d, __m128i, 1, 5)
+test_4x (_mm_maskz_fixupimm_round_ss, __m128, __mmask8, __m128, __m128, __m128i, 1, 5)
 
 /* avx512pfintrin.h */
 test_3vx (_mm512_mask_prefetch_i32gather_ps, __m512i, __mmask16, void const *, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 0123538..a6a7b39 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -186,6 +186,8 @@
 /* avx512fintrin.h */
 #define __builtin_ia32_addpd512_mask(A, B, C, D, E) __builtin_ia32_addpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_addps512_mask(A, B, C, D, E) __builtin_ia32_addps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_addsd_round(A, B, C) __builtin_ia32_addsd_round(A, B, 1)
+#define __builtin_ia32_addss_round(A, B, C) __builtin_ia32_addss_round(A, B, 1)
 #define __builtin_ia32_alignd512_mask(A, B, F, D, E) __builtin_ia32_alignd512_mask(A, B, 1, D, E)
 #define __builtin_ia32_alignq512_mask(A, B, F, D, E) __builtin_ia32_alignq512_mask(A, B, 1, D, E)
 #define __builtin_ia32_cmpd512_mask(A, B, E, D) __builtin_ia32_cmpd512_mask(A, B, 1, D)
@@ -201,6 +203,8 @@
 #define __builtin_ia32_cvtps2dq512_mask(A, B, C, D) __builtin_ia32_cvtps2dq512_mask(A, B, C, 1)
 #define __builtin_ia32_cvtps2pd512_mask(A, B, C, D) __builtin_ia32_cvtps2pd512_mask(A, B, C, 5)
 #define __builtin_ia32_cvtps2udq512_mask(A, B, C, D) __builtin_ia32_cvtps2udq512_mask(A, B, C, 1)
+#define __builtin_ia32_cvtsd2ss_round(A, B, C) __builtin_ia32_cvtsd2ss_round(A, B, 1)
+#define __builtin_ia32_cvtss2sd_round(A, B, C) __builtin_ia32_cvtss2sd_round(A, B, 4)
 #define __builtin_ia32_cvtsi2sd64(A, B, C) __builtin_ia32_cvtsi2sd64(A, B, 1)
 #define __builtin_ia32_cvtsi2ss32(A, B, C) __builtin_ia32_cvtsi2ss32(A, B, 1)
 #define __builtin_ia32_cvtsi2ss64(A, B, C) __builtin_ia32_cvtsi2ss64(A, B, 1)
@@ -214,6 +218,8 @@
 #define __builtin_ia32_cvtusi2ss64(A, B, C) __builtin_ia32_cvtusi2ss64(A, B, 1)
 #define __builtin_ia32_divpd512_mask(A, B, C, D, E) __builtin_ia32_divpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_divps512_mask(A, B, C, D, E) __builtin_ia32_divps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_divsd_round(A, B, C) __builtin_ia32_divsd_round(A, B, 1)
+#define __builtin_ia32_divss_round(A, B, C) __builtin_ia32_divss_round(A, B, 1)
 #define __builtin_ia32_extractf32x4_mask(A, E, C, D) __builtin_ia32_extractf32x4_mask(A, 1, C, D)
 #define __builtin_ia32_extractf64x4_mask(A, E, C, D) __builtin_ia32_extractf64x4_mask(A, 1, C, D)
 #define __builtin_ia32_extracti32x4_mask(A, E, C, D) __builtin_ia32_extracti32x4_mask(A, 1, C, D)
@@ -236,18 +242,28 @@
 #define __builtin_ia32_gathersiv8di(A, B, C, D, F) __builtin_ia32_gathersiv8di(A, B, C, D, 1)
 #define __builtin_ia32_getexppd512_mask(A, B, C, D) __builtin_ia32_getexppd512_mask(A, B, C, 5)
 #define __builtin_ia32_getexpps512_mask(A, B, C, D) __builtin_ia32_getexpps512_mask(A, B, C, 5)
+#define __builtin_ia32_getexpsd128_round(A, B, C) __builtin_ia32_getexpsd128_round(A, B, 4)
+#define __builtin_ia32_getexpss128_round(A, B, C) __builtin_ia32_getexpss128_round(A, B, 4)
 #define __builtin_ia32_getmantpd512_mask(A, F, C, D, E) __builtin_ia32_getmantpd512_mask(A, 1, C, D, 5)
 #define __builtin_ia32_getmantps512_mask(A, F, C, D, E) __builtin_ia32_getmantps512_mask(A, 1, C, D, 5)
+#define __builtin_ia32_getmantsd_round(A, B, C, D) __builtin_ia32_getmantsd_round(A, B, 1, 4)
+#define __builtin_ia32_getmantss_round(A, B, C, D) __builtin_ia32_getmantss_round(A, B, 1, 4)
 #define __builtin_ia32_insertf32x4_mask(A, B, F, D, E) __builtin_ia32_insertf32x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_insertf64x4_mask(A, B, F, D, E) __builtin_ia32_insertf64x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti32x4_mask(A, B, F, D, E) __builtin_ia32_inserti32x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_inserti64x4_mask(A, B, F, D, E) __builtin_ia32_inserti64x4_mask(A, B, 1, D, E)
 #define __builtin_ia32_maxpd512_mask(A, B, C, D, E) __builtin_ia32_maxpd512_mask(A, B, C, D, 5)
 #define __builtin_ia32_maxps512_mask(A, B, C, D, E) __builtin_ia32_maxps512_mask(A, B, C, D, 5)
+#define __builtin_ia32_maxsd_round(A, B, C) __builtin_ia32_maxsd_round(A, B, 4)
+#define __builtin_ia32_maxss_round(A, B, C) __builtin_ia32_maxss_round(A, B, 4)
 #define __builtin_ia32_minpd512_mask(A, B, C, D, E) __builtin_ia32_minpd512_mask(A, B, C, D, 5)
 #define __builtin_ia32_minps512_mask(A, B, C, D, E) __builtin_ia32_minps512_mask(A, B, C, D, 5)
+#define __builtin_ia32_minsd_round(A, B, C) __builtin_ia32_minsd_round(A, B, 4)
+#define __builtin_ia32_minss_round(A, B, C) __builtin_ia32_minss_round(A, B, 4)
 #define __builtin_ia32_mulpd512_mask(A, B, C, D, E) __builtin_ia32_mulpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_mulps512_mask(A, B, C, D, E) __builtin_ia32_mulps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_mulsd_round(A, B, C) __builtin_ia32_mulsd_round(A, B, 1)
+#define __builtin_ia32_mulss_round(A, B, C) __builtin_ia32_mulss_round(A, B, 1)
 #define __builtin_ia32_permdf512_mask(A, E, C, D) __builtin_ia32_permdf512_mask(A, 1, C, D)
 #define __builtin_ia32_permdi512_mask(A, E, C, D) __builtin_ia32_permdi512_mask(A, 1, C, D)
 #define __builtin_ia32_prold512_mask(A, E, C, D) __builtin_ia32_prold512_mask(A, 1, C, D)
@@ -267,8 +283,12 @@
 #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E)
 #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 5)
 #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 5)
+#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4)
+#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4)
 #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_scalefsd_round(A, B, C) __builtin_ia32_scalefsd_round(A, B, 1)
+#define __builtin_ia32_scalefss_round(A, B, C) __builtin_ia32_scalefss_round(A, B, 1)
 #define __builtin_ia32_scatterdiv8df(A, B, C, D, F) __builtin_ia32_scatterdiv8df(A, B, C, D, 1)
 #define __builtin_ia32_scatterdiv8di(A, B, C, D, F) __builtin_ia32_scatterdiv8di(A, B, C, D, 1)
 #define __builtin_ia32_scatterdiv16sf(A, B, C, D, F) __builtin_ia32_scatterdiv16sf(A, B, C, D, 1)
@@ -285,8 +305,12 @@
 #define __builtin_ia32_shufps512_mask(A, B, F, D, E) __builtin_ia32_shufps512_mask(A, B, 1, D, E)
 #define __builtin_ia32_sqrtpd512_mask(A, B, C, D) __builtin_ia32_sqrtpd512_mask(A, B, C, 1)
 #define __builtin_ia32_sqrtps512_mask(A, B, C, D) __builtin_ia32_sqrtps512_mask(A, B, C, 1)
+#define __builtin_ia32_sqrtss_round(A, B, C) __builtin_ia32_sqrtss_round(A, B, 1)
+#define __builtin_ia32_sqrtsd_round(A, B, C) __builtin_ia32_sqrtsd_round(A, B, 1)
 #define __builtin_ia32_subpd512_mask(A, B, C, D, E) __builtin_ia32_subpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_subps512_mask(A, B, C, D, E) __builtin_ia32_subps512_mask(A, B, C, D, 1)
+#define __builtin_ia32_subsd_round(A, B, C) __builtin_ia32_subsd_round(A, B, 1)
+#define __builtin_ia32_subss_round(A, B, C) __builtin_ia32_subss_round(A, B, 1)
 #define __builtin_ia32_ucmpd512_mask(A, B, E, D) __builtin_ia32_ucmpd512_mask(A, B, 1, D)
 #define __builtin_ia32_ucmpq512_mask(A, B, E, D) __builtin_ia32_ucmpq512_mask(A, B, 1, D)
 #define __builtin_ia32_vcomisd(A, B, C, D) __builtin_ia32_vcomisd(A, B, 1, 5)
@@ -315,12 +339,8 @@
 #define __builtin_ia32_vfmaddps512_mask(A, B, C, D, E) __builtin_ia32_vfmaddps512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddps512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddps512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddps512_maskz(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsd3_mask(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsd3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddsd3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsd3_maskz(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_mask(A, B, C, D, E) __builtin_ia32_vfmaddss3_mask(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddss3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmaddss3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddss3_maskz(A, B, C, D, 1)
+#define __builtin_ia32_vfmaddsd3_round(A, B, C, D) __builtin_ia32_vfmaddsd3_round(A, B, C, 1)
+#define __builtin_ia32_vfmaddss3_round(A, B, C, D) __builtin_ia32_vfmaddss3_round(A, B, C, 1)
 #define __builtin_ia32_vfmaddsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmaddsubpd512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubpd512_maskz(A, B, C, D, 1)
@@ -331,8 +351,6 @@
 #define __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddps512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubpd512_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfmsubps512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubps512_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmsubsd3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsd3_mask3(A, B, C, D, 1)
-#define __builtin_ia32_vfmsubss3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubss3_mask3(A, B, C, D, 1)
 #define __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddpd512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfnmaddps512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddps512_mask(A, B, C, D, 1)
 #define __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubpd512_mask(A, B, C, D, 1)
diff --git a/gcc/testsuite/gcc.target/i386/testimm-10.c b/gcc/testsuite/gcc.target/i386/testimm-10.c
index b73f7f0..58e5e52 100644
--- a/gcc/testsuite/gcc.target/i386/testimm-10.c
+++ b/gcc/testsuite/gcc.target/i386/testimm-10.c
@@ -77,7 +77,13 @@ test8bit (void)
   m512  = _mm512_mask_fixupimm_ps (m512, mmask16, m512, m512i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
   m512  = _mm512_maskz_fixupimm_ps (mmask16, m512, m512, m512i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
 
+  m128d = _mm_fixupimm_sd (m128d, m128d, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
+  m128d = _mm_mask_fixupimm_sd (m128d, mmask8, m128d, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
+  m128d = _mm_maskz_fixupimm_sd (mmask8, m128d, m128d, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
 
+  m128  = _mm_fixupimm_ss (m128, m128, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
+  m128  = _mm_mask_fixupimm_ss (m128, mmask8, m128, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
+  m128  = _mm_maskz_fixupimm_ss (mmask8, m128, m128, m128i, 256); /* { dg-error "the immediate argument must be 8-bit immediate." } */
 
   m512i = _mm512_rol_epi32 (m512i, 256); /* { dg-error "the last argument must be an 8-bit immediate" } */
   m512i = _mm512_mask_rol_epi32 (m512i, mmask16, m512i, 256); /* { dg-error "the last argument must be an 8-bit immediate" } */
@@ -107,6 +113,8 @@ test8bit (void)
   m512  = _mm512_mask_roundscale_ps (m512, mmask16, m512, 256); /* { dg-error "the immediate argument must be 8-bit immediate" } */
   m512  = _mm512_maskz_roundscale_ps (mmask16, m512, 256); /* { dg-error "the immediate argument must be 8-bit immediate" } */
 
+  m128d = _mm_roundscale_sd (m128d, m128d, 256); /* { dg-error "the immediate argument must be 8-bit immediate" } */
+  m128  = _mm_roundscale_ss (m128, m128, 256); /* { dg-error "the immediate argument must be 8-bit immediate" } */
 
   m512i = _mm512_alignr_epi32 (m512i, m512i, 256); /* { dg-error "the last argument must be an 8-bit immediate" } */
   m512i = _mm512_mask_alignr_epi32 (m512i, mmask16, m512i, m512i, 256); /* { dg-error "the last argument must be an 8-bit immediate" } */
@@ -179,5 +187,6 @@ test4bit (void) {
   m512  = _mm512_mask_getmant_ps (m512, mmask16, m512, 1, 64); /* { dg-error "the immediate argument must be 4-bit immediate." } */
   m512  = _mm512_maskz_getmant_ps (mmask16, m512, 1, 64); /* { dg-error "the immediate argument must be 4-bit immediate." } */
 
-
+  m128d = _mm_getmant_sd (m128d, m128d, 1, 64); /* { dg-error "the immediate argument must be 4-bit immediate." } */
+  m128  = _mm_getmant_ss (m128, m128, 1, 64); /* { dg-error "the immediate argument must be 4-bit immediate." } */
 }
diff --git a/gcc/testsuite/gcc.target/i386/testround-1.c b/gcc/testsuite/gcc.target/i386/testround-1.c
index 20db176..114f386 100644
--- a/gcc/testsuite/gcc.target/i386/testround-1.c
+++ b/gcc/testsuite/gcc.target/i386/testround-1.c
@@ -19,12 +19,19 @@ __mmask16 mmask16;
 void
 test_round (void)
 {
+  m128d = _mm_add_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_add_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_sub_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_sub_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+
   m512d = _mm512_sqrt_round_pd (m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_sqrt_round_pd (m512d, mmask8, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_maskz_sqrt_round_pd (mmask8, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_sqrt_round_ps (m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_sqrt_round_ps (m512, mmask16, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_sqrt_round_ps (mmask16, m512, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_sqrt_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_sqrt_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_add_round_pd (m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_add_round_pd (m512d, mmask8, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
@@ -51,6 +58,10 @@ test_round (void)
   m512 = _mm512_div_round_ps (m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_div_round_ps (m512, mmask16, m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_div_round_ps (mmask16, m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_mul_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_mul_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_div_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_div_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_scalef_round_pd(m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_scalef_round_pd(m512d, mmask8, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
@@ -58,6 +69,8 @@ test_round (void)
   m512 = _mm512_scalef_round_ps(m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_scalef_round_ps(m512, mmask16, m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_scalef_round_ps(mmask16, m512, m512, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_scalef_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_scalef_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_fmadd_round_pd (m512d, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_fmadd_round_pd (m512d, mmask8, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
@@ -141,6 +154,16 @@ test_round (void)
   m256 = _mm512_cvt_roundpd_ps (m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m256 = _mm512_mask_cvt_roundpd_ps (m256, mmask8, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m256 = _mm512_maskz_cvt_roundpd_ps (mmask8, m512d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_cvt_roundsd_ss (m128, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+
+  m128d = _mm_fmadd_round_sd (m128d, m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fmadd_round_ss (m128, m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fmsub_round_sd (m128d, m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fmsub_round_ss (m128, m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fnmadd_round_sd (m128d, m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fnmadd_round_ss (m128, m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fnmsub_round_sd (m128d, m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fnmsub_round_ss (m128, m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_max_round_pd (m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_max_round_pd (m512d, mmask8, m512d, m512d, 7); /* { dg-error "incorrect rounding operand." } */
@@ -195,6 +218,10 @@ test_round (void)
   m512 = _mm512_mask_cvt_roundph_ps (m512, mmask16, m256i, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_cvt_roundph_ps (mmask16, m256i, 7); /* { dg-error "incorrect rounding operand." } */
 
+  m128d = _mm_cvt_roundss_sd (m128d, m128, 7); /* { dg-error "incorrect rounding operand." } */
+
+  m128 = _mm_getexp_round_ss (m128, m128, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_getexp_round_sd (m128d, m128d, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_getexp_round_ps (m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_getexp_round_ps (m512, mmask16, m512, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_getexp_round_ps (mmask16, m512, 7); /* { dg-error "incorrect rounding operand." } */
@@ -207,6 +234,8 @@ test_round (void)
   m512 = _mm512_getmant_round_ps (m512, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_getmant_round_ps (m512, mmask16, m512, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_getmant_round_ps (mmask16, m512, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_getmant_round_sd (m128d, m128d, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_getmant_round_ss (m128, m128, 0, 0, 7); /* { dg-error "incorrect rounding operand." } */
 
   m512 = _mm512_roundscale_round_ps (m512, 4, 7); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_roundscale_round_ps (m512, mmask16, m512, 4, 7); /* { dg-error "incorrect rounding operand." } */
@@ -214,6 +243,8 @@ test_round (void)
   m512d = _mm512_roundscale_round_pd (m512d, 4, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_roundscale_round_pd (m512d, mmask8, m512d, 4, 7); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_maskz_roundscale_round_pd (mmask8, m512d, 4, 7); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_roundscale_round_ss (m128, m128, 4, 7); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_roundscale_round_sd (m128d, m128d, 4, 7); /* { dg-error "incorrect rounding operand." } */
 
   mmask8 = _mm512_cmp_round_pd_mask (m512d, m512d, 4, 7); /* { dg-error "incorrect rounding operand." } */
   mmask16 = _mm512_cmp_round_ps_mask (m512, m512, 4, 7); /* { dg-error "incorrect rounding operand." } */
@@ -231,12 +262,19 @@ test_round (void)
 void
 test_round_sae (void)
 {
+  m128d = _mm_add_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_add_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_sub_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_sub_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+
   m512d = _mm512_sqrt_round_pd (m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_sqrt_round_pd (m512d, mmask8, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_maskz_sqrt_round_pd (mmask8, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_sqrt_round_ps (m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_sqrt_round_ps (m512, mmask16, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_sqrt_round_ps (mmask16, m512, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_sqrt_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_sqrt_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_add_round_pd (m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_add_round_pd (m512d, mmask8, m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
@@ -263,6 +301,10 @@ test_round_sae (void)
   m512 = _mm512_div_round_ps (m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_div_round_ps (m512, mmask16, m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_div_round_ps (mmask16, m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_mul_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_mul_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_div_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_div_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_scalef_round_pd(m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_scalef_round_pd(m512d, mmask8, m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
@@ -270,6 +312,8 @@ test_round_sae (void)
   m512 = _mm512_scalef_round_ps(m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_scalef_round_ps(m512, mmask16, m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_scalef_round_ps(mmask16, m512, m512, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_scalef_round_sd (m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_scalef_round_ss (m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
 
   m512d = _mm512_fmadd_round_pd (m512d, m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_fmadd_round_pd (m512d, mmask8, m512d, m512d, 5); /* { dg-error "incorrect rounding operand." } */
@@ -353,6 +397,16 @@ test_round_sae (void)
   m256 = _mm512_cvt_roundpd_ps (m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m256 = _mm512_mask_cvt_roundpd_ps (m256, mmask8, m512d, 5); /* { dg-error "incorrect rounding operand." } */
   m256 = _mm512_maskz_cvt_roundpd_ps (mmask8, m512d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_cvt_roundsd_ss (m128, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+
+  m128d = _mm_fmadd_round_sd (m128d, m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fmadd_round_ss (m128, m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fmsub_round_sd (m128d, m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fmsub_round_ss (m128, m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fnmadd_round_sd (m128d, m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fnmadd_round_ss (m128, m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_fnmsub_round_sd (m128d, m128d, m128d, 5); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_fnmsub_round_ss (m128, m128, m128, 5); /* { dg-error "incorrect rounding operand." } */
 }
 
 void
@@ -411,6 +465,10 @@ test_sae_only (void)
   m512 = _mm512_mask_cvt_roundph_ps (m512, mmask16, m256i, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_cvt_roundph_ps (mmask16, m256i, 3); /* { dg-error "incorrect rounding operand." } */
 
+  m128d = _mm_cvt_roundss_sd (m128d, m128, 3); /* { dg-error "incorrect rounding operand." } */
+
+  m128 = _mm_getexp_round_ss (m128, m128, 3); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_getexp_round_sd (m128d, m128d, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_getexp_round_ps (m512, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_getexp_round_ps (m512, mmask16, m512, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_getexp_round_ps (mmask16, m512, 3); /* { dg-error "incorrect rounding operand." } */
@@ -423,12 +481,17 @@ test_sae_only (void)
   m512 = _mm512_getmant_round_ps (m512, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_getmant_round_ps (m512, mmask16, m512, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_getmant_round_ps (mmask16, m512, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_getmant_round_sd (m128d, m128d, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_getmant_round_ss (m128, m128, 0, 0, 3); /* { dg-error "incorrect rounding operand." } */
+
   m512 = _mm512_roundscale_round_ps (m512, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_mask_roundscale_round_ps (m512, mmask16, m512, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512 = _mm512_maskz_roundscale_round_ps (mmask16, m512, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_roundscale_round_pd (m512d, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_mask_roundscale_round_pd (m512d, mmask8, m512d, 4, 3); /* { dg-error "incorrect rounding operand." } */
   m512d = _mm512_maskz_roundscale_round_pd (mmask8, m512d, 4, 3); /* { dg-error "incorrect rounding operand." } */
+  m128 = _mm_roundscale_round_ss (m128, m128, 4, 3); /* { dg-error "incorrect rounding operand." } */
+  m128d = _mm_roundscale_round_sd (m128d, m128d, 4, 3); /* { dg-error "incorrect rounding operand." } */
 
   mmask8 = _mm512_cmp_round_pd_mask (m512d, m512d, 4, 3); /* { dg-error "incorrect rounding operand." } */
   mmask16 = _mm512_cmp_round_ps_mask (m512, m512, 4, 3); /* { dg-error "incorrect rounding operand." } */

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst.
  2013-12-18 13:02         ` Kirill Yukhin
  2013-12-23 16:41           ` Uros Bizjak
@ 2014-01-09 15:35           ` H.J. Lu
  1 sibling, 0 replies; 63+ messages in thread
From: H.J. Lu @ 2014-01-09 15:35 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Richard Henderson, Uros Bizjak, Jakub Jelinek, Jeffrey Law, GCC Patches

On Wed, Dec 18, 2013 at 5:02 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
>
> On 02 Dec 16:10, Kirill Yukhin wrote:
>> Hello,
>> On 19 Nov 12:11, Kirill Yukhin wrote:
>> > Hello,
>> > On 15 Nov 20:07, Kirill Yukhin wrote:
>> > > > Is it ok for trunk?
>> > > Ping.
>> > Ping.
>> Ping.
> Ping.
>
> Rebased patch in the bottom.
>

This patch caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59733

Now bootstrap-asan failed with 24GB RAM and is almost unusable.

H.J.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2014-01-09 15:35 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-14  7:44 [PATCH i386 4/8] [AVX512] Add substed patterns Kirill Yukhin
2013-08-22 14:15 ` Kirill Yukhin
2013-10-17 14:17   ` [PATCH i386 4/8] [AVX512] [1/n] " Kirill Yukhin
2013-10-22  1:43     ` Richard Henderson
2013-10-22 15:58       ` Kirill Yukhin
2013-10-22 15:59         ` Richard Henderson
2013-10-28 13:10           ` Kirill Yukhin
2013-10-28 16:37             ` Richard Henderson
2013-10-28 21:19               ` Kirill Yukhin
2013-10-28 22:05                 ` Richard Henderson
2013-10-29 10:22                   ` Kirill Yukhin
2013-10-29 16:46                     ` Richard Henderson
2013-11-01 12:20       ` Kirill Yukhin
2013-11-05  9:19         ` Kirill Yukhin
2013-11-05 13:06         ` Kirill Yukhin
2013-11-11 13:52           ` Kirill Yukhin
2013-11-11 23:34           ` Richard Henderson
2013-11-26  7:01         ` Richard Henderson
2013-10-28 15:13   ` [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst Kirill Yukhin
2013-11-13 14:44     ` Kirill Yukhin
2013-11-15 17:59       ` Kirill Yukhin
2013-11-19  9:37         ` Kirill Yukhin
2013-12-02 13:10           ` Kirill Yukhin
2013-12-24  4:56             ` Kirill Yukhin
2013-12-24  4:59               ` Kirill Yukhin
     [not found]               ` <CAGs3RfukBMLGgmzR8+EFP2ok4mfXXz1DY88ghr-b_u=TbcSkdA@mail.gmail.com>
2013-12-24  7:34                 ` Uros Bizjak
2013-12-30 12:59                   ` Kirill Yukhin
2013-10-28 15:20   ` [PATCH i386 4/8] [AVX512] Add substed patterns: mask_scalar_merge subst Kirill Yukhin
2013-10-28 20:15     ` Richard Henderson
2013-11-06  7:22 ` [PATCH i386 4/8] [AVX512] [4/n] Add substed patterns: `sd' subst Kirill Yukhin
2013-11-15 18:04   ` Kirill Yukhin
2013-11-19  9:42     ` Kirill Yukhin
2013-11-19  9:48       ` Kirill Yukhin
2013-11-26  7:21   ` Richard Henderson
2013-11-06  7:24 ` [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst Kirill Yukhin
2013-11-15 18:07   ` Kirill Yukhin
2013-11-19  9:45     ` Kirill Yukhin
2013-12-02 13:11       ` Kirill Yukhin
2013-12-18 13:00         ` Kirill Yukhin
2013-12-23 16:11           ` Uros Bizjak
2013-12-23 16:26             ` Uros Bizjak
2013-12-26 13:10               ` Kirill Yukhin
2013-11-06  7:28 ` [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst Kirill Yukhin
2013-11-15 18:18   ` Kirill Yukhin
2013-11-19  9:58     ` Kirill Yukhin
2013-12-02 13:13       ` Kirill Yukhin
2013-12-18 13:02         ` Kirill Yukhin
2013-12-23 16:41           ` Uros Bizjak
2014-01-09 15:35           ` H.J. Lu
2013-11-06  7:39 ` [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst Kirill Yukhin
2013-11-15 18:19   ` Kirill Yukhin
2013-11-19  9:53     ` Kirill Yukhin
2013-12-02 13:13       ` Kirill Yukhin
2013-12-18 13:04         ` Kirill Yukhin
2013-12-23 16:46           ` Uros Bizjak
2013-12-26  9:48             ` Kirill Yukhin
2013-11-06  7:40 ` [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only " Kirill Yukhin
2013-11-15 18:20   ` Kirill Yukhin
2013-11-19 10:06     ` Kirill Yukhin
2013-12-02 13:14       ` Kirill Yukhin
2013-12-18 13:06         ` Kirill Yukhin
2013-12-18 13:17           ` Kirill Yukhin
2013-12-23 16:52             ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).