public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 00/62] Support all AVX512FP16 intrinsics.
@ 2021-07-01  6:15 liuhongt
  2021-07-01  6:15 ` [PATCH 01/62] AVX512FP16: Support vector init/broadcast for FP16 liuhongt
                   ` (61 more replies)
  0 siblings, 62 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

Hi:
  This is the second part to support AVX512FP16, i've squash them from 65 commits to 62 commits.
  The second part support all AVX512FP16 intrinsics, also include some optimizations.
  Most of AVX512FP16 instructions are just "extension" from float/double intructions except for _Float16 complex instructions which is complete new.[1]

[1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html 

H.J. Lu (6):
  AVX512FP16: Support vector init/broadcast for FP16.
  AVX512FP16: Fix HF vector passing in variable arguments.
  AVX512FP16: Add ABI tests for xmm.
  AVX512FP16: Enable _Float16 autovectorization
  AVX512FP16: Add scalar/vector bitwise operations, including
  AVX512FP16: Enable FP16 mask load/store.

Liu, Hongtao (1):
  AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh.

dianhong xu (4):
  AVX512FP16: Support load/store/abs intrinsics.
  AVX512FP16: Add reduce operators(add/mul/min/max).
  AVX512FP16: Add complex conjugation intrinsic instructions.
  AVX512FP16: Add permutation and mask blend intrinsics.

liuhongt (51):
  AVX512FP16: Add testcase for vector init and broadcast intrinsics.
  AVX512FP16: Add ABI test for ymm.
  AVX512FP16: Add abi test for zmm
  AVX512FP16: Add vaddph/vsubph/vdivph/vmulph.
  AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph.
  AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh.
  AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh.
  AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh.
  AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish.
  AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish.
  AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh.
  AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh.
  AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh.
  AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh.
  AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh.
  AVX512FP16: Add testcase for
    vreduceph/vreducesh/vrndscaleph/vrndscalesh.
  AVX512FP16: Add fpclass/getexp/getmant instructions.
  AVX512FP16: Add testcase for fpclass/getmant/getexp instructions.
  AVX512FP16: Add vmovw/vmovsh.
  AVX512FP16: Add testcase for vmovsh/vmovw.
  AVX512FP16: Add
    vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq
  AVX512FP16: Add testcase for
    vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq.
  AVX512FP16: Add
    vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph
  AVX512FP16: Add testcase for
    vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph.
  AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.
  AVX512FP16: Add testcase for
    vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.
  AVX512FP16: Add
    vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq
  AVX512FP16: Add testcase for
    vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq.
  AVX512FP16: Add vcvttsh2si/vcvttsh2usi.
  AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx.
  AVX512FP16: Add testcase for
    vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx.
  AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh.
  AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh.
  AVX512FP16: Add intrinsics for casting between vector float16 and
    vector float32/float64/integer.
  AVX512FP16: Add vfmaddsub[132,213,231]ph/vfmsubadd[132,213,231]ph.
  AVX512FP16: Add testcase for
    vfmaddsub[132,213,231]ph/vfmsubadd[132,213,231]ph.
  AVX512FP16: Add FP16 fma instructions.
  AVX512FP16: Add testcase for fma instructions
  AVX512FP16: Add testcase for fp16 bitwise operations.
  AVX512FP16: Add scalar fma instructions.
  AVX512FP16: Add testcase for scalar FMA instructions.
  AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph
  AVX512FP16: Add testcases for vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph.
  AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh.
  AVX512FP16: Add testcases for vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh.
  AVX512FP16: Add expander for sqrthf2.
  AVX512FP16: Add expander for ceil/floor/trunc/roundeven.
  AVX512FP16: Add expander for cstorehf4.
  AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).
  AVX512FP16: Add expander for fmahf4
  AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float)
    f16).

 gcc/config.gcc                                |    2 +-
 gcc/config/i386/avx512fp16intrin.h            | 7136 +++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h          | 3316 ++++++++
 gcc/config/i386/i386-builtin-types.def        |   78 +
 gcc/config/i386/i386-builtin.def              |  269 +
 gcc/config/i386/i386-expand.c                 |  188 +-
 gcc/config/i386/i386-features.c               |   15 +-
 gcc/config/i386/i386-modes.def                |   14 +-
 gcc/config/i386/i386.c                        |   53 +-
 gcc/config/i386/i386.md                       |  130 +-
 gcc/config/i386/immintrin.h                   |    2 +
 gcc/config/i386/sse.md                        | 2311 ++++--
 gcc/config/i386/subst.md                      |  114 +-
 gcc/testsuite/gcc.target/i386/avx-1.c         |  133 +-
 gcc/testsuite/gcc.target/i386/avx-2.c         |    2 +-
 .../gcc.target/i386/avx512fp16-10a.c          |   14 +
 .../gcc.target/i386/avx512fp16-10b.c          |   25 +
 .../gcc.target/i386/avx512fp16-11a.c          |   36 +
 .../gcc.target/i386/avx512fp16-11b.c          |   75 +
 gcc/testsuite/gcc.target/i386/avx512fp16-13.c |  143 +
 gcc/testsuite/gcc.target/i386/avx512fp16-14.c |   91 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |   24 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |   32 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |   26 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |   33 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |   30 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |   28 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |   33 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |   36 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |   36 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |   35 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |   40 +
 gcc/testsuite/gcc.target/i386/avx512fp16-4.c  |   31 +
 gcc/testsuite/gcc.target/i386/avx512fp16-5.c  |  133 +
 gcc/testsuite/gcc.target/i386/avx512fp16-6.c  |   57 +
 gcc/testsuite/gcc.target/i386/avx512fp16-7.c  |   86 +
 gcc/testsuite/gcc.target/i386/avx512fp16-8.c  |   53 +
 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c |   27 +
 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c |   49 +
 .../i386/avx512fp16-builtin-fpcompare-1.c     |   40 +
 .../i386/avx512fp16-builtin-fpcompare-2.c     |   29 +
 .../i386/avx512fp16-builtin-round-1.c         |   31 +
 .../i386/avx512fp16-builtin-round-2.c         |   29 +
 .../i386/avx512fp16-builtin-sqrt-1.c          |   18 +
 .../i386/avx512fp16-builtin-sqrt-2.c          |   18 +
 .../i386/avx512fp16-conjugation-1.c           |   34 +
 .../gcc.target/i386/avx512fp16-fma-1.c        |   69 +
 .../gcc.target/i386/avx512fp16-helper.h       |  284 +
 .../gcc.target/i386/avx512fp16-neg-1a.c       |   19 +
 .../gcc.target/i386/avx512fp16-neg-1b.c       |   33 +
 .../gcc.target/i386/avx512fp16-reduce-op-1.c  |  132 +
 .../i386/avx512fp16-scalar-bitwise-1a.c       |   31 +
 .../i386/avx512fp16-scalar-bitwise-1b.c       |   82 +
 .../gcc.target/i386/avx512fp16-typecast-1.c   |   44 +
 .../gcc.target/i386/avx512fp16-typecast-2.c   |   43 +
 .../gcc.target/i386/avx512fp16-vaddph-1a.c    |   26 +
 .../gcc.target/i386/avx512fp16-vaddph-1b.c    |   92 +
 .../gcc.target/i386/avx512fp16-vaddsh-1a.c    |   27 +
 .../gcc.target/i386/avx512fp16-vaddsh-1b.c    |  104 +
 .../gcc.target/i386/avx512fp16-vararg-1.c     |  122 +
 .../gcc.target/i386/avx512fp16-vararg-2.c     |  107 +
 .../gcc.target/i386/avx512fp16-vararg-3.c     |  114 +
 .../gcc.target/i386/avx512fp16-vararg-4.c     |  115 +
 .../gcc.target/i386/avx512fp16-vcmpph-1a.c    |   22 +
 .../gcc.target/i386/avx512fp16-vcmpph-1b.c    |   70 +
 .../gcc.target/i386/avx512fp16-vcmpsh-1a.c    |   21 +
 .../gcc.target/i386/avx512fp16-vcmpsh-1b.c    |   45 +
 .../gcc.target/i386/avx512fp16-vcomish-1a.c   |   41 +
 .../gcc.target/i386/avx512fp16-vcomish-1b.c   |   66 +
 .../gcc.target/i386/avx512fp16-vcomish-1c.c   |   66 +
 .../gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c |   79 +
 .../gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c |   82 +
 .../gcc.target/i386/avx512fp16-vcvtph2dq-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvtph2dq-1b.c |   79 +
 .../gcc.target/i386/avx512fp16-vcvtph2pd-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvtph2pd-1b.c |   78 +
 .../i386/avx512fp16-vcvtph2psx-1a.c           |   24 +
 .../i386/avx512fp16-vcvtph2psx-1b.c           |   81 +
 .../gcc.target/i386/avx512fp16-vcvtph2qq-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvtph2qq-1b.c |   78 +
 .../i386/avx512fp16-vcvtph2udq-1a.c           |   24 +
 .../i386/avx512fp16-vcvtph2udq-1b.c           |   79 +
 .../i386/avx512fp16-vcvtph2uqq-1a.c           |   24 +
 .../i386/avx512fp16-vcvtph2uqq-1b.c           |   78 +
 .../gcc.target/i386/avx512fp16-vcvtph2uw-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvtph2uw-1b.c |   84 +
 .../gcc.target/i386/avx512fp16-vcvtph2w-1a.c  |   24 +
 .../gcc.target/i386/avx512fp16-vcvtph2w-1b.c  |   83 +
 .../gcc.target/i386/avx512fp16-vcvtps2ph-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvtps2ph-1b.c |   84 +
 .../gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c |   84 +
 .../gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c |   25 +
 .../gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c |   60 +
 .../gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c |   25 +
 .../gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c |   57 +
 .../gcc.target/i386/avx512fp16-vcvtsh2si-1a.c |   17 +
 .../gcc.target/i386/avx512fp16-vcvtsh2si-1b.c |   54 +
 .../i386/avx512fp16-vcvtsh2si64-1a.c          |   17 +
 .../i386/avx512fp16-vcvtsh2si64-1b.c          |   52 +
 .../gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c |   25 +
 .../gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c |   59 +
 .../i386/avx512fp16-vcvtsh2usi-1a.c           |   17 +
 .../i386/avx512fp16-vcvtsh2usi-1b.c           |   54 +
 .../i386/avx512fp16-vcvtsh2usi64-1a.c         |   16 +
 .../i386/avx512fp16-vcvtsh2usi64-1b.c         |   53 +
 .../gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c |   16 +
 .../gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c |   41 +
 .../i386/avx512fp16-vcvtsi2sh64-1a.c          |   16 +
 .../i386/avx512fp16-vcvtsi2sh64-1b.c          |   41 +
 .../gcc.target/i386/avx512fp16-vcvtss2sh-1a.c |   25 +
 .../gcc.target/i386/avx512fp16-vcvtss2sh-1b.c |   60 +
 .../i386/avx512fp16-vcvttph2dq-1a.c           |   24 +
 .../i386/avx512fp16-vcvttph2dq-1b.c           |   79 +
 .../i386/avx512fp16-vcvttph2qq-1a.c           |   24 +
 .../i386/avx512fp16-vcvttph2qq-1b.c           |   78 +
 .../i386/avx512fp16-vcvttph2udq-1a.c          |   24 +
 .../i386/avx512fp16-vcvttph2udq-1b.c          |   79 +
 .../i386/avx512fp16-vcvttph2uqq-1a.c          |   24 +
 .../i386/avx512fp16-vcvttph2uqq-1b.c          |   78 +
 .../i386/avx512fp16-vcvttph2uw-1a.c           |   24 +
 .../i386/avx512fp16-vcvttph2uw-1b.c           |   84 +
 .../gcc.target/i386/avx512fp16-vcvttph2w-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvttph2w-1b.c |   83 +
 .../i386/avx512fp16-vcvttsh2si-1a.c           |   16 +
 .../i386/avx512fp16-vcvttsh2si-1b.c           |   54 +
 .../i386/avx512fp16-vcvttsh2si64-1a.c         |   16 +
 .../i386/avx512fp16-vcvttsh2si64-1b.c         |   52 +
 .../i386/avx512fp16-vcvttsh2usi-1a.c          |   16 +
 .../i386/avx512fp16-vcvttsh2usi-1b.c          |   54 +
 .../i386/avx512fp16-vcvttsh2usi64-1a.c        |   16 +
 .../i386/avx512fp16-vcvttsh2usi64-1b.c        |   53 +
 .../i386/avx512fp16-vcvtudq2ph-1a.c           |   24 +
 .../i386/avx512fp16-vcvtudq2ph-1b.c           |   79 +
 .../i386/avx512fp16-vcvtuqq2ph-1a.c           |   24 +
 .../i386/avx512fp16-vcvtuqq2ph-1b.c           |   83 +
 .../i386/avx512fp16-vcvtusi2sh-1a.c           |   16 +
 .../i386/avx512fp16-vcvtusi2sh-1b.c           |   41 +
 .../i386/avx512fp16-vcvtusi2sh64-1a.c         |   16 +
 .../i386/avx512fp16-vcvtusi2sh64-1b.c         |   41 +
 .../gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c |   93 +
 .../gcc.target/i386/avx512fp16-vcvtw2ph-1a.c  |   24 +
 .../gcc.target/i386/avx512fp16-vcvtw2ph-1b.c  |   92 +
 .../gcc.target/i386/avx512fp16-vdivph-1a.c    |   26 +
 .../gcc.target/i386/avx512fp16-vdivph-1b.c    |   97 +
 .../gcc.target/i386/avx512fp16-vdivsh-1a.c    |   27 +
 .../gcc.target/i386/avx512fp16-vdivsh-1b.c    |   76 +
 .../i386/avx512fp16-vector-bitwise-1a.c       |  124 +
 .../i386/avx512fp16-vector-bitwise-1b.c       |  119 +
 .../i386/avx512fp16-vfcmaddcph-1a.c           |   27 +
 .../i386/avx512fp16-vfcmaddcph-1b.c           |  133 +
 .../i386/avx512fp16-vfcmaddcsh-1a.c           |   27 +
 .../i386/avx512fp16-vfcmaddcsh-1b.c           |   78 +
 .../gcc.target/i386/avx512fp16-vfcmulcph-1a.c |   25 +
 .../gcc.target/i386/avx512fp16-vfcmulcph-1b.c |  111 +
 .../gcc.target/i386/avx512fp16-vfcmulcsh-1a.c |   25 +
 .../gcc.target/i386/avx512fp16-vfcmulcsh-1b.c |   71 +
 .../i386/avx512fp16-vfmaddXXXph-1a.c          |   28 +
 .../i386/avx512fp16-vfmaddXXXph-1b.c          |  160 +
 .../i386/avx512fp16-vfmaddXXXsh-1a.c          |   28 +
 .../i386/avx512fp16-vfmaddXXXsh-1b.c          |   90 +
 .../gcc.target/i386/avx512fp16-vfmaddcph-1a.c |   27 +
 .../gcc.target/i386/avx512fp16-vfmaddcph-1b.c |  131 +
 .../gcc.target/i386/avx512fp16-vfmaddcsh-1a.c |   27 +
 .../gcc.target/i386/avx512fp16-vfmaddcsh-1b.c |   77 +
 .../i386/avx512fp16-vfmaddsubXXXph-1a.c       |   28 +
 .../i386/avx512fp16-vfmaddsubXXXph-1b.c       |  171 +
 .../i386/avx512fp16-vfmsubXXXph-1a.c          |   32 +
 .../i386/avx512fp16-vfmsubXXXph-1b.c          |  155 +
 .../i386/avx512fp16-vfmsubXXXsh-1a.c          |   28 +
 .../i386/avx512fp16-vfmsubXXXsh-1b.c          |   89 +
 .../i386/avx512fp16-vfmsubaddXXXph-1a.c       |   28 +
 .../i386/avx512fp16-vfmsubaddXXXph-1b.c       |  175 +
 .../gcc.target/i386/avx512fp16-vfmulcph-1a.c  |   25 +
 .../gcc.target/i386/avx512fp16-vfmulcph-1b.c  |  115 +
 .../gcc.target/i386/avx512fp16-vfmulcsh-1a.c  |   25 +
 .../gcc.target/i386/avx512fp16-vfmulcsh-1b.c  |   71 +
 .../i386/avx512fp16-vfnmaddXXXph-1a.c         |   28 +
 .../i386/avx512fp16-vfnmaddXXXph-1b.c         |  159 +
 .../i386/avx512fp16-vfnmaddXXXsh-1a.c         |   32 +
 .../i386/avx512fp16-vfnmaddXXXsh-1b.c         |   90 +
 .../i386/avx512fp16-vfnmsubXXXph-1a.c         |   32 +
 .../i386/avx512fp16-vfnmsubXXXph-1b.c         |  157 +
 .../i386/avx512fp16-vfnmsubXXXsh-1a.c         |   28 +
 .../i386/avx512fp16-vfnmsubXXXsh-1b.c         |   90 +
 .../i386/avx512fp16-vfpclassph-1a.c           |   16 +
 .../i386/avx512fp16-vfpclassph-1b.c           |   77 +
 .../i386/avx512fp16-vfpclasssh-1a.c           |   16 +
 .../i386/avx512fp16-vfpclasssh-1b.c           |   76 +
 .../gcc.target/i386/avx512fp16-vgetexpph-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vgetexpph-1b.c |   99 +
 .../gcc.target/i386/avx512fp16-vgetexpsh-1a.c |   24 +
 .../gcc.target/i386/avx512fp16-vgetexpsh-1b.c |   61 +
 .../i386/avx512fp16-vgetmantph-1a.c           |   24 +
 .../i386/avx512fp16-vgetmantph-1b.c           |  102 +
 .../i386/avx512fp16-vgetmantsh-1a.c           |   24 +
 .../i386/avx512fp16-vgetmantsh-1b.c           |   62 +
 .../gcc.target/i386/avx512fp16-vmaxph-1a.c    |   26 +
 .../gcc.target/i386/avx512fp16-vmaxph-1b.c    |   94 +
 .../gcc.target/i386/avx512fp16-vmaxsh-1.c     |   27 +
 .../gcc.target/i386/avx512fp16-vmaxsh-1b.c    |   72 +
 .../gcc.target/i386/avx512fp16-vminph-1a.c    |   26 +
 .../gcc.target/i386/avx512fp16-vminph-1b.c    |   93 +
 .../gcc.target/i386/avx512fp16-vminsh-1.c     |   27 +
 .../gcc.target/i386/avx512fp16-vminsh-1b.c    |   72 +
 .../gcc.target/i386/avx512fp16-vmovsh-1a.c    |   26 +
 .../gcc.target/i386/avx512fp16-vmovsh-1b.c    |  115 +
 .../gcc.target/i386/avx512fp16-vmovw-1a.c     |   15 +
 .../gcc.target/i386/avx512fp16-vmovw-1b.c     |   27 +
 .../gcc.target/i386/avx512fp16-vmovw-2a.c     |   21 +
 .../gcc.target/i386/avx512fp16-vmovw-2b.c     |   53 +
 .../gcc.target/i386/avx512fp16-vmovw-3a.c     |   23 +
 .../gcc.target/i386/avx512fp16-vmovw-3b.c     |   52 +
 .../gcc.target/i386/avx512fp16-vmovw-4a.c     |   27 +
 .../gcc.target/i386/avx512fp16-vmovw-4b.c     |   52 +
 .../gcc.target/i386/avx512fp16-vmulph-1a.c    |   26 +
 .../gcc.target/i386/avx512fp16-vmulph-1b.c    |   92 +
 .../gcc.target/i386/avx512fp16-vmulsh-1a.c    |   27 +
 .../gcc.target/i386/avx512fp16-vmulsh-1b.c    |   77 +
 .../gcc.target/i386/avx512fp16-vrcpph-1a.c    |   19 +
 .../gcc.target/i386/avx512fp16-vrcpph-1b.c    |   79 +
 .../gcc.target/i386/avx512fp16-vrcpsh-1a.c    |   18 +
 .../gcc.target/i386/avx512fp16-vrcpsh-1b.c    |   57 +
 .../gcc.target/i386/avx512fp16-vreduceph-1a.c |   26 +
 .../gcc.target/i386/avx512fp16-vreduceph-1b.c |  116 +
 .../gcc.target/i386/avx512fp16-vreducesh-1a.c |   26 +
 .../gcc.target/i386/avx512fp16-vreducesh-1b.c |   78 +
 .../i386/avx512fp16-vrndscaleph-1a.c          |   26 +
 .../i386/avx512fp16-vrndscaleph-1b.c          |  101 +
 .../i386/avx512fp16-vrndscalesh-1a.c          |   25 +
 .../i386/avx512fp16-vrndscalesh-1b.c          |   62 +
 .../gcc.target/i386/avx512fp16-vrsqrtph-1a.c  |   19 +
 .../gcc.target/i386/avx512fp16-vrsqrtph-1b.c  |   77 +
 .../gcc.target/i386/avx512fp16-vrsqrtsh-1a.c  |   18 +
 .../gcc.target/i386/avx512fp16-vrsqrtsh-1b.c  |   59 +
 .../gcc.target/i386/avx512fp16-vscalefph-1a.c |   25 +
 .../gcc.target/i386/avx512fp16-vscalefph-1b.c |   94 +
 .../gcc.target/i386/avx512fp16-vscalefsh-1a.c |   23 +
 .../gcc.target/i386/avx512fp16-vscalefsh-1b.c |   58 +
 .../gcc.target/i386/avx512fp16-vsqrtph-1a.c   |   24 +
 .../gcc.target/i386/avx512fp16-vsqrtph-1b.c   |   92 +
 .../gcc.target/i386/avx512fp16-vsqrtsh-1a.c   |   23 +
 .../gcc.target/i386/avx512fp16-vsqrtsh-1b.c   |   60 +
 .../gcc.target/i386/avx512fp16-vsubph-1a.c    |   26 +
 .../gcc.target/i386/avx512fp16-vsubph-1b.c    |   93 +
 .../gcc.target/i386/avx512fp16-vsubsh-1a.c    |   27 +
 .../gcc.target/i386/avx512fp16-vsubsh-1b.c    |   76 +
 .../gcc.target/i386/avx512fp16-xorsign-1.c    |   41 +
 .../i386/avx512fp16vl-builtin-sqrt-1.c        |   19 +
 .../i386/avx512fp16vl-conjugation-1.c         |   65 +
 .../gcc.target/i386/avx512fp16vl-fma-1.c      |   70 +
 .../i386/avx512fp16vl-fma-vectorize-1.c       |   45 +
 .../gcc.target/i386/avx512fp16vl-neg-1a.c     |   18 +
 .../gcc.target/i386/avx512fp16vl-neg-1b.c     |   33 +
 .../i386/avx512fp16vl-reduce-op-1.c           |  244 +
 .../gcc.target/i386/avx512fp16vl-typecast-1.c |   55 +
 .../gcc.target/i386/avx512fp16vl-typecast-2.c |   37 +
 .../gcc.target/i386/avx512fp16vl-vaddph-1a.c  |   29 +
 .../gcc.target/i386/avx512fp16vl-vaddph-1b.c  |   16 +
 .../gcc.target/i386/avx512fp16vl-vcmpph-1a.c  |   24 +
 .../gcc.target/i386/avx512fp16vl-vcmpph-1b.c  |   16 +
 .../i386/avx512fp16vl-vcvtdq2ph-1a.c          |   27 +
 .../i386/avx512fp16vl-vcvtdq2ph-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvtpd2ph-1a.c          |   28 +
 .../i386/avx512fp16vl-vcvtpd2ph-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvtph2dq-1a.c          |   27 +
 .../i386/avx512fp16vl-vcvtph2dq-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvtph2pd-1a.c          |   27 +
 .../i386/avx512fp16vl-vcvtph2pd-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvtph2psx-1a.c         |   27 +
 .../i386/avx512fp16vl-vcvtph2psx-1b.c         |   15 +
 .../i386/avx512fp16vl-vcvtph2qq-1a.c          |   27 +
 .../i386/avx512fp16vl-vcvtph2qq-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvtph2udq-1a.c         |   27 +
 .../i386/avx512fp16vl-vcvtph2udq-1b.c         |   15 +
 .../i386/avx512fp16vl-vcvtph2uqq-1a.c         |   27 +
 .../i386/avx512fp16vl-vcvtph2uqq-1b.c         |   15 +
 .../i386/avx512fp16vl-vcvtph2uw-1a.c          |   29 +
 .../i386/avx512fp16vl-vcvtph2uw-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvtph2w-1a.c           |   29 +
 .../i386/avx512fp16vl-vcvtph2w-1b.c           |   15 +
 .../i386/avx512fp16vl-vcvtps2ph-1a.c          |   27 +
 .../i386/avx512fp16vl-vcvtps2ph-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvtqq2ph-1a.c          |   28 +
 .../i386/avx512fp16vl-vcvtqq2ph-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvttph2dq-1a.c         |   27 +
 .../i386/avx512fp16vl-vcvttph2dq-1b.c         |   15 +
 .../i386/avx512fp16vl-vcvttph2qq-1a.c         |   27 +
 .../i386/avx512fp16vl-vcvttph2qq-1b.c         |   15 +
 .../i386/avx512fp16vl-vcvttph2udq-1a.c        |   27 +
 .../i386/avx512fp16vl-vcvttph2udq-1b.c        |   15 +
 .../i386/avx512fp16vl-vcvttph2uqq-1a.c        |   27 +
 .../i386/avx512fp16vl-vcvttph2uqq-1b.c        |   15 +
 .../i386/avx512fp16vl-vcvttph2uw-1a.c         |   29 +
 .../i386/avx512fp16vl-vcvttph2uw-1b.c         |   15 +
 .../i386/avx512fp16vl-vcvttph2w-1a.c          |   29 +
 .../i386/avx512fp16vl-vcvttph2w-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvtudq2ph-1a.c         |   27 +
 .../i386/avx512fp16vl-vcvtudq2ph-1b.c         |   15 +
 .../i386/avx512fp16vl-vcvtuqq2ph-1a.c         |   28 +
 .../i386/avx512fp16vl-vcvtuqq2ph-1b.c         |   15 +
 .../i386/avx512fp16vl-vcvtuw2ph-1a.c          |   29 +
 .../i386/avx512fp16vl-vcvtuw2ph-1b.c          |   15 +
 .../i386/avx512fp16vl-vcvtw2ph-1a.c           |   29 +
 .../i386/avx512fp16vl-vcvtw2ph-1b.c           |   15 +
 .../gcc.target/i386/avx512fp16vl-vdivph-1a.c  |   29 +
 .../gcc.target/i386/avx512fp16vl-vdivph-1b.c  |   16 +
 .../i386/avx512fp16vl-vfcmaddcph-1a.c         |   30 +
 .../i386/avx512fp16vl-vfcmaddcph-1b.c         |   15 +
 .../i386/avx512fp16vl-vfcmulcph-1a.c          |   28 +
 .../i386/avx512fp16vl-vfcmulcph-1b.c          |   15 +
 .../i386/avx512fp16vl-vfmaddXXXph-1a.c        |   28 +
 .../i386/avx512fp16vl-vfmaddXXXph-1b.c        |   15 +
 .../i386/avx512fp16vl-vfmaddcph-1a.c          |   30 +
 .../i386/avx512fp16vl-vfmaddcph-1b.c          |   15 +
 .../i386/avx512fp16vl-vfmaddsubXXXph-1a.c     |   28 +
 .../i386/avx512fp16vl-vfmaddsubXXXph-1b.c     |   15 +
 .../i386/avx512fp16vl-vfmsubXXXph-1a.c        |   28 +
 .../i386/avx512fp16vl-vfmsubXXXph-1b.c        |   15 +
 .../i386/avx512fp16vl-vfmsubaddXXXph-1a.c     |   28 +
 .../i386/avx512fp16vl-vfmsubaddXXXph-1b.c     |   15 +
 .../i386/avx512fp16vl-vfmulcph-1a.c           |   28 +
 .../i386/avx512fp16vl-vfmulcph-1b.c           |   15 +
 .../i386/avx512fp16vl-vfnmaddXXXph-1a.c       |   28 +
 .../i386/avx512fp16vl-vfnmaddXXXph-1b.c       |   15 +
 .../i386/avx512fp16vl-vfnmsubXXXph-1a.c       |   28 +
 .../i386/avx512fp16vl-vfnmsubXXXph-1b.c       |   15 +
 .../i386/avx512fp16vl-vfpclassph-1a.c         |   22 +
 .../i386/avx512fp16vl-vfpclassph-1b.c         |   16 +
 .../i386/avx512fp16vl-vgetexpph-1a.c          |   26 +
 .../i386/avx512fp16vl-vgetexpph-1b.c          |   16 +
 .../i386/avx512fp16vl-vgetmantph-1a.c         |   30 +
 .../i386/avx512fp16vl-vgetmantph-1b.c         |   16 +
 .../gcc.target/i386/avx512fp16vl-vmaxph-1a.c  |   29 +
 .../gcc.target/i386/avx512fp16vl-vmaxph-1b.c  |   16 +
 .../gcc.target/i386/avx512fp16vl-vminph-1a.c  |   29 +
 .../gcc.target/i386/avx512fp16vl-vminph-1b.c  |   16 +
 .../gcc.target/i386/avx512fp16vl-vmulph-1a.c  |   29 +
 .../gcc.target/i386/avx512fp16vl-vmulph-1b.c  |   16 +
 .../gcc.target/i386/avx512fp16vl-vrcpph-1a.c  |   29 +
 .../gcc.target/i386/avx512fp16vl-vrcpph-1b.c  |   16 +
 .../i386/avx512fp16vl-vreduceph-1a.c          |   30 +
 .../i386/avx512fp16vl-vreduceph-1b.c          |   16 +
 .../i386/avx512fp16vl-vrndscaleph-1a.c        |   30 +
 .../i386/avx512fp16vl-vrndscaleph-1b.c        |   16 +
 .../i386/avx512fp16vl-vrsqrtph-1a.c           |   29 +
 .../i386/avx512fp16vl-vrsqrtph-1b.c           |   16 +
 .../i386/avx512fp16vl-vscalefph-1a.c          |   29 +
 .../i386/avx512fp16vl-vscalefph-1b.c          |   16 +
 .../gcc.target/i386/avx512fp16vl-vsqrtph-1a.c |   29 +
 .../gcc.target/i386/avx512fp16vl-vsqrtph-1b.c |   16 +
 .../gcc.target/i386/avx512fp16vl-vsubph-1a.c  |   29 +
 .../gcc.target/i386/avx512fp16vl-vsubph-1b.c  |   16 +
 .../gcc.target/i386/avx512vlfp16-11a.c        |   68 +
 .../gcc.target/i386/avx512vlfp16-11b.c        |   96 +
 gcc/testsuite/gcc.target/i386/m512-check.h    |   38 +-
 gcc/testsuite/gcc.target/i386/pr54855-11.c    |   16 +
 gcc/testsuite/gcc.target/i386/sse-13.c        |  131 +
 gcc/testsuite/gcc.target/i386/sse-14.c        |  280 +
 gcc/testsuite/gcc.target/i386/sse-22.c        |  277 +
 gcc/testsuite/gcc.target/i386/sse-23.c        |  131 +
 .../gcc.target/i386/vect-float16-1.c          |   14 +
 .../gcc.target/i386/vect-float16-10.c         |   14 +
 .../gcc.target/i386/vect-float16-11.c         |   14 +
 .../gcc.target/i386/vect-float16-12.c         |   14 +
 .../gcc.target/i386/vect-float16-2.c          |   14 +
 .../gcc.target/i386/vect-float16-3.c          |   14 +
 .../gcc.target/i386/vect-float16-4.c          |   14 +
 .../gcc.target/i386/vect-float16-5.c          |   14 +
 .../gcc.target/i386/vect-float16-6.c          |   14 +
 .../gcc.target/i386/vect-float16-7.c          |   14 +
 .../gcc.target/i386/vect-float16-8.c          |   14 +
 .../gcc.target/i386/vect-float16-9.c          |   14 +
 .../abi/avx512fp16/abi-avx512fp16-xmm.exp     |   48 +
 .../gcc.target/x86_64/abi/avx512fp16/args.h   |  190 +
 .../x86_64/abi/avx512fp16/asm-support.S       |   81 +
 .../x86_64/abi/avx512fp16/avx512fp16-check.h  |   74 +
 .../abi/avx512fp16/avx512fp16-xmm-check.h     |    3 +
 .../x86_64/abi/avx512fp16/defines.h           |  150 +
 .../avx512fp16/m256h/abi-avx512fp16-ymm.exp   |   45 +
 .../x86_64/abi/avx512fp16/m256h/args.h        |  182 +
 .../x86_64/abi/avx512fp16/m256h/asm-support.S |   81 +
 .../avx512fp16/m256h/avx512fp16-ymm-check.h   |    3 +
 .../avx512fp16/m256h/test_m256_returning.c    |   54 +
 .../abi/avx512fp16/m256h/test_passing_m256.c  |  370 +
 .../avx512fp16/m256h/test_passing_structs.c   |  113 +
 .../avx512fp16/m256h/test_passing_unions.c    |  337 +
 .../abi/avx512fp16/m256h/test_varargs-m256.c  |  160 +
 .../avx512fp16/m512h/abi-avx512fp16-zmm.exp   |   48 +
 .../x86_64/abi/avx512fp16/m512h/args.h        |  186 +
 .../x86_64/abi/avx512fp16/m512h/asm-support.S |   97 +
 .../avx512fp16/m512h/avx512fp16-zmm-check.h   |    4 +
 .../avx512fp16/m512h/test_m512_returning.c    |   62 +
 .../abi/avx512fp16/m512h/test_passing_m512.c  |  380 +
 .../avx512fp16/m512h/test_passing_structs.c   |  123 +
 .../avx512fp16/m512h/test_passing_unions.c    |  415 +
 .../abi/avx512fp16/m512h/test_varargs-m512.c  |  164 +
 .../gcc.target/x86_64/abi/avx512fp16/macros.h |   53 +
 .../test_3_element_struct_and_unions.c        |  692 ++
 .../abi/avx512fp16/test_basic_alignment.c     |   45 +
 .../test_basic_array_size_and_align.c         |   43 +
 .../abi/avx512fp16/test_basic_returning.c     |   87 +
 .../x86_64/abi/avx512fp16/test_basic_sizes.c  |   43 +
 .../test_basic_struct_size_and_align.c        |   42 +
 .../test_basic_union_size_and_align.c         |   40 +
 .../abi/avx512fp16/test_complex_returning.c   |  104 +
 .../abi/avx512fp16/test_m64m128_returning.c   |   73 +
 .../abi/avx512fp16/test_passing_floats.c      | 1066 +++
 .../abi/avx512fp16/test_passing_m64m128.c     |  510 ++
 .../abi/avx512fp16/test_passing_structs.c     |  332 +
 .../abi/avx512fp16/test_passing_unions.c      |  335 +
 .../abi/avx512fp16/test_struct_returning.c    |  274 +
 .../x86_64/abi/avx512fp16/test_varargs-m128.c |  164 +
 416 files changed, 37029 insertions(+), 707 deletions(-)
 create mode 100644 gcc/config/i386/avx512fp16vlintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-11a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-11b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-conjugation-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-neg-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-neg-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-typecast-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-typecast-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-conjugation-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-reduce-op-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-9.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c

-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 01/62] AVX512FP16: Support vector init/broadcast for FP16.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 02/62] AVX512FP16: Add testcase for vector init and broadcast intrinsics liuhongt
                   ` (60 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

From: "H.J. Lu" <hjl.tools@gmail.com>

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
	(_mm256_set_ph): Likewise.
	(_mm512_set_ph): Likewise.
	(_mm_setr_ph): Likewise.
	(_mm256_setr_ph): Likewise.
	(_mm512_setr_ph): Likewise.
	(_mm_set1_ph): Likewise.
	(_mm256_set1_ph): Likewise.
	(_mm512_set1_ph): Likewise.
	(_mm_setzero_ph): Likewise.
	(_mm256_setzero_ph): Likewise.
	(_mm512_setzero_ph): Likewise.
	(_mm_set_sh): Likewise.
	(_mm_load_sh): Likewise.
	(_mm_store_sh): Likewise.
	* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
	Support vector HFmodes.
	(ix86_expand_vector_init_one_nonzero): Likewise.
	(ix86_expand_vector_init_one_var): Likewise.
	(ix86_expand_vector_init_interleave): Likewise.
	(ix86_expand_vector_init_general): Likewise.
	(ix86_expand_vector_set): Likewise.
	(ix86_expand_vector_extract): Likewise.
	* config/i386/i386-modes.def: Add HF vector modes in comment.
	* config/i386/i386.c (classify_argument): Add HF vector modes.
	(inline_secondary_memory_needed): Enable 16bit move.
	(ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
	(ix86_vector_mode_supported_p): Likewise.
	* config/i386/i386.md (mode): Add HF vector modes.
	(MODE_SIZE): Likewise.
	(ssemodesuffix): Add ph suffix for HF vector modes.
	* config/i386/sse.md (VMOVE): Adjust for HF vector modes.
	(V): Likewise.
	(V_256_512): Likewise.
	(avx512): Likewise.
	(shuffletype): Likewise.
	(sseinsnmode): Likewise.
	(ssedoublevecmode): Likewise.
	(ssehalfvecmode): Likewise.
	(ssehalfvecmodelower): Likewise.
	(ssePScmode): Likewise.
	(ssescalarmode): Likewise.
	(ssescalarmodelower): Likewise.
	(sseintprefix): Likewise.
	(i128): Likewise.
	(bcstscalarsuff): Likewise.
	(xtg_mode): Likewise.
	(VI12HF_AVX512VL): New mode_iterator.
	(VF_AVX512FP16): Likewise.
	(VIHF): Likewise.
	(VIHF_256): Likewise.
	(VIHF_AVX512BW): Likewise.
	(V16_256): Likewise.
	(V32_512): Likewise.
	(sseintmodesuffix): New mode_attr.
	(vec_set<mode>_0): New define_insn for HF vector set.
	(*avx512fp16_movsh): Likewise.
	(avx512fp16_movsh): Likewise.
	(vec_extract_lo_v32hi): Rename to ...
	(vec_extract_lo_<mode>): ... this, and adjust to allow HF
	vector modes.
	(vec_extract_hi_v32hi): Likewise.
	(vec_extract_hi_<mode>): Likewise.
	(vec_extract_lo_v16hi): Likewise.
	(vec_extract_lo_<mode>): Likewise.
	(vec_extract_hi_v16hi): Likewise.
	(vec_extract_hi_<mode>): Likewise.
	(*vec_extract<mode>_0): New define_insn_and_split for HF
	vector extract.
	(*vec_extracthf): New define_insn.
	(VEC_EXTRACT_MODE): Add HF vector modes.
	(PINSR_MODE): Add V8HF.
	(sse2p4_1): Likewise.
	(pinsr_evex_isa): Likewise.
	(<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
	insert for V8HFmode.
	(pbroadcast_evex_isa): Add HF vector modes.
	(AVX2_VEC_DUP_MODE): Likewise.
	(VEC_INIT_MODE): Likewise.
	(VEC_INIT_HALF_MODE): Likewise.
	(avx2_pbroadcast<mode>): Adjust to support HF vector mode
	broadcast.
	(avx2_pbroadcast<mode>_1): Likewise.
	(<avx512>_vec_dup<mode>_1): Likewise.
	(<avx512>_vec_dup<mode><mask_name>): Likewise.
	(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>): Likewise.
---
 gcc/config/i386/avx512fp16intrin.h | 172 +++++++++++++++++++
 gcc/config/i386/i386-expand.c      |  79 ++++++++-
 gcc/config/i386/i386-modes.def     |  12 +-
 gcc/config/i386/i386.c             |  19 ++-
 gcc/config/i386/i386.md            |  13 +-
 gcc/config/i386/sse.md             | 266 ++++++++++++++++++++++-------
 6 files changed, 480 insertions(+), 81 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 38d63161ba6..3fc0770986e 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -45,6 +45,178 @@ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
 typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
 typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
 
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5,
+	    _Float16 __A4, _Float16 __A3, _Float16 __A2,
+	    _Float16 __A1, _Float16 __A0)
+{
+  return __extension__ (__m128h)(__v8hf){ __A0, __A1, __A2, __A3,
+					  __A4, __A5, __A6, __A7 };
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_set_ph (_Float16 __A15, _Float16 __A14, _Float16 __A13,
+	       _Float16 __A12, _Float16 __A11, _Float16 __A10,
+	       _Float16 __A9, _Float16 __A8, _Float16 __A7,
+	       _Float16 __A6, _Float16 __A5, _Float16 __A4,
+	       _Float16 __A3, _Float16 __A2, _Float16 __A1,
+	       _Float16 __A0)
+{
+  return __extension__ (__m256h)(__v16hf){ __A0, __A1, __A2, __A3,
+					   __A4, __A5, __A6, __A7,
+					   __A8, __A9, __A10, __A11,
+					   __A12, __A13, __A14, __A15 };
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29,
+	       _Float16 __A28, _Float16 __A27, _Float16 __A26,
+	       _Float16 __A25, _Float16 __A24, _Float16 __A23,
+	       _Float16 __A22, _Float16 __A21, _Float16 __A20,
+	       _Float16 __A19, _Float16 __A18, _Float16 __A17,
+	       _Float16 __A16, _Float16 __A15, _Float16 __A14,
+	       _Float16 __A13, _Float16 __A12, _Float16 __A11,
+	       _Float16 __A10, _Float16 __A9, _Float16 __A8,
+	       _Float16 __A7, _Float16 __A6, _Float16 __A5,
+	       _Float16 __A4, _Float16 __A3, _Float16 __A2,
+	       _Float16 __A1, _Float16 __A0)
+{
+  return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3,
+					   __A4, __A5, __A6, __A7,
+					   __A8, __A9, __A10, __A11,
+					   __A12, __A13, __A14, __A15,
+					   __A16, __A17, __A18, __A19,
+					   __A20, __A21, __A22, __A23,
+					   __A24, __A25, __A26, __A27,
+					   __A28, __A29, __A30, __A31 };
+}
+
+/* Create vectors of elements in the reversed order from _mm_set_ph,
+   _mm256_set_ph and _mm512_set_ph functions.  */
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+	     _Float16 __A3, _Float16 __A4, _Float16 __A5,
+	     _Float16 __A6, _Float16 __A7)
+{
+  return _mm_set_ph (__A7, __A6, __A5, __A4, __A3, __A2, __A1, __A0);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+		_Float16 __A3, _Float16 __A4, _Float16 __A5,
+		_Float16 __A6, _Float16 __A7, _Float16 __A8,
+		_Float16 __A9, _Float16 __A10, _Float16 __A11,
+		_Float16 __A12, _Float16 __A13, _Float16 __A14,
+		_Float16 __A15)
+{
+  return _mm256_set_ph (__A15, __A14, __A13, __A12, __A11, __A10, __A9,
+			__A8, __A7, __A6, __A5, __A4, __A3, __A2, __A1,
+			__A0);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+		_Float16 __A3, _Float16 __A4, _Float16 __A5,
+		_Float16 __A6, _Float16 __A7, _Float16 __A8,
+		_Float16 __A9, _Float16 __A10, _Float16 __A11,
+		_Float16 __A12, _Float16 __A13, _Float16 __A14,
+		_Float16 __A15, _Float16 __A16, _Float16 __A17,
+		_Float16 __A18, _Float16 __A19, _Float16 __A20,
+		_Float16 __A21, _Float16 __A22, _Float16 __A23,
+		_Float16 __A24, _Float16 __A25, _Float16 __A26,
+		_Float16 __A27, _Float16 __A28, _Float16 __A29,
+		_Float16 __A30, _Float16 __A31)
+
+{
+  return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25,
+			__A24, __A23, __A22, __A21, __A20, __A19, __A18,
+			__A17, __A16, __A15, __A14, __A13, __A12, __A11,
+			__A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3,
+			__A2, __A1, __A0);
+}
+
+/* Broadcast _Float16 to vector.  */
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set1_ph (_Float16 __A)
+{
+  return _mm_set_ph (__A, __A, __A, __A, __A, __A, __A, __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_set1_ph (_Float16 __A)
+{
+  return _mm256_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_ph (_Float16 __A)
+{
+  return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A);
+}
+
+/* Create a vector with all zeros.  */
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_setzero_ph (void)
+{
+  return _mm_set1_ph (0.0f);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_setzero_ph (void)
+{
+  return _mm256_set1_ph (0.0f);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_ph (void)
+{
+  return _mm512_set1_ph (0.0f);
+}
+
+/* Create a vector with element 0 as F and the rest zero.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set_sh (_Float16 __F)
+{
+  return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, __F);
+}
+
+/* Create a vector with element 0 as *P and the rest zero.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_load_sh (void const *__P)
+{
+  return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     *(_Float16 const *) __P);
+}
+
+/* Stores the lower _Float16 value.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_store_sh (void *__P, __m128h __A)
+{
+  *(_Float16 *) __P = ((__v8hf)__A)[0];
+}
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index ab5f5b284c8..5ce7163b241 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -13914,6 +13914,11 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
 	}
       return true;
 
+    case E_V8HFmode:
+    case E_V16HFmode:
+    case E_V32HFmode:
+      return ix86_vector_duplicate_value (mode, target, val);
+
     default:
       return false;
     }
@@ -13998,6 +14003,18 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
       use_vector_set = TARGET_AVX512F && TARGET_64BIT && one_var == 0;
       gen_vec_set_0 = gen_vec_setv8di_0;
       break;
+    case E_V8HFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv8hf_0;
+      break;
+    case E_V16HFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv16hf_0;
+      break;
+    case E_V32HFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv32hf_0;
+      break;
     default:
       break;
     }
@@ -14147,6 +14164,7 @@ ix86_expand_vector_init_one_var (bool mmx_ok, machine_mode mode,
       if (!TARGET_64BIT)
 	return false;
       /* FALLTHRU */
+    case E_V8HFmode:
     case E_V4DFmode:
     case E_V8SFmode:
     case E_V8SImode:
@@ -14381,13 +14399,22 @@ ix86_expand_vector_init_interleave (machine_mode mode,
 {
   machine_mode first_imode, second_imode, third_imode, inner_mode;
   int i, j;
-  rtx op0, op1;
+  rtx op, op0, op1;
   rtx (*gen_load_even) (rtx, rtx, rtx);
   rtx (*gen_interleave_first_low) (rtx, rtx, rtx);
   rtx (*gen_interleave_second_low) (rtx, rtx, rtx);
 
   switch (mode)
     {
+    case E_V8HFmode:
+      gen_load_even = gen_vec_setv8hf;
+      gen_interleave_first_low = gen_vec_interleave_lowv4si;
+      gen_interleave_second_low = gen_vec_interleave_lowv2di;
+      inner_mode = HFmode;
+      first_imode = V4SImode;
+      second_imode = V2DImode;
+      third_imode = VOIDmode;
+      break;
     case E_V8HImode:
       gen_load_even = gen_vec_setv8hi;
       gen_interleave_first_low = gen_vec_interleave_lowv4si;
@@ -14412,9 +14439,19 @@ ix86_expand_vector_init_interleave (machine_mode mode,
 
   for (i = 0; i < n; i++)
     {
+      op = ops [i + i];
+      if (inner_mode == HFmode)
+	{
+	  /* Convert HFmode to HImode.  */
+	  op1 = gen_reg_rtx (HImode);
+	  op1 = gen_rtx_SUBREG (HImode, force_reg (HFmode, op), 0);
+	  op = gen_reg_rtx (HImode);
+	  emit_move_insn (op, op1);
+	}
+
       /* Extend the odd elment to SImode using a paradoxical SUBREG.  */
       op0 = gen_reg_rtx (SImode);
-      emit_move_insn (op0, gen_lowpart (SImode, ops [i + i]));
+      emit_move_insn (op0, gen_lowpart (SImode, op));
 
       /* Insert the SImode value as low element of V4SImode vector. */
       op1 = gen_reg_rtx (V4SImode);
@@ -14551,6 +14588,10 @@ ix86_expand_vector_init_general (bool mmx_ok, machine_mode mode,
       half_mode = V8HImode;
       goto half;
 
+    case E_V16HFmode:
+      half_mode = V8HFmode;
+      goto half;
+
 half:
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -14574,6 +14615,11 @@ half:
       half_mode = V16HImode;
       goto quarter;
 
+    case E_V32HFmode:
+      quarter_mode = V8HFmode;
+      half_mode = V16HFmode;
+      goto quarter;
+
 quarter:
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -14610,6 +14656,9 @@ quarter:
 	 move from GPR to SSE register directly.  */
       if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
 	break;
+      /* FALLTHRU */
+
+    case E_V8HFmode:
 
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -15076,6 +15125,10 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
 	}
       return;
 
+    case E_V8HFmode:
+      use_vec_merge = true;
+      break;
+
     case E_V8HImode:
     case E_V2HImode:
       use_vec_merge = TARGET_SSE2;
@@ -15550,6 +15603,28 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
       ix86_expand_vector_extract (false, target, tmp, elt & 3);
       return;
 
+    case E_V32HFmode:
+      tmp = gen_reg_rtx (V16HFmode);
+      if (elt < 16)
+	emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
+      else
+	emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
+      ix86_expand_vector_extract (false, target, tmp, elt & 15);
+      return;
+
+    case E_V16HFmode:
+      tmp = gen_reg_rtx (V8HFmode);
+      if (elt < 8)
+	emit_insn (gen_vec_extract_lo_v16hf (tmp, vec));
+      else
+	emit_insn (gen_vec_extract_hi_v16hf (tmp, vec));
+      ix86_expand_vector_extract (false, target, tmp, elt & 7);
+      return;
+
+    case E_V8HFmode:
+      use_vec_extr = true;
+      break;
+
     case E_V8QImode:
       use_vec_extr = TARGET_MMX_WITH_SSE && TARGET_SSE4_1;
       /* ??? Could extract the appropriate HImode element and shift.  */
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 9232f59a925..fcadfcd4c94 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -84,12 +84,12 @@ VECTOR_MODES (INT, 16);       /*   V16QI V8HI V4SI V2DI */
 VECTOR_MODES (INT, 32);       /*  V32QI V16HI V8SI V4DI */
 VECTOR_MODES (INT, 64);       /* V64QI V32HI V16SI V8DI */
 VECTOR_MODES (INT, 128);      /* V128QI V64HI V32SI V16DI */
-VECTOR_MODES (FLOAT, 8);      /*                   V2SF */
-VECTOR_MODES (FLOAT, 16);     /*              V4SF V2DF */
-VECTOR_MODES (FLOAT, 32);     /*         V8SF V4DF V2TF */
-VECTOR_MODES (FLOAT, 64);     /*        V16SF V8DF V4TF */
-VECTOR_MODES (FLOAT, 128);    /*       V32SF V16DF V8TF */
-VECTOR_MODES (FLOAT, 256);    /*      V64SF V32DF V16TF */
+VECTOR_MODES (FLOAT, 8);      /*              V4HF V2SF */
+VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
+VECTOR_MODES (FLOAT, 32);     /*   V16HF V8SF V4DF V2TF */
+VECTOR_MODES (FLOAT, 64);     /*  V32HF V16SF V8DF V4TF */
+VECTOR_MODES (FLOAT, 128);    /* V64HF V32SF V16DF V8TF */
+VECTOR_MODES (FLOAT, 256);    /* V128HF V64SF V32DF V16TF */
 VECTOR_MODE (INT, TI, 1);     /*                   V1TI */
 VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
 VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9ca31e934ab..021283e6f39 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2404,6 +2404,7 @@ classify_argument (machine_mode mode, const_tree type,
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
+    case E_V16HFmode:
     case E_V16HImode:
     case E_V4DFmode:
     case E_V4DImode:
@@ -2414,6 +2415,7 @@ classify_argument (machine_mode mode, const_tree type,
       return 4;
     case E_V8DFmode:
     case E_V16SFmode:
+    case E_V32HFmode:
     case E_V8DImode:
     case E_V16SImode:
     case E_V32HImode:
@@ -2431,6 +2433,7 @@ classify_argument (machine_mode mode, const_tree type,
     case E_V4SImode:
     case E_V16QImode:
     case E_V8HImode:
+    case E_V8HFmode:
     case E_V2DFmode:
     case E_V2DImode:
       classes[0] = X86_64_SSE_CLASS;
@@ -19102,9 +19105,11 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1,
       if (!TARGET_SSE2)
 	return true;
 
-      /* Between SSE and general, we have moves no larger than word size.  */
+      /* Between SSE and general, we have moves no larger than word size
+	 except for AVX512FP16, VMOVW enable 16bits movement.  */
       if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
-	  || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
+	  || GET_MODE_SIZE (mode) < GET_MODE_SIZE (TARGET_AVX512FP16
+						   ? HImode : SImode)
 	  || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
 	return true;
 
@@ -19552,6 +19557,14 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	      || VALID_AVX512F_SCALAR_MODE (mode)))
 	return true;
 
+      /* Allow HF vector modes for AVX512FP16.  NB: Since HF vector
+	 moves are implemented as integer vector moves, we allow
+	 V8HFmode and V16HFmode without AVX512VL in xmm0-xmm15.  */
+      if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
+	return (mode == V32HFmode
+		|| TARGET_AVX512VL
+		|| !EXT_REX_SSE_REGNO_P (regno));
+
       /* For AVX-5124FMAPS or AVX-5124VNNIW
 	 allow V64SF and V64SI modes for special regnos.  */
       if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
@@ -21663,6 +21676,8 @@ ix86_vector_mode_supported_p (machine_mode mode)
   if ((TARGET_MMX || TARGET_MMX_WITH_SSE)
       && VALID_MMX_REG_MODE (mode))
     return true;
+  if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
+    return true;
   if ((TARGET_3DNOW || TARGET_MMX_WITH_SSE)
       && VALID_MMX_REG_MODE_3DNOW (mode))
     return true;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ee5660e8161..25cee502f97 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -496,8 +496,8 @@ (define_attr "type"
 
 ;; Main data type used by the insn
 (define_attr "mode"
-  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
-  V2DF,V2SF,V1DF,V8DF"
+  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V32HF,V16HF,V8HF,
+   V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF"
   (const_string "unknown"))
 
 ;; The CPU unit operations uses.
@@ -1098,7 +1098,8 @@ (define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
 			     (V2DI "16") (V4DI "32") (V8DI "64")
 			     (V1TI "16") (V2TI "32") (V4TI "64")
 			     (V2DF "16") (V4DF "32") (V8DF "64")
-			     (V4SF "16") (V8SF "32") (V16SF "64")])
+			     (V4SF "16") (V8SF "32") (V16SF "64")
+			     (V8HF "16") (V16HF "32") (V32HF "64")])
 
 ;; Double word integer modes as mode attribute.
 (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "OI")])
@@ -1239,9 +1240,9 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
 ;; SSE instruction suffix for various modes
 (define_mode_attr ssemodesuffix
   [(HF "sh") (SF "ss") (DF "sd")
-   (V16SF "ps") (V8DF "pd")
-   (V8SF "ps") (V4DF "pd")
-   (V4SF "ps") (V2DF "pd")
+   (V32HF "ph") (V16SF "ps") (V8DF "pd")
+   (V16HF "ph") (V8SF "ps") (V4DF "pd")
+   (V8HF "ph") (V4SF "ps") (V2DF "pd")
    (V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
    (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
    (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")])
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 446f9ba552f..1009d656cbb 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -225,6 +225,8 @@ (define_mode_iterator VMOVE
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") V2DF])
 
@@ -240,6 +242,13 @@ (define_mode_iterator VI12_AVX512VL
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
    V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
 
+(define_mode_iterator VI12HF_AVX512VL
+  [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
+   V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
+   (V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")])
+
 ;; Same iterator, but without supposed TARGET_AVX512BW
 (define_mode_iterator VI12_AVX512VLBW
   [(V64QI "TARGET_AVX512BW") (V16QI "TARGET_AVX512VL")
@@ -255,6 +264,7 @@ (define_mode_iterator V
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16") V8HF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
@@ -277,7 +287,8 @@ (define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF])
 (define_mode_iterator V_256_512
   [V32QI V16HI V8SI V4DI V8SF V4DF
    (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V16SI "TARGET_AVX512F")
-   (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
+   (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
+   (V16HF "TARGET_AVX512FP16") (V32HF "TARGET_AVX512FP16")])
 
 ;; All vector float modes
 (define_mode_iterator VF
@@ -352,6 +363,9 @@ (define_mode_iterator VF2_AVX512VL
 (define_mode_iterator VF1_AVX512VL
   [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
 
+(define_mode_iterator VF_AVX512FP16
+  [V32HF V16HF V8HF])
+
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
@@ -360,6 +374,16 @@ (define_mode_iterator VI
    (V8SI "TARGET_AVX") V4SI
    (V4DI "TARGET_AVX") V2DI])
 
+;; All vector integer and HF modes
+(define_mode_iterator VIHF
+  [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+   (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
+   (V8SI "TARGET_AVX") V4SI
+   (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")])
+
 (define_mode_iterator VI_AVX2
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
@@ -562,6 +586,7 @@ (define_mode_attr avx512
    (V8HI  "avx512vl") (V16HI  "avx512vl") (V32HI "avx512bw")
    (V4SI  "avx512vl") (V8SI  "avx512vl") (V16SI "avx512f")
    (V2DI  "avx512vl") (V4DI  "avx512vl") (V8DI "avx512f")
+   (V8HF "avx512fp16") (V16HF "avx512vl") (V32HF "avx512bw")
    (V4SF "avx512vl") (V8SF "avx512vl") (V16SF "avx512f")
    (V2DF "avx512vl") (V4DF "avx512vl") (V8DF "avx512f")])
 
@@ -622,12 +647,13 @@ (define_mode_attr avx2_avx512
    (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw")])
 
 (define_mode_attr shuffletype
-  [(V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i")
-  (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i")
-  (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i")
-  (V32HI "i") (V16HI "i") (V8HI "i")
-  (V64QI "i") (V32QI "i") (V16QI "i")
-  (V4TI "i") (V2TI "i") (V1TI "i")])
+  [(V32HF "f") (V16HF "f") (V8HF "f")
+   (V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i")
+   (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i")
+   (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i")
+   (V32HI "i") (V16HI "i") (V8HI "i")
+   (V64QI "i") (V32QI "i") (V16QI "i")
+   (V4TI "i") (V2TI "i") (V1TI "i")])
 
 (define_mode_attr ssequartermode
   [(V16SF "V4SF") (V8DF "V2DF") (V16SI "V4SI") (V8DI "V2DI")])
@@ -664,6 +690,8 @@ (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
 
 ;; All 128 and 256bit vector integer modes
 (define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
+;; All 256bit vector integer and HF modes
+(define_mode_iterator VIHF_256 [V32QI V16HI V8SI V4DI V16HF])
 
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
@@ -685,6 +713,9 @@ (define_mode_iterator VI48_512 [V16SI V8DI])
 (define_mode_iterator VI4_256_8_512 [V8SI V8DI])
 (define_mode_iterator VI_AVX512BW
   [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
+(define_mode_iterator VIHF_AVX512BW
+  [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
+  (V32HF "TARGET_AVX512FP16")])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -725,6 +756,9 @@ (define_mode_iterator VF_AVX512
    (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
    V16SF V8DF])
 
+(define_mode_iterator V16_256 [V16HI V16HF])
+(define_mode_iterator V32_512 [V32HI V32HF])
+
 (define_mode_attr avx512bcst
   [(V4SI "%{1to4%}") (V2DI "%{1to2%}")
    (V8SI "%{1to8%}") (V4DI "%{1to4%}")
@@ -774,8 +808,16 @@ (define_mode_attr sseinsnmode
    (V16SF "V16SF") (V8DF "V8DF")
    (V8SF "V8SF") (V4DF "V4DF")
    (V4SF "V4SF") (V2DF "V2DF")
+   (V8HF "TI") (V16HF "OI") (V32HF "XI")
    (TI "TI")])
 
+;; SSE integer instruction suffix for various modes
+(define_mode_attr sseintmodesuffix
+  [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
+   (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
+   (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")
+   (V8HF "w") (V16HF "w") (V32HF "w")])
+
 ;; Mapping of vector modes to corresponding mask size
 (define_mode_attr avx512fmaskmode
   [(V64QI "DI") (V32QI "SI") (V16QI "HI")
@@ -835,7 +877,8 @@ (define_mode_attr ssedoublevecmode
    (V16QI "V32QI") (V8HI "V16HI") (V4SI "V8SI") (V2DI "V4DI")
    (V16SF "V32SF") (V8DF "V16DF")
    (V8SF "V16SF") (V4DF "V8DF")
-   (V4SF "V8SF") (V2DF "V4DF")])
+   (V4SF "V8SF") (V2DF "V4DF")
+   (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")])
 
 ;; Mapping of vector modes to a vector mode of half size
 ;; instead of V1DI/V1DF, DI/DF are used for V2DI/V2DF although they are scalar.
@@ -845,7 +888,8 @@ (define_mode_attr ssehalfvecmode
    (V16QI  "V8QI") (V8HI   "V4HI") (V4SI  "V2SI") (V2DI "DI")
    (V16SF "V8SF") (V8DF "V4DF")
    (V8SF  "V4SF") (V4DF "V2DF")
-   (V4SF  "V2SF") (V2DF "DF")])
+   (V4SF  "V2SF") (V2DF "DF")
+   (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")])
 
 (define_mode_attr ssehalfvecmodelower
   [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti")
@@ -853,9 +897,10 @@ (define_mode_attr ssehalfvecmodelower
    (V16QI  "v8qi") (V8HI   "v4hi") (V4SI  "v2si")
    (V16SF "v8sf") (V8DF "v4df")
    (V8SF  "v4sf") (V4DF "v2df")
-   (V4SF  "v2sf")])
+   (V4SF  "v2sf")
+   (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")])
 
-;; Mapping of vector modes ti packed single mode of the same size
+;; Mapping of vector modes to packed single mode of the same size
 (define_mode_attr ssePSmode
   [(V16SI "V16SF") (V8DF "V16SF")
    (V16SF "V16SF") (V8DI "V16SF")
@@ -865,7 +910,8 @@ (define_mode_attr ssePSmode
    (V4DI "V8SF") (V2DI "V4SF")
    (V4TI "V16SF") (V2TI "V8SF") (V1TI "V4SF")
    (V8SF "V8SF") (V4SF "V4SF")
-   (V4DF "V8SF") (V2DF "V4SF")])
+   (V4DF "V8SF") (V2DF "V4SF")
+   (V32HF "V16SF") (V16HF "V8SF") (V8HF "V4SF")])
 
 (define_mode_attr ssePSmode2
   [(V8DI "V8SF") (V4DI "V4SF")])
@@ -887,6 +933,7 @@ (define_mode_attr ssescalarmodelower
    (V32HI "hi") (V16HI "hi") (V8HI "hi")
    (V16SI "si") (V8SI "si")  (V4SI "si")
    (V8DI "di")  (V4DI "di")  (V2DI "di")
+   (V32HF "hf") (V16HF "hf")  (V8HF "hf")
    (V16SF "sf") (V8SF "sf")  (V4SF "sf")
    (V8DF "df")  (V4DF "df")  (V2DF "df")
    (V4TI "ti")  (V2TI "ti")])
@@ -897,6 +944,7 @@ (define_mode_attr ssexmmmode
    (V32HI "V8HI")  (V16HI "V8HI") (V8HI "V8HI")
    (V16SI "V4SI")  (V8SI "V4SI")  (V4SI "V4SI")
    (V8DI "V2DI")   (V4DI "V2DI")  (V2DI "V2DI")
+   (V32HF "V8HF")  (V16HF "V8HF") (V8HF "V8HF")
    (V16SF "V4SF")  (V8SF "V4SF")  (V4SF "V4SF")
    (V8DF "V2DF")   (V4DF "V2DF")  (V2DF "V2DF")])
 
@@ -939,10 +987,11 @@ (define_mode_attr ssescalarsize
    (V64QI "8") (V32QI "8") (V16QI "8")
    (V32HI "16") (V16HI "16") (V8HI "16")
    (V16SI "32") (V8SI "32") (V4SI "32")
+   (V32HF "16") (V16HF "16") (V8HF "16")
    (V16SF "32") (V8SF "32") (V4SF "32")
    (V8DF "64") (V4DF "64") (V2DF "64")])
 
-;; SSE prefix for integer vector modes
+;; SSE prefix for integer and HF vector modes
 (define_mode_attr sseintprefix
   [(V2DI  "p") (V2DF  "")
    (V4DI  "p") (V4DF  "")
@@ -950,9 +999,9 @@ (define_mode_attr sseintprefix
    (V4SI  "p") (V4SF  "")
    (V8SI  "p") (V8SF  "")
    (V16SI "p") (V16SF "")
-   (V16QI "p") (V8HI "p")
-   (V32QI "p") (V16HI "p")
-   (V64QI "p") (V32HI "p")])
+   (V16QI "p") (V8HI "p") (V8HF "p")
+   (V32QI "p") (V16HI "p") (V16HF "p")
+   (V64QI "p") (V32HI "p") (V32HF "p")])
 
 ;; SSE scalar suffix for vector modes
 (define_mode_attr ssescalarmodesuffix
@@ -987,7 +1036,8 @@ (define_mode_attr castmode
 ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise.
 ;; i64x4 or f64x4 for 512bit modes.
 (define_mode_attr i128
-  [(V16SF "f64x4") (V8SF "f128") (V8DF "f64x4") (V4DF "f128")
+  [(V16HF "%~128") (V32HF "i64x4") (V16SF "f64x4") (V8SF "f128")
+   (V8DF "f64x4") (V4DF "f128")
    (V64QI "i64x4") (V32QI "%~128") (V32HI "i64x4") (V16HI "%~128")
    (V16SI "i64x4") (V8SI "%~128") (V8DI "i64x4") (V4DI "%~128")])
 
@@ -1011,14 +1061,18 @@ (define_mode_attr bcstscalarsuff
    (V32HI "w")  (V16HI "w") (V8HI "w")
    (V16SI "d")  (V8SI "d")  (V4SI "d")
    (V8DI "q")   (V4DI "q")  (V2DI "q")
+   (V32HF "w")  (V16HF "w") (V8HF "w")
    (V16SF "ss") (V8SF "ss") (V4SF "ss")
    (V8DF "sd")  (V4DF "sd") (V2DF "sd")])
 
 ;; Tie mode of assembler operand to mode iterator
 (define_mode_attr xtg_mode
-  [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x") (V4SF "x") (V2DF "x")
-   (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t")
-   (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")])
+  [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x")
+   (V8HF "x") (V4SF "x") (V2DF "x")
+   (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t")
+   (V16HF "t") (V8SF "t") (V4DF "t")
+   (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g")
+   (V32HF "g") (V16SF "g") (V8DF "g")])
 
 ;; Half mask mode for unpacks
 (define_mode_attr HALFMASKMODE
@@ -8353,6 +8407,45 @@ (define_insn "vec_set<mode>_0"
 	   ]
 	   (symbol_ref "true")))])
 
+;; vmovw clears also the higer bits
+(define_insn "vec_set<mode>_0"
+  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v")
+	(vec_merge:VF_AVX512FP16
+	  (vec_duplicate:VF_AVX512FP16
+	    (match_operand:HF 2 "nonimmediate_operand" "rm"))
+	  (match_operand:VF_AVX512FP16 1 "const0_operand" "C")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vmovw\t{%2, %x0|%x0, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
+(define_insn "*avx512fp16_movsh"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (vec_duplicate:V8HF
+	    (match_operand:HF 2 "register_operand" "v"))
+	  (match_operand:V8HF 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vmovsh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
+(define_insn "avx512fp16_movsh"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+          (match_operand:V8HF 2 "register_operand" "v")
+	  (match_operand:V8HF 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vmovsh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 ;; A subset is vec_setv4sf.
 (define_insn "*vec_setv4sf_sse4_1"
   [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
@@ -9189,10 +9282,10 @@ (define_insn "vec_extract_hi_<mode>"
    (set_attr "length_immediate" "1")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn_and_split "vec_extract_lo_v32hi"
-  [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,v,m")
-	(vec_select:V16HI
-	  (match_operand:V32HI 1 "nonimmediate_operand" "v,m,v")
+(define_insn_and_split "vec_extract_lo_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,v,m")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V32_512 1 "nonimmediate_operand" "v,m,v")
 	  (parallel [(const_int 0) (const_int 1)
 		     (const_int 2) (const_int 3)
 		     (const_int 4) (const_int 5)
@@ -9219,9 +9312,10 @@ (define_insn_and_split "vec_extract_lo_v32hi"
   if (!TARGET_AVX512VL
       && REG_P (operands[0])
       && EXT_REX_SSE_REG_P (operands[1]))
-    operands[0] = lowpart_subreg (V32HImode, operands[0], V16HImode);
+    operands[0] = lowpart_subreg (<MODE>mode, operands[0],
+				  <ssehalfvecmode>mode);
   else
-    operands[1] = gen_lowpart (V16HImode, operands[1]);
+    operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);
 }
   [(set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
@@ -9230,10 +9324,10 @@ (define_insn_and_split "vec_extract_lo_v32hi"
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "vec_extract_hi_v32hi"
-  [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
-	(vec_select:V16HI
-	  (match_operand:V32HI 1 "register_operand" "v")
+(define_insn "vec_extract_hi_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V32_512 1 "register_operand" "v")
 	  (parallel [(const_int 16) (const_int 17)
 		     (const_int 18) (const_int 19)
 		     (const_int 20) (const_int 21)
@@ -9250,10 +9344,10 @@ (define_insn "vec_extract_hi_v32hi"
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn_and_split "vec_extract_lo_v16hi"
-  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,m")
-	(vec_select:V8HI
-	  (match_operand:V16HI 1 "nonimmediate_operand" "vm,v")
+(define_insn_and_split "vec_extract_lo_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,m")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V16_256 1 "nonimmediate_operand" "vm,v")
 	  (parallel [(const_int 0) (const_int 1)
 		     (const_int 2) (const_int 3)
 		     (const_int 4) (const_int 5)
@@ -9262,12 +9356,12 @@ (define_insn_and_split "vec_extract_lo_v16hi"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (V8HImode, operands[1]);")
+  "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);")
 
-(define_insn "vec_extract_hi_v16hi"
-  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=xm,vm,vm")
-	(vec_select:V8HI
-	  (match_operand:V16HI 1 "register_operand" "x,v,v")
+(define_insn "vec_extract_hi_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=xm,vm,vm")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V16_256 1 "register_operand" "x,v,v")
 	  (parallel [(const_int 8) (const_int 9)
 		     (const_int 10) (const_int 11)
 		     (const_int 12) (const_int 13)
@@ -9403,12 +9497,41 @@ (define_insn "vec_extract_hi_v32qi"
    (set_attr "prefix" "vex,evex,evex")
    (set_attr "mode" "OI")])
 
+;; NB: *vec_extract<mode>_0 must be placed before *vec_extracthf.
+;; Otherwise, it will be ignored.
+(define_insn_and_split "*vec_extract<mode>_0"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,r")
+	(vec_select:HF
+	  (match_operand:VF_AVX512FP16 1 "nonimmediate_operand" "vm,v,m")
+	  (parallel [(const_int 0)])))]
+  "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 1))]
+  "operands[1] = gen_lowpart (HFmode, operands[1]);")
+
+(define_insn "*vec_extracthf"
+  [(set (match_operand:HF 0 "register_sse4nonimm_operand" "=r,m")
+	(vec_select:HF
+	  (match_operand:V8HF 1 "register_operand" "v,v")
+	  (parallel
+	    [(match_operand:SI 2 "const_0_to_7_operand")])))]
+  "TARGET_AVX512FP16"
+  "@
+   vpextrw\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix" "maybe_evex")
+   (set_attr "mode" "TI")])
+
 ;; Modes handled by vec_extract patterns.
 (define_mode_iterator VEC_EXTRACT_MODE
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -14639,16 +14762,16 @@ (define_expand "vec_interleave_low<mode>"
 
 ;; Modes handled by pinsr patterns.
 (define_mode_iterator PINSR_MODE
-  [(V16QI "TARGET_SSE4_1") V8HI
+  [(V16QI "TARGET_SSE4_1") V8HI (V8HF "TARGET_AVX512FP16")
    (V4SI "TARGET_SSE4_1")
    (V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
 
 (define_mode_attr sse2p4_1
-  [(V16QI "sse4_1") (V8HI "sse2")
+  [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1")
    (V4SI "sse4_1") (V2DI "sse4_1")])
 
 (define_mode_attr pinsr_evex_isa
-  [(V16QI "avx512bw") (V8HI "avx512bw")
+  [(V16QI "avx512bw") (V8HI "avx512bw") (V8HF "avx512bw")
    (V4SI "avx512dq") (V2DI "avx512dq")])
 
 ;; sse4_1_pinsrd must come before sse2_loadld since it is preferred.
@@ -14676,11 +14799,19 @@ (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
     case 2:
     case 4:
       if (GET_MODE_SIZE (<ssescalarmode>mode) < GET_MODE_SIZE (SImode))
-	return "vpinsr<ssemodesuffix>\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+	{
+	  if (<MODE>mode == V8HFmode)
+	    return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+	  else
+	    return "vpinsr<ssemodesuffix>\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+	}
       /* FALLTHRU */
     case 3:
     case 5:
-      return "vpinsr<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      if (<MODE>mode == V8HFmode)
+	return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      else
+	return "vpinsr<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
     default:
       gcc_unreachable ();
     }
@@ -21095,16 +21226,17 @@ (define_mode_attr pbroadcast_evex_isa
   [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw")
    (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
    (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f")
-   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")])
+   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")
+   (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")])
 
 (define_insn "avx2_pbroadcast<mode>"
-  [(set (match_operand:VI 0 "register_operand" "=x,v")
-	(vec_duplicate:VI
+  [(set (match_operand:VIHF 0 "register_operand" "=x,v")
+	(vec_duplicate:VIHF
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "xm,vm")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX2"
-  "vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}"
   [(set_attr "isa" "*,<pbroadcast_evex_isa>")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
@@ -21112,17 +21244,17 @@ (define_insn "avx2_pbroadcast<mode>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx2_pbroadcast<mode>_1"
-  [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v")
-	(vec_duplicate:VI_256
+  [(set (match_operand:VIHF_256 0 "register_operand" "=x,x,v,v")
+	(vec_duplicate:VIHF_256
 	  (vec_select:<ssescalarmode>
-	    (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v")
+	    (match_operand:VIHF_256 1 "nonimmediate_operand" "m,x,m,v")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX2"
   "@
-   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
-   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}"
+   vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}
+   vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}"
   [(set_attr "isa" "*,*,<pbroadcast_evex_isa>,<pbroadcast_evex_isa>")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
@@ -21476,15 +21608,15 @@ (define_insn "avx2_vec_dupv4df"
    (set_attr "mode" "V4DF")])
 
 (define_insn "<avx512>_vec_dup<mode>_1"
-  [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v")
-	(vec_duplicate:VI_AVX512BW
+  [(set (match_operand:VIHF_AVX512BW 0 "register_operand" "=v,v")
+	(vec_duplicate:VIHF_AVX512BW
 	  (vec_select:<ssescalarmode>
-	    (match_operand:VI_AVX512BW 1 "nonimmediate_operand" "v,m")
+	    (match_operand:VIHF_AVX512BW 1 "nonimmediate_operand" "v,m")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
   "@
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %<iptr>1}"
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %<iptr>1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -21509,8 +21641,8 @@ (define_insn "<avx512>_vec_dup<mode><mask_name>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx512>_vec_dup<mode><mask_name>"
-  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
-	(vec_duplicate:VI12_AVX512VL
+  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v")
+	(vec_duplicate:VI12HF_AVX512VL
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
 	    (parallel [(const_int 0)]))))]
@@ -21545,8 +21677,8 @@ (define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>"
-  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v,v")
-	(vec_duplicate:VI12_AVX512VL
+  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v,v")
+	(vec_duplicate:VI12HF_AVX512VL
 	  (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm,r")))]
   "TARGET_AVX512BW"
   "@
@@ -21641,7 +21773,7 @@ (define_mode_attr vecdupssescalarmodesuffix
   [(V8SF "ss") (V4DF "sd") (V8SI "ss") (V4DI "sd")])
 ;; Modes handled by AVX2 vec_dup patterns.
 (define_mode_iterator AVX2_VEC_DUP_MODE
-  [V32QI V16QI V16HI V8HI V8SI V4SI])
+  [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF])
 
 (define_insn "*vec_dup<mode>"
   [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand" "=x,x,v")
@@ -22403,6 +22535,8 @@ (define_mode_iterator VEC_INIT_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -22414,6 +22548,8 @@ (define_mode_iterator VEC_INIT_HALF_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
    (V4TI "TARGET_AVX512F")])
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 02/62] AVX512FP16: Add testcase for vector init and broadcast intrinsics.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
  2021-07-01  6:15 ` [PATCH 01/62] AVX512FP16: Support vector init/broadcast for FP16 liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 03/62] AVX512FP16: Fix HF vector passing in variable arguments liuhongt
                   ` (59 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/m512-check.h: Add union128h, union256h, union512h.
	* gcc.target/i386/avx512fp16-10a.c: New test.
	* gcc.target/i386/avx512fp16-10b.c: Ditto.
	* gcc.target/i386/avx512fp16-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-1c.c: Ditto.
	* gcc.target/i386/avx512fp16-1d.c: Ditto.
	* gcc.target/i386/avx512fp16-1e.c: Ditto.
	* gcc.target/i386/avx512fp16-2a.c: Ditto.
	* gcc.target/i386/avx512fp16-2b.c: Ditto.
	* gcc.target/i386/avx512fp16-2c.c: Ditto.
	* gcc.target/i386/avx512fp16-3a.c: Ditto.
	* gcc.target/i386/avx512fp16-3b.c: Ditto.
	* gcc.target/i386/avx512fp16-3c.c: Ditto.
	* gcc.target/i386/avx512fp16-4.c: Ditto.
	* gcc.target/i386/avx512fp16-5.c: Ditto.
	* gcc.target/i386/avx512fp16-6.c: Ditto.
	* gcc.target/i386/avx512fp16-7.c: Ditto.
	* gcc.target/i386/avx512fp16-8.c: Ditto.
	* gcc.target/i386/avx512fp16-9a.c: Ditto.
	* gcc.target/i386/avx512fp16-9b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-10a.c          |  14 ++
 .../gcc.target/i386/avx512fp16-10b.c          |  25 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |  24 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |  32 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |  26 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |  33 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |  30 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |  28 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |  33 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |  36 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |  36 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |  35 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |  40 ++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-4.c  |  31 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-5.c  | 133 ++++++++++++++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-6.c  |  57 ++++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-7.c  |  86 +++++++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-8.c  |  53 +++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c |  27 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c |  49 +++++++
 gcc/testsuite/gcc.target/i386/m512-check.h    |  38 ++++-
 21 files changed, 865 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
new file mode 100644
index 00000000000..f06ffffa822
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <immintrin.h>
+
+__m128h
+__attribute__ ((noinline, noclone))
+set_128 (_Float16 x)
+{
+  return _mm_set_sh (x);
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[ \t]\+\[^\n\r]*xmm0" 1 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovw\[ \t]\+\[^\n\r]*xmm0" 2 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
new file mode 100644
index 00000000000..055edd7aaf5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-10a.c"
+
+union128h u128 = { ESP_FLOAT16, 0.0f, 0.0f, 0.0f,
+		   0.0f, 0.0f, 0.0f, 0.0f };
+
+static void
+do_test (void)
+{
+  __m128h v128 = set_128 (ESP_FLOAT16);
+  union128h a128;
+
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
new file mode 100644
index 00000000000..45c7bddeba5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo1 (_Float16 x)
+{
+  return __extension__ (__m128h)(__v8hf) { x, 0.0f, 0.0f, 0.0f,
+                                           0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo2 (_Float16 *x)
+{
+  return __extension__ (__m128h)(__v8hf) { *x, 0.0f, 0.0f, 0.0f,
+                                           0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 3 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 2 { target { ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
new file mode 100644
index 00000000000..7560c625e25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
@@ -0,0 +1,32 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-1a.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m128h v;
+  union128h a;
+  memset (&v, -1, sizeof (v));
+  v = foo1 (x);
+  a.x = v;
+  if (check_union128h (a, u.a))
+    abort ();
+  x = 33.3;
+  u.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo2 (&x);
+  a.x = v;
+  if (check_union128h (a, u.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
new file mode 100644
index 00000000000..9814e9c0363
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vmovsh" 2 { target { ! ia32 } } } }  */
+/* { dg-final { scan-assembler-times "vpinsrw" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpinsrw" 2 { target { ia32 } } } } */
+
+typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo1 (__m128h a, _Float16 f)
+{
+  __v8hf x = (__v8hf) a;
+  x[2] = f;
+  return (__m128h) x;
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo2 (__m128h a, _Float16 f)
+{
+  __v8hf x = (__v8hf) a;
+  x[0] = f;
+  return (__m128h) x;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1d.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
new file mode 100644
index 00000000000..cdaf656eb48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
@@ -0,0 +1,33 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-1c.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u = { -1.2f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f };
+  __m128h v;
+  union128h a, b;
+  v = foo1 (u.x, x);
+  a.x = v;
+  b = u;
+  b.a[2] = x;
+  if (check_union128h (a, b.a))
+    abort ();
+  x = 33.3;
+  b = u;
+  b.a[0] = x;
+  v = foo2 (u.x, x);
+  a.x = v;
+  if (check_union128h (a, b.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1e.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
new file mode 100644
index 00000000000..04d33cfcf2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-1a.c"
+
+__m128h
+__attribute__ ((noinline,noclone))
+foo3 (__m128h x)
+{
+  return foo1(x[0]);
+}
+
+static void
+do_test (void)
+{
+  union128h u = { -1.2f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f };
+  union128h a, b = { -1.2f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f};
+  __m128h v;
+  v = foo3 (u.x);
+  a.x = v;
+  if (check_union128h (a, b.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-2a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
new file mode 100644
index 00000000000..c03138fb13d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
+typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
+
+__m256h
+__attribute__ ((noinline, noclone))
+foo1 (_Float16 x)
+{
+  return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+foo2 (_Float16 *x)
+{
+  return __extension__ (__m256h)(__v16hf) { *x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 3 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 2 { target { ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-2b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
new file mode 100644
index 00000000000..100afd0f49c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
@@ -0,0 +1,33 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-2a.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union256h u = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m256h v;
+  union256h a;
+  memset (&v, -1, sizeof (v));
+  v = foo1 (x);
+  a.x = v;
+  if (check_union256h (a, u.a))
+    abort ();
+  x = 33.3;
+  u.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo2 (&x);
+  a.x = v;
+  if (check_union256h (a, u.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-2c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
new file mode 100644
index 00000000000..cf4b42a4021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
@@ -0,0 +1,36 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-2a.c"
+
+__m256h
+__attribute__ ((noinline,noclone))
+foo3 (__m256h x)
+{
+  return foo1(x[0]);
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union256h u = { x, 3.5f, -5.9f, 0.0f, 0.0f, 0.0f, 7.7f, 0.0f,
+		  4.0f, -4.20f, 0.0f, 0.0f, 0.0f, -8.7f, 0.0f, 0.0f };
+
+  union256h exp = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		    0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m256h v;
+  union256h a;
+  memset (&v, -1, sizeof (v));
+  v = foo3 (u.x);
+  a.x = v;
+  if (check_union256h (a, exp.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-3a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
new file mode 100644
index 00000000000..126e7d9ee36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
+typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__m512h
+__attribute__ ((noinline, noclone))
+foo1 (_Float16 x)
+{
+  return __extension__ (__m512h)(__v32hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+foo2 (_Float16 *x)
+{
+  return __extension__ (__m512h)(__v32hf) { *x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 3 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 2 { target { ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-3b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
new file mode 100644
index 00000000000..291db066bfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
@@ -0,0 +1,35 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-3a.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union512h u = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m512h v;
+  union512h a;
+  memset (&v, -1, sizeof (v));
+  v = foo1 (x);
+  a.x = v;
+  if (check_union512h (a, u.a))
+    abort ();
+  x = 33.3;
+  u.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo2 (&x);
+  a.x = v;
+  if (check_union512h (a, u.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-3c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
new file mode 100644
index 00000000000..21f9e16434a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-3a.c"
+
+__m512h
+__attribute__ ((noinline,noclone))
+foo3 (__m512h x)
+{
+  return foo1(x[0]);
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union512h u = { x, 3.5f, -5.9f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  2.0f, -2.3f, 0.0f, 0.0f, 10.4f, 0.0f, 0.0f, 0.0f,
+		  3.0f, -3.2f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  4.0f, -4.20f, 0.0f, 0.0f, 0.0f, -8.7f, 0.0f, 0.0f };
+
+  union512h exp = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		    0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		    0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		    0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m512h v;
+  union512h a;
+  memset (&v, -1, sizeof (v));
+  v = foo3 (u.x);
+  a.x = v;
+  if (check_union512h (a, exp.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-4.c b/gcc/testsuite/gcc.target/i386/avx512fp16-4.c
new file mode 100644
index 00000000000..1329a0434a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-4.c
@@ -0,0 +1,31 @@
+/* { dg-do assemble { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
+typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+
+extern __m128h x128, y128;
+extern __m256h x256, y256;
+extern __m512h x512, y512;
+
+__m128h
+foo1 (float f1, __m128h f2)
+{
+  x128 = y128;
+  return f2;
+}
+
+__m256h
+foo2 (float f1, __m256h f2)
+{
+  x256 = y256;
+  return f2;
+}
+
+__m512h
+foo3 (float f1, __m512h f2)
+{
+  x512 = y512;
+  return f2;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-5.c b/gcc/testsuite/gcc.target/i386/avx512fp16-5.c
new file mode 100644
index 00000000000..d28b9651b8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-5.c
@@ -0,0 +1,133 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo1 (_Float16 x)
+{
+  return __extension__ (__m128h)(__v8hf) { x, 0.0f, 0.0f, 0.0f,
+                                           1.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo2 (_Float16 x, _Float16 y)
+{
+  return __extension__ (__m128h)(__v8hf) { x, 0.0f, 0.0f, y,
+                                           3.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+foo3 (_Float16 x)
+{
+  return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            1.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+foo4 (_Float16 x, _Float16 y)
+{
+  return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, y,
+                                            3.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+foo5 (_Float16 x)
+{
+  return __extension__ (__m512h)(__v32hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            1.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+foo6 (_Float16 x, _Float16 y)
+{
+  return __extension__ (__m512h)(__v32hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, y,
+                                            3.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  _Float16 y = -35.7;
+  union128h u128 = { x, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f };
+  union256h u256 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  union512h u512 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m128h v128;
+  __m256h v256;
+  __m512h v512;
+  union128h a128;
+  union256h a256;
+  union512h a512;
+
+  memset (&v128, -1, sizeof (v128));
+  v128 = foo1 (x);
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+  memset (&v128, -1, sizeof (v128));
+  u128.a[3] = y;
+  u128.a[4] = 3.0f;
+  v128 = foo2 (x, y);
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+
+  memset (&v256, -1, sizeof (v256));
+  v256 = foo3 (x);
+  a256.x = v256;
+  if (check_union256h (a256, u256.a))
+    abort ();
+  memset (&v256, -1, sizeof (v256));
+  u256.a[7] = y;
+  u256.a[8] = 3.0f;
+  v256 = foo4 (x, y);
+  a256.x = v256;
+  if (check_union256h (a256, u256.a))
+    abort ();
+
+  memset (&v512, -1, sizeof (v512));
+  v512 = foo5 (x);
+  a512.x = v512;
+  if (check_union512h (a512, u512.a))
+    abort ();
+  memset (&v512, -1, sizeof (v512));
+  u512.a[15] = y;
+  u512.a[16] = 3.0f;
+  v512 = foo6 (x, y);
+  a512.x = v512;
+  if (check_union512h (a512, u512.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-6.c b/gcc/testsuite/gcc.target/i386/avx512fp16-6.c
new file mode 100644
index 00000000000..d85a6c40603
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-6.c
@@ -0,0 +1,57 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+void
+__attribute__ ((noinline, noclone))
+foo128 (_Float16 *p, __m128h x)
+{
+  *p = ((__v8hf)x)[0];
+}
+
+void
+__attribute__ ((noinline, noclone))
+foo256 (_Float16 *p, __m256h x)
+{
+  *p = ((__v16hf)x)[0];
+}
+
+void
+__attribute__ ((noinline, noclone))
+foo512 (_Float16 *p, __m512h x)
+{
+  *p = ((__v32hf)x)[0];
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u128 = { x, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f };
+  union256h u256 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  union512h u512 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  _Float16 y;
+
+  foo128 (&y, u128.x);
+  if (x != y)
+    abort ();
+
+  foo256 (&y, u256.x);
+  if (x != y)
+    abort ();
+
+  foo512 (&y, u512.x);
+  if (x != y)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-7.c b/gcc/testsuite/gcc.target/i386/avx512fp16-7.c
new file mode 100644
index 00000000000..26ae25fc0d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-7.c
@@ -0,0 +1,86 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+void
+__attribute__ ((noinline, noclone))
+foo128 (_Float16 *p, __m128h x)
+{
+  *p = ((__v8hf)x)[4];
+}
+
+void
+__attribute__ ((noinline, noclone))
+foo256 (_Float16 *p, __m256h x)
+{
+  *p = ((__v16hf)x)[10];
+}
+
+void
+__attribute__ ((noinline, noclone))
+foo512 (_Float16 *p, __m512h x)
+{
+  *p = ((__v32hf)x)[30];
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u128 = { 0.0f, x, 0.0f, 0.0f, x, 0.0f, 0.0f, x };
+  union256h u256 = { x, 0.0f, 0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, x, 0.0f, 0.0f, x, 0.0f, 0.0f };
+  union512h u512 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, x, 0.0f, 0.0f, x, 0.0f };
+  __m128h v128 = _mm_setr_ph (0.0f, x, 0.0f, 0.0f,
+			      x, 0.0f, 0.0f, x);
+  __m256h v256 = _mm256_setr_ph (x, 0.0f, 0.0f, 0.0f,
+				 x, 0.0f, 0.0f, 0.0f,
+				 0.0f, 0.0f, x, 0.0f,
+				 0.0f, x, 0.0f, 0.0f);
+  __m512h v512 = _mm512_setr_ph (x, 0.0f, 0.0f, 0.0f,
+				 0.0f, 0.0f, 0.0f, 0.0f,
+				 0.0f, x, 0.0f, 0.0f,
+				 0.0f, 0.0f, 0.0f, 0.0f,
+				 0.0f, 0.0f, x, 0.0f,
+				 0.0f, 0.0f, 0.0f, 0.0f,
+				 0.0f, 0.0f, 0.0f, x,
+				 0.0f, 0.0f, x, 0.0f);
+  union128h a128;
+  union256h a256;
+  union512h a512;
+  _Float16 y;
+
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+
+  a256.x = v256;
+  if (check_union256h (a256, u256.a))
+    abort ();
+
+  a512.x = v512;
+  if (check_union512h (a512, u512.a))
+    abort ();
+
+  foo128 (&y, u128.x);
+  if (x != y)
+    abort ();
+
+  foo256 (&y, u256.x);
+  if (x != y)
+    abort ();
+
+  foo512 (&y, u512.x);
+  if (x != y)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-8.c b/gcc/testsuite/gcc.target/i386/avx512fp16-8.c
new file mode 100644
index 00000000000..8f103751c2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-8.c
@@ -0,0 +1,53 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+_Float16
+__attribute__ ((noinline, noclone))
+foo128 (__m128h x)
+{
+  return ((__v8hf)x)[4];
+}
+
+_Float16
+__attribute__ ((noinline, noclone))
+foo256 (__m256h x)
+{
+  return ((__v16hf)x)[10];
+}
+
+_Float16
+__attribute__ ((noinline, noclone))
+foo512 (__m512h x)
+{
+  return ((__v32hf)x)[30];
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u128 = { 0.0f, 0.0f, 0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f };
+  union256h u256 = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  union512h u512 = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, x, 0.0f };
+
+  if (foo128 (u128.x) != x)
+    abort ();
+
+  if (foo256 (u256.x) != x)
+    abort ();
+
+  if (foo512 (u512.x) != x)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-9a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
new file mode 100644
index 00000000000..580ffb51e45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <immintrin.h>
+
+__m128h
+__attribute__ ((noinline, noclone))
+set1_128 (_Float16 x)
+{
+  return _mm_set1_ph (x);
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+set1_256 (_Float16 x)
+{
+  return _mm256_set1_ph (x);
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+set1_512 (_Float16 x)
+{
+  return _mm512_set1_ph (x);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastw\[ \t]\+\[^\n\r]*\[xyz\]mm0" 3 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-9b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
new file mode 100644
index 00000000000..198b23e64b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
@@ -0,0 +1,49 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-9a.c"
+
+union128h u128 = { ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16 };
+union256h u256 = { ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16 };
+union512h u512 = { ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16 };
+
+static void
+do_test (void)
+{
+  __m128h v128 = set1_128 (ESP_FLOAT16);
+  __m256h v256 = set1_256 (ESP_FLOAT16);
+  __m512h v512 = set1_512 (ESP_FLOAT16);
+  union128h a128;
+  union256h a256;
+  union512h a512;
+
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+
+  a256.x = v256;
+  if (check_union256h (a256, u256.a))
+    abort ();
+
+  a512.x = v512;
+  if (check_union512h (a512, u512.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h
index 6befaf0a9ba..68e74fce68d 100644
--- a/gcc/testsuite/gcc.target/i386/m512-check.h
+++ b/gcc/testsuite/gcc.target/i386/m512-check.h
@@ -60,7 +60,24 @@ typedef union
  __m512i x;
  unsigned long long a[8];
 } union512i_uq;
-                                    
+
+typedef union
+{
+  __m128h x;
+  _Float16 a[8];
+} union128h;
+
+typedef union
+{
+  __m256h x;
+  _Float16 a[16];
+} union256h;
+
+typedef union
+{
+  __m512h x;
+  _Float16 a[32];
+} union512h;
 
 CHECK_EXP (union512i_b, char, "%d")
 CHECK_EXP (union512i_w, short, "%d")
@@ -115,3 +132,22 @@ CHECK_ROUGH_EXP (union256, float, "%f")
 CHECK_ROUGH_EXP (union256d, double, "%f")
 CHECK_ROUGH_EXP (union128, float, "%f")
 CHECK_ROUGH_EXP (union128d, double, "%f")
+
+#ifdef AVX512FP16
+
+CHECK_EXP (union128h, _Float16, "%f")
+CHECK_EXP (union256h, _Float16, "%f")
+CHECK_EXP (union512h, _Float16, "%f")
+
+#ifndef ESP_FLOAT16
+#define ESP_FLOAT16 0.27
+#endif
+
+CHECK_FP_EXP (union128h, _Float16, ESP_FLOAT16, "%f")
+CHECK_FP_EXP (union256h, _Float16, ESP_FLOAT16, "%f")
+CHECK_FP_EXP (union512h, _Float16, ESP_FLOAT16, "%f")
+
+CHECK_ROUGH_EXP (union128h, _Float16, "%f")
+CHECK_ROUGH_EXP (union256h, _Float16, "%f")
+CHECK_ROUGH_EXP (union512h, _Float16, "%f")
+#endif
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 03/62] AVX512FP16: Fix HF vector passing in variable arguments.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
  2021-07-01  6:15 ` [PATCH 01/62] AVX512FP16: Support vector init/broadcast for FP16 liuhongt
  2021-07-01  6:15 ` [PATCH 02/62] AVX512FP16: Add testcase for vector init and broadcast intrinsics liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 04/62] AVX512FP16: Add ABI tests for xmm liuhongt
                   ` (58 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

From: "H.J. Lu" <hjl.tools@gmail.com>

gcc/ChangeLog:

	* config/i386/i386.c (function_arg_advance_64): Allow
	V16HFmode and V32HFmode.
	(function_arg_64): Likewise.
	(ix86_gimplify_va_arg): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vararg-1.c: New test.
	* gcc.target/i386/avx512fp16-vararg-2.c: Ditto.
	* gcc.target/i386/avx512fp16-vararg-3.c: Ditto.
	* gcc.target/i386/avx512fp16-vararg-4.c: Ditto.
---
 gcc/config/i386/i386.c                        |   8 +-
 .../gcc.target/i386/avx512fp16-vararg-1.c     | 122 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vararg-2.c     | 107 +++++++++++++++
 .../gcc.target/i386/avx512fp16-vararg-3.c     | 114 ++++++++++++++++
 .../gcc.target/i386/avx512fp16-vararg-4.c     | 115 +++++++++++++++++
 5 files changed, 465 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 021283e6f39..79e6880d9dd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2908,7 +2908,9 @@ function_arg_advance_64 (CUMULATIVE_ARGS *cum, machine_mode mode,
 
   /* Unnamed 512 and 256bit vector mode parameters are passed on stack.  */
   if (!named && (VALID_AVX512F_REG_MODE (mode)
-		 || VALID_AVX256_REG_MODE (mode)))
+		 || VALID_AVX256_REG_MODE (mode)
+		 || mode == V16HFmode
+		 || mode == V32HFmode))
     return 0;
 
   if (!examine_argument (mode, type, 0, &int_nregs, &sse_nregs)
@@ -3167,6 +3169,8 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode,
     case E_V32HImode:
     case E_V8DFmode:
     case E_V8DImode:
+    case E_V16HFmode:
+    case E_V32HFmode:
       /* Unnamed 256 and 512bit vector mode parameters are passed on stack.  */
       if (!named)
 	return NULL;
@@ -4658,6 +4662,8 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
     case E_V32HImode:
     case E_V8DFmode:
     case E_V8DImode:
+    case E_V16HFmode:
+    case E_V32HFmode:
       /* Unnamed 256 and 512bit vector mode parameters are passed on stack.  */
       if (!TARGET_64BIT_MS_ABI)
 	{
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
new file mode 100644
index 00000000000..9bd366838b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
@@ -0,0 +1,122 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-options "-mavx512fp16" } */
+
+#include <stdarg.h>
+#include <assert.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+struct m256h
+{
+  __m256h  v;
+};
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+struct m256h n2 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16,
+		      234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } };
+__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ;
+_Float16 n4 = 32.4f16;
+double n5 = 103.3;
+__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 };
+__m128d n7 = { -91.387, -8193.518 };
+struct m256h n8 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16,
+		      234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 };
+_Float16 n11 = 40.7f16;
+double n12 = 304.9;
+__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 };
+__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16,
+		131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 };
+__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+		 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16,
+		 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+		 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+struct m256h e2;
+__m128h e3;
+_Float16 e4;
+double e5;
+__m128h e6;
+__m128d e7;
+struct m256h e8;
+__m128 e9;
+__m128h e10;
+_Float16 e11;
+double e12;
+__m128h e13;
+__m256h e14;
+__m512h e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+foo (va_list va_arglist)
+{
+  e4 = va_arg (va_arglist, _Float16);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128h);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, struct m256h);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128h);
+  e11 = va_arg (va_arglist, _Float16);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128h);
+  e14 = va_arg (va_arglist, __m256h);
+  e15 = va_arg (va_arglist, __m512h);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+  va_end (va_arglist);
+}
+
+static void
+__attribute__((noinline))
+test (__m128 a1, struct m256h a2, __m128h a3, ...)
+{
+  va_list va_arglist;
+
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  va_start (va_arglist, a3);
+  foo (va_arglist);
+  va_end (va_arglist);
+}
+
+static void
+do_test (void)
+{
+  test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+	n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
new file mode 100644
index 00000000000..043f1c75d00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
@@ -0,0 +1,107 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-options "-mavx512fp16" } */
+
+#include <stdarg.h>
+#include <assert.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m256d n2 = { -93.83, 893.318, 3994.3, -39484.0 };
+__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ;
+_Float16 n4 = 32.4f16;
+double n5 = 103.3;
+__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 };
+_Float16 n11 = 40.7f16;
+double n12 = 304.9;
+__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 };
+__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16,
+		131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 };
+__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+		 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16,
+		 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+		 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m256d e2;
+__m128h e3;
+_Float16 e4;
+double e5;
+__m128h e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128h e10;
+_Float16 e11;
+double e12;
+__m128h e13;
+__m256h e14;
+__m512h e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+test (__m128 a1, __m256d a2, __m128h a3, ...)
+{
+  va_list va_arglist;
+
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  va_start (va_arglist, a3);
+  e4 = va_arg (va_arglist, _Float16);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128h);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128h);
+  e11 = va_arg (va_arglist, _Float16);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128h);
+  e14 = va_arg (va_arglist, __m256h);
+  e15 = va_arg (va_arglist, __m512h);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+  va_end (va_arglist);
+}
+
+static void
+do_test (void)
+{
+  test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+	n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
new file mode 100644
index 00000000000..cb414a97753
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
@@ -0,0 +1,114 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-options "-mavx512fp16" } */
+
+#include <stdarg.h>
+#include <assert.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+struct m256h
+{
+  __m256h  v;
+};
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+struct m256h n2 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16,
+		      234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } };
+__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ;
+_Float16 n4 = 32.4f16;
+double n5 = 103.3;
+__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 };
+__m128d n7 = { -91.387, -8193.518 };
+struct m256h n8 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16,
+		      234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 };
+_Float16 n11 = 40.7f16;
+double n12 = 304.9;
+__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 };
+__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16,
+		131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 };
+__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+		 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16,
+		 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+		 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+struct m256h e2;
+__m128h e3;
+_Float16 e4;
+double e5;
+__m128h e6;
+__m128d e7;
+struct m256h e8;
+__m128 e9;
+__m128h e10;
+_Float16 e11;
+double e12;
+__m128h e13;
+__m256h e14;
+__m512h e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+test (__m128 a1, struct m256h a2, __m128h a3, ...)
+{
+  va_list va_arglist;
+
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  va_start (va_arglist, a3);
+  e4 = va_arg (va_arglist, _Float16);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128h);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, struct m256h);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128h);
+  e11 = va_arg (va_arglist, _Float16);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128h);
+  e14 = va_arg (va_arglist, __m256h);
+  e15 = va_arg (va_arglist, __m512h);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+  va_end (va_arglist);
+}
+
+static void
+do_test (void)
+{
+  test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+	n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c
new file mode 100644
index 00000000000..962c2bf031d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c
@@ -0,0 +1,115 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-options "-mavx512fp16" } */
+
+#include <stdarg.h>
+#include <assert.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m256d n2 = { -93.83, 893.318, 3994.3, -39484.0 };
+__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ;
+_Float16 n4 = 32.4f16;
+double n5 = 103.3;
+__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 };
+_Float16 n11 = 40.7f16;
+double n12 = 304.9;
+__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 };
+__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16,
+		131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 };
+__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+		 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16,
+		 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+		 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m256d e2;
+__m128h e3;
+_Float16 e4;
+double e5;
+__m128h e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128h e10;
+_Float16 e11;
+double e12;
+__m128h e13;
+__m256h e14;
+__m512h e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+foo (va_list va_arglist)
+{
+  e4 = va_arg (va_arglist, _Float16);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128h);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128h);
+  e11 = va_arg (va_arglist, _Float16);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128h);
+  e14 = va_arg (va_arglist, __m256h);
+  e15 = va_arg (va_arglist, __m512h);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+  va_end (va_arglist);
+}
+
+static void
+__attribute__((noinline))
+test (__m128 a1, __m256d a2, __m128h a3, ...)
+{
+  va_list va_arglist;
+
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  va_start (va_arglist, a3);
+  foo (va_arglist);
+  va_end (va_arglist);
+}
+
+static void
+do_test (void)
+{
+  test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+	n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 04/62] AVX512FP16: Add ABI tests for xmm.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (2 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 03/62] AVX512FP16: Fix HF vector passing in variable arguments liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 05/62] AVX512FP16: Add ABI test for ymm liuhongt
                   ` (57 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

From: "H.J. Lu" <hjl.tools@gmail.com>

Copied from regular XMM ABI tests. Only run AVX512FP16 ABI tests for ELF
targets.

gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp: New exp
	file for abi test.
	* gcc.target/x86_64/abi/avx512fp16/args.h: New header file for abi test.
	* gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/defines.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/macros.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/asm-support.S: New asm for abi check.
	* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c:
	New test.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c: Likewise.
---
 .../abi/avx512fp16/abi-avx512fp16-xmm.exp     |   48 +
 .../gcc.target/x86_64/abi/avx512fp16/args.h   |  190 +++
 .../x86_64/abi/avx512fp16/asm-support.S       |   81 ++
 .../x86_64/abi/avx512fp16/avx512fp16-check.h  |   74 ++
 .../abi/avx512fp16/avx512fp16-xmm-check.h     |    3 +
 .../x86_64/abi/avx512fp16/defines.h           |  150 +++
 .../gcc.target/x86_64/abi/avx512fp16/macros.h |   53 +
 .../test_3_element_struct_and_unions.c        |  692 +++++++++++
 .../abi/avx512fp16/test_basic_alignment.c     |   45 +
 .../test_basic_array_size_and_align.c         |   43 +
 .../abi/avx512fp16/test_basic_returning.c     |   87 ++
 .../x86_64/abi/avx512fp16/test_basic_sizes.c  |   43 +
 .../test_basic_struct_size_and_align.c        |   42 +
 .../test_basic_union_size_and_align.c         |   40 +
 .../abi/avx512fp16/test_complex_returning.c   |  104 ++
 .../abi/avx512fp16/test_m64m128_returning.c   |   73 ++
 .../abi/avx512fp16/test_passing_floats.c      | 1066 +++++++++++++++++
 .../abi/avx512fp16/test_passing_m64m128.c     |  510 ++++++++
 .../abi/avx512fp16/test_passing_structs.c     |  332 +++++
 .../abi/avx512fp16/test_passing_unions.c      |  335 ++++++
 .../abi/avx512fp16/test_struct_returning.c    |  274 +++++
 .../x86_64/abi/avx512fp16/test_varargs-m128.c |  164 +++
 22 files changed, 4449 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
new file mode 100644
index 00000000000..33d24762788
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
@@ -0,0 +1,48 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib clearcap.exp
+load_lib file-format.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || [is-effective-target ia32]
+     || [gcc_target_object_format] != "elf"
+     || ![is-effective-target avx512fp16] } then {
+  return
+}
+
+
+torture-init
+clearcap-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -Wno-abi -mavx512fp16"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+    if {[runtest_file_p $runtests $src]} {
+	c-torture-execute [list $src \
+				$srcdir/$subdir/asm-support.S] \
+				$additional_flags
+    }
+}
+
+clearcap-finish
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
new file mode 100644
index 00000000000..4a7b9a90fbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
@@ -0,0 +1,190 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include <string.h>
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 xmm0
+#define F1 xmm1
+#define F2 xmm2
+#define F3 xmm3
+#define F4 xmm4
+#define F5 xmm5
+#define F6 xmm6
+#define F7 xmm7
+
+typedef union {
+  _Float16 __Float16[8];
+  float _float[4];
+  double _double[2];
+  long _long[2];
+  int _int[4];
+  unsigned long _ulong[2];
+#ifdef CHECK_M64_M128
+  __m64 _m64[2];
+  __m128 _m128[1];
+  __m128h _m128h[1];
+#endif
+} XMM_T;
+
+typedef union {
+  _Float16 __Float16;
+  float _float;
+  double _double;
+  ldouble _ldouble;
+  ulong _ulong[2];
+} X87_T;
+extern void (*callthis)(void);
+extern unsigned long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
+XMM_T xmm_regs[16];
+X87_T x87_regs[8];
+extern volatile unsigned long volatile_var;
+extern void snapshot (void);
+extern void snapshot_ret (void);
+#define WRAP_CALL(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
+#define WRAP_RET(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
+
+/* Clear all integer registers.  */
+#define clear_int_hardware_registers \
+  asm __volatile__ ("xor %%rax, %%rax\n\t" \
+		    "xor %%rbx, %%rbx\n\t" \
+		    "xor %%rcx, %%rcx\n\t" \
+		    "xor %%rdx, %%rdx\n\t" \
+		    "xor %%rsi, %%rsi\n\t" \
+		    "xor %%rdi, %%rdi\n\t" \
+		    "xor %%r8, %%r8\n\t" \
+		    "xor %%r9, %%r9\n\t" \
+		    "xor %%r10, %%r10\n\t" \
+		    "xor %%r11, %%r11\n\t" \
+		    "xor %%r12, %%r12\n\t" \
+		    "xor %%r13, %%r13\n\t" \
+		    "xor %%r14, %%r14\n\t" \
+		    "xor %%r15, %%r15\n\t" \
+		    ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
+		    "r9", "r10", "r11", "r12", "r13", "r14", "r15");
+
+/* This is the list of registers available for passing arguments. Not all of
+   these are used or even really available.  */
+struct IntegerRegisters
+{
+  unsigned long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
+};
+struct FloatRegisters
+{
+  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
+  ldouble st0, st1, st2, st3, st4, st5, st6, st7;
+  XMM_T xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8, xmm9,
+        xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
+};
+
+/* Implemented in scalarargs.c  */
+extern struct IntegerRegisters iregs;
+extern struct FloatRegisters fregs;
+extern unsigned int num_iregs, num_fregs;
+
+#define check_int_arguments do { \
+  assert (num_iregs <= 0 || iregs.I0 == I0); \
+  assert (num_iregs <= 1 || iregs.I1 == I1); \
+  assert (num_iregs <= 2 || iregs.I2 == I2); \
+  assert (num_iregs <= 3 || iregs.I3 == I3); \
+  assert (num_iregs <= 4 || iregs.I4 == I4); \
+  assert (num_iregs <= 5 || iregs.I5 == I5); \
+  } while (0)
+
+#define check_char_arguments check_int_arguments
+#define check_short_arguments check_int_arguments
+#define check_long_arguments check_int_arguments
+
+/* Clear register struct.  */
+#define clear_struct_registers \
+  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
+    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
+  memset (&iregs, 0, sizeof (iregs)); \
+  memset (&fregs, 0, sizeof (fregs)); \
+  memset (xmm_regs, 0, sizeof (xmm_regs)); \
+  memset (x87_regs, 0, sizeof (x87_regs));
+
+/* Clear both hardware and register structs for integers.  */
+#define clear_int_registers \
+  clear_struct_registers \
+  clear_int_hardware_registers
+
+/* TODO: Do the checking.  */
+#define check_f_arguments(T) do { \
+  assert (num_fregs <= 0 || fregs.xmm0._ ## T [0] == xmm_regs[0]._ ## T [0]); \
+  assert (num_fregs <= 1 || fregs.xmm1._ ## T [0] == xmm_regs[1]._ ## T [0]); \
+  assert (num_fregs <= 2 || fregs.xmm2._ ## T [0] == xmm_regs[2]._ ## T [0]); \
+  assert (num_fregs <= 3 || fregs.xmm3._ ## T [0] == xmm_regs[3]._ ## T [0]); \
+  assert (num_fregs <= 4 || fregs.xmm4._ ## T [0] == xmm_regs[4]._ ## T [0]); \
+  assert (num_fregs <= 5 || fregs.xmm5._ ## T [0] == xmm_regs[5]._ ## T [0]); \
+  assert (num_fregs <= 6 || fregs.xmm6._ ## T [0] == xmm_regs[6]._ ## T [0]); \
+  assert (num_fregs <= 7 || fregs.xmm7._ ## T [0] == xmm_regs[7]._ ## T [0]); \
+  } while (0)
+
+#define check_float16_arguments check_f_arguments(_Float16)
+#define check_float_arguments check_f_arguments(float)
+#define check_double_arguments check_f_arguments(double)
+
+#define check_vector_arguments(T,O) do { \
+  assert (num_fregs <= 0 \
+	  || memcmp (((char *) &fregs.xmm0) + (O), \
+		     &xmm_regs[0], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 1 \
+	  || memcmp (((char *) &fregs.xmm1) + (O), \
+		     &xmm_regs[1], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 2 \
+	  || memcmp (((char *) &fregs.xmm2) + (O), \
+		     &xmm_regs[2], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 3 \
+	  || memcmp (((char *) &fregs.xmm3) + (O), \
+		     &xmm_regs[3], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 4 \
+	  || memcmp (((char *) &fregs.xmm4) + (O), \
+		     &xmm_regs[4], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 5 \
+	  || memcmp (((char *) &fregs.xmm5) + (O), \
+		     &xmm_regs[5], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 6 \
+	  || memcmp (((char *) &fregs.xmm6) + (O), \
+		     &xmm_regs[6], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 7 \
+	  || memcmp (((char *) &fregs.xmm7) + (O), \
+		     &xmm_regs[7], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  } while (0)
+
+#define check_m64_arguments check_vector_arguments(m64, 0)
+#define check_m128_arguments check_vector_arguments(m128, 0)
+
+/* ldoubles are not passed in registers */
+#define check_ldouble_arguments
+
+/* TODO: Do the clearing.  */
+#define clear_float_hardware_registers
+#define clear_x87_hardware_registers
+
+#define clear_float_registers \
+  clear_struct_registers \
+  clear_float_hardware_registers
+
+#define clear_x87_registers \
+  clear_struct_registers \
+  clear_x87_hardware_registers
+
+
+#endif /* INCLUDED_ARGS_H  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
new file mode 100644
index 00000000000..7849acd2649
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
@@ -0,0 +1,81 @@
+	.text
+	.p2align 4,,15
+.globl snapshot
+	.type	snapshot, @function
+snapshot:
+.LFB3:
+	movq	%rax, rax(%rip)
+	movq	%rbx, rbx(%rip)
+	movq	%rcx, rcx(%rip)
+	movq	%rdx, rdx(%rip)
+	movq	%rdi, rdi(%rip)
+	movq	%rsi, rsi(%rip)
+	movq	%rbp, rbp(%rip)
+	movq	%rsp, rsp(%rip)
+	movq	%r8, r8(%rip)
+	movq	%r9, r9(%rip)
+	movq	%r10, r10(%rip)
+	movq	%r11, r11(%rip)
+	movq	%r12, r12(%rip)
+	movq	%r13, r13(%rip)
+	movq	%r14, r14(%rip)
+	movq	%r15, r15(%rip)
+	vmovdqu	%xmm0, xmm_regs+0(%rip)
+	vmovdqu	%xmm1, xmm_regs+16(%rip)
+	vmovdqu	%xmm2, xmm_regs+32(%rip)
+	vmovdqu	%xmm3, xmm_regs+48(%rip)
+	vmovdqu	%xmm4, xmm_regs+64(%rip)
+	vmovdqu	%xmm5, xmm_regs+80(%rip)
+	vmovdqu	%xmm6, xmm_regs+96(%rip)
+	vmovdqu	%xmm7, xmm_regs+112(%rip)
+	vmovdqu	%xmm8, xmm_regs+128(%rip)
+	vmovdqu	%xmm9, xmm_regs+144(%rip)
+	vmovdqu	%xmm10, xmm_regs+160(%rip)
+	vmovdqu	%xmm11, xmm_regs+176(%rip)
+	vmovdqu	%xmm12, xmm_regs+192(%rip)
+	vmovdqu	%xmm13, xmm_regs+208(%rip)
+	vmovdqu	%xmm14, xmm_regs+224(%rip)
+	vmovdqu	%xmm15, xmm_regs+240(%rip)
+	jmp	*callthis(%rip)
+.LFE3:
+	.size	snapshot, .-snapshot
+
+	.p2align 4,,15
+.globl snapshot_ret
+	.type	snapshot_ret, @function
+snapshot_ret:
+	movq	%rdi, rdi(%rip)
+	subq	$8, %rsp
+	call	*callthis(%rip)
+	addq	$8, %rsp
+	movq	%rax, rax(%rip)
+	movq	%rdx, rdx(%rip)
+	vmovdqu	%xmm0, xmm_regs+0(%rip)
+	vmovdqu	%xmm1, xmm_regs+16(%rip)
+	fstpt	x87_regs(%rip)
+	fstpt	x87_regs+16(%rip)
+	fldt	x87_regs+16(%rip)
+	fldt	x87_regs(%rip)
+	ret
+	.size	snapshot_ret, .-snapshot_ret
+
+	.comm	callthis,8,8
+	.comm	rax,8,8
+	.comm	rbx,8,8
+	.comm	rcx,8,8
+	.comm	rdx,8,8
+	.comm	rsi,8,8
+	.comm	rdi,8,8
+	.comm	rsp,8,8
+	.comm	rbp,8,8
+	.comm	r8,8,8
+	.comm	r9,8,8
+	.comm	r10,8,8
+	.comm	r11,8,8
+	.comm	r12,8,8
+	.comm	r13,8,8
+	.comm	r14,8,8
+	.comm	r15,8,8
+	.comm	xmm_regs,256,32
+	.comm	x87_regs,128,32
+	.comm   volatile_var,8,8
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
new file mode 100644
index 00000000000..9fbec9d03ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
@@ -0,0 +1,74 @@
+#include <stdlib.h>
+#include <cpuid.h>
+
+/* Check if the OS supports executing AVX512FP16 instructions.  */
+
+#define XCR_XFEATURE_ENABLED_MASK	0x0
+
+#define XSTATE_FP	0x1
+#define XSTATE_SSE	0x2
+#define XSTATE_YMM	0x4
+#define XSTATE_OPMASK	0x20
+#define XSTATE_ZMM	0x40
+#define XSTATE_HI_ZMM	0x80
+
+static int
+check_osxsave (void)
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  return (ecx & bit_OSXSAVE) != 0;
+}
+
+static int
+avx512fp16_os_support (void)
+{
+  unsigned int eax, edx;
+  unsigned int ecx = XCR_XFEATURE_ENABLED_MASK;
+  unsigned int mask = XSTATE_MASK;
+
+  if (!check_osxsave ())
+    return 0;
+
+  __asm__ ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (ecx));
+
+  return ((eax & mask) == mask);
+}
+
+static void do_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!avx512fp16_os_support ())
+    return 0;
+
+  if (__get_cpuid_max (0, NULL) < 7)
+    return 0;
+
+  __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+    /* Run AVX512FP16 test only if host has ISA support.  */
+  if (((ebx & (bit_AVX512F | bit_AVX512BW))
+       == (bit_AVX512F | bit_AVX512BW))
+      && (edx & bit_AVX512FP16)
+      && AVX512VL (ebx))
+    {
+      do_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+      return 0;
+    }
+
+#ifdef DEBUG
+  printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
new file mode 100644
index 00000000000..0abe09f1166
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
@@ -0,0 +1,3 @@
+#define AVX512VL(ebx) (ebx & bit_AVX512VL)
+#define XSTATE_MASK (XSTATE_SSE | XSTATE_OPMASK)
+#include "avx512fp16-check.h"
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
new file mode 100644
index 00000000000..17f2c27edc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
@@ -0,0 +1,150 @@
+#ifndef DEFINED_DEFINES_H
+#define DEFINED_DEFINES_H
+
+/* Get __m64 and __m128. */
+#include <immintrin.h>
+
+typedef unsigned long ulong;
+typedef long double ldouble;
+
+/* These defines determines what part of the test should be run.  When
+   GCC implements these parts, the defines should be uncommented to
+   enable testing.  */
+
+/* Scalar type __int128.  */
+/* #define CHECK_INT128 */
+
+/* Scalar type long double.  */
+#define CHECK_LONG_DOUBLE
+
+/* Scalar type __float128.  */
+/* #define CHECK_FLOAT128 */
+
+/* Scalar types __m64 and __m128.  */
+#define CHECK_M64_M128
+
+/* Returning of complex type.  */
+#define CHECK_COMPLEX
+
+/* Structs with size >= 16.  */
+#define CHECK_LARGER_STRUCTS
+
+/* Checks for passing floats and doubles.  */
+#define CHECK_FLOAT_DOUBLE_PASSING
+
+/* Union passing with not-extremely-simple unions.  */
+#define CHECK_LARGER_UNION_PASSING
+
+/* Variable args.  */
+#define CHECK_VARARGS
+
+/* Check argument passing and returning for scalar types with sizeof = 16.  */
+/* TODO: Implement these tests. Don't activate them for now.  */
+#define CHECK_LARGE_SCALAR_PASSING
+
+/* Defines for sizing and alignment.  */
+
+#define TYPE_SIZE_CHAR         1
+#define TYPE_SIZE_SHORT        2
+#define TYPE_SIZE_INT          4
+#define TYPE_SIZE_LONG         8
+#define TYPE_SIZE_LONG_LONG    8
+#define TYPE_SIZE_INT128       16
+#define TYPE_SIZE_FLOAT16      2
+#define TYPE_SIZE_FLOAT        4
+#define TYPE_SIZE_DOUBLE       8
+#define TYPE_SIZE_LONG_DOUBLE  16
+#define TYPE_SIZE_FLOAT128     16
+#define TYPE_SIZE_M64          8
+#define TYPE_SIZE_M128         16
+#define TYPE_SIZE_ENUM         4
+#define TYPE_SIZE_POINTER      8
+
+#define TYPE_ALIGN_CHAR        1
+#define TYPE_ALIGN_SHORT       2
+#define TYPE_ALIGN_INT         4
+#define TYPE_ALIGN_LONG        8
+#define TYPE_ALIGN_LONG_LONG   8
+#define TYPE_ALIGN_INT128      16
+#define TYPE_ALIGN_FLOAT16     2
+#define TYPE_ALIGN_FLOAT       4
+#define TYPE_ALIGN_DOUBLE      8
+#define TYPE_ALIGN_LONG_DOUBLE 16
+#define TYPE_ALIGN_FLOAT128    16
+#define TYPE_ALIGN_M64         8
+#define TYPE_ALIGN_M128        16
+#define TYPE_ALIGN_ENUM        4
+#define TYPE_ALIGN_POINTER     8
+
+/* These defines control the building of the list of types to check. There
+   is a string identifying the type (with a comma after), a size of the type
+   (also with a comma and an integer for adding to the total amount of types)
+   and an alignment of the type (which is currently not really needed since
+   the abi specifies that alignof == sizeof for all scalar types).  */
+#ifdef CHECK_INT128
+#define CI128_STR "__int128",
+#define CI128_SIZ TYPE_SIZE_INT128,
+#define CI128_ALI TYPE_ALIGN_INT128,
+#define CI128_RET "???",
+#else
+#define CI128_STR
+#define CI128_SIZ
+#define CI128_ALI
+#define CI128_RET
+#endif
+#ifdef CHECK_LONG_DOUBLE
+#define CLD_STR "long double",
+#define CLD_SIZ TYPE_SIZE_LONG_DOUBLE,
+#define CLD_ALI TYPE_ALIGN_LONG_DOUBLE,
+#define CLD_RET "x87_regs[0]._ldouble",
+#else
+#define CLD_STR
+#define CLD_SIZ
+#define CLD_ALI
+#define CLD_RET
+#endif
+#ifdef CHECK_FLOAT128
+#define CF128_STR "__float128",
+#define CF128_SIZ TYPE_SIZE_FLOAT128,
+#define CF128_ALI TYPE_ALIGN_FLOAT128, 
+#define CF128_RET "???",
+#else
+#define CF128_STR
+#define CF128_SIZ
+#define CF128_ALI
+#define CF128_RET
+#endif
+#ifdef CHECK_M64_M128
+#define CMM_STR "__m64", "__m128",
+#define CMM_SIZ TYPE_SIZE_M64, TYPE_SIZE_M128,
+#define CMM_ALI TYPE_ALIGN_M64, TYPE_ALIGN_M128,
+#define CMM_RET "???", "???",
+#else
+#define CMM_STR
+#define CMM_SIZ
+#define CMM_ALI
+#define CMM_RET
+#endif
+
+/* Used in size and alignment tests.  */
+enum dummytype { enumtype };
+
+extern void abort (void);
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+#ifdef __GNUC__
+#define PACKED __attribute__((__packed__))
+#else
+#warning Some tests will fail due to missing __packed__ support
+#define PACKED
+#endif
+
+#endif /* DEFINED_DEFINES_H */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
new file mode 100644
index 00000000000..98fbc660f27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
@@ -0,0 +1,53 @@
+#ifndef MACROS_H
+
+#define check_size(_t, _size) assert(sizeof(_t) == (_size))
+
+#define check_align(_t, _align) assert(__alignof__(_t) == (_align))
+
+#define check_align_lv(_t, _align) assert(__alignof__(_t) == (_align) \
+					  && (((unsigned long)&(_t)) & ((_align) - 1) ) == 0)
+
+#define check_basic_struct_size_and_align(_type, _size, _align) { \
+  struct _str { _type dummy; } _t; \
+  check_size(_t, _size); \
+  check_align_lv(_t, _align); \
+}
+
+#define check_array_size_and_align(_type, _size, _align) { \
+  _type _a[1]; _type _b[2]; _type _c[16]; \
+  struct _str { _type _a[1]; } _s; \
+  check_align_lv(_a[0], _align); \
+  check_size(_a, _size); \
+  check_size(_b, (_size*2)); \
+  check_size(_c, (_size*16)); \
+  check_size(_s, _size); \
+  check_align_lv(_s._a[0], _align); \
+}
+
+#define check_basic_union_size_and_align(_type, _size, _align) { \
+  union _union { _type dummy; } _u; \
+  check_size(_u, _size); \
+  check_align_lv(_u, _align); \
+}
+
+#define run_signed_tests2(_function, _arg1, _arg2) \
+  _function(_arg1, _arg2); \
+  _function(signed _arg1, _arg2); \
+  _function(unsigned _arg1, _arg2);
+
+#define run_signed_tests3(_function, _arg1, _arg2, _arg3) \
+  _function(_arg1, _arg2, _arg3); \
+  _function(signed _arg1, _arg2, _arg3); \
+  _function(unsigned _arg1, _arg2, _arg3);
+
+/* Check size of a struct and a union of three types.  */
+
+#define check_struct_and_union3(type1, type2, type3, struct_size, align_size) \
+{ \
+  struct _str { type1 t1; type2 t2; type3 t3; } _t; \
+  union _uni { type1 t1; type2 t2; type3 t3; } _u; \
+  check_size(_t, struct_size); \
+  check_size(_u, align_size); \
+}
+
+#endif // MACROS_H
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
new file mode 100644
index 00000000000..cc94e0fe0e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
@@ -0,0 +1,692 @@
+/* This is an autogenerated file. Do not edit.  */
+
+#include "defines.h"
+#include "macros.h"
+
+/* Check structs and unions of all permutations of 3 basic types.  */
+int
+main (void)
+{
+  check_struct_and_union3(char, char, char, 3, 1);
+  check_struct_and_union3(char, char, short, 4, 2);
+  check_struct_and_union3(char, char, int, 8, 4);
+  check_struct_and_union3(char, char, long, 16, 8);
+  check_struct_and_union3(char, char, long long, 16, 8);
+  check_struct_and_union3(char, char, float, 8, 4);
+  check_struct_and_union3(char, char, double, 16, 8);
+  check_struct_and_union3(char, char, long double, 32, 16);
+  check_struct_and_union3(char, short, char, 6, 2);
+  check_struct_and_union3(char, short, short, 6, 2);
+  check_struct_and_union3(char, short, int, 8, 4);
+  check_struct_and_union3(char, short, long, 16, 8);
+  check_struct_and_union3(char, short, long long, 16, 8);
+  check_struct_and_union3(char, short, float, 8, 4);
+  check_struct_and_union3(char, short, double, 16, 8);
+  check_struct_and_union3(char, short, long double, 32, 16);
+  check_struct_and_union3(char, int, char, 12, 4);
+  check_struct_and_union3(char, int, short, 12, 4);
+  check_struct_and_union3(char, int, int, 12, 4);
+  check_struct_and_union3(char, int, long, 16, 8);
+  check_struct_and_union3(char, int, long long, 16, 8);
+  check_struct_and_union3(char, int, float, 12, 4);
+  check_struct_and_union3(char, int, double, 16, 8);
+  check_struct_and_union3(char, int, long double, 32, 16);
+  check_struct_and_union3(char, long, char, 24, 8);
+  check_struct_and_union3(char, long, short, 24, 8);
+  check_struct_and_union3(char, long, int, 24, 8);
+  check_struct_and_union3(char, long, long, 24, 8);
+  check_struct_and_union3(char, long, long long, 24, 8);
+  check_struct_and_union3(char, long, float, 24, 8);
+  check_struct_and_union3(char, long, double, 24, 8);
+  check_struct_and_union3(char, long, long double, 32, 16);
+  check_struct_and_union3(char, long long, char, 24, 8);
+  check_struct_and_union3(char, long long, short, 24, 8);
+  check_struct_and_union3(char, long long, int, 24, 8);
+  check_struct_and_union3(char, long long, long, 24, 8);
+  check_struct_and_union3(char, long long, long long, 24, 8);
+  check_struct_and_union3(char, long long, float, 24, 8);
+  check_struct_and_union3(char, long long, double, 24, 8);
+  check_struct_and_union3(char, long long, long double, 32, 16);
+  check_struct_and_union3(char, float, char, 12, 4);
+  check_struct_and_union3(char, float, short, 12, 4);
+  check_struct_and_union3(char, float, int, 12, 4);
+  check_struct_and_union3(char, float, long, 16, 8);
+  check_struct_and_union3(char, float, long long, 16, 8);
+  check_struct_and_union3(char, float, float, 12, 4);
+  check_struct_and_union3(char, float, double, 16, 8);
+  check_struct_and_union3(char, float, long double, 32, 16);
+  check_struct_and_union3(char, double, char, 24, 8);
+  check_struct_and_union3(char, double, short, 24, 8);
+  check_struct_and_union3(char, double, int, 24, 8);
+  check_struct_and_union3(char, double, long, 24, 8);
+  check_struct_and_union3(char, double, long long, 24, 8);
+  check_struct_and_union3(char, double, float, 24, 8);
+  check_struct_and_union3(char, double, double, 24, 8);
+  check_struct_and_union3(char, double, long double, 32, 16);
+  check_struct_and_union3(char, long double, char, 48, 16);
+  check_struct_and_union3(char, long double, short, 48, 16);
+  check_struct_and_union3(char, long double, int, 48, 16);
+  check_struct_and_union3(char, long double, long, 48, 16);
+  check_struct_and_union3(char, long double, long long, 48, 16);
+  check_struct_and_union3(char, long double, float, 48, 16);
+  check_struct_and_union3(char, long double, double, 48, 16);
+  check_struct_and_union3(char, long double, long double, 48, 16);
+  check_struct_and_union3(short, char, char, 4, 2);
+  check_struct_and_union3(short, char, short, 6, 2);
+  check_struct_and_union3(short, char, int, 8, 4);
+  check_struct_and_union3(short, char, long, 16, 8);
+  check_struct_and_union3(short, char, long long, 16, 8);
+  check_struct_and_union3(short, char, float, 8, 4);
+  check_struct_and_union3(short, char, double, 16, 8);
+  check_struct_and_union3(short, char, long double, 32, 16);
+  check_struct_and_union3(short, short, char, 6, 2);
+  check_struct_and_union3(short, short, short, 6, 2);
+  check_struct_and_union3(short, short, int, 8, 4);
+  check_struct_and_union3(short, short, long, 16, 8);
+  check_struct_and_union3(short, short, long long, 16, 8);
+  check_struct_and_union3(short, short, float, 8, 4);
+  check_struct_and_union3(short, short, double, 16, 8);
+  check_struct_and_union3(short, short, long double, 32, 16);
+  check_struct_and_union3(short, int, char, 12, 4);
+  check_struct_and_union3(short, int, short, 12, 4);
+  check_struct_and_union3(short, int, int, 12, 4);
+  check_struct_and_union3(short, int, long, 16, 8);
+  check_struct_and_union3(short, int, long long, 16, 8);
+  check_struct_and_union3(short, int, float, 12, 4);
+  check_struct_and_union3(short, int, double, 16, 8);
+  check_struct_and_union3(short, int, long double, 32, 16);
+  check_struct_and_union3(short, long, char, 24, 8);
+  check_struct_and_union3(short, long, short, 24, 8);
+  check_struct_and_union3(short, long, int, 24, 8);
+  check_struct_and_union3(short, long, long, 24, 8);
+  check_struct_and_union3(short, long, long long, 24, 8);
+  check_struct_and_union3(short, long, float, 24, 8);
+  check_struct_and_union3(short, long, double, 24, 8);
+  check_struct_and_union3(short, long, long double, 32, 16);
+  check_struct_and_union3(short, long long, char, 24, 8);
+  check_struct_and_union3(short, long long, short, 24, 8);
+  check_struct_and_union3(short, long long, int, 24, 8);
+  check_struct_and_union3(short, long long, long, 24, 8);
+  check_struct_and_union3(short, long long, long long, 24, 8);
+  check_struct_and_union3(short, long long, float, 24, 8);
+  check_struct_and_union3(short, long long, double, 24, 8);
+  check_struct_and_union3(short, long long, long double, 32, 16);
+  check_struct_and_union3(short, float, char, 12, 4);
+  check_struct_and_union3(short, float, short, 12, 4);
+  check_struct_and_union3(short, float, int, 12, 4);
+  check_struct_and_union3(short, float, long, 16, 8);
+  check_struct_and_union3(short, float, long long, 16, 8);
+  check_struct_and_union3(short, float, float, 12, 4);
+  check_struct_and_union3(short, float, double, 16, 8);
+  check_struct_and_union3(short, float, long double, 32, 16);
+  check_struct_and_union3(short, double, char, 24, 8);
+  check_struct_and_union3(short, double, short, 24, 8);
+  check_struct_and_union3(short, double, int, 24, 8);
+  check_struct_and_union3(short, double, long, 24, 8);
+  check_struct_and_union3(short, double, long long, 24, 8);
+  check_struct_and_union3(short, double, float, 24, 8);
+  check_struct_and_union3(short, double, double, 24, 8);
+  check_struct_and_union3(short, double, long double, 32, 16);
+  check_struct_and_union3(short, long double, char, 48, 16);
+  check_struct_and_union3(short, long double, short, 48, 16);
+  check_struct_and_union3(short, long double, int, 48, 16);
+  check_struct_and_union3(short, long double, long, 48, 16);
+  check_struct_and_union3(short, long double, long long, 48, 16);
+  check_struct_and_union3(short, long double, float, 48, 16);
+  check_struct_and_union3(short, long double, double, 48, 16);
+  check_struct_and_union3(short, long double, long double, 48, 16);
+  check_struct_and_union3(int, char, char, 8, 4);
+  check_struct_and_union3(int, char, short, 8, 4);
+  check_struct_and_union3(int, char, int, 12, 4);
+  check_struct_and_union3(int, char, long, 16, 8);
+  check_struct_and_union3(int, char, long long, 16, 8);
+  check_struct_and_union3(int, char, float, 12, 4);
+  check_struct_and_union3(int, char, double, 16, 8);
+  check_struct_and_union3(int, char, long double, 32, 16);
+  check_struct_and_union3(int, short, char, 8, 4);
+  check_struct_and_union3(int, short, short, 8, 4);
+  check_struct_and_union3(int, short, int, 12, 4);
+  check_struct_and_union3(int, short, long, 16, 8);
+  check_struct_and_union3(int, short, long long, 16, 8);
+  check_struct_and_union3(int, short, float, 12, 4);
+  check_struct_and_union3(int, short, double, 16, 8);
+  check_struct_and_union3(int, short, long double, 32, 16);
+  check_struct_and_union3(int, int, char, 12, 4);
+  check_struct_and_union3(int, int, short, 12, 4);
+  check_struct_and_union3(int, int, int, 12, 4);
+  check_struct_and_union3(int, int, long, 16, 8);
+  check_struct_and_union3(int, int, long long, 16, 8);
+  check_struct_and_union3(int, int, float, 12, 4);
+  check_struct_and_union3(int, int, double, 16, 8);
+  check_struct_and_union3(int, int, long double, 32, 16);
+  check_struct_and_union3(int, long, char, 24, 8);
+  check_struct_and_union3(int, long, short, 24, 8);
+  check_struct_and_union3(int, long, int, 24, 8);
+  check_struct_and_union3(int, long, long, 24, 8);
+  check_struct_and_union3(int, long, long long, 24, 8);
+  check_struct_and_union3(int, long, float, 24, 8);
+  check_struct_and_union3(int, long, double, 24, 8);
+  check_struct_and_union3(int, long, long double, 32, 16);
+  check_struct_and_union3(int, long long, char, 24, 8);
+  check_struct_and_union3(int, long long, short, 24, 8);
+  check_struct_and_union3(int, long long, int, 24, 8);
+  check_struct_and_union3(int, long long, long, 24, 8);
+  check_struct_and_union3(int, long long, long long, 24, 8);
+  check_struct_and_union3(int, long long, float, 24, 8);
+  check_struct_and_union3(int, long long, double, 24, 8);
+  check_struct_and_union3(int, long long, long double, 32, 16);
+  check_struct_and_union3(int, float, char, 12, 4);
+  check_struct_and_union3(int, float, short, 12, 4);
+  check_struct_and_union3(int, float, int, 12, 4);
+  check_struct_and_union3(int, float, long, 16, 8);
+  check_struct_and_union3(int, float, long long, 16, 8);
+  check_struct_and_union3(int, float, float, 12, 4);
+  check_struct_and_union3(int, float, double, 16, 8);
+  check_struct_and_union3(int, float, long double, 32, 16);
+  check_struct_and_union3(int, double, char, 24, 8);
+  check_struct_and_union3(int, double, short, 24, 8);
+  check_struct_and_union3(int, double, int, 24, 8);
+  check_struct_and_union3(int, double, long, 24, 8);
+  check_struct_and_union3(int, double, long long, 24, 8);
+  check_struct_and_union3(int, double, float, 24, 8);
+  check_struct_and_union3(int, double, double, 24, 8);
+  check_struct_and_union3(int, double, long double, 32, 16);
+  check_struct_and_union3(int, long double, char, 48, 16);
+  check_struct_and_union3(int, long double, short, 48, 16);
+  check_struct_and_union3(int, long double, int, 48, 16);
+  check_struct_and_union3(int, long double, long, 48, 16);
+  check_struct_and_union3(int, long double, long long, 48, 16);
+  check_struct_and_union3(int, long double, float, 48, 16);
+  check_struct_and_union3(int, long double, double, 48, 16);
+  check_struct_and_union3(int, long double, long double, 48, 16);
+  check_struct_and_union3(long, char, char, 16, 8);
+  check_struct_and_union3(long, char, short, 16, 8);
+  check_struct_and_union3(long, char, int, 16, 8);
+  check_struct_and_union3(long, char, long, 24, 8);
+  check_struct_and_union3(long, char, long long, 24, 8);
+  check_struct_and_union3(long, char, float, 16, 8);
+  check_struct_and_union3(long, char, double, 24, 8);
+  check_struct_and_union3(long, char, long double, 32, 16);
+  check_struct_and_union3(long, short, char, 16, 8);
+  check_struct_and_union3(long, short, short, 16, 8);
+  check_struct_and_union3(long, short, int, 16, 8);
+  check_struct_and_union3(long, short, long, 24, 8);
+  check_struct_and_union3(long, short, long long, 24, 8);
+  check_struct_and_union3(long, short, float, 16, 8);
+  check_struct_and_union3(long, short, double, 24, 8);
+  check_struct_and_union3(long, short, long double, 32, 16);
+  check_struct_and_union3(long, int, char, 16, 8);
+  check_struct_and_union3(long, int, short, 16, 8);
+  check_struct_and_union3(long, int, int, 16, 8);
+  check_struct_and_union3(long, int, long, 24, 8);
+  check_struct_and_union3(long, int, long long, 24, 8);
+  check_struct_and_union3(long, int, float, 16, 8);
+  check_struct_and_union3(long, int, double, 24, 8);
+  check_struct_and_union3(long, int, long double, 32, 16);
+  check_struct_and_union3(long, long, char, 24, 8);
+  check_struct_and_union3(long, long, short, 24, 8);
+  check_struct_and_union3(long, long, int, 24, 8);
+  check_struct_and_union3(long, long, long, 24, 8);
+  check_struct_and_union3(long, long, long long, 24, 8);
+  check_struct_and_union3(long, long, float, 24, 8);
+  check_struct_and_union3(long, long, double, 24, 8);
+  check_struct_and_union3(long, long, long double, 32, 16);
+  check_struct_and_union3(long, long long, char, 24, 8);
+  check_struct_and_union3(long, long long, short, 24, 8);
+  check_struct_and_union3(long, long long, int, 24, 8);
+  check_struct_and_union3(long, long long, long, 24, 8);
+  check_struct_and_union3(long, long long, long long, 24, 8);
+  check_struct_and_union3(long, long long, float, 24, 8);
+  check_struct_and_union3(long, long long, double, 24, 8);
+  check_struct_and_union3(long, long long, long double, 32, 16);
+  check_struct_and_union3(long, float, char, 16, 8);
+  check_struct_and_union3(long, float, short, 16, 8);
+  check_struct_and_union3(long, float, int, 16, 8);
+  check_struct_and_union3(long, float, long, 24, 8);
+  check_struct_and_union3(long, float, long long, 24, 8);
+  check_struct_and_union3(long, float, float, 16, 8);
+  check_struct_and_union3(long, float, double, 24, 8);
+  check_struct_and_union3(long, float, long double, 32, 16);
+  check_struct_and_union3(long, double, char, 24, 8);
+  check_struct_and_union3(long, double, short, 24, 8);
+  check_struct_and_union3(long, double, int, 24, 8);
+  check_struct_and_union3(long, double, long, 24, 8);
+  check_struct_and_union3(long, double, long long, 24, 8);
+  check_struct_and_union3(long, double, float, 24, 8);
+  check_struct_and_union3(long, double, double, 24, 8);
+  check_struct_and_union3(long, double, long double, 32, 16);
+  check_struct_and_union3(long, long double, char, 48, 16);
+  check_struct_and_union3(long, long double, short, 48, 16);
+  check_struct_and_union3(long, long double, int, 48, 16);
+  check_struct_and_union3(long, long double, long, 48, 16);
+  check_struct_and_union3(long, long double, long long, 48, 16);
+  check_struct_and_union3(long, long double, float, 48, 16);
+  check_struct_and_union3(long, long double, double, 48, 16);
+  check_struct_and_union3(long, long double, long double, 48, 16);
+  check_struct_and_union3(long long, char, char, 16, 8);
+  check_struct_and_union3(long long, char, short, 16, 8);
+  check_struct_and_union3(long long, char, int, 16, 8);
+  check_struct_and_union3(long long, char, long, 24, 8);
+  check_struct_and_union3(long long, char, long long, 24, 8);
+  check_struct_and_union3(long long, char, float, 16, 8);
+  check_struct_and_union3(long long, char, double, 24, 8);
+  check_struct_and_union3(long long, char, long double, 32, 16);
+  check_struct_and_union3(long long, short, char, 16, 8);
+  check_struct_and_union3(long long, short, short, 16, 8);
+  check_struct_and_union3(long long, short, int, 16, 8);
+  check_struct_and_union3(long long, short, long, 24, 8);
+  check_struct_and_union3(long long, short, long long, 24, 8);
+  check_struct_and_union3(long long, short, float, 16, 8);
+  check_struct_and_union3(long long, short, double, 24, 8);
+  check_struct_and_union3(long long, short, long double, 32, 16);
+  check_struct_and_union3(long long, int, char, 16, 8);
+  check_struct_and_union3(long long, int, short, 16, 8);
+  check_struct_and_union3(long long, int, int, 16, 8);
+  check_struct_and_union3(long long, int, long, 24, 8);
+  check_struct_and_union3(long long, int, long long, 24, 8);
+  check_struct_and_union3(long long, int, float, 16, 8);
+  check_struct_and_union3(long long, int, double, 24, 8);
+  check_struct_and_union3(long long, int, long double, 32, 16);
+  check_struct_and_union3(long long, long, char, 24, 8);
+  check_struct_and_union3(long long, long, short, 24, 8);
+  check_struct_and_union3(long long, long, int, 24, 8);
+  check_struct_and_union3(long long, long, long, 24, 8);
+  check_struct_and_union3(long long, long, long long, 24, 8);
+  check_struct_and_union3(long long, long, float, 24, 8);
+  check_struct_and_union3(long long, long, double, 24, 8);
+  check_struct_and_union3(long long, long, long double, 32, 16);
+  check_struct_and_union3(long long, long long, char, 24, 8);
+  check_struct_and_union3(long long, long long, short, 24, 8);
+  check_struct_and_union3(long long, long long, int, 24, 8);
+  check_struct_and_union3(long long, long long, long, 24, 8);
+  check_struct_and_union3(long long, long long, long long, 24, 8);
+  check_struct_and_union3(long long, long long, float, 24, 8);
+  check_struct_and_union3(long long, long long, double, 24, 8);
+  check_struct_and_union3(long long, long long, long double, 32, 16);
+  check_struct_and_union3(long long, float, char, 16, 8);
+  check_struct_and_union3(long long, float, short, 16, 8);
+  check_struct_and_union3(long long, float, int, 16, 8);
+  check_struct_and_union3(long long, float, long, 24, 8);
+  check_struct_and_union3(long long, float, long long, 24, 8);
+  check_struct_and_union3(long long, float, float, 16, 8);
+  check_struct_and_union3(long long, float, double, 24, 8);
+  check_struct_and_union3(long long, float, long double, 32, 16);
+  check_struct_and_union3(long long, double, char, 24, 8);
+  check_struct_and_union3(long long, double, short, 24, 8);
+  check_struct_and_union3(long long, double, int, 24, 8);
+  check_struct_and_union3(long long, double, long, 24, 8);
+  check_struct_and_union3(long long, double, long long, 24, 8);
+  check_struct_and_union3(long long, double, float, 24, 8);
+  check_struct_and_union3(long long, double, double, 24, 8);
+  check_struct_and_union3(long long, double, long double, 32, 16);
+  check_struct_and_union3(long long, long double, char, 48, 16);
+  check_struct_and_union3(long long, long double, short, 48, 16);
+  check_struct_and_union3(long long, long double, int, 48, 16);
+  check_struct_and_union3(long long, long double, long, 48, 16);
+  check_struct_and_union3(long long, long double, long long, 48, 16);
+  check_struct_and_union3(long long, long double, float, 48, 16);
+  check_struct_and_union3(long long, long double, double, 48, 16);
+  check_struct_and_union3(long long, long double, long double, 48, 16);
+  check_struct_and_union3(float, char, char, 8, 4);
+  check_struct_and_union3(float, char, short, 8, 4);
+  check_struct_and_union3(float, char, int, 12, 4);
+  check_struct_and_union3(float, char, long, 16, 8);
+  check_struct_and_union3(float, char, long long, 16, 8);
+  check_struct_and_union3(float, char, float, 12, 4);
+  check_struct_and_union3(float, char, double, 16, 8);
+  check_struct_and_union3(float, char, long double, 32, 16);
+  check_struct_and_union3(float, short, char, 8, 4);
+  check_struct_and_union3(float, short, short, 8, 4);
+  check_struct_and_union3(float, short, int, 12, 4);
+  check_struct_and_union3(float, short, long, 16, 8);
+  check_struct_and_union3(float, short, long long, 16, 8);
+  check_struct_and_union3(float, short, float, 12, 4);
+  check_struct_and_union3(float, short, double, 16, 8);
+  check_struct_and_union3(float, short, long double, 32, 16);
+  check_struct_and_union3(float, int, char, 12, 4);
+  check_struct_and_union3(float, int, short, 12, 4);
+  check_struct_and_union3(float, int, int, 12, 4);
+  check_struct_and_union3(float, int, long, 16, 8);
+  check_struct_and_union3(float, int, long long, 16, 8);
+  check_struct_and_union3(float, int, float, 12, 4);
+  check_struct_and_union3(float, int, double, 16, 8);
+  check_struct_and_union3(float, int, long double, 32, 16);
+  check_struct_and_union3(float, long, char, 24, 8);
+  check_struct_and_union3(float, long, short, 24, 8);
+  check_struct_and_union3(float, long, int, 24, 8);
+  check_struct_and_union3(float, long, long, 24, 8);
+  check_struct_and_union3(float, long, long long, 24, 8);
+  check_struct_and_union3(float, long, float, 24, 8);
+  check_struct_and_union3(float, long, double, 24, 8);
+  check_struct_and_union3(float, long, long double, 32, 16);
+  check_struct_and_union3(float, long long, char, 24, 8);
+  check_struct_and_union3(float, long long, short, 24, 8);
+  check_struct_and_union3(float, long long, int, 24, 8);
+  check_struct_and_union3(float, long long, long, 24, 8);
+  check_struct_and_union3(float, long long, long long, 24, 8);
+  check_struct_and_union3(float, long long, float, 24, 8);
+  check_struct_and_union3(float, long long, double, 24, 8);
+  check_struct_and_union3(float, long long, long double, 32, 16);
+  check_struct_and_union3(float, float, char, 12, 4);
+  check_struct_and_union3(float, float, short, 12, 4);
+  check_struct_and_union3(float, float, int, 12, 4);
+  check_struct_and_union3(float, float, long, 16, 8);
+  check_struct_and_union3(float, float, long long, 16, 8);
+  check_struct_and_union3(float, float, float, 12, 4);
+  check_struct_and_union3(float, float, double, 16, 8);
+  check_struct_and_union3(float, float, long double, 32, 16);
+  check_struct_and_union3(float, double, char, 24, 8);
+  check_struct_and_union3(float, double, short, 24, 8);
+  check_struct_and_union3(float, double, int, 24, 8);
+  check_struct_and_union3(float, double, long, 24, 8);
+  check_struct_and_union3(float, double, long long, 24, 8);
+  check_struct_and_union3(float, double, float, 24, 8);
+  check_struct_and_union3(float, double, double, 24, 8);
+  check_struct_and_union3(float, double, long double, 32, 16);
+  check_struct_and_union3(float, long double, char, 48, 16);
+  check_struct_and_union3(float, long double, short, 48, 16);
+  check_struct_and_union3(float, long double, int, 48, 16);
+  check_struct_and_union3(float, long double, long, 48, 16);
+  check_struct_and_union3(float, long double, long long, 48, 16);
+  check_struct_and_union3(float, long double, float, 48, 16);
+  check_struct_and_union3(float, long double, double, 48, 16);
+  check_struct_and_union3(float, long double, long double, 48, 16);
+  check_struct_and_union3(double, char, char, 16, 8);
+  check_struct_and_union3(double, char, short, 16, 8);
+  check_struct_and_union3(double, char, int, 16, 8);
+  check_struct_and_union3(double, char, long, 24, 8);
+  check_struct_and_union3(double, char, long long, 24, 8);
+  check_struct_and_union3(double, char, float, 16, 8);
+  check_struct_and_union3(double, char, double, 24, 8);
+  check_struct_and_union3(double, char, long double, 32, 16);
+  check_struct_and_union3(double, short, char, 16, 8);
+  check_struct_and_union3(double, short, short, 16, 8);
+  check_struct_and_union3(double, short, int, 16, 8);
+  check_struct_and_union3(double, short, long, 24, 8);
+  check_struct_and_union3(double, short, long long, 24, 8);
+  check_struct_and_union3(double, short, float, 16, 8);
+  check_struct_and_union3(double, short, double, 24, 8);
+  check_struct_and_union3(double, short, long double, 32, 16);
+  check_struct_and_union3(double, int, char, 16, 8);
+  check_struct_and_union3(double, int, short, 16, 8);
+  check_struct_and_union3(double, int, int, 16, 8);
+  check_struct_and_union3(double, int, long, 24, 8);
+  check_struct_and_union3(double, int, long long, 24, 8);
+  check_struct_and_union3(double, int, float, 16, 8);
+  check_struct_and_union3(double, int, double, 24, 8);
+  check_struct_and_union3(double, int, long double, 32, 16);
+  check_struct_and_union3(double, long, char, 24, 8);
+  check_struct_and_union3(double, long, short, 24, 8);
+  check_struct_and_union3(double, long, int, 24, 8);
+  check_struct_and_union3(double, long, long, 24, 8);
+  check_struct_and_union3(double, long, long long, 24, 8);
+  check_struct_and_union3(double, long, float, 24, 8);
+  check_struct_and_union3(double, long, double, 24, 8);
+  check_struct_and_union3(double, long, long double, 32, 16);
+  check_struct_and_union3(double, long long, char, 24, 8);
+  check_struct_and_union3(double, long long, short, 24, 8);
+  check_struct_and_union3(double, long long, int, 24, 8);
+  check_struct_and_union3(double, long long, long, 24, 8);
+  check_struct_and_union3(double, long long, long long, 24, 8);
+  check_struct_and_union3(double, long long, float, 24, 8);
+  check_struct_and_union3(double, long long, double, 24, 8);
+  check_struct_and_union3(double, long long, long double, 32, 16);
+  check_struct_and_union3(double, float, char, 16, 8);
+  check_struct_and_union3(double, float, short, 16, 8);
+  check_struct_and_union3(double, float, int, 16, 8);
+  check_struct_and_union3(double, float, long, 24, 8);
+  check_struct_and_union3(double, float, long long, 24, 8);
+  check_struct_and_union3(double, float, float, 16, 8);
+  check_struct_and_union3(double, float, double, 24, 8);
+  check_struct_and_union3(double, float, long double, 32, 16);
+  check_struct_and_union3(double, double, char, 24, 8);
+  check_struct_and_union3(double, double, short, 24, 8);
+  check_struct_and_union3(double, double, int, 24, 8);
+  check_struct_and_union3(double, double, long, 24, 8);
+  check_struct_and_union3(double, double, long long, 24, 8);
+  check_struct_and_union3(double, double, float, 24, 8);
+  check_struct_and_union3(double, double, double, 24, 8);
+  check_struct_and_union3(double, double, long double, 32, 16);
+  check_struct_and_union3(double, long double, char, 48, 16);
+  check_struct_and_union3(double, long double, short, 48, 16);
+  check_struct_and_union3(double, long double, int, 48, 16);
+  check_struct_and_union3(double, long double, long, 48, 16);
+  check_struct_and_union3(double, long double, long long, 48, 16);
+  check_struct_and_union3(double, long double, float, 48, 16);
+  check_struct_and_union3(double, long double, double, 48, 16);
+  check_struct_and_union3(double, long double, long double, 48, 16);
+  check_struct_and_union3(long double, char, char, 32, 16);
+  check_struct_and_union3(long double, char, short, 32, 16);
+  check_struct_and_union3(long double, char, int, 32, 16);
+  check_struct_and_union3(long double, char, long, 32, 16);
+  check_struct_and_union3(long double, char, long long, 32, 16);
+  check_struct_and_union3(long double, char, float, 32, 16);
+  check_struct_and_union3(long double, char, double, 32, 16);
+  check_struct_and_union3(long double, char, long double, 48, 16);
+  check_struct_and_union3(long double, short, char, 32, 16);
+  check_struct_and_union3(long double, short, short, 32, 16);
+  check_struct_and_union3(long double, short, int, 32, 16);
+  check_struct_and_union3(long double, short, long, 32, 16);
+  check_struct_and_union3(long double, short, long long, 32, 16);
+  check_struct_and_union3(long double, short, float, 32, 16);
+  check_struct_and_union3(long double, short, double, 32, 16);
+  check_struct_and_union3(long double, short, long double, 48, 16);
+  check_struct_and_union3(long double, int, char, 32, 16);
+  check_struct_and_union3(long double, int, short, 32, 16);
+  check_struct_and_union3(long double, int, int, 32, 16);
+  check_struct_and_union3(long double, int, long, 32, 16);
+  check_struct_and_union3(long double, int, long long, 32, 16);
+  check_struct_and_union3(long double, int, float, 32, 16);
+  check_struct_and_union3(long double, int, double, 32, 16);
+  check_struct_and_union3(long double, int, long double, 48, 16);
+  check_struct_and_union3(long double, long, char, 32, 16);
+  check_struct_and_union3(long double, long, short, 32, 16);
+  check_struct_and_union3(long double, long, int, 32, 16);
+  check_struct_and_union3(long double, long, long, 32, 16);
+  check_struct_and_union3(long double, long, long long, 32, 16);
+  check_struct_and_union3(long double, long, float, 32, 16);
+  check_struct_and_union3(long double, long, double, 32, 16);
+  check_struct_and_union3(long double, long, long double, 48, 16);
+  check_struct_and_union3(long double, long long, char, 32, 16);
+  check_struct_and_union3(long double, long long, short, 32, 16);
+  check_struct_and_union3(long double, long long, int, 32, 16);
+  check_struct_and_union3(long double, long long, long, 32, 16);
+  check_struct_and_union3(long double, long long, long long, 32, 16);
+  check_struct_and_union3(long double, long long, float, 32, 16);
+  check_struct_and_union3(long double, long long, double, 32, 16);
+  check_struct_and_union3(long double, long long, long double, 48, 16);
+  check_struct_and_union3(long double, float, char, 32, 16);
+  check_struct_and_union3(long double, float, short, 32, 16);
+  check_struct_and_union3(long double, float, int, 32, 16);
+  check_struct_and_union3(long double, float, long, 32, 16);
+  check_struct_and_union3(long double, float, long long, 32, 16);
+  check_struct_and_union3(long double, float, float, 32, 16);
+  check_struct_and_union3(long double, float, double, 32, 16);
+  check_struct_and_union3(long double, float, long double, 48, 16);
+  check_struct_and_union3(long double, double, char, 32, 16);
+  check_struct_and_union3(long double, double, short, 32, 16);
+  check_struct_and_union3(long double, double, int, 32, 16);
+  check_struct_and_union3(long double, double, long, 32, 16);
+  check_struct_and_union3(long double, double, long long, 32, 16);
+  check_struct_and_union3(long double, double, float, 32, 16);
+  check_struct_and_union3(long double, double, double, 32, 16);
+  check_struct_and_union3(long double, double, long double, 48, 16);
+  check_struct_and_union3(long double, long double, char, 48, 16);
+  check_struct_and_union3(long double, long double, short, 48, 16);
+  check_struct_and_union3(long double, long double, int, 48, 16);
+  check_struct_and_union3(long double, long double, long, 48, 16);
+  check_struct_and_union3(long double, long double, long long, 48, 16);
+  check_struct_and_union3(long double, long double, float, 48, 16);
+  check_struct_and_union3(long double, long double, double, 48, 16);
+  check_struct_and_union3(long double, long double, long double, 48, 16);
+  check_struct_and_union3(char, char, _Float16, 4, 2);
+  check_struct_and_union3(char, _Float16, char, 6, 2);
+  check_struct_and_union3(char, _Float16, _Float16, 6, 2);
+  check_struct_and_union3(char, _Float16, int, 8, 4);
+  check_struct_and_union3(char, _Float16, long, 16, 8);
+  check_struct_and_union3(char, _Float16, long long, 16, 8);
+  check_struct_and_union3(char, _Float16, float, 8, 4);
+  check_struct_and_union3(char, _Float16, double, 16, 8);
+  check_struct_and_union3(char, _Float16, long double, 32, 16);
+  check_struct_and_union3(char, int, _Float16, 12, 4);
+  check_struct_and_union3(char, long, _Float16, 24, 8);
+  check_struct_and_union3(char, long long, _Float16, 24, 8);
+  check_struct_and_union3(char, float, _Float16, 12, 4);
+  check_struct_and_union3(char, double, _Float16, 24, 8);
+  check_struct_and_union3(char, long double, _Float16, 48, 16);
+  check_struct_and_union3(_Float16, char, char, 4, 2);
+  check_struct_and_union3(_Float16, char, _Float16, 6, 2);
+  check_struct_and_union3(_Float16, char, int, 8, 4);
+  check_struct_and_union3(_Float16, char, long, 16, 8);
+  check_struct_and_union3(_Float16, char, long long, 16, 8);
+  check_struct_and_union3(_Float16, char, float, 8, 4);
+  check_struct_and_union3(_Float16, char, double, 16, 8);
+  check_struct_and_union3(_Float16, char, long double, 32, 16);
+  check_struct_and_union3(_Float16, _Float16, char, 6, 2);
+  check_struct_and_union3(_Float16, _Float16, _Float16, 6, 2);
+  check_struct_and_union3(_Float16, _Float16, int, 8, 4);
+  check_struct_and_union3(_Float16, _Float16, long, 16, 8);
+  check_struct_and_union3(_Float16, _Float16, long long, 16, 8);
+  check_struct_and_union3(_Float16, _Float16, float, 8, 4);
+  check_struct_and_union3(_Float16, _Float16, double, 16, 8);
+  check_struct_and_union3(_Float16, _Float16, long double, 32, 16);
+  check_struct_and_union3(_Float16, int, char, 12, 4);
+  check_struct_and_union3(_Float16, int, _Float16, 12, 4);
+  check_struct_and_union3(_Float16, int, int, 12, 4);
+  check_struct_and_union3(_Float16, int, long, 16, 8);
+  check_struct_and_union3(_Float16, int, long long, 16, 8);
+  check_struct_and_union3(_Float16, int, float, 12, 4);
+  check_struct_and_union3(_Float16, int, double, 16, 8);
+  check_struct_and_union3(_Float16, int, long double, 32, 16);
+  check_struct_and_union3(_Float16, long, char, 24, 8);
+  check_struct_and_union3(_Float16, long, _Float16, 24, 8);
+  check_struct_and_union3(_Float16, long, int, 24, 8);
+  check_struct_and_union3(_Float16, long, long, 24, 8);
+  check_struct_and_union3(_Float16, long, long long, 24, 8);
+  check_struct_and_union3(_Float16, long, float, 24, 8);
+  check_struct_and_union3(_Float16, long, double, 24, 8);
+  check_struct_and_union3(_Float16, long, long double, 32, 16);
+  check_struct_and_union3(_Float16, long long, char, 24, 8);
+  check_struct_and_union3(_Float16, long long, _Float16, 24, 8);
+  check_struct_and_union3(_Float16, long long, int, 24, 8);
+  check_struct_and_union3(_Float16, long long, long, 24, 8);
+  check_struct_and_union3(_Float16, long long, long long, 24, 8);
+  check_struct_and_union3(_Float16, long long, float, 24, 8);
+  check_struct_and_union3(_Float16, long long, double, 24, 8);
+  check_struct_and_union3(_Float16, long long, long double, 32, 16);
+  check_struct_and_union3(_Float16, float, char, 12, 4);
+  check_struct_and_union3(_Float16, float, _Float16, 12, 4);
+  check_struct_and_union3(_Float16, float, int, 12, 4);
+  check_struct_and_union3(_Float16, float, long, 16, 8);
+  check_struct_and_union3(_Float16, float, long long, 16, 8);
+  check_struct_and_union3(_Float16, float, float, 12, 4);
+  check_struct_and_union3(_Float16, float, double, 16, 8);
+  check_struct_and_union3(_Float16, float, long double, 32, 16);
+  check_struct_and_union3(_Float16, double, char, 24, 8);
+  check_struct_and_union3(_Float16, double, _Float16, 24, 8);
+  check_struct_and_union3(_Float16, double, int, 24, 8);
+  check_struct_and_union3(_Float16, double, long, 24, 8);
+  check_struct_and_union3(_Float16, double, long long, 24, 8);
+  check_struct_and_union3(_Float16, double, float, 24, 8);
+  check_struct_and_union3(_Float16, double, double, 24, 8);
+  check_struct_and_union3(_Float16, double, long double, 32, 16);
+  check_struct_and_union3(_Float16, long double, char, 48, 16);
+  check_struct_and_union3(_Float16, long double, _Float16, 48, 16);
+  check_struct_and_union3(_Float16, long double, int, 48, 16);
+  check_struct_and_union3(_Float16, long double, long, 48, 16);
+  check_struct_and_union3(_Float16, long double, long long, 48, 16);
+  check_struct_and_union3(_Float16, long double, float, 48, 16);
+  check_struct_and_union3(_Float16, long double, double, 48, 16);
+  check_struct_and_union3(_Float16, long double, long double, 48, 16);
+  check_struct_and_union3(int, char, _Float16, 8, 4);
+  check_struct_and_union3(int, _Float16, char, 8, 4);
+  check_struct_and_union3(int, _Float16, _Float16, 8, 4);
+  check_struct_and_union3(int, _Float16, int, 12, 4);
+  check_struct_and_union3(int, _Float16, long, 16, 8);
+  check_struct_and_union3(int, _Float16, long long, 16, 8);
+  check_struct_and_union3(int, _Float16, float, 12, 4);
+  check_struct_and_union3(int, _Float16, double, 16, 8);
+  check_struct_and_union3(int, _Float16, long double, 32, 16);
+  check_struct_and_union3(int, int, _Float16, 12, 4);
+  check_struct_and_union3(int, long, _Float16, 24, 8);
+  check_struct_and_union3(int, long long, _Float16, 24, 8);
+  check_struct_and_union3(int, float, _Float16, 12, 4);
+  check_struct_and_union3(int, double, _Float16, 24, 8);
+  check_struct_and_union3(int, long double, _Float16, 48, 16);
+  check_struct_and_union3(long, char, _Float16, 16, 8);
+  check_struct_and_union3(long, _Float16, char, 16, 8);
+  check_struct_and_union3(long, _Float16, _Float16, 16, 8);
+  check_struct_and_union3(long, _Float16, int, 16, 8);
+  check_struct_and_union3(long, _Float16, long, 24, 8);
+  check_struct_and_union3(long, _Float16, long long, 24, 8);
+  check_struct_and_union3(long, _Float16, float, 16, 8);
+  check_struct_and_union3(long, _Float16, double, 24, 8);
+  check_struct_and_union3(long, _Float16, long double, 32, 16);
+  check_struct_and_union3(long, int, _Float16, 16, 8);
+  check_struct_and_union3(long, long, _Float16, 24, 8);
+  check_struct_and_union3(long, long long, _Float16, 24, 8);
+  check_struct_and_union3(long, float, _Float16, 16, 8);
+  check_struct_and_union3(long, double, _Float16, 24, 8);
+  check_struct_and_union3(long, long double, _Float16, 48, 16);
+  check_struct_and_union3(long long, char, _Float16, 16, 8);
+  check_struct_and_union3(long long, _Float16, char, 16, 8);
+  check_struct_and_union3(long long, _Float16, _Float16, 16, 8);
+  check_struct_and_union3(long long, _Float16, int, 16, 8);
+  check_struct_and_union3(long long, _Float16, long, 24, 8);
+  check_struct_and_union3(long long, _Float16, long long, 24, 8);
+  check_struct_and_union3(long long, _Float16, float, 16, 8);
+  check_struct_and_union3(long long, _Float16, double, 24, 8);
+  check_struct_and_union3(long long, _Float16, long double, 32, 16);
+  check_struct_and_union3(long long, int, _Float16, 16, 8);
+  check_struct_and_union3(long long, long, _Float16, 24, 8);
+  check_struct_and_union3(long long, long long, _Float16, 24, 8);
+  check_struct_and_union3(long long, float, _Float16, 16, 8);
+  check_struct_and_union3(long long, double, _Float16, 24, 8);
+  check_struct_and_union3(long long, long double, _Float16, 48, 16);
+  check_struct_and_union3(float, char, _Float16, 8, 4);
+  check_struct_and_union3(float, _Float16, char, 8, 4);
+  check_struct_and_union3(float, _Float16, _Float16, 8, 4);
+  check_struct_and_union3(float, _Float16, int, 12, 4);
+  check_struct_and_union3(float, _Float16, long, 16, 8);
+  check_struct_and_union3(float, _Float16, long long, 16, 8);
+  check_struct_and_union3(float, _Float16, float, 12, 4);
+  check_struct_and_union3(float, _Float16, double, 16, 8);
+  check_struct_and_union3(float, _Float16, long double, 32, 16);
+  check_struct_and_union3(float, int, _Float16, 12, 4);
+  check_struct_and_union3(float, long, _Float16, 24, 8);
+  check_struct_and_union3(float, long long, _Float16, 24, 8);
+  check_struct_and_union3(float, float, _Float16, 12, 4);
+  check_struct_and_union3(float, double, _Float16, 24, 8);
+  check_struct_and_union3(float, long double, _Float16, 48, 16);
+  check_struct_and_union3(double, char, _Float16, 16, 8);
+  check_struct_and_union3(double, _Float16, char, 16, 8);
+  check_struct_and_union3(double, _Float16, _Float16, 16, 8);
+  check_struct_and_union3(double, _Float16, int, 16, 8);
+  check_struct_and_union3(double, _Float16, long, 24, 8);
+  check_struct_and_union3(double, _Float16, long long, 24, 8);
+  check_struct_and_union3(double, _Float16, float, 16, 8);
+  check_struct_and_union3(double, _Float16, double, 24, 8);
+  check_struct_and_union3(double, _Float16, long double, 32, 16);
+  check_struct_and_union3(double, int, _Float16, 16, 8);
+  check_struct_and_union3(double, long, _Float16, 24, 8);
+  check_struct_and_union3(double, long long, _Float16, 24, 8);
+  check_struct_and_union3(double, float, _Float16, 16, 8);
+  check_struct_and_union3(double, double, _Float16, 24, 8);
+  check_struct_and_union3(double, long double, _Float16, 48, 16);
+  check_struct_and_union3(long double, char, _Float16, 32, 16);
+  check_struct_and_union3(long double, _Float16, char, 32, 16);
+  check_struct_and_union3(long double, _Float16, _Float16, 32, 16);
+  check_struct_and_union3(long double, _Float16, int, 32, 16);
+  check_struct_and_union3(long double, _Float16, long, 32, 16);
+  check_struct_and_union3(long double, _Float16, long long, 32, 16);
+  check_struct_and_union3(long double, _Float16, float, 32, 16);
+  check_struct_and_union3(long double, _Float16, double, 32, 16);
+  check_struct_and_union3(long double, _Float16, long double, 48, 16);
+  check_struct_and_union3(long double, int, _Float16, 32, 16);
+  check_struct_and_union3(long double, long, _Float16, 32, 16);
+  check_struct_and_union3(long double, long long, _Float16, 32, 16);
+  check_struct_and_union3(long double, float, _Float16, 32, 16);
+  check_struct_and_union3(long double, double, _Float16, 32, 16);
+  check_struct_and_union3(long double, long double, _Float16, 48, 16);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
new file mode 100644
index 00000000000..2a72b5c9e18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
@@ -0,0 +1,45 @@
+/* This checks alignment of basic types.  */
+
+#include "defines.h"
+#include "macros.h"
+
+
+int
+main (void)
+{
+  /* Integral types.  */
+  run_signed_tests2(check_align, char, TYPE_ALIGN_CHAR);
+  run_signed_tests2(check_align, short, TYPE_ALIGN_SHORT);
+  run_signed_tests2(check_align, int, TYPE_ALIGN_INT);
+  run_signed_tests2(check_align, long, TYPE_ALIGN_LONG);
+  run_signed_tests2(check_align, long long, TYPE_ALIGN_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests2(check_align, __int128, TYPE_ALIGN_INT128);
+#endif
+  check_align(enumtype, TYPE_ALIGN_ENUM);
+
+  /* Floating point types.  */
+  check_align(float, TYPE_ALIGN_FLOAT);
+  check_align(double, TYPE_ALIGN_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_align(long double, TYPE_ALIGN_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_align(__float128, TYPE_ALIGN_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_align(__m64, TYPE_ALIGN_M64);
+  check_align(__m128, TYPE_ALIGN_M128);
+#endif
+
+  /* _Float16 point types.  */
+  check_align(_Float16, TYPE_ALIGN_FLOAT16);
+
+  /* Pointer types.  */
+  check_align(void *, TYPE_ALIGN_POINTER);
+  check_align(void (*)(), TYPE_ALIGN_POINTER);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
new file mode 100644
index 00000000000..d58b9d1c43c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
@@ -0,0 +1,43 @@
+/* This checks .  */
+
+#include "defines.h"
+#include "macros.h"
+
+
+int
+main (void)
+{
+  /* Integral types.  */
+  run_signed_tests3(check_array_size_and_align, char, TYPE_SIZE_CHAR, TYPE_ALIGN_CHAR);
+  run_signed_tests3(check_array_size_and_align, short, TYPE_SIZE_SHORT, TYPE_ALIGN_SHORT);
+  run_signed_tests3(check_array_size_and_align, int, TYPE_SIZE_INT, TYPE_ALIGN_INT);
+  run_signed_tests3(check_array_size_and_align, long, TYPE_SIZE_LONG, TYPE_ALIGN_LONG);
+  run_signed_tests3(check_array_size_and_align, long long, TYPE_SIZE_LONG_LONG, TYPE_ALIGN_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests3(check_array_size_and_align, __int128, TYPE_SIZE_INT128, TYPE_ALIGN_INT128);
+#endif
+  check_array_size_and_align(enum dummytype, TYPE_SIZE_ENUM, TYPE_ALIGN_ENUM);
+
+  /* Floating point types.  */
+  check_array_size_and_align(float, TYPE_SIZE_FLOAT, TYPE_ALIGN_FLOAT);
+  check_array_size_and_align(double, TYPE_SIZE_DOUBLE, TYPE_ALIGN_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_array_size_and_align(long double, TYPE_SIZE_LONG_DOUBLE, TYPE_ALIGN_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_array_size_and_align(__float128, TYPE_SIZE_FLOAT128, TYPE_ALIGN_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_array_size_and_align(__m64, TYPE_SIZE_M64, TYPE_ALIGN_M64);
+  check_array_size_and_align(__m128, TYPE_SIZE_M128, TYPE_ALIGN_M128);
+#endif
+
+  /* Pointer types. The function pointer doesn't work with these macros.  */
+  check_array_size_and_align(void *, TYPE_SIZE_POINTER, TYPE_ALIGN_POINTER);
+
+  check_array_size_and_align(_Float16, TYPE_SIZE_FLOAT16, TYPE_ALIGN_FLOAT16);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
new file mode 100644
index 00000000000..36fb24e6250
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
@@ -0,0 +1,87 @@
+/* This is an autogenerated file. Do not edit.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+char
+fun_test_returning_char (void)
+{
+  volatile_var++;
+  return 64;
+}
+
+short
+fun_test_returning_short (void)
+{
+  volatile_var++;
+  return 65;
+}
+
+int
+fun_test_returning_int (void)
+{
+  volatile_var++;
+  return 66;
+}
+
+long
+fun_test_returning_long (void)
+{
+  volatile_var++;
+  return 67;
+}
+
+long long
+fun_test_returning_long_long (void)
+{
+  volatile_var++;
+  return 68;
+}
+
+float
+fun_test_returning_float (void)
+{
+  volatile_var++;
+  return 69;
+}
+
+double
+fun_test_returning_double (void)
+{
+  volatile_var++;
+  return 70;
+}
+
+long double
+fun_test_returning_long_double (void)
+{
+  volatile_var++;
+  return 71;
+}
+
+_Float16
+fun_test_returning_float16 (void)
+{
+  volatile_var++;
+  return 72;
+}
+
+#define def_test_returning_type_xmm(fun, type, ret, reg) \
+  { type var = WRAP_RET (fun) (); \
+  assert (ret == (type) reg && ret == var); }
+
+static void
+do_test (void)
+{
+  def_test_returning_type_xmm(fun_test_returning_char, char, 64, rax);
+  def_test_returning_type_xmm(fun_test_returning_short, short, 65, rax);
+  def_test_returning_type_xmm(fun_test_returning_int, int, 66, rax);
+  def_test_returning_type_xmm(fun_test_returning_long, long, 67, rax);
+  def_test_returning_type_xmm(fun_test_returning_long_long, long long, 68, rax);
+  def_test_returning_type_xmm(fun_test_returning_float, float, 69, xmm_regs[0]._float[0]);
+  def_test_returning_type_xmm(fun_test_returning_double, double, 70, xmm_regs[0]._double[0]);
+  def_test_returning_type_xmm(fun_test_returning_long_double, long double, 71, x87_regs[0]._ldouble);
+  def_test_returning_type_xmm(fun_test_returning_float16, _Float16, 72, xmm_regs[0].__Float16[0]);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
new file mode 100644
index 00000000000..47f3a5e87ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
@@ -0,0 +1,43 @@
+/* This checks sizes of basic types.  */
+
+#include "defines.h"
+#include "macros.h"
+
+
+int
+main (void)
+{
+  /* Integral types.  */
+  run_signed_tests2(check_size, char, TYPE_SIZE_CHAR);
+  run_signed_tests2(check_size, short, TYPE_SIZE_SHORT);
+  run_signed_tests2(check_size, int, TYPE_SIZE_INT);
+  run_signed_tests2(check_size, long, TYPE_SIZE_LONG);
+  run_signed_tests2(check_size, long long, TYPE_SIZE_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests2(check_size, __int128, TYPE_SIZE_INT128);
+#endif
+  check_size(enumtype, TYPE_SIZE_ENUM);
+
+  /* Floating point types.  */
+  check_size(_Float16, TYPE_SIZE_FLOAT16);
+  check_size(float, TYPE_SIZE_FLOAT);
+  check_size(double, TYPE_SIZE_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_size(long double, TYPE_SIZE_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_size(__float128, TYPE_SIZE_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_size(__m64, TYPE_SIZE_M64);
+  check_size(__m128, TYPE_SIZE_M128);
+#endif
+
+  /* Pointer types.  */
+  check_size(void *, TYPE_SIZE_POINTER);
+  check_size(void (*)(), TYPE_SIZE_POINTER);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
new file mode 100644
index 00000000000..3d1add464a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
@@ -0,0 +1,42 @@
+/* This checks size and alignment of structs with a single basic type
+   element. All basic types are checked.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+
+
+static void
+do_test (void)
+{
+  /* Integral types.  */
+  run_signed_tests3(check_basic_struct_size_and_align, char, TYPE_SIZE_CHAR, TYPE_ALIGN_CHAR);
+  run_signed_tests3(check_basic_struct_size_and_align, short, TYPE_SIZE_SHORT, TYPE_ALIGN_SHORT);
+  run_signed_tests3(check_basic_struct_size_and_align, int, TYPE_SIZE_INT, TYPE_ALIGN_INT);
+  run_signed_tests3(check_basic_struct_size_and_align, long, TYPE_SIZE_LONG, TYPE_ALIGN_LONG);
+  run_signed_tests3(check_basic_struct_size_and_align, long long, TYPE_SIZE_LONG_LONG, TYPE_ALIGN_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests3(check_basic_struct_size_and_align, __int128, TYPE_SIZE_INT128, TYPE_ALIGN_INT128);
+#endif
+  check_basic_struct_size_and_align(enum dummytype, TYPE_SIZE_ENUM, TYPE_ALIGN_ENUM);
+
+  /* Floating point types.  */
+  check_basic_struct_size_and_align(_Float16, TYPE_SIZE_FLOAT16, TYPE_ALIGN_FLOAT16);
+  check_basic_struct_size_and_align(float, TYPE_SIZE_FLOAT, TYPE_ALIGN_FLOAT);
+  check_basic_struct_size_and_align(double, TYPE_SIZE_DOUBLE, TYPE_ALIGN_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_basic_struct_size_and_align(long double, TYPE_SIZE_LONG_DOUBLE, TYPE_ALIGN_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_basic_struct_size_and_align(__float128, TYPE_SIZE_FLOAT128, TYPE_ALIGN_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_basic_struct_size_and_align(__m64, TYPE_SIZE_M64, TYPE_ALIGN_M64);
+  check_basic_struct_size_and_align(__m128, TYPE_SIZE_M128, TYPE_ALIGN_M128);
+#endif
+
+  /* Pointer types. The function pointer doesn't work with these macros.  */
+  check_basic_struct_size_and_align(void *, TYPE_SIZE_POINTER, TYPE_ALIGN_POINTER);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
new file mode 100644
index 00000000000..632feebe920
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
@@ -0,0 +1,40 @@
+/* Test of simple unions, size and alignment.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+
+static void
+do_test (void)
+{
+  /* Integral types.  */
+  run_signed_tests3(check_basic_union_size_and_align, char, TYPE_SIZE_CHAR, TYPE_ALIGN_CHAR);
+  run_signed_tests3(check_basic_union_size_and_align, short, TYPE_SIZE_SHORT, TYPE_ALIGN_SHORT);
+  run_signed_tests3(check_basic_union_size_and_align, int, TYPE_SIZE_INT, TYPE_ALIGN_INT);
+  run_signed_tests3(check_basic_union_size_and_align, long, TYPE_SIZE_LONG, TYPE_ALIGN_LONG);
+  run_signed_tests3(check_basic_union_size_and_align, long long, TYPE_SIZE_LONG_LONG, TYPE_ALIGN_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests3(check_basic_union_size_and_align, __int128, TYPE_SIZE_INT128, TYPE_ALIGN_INT128);
+#endif
+  check_basic_union_size_and_align(enum dummytype, TYPE_SIZE_ENUM, TYPE_ALIGN_ENUM);
+
+  /* Floating point types.  */
+  check_basic_union_size_and_align(_Float16, TYPE_SIZE_FLOAT16, TYPE_ALIGN_FLOAT16);
+  check_basic_union_size_and_align(float, TYPE_SIZE_FLOAT, TYPE_ALIGN_FLOAT);
+  check_basic_union_size_and_align(double, TYPE_SIZE_DOUBLE, TYPE_ALIGN_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_basic_union_size_and_align(long double, TYPE_SIZE_LONG_DOUBLE, TYPE_ALIGN_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_basic_union_size_and_align(__float128, TYPE_SIZE_FLOAT128, TYPE_ALIGN_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_basic_union_size_and_align(__m64, TYPE_SIZE_M64, TYPE_ALIGN_M64);
+  check_basic_union_size_and_align(__m128, TYPE_SIZE_M128, TYPE_ALIGN_M128);
+#endif
+
+  /* Pointer types. The function pointer doesn't work with these macros.  */
+  check_basic_union_size_and_align(void *, TYPE_SIZE_POINTER, TYPE_ALIGN_POINTER);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
new file mode 100644
index 00000000000..829d86e9ee7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
@@ -0,0 +1,104 @@
+/* This is a small test case for returning a complex number. Written by
+   Andreas Jaeger.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+
+#define BUILD_F16_COMPLEX(real, imag) \
+  ({ __complex__ _Float16 __retval; \
+     __real__ __retval = (real); \
+     __imag__ __retval = (imag); \
+     __retval; })
+
+__complex__ _Float16
+aj_f16_times2 (__complex__ _Float16 x)
+{
+  __complex__ _Float16 res;
+
+  __real__ res = (2.0 * __real__ x);
+  __imag__ res = (2.0 * __imag__ x);
+
+  return res;
+}
+
+#define BUILD_F_COMPLEX(real, imag) \
+  ({ __complex__ float __retval; \
+     __real__ __retval = (real); \
+     __imag__ __retval = (imag); \
+     __retval; })
+
+#define BUILD_D_COMPLEX(real, imag) \
+  ({ __complex__ double __retval; \
+     __real__ __retval = (real); \
+     __imag__ __retval = (imag); \
+     __retval; })
+
+#define BUILD_LD_COMPLEX(real, imag) \
+  ({ __complex__ long double __retval; \
+     __real__ __retval = (real); \
+     __imag__ __retval = (imag); \
+     __retval; })
+
+__complex__ float
+aj_f_times2 (__complex__ float x)
+{
+  __complex__ float res;
+
+  __real__ res = (2.0 * __real__ x);
+  __imag__ res = (2.0 * __imag__ x);
+
+  return res;
+}
+
+__complex__ double
+aj_d_times2 (__complex__ double x)
+{
+  __complex__ double res;
+
+  __real__ res = (2.0 * __real__ x);
+  __imag__ res = (2.0 * __imag__ x);
+
+  return res;
+}
+
+__complex__ long double
+aj_ld_times2 (__complex__ long double x)
+{
+  __complex__ long double res;
+
+  __real__ res = (2.0 * __real__ x);
+  __imag__ res = (2.0 * __imag__ x);
+
+  return res;
+}
+
+static void
+do_test (void)
+{
+#ifdef CHECK_COMPLEX
+  _Complex _Float16 f16c, f16d;
+  _Complex float fc, fd;
+  _Complex double dc, dd;
+  _Complex long double ldc, ldd;
+
+  f16c = BUILD_F16_COMPLEX (2.0, 3.0);
+  f16d = aj_f16_times2 (f16c);
+
+  assert (__real__ f16d == 4.0f16 && __imag__ f16d == 6.0f16);
+
+  fc = BUILD_LD_COMPLEX (2.0f, 3.0f);
+  fd = aj_f_times2 (fc);
+
+  assert (__real__ fd == 4.0f && __imag__ fd == 6.0f);
+
+  dc = BUILD_LD_COMPLEX (2.0, 3.0);
+  dd = aj_ld_times2 (dc);
+
+  assert (__real__ dd == 4.0 && __imag__ dd == 6.0);
+
+  ldc = BUILD_LD_COMPLEX (2.0L, 3.0L);
+  ldd = aj_ld_times2 (ldc);
+
+  assert (__real__ ldd == 4.0L && __imag__ ldd == 6.0L);
+#endif
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
new file mode 100644
index 00000000000..34afee66586
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
@@ -0,0 +1,73 @@
+#include <stdio.h>
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+__m64
+fun_test_returning___m64 (void)
+{
+  volatile_var++;
+  return (__m64){72,0};
+}
+
+__m128
+fun_test_returning___m128 (void)
+{
+  volatile_var++;
+  return (__m128){73,0,0,0};
+}
+
+__m128h
+fun_test_returning___m128h (void)
+{
+  volatile_var++;
+  return (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16,
+                   6.6f16, 7.7f16, 8.8f16};
+}
+
+__m64 test_64;
+__m128 test_128;
+__m128h test_128h;
+
+static void
+do_test (void)
+{
+  unsigned failed = 0;
+  XMM_T xmmt1, xmmt2;
+
+  /* We jump through hoops to compare the results as gcc 3.3 does throw
+     an ICE when trying to generate a compare for a == b, when a and b
+     are of __m64 or __m128 type :-(  */
+  clear_struct_registers;
+  test_64 = (__m64){72,0};
+  xmmt1._m64[0] = test_64;
+  xmmt2._m64[0] = WRAP_RET (fun_test_returning___m64)();
+  if (xmmt1._long[0] != xmmt2._long[0]
+      || xmmt1._long[0] != xmm_regs[0]._long[0])
+    printf ("fail m64\n"), failed++;
+
+  clear_struct_registers;
+  test_128 = (__m128){73,0};
+  xmmt1._m128[0] = test_128;
+  xmmt2._m128[0] = WRAP_RET (fun_test_returning___m128)();
+  if (xmmt1._long[0] != xmmt2._long[0]
+      || xmmt1._long[0] != xmm_regs[0]._long[0])
+    printf ("fail m128\n"), failed++;
+
+  clear_struct_registers;
+  test_128h = (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16,
+                        6.6f16, 7.7f16, 8.8f16};
+  xmmt1._m128h[0] = test_128h;
+  xmmt2._m128h[0] = WRAP_RET (fun_test_returning___m128h)();
+  if (xmmt1._long[0] != xmmt2._long[0]
+      || xmmt1._long[0] != xmm_regs[0]._long[0])
+    printf ("fail m128h\n"), failed++;
+
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
new file mode 100644
index 00000000000..678b25c14d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
@@ -0,0 +1,1066 @@
+/* This is an autogenerated file. Do not edit.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  _Float16 f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14,
+    f15, f16, f17, f18, f19, f20, f21, f22, f23;
+} values__Float16;
+
+struct
+{
+  float f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15,
+    f16, f17, f18, f19, f20, f21, f22, f23;
+} values_float;
+
+struct
+{
+  double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15,
+    f16, f17, f18, f19, f20, f21, f22, f23;
+} values_double;
+
+struct
+{
+  ldouble f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14,
+    f15, f16, f17, f18, f19, f20, f21, f22, f23;
+} values_ldouble;
+
+void
+fun_check_float16_passing_8_values (_Float16 f0 ATTRIBUTE_UNUSED,
+				    _Float16 f1 ATTRIBUTE_UNUSED,
+				    _Float16 f2 ATTRIBUTE_UNUSED,
+				    _Float16 f3 ATTRIBUTE_UNUSED,
+				    _Float16 f4 ATTRIBUTE_UNUSED,
+				    _Float16 f5 ATTRIBUTE_UNUSED,
+				    _Float16 f6 ATTRIBUTE_UNUSED,
+				    _Float16 f7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values__Float16.f0 == f0);
+  assert (values__Float16.f1 == f1);
+  assert (values__Float16.f2 == f2);
+  assert (values__Float16.f3 == f3);
+  assert (values__Float16.f4 == f4);
+  assert (values__Float16.f5 == f5);
+  assert (values__Float16.f6 == f6);
+  assert (values__Float16.f7 == f7);
+}
+
+void
+fun_check_float16_passing_8_regs (_Float16 f0 ATTRIBUTE_UNUSED,
+				  _Float16 f1 ATTRIBUTE_UNUSED,
+				  _Float16 f2 ATTRIBUTE_UNUSED,
+				  _Float16 f3 ATTRIBUTE_UNUSED,
+				  _Float16 f4 ATTRIBUTE_UNUSED,
+				  _Float16 f5 ATTRIBUTE_UNUSED,
+				  _Float16 f6 ATTRIBUTE_UNUSED,
+				  _Float16 f7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float16_arguments;
+}
+
+void
+fun_check_float16_passing_16_values (_Float16 f0 ATTRIBUTE_UNUSED,
+				     _Float16 f1 ATTRIBUTE_UNUSED,
+				     _Float16 f2 ATTRIBUTE_UNUSED,
+				     _Float16 f3 ATTRIBUTE_UNUSED,
+				     _Float16 f4 ATTRIBUTE_UNUSED,
+				     _Float16 f5 ATTRIBUTE_UNUSED,
+				     _Float16 f6 ATTRIBUTE_UNUSED,
+				     _Float16 f7 ATTRIBUTE_UNUSED,
+				     _Float16 f8 ATTRIBUTE_UNUSED,
+				     _Float16 f9 ATTRIBUTE_UNUSED,
+				     _Float16 f10 ATTRIBUTE_UNUSED,
+				     _Float16 f11 ATTRIBUTE_UNUSED,
+				     _Float16 f12 ATTRIBUTE_UNUSED,
+				     _Float16 f13 ATTRIBUTE_UNUSED,
+				     _Float16 f14 ATTRIBUTE_UNUSED,
+				     _Float16 f15 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values__Float16.f0 == f0);
+  assert (values__Float16.f1 == f1);
+  assert (values__Float16.f2 == f2);
+  assert (values__Float16.f3 == f3);
+  assert (values__Float16.f4 == f4);
+  assert (values__Float16.f5 == f5);
+  assert (values__Float16.f6 == f6);
+  assert (values__Float16.f7 == f7);
+  assert (values__Float16.f8 == f8);
+  assert (values__Float16.f9 == f9);
+  assert (values__Float16.f10 == f10);
+  assert (values__Float16.f11 == f11);
+  assert (values__Float16.f12 == f12);
+  assert (values__Float16.f13 == f13);
+  assert (values__Float16.f14 == f14);
+  assert (values__Float16.f15 == f15);
+}
+
+void
+fun_check_float16_passing_16_regs (_Float16 f0 ATTRIBUTE_UNUSED,
+				   _Float16 f1 ATTRIBUTE_UNUSED,
+				   _Float16 f2 ATTRIBUTE_UNUSED,
+				   _Float16 f3 ATTRIBUTE_UNUSED,
+				   _Float16 f4 ATTRIBUTE_UNUSED,
+				   _Float16 f5 ATTRIBUTE_UNUSED,
+				   _Float16 f6 ATTRIBUTE_UNUSED,
+				   _Float16 f7 ATTRIBUTE_UNUSED,
+				   _Float16 f8 ATTRIBUTE_UNUSED,
+				   _Float16 f9 ATTRIBUTE_UNUSED,
+				   _Float16 f10 ATTRIBUTE_UNUSED,
+				   _Float16 f11 ATTRIBUTE_UNUSED,
+				   _Float16 f12 ATTRIBUTE_UNUSED,
+				   _Float16 f13 ATTRIBUTE_UNUSED,
+				   _Float16 f14 ATTRIBUTE_UNUSED,
+				   _Float16 f15 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float16_arguments;
+}
+
+void
+fun_check_float16_passing_20_values (_Float16 f0 ATTRIBUTE_UNUSED,
+				     _Float16 f1 ATTRIBUTE_UNUSED,
+				     _Float16 f2 ATTRIBUTE_UNUSED,
+				     _Float16 f3 ATTRIBUTE_UNUSED,
+				     _Float16 f4 ATTRIBUTE_UNUSED,
+				     _Float16 f5 ATTRIBUTE_UNUSED,
+				     _Float16 f6 ATTRIBUTE_UNUSED,
+				     _Float16 f7 ATTRIBUTE_UNUSED,
+				     _Float16 f8 ATTRIBUTE_UNUSED,
+				     _Float16 f9 ATTRIBUTE_UNUSED,
+				     _Float16 f10 ATTRIBUTE_UNUSED,
+				     _Float16 f11 ATTRIBUTE_UNUSED,
+				     _Float16 f12 ATTRIBUTE_UNUSED,
+				     _Float16 f13 ATTRIBUTE_UNUSED,
+				     _Float16 f14 ATTRIBUTE_UNUSED,
+				     _Float16 f15 ATTRIBUTE_UNUSED,
+				     _Float16 f16 ATTRIBUTE_UNUSED,
+				     _Float16 f17 ATTRIBUTE_UNUSED,
+				     _Float16 f18 ATTRIBUTE_UNUSED,
+				     _Float16 f19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values__Float16.f0 == f0);
+  assert (values__Float16.f1 == f1);
+  assert (values__Float16.f2 == f2);
+  assert (values__Float16.f3 == f3);
+  assert (values__Float16.f4 == f4);
+  assert (values__Float16.f5 == f5);
+  assert (values__Float16.f6 == f6);
+  assert (values__Float16.f7 == f7);
+  assert (values__Float16.f8 == f8);
+  assert (values__Float16.f9 == f9);
+  assert (values__Float16.f10 == f10);
+  assert (values__Float16.f11 == f11);
+  assert (values__Float16.f12 == f12);
+  assert (values__Float16.f13 == f13);
+  assert (values__Float16.f14 == f14);
+  assert (values__Float16.f15 == f15);
+  assert (values__Float16.f16 == f16);
+  assert (values__Float16.f17 == f17);
+  assert (values__Float16.f18 == f18);
+  assert (values__Float16.f19 == f19);
+}
+
+void
+fun_check_float16_passing_20_regs (_Float16 f0 ATTRIBUTE_UNUSED,
+				   _Float16 f1 ATTRIBUTE_UNUSED,
+				   _Float16 f2 ATTRIBUTE_UNUSED,
+				   _Float16 f3 ATTRIBUTE_UNUSED,
+				   _Float16 f4 ATTRIBUTE_UNUSED,
+				   _Float16 f5 ATTRIBUTE_UNUSED,
+				   _Float16 f6 ATTRIBUTE_UNUSED,
+				   _Float16 f7 ATTRIBUTE_UNUSED,
+				   _Float16 f8 ATTRIBUTE_UNUSED,
+				   _Float16 f9 ATTRIBUTE_UNUSED,
+				   _Float16 f10 ATTRIBUTE_UNUSED,
+				   _Float16 f11 ATTRIBUTE_UNUSED,
+				   _Float16 f12 ATTRIBUTE_UNUSED,
+				   _Float16 f13 ATTRIBUTE_UNUSED,
+				   _Float16 f14 ATTRIBUTE_UNUSED,
+				   _Float16 f15 ATTRIBUTE_UNUSED,
+				   _Float16 f16 ATTRIBUTE_UNUSED,
+				   _Float16 f17 ATTRIBUTE_UNUSED,
+				   _Float16 f18 ATTRIBUTE_UNUSED,
+				   _Float16 f19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float16_arguments;
+}
+
+void
+fun_check_float_passing_float8_values (float f0 ATTRIBUTE_UNUSED,
+				       float f1 ATTRIBUTE_UNUSED,
+				       float f2 ATTRIBUTE_UNUSED,
+				       float f3 ATTRIBUTE_UNUSED,
+				       float f4 ATTRIBUTE_UNUSED,
+				       float f5 ATTRIBUTE_UNUSED,
+				       float f6 ATTRIBUTE_UNUSED,
+				       float f7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_float.f0 == f0);
+  assert (values_float.f1 == f1);
+  assert (values_float.f2 == f2);
+  assert (values_float.f3 == f3);
+  assert (values_float.f4 == f4);
+  assert (values_float.f5 == f5);
+  assert (values_float.f6 == f6);
+  assert (values_float.f7 == f7);
+
+}
+
+void
+fun_check_float_passing_float8_regs (float f0 ATTRIBUTE_UNUSED,
+				     float f1 ATTRIBUTE_UNUSED,
+				     float f2 ATTRIBUTE_UNUSED,
+				     float f3 ATTRIBUTE_UNUSED,
+				     float f4 ATTRIBUTE_UNUSED,
+				     float f5 ATTRIBUTE_UNUSED,
+				     float f6 ATTRIBUTE_UNUSED,
+				     float f7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float_arguments;
+}
+
+void
+fun_check_float_passing_float16_values (float f0 ATTRIBUTE_UNUSED,
+					float f1 ATTRIBUTE_UNUSED,
+					float f2 ATTRIBUTE_UNUSED,
+					float f3 ATTRIBUTE_UNUSED,
+					float f4 ATTRIBUTE_UNUSED,
+					float f5 ATTRIBUTE_UNUSED,
+					float f6 ATTRIBUTE_UNUSED,
+					float f7 ATTRIBUTE_UNUSED,
+					float f8 ATTRIBUTE_UNUSED,
+					float f9 ATTRIBUTE_UNUSED,
+					float f10 ATTRIBUTE_UNUSED,
+					float f11 ATTRIBUTE_UNUSED,
+					float f12 ATTRIBUTE_UNUSED,
+					float f13 ATTRIBUTE_UNUSED,
+					float f14 ATTRIBUTE_UNUSED,
+					float f15 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_float.f0 == f0);
+  assert (values_float.f1 == f1);
+  assert (values_float.f2 == f2);
+  assert (values_float.f3 == f3);
+  assert (values_float.f4 == f4);
+  assert (values_float.f5 == f5);
+  assert (values_float.f6 == f6);
+  assert (values_float.f7 == f7);
+  assert (values_float.f8 == f8);
+  assert (values_float.f9 == f9);
+  assert (values_float.f10 == f10);
+  assert (values_float.f11 == f11);
+  assert (values_float.f12 == f12);
+  assert (values_float.f13 == f13);
+  assert (values_float.f14 == f14);
+  assert (values_float.f15 == f15);
+
+}
+
+void
+fun_check_float_passing_float16_regs (float f0 ATTRIBUTE_UNUSED,
+				      float f1 ATTRIBUTE_UNUSED,
+				      float f2 ATTRIBUTE_UNUSED,
+				      float f3 ATTRIBUTE_UNUSED,
+				      float f4 ATTRIBUTE_UNUSED,
+				      float f5 ATTRIBUTE_UNUSED,
+				      float f6 ATTRIBUTE_UNUSED,
+				      float f7 ATTRIBUTE_UNUSED,
+				      float f8 ATTRIBUTE_UNUSED,
+				      float f9 ATTRIBUTE_UNUSED,
+				      float f10 ATTRIBUTE_UNUSED,
+				      float f11 ATTRIBUTE_UNUSED,
+				      float f12 ATTRIBUTE_UNUSED,
+				      float f13 ATTRIBUTE_UNUSED,
+				      float f14 ATTRIBUTE_UNUSED,
+				      float f15 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float_arguments;
+}
+
+void
+fun_check_float_passing_float20_values (float f0 ATTRIBUTE_UNUSED,
+					float f1 ATTRIBUTE_UNUSED,
+					float f2 ATTRIBUTE_UNUSED,
+					float f3 ATTRIBUTE_UNUSED,
+					float f4 ATTRIBUTE_UNUSED,
+					float f5 ATTRIBUTE_UNUSED,
+					float f6 ATTRIBUTE_UNUSED,
+					float f7 ATTRIBUTE_UNUSED,
+					float f8 ATTRIBUTE_UNUSED,
+					float f9 ATTRIBUTE_UNUSED,
+					float f10 ATTRIBUTE_UNUSED,
+					float f11 ATTRIBUTE_UNUSED,
+					float f12 ATTRIBUTE_UNUSED,
+					float f13 ATTRIBUTE_UNUSED,
+					float f14 ATTRIBUTE_UNUSED,
+					float f15 ATTRIBUTE_UNUSED,
+					float f16 ATTRIBUTE_UNUSED,
+					float f17 ATTRIBUTE_UNUSED,
+					float f18 ATTRIBUTE_UNUSED,
+					float f19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_float.f0 == f0);
+  assert (values_float.f1 == f1);
+  assert (values_float.f2 == f2);
+  assert (values_float.f3 == f3);
+  assert (values_float.f4 == f4);
+  assert (values_float.f5 == f5);
+  assert (values_float.f6 == f6);
+  assert (values_float.f7 == f7);
+  assert (values_float.f8 == f8);
+  assert (values_float.f9 == f9);
+  assert (values_float.f10 == f10);
+  assert (values_float.f11 == f11);
+  assert (values_float.f12 == f12);
+  assert (values_float.f13 == f13);
+  assert (values_float.f14 == f14);
+  assert (values_float.f15 == f15);
+  assert (values_float.f16 == f16);
+  assert (values_float.f17 == f17);
+  assert (values_float.f18 == f18);
+  assert (values_float.f19 == f19);
+
+}
+
+void
+fun_check_float_passing_float20_regs (float f0 ATTRIBUTE_UNUSED,
+				      float f1 ATTRIBUTE_UNUSED,
+				      float f2 ATTRIBUTE_UNUSED,
+				      float f3 ATTRIBUTE_UNUSED,
+				      float f4 ATTRIBUTE_UNUSED,
+				      float f5 ATTRIBUTE_UNUSED,
+				      float f6 ATTRIBUTE_UNUSED,
+				      float f7 ATTRIBUTE_UNUSED,
+				      float f8 ATTRIBUTE_UNUSED,
+				      float f9 ATTRIBUTE_UNUSED,
+				      float f10 ATTRIBUTE_UNUSED,
+				      float f11 ATTRIBUTE_UNUSED,
+				      float f12 ATTRIBUTE_UNUSED,
+				      float f13 ATTRIBUTE_UNUSED,
+				      float f14 ATTRIBUTE_UNUSED,
+				      float f15 ATTRIBUTE_UNUSED,
+				      float f16 ATTRIBUTE_UNUSED,
+				      float f17 ATTRIBUTE_UNUSED,
+				      float f18 ATTRIBUTE_UNUSED,
+				      float f19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float_arguments;
+}
+
+void
+fun_check_float_passing_double8_values (double f0 ATTRIBUTE_UNUSED,
+					double f1 ATTRIBUTE_UNUSED,
+					double f2 ATTRIBUTE_UNUSED,
+					double f3 ATTRIBUTE_UNUSED,
+					double f4 ATTRIBUTE_UNUSED,
+					double f5 ATTRIBUTE_UNUSED,
+					double f6 ATTRIBUTE_UNUSED,
+					double f7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_double.f0 == f0);
+  assert (values_double.f1 == f1);
+  assert (values_double.f2 == f2);
+  assert (values_double.f3 == f3);
+  assert (values_double.f4 == f4);
+  assert (values_double.f5 == f5);
+  assert (values_double.f6 == f6);
+  assert (values_double.f7 == f7);
+
+}
+
+void
+fun_check_float_passing_double8_regs (double f0 ATTRIBUTE_UNUSED,
+				      double f1 ATTRIBUTE_UNUSED,
+				      double f2 ATTRIBUTE_UNUSED,
+				      double f3 ATTRIBUTE_UNUSED,
+				      double f4 ATTRIBUTE_UNUSED,
+				      double f5 ATTRIBUTE_UNUSED,
+				      double f6 ATTRIBUTE_UNUSED,
+				      double f7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_double_arguments;
+}
+
+void
+fun_check_float_passing_double16_values (double f0 ATTRIBUTE_UNUSED,
+					 double f1 ATTRIBUTE_UNUSED,
+					 double f2 ATTRIBUTE_UNUSED,
+					 double f3 ATTRIBUTE_UNUSED,
+					 double f4 ATTRIBUTE_UNUSED,
+					 double f5 ATTRIBUTE_UNUSED,
+					 double f6 ATTRIBUTE_UNUSED,
+					 double f7 ATTRIBUTE_UNUSED,
+					 double f8 ATTRIBUTE_UNUSED,
+					 double f9 ATTRIBUTE_UNUSED,
+					 double f10 ATTRIBUTE_UNUSED,
+					 double f11 ATTRIBUTE_UNUSED,
+					 double f12 ATTRIBUTE_UNUSED,
+					 double f13 ATTRIBUTE_UNUSED,
+					 double f14 ATTRIBUTE_UNUSED,
+					 double f15 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_double.f0 == f0);
+  assert (values_double.f1 == f1);
+  assert (values_double.f2 == f2);
+  assert (values_double.f3 == f3);
+  assert (values_double.f4 == f4);
+  assert (values_double.f5 == f5);
+  assert (values_double.f6 == f6);
+  assert (values_double.f7 == f7);
+  assert (values_double.f8 == f8);
+  assert (values_double.f9 == f9);
+  assert (values_double.f10 == f10);
+  assert (values_double.f11 == f11);
+  assert (values_double.f12 == f12);
+  assert (values_double.f13 == f13);
+  assert (values_double.f14 == f14);
+  assert (values_double.f15 == f15);
+
+}
+
+void
+fun_check_float_passing_double16_regs (double f0 ATTRIBUTE_UNUSED,
+				       double f1 ATTRIBUTE_UNUSED,
+				       double f2 ATTRIBUTE_UNUSED,
+				       double f3 ATTRIBUTE_UNUSED,
+				       double f4 ATTRIBUTE_UNUSED,
+				       double f5 ATTRIBUTE_UNUSED,
+				       double f6 ATTRIBUTE_UNUSED,
+				       double f7 ATTRIBUTE_UNUSED,
+				       double f8 ATTRIBUTE_UNUSED,
+				       double f9 ATTRIBUTE_UNUSED,
+				       double f10 ATTRIBUTE_UNUSED,
+				       double f11 ATTRIBUTE_UNUSED,
+				       double f12 ATTRIBUTE_UNUSED,
+				       double f13 ATTRIBUTE_UNUSED,
+				       double f14 ATTRIBUTE_UNUSED,
+				       double f15 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_double_arguments;
+}
+
+void
+fun_check_float_passing_double20_values (double f0 ATTRIBUTE_UNUSED,
+					 double f1 ATTRIBUTE_UNUSED,
+					 double f2 ATTRIBUTE_UNUSED,
+					 double f3 ATTRIBUTE_UNUSED,
+					 double f4 ATTRIBUTE_UNUSED,
+					 double f5 ATTRIBUTE_UNUSED,
+					 double f6 ATTRIBUTE_UNUSED,
+					 double f7 ATTRIBUTE_UNUSED,
+					 double f8 ATTRIBUTE_UNUSED,
+					 double f9 ATTRIBUTE_UNUSED,
+					 double f10 ATTRIBUTE_UNUSED,
+					 double f11 ATTRIBUTE_UNUSED,
+					 double f12 ATTRIBUTE_UNUSED,
+					 double f13 ATTRIBUTE_UNUSED,
+					 double f14 ATTRIBUTE_UNUSED,
+					 double f15 ATTRIBUTE_UNUSED,
+					 double f16 ATTRIBUTE_UNUSED,
+					 double f17 ATTRIBUTE_UNUSED,
+					 double f18 ATTRIBUTE_UNUSED,
+					 double f19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_double.f0 == f0);
+  assert (values_double.f1 == f1);
+  assert (values_double.f2 == f2);
+  assert (values_double.f3 == f3);
+  assert (values_double.f4 == f4);
+  assert (values_double.f5 == f5);
+  assert (values_double.f6 == f6);
+  assert (values_double.f7 == f7);
+  assert (values_double.f8 == f8);
+  assert (values_double.f9 == f9);
+  assert (values_double.f10 == f10);
+  assert (values_double.f11 == f11);
+  assert (values_double.f12 == f12);
+  assert (values_double.f13 == f13);
+  assert (values_double.f14 == f14);
+  assert (values_double.f15 == f15);
+  assert (values_double.f16 == f16);
+  assert (values_double.f17 == f17);
+  assert (values_double.f18 == f18);
+  assert (values_double.f19 == f19);
+
+}
+
+void
+fun_check_float_passing_double20_regs (double f0 ATTRIBUTE_UNUSED,
+				       double f1 ATTRIBUTE_UNUSED,
+				       double f2 ATTRIBUTE_UNUSED,
+				       double f3 ATTRIBUTE_UNUSED,
+				       double f4 ATTRIBUTE_UNUSED,
+				       double f5 ATTRIBUTE_UNUSED,
+				       double f6 ATTRIBUTE_UNUSED,
+				       double f7 ATTRIBUTE_UNUSED,
+				       double f8 ATTRIBUTE_UNUSED,
+				       double f9 ATTRIBUTE_UNUSED,
+				       double f10 ATTRIBUTE_UNUSED,
+				       double f11 ATTRIBUTE_UNUSED,
+				       double f12 ATTRIBUTE_UNUSED,
+				       double f13 ATTRIBUTE_UNUSED,
+				       double f14 ATTRIBUTE_UNUSED,
+				       double f15 ATTRIBUTE_UNUSED,
+				       double f16 ATTRIBUTE_UNUSED,
+				       double f17 ATTRIBUTE_UNUSED,
+				       double f18 ATTRIBUTE_UNUSED,
+				       double f19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_double_arguments;
+}
+
+void
+fun_check_x87_passing_ldouble8_values (ldouble f0 ATTRIBUTE_UNUSED,
+				       ldouble f1 ATTRIBUTE_UNUSED,
+				       ldouble f2 ATTRIBUTE_UNUSED,
+				       ldouble f3 ATTRIBUTE_UNUSED,
+				       ldouble f4 ATTRIBUTE_UNUSED,
+				       ldouble f5 ATTRIBUTE_UNUSED,
+				       ldouble f6 ATTRIBUTE_UNUSED,
+				       ldouble f7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_ldouble.f0 == f0);
+  assert (values_ldouble.f1 == f1);
+  assert (values_ldouble.f2 == f2);
+  assert (values_ldouble.f3 == f3);
+  assert (values_ldouble.f4 == f4);
+  assert (values_ldouble.f5 == f5);
+  assert (values_ldouble.f6 == f6);
+  assert (values_ldouble.f7 == f7);
+
+}
+
+void
+fun_check_x87_passing_ldouble8_regs (ldouble f0 ATTRIBUTE_UNUSED,
+				     ldouble f1 ATTRIBUTE_UNUSED,
+				     ldouble f2 ATTRIBUTE_UNUSED,
+				     ldouble f3 ATTRIBUTE_UNUSED,
+				     ldouble f4 ATTRIBUTE_UNUSED,
+				     ldouble f5 ATTRIBUTE_UNUSED,
+				     ldouble f6 ATTRIBUTE_UNUSED,
+				     ldouble f7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_ldouble_arguments;
+}
+
+void
+fun_check_x87_passing_ldouble16_values (ldouble f0 ATTRIBUTE_UNUSED,
+					ldouble f1 ATTRIBUTE_UNUSED,
+					ldouble f2 ATTRIBUTE_UNUSED,
+					ldouble f3 ATTRIBUTE_UNUSED,
+					ldouble f4 ATTRIBUTE_UNUSED,
+					ldouble f5 ATTRIBUTE_UNUSED,
+					ldouble f6 ATTRIBUTE_UNUSED,
+					ldouble f7 ATTRIBUTE_UNUSED,
+					ldouble f8 ATTRIBUTE_UNUSED,
+					ldouble f9 ATTRIBUTE_UNUSED,
+					ldouble f10 ATTRIBUTE_UNUSED,
+					ldouble f11 ATTRIBUTE_UNUSED,
+					ldouble f12 ATTRIBUTE_UNUSED,
+					ldouble f13 ATTRIBUTE_UNUSED,
+					ldouble f14 ATTRIBUTE_UNUSED,
+					ldouble f15 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_ldouble.f0 == f0);
+  assert (values_ldouble.f1 == f1);
+  assert (values_ldouble.f2 == f2);
+  assert (values_ldouble.f3 == f3);
+  assert (values_ldouble.f4 == f4);
+  assert (values_ldouble.f5 == f5);
+  assert (values_ldouble.f6 == f6);
+  assert (values_ldouble.f7 == f7);
+  assert (values_ldouble.f8 == f8);
+  assert (values_ldouble.f9 == f9);
+  assert (values_ldouble.f10 == f10);
+  assert (values_ldouble.f11 == f11);
+  assert (values_ldouble.f12 == f12);
+  assert (values_ldouble.f13 == f13);
+  assert (values_ldouble.f14 == f14);
+  assert (values_ldouble.f15 == f15);
+
+}
+
+void
+fun_check_x87_passing_ldouble16_regs (ldouble f0 ATTRIBUTE_UNUSED,
+				      ldouble f1 ATTRIBUTE_UNUSED,
+				      ldouble f2 ATTRIBUTE_UNUSED,
+				      ldouble f3 ATTRIBUTE_UNUSED,
+				      ldouble f4 ATTRIBUTE_UNUSED,
+				      ldouble f5 ATTRIBUTE_UNUSED,
+				      ldouble f6 ATTRIBUTE_UNUSED,
+				      ldouble f7 ATTRIBUTE_UNUSED,
+				      ldouble f8 ATTRIBUTE_UNUSED,
+				      ldouble f9 ATTRIBUTE_UNUSED,
+				      ldouble f10 ATTRIBUTE_UNUSED,
+				      ldouble f11 ATTRIBUTE_UNUSED,
+				      ldouble f12 ATTRIBUTE_UNUSED,
+				      ldouble f13 ATTRIBUTE_UNUSED,
+				      ldouble f14 ATTRIBUTE_UNUSED,
+				      ldouble f15 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_ldouble_arguments;
+}
+
+void
+fun_check_x87_passing_ldouble20_values (ldouble f0 ATTRIBUTE_UNUSED,
+					ldouble f1 ATTRIBUTE_UNUSED,
+					ldouble f2 ATTRIBUTE_UNUSED,
+					ldouble f3 ATTRIBUTE_UNUSED,
+					ldouble f4 ATTRIBUTE_UNUSED,
+					ldouble f5 ATTRIBUTE_UNUSED,
+					ldouble f6 ATTRIBUTE_UNUSED,
+					ldouble f7 ATTRIBUTE_UNUSED,
+					ldouble f8 ATTRIBUTE_UNUSED,
+					ldouble f9 ATTRIBUTE_UNUSED,
+					ldouble f10 ATTRIBUTE_UNUSED,
+					ldouble f11 ATTRIBUTE_UNUSED,
+					ldouble f12 ATTRIBUTE_UNUSED,
+					ldouble f13 ATTRIBUTE_UNUSED,
+					ldouble f14 ATTRIBUTE_UNUSED,
+					ldouble f15 ATTRIBUTE_UNUSED,
+					ldouble f16 ATTRIBUTE_UNUSED,
+					ldouble f17 ATTRIBUTE_UNUSED,
+					ldouble f18 ATTRIBUTE_UNUSED,
+					ldouble f19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_ldouble.f0 == f0);
+  assert (values_ldouble.f1 == f1);
+  assert (values_ldouble.f2 == f2);
+  assert (values_ldouble.f3 == f3);
+  assert (values_ldouble.f4 == f4);
+  assert (values_ldouble.f5 == f5);
+  assert (values_ldouble.f6 == f6);
+  assert (values_ldouble.f7 == f7);
+  assert (values_ldouble.f8 == f8);
+  assert (values_ldouble.f9 == f9);
+  assert (values_ldouble.f10 == f10);
+  assert (values_ldouble.f11 == f11);
+  assert (values_ldouble.f12 == f12);
+  assert (values_ldouble.f13 == f13);
+  assert (values_ldouble.f14 == f14);
+  assert (values_ldouble.f15 == f15);
+  assert (values_ldouble.f16 == f16);
+  assert (values_ldouble.f17 == f17);
+  assert (values_ldouble.f18 == f18);
+  assert (values_ldouble.f19 == f19);
+
+}
+
+void
+fun_check_x87_passing_ldouble20_regs (ldouble f0 ATTRIBUTE_UNUSED,
+				      ldouble f1 ATTRIBUTE_UNUSED,
+				      ldouble f2 ATTRIBUTE_UNUSED,
+				      ldouble f3 ATTRIBUTE_UNUSED,
+				      ldouble f4 ATTRIBUTE_UNUSED,
+				      ldouble f5 ATTRIBUTE_UNUSED,
+				      ldouble f6 ATTRIBUTE_UNUSED,
+				      ldouble f7 ATTRIBUTE_UNUSED,
+				      ldouble f8 ATTRIBUTE_UNUSED,
+				      ldouble f9 ATTRIBUTE_UNUSED,
+				      ldouble f10 ATTRIBUTE_UNUSED,
+				      ldouble f11 ATTRIBUTE_UNUSED,
+				      ldouble f12 ATTRIBUTE_UNUSED,
+				      ldouble f13 ATTRIBUTE_UNUSED,
+				      ldouble f14 ATTRIBUTE_UNUSED,
+				      ldouble f15 ATTRIBUTE_UNUSED,
+				      ldouble f16 ATTRIBUTE_UNUSED,
+				      ldouble f17 ATTRIBUTE_UNUSED,
+				      ldouble f18 ATTRIBUTE_UNUSED,
+				      ldouble f19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_ldouble_arguments;
+}
+
+#define def_check_float16_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6,\
+				   _f7, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7);
+
+#define def_check_float16_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \
+				    _f7, _f8, _f9, _f10, _f11, _f12, _f13, \
+				    _f14, _f15, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
+		     _f10, _f11, _f12, _f13, _f14, _f15); \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
+		     _f10, _f11, _f12, _f13, _f14, _f15);
+
+#define def_check_float16_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \
+				    _f7, _f8, _f9, _f10, _f11, _f12, \
+				    _f13, _f14, _f15, _f16, _f17, \
+				    _f18, _f19, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  values_ ## TYPE .f16 = _f16; \
+  values_ ## TYPE .f17 = _f17; \
+  values_ ## TYPE .f18 = _f18; \
+  values_ ## TYPE .f19 = _f19; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, \
+		     _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, \
+		     _f17, _f18, _f19); \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
+		     _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, \
+		     _f18, _f19);
+
+
+#define def_check_float_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \
+  \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7);
+
+#define def_check_float_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15); \
+  \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15);
+
+#define def_check_float_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  values_ ## TYPE .f16 = _f16; \
+  values_ ## TYPE .f17 = _f17; \
+  values_ ## TYPE .f18 = _f18; \
+  values_ ## TYPE .f19 = _f19; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19); \
+  \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19);
+
+#define def_check_x87_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \
+  \
+  clear_x87_registers; \
+  num_fregs = 0; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7);
+
+#define def_check_x87_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15); \
+  \
+  clear_x87_registers; \
+  num_fregs = 0; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15);
+
+#define def_check_x87_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  values_ ## TYPE .f16 = _f16; \
+  values_ ## TYPE .f17 = _f17; \
+  values_ ## TYPE .f18 = _f18; \
+  values_ ## TYPE .f19 = _f19; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19); \
+  \
+  clear_x87_registers; \
+  num_fregs = 0; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19);
+
+void
+test_float16_on_stack ()
+{
+  def_check_float16_passing8 (32, 33, 34, 35, 36, 37, 38, 39,
+			      fun_check_float16_passing_8_values,
+			      fun_check_float16_passing_8_regs, _Float16);
+
+  def_check_float16_passing16 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			       44, 45, 46, 47,
+			       fun_check_float16_passing_16_values,
+			       fun_check_float16_passing_16_regs, _Float16);
+}
+
+void
+test_too_many_float16 ()
+{
+  def_check_float16_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			       44, 45, 46, 47, 48, 49, 50, 51,
+			       fun_check_float16_passing_20_values,
+			       fun_check_float16_passing_20_regs, _Float16);
+}
+
+void
+test_floats_on_stack ()
+{
+  def_check_float_passing8 (32, 33, 34, 35, 36, 37, 38, 39,
+			    fun_check_float_passing_float8_values,
+			    fun_check_float_passing_float8_regs, float);
+
+  def_check_float_passing16 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			     44, 45, 46, 47,
+			     fun_check_float_passing_float16_values,
+			     fun_check_float_passing_float16_regs, float);
+}
+
+void
+test_too_many_floats ()
+{
+  def_check_float_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			     44, 45, 46, 47, 48, 49, 50, 51,
+			     fun_check_float_passing_float20_values,
+			     fun_check_float_passing_float20_regs, float);
+}
+
+void
+test_doubles_on_stack ()
+{
+  def_check_float_passing8 (32, 33, 34, 35, 36, 37, 38, 39,
+			    fun_check_float_passing_double8_values,
+			    fun_check_float_passing_double8_regs, double);
+
+  def_check_float_passing16 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			     44, 45, 46, 47,
+			     fun_check_float_passing_double16_values,
+			     fun_check_float_passing_double16_regs, double);
+}
+
+void
+test_too_many_doubles ()
+{
+  def_check_float_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			     44, 45, 46, 47, 48, 49, 50, 51,
+			     fun_check_float_passing_double20_values,
+			     fun_check_float_passing_double20_regs, double);
+}
+
+void
+test_long_doubles_on_stack ()
+{
+  def_check_x87_passing8 (32, 33, 34, 35, 36, 37, 38, 39,
+			  fun_check_x87_passing_ldouble8_values,
+			  fun_check_x87_passing_ldouble8_regs, ldouble);
+}
+
+void
+test_too_many_long_doubles ()
+{
+  def_check_x87_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+			   45, 46, 47, 48, 49, 50, 51,
+			   fun_check_x87_passing_ldouble20_values,
+			   fun_check_x87_passing_ldouble20_regs, ldouble);
+}
+
+void
+test_float128s_on_stack ()
+{
+}
+
+void
+test_too_many_float128s ()
+{
+}
+
+
+static void
+do_test (void)
+{
+  test_float16_on_stack ();
+  test_too_many_float16 ();
+  test_floats_on_stack ();
+  test_too_many_floats ();
+  test_doubles_on_stack ();
+  test_too_many_doubles ();
+  test_long_doubles_on_stack ();
+  test_too_many_long_doubles ();
+  test_float128s_on_stack ();
+  test_too_many_float128s ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
new file mode 100644
index 00000000000..66c27aef7af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
@@ -0,0 +1,510 @@
+#include <stdio.h>
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
+    i16, i17, i18, i19, i20, i21, i22, i23;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m64_8_values (__m64 i0 ATTRIBUTE_UNUSED,
+				__m64 i1 ATTRIBUTE_UNUSED,
+				__m64 i2 ATTRIBUTE_UNUSED,
+				__m64 i3 ATTRIBUTE_UNUSED,
+				__m64 i4 ATTRIBUTE_UNUSED,
+				__m64 i5 ATTRIBUTE_UNUSED,
+				__m64 i6 ATTRIBUTE_UNUSED,
+				__m64 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m64);
+  compare (values.i1, i1, __m64);
+  compare (values.i2, i2, __m64);
+  compare (values.i3, i3, __m64);
+  compare (values.i4, i4, __m64);
+  compare (values.i5, i5, __m64);
+  compare (values.i6, i6, __m64);
+  compare (values.i7, i7, __m64);
+}
+
+void
+fun_check_passing_m64_8_regs (__m64 i0 ATTRIBUTE_UNUSED,
+			      __m64 i1 ATTRIBUTE_UNUSED,
+			      __m64 i2 ATTRIBUTE_UNUSED,
+			      __m64 i3 ATTRIBUTE_UNUSED,
+			      __m64 i4 ATTRIBUTE_UNUSED,
+			      __m64 i5 ATTRIBUTE_UNUSED,
+			      __m64 i6 ATTRIBUTE_UNUSED,
+			      __m64 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m64_arguments;
+}
+
+void
+fun_check_passing_m64_20_values (__m64 i0 ATTRIBUTE_UNUSED,
+				 __m64 i1 ATTRIBUTE_UNUSED,
+				 __m64 i2 ATTRIBUTE_UNUSED,
+				 __m64 i3 ATTRIBUTE_UNUSED,
+				 __m64 i4 ATTRIBUTE_UNUSED,
+				 __m64 i5 ATTRIBUTE_UNUSED,
+				 __m64 i6 ATTRIBUTE_UNUSED,
+				 __m64 i7 ATTRIBUTE_UNUSED,
+				 __m64 i8 ATTRIBUTE_UNUSED,
+				 __m64 i9 ATTRIBUTE_UNUSED,
+				 __m64 i10 ATTRIBUTE_UNUSED,
+				 __m64 i11 ATTRIBUTE_UNUSED,
+				 __m64 i12 ATTRIBUTE_UNUSED,
+				 __m64 i13 ATTRIBUTE_UNUSED,
+				 __m64 i14 ATTRIBUTE_UNUSED,
+				 __m64 i15 ATTRIBUTE_UNUSED,
+				 __m64 i16 ATTRIBUTE_UNUSED,
+				 __m64 i17 ATTRIBUTE_UNUSED,
+				 __m64 i18 ATTRIBUTE_UNUSED,
+				 __m64 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m64);
+  compare (values.i1, i1, __m64);
+  compare (values.i2, i2, __m64);
+  compare (values.i3, i3, __m64);
+  compare (values.i4, i4, __m64);
+  compare (values.i5, i5, __m64);
+  compare (values.i6, i6, __m64);
+  compare (values.i7, i7, __m64);
+  compare (values.i8, i8, __m64);
+  compare (values.i9, i9, __m64);
+  compare (values.i10, i10, __m64);
+  compare (values.i11, i11, __m64);
+  compare (values.i12, i12, __m64);
+  compare (values.i13, i13, __m64);
+  compare (values.i14, i14, __m64);
+  compare (values.i15, i15, __m64);
+  compare (values.i16, i16, __m64);
+  compare (values.i17, i17, __m64);
+  compare (values.i18, i18, __m64);
+  compare (values.i19, i19, __m64);
+}
+
+void
+fun_check_passing_m64_20_regs (__m64 i0 ATTRIBUTE_UNUSED,
+			       __m64 i1 ATTRIBUTE_UNUSED,
+			       __m64 i2 ATTRIBUTE_UNUSED,
+			       __m64 i3 ATTRIBUTE_UNUSED,
+			       __m64 i4 ATTRIBUTE_UNUSED,
+			       __m64 i5 ATTRIBUTE_UNUSED,
+			       __m64 i6 ATTRIBUTE_UNUSED,
+			       __m64 i7 ATTRIBUTE_UNUSED,
+			       __m64 i8 ATTRIBUTE_UNUSED,
+			       __m64 i9 ATTRIBUTE_UNUSED,
+			       __m64 i10 ATTRIBUTE_UNUSED,
+			       __m64 i11 ATTRIBUTE_UNUSED,
+			       __m64 i12 ATTRIBUTE_UNUSED,
+			       __m64 i13 ATTRIBUTE_UNUSED,
+			       __m64 i14 ATTRIBUTE_UNUSED,
+			       __m64 i15 ATTRIBUTE_UNUSED,
+			       __m64 i16 ATTRIBUTE_UNUSED,
+			       __m64 i17 ATTRIBUTE_UNUSED,
+			       __m64 i18 ATTRIBUTE_UNUSED,
+			       __m64 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m64_arguments;
+}
+
+void
+fun_check_passing_m128_8_values (__m128 i0 ATTRIBUTE_UNUSED,
+				 __m128 i1 ATTRIBUTE_UNUSED,
+				 __m128 i2 ATTRIBUTE_UNUSED,
+				 __m128 i3 ATTRIBUTE_UNUSED,
+				 __m128 i4 ATTRIBUTE_UNUSED,
+				 __m128 i5 ATTRIBUTE_UNUSED,
+				 __m128 i6 ATTRIBUTE_UNUSED,
+				 __m128 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128);
+  compare (values.i1, i1, __m128);
+  compare (values.i2, i2, __m128);
+  compare (values.i3, i3, __m128);
+  compare (values.i4, i4, __m128);
+  compare (values.i5, i5, __m128);
+  compare (values.i6, i6, __m128);
+  compare (values.i7, i7, __m128);
+}
+
+void
+fun_check_passing_m128h_8_values (__m128h i0 ATTRIBUTE_UNUSED,
+				  __m128h i1 ATTRIBUTE_UNUSED,
+				  __m128h i2 ATTRIBUTE_UNUSED,
+				  __m128h i3 ATTRIBUTE_UNUSED,
+				  __m128h i4 ATTRIBUTE_UNUSED,
+				  __m128h i5 ATTRIBUTE_UNUSED,
+				  __m128h i6 ATTRIBUTE_UNUSED,
+				  __m128h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128h);
+  compare (values.i1, i1, __m128h);
+  compare (values.i2, i2, __m128h);
+  compare (values.i3, i3, __m128h);
+  compare (values.i4, i4, __m128h);
+  compare (values.i5, i5, __m128h);
+  compare (values.i6, i6, __m128h);
+  compare (values.i7, i7, __m128h);
+}
+
+void
+fun_check_passing_m128_8_regs (__m128 i0 ATTRIBUTE_UNUSED,
+			       __m128 i1 ATTRIBUTE_UNUSED,
+			       __m128 i2 ATTRIBUTE_UNUSED,
+			       __m128 i3 ATTRIBUTE_UNUSED,
+			       __m128 i4 ATTRIBUTE_UNUSED,
+			       __m128 i5 ATTRIBUTE_UNUSED,
+			       __m128 i6 ATTRIBUTE_UNUSED,
+			       __m128 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+void
+fun_check_passing_m128h_8_regs (__m128h i0 ATTRIBUTE_UNUSED,
+			        __m128h i1 ATTRIBUTE_UNUSED,
+			        __m128h i2 ATTRIBUTE_UNUSED,
+			        __m128h i3 ATTRIBUTE_UNUSED,
+			        __m128h i4 ATTRIBUTE_UNUSED,
+			        __m128h i5 ATTRIBUTE_UNUSED,
+			        __m128h i6 ATTRIBUTE_UNUSED,
+			        __m128h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+void
+fun_check_passing_m128_20_values (__m128 i0 ATTRIBUTE_UNUSED,
+				  __m128 i1 ATTRIBUTE_UNUSED,
+				  __m128 i2 ATTRIBUTE_UNUSED,
+				  __m128 i3 ATTRIBUTE_UNUSED,
+				  __m128 i4 ATTRIBUTE_UNUSED,
+				  __m128 i5 ATTRIBUTE_UNUSED,
+				  __m128 i6 ATTRIBUTE_UNUSED,
+				  __m128 i7 ATTRIBUTE_UNUSED,
+				  __m128 i8 ATTRIBUTE_UNUSED,
+				  __m128 i9 ATTRIBUTE_UNUSED,
+				  __m128 i10 ATTRIBUTE_UNUSED,
+				  __m128 i11 ATTRIBUTE_UNUSED,
+				  __m128 i12 ATTRIBUTE_UNUSED,
+				  __m128 i13 ATTRIBUTE_UNUSED,
+				  __m128 i14 ATTRIBUTE_UNUSED,
+				  __m128 i15 ATTRIBUTE_UNUSED,
+				  __m128 i16 ATTRIBUTE_UNUSED,
+				  __m128 i17 ATTRIBUTE_UNUSED,
+				  __m128 i18 ATTRIBUTE_UNUSED,
+				  __m128 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128);
+  compare (values.i1, i1, __m128);
+  compare (values.i2, i2, __m128);
+  compare (values.i3, i3, __m128);
+  compare (values.i4, i4, __m128);
+  compare (values.i5, i5, __m128);
+  compare (values.i6, i6, __m128);
+  compare (values.i7, i7, __m128);
+  compare (values.i8, i8, __m128);
+  compare (values.i9, i9, __m128);
+  compare (values.i10, i10, __m128);
+  compare (values.i11, i11, __m128);
+  compare (values.i12, i12, __m128);
+  compare (values.i13, i13, __m128);
+  compare (values.i14, i14, __m128);
+  compare (values.i15, i15, __m128);
+  compare (values.i16, i16, __m128);
+  compare (values.i17, i17, __m128);
+  compare (values.i18, i18, __m128);
+  compare (values.i19, i19, __m128);
+}
+
+void
+fun_check_passing_m128h_20_values (__m128h i0 ATTRIBUTE_UNUSED,
+				   __m128h i1 ATTRIBUTE_UNUSED,
+				   __m128h i2 ATTRIBUTE_UNUSED,
+				   __m128h i3 ATTRIBUTE_UNUSED,
+				   __m128h i4 ATTRIBUTE_UNUSED,
+				   __m128h i5 ATTRIBUTE_UNUSED,
+				   __m128h i6 ATTRIBUTE_UNUSED,
+				   __m128h i7 ATTRIBUTE_UNUSED,
+				   __m128h i8 ATTRIBUTE_UNUSED,
+				   __m128h i9 ATTRIBUTE_UNUSED,
+				   __m128h i10 ATTRIBUTE_UNUSED,
+				   __m128h i11 ATTRIBUTE_UNUSED,
+				   __m128h i12 ATTRIBUTE_UNUSED,
+				   __m128h i13 ATTRIBUTE_UNUSED,
+				   __m128h i14 ATTRIBUTE_UNUSED,
+				   __m128h i15 ATTRIBUTE_UNUSED,
+				   __m128h i16 ATTRIBUTE_UNUSED,
+				   __m128h i17 ATTRIBUTE_UNUSED,
+				   __m128h i18 ATTRIBUTE_UNUSED,
+				   __m128h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128h);
+  compare (values.i1, i1, __m128h);
+  compare (values.i2, i2, __m128h);
+  compare (values.i3, i3, __m128h);
+  compare (values.i4, i4, __m128h);
+  compare (values.i5, i5, __m128h);
+  compare (values.i6, i6, __m128h);
+  compare (values.i7, i7, __m128h);
+  compare (values.i8, i8, __m128h);
+  compare (values.i9, i9, __m128h);
+  compare (values.i10, i10, __m128h);
+  compare (values.i11, i11, __m128h);
+  compare (values.i12, i12, __m128h);
+  compare (values.i13, i13, __m128h);
+  compare (values.i14, i14, __m128h);
+  compare (values.i15, i15, __m128h);
+  compare (values.i16, i16, __m128h);
+  compare (values.i17, i17, __m128h);
+  compare (values.i18, i18, __m128h);
+  compare (values.i19, i19, __m128h);
+}
+
+void
+fun_check_passing_m128_20_regs (__m128 i0 ATTRIBUTE_UNUSED,
+				__m128 i1 ATTRIBUTE_UNUSED,
+				__m128 i2 ATTRIBUTE_UNUSED,
+				__m128 i3 ATTRIBUTE_UNUSED,
+				__m128 i4 ATTRIBUTE_UNUSED,
+				__m128 i5 ATTRIBUTE_UNUSED,
+				__m128 i6 ATTRIBUTE_UNUSED,
+				__m128 i7 ATTRIBUTE_UNUSED,
+				__m128 i8 ATTRIBUTE_UNUSED,
+				__m128 i9 ATTRIBUTE_UNUSED,
+				__m128 i10 ATTRIBUTE_UNUSED,
+				__m128 i11 ATTRIBUTE_UNUSED,
+				__m128 i12 ATTRIBUTE_UNUSED,
+				__m128 i13 ATTRIBUTE_UNUSED,
+				__m128 i14 ATTRIBUTE_UNUSED,
+				__m128 i15 ATTRIBUTE_UNUSED,
+				__m128 i16 ATTRIBUTE_UNUSED,
+				__m128 i17 ATTRIBUTE_UNUSED,
+				__m128 i18 ATTRIBUTE_UNUSED,
+				__m128 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+void
+fun_check_passing_m128h_20_regs (__m128h i0 ATTRIBUTE_UNUSED,
+				 __m128h i1 ATTRIBUTE_UNUSED,
+				 __m128h i2 ATTRIBUTE_UNUSED,
+				 __m128h i3 ATTRIBUTE_UNUSED,
+				 __m128h i4 ATTRIBUTE_UNUSED,
+				 __m128h i5 ATTRIBUTE_UNUSED,
+				 __m128h i6 ATTRIBUTE_UNUSED,
+				 __m128h i7 ATTRIBUTE_UNUSED,
+				 __m128h i8 ATTRIBUTE_UNUSED,
+				 __m128h i9 ATTRIBUTE_UNUSED,
+				 __m128h i10 ATTRIBUTE_UNUSED,
+				 __m128h i11 ATTRIBUTE_UNUSED,
+				 __m128h i12 ATTRIBUTE_UNUSED,
+				 __m128h i13 ATTRIBUTE_UNUSED,
+				 __m128h i14 ATTRIBUTE_UNUSED,
+				 __m128h i15 ATTRIBUTE_UNUSED,
+				 __m128h i16 ATTRIBUTE_UNUSED,
+				 __m128h i17 ATTRIBUTE_UNUSED,
+				 __m128h i18 ATTRIBUTE_UNUSED,
+				 __m128h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+#define def_check_int_passing8(_i0, _i1, _i2, _i3, \
+			       _i4, _i5, _i6, _i7, \
+			       _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
+  clear_float_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
+
+#define def_check_int_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, \
+				_i7, _i8, _i9, _i10, _i11, _i12, _i13, \
+				_i14, _i15, _i16, _i17, _i18, _i19, \
+				_func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  values.i10.TYPE[0] = _i10; \
+  values.i11.TYPE[0] = _i11; \
+  values.i12.TYPE[0] = _i12; \
+  values.i13.TYPE[0] = _i13; \
+  values.i14.TYPE[0] = _i14; \
+  values.i15.TYPE[0] = _i15; \
+  values.i16.TYPE[0] = _i16; \
+  values.i17.TYPE[0] = _i17; \
+  values.i18.TYPE[0] = _i18; \
+  values.i19.TYPE[0] = _i19; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \
+		     _i17, _i18, _i19); \
+  clear_float_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \
+		     _i17, _i18, _i19);
+
+void
+test_m64_on_stack ()
+{
+  __m64 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m64){32 + i, 0};
+  pass = "m64-8";
+  def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			  fun_check_passing_m64_8_values,
+			  fun_check_passing_m64_8_regs, _m64);
+}
+
+void
+test_too_many_m64 ()
+{
+  __m64 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m64){32 + i, 0};
+  pass = "m64-20";
+  def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			   x[8], x[9], x[10], x[11], x[12], x[13], x[14],
+			   x[15], x[16], x[17], x[18], x[19],
+			   fun_check_passing_m64_20_values,
+			   fun_check_passing_m64_20_regs, _m64);
+}
+
+void
+test_m128_on_stack ()
+{
+  __m128 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m128){32 + i, 0, 0, 0};
+  pass = "m128-8";
+  def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			  fun_check_passing_m128_8_values,
+			  fun_check_passing_m128_8_regs, _m128);
+}
+
+void
+test_m128h_on_stack ()
+{
+  __m128h x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16,
+	             6.6f16, 7.7f16, 8.8f16};
+  pass = "m128h-8";
+  def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			  fun_check_passing_m128h_8_values,
+			  fun_check_passing_m128h_8_regs, _m128h);
+}
+
+void
+test_too_many_m128 ()
+{
+  __m128 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m128){32 + i, 0, 0, 0};
+  pass = "m128-20";
+  def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			   x[8], x[9], x[10], x[11], x[12], x[13], x[14],
+			   x[15], x[16], x[17], x[18], x[19],
+			   fun_check_passing_m128_20_values,
+			   fun_check_passing_m128_20_regs, _m128);
+}
+
+void
+test_too_many_m128h ()
+{
+  __m128h x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16,
+	             6.6f16, 7.7f16, 8.8f16};
+  pass = "m128h-20";
+  def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			   x[8], x[9], x[10], x[11], x[12], x[13], x[14],
+			   x[15], x[16], x[17], x[18], x[19],
+			   fun_check_passing_m128h_20_values,
+			   fun_check_passing_m128h_20_regs, _m128h);
+}
+
+static void
+do_test (void)
+{
+  test_m64_on_stack ();
+  test_too_many_m64 ();
+  test_m128_on_stack ();
+  test_too_many_m128 ();
+  test_m128h_on_stack ();
+  test_too_many_m128h ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
new file mode 100644
index 00000000000..4d1956a846d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
@@ -0,0 +1,332 @@
+/* This tests passing of structs. */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "args.h"
+#include <complex.h>
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+struct int_struct
+{
+  int i;
+};
+
+struct long_struct
+{
+  long long l;
+};
+
+struct long2_struct
+{
+  long long l1, l2;
+};
+
+struct long3_struct
+{
+  long long l1, l2, l3;
+};
+
+
+/* Check that the struct is passed as the individual members in iregs.  */
+void
+check_struct_passing1 (struct int_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_struct_passing2 (struct long_struct ls ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_struct_passing3 (struct long2_struct ls ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_struct_passing4 (struct long3_struct ls ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ls.l1 == rsp+8);
+  assert ((unsigned long)&ls.l2 == rsp+16);
+  assert ((unsigned long)&ls.l3 == rsp+24);
+}
+
+#ifdef CHECK_M64_M128
+struct m128_struct
+{
+  __m128 x;
+};
+
+struct m128_2_struct
+{
+  __m128 x1, x2;
+};
+
+/* Check that the struct is passed as the individual members in fregs.  */
+void
+check_struct_passing5 (struct m128_struct ms1 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms2 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms3 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms4 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms5 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms6 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms7 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+void
+check_struct_passing6 (struct m128_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+24);
+}
+#endif
+
+struct flex1_struct
+{
+  long long i;
+  long long flex[];
+};
+
+struct flex2_struct
+{
+  long long i;
+  long long flex[0];
+};
+
+void
+check_struct_passing7 (struct flex1_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_struct_passing8 (struct flex2_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+struct complex1_struct
+{
+  int c;
+  __complex__ float x;
+};
+
+struct complex1a_struct
+{
+  long long l;
+  float f;
+};
+
+struct complex2_struct
+{
+  int c;
+  __complex__ float x;
+  float y;
+};
+
+struct complex2a_struct
+{
+  long long l;
+  double d;
+};
+
+struct complex3_struct
+{
+  int c;
+  __complex__ _Float16 x;
+};
+
+struct complex3a_struct
+{
+  long long l;
+  _Float16 f;
+};
+
+struct complex4_struct
+{
+  int c;
+  __complex__ _Float16 x;
+  _Float16 y;
+};
+
+struct complex4a_struct
+{
+  long long l;
+  _Float16 f;
+};
+
+void
+check_struct_passing9 (struct complex1_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_float_arguments;
+}
+
+void
+check_struct_passing10 (struct complex2_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_double_arguments;
+}
+
+void
+check_struct_passing11 (struct complex3_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_float16_arguments;
+}
+
+void
+check_struct_passing12 (struct complex4_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_float16_arguments;
+}
+
+static struct flex1_struct f1s = { 60, { } };
+static struct flex2_struct f2s = { 61, { } };
+
+static void
+do_test (void)
+{
+  struct int_struct is = { 48 };
+  struct long_struct ls = { 49 };
+#ifdef CHECK_LARGER_STRUCTS
+  struct long2_struct l2s = { 50, 51 };
+  struct long3_struct l3s = { 52, 53, 54 };
+#endif
+#ifdef CHECK_M64_M128
+  struct m128_struct m128s[8];
+  struct m128_2_struct m128_2s = { 
+      { 48.394, 39.3, -397.9, 3484.9 },
+      { -8.394, -93.3, 7.9, 84.94 }
+  };
+  int i;
+#endif
+  struct complex1_struct c1s = { 4, ( -13.4 + 3.5*I ) };
+  union
+    {
+      struct complex1_struct c;
+      struct complex1a_struct u;
+    } c1u;
+  struct complex2_struct c2s = { 4, ( -13.4 + 3.5*I ), -34.5 };
+  union
+    {
+      struct complex2_struct c;
+      struct complex2a_struct u;
+    } c2u;
+
+  struct complex3_struct c3s = { 4, ( -13.4 + 3.5*I ) };
+  union
+    {
+      struct complex3_struct c;
+      struct complex3a_struct u;
+    } c3u;
+
+  struct complex4_struct c4s = { 4, ( -13.4 + 3.5*I ), -34.5 };
+  union
+    {
+      struct complex4_struct c;
+      struct complex4a_struct u;
+    } c4u;
+
+  clear_struct_registers;
+  iregs.I0 = is.i;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing1)(is);
+
+  clear_struct_registers;
+  iregs.I0 = ls.l;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing2)(ls);
+
+#ifdef CHECK_LARGER_STRUCTS
+  clear_struct_registers;
+  iregs.I0 = l2s.l1;
+  iregs.I1 = l2s.l2;
+  num_iregs = 2;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing3)(l2s);
+  WRAP_CALL (check_struct_passing4)(l3s);
+#endif
+
+#ifdef CHECK_M64_M128
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      m128s[i].x = (__m128){32+i, 0, i, 0};
+      (&fregs.xmm0)[i]._m128[0] = m128s[i].x;
+    }
+  num_fregs = 8;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing5)(m128s[0], m128s[1], m128s[2], m128s[3],
+				    m128s[4], m128s[5], m128s[6], m128s[7]);
+  WRAP_CALL (check_struct_passing6)(m128_2s);
+#endif
+
+  clear_struct_registers;
+  iregs.I0 = f1s.i;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing7)(f1s);
+
+  clear_struct_registers;
+  iregs.I0 = f2s.i;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing8)(f2s);
+
+  clear_struct_registers;
+  c1u.c = c1s;
+  iregs.I0 = c1u.u.l;
+  num_iregs = 1;
+  fregs.xmm0._float [0] = c1u.u.f;
+  num_fregs = 1;
+  clear_int_hardware_registers;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing9)(c1s);
+
+  clear_struct_registers;
+  c2u.c = c2s;
+  iregs.I0 = c2u.u.l;
+  num_iregs = 1;
+  fregs.xmm0._double[0] = c2u.u.d;
+  num_fregs = 1;
+  clear_int_hardware_registers;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing10)(c2s);
+
+  clear_struct_registers;
+  c3u.c = c3s;
+  iregs.I0 = c3u.u.l;
+  num_iregs = 1;
+  num_fregs = 0;
+  clear_int_hardware_registers;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing11)(c3s);
+
+  clear_struct_registers;
+  c4u.c = c4s;
+  iregs.I0 = c4u.u.l;
+  num_iregs = 1;
+  fregs.xmm0.__Float16 [0] = c4u.u.f;
+  num_fregs = 1;
+  clear_int_hardware_registers;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing12)(c4s);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
new file mode 100644
index 00000000000..640b3057f93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
@@ -0,0 +1,335 @@
+/* This tests passing of structs.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+struct int_struct
+{
+  int i;
+};
+
+struct long_struct
+{
+  long l;
+};
+
+union un1
+{
+  char c;
+  int i;
+};
+
+union un2
+{
+  char c1;
+  long l;
+  char c2;
+};
+
+union un3
+{
+  struct int_struct is;
+  struct long_struct ls;
+  union un1 un;
+};
+
+
+void
+check_union_passing1(union un1 u ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_union_passing3(union un3 u ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+#define check_union_passing1 WRAP_CALL(check_union_passing1)
+#define check_union_passing2 WRAP_CALL(check_union_passing2)
+#define check_union_passing3 WRAP_CALL(check_union_passing3)
+
+#ifdef CHECK_M64_M128
+union un4
+{
+  __m128 x;
+  float f;
+};
+
+union un5
+{
+  __m128 x;
+  long i;
+};
+
+void
+check_union_passing4(union un4 u1 ATTRIBUTE_UNUSED,
+		     union un4 u2 ATTRIBUTE_UNUSED,
+		     union un4 u3 ATTRIBUTE_UNUSED,
+		     union un4 u4 ATTRIBUTE_UNUSED,
+		     union un4 u5 ATTRIBUTE_UNUSED,
+		     union un4 u6 ATTRIBUTE_UNUSED,
+		     union un4 u7 ATTRIBUTE_UNUSED,
+		     union un4 u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+void
+check_union_passing5(union un5 u ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_vector_arguments(m128, 8);
+}
+
+union un4a
+{
+  __m128 x;
+  _Float16 f;
+};
+
+void
+check_union_passing4a(union un4a u1 ATTRIBUTE_UNUSED,
+		      union un4a u2 ATTRIBUTE_UNUSED,
+		      union un4a u3 ATTRIBUTE_UNUSED,
+		      union un4a u4 ATTRIBUTE_UNUSED,
+		      union un4a u5 ATTRIBUTE_UNUSED,
+		      union un4a u6 ATTRIBUTE_UNUSED,
+		      union un4a u7 ATTRIBUTE_UNUSED,
+		      union un4a u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+union un4b
+{
+  __m128h x;
+  _Float16 f;
+};
+
+void
+check_union_passing4b(union un4b u1 ATTRIBUTE_UNUSED,
+		      union un4b u2 ATTRIBUTE_UNUSED,
+		      union un4b u3 ATTRIBUTE_UNUSED,
+		      union un4b u4 ATTRIBUTE_UNUSED,
+		      union un4b u5 ATTRIBUTE_UNUSED,
+		      union un4b u6 ATTRIBUTE_UNUSED,
+		      union un4b u7 ATTRIBUTE_UNUSED,
+		      union un4b u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+#define check_union_passing4 WRAP_CALL(check_union_passing4)
+#define check_union_passing4a WRAP_CALL(check_union_passing4a)
+#define check_union_passing4b WRAP_CALL(check_union_passing4b)
+#define check_union_passing5 WRAP_CALL(check_union_passing5)
+#endif
+
+union un6
+{
+  long double ld;
+  int i;
+};
+
+
+void
+check_union_passing6(union un6 u ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.ld == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+#define check_union_passing6 WRAP_CALL(check_union_passing6)
+
+union un7
+{
+  long double ld;
+  _Float16 f;
+};
+
+void
+check_union_passing7(union un7 u ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.ld == rsp+8);
+  assert ((unsigned long)&u.f == rsp+8);
+}
+
+#define check_union_passing7 WRAP_CALL(check_union_passing7)
+
+union un8
+{
+  _Float16 f;
+  int i;
+};
+
+void
+check_union_passing8(union un8 u ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+#define check_union_passing8 WRAP_CALL(check_union_passing8)
+
+static void
+do_test (void)
+{
+  union un1 u1;
+#ifdef CHECK_LARGER_UNION_PASSING
+  union un2 u2;
+  union un3 u3;
+  struct int_struct is;
+  struct long_struct ls;
+#endif /* CHECK_LARGER_UNION_PASSING */
+#ifdef CHECK_M64_M128
+  union un4 u4[8];
+  union un4a u4a[8];
+  union un4b u4b[8];
+  union un5 u5 = { { 48.394, 39.3, -397.9, 3484.9 } };
+  int i;
+#endif
+  union un6 u6;
+  union un7 u7;
+  union un8 u8;
+
+  /* Check a union with char, int.  */
+  clear_struct_registers;
+  u1.i = 0;  /* clear the struct to not have high bits left */
+  u1.c = 32;
+  iregs.I0 = 32;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing1(u1);
+  u1.i = 0;  /* clear the struct to not have high bits left */
+  u1.i = 33;
+  iregs.I0 = 33;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing1(u1);
+
+  /* Check a union with char, long, char.  */
+#ifdef CHECK_LARGER_UNION_PASSING
+  clear_struct_registers;
+  u2.l = 0;  /* clear the struct to not have high bits left */
+  u2.c1 = 34;
+  iregs.I0 = 34;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing2(u2);
+  u2.l = 0;  /* clear the struct to not have high bits left */
+  u2.l = 35;
+  iregs.I0 = 35;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing2(u2);
+  u2.l = 0;  /* clear the struct to not have high bits left */
+  u2.c2 = 36;
+  iregs.I0 = 36;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing2(u2);
+
+  /* check a union containing two structs and a union.  */
+  clear_struct_registers;
+  is.i = 37;
+  u3.ls.l = 0;  /* clear the struct to not have high bits left */
+  u3.is = is;
+  iregs.I0 = 37;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing3(u3);
+  ls.l = 38;
+  u3.ls.l = 0;  /* clear the struct to not have high bits left */
+  u3.ls = ls;
+  iregs.I0 = 38;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing3(u3);
+  u1.c = 39;
+  u3.ls.l = 0;  /* clear the struct to not have high bits left */
+  u3.un = u1;
+  iregs.I0 = 39;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing3(u3);
+  u1.i = 40;
+  u3.ls.l = 0;  /* clear the struct to not have high bits left */
+  u3.un = u1;
+  iregs.I0 = 40;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing3(u3);
+#endif /* CHECK_LARGER_UNION_PASSING */
+
+#ifdef CHECK_M64_M128
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u4[i].x = (__m128){32+i, 0, i, 0};
+      (&fregs.xmm0)[i]._m128[0] = u4[i].x;
+    }
+  num_fregs = 8;
+  clear_float_hardware_registers;
+  check_union_passing4(u4[0], u4[1], u4[2], u4[3],
+		       u4[4], u4[5], u4[6], u4[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u4a[i].x = (__m128){32+i, 0, i, 0};
+      (&fregs.xmm0)[i]._m128[0] = u4[i].x;
+    }
+  num_fregs = 8;
+  clear_float_hardware_registers;
+  check_union_passing4a(u4a[0], u4a[1], u4a[2], u4a[3],
+		       u4a[4], u4a[5], u4a[6], u4a[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u4b[i].x = (__m128h){33+i, 0, i, 0, -i, 1, 2 * i, i + 8};
+      (&fregs.xmm0)[i]._m128h[0] = u4b[i].x;
+    }
+  num_fregs = 8;
+  clear_float_hardware_registers;
+  check_union_passing4b(u4b[0], u4b[1], u4b[2], u4b[3],
+		        u4b[4], u4b[5], u4b[6], u4b[7]);
+
+  clear_struct_registers;
+  fregs.xmm0._m128[0] = u5.x;
+  num_fregs = 1;
+  num_iregs = 1;
+  iregs.I0 = u5.i;
+  clear_float_hardware_registers;
+  check_union_passing5(u5);
+#endif
+
+  u6.i = 2;
+  check_union_passing6(u6);
+
+  u7.f = 2.0f16;
+  check_union_passing7(u7);
+
+  clear_struct_registers;
+  u8.i = 8;
+  num_iregs = 1;
+  iregs.I0 = u8.i;
+  clear_int_hardware_registers;
+  check_union_passing8(u8);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
new file mode 100644
index 00000000000..92578127be7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
@@ -0,0 +1,274 @@
+/* This tests returning of structures.  */
+
+#include <stdio.h>
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+int current_test;
+int num_failed = 0;
+
+#undef assert
+#define assert(test) do { if (!(test)) {fprintf (stderr, "failed in test %d\n", current_test); num_failed++; } } while (0)
+
+#define xmm0h xmm_regs[0].__Float16
+#define xmm1h xmm_regs[1].__Float16
+#define xmm0f xmm_regs[0]._float
+#define xmm0d xmm_regs[0]._double
+#define xmm1f xmm_regs[1]._float
+#define xmm1d xmm_regs[1]._double
+
+typedef enum {
+  INT = 0,
+  SSE_H,
+  SSE_F,
+  SSE_D,
+  X87,
+  MEM,
+  INT_SSE,
+  SSE_INT,
+  SSE_F_V,
+  SSE_F_H,
+  SSE_F_H8
+} Type;
+
+/* Structures which should be returned in INTEGER.  */
+#define D(I,MEMBERS,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = INT; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s)); B; return s; }
+
+D(1,char m1, s.m1=42)
+D(2,short m1, s.m1=42)
+D(3,int m1, s.m1=42)
+D(4,long m1, s.m1=42)
+D(5,long long m1, s.m1=42)
+D(6,char m1;short s, s.m1=42)
+D(7,char m1;int i, s.m1=42)
+D(8,char m1; long l, s.m1=42)
+D(9,char m1; long long l, s.m1=42)
+D(10,char m1[16], s.m1[0]=42)
+D(11,short m1[8], s.m1[0]=42)
+D(12,int m1[4], s.m1[0]=42)
+D(13,long m1[2], s.m1[0]=42)
+D(14,long long m1[2], s.m1[0]=42)
+
+#undef D
+
+/* Structures which should be returned in SSE.  */
+#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s)); B; return s; }
+
+D(100,float f,SSE_F, s.f=42)
+D(101,double d,SSE_D, s.d=42)
+D(102,float f;float f2,SSE_F, s.f=42)
+D(103,float f;double d,SSE_F, s.f=42)
+D(104,double d; float f,SSE_D, s.d=42)
+D(105,double d; double d2,SSE_D, s.d=42)
+D(106,float f[2],SSE_F, s.f[0]=42)
+D(107,float f[3],SSE_F, s.f[0]=42)
+D(108,float f[4],SSE_F, s.f[0]=42)
+D(109,double d[2],SSE_D, s.d[0]=42)
+D(110,float f[2]; double d,SSE_F, s.f[0]=42)
+D(111,double d;float f[2],SSE_D, s.d=42)
+
+D(120,_Float16 f,SSE_H, s.f=42)
+D(121,_Float16 f;_Float16 f2,SSE_H, s.f=42)
+D(122,_Float16 f;float d,SSE_H, s.f=42)
+D(123,_Float16 f;double d,SSE_H, s.f=42)
+D(124,double d; _Float16 f,SSE_D, s.d=42)
+D(125,_Float16 f[2],SSE_H, s.f[0]=42)
+D(126,_Float16 f[3],SSE_H, s.f[0]=42)
+D(127,_Float16 f[4],SSE_H, s.f[0]=42)
+D(128,_Float16 f[2]; double d,SSE_H, s.f[0]=42)
+D(129,double d;_Float16 f[2],SSE_D, s.d=42)
+
+#undef D
+
+/* Structures which should be returned on x87 stack.  */
+#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = X87; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s = { 42 }; return s; }
+
+/* The only struct containing a long double, which is returned in
+   registers at all, is the singleton struct.  All others are too large.
+   This includes a struct containing complex long double, which is passed
+   in memory, although a complex long double type itself is returned in
+   two registers.  */
+D(200,long double ld)
+
+#undef D
+
+/* Structures which should be returned in INT (low) and SSE (high).  */
+#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = INT_SSE; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s = { 42,43 }; return s; }
+
+D(300,char m1; float m2)
+D(301,char m1; double m2)
+D(302,short m1; float m2)
+D(303,short m1; double m2)
+D(304,int m1; float m2)
+D(305,int m1; double m2)
+D(306,long long m1; float m2)
+D(307,long long m1; double m2)
+
+D(310,char m1; _Float16 m2)
+D(311,short m1; _Float16 m2)
+D(312,int m1; _Float16 m2)
+D(313,long long m1; _Float16 m2)
+
+#undef D
+
+void check_300 (void)
+{
+  XMM_T x;
+  x._ulong[0] = rax;
+  switch (current_test) {
+    case 300: assert ((rax & 0xff) == 42 && x._float[1] == 43); break;
+    case 301: assert ((rax & 0xff) == 42 && xmm0d[0] == 43); break;
+    case 302: assert ((rax & 0xffff) == 42 && x._float[1] == 43); break;
+    case 303: assert ((rax & 0xffff) == 42 && xmm0d[0] == 43); break;
+    case 304: assert ((rax & 0xffffffff) == 42 && x._float[1] == 43); break;
+    case 305: assert ((rax & 0xffffffff) == 42 && xmm0d[0] == 43); break;
+    case 306: assert (rax == 42 && xmm0f[0] == 43); break;
+    case 307: assert (rax == 42 && xmm0d[0] == 43); break;
+    case 310: assert ((rax & 0xff) == 42 && x.__Float16[1] == 43); break;
+    case 311: assert ((rax & 0xffff) == 42 && x.__Float16[1] == 43); break;
+    case 312: assert ((rax & 0xffffffff) == 42 && x.__Float16[2] == 43); break;
+    case 313: assert (rax == 42 && xmm0h[0] == 43); break;
+
+    default: assert (0); break;
+  }
+}
+
+/* Structures which should be returned in SSE (low) and INT (high).  */
+#define D(I,MEMBERS,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = SSE_INT; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s));  B; return s; }
+
+D(400,float f[2];char c, s.f[0]=42; s.c=43)
+D(401,double d;char c, s.d=42; s.c=43)
+
+D(402,_Float16 f[4];char c, s.f[0]=42; s.c=43)
+
+#undef D
+
+void check_400 (void)
+{
+  switch (current_test) {
+    case 400: assert (xmm0f[0] == 42 && (rax & 0xff) == 43); break;
+    case 401: assert (xmm0d[0] == 42 && (rax & 0xff) == 43); break;
+    case 402: assert (xmm0h[0] == 42 && (rax & 0xff) == 43); break;
+
+    default: assert (0); break;
+  }
+}
+
+/* Structures which should be returned in MEM.  */
+void *struct_addr;
+#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = MEM; \
+struct S_ ## I f_ ## I (void) { union {unsigned char c; struct S_ ## I s;} u; memset (&u.s, 0, sizeof(u.s)); u.c = 42; return u.s; }
+
+/* Too large.  */
+D(500,char m1[17])
+D(501,short m1[9])
+D(502,int m1[5])
+D(503,long m1[3])
+D(504,short m1[8];char c)
+D(505,char m1[1];int i[4])
+D(506,float m1[5])
+D(507,double m1[3])
+D(508,char m1[1];float f[4])
+D(509,char m1[1];double d[2])
+D(510,__complex long double m1[1])
+
+/* Too large due to padding.  */
+D(520,char m1[1];int i;char c2; int i2; char c3)
+
+/* Unnaturally aligned members.  */
+D(530,short m1[1];int i PACKED)
+
+D(540,_Float16 m1[10])
+D(541,char m1[1];_Float16 f[8])
+
+#undef D
+
+
+/* Special tests.  */
+#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; B; return s; }
+D(600,float f[4], SSE_F_V, s.f[0] = s.f[1] = s.f[2] = s.f[3] = 42)
+D(601,_Float16 f[4], SSE_F_H, s.f[0] = s.f[1] = s.f[2] = s.f[3] = 42)
+D(602,_Float16 f[8], SSE_F_H8,
+  s.f[0] = s.f[1] = s.f[2] = s.f[3] = s.f[4] = s.f[5] = s.f[6] = s.f[7] = 42)
+#undef D
+
+void clear_all (void)
+{
+  clear_int_registers;
+  clear_float_registers;
+  clear_x87_registers;
+}
+
+void check_all (Type class, unsigned long size)
+{
+  switch (class) {
+    case INT: if (size < 8) rax &= ~0UL >> (64-8*size); assert (rax == 42); break;
+    case SSE_H: assert (xmm0h[0] == 42); break;
+    case SSE_F: assert (xmm0f[0] == 42); break;
+    case SSE_D: assert (xmm0d[0] == 42); break;
+    case SSE_F_V: assert (xmm0f[0] == 42 && xmm0f[1]==42 && xmm1f[0] == 42 && xmm1f[1] == 42); break;
+    case SSE_F_H: assert (xmm0h[0] == 42 && xmm0h[1]==42 && xmm0h[2] == 42 && xmm0h[3] == 42); break;
+    case SSE_F_H8: assert (xmm0h[0] == 42 && xmm0h[1]==42 && xmm0h[2] == 42 && xmm0h[3] == 42
+			   && xmm1h[0] == 42 && xmm1h[1]==42 && xmm1h[2] == 42 && xmm1h[3] == 42); break;
+    case X87: assert (x87_regs[0]._ldouble == 42); break;
+    case INT_SSE: check_300(); break;
+    case SSE_INT: check_400(); break;
+    /* Ideally we would like to check that rax == struct_addr.
+       Unfortunately the address of the target struct escapes (for setting
+       struct_addr), so the return struct is a temporary one whose address
+       is given to the f_* functions, otherwise a conforming program
+       could notice the struct changing already before the function returns.
+       This temporary struct could be anywhere.  For GCC it will be on
+       stack, but no one is forbidding that it could be a static variable
+       if there's no threading or proper locking.  Nobody in his right mind
+       will not use the stack for that.  */
+    case MEM: assert (*(unsigned char*)struct_addr == 42 && rdi == rax); break;
+  }
+}
+
+#define D(I) { struct S_ ## I s; current_test = I; struct_addr = (void*)&s; \
+  clear_all(); \
+  s = WRAP_RET(f_ ## I) (); \
+  check_all(class_ ## I, sizeof(s)); \
+}
+
+static void
+do_test (void)
+{
+  D(1) D(2) D(3) D(4) D(5) D(6) D(7) D(8) D(9) D(10) D(11) D(12) D(13) D(14)
+  
+  D(100) D(101) D(102) D(103) D(104) D(105) D(106) D(107) D(108) D(109) D(110)
+  D(111)
+  
+  D(120) D(121) D(122) D(123) D(124) D(125) D(126) D(127) D(128) D(129)
+
+  D(200)
+
+  D(300) D(301) D(302) D(303) D(304) D(305) D(306) D(307)
+  D(310) D(311) D(312) D(313)
+
+  D(400) D(401) D(402)
+
+  D(500) D(501) D(502) D(503) D(504) D(505) D(506) D(507) D(508) D(509)
+  D(520)
+  D(530)
+
+  D(540) D(541)
+
+  D(600) D(601) D(602)
+  if (num_failed)
+    abort ();
+}
+#undef D
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c
new file mode 100644
index 00000000000..5bdc44db5f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c
@@ -0,0 +1,164 @@
+/* Test variable number of 128-bit vector arguments passed to functions.  */
+
+#include <stdio.h>
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m128_varargs (__m128 i0, __m128 i1, __m128 i2,
+				__m128 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m128 *argp;
+
+  compare (values.i0, i0, __m128);
+  compare (values.i1, i1, __m128);
+  compare (values.i2, i2, __m128);
+  compare (values.i3, i3, __m128);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m128 *) (((char *) fp) + 8);
+
+  /* Check __m128 arguments passed on stack.  */
+  compare (values.i8, argp[0], __m128);
+  compare (values.i9, argp[1], __m128);
+
+  /* Check register contents.  */
+  compare (fregs.xmm0, xmm_regs[0], __m128);
+  compare (fregs.xmm1, xmm_regs[1], __m128);
+  compare (fregs.xmm2, xmm_regs[2], __m128);
+  compare (fregs.xmm3, xmm_regs[3], __m128);
+  compare (fregs.xmm4, xmm_regs[4], __m128);
+  compare (fregs.xmm5, xmm_regs[5], __m128);
+  compare (fregs.xmm6, xmm_regs[6], __m128);
+  compare (fregs.xmm7, xmm_regs[7], __m128);
+}
+
+void
+fun_check_passing_m128h_varargs (__m128h i0, __m128h i1, __m128h i2,
+				 __m128h i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m128h *argp;
+
+  compare (values.i0, i0, __m128h);
+  compare (values.i1, i1, __m128h);
+  compare (values.i2, i2, __m128h);
+  compare (values.i3, i3, __m128h);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m128h *) (((char *) fp) + 8);
+
+  /* Check __m128h arguments passed on stack.  */
+  compare (values.i8, argp[0], __m128h);
+  compare (values.i9, argp[1], __m128h);
+
+  /* Check register contents.  */
+  compare (fregs.xmm0, xmm_regs[0], __m128h);
+  compare (fregs.xmm1, xmm_regs[1], __m128h);
+  compare (fregs.xmm2, xmm_regs[2], __m128h);
+  compare (fregs.xmm3, xmm_regs[3], __m128h);
+  compare (fregs.xmm4, xmm_regs[4], __m128h);
+  compare (fregs.xmm5, xmm_regs[5], __m128h);
+  compare (fregs.xmm6, xmm_regs[6], __m128h);
+  compare (fregs.xmm7, xmm_regs[7], __m128h);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+				      _i6, _i7, _i8, _i9, \
+				      _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_float_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m128_varargs (void)
+{
+  __m128 x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m128){32+i, 0, 0, 0};
+  pass = "m128-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m128_varargs,
+				 _m128);
+}
+
+void
+test_m128h_varargs (void)
+{
+  __m128h x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m128h) {
+        1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+	5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i
+    };
+  pass = "m128h-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m128h_varargs,
+				 _m128h);
+}
+
+static void
+do_test (void)
+{
+  test_m128_varargs ();
+  test_m128h_varargs ();
+  if (failed)
+    abort ();
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 05/62] AVX512FP16: Add ABI test for ymm.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (3 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 04/62] AVX512FP16: Add ABI tests for xmm liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 06/62] AVX512FP16: Add abi test for zmm liuhongt
                   ` (56 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp:
	New exp file.
	* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header.
	* gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S: New.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c:
	New test.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c: Likewise.
---
 .../avx512fp16/m256h/abi-avx512fp16-ymm.exp   |  45 +++
 .../x86_64/abi/avx512fp16/m256h/args.h        | 182 +++++++++
 .../x86_64/abi/avx512fp16/m256h/asm-support.S |  81 ++++
 .../avx512fp16/m256h/avx512fp16-ymm-check.h   |   3 +
 .../avx512fp16/m256h/test_m256_returning.c    |  54 +++
 .../abi/avx512fp16/m256h/test_passing_m256.c  | 370 ++++++++++++++++++
 .../avx512fp16/m256h/test_passing_structs.c   | 113 ++++++
 .../avx512fp16/m256h/test_passing_unions.c    | 337 ++++++++++++++++
 .../abi/avx512fp16/m256h/test_varargs-m256.c  | 160 ++++++++
 9 files changed, 1345 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
new file mode 100644
index 00000000000..ecf673bf796
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
@@ -0,0 +1,45 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib file-format.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || [is-effective-target ia32]
+     || [gcc_target_object_format] != "elf"
+     || ![is-effective-target avx512fp16] } then {
+  return
+}
+
+
+torture-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -Wno-abi -mavx512fp16"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+    if {[runtest_file_p $runtests $src]} {
+	c-torture-execute [list $src \
+				$srcdir/$subdir/asm-support.S] \
+				$additional_flags
+    }
+}
+
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
new file mode 100644
index 00000000000..136db48c144
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
@@ -0,0 +1,182 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include <immintrin.h>
+#include <string.h>
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 ymm0
+#define F1 ymm1
+#define F2 ymm2
+#define F3 ymm3
+#define F4 ymm4
+#define F5 ymm5
+#define F6 ymm6
+#define F7 ymm7
+
+typedef union {
+  _Float16 __Float16[16];
+  float _float[8];
+  double _double[4];
+  long _long[4];
+  int _int[8];
+  unsigned long _ulong[4];
+  __m64 _m64[4];
+  __m128 _m128[2];
+  __m256 _m256[1];
+  __m256h _m256h[1];
+} YMM_T;
+
+typedef union {
+  float _float;
+  double _double;
+  long double _ldouble;
+  unsigned long _ulong[2];
+} X87_T;
+extern void (*callthis)(void);
+extern unsigned long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
+YMM_T ymm_regs[16];
+X87_T x87_regs[8];
+extern volatile unsigned long volatile_var;
+extern void snapshot (void);
+extern void snapshot_ret (void);
+#define WRAP_CALL(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
+#define WRAP_RET(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
+
+/* Clear all integer registers.  */
+#define clear_int_hardware_registers \
+  asm __volatile__ ("xor %%rax, %%rax\n\t" \
+		    "xor %%rbx, %%rbx\n\t" \
+		    "xor %%rcx, %%rcx\n\t" \
+		    "xor %%rdx, %%rdx\n\t" \
+		    "xor %%rsi, %%rsi\n\t" \
+		    "xor %%rdi, %%rdi\n\t" \
+		    "xor %%r8, %%r8\n\t" \
+		    "xor %%r9, %%r9\n\t" \
+		    "xor %%r10, %%r10\n\t" \
+		    "xor %%r11, %%r11\n\t" \
+		    "xor %%r12, %%r12\n\t" \
+		    "xor %%r13, %%r13\n\t" \
+		    "xor %%r14, %%r14\n\t" \
+		    "xor %%r15, %%r15\n\t" \
+		    ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
+		    "r9", "r10", "r11", "r12", "r13", "r14", "r15");
+
+/* This is the list of registers available for passing arguments. Not all of
+   these are used or even really available.  */
+struct IntegerRegisters
+{
+  unsigned long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
+};
+struct FloatRegisters
+{
+  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
+  long double st0, st1, st2, st3, st4, st5, st6, st7;
+  YMM_T ymm0, ymm1, ymm2, ymm3, ymm4, ymm5, ymm6, ymm7, ymm8, ymm9,
+        ymm10, ymm11, ymm12, ymm13, ymm14, ymm15;
+};
+
+/* Implemented in scalarargs.c  */
+extern struct IntegerRegisters iregs;
+extern struct FloatRegisters fregs;
+extern unsigned int num_iregs, num_fregs;
+
+#define check_int_arguments do { \
+  assert (num_iregs <= 0 || iregs.I0 == I0); \
+  assert (num_iregs <= 1 || iregs.I1 == I1); \
+  assert (num_iregs <= 2 || iregs.I2 == I2); \
+  assert (num_iregs <= 3 || iregs.I3 == I3); \
+  assert (num_iregs <= 4 || iregs.I4 == I4); \
+  assert (num_iregs <= 5 || iregs.I5 == I5); \
+  } while (0)
+
+#define check_char_arguments check_int_arguments
+#define check_short_arguments check_int_arguments
+#define check_long_arguments check_int_arguments
+
+/* Clear register struct.  */
+#define clear_struct_registers \
+  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
+    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
+  memset (&iregs, 0, sizeof (iregs)); \
+  memset (&fregs, 0, sizeof (fregs)); \
+  memset (ymm_regs, 0, sizeof (ymm_regs)); \
+  memset (x87_regs, 0, sizeof (x87_regs));
+
+/* Clear both hardware and register structs for integers.  */
+#define clear_int_registers \
+  clear_struct_registers \
+  clear_int_hardware_registers
+
+/* TODO: Do the checking.  */
+#define check_f_arguments(T) do { \
+  assert (num_fregs <= 0 || fregs.ymm0._ ## T [0] == ymm_regs[0]._ ## T [0]); \
+  assert (num_fregs <= 1 || fregs.ymm1._ ## T [0] == ymm_regs[1]._ ## T [0]); \
+  assert (num_fregs <= 2 || fregs.ymm2._ ## T [0] == ymm_regs[2]._ ## T [0]); \
+  assert (num_fregs <= 3 || fregs.ymm3._ ## T [0] == ymm_regs[3]._ ## T [0]); \
+  assert (num_fregs <= 4 || fregs.ymm4._ ## T [0] == ymm_regs[4]._ ## T [0]); \
+  assert (num_fregs <= 5 || fregs.ymm5._ ## T [0] == ymm_regs[5]._ ## T [0]); \
+  assert (num_fregs <= 6 || fregs.ymm6._ ## T [0] == ymm_regs[6]._ ## T [0]); \
+  assert (num_fregs <= 7 || fregs.ymm7._ ## T [0] == ymm_regs[7]._ ## T [0]); \
+  } while (0)
+
+#define check_float_arguments check_f_arguments(float)
+#define check_double_arguments check_f_arguments(double)
+
+#define check_vector_arguments(T,O) do { \
+  assert (num_fregs <= 0 \
+	  || memcmp (((char *) &fregs.ymm0) + (O), \
+		     &ymm_regs[0], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 1 \
+	  || memcmp (((char *) &fregs.ymm1) + (O), \
+		     &ymm_regs[1], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 2 \
+	  || memcmp (((char *) &fregs.ymm2) + (O), \
+		     &ymm_regs[2], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 3 \
+	  || memcmp (((char *) &fregs.ymm3) + (O), \
+		     &ymm_regs[3], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 4 \
+	  || memcmp (((char *) &fregs.ymm4) + (O), \
+		     &ymm_regs[4], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 5 \
+	  || memcmp (((char *) &fregs.ymm5) + (O), \
+		     &ymm_regs[5], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 6 \
+	  || memcmp (((char *) &fregs.ymm6) + (O), \
+		     &ymm_regs[6], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 7 \
+	  || memcmp (((char *) &fregs.ymm7) + (O), \
+		     &ymm_regs[7], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  } while (0)
+
+#define check_m64_arguments check_vector_arguments(m64, 0)
+#define check_m128_arguments check_vector_arguments(m128, 0)
+#define check_m256_arguments check_vector_arguments(m256, 0)
+
+#endif /* INCLUDED_ARGS_H  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
new file mode 100644
index 00000000000..73a59191d6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
@@ -0,0 +1,81 @@
+	.text
+	.p2align 4,,15
+.globl snapshot
+	.type	snapshot, @function
+snapshot:
+.LFB3:
+	movq	%rax, rax(%rip)
+	movq	%rbx, rbx(%rip)
+	movq	%rcx, rcx(%rip)
+	movq	%rdx, rdx(%rip)
+	movq	%rdi, rdi(%rip)
+	movq	%rsi, rsi(%rip)
+	movq	%rbp, rbp(%rip)
+	movq	%rsp, rsp(%rip)
+	movq	%r8, r8(%rip)
+	movq	%r9, r9(%rip)
+	movq	%r10, r10(%rip)
+	movq	%r11, r11(%rip)
+	movq	%r12, r12(%rip)
+	movq	%r13, r13(%rip)
+	movq	%r14, r14(%rip)
+	movq	%r15, r15(%rip)
+	vmovdqu	%ymm0, ymm_regs+0(%rip)
+	vmovdqu	%ymm1, ymm_regs+32(%rip)
+	vmovdqu	%ymm2, ymm_regs+64(%rip)
+	vmovdqu	%ymm3, ymm_regs+96(%rip)
+	vmovdqu	%ymm4, ymm_regs+128(%rip)
+	vmovdqu	%ymm5, ymm_regs+160(%rip)
+	vmovdqu	%ymm6, ymm_regs+192(%rip)
+	vmovdqu	%ymm7, ymm_regs+224(%rip)
+	vmovdqu	%ymm8, ymm_regs+256(%rip)
+	vmovdqu	%ymm9, ymm_regs+288(%rip)
+	vmovdqu	%ymm10, ymm_regs+320(%rip)
+	vmovdqu	%ymm11, ymm_regs+352(%rip)
+	vmovdqu	%ymm12, ymm_regs+384(%rip)
+	vmovdqu	%ymm13, ymm_regs+416(%rip)
+	vmovdqu	%ymm14, ymm_regs+448(%rip)
+	vmovdqu	%ymm15, ymm_regs+480(%rip)
+	jmp	*callthis(%rip)
+.LFE3:
+	.size	snapshot, .-snapshot
+
+	.p2align 4,,15
+.globl snapshot_ret
+	.type	snapshot_ret, @function
+snapshot_ret:
+	movq	%rdi, rdi(%rip)
+	subq	$8, %rsp
+	call	*callthis(%rip)
+	addq	$8, %rsp
+	movq	%rax, rax(%rip)
+	movq	%rdx, rdx(%rip)
+	vmovdqu	%ymm0, ymm_regs+0(%rip)
+	vmovdqu	%ymm1, ymm_regs+32(%rip)
+	fstpt	x87_regs(%rip)
+	fstpt	x87_regs+16(%rip)
+	fldt	x87_regs+16(%rip)
+	fldt	x87_regs(%rip)
+	ret
+	.size	snapshot_ret, .-snapshot_ret
+
+	.comm	callthis,8,8
+	.comm	rax,8,8
+	.comm	rbx,8,8
+	.comm	rcx,8,8
+	.comm	rdx,8,8
+	.comm	rsi,8,8
+	.comm	rdi,8,8
+	.comm	rsp,8,8
+	.comm	rbp,8,8
+	.comm	r8,8,8
+	.comm	r9,8,8
+	.comm	r10,8,8
+	.comm	r11,8,8
+	.comm	r12,8,8
+	.comm	r13,8,8
+	.comm	r14,8,8
+	.comm	r15,8,8
+	.comm	ymm_regs,512,32
+	.comm	x87_regs,128,32
+	.comm   volatile_var,8,8
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
new file mode 100644
index 00000000000..6a55030c0d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
@@ -0,0 +1,3 @@
+#define AVX512VL(ebx) (ebx & bit_AVX512VL)
+#define XSTATE_MASK (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK)
+#include "../avx512fp16-check.h"
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
new file mode 100644
index 00000000000..48e0139f416
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
@@ -0,0 +1,54 @@
+#include <stdio.h>
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+__m256
+fun_test_returning___m256 (void)
+{
+  volatile_var++;
+  return (__m256){73,0,0,0,0,0,0,0};
+}
+
+__m256h
+fun_test_returning___m256h (void)
+{
+  volatile_var++;
+  return (__m256h){1.1f16,2.1f16,3.1f16,4.1f16,
+                   5.1f16,6.1f16,7.1f16,8.1f16,
+                   9.1f16,10.1f16,11.1f16,12.1f16,
+		   13.1f16,14.1f16,15.1f16,16.1f16};
+}
+
+__m256 test_256;
+__m256h test_256h;
+
+static void
+do_test (void)
+{
+  unsigned failed = 0;
+  YMM_T ymmt1, ymmt2;
+
+  clear_struct_registers;
+  test_256 = (__m256){73,0,0,0,0,0,0,0};
+  ymmt1._m256[0] = test_256;
+  ymmt2._m256[0] = WRAP_RET (fun_test_returning___m256)();
+  if (memcmp (&ymmt1, &ymmt2, sizeof (ymmt2)) != 0)
+    printf ("fail m256\n"), failed++;
+
+  clear_struct_registers;
+  test_256h = (__m256h){1.1f16,2.1f16,3.1f16,4.1f16,
+                        5.1f16,6.1f16,7.1f16,8.1f16,
+                        9.1f16,10.1f16,11.1f16,12.1f16,
+			13.1f16,14.1f16,15.1f16,16.1f16};
+  ymmt1._m256h[0] = test_256h;
+  ymmt2._m256h[0] = WRAP_RET (fun_test_returning___m256h)();
+  if (memcmp (&ymmt1, &ymmt2, sizeof (ymmt2)) != 0)
+    printf ("fail m256h\n"), failed++;
+
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
new file mode 100644
index 00000000000..bfa80d616ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
@@ -0,0 +1,370 @@
+#include <stdio.h>
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
+    i16, i17, i18, i19, i20, i21, i22, i23;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+fun_check_passing_m256_8_values (__m256 i0 ATTRIBUTE_UNUSED,
+				 __m256 i1 ATTRIBUTE_UNUSED,
+				 __m256 i2 ATTRIBUTE_UNUSED,
+				 __m256 i3 ATTRIBUTE_UNUSED,
+				 __m256 i4 ATTRIBUTE_UNUSED,
+				 __m256 i5 ATTRIBUTE_UNUSED,
+				 __m256 i6 ATTRIBUTE_UNUSED,
+				 __m256 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256);
+  compare (values.i1, i1, __m256);
+  compare (values.i2, i2, __m256);
+  compare (values.i3, i3, __m256);
+  compare (values.i4, i4, __m256);
+  compare (values.i5, i5, __m256);
+  compare (values.i6, i6, __m256);
+  compare (values.i7, i7, __m256);
+}
+
+fun_check_passing_m256h_8_values (__m256h i0 ATTRIBUTE_UNUSED,
+				  __m256h i1 ATTRIBUTE_UNUSED,
+				  __m256h i2 ATTRIBUTE_UNUSED,
+				  __m256h i3 ATTRIBUTE_UNUSED,
+				  __m256h i4 ATTRIBUTE_UNUSED,
+				  __m256h i5 ATTRIBUTE_UNUSED,
+				  __m256h i6 ATTRIBUTE_UNUSED,
+				  __m256h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256h);
+  compare (values.i1, i1, __m256h);
+  compare (values.i2, i2, __m256h);
+  compare (values.i3, i3, __m256h);
+  compare (values.i4, i4, __m256h);
+  compare (values.i5, i5, __m256h);
+  compare (values.i6, i6, __m256h);
+  compare (values.i7, i7, __m256h);
+}
+
+void
+fun_check_passing_m256_8_regs (__m256 i0 ATTRIBUTE_UNUSED,
+			       __m256 i1 ATTRIBUTE_UNUSED,
+			       __m256 i2 ATTRIBUTE_UNUSED,
+			       __m256 i3 ATTRIBUTE_UNUSED,
+			       __m256 i4 ATTRIBUTE_UNUSED,
+			       __m256 i5 ATTRIBUTE_UNUSED,
+			       __m256 i6 ATTRIBUTE_UNUSED,
+			       __m256 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+void
+fun_check_passing_m256h_8_regs (__m256h i0 ATTRIBUTE_UNUSED,
+				__m256h i1 ATTRIBUTE_UNUSED,
+				__m256h i2 ATTRIBUTE_UNUSED,
+				__m256h i3 ATTRIBUTE_UNUSED,
+				__m256h i4 ATTRIBUTE_UNUSED,
+				__m256h i5 ATTRIBUTE_UNUSED,
+				__m256h i6 ATTRIBUTE_UNUSED,
+				__m256h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+void
+fun_check_passing_m256_20_values (__m256 i0 ATTRIBUTE_UNUSED,
+				  __m256 i1 ATTRIBUTE_UNUSED,
+				  __m256 i2 ATTRIBUTE_UNUSED,
+				  __m256 i3 ATTRIBUTE_UNUSED,
+				  __m256 i4 ATTRIBUTE_UNUSED,
+				  __m256 i5 ATTRIBUTE_UNUSED,
+				  __m256 i6 ATTRIBUTE_UNUSED,
+				  __m256 i7 ATTRIBUTE_UNUSED,
+				  __m256 i8 ATTRIBUTE_UNUSED,
+				  __m256 i9 ATTRIBUTE_UNUSED,
+				  __m256 i10 ATTRIBUTE_UNUSED,
+				  __m256 i11 ATTRIBUTE_UNUSED,
+				  __m256 i12 ATTRIBUTE_UNUSED,
+				  __m256 i13 ATTRIBUTE_UNUSED,
+				  __m256 i14 ATTRIBUTE_UNUSED,
+				  __m256 i15 ATTRIBUTE_UNUSED,
+				  __m256 i16 ATTRIBUTE_UNUSED,
+				  __m256 i17 ATTRIBUTE_UNUSED,
+				  __m256 i18 ATTRIBUTE_UNUSED,
+				  __m256 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256);
+  compare (values.i1, i1, __m256);
+  compare (values.i2, i2, __m256);
+  compare (values.i3, i3, __m256);
+  compare (values.i4, i4, __m256);
+  compare (values.i5, i5, __m256);
+  compare (values.i6, i6, __m256);
+  compare (values.i7, i7, __m256);
+  compare (values.i8, i8, __m256);
+  compare (values.i9, i9, __m256);
+  compare (values.i10, i10, __m256);
+  compare (values.i11, i11, __m256);
+  compare (values.i12, i12, __m256);
+  compare (values.i13, i13, __m256);
+  compare (values.i14, i14, __m256);
+  compare (values.i15, i15, __m256);
+  compare (values.i16, i16, __m256);
+  compare (values.i17, i17, __m256);
+  compare (values.i18, i18, __m256);
+  compare (values.i19, i19, __m256);
+}
+
+void
+fun_check_passing_m256h_20_values (__m256h i0 ATTRIBUTE_UNUSED,
+				   __m256h i1 ATTRIBUTE_UNUSED,
+				   __m256h i2 ATTRIBUTE_UNUSED,
+				   __m256h i3 ATTRIBUTE_UNUSED,
+				   __m256h i4 ATTRIBUTE_UNUSED,
+				   __m256h i5 ATTRIBUTE_UNUSED,
+				   __m256h i6 ATTRIBUTE_UNUSED,
+				   __m256h i7 ATTRIBUTE_UNUSED,
+				   __m256h i8 ATTRIBUTE_UNUSED,
+				   __m256h i9 ATTRIBUTE_UNUSED,
+				   __m256h i10 ATTRIBUTE_UNUSED,
+				   __m256h i11 ATTRIBUTE_UNUSED,
+				   __m256h i12 ATTRIBUTE_UNUSED,
+				   __m256h i13 ATTRIBUTE_UNUSED,
+				   __m256h i14 ATTRIBUTE_UNUSED,
+				   __m256h i15 ATTRIBUTE_UNUSED,
+				   __m256h i16 ATTRIBUTE_UNUSED,
+				   __m256h i17 ATTRIBUTE_UNUSED,
+				   __m256h i18 ATTRIBUTE_UNUSED,
+				   __m256h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256h);
+  compare (values.i1, i1, __m256h);
+  compare (values.i2, i2, __m256h);
+  compare (values.i3, i3, __m256h);
+  compare (values.i4, i4, __m256h);
+  compare (values.i5, i5, __m256h);
+  compare (values.i6, i6, __m256h);
+  compare (values.i7, i7, __m256h);
+  compare (values.i8, i8, __m256h);
+  compare (values.i9, i9, __m256h);
+  compare (values.i10, i10, __m256h);
+  compare (values.i11, i11, __m256h);
+  compare (values.i12, i12, __m256h);
+  compare (values.i13, i13, __m256h);
+  compare (values.i14, i14, __m256h);
+  compare (values.i15, i15, __m256h);
+  compare (values.i16, i16, __m256h);
+  compare (values.i17, i17, __m256h);
+  compare (values.i18, i18, __m256h);
+  compare (values.i19, i19, __m256h);
+}
+
+void
+fun_check_passing_m256_20_regs (__m256 i0 ATTRIBUTE_UNUSED,
+				__m256 i1 ATTRIBUTE_UNUSED,
+				__m256 i2 ATTRIBUTE_UNUSED,
+				__m256 i3 ATTRIBUTE_UNUSED,
+				__m256 i4 ATTRIBUTE_UNUSED,
+				__m256 i5 ATTRIBUTE_UNUSED,
+				__m256 i6 ATTRIBUTE_UNUSED,
+				__m256 i7 ATTRIBUTE_UNUSED,
+				__m256 i8 ATTRIBUTE_UNUSED,
+				__m256 i9 ATTRIBUTE_UNUSED,
+				__m256 i10 ATTRIBUTE_UNUSED,
+				__m256 i11 ATTRIBUTE_UNUSED,
+				__m256 i12 ATTRIBUTE_UNUSED,
+				__m256 i13 ATTRIBUTE_UNUSED,
+				__m256 i14 ATTRIBUTE_UNUSED,
+				__m256 i15 ATTRIBUTE_UNUSED,
+				__m256 i16 ATTRIBUTE_UNUSED,
+				__m256 i17 ATTRIBUTE_UNUSED,
+				__m256 i18 ATTRIBUTE_UNUSED,
+				__m256 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+void
+fun_check_passing_m256h_20_regs (__m256h i0 ATTRIBUTE_UNUSED,
+				 __m256h i1 ATTRIBUTE_UNUSED,
+				 __m256h i2 ATTRIBUTE_UNUSED,
+				 __m256h i3 ATTRIBUTE_UNUSED,
+				 __m256h i4 ATTRIBUTE_UNUSED,
+				 __m256h i5 ATTRIBUTE_UNUSED,
+				 __m256h i6 ATTRIBUTE_UNUSED,
+				 __m256h i7 ATTRIBUTE_UNUSED,
+				 __m256h i8 ATTRIBUTE_UNUSED,
+				 __m256h i9 ATTRIBUTE_UNUSED,
+				 __m256h i10 ATTRIBUTE_UNUSED,
+				 __m256h i11 ATTRIBUTE_UNUSED,
+				 __m256h i12 ATTRIBUTE_UNUSED,
+				 __m256h i13 ATTRIBUTE_UNUSED,
+				 __m256h i14 ATTRIBUTE_UNUSED,
+				 __m256h i15 ATTRIBUTE_UNUSED,
+				 __m256h i16 ATTRIBUTE_UNUSED,
+				 __m256h i17 ATTRIBUTE_UNUSED,
+				 __m256h i18 ATTRIBUTE_UNUSED,
+				 __m256h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
+
+#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, \
+			    _i8, _i9, _i10, _i11, _i12, _i13, _i14, \
+			    _i15, _i16, _i17, _i18, _i19, _func1, \
+			    _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  values.i10.TYPE[0] = _i10; \
+  values.i11.TYPE[0] = _i11; \
+  values.i12.TYPE[0] = _i12; \
+  values.i13.TYPE[0] = _i13; \
+  values.i14.TYPE[0] = _i14; \
+  values.i15.TYPE[0] = _i15; \
+  values.i16.TYPE[0] = _i16; \
+  values.i17.TYPE[0] = _i17; \
+  values.i18.TYPE[0] = _i18; \
+  values.i19.TYPE[0] = _i19; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, \
+		     _i16, _i17, _i18, _i19); \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, \
+		     _i16, _i17, _i18, _i19);
+
+void
+test_m256_on_stack ()
+{
+  __m256 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m256){32 + i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m256-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m256_8_values,
+		      fun_check_passing_m256_8_regs, _m256);
+}
+
+void
+test_m256h_on_stack ()
+{
+  __m256h x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m256h){1.1f16 + i, 2.1f16 + i, 3.1f16 + i, 4.1f16 + i,
+	             5.1f16 + i, 6.1f16 + i, 7.1f16 + i, 8.1f16 + i,
+	             9.1f16 + i, 10.1f16 + i, 11.1f16 + i, 12.1f16 + i,
+	             13.1f16 + i, 14.1f16 + i, 15.1f16 + i, 16.1f16 + i};
+  pass = "m256h-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m256h_8_values,
+		      fun_check_passing_m256h_8_regs, _m256h);
+}
+
+void
+test_too_many_m256 ()
+{
+  __m256 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m256){32 + i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m256-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m256_20_values,
+		       fun_check_passing_m256_20_regs, _m256);
+}
+
+void
+test_too_many_m256h ()
+{
+  __m256h x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m256h){1.1f16 + i, 2.1f16 + i, 3.1f16 + i, 4.1f16 + i,
+	             5.1f16 + i, 6.1f16 + i, 7.1f16 + i, 8.1f16 + i,
+	             9.1f16 + i, 10.1f16 + i, 11.1f16 + i, 12.1f16 + i,
+	             13.1f16 + i, 14.1f16 + i, 15.1f16 + i, 16.1f16 + i};
+  pass = "m256h-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m256h_20_values,
+		       fun_check_passing_m256h_20_regs, _m256h);
+}
+
+static void
+do_test (void)
+{
+  test_m256_on_stack ();
+  test_too_many_m256 ();
+  test_m256h_on_stack ();
+  test_too_many_m256h ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
new file mode 100644
index 00000000000..eff10badd6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
@@ -0,0 +1,113 @@
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+struct m256_struct
+{
+  __m256 x;
+};
+
+struct m256_2_struct
+{
+  __m256 x1, x2;
+};
+
+struct m256h_struct
+{
+  __m256h x;
+};
+
+struct m256h_2_struct
+{
+  __m256h x1, x2;
+};
+
+/* Check that the struct is passed as the individual members in fregs.  */
+void
+check_struct_passing1 (struct m256_struct ms1 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms2 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms3 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms4 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms5 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms6 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms7 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_struct_passing2 (struct m256_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+40);
+}
+
+void
+check_struct_passing1h (struct m256h_struct ms1 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms2 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms3 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms4 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms5 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms6 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms7 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_struct_passing2h (struct m256h_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+40);
+}
+
+static void
+do_test (void)
+{
+  struct m256_struct m256s [8];
+  struct m256h_struct m256hs [8];
+  struct m256_2_struct m256_2s = { 
+      { 48.394, 39.3, -397.9, 3484.9, -8.394, -93.3, 7.9, 84.94 },
+      { -8.394, -3.3, -39.9, 34.9, 7.9, 84.94, -48.394, 39.3 }
+  };
+  struct m256h_2_struct m256h_2s = { 
+      { 47.364f16, 36.3f16, -367.6f16, 3474.6f16, -7.364f16, -63.3f16, 7.6f16, 74.64f16,
+        57.865f16, 86.8f16, -867.6f16, 8575.6f16, -7.865f16, -68.8f16, 7.6f16, 75.65f16  },
+      { -7.364f16, -3.3f16, -36.6f16, 34.6f16, 7.6f16, 74.64f16, -47.364f16, 36.3f16,
+        -8.364f16, -3.3f16, -36.6f16, 34.6f16, 8.6f16, 84.64f16, -48.364f16, 36.3f16  }
+  };
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      m256s[i].x = (__m256){32+i, 0, i, 0, -i, 0, i - 12, i + 8};
+
+      m256hs[i].x = (__m256h){33+i, 0, i, 0, -i, 0, i - 11, i + 9,
+                              31+i, 2, i, 3, -i, 4, i - 10, i + 7};
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256[0] = m256s[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1)(m256s[0], m256s[1], m256s[2], m256s[3],
+				    m256s[4], m256s[5], m256s[6], m256s[7]);
+  WRAP_CALL (check_struct_passing2)(m256_2s);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256h[0] = m256hs[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1h)(m256hs[0], m256hs[1], m256hs[2], m256hs[3],
+				    m256hs[4], m256hs[5], m256hs[6], m256hs[7]);
+  WRAP_CALL (check_struct_passing2h)(m256h_2s);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
new file mode 100644
index 00000000000..76f300c3e5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
@@ -0,0 +1,337 @@
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+union un1
+{
+  __m256 x;
+  float f;
+};
+
+union un2
+{
+  __m256 x;
+  double d;
+};
+
+union un3
+{
+  __m256 x;
+  __m128 v;
+};
+
+union un4
+{
+  __m256 x;
+  long double ld;
+};
+
+union un5
+{
+  __m256 x;
+  int i;
+};
+
+union un1a
+{
+  __m256 x;
+  _Float16 f;
+};
+
+union un1h
+{
+  __m256h x;
+  float f;
+};
+
+union un1hh
+{
+  __m256h x;
+  _Float16 f;
+};
+
+union un2h
+{
+  __m256h x;
+  double d;
+};
+
+union un3h
+{
+  __m256h x;
+  __m128 v;
+};
+
+union un4h
+{
+  __m256h x;
+  long double ld;
+};
+
+union un5h
+{
+  __m256h x;
+  int i;
+};
+
+void
+check_union_passing1(union un1 u1 ATTRIBUTE_UNUSED,
+		     union un1 u2 ATTRIBUTE_UNUSED,
+		     union un1 u3 ATTRIBUTE_UNUSED,
+		     union un1 u4 ATTRIBUTE_UNUSED,
+		     union un1 u5 ATTRIBUTE_UNUSED,
+		     union un1 u6 ATTRIBUTE_UNUSED,
+		     union un1 u7 ATTRIBUTE_UNUSED,
+		     union un1 u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing1a(union un1a u1 ATTRIBUTE_UNUSED,
+		      union un1a u2 ATTRIBUTE_UNUSED,
+		      union un1a u3 ATTRIBUTE_UNUSED,
+		      union un1a u4 ATTRIBUTE_UNUSED,
+		      union un1a u5 ATTRIBUTE_UNUSED,
+		      union un1a u6 ATTRIBUTE_UNUSED,
+		      union un1a u7 ATTRIBUTE_UNUSED,
+		      union un1a u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing1h(union un1h u1 ATTRIBUTE_UNUSED,
+		      union un1h u2 ATTRIBUTE_UNUSED,
+		      union un1h u3 ATTRIBUTE_UNUSED,
+		      union un1h u4 ATTRIBUTE_UNUSED,
+		      union un1h u5 ATTRIBUTE_UNUSED,
+		      union un1h u6 ATTRIBUTE_UNUSED,
+		      union un1h u7 ATTRIBUTE_UNUSED,
+		      union un1h u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing1hh(union un1hh u1 ATTRIBUTE_UNUSED,
+		       union un1hh u2 ATTRIBUTE_UNUSED,
+		       union un1hh u3 ATTRIBUTE_UNUSED,
+		       union un1hh u4 ATTRIBUTE_UNUSED,
+		       union un1hh u5 ATTRIBUTE_UNUSED,
+		       union un1hh u6 ATTRIBUTE_UNUSED,
+		       union un1hh u7 ATTRIBUTE_UNUSED,
+		       union un1hh u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED,
+		     union un2 u2 ATTRIBUTE_UNUSED,
+		     union un2 u3 ATTRIBUTE_UNUSED,
+		     union un2 u4 ATTRIBUTE_UNUSED,
+		     union un2 u5 ATTRIBUTE_UNUSED,
+		     union un2 u6 ATTRIBUTE_UNUSED,
+		     union un2 u7 ATTRIBUTE_UNUSED,
+		     union un2 u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing2h(union un2h u1 ATTRIBUTE_UNUSED,
+		      union un2h u2 ATTRIBUTE_UNUSED,
+		      union un2h u3 ATTRIBUTE_UNUSED,
+		      union un2h u4 ATTRIBUTE_UNUSED,
+		      union un2h u5 ATTRIBUTE_UNUSED,
+		      union un2h u6 ATTRIBUTE_UNUSED,
+		      union un2h u7 ATTRIBUTE_UNUSED,
+		      union un2h u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing3(union un3 u1 ATTRIBUTE_UNUSED,
+		     union un3 u2 ATTRIBUTE_UNUSED,
+		     union un3 u3 ATTRIBUTE_UNUSED,
+		     union un3 u4 ATTRIBUTE_UNUSED,
+		     union un3 u5 ATTRIBUTE_UNUSED,
+		     union un3 u6 ATTRIBUTE_UNUSED,
+		     union un3 u7 ATTRIBUTE_UNUSED,
+		     union un3 u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing3h(union un3h u1 ATTRIBUTE_UNUSED,
+		      union un3h u2 ATTRIBUTE_UNUSED,
+		      union un3h u3 ATTRIBUTE_UNUSED,
+		      union un3h u4 ATTRIBUTE_UNUSED,
+		      union un3h u5 ATTRIBUTE_UNUSED,
+		      union un3h u6 ATTRIBUTE_UNUSED,
+		      union un3h u7 ATTRIBUTE_UNUSED,
+		      union un3h u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing4(union un4 u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing4h(union un4h u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing5(union un5 u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+void
+check_union_passing5h(union un5h u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+#define check_union_passing1 WRAP_CALL(check_union_passing1)
+#define check_union_passing2 WRAP_CALL(check_union_passing2)
+#define check_union_passing3 WRAP_CALL(check_union_passing3)
+#define check_union_passing4 WRAP_CALL(check_union_passing4)
+#define check_union_passing5 WRAP_CALL(check_union_passing5)
+
+#define check_union_passing1h WRAP_CALL(check_union_passing1h)
+#define check_union_passing1a WRAP_CALL(check_union_passing1a)
+#define check_union_passing1hh WRAP_CALL(check_union_passing1hh)
+#define check_union_passing2h WRAP_CALL(check_union_passing2h)
+#define check_union_passing3h WRAP_CALL(check_union_passing3h)
+#define check_union_passing4h WRAP_CALL(check_union_passing4h)
+#define check_union_passing5h WRAP_CALL(check_union_passing5h)
+
+static void
+do_test (void)
+{
+  union un1 u1[8];
+  union un2 u2[8];
+  union un3 u3[8];
+  union un4 u4;
+  union un5 u5;
+  union un1a u1a[8];
+  union un1h u1h[8];
+  union un1hh u1hh[8];
+  union un2h u2h[8];
+  union un3h u3h[8];
+  union un4h u4h;
+  union un5h u5h;
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      u1[i].x = (__m256){32+i, 0, i, 0, -i, 0, i - 12, i + 8};
+      u1h[i].x = (__m256h){32+i, 0, i, 0, -i, 0, i - 12, i + 8,
+                           33+i, 1, i, 2, -i, 4, i - 11, i + 9};
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256[0] = u1[i].x;
+  num_fregs = 8;
+  check_union_passing1(u1[0], u1[1], u1[2], u1[3],
+		       u1[4], u1[5], u1[6], u1[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1a[i].x = u1[i].x;
+      (&fregs.ymm0)[i]._m256[0] = u1a[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1a(u1a[0], u1a[1], u1a[2], u1a[3],
+		        u1a[4], u1a[5], u1a[6], u1a[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256h[0] = u1h[i].x;
+  num_fregs = 8;
+  check_union_passing1h(u1h[0], u1h[1], u1h[2], u1h[3],
+		        u1h[4], u1h[5], u1h[6], u1h[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1hh[i].x = u1h[i].x;
+      (&fregs.ymm0)[i]._m256h[0] = u1hh[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1hh(u1hh[0], u1hh[1], u1hh[2], u1hh[3],
+		         u1hh[4], u1hh[5], u1hh[6], u1hh[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2[i].x = u1[i].x;
+      (&fregs.ymm0)[i]._m256[0] = u2[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2(u2[0], u2[1], u2[2], u2[3],
+		       u2[4], u2[5], u2[6], u2[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2h[i].x = u1h[i].x;
+      (&fregs.ymm0)[i]._m256h[0] = u2h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2h(u2h[0], u2h[1], u2h[2], u2h[3],
+		        u2h[4], u2h[5], u2h[6], u2h[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3[i].x = u1[i].x;
+      (&fregs.ymm0)[i]._m256[0] = u3[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3(u3[0], u3[1], u3[2], u3[3],
+		       u3[4], u3[5], u3[6], u3[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3h[i].x = u1h[i].x;
+      (&fregs.ymm0)[i]._m256h[0] = u3h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3h(u3h[0], u3h[1], u3h[2], u3h[3],
+		        u3h[4], u3h[5], u3h[6], u3h[7]);
+
+  check_union_passing4(u4);
+  check_union_passing5(u5);
+
+  check_union_passing4h(u4h);
+  check_union_passing5h(u5h);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c
new file mode 100644
index 00000000000..f15adb4a33b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c
@@ -0,0 +1,160 @@
+/* Test variable number of 256-bit vector arguments passed to functions.  */
+
+#include <stdio.h>
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m256_varargs (__m256 i0, __m256 i1, __m256 i2,
+				__m256 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m256 *argp;
+
+  compare (values.i0, i0, __m256);
+  compare (values.i1, i1, __m256);
+  compare (values.i2, i2, __m256);
+  compare (values.i3, i3, __m256);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m256 *)(((char *) fp) + 8);
+
+  /* Check __m256 arguments passed on stack.  */
+  compare (values.i4, argp[0], __m256);
+  compare (values.i5, argp[1], __m256);
+  compare (values.i6, argp[2], __m256);
+  compare (values.i7, argp[3], __m256);
+  compare (values.i8, argp[4], __m256);
+  compare (values.i9, argp[5], __m256);
+
+  /* Check register contents.  */
+  compare (fregs.ymm0, ymm_regs[0], __m256);
+  compare (fregs.ymm1, ymm_regs[1], __m256);
+  compare (fregs.ymm2, ymm_regs[2], __m256);
+  compare (fregs.ymm3, ymm_regs[3], __m256);
+}
+
+void
+fun_check_passing_m256h_varargs (__m256h i0, __m256h i1, __m256h i2,
+				 __m256h i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m256h *argp;
+
+  compare (values.i0, i0, __m256h);
+  compare (values.i1, i1, __m256h);
+  compare (values.i2, i2, __m256h);
+  compare (values.i3, i3, __m256h);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m256h *)(((char *) fp) + 8);
+
+  /* Check __m256h arguments passed on stack.  */
+  compare (values.i4, argp[0], __m256h);
+  compare (values.i5, argp[1], __m256h);
+  compare (values.i6, argp[2], __m256h);
+  compare (values.i7, argp[3], __m256h);
+  compare (values.i8, argp[4], __m256h);
+  compare (values.i9, argp[5], __m256h);
+
+  /* Check register contents.  */
+  compare (fregs.ymm0, ymm_regs[0], __m256h);
+  compare (fregs.ymm1, ymm_regs[1], __m256h);
+  compare (fregs.ymm2, ymm_regs[2], __m256h);
+  compare (fregs.ymm3, ymm_regs[3], __m256h);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+				      _i6, _i7, _i8, _i9, \
+				      _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m256_varargs (void)
+{
+  __m256 x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m256){32+i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m256-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m256_varargs,
+				 _m256);
+}
+
+void
+test_m256h_varargs (void)
+{
+  __m256h x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m256h) {
+        1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+	5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i,
+	9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i,
+	13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i
+    };
+  pass = "m256h-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m256h_varargs,
+				 _m256h);
+}
+
+void
+do_test (void)
+{
+  test_m256_varargs ();
+  test_m256h_varargs ();
+  if (failed)
+    abort ();
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 06/62] AVX512FP16: Add abi test for zmm
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (4 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 05/62] AVX512FP16: Add ABI test for ymm liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 07/62] AVX512FP16: Add vaddph/vsubph/vdivph/vmulph liuhongt
                   ` (55 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp:
	New file.
	* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c:
	Likewise.
---
 .../avx512fp16/m512h/abi-avx512fp16-zmm.exp   |  48 ++
 .../x86_64/abi/avx512fp16/m512h/args.h        | 186 ++++++++
 .../x86_64/abi/avx512fp16/m512h/asm-support.S |  97 ++++
 .../avx512fp16/m512h/avx512fp16-zmm-check.h   |   4 +
 .../avx512fp16/m512h/test_m512_returning.c    |  62 +++
 .../abi/avx512fp16/m512h/test_passing_m512.c  | 380 ++++++++++++++++
 .../avx512fp16/m512h/test_passing_structs.c   | 123 ++++++
 .../avx512fp16/m512h/test_passing_unions.c    | 415 ++++++++++++++++++
 .../abi/avx512fp16/m512h/test_varargs-m512.c  | 164 +++++++
 9 files changed, 1479 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
new file mode 100644
index 00000000000..33d24762788
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
@@ -0,0 +1,48 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib clearcap.exp
+load_lib file-format.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || [is-effective-target ia32]
+     || [gcc_target_object_format] != "elf"
+     || ![is-effective-target avx512fp16] } then {
+  return
+}
+
+
+torture-init
+clearcap-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -Wno-abi -mavx512fp16"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+    if {[runtest_file_p $runtests $src]} {
+	c-torture-execute [list $src \
+				$srcdir/$subdir/asm-support.S] \
+				$additional_flags
+    }
+}
+
+clearcap-finish
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
new file mode 100644
index 00000000000..ec89fae4597
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
@@ -0,0 +1,186 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include <immintrin.h>
+#include <string.h>
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 zmm0
+#define F1 zmm1
+#define F2 zmm2
+#define F3 zmm3
+#define F4 zmm4
+#define F5 zmm5
+#define F6 zmm6
+#define F7 zmm7
+
+typedef union {
+  _Float16 __Float16[32];
+  float _float[16];
+  double _double[8];
+  long _long[8];
+  int _int[16];
+  unsigned long _ulong[8];
+  __m64 _m64[8];
+  __m128 _m128[4];
+  __m256 _m256[2];
+  __m512 _m512[1];
+  __m512h _m512h[1];
+} ZMM_T;
+
+typedef union {
+  float _float;
+  double _double;
+  long double _ldouble;
+  unsigned long _ulong[2];
+} X87_T;
+extern void (*callthis)(void);
+extern unsigned long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
+ZMM_T zmm_regs[32];
+X87_T x87_regs[8];
+extern volatile unsigned long volatile_var;
+extern void snapshot (void);
+extern void snapshot_ret (void);
+#define WRAP_CALL(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
+#define WRAP_RET(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
+
+/* Clear all integer registers.  */
+#define clear_int_hardware_registers \
+  asm __volatile__ ("xor %%rax, %%rax\n\t" \
+		    "xor %%rbx, %%rbx\n\t" \
+		    "xor %%rcx, %%rcx\n\t" \
+		    "xor %%rdx, %%rdx\n\t" \
+		    "xor %%rsi, %%rsi\n\t" \
+		    "xor %%rdi, %%rdi\n\t" \
+		    "xor %%r8, %%r8\n\t" \
+		    "xor %%r9, %%r9\n\t" \
+		    "xor %%r10, %%r10\n\t" \
+		    "xor %%r11, %%r11\n\t" \
+		    "xor %%r12, %%r12\n\t" \
+		    "xor %%r13, %%r13\n\t" \
+		    "xor %%r14, %%r14\n\t" \
+		    "xor %%r15, %%r15\n\t" \
+		    ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
+		    "r9", "r10", "r11", "r12", "r13", "r14", "r15");
+
+/* This is the list of registers available for passing arguments. Not all of
+   these are used or even really available.  */
+struct IntegerRegisters
+{
+  unsigned long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
+};
+struct FloatRegisters
+{
+  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
+  long double st0, st1, st2, st3, st4, st5, st6, st7;
+  ZMM_T zmm0, zmm1, zmm2, zmm3, zmm4, zmm5, zmm6, zmm7, zmm8, zmm9,
+        zmm10, zmm11, zmm12, zmm13, zmm14, zmm15, zmm16, zmm17, zmm18,
+	zmm19, zmm20, zmm21, zmm22, zmm23, zmm24, zmm25, zmm26, zmm27,
+	zmm28, zmm29, zmm30, zmm31;
+};
+
+/* Implemented in scalarargs.c  */
+extern struct IntegerRegisters iregs;
+extern struct FloatRegisters fregs;
+extern unsigned int num_iregs, num_fregs;
+
+#define check_int_arguments do { \
+  assert (num_iregs <= 0 || iregs.I0 == I0); \
+  assert (num_iregs <= 1 || iregs.I1 == I1); \
+  assert (num_iregs <= 2 || iregs.I2 == I2); \
+  assert (num_iregs <= 3 || iregs.I3 == I3); \
+  assert (num_iregs <= 4 || iregs.I4 == I4); \
+  assert (num_iregs <= 5 || iregs.I5 == I5); \
+  } while (0)
+
+#define check_char_arguments check_int_arguments
+#define check_short_arguments check_int_arguments
+#define check_long_arguments check_int_arguments
+
+/* Clear register struct.  */
+#define clear_struct_registers \
+  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
+    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
+  memset (&iregs, 0, sizeof (iregs)); \
+  memset (&fregs, 0, sizeof (fregs)); \
+  memset (zmm_regs, 0, sizeof (zmm_regs)); \
+  memset (x87_regs, 0, sizeof (x87_regs));
+
+/* Clear both hardware and register structs for integers.  */
+#define clear_int_registers \
+  clear_struct_registers \
+  clear_int_hardware_registers
+
+/* TODO: Do the checking.  */
+#define check_f_arguments(T) do { \
+  assert (num_fregs <= 0 || fregs.zmm0._ ## T [0] == zmm_regs[0]._ ## T [0]); \
+  assert (num_fregs <= 1 || fregs.zmm1._ ## T [0] == zmm_regs[1]._ ## T [0]); \
+  assert (num_fregs <= 2 || fregs.zmm2._ ## T [0] == zmm_regs[2]._ ## T [0]); \
+  assert (num_fregs <= 3 || fregs.zmm3._ ## T [0] == zmm_regs[3]._ ## T [0]); \
+  assert (num_fregs <= 4 || fregs.zmm4._ ## T [0] == zmm_regs[4]._ ## T [0]); \
+  assert (num_fregs <= 5 || fregs.zmm5._ ## T [0] == zmm_regs[5]._ ## T [0]); \
+  assert (num_fregs <= 6 || fregs.zmm6._ ## T [0] == zmm_regs[6]._ ## T [0]); \
+  assert (num_fregs <= 7 || fregs.zmm7._ ## T [0] == zmm_regs[7]._ ## T [0]); \
+  } while (0)
+
+#define check_float_arguments check_f_arguments(float)
+#define check_double_arguments check_f_arguments(double)
+
+#define check_vector_arguments(T,O) do { \
+  assert (num_fregs <= 0 \
+	  || memcmp (((char *) &fregs.zmm0) + (O), \
+		     &zmm_regs[0], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 1 \
+	  || memcmp (((char *) &fregs.zmm1) + (O), \
+		     &zmm_regs[1], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 2 \
+	  || memcmp (((char *) &fregs.zmm2) + (O), \
+		     &zmm_regs[2], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 3 \
+	  || memcmp (((char *) &fregs.zmm3) + (O), \
+		     &zmm_regs[3], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 4 \
+	  || memcmp (((char *) &fregs.zmm4) + (O), \
+		     &zmm_regs[4], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 5 \
+	  || memcmp (((char *) &fregs.zmm5) + (O), \
+		     &zmm_regs[5], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 6 \
+	  || memcmp (((char *) &fregs.zmm6) + (O), \
+		     &zmm_regs[6], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 7 \
+	  || memcmp (((char *) &fregs.zmm7) + (O), \
+		     &zmm_regs[7], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  } while (0)
+
+#define check_m64_arguments check_vector_arguments(m64, 0)
+#define check_m128_arguments check_vector_arguments(m128, 0)
+#define check_m256_arguments check_vector_arguments(m256, 0)
+#define check_m512_arguments check_vector_arguments(m512, 0)
+
+#endif /* INCLUDED_ARGS_H  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
new file mode 100644
index 00000000000..0ef82876dd9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
@@ -0,0 +1,97 @@
+	.text
+	.p2align 4,,15
+.globl snapshot
+	.type	snapshot, @function
+snapshot:
+.LFB3:
+	movq	%rax, rax(%rip)
+	movq	%rbx, rbx(%rip)
+	movq	%rcx, rcx(%rip)
+	movq	%rdx, rdx(%rip)
+	movq	%rdi, rdi(%rip)
+	movq	%rsi, rsi(%rip)
+	movq	%rbp, rbp(%rip)
+	movq	%rsp, rsp(%rip)
+	movq	%r8, r8(%rip)
+	movq	%r9, r9(%rip)
+	movq	%r10, r10(%rip)
+	movq	%r11, r11(%rip)
+	movq	%r12, r12(%rip)
+	movq	%r13, r13(%rip)
+	movq	%r14, r14(%rip)
+	movq	%r15, r15(%rip)
+	vmovdqu32 %zmm0, zmm_regs+0(%rip)
+	vmovdqu32 %zmm1, zmm_regs+64(%rip)
+	vmovdqu32 %zmm2, zmm_regs+128(%rip)
+	vmovdqu32 %zmm3, zmm_regs+192(%rip)
+	vmovdqu32 %zmm4, zmm_regs+256(%rip)
+	vmovdqu32 %zmm5, zmm_regs+320(%rip)
+	vmovdqu32 %zmm6, zmm_regs+384(%rip)
+	vmovdqu32 %zmm7, zmm_regs+448(%rip)
+	vmovdqu32 %zmm8, zmm_regs+512(%rip)
+	vmovdqu32 %zmm9, zmm_regs+576(%rip)
+	vmovdqu32 %zmm10, zmm_regs+640(%rip)
+	vmovdqu32 %zmm11, zmm_regs+704(%rip)
+	vmovdqu32 %zmm12, zmm_regs+768(%rip)
+	vmovdqu32 %zmm13, zmm_regs+832(%rip)
+	vmovdqu32 %zmm14, zmm_regs+896(%rip)
+	vmovdqu32 %zmm15, zmm_regs+960(%rip)
+	vmovdqu32 %zmm16, zmm_regs+1024(%rip)
+	vmovdqu32 %zmm17, zmm_regs+1088(%rip)
+	vmovdqu32 %zmm18, zmm_regs+1152(%rip)
+	vmovdqu32 %zmm19, zmm_regs+1216(%rip)
+	vmovdqu32 %zmm20, zmm_regs+1280(%rip)
+	vmovdqu32 %zmm21, zmm_regs+1344(%rip)
+	vmovdqu32 %zmm22, zmm_regs+1408(%rip)
+	vmovdqu32 %zmm23, zmm_regs+1472(%rip)
+	vmovdqu32 %zmm24, zmm_regs+1536(%rip)
+	vmovdqu32 %zmm25, zmm_regs+1600(%rip)
+	vmovdqu32 %zmm26, zmm_regs+1664(%rip)
+	vmovdqu32 %zmm27, zmm_regs+1728(%rip)
+	vmovdqu32 %zmm28, zmm_regs+1792(%rip)
+	vmovdqu32 %zmm29, zmm_regs+1856(%rip)
+	vmovdqu32 %zmm30, zmm_regs+1920(%rip)
+	vmovdqu32 %zmm31, zmm_regs+1984(%rip)
+	jmp	*callthis(%rip)
+.LFE3:
+	.size	snapshot, .-snapshot
+
+	.p2align 4,,15
+.globl snapshot_ret
+	.type	snapshot_ret, @function
+snapshot_ret:
+	movq	%rdi, rdi(%rip)
+	subq	$8, %rsp
+	call	*callthis(%rip)
+	addq	$8, %rsp
+	movq	%rax, rax(%rip)
+	movq	%rdx, rdx(%rip)
+	vmovdqu32	%zmm0, zmm_regs+0(%rip)
+	vmovdqu32	%zmm1, zmm_regs+64(%rip)
+	fstpt	x87_regs(%rip)
+	fstpt	x87_regs+16(%rip)
+	fldt	x87_regs+16(%rip)
+	fldt	x87_regs(%rip)
+	ret
+	.size	snapshot_ret, .-snapshot_ret
+
+	.comm	callthis,8,8
+	.comm	rax,8,8
+	.comm	rbx,8,8
+	.comm	rcx,8,8
+	.comm	rdx,8,8
+	.comm	rsi,8,8
+	.comm	rdi,8,8
+	.comm	rsp,8,8
+	.comm	rbp,8,8
+	.comm	r8,8,8
+	.comm	r9,8,8
+	.comm	r10,8,8
+	.comm	r11,8,8
+	.comm	r12,8,8
+	.comm	r13,8,8
+	.comm	r14,8,8
+	.comm	r15,8,8
+	.comm	zmm_regs,2048,64
+	.comm	x87_regs,128,32
+	.comm   volatile_var,8,8
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
new file mode 100644
index 00000000000..4b882cc11fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
@@ -0,0 +1,4 @@
+#define AVX512VL(ebx) 1
+#define XSTATE_MASK (XSTATE_SSE | XSTATE_YMM | XSTATE_ZMM \
+		     | XSTATE_HI_ZMM | XSTATE_OPMASK)
+#include "../avx512fp16-check.h"
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
new file mode 100644
index 00000000000..5cb59436cfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
@@ -0,0 +1,62 @@
+#include <stdio.h>
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+__m512
+fun_test_returning___m512 (void)
+{
+  volatile_var++;
+  return (__m512){73,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+}
+
+__m512h
+fun_test_returning___m512h (void)
+{
+  volatile_var++;
+  return (__m512h){ 1.1f16, 2.2f16, 3.3f16, 4.4f16,
+                    5.5f16, 6.6f16, 7.7f16, 8.8f16,
+                    9.9f16,  10.10f16,   11.11f16, 12.12f16,
+                    13.13f16, 14.14f16,  15.15f16, 16.16f16,
+                    17.17f16, 18.18f16,  19.19f16, 20.20f16,
+                    21.21f16, 22.22f16,  23.23f16, 24.24f16,
+                    25.25f16, 26.26f16,  27.27f16, 28.28f16,
+                    29.29f16, 30.30f16,  31.31f16, 32.32f16};
+}
+
+__m512 test_512;
+__m512h test_512h;
+
+static void
+do_test (void)
+{
+  unsigned failed = 0;
+  ZMM_T zmmt1, zmmt2;
+
+  clear_struct_registers;
+  test_512 = (__m512){73,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+  zmmt1._m512[0] = test_512;
+  zmmt2._m512[0] = WRAP_RET (fun_test_returning___m512)();
+  if (memcmp (&zmmt1, &zmmt2, sizeof (zmmt2)) != 0)
+    printf ("fail m512\n"), failed++;
+
+  clear_struct_registers;
+  test_512h = (__m512h){ 1.1f16, 2.2f16, 3.3f16, 4.4f16,
+                         5.5f16, 6.6f16, 7.7f16, 8.8f16,
+                         9.9f16,  10.10f16,   11.11f16, 12.12f16,
+                         13.13f16, 14.14f16,  15.15f16, 16.16f16,
+                         17.17f16, 18.18f16,  19.19f16, 20.20f16,
+                         21.21f16, 22.22f16,  23.23f16, 24.24f16,
+                         25.25f16, 26.26f16,  27.27f16, 28.28f16,
+                         29.29f16, 30.30f16,  31.31f16, 32.32f16};
+  zmmt1._m512h[0] = test_512h;
+  zmmt2._m512h[0] = WRAP_RET (fun_test_returning___m512h)();
+  if (memcmp (&zmmt1, &zmmt2, sizeof (zmmt2)) != 0)
+    printf ("fail m512h\n"), failed++;
+
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
new file mode 100644
index 00000000000..ad5ba2e7f92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
@@ -0,0 +1,380 @@
+#include <stdio.h>
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
+    i16, i17, i18, i19, i20, i21, i22, i23;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+fun_check_passing_m512_8_values (__m512 i0 ATTRIBUTE_UNUSED,
+				 __m512 i1 ATTRIBUTE_UNUSED,
+				 __m512 i2 ATTRIBUTE_UNUSED,
+				 __m512 i3 ATTRIBUTE_UNUSED,
+				 __m512 i4 ATTRIBUTE_UNUSED,
+				 __m512 i5 ATTRIBUTE_UNUSED,
+				 __m512 i6 ATTRIBUTE_UNUSED,
+				 __m512 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512);
+  compare (values.i1, i1, __m512);
+  compare (values.i2, i2, __m512);
+  compare (values.i3, i3, __m512);
+  compare (values.i4, i4, __m512);
+  compare (values.i5, i5, __m512);
+  compare (values.i6, i6, __m512);
+  compare (values.i7, i7, __m512);
+}
+
+fun_check_passing_m512h_8_values (__m512h i0 ATTRIBUTE_UNUSED,
+				  __m512h i1 ATTRIBUTE_UNUSED,
+				  __m512h i2 ATTRIBUTE_UNUSED,
+				  __m512h i3 ATTRIBUTE_UNUSED,
+				  __m512h i4 ATTRIBUTE_UNUSED,
+				  __m512h i5 ATTRIBUTE_UNUSED,
+				  __m512h i6 ATTRIBUTE_UNUSED,
+				  __m512h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512h);
+  compare (values.i1, i1, __m512h);
+  compare (values.i2, i2, __m512h);
+  compare (values.i3, i3, __m512h);
+  compare (values.i4, i4, __m512h);
+  compare (values.i5, i5, __m512h);
+  compare (values.i6, i6, __m512h);
+  compare (values.i7, i7, __m512h);
+}
+
+void
+fun_check_passing_m512_8_regs (__m512 i0 ATTRIBUTE_UNUSED,
+			       __m512 i1 ATTRIBUTE_UNUSED,
+			       __m512 i2 ATTRIBUTE_UNUSED,
+			       __m512 i3 ATTRIBUTE_UNUSED,
+			       __m512 i4 ATTRIBUTE_UNUSED,
+			       __m512 i5 ATTRIBUTE_UNUSED,
+			       __m512 i6 ATTRIBUTE_UNUSED,
+			       __m512 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+fun_check_passing_m512h_8_regs (__m512h i0 ATTRIBUTE_UNUSED,
+				__m512h i1 ATTRIBUTE_UNUSED,
+				__m512h i2 ATTRIBUTE_UNUSED,
+				__m512h i3 ATTRIBUTE_UNUSED,
+				__m512h i4 ATTRIBUTE_UNUSED,
+				__m512h i5 ATTRIBUTE_UNUSED,
+				__m512h i6 ATTRIBUTE_UNUSED,
+				__m512h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+fun_check_passing_m512_20_values (__m512 i0 ATTRIBUTE_UNUSED,
+				  __m512 i1 ATTRIBUTE_UNUSED,
+				  __m512 i2 ATTRIBUTE_UNUSED,
+				  __m512 i3 ATTRIBUTE_UNUSED,
+				  __m512 i4 ATTRIBUTE_UNUSED,
+				  __m512 i5 ATTRIBUTE_UNUSED,
+				  __m512 i6 ATTRIBUTE_UNUSED,
+				  __m512 i7 ATTRIBUTE_UNUSED,
+				  __m512 i8 ATTRIBUTE_UNUSED,
+				  __m512 i9 ATTRIBUTE_UNUSED,
+				  __m512 i10 ATTRIBUTE_UNUSED,
+				  __m512 i11 ATTRIBUTE_UNUSED,
+				  __m512 i12 ATTRIBUTE_UNUSED,
+				  __m512 i13 ATTRIBUTE_UNUSED,
+				  __m512 i14 ATTRIBUTE_UNUSED,
+				  __m512 i15 ATTRIBUTE_UNUSED,
+				  __m512 i16 ATTRIBUTE_UNUSED,
+				  __m512 i17 ATTRIBUTE_UNUSED,
+				  __m512 i18 ATTRIBUTE_UNUSED,
+				  __m512 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512);
+  compare (values.i1, i1, __m512);
+  compare (values.i2, i2, __m512);
+  compare (values.i3, i3, __m512);
+  compare (values.i4, i4, __m512);
+  compare (values.i5, i5, __m512);
+  compare (values.i6, i6, __m512);
+  compare (values.i7, i7, __m512);
+  compare (values.i8, i8, __m512);
+  compare (values.i9, i9, __m512);
+  compare (values.i10, i10, __m512);
+  compare (values.i11, i11, __m512);
+  compare (values.i12, i12, __m512);
+  compare (values.i13, i13, __m512);
+  compare (values.i14, i14, __m512);
+  compare (values.i15, i15, __m512);
+  compare (values.i16, i16, __m512);
+  compare (values.i17, i17, __m512);
+  compare (values.i18, i18, __m512);
+  compare (values.i19, i19, __m512);
+}
+
+void
+fun_check_passing_m512h_20_values (__m512h i0 ATTRIBUTE_UNUSED,
+				   __m512h i1 ATTRIBUTE_UNUSED,
+				   __m512h i2 ATTRIBUTE_UNUSED,
+				   __m512h i3 ATTRIBUTE_UNUSED,
+				   __m512h i4 ATTRIBUTE_UNUSED,
+				   __m512h i5 ATTRIBUTE_UNUSED,
+				   __m512h i6 ATTRIBUTE_UNUSED,
+				   __m512h i7 ATTRIBUTE_UNUSED,
+				   __m512h i8 ATTRIBUTE_UNUSED,
+				   __m512h i9 ATTRIBUTE_UNUSED,
+				   __m512h i10 ATTRIBUTE_UNUSED,
+				   __m512h i11 ATTRIBUTE_UNUSED,
+				   __m512h i12 ATTRIBUTE_UNUSED,
+				   __m512h i13 ATTRIBUTE_UNUSED,
+				   __m512h i14 ATTRIBUTE_UNUSED,
+				   __m512h i15 ATTRIBUTE_UNUSED,
+				   __m512h i16 ATTRIBUTE_UNUSED,
+				   __m512h i17 ATTRIBUTE_UNUSED,
+				   __m512h i18 ATTRIBUTE_UNUSED,
+				   __m512h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512h);
+  compare (values.i1, i1, __m512h);
+  compare (values.i2, i2, __m512h);
+  compare (values.i3, i3, __m512h);
+  compare (values.i4, i4, __m512h);
+  compare (values.i5, i5, __m512h);
+  compare (values.i6, i6, __m512h);
+  compare (values.i7, i7, __m512h);
+  compare (values.i8, i8, __m512h);
+  compare (values.i9, i9, __m512h);
+  compare (values.i10, i10, __m512h);
+  compare (values.i11, i11, __m512h);
+  compare (values.i12, i12, __m512h);
+  compare (values.i13, i13, __m512h);
+  compare (values.i14, i14, __m512h);
+  compare (values.i15, i15, __m512h);
+  compare (values.i16, i16, __m512h);
+  compare (values.i17, i17, __m512h);
+  compare (values.i18, i18, __m512h);
+  compare (values.i19, i19, __m512h);
+}
+
+void
+fun_check_passing_m512_20_regs (__m512 i0 ATTRIBUTE_UNUSED,
+				__m512 i1 ATTRIBUTE_UNUSED,
+				__m512 i2 ATTRIBUTE_UNUSED,
+				__m512 i3 ATTRIBUTE_UNUSED,
+				__m512 i4 ATTRIBUTE_UNUSED,
+				__m512 i5 ATTRIBUTE_UNUSED,
+				__m512 i6 ATTRIBUTE_UNUSED,
+				__m512 i7 ATTRIBUTE_UNUSED,
+				__m512 i8 ATTRIBUTE_UNUSED,
+				__m512 i9 ATTRIBUTE_UNUSED,
+				__m512 i10 ATTRIBUTE_UNUSED,
+				__m512 i11 ATTRIBUTE_UNUSED,
+				__m512 i12 ATTRIBUTE_UNUSED,
+				__m512 i13 ATTRIBUTE_UNUSED,
+				__m512 i14 ATTRIBUTE_UNUSED,
+				__m512 i15 ATTRIBUTE_UNUSED,
+				__m512 i16 ATTRIBUTE_UNUSED,
+				__m512 i17 ATTRIBUTE_UNUSED,
+				__m512 i18 ATTRIBUTE_UNUSED,
+				__m512 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+fun_check_passing_m512h_20_regs (__m512h i0 ATTRIBUTE_UNUSED,
+				 __m512h i1 ATTRIBUTE_UNUSED,
+				 __m512h i2 ATTRIBUTE_UNUSED,
+				 __m512h i3 ATTRIBUTE_UNUSED,
+				 __m512h i4 ATTRIBUTE_UNUSED,
+				 __m512h i5 ATTRIBUTE_UNUSED,
+				 __m512h i6 ATTRIBUTE_UNUSED,
+				 __m512h i7 ATTRIBUTE_UNUSED,
+				 __m512h i8 ATTRIBUTE_UNUSED,
+				 __m512h i9 ATTRIBUTE_UNUSED,
+				 __m512h i10 ATTRIBUTE_UNUSED,
+				 __m512h i11 ATTRIBUTE_UNUSED,
+				 __m512h i12 ATTRIBUTE_UNUSED,
+				 __m512h i13 ATTRIBUTE_UNUSED,
+				 __m512h i14 ATTRIBUTE_UNUSED,
+				 __m512h i15 ATTRIBUTE_UNUSED,
+				 __m512h i16 ATTRIBUTE_UNUSED,
+				 __m512h i17 ATTRIBUTE_UNUSED,
+				 __m512h i18 ATTRIBUTE_UNUSED,
+				 __m512h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
+  \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
+
+#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
+			    _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
+			    _i18, _i19, _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  values.i10.TYPE[0] = _i10; \
+  values.i11.TYPE[0] = _i11; \
+  values.i12.TYPE[0] = _i12; \
+  values.i13.TYPE[0] = _i13; \
+  values.i14.TYPE[0] = _i14; \
+  values.i15.TYPE[0] = _i15; \
+  values.i16.TYPE[0] = _i16; \
+  values.i17.TYPE[0] = _i17; \
+  values.i18.TYPE[0] = _i18; \
+  values.i19.TYPE[0] = _i19; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
+		     _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
+		     _i18, _i19); \
+  \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
+		     _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
+		     _i18, _i19);
+
+void
+test_m512_on_stack ()
+{
+  __m512 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m512){32 + i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m512-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m512_8_values,
+		      fun_check_passing_m512_8_regs, _m512);
+}
+
+void
+test_m512h_on_stack ()
+{
+  __m512h x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m512h){1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+		     5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i,
+		     9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i,
+		     13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i,
+		     17.17f16 + i, 18.18f16 + i, 19.19f16 + i, 20.20f16 + i,
+		     21.21f16 + i, 22.22f16 + i, 23.23f16 + i, 24.24f16 + i,
+		     25.25f16 + i, 26.26f16 + i, 27.27f16 + i, 28.28f16 + i,
+		     29.29f16 + i, 30.30f16 + i, 31.31f16 + i, 32.32f16 + i};
+
+  pass = "m512h-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m512h_8_values,
+		      fun_check_passing_m512h_8_regs, _m512h);
+}
+
+void
+test_too_many_m512 ()
+{
+  __m512 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m512){32 + i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m512-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m512_20_values,
+		       fun_check_passing_m512_20_regs, _m512);
+}
+
+void
+test_too_many_m512h ()
+{
+  __m512h x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m512h){ 1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+		      5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i,
+		      9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i,
+		      13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i,
+		      17.17f16 + i, 18.18f16 + i, 19.19f16 + i, 20.20f16 + i,
+		      21.21f16 + i, 22.22f16 + i, 23.23f16 + i, 24.24f16 + i,
+		      25.25f16 + i, 26.26f16 + i, 27.27f16 + i, 28.28f16 + i,
+		      29.29f16 + i, 30.30f16 + i, 31.31f16 + i, 32.32f16 + i};
+  pass = "m512h-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m512h_20_values,
+		       fun_check_passing_m512h_20_regs, _m512h);
+}
+
+static void
+do_test (void)
+{
+  test_m512_on_stack ();
+  test_too_many_m512 ();
+  test_m512h_on_stack ();
+  test_too_many_m512h ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
new file mode 100644
index 00000000000..734e0f8e9e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
@@ -0,0 +1,123 @@
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+struct m512_struct
+{
+  __m512 x;
+};
+
+struct m512h_struct
+{
+  __m512h x;
+};
+
+struct m512_2_struct
+{
+  __m512 x1, x2;
+};
+
+struct m512h_2_struct
+{
+  __m512h x1, x2;
+};
+
+/* Check that the struct is passed as the individual members in fregs.  */
+void
+check_struct_passing1 (struct m512_struct ms1 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms2 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms3 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms4 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms5 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms6 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms7 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_struct_passing1h (struct m512h_struct ms1 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms2 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms3 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms4 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms5 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms6 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms7 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_struct_passing2 (struct m512_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+72);
+}
+
+void
+check_struct_passing2h (struct m512h_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+72);
+}
+
+static void
+do_test (void)
+{
+  struct m512_struct m512s [8];
+  struct m512h_struct m512hs [8];
+  struct m512_2_struct m512_2s = {
+      { 48.394, 39.3, -397.9, 3484.9, -8.394, -93.3, 7.9, 84.94,
+	48.3941, 39.31, -397.91, 3484.91, -8.3941, -93.31, 7.91, 84.941 },
+      { -8.394, -3.3, -39.9, 34.9, 7.9, 84.94, -48.394, 39.3,
+	-8.3942, -3.32, -39.92, 34.92, 7.92, 84.942, -48.3942, 39.32 }
+  };
+  struct m512h_2_struct m512h_2s = {
+      { 58.395f16, 39.3f16, -397.9f16, 3585.9f16, -8.395f16, -93.3f16, 7.9f16, 85.95f16,
+        58.395f16, 39.3f16, -397.9f16, 3585.9f16, -8.395f16, -93.3f16, 7.9f16, 85.95f16,
+        58.395f16, 39.3f16, -397.9f16, 3585.9f16, -8.395f16, -93.3f16, 7.9f16, 85.95f16,
+	58.3951f16, 39.31f16, -397.91f16, 3585.91f16, -8.3951f16, -93.31f16, 7.91f16, 85.951f16},
+      { 67.396f16, 39.3f16, -397.9f16, 3676.9f16, -7.396f16, -93.3f16, 7.9f16, 76.96f16,
+        67.396f16, 39.3f16, -397.9f16, 3676.9f16, -7.396f16, -93.3f16, 7.9f16, 76.96f16,
+        67.396f16, 39.3f16, -397.9f16, 3676.9f16, -7.396f16, -93.3f16, 7.9f16, 76.96f16,
+	67.3961f16, 39.31f16, -397.91f16, 3676.91f16, -7.3961f16, -93.31f16, 7.91f16, 76.961f16},
+  };
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      m512s[i].x = (__m512){32+i, 0, i, 0, -i, 0, i - 12, i + 8,
+			    32+i, 0, i, 0, -i, 0, i - 12, i + 8};
+      m512hs[i].x = (__m512h){33+i, 1, i, 2, -i, 0, i - 15, i + 9,
+			      34+i, 1, i, 2, -i, 0, i - 15, i + 9,
+			      35+i, 1, i, 2, -i, 0, i - 15, i + 9,
+			      36+i, 1, i, 2, -i, 0, i - 15, i + 9};
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512[0] = m512s[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1)(m512s[0], m512s[1], m512s[2], m512s[3],
+				    m512s[4], m512s[5], m512s[6], m512s[7]);
+  WRAP_CALL (check_struct_passing2)(m512_2s);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512h[0] = m512hs[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1h)(m512hs[0], m512hs[1], m512hs[2], m512hs[3],
+				    m512hs[4], m512hs[5], m512hs[6], m512hs[7]);
+  WRAP_CALL (check_struct_passing2h)(m512h_2s);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
new file mode 100644
index 00000000000..fa801fbf7ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
@@ -0,0 +1,415 @@
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+union un1
+{
+  __m512 x;
+  float f;
+};
+
+union un2
+{
+  __m512 x;
+  double d;
+};
+
+union un3
+{
+  __m512 x;
+  __m128 v;
+};
+
+union un4
+{
+  __m512 x;
+  long double ld;
+};
+
+union un5
+{
+  __m512 x;
+  int i;
+};
+
+union un6
+{
+  __m512 x;
+  __m256 v;
+};
+
+union un1h
+{
+  __m512 x;
+  _Float16 f;
+};
+
+union un1hf
+{
+  __m512h x;
+  float f;
+};
+
+union un1hh
+{
+  __m512h x;
+  _Float16 f;
+};
+
+union un2h
+{
+  __m512h x;
+  double d;
+};
+
+union un3h
+{
+  __m512h x;
+  __m128 v;
+};
+
+union un4h
+{
+  __m512h x;
+  long double ld;
+};
+
+union un5h
+{
+  __m512h x;
+  int i;
+};
+
+union un6h
+{
+  __m512h x;
+  __m256 v;
+};
+
+void
+check_union_passing1(union un1 u1 ATTRIBUTE_UNUSED,
+		     union un1 u2 ATTRIBUTE_UNUSED,
+		     union un1 u3 ATTRIBUTE_UNUSED,
+		     union un1 u4 ATTRIBUTE_UNUSED,
+		     union un1 u5 ATTRIBUTE_UNUSED,
+		     union un1 u6 ATTRIBUTE_UNUSED,
+		     union un1 u7 ATTRIBUTE_UNUSED,
+		     union un1 u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing1h(union un1h u1 ATTRIBUTE_UNUSED,
+		      union un1h u2 ATTRIBUTE_UNUSED,
+		      union un1h u3 ATTRIBUTE_UNUSED,
+		      union un1h u4 ATTRIBUTE_UNUSED,
+		      union un1h u5 ATTRIBUTE_UNUSED,
+		      union un1h u6 ATTRIBUTE_UNUSED,
+		      union un1h u7 ATTRIBUTE_UNUSED,
+		      union un1h u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing1hf(union un1hf u1 ATTRIBUTE_UNUSED,
+		       union un1hf u2 ATTRIBUTE_UNUSED,
+		       union un1hf u3 ATTRIBUTE_UNUSED,
+		       union un1hf u4 ATTRIBUTE_UNUSED,
+		       union un1hf u5 ATTRIBUTE_UNUSED,
+		       union un1hf u6 ATTRIBUTE_UNUSED,
+		       union un1hf u7 ATTRIBUTE_UNUSED,
+		       union un1hf u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing1hh(union un1hh u1 ATTRIBUTE_UNUSED,
+		       union un1hh u2 ATTRIBUTE_UNUSED,
+		       union un1hh u3 ATTRIBUTE_UNUSED,
+		       union un1hh u4 ATTRIBUTE_UNUSED,
+		       union un1hh u5 ATTRIBUTE_UNUSED,
+		       union un1hh u6 ATTRIBUTE_UNUSED,
+		       union un1hh u7 ATTRIBUTE_UNUSED,
+		       union un1hh u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+
+void
+check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED,
+		     union un2 u2 ATTRIBUTE_UNUSED,
+		     union un2 u3 ATTRIBUTE_UNUSED,
+		     union un2 u4 ATTRIBUTE_UNUSED,
+		     union un2 u5 ATTRIBUTE_UNUSED,
+		     union un2 u6 ATTRIBUTE_UNUSED,
+		     union un2 u7 ATTRIBUTE_UNUSED,
+		     union un2 u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing2h(union un2h u1 ATTRIBUTE_UNUSED,
+		      union un2h u2 ATTRIBUTE_UNUSED,
+		      union un2h u3 ATTRIBUTE_UNUSED,
+		      union un2h u4 ATTRIBUTE_UNUSED,
+		      union un2h u5 ATTRIBUTE_UNUSED,
+		      union un2h u6 ATTRIBUTE_UNUSED,
+		      union un2h u7 ATTRIBUTE_UNUSED,
+		      union un2h u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing3(union un3 u1 ATTRIBUTE_UNUSED,
+		     union un3 u2 ATTRIBUTE_UNUSED,
+		     union un3 u3 ATTRIBUTE_UNUSED,
+		     union un3 u4 ATTRIBUTE_UNUSED,
+		     union un3 u5 ATTRIBUTE_UNUSED,
+		     union un3 u6 ATTRIBUTE_UNUSED,
+		     union un3 u7 ATTRIBUTE_UNUSED,
+		     union un3 u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing3h(union un3h u1 ATTRIBUTE_UNUSED,
+		      union un3h u2 ATTRIBUTE_UNUSED,
+		      union un3h u3 ATTRIBUTE_UNUSED,
+		      union un3h u4 ATTRIBUTE_UNUSED,
+		      union un3h u5 ATTRIBUTE_UNUSED,
+		      union un3h u6 ATTRIBUTE_UNUSED,
+		      union un3h u7 ATTRIBUTE_UNUSED,
+		      union un3h u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing4(union un4 u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing4h(union un4h u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing5(union un5 u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+void
+check_union_passing5h(union un5h u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+void
+check_union_passing6(union un6 u1 ATTRIBUTE_UNUSED,
+		     union un6 u2 ATTRIBUTE_UNUSED,
+		     union un6 u3 ATTRIBUTE_UNUSED,
+		     union un6 u4 ATTRIBUTE_UNUSED,
+		     union un6 u5 ATTRIBUTE_UNUSED,
+		     union un6 u6 ATTRIBUTE_UNUSED,
+		     union un6 u7 ATTRIBUTE_UNUSED,
+		     union un6 u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing6h(union un6h u1 ATTRIBUTE_UNUSED,
+		      union un6h u2 ATTRIBUTE_UNUSED,
+		      union un6h u3 ATTRIBUTE_UNUSED,
+		      union un6h u4 ATTRIBUTE_UNUSED,
+		      union un6h u5 ATTRIBUTE_UNUSED,
+		      union un6h u6 ATTRIBUTE_UNUSED,
+		      union un6h u7 ATTRIBUTE_UNUSED,
+		      union un6h u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+#define check_union_passing1 WRAP_CALL(check_union_passing1)
+#define check_union_passing2 WRAP_CALL(check_union_passing2)
+#define check_union_passing3 WRAP_CALL(check_union_passing3)
+#define check_union_passing4 WRAP_CALL(check_union_passing4)
+#define check_union_passing5 WRAP_CALL(check_union_passing5)
+#define check_union_passing6 WRAP_CALL(check_union_passing6)
+
+#define check_union_passing1h WRAP_CALL(check_union_passing1h)
+#define check_union_passing1hf WRAP_CALL(check_union_passing1hf)
+#define check_union_passing1hh WRAP_CALL(check_union_passing1hh)
+#define check_union_passing2h WRAP_CALL(check_union_passing2h)
+#define check_union_passing3h WRAP_CALL(check_union_passing3h)
+#define check_union_passing4h WRAP_CALL(check_union_passing4h)
+#define check_union_passing5h WRAP_CALL(check_union_passing5h)
+#define check_union_passing6h WRAP_CALL(check_union_passing6h)
+
+
+static void
+do_test (void)
+{
+  union un1 u1[8];
+  union un2 u2[8];
+  union un3 u3[8];
+  union un4 u4;
+  union un5 u5;
+  union un6 u6[8];
+  union un1h u1h[8];
+  union un1hf u1hf[8];
+  union un1hh u1hh[8];
+  union un2h u2h[8];
+  union un3h u3h[8];
+  union un4h u4h;
+  union un5h u5h;
+  union un6h u6h[8];
+   int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      u1[i].x = (__m512){32+i, 0, i, 0, -i, 0, i - 12, i + 8,
+	                 32+i, 0, i, 0, -i, 0, i - 12, i + 8};
+
+      u1hf[i].x =  (__m512h){ 33+i, 1, i, 2, -i, 0, i - 15, i + 9,
+                              34+i, 1, i, 2, -i, 0, i - 15, i + 9,
+                              35+i, 1, i, 2, -i, 0, i - 15, i + 9,
+                              36+i, 1, i, 2, -i, 0, i - 15, i + 9};
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512[0] = u1[i].x;
+  num_fregs = 8;
+  check_union_passing1(u1[0], u1[1], u1[2], u1[3],
+		       u1[4], u1[5], u1[6], u1[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1h[i].x = u1[i].x;
+      (&fregs.zmm0)[i]._m512[0] = u1h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1h(u1h[0], u1h[1], u1h[2], u1h[3],
+		        u1h[4], u1h[5], u1h[6], u1h[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512h[0] = u1hf[i].x;
+  num_fregs = 8;
+  check_union_passing1hf(u1hf[0], u1hf[1], u1hf[2], u1hf[3],
+		         u1hf[4], u1hf[5], u1hf[6], u1hf[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1hh[i].x = u1hf[i].x;
+      (&fregs.zmm0)[i]._m512h[0] = u1hh[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1hh(u1hh[0], u1hh[1], u1hh[2], u1hh[3],
+		         u1hh[4], u1hh[5], u1hh[6], u1hh[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2[i].x = u1[i].x;
+      (&fregs.zmm0)[i]._m512[0] = u2[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2(u2[0], u2[1], u2[2], u2[3],
+		       u2[4], u2[5], u2[6], u2[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2h[i].x = u1hf[i].x;
+      (&fregs.zmm0)[i]._m512h[0] = u2h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2h(u2h[0], u2h[1], u2h[2], u2h[3],
+		        u2h[4], u2h[5], u2h[6], u2h[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3[i].x = u1[i].x;
+      (&fregs.zmm0)[i]._m512[0] = u3[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3(u3[0], u3[1], u3[2], u3[3],
+		       u3[4], u3[5], u3[6], u3[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3h[i].x = u1hf[i].x;
+      (&fregs.zmm0)[i]._m512h[0] = u3h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3h(u3h[0], u3h[1], u3h[2], u3h[3],
+		        u3h[4], u3h[5], u3h[6], u3h[7]);
+
+  check_union_passing4(u4);
+  check_union_passing5(u5);
+
+  check_union_passing4h(u4h);
+  check_union_passing5h(u5h);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u6[i].x = u1[i].x;
+      (&fregs.zmm0)[i]._m512[0] = u6[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing6(u6[0], u6[1], u6[2], u6[3],
+		       u6[4], u6[5], u6[6], u6[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u6h[i].x = u1hf[i].x;
+      (&fregs.zmm0)[i]._m512h[0] = u6h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing6h(u6h[0], u6h[1], u6h[2], u6h[3],
+		        u6h[4], u6h[5], u6h[6], u6h[7]);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c
new file mode 100644
index 00000000000..e6d165a8247
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c
@@ -0,0 +1,164 @@
+/* Test variable number of 512-bit vector arguments passed to functions.  */
+
+#include <stdio.h>
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m512_varargs (__m512 i0, __m512 i1, __m512 i2,
+				__m512 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m512 *argp;
+
+  compare (values.i0, i0, __m512);
+  compare (values.i1, i1, __m512);
+  compare (values.i2, i2, __m512);
+  compare (values.i3, i3, __m512);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m512 *)(((char *) fp) + 8);
+
+  /* Check __m512 arguments passed on stack.  */
+  compare (values.i4, argp[0], __m512);
+  compare (values.i5, argp[1], __m512);
+  compare (values.i6, argp[2], __m512);
+  compare (values.i7, argp[3], __m512);
+  compare (values.i8, argp[4], __m512);
+  compare (values.i9, argp[5], __m512);
+
+  /* Check register contents.  */
+  compare (fregs.zmm0, zmm_regs[0], __m512);
+  compare (fregs.zmm1, zmm_regs[1], __m512);
+  compare (fregs.zmm2, zmm_regs[2], __m512);
+  compare (fregs.zmm3, zmm_regs[3], __m512);
+}
+
+void
+fun_check_passing_m512h_varargs (__m512h i0, __m512h i1, __m512h i2,
+				 __m512h i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m512h *argp;
+
+  compare (values.i0, i0, __m512h);
+  compare (values.i1, i1, __m512h);
+  compare (values.i2, i2, __m512h);
+  compare (values.i3, i3, __m512h);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m512h *)(((char *) fp) + 8);
+
+  /* Check __m512h arguments passed on stack.  */
+  compare (values.i4, argp[0], __m512h);
+  compare (values.i5, argp[1], __m512h);
+  compare (values.i6, argp[2], __m512h);
+  compare (values.i7, argp[3], __m512h);
+  compare (values.i8, argp[4], __m512h);
+  compare (values.i9, argp[5], __m512h);
+
+  /* Check register contents.  */
+  compare (fregs.zmm0, zmm_regs[0], __m512h);
+  compare (fregs.zmm1, zmm_regs[1], __m512h);
+  compare (fregs.zmm2, zmm_regs[2], __m512h);
+  compare (fregs.zmm3, zmm_regs[3], __m512h);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+				      _i6, _i7, _i8, _i9, \
+				      _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m512_varargs (void)
+{
+  __m512 x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m512){32+i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m512-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m512_varargs,
+				 _m512);
+}
+
+void
+test_m512h_varargs (void)
+{
+  __m512h x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m512h) {
+        1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+	5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i,
+	9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i,
+	13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i,
+	17.17f16 + i, 18.18f16 + i, 19.19f16 + i, 20.20f16 + i,
+	21.21f16 + i, 22.22f16 + i, 23.23f16 + i, 24.24f16 + i,
+	25.25f16 + i, 26.26f16 + i, 27.27f16 + i, 28.28f16 + i,
+	29.29f16 + i, 30.30f16 + i, 31.31f16 + i, 32.32f16 + i
+    };
+  pass = "m512h-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m512h_varargs,
+				 _m512h);
+}
+
+void
+do_test (void)
+{
+  test_m512_varargs ();
+  test_m512h_varargs ();
+  if (failed)
+    abort ();
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 07/62] AVX512FP16: Add vaddph/vsubph/vdivph/vmulph.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (5 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 06/62] AVX512FP16: Add abi test for zmm liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-09-09  7:48   ` Hongtao Liu
  2021-07-01  6:15 ` [PATCH 08/62] AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph liuhongt
                   ` (54 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config.gcc: Add avx512fp16vlintrin.h.
	* config/i386/avx512fp16intrin.h: (_mm512_add_ph): New intrinsic.
	(_mm512_mask_add_ph): Likewise.
	(_mm512_maskz_add_ph): Likewise.
	(_mm512_sub_ph): Likewise.
	(_mm512_mask_sub_ph): Likewise.
	(_mm512_maskz_sub_ph): Likewise.
	(_mm512_mul_ph): Likewise.
	(_mm512_mask_mul_ph): Likewise.
	(_mm512_maskz_mul_ph): Likewise.
	(_mm512_div_ph): Likewise.
	(_mm512_mask_div_ph): Likewise.
	(_mm512_maskz_div_ph): Likewise.
	(_mm512_add_round_ph): Likewise.
	(_mm512_mask_add_round_ph): Likewise.
	(_mm512_maskz_add_round_ph): Likewise.
	(_mm512_sub_round_ph): Likewise.
	(_mm512_mask_sub_round_ph): Likewise.
	(_mm512_maskz_sub_round_ph): Likewise.
	(_mm512_mul_round_ph): Likewise.
	(_mm512_mask_mul_round_ph): Likewise.
	(_mm512_maskz_mul_round_ph): Likewise.
	(_mm512_div_round_ph): Likewise.
	(_mm512_mask_div_round_ph): Likewise.
	(_mm512_maskz_div_round_ph): Likewise.
	* config/i386/avx512fp16vlintrin.h: New header.
	* config/i386/i386-builtin-types.def (V16HF, V8HF, V32HF):
	Add new builtin types.
	* config/i386/i386-builtin.def: Add corresponding builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	(ix86_expand_round_builtin): Likewise.
	* config/i386/immintrin.h: Include avx512fp16vlintrin.h
	* config/i386/sse.md (VFH): New mode_iterator.
	(VF2H): Likewise.
	(avx512fmaskmode): Add HF vector modes.
	(avx512fmaskhalfmode): Likewise.
	(<plusminus_insn><mode>3<mask_name><round_name>): Adjust to for
	HF vector modes.
	(*<plusminus_insn><mode>3<mask_name><round_name>): Likewise.
	(mul<mode>3<mask_name><round_name>): Likewise.
	(*mul<mode>3<mask_name><round_name>): Likewise.
	(div<mode>3): Likewise.
	(<sse>_div<mode>3<mask_name><round_name>): Likewise.
	* config/i386/subst.md (SUBST_V): Add HF vector modes.
	(SUBST_A): Likewise.
	(round_mode512bit_condition): Adjust for V32HFmode.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add -mavx512vl and test for new intrinsics.
	* gcc.target/i386/avx-2.c: Add -mavx512vl.
	* gcc.target/i386/avx512fp16-11a.c: New test.
	* gcc.target/i386/avx512fp16-11b.c: Ditto.
	* gcc.target/i386/avx512vlfp16-11a.c: Ditto.
	* gcc.target/i386/avx512vlfp16-11b.c: Ditto.
	* gcc.target/i386/sse-13.c: Add test for new builtins.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config.gcc                                |   2 +-
 gcc/config/i386/avx512fp16intrin.h            | 251 ++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h          | 219 +++++++++++++++
 gcc/config/i386/i386-builtin-types.def        |   7 +
 gcc/config/i386/i386-builtin.def              |  20 ++
 gcc/config/i386/i386-expand.c                 |   5 +
 gcc/config/i386/immintrin.h                   |   2 +
 gcc/config/i386/sse.md                        |  62 +++--
 gcc/config/i386/subst.md                      |   6 +-
 gcc/testsuite/gcc.target/i386/avx-1.c         |   8 +-
 gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
 .../gcc.target/i386/avx512fp16-11a.c          |  36 +++
 .../gcc.target/i386/avx512fp16-11b.c          |  75 ++++++
 .../gcc.target/i386/avx512vlfp16-11a.c        |  68 +++++
 .../gcc.target/i386/avx512vlfp16-11b.c        |  96 +++++++
 gcc/testsuite/gcc.target/i386/sse-13.c        |   6 +
 gcc/testsuite/gcc.target/i386/sse-14.c        |  14 +
 gcc/testsuite/gcc.target/i386/sse-22.c        |  14 +
 gcc/testsuite/gcc.target/i386/sse-23.c        |   6 +
 19 files changed, 872 insertions(+), 27 deletions(-)
 create mode 100644 gcc/config/i386/avx512fp16vlintrin.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-11a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-11b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5b4f894185a..d64a8b9407e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
 		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
 		       amxbf16intrin.h x86gprintrin.h uintrintrin.h
 		       hresetintrin.h keylockerintrin.h avxvnniintrin.h
-		       mwaitintrin.h avx512fp16intrin.h"
+		       mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 3fc0770986e..3e9d676dc39 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -217,6 +217,257 @@ _mm_store_sh (void *__P, __m128h __A)
   *(_Float16 *) __P = ((__v8hf)__A)[0];
 }
 
+/* Intrinsics v[add,sub,mul,div]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_add_ph (__m512h __A, __m512h __B)
+{
+  return (__m512h) ((__v32hf) __A + (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_add_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_vaddph_v32hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_add_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_vaddph_v32hf_mask (__B, __C,
+					   _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sub_ph (__m512h __A, __m512h __B)
+{
+  return (__m512h) ((__v32hf) __A - (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sub_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_vsubph_v32hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sub_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_vsubph_v32hf_mask (__B, __C,
+					   _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mul_ph (__m512h __A, __m512h __B)
+{
+  return (__m512h) ((__v32hf) __A * (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mul_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_vmulph_v32hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mul_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_vmulph_v32hf_mask (__B, __C,
+					   _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_div_ph (__m512h __A, __m512h __B)
+{
+  return (__m512h) ((__v32hf) __A / (__v32hf) __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_div_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_vdivph_v32hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_div_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_vdivph_v32hf_mask (__B, __C,
+					   _mm512_setzero_ph (), __A);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_add_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_vaddph_v32hf_mask_round (__A, __B,
+						 _mm512_setzero_ph (),
+						 (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_add_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
+{
+  return __builtin_ia32_vaddph_v32hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_add_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_vaddph_v32hf_mask_round (__B, __C,
+						 _mm512_setzero_ph (),
+						 __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sub_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_vsubph_v32hf_mask_round (__A, __B,
+						 _mm512_setzero_ph (),
+						 (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sub_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
+{
+  return __builtin_ia32_vsubph_v32hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sub_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_vsubph_v32hf_mask_round (__B, __C,
+						 _mm512_setzero_ph (),
+						 __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mul_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_vmulph_v32hf_mask_round (__A, __B,
+						 _mm512_setzero_ph (),
+						 (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_mul_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
+{
+  return __builtin_ia32_vmulph_v32hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_mul_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_vmulph_v32hf_mask_round (__B, __C,
+						 _mm512_setzero_ph (),
+						 __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_div_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_vdivph_v32hf_mask_round (__A, __B,
+						 _mm512_setzero_ph (),
+						 (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_div_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
+{
+  return __builtin_ia32_vdivph_v32hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_vdivph_v32hf_mask_round (__B, __C,
+						 _mm512_setzero_ph (),
+						 __A, __D);
+}
+#else
+#define _mm512_add_round_ph(A, B, C)					\
+  ((__m512h)__builtin_ia32_vaddph_v32hf_mask_round((A), (B),		\
+						   _mm512_setzero_ph (),\
+						   (__mmask32)-1, (C)))
+
+#define _mm512_mask_add_round_ph(A, B, C, D, E)			\
+  ((__m512h)__builtin_ia32_vaddph_v32hf_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_add_round_ph(A, B, C, D)				\
+  ((__m512h)__builtin_ia32_vaddph_v32hf_mask_round((B), (C),		\
+						   _mm512_setzero_ph (),\
+						   (A), (D)))
+
+#define _mm512_sub_round_ph(A, B, C)					\
+  ((__m512h)__builtin_ia32_vsubph_v32hf_mask_round((A), (B),		\
+						   _mm512_setzero_ph (),\
+						   (__mmask32)-1, (C)))
+
+#define _mm512_mask_sub_round_ph(A, B, C, D, E)			\
+  ((__m512h)__builtin_ia32_vsubph_v32hf_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_sub_round_ph(A, B, C, D)				\
+  ((__m512h)__builtin_ia32_vsubph_v32hf_mask_round((B), (C),		\
+						   _mm512_setzero_ph (),\
+						   (A), (D)))
+
+#define _mm512_mul_round_ph(A, B, C)					\
+  ((__m512h)__builtin_ia32_vmulph_v32hf_mask_round((A), (B),		\
+						   _mm512_setzero_ph (),\
+						   (__mmask32)-1, (C)))
+
+#define _mm512_mask_mul_round_ph(A, B, C, D, E)			\
+  ((__m512h)__builtin_ia32_vmulph_v32hf_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_mul_round_ph(A, B, C, D)				\
+  ((__m512h)__builtin_ia32_vmulph_v32hf_mask_round((B), (C),		\
+						   _mm512_setzero_ph (),\
+						   (A), (D)))
+
+#define _mm512_div_round_ph(A, B, C)					\
+  ((__m512h)__builtin_ia32_vdivph_v32hf_mask_round((A), (B),		\
+						   _mm512_setzero_ph (),\
+						   (__mmask32)-1, (C)))
+
+#define _mm512_mask_div_round_ph(A, B, C, D, E)			\
+  ((__m512h)__builtin_ia32_vdivph_v32hf_mask_round((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_div_round_ph(A, B, C, D)				\
+  ((__m512h)__builtin_ia32_vdivph_v32hf_mask_round((B), (C),		\
+						   _mm512_setzero_ph (),\
+						   (A), (D)))
+#endif  /* __OPTIMIZE__  */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
new file mode 100644
index 00000000000..75fa9eb29e7
--- /dev/null
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -0,0 +1,219 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#error "Never use <avx512fp16vlintrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef __AVX512FP16VLINTRIN_H_INCLUDED
+#define __AVX512FP16VLINTRIN_H_INCLUDED
+
+#if !defined(__AVX512VL__) || !defined(__AVX512FP16__)
+#pragma GCC push_options
+#pragma GCC target("avx512fp16,avx512vl")
+#define __DISABLE_AVX512FP16VL__
+#endif /* __AVX512FP16VL__ */
+
+/* Intrinsics v[add,sub,mul,div]ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_add_ph (__m128h __A, __m128h __B)
+{
+  return (__m128h) ((__v8hf) __A + (__v8hf) __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_add_ph (__m256h __A, __m256h __B)
+{
+  return (__m256h) ((__v16hf) __A + (__v16hf) __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_add_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vaddph_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_add_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
+{
+  return __builtin_ia32_vaddph_v16hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_add_ph (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vaddph_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_add_ph (__mmask16 __A, __m256h __B, __m256h __C)
+{
+  return __builtin_ia32_vaddph_v16hf_mask (__B, __C,
+					   _mm256_setzero_ph (), __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sub_ph (__m128h __A, __m128h __B)
+{
+  return (__m128h) ((__v8hf) __A - (__v8hf) __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_sub_ph (__m256h __A, __m256h __B)
+{
+  return (__m256h) ((__v16hf) __A - (__v16hf) __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sub_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vsubph_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_sub_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
+{
+  return __builtin_ia32_vsubph_v16hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sub_ph (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vsubph_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_sub_ph (__mmask16 __A, __m256h __B, __m256h __C)
+{
+  return __builtin_ia32_vsubph_v16hf_mask (__B, __C,
+					   _mm256_setzero_ph (), __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_ph (__m128h __A, __m128h __B)
+{
+  return (__m128h) ((__v8hf) __A * (__v8hf) __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mul_ph (__m256h __A, __m256h __B)
+{
+  return (__m256h) ((__v16hf) __A * (__v16hf) __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_mul_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vmulph_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_mul_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
+{
+  return __builtin_ia32_vmulph_v16hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_mul_ph (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vmulph_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_mul_ph (__mmask16 __A, __m256h __B, __m256h __C)
+{
+  return __builtin_ia32_vmulph_v16hf_mask (__B, __C,
+					   _mm256_setzero_ph (), __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_div_ph (__m128h __A, __m128h __B)
+{
+  return (__m128h) ((__v8hf) __A / (__v8hf) __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_div_ph (__m256h __A, __m256h __B)
+{
+  return (__m256h) ((__v16hf) __A / (__v16hf) __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_div_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vdivph_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_div_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
+{
+  return __builtin_ia32_vdivph_v16hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_div_ph (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vdivph_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_div_ph (__mmask16 __A, __m256h __B, __m256h __C)
+{
+  return __builtin_ia32_vdivph_v16hf_mask (__B, __C,
+					   _mm256_setzero_ph (), __A);
+}
+
+#ifdef __DISABLE_AVX512FP16VL__
+#undef __DISABLE_AVX512FP16VL__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512FP16VL__ */
+
+#endif /* __AVX512FP16VLINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index eb5153002ae..ee3b8c30589 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -98,6 +98,7 @@ DEF_VECTOR_TYPE (V16UQI, UQI, V16QI)
 # AVX vectors
 DEF_VECTOR_TYPE (V4DF, DOUBLE)
 DEF_VECTOR_TYPE (V8SF, FLOAT)
+DEF_VECTOR_TYPE (V16HF, FLOAT16)
 DEF_VECTOR_TYPE (V4DI, DI)
 DEF_VECTOR_TYPE (V8SI, SI)
 DEF_VECTOR_TYPE (V16HI, HI)
@@ -108,6 +109,7 @@ DEF_VECTOR_TYPE (V16UHI, UHI, V16HI)
 
 # AVX512F vectors
 DEF_VECTOR_TYPE (V32SF, FLOAT)
+DEF_VECTOR_TYPE (V32HF, FLOAT16)
 DEF_VECTOR_TYPE (V16SF, FLOAT)
 DEF_VECTOR_TYPE (V8DF, DOUBLE)
 DEF_VECTOR_TYPE (V8DI, DI)
@@ -1302,3 +1304,8 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
 
 # FP16 builtins
 DEF_FUNCTION_TYPE (V8HF, V8HI)
+DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
+DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
+DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
+DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
+DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 1cc0cc6968c..b783d266dd8 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2774,6 +2774,20 @@ BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__builti
 BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPHI16PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPHI16PS_V4SF_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI)
 
+/* AVX512FP16.  */
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_vaddph_v8hf_mask", IX86_BUILTIN_VADDPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv16hf3_mask, "__builtin_ia32_vaddph_v16hf_mask", IX86_BUILTIN_VADDPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask, "__builtin_ia32_vaddph_v32hf_mask", IX86_BUILTIN_VADDPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv8hf3_mask, "__builtin_ia32_vsubph_v8hf_mask", IX86_BUILTIN_VSUBPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv16hf3_mask, "__builtin_ia32_vsubph_v16hf_mask", IX86_BUILTIN_VSUBPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask, "__builtin_ia32_vsubph_v32hf_mask", IX86_BUILTIN_VSUBPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv8hf3_mask, "__builtin_ia32_vmulph_v8hf_mask", IX86_BUILTIN_VMULPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv16hf3_mask, "__builtin_ia32_vmulph_v16hf_mask", IX86_BUILTIN_VMULPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_vmulph_v32hf_mask", IX86_BUILTIN_VMULPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv8hf3_mask, "__builtin_ia32_vdivph_v8hf_mask", IX86_BUILTIN_VDIVPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv16hf3_mask, "__builtin_ia32_vdivph_v16hf_mask", IX86_BUILTIN_VDIVPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_vdivph_v32hf_mask", IX86_BUILTIN_VDIVPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
 
@@ -2973,6 +2987,12 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_truncv8dfv8di2_mask_round, "
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
 
+/* AVX512FP16.  */
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_vaddph_v32hf_mask_round", IX86_BUILTIN_VADDPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_vsubph_v32hf_mask_round", IX86_BUILTIN_VSUBPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_vmulph_v32hf_mask_round", IX86_BUILTIN_VMULPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_vdivph_v32hf_mask_round", IX86_BUILTIN_VDIVPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
 /* FMA4 and XOP.  */
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 5ce7163b241..39647eb2cf1 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9760,6 +9760,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16HI_FTYPE_V8SI_V8SI_V16HI_UHI:
     case V8HI_FTYPE_V4SI_V4SI_V8HI_UQI:
     case V4DF_FTYPE_V4DF_V4DI_V4DF_UQI:
+    case V32HF_FTYPE_V32HF_V32HF_V32HF_USI:
     case V8SF_FTYPE_V8SF_V8SI_V8SF_UQI:
     case V4SF_FTYPE_V4SF_V4SI_V4SF_UQI:
     case V2DF_FTYPE_V2DF_V2DI_V2DF_UQI:
@@ -9777,6 +9778,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V8HI_FTYPE_V8HI_V8HI_V8HI_UQI:
     case V8SI_FTYPE_V8SI_V8SI_V8SI_UQI:
     case V4SI_FTYPE_V4SI_V4SI_V4SI_UQI:
+    case V16HF_FTYPE_V16HF_V16HF_V16HF_UHI:
     case V8SF_FTYPE_V8SF_V8SF_V8SF_UQI:
     case V16QI_FTYPE_V16QI_V16QI_V16QI_UHI:
     case V16HI_FTYPE_V16HI_V16HI_V16HI_UHI:
@@ -9784,6 +9786,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI:
     case V4DI_FTYPE_V4DI_V4DI_V4DI_UQI:
     case V4DF_FTYPE_V4DF_V4DF_V4DF_UQI:
+    case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI:
     case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI:
     case V8DF_FTYPE_V8DF_V8DF_V8DF_UQI:
     case V8DF_FTYPE_V8DF_V8DI_V8DF_UQI:
@@ -10460,6 +10463,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case INT_FTYPE_V4SF_INT:
       nargs = 2;
       break;
+    case V32HF_FTYPE_V32HF_V32HF_INT:
     case V4SF_FTYPE_V4SF_UINT_INT:
     case V4SF_FTYPE_V4SF_UINT64_INT:
     case V2DF_FTYPE_V2DF_UINT64_INT:
@@ -10500,6 +10504,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT:
     case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT:
     case V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT:
+    case V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT:
     case V2DF_FTYPE_V2DF_V2DF_V2DF_QI_INT:
     case V2DF_FTYPE_V2DF_V4SF_V2DF_QI_INT:
     case V2DF_FTYPE_V2DF_V4SF_V2DF_UQI_INT:
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index 5344e22c9c8..e08efb9dff3 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -96,6 +96,8 @@
 
 #include <avx512fp16intrin.h>
 
+#include <avx512fp16vlintrin.h>
+
 #include <shaintrin.h>
 
 #include <fmaintrin.h>
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 1009d656cbb..2c1b6fbcd86 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -295,6 +295,13 @@ (define_mode_iterator VF
   [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
+(define_mode_iterator VFH
+  [(V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+
 ;; 128- and 256-bit float vector modes
 (define_mode_iterator VF_128_256
   [(V8SF "TARGET_AVX") V4SF
@@ -318,6 +325,13 @@ (define_mode_iterator VF1_128_256VL
 (define_mode_iterator VF2
   [(V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
 
+;; All DFmode & HFmode vector float modes
+(define_mode_iterator VF2H
+  [(V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
+
 ;; 128- and 256-bit DF vector modes
 (define_mode_iterator VF2_128_256
   [(V4DF "TARGET_AVX") V2DF])
@@ -824,6 +838,7 @@ (define_mode_attr avx512fmaskmode
    (V32HI "SI") (V16HI "HI") (V8HI  "QI") (V4HI "QI")
    (V16SI "HI") (V8SI  "QI") (V4SI  "QI")
    (V8DI  "QI") (V4DI  "QI") (V2DI  "QI")
+   (V32HF "SI") (V16HF "HI") (V8HF  "QI")
    (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
@@ -842,6 +857,7 @@ (define_mode_attr avx512fmaskhalfmode
    (V32HI "HI") (V16HI "QI") (V8HI  "QI") (V4HI "QI")
    (V16SI "QI") (V8SI  "QI") (V4SI  "QI")
    (V8DI  "QI") (V4DI  "QI") (V2DI  "QI")
+   (V32HF "HI") (V16HF "QI") (V8HF  "QI")
    (V16SF "QI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
@@ -1940,18 +1956,18 @@ (define_insn_and_split "*nabs<mode>2"
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
 (define_expand "<insn><mode>3<mask_name><round_name>"
-  [(set (match_operand:VF 0 "register_operand")
-	(plusminus:VF
-	  (match_operand:VF 1 "<round_nimm_predicate>")
-	  (match_operand:VF 2 "<round_nimm_predicate>")))]
+  [(set (match_operand:VFH 0 "register_operand")
+	(plusminus:VFH
+	  (match_operand:VFH 1 "<round_nimm_predicate>")
+	  (match_operand:VFH 2 "<round_nimm_predicate>")))]
   "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
 (define_insn "*<insn><mode>3<mask_name><round_name>"
-  [(set (match_operand:VF 0 "register_operand" "=x,v")
-	(plusminus:VF
-	  (match_operand:VF 1 "<bcst_round_nimm_predicate>" "<comm>0,v")
-	  (match_operand:VF 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
+  [(set (match_operand:VFH 0 "register_operand" "=x,v")
+	(plusminus:VFH
+	  (match_operand:VFH 1 "<bcst_round_nimm_predicate>" "<comm>0,v")
+	  (match_operand:VFH 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
   "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
    && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
@@ -2002,18 +2018,18 @@ (define_insn "<sse>_vm<insn><mode>3<mask_scalar_name><round_scalar_name>"
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_expand "mul<mode>3<mask_name><round_name>"
-  [(set (match_operand:VF 0 "register_operand")
-	(mult:VF
-	  (match_operand:VF 1 "<round_nimm_predicate>")
-	  (match_operand:VF 2 "<round_nimm_predicate>")))]
+  [(set (match_operand:VFH 0 "register_operand")
+	(mult:VFH
+	  (match_operand:VFH 1 "<round_nimm_predicate>")
+	  (match_operand:VFH 2 "<round_nimm_predicate>")))]
   "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
 
 (define_insn "*mul<mode>3<mask_name><round_name>"
-  [(set (match_operand:VF 0 "register_operand" "=x,v")
-	(mult:VF
-	  (match_operand:VF 1 "<bcst_round_nimm_predicate>" "%0,v")
-	  (match_operand:VF 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
+  [(set (match_operand:VFH 0 "register_operand" "=x,v")
+	(mult:VFH
+	  (match_operand:VFH 1 "<bcst_round_nimm_predicate>" "%0,v")
+	  (match_operand:VFH 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
   "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands)
    && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
@@ -2067,9 +2083,9 @@ (define_insn "<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_scalar_n
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_expand "div<mode>3"
-  [(set (match_operand:VF2 0 "register_operand")
-	(div:VF2 (match_operand:VF2 1 "register_operand")
-		 (match_operand:VF2 2 "vector_operand")))]
+  [(set (match_operand:VF2H 0 "register_operand")
+	(div:VF2H (match_operand:VF2H 1 "register_operand")
+		  (match_operand:VF2H 2 "vector_operand")))]
   "TARGET_SSE2")
 
 (define_expand "div<mode>3"
@@ -2090,10 +2106,10 @@ (define_expand "div<mode>3"
 })
 
 (define_insn "<sse>_div<mode>3<mask_name><round_name>"
-  [(set (match_operand:VF 0 "register_operand" "=x,v")
-	(div:VF
-	  (match_operand:VF 1 "register_operand" "0,v")
-	  (match_operand:VF 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
+  [(set (match_operand:VFH 0 "register_operand" "=x,v")
+	(div:VFH
+	  (match_operand:VFH 1 "register_operand" "0,v")
+	  (match_operand:VFH 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
   "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    div<ssemodesuffix>\t{%2, %0|%0, %2}
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 477a89803fa..762383bfd11 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -24,6 +24,7 @@ (define_mode_iterator SUBST_V
    V32HI V16HI V8HI
    V16SI V8SI  V4SI
    V8DI  V4DI  V2DI
+   V32HF V16HF V8HF
    V16SF V8SF  V4SF
    V8DF  V4DF  V2DF])
 
@@ -35,6 +36,7 @@ (define_mode_iterator SUBST_A
    V32HI V16HI V8HI
    V16SI V8SI  V4SI
    V8DI  V4DI  V2DI
+   V32HF V16HF V8HF
    V16SF V8SF  V4SF
    V8DF  V4DF  V2DF
    QI HI SI DI SF DF])
@@ -142,7 +144,9 @@ (define_subst_attr "round_prefix" "round" "vex" "evex")
 (define_subst_attr "round_mode512bit_condition" "round" "1" "(<MODE>mode == V16SFmode
 							      || <MODE>mode == V8DFmode
 							      || <MODE>mode == V8DImode
-							      || <MODE>mode == V16SImode)")
+							      || <MODE>mode == V16SImode
+							      || <MODE>mode == V32HFmode)")
+
 (define_subst_attr "round_modev8sf_condition" "round" "1" "(<MODE>mode == V8SFmode)")
 (define_subst_attr "round_modev4sf_condition" "round" "1" "(<MODE>mode == V4SFmode)")
 (define_subst_attr "round_codefor" "round" "*" "")
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index f3676077743..1eaee861141 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16 -mavx512vl" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
@@ -685,6 +685,12 @@
 #define __builtin_ia32_vpshld_v2di(A, B, C) __builtin_ia32_vpshld_v2di(A, B, 1)
 #define __builtin_ia32_vpshld_v2di_mask(A, B, C, D, E)  __builtin_ia32_vpshld_v2di_mask(A, B, 1, D, E)
 
+/* avx512fp16intrin.h */
+#define __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8)
+
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
index 1751c52565c..642ae4d7bfb 100644
--- a/gcc/testsuite/gcc.target/i386/avx-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16 -mavx512vl" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-11a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-11a.c
new file mode 100644
index 00000000000..28492fa3f7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-11a.c
@@ -0,0 +1,36 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <immintrin.h>
+__m512h
+__attribute__ ((noinline, noclone))
+vadd512 (__m512h a, __m512h b)
+{
+  return a + b;
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+vsub512 (__m512h a, __m512h b)
+{
+  return a - b;
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+vmul512 (__m512h a, __m512h b)
+{
+  return a * b;
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+vdiv512 (__m512h a, __m512h b)
+{
+  return a / b;
+}
+
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-11b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-11b.c
new file mode 100644
index 00000000000..fc105152d2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-11b.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+#include <stdlib.h>
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-11a.c"
+
+/* Get random float16 between -50.x to 50.x.  */
+_Float16
+get_float16_noround()
+{
+  return ((int) (100.0 * rand ()/ (RAND_MAX + 1.0)) - 50)
+    + 0.1f * (int) (10 * rand() / (RAND_MAX + 1.0));
+}
+
+static void
+do_test (void)
+{
+  _Float16 x[32];
+  _Float16 y[32];
+  _Float16 res_add[32];
+  _Float16 res_sub[32];
+  _Float16 res_mul[32];
+  _Float16 res_div[32];
+  for (int i = 0 ; i != 32; i++)
+    {
+      x[i] = get_float16_noround ();
+      y[i] = get_float16_noround ();
+      if (y[i] == 0)
+	y[i] = 1.0f;
+      res_add[i] = x[i] + y[i];
+      res_sub[i] = x[i] - y[i];
+      res_mul[i] = x[i] * y[i];
+      res_div[i] = x[i] / y[i];
+
+    }
+
+  union512h u512 = { x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+      x[8], x[9], x[10], x[11], x[12], x[13], x[14], x[15],
+      x[16], x[17], x[18], x[19], x[20], x[21], x[22], x[23],
+      x[24], x[25], x[26], x[27], x[28], x[29], x[30], x[31] };
+  union512h u512_1 = {y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7],
+      y[8], y[9], y[10], y[11], y[12], y[13], y[14], y[15],
+      y[16], y[17], y[18], y[19], y[20], y[21], y[22], y[23],
+      y[24], y[25], y[26], y[27], y[28], y[29], y[30], y[31] };
+
+  __m512h v512;
+  union512h a512;
+
+  memset (&v512, -1, sizeof (v512));
+  v512 = vadd512 (u512.x, u512_1.x);
+  a512.x = v512;
+  if (check_union512h (a512, res_add))
+    abort ();
+  memset (&v512, -1, sizeof (v512));
+  v512 = vsub512 (u512.x, u512_1.x);
+  a512.x = v512;
+  if (check_union512h (a512, res_sub))
+    abort ();
+  memset (&v512, -1, sizeof (v512));
+  v512 = vmul512 (u512.x, u512_1.x);
+  a512.x = v512;
+  if (check_union512h (a512, res_mul))
+    abort ();
+  memset (&v512, -1, sizeof (v512));
+  v512 = vdiv512 (u512.x, u512_1.x);
+  a512.x = v512;
+  if (check_union512h (a512, res_div))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c
new file mode 100644
index 00000000000..a8c6296f504
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c
@@ -0,0 +1,68 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
+
+#include <immintrin.h>
+__m128h
+__attribute__ ((noinline, noclone))
+vadd128 (__m128h a, __m128h b)
+{
+  return a + b;
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+vadd256 (__m256h a, __m256h b)
+{
+  return a + b;
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+vsub128 (__m128h a, __m128h b)
+{
+  return a - b;
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+vsub256 (__m256h a, __m256h b)
+{
+  return a - b;
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+vmul128 (__m128h a, __m128h b)
+{
+  return a * b;
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+vmul256 (__m256h a, __m256h b)
+{
+  return a * b;
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+vdiv128 (__m128h a, __m128h b)
+{
+  return a / b;
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+vdiv256 (__m256h a, __m256h b)
+{
+  return a / b;
+}
+
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c
new file mode 100644
index 00000000000..b8d3e8a4e96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c
@@ -0,0 +1,96 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
+
+#include <string.h>
+#include <stdlib.h>
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512vlfp16-11a.c"
+
+/* Get random float16 between -50.x to 50.x.  */
+_Float16
+get_float16_noround()
+{
+  return ((int) (100.0 * rand ()/ (RAND_MAX + 1.0)) - 50)
+    + 0.1f * (int) (10 * rand() / (RAND_MAX + 1.0));
+}
+
+static void
+do_test (void)
+{
+  _Float16 x[16];
+  _Float16 y[16];
+  _Float16 res_add[16];
+  _Float16 res_sub[16];
+  _Float16 res_mul[16];
+  _Float16 res_div[16];
+  for (int i = 0 ; i != 16; i++)
+    {
+      x[i] = get_float16_noround ();
+      y[i] = get_float16_noround ();
+      if (y[i] == 0)
+	y[i] = 1.0f;
+      res_add[i] = x[i] + y[i];
+      res_sub[i] = x[i] - y[i];
+      res_mul[i] = x[i] * y[i];
+      res_div[i] = x[i] / y[i];
+
+    }
+
+  union128h u128 = { x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7] };
+  union128h u128_1 = { y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7] };
+  union256h u256 = { x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+      x[8], x[9], x[10], x[11], x[12], x[13], x[14], x[15] };
+  union256h u256_1 = { y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7],
+      y[8], y[9], y[10], y[11], y[12], y[13], y[14], y[15]};
+
+  __m128h v128;
+  __m256h v256;
+  union128h a128;
+  union256h a256;
+
+  memset (&v128, -1, sizeof (v128));
+  v128 = vadd128 (u128.x, u128_1.x);
+  a128.x = v128;
+  if (check_union128h (a128, res_add))
+    abort ();
+  memset (&v128, -1, sizeof (v128));
+  v128 = vsub128 (u128.x, u128_1.x);
+  a128.x = v128;
+  if (check_union128h (a128, res_sub))
+    abort ();
+  memset (&v128, -1, sizeof (v128));
+  v128 = vmul128 (u128.x, u128_1.x);
+  a128.x = v128;
+  if (check_union128h (a128, res_mul))
+    abort ();
+  memset (&v128, -1, sizeof (v128));
+  v128 = vdiv128 (u128.x, u128_1.x);
+  a128.x = v128;
+  if (check_union128h (a128, res_div))
+    abort ();
+
+  memset (&v256, -1, sizeof (v256));
+  v256 = vadd256 (u256.x, u256_1.x);
+  a256.x = v256;
+  if (check_union256h (a256, res_add))
+    abort ();
+  memset (&v256, -1, sizeof (v256));
+  v256 = vsub256 (u256.x, u256_1.x);
+  a256.x = v256;
+  if (check_union256h (a256, res_sub))
+    abort ();
+  memset (&v256, -1, sizeof (v256));
+  v256 = vmul256 (u256.x, u256_1.x);
+  a256.x = v256;
+  if (check_union256h (a256, res_mul))
+    abort ();
+  memset (&v256, -1, sizeof (v256));
+  v256 = vdiv256 (u256.x, u256_1.x);
+  a256.x = v256;
+  if (check_union256h (a256, res_div))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index f5f5c113612..50ed74cd6d6 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -702,6 +702,12 @@
 #define __builtin_ia32_vpshld_v2di(A, B, C) __builtin_ia32_vpshld_v2di(A, B, 1)
 #define __builtin_ia32_vpshld_v2di_mask(A, B, C, D, E)  __builtin_ia32_vpshld_v2di_mask(A, B, 1, D, E)
 
+/* avx512fp16intrin.h */
+#define __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8)
+
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 747d504cedb..26a5e94c7ca 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -667,6 +667,20 @@ test_3 (_mm512_mask_rcp28_round_ps, __m512, __m512, __mmask16, __m512, 8)
 test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 8)
 test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 8)
 
+/* avx512fp16intrin.h */
+test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 33411969901..8d25effd724 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -772,6 +772,20 @@ test_2 (_mm_rcp28_round_ss, __m128, __m128, __m128, 8)
 test_2 (_mm_rsqrt28_round_sd, __m128d, __m128d, __m128d, 8)
 test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 8)
 
+/* avx512fp16intrin.h */
+test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 86590ca5ffb..f7dd5d7495c 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -703,6 +703,12 @@
 #define __builtin_ia32_vpshld_v2di(A, B, C) __builtin_ia32_vpshld_v2di(A, B, 1)
 #define __builtin_ia32_vpshld_v2di_mask(A, B, C, D, E)  __builtin_ia32_vpshld_v2di_mask(A, B, 1, D, E)
 
+/* avx512fp16intrin.h */
+#define __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8)
+
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 08/62] AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (6 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 07/62] AVX512FP16: Add vaddph/vsubph/vdivph/vmulph liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 09/62] AVX512FP16: Enable _Float16 autovectorization liuhongt
                   ` (53 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h: New header file for
	FP16 runtime test.
	* gcc.target/i386/avx512fp16-vaddph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vaddph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vdivph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vdivph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmulph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vmulph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vsubph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vsubph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vaddph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vaddph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vdivph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vdivph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vmulph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vmulph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vsubph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vsubph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-helper.h       | 207 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vaddph-1a.c    |  26 +++
 .../gcc.target/i386/avx512fp16-vaddph-1b.c    |  92 ++++++++
 .../gcc.target/i386/avx512fp16-vdivph-1a.c    |  26 +++
 .../gcc.target/i386/avx512fp16-vdivph-1b.c    |  97 ++++++++
 .../gcc.target/i386/avx512fp16-vmulph-1a.c    |  26 +++
 .../gcc.target/i386/avx512fp16-vmulph-1b.c    |  92 ++++++++
 .../gcc.target/i386/avx512fp16-vsubph-1a.c    |  26 +++
 .../gcc.target/i386/avx512fp16-vsubph-1b.c    |  93 ++++++++
 .../gcc.target/i386/avx512fp16vl-vaddph-1a.c  |  29 +++
 .../gcc.target/i386/avx512fp16vl-vaddph-1b.c  |  16 ++
 .../gcc.target/i386/avx512fp16vl-vdivph-1a.c  |  29 +++
 .../gcc.target/i386/avx512fp16vl-vdivph-1b.c  |  16 ++
 .../gcc.target/i386/avx512fp16vl-vmulph-1a.c  |  29 +++
 .../gcc.target/i386/avx512fp16vl-vmulph-1b.c  |  16 ++
 .../gcc.target/i386/avx512fp16vl-vsubph-1a.c  |  29 +++
 .../gcc.target/i386/avx512fp16vl-vsubph-1b.c  |  16 ++
 17 files changed, 865 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
new file mode 100644
index 00000000000..9fde88a4f7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
@@ -0,0 +1,207 @@
+/* This file is used for emulation of avx512fp16 runtime tests. To
+   verify the correctness of _Float16 type calculation, the idea is
+   convert _Float16 to float and do emulation using float instructions. 
+   _Float16 type should not be emulate or check by itself.  */
+
+#include "avx512f-helper.h"
+#ifndef AVX512FP16_HELPER_INCLUDED
+#define AVX512FP16_HELPER_INCLUDED
+
+#ifdef DEBUG
+#include <string.h>
+#endif
+#include <math.h>
+#include <limits.h>
+#include <float.h>
+
+/* Useful macros.  */
+#define NOINLINE __attribute__((noinline,noclone))
+#define _ROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC)
+#define AVX512F_MAX_ELEM 512 / 32
+
+/* Structure for _Float16 emulation  */
+typedef union
+{
+  __m512          zmm;
+  __m512h         zmmh;
+  __m256          ymm[2];
+  __m256h         ymmh[2];
+  __m256i         ymmi[2];
+  __m128h         xmmh[4];
+  unsigned short  u16[32];
+  unsigned int    u32[16];
+  float           f32[16];
+  _Float16        f16[32];
+} V512;
+
+/* Global variables.  */
+V512 src1, src2, src3;
+int n_errs = 0;
+
+/* Helper function for packing/unpacking ph operands. */
+void NOINLINE 
+unpack_ph_2twops(V512 src, V512 *op1, V512 *op2)
+{
+    V512 v1;
+
+    op1->zmm = _mm512_cvtph_ps(src.ymmi[0]);
+    v1.ymm[0] = _mm512_extractf32x8_ps(src.zmm, 1);
+    op2->zmm = _mm512_cvtph_ps(v1.ymmi[0]);
+}
+
+V512 NOINLINE
+pack_twops_2ph(V512 op1, V512 op2)
+{
+    V512 v1, v2, v3;
+
+    v1.ymmi[0] = _mm512_cvtps_ph(op1.zmm, _MM_FROUND_TO_NEAREST_INT);
+    v2.ymmi[0] = _mm512_cvtps_ph(op2.zmm, _MM_FROUND_TO_NEAREST_INT);
+
+    v3.zmm = _mm512_insertf32x8(v1.zmm, v2.ymm[0], 1);
+
+    return v3;
+}
+
+/* Helper function used for result debugging */
+#ifdef DEBUG
+void NOINLINE
+display_ps(const void *p, const char *banner, int n_elems)
+{
+    int i;
+    V512 *v = (V512*)p;
+
+    if (banner) {
+        printf("%s", banner);
+    }
+
+    for (i = 15; i >= n_elems; i--) {
+        printf(" --------");
+        if (i == 8) {
+            printf("\n");
+            if (banner) {
+                printf("%*s", (int)strlen(banner), "");
+            }
+        }
+    }
+
+    for (; i >= 0; i--) {
+        printf(" %x", v->u32[i]);
+        if (i == 8) {
+            printf("\n");
+            if (banner) {
+                printf("%*s", (int)strlen(banner), "");
+            }
+        }
+    }
+    printf("\n");
+}
+#endif
+
+/* Functions/macros used for init/result checking.
+   Only check components within AVX512F_LEN.  */
+#define TO_STRING(x) #x
+#define STRINGIFY(x) TO_STRING(x)
+#define NAME_OF(NAME) STRINGIFY(INTRINSIC (NAME))
+
+#define CHECK_RESULT(res, exp, size, intrin) \
+  check_results ((void*)res, (void*)exp, size,\
+		 NAME_OF(intrin))
+
+/* To evaluate whether result match _Float16 precision,
+   only the last bit of real/emulate result could be
+   different.  */
+void NOINLINE
+check_results(void *got, void *exp, int n_elems, char *banner)
+{
+    int i;
+    V512 *v1 = (V512*)got;
+    V512 *v2 = (V512*)exp;
+
+    for (i = 0; i < n_elems; i++) {
+        if (v1->u16[i] != v2->u16[i] &&
+            ((v1->u16[i] > (v2->u16[i] + 1)) ||
+             (v1->u16[i] < (v2->u16[i] - 1)))) {
+
+#ifdef DEBUG
+            printf("ERROR: %s failed at %d'th element: %x(%f) != %x(%f)\n",
+                   banner ? banner : "", i,
+                   v1->u16[i], *(float *)(&v1->u16[i]),
+                   v2->u16[i], *(float *)(&v2->u16[i]));
+            display_ps(got, "got:", n_elems);
+            display_ps(exp, "exp:", n_elems);
+#endif
+            n_errs++;
+            break;
+        }
+    }
+}
+
+/* Functions for src/dest initialization */
+void NOINLINE
+init_src()
+{
+    V512 v1, v2, v3, v4;
+    int i;
+
+    for (i = 0; i < AVX512F_MAX_ELEM; i++) {
+        v1.f32[i] = -i + 1;
+        v2.f32[i] = i * 0.5f;
+        v3.f32[i] = i * 2.5f;
+        v4.f32[i] = i - 0.5f;
+
+        src3.u32[i] = (i + 1) * 10;
+    }
+
+    src1 = pack_twops_2ph(v1, v2);
+    src2 = pack_twops_2ph(v3, v4);
+}
+
+void NOINLINE
+init_dest(V512 * res, V512 * exp)
+{
+    int i;
+    V512 v1;
+
+    for (i = 0; i < AVX512F_MAX_ELEM; i++) {
+        v1.f32[i] = 12 + 0.5f * i;
+    }
+    *res = *exp = pack_twops_2ph(v1, v1);
+}
+
+#define EMULATE(NAME) EVAL(emulate_, NAME, AVX512F_LEN)
+
+#endif /* AVX512FP16_HELPER_INCLUDED */
+
+/* Macros for AVX512VL Testing. Include V512 component usage
+   and mask type for emulation. */
+
+#if AVX512F_LEN == 256
+#undef HF
+#undef SF
+#undef NET_MASK 
+#undef MASK_VALUE 
+#undef ZMASK_VALUE 
+#define NET_MASK 0xffff
+#define MASK_VALUE 0xcccc
+#define ZMASK_VALUE 0xfcc1
+#define HF(x) x.ymmh[0]
+#define SF(x) x.ymm[0]
+#elif AVX512F_LEN == 128
+#undef HF
+#undef SF
+#undef NET_MASK 
+#undef MASK_VALUE 
+#undef ZMASK_VALUE 
+#define NET_MASK 0xff
+#define MASK_VALUE 0xcc
+#define ZMASK_VALUE 0xc1
+#define HF(x) x.xmmh[0]
+#define SF(x) x.xmm[0]
+#else
+#define NET_MASK 0xffffffff
+#define MASK_VALUE 0xcccccccc
+#define ZMASK_VALUE 0xfcc1fcc1
+#define HF(x) x.zmmh
+#define SF(x) x.zmm
+#endif
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1a.c
new file mode 100644
index 00000000000..0590c34cebf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_add_ph (x1, x2);
+  res1 = _mm512_mask_add_ph (res1, m32, x1, x2);
+  res2 = _mm512_maskz_add_ph (m32, x1, x2);
+
+  res = _mm512_add_round_ph (x1, x2, 8);
+  res1 = _mm512_mask_add_round_ph (res1, m32, x1, x2, 8);
+  res2 = _mm512_maskz_add_round_ph (m32, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1b.c
new file mode 100644
index 00000000000..1c412b5c10e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddph-1b.c
@@ -0,0 +1,92 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(add_ph) (V512 * dest, V512 op1, V512 op2,
+		 __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+               v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = v1.f32[i] + v3.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+               v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = v2.f32[i] + v4.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  
+  EMULATE(add_ph) (&exp, src1, src2, NET_MASK, 0);
+  HF(res) = INTRINSIC (_add_ph) (HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _add_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(add_ph) (&exp, src1, src2, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_add_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_add_ph);
+
+  EMULATE(add_ph) (&exp, src1, src2, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_add_ph) (ZMASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_add_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(add_ph) (&exp, src1, src2, NET_MASK, 0);
+  HF(res) = INTRINSIC (_add_round_ph) (HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _add_round_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(add_ph) (&exp, src1, src2, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_add_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_add_round_ph);
+
+  EMULATE(add_ph) (&exp, src1, src2, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_add_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_add_round_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1a.c
new file mode 100644
index 00000000000..63f111f3196
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_div_ph (x1, x2);
+  res1 = _mm512_mask_div_ph (res1, m32, x1, x2);
+  res2 = _mm512_maskz_div_ph (m32, x1, x2);
+
+  res = _mm512_div_round_ph (x1, x2, 8);
+  res1 = _mm512_mask_div_round_ph (res1, m32, x1, x2, 8);
+  res2 = _mm512_maskz_div_round_ph (m32, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1b.c
new file mode 100644
index 00000000000..c8b38210e87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivph-1b.c
@@ -0,0 +1,97 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(div_ph) (V512 * dest, V512 op1, V512 op2,
+                __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+               v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = v1.f32[i] / v3.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+               v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = v2.f32[i] / v4.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(div_ph) (&exp, src1, src2, NET_MASK, 0);
+  HF(res) = INTRINSIC (_div_ph) (HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _div_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(div_ph) (&exp, src1, src2, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_div_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_div_ph);
+
+  EMULATE(div_ph) (&exp, src1, src2, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_div_ph) (ZMASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_div_ph);
+
+#if AVX512F_LEN == 512
+#if AVX512F_LEN == 512
+  EMULATE(div_ph) (&exp, src1, src2, NET_MASK, 0);
+  HF(res) = INTRINSIC (_div_round_ph) (HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _div_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(div_ph) (&exp, src1, src2, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_div_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_div_ph);
+  
+  EMULATE(div_ph) (&exp, src1, src2, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_div_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_div_ph);
+#endif
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1a.c
new file mode 100644
index 00000000000..1088e255786
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_mul_ph (x1, x2);
+  res1 = _mm512_mask_mul_ph (res1, m32, x1, x2);
+  res2 = _mm512_maskz_mul_ph (m32, x1, x2);
+
+  res = _mm512_mul_round_ph (x1, x2, 8);
+  res1 = _mm512_mask_mul_round_ph (res1, m32, x1, x2, 8);
+  res2 = _mm512_maskz_mul_round_ph (m32, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1b.c
new file mode 100644
index 00000000000..0d67e874d53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulph-1b.c
@@ -0,0 +1,92 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(mul_ph) (V512 * dest, V512 op1, V512 op2, 
+                __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+               v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = v1.f32[i] * v3.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+               v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = v2.f32[i] * v4.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(mul_ph) (&exp, src1, src2, NET_MASK, 0);
+  HF(res) = INTRINSIC (_mul_ph) (HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mul_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(mul_ph) (&exp, src1, src2, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_mul_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_mul_ph);
+
+  EMULATE(mul_ph) (&exp, src1, src2, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_mul_ph) (ZMASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_mul_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(mul_ph) (&exp, src1, src2, NET_MASK, 0);
+  HF(res) = INTRINSIC (_mul_round_ph) (HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mul_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(mul_ph) (&exp, src1, src2, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_mul_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_mul_ph);
+  
+  EMULATE(mul_ph) (&exp, src1, src2, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_mul_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_mul_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1a.c
new file mode 100644
index 00000000000..bb5eda64e37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_sub_ph (x1, x2);
+  res1 = _mm512_mask_sub_ph (res1, m32, x1, x2);
+  res2 = _mm512_maskz_sub_ph (m32, x1, x2);
+
+  res = _mm512_sub_round_ph (x1, x2, 8);
+  res1 = _mm512_mask_sub_round_ph (res1, m32, x1, x2, 8);
+  res2 = _mm512_maskz_sub_round_ph (m32, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1b.c
new file mode 100644
index 00000000000..bd31d98f43d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubph-1b.c
@@ -0,0 +1,93 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(sub_ph) (V512 * dest, V512 op1, V512 op2,
+                __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+               v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = v1.f32[i] - v3.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+               v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = v2.f32[i] - v4.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  
+  EMULATE(sub_ph) (&exp, src1, src2, NET_MASK, 0);
+  HF(res) = INTRINSIC (_sub_ph) (HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _sub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(sub_ph) (&exp, src1, src2, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_sub_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_sub_ph);
+
+  EMULATE(sub_ph) (&exp, src1, src2, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_sub_ph) (ZMASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_sub_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(sub_ph) (&exp, src1, src2, NET_MASK, 0);
+  HF(res) = INTRINSIC (_sub_round_ph) (HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _sub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(sub_ph) (&exp, src1, src2, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_sub_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_sub_ph);
+
+  EMULATE(sub_ph) (&exp, src1, src2, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_sub_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_sub_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1a.c
new file mode 100644
index 00000000000..354d897dd9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1,x2;
+volatile __m128h x3, x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_add_ph (x1, x2);
+  res1 = _mm256_mask_add_ph (res1, m16, x1, x2);
+  res1 = _mm256_maskz_add_ph (m16, x1, x2);
+
+  res2 = _mm_add_ph (x3, x4);
+  res2 = _mm_mask_add_ph (res2, m8, x3, x4);
+  res2 = _mm_maskz_add_ph (m8, x3, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
new file mode 100644
index 00000000000..fcf6a9058f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vaddph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vaddph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vaddph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1a.c
new file mode 100644
index 00000000000..038d9e42fce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1,x2;
+volatile __m128h x3, x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_div_ph (x1, x2);
+  res1 = _mm256_mask_div_ph (res1, m16, x1, x2);
+  res1 = _mm256_maskz_div_ph (m16, x1, x2);
+
+  res2 = _mm_div_ph (x3, x4);
+  res2 = _mm_mask_div_ph (res2, m8, x3, x4);
+  res2 = _mm_maskz_div_ph (m8, x3, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c
new file mode 100644
index 00000000000..48965c6cfb8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vdivph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vdivph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vdivph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1a.c
new file mode 100644
index 00000000000..26663c5ca8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1,x2;
+volatile __m128h x3, x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_mul_ph (x1, x2);
+  res1 = _mm256_mask_mul_ph (res1, m16, x1, x2);
+  res1 = _mm256_maskz_mul_ph (m16, x1, x2);
+
+  res2 = _mm_mul_ph (x3, x4);
+  res2 = _mm_mask_mul_ph (res2, m8, x3, x4);
+  res2 = _mm_maskz_mul_ph (m8, x3, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c
new file mode 100644
index 00000000000..2b3ba050533
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmulph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vmulph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vmulph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1a.c
new file mode 100644
index 00000000000..10e5cbfed92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1,x2;
+volatile __m128h x3, x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_sub_ph (x1, x2);
+  res1 = _mm256_mask_sub_ph (res1, m16, x1, x2);
+  res1 = _mm256_maskz_sub_ph (m16, x1, x2);
+
+  res2 = _mm_sub_ph (x3, x4);
+  res2 = _mm_mask_sub_ph (res2, m8, x3, x4);
+  res2 = _mm_maskz_sub_ph (m8, x3, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c
new file mode 100644
index 00000000000..fa162185e3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsubph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vsubph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vsubph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 09/62] AVX512FP16: Enable _Float16 autovectorization
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (7 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 08/62] AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-09-10  7:03   ` Hongtao Liu
  2021-07-01  6:15 ` [PATCH 10/62] AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh liuhongt
                   ` (52 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

From: "H.J. Lu" <hjl.tools@gmail.com>

gcc/ChangeLog:

	* config/i386/i386-expand.c
	(ix86_avx256_split_vector_move_misalign): Handle V16HF mode.
	* config/i386/i386.c
	(ix86_preferred_simd_mode): Handle HF mode.
	* config/i386/sse.md (V_256H): New mode iterator.
	(avx_vextractf128<mode>): Use it.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vect-float16-1.c: New test.
	* gcc.target/i386/vect-float16-10.c: Ditto.
	* gcc.target/i386/vect-float16-11.c: Ditto.
	* gcc.target/i386/vect-float16-12.c: Ditto.
	* gcc.target/i386/vect-float16-2.c: Ditto.
	* gcc.target/i386/vect-float16-3.c: Ditto.
	* gcc.target/i386/vect-float16-4.c: Ditto.
	* gcc.target/i386/vect-float16-5.c: Ditto.
	* gcc.target/i386/vect-float16-6.c: Ditto.
	* gcc.target/i386/vect-float16-7.c: Ditto.
	* gcc.target/i386/vect-float16-8.c: Ditto.
	* gcc.target/i386/vect-float16-9.c: Ditto.
---
 gcc/config/i386/i386-expand.c                   |  4 ++++
 gcc/config/i386/i386.c                          | 14 ++++++++++++++
 gcc/config/i386/sse.md                          |  7 ++++++-
 gcc/testsuite/gcc.target/i386/vect-float16-1.c  | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-10.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-11.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-12.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-2.c  | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-3.c  | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-4.c  | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-5.c  | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-6.c  | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-7.c  | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-8.c  | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-float16-9.c  | 14 ++++++++++++++
 15 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-9.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 39647eb2cf1..df50c72ab16 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -498,6 +498,10 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1)
       extract = gen_avx_vextractf128v32qi;
       mode = V16QImode;
       break;
+    case E_V16HFmode:
+      extract = gen_avx_vextractf128v16hf;
+      mode = V8HFmode;
+      break;
     case E_V8SFmode:
       extract = gen_avx_vextractf128v8sf;
       mode = V4SFmode;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 79e6880d9dd..dc0d440061b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22360,6 +22360,20 @@ ix86_preferred_simd_mode (scalar_mode mode)
       else
 	return V2DImode;
 
+    case E_HFmode:
+      if (TARGET_AVX512FP16)
+	{
+	  if (TARGET_AVX512VL)
+	    {
+	      if (TARGET_PREFER_AVX128)
+		return V8HFmode;
+	      else if (TARGET_PREFER_AVX256)
+		return V16HFmode;
+	    }
+	  return V32HFmode;
+	}
+      return word_mode;
+
     case E_SFmode:
       if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
 	return V16SFmode;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2c1b6fbcd86..a0cfd611006 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -276,6 +276,11 @@ (define_mode_iterator V_128
 (define_mode_iterator V_256
   [V32QI V16HI V8SI V4DI V8SF V4DF])
 
+;; All 256bit vector modes including HF vector mode
+(define_mode_iterator V_256H
+  [V32QI V16HI V8SI V4DI V8SF V4DF
+   (V16HF "TARGET_AVX512F && TARGET_AVX512VL")])
+
 ;; All 128bit and 256bit vector modes
 (define_mode_iterator V_128_256
   [V32QI V16QI V16HI V8HI V8SI V4SI V4DI V2DI V8SF V4SF V4DF V2DF])
@@ -9045,7 +9050,7 @@ (define_expand "avx512vl_vextractf128<mode>"
 
 (define_expand "avx_vextractf128<mode>"
   [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
-   (match_operand:V_256 1 "register_operand")
+   (match_operand:V_256H 1 "register_operand")
    (match_operand:SI 2 "const_0_to_1_operand")]
   "TARGET_AVX"
 {
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-1.c b/gcc/testsuite/gcc.target/i386/vect-float16-1.c
new file mode 100644
index 00000000000..0f82cf94932
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times "vaddph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-10.c b/gcc/testsuite/gcc.target/i386/vect-float16-10.c
new file mode 100644
index 00000000000..217645692ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-10.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] / c[i];
+}
+
+/* { dg-final { scan-assembler-times "vdivph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-11.c b/gcc/testsuite/gcc.target/i386/vect-float16-11.c
new file mode 100644
index 00000000000..e0409ce9d3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-11.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] / c[i];
+}
+
+/* { dg-final { scan-assembler-times "vdivph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-12.c b/gcc/testsuite/gcc.target/i386/vect-float16-12.c
new file mode 100644
index 00000000000..d92a25dc255
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-12.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] / c[i];
+}
+
+/* { dg-final { scan-assembler-times "vdivph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-2.c b/gcc/testsuite/gcc.target/i386/vect-float16-2.c
new file mode 100644
index 00000000000..974fca4ce09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times "vaddph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-3.c b/gcc/testsuite/gcc.target/i386/vect-float16-3.c
new file mode 100644
index 00000000000..9bca9142df7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times "vaddph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-4.c b/gcc/testsuite/gcc.target/i386/vect-float16-4.c
new file mode 100644
index 00000000000..e6f26f0aa40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] - c[i];
+}
+
+/* { dg-final { scan-assembler-times "vsubph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-5.c b/gcc/testsuite/gcc.target/i386/vect-float16-5.c
new file mode 100644
index 00000000000..38f287b1dc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-5.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] - c[i];
+}
+
+/* { dg-final { scan-assembler-times "vsubph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-6.c b/gcc/testsuite/gcc.target/i386/vect-float16-6.c
new file mode 100644
index 00000000000..bc9f7870061
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-6.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] - c[i];
+}
+
+/* { dg-final { scan-assembler-times "vsubph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-7.c b/gcc/testsuite/gcc.target/i386/vect-float16-7.c
new file mode 100644
index 00000000000..b4849cf77c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-7.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] * c[i];
+}
+
+/* { dg-final { scan-assembler-times "vmulph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-8.c b/gcc/testsuite/gcc.target/i386/vect-float16-8.c
new file mode 100644
index 00000000000..71631b17cc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] * c[i];
+}
+
+/* { dg-final { scan-assembler-times "vmulph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-9.c b/gcc/testsuite/gcc.target/i386/vect-float16-9.c
new file mode 100644
index 00000000000..1be5c7f022f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-9.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] * c[i];
+}
+
+/* { dg-final { scan-assembler-times "vmulph" 16 } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 10/62] AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (8 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 09/62] AVX512FP16: Enable _Float16 autovectorization liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 11/62] AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh liuhongt
                   ` (51 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub, Liu, Hongtao

From: "Liu, Hongtao" <hongtao.liu@intel.com>

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_add_sh): New intrinsic.
	(_mm_mask_add_sh): Likewise.
	(_mm_maskz_add_sh): Likewise.
	(_mm_sub_sh): Likewise.
	(_mm_mask_sub_sh): Likewise.
	(_mm_maskz_sub_sh): Likewise.
	(_mm_mul_sh): Likewise.
	(_mm_mask_mul_sh): Likewise.
	(_mm_maskz_mul_sh): Likewise.
	(_mm_div_sh): Likewise.
	(_mm_mask_div_sh): Likewise.
	(_mm_maskz_div_sh): Likewise.
	(_mm_add_round_sh): Likewise.
	(_mm_mask_add_round_sh): Likewise.
	(_mm_maskz_add_round_sh): Likewise.
	(_mm_sub_round_sh): Likewise.
	(_mm_mask_sub_round_sh): Likewise.
	(_mm_maskz_sub_round_sh): Likewise.
	(_mm_mul_round_sh): Likewise.
	(_mm_mask_mul_round_sh): Likewise.
	(_mm_maskz_mul_round_sh): Likewise.
	(_mm_div_round_sh): Likewise.
	(_mm_mask_div_round_sh): Likewise.
	(_mm_maskz_div_round_sh): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_round_builtin): Handle new builtins.
	* config/i386/sse.md (VF_128): Change description.
	(<sse>_vm<plusminus_insn><mode>3<mask_scalar_name><round_scalar_name>):
	Adjust to support HF vector modes.
	(<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_scalar_name>):
	Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 254 +++++++++++++++++++++++++
 gcc/config/i386/i386-builtin-types.def |   2 +
 gcc/config/i386/i386-builtin.def       |   8 +
 gcc/config/i386/i386-expand.c          |   2 +
 gcc/config/i386/sse.md                 |  22 +--
 gcc/testsuite/gcc.target/i386/avx-1.c  |   4 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   4 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  12 ++
 gcc/testsuite/gcc.target/i386/sse-22.c |  12 ++
 gcc/testsuite/gcc.target/i386/sse-23.c |   4 +
 10 files changed, 313 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 3e9d676dc39..6ae12ebf920 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -468,6 +468,260 @@ _mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
 						   (A), (D)))
 #endif  /* __OPTIMIZE__  */
 
+/* Intrinsics of v[add,sub,mul,div]sh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_add_sh (__m128h __A, __m128h __B)
+{
+  __A[0] += __B[0];
+  return __A;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_add_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vaddsh_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_add_sh (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vaddsh_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sub_sh (__m128h __A, __m128h __B)
+{
+  __A[0] -= __B[0];
+  return __A;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sub_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vsubsh_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sub_sh (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vsubsh_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_sh (__m128h __A, __m128h __B)
+{
+  __A[0] *= __B[0];
+  return __A;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_mul_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vmulsh_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_mul_sh (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vmulsh_v8hf_mask (__B, __C, _mm_setzero_ph (), __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_div_sh (__m128h __A, __m128h __B)
+{
+  __A[0] /= __B[0];
+  return __A;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_div_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vdivsh_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_div_sh (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vdivsh_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_add_round_sh (__m128h __A, __m128h __B, const int __C)
+{
+  return __builtin_ia32_vaddsh_v8hf_mask_round (__A, __B,
+						_mm_setzero_ph (),
+						(__mmask8) -1, __C);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_add_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
+{
+  return __builtin_ia32_vaddsh_v8hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_add_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
+{
+  return __builtin_ia32_vaddsh_v8hf_mask_round (__B, __C,
+						_mm_setzero_ph (),
+						__A, __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sub_round_sh (__m128h __A, __m128h __B, const int __C)
+{
+  return __builtin_ia32_vsubsh_v8hf_mask_round (__A, __B,
+						_mm_setzero_ph (),
+						(__mmask8) -1, __C);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sub_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
+{
+  return __builtin_ia32_vsubsh_v8hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sub_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
+{
+  return __builtin_ia32_vsubsh_v8hf_mask_round (__B, __C,
+						_mm_setzero_ph (),
+						__A, __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_round_sh (__m128h __A, __m128h __B, const int __C)
+{
+  return __builtin_ia32_vmulsh_v8hf_mask_round (__A, __B,
+						_mm_setzero_ph (),
+						(__mmask8) -1, __C);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_mul_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
+{
+  return __builtin_ia32_vmulsh_v8hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_mul_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
+{
+  return __builtin_ia32_vmulsh_v8hf_mask_round (__B, __C,
+						_mm_setzero_ph (),
+						__A, __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_div_round_sh (__m128h __A, __m128h __B, const int __C)
+{
+  return __builtin_ia32_vdivsh_v8hf_mask_round (__A, __B,
+						_mm_setzero_ph (),
+						(__mmask8) -1, __C);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_div_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
+{
+  return __builtin_ia32_vdivsh_v8hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
+{
+  return __builtin_ia32_vdivsh_v8hf_mask_round (__B, __C,
+						_mm_setzero_ph (),
+						__A, __D);
+}
+#else
+#define _mm_add_round_sh(A, B, C)					\
+  ((__m128h)__builtin_ia32_vaddsh_v8hf_mask_round ((A), (B),		\
+						   _mm_setzero_ph (),	\
+						   (__mmask8)-1, (C)))
+
+#define _mm_mask_add_round_sh(A, B, C, D, E)				\
+  ((__m128h)__builtin_ia32_vaddsh_v8hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_add_round_sh(A, B, C, D)				\
+  ((__m128h)__builtin_ia32_vaddsh_v8hf_mask_round ((B), (C),		\
+						   _mm_setzero_ph (),	\
+						   (A), (D)))
+
+#define _mm_sub_round_sh(A, B, C)					\
+  ((__m128h)__builtin_ia32_vsubsh_v8hf_mask_round ((A), (B),		\
+						   _mm_setzero_ph (),	\
+						   (__mmask8)-1, (C)))
+
+#define _mm_mask_sub_round_sh(A, B, C, D, E)				\
+  ((__m128h)__builtin_ia32_vsubsh_v8hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_sub_round_sh(A, B, C, D)				\
+  ((__m128h)__builtin_ia32_vsubsh_v8hf_mask_round ((B), (C),		\
+						   _mm_setzero_ph (),	\
+						   (A), (D)))
+
+#define _mm_mul_round_sh(A, B, C)					\
+  ((__m128h)__builtin_ia32_vmulsh_v8hf_mask_round ((A), (B),		\
+						   _mm_setzero_ph (),	\
+						   (__mmask8)-1, (C)))
+
+#define _mm_mask_mul_round_sh(A, B, C, D, E)				\
+  ((__m128h)__builtin_ia32_vmulsh_v8hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_mul_round_sh(A, B, C, D)				\
+  ((__m128h)__builtin_ia32_vmulsh_v8hf_mask_round ((B), (C),		\
+						   _mm_setzero_ph (),	\
+						   (A), (D)))
+
+#define _mm_div_round_sh(A, B, C)					\
+  ((__m128h)__builtin_ia32_vdivsh_v8hf_mask_round ((A), (B),		\
+						   _mm_setzero_ph (),	\
+						   (__mmask8)-1, (C)))
+
+#define _mm_mask_div_round_sh(A, B, C, D, E)				\
+  ((__m128h)__builtin_ia32_vdivsh_v8hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_div_round_sh(A, B, C, D)				\
+  ((__m128h)__builtin_ia32_vdivsh_v8hf_mask_round ((B), (C),		\
+						   _mm_setzero_ph (),	\
+						   (A), (D)))
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index ee3b8c30589..ed738f71927 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1304,7 +1304,9 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
 
 # FP16 builtins
 DEF_FUNCTION_TYPE (V8HF, V8HI)
+DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index b783d266dd8..60e2b75be14 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2787,6 +2787,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv8hf3_mask, "__builtin_ia32_vdivph_v8hf_mask", IX86_BUILTIN_VDIVPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv16hf3_mask, "__builtin_ia32_vdivph_v16hf_mask", IX86_BUILTIN_VDIVPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_vdivph_v32hf_mask", IX86_BUILTIN_VDIVPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask, "__builtin_ia32_vaddsh_v8hf_mask", IX86_BUILTIN_VADDSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask, "__builtin_ia32_vsubsh_v8hf_mask", IX86_BUILTIN_VSUBSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask, "__builtin_ia32_vmulsh_v8hf_mask", IX86_BUILTIN_VMULSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask, "__builtin_ia32_vdivsh_v8hf_mask", IX86_BUILTIN_VDIVSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -2992,6 +2996,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_vsubph_v32hf_mask_round", IX86_BUILTIN_VSUBPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_vmulph_v32hf_mask_round", IX86_BUILTIN_VMULPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_vdivph_v32hf_mask_round", IX86_BUILTIN_VDIVPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask_round, "__builtin_ia32_vaddsh_v8hf_mask_round", IX86_BUILTIN_VADDSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask_round, "__builtin_ia32_vsubsh_v8hf_mask_round", IX86_BUILTIN_VSUBSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask_round, "__builtin_ia32_vmulsh_v8hf_mask_round", IX86_BUILTIN_VMULSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask_round, "__builtin_ia32_vdivsh_v8hf_mask_round", IX86_BUILTIN_VDIVSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index df50c72ab16..d2a47150e1b 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10468,6 +10468,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
       nargs = 2;
       break;
     case V32HF_FTYPE_V32HF_V32HF_INT:
+    case V8HF_FTYPE_V8HF_V8HF_INT:
     case V4SF_FTYPE_V4SF_UINT_INT:
     case V4SF_FTYPE_V4SF_UINT64_INT:
     case V2DF_FTYPE_V2DF_UINT64_INT:
@@ -10515,6 +10516,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V4SF_FTYPE_V4SF_V4SF_V4SF_QI_INT:
     case V4SF_FTYPE_V4SF_V2DF_V4SF_QI_INT:
     case V4SF_FTYPE_V4SF_V2DF_V4SF_UQI_INT:
+    case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT:
       nargs = 5;
       break;
     case V16SF_FTYPE_V16SF_INT_V16SF_HI_INT:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a0cfd611006..8fa3f8ddac9 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -347,7 +347,7 @@ (define_mode_iterator VF2_512_256
 (define_mode_iterator VF2_512_256VL
   [V8DF (V4DF "TARGET_AVX512VL")])
 
-;; All 128bit vector float modes
+;; All 128bit vector SF/DF modes
 (define_mode_iterator VF_128
   [V4SF (V2DF "TARGET_SSE2")])
 
@@ -2006,11 +2006,11 @@ (define_insn "*<sse>_vm<insn><mode>3"
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_insn "<sse>_vm<insn><mode>3<mask_scalar_name><round_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (plusminus:VF_128
-	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_scalar_constraint>"))
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (plusminus:VFH_128
+	    (match_operand:VFH_128 1 "register_operand" "0,v")
+	    (match_operand:VFH_128 2 "nonimmediate_operand" "xm,<round_scalar_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
@@ -2070,11 +2070,11 @@ (define_insn "*<sse>_vm<multdiv_mnemonic><mode>3"
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_insn "<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (multdiv:VF_128
-	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_scalar_constraint>"))
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (multdiv:VFH_128
+	    (match_operand:VFH_128 1 "register_operand" "0,v")
+	    (match_operand:VFH_128 2 "nonimmediate_operand" "xm,<round_scalar_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_SSE"
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 1eaee861141..26ca87ce2f5 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -690,6 +690,10 @@
 #define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 50ed74cd6d6..ae35adb5ead 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -707,6 +707,10 @@
 #define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 26a5e94c7ca..e79edf0a5bb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -672,14 +672,26 @@ test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm_add_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_sub_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_mul_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_div_round_sh, __m128h, __m128h, __m128h, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm_maskz_add_round_sh, __m128h, __mmask32, __m128h, __m128h, 8)
+test_3 (_mm_maskz_sub_round_sh, __m128h, __mmask32, __m128h, __m128h, 8)
+test_3 (_mm_maskz_mul_round_sh, __m128h, __mmask32, __m128h, __m128h, 8)
+test_3 (_mm_maskz_div_round_sh, __m128h, __mmask32, __m128h, __m128h, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm_mask_add_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8)
+test_4 (_mm_mask_sub_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8)
+test_4 (_mm_mask_mul_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8)
+test_4 (_mm_mask_div_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8)
 
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 8d25effd724..2c1f27d881a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -777,14 +777,26 @@ test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm_add_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_sub_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_mul_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_div_round_sh, __m128h, __m128h, __m128h, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm_maskz_add_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm_maskz_sub_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm_maskz_mul_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm_maskz_div_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm_mask_add_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_sub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_mul_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_div_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index f7dd5d7495c..a89aef2aa8e 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -708,6 +708,10 @@
 #define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 11/62] AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (9 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 10/62] AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 12/62] AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh liuhongt
                   ` (50 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vaddsh-1a.c: New test.
	* gcc.target/i386/avx512fp16-vaddsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vdivsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vdivsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmulsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vmulsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vsubsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vsubsh-1b.c: Ditto.
	* gcc.target/i386/pr54855-11.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-vaddsh-1a.c    |  27 +++++
 .../gcc.target/i386/avx512fp16-vaddsh-1b.c    | 104 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vdivsh-1a.c    |  27 +++++
 .../gcc.target/i386/avx512fp16-vdivsh-1b.c    |  76 +++++++++++++
 .../gcc.target/i386/avx512fp16-vmulsh-1a.c    |  27 +++++
 .../gcc.target/i386/avx512fp16-vmulsh-1b.c    |  77 +++++++++++++
 .../gcc.target/i386/avx512fp16-vsubsh-1a.c    |  27 +++++
 .../gcc.target/i386/avx512fp16-vsubsh-1b.c    |  76 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr54855-11.c    |  16 +++
 9 files changed, 457 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-11.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1a.c
new file mode 100644
index 00000000000..97aac3fd131
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res;
+volatile __m128h x1, x2;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_add_sh (x1, x2);
+  res = _mm_mask_add_sh (res, m8, x1, x2);
+  res = _mm_maskz_add_sh (m8, x1, x2);
+
+  res = _mm_add_round_sh (x1, x2, 8);
+  res = _mm_mask_add_round_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_add_round_sh (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1b.c
new file mode 100644
index 00000000000..724112c8fc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vaddsh-1b.c
@@ -0,0 +1,104 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_add_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = v1.f32[0] + v3.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+   
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_add_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_add_sh(src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_add_sh");
+
+  //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0]
+  emulate_add_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_add_sh(res.xmmh[0], 0x1,
+			       	src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_add_sh");
+
+  //dest.fp16[0] remains unchanged
+  init_dest(&res, &exp);
+  emulate_add_sh(&exp, src1, src2,  0x2, 0);
+  res.xmmh[0] = _mm_mask_add_sh(res.xmmh[0], 0x2,
+			       	src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_add_sh");
+
+  //dest.fp16[0] = 0
+  emulate_add_sh(&exp, src1, src2,  0x2, 1);
+  res.xmmh[0] = _mm_maskz_add_sh(0x2, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_add_sh");
+
+  //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0]
+  emulate_add_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_add_sh(0x3, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_add_sh");
+
+  //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0]
+  emulate_add_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_add_round_sh(src1.xmmh[0], 
+				 src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_add_round_sh");
+
+  //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0]
+  emulate_add_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_add_round_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+				      src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_add_round_sh");
+
+  //dest.fp16[0] remains unchanged
+  init_dest(&res, &exp);
+  emulate_add_sh(&exp, src1, src2,  0x2, 0);
+  res.xmmh[0] = _mm_mask_add_round_sh(res.xmmh[0], 0x2, src1.xmmh[0], 
+				      src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_add_round_sh");
+
+  //dest.fp16[0] = 0
+  emulate_add_sh(&exp, src1, src2,  0x2, 1);
+  res.xmmh[0] = _mm_maskz_add_round_sh(0x2, src1.xmmh[0], 
+				       src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_add_round_sh");
+
+  //DEST.fp16[0] := SRC1.fp16[0] + SRC2.fp16[0]
+  emulate_add_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_add_round_sh(0x3, src1.xmmh[0],
+				       src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_add_round_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1a.c
new file mode 100644
index 00000000000..39f26f5d77a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vdivsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res;
+volatile __m128h x1, x2;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_div_sh (x1, x2);
+  res = _mm_mask_div_sh (res, m8, x1, x2);
+  res = _mm_maskz_div_sh (m8, x1, x2);
+
+  res = _mm_div_round_sh (x1, x2, 8);
+  res = _mm_mask_div_round_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_div_round_sh (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1b.c
new file mode 100644
index 00000000000..467f5d20155
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vdivsh-1b.c
@@ -0,0 +1,76 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_div_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = v1.f32[0] / v3.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  
+  emulate_div_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_div_sh(src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_div_sh");
+
+  init_dest(&res, &exp);
+  emulate_div_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_div_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+			       	src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_div_sh");
+
+  emulate_div_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_div_sh(0x3, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_div_sh");
+
+  emulate_div_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_div_round_sh(src1.xmmh[0], src2.xmmh[0],
+				 _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_div_sh");
+
+  init_dest(&res, &exp);
+  emulate_div_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_div_round_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+				      src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_div_sh");
+  
+  emulate_div_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_div_round_sh(0x3, src1.xmmh[0],
+				       src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_div_sh");    
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1a.c
new file mode 100644
index 00000000000..85707b5f169
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmulsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res;
+volatile __m128h x1, x2;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_mul_sh (x1, x2);
+  res = _mm_mask_mul_sh (res, m8, x1, x2);
+  res = _mm_maskz_mul_sh (m8, x1, x2);
+
+  res = _mm_mul_round_sh (x1, x2, 8);
+  res = _mm_mask_mul_round_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_mul_round_sh (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1b.c
new file mode 100644
index 00000000000..36b6930a516
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmulsh-1b.c
@@ -0,0 +1,77 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_mul_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = v1.f32[0] * v3.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+   
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  
+  emulate_mul_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mul_sh(src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mul_sh");
+
+  init_dest(&res, &exp);
+  emulate_mul_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_mul_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+			       	src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_mul_sh");
+
+  emulate_mul_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_mul_sh(0x3, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_mul_sh");
+
+  emulate_mul_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mul_round_sh(src1.xmmh[0], src2.xmmh[0],
+				 _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mul_sh");
+
+  init_dest(&res, &exp);
+  emulate_mul_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_mul_round_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+				      src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_mul_sh");
+
+  emulate_mul_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_mul_round_sh(0x3, src1.xmmh[0],
+				       src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_mul_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1a.c
new file mode 100644
index 00000000000..8ea1eea615b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsubsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res;
+volatile __m128h x1, x2;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_sub_sh (x1, x2);
+  res = _mm_mask_sub_sh (res, m8, x1, x2);
+  res = _mm_maskz_sub_sh (m8, x1, x2);
+
+  res = _mm_sub_round_sh (x1, x2, 8);
+  res = _mm_mask_sub_round_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_sub_round_sh (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1b.c
new file mode 100644
index 00000000000..df3680ebee1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsubsh-1b.c
@@ -0,0 +1,76 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_sub_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = v1.f32[0] - v3.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+   
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_sub_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_sub_sh(src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_sub_sh");
+
+  init_dest(&res, &exp);
+  emulate_sub_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_sub_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+			       	src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_sub_sh");
+
+  emulate_sub_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_sub_sh(0x3, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_sub_sh");
+
+  emulate_sub_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_sub_round_sh(src1.xmmh[0], src2.xmmh[0],
+				 _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_sub_sh");
+
+  init_dest(&res, &exp);
+  emulate_sub_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_sub_round_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+				      src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_sub_sh");
+  
+  emulate_sub_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_sub_round_sh(0x3, src1.xmmh[0],
+				       src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_sub_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr54855-11.c b/gcc/testsuite/gcc.target/i386/pr54855-11.c
new file mode 100644
index 00000000000..a7095665d76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr54855-11.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+/* { dg-final { scan-assembler-times "vaddsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vpextrw\[ \\t\]" } } */
+/* { dg-final { scan-assembler-not "vmovw\[ \\t\]" } } */
+/* { dg-final { scan-assembler-not "vmovd\[ \\t\]" } } */
+/* { dg-final { scan-assembler-not "vpunpckldq\[ \\t\]" } } */
+/* { dg-final { scan-assembler-not "vpunpcklqdq\[ \\t\]" } } */
+
+#include <immintrin.h>
+
+__m128h
+foo (__m128h x, __m128h y)
+{
+  return _mm_add_sh (x, y);
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 12/62] AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (10 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 11/62] AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:15 ` [PATCH 13/62] AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh liuhongt
                   ` (49 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h: (_mm512_max_ph): New intrinsic.
	(_mm512_mask_max_ph): Likewise.
	(_mm512_maskz_max_ph): Likewise.
	(_mm512_min_ph): Likewise.
	(_mm512_mask_min_ph): Likewise.
	(_mm512_maskz_min_ph): Likewise.
	(_mm512_max_round_ph): Likewise.
	(_mm512_mask_max_round_ph): Likewise.
	(_mm512_maskz_max_round_ph): Likewise.
	(_mm512_min_round_ph): Likewise.
	(_mm512_mask_min_round_ph): Likewise.
	(_mm512_maskz_min_round_ph): Likewise.
	(_mm_max_sh): Likewise.
	(_mm_mask_max_sh): Likewise.
	(_mm_maskz_max_sh): Likewise.
	(_mm_min_sh): Likewise.
	(_mm_mask_min_sh): Likewise.
	(_mm_maskz_min_sh): Likewise.
	(_mm_max_round_sh): Likewise.
	(_mm_mask_max_round_sh): Likewise.
	(_mm_maskz_max_round_sh): Likewise.
	(_mm_min_round_sh): Likewise.
	(_mm_mask_min_round_sh): Likewise.
	(_mm_maskz_min_round_sh): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_max_ph): New intrinsic.
	(_mm256_max_ph): Likewise.
	(_mm_mask_max_ph): Likewise.
	(_mm256_mask_max_ph): Likewise.
	(_mm_maskz_max_ph): Likewise.
	(_mm256_maskz_max_ph): Likewise.
	(_mm_min_ph): Likewise.
	(_mm256_min_ph): Likewise.
	(_mm_mask_min_ph): Likewise.
	(_mm256_mask_min_ph): Likewise.
	(_mm_maskz_min_ph): Likewise.
	(_mm256_maskz_min_ph): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	* config/i386/sse.md
	(<code><mode>3<mask_name><round_saeonly_name>): Adjust to
	support HF vector modes.
	(*<code><mode>3<mask_name><round_saeonly_name>): Likewise.
	(ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>):
	Likewise.
	(<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
	Likewise.
	* config/i386/subst.md (round_saeonly_mode512bit_condition):
	Adjust for HF vector modes.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 263 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   |  97 +++++++++
 gcc/config/i386/i386-builtin-types.def |   2 +
 gcc/config/i386/i386-builtin.def       |  12 ++
 gcc/config/i386/i386-expand.c          |   2 +
 gcc/config/i386/sse.md                 |  43 ++--
 gcc/config/i386/subst.md               |   4 +-
 gcc/testsuite/gcc.target/i386/avx-1.c  |   4 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   4 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  12 ++
 gcc/testsuite/gcc.target/i386/sse-22.c |  12 ++
 gcc/testsuite/gcc.target/i386/sse-23.c |   4 +
 12 files changed, 438 insertions(+), 21 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 6ae12ebf920..c232419b4db 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -722,6 +722,269 @@ _mm_maskz_div_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
 						   (A), (D)))
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsic vmaxph vminph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_max_ph (__m512h __A, __m512h __B)
+{
+  return __builtin_ia32_vmaxph_v32hf_mask (__A, __B,
+					   _mm512_setzero_ph (),
+					   (__mmask32) -1);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_max_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_vmaxph_v32hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_max_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_vmaxph_v32hf_mask (__B, __C,
+					   _mm512_setzero_ph (), __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_min_ph (__m512h __A, __m512h __B)
+{
+  return __builtin_ia32_vminph_v32hf_mask (__A, __B,
+					   _mm512_setzero_ph (),
+					   (__mmask32) -1);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_min_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_vminph_v32hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_min_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_vminph_v32hf_mask (__B, __C,
+					   _mm512_setzero_ph (), __A);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_max_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_vmaxph_v32hf_mask_round (__A, __B,
+						 _mm512_setzero_ph (),
+						 (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_max_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
+{
+  return __builtin_ia32_vmaxph_v32hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_max_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_vmaxph_v32hf_mask_round (__B, __C,
+						 _mm512_setzero_ph (),
+						 __A, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_min_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_vminph_v32hf_mask_round (__A, __B,
+						 _mm512_setzero_ph (),
+						 (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_min_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			  __m512h __D, const int __E)
+{
+  return __builtin_ia32_vminph_v32hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_min_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_vminph_v32hf_mask_round (__B, __C,
+						 _mm512_setzero_ph (),
+						 __A, __D);
+}
+
+#else
+#define _mm512_max_round_ph(A, B, C)					\
+  (__builtin_ia32_vmaxph_v32hf_mask_round ((A), (B),			\
+					   _mm512_setzero_ph (),	\
+					   (__mmask32)-1, (C)))
+
+#define _mm512_mask_max_round_ph(A, B, C, D, E)				\
+  (__builtin_ia32_vmaxph_v32hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_max_round_ph(A, B, C, D)				\
+  (__builtin_ia32_vmaxph_v32hf_mask_round ((B), (C),			\
+					   _mm512_setzero_ph (),	\
+					   (A), (D)))
+
+#define _mm512_min_round_ph(A, B, C)					\
+  (__builtin_ia32_vminph_v32hf_mask_round ((A), (B),			\
+					   _mm512_setzero_ph (),	\
+					   (__mmask32)-1, (C)))
+
+#define _mm512_mask_min_round_ph(A, B, C, D, E)				\
+  (__builtin_ia32_vminph_v32hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_min_round_ph(A, B, C, D)				\
+  (__builtin_ia32_vminph_v32hf_mask_round ((B), (C),			\
+					   _mm512_setzero_ph (),	\
+					   (A), (D)))
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsic vmaxsh vminsh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_sh (__m128h __A, __m128h __B)
+{
+  __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
+  return __A;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_max_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vmaxsh_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_max_sh (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vmaxsh_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_sh (__m128h __A, __m128h __B)
+{
+  __A[0] = __A[0] < __B[0] ? __A[0] : __B[0];
+  return __A;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_min_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vminsh_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_min_sh (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vminsh_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_round_sh (__m128h __A, __m128h __B, const int __C)
+{
+  return __builtin_ia32_vmaxsh_v8hf_mask_round (__A, __B,
+						_mm_setzero_ph (),
+						(__mmask8) -1, __C);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_max_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
+{
+  return __builtin_ia32_vmaxsh_v8hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_max_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
+{
+  return __builtin_ia32_vmaxsh_v8hf_mask_round (__B, __C,
+						_mm_setzero_ph (),
+						__A, __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_round_sh (__m128h __A, __m128h __B, const int __C)
+{
+  return __builtin_ia32_vminsh_v8hf_mask_round (__A, __B,
+						_mm_setzero_ph (),
+						(__mmask8) -1, __C);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_min_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		       __m128h __D, const int __E)
+{
+  return __builtin_ia32_vminsh_v8hf_mask_round (__C, __D, __A, __B, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			const int __D)
+{
+  return __builtin_ia32_vminsh_v8hf_mask_round (__B, __C,
+						_mm_setzero_ph (),
+						__A, __D);
+}
+
+#else
+#define _mm_max_round_sh(A, B, C)					\
+  (__builtin_ia32_vmaxsh_v8hf_mask_round ((A), (B),			\
+					  _mm_setzero_ph (),		\
+					  (__mmask8)-1, (C)))
+
+#define _mm_mask_max_round_sh(A, B, C, D, E)				\
+  (__builtin_ia32_vmaxsh_v8hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_max_round_sh(A, B, C, D)				\
+  (__builtin_ia32_vmaxsh_v8hf_mask_round ((B), (C),			\
+					  _mm_setzero_ph (),		\
+					  (A), (D)))
+
+#define _mm_min_round_sh(A, B, C)					\
+  (__builtin_ia32_vminsh_v8hf_mask_round ((A), (B),			\
+					  _mm_setzero_ph (),		\
+					  (__mmask8)-1, (C)))
+
+#define _mm_mask_min_round_sh(A, B, C, D, E)				\
+  (__builtin_ia32_vminsh_v8hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_min_round_sh(A, B, C, D)				\
+  (__builtin_ia32_vminsh_v8hf_mask_round ((B), (C),			\
+					  _mm_setzero_ph (),		\
+					  (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index 75fa9eb29e7..bd60b4cd4ca 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -211,6 +211,103 @@ _mm256_maskz_div_ph (__mmask16 __A, __m256h __B, __m256h __C)
 					   _mm256_setzero_ph (), __A);
 }
 
+/* Intrinsics v[max,min]ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_ph (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vmaxph_v8hf_mask (__A, __B,
+					  _mm_setzero_ph (),
+					  (__mmask8) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_max_ph (__m256h __A, __m256h __B)
+{
+  return __builtin_ia32_vmaxph_v16hf_mask (__A, __B,
+					  _mm256_setzero_ph (),
+					  (__mmask16) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_max_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vmaxph_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_max_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
+{
+  return __builtin_ia32_vmaxph_v16hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_max_ph (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vmaxph_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_max_ph (__mmask16 __A, __m256h __B, __m256h __C)
+{
+  return __builtin_ia32_vmaxph_v16hf_mask (__B, __C,
+					   _mm256_setzero_ph (), __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_ph (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vminph_v8hf_mask (__A, __B,
+					  _mm_setzero_ph (),
+					  (__mmask8) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_min_ph (__m256h __A, __m256h __B)
+{
+  return __builtin_ia32_vminph_v16hf_mask (__A, __B,
+					  _mm256_setzero_ph (),
+					  (__mmask16) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_min_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vminph_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_min_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
+{
+  return __builtin_ia32_vminph_v16hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_min_ph (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vminph_v8hf_mask (__B, __C, _mm_setzero_ph (),
+					  __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_min_ph (__mmask16 __A, __m256h __B, __m256h __C)
+{
+  return __builtin_ia32_vminph_v16hf_mask (__B, __C,
+					   _mm256_setzero_ph (), __A);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index ed738f71927..3bd2670e229 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1304,9 +1304,11 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
 
 # FP16 builtins
 DEF_FUNCTION_TYPE (V8HF, V8HI)
+DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
+DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 60e2b75be14..28e5627ca4c 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2791,6 +2791,14 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask, "__b
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask, "__builtin_ia32_vsubsh_v8hf_mask", IX86_BUILTIN_VSUBSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask, "__builtin_ia32_vmulsh_v8hf_mask", IX86_BUILTIN_VMULSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask, "__builtin_ia32_vdivsh_v8hf_mask", IX86_BUILTIN_VDIVSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv8hf3_mask, "__builtin_ia32_vmaxph_v8hf_mask", IX86_BUILTIN_VMAXPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv16hf3_mask, "__builtin_ia32_vmaxph_v16hf_mask", IX86_BUILTIN_VMAXPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask, "__builtin_ia32_vmaxph_v32hf_mask", IX86_BUILTIN_VMAXPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv8hf3_mask, "__builtin_ia32_vminph_v8hf_mask", IX86_BUILTIN_VMINPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv16hf3_mask, "__builtin_ia32_vminph_v16hf_mask", IX86_BUILTIN_VMINPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_vminph_v32hf_mask", IX86_BUILTIN_VMINPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask, "__builtin_ia32_vmaxsh_v8hf_mask", IX86_BUILTIN_VMAXSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__builtin_ia32_vminsh_v8hf_mask", IX86_BUILTIN_VMINSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3000,6 +3008,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmaddv8hf3_mask_round
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsubv8hf3_mask_round, "__builtin_ia32_vsubsh_v8hf_mask_round", IX86_BUILTIN_VSUBSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmmulv8hf3_mask_round, "__builtin_ia32_vmulsh_v8hf_mask_round", IX86_BUILTIN_VMULSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmdivv8hf3_mask_round, "__builtin_ia32_vdivsh_v8hf_mask_round", IX86_BUILTIN_VDIVSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask_round, "__builtin_ia32_vmaxph_v32hf_mask_round", IX86_BUILTIN_VMAXPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_vminph_v32hf_mask_round", IX86_BUILTIN_VMINPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_round, "__builtin_ia32_vmaxsh_v8hf_mask_round", IX86_BUILTIN_VMAXSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_vminsh_v8hf_mask_round", IX86_BUILTIN_VMINSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index d2a47150e1b..90f8e3a6d4c 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9349,12 +9349,14 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case FLOAT128_FTYPE_FLOAT128_FLOAT128:
     case V16QI_FTYPE_V16QI_V16QI:
     case V16QI_FTYPE_V8HI_V8HI:
+    case V16HF_FTYPE_V16HF_V16HF:
     case V16SF_FTYPE_V16SF_V16SF:
     case V8QI_FTYPE_V8QI_V8QI:
     case V8QI_FTYPE_V4HI_V4HI:
     case V8HI_FTYPE_V8HI_V8HI:
     case V8HI_FTYPE_V16QI_V16QI:
     case V8HI_FTYPE_V4SI_V4SI:
+    case V8HF_FTYPE_V8HF_V8HF:
     case V8SF_FTYPE_V8SF_V8SF:
     case V8SF_FTYPE_V8SF_V8SI:
     case V8DF_FTYPE_V8DF_V8DF:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8fa3f8ddac9..976803f2a1d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2384,11 +2384,12 @@ (define_insn "*sse_vmrsqrtv4sf2"
    (set_attr "mode" "SF")])
 
 (define_expand "<code><mode>3<mask_name><round_saeonly_name>"
-  [(set (match_operand:VF 0 "register_operand")
-	(smaxmin:VF
-	  (match_operand:VF 1 "<round_saeonly_nimm_predicate>")
-	  (match_operand:VF 2 "<round_saeonly_nimm_predicate>")))]
-  "TARGET_SSE && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
+  [(set (match_operand:VFH 0 "register_operand")
+	(smaxmin:VFH
+	  (match_operand:VFH 1 "<round_saeonly_nimm_predicate>")
+	  (match_operand:VFH 2 "<round_saeonly_nimm_predicate>")))]
+  "TARGET_SSE && <mask_mode512bit_condition>
+   && <round_saeonly_mode512bit_condition>"
 {
   if (!flag_finite_math_only || flag_signed_zeros)
     {
@@ -2409,13 +2410,14 @@ (define_expand "<code><mode>3<mask_name><round_saeonly_name>"
 ;; are undefined in this condition, we're certain this is correct.
 
 (define_insn "*<code><mode>3<mask_name><round_saeonly_name>"
-  [(set (match_operand:VF 0 "register_operand" "=x,v")
-	(smaxmin:VF
-	  (match_operand:VF 1 "<round_saeonly_nimm_predicate>" "%0,v")
-	  (match_operand:VF 2 "<round_saeonly_nimm_predicate>" "xBm,<round_saeonly_constraint>")))]
+  [(set (match_operand:VFH 0 "register_operand" "=x,v")
+	(smaxmin:VFH
+	  (match_operand:VFH 1 "<round_saeonly_nimm_predicate>" "%0,v")
+	  (match_operand:VFH 2 "<round_saeonly_nimm_predicate>" "xBm,<round_saeonly_constraint>")))]
   "TARGET_SSE
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))
-   && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
+   && <mask_mode512bit_condition>
+   && <round_saeonly_mode512bit_condition>"
   "@
    <maxmin_float><ssemodesuffix>\t{%2, %0|%0, %2}
    v<maxmin_float><ssemodesuffix>\t{<round_saeonly_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_saeonly_mask_op3>}"
@@ -2432,13 +2434,14 @@ (define_insn "*<code><mode>3<mask_name><round_saeonly_name>"
 ;; presence of -0.0 and NaN.
 
 (define_insn "ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>"
-  [(set (match_operand:VF 0 "register_operand" "=x,v")
-	(unspec:VF
-	  [(match_operand:VF 1 "register_operand" "0,v")
-	   (match_operand:VF 2 "<round_saeonly_nimm_predicate>" "xBm,<round_saeonly_constraint>")]
+  [(set (match_operand:VFH 0 "register_operand" "=x,v")
+	(unspec:VFH
+	  [(match_operand:VFH 1 "register_operand" "0,v")
+	   (match_operand:VFH 2 "<round_saeonly_nimm_predicate>" "xBm,<round_saeonly_constraint>")]
 	  IEEE_MAXMIN))]
   "TARGET_SSE
-   && <mask_mode512bit_condition> && <round_saeonly_mode512bit_condition>"
+   && <mask_mode512bit_condition>
+   && <round_saeonly_mode512bit_condition>"
   "@
    <ieee_maxmin><ssemodesuffix>\t{%2, %0|%0, %2}
    v<ieee_maxmin><ssemodesuffix>\t{<round_saeonly_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_saeonly_mask_op3>}"
@@ -2473,11 +2476,11 @@ (define_insn "*ieee_<ieee_maxmin><mode>3"
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_insn "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (smaxmin:VF_128
-	    (match_operand:VF_128 1 "register_operand" "0,v")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,<round_saeonly_scalar_constraint>"))
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (smaxmin:VFH_128
+	    (match_operand:VFH_128 1 "register_operand" "0,v")
+	    (match_operand:VFH_128 2 "nonimmediate_operand" "xm,<round_saeonly_scalar_constraint>"))
 	 (match_dup 1)
 	 (const_int 1)))]
   "TARGET_SSE"
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 762383bfd11..ecb158f07e5 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -187,7 +187,9 @@ (define_subst_attr "round_saeonly_nimm_scalar_predicate" "round_saeonly" "nonimm
 (define_subst_attr "round_saeonly_mode512bit_condition" "round_saeonly" "1" "(<MODE>mode == V16SFmode
 									      || <MODE>mode == V8DFmode
 									      || <MODE>mode == V8DImode
-									      || <MODE>mode == V16SImode)")
+									      || <MODE>mode == V16SImode
+									      || <MODE>mode == V32HFmode)")
+
 (define_subst_attr "round_saeonly_modev8sf_condition" "round_saeonly" "1" "(<MODE>mode == V8SFmode)")
 
 (define_subst "round_saeonly"
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 26ca87ce2f5..7106076b2a3 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -694,6 +694,10 @@
 #define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index ae35adb5ead..1732b50be6b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -711,6 +711,10 @@
 #define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index e79edf0a5bb..135b4463941 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -676,6 +676,10 @@ test_2 (_mm_add_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_sub_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_mul_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_div_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm512_max_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_min_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -684,6 +688,10 @@ test_3 (_mm_maskz_add_round_sh, __m128h, __mmask32, __m128h, __m128h, 8)
 test_3 (_mm_maskz_sub_round_sh, __m128h, __mmask32, __m128h, __m128h, 8)
 test_3 (_mm_maskz_mul_round_sh, __m128h, __mmask32, __m128h, __m128h, 8)
 test_3 (_mm_maskz_div_round_sh, __m128h, __mmask32, __m128h, __m128h, 8)
+test_3 (_mm512_maskz_max_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -692,6 +700,10 @@ test_4 (_mm_mask_add_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8)
 test_4 (_mm_mask_sub_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8)
 test_4 (_mm_mask_mul_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8)
 test_4 (_mm_mask_div_round_sh, __m128h, __m128h, __mmask32, __m128h, __m128h, 8)
+test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 2c1f27d881a..da3f5606207 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -781,6 +781,10 @@ test_2 (_mm_add_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_sub_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_mul_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_div_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm512_max_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_min_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -789,6 +793,10 @@ test_3 (_mm_maskz_add_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm_maskz_sub_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm_maskz_mul_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm_maskz_div_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm512_maskz_max_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -797,6 +805,10 @@ test_4 (_mm_mask_add_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_sub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_mul_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_div_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index a89aef2aa8e..c3fee655288 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -712,6 +712,10 @@
 #define __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubsh_v8hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulsh_v8hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 13/62] AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (11 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 12/62] AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh liuhongt
@ 2021-07-01  6:15 ` liuhongt
  2021-07-01  6:16 ` [PATCH 14/62] AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish liuhongt
                   ` (48 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vmaxph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vmaxph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmaxsh-1.c: Ditto.
	* gcc.target/i386/avx512fp16-vmaxsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vminph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vminph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vminsh-1.c: Ditto.
	* gcc.target/i386/avx512fp16-vminsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vmaxph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vmaxph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vminph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vminph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-vmaxph-1a.c    | 26 +++++
 .../gcc.target/i386/avx512fp16-vmaxph-1b.c    | 94 +++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vmaxsh-1.c     | 27 ++++++
 .../gcc.target/i386/avx512fp16-vmaxsh-1b.c    | 72 ++++++++++++++
 .../gcc.target/i386/avx512fp16-vminph-1a.c    | 26 +++++
 .../gcc.target/i386/avx512fp16-vminph-1b.c    | 93 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vminsh-1.c     | 27 ++++++
 .../gcc.target/i386/avx512fp16-vminsh-1b.c    | 72 ++++++++++++++
 .../gcc.target/i386/avx512fp16vl-vmaxph-1a.c  | 29 ++++++
 .../gcc.target/i386/avx512fp16vl-vmaxph-1b.c  | 16 ++++
 .../gcc.target/i386/avx512fp16vl-vminph-1a.c  | 29 ++++++
 .../gcc.target/i386/avx512fp16vl-vminph-1b.c  | 16 ++++
 12 files changed, 527 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1a.c
new file mode 100644
index 00000000000..b91f4bd1154
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_max_ph (x1, x2);
+  res1 = _mm512_mask_max_ph (res1, m32, x1, x2);
+  res2 = _mm512_maskz_max_ph (m32, x1, x2);
+
+  res = _mm512_max_round_ph (x1, x2, 8);
+  res1 = _mm512_mask_max_round_ph (res1, m32, x1, x2, 8);
+  res2 = _mm512_maskz_max_round_ph (m32, x1, x2, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1b.c
new file mode 100644
index 00000000000..0dd4c11e9aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxph-1b.c
@@ -0,0 +1,94 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(max_ph) (V512 * dest, V512 op1, V512 op2,
+	       __mmask32 k, int zero_mask)
+{   
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff; 
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) { 
+      if (((1 << i) & m1) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.u32[i] = v7.u32[i];
+	  }
+      }
+      else {
+	  v5.f32[i] = v1.f32[i] > v3.f32[i] ? v1.f32[i] : v3.f32[i];
+      }
+
+      if (((1 << i) & m2) == 0) {
+	  if (zero_mask) {
+	      v6.f32[i] = 0;
+	  }
+	  else {
+	      v6.u32[i] = v8.u32[i];
+	  }
+      }
+      else {
+	  v6.f32[i] = v2.f32[i] > v4.f32[i] ? v2.f32[i] : v4.f32[i];
+      }
+  }
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(max_ph) (&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_max_ph) (HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _max_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(max_ph) (&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_max_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_max_ph);
+
+  EMULATE(max_ph) (&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_max_ph) (ZMASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_max_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(max_ph) (&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_max_round_ph) (HF(src1), HF(src2), 8);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _max_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(max_ph) (&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_max_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), 8);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_max_ph);
+
+  EMULATE(max_ph) (&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_max_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), 8);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_max_ph);
+
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1.c
new file mode 100644
index 00000000000..d5198dcebdc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res;
+volatile __m128h x1, x2;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_max_sh (x1, x2);
+  res = _mm_mask_max_sh (res, m8, x1, x2);
+  res = _mm_maskz_max_sh (m8, x1, x2);
+
+  res = _mm_max_round_sh (x1, x2, 8);
+  res = _mm_mask_max_round_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_max_round_sh (m8, x1, x2, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1b.c
new file mode 100644
index 00000000000..fe49de3147f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmaxsh-1b.c
@@ -0,0 +1,72 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_max_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[i] = v1.f32[i] > v3.f32[i] ? v1.f32[i] : v3.f32[i];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+   
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_max_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_max_sh(src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_max_sh");
+
+  init_dest(&res, &exp);
+  emulate_max_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_max_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_max_sh");
+
+  emulate_max_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_max_sh(0x3, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_max_sh");
+
+  emulate_max_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_max_round_sh(src1.xmmh[0], src2.xmmh[0], 8);
+  check_results(&res, &exp, N_ELEMS, "_mm_max_round_sh");
+
+  init_dest(&res, &exp);
+  emulate_max_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_max_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0], 8);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_max_round_sh");
+
+  emulate_max_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_max_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], 8);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_max_round_sh");
+
+  if (n_errs != 0)
+      abort ();
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1a.c
new file mode 100644
index 00000000000..810a93e3870
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_min_ph (x1, x2);
+  res1 = _mm512_mask_min_ph (res1, m32, x1, x2);
+  res2 = _mm512_maskz_min_ph (m32, x1, x2);
+
+  res = _mm512_min_round_ph (x1, x2, 8);
+  res1 = _mm512_mask_min_round_ph (res1, m32, x1, x2, 8);
+  res2 = _mm512_maskz_min_round_ph (m32, x1, x2, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1b.c
new file mode 100644
index 00000000000..3315ce13813
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vminph-1b.c
@@ -0,0 +1,93 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(min_ph) (V512 * dest, V512 op1, V512 op2,
+                __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+               v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = v1.f32[i] < v3.f32[i] ? v1.f32[i] : v3.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+               v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = v2.f32[i] < v4.f32[i] ? v2.f32[i] : v4.f32[i];
+        }
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(min_ph) (&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_min_ph) (HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _min_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(min_ph) (&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_min_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_min_ph);
+
+  EMULATE(min_ph) (&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_min_ph) (ZMASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_min_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(min_ph) (&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_min_round_ph) (HF(src1), HF(src2), 8);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _min_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(min_ph) (&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_min_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), 8);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_min_ph);
+
+  EMULATE(min_ph) (&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_min_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), 8);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_min_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1.c
new file mode 100644
index 00000000000..9f1d6e7da4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminsh\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res;
+volatile __m128h x1, x2;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_min_sh (x1, x2);
+  res = _mm_mask_min_sh (res, m8, x1, x2);
+  res = _mm_maskz_min_sh (m8, x1, x2);
+
+  res = _mm_min_round_sh (x1, x2, 8);
+  res = _mm_mask_min_round_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_min_round_sh (m8, x1, x2, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1b.c
new file mode 100644
index 00000000000..13b8d86689c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vminsh-1b.c
@@ -0,0 +1,72 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_min_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[i] = v1.f32[i] < v3.f32[i] ? v1.f32[i] : v3.f32[i];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+   
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_min_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_min_sh(src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_min_sh");
+
+  init_dest(&res, &exp);
+  emulate_min_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_min_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_min_sh");
+
+  emulate_min_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_min_sh(0x3, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_min_sh");
+
+  emulate_min_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_min_round_sh(src1.xmmh[0], src2.xmmh[0], 8);
+  check_results(&res, &exp, N_ELEMS, "_mm_min_round_sh");
+
+  init_dest(&res, &exp);
+  emulate_min_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_min_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0], 8);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_min_round_sh");
+
+  emulate_min_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_min_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], 8);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_min_round_sh");
+
+  if (n_errs != 0)
+      abort ();
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1a.c
new file mode 100644
index 00000000000..adadc4ed8d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1,x2;
+volatile __m128h x3, x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_max_ph (x1, x2);
+  res1 = _mm256_mask_max_ph (res1, m16, x1, x2);
+  res1 = _mm256_maskz_max_ph (m16, x1, x2);
+
+  res2 = _mm_max_ph (x3, x4);
+  res2 = _mm_mask_max_ph (res2, m8, x3, x4);
+  res2 = _mm_maskz_max_ph (m8, x3, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c
new file mode 100644
index 00000000000..f9a3b70d47c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vmaxph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vmaxph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vmaxph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1a.c
new file mode 100644
index 00000000000..7909541aa34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vminph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1,x2;
+volatile __m128h x3, x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_min_ph (x1, x2);
+  res1 = _mm256_mask_min_ph (res1, m16, x1, x2);
+  res1 = _mm256_maskz_min_ph (m16, x1, x2);
+
+  res2 = _mm_min_ph (x3, x4);
+  res2 = _mm_mask_min_ph (res2, m8, x3, x4);
+  res2 = _mm_maskz_min_ph (m8, x3, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c
new file mode 100644
index 00000000000..98808b0eddd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vminph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vminph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vminph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 14/62] AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (12 preceding siblings ...)
  2021-07-01  6:15 ` [PATCH 13/62] AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 15/62] AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish liuhongt
                   ` (47 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h: (_mm512_cmp_ph_mask):
	New intrinsic.
	(_mm512_mask_cmp_ph_mask): Likewise.
	(_mm512_cmp_round_ph_mask): Likewise.
	(_mm512_mask_cmp_round_ph_mask): Likewise.
	(_mm_cmp_sh_mask): Likewise.
	(_mm_mask_cmp_sh_mask): Likewise.
	(_mm_cmp_round_sh_mask): Likewise.
	(_mm_mask_cmp_round_sh_mask): Likewise.
	(_mm_comieq_sh): Likewise.
	(_mm_comilt_sh): Likewise.
	(_mm_comile_sh): Likewise.
	(_mm_comigt_sh): Likewise.
	(_mm_comige_sh): Likewise.
	(_mm_comineq_sh): Likewise.
	(_mm_ucomieq_sh): Likewise.
	(_mm_ucomilt_sh): Likewise.
	(_mm_ucomile_sh): Likewise.
	(_mm_ucomigt_sh): Likewise.
	(_mm_ucomige_sh): Likewise.
	(_mm_ucomineq_sh): Likewise.
	(_mm_comi_round_sh): Likewise.
	(_mm_comi_sh): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_cmp_ph_mask): New intrinsic.
	(_mm_mask_cmp_ph_mask): Likewise.
	(_mm256_cmp_ph_mask): Likewise.
	(_mm256_mask_cmp_ph_mask): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	(ix86_expand_round_builtin): Ditto.
	* config/i386/i386.md (ssevecmode): Add HF mode.
	* config/i386/sse.md
	(V48H_AVX512VL): New mode iterator to support HF vector modes.
	Ajdust corresponding description.
	(ssecmpintprefix): New.
	(VI12_AVX512VL): Adjust to support HF vector modes.
	(cmp_imm_predicate): Likewise.
	(<avx512>_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>):
	Likewise.
	(avx512f_vmcmp<mode>3<round_saeonly_name>): Likewise.
	(avx512f_vmcmp<mode>3_mask<round_saeonly_name>): Likewise.
	(<sse>_<unord>comi<round_saeonly_name>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 250 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   |  50 +++++
 gcc/config/i386/i386-builtin-types.def |   5 +
 gcc/config/i386/i386-builtin.def       |   5 +
 gcc/config/i386/i386-expand.c          |  10 +
 gcc/config/i386/i386.md                |   2 +-
 gcc/config/i386/sse.md                 |  56 ++++--
 gcc/testsuite/gcc.target/i386/avx-1.c  |   7 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   7 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  16 ++
 gcc/testsuite/gcc.target/i386/sse-22.c |  16 ++
 gcc/testsuite/gcc.target/i386/sse-23.c |   7 +
 12 files changed, 413 insertions(+), 18 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index c232419b4db..ed8ad84a105 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -985,6 +985,256 @@ _mm_maskz_min_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
 
 #endif /* __OPTIMIZE__ */
 
+/* vcmpph */
+#ifdef __OPTIMIZE
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cmp_ph_mask (__m512h __A, __m512h __B, const int __C)
+{
+  return (__mmask32) __builtin_ia32_vcmpph_v32hf_mask (__A, __B, __C,
+						       (__mmask32) -1);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cmp_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
+			 const int __D)
+{
+  return (__mmask32) __builtin_ia32_vcmpph_v32hf_mask (__B, __C, __D,
+						       __A);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cmp_round_ph_mask (__m512h __A, __m512h __B, const int __C,
+			  const int __D)
+{
+  return (__mmask32) __builtin_ia32_vcmpph_v32hf_mask_round (__A, __B,
+							     __C, (__mmask32) -1,
+							     __D);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cmp_round_ph_mask (__mmask32 __A, __m512h __B, __m512h __C,
+			       const int __D, const int __E)
+{
+  return (__mmask32) __builtin_ia32_vcmpph_v32hf_mask_round (__B, __C,
+							     __D, __A,
+							     __E);
+}
+
+#else
+#define _mm512_cmp_ph_mask(A, B, C)			\
+  (__builtin_ia32_vcmpph_v32hf_mask ((A), (B), (C), (-1)))
+
+#define _mm512_mask_cmp_ph_mask(A, B, C, D)		\
+  (__builtin_ia32_vcmpph_v32hf_mask ((B), (C), (D), (A)))
+
+#define _mm512_cmp_round_ph_mask(A, B, C, D)		\
+  (__builtin_ia32_vcmpph_v32hf_mask_round ((A), (B), (C), (-1), (D)))
+
+#define _mm512_mask_cmp_round_ph_mask(A, B, C, D, E)	\
+  (__builtin_ia32_vcmpph_v32hf_mask_round ((B), (C), (D), (A), (E)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcmpsh.  */
+#ifdef __OPTIMIZE__
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmp_sh_mask (__m128h __A, __m128h __B, const int __C)
+{
+  return (__mmask8)
+    __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B,
+					   __C, (__mmask8) -1,
+					   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cmp_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
+		      const int __D)
+{
+  return (__mmask8)
+    __builtin_ia32_vcmpsh_v8hf_mask_round (__B, __C,
+					   __D, __A,
+					   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmp_round_sh_mask (__m128h __A, __m128h __B, const int __C,
+		       const int __D)
+{
+  return (__mmask8) __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B,
+							   __C, (__mmask8) -1,
+							   __D);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cmp_round_sh_mask (__mmask8 __A, __m128h __B, __m128h __C,
+			    const int __D, const int __E)
+{
+  return (__mmask8) __builtin_ia32_vcmpsh_v8hf_mask_round (__B, __C,
+							   __D, __A,
+							   __E);
+}
+
+#else
+#define _mm_cmp_sh_mask(A, B, C)		\
+  (__builtin_ia32_vcmpsh_v8hf_mask_round ((A), (B), (C), (-1), \
+					  (_MM_FROUND_CUR_DIRECTION)))
+
+#define _mm_mask_cmp_sh_mask(A, B, C, D)	\
+  (__builtin_ia32_vcmpsh_v8hf_mask_round ((B), (C), (D), (A),		\
+					  (_MM_FROUND_CUR_DIRECTION)))
+
+#define _mm_cmp_round_sh_mask(A, B, C, D)				\
+  (__builtin_ia32_vcmpsh_v8hf_mask_round ((A), (B), (C), (-1), (D)))
+
+#define _mm_mask_cmp_round_sh_mask(A, B, C, D, E)	\
+  (__builtin_ia32_vcmpsh_v8hf_mask_round ((B), (C), (D), (A), (E)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcomish.  */
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comieq_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_EQ_OS,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comilt_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_LT_OS,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comile_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_LE_OS,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comigt_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_GT_OS,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comige_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_GE_OS,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comineq_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_NEQ_US,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ucomieq_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_EQ_OQ,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ucomilt_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_LT_OQ,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ucomile_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_LE_OQ,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ucomigt_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_GT_OQ,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ucomige_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_GE_OQ,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ucomineq_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, _CMP_NEQ_UQ,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+  _mm_comi_sh (__m128h __A, __m128h __B, const int __P)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, __P,
+						(__mmask8) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R)
+{
+  return __builtin_ia32_vcmpsh_v8hf_mask_round (__A, __B, __P,
+						(__mmask8) -1,__R);
+}
+
+#else
+#define _mm_comi_round_sh(A, B, P, R)		\
+  (__builtin_ia32_vcmpsh_v8hf_mask_round ((A), (B), (P), (__mmask8) (-1), (R)))
+#define _mm_comi_sh(A, B, P)		\
+  (__builtin_ia32_vcmpsh_v8hf_mask_round ((A), (B), (P), (__mmask8) (-1), \
+					  _MM_FROUND_CUR_DIRECTION))
+
+#endif /* __OPTIMIZE__  */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index bd60b4cd4ca..1787ed5f4ff 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -308,6 +308,56 @@ _mm256_maskz_min_ph (__mmask16 __A, __m256h __B, __m256h __C)
 					   _mm256_setzero_ph (), __A);
 }
 
+/* vcmpph */
+#ifdef __OPTIMIZE
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmp_ph_mask (__m128h __A, __m128h __B, const int __C)
+{
+  return (__mmask8) __builtin_ia32_vcmpph_v8hf_mask (__A, __B, __C,
+						     (__mmask8) -1);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cmp_ph_mask (__mmask8 __A, __m128h __B, __m128h __C,
+		      const int __D)
+{
+  return (__mmask8) __builtin_ia32_vcmpph_v8hf_mask (__B, __C, __D, __A);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmp_ph_mask (__m256h __A, __m256h __B, const int __C)
+{
+  return (__mmask16) __builtin_ia32_vcmpph_v16hf_mask (__A, __B, __C,
+						       (__mmask16) -1);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cmp_ph_mask (__mmask16 __A, __m256h __B, __m256h __C,
+		      const int __D)
+{
+  return (__mmask16) __builtin_ia32_vcmpph_v16hf_mask (__B, __C, __D,
+						       __A);
+}
+
+#else
+#define _mm_cmp_ph_mask(A, B, C)		\
+  (__builtin_ia32_vcmpph_v8hf_mask ((A), (B), (C), (-1)))
+
+#define _mm_mask_cmp_ph_mask(A, B, C, D)	\
+  (__builtin_ia32_vcmpph_v8hf_mask ((B), (C), (D), (A)))
+
+#define _mm256_cmp_ph_mask(A, B, C)		\
+  (__builtin_ia32_vcmpph_v16hf_mask ((A), (B), (C), (-1)))
+
+#define _mm256_mask_cmp_ph_mask(A, B, C, D)	\
+  (__builtin_ia32_vcmpph_v16hf_mask ((B), (C), (D), (A)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 3bd2670e229..e3070ad00bd 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1306,10 +1306,15 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
 DEF_FUNCTION_TYPE (V8HF, V8HI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
+DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
+DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
+DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
+DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
+DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 28e5627ca4c..045cf561ec7 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2799,6 +2799,9 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv16hf
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask, "__builtin_ia32_vminph_v32hf_mask", IX86_BUILTIN_VMINPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask, "__builtin_ia32_vmaxsh_v8hf_mask", IX86_BUILTIN_VMAXSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__builtin_ia32_vminsh_v8hf_mask", IX86_BUILTIN_VMINSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_cmpv8hf3_mask, "__builtin_ia32_vcmpph_v8hf_mask", IX86_BUILTIN_VCMPPH_V8HF_MASK, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_cmpv16hf3_mask, "__builtin_ia32_vcmpph_v16hf_mask", IX86_BUILTIN_VCMPPH_V16HF_MASK, UNKNOWN, (int) UHI_FTYPE_V16HF_V16HF_INT_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_vcmpph_v32hf_mask", IX86_BUILTIN_VCMPPH_V32HF_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3012,6 +3015,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_smaxv32hf3_mask_round, "__builti
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_sminv32hf3_mask_round, "__builtin_ia32_vminph_v32hf_mask_round", IX86_BUILTIN_VMINPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_round, "__builtin_ia32_vmaxsh_v8hf_mask_round", IX86_BUILTIN_VMAXSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_vminsh_v8hf_mask_round", IX86_BUILTIN_VMINSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_vcmpph_v32hf_mask_round", IX86_BUILTIN_VCMPPH_V32HF_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_vcmpsh_v8hf_mask_round", IX86_BUILTIN_VCMPSH_V8HF_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 90f8e3a6d4c..a79cc324ceb 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9821,14 +9821,17 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case UQI_FTYPE_V8SI_V8SI_INT_UQI:
     case QI_FTYPE_V4DF_V4DF_INT_UQI:
     case QI_FTYPE_V8SF_V8SF_INT_UQI:
+    case UHI_FTYPE_V16HF_V16HF_INT_UHI:
     case UQI_FTYPE_V2DI_V2DI_INT_UQI:
     case UQI_FTYPE_V4SI_V4SI_INT_UQI:
     case UQI_FTYPE_V2DF_V2DF_INT_UQI:
     case UQI_FTYPE_V4SF_V4SF_INT_UQI:
+    case UQI_FTYPE_V8HF_V8HF_INT_UQI:
     case UDI_FTYPE_V64QI_V64QI_INT_UDI:
     case USI_FTYPE_V32QI_V32QI_INT_USI:
     case UHI_FTYPE_V16QI_V16QI_INT_UHI:
     case USI_FTYPE_V32HI_V32HI_INT_USI:
+    case USI_FTYPE_V32HF_V32HF_INT_USI:
     case UHI_FTYPE_V16HI_V16HI_INT_UHI:
     case UQI_FTYPE_V8HI_V8HI_INT_UQI:
       nargs = 4;
@@ -10112,6 +10115,9 @@ ix86_expand_args_builtin (const struct builtin_description *d,
 	      case CODE_FOR_avx512f_cmpv16sf3_mask:
 	      case CODE_FOR_avx512f_vmcmpv2df3_mask:
 	      case CODE_FOR_avx512f_vmcmpv4sf3_mask:
+	      case CODE_FOR_avx512bw_cmpv32hf3_mask:
+	      case CODE_FOR_avx512vl_cmpv16hf3_mask:
+	      case CODE_FOR_avx512fp16_cmpv8hf3_mask:
 		error ("the last argument must be a 5-bit immediate");
 		return const0_rtx;
 
@@ -10532,6 +10538,8 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case UQI_FTYPE_V2DF_V2DF_INT_UQI_INT:
     case UHI_FTYPE_V16SF_V16SF_INT_UHI_INT:
     case UQI_FTYPE_V4SF_V4SF_INT_UQI_INT:
+    case USI_FTYPE_V32HF_V32HF_INT_USI_INT:
+    case UQI_FTYPE_V8HF_V8HF_INT_UQI_INT:
       nargs_constant = 3;
       nargs = 5;
       break;
@@ -10587,6 +10595,8 @@ ix86_expand_round_builtin (const struct builtin_description *d,
 		case CODE_FOR_avx512f_cmpv16sf3_mask_round:
 		case CODE_FOR_avx512f_vmcmpv2df3_mask_round:
 		case CODE_FOR_avx512f_vmcmpv4sf3_mask_round:
+		case CODE_FOR_avx512f_vmcmpv8hf3_mask_round:
+		case CODE_FOR_avx512bw_cmpv32hf3_mask_round:
 		  error ("the immediate argument must be a 5-bit immediate");
 		  return const0_rtx;
 		default:
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 25cee502f97..014aba187e1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1252,7 +1252,7 @@ (define_mode_attr ssevecmodesuffix [(SF "ps") (DF "pd")])
 
 ;; SSE vector mode corresponding to a scalar mode
 (define_mode_attr ssevecmode
-  [(QI "V16QI") (HI "V8HI") (SI "V4SI") (DI "V2DI") (SF "V4SF") (DF "V2DF")])
+  [(QI "V16QI") (HI "V8HI") (SI "V4SI") (DI "V2DI") (HF "V8HF") (SF "V4SF") (DF "V2DF")])
 (define_mode_attr ssevecmodelower
   [(QI "v16qi") (HI "v8hi") (SI "v4si") (DI "v2di") (SF "v4sf") (DF "v2df")])
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 976803f2a1d..b7e22e0ec80 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -230,13 +230,23 @@ (define_mode_iterator VMOVE
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") V2DF])
 
-;; All AVX-512{F,VL} vector modes. Supposed TARGET_AVX512F baseline.
+;; All AVX-512{F,VL} vector modes without HF. Supposed TARGET_AVX512F baseline.
 (define_mode_iterator V48_AVX512VL
   [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
    V8DI  (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
    V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    V8DF  (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
+;; All AVX-512{F,VL} vector modes. Supposed TARGET_AVX512F baseline.
+(define_mode_iterator V48H_AVX512VL
+  [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+   V8DI  (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")
+   (V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+   V8DF  (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+
 ;; 1,2 byte AVX-512{BW,VL} vector modes. Supposed TARGET_AVX512BW baseline.
 (define_mode_iterator VI12_AVX512VL
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
@@ -974,10 +984,10 @@ (define_mode_attr iptr
   [(V64QI "b") (V32HI "w") (V16SI "k") (V8DI "q")
    (V32QI "b") (V16HI "w") (V8SI "k") (V4DI "q")
    (V16QI "b") (V8HI "w") (V4SI "k") (V2DI "q")
-   (V16SF "k") (V8DF "q")
-   (V8SF "k") (V4DF "q")
-   (V4SF "k") (V2DF "q")
-   (SF "k") (DF "q")])
+   (V32HF "w") (V16SF "k") (V8DF "q")
+   (V16HF "w") (V8SF "k") (V4DF "q")
+   (V8HF "w") (V4SF "k") (V2DF "q")
+   (HF "w") (SF "k") (DF "q")])
 
 ;; Mapping of vector modes to VPTERNLOG suffix
 (define_mode_attr ternlogsuffix
@@ -1024,6 +1034,18 @@ (define_mode_attr sseintprefix
    (V32QI "p") (V16HI "p") (V16HF "p")
    (V64QI "p") (V32HI "p") (V32HF "p")])
 
+;; SSE prefix for integer and HF vector comparison.
+(define_mode_attr ssecmpintprefix
+  [(V2DI  "p") (V2DF  "")
+   (V4DI  "p") (V4DF  "")
+   (V8DI  "p") (V8DF  "")
+   (V4SI  "p") (V4SF  "")
+   (V8SI  "p") (V8SF  "")
+   (V16SI "p") (V16SF "")
+   (V16QI "p") (V8HI "p") (V8HF "")
+   (V32QI "p") (V16HI "p") (V16HF "")
+   (V64QI "p") (V32HI "p") (V32HF "")])
+
 ;; SSE scalar suffix for vector modes
 (define_mode_attr ssescalarmodesuffix
   [(HF "sh") (SF "ss") (DF "sd")
@@ -3263,11 +3285,11 @@ (define_insn "<sse>_vmmaskcmp<mode>3"
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_mode_attr cmp_imm_predicate
-  [(V16SF "const_0_to_31_operand")  (V8DF "const_0_to_31_operand")
+  [(V32HF "const_0_to_31_operand") (V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand")
    (V16SI "const_0_to_7_operand")   (V8DI "const_0_to_7_operand")
-   (V8SF "const_0_to_31_operand")   (V4DF "const_0_to_31_operand")
+   (V16HF "const_0_to_31_operand") (V8SF "const_0_to_31_operand") (V4DF "const_0_to_31_operand")
    (V8SI "const_0_to_7_operand")    (V4DI "const_0_to_7_operand")
-   (V4SF "const_0_to_31_operand")   (V2DF "const_0_to_31_operand")
+   (V8HF "const_0_to_31_operand") (V4SF "const_0_to_31_operand") (V2DF "const_0_to_31_operand")
    (V4SI "const_0_to_7_operand")    (V2DI "const_0_to_7_operand")
    (V32HI "const_0_to_7_operand")   (V64QI "const_0_to_7_operand")
    (V16HI "const_0_to_7_operand")   (V32QI "const_0_to_7_operand")
@@ -3276,12 +3298,12 @@ (define_mode_attr cmp_imm_predicate
 (define_insn "<avx512>_cmp<mode>3<mask_scalar_merge_name><round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
-	  [(match_operand:V48_AVX512VL 1 "register_operand" "v")
-	   (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "<round_saeonly_constraint>")
+	  [(match_operand:V48H_AVX512VL 1 "register_operand" "v")
+	   (match_operand:V48H_AVX512VL 2 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 3 "<cmp_imm_predicate>" "n")]
 	  UNSPEC_PCMP))]
   "TARGET_AVX512F && <round_saeonly_mode512bit_condition>"
-  "v<sseintprefix>cmp<ssemodesuffix>\t{%3, <round_saeonly_mask_scalar_merge_op4>%2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2<round_saeonly_mask_scalar_merge_op4>, %3}"
+  "v<ssecmpintprefix>cmp<ssemodesuffix>\t{%3, <round_saeonly_mask_scalar_merge_op4>%2, %1, %0<mask_scalar_merge_operand4>|%0<mask_scalar_merge_operand4>, %1, %2<round_saeonly_mask_scalar_merge_op4>, %3}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "evex")
@@ -3428,8 +3450,8 @@ (define_insn "avx512f_vmcmp<mode>3<round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(and:<avx512fmaskmode>
 	  (unspec:<avx512fmaskmode>
-	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "<round_saeonly_nimm_scalar_predicate>" "<round_saeonly_constraint>")
+	    [(match_operand:VFH_128 1 "register_operand" "v")
+	     (match_operand:VFH_128 2 "<round_saeonly_nimm_scalar_predicate>" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
 	    UNSPEC_PCMP)
 	  (const_int 1)))]
@@ -3444,8 +3466,8 @@ (define_insn "avx512f_vmcmp<mode>3_mask<round_saeonly_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(and:<avx512fmaskmode>
 	  (unspec:<avx512fmaskmode>
-	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "<round_saeonly_nimm_scalar_predicate>" "<round_saeonly_constraint>")
+	    [(match_operand:VFH_128 1 "register_operand" "v")
+	     (match_operand:VFH_128 2 "<round_saeonly_nimm_scalar_predicate>" "<round_saeonly_constraint>")
 	     (match_operand:SI 3 "const_0_to_31_operand" "n")]
 	    UNSPEC_PCMP)
 	  (and:<avx512fmaskmode>
@@ -3461,10 +3483,10 @@ (define_insn "avx512f_vmcmp<mode>3_mask<round_saeonly_name>"
 (define_insn "<sse>_<unord>comi<round_saeonly_name>"
   [(set (reg:CCFP FLAGS_REG)
 	(compare:CCFP
-	  (vec_select:MODEF
+	  (vec_select:MODEFH
 	    (match_operand:<ssevecmode> 0 "register_operand" "v")
 	    (parallel [(const_int 0)]))
-	  (vec_select:MODEF
+	  (vec_select:MODEFH
 	    (match_operand:<ssevecmode> 1 "<round_saeonly_nimm_scalar_predicate>" "<round_saeonly_constraint>")
 	    (parallel [(const_int 0)]))))]
   "SSE_FLOAT_MODE_P (<MODE>mode)"
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 7106076b2a3..d9aa8a70e35 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -698,6 +698,13 @@
 #define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D)
+#define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8)
+#define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
+
+/* avx512fp16vlintrin.h */
+#define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
+#define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 1732b50be6b..9a2833d78f2 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -715,6 +715,13 @@
 #define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D)
+#define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8)
+#define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
+
+/* avx512fp16vlintrin.h */
+#define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
+#define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 135b4463941..ce0ad71f190 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -286,6 +286,7 @@ test_2 (_mm_add_round_sd, __m128d, __m128d, __m128d, 9)
 test_2 (_mm_add_round_ss, __m128, __m128, __m128, 9)
 test_2 (_mm_cmp_sd_mask, __mmask8, __m128d, __m128d, 1)
 test_2 (_mm_cmp_ss_mask, __mmask8, __m128, __m128, 1)
+test_2 (_mm_cmp_sh_mask, __mmask8, __m128h, __m128h, 1)
 #ifdef __x86_64__
 test_2 (_mm_cvt_roundi64_sd, __m128d, __m128d, long long, 9)
 test_2 (_mm_cvt_roundi64_ss, __m128, __m128, long long, 9)
@@ -470,6 +471,7 @@ test_3 (_mm256_maskz_shldi_epi64, __m256i, __mmask8, __m256i, __m256i, 1)
 test_3 (_mm_maskz_shldi_epi16, __m128i, __mmask8, __m128i, __m128i, 1)
 test_3 (_mm_maskz_shldi_epi32, __m128i, __mmask8, __m128i, __m128i, 1)
 test_3 (_mm_maskz_shldi_epi64, __m128i, __mmask8, __m128i, __m128i, 1)
+test_3 (_mm_mask_cmp_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1)
 test_3v (_mm512_i32scatter_epi32, void *, __m512i, __m512i, 1)
 test_3v (_mm512_i32scatter_epi64, void *, __m256i, __m512i, 1)
 test_3v (_mm512_i32scatter_pd, void *, __m256i, __m512d, 1)
@@ -680,6 +682,11 @@ test_2 (_mm512_max_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_min_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1)
+test_2 (_mm_comi_sh, int, __m128h, __m128h, 1)
+test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
+test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
+test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -692,6 +699,9 @@ test_3 (_mm512_maskz_max_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1)
+test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
+test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -705,6 +715,12 @@ test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h,
 test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 
+/* avx512fp16vlintrin.h */
+test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
+test_2 (_mm256_cmp_ph_mask, __mmask16, __m256h, __m256h, 1)
+test_3 (_mm_mask_cmp_ph_mask, __mmask8, __mmask8, __m128h, __m128h, 1)
+test_3 (_mm256_mask_cmp_ph_mask, __mmask16, __mmask16, __m256h, __m256h, 1)
+
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index da3f5606207..439346490bd 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -457,6 +457,7 @@ test_2 (_mm256_shldi_epi64, __m256i, __m256i, __m256i, 1)
 test_2 (_mm_shldi_epi16, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_shldi_epi32, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_shldi_epi64, __m128i, __m128i, __m128i, 1)
+test_2 (_mm_cmp_sh_mask, __mmask8, __m128h, __m128h, 1)
 #ifdef __x86_64__
 test_2 (_mm_cvt_roundi64_sd, __m128d, __m128d, long long, 9)
 test_2 (_mm_cvt_roundi64_ss, __m128, __m128, long long, 9)
@@ -581,6 +582,7 @@ test_3 (_mm256_maskz_shldi_epi64, __m256i, __mmask8, __m256i, __m256i, 1)
 test_3 (_mm_maskz_shldi_epi16, __m128i, __mmask8, __m128i, __m128i, 1)
 test_3 (_mm_maskz_shldi_epi32, __m128i, __mmask8, __m128i, __m128i, 1)
 test_3 (_mm_maskz_shldi_epi64, __m128i, __mmask8, __m128i, __m128i, 1)
+test_3 (_mm_mask_cmp_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1)
 test_3v (_mm512_i32scatter_epi32, void *, __m512i, __m512i, 1)
 test_3v (_mm512_i32scatter_epi64, void *, __m256i, __m512i, 1)
 test_3v (_mm512_i32scatter_pd, void *, __m256i, __m512d, 1)
@@ -785,6 +787,11 @@ test_2 (_mm512_max_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_min_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1)
+test_2 (_mm_comi_sh, int, __m128h, __m128h, 1)
+test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
+test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
+test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -797,6 +804,9 @@ test_3 (_mm512_maskz_max_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1)
+test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
+test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -810,6 +820,12 @@ test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h,
 test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 
+/* avx512fp16vlintrin.h */
+test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
+test_2 (_mm256_cmp_ph_mask, __mmask16, __m256h, __m256h, 1)
+test_3 (_mm_mask_cmp_ph_mask, __mmask8, __mmask8, __m128h, __m128h, 1)
+test_3 (_mm256_mask_cmp_ph_mask, __mmask16, __mmask16, __m256h, __m256h, 1)
+
 /* shaintrin.h */
 test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index c3fee655288..f6768bac345 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -716,6 +716,13 @@
 #define __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vminph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vmaxsh_v8hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vminsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D)
+#define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8)
+#define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
+
+/* avx512fp16vlintrin.h */
+#define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
+#define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 15/62] AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (13 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 14/62] AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh liuhongt
                   ` (46 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h
	(check_results_mask): New check_function.
	* gcc.target/i386/avx512fp16-vcmpph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vcmpph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcmpsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcmpsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcomish-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcomish-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcomish-1c.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcmpph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcmpph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-helper.h       | 37 ++++++++++
 .../gcc.target/i386/avx512fp16-vcmpph-1a.c    | 22 ++++++
 .../gcc.target/i386/avx512fp16-vcmpph-1b.c    | 70 +++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcmpsh-1a.c    | 21 ++++++
 .../gcc.target/i386/avx512fp16-vcmpsh-1b.c    | 45 ++++++++++++
 .../gcc.target/i386/avx512fp16-vcomish-1a.c   | 41 +++++++++++
 .../gcc.target/i386/avx512fp16-vcomish-1b.c   | 66 +++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcomish-1c.c   | 66 +++++++++++++++++
 .../gcc.target/i386/avx512fp16vl-vcmpph-1a.c  | 24 +++++++
 .../gcc.target/i386/avx512fp16vl-vcmpph-1b.c  | 16 +++++
 10 files changed, 408 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
index 9fde88a4f7b..5d3539bf312 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
@@ -107,6 +107,10 @@ display_ps(const void *p, const char *banner, int n_elems)
   check_results ((void*)res, (void*)exp, size,\
 		 NAME_OF(intrin))
 
+#define CHECK_RESULT_MASK(res, exp, size, intrin) \
+  check_results_mask ((__mmask32)res, (__mmask32)exp, size,\
+		 NAME_OF(intrin))
+
 /* To evaluate whether result match _Float16 precision,
    only the last bit of real/emulate result could be
    different.  */
@@ -136,6 +140,18 @@ check_results(void *got, void *exp, int n_elems, char *banner)
     }
 }
 
+void NOINLINE
+check_results_mask(__mmask32 got, __mmask32 exp, int n_elems, char *banner)
+{
+  if (got != exp) {
+#ifdef DEBUG
+      printf("ERROR: %s failed : got mask %x != exp mask %x\n",
+	     banner ? banner : "", got, exp);
+#endif
+      n_errs++;
+  }
+}
+
 /* Functions for src/dest initialization */
 void NOINLINE
 init_src()
@@ -156,6 +172,27 @@ init_src()
     src2 = pack_twops_2ph(v3, v4);
 }
 
+void NOINLINE
+init_src_nanf()
+{
+  V512 v1, v2, v3, v4;
+  int i;
+
+  for (i = 0; i < 16; i++) {
+    v1.f32[i] = i + 1 + 0.5;
+    v2.f32[i] = i + 17 + 0.5;
+    v3.f32[i] = i * 2 + 2 + 0.5;
+    v4.f32[i] = i * 2 + 34 + 0.5;
+
+    src3.u32[i] = (i + 1) * 10;
+  }
+
+  v1.f32[0] = __builtin_nanf("");
+  src1 = pack_twops_2ph(v1, v2);
+  src2 = pack_twops_2ph(v3, v4);
+}
+
+
 void NOINLINE
 init_dest(V512 * res, V512 * exp)
 {
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1a.c
new file mode 100644
index 00000000000..6425c4644c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1a.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$1\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$2\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$3\[^\n\r]*\{sae\}\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\[^\{\n\]*\\\$4\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __mmask32 res, res1, res2;
+volatile __m512h x1, x2;
+volatile __mmask32 m32;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cmp_ph_mask (x1, x2, 1);
+  res1 = _mm512_mask_cmp_ph_mask (m32, x1, x2, 2);
+  res = _mm512_cmp_round_ph_mask (x1, x2, 3, 8);
+  res1 = _mm512_mask_cmp_round_ph_mask (m32, x1, x2, 4, 4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1b.c
new file mode 100644
index 00000000000..ec5eccfccb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpph-1b.c
@@ -0,0 +1,70 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+__mmask32 NOINLINE
+EMULATE(cmp_ph) (V512 op1, V512 op2,
+	       __mmask32 k, int predicate)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i, j;
+  __mmask16 mr1 = 0, mr2 = 0;
+  __mmask16 m1, m2;
+  __mmask32 mr = 0;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+
+  for (i = 0; i < 16; i++) {
+      if (((1 << i) & m1) != 0) {
+	  j = v1.f32[i] == v3.f32[i] ? 1 : 0;
+	  mr1 = mr1 | (j << i);
+      }
+
+      if (((1 << i) & m2) != 0) {
+	  j = v2.f32[i] == v4.f32[i] ? 1 : 0;
+	  mr2 = mr2 | (j << i);
+      }
+  }
+
+  mr = mr1 | (mr2 << 16);
+  return mr;
+}
+
+void
+TEST (void)
+{
+  __mmask32 res, exp;
+
+  init_src();
+
+  exp = EMULATE(cmp_ph) (src1, src2,  NET_MASK, 0);
+  res = INTRINSIC (_cmp_ph_mask) (HF(src1), HF(src2), 0);
+  CHECK_RESULT_MASK (res, exp, N_ELEMS, _cmp_ph_mask);
+
+  exp = EMULATE(cmp_ph) (src1, src2,  MASK_VALUE, 0); 
+  res = INTRINSIC (_mask_cmp_ph_mask) (MASK_VALUE, HF(src1), HF(src2), 0);
+  CHECK_RESULT_MASK (res, exp, N_ELEMS, _mask_cmp_ph_mask);
+
+#if AVX512F_LEN == 512
+  exp = EMULATE(cmp_ph) (src1, src2,  NET_MASK, 0); 
+  res = INTRINSIC (_cmp_round_ph_mask) (HF(src1), HF(src2), 0, 8);
+  CHECK_RESULT_MASK (res, exp, N_ELEMS, _cmp_round_ph_mask);
+
+  exp = EMULATE(cmp_ph) (src1, src2,  MASK_VALUE, 0);
+  res = INTRINSIC (_mask_cmp_round_ph_mask) (MASK_VALUE, HF(src1), HF(src2), 0, 8);
+  CHECK_RESULT_MASK (res, exp, N_ELEMS, _mask_cmp_round_ph_mask);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1a.c
new file mode 100644
index 00000000000..5cce097d661
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\[^\{\n\]*\\\$4\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$3\[^\n\r]*\{sae\}\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\[^\{\n\]*\\\$4\[^\n\r]*\{sae\}\[^\n\r\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __mmask8 res, res1, res2;
+volatile __m128h x1, x2;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_cmp_sh_mask (x1, x2, 3);
+  res = _mm_mask_cmp_sh_mask (m8, x1, x2, 4);
+  res = _mm_cmp_round_sh_mask (x1, x2, 3, 8);
+  res1 = _mm_mask_cmp_round_sh_mask (m8, x1, x2, 4, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1b.c
new file mode 100644
index 00000000000..9deae52b41d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcmpsh-1b.c
@@ -0,0 +1,45 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+__mmask8 NOINLINE
+emulate_cmp_sh(V512 op1, V512 op2,
+	       __mmask8 k, int predicate)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  __mmask8 mr = 0;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+
+  if ((k&1) || !k)
+    mr = v1.f32[0] == v3.f32[0] ? 1 : 0;
+
+  return mr;
+}
+
+void
+test_512 (void)
+{
+  __mmask8 res, exp;
+
+  init_src();
+
+  exp = emulate_cmp_sh(src1, src2,  0x1, 0);
+  res = _mm_cmp_round_sh_mask(src1.xmmh[0], src2.xmmh[0], 0, 8);
+  check_results_mask(res, exp, 1, "_mm_cmp_round_sh_mask");
+
+  exp = emulate_cmp_sh(src1, src2,  0x1, 0);
+  res = _mm_mask_cmp_round_sh_mask(0x1, src1.xmmh[0], src2.xmmh[0], 0, 8);
+  check_results_mask(res, exp, 1, "_mm_mask_cmp_round_sh_mask");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1a.c
new file mode 100644
index 00000000000..b87ffd9b80f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1a.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$3\[^\n\r]*\{sae\}\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$7\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$16\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$1\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$2\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$14\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$13\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$20\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$0\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$17\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$18\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$30\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$29\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpsh\[ \\t\]+\\\$4\[^\n\r0-9]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x, y;
+volatile int res;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_comi_round_sh (x, y, 3, 8);
+  res = _mm_comi_sh (x, y, 7);
+  res = _mm_comieq_sh (x, y);
+  res = _mm_comilt_sh (x, y);
+  res = _mm_comile_sh (x, y);
+  res = _mm_comigt_sh (x, y);
+  res = _mm_comige_sh (x, y);
+  res = _mm_comineq_sh (x, y);
+  res = _mm_ucomieq_sh (x, y);
+  res = _mm_ucomilt_sh (x, y);
+  res = _mm_ucomile_sh (x, y);
+  res = _mm_ucomigt_sh (x, y);
+  res = _mm_ucomige_sh (x, y);
+  res = _mm_ucomineq_sh (x, y);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1b.c
new file mode 100644
index 00000000000..8c398003cb9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1b.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+
+#define CMP(imm, rel)					\
+  dst_ref = 0;						\
+  dst_ref = ((int) rel) | dst_ref;			\
+  dst = _mm_comi_round_sh(src1.xmmh[0], src2.xmmh[0], imm,    \
+			  _MM_FROUND_NO_EXC);		\
+  if (dst_ref != dst) abort();				\
+
+void
+test_512 (void)
+{
+  V512 v1,v2,v3,v4;
+  float s1,s2;
+  int res,exp,dst;
+  __mmask8 dst_ref;
+
+  init_src();
+  unpack_ph_2twops(src1, &v1, &v2);
+  unpack_ph_2twops(src2, &v3, &v4);
+  s1 = v1.f32[0];
+  s2 = v3.f32[0];
+
+  CMP(_CMP_EQ_OQ, !isunordered(s1, s2) && s1 == s2);
+  CMP(_CMP_LT_OS, !isunordered(s1, s2) && s1 < s2);
+  CMP(_CMP_LE_OS, !isunordered(s1, s2) && s1 <= s2);
+  CMP(_CMP_UNORD_Q, isunordered(s1, s2));
+  CMP(_CMP_NEQ_UQ, isunordered(s1, s2) || s1 != s2);
+  CMP(_CMP_NLT_US, isunordered(s1, s2) || s1 >= s2);
+  CMP(_CMP_NLE_US, isunordered(s1, s2) || s1 > s2);
+  CMP(_CMP_ORD_Q, !isunordered(s1, s2));
+
+  CMP(_CMP_EQ_UQ, isunordered(s1, s2) || s1 == s2);
+  CMP(_CMP_NGE_US, isunordered(s1, s2) || s1 < s2);
+  CMP(_CMP_NGT_US, isunordered(s1, s2) || s1 <= s2);
+
+  CMP(_CMP_FALSE_OQ, 0);
+  CMP(_CMP_NEQ_OQ, !isunordered(s1, s2) && s1 != s2);
+  CMP(_CMP_GE_OS, !isunordered(s1, s2) && s1 >= s2);
+  CMP(_CMP_GT_OS, !isunordered(s1, s2) && s1 > s2);
+  CMP(_CMP_TRUE_UQ, 1);
+
+  CMP(_CMP_EQ_OS, !isunordered(s1, s2) && s1 == s2);
+  CMP(_CMP_LT_OQ, !isunordered(s1, s2) && s1 < s2);
+  CMP(_CMP_LE_OQ, !isunordered(s1, s2) && s1 <= s2);
+  CMP(_CMP_UNORD_S, isunordered(s1, s2));
+  CMP(_CMP_NEQ_US, isunordered(s1, s2) || s1 != s2);
+  CMP(_CMP_NLT_UQ, isunordered(s1, s2) || s1 >= s2);
+  CMP(_CMP_NLE_UQ, isunordered(s1, s2) || s1 > s2);
+  CMP(_CMP_ORD_S, !isunordered(s1, s2));
+  CMP(_CMP_EQ_US, isunordered(s1, s2) || s1 == s2);
+  CMP(_CMP_NGE_UQ, isunordered(s1, s2) || s1 < s2);
+  CMP(_CMP_NGT_UQ, isunordered(s1, s2) || s1 <= s2);
+  CMP(_CMP_FALSE_OS, 0);
+  CMP(_CMP_NEQ_OS, !isunordered(s1, s2) && s1 != s2);
+  CMP(_CMP_GE_OQ, !isunordered(s1, s2) && s1 >= s2);
+  CMP(_CMP_GT_OQ, !isunordered(s1, s2) && s1 > s2);
+  CMP(_CMP_TRUE_US, 1);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1c.c
new file mode 100644
index 00000000000..77366a8a30e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcomish-1c.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+
+#define CMP(imm, rel)					\
+  dst_ref = 0;						\
+  dst_ref = ((int) rel) | dst_ref;			\
+  dst = _mm_comi_round_sh(src1.xmmh[0], src2.xmmh[0], imm,    \
+			  _MM_FROUND_NO_EXC);		\
+  if (dst_ref != dst) abort();				\
+
+void
+test_512 (void)
+{
+  V512 v1,v2,v3,v4;
+  float s1,s2;
+  int res,exp,dst;
+  __mmask8 dst_ref;
+
+  init_src_nanf();
+  unpack_ph_2twops(src1, &v1, &v2);
+  unpack_ph_2twops(src2, &v3, &v4);
+  s1 = v1.f32[0];
+  s2 = v3.f32[0];
+
+  CMP(_CMP_EQ_OQ, !isunordered(s1, s2) && s1 == s2);
+  CMP(_CMP_LT_OS, !isunordered(s1, s2) && s1 < s2);
+  CMP(_CMP_LE_OS, !isunordered(s1, s2) && s1 <= s2);
+  CMP(_CMP_UNORD_Q, isunordered(s1, s2));
+  CMP(_CMP_NEQ_UQ, isunordered(s1, s2) || s1 != s2);
+  CMP(_CMP_NLT_US, isunordered(s1, s2) || s1 >= s2);
+  CMP(_CMP_NLE_US, isunordered(s1, s2) || s1 > s2);
+  CMP(_CMP_ORD_Q, !isunordered(s1, s2));
+
+  CMP(_CMP_EQ_UQ, isunordered(s1, s2) || s1 == s2);
+  CMP(_CMP_NGE_US, isunordered(s1, s2) || s1 < s2);
+  CMP(_CMP_NGT_US, isunordered(s1, s2) || s1 <= s2);
+
+  CMP(_CMP_FALSE_OQ, 0);
+  CMP(_CMP_NEQ_OQ, !isunordered(s1, s2) && s1 != s2);
+  CMP(_CMP_GE_OS, !isunordered(s1, s2) && s1 >= s2);
+  CMP(_CMP_GT_OS, !isunordered(s1, s2) && s1 > s2);
+  CMP(_CMP_TRUE_UQ, 1);
+
+  CMP(_CMP_EQ_OS, !isunordered(s1, s2) && s1 == s2);
+  CMP(_CMP_LT_OQ, !isunordered(s1, s2) && s1 < s2);
+  CMP(_CMP_LE_OQ, !isunordered(s1, s2) && s1 <= s2);
+  CMP(_CMP_UNORD_S, isunordered(s1, s2));
+  CMP(_CMP_NEQ_US, isunordered(s1, s2) || s1 != s2);
+  CMP(_CMP_NLT_UQ, isunordered(s1, s2) || s1 >= s2);
+  CMP(_CMP_NLE_UQ, isunordered(s1, s2) || s1 > s2);
+  CMP(_CMP_ORD_S, !isunordered(s1, s2));
+  CMP(_CMP_EQ_US, isunordered(s1, s2) || s1 == s2);
+  CMP(_CMP_NGE_UQ, isunordered(s1, s2) || s1 < s2);
+  CMP(_CMP_NGT_UQ, isunordered(s1, s2) || s1 <= s2);
+  CMP(_CMP_FALSE_OS, 0);
+  CMP(_CMP_NEQ_OS, !isunordered(s1, s2) && s1 != s2);
+  CMP(_CMP_GE_OQ, !isunordered(s1, s2) && s1 >= s2);
+  CMP(_CMP_GT_OQ, !isunordered(s1, s2) && s1 > s2);
+  CMP(_CMP_TRUE_US, 1);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1a.c
new file mode 100644
index 00000000000..31da2b235f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$1\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%k\[0-9\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$2\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpph\[ \\t\]+\\\$4\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%k\[0-9\]\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __mmask16 res;
+volatile __mmask8 res1;
+volatile __m256h x1, x2;
+volatile __m128h x3, x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm256_cmp_ph_mask (x1, x2, 1);
+  res = _mm256_mask_cmp_ph_mask (m16, x1, x2, 2);
+  res1 = _mm_cmp_ph_mask (x3, x4, 3);
+  res1 = _mm_mask_cmp_ph_mask (m8, x3, x4, 4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c
new file mode 100644
index 00000000000..c201a9258bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcmpph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcmpph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcmpph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (14 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 15/62] AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-09-14  3:50   ` Hongtao Liu
  2021-07-01  6:16 ` [PATCH 17/62] AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh liuhongt
                   ` (45 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h: (_mm512_sqrt_ph):
	New intrinsic.
	(_mm512_mask_sqrt_ph): Likewise.
	(_mm512_maskz_sqrt_ph): Likewise.
	(_mm512_sqrt_round_ph): Likewise.
	(_mm512_mask_sqrt_round_ph): Likewise.
	(_mm512_maskz_sqrt_round_ph): Likewise.
	(_mm512_rsqrt_ph): Likewise.
	(_mm512_mask_rsqrt_ph): Likewise.
	(_mm512_maskz_rsqrt_ph): Likewise.
	(_mm_rsqrt_sh): Likewise.
	(_mm_mask_rsqrt_sh): Likewise.
	(_mm_maskz_rsqrt_sh): Likewise.
	(_mm_sqrt_sh): Likewise.
	(_mm_mask_sqrt_sh): Likewise.
	(_mm_maskz_sqrt_sh): Likewise.
	(_mm_sqrt_round_sh): Likewise.
	(_mm_mask_sqrt_round_sh): Likewise.
	(_mm_maskz_sqrt_round_sh): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_sqrt_ph): New intrinsic.
	(_mm256_sqrt_ph): Likewise.
	(_mm_mask_sqrt_ph): Likewise.
	(_mm256_mask_sqrt_ph): Likewise.
	(_mm_maskz_sqrt_ph): Likewise.
	(_mm256_maskz_sqrt_ph): Likewise.
	(_mm_rsqrt_ph): Likewise.
	(_mm256_rsqrt_ph): Likewise.
	(_mm_mask_rsqrt_ph): Likewise.
	(_mm256_mask_rsqrt_ph): Likewise.
	(_mm_maskz_rsqrt_ph): Likewise.
	(_mm256_maskz_rsqrt_ph): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtins.
	(ix86_expand_round_builtin): Ditto.
	* config/i386/sse.md (VF_AVX512FP16VL): New.
	(sqrt<mode>2): Adjust for HF vector modes.
	(<sse>_sqrt<mode>2<mask_name><round_name>): Likewise.
	(<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>):
	Likewise.
	(<sse>_rsqrt<mode>2<mask_name>): New.
	(avx512fp16_vmrsqrtv8hf2<mask_scalar_name>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 193 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   |  93 ++++++++++++
 gcc/config/i386/i386-builtin-types.def |   4 +
 gcc/config/i386/i386-builtin.def       |   8 +
 gcc/config/i386/i386-expand.c          |   4 +
 gcc/config/i386/sse.md                 |  44 ++++--
 gcc/testsuite/gcc.target/i386/avx-1.c  |   2 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   2 +
 gcc/testsuite/gcc.target/i386/sse-14.c |   6 +
 gcc/testsuite/gcc.target/i386/sse-22.c |   6 +
 gcc/testsuite/gcc.target/i386/sse-23.c |   2 +
 11 files changed, 355 insertions(+), 9 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index ed8ad84a105..50db5d12140 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -1235,6 +1235,199 @@ _mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R)
 
 #endif /* __OPTIMIZE__  */
 
+/* Intrinsics vsqrtph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sqrt_ph (__m512h __A)
+{
+  return __builtin_ia32_vsqrtph_v32hf_mask_round (__A,
+						  _mm512_setzero_ph(),
+						  (__mmask32) -1,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
+{
+  return __builtin_ia32_vsqrtph_v32hf_mask_round (__C, __A, __B,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sqrt_ph (__mmask32 __A, __m512h __B)
+{
+  return __builtin_ia32_vsqrtph_v32hf_mask_round (__B,
+						  _mm512_setzero_ph (),
+						  __A,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_sqrt_round_ph (__m512h __A, const int __B)
+{
+  return __builtin_ia32_vsqrtph_v32hf_mask_round (__A,
+						  _mm512_setzero_ph(),
+						  (__mmask32) -1, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_sqrt_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			   const int __D)
+{
+  return __builtin_ia32_vsqrtph_v32hf_mask_round (__C, __A, __B, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_sqrt_round_ph (__mmask32 __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_vsqrtph_v32hf_mask_round (__B,
+						  _mm512_setzero_ph (),
+						  __A, __C);
+}
+
+#else
+#define _mm512_sqrt_round_ph(A, B)					\
+  (__builtin_ia32_vsqrtph_v32hf_mask_round ((A),			\
+					    _mm512_setzero_ph (),	\
+					    (__mmask32)-1, (B)))
+
+#define _mm512_mask_sqrt_round_ph(A, B, C, D)				\
+  (__builtin_ia32_vsqrtph_v32hf_mask_round ((C), (A), (B), (D)))
+
+#define _mm512_maskz_sqrt_round_ph(A, B, C)				\
+  (__builtin_ia32_vsqrtph_v32hf_mask_round ((B),			\
+					    _mm512_setzero_ph (),	\
+					    (A), (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vrsqrtph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_rsqrt_ph (__m512h __A)
+{
+  return __builtin_ia32_vrsqrtph_v32hf_mask (__A, _mm512_setzero_ph (),
+					     (__mmask32) -1);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_rsqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
+{
+  return __builtin_ia32_vrsqrtph_v32hf_mask (__C, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_rsqrt_ph (__mmask32 __A, __m512h __B)
+{
+  return __builtin_ia32_vrsqrtph_v32hf_mask (__B, _mm512_setzero_ph (),
+					     __A);
+}
+
+/* Intrinsics vrsqrtsh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rsqrt_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vrsqrtsh_v8hf_mask (__B, __A, _mm_setzero_ph (),
+					    (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_rsqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vrsqrtsh_v8hf_mask (__D, __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_rsqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vrsqrtsh_v8hf_mask (__C, __B, _mm_setzero_ph (),
+					    __A);
+}
+
+/* Intrinsics vsqrtsh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sqrt_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__B, __A,
+						 _mm_setzero_ph (),
+						 (__mmask8) -1,
+						 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__D, __C, __A, __B,
+						 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__C, __B,
+						 _mm_setzero_ph (),
+						 __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sqrt_round_sh (__m128h __A, __m128h __B, const int __C)
+{
+  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__B, __A,
+						 _mm_setzero_ph (),
+						 (__mmask8) -1, __C);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sqrt_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			__m128h __D, const int __E)
+{
+  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__D, __C, __A, __B,
+						 __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			 const int __D)
+{
+  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__C, __B,
+						 _mm_setzero_ph (),
+						 __A, __D);
+}
+
+#else
+#define _mm_sqrt_round_sh(A, B, C)				\
+  (__builtin_ia32_vsqrtsh_v8hf_mask_round ((B), (A),		\
+					   _mm_setzero_ph (),	\
+					   (__mmask8)-1, (C)))
+
+#define _mm_mask_sqrt_round_sh(A, B, C, D, E)			\
+  (__builtin_ia32_vsqrtsh_v8hf_mask_round ((D), (C), (A), (B), (E)))
+
+#define _mm_maskz_sqrt_round_sh(A, B, C, D)			\
+  (__builtin_ia32_vsqrtsh_v8hf_mask_round ((C), (B),		\
+					   _mm_setzero_ph (),	\
+					   (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index 1787ed5f4ff..aaed85203c9 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -358,6 +358,99 @@ _mm_mask_cmp_ph_mask (__mmask16 __A, __m256h __B, __m256h __C,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vsqrtph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sqrt_ph (__m128h __A)
+{
+  return __builtin_ia32_vsqrtph_v8hf_mask (__A, _mm_setzero_ph (),
+					   (__mmask8) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_sqrt_ph (__m256h __A)
+{
+  return __builtin_ia32_vsqrtph_v16hf_mask (__A, _mm256_setzero_ph (),
+					    (__mmask16) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_sqrt_ph (__m128h __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vsqrtph_v8hf_mask (__C, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_sqrt_ph (__m256h __A, __mmask16 __B, __m256h __C)
+{
+  return __builtin_ia32_vsqrtph_v16hf_mask (__C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_sqrt_ph (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vsqrtph_v8hf_mask (__B, _mm_setzero_ph (),
+					   __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_sqrt_ph (__mmask16 __A, __m256h __B)
+{
+  return __builtin_ia32_vsqrtph_v16hf_mask (__B, _mm256_setzero_ph (),
+					    __A);
+}
+
+/* Intrinsics vrsqrtph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rsqrt_ph (__m128h __A)
+{
+  return __builtin_ia32_vrsqrtph_v8hf_mask (__A, _mm_setzero_ph (),
+					    (__mmask8) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_rsqrt_ph (__m256h __A)
+{
+  return __builtin_ia32_vrsqrtph_v16hf_mask (__A, _mm256_setzero_ph (),
+					     (__mmask16) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_rsqrt_ph (__m128h __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vrsqrtph_v8hf_mask (__C, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_rsqrt_ph (__m256h __A, __mmask16 __B, __m256h __C)
+{
+  return __builtin_ia32_vrsqrtph_v16hf_mask (__C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_rsqrt_ph (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vrsqrtph_v8hf_mask (__B, _mm_setzero_ph (), __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_rsqrt_ph (__mmask16 __A, __m256h __B)
+{
+  return __builtin_ia32_vrsqrtph_v16hf_mask (__B, _mm256_setzero_ph (),
+					     __A);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index e3070ad00bd..9ebad6b5f49 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1305,16 +1305,20 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
 # FP16 builtins
 DEF_FUNCTION_TYPE (V8HF, V8HI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
+DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
+DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
+DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
 DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI)
+DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
 DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 045cf561ec7..999b2e1abb5 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2802,6 +2802,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_cmpv8hf3_mask, "__builtin_ia32_vcmpph_v8hf_mask", IX86_BUILTIN_VCMPPH_V8HF_MASK, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_cmpv16hf3_mask, "__builtin_ia32_vcmpph_v16hf_mask", IX86_BUILTIN_VCMPPH_V16HF_MASK, UNKNOWN, (int) UHI_FTYPE_V16HF_V16HF_INT_UHI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_vcmpph_v32hf_mask", IX86_BUILTIN_VCMPPH_V32HF_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv8hf2_mask, "__builtin_ia32_vsqrtph_v8hf_mask", IX86_BUILTIN_VSQRTPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv16hf2_mask, "__builtin_ia32_vsqrtph_v16hf_mask", IX86_BUILTIN_VSQRTPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv8hf2_mask, "__builtin_ia32_vrsqrtph_v8hf_mask", IX86_BUILTIN_VRSQRTPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv16hf2_mask, "__builtin_ia32_vrsqrtph_v16hf_mask", IX86_BUILTIN_VRSQRTPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_vrsqrtph_v32hf_mask", IX86_BUILTIN_VRSQRTPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrsqrtv8hf2_mask, "__builtin_ia32_vrsqrtsh_v8hf_mask", IX86_BUILTIN_VRSQRTSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3017,6 +3023,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_roun
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_vminsh_v8hf_mask_round", IX86_BUILTIN_VMINSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_vcmpph_v32hf_mask_round", IX86_BUILTIN_VCMPPH_V32HF_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_vcmpsh_v8hf_mask_round", IX86_BUILTIN_VCMPSH_V8HF_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_vsqrtph_v32hf_mask_round", IX86_BUILTIN_VSQRTPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_vsqrtsh_v8hf_mask_round", IX86_BUILTIN_VSQRTSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index a79cc324ceb..d76e4405413 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9532,6 +9532,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16HI_FTYPE_V16SI_V16HI_UHI:
     case V16QI_FTYPE_V16SI_V16QI_UHI:
     case V16QI_FTYPE_V8DI_V16QI_UQI:
+    case V32HF_FTYPE_V32HF_V32HF_USI:
     case V16SF_FTYPE_V16SF_V16SF_UHI:
     case V16SF_FTYPE_V4SF_V16SF_UHI:
     case V16SI_FTYPE_SI_V16SI_UHI:
@@ -9561,12 +9562,14 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16HI_FTYPE_HI_V16HI_UHI:
     case V8HI_FTYPE_V8HI_V8HI_UQI:
     case V8HI_FTYPE_HI_V8HI_UQI:
+    case V16HF_FTYPE_V16HF_V16HF_UHI:
     case V8SF_FTYPE_V8HI_V8SF_UQI:
     case V4SF_FTYPE_V8HI_V4SF_UQI:
     case V8SI_FTYPE_V8SF_V8SI_UQI:
     case V4SI_FTYPE_V4SF_V4SI_UQI:
     case V4DI_FTYPE_V4SF_V4DI_UQI:
     case V2DI_FTYPE_V4SF_V2DI_UQI:
+    case V8HF_FTYPE_V8HF_V8HF_UQI:
     case V4SF_FTYPE_V4DI_V4SF_UQI:
     case V4SF_FTYPE_V2DI_V4SF_UQI:
     case V4DF_FTYPE_V4DI_V4DF_UQI:
@@ -10495,6 +10498,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V8DI_FTYPE_V8DF_V8DI_QI_INT:
     case V8SF_FTYPE_V8DI_V8SF_QI_INT:
     case V8DF_FTYPE_V8DI_V8DF_QI_INT:
+    case V32HF_FTYPE_V32HF_V32HF_USI_INT:
     case V16SF_FTYPE_V16SF_V16SF_HI_INT:
     case V8DI_FTYPE_V8SF_V8DI_QI_INT:
     case V16SF_FTYPE_V16SI_V16SF_HI_INT:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b7e22e0ec80..4763fd0558d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -395,6 +395,9 @@ (define_mode_iterator VF1_AVX512VL
 (define_mode_iterator VF_AVX512FP16
   [V32HF V16HF V8HF])
 
+(define_mode_iterator VF_AVX512FP16VL
+  [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
+
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
@@ -2238,8 +2241,8 @@ (define_insn "srcp14<mode>_mask"
    (set_attr "mode" "<MODE>")])
 
 (define_expand "sqrt<mode>2"
-  [(set (match_operand:VF2 0 "register_operand")
-	(sqrt:VF2 (match_operand:VF2 1 "vector_operand")))]
+  [(set (match_operand:VF2H 0 "register_operand")
+	(sqrt:VF2H (match_operand:VF2H 1 "vector_operand")))]
   "TARGET_SSE2")
 
 (define_expand "sqrt<mode>2"
@@ -2259,8 +2262,8 @@ (define_expand "sqrt<mode>2"
 })
 
 (define_insn "<sse>_sqrt<mode>2<mask_name><round_name>"
-  [(set (match_operand:VF 0 "register_operand" "=x,v")
-	(sqrt:VF (match_operand:VF 1 "<round_nimm_predicate>" "xBm,<round_constraint>")))]
+  [(set (match_operand:VFH 0 "register_operand" "=x,v")
+	(sqrt:VFH (match_operand:VFH 1 "<round_nimm_predicate>" "xBm,<round_constraint>")))]
   "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    sqrt<ssemodesuffix>\t{%1, %0|%0, %1}
@@ -2273,11 +2276,11 @@ (define_insn "<sse>_sqrt<mode>2<mask_name><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (sqrt:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" "xm,<round_scalar_constraint>"))
-	  (match_operand:VF_128 2 "register_operand" "0,v")
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (sqrt:VFH_128
+	    (match_operand:VFH_128 1 "nonimmediate_operand" "xm,<round_scalar_constraint>"))
+	  (match_operand:VFH_128 2 "register_operand" "0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
@@ -2330,6 +2333,16 @@ (define_insn "<sse>_rsqrt<mode>2"
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "<sse>_rsqrt<mode>2<mask_name>"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
+	(unspec:VF_AVX512FP16VL
+	  [(match_operand:VF_AVX512FP16VL 1 "vector_operand" "vBm")] UNSPEC_RSQRT))]
+  "TARGET_AVX512FP16"
+  "vrsqrtph\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "<mask_codefor>rsqrt14<mode><mask_name>"
   [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
 	(unspec:VF_AVX512VL
@@ -2405,6 +2418,19 @@ (define_insn "*sse_vmrsqrtv4sf2"
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
+(define_insn "avx512fp16_vmrsqrtv8hf2<mask_scalar_name>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (unspec:V8HF [(match_operand:V8HF 1 "nonimmediate_operand" "vm")]
+		       UNSPEC_RSQRT)
+	  (match_operand:V8HF 2 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vrsqrtsh\t{%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %w1}"
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_expand "<code><mode>3<mask_name><round_saeonly_name>"
   [(set (match_operand:VFH 0 "register_operand")
 	(smaxmin:VFH
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index d9aa8a70e35..651cb1c80fb 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -701,6 +701,8 @@
 #define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D)
 #define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8)
 #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
+#define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8)
+#define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 9a2833d78f2..94553dec9e7 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -718,6 +718,8 @@
 #define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D)
 #define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8)
 #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
+#define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8)
+#define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index ce0ad71f190..7281bffdf2b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -670,6 +670,7 @@ test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 8)
 test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 8)
 
 /* avx512fp16intrin.h */
+test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8)
 test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
@@ -684,6 +685,8 @@ test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1)
 test_2 (_mm_comi_sh, int, __m128h, __m128h, 1)
+test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8)
+test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -700,6 +703,8 @@ test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1)
+test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
+test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -714,6 +719,7 @@ test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h,
 test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 
 /* avx512fp16vlintrin.h */
 test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 439346490bd..04326e0e37d 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -775,6 +775,7 @@ test_2 (_mm_rsqrt28_round_sd, __m128d, __m128d, __m128d, 8)
 test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 8)
 
 /* avx512fp16intrin.h */
+test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8)
 test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
@@ -789,6 +790,8 @@ test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1)
 test_2 (_mm_comi_sh, int, __m128h, __m128h, 1)
+test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8)
+test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -805,6 +808,8 @@ test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1)
+test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
+test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -819,6 +824,7 @@ test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h,
 test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 
 /* avx512fp16vlintrin.h */
 test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index f6768bac345..7559d335dbc 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -719,6 +719,8 @@
 #define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D)
 #define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8)
 #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
+#define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8)
+#define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 17/62] AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (15 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 18/62] AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh liuhongt
                   ` (44 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vrsqrtph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vrsqrtph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vrsqrtsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vrsqrtsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vsqrtph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vsqrtph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vsqrtsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vsqrtsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vsqrtph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vsqrtph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-vrsqrtph-1a.c  | 19 ++++
 .../gcc.target/i386/avx512fp16-vrsqrtph-1b.c  | 77 ++++++++++++++++
 .../gcc.target/i386/avx512fp16-vrsqrtsh-1a.c  | 18 ++++
 .../gcc.target/i386/avx512fp16-vrsqrtsh-1b.c  | 59 ++++++++++++
 .../gcc.target/i386/avx512fp16-vsqrtph-1a.c   | 24 +++++
 .../gcc.target/i386/avx512fp16-vsqrtph-1b.c   | 92 +++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vsqrtsh-1a.c   | 23 +++++
 .../gcc.target/i386/avx512fp16-vsqrtsh-1b.c   | 60 ++++++++++++
 .../i386/avx512fp16vl-vrsqrtph-1a.c           | 29 ++++++
 .../i386/avx512fp16vl-vrsqrtph-1b.c           | 16 ++++
 .../gcc.target/i386/avx512fp16vl-vsqrtph-1a.c | 29 ++++++
 .../gcc.target/i386/avx512fp16vl-vsqrtph-1b.c | 16 ++++
 12 files changed, 462 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1a.c
new file mode 100644
index 00000000000..c9671e8ed0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1a.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res;
+volatile __m512h x1;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_rsqrt_ph (x1);
+  res = _mm512_mask_rsqrt_ph (res, m32, x1);
+  res = _mm512_maskz_rsqrt_ph (m32, x1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1b.c
new file mode 100644
index 00000000000..237971dbaa7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtph-1b.c
@@ -0,0 +1,77 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(rsqrt_ph) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.f32[i] = 0;
+      }
+      else {
+	v5.u32[i] = v7.u32[i];
+      }
+    }
+    else {
+      v5.f32[i] = 1. / sqrtf(v1.f32[i]);
+    }
+
+    if (((1 << i) & m2) == 0) {
+      if (zero_mask) {
+	v6.f32[i] = 0;
+      }
+      else {
+	v6.u32[i] = v8.u32[i];
+      }
+    }
+    else {
+      v6.f32[i] = 1. / sqrtf(v2.f32[i]);
+    }
+
+  }
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(rsqrt_ph) (&exp, src1,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_rsqrt_ph) (HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _rsqrt_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(rsqrt_ph) (&exp, src1, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_rsqrt_ph) (HF(res), MASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_rsqrt_ph);
+
+  EMULATE(rsqrt_ph) (&exp, src1,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_rsqrt_ph) (ZMASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_rsqrt_ph);
+
+  if (n_errs != 0)
+    abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1a.c
new file mode 100644
index 00000000000..060ce33f164
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1a.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vrsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vrsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, x1, x2;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_rsqrt_sh (x1, x2);
+  res = _mm_mask_rsqrt_sh (res, m8, x1, x2);
+  res = _mm_maskz_rsqrt_sh (m8, x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1b.c
new file mode 100644
index 00000000000..5f20de7c24a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrsqrtsh-1b.c
@@ -0,0 +1,59 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_rsqrt_sh(V512 * dest, V512 op1,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = 1.0 / sqrtf(v1.f32[0]);
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_rsqrt_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_rsqrt_sh(exp.xmmh[0], src1.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_rsqrt_sh");
+
+  init_dest(&res, &exp);
+  emulate_rsqrt_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_mask_rsqrt_sh(res.xmmh[0], 0x1, exp.xmmh[0], src1.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_rsqrt_sh");
+
+  emulate_rsqrt_sh(&exp, src1,  0x1, 1);
+  res.xmmh[0] = _mm_maskz_rsqrt_sh(0x1, exp.xmmh[0], src1.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_rsqrt_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1a.c
new file mode 100644
index 00000000000..497b5bab1db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res;
+volatile __m512h x1;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_sqrt_ph (x1);
+  res = _mm512_mask_sqrt_ph (res, m32, x1);
+  res = _mm512_maskz_sqrt_ph (m32, x1);
+  res = _mm512_sqrt_round_ph (x1, 4);
+  res = _mm512_mask_sqrt_round_ph (res, m32, x1, 8);
+  res = _mm512_maskz_sqrt_round_ph (m32, x1, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1b.c
new file mode 100644
index 00000000000..d4d047b194d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtph-1b.c
@@ -0,0 +1,92 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(sqrt_ph) (V512 * dest, V512 op1,
+		__mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.f32[i] = 0;
+      }
+      else {
+	v5.u32[i] = v7.u32[i];
+      }
+    }
+    else {
+      v5.f32[i] = sqrtf(v1.f32[i]);
+    }
+
+    if (((1 << i) & m2) == 0) {
+      if (zero_mask) {
+	v6.f32[i] = 0;
+      }
+      else {
+	v6.u32[i] = v8.u32[i];
+      }
+    }
+    else {
+      v6.f32[i] = sqrtf(v2.f32[i]);
+    }
+
+  }
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(sqrt_ph) (&exp, src1, NET_MASK, 0);
+  HF(res) = INTRINSIC (_sqrt_ph) (HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _sqrt_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(sqrt_ph) (&exp, src1, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_sqrt_ph) (HF(res), MASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_sqrt_ph);
+
+  EMULATE(sqrt_ph) (&exp, src1, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_sqrt_ph) (ZMASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_sqrt_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(sqrt_ph) (&exp, src1, NET_MASK, 0);
+  HF(res) = INTRINSIC (_sqrt_round_ph) (HF(src1), 8);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _sqrt_round_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(sqrt_ph) (&exp, src1, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_sqrt_round_ph) (HF(res), MASK_VALUE, HF(src1), 8);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_sqrt_round_ph);
+
+  EMULATE(sqrt_ph) (&exp, src1,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_sqrt_round_ph) (ZMASK_VALUE, HF(src1), 8);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_sqrt_round_ph);
+#endif
+
+  if (n_errs != 0)
+    abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1a.c
new file mode 100644
index 00000000000..dd44534a2eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1a.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, x1, x2;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_sqrt_sh (x1, x2);
+  res = _mm_mask_sqrt_sh (res, m8, x1, x2);
+  res = _mm_maskz_sqrt_sh (m8, x1, x2);
+  res = _mm_sqrt_round_sh (x1, x2, 4);
+  res = _mm_mask_sqrt_round_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_sqrt_round_sh (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1b.c
new file mode 100644
index 00000000000..4744c6f1e55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vsqrtsh-1b.c
@@ -0,0 +1,60 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_sqrt_sh(V512 * dest, V512 op1,
+		__mmask8 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  if ((k&1) || !k)
+    v5.f32[0] = sqrtf(v1.f32[0]);
+  else if (zero_mask)
+    v5.f32[0] = 0;
+  else
+    v5.f32[0] = v7.f32[0];
+
+  for (i = 1; i < 8; i++)
+    v5.f32[i] = v1.f32[i];
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_sqrt_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_sqrt_round_sh(exp.xmmh[0], src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_sqrt_round_sh");
+
+  init_dest(&res, &exp);
+  emulate_sqrt_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_mask_sqrt_round_sh(res.xmmh[0], 0x1, exp.xmmh[0],
+                                         src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_sqrt_round_sh");
+
+  emulate_sqrt_sh(&exp, src1,  0x1, 1);
+  res.xmmh[0] = _mm_maskz_sqrt_round_sh(0x1, exp.xmmh[0], src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_sqrt_round_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c
new file mode 100644
index 00000000000..a5edc176b63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vrsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1;
+volatile __m128h x2;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_rsqrt_ph (x1);
+  res1 = _mm256_mask_rsqrt_ph (res1, m16, x1);
+  res1 = _mm256_maskz_rsqrt_ph (m16, x1);
+
+  res2 = _mm_rsqrt_ph (x2);
+  res2 = _mm_mask_rsqrt_ph (res2, m8, x2);
+  res2 = _mm_maskz_rsqrt_ph (m8, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c
new file mode 100644
index 00000000000..a5e796b8ebb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrsqrtph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vrsqrtph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vrsqrtph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1a.c
new file mode 100644
index 00000000000..4acb137e6b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1;
+volatile __m128h x2;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_sqrt_ph (x1);
+  res1 = _mm256_mask_sqrt_ph (res1, m16, x1);
+  res1 = _mm256_maskz_sqrt_ph (m16, x1);
+
+  res2 = _mm_sqrt_ph (x2);
+  res2 = _mm_mask_sqrt_ph (res2, m8, x2);
+  res2 = _mm_maskz_sqrt_ph (m8, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c
new file mode 100644
index 00000000000..9b0a91d7b5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vsqrtph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vsqrtph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vsqrtph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 18/62] AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (16 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 17/62] AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 19/62] AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh liuhongt
                   ` (43 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h: (_mm512_rcp_ph):
	New intrinsic.
	(_mm512_mask_rcp_ph): Likewise.
	(_mm512_maskz_rcp_ph): Likewise.
	(_mm_rcp_sh): Likewise.
	(_mm_mask_rcp_sh): Likewise.
	(_mm_maskz_rcp_sh): Likewise.
	(_mm512_scalef_ph): Likewise.
	(_mm512_mask_scalef_ph): Likewise.
	(_mm512_maskz_scalef_ph): Likewise.
	(_mm512_scalef_round_ph): Likewise.
	(_mm512_mask_scalef_round_ph): Likewise.
	(_mm512_maskz_scalef_round_ph): Likewise.
	(_mm_scalef_sh): Likewise.
	(_mm_mask_scalef_sh): Likewise.
	(_mm_maskz_scalef_sh): Likewise.
	(_mm_scalef_round_sh): Likewise.
	(_mm_mask_scalef_round_sh): Likewise.
	(_mm_maskz_scalef_round_sh): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_rcp_ph):
	New intrinsic.
	(_mm256_rcp_ph): Likewise.
	(_mm_mask_rcp_ph): Likewise.
	(_mm256_mask_rcp_ph): Likewise.
	(_mm_maskz_rcp_ph): Likewise.
	(_mm256_maskz_rcp_ph): Likewise.
	(_mm_scalef_ph): Likewise.
	(_mm256_scalef_ph): Likewise.
	(_mm_mask_scalef_ph): Likewise.
	(_mm256_mask_scalef_ph): Likewise.
	(_mm_maskz_scalef_ph): Likewise.
	(_mm256_maskz_scalef_ph): Likewise.
	* config/i386/i386-builtin.def: Add new builtins.
	* config/i386/sse.md (VFH_AVX512VL): New.
	(avx512fp16_rcp<mode>2<mask_name>): Ditto.
	(avx512fp16_vmrcpv8hf2<mask_scalar_name>): Ditto.
	(avx512f_vmscalef<mode><mask_scalar_name><round_scalar_name>):
	Adjust to support HF vector modes.
	(<avx512>_scalef<mode><mask_name><round_name>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 195 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   |  97 ++++++++++++
 gcc/config/i386/i386-builtin.def       |   8 +
 gcc/config/i386/sse.md                 |  49 +++++--
 gcc/testsuite/gcc.target/i386/avx-1.c  |   2 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   2 +
 gcc/testsuite/gcc.target/i386/sse-14.c |   6 +
 gcc/testsuite/gcc.target/i386/sse-22.c |   3 +
 gcc/testsuite/gcc.target/i386/sse-23.c |   2 +
 9 files changed, 355 insertions(+), 9 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 50db5d12140..9a52d2ac36e 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -1428,6 +1428,201 @@ _mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vrcpph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_rcp_ph (__m512h __A)
+{
+  return __builtin_ia32_vrcpph_v32hf_mask (__A, _mm512_setzero_ph (),
+					   (__mmask32) -1);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_rcp_ph (__m512h __A, __mmask32 __B, __m512h __C)
+{
+  return __builtin_ia32_vrcpph_v32hf_mask (__C, __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_rcp_ph (__mmask32 __A, __m512h __B)
+{
+  return __builtin_ia32_vrcpph_v32hf_mask (__B, _mm512_setzero_ph (),
+					   __A);
+}
+
+/* Intrinsics vrcpsh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rcp_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vrcpsh_v8hf_mask (__B, __A, _mm_setzero_ph (),
+					  (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_rcp_sh (__m128h __A, __mmask32 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vrcpsh_v8hf_mask (__D, __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_rcp_sh (__mmask32 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vrcpsh_v8hf_mask (__C, __B, _mm_setzero_ph (),
+					  __A);
+}
+
+/* Intrinsics vscalefph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_scalef_ph (__m512h __A, __m512h __B)
+{
+  return __builtin_ia32_vscalefph_v32hf_mask_round (__A, __B,
+						    _mm512_setzero_ph (),
+						    (__mmask32) -1,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_scalef_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
+{
+  return __builtin_ia32_vscalefph_v32hf_mask_round (__C, __D, __A, __B,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_scalef_ph (__mmask32 __A, __m512h __B, __m512h __C)
+{
+  return __builtin_ia32_vscalefph_v32hf_mask_round (__B, __C,
+						    _mm512_setzero_ph (),
+						    __A,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_scalef_round_ph (__m512h __A, __m512h __B, const int __C)
+{
+  return __builtin_ia32_vscalefph_v32hf_mask_round (__A, __B,
+						    _mm512_setzero_ph (),
+						    (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_scalef_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			     __m512h __D, const int __E)
+{
+  return __builtin_ia32_vscalefph_v32hf_mask_round (__C, __D, __A, __B,
+						    __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_scalef_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
+			      const int __D)
+{
+  return __builtin_ia32_vscalefph_v32hf_mask_round (__B, __C,
+						    _mm512_setzero_ph (),
+						    __A, __D);
+}
+
+#else
+#define _mm512_scalef_round_ph(A, B, C)					\
+  (__builtin_ia32_vscalefph_v32hf_mask_round ((A), (B),			\
+					      _mm512_setzero_ph (),	\
+					      (__mmask32)-1, (C)))
+
+#define _mm512_mask_scalef_round_ph(A, B, C, D, E)			\
+  (__builtin_ia32_vscalefph_v32hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_scalef_round_ph(A, B, C, D)			\
+  (__builtin_ia32_vscalefph_v32hf_mask_round ((B), (C),			\
+					      _mm512_setzero_ph (),	\
+					      (A), (D)))
+
+#endif  /* __OPTIMIZE__ */
+
+/* Intrinsics vscalefsh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_sh (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vscalefsh_v8hf_mask_round (__A, __B,
+						   _mm_setzero_ph (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_scalef_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vscalefsh_v8hf_mask_round (__C, __D, __A, __B,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_scalef_sh (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vscalefsh_v8hf_mask_round (__B, __C,
+						   _mm_setzero_ph (),
+						   __A,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_round_sh (__m128h __A, __m128h __B, const int __C)
+{
+  return __builtin_ia32_vscalefsh_v8hf_mask_round (__A, __B,
+						   _mm_setzero_ph (),
+						   (__mmask8) -1, __C);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_scalef_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			  __m128h __D, const int __E)
+{
+  return __builtin_ia32_vscalefsh_v8hf_mask_round (__C, __D, __A, __B,
+						   __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			   const int __D)
+{
+  return __builtin_ia32_vscalefsh_v8hf_mask_round (__B, __C,
+						   _mm_setzero_ph (),
+						   __A, __D);
+}
+
+#else
+#define _mm_scalef_round_sh(A, B, C)					  \
+  (__builtin_ia32_vscalefsh_v8hf_mask_round ((A), (B),			  \
+					     _mm_setzero_ph (),		  \
+					     (__mmask8)-1, (C)))
+
+#define _mm_mask_scalef_round_sh(A, B, C, D, E)				  \
+  (__builtin_ia32_vscalefsh_v8hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm_maskz_scalef_round_sh(A, B, C, D)				  \
+  (__builtin_ia32_vscalefsh_v8hf_mask_round ((B), (C), _mm_setzero_ph (), \
+					     (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index aaed85203c9..ebda59b9f9a 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -451,6 +451,103 @@ _mm256_maskz_rsqrt_ph (__mmask16 __A, __m256h __B)
 					     __A);
 }
 
+/* Intrinsics vrcpph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rcp_ph (__m128h __A)
+{
+  return __builtin_ia32_vrcpph_v8hf_mask (__A, _mm_setzero_ph (),
+					  (__mmask8) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_rcp_ph (__m256h __A)
+{
+  return __builtin_ia32_vrcpph_v16hf_mask (__A, _mm256_setzero_ph (),
+					   (__mmask16) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_rcp_ph (__m128h __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vrcpph_v8hf_mask (__C, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_rcp_ph (__m256h __A, __mmask16 __B, __m256h __C)
+{
+  return __builtin_ia32_vrcpph_v16hf_mask (__C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_rcp_ph (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vrcpph_v8hf_mask (__B, _mm_setzero_ph (), __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_rcp_ph (__mmask16 __A, __m256h __B)
+{
+  return __builtin_ia32_vrcpph_v16hf_mask (__B, _mm256_setzero_ph (),
+					   __A);
+}
+
+/* Intrinsics vscalefph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_scalef_ph (__m128h __A, __m128h __B)
+{
+  return __builtin_ia32_vscalefph_v8hf_mask (__A, __B,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_scalef_ph (__m256h __A, __m256h __B)
+{
+  return __builtin_ia32_vscalefph_v16hf_mask (__A, __B,
+					      _mm256_setzero_ph (),
+					      (__mmask16) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_scalef_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return __builtin_ia32_vscalefph_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_scalef_ph (__m256h __A, __mmask16 __B, __m256h __C,
+		       __m256h __D)
+{
+  return __builtin_ia32_vscalefph_v16hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_scalef_ph (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return __builtin_ia32_vscalefph_v8hf_mask (__B, __C,
+					     _mm_setzero_ph (), __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_scalef_ph (__mmask16 __A, __m256h __B, __m256h __C)
+{
+  return __builtin_ia32_vscalefph_v16hf_mask (__B, __C,
+					      _mm256_setzero_ph (),
+					      __A);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 999b2e1abb5..7b8ca3ba685 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2808,6 +2808,12 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv16hf2_mask, "__builtin_ia32_vrsqrtph_v16hf_mask", IX86_BUILTIN_VRSQRTPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_vrsqrtph_v32hf_mask", IX86_BUILTIN_VRSQRTPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrsqrtv8hf2_mask, "__builtin_ia32_vrsqrtsh_v8hf_mask", IX86_BUILTIN_VRSQRTSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv8hf2_mask, "__builtin_ia32_vrcpph_v8hf_mask", IX86_BUILTIN_VRCPPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv16hf2_mask, "__builtin_ia32_vrcpph_v16hf_mask", IX86_BUILTIN_VRCPPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__builtin_ia32_vrcpph_v32hf_mask", IX86_BUILTIN_VRCPPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrcpv8hf2_mask, "__builtin_ia32_vrcpsh_v8hf_mask", IX86_BUILTIN_VRCPSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_scalefv8hf_mask, "__builtin_ia32_vscalefph_v8hf_mask", IX86_BUILTIN_VSCALEFPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_scalefv16hf_mask, "__builtin_ia32_vscalefph_v16hf_mask", IX86_BUILTIN_VSCALEFPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3025,6 +3031,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_vcmpsh_v8hf_mask_round", IX86_BUILTIN_VCMPSH_V8HF_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_vsqrtph_v32hf_mask_round", IX86_BUILTIN_VSQRTPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_vsqrtsh_v8hf_mask_round", IX86_BUILTIN_VSQRTSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_vscalefph_v32hf_mask_round", IX86_BUILTIN_VSCALEFPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmscalefv8hf_mask_round, "__builtin_ia32_vscalefsh_v8hf_mask_round", IX86_BUILTIN_VSCALEFSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4763fd0558d..683efe4bb0e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -386,6 +386,13 @@ (define_mode_iterator VF_AVX512VL
 (define_mode_iterator VF1_AVX512ER_128_256
   [(V16SF "TARGET_AVX512ER") (V8SF "TARGET_AVX") V4SF])
 
+(define_mode_iterator VFH_AVX512VL
+  [(V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+   V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+
 (define_mode_iterator VF2_AVX512VL
   [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
@@ -2198,6 +2205,30 @@ (define_insn "*sse_vmrcpv4sf2"
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
+(define_insn "avx512fp16_rcp<mode>2<mask_name>"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
+	(unspec:VF_AVX512FP16VL
+	  [(match_operand:VF_AVX512FP16VL 1 "nonimmediate_operand" "vm")]
+	  UNSPEC_RCP))]
+  "TARGET_AVX512FP16"
+  "vrcpph\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512fp16_vmrcpv8hf2<mask_scalar_name>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (unspec:V8HF [(match_operand:V8HF 1 "nonimmediate_operand" "vm")]
+		       UNSPEC_RCP)
+	  (match_operand:V8HF 2 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vrcpsh\t{%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %w1}"
+  [(set_attr "type" "sse")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_insn "<mask_codefor>rcp14<mode><mask_name>"
   [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
 	(unspec:VF_AVX512VL
@@ -9948,11 +9979,11 @@ (define_split
 })
 
 (define_insn "avx512f_vmscalef<mode><mask_scalar_name><round_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (unspec:VF_128
-	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "<round_scalar_nimm_predicate>" "<round_scalar_constraint>")]
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (unspec:VFH_128
+	    [(match_operand:VFH_128 1 "register_operand" "v")
+	     (match_operand:VFH_128 2 "<round_scalar_nimm_predicate>" "<round_scalar_constraint>")]
 	    UNSPEC_SCALEF)
 	  (match_dup 1)
 	  (const_int 1)))]
@@ -9962,10 +9993,10 @@ (define_insn "avx512f_vmscalef<mode><mask_scalar_name><round_scalar_name>"
    (set_attr "mode"  "<ssescalarmode>")])
 
 (define_insn "<avx512>_scalef<mode><mask_name><round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(unspec:VF_AVX512VL
-	  [(match_operand:VF_AVX512VL 1 "register_operand" "v")
-	   (match_operand:VF_AVX512VL 2 "nonimmediate_operand" "<round_constraint>")]
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(unspec:VFH_AVX512VL
+	  [(match_operand:VFH_AVX512VL 1 "register_operand" "v")
+	   (match_operand:VFH_AVX512VL 2 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_SCALEF))]
   "TARGET_AVX512F"
   "vscalef<ssemodesuffix>\t{<round_mask_op3>%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2<round_mask_op3>}"
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 651cb1c80fb..17c396567f2 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -703,6 +703,8 @@
 #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
 #define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8)
 #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
+#define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 94553dec9e7..c1d95fc2ead 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -720,6 +720,8 @@
 #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
 #define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8)
 #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
+#define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 7281bffdf2b..5b6d0b082d1 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -687,6 +687,8 @@ test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1)
 test_2 (_mm_comi_sh, int, __m128h, __m128h, 1)
 test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8)
 test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm512_scalef_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm_scalef_round_sh, __m128h, __m128h, __m128h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -705,6 +707,8 @@ test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1)
 test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
 test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm512_maskz_scalef_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm_maskz_scalef_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -720,6 +724,8 @@ test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h,
 test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm_mask_scalef_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 
 /* avx512fp16vlintrin.h */
 test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 04326e0e37d..b2de5679bb6 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -792,6 +792,7 @@ test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1)
 test_2 (_mm_comi_sh, int, __m128h, __m128h, 1)
 test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8)
 test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm512_scalef_round_ph, __m512h, __m512h, __m512h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -810,6 +811,7 @@ test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1)
 test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
 test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm512_maskz_scalef_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -825,6 +827,7 @@ test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h,
 test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 
 /* avx512fp16vlintrin.h */
 test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 7559d335dbc..5948622cc4f 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -721,6 +721,8 @@
 #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
 #define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8)
 #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
+#define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 19/62] AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (17 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 18/62] AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 20/62] AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh liuhongt
                   ` (42 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vrcpph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vrcpph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vrcpsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vrcpsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vscalefph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vscalefph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vscalefsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vscalefsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vrcpph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vrcpph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vscalefph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vscalefph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-vrcpph-1a.c    | 19 ++++
 .../gcc.target/i386/avx512fp16-vrcpph-1b.c    | 79 ++++++++++++++++
 .../gcc.target/i386/avx512fp16-vrcpsh-1a.c    | 18 ++++
 .../gcc.target/i386/avx512fp16-vrcpsh-1b.c    | 57 +++++++++++
 .../gcc.target/i386/avx512fp16-vscalefph-1a.c | 25 +++++
 .../gcc.target/i386/avx512fp16-vscalefph-1b.c | 94 +++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vscalefsh-1a.c | 23 +++++
 .../gcc.target/i386/avx512fp16-vscalefsh-1b.c | 58 ++++++++++++
 .../gcc.target/i386/avx512fp16vl-vrcpph-1a.c  | 29 ++++++
 .../gcc.target/i386/avx512fp16vl-vrcpph-1b.c  | 16 ++++
 .../i386/avx512fp16vl-vscalefph-1a.c          | 29 ++++++
 .../i386/avx512fp16vl-vscalefph-1b.c          | 16 ++++
 12 files changed, 463 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1a.c
new file mode 100644
index 00000000000..6a5c642d7d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1a.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res;
+volatile __m512h x1;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_rcp_ph (x1);
+  res = _mm512_mask_rcp_ph (res, m32, x1);
+  res = _mm512_maskz_rcp_ph (m32, x1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1b.c
new file mode 100644
index 00000000000..4a65451af3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpph-1b.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(rcp_ph) (V512 * dest, V512 op1,
+	       __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+      if (((1 << i) & m1) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.u32[i] = v7.u32[i];
+	  }
+      }
+      else {
+	  v5.f32[i] = 1. / v1.f32[i];
+
+      }
+
+      if (((1 << i) & m2) == 0) {
+	  if (zero_mask) {
+	      v6.f32[i] = 0;
+	  }
+	  else {
+	      v6.u32[i] = v8.u32[i];
+	  }
+      }
+      else {
+	  v6.f32[i] = 1. / v2.f32[i];
+      }
+
+  }
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(rcp_ph) (&exp, src1,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_rcp_ph) (HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _rcp_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(rcp_ph) (&exp, src1, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_rcp_ph) (HF(res), MASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_rcp_ph);
+
+  EMULATE(rcp_ph) (&exp, src1,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_rcp_ph) (ZMASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_rcp_ph);
+
+  if (n_errs != 0)
+    abort ();
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1a.c
new file mode 100644
index 00000000000..0a5a18e8b84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1a.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vrcpsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrcpsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vrcpsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, x1, x2;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_rcp_sh (x1, x2);
+  res = _mm_mask_rcp_sh (res, m8, x1, x2);
+  res = _mm_maskz_rcp_sh (m8, x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1b.c
new file mode 100644
index 00000000000..531689569cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrcpsh-1b.c
@@ -0,0 +1,57 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_rcp_sh(V512 * dest, V512 op1,
+                __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = 1. / v1.f32[0]; 
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+   
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_rcp_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_rcp_sh(exp.xmmh[0], src1.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_rcp_sh");
+
+  init_dest(&res, &exp);
+  emulate_rcp_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_mask_rcp_sh(res.xmmh[0], 0x1, exp.xmmh[0], src1.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_rcp_sh");
+
+  emulate_rcp_sh(&exp, src1,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_rcp_sh(0x3, exp.xmmh[0], src1.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_rcp_sh");
+
+  if (n_errs != 0)
+    abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1a.c
new file mode 100644
index 00000000000..f3d27898f27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_scalef_ph (x1, x2);
+  res1 = _mm512_mask_scalef_ph (res1, m32, x1, x2);
+  res2 = _mm512_maskz_scalef_ph (m32, x1, x2);
+  res = _mm512_scalef_round_ph (x1, x2, 8);
+  res1 = _mm512_mask_scalef_round_ph (res1, m32, x1, x2, 8);
+  res2 = _mm512_maskz_scalef_round_ph (m32, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
new file mode 100644
index 00000000000..7c7288d6eb3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefph-1b.c
@@ -0,0 +1,94 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define DEBUG
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(scalef_ph) (V512 * dest, V512 op1, V512 op2,
+		  __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8; 
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+      if (((1 << i) & m1) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.u32[i] = v7.u32[i];
+	  }
+      }
+      else {
+	  v5.f32[i] = v1.f32[i] * powf(2.0f, floorf(v3.f32[i]));
+      }
+
+      if (((1 << i) & m2) == 0) {
+	  if (zero_mask) {
+	      v6.f32[i] = 0;
+	  }
+	  else {
+	      v6.u32[i] = v8.u32[i];
+	  }
+      }
+      else {
+	  v6.f32[i] = v2.f32[i] * powf(2.0f, floorf(v4.f32[i]));
+      }
+  }
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(scalef_ph) (&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_scalef_ph) (HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _scalef_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(scalef_ph) (&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_scalef_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_scalef_ph);
+
+  EMULATE(scalef_ph) (&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_scalef_ph) (ZMASK_VALUE, HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_scalef_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(scalef_ph) (&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_scalef_round_ph) (HF(src1), HF(src2), 0x04);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _scalef_round_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(scalef_ph) (&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_scalef_round_ph) (HF(res), MASK_VALUE, HF(src1), HF(src2), 0x04);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_scalef_round_ph);
+
+  EMULATE(scalef_ph) (&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_scalef_round_ph) (ZMASK_VALUE, HF(src1), HF(src2), 0x04);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_scalef_round_ph);
+#endif
+
+  if (n_errs != 0)
+    abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1a.c
new file mode 100644
index 00000000000..999c04849e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1a.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, x1, x2;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_scalef_sh (x1, x2);
+  res = _mm_mask_scalef_sh (res, m8, x1, x2);
+  res = _mm_maskz_scalef_sh (m8, x1, x2);
+  res = _mm_scalef_round_sh (x1, x2, 4);
+  res = _mm_mask_scalef_round_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_scalef_round_sh (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1b.c
new file mode 100644
index 00000000000..5db7be0715f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vscalefsh-1b.c
@@ -0,0 +1,58 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_scalef_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = v1.f32[0] * powf(2.0f, floorf(v3.f32[0])); 
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+   
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  emulate_scalef_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_scalef_round_sh(src1.xmmh[0], src2.xmmh[0], (0x00 | 0x08));
+  check_results(&res, &exp, N_ELEMS, "_mm_scalef_round_sh");
+
+  init_dest(&res, &exp);
+  emulate_scalef_sh(&exp, src1, src2,  0x1, 0);
+  res.xmmh[0] = _mm_mask_scalef_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0], (0x00 | 0x08));
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_scalef_round_sh");
+
+  emulate_scalef_sh(&exp, src1, src2,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_scalef_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], (0x00 | 0x08));
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_scalef_round_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1a.c
new file mode 100644
index 00000000000..5894dbc679f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vrcpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1;
+volatile __m128h x2;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_rcp_ph (x1);
+  res1 = _mm256_mask_rcp_ph (res1, m16, x1);
+  res1 = _mm256_maskz_rcp_ph (m16, x1);
+
+  res2 = _mm_rcp_ph (x2);
+  res2 = _mm_mask_rcp_ph (res2, m8, x2);
+  res2 = _mm_maskz_rcp_ph (m8, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c
new file mode 100644
index 00000000000..a6b1e376a8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrcpph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vrcpph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vrcpph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1a.c
new file mode 100644
index 00000000000..22231d628cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscalefph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1,x2;
+volatile __m128h x3, x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_scalef_ph (x1, x2);
+  res1 = _mm256_mask_scalef_ph (res1, m16, x1, x2);
+  res1 = _mm256_maskz_scalef_ph (m16, x1, x2);
+
+  res2 = _mm_scalef_ph (x3, x4);
+  res2 = _mm_mask_scalef_ph (res2, m8, x3, x4);
+  res2 = _mm_maskz_scalef_ph (m8, x3, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c
new file mode 100644
index 00000000000..5c12d08e2e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vscalefph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vscalefph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vscalefph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 20/62] AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (18 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 19/62] AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 21/62] AVX512FP16: Add testcase for vreduceph/vreducesh/vrndscaleph/vrndscalesh liuhongt
                   ` (41 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm512_reduce_ph):
	New intrinsic.
	(_mm512_mask_reduce_ph): Likewise.
	(_mm512_maskz_reduce_ph): Likewise.
	(_mm512_reduce_round_ph): Likewise.
	(_mm512_mask_reduce_round_ph): Likewise.
	(_mm512_maskz_reduce_round_ph): Likewise.
	(_mm_reduce_sh): Likewise.
	(_mm_mask_reduce_sh): Likewise.
	(_mm_maskz_reduce_sh): Likewise.
	(_mm_reduce_round_sh): Likewise.
	(_mm_mask_reduce_round_sh): Likewise.
	(_mm_maskz_reduce_round_sh): Likewise.
	(_mm512_roundscale_ph): Likewise.
	(_mm512_mask_roundscale_ph): Likewise.
	(_mm512_maskz_roundscale_ph): Likewise.
	(_mm512_roundscale_round_ph): Likewise.
	(_mm512_mask_roundscale_round_ph): Likewise.
	(_mm512_maskz_roundscale_round_ph): Likewise.
	(_mm_roundscale_sh): Likewise.
	(_mm_mask_roundscale_sh): Likewise.
	(_mm_maskz_roundscale_sh): Likewise.
	(_mm_roundscale_round_sh): Likewise.
	(_mm_mask_roundscale_round_sh): Likewise.
	(_mm_maskz_roundscale_round_sh): Likewise.
	* config/i386/avx512fp16vlintrin.h: (_mm_reduce_ph):
	New intrinsic.
	(_mm_mask_reduce_ph): Likewise.
	(_mm_maskz_reduce_ph): Likewise.
	(_mm256_reduce_ph): Likewise.
	(_mm256_mask_reduce_ph): Likewise.
	(_mm256_maskz_reduce_ph): Likewise.
	(_mm_roundscale_ph): Likewise.
	(_mm_mask_roundscale_ph): Likewise.
	(_mm_maskz_roundscale_ph): Likewise.
	(_mm256_roundscale_ph): Likewise.
	(_mm256_mask_roundscale_ph): Likewise.
	(_mm256_maskz_roundscale_ph): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	(ix86_expand_round_builtin): Ditto.
	* config/i386/sse.md (<mask_codefor>reducep<mode><mask_name>):
	Renamed to ...
	(<mask_codefor>reducep<mode><mask_name><round_saeonly_name>):
	... this, and adjust for round operands.
	(reduces<mode><mask_scalar_name>): Likewise, with ...
	(reduces<mode><mask_scalar_name><round_saeonly_scalar_name):
	... this.
	(<avx512>_rndscale<mode><mask_name><round_saeonly_name>):
	Adjust for HF vector modes.
	(avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>):
	Ditto.
	(*avx512f_rndscale<mode><round_saeonly_name>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 359 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   | 153 +++++++++++
 gcc/config/i386/i386-builtin-types.def |   4 +
 gcc/config/i386/i386-builtin.def       |   8 +
 gcc/config/i386/i386-expand.c          |   4 +
 gcc/config/i386/sse.md                 |  44 +--
 gcc/testsuite/gcc.target/i386/avx-1.c  |   8 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   8 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  36 +++
 gcc/testsuite/gcc.target/i386/sse-22.c |  36 +++
 gcc/testsuite/gcc.target/i386/sse-23.c |   8 +
 11 files changed, 646 insertions(+), 22 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 9a52d2ac36e..8c2c9b28987 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -1623,6 +1623,365 @@ _mm_maskz_scalef_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vreduceph.  */
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_reduce_ph (__m512h __A, int __B)
+{
+  return __builtin_ia32_vreduceph_v32hf_mask_round (__A, __B,
+						    _mm512_setzero_ph (),
+						    (__mmask32) -1,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_reduce_ph (__m512h __A, __mmask32 __B, __m512h __C, int __D)
+{
+  return __builtin_ia32_vreduceph_v32hf_mask_round (__C, __D, __A, __B,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_reduce_ph (__mmask32 __A, __m512h __B, int __C)
+{
+  return __builtin_ia32_vreduceph_v32hf_mask_round (__B, __C,
+						    _mm512_setzero_ph (),
+						    __A,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_reduce_round_ph (__m512h __A, int __B, const int __C)
+{
+  return __builtin_ia32_vreduceph_v32hf_mask_round (__A, __B,
+						    _mm512_setzero_ph (),
+						    (__mmask32) -1, __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_reduce_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
+			     int __D, const int __E)
+{
+  return __builtin_ia32_vreduceph_v32hf_mask_round (__C, __D, __A, __B,
+						    __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_reduce_round_ph (__mmask32 __A, __m512h __B, int __C,
+			      const int __D)
+{
+  return __builtin_ia32_vreduceph_v32hf_mask_round (__B, __C,
+						    _mm512_setzero_ph (),
+						    __A, __D);
+}
+
+#else
+#define _mm512_reduce_ph(A, B)						\
+  (__builtin_ia32_vreduceph_v32hf_mask_round ((A), (B),			\
+					      _mm512_setzero_ph (),	\
+					      (__mmask32)-1,		\
+					      _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_mask_reduce_ph(A, B, C, D)				\
+  (__builtin_ia32_vreduceph_v32hf_mask_round ((C), (D), (A), (B),	\
+					      _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_maskz_reduce_ph(A, B, C)					\
+  (__builtin_ia32_vreduceph_v32hf_mask_round ((B), (C),			\
+					      _mm512_setzero_ph (),	\
+					      (A), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_reduce_round_ph(A, B, C)					\
+  (__builtin_ia32_vreduceph_v32hf_mask_round ((A), (B),			\
+					      _mm512_setzero_ph (),	\
+					      (__mmask32)-1, (C)))
+
+#define _mm512_mask_reduce_round_ph(A, B, C, D, E)			\
+  (__builtin_ia32_vreduceph_v32hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_reduce_round_ph(A, B, C, D)			\
+  (__builtin_ia32_vreduceph_v32hf_mask_round ((B), (C),			\
+					      _mm512_setzero_ph (),	\
+					      (A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vreducesh.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_reduce_sh (__m128h __A, __m128h __B, int __C)
+{
+  return __builtin_ia32_vreducesh_v8hf_mask_round (__A, __B, __C,
+						   _mm_setzero_ph (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_reduce_sh (__m128h __A, __mmask8 __B, __m128h __C,
+		    __m128h __D, int __E)
+{
+  return __builtin_ia32_vreducesh_v8hf_mask_round (__C, __D, __E, __A, __B,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_reduce_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
+{
+  return __builtin_ia32_vreducesh_v8hf_mask_round (__B, __C, __D,
+						   _mm_setzero_ph (), __A,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_reduce_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
+{
+  return __builtin_ia32_vreducesh_v8hf_mask_round (__A, __B, __C,
+						   _mm_setzero_ph (),
+						   (__mmask8) -1, __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_reduce_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			  __m128h __D, int __E, const int __F)
+{
+  return __builtin_ia32_vreducesh_v8hf_mask_round (__C, __D, __E, __A,
+						   __B, __F);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_reduce_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			   int __D, const int __E)
+{
+  return __builtin_ia32_vreducesh_v8hf_mask_round (__B, __C, __D,
+						   _mm_setzero_ph (),
+						   __A, __E);
+}
+
+#else
+#define _mm_reduce_sh(A, B, C)						\
+  (__builtin_ia32_vreducesh_v8hf_mask_round ((A), (B), (C),		\
+					     _mm_setzero_ph (),	\
+					     (__mmask8)-1,		\
+					     _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_reduce_sh(A, B, C, D, E)				\
+  (__builtin_ia32_vreducesh_v8hf_mask_round ((C), (D), (E), (A), (B),	\
+					     _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_reduce_sh(A, B, C, D)					\
+  (__builtin_ia32_vreducesh_v8hf_mask_round ((B), (C), (D),		\
+					     _mm_setzero_ph (),	\
+					     (A), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_reduce_round_sh(A, B, C, D)				\
+  (__builtin_ia32_vreducesh_v8hf_mask_round ((A), (B), (C),	\
+					     _mm_setzero_ph (),	\
+					     (__mmask8)-1, (D)))
+
+#define _mm_mask_reduce_round_sh(A, B, C, D, E, F)			\
+  (__builtin_ia32_vreducesh_v8hf_mask_round ((C), (D), (E), (A), (B), (F)))
+
+#define _mm_maskz_reduce_round_sh(A, B, C, D, E)		\
+  (__builtin_ia32_vreducesh_v8hf_mask_round ((B), (C), (D),	\
+					     _mm_setzero_ph (),	\
+					     (A), (E)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vrndscaleph.  */
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_roundscale_ph (__m512h __A, int __B)
+{
+  return __builtin_ia32_vrndscaleph_v32hf_mask_round (__A, __B,
+						      _mm512_setzero_ph (),
+						      (__mmask32) -1,
+						      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_roundscale_ph (__m512h __A, __mmask32 __B,
+				 __m512h __C, int __D)
+{
+  return __builtin_ia32_vrndscaleph_v32hf_mask_round (__C, __D, __A, __B,
+						      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_roundscale_ph (__mmask32 __A, __m512h __B, int __C)
+{
+  return __builtin_ia32_vrndscaleph_v32hf_mask_round (__B, __C,
+						      _mm512_setzero_ph (),
+						      __A,
+						      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_roundscale_round_ph (__m512h __A, int __B, const int __C)
+{
+  return __builtin_ia32_vrndscaleph_v32hf_mask_round (__A, __B,
+						      _mm512_setzero_ph (),
+						      (__mmask32) -1,
+						      __C);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_roundscale_round_ph (__m512h __A, __mmask32 __B,
+				 __m512h __C, int __D, const int __E)
+{
+  return __builtin_ia32_vrndscaleph_v32hf_mask_round (__C, __D, __A,
+						      __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_roundscale_round_ph (__mmask32 __A, __m512h __B, int __C,
+				  const int __D)
+{
+  return __builtin_ia32_vrndscaleph_v32hf_mask_round (__B, __C,
+						      _mm512_setzero_ph (),
+						      __A, __D);
+}
+
+#else
+#define _mm512_roundscale_ph(A, B) \
+  (__builtin_ia32_vrndscaleph_v32hf_mask_round ((A), (B),		\
+						_mm512_setzero_ph (),	\
+						(__mmask32)-1,		\
+						_MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_mask_roundscale_ph(A, B, C, D) \
+  (__builtin_ia32_vrndscaleph_v32hf_mask_round ((C), (D), (A), (B),	\
+						_MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_maskz_roundscale_ph(A, B, C) \
+  (__builtin_ia32_vrndscaleph_v32hf_mask_round ((B), (C),		\
+						_mm512_setzero_ph (),	\
+						(A),			\
+						_MM_FROUND_CUR_DIRECTION))
+#define _mm512_roundscale_round_ph(A, B, C) \
+  (__builtin_ia32_vrndscaleph_v32hf_mask_round ((A), (B),		\
+						_mm512_setzero_ph (),	\
+						(__mmask32)-1, (C)))
+
+#define _mm512_mask_roundscale_round_ph(A, B, C, D, E)			\
+  (__builtin_ia32_vrndscaleph_v32hf_mask_round ((C), (D), (A), (B), (E)))
+
+#define _mm512_maskz_roundscale_round_ph(A, B, C, D) \
+  (__builtin_ia32_vrndscaleph_v32hf_mask_round ((B), (C),		\
+						_mm512_setzero_ph (),	\
+						(A), (D)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vrndscalesh.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_sh (__m128h __A, __m128h __B, int __C)
+{
+  return __builtin_ia32_vrndscalesh_v8hf_mask_round (__A, __B, __C,
+						     _mm_setzero_ph (),
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_roundscale_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			__m128h __D, int __E)
+{
+  return __builtin_ia32_vrndscalesh_v8hf_mask_round (__C, __D, __E, __A, __B,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_roundscale_sh (__mmask8 __A, __m128h __B, __m128h __C, int __D)
+{
+  return __builtin_ia32_vrndscalesh_v8hf_mask_round (__B, __C, __D,
+						     _mm_setzero_ph (), __A,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_round_sh (__m128h __A, __m128h __B, int __C, const int __D)
+{
+  return __builtin_ia32_vrndscalesh_v8hf_mask_round (__A, __B, __C,
+						     _mm_setzero_ph (),
+						     (__mmask8) -1,
+						     __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_roundscale_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
+			      __m128h __D, int __E, const int __F)
+{
+  return __builtin_ia32_vrndscalesh_v8hf_mask_round (__C, __D, __E,
+						     __A, __B, __F);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
+			       int __D, const int __E)
+{
+  return __builtin_ia32_vrndscalesh_v8hf_mask_round (__B, __C, __D,
+						     _mm_setzero_ph (),
+						     __A, __E);
+}
+
+#else
+#define _mm_roundscale_sh(A, B, C)					\
+  (__builtin_ia32_vrndscalesh_v8hf_mask_round ((A), (B), (C),		\
+					       _mm_setzero_ph (),	\
+					       (__mmask8)-1, \
+					       _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_roundscale_sh(A, B, C, D, E)				\
+  (__builtin_ia32_vrndscalesh_v8hf_mask_round ((C), (D), (E), (A), (B), \
+					       _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_roundscale_sh(A, B, C, D)				\
+  (__builtin_ia32_vrndscalesh_v8hf_mask_round ((B), (C), (D),		\
+					       _mm_setzero_ph (),	\
+					       (A), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_roundscale_round_sh(A, B, C, D)				\
+  (__builtin_ia32_vrndscalesh_v8hf_mask_round ((A), (B), (C),		\
+					       _mm_setzero_ph (),	\
+					       (__mmask8)-1, (D)))
+
+#define _mm_mask_roundscale_round_sh(A, B, C, D, E, F)			\
+  (__builtin_ia32_vrndscalesh_v8hf_mask_round ((C), (D), (E), (A), (B), (F)))
+
+#define _mm_maskz_roundscale_round_sh(A, B, C, D, E)			\
+  (__builtin_ia32_vrndscalesh_v8hf_mask_round ((B), (C), (D),		\
+					       _mm_setzero_ph (),	\
+					       (A), (E)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index ebda59b9f9a..20b6716aa00 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -548,6 +548,159 @@ _mm256_maskz_scalef_ph (__mmask16 __A, __m256h __B, __m256h __C)
 					      __A);
 }
 
+/* Intrinsics vreduceph.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_reduce_ph (__m128h __A, int __B)
+{
+  return __builtin_ia32_vreduceph_v8hf_mask (__A, __B,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_reduce_ph (__m128h __A, __mmask8 __B, __m128h __C, int __D)
+{
+  return __builtin_ia32_vreduceph_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_reduce_ph (__mmask8 __A, __m128h __B, int __C)
+{
+  return __builtin_ia32_vreduceph_v8hf_mask (__B, __C,
+					     _mm_setzero_ph (), __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_reduce_ph (__m256h __A, int __B)
+{
+  return __builtin_ia32_vreduceph_v16hf_mask (__A, __B,
+					      _mm256_setzero_ph (),
+					      (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_reduce_ph (__m256h __A, __mmask16 __B, __m256h __C, int __D)
+{
+  return __builtin_ia32_vreduceph_v16hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_reduce_ph (__mmask16 __A, __m256h __B, int __C)
+{
+  return __builtin_ia32_vreduceph_v16hf_mask (__B, __C,
+					      _mm256_setzero_ph (),
+					      __A);
+}
+
+#else
+#define _mm_reduce_ph(A, B)					\
+  (__builtin_ia32_vreduceph_v8hf_mask ((A), (B),\
+				       _mm_setzero_ph (),	\
+				       ((__mmask8)-1)))
+
+#define _mm_mask_reduce_ph(A,  B,  C, D)		\
+  (__builtin_ia32_vreduceph_v8hf_mask ((C), (D), (A), (B)))
+
+#define _mm_maskz_reduce_ph(A,  B, C)				\
+  (__builtin_ia32_vreduceph_v8hf_mask ((B), (C), _mm_setzero_ph (), (A)))
+
+#define _mm256_reduce_ph(A, B)					\
+  (__builtin_ia32_vreduceph_v16hf_mask ((A), (B),\
+					_mm256_setzero_ph (),	\
+					((__mmask16)-1)))
+
+#define _mm256_mask_reduce_ph(A, B, C, D)		\
+  (__builtin_ia32_vreduceph_v16hf_mask ((C), (D), (A), (B)))
+
+#define _mm256_maskz_reduce_ph(A, B, C)				\
+  (__builtin_ia32_vreduceph_v16hf_mask ((B), (C), _mm256_setzero_ph (), (A)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vrndscaleph.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roundscale_ph (__m128h __A, int __B)
+{
+  return __builtin_ia32_vrndscaleph_v8hf_mask (__A, __B,
+					       _mm_setzero_ph (),
+					       (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_roundscale_ph (__m128h __A, __mmask8 __B, __m128h __C, int __D)
+{
+  return __builtin_ia32_vrndscaleph_v8hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_roundscale_ph (__mmask8 __A, __m128h __B, int __C)
+{
+  return __builtin_ia32_vrndscaleph_v8hf_mask (__B, __C,
+					       _mm_setzero_ph (), __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_roundscale_ph (__m256h __A, int __B)
+{
+  return __builtin_ia32_vrndscaleph_v16hf_mask (__A, __B,
+						_mm256_setzero_ph (),
+						(__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_roundscale_ph (__m256h __A, __mmask16 __B, __m256h __C,
+			   int __D)
+{
+  return __builtin_ia32_vrndscaleph_v16hf_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_roundscale_ph (__mmask16 __A, __m256h __B, int __C)
+{
+  return __builtin_ia32_vrndscaleph_v16hf_mask (__B, __C,
+						_mm256_setzero_ph (),
+						__A);
+}
+
+#else
+#define _mm_roundscale_ph(A, B) \
+  (__builtin_ia32_vrndscaleph_v8hf_mask ((A), (B), _mm_setzero_ph (),	\
+					 ((__mmask8)-1)))
+
+#define _mm_mask_roundscale_ph(A, B, C, D) \
+  (__builtin_ia32_vrndscaleph_v8hf_mask ((C), (D), (A), (B)))
+
+#define _mm_maskz_roundscale_ph(A, B, C) \
+  (__builtin_ia32_vrndscaleph_v8hf_mask ((B), (C), _mm_setzero_ph (), (A)))
+
+#define _mm256_roundscale_ph(A, B) \
+  (__builtin_ia32_vrndscaleph_v16hf_mask ((A), (B),	      \
+					 _mm256_setzero_ph(), \
+					  ((__mmask16)-1)))
+
+#define _mm256_mask_roundscale_ph(A, B, C, D) \
+  (__builtin_ia32_vrndscaleph_v16hf_mask ((C), (D), (A), (B)))
+
+#define _mm256_maskz_roundscale_ph(A, B, C) \
+  (__builtin_ia32_vrndscaleph_v16hf_mask ((B), (C),			\
+					  _mm256_setzero_ph (), (A)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 9ebad6b5f49..d2ba1a5edac 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1307,12 +1307,15 @@ DEF_FUNCTION_TYPE (V8HF, V8HI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
+DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
+DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI)
+DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI)
 DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI)
@@ -1322,3 +1325,4 @@ DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
 DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
+DEF_FUNCTION_TYPE (V32HF, V32HF, INT, V32HF, USI, INT)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 7b8ca3ba685..6964062c874 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2814,6 +2814,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rcpv32hf2_mask, "__bu
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrcpv8hf2_mask, "__builtin_ia32_vrcpsh_v8hf_mask", IX86_BUILTIN_VRCPSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_scalefv8hf_mask, "__builtin_ia32_vscalefph_v8hf_mask", IX86_BUILTIN_VSCALEFPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_scalefv16hf_mask, "__builtin_ia32_vscalefph_v16hf_mask", IX86_BUILTIN_VSCALEFPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv8hf_mask, "__builtin_ia32_vreduceph_v8hf_mask", IX86_BUILTIN_VREDUCEPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv16hf_mask, "__builtin_ia32_vreduceph_v16hf_mask", IX86_BUILTIN_VREDUCEPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rndscalev8hf_mask, "__builtin_ia32_vrndscaleph_v8hf_mask", IX86_BUILTIN_VRNDSCALEPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_rndscalev16hf_mask, "__builtin_ia32_vrndscaleph_v16hf_mask", IX86_BUILTIN_VRNDSCALEPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3033,6 +3037,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_vsqrtsh_v8hf_mask_round", IX86_BUILTIN_VSQRTSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_scalefv32hf_mask_round, "__builtin_ia32_vscalefph_v32hf_mask_round", IX86_BUILTIN_VSCALEFPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmscalefv8hf_mask_round, "__builtin_ia32_vscalefsh_v8hf_mask_round", IX86_BUILTIN_VSCALEFSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv32hf_mask_round, "__builtin_ia32_vreduceph_v32hf_mask_round", IX86_BUILTIN_VREDUCEPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducesv8hf_mask_round, "__builtin_ia32_vreducesh_v8hf_mask_round", IX86_BUILTIN_VREDUCESH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_vrndscaleph_v32hf_mask_round", IX86_BUILTIN_VRNDSCALEPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_rndscalev8hf_mask_round, "__builtin_ia32_vrndscalesh_v8hf_mask_round", IX86_BUILTIN_VRNDSCALESH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index d76e4405413..655234cbdd0 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9883,6 +9883,8 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16SF_FTYPE_V16SF_INT_V16SF_UHI:
     case V16HI_FTYPE_V16SF_INT_V16HI_UHI:
     case V16SI_FTYPE_V16SI_INT_V16SI_UHI:
+    case V16HF_FTYPE_V16HF_INT_V16HF_UHI:
+    case V8HF_FTYPE_V8HF_INT_V8HF_UQI:
     case V4SI_FTYPE_V16SI_INT_V4SI_UQI:
     case V4DI_FTYPE_V8DI_INT_V4DI_UQI:
     case V4DF_FTYPE_V8DF_INT_V4DF_UQI:
@@ -10531,6 +10533,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT:
       nargs = 5;
       break;
+    case V32HF_FTYPE_V32HF_INT_V32HF_USI_INT:
     case V16SF_FTYPE_V16SF_INT_V16SF_HI_INT:
     case V8DF_FTYPE_V8DF_INT_V8DF_QI_INT:
     case V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT:
@@ -10553,6 +10556,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V2DF_FTYPE_V2DF_V2DF_INT_V2DF_QI_INT:
     case V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT:
     case V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT:
+    case V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT:
       nargs = 6;
       nargs_constant = 4;
       break;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 683efe4bb0e..f43651a95ce 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3070,28 +3070,28 @@ (define_expand "reduc_umin_scal_v8hi"
 })
 
 (define_insn "<mask_codefor>reducep<mode><mask_name><round_saeonly_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(unspec:VF_AVX512VL
-	  [(match_operand:VF_AVX512VL 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(unspec:VFH_AVX512VL
+	  [(match_operand:VFH_AVX512VL 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_REDUCE))]
-  "TARGET_AVX512DQ"
+  "TARGET_AVX512DQ || (VALID_AVX512FP16_REG_MODE (<MODE>mode))"
   "vreduce<ssemodesuffix>\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "reduces<mode><mask_scalar_name><round_saeonly_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (unspec:VF_128
-	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (unspec:VFH_128
+	    [(match_operand:VFH_128 1 "register_operand" "v")
+	     (match_operand:VFH_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")
 	     (match_operand:SI 3 "const_0_to_255_operand")]
 	    UNSPEC_REDUCE)
 	  (match_dup 1)
 	  (const_int 1)))]
-  "TARGET_AVX512DQ"
+  "TARGET_AVX512DQ || (VALID_AVX512FP16_REG_MODE (<MODE>mode))"
   "vreduce<ssescalarmodesuffix>\t{%3, <round_saeonly_scalar_mask_op4>%2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %<iptr>2<round_saeonly_scalar_mask_op4>, %3}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
@@ -10212,9 +10212,9 @@ (define_insn "avx512f_sfixupimm<mode>_mask<round_saeonly_name>"
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_insn "<avx512>_rndscale<mode><mask_name><round_saeonly_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(unspec:VF_AVX512VL
-	  [(match_operand:VF_AVX512VL 1 "nonimmediate_operand" "<round_saeonly_constraint>")
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(unspec:VFH_AVX512VL
+	  [(match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_AVX512F"
@@ -10224,13 +10224,13 @@ (define_insn "<avx512>_rndscale<mode><mask_name><round_saeonly_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (unspec:VF_128
-	    [(match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (unspec:VFH_128
+	    [(match_operand:VFH_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")
 	     (match_operand:SI 3 "const_0_to_255_operand")]
 	    UNSPEC_ROUND)
-	  (match_operand:VF_128 1 "register_operand" "v")
+	  (match_operand:VFH_128 1 "register_operand" "v")
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_scalar_mask_op4>%2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %<iptr>2<round_saeonly_scalar_mask_op4>, %3}"
@@ -10239,14 +10239,14 @@ (define_insn "avx512f_rndscale<mode><mask_scalar_name><round_saeonly_scalar_name
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*avx512f_rndscale<mode><round_saeonly_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (vec_duplicate:VF_128
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (vec_duplicate:VFH_128
 	    (unspec:<ssescalarmode>
 	      [(match_operand:<ssescalarmode> 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
 	       (match_operand:SI 3 "const_0_to_255_operand")]
 	      UNSPEC_ROUND))
-          (match_operand:VF_128 1 "register_operand" "v")
+          (match_operand:VFH_128 1 "register_operand" "v")
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}"
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 17c396567f2..4c8e54e4c2a 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -705,6 +705,14 @@
 #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
 #define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vreduceph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vreduceph_v32hf_mask_round(A, 123, C, D, 8)
+#define __builtin_ia32_vreduceph_v8hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v8hf_mask(A, 123, C, D)
+#define __builtin_ia32_vreduceph_v16hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v16hf_mask(A, 123, C, D)
+#define __builtin_ia32_vreducesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vreducesh_v8hf_mask_round(A, B, 123, D, E, 8)
+#define __builtin_ia32_vrndscaleph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vrndscaleph_v32hf_mask_round(A, 123, C, D, 8)
+#define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D)
+#define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D)
+#define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index c1d95fc2ead..044d427c932 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -722,6 +722,14 @@
 #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
 #define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vreduceph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vreduceph_v32hf_mask_round(A, 123, C, D, 8)
+#define __builtin_ia32_vreduceph_v8hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v8hf_mask(A, 123, C, D)
+#define __builtin_ia32_vreduceph_v16hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v16hf_mask(A, 123, C, D)
+#define __builtin_ia32_vreducesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vreducesh_v8hf_mask_round(A, B, 123, D, E, 8)
+#define __builtin_ia32_vrndscaleph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vrndscaleph_v32hf_mask_round(A, 123, C, D, 8)
+#define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D)
+#define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D)
+#define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 5b6d0b082d1..b7ffdf7e1df 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -671,6 +671,14 @@ test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 8)
 
 /* avx512fp16intrin.h */
 test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8)
+test_1 (_mm_reduce_ph, __m128h, __m128h, 123)
+test_1 (_mm256_reduce_ph, __m256h, __m256h, 123)
+test_1 (_mm512_reduce_ph, __m512h, __m512h, 123)
+test_1 (_mm_roundscale_ph, __m128h, __m128h, 123)
+test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123)
+test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123)
+test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
+test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
 test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
@@ -689,9 +697,21 @@ test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8)
 test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm512_scalef_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm_scalef_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_maskz_reduce_ph, __m128h, __mmask8, __m128h, 123)
+test_2 (_mm256_maskz_reduce_ph, __m256h, __mmask16, __m256h, 123)
+test_2 (_mm512_maskz_reduce_ph, __m512h, __mmask32, __m512h, 123)
+test_2 (_mm_reduce_sh, __m128h, __m128h, __m128h, 123)
+test_2 (_mm_maskz_roundscale_ph, __m128h, __mmask8, __m128h, 123)
+test_2 (_mm256_maskz_roundscale_ph, __m256h, __mmask16, __m256h, 123)
+test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123)
+test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
+test_2x (_mm512_maskz_reduce_round_ph, __m512h, __mmask32, __m512h, 123, 8)
+test_2x (_mm512_maskz_roundscale_round_ph, __m512h, __mmask32, __m512h, 123, 8)
+test_2x (_mm_reduce_round_sh, __m128h, __m128h, __m128h, 123, 8)
+test_2x (_mm_roundscale_round_sh, __m128h, __m128h, __m128h, 123, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -709,8 +729,20 @@ test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
 test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_maskz_scalef_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm_maskz_scalef_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm_mask_reduce_ph, __m128h, __m128h, __mmask8, __m128h, 123)
+test_3 (_mm256_mask_reduce_ph, __m256h, __m256h, __mmask16, __m256h, 123)
+test_3 (_mm512_mask_reduce_ph, __m512h, __m512h, __mmask32, __m512h, 123)
+test_3 (_mm_maskz_reduce_sh, __m128h, __mmask8, __m128h, __m128h, 123)
+test_3 (_mm_mask_roundscale_ph, __m128h, __m128h, __mmask8, __m128h, 123)
+test_3 (_mm256_mask_roundscale_ph, __m256h, __m256h, __mmask16, __m256h, 123)
+test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123)
+test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
+test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
+test_3x (_mm512_mask_roundscale_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
+test_3x (_mm_maskz_reduce_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8)
+test_3x (_mm_maskz_roundscale_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -726,6 +758,10 @@ test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm_mask_scalef_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
+test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
+test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
+test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 
 /* avx512fp16vlintrin.h */
 test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index b2de5679bb6..5dbe8cba5ea 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -776,6 +776,14 @@ test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 8)
 
 /* avx512fp16intrin.h */
 test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8)
+test_1 (_mm_reduce_ph, __m128h, __m128h, 123)
+test_1 (_mm256_reduce_ph, __m256h, __m256h, 123)
+test_1 (_mm512_reduce_ph, __m512h, __m512h, 123)
+test_1 (_mm_roundscale_ph, __m128h, __m128h, 123)
+test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123)
+test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123)
+test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
+test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
 test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
@@ -793,9 +801,21 @@ test_2 (_mm_comi_sh, int, __m128h, __m128h, 1)
 test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8)
 test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm512_scalef_round_ph, __m512h, __m512h, __m512h, 8)
+test_2 (_mm_maskz_reduce_ph, __m128h, __mmask8, __m128h, 123)
+test_2 (_mm256_maskz_reduce_ph, __m256h, __mmask16, __m256h, 123)
+test_2 (_mm512_maskz_reduce_ph, __m512h, __mmask32, __m512h, 123)
+test_2 (_mm_reduce_sh, __m128h, __m128h, __m128h, 123)
+test_2 (_mm_maskz_roundscale_ph, __m128h, __mmask8, __m128h, 123)
+test_2 (_mm256_maskz_roundscale_ph, __m256h, __mmask16, __m256h, 123)
+test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123)
+test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
+test_2x (_mm512_maskz_reduce_round_ph, __m512h, __mmask32, __m512h, 123, 8)
+test_2x (_mm512_maskz_roundscale_round_ph, __m512h, __mmask32, __m512h, 123, 8)
+test_2x (_mm_reduce_round_sh, __m128h, __m128h, __m128h, 123, 8)
+test_2x (_mm_roundscale_round_sh, __m128h, __m128h, __m128h, 123, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -812,8 +832,20 @@ test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1)
 test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
 test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_maskz_scalef_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
+test_3 (_mm_mask_reduce_ph, __m128h, __m128h, __mmask8, __m128h, 123)
+test_3 (_mm256_mask_reduce_ph, __m256h, __m256h, __mmask16, __m256h, 123)
+test_3 (_mm512_mask_reduce_ph, __m512h, __m512h, __mmask32, __m512h, 123)
+test_3 (_mm_maskz_reduce_sh, __m128h, __mmask8, __m128h, __m128h, 123)
+test_3 (_mm_mask_roundscale_ph, __m128h, __m128h, __mmask8, __m128h, 123)
+test_3 (_mm256_mask_roundscale_ph, __m256h, __m256h, __mmask16, __m256h, 123)
+test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123)
+test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
+test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
+test_3x (_mm512_mask_roundscale_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
+test_3x (_mm_maskz_reduce_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8)
+test_3x (_mm_maskz_roundscale_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -828,6 +860,10 @@ test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
+test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
+test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
+test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
+test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 
 /* avx512fp16vlintrin.h */
 test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 5948622cc4f..2d968f07bc8 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -723,6 +723,14 @@
 #define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
 #define __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefph_v32hf_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vscalefsh_v8hf_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vreduceph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vreduceph_v32hf_mask_round(A, 123, C, D, 8)
+#define __builtin_ia32_vreduceph_v8hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v8hf_mask(A, 123, C, D)
+#define __builtin_ia32_vreduceph_v16hf_mask(A, B, C, D) __builtin_ia32_vreduceph_v16hf_mask(A, 123, C, D)
+#define __builtin_ia32_vreducesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vreducesh_v8hf_mask_round(A, B, 123, D, E, 8)
+#define __builtin_ia32_vrndscaleph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vrndscaleph_v32hf_mask_round(A, 123, C, D, 8)
+#define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D)
+#define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D)
+#define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 21/62] AVX512FP16: Add testcase for vreduceph/vreducesh/vrndscaleph/vrndscalesh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (19 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 20/62] AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 22/62] AVX512FP16: Add fpclass/getexp/getmant instructions liuhongt
                   ` (40 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h (_ROUND_CUR): New macro.
	* gcc.target/i386/avx512fp16-vreduceph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vreduceph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vreducesh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vreducesh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vrndscaleph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vrndscaleph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vrndscalesh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vrndscalesh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vreduceph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vreduceph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-helper.h       |   1 +
 .../gcc.target/i386/avx512fp16-vreduceph-1a.c |  26 ++++
 .../gcc.target/i386/avx512fp16-vreduceph-1b.c | 116 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vreducesh-1a.c |  26 ++++
 .../gcc.target/i386/avx512fp16-vreducesh-1b.c |  78 ++++++++++++
 .../i386/avx512fp16-vrndscaleph-1a.c          |  26 ++++
 .../i386/avx512fp16-vrndscaleph-1b.c          | 101 +++++++++++++++
 .../i386/avx512fp16-vrndscalesh-1a.c          |  25 ++++
 .../i386/avx512fp16-vrndscalesh-1b.c          |  62 ++++++++++
 .../i386/avx512fp16vl-vreduceph-1a.c          |  30 +++++
 .../i386/avx512fp16vl-vreduceph-1b.c          |  16 +++
 .../i386/avx512fp16vl-vrndscaleph-1a.c        |  30 +++++
 .../i386/avx512fp16vl-vrndscaleph-1b.c        |  16 +++
 13 files changed, 553 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
index 5d3539bf312..ec88888532c 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
@@ -17,6 +17,7 @@
 /* Useful macros.  */
 #define NOINLINE __attribute__((noinline,noclone))
 #define _ROUND_NINT (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC)
+#define _ROUND_CUR 8 
 #define AVX512F_MAX_ELEM 512 / 32
 
 /* Structure for _Float16 emulation  */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1a.c
new file mode 100644
index 00000000000..536c1ef6b02
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+#define IMM 123
+
+volatile __m512h x1;
+volatile __mmask32 m;
+
+void extern
+avx512fp16_test (void)
+{
+  x1 = _mm512_reduce_ph (x1, IMM);
+  x1 = _mm512_mask_reduce_ph (x1, m, x1, IMM);
+  x1 = _mm512_maskz_reduce_ph (m, x1, IMM);
+  x1 = _mm512_reduce_round_ph (x1, IMM, 8);
+  x1 = _mm512_mask_reduce_round_ph (x1, m, x1, IMM, 8);
+  x1 = _mm512_maskz_reduce_round_ph (m, x1, IMM, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1b.c
new file mode 100644
index 00000000000..20d1ba59fda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vreduceph-1b.c
@@ -0,0 +1,116 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+#ifndef __REDUCEPH__
+#define __REDUCEPH__
+V512 borrow_reduce_ps(V512 v, int imm8)
+{
+  V512 temp;
+  switch (imm8)
+    {
+    case 1: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 1);break;
+    case 2: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 2);break;
+    case 3: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 3);break;
+    case 4: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 4);break;
+    case 5: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 5);break;
+    case 6: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 6);break;
+    case 7: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 7);break;
+    case 8: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 8);break;
+    }
+  return temp;
+}
+#endif
+
+void NOINLINE
+EMULATE(reduce_ph) (V512 * dest, V512 op1,
+		  __mmask32 k, int imm8, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  V512 t1,t2;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+  t1 = borrow_reduce_ps(v1, imm8);
+  t2 = borrow_reduce_ps(v2, imm8);
+
+  for (i = 0; i < 16; i++) {
+      if (((1 << i) & m1) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.u32[i] = v7.u32[i];
+	  }
+      }
+      else {
+	  v5.f32[i] = t1.f32[i];
+      }
+
+      if (((1 << i) & m2) == 0) {
+	  if (zero_mask) {
+	      v6.f32[i] = 0;
+	  }
+	  else {
+	      v6.u32[i] = v8.u32[i];
+	  }
+      }
+      else {
+	  v6.f32[i] = t2.f32[i];
+      }
+
+  }
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(reduce_ph) (&exp, src1,  NET_MASK, 6, 0);
+  HF(res) = INTRINSIC (_reduce_ph) (HF(src1), 6);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _reduce_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(reduce_ph) (&exp, src1, MASK_VALUE, 5, 0);
+  HF(res) = INTRINSIC (_mask_reduce_ph) (HF(res), MASK_VALUE, HF(src1), 5);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_reduce_ph);
+
+  EMULATE(reduce_ph) (&exp, src1,  ZMASK_VALUE, 4, 1);
+  HF(res) = INTRINSIC (_maskz_reduce_ph) (ZMASK_VALUE, HF(src1), 4);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_reduce_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(reduce_ph) (&exp, src1,  NET_MASK, 6, 0);
+  HF(res) = INTRINSIC (_reduce_round_ph) (HF(src1), 6, _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _reduce_round_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(reduce_ph) (&exp, src1, MASK_VALUE, 5, 0);
+  HF(res) = INTRINSIC (_mask_reduce_round_ph) (HF(res), MASK_VALUE, HF(src1), 5, _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_reduce_round_ph);
+
+  EMULATE(reduce_ph) (&exp, src1,  ZMASK_VALUE, 4, 1);
+  HF(res) = INTRINSIC (_maskz_reduce_round_ph) (ZMASK_VALUE, HF(src1), 4, _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_reduce_round_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1a.c
new file mode 100644
index 00000000000..80369918567
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreducesh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+
+#include <immintrin.h>
+
+#define IMM 123
+
+volatile __m128h x1, x2;
+volatile __mmask8 m;
+
+void extern
+avx512fp16_test (void)
+{
+  x1 = _mm_reduce_sh (x1, x2, IMM);
+  x1 = _mm_mask_reduce_sh(x1, m, x1, x2, IMM);
+  x1 = _mm_maskz_reduce_sh(m, x1, x2, IMM);
+  x1 = _mm_reduce_round_sh (x1, x2, IMM, 4);
+  x1 = _mm_mask_reduce_round_sh(x1, m, x1, x2, IMM, 8);
+  x1 = _mm_maskz_reduce_round_sh(m, x1, x2, IMM, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1b.c
new file mode 100644
index 00000000000..4c5dfe73c3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vreducesh-1b.c
@@ -0,0 +1,78 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+V512 borrow_reduce_ps(V512 v, int imm8)
+{
+  V512 temp;
+  switch (imm8)
+    {
+    case 1: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 1);break;
+    case 2: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 2);break;
+    case 3: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 3);break;
+    case 4: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 4);break;
+    case 5: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 5);break;
+    case 6: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 6);break;
+    case 7: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 7);break;
+    case 8: temp.zmm =  _mm512_mask_reduce_ps (v.zmm, 0xffff, v.zmm, 8);break;
+    }
+  return temp;
+}
+
+void NOINLINE
+emulate_reduce_sh(V512 * dest, V512 op1,
+                  __mmask32 k, int imm8, int zero_mask) 
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  V512 t1;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+  t1 = borrow_reduce_ps(v1, imm8);
+
+  if ((k&1) || !k)
+    v5.f32[0] = t1.f32[0];
+  else if (zero_mask)
+    v5.f32[0] = 0;
+  else
+    v5.f32[0] = v7.f32[0];
+
+  for (i = 1; i < 8; i++)
+    v5.f32[i] = v1.f32[i];
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_reduce_sh(&exp, src1,  0x1, 8, 0);
+  res.xmmh[0] = _mm_reduce_round_sh(src1.xmmh[0], exp.xmmh[0], 8, _ROUND_CUR);
+  check_results(&res, &exp, N_ELEMS, "_mm_reduce_round_sh");
+
+  init_dest(&res, &exp);
+  emulate_reduce_sh(&exp, src1,  0x1, 7, 0);
+  res.xmmh[0] = _mm_mask_reduce_round_sh(res.xmmh[0], 0x1, src1.xmmh[0], exp.xmmh[0], 7, _ROUND_CUR);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_reduce_round_sh");
+
+  emulate_reduce_sh(&exp, src1,  0x3, 6, 1);
+  res.xmmh[0] = _mm_maskz_reduce_round_sh(0x3, src1.xmmh[0], exp.xmmh[0], 6, _ROUND_CUR);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_reduce_round_sh");
+
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1a.c
new file mode 100644
index 00000000000..8a307274a9f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+#define IMM 123
+
+volatile __m512h x1;
+volatile __mmask32 m;
+
+void extern
+avx512fp16_test (void)
+{
+  x1 = _mm512_roundscale_ph (x1, IMM);
+  x1 = _mm512_mask_roundscale_ph (x1, m, x1, IMM);
+  x1 = _mm512_maskz_roundscale_ph (m, x1, IMM);
+  x1 = _mm512_roundscale_round_ph (x1, IMM, 8);
+  x1 = _mm512_mask_roundscale_round_ph (x1, m, x1, IMM, 8);
+  x1 = _mm512_maskz_roundscale_round_ph (m, x1, IMM, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1b.c
new file mode 100644
index 00000000000..d50e75585f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscaleph-1b.c
@@ -0,0 +1,101 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(roundscale_ph) (V512 * dest, V512 op1,
+		      __mmask32 k, int zero_mask, int round)
+{   
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+  V512 t1, t2;
+  m1 = k & 0xffff; 
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+  if (round==0)
+  {
+    t1.zmm = _mm512_maskz_roundscale_ps (0xffff, v1.zmm, 0x11);
+    t2.zmm = _mm512_maskz_roundscale_ps (0xffff, v2.zmm, 0x11);
+  }  
+  else
+  {
+    t1.zmm = _mm512_maskz_roundscale_ps (0xffff, v1.zmm, 0x14);
+    t2.zmm = _mm512_maskz_roundscale_ps (0xffff, v2.zmm, 0x14);
+  }
+  for (i = 0; i < 16; i++) 
+  { 
+    if (((1 << i) & m1) == 0) {
+	if (zero_mask) {
+	    v5.f32[i] = 0;
+	}
+	else {
+	    v5.u32[i] = v7.u32[i];
+	}
+    }
+    else {
+	v5.f32[i] = t1.f32[i];
+    }
+
+    if (((1 << i) & m2) == 0) {
+	if (zero_mask) {
+	    v6.f32[i] = 0;
+	}
+	else {
+	    v6.u32[i] = v8.u32[i];
+	}
+    }
+    else {
+	v6.f32[i] = t2.f32[i];
+    }
+  }
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res, exp;
+
+  init_src();
+
+  EMULATE(roundscale_ph) (&exp, src1,  NET_MASK, 0, 1);
+  HF(res) = INTRINSIC (_roundscale_ph) (HF(src1), 0x13);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _roundscale_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(roundscale_ph) (&exp, src1, MASK_VALUE, 0, 1);
+  HF(res) = INTRINSIC (_mask_roundscale_ph) (HF(res), MASK_VALUE, HF(src1), 0x14);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_roundscale_ph);
+
+  EMULATE(roundscale_ph) (&exp, src1,  ZMASK_VALUE, 1, 1);
+  HF(res) = INTRINSIC (_maskz_roundscale_ph) (ZMASK_VALUE, HF(src1), 0x14);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_roundscale_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(roundscale_ph) (&exp, src1,  NET_MASK, 0, 1);
+  HF(res) = INTRINSIC (_roundscale_round_ph) (HF(src1), 0x13, 0x08);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _roundscale_round_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(roundscale_ph) (&exp, src1, MASK_VALUE, 0, 1);
+  HF(res) = INTRINSIC (_mask_roundscale_round_ph) (HF(res), MASK_VALUE, HF(src1), 0x14, 0x08);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_roundscale_round_ph);
+
+  EMULATE(roundscale_ph) (&exp, src1,  ZMASK_VALUE, 1, 1);
+  HF(res) = INTRINSIC (_maskz_roundscale_round_ph) (ZMASK_VALUE, HF(src1), 0x14, 0x08);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_roundscale_round_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1a.c
new file mode 100644
index 00000000000..bd41b634aff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+#define IMM 123
+
+volatile __m128h x1, x2;
+volatile __mmask8 m;
+
+void extern
+avx512fp16_test (void)
+{
+  x1 = _mm_roundscale_sh (x1, x2, IMM);
+  x1 = _mm_mask_roundscale_sh(x1, m, x1, x2, IMM);
+  x1 = _mm_maskz_roundscale_sh(m, x1, x2, IMM);
+  x1 = _mm_roundscale_round_sh (x1, x2, IMM, 4);
+  x1 = _mm_mask_roundscale_round_sh(x1, m, x1, x2, IMM, 8);
+  x1 = _mm_maskz_roundscale_round_sh(m, x1, x2, IMM, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1b.c
new file mode 100644
index 00000000000..c1033892878
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vrndscalesh-1b.c
@@ -0,0 +1,62 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_roundscale_sh(V512 * dest, V512 op1,
+	       __mmask8 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  V512 t1,t2;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+  t1.zmm = _mm512_maskz_roundscale_ps (0xffff, v1.zmm, 0x14);
+  t2.zmm = _mm512_maskz_roundscale_ps (0xffff, v2.zmm, 0x14);
+
+  if ((k&1) || !k)
+    v5.f32[0] = t1.f32[0]; 
+  else if (zero_mask)
+    v5.f32[0] = 0;
+  else
+    v5.f32[0] = v7.f32[0];
+
+  for (i = 1; i < 8; i++)
+    v5.f32[i] = v1.f32[i];
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_roundscale_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_roundscale_round_sh(src1.xmmh[0], src1.xmmh[0], 0x1, 0x08);
+  check_results(&res, &exp, N_ELEMS, "_mm_roundscale_round_sh");
+
+  init_dest(&res, &exp);
+  emulate_roundscale_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_mask_roundscale_round_sh(res.xmmh[0],
+					     0x1, src1.xmmh[0], src1.xmmh[0], 0x1, 0x08);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_roundscale_round_sh");
+
+  emulate_roundscale_sh(&exp, src1,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_roundscale_round_sh(0x3, src1.xmmh[0], src1.xmmh[0], 0x1, 0x08);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_roundscale_round_sh");
+
+
+  if (n_errs != 0)
+    abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1a.c
new file mode 100644
index 00000000000..4f43abd5411
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1a.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreduceph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+#define IMM 123
+
+volatile __m256h x2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+volatile __mmask16 m16;
+
+void extern
+avx512fp16_test (void)
+{
+  x2 = _mm256_reduce_ph (x2, IMM);
+  x3 = _mm_reduce_ph (x3, IMM);
+
+  x2 = _mm256_mask_reduce_ph (x2, m16, x2, IMM);
+  x3 = _mm_mask_reduce_ph (x3, m8, x3, IMM);
+
+  x2 = _mm256_maskz_reduce_ph (m8, x2, IMM);
+  x3 = _mm_maskz_reduce_ph (m16, x3, IMM);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c
new file mode 100644
index 00000000000..38515976ce6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vreduceph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vreduceph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vreduceph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c
new file mode 100644
index 00000000000..9fcf7e9b7bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1a.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscaleph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+#define IMM 123
+
+volatile __m256h x2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+volatile __mmask16 m16;
+
+void extern
+avx512fp16_test (void)
+{
+  x2 = _mm256_roundscale_ph (x2, IMM);
+  x3 = _mm_roundscale_ph (x3, IMM);
+
+  x2 = _mm256_mask_roundscale_ph (x2, m16, x2, IMM);
+  x3 = _mm_mask_roundscale_ph (x3, m8, x3, IMM);
+
+  x2 = _mm256_maskz_roundscale_ph (m8, x2, IMM);
+  x3 = _mm_maskz_roundscale_ph (m16, x3, IMM);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c
new file mode 100644
index 00000000000..04b00e2db2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vrndscaleph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vrndscaleph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vrndscaleph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 22/62] AVX512FP16: Add fpclass/getexp/getmant instructions.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (20 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 21/62] AVX512FP16: Add testcase for vreduceph/vreducesh/vrndscaleph/vrndscalesh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 23/62] AVX512FP16: Add testcase for fpclass/getmant/getexp instructions liuhongt
                   ` (39 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

Add vfpclassph/vfpclasssh/vgetexpph/vgetexpsh/vgetmantph/vgetmantsh.

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_fpclass_sh_mask):
	New intrinsic.
	(_mm_mask_fpclass_sh_mask): Likewise.
	(_mm512_mask_fpclass_ph_mask): Likewise.
	(_mm512_fpclass_ph_mask): Likewise.
	(_mm_getexp_sh): Likewise.
	(_mm_mask_getexp_sh): Likewise.
	(_mm_maskz_getexp_sh): Likewise.
	(_mm512_getexp_ph): Likewise.
	(_mm512_mask_getexp_ph): Likewise.
	(_mm512_maskz_getexp_ph): Likewise.
	(_mm_getexp_round_sh): Likewise.
	(_mm_mask_getexp_round_sh): Likewise.
	(_mm_maskz_getexp_round_sh): Likewise.
	(_mm512_getexp_round_ph): Likewise.
	(_mm512_mask_getexp_round_ph): Likewise.
	(_mm512_maskz_getexp_round_ph): Likewise.
	(_mm_getmant_sh): Likewise.
	(_mm_mask_getmant_sh): Likewise.
	(_mm_maskz_getmant_sh): Likewise.
	(_mm512_getmant_ph): Likewise.
	(_mm512_mask_getmant_ph): Likewise.
	(_mm512_maskz_getmant_ph): Likewise.
	(_mm_getmant_round_sh): Likewise.
	(_mm_mask_getmant_round_sh): Likewise.
	(_mm_maskz_getmant_round_sh): Likewise.
	(_mm512_getmant_round_ph): Likewise.
	(_mm512_mask_getmant_round_ph): Likewise.
	(_mm512_maskz_getmant_round_ph): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_mask_fpclass_ph_mask):
	New intrinsic.
	(_mm_fpclass_ph_mask): Likewise.
	(_mm256_mask_fpclass_ph_mask): Likewise.
	(_mm256_fpclass_ph_mask): Likewise.
	(_mm256_getexp_ph): Likewise.
	(_mm256_mask_getexp_ph): Likewise.
	(_mm256_maskz_getexp_ph): Likewise.
	(_mm_getexp_ph): Likewise.
	(_mm_mask_getexp_ph): Likewise.
	(_mm_maskz_getexp_ph): Likewise.
	(_mm256_getmant_ph): Likewise.
	(_mm256_mask_getmant_ph): Likewise.
	(_mm256_maskz_getmant_ph): Likewise.
	(_mm_getmant_ph): Likewise.
	(_mm_mask_getmant_ph): Likewise.
	(_mm_maskz_getmant_ph): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	(ix86_expand_round_builtin): Ditto.
	* config/i386/sse.md (vecmemsuffix): Add HF vector modes.
	(<avx512>_getexp<mode><mask_name><round_saeonly_name>): Adjust
	to support HF vector modes.
	(avx512f_sgetexp<mode><mask_scalar_name><round_saeonly_scalar_name):
	Ditto.
	(avx512dq_fpclass<mode><mask_scalar_merge_name>): Ditto.
	(avx512dq_vmfpclass<mode><mask_scalar_merge_name>): Ditto.
	(<avx512>_getmant<mode><mask_name><round_saeonly_name>): Ditto.
	(avx512f_vgetmant<mode><mask_scalar_name><round_saeonly_scalar_name>):
	Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 471 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   | 229 ++++++++++++
 gcc/config/i386/i386-builtin-types.def |   3 +
 gcc/config/i386/i386-builtin.def       |  12 +
 gcc/config/i386/i386-expand.c          |   7 +
 gcc/config/i386/sse.md                 |  41 +--
 gcc/testsuite/gcc.target/i386/avx-1.c  |  10 +
 gcc/testsuite/gcc.target/i386/sse-13.c |  10 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  18 +
 gcc/testsuite/gcc.target/i386/sse-22.c |  18 +
 gcc/testsuite/gcc.target/i386/sse-23.c |  10 +
 11 files changed, 809 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 8c2c9b28987..2fbfc140c44 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -1982,6 +1982,477 @@ _mm_maskz_roundscale_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vfpclasssh.  */
+#ifdef __OPTIMIZE__
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fpclass_sh_mask (__m128h __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm,
+						   (__mmask8) -1);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fpclass_sh_mask (__mmask8 __U, __m128h __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) __A, __imm, __U);
+}
+
+#else
+#define _mm_fpclass_sh_mask(X, C)					\
+  ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X),	\
+					     (int) (C), (__mmask8) (-1))) \
+
+#define _mm_mask_fpclass_sh_mask(U, X, C)				\
+  ((__mmask8) __builtin_ia32_fpclasssh_mask ((__v8hf) (__m128h) (X),	\
+					     (int) (C), (__mmask8) (U)))
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfpclassph.  */
+#ifdef __OPTIMIZE__
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fpclass_ph_mask (__mmask32 __U, __m512h __A,
+			     const int __imm)
+{
+  return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
+						       __imm, __U);
+}
+
+extern __inline __mmask32
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fpclass_ph_mask (__m512h __A, const int __imm)
+{
+  return (__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) __A,
+						       __imm,
+						       (__mmask32) -1);
+}
+
+#else
+#define _mm512_mask_fpclass_ph_mask(u, x, c)				\
+  ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x),\
+						 (int) (c),(__mmask8)(u)))
+
+#define _mm512_fpclass_ph_mask(x, c)                                    \
+  ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x),\
+						 (int) (c),(__mmask8)-1))
+#endif /* __OPIMTIZE__ */
+
+/* Intrinsics vgetexpph, vgetexpsh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_sh (__m128h __A, __m128h __B)
+{
+  return (__m128h)
+    __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					(__v8hf) _mm_setzero_ph (),
+					(__mmask8) -1,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getexp_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+{
+  return (__m128h)
+    __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					(__v8hf) __W, (__mmask8) __U,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getexp_sh (__mmask8 __U, __m128h __A, __m128h __B)
+{
+  return (__m128h)
+    __builtin_ia32_getexpsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					(__v8hf) _mm_setzero_ph (),
+					(__mmask8) __U,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_getexp_ph (__m512h __A)
+{
+  return (__m512h)
+    __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+				     (__v32hf) _mm512_setzero_ph (),
+				     (__mmask32) -1, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_getexp_ph (__m512h __W, __mmask32 __U, __m512h __A)
+{
+  return (__m512h)
+    __builtin_ia32_getexpph512_mask ((__v32hf) __A, (__v32hf) __W,
+				     (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_getexp_ph (__mmask32 __U, __m512h __A)
+{
+  return (__m512h)
+    __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+				     (__v32hf) _mm512_setzero_ph (),
+				     (__mmask32) __U, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_round_sh (__m128h __A, __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
+						       (__v8hf) __B,
+						       _mm_setzero_ph (),
+						       (__mmask8) -1,
+						       __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getexp_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
+			  __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __W,
+						       (__mmask8) __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getexp_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
+			   const int __R)
+{
+  return (__m128h) __builtin_ia32_getexpsh_mask_round ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf)
+						       _mm_setzero_ph (),
+						       (__mmask8) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_getexp_round_ph (__m512h __A, const int __R)
+{
+  return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+						   (__v32hf)
+						   _mm512_setzero_ph (),
+						   (__mmask32) -1, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_getexp_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
+			     const int __R)
+{
+  return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+						   (__v32hf) __W,
+						   (__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_getexp_round_ph (__mmask32 __U, __m512h __A, const int __R)
+{
+  return (__m512h) __builtin_ia32_getexpph512_mask ((__v32hf) __A,
+						   (__v32hf)
+						   _mm512_setzero_ph (),
+						   (__mmask32) __U, __R);
+}
+
+#else
+#define _mm_getexp_round_sh(A, B, R)						\
+  ((__m128h)__builtin_ia32_getexpsh_mask_round((__v8hf)(__m128h)(A),		\
+					       (__v8hf)(__m128h)(B),		\
+					       (__v8hf)_mm_setzero_ph(),	\
+					       (__mmask8)-1, R))
+
+#define _mm_mask_getexp_round_sh(W, U, A, B, C)					\
+  (__m128h)__builtin_ia32_getexpsh_mask_round(A, B, W, U, C)
+
+#define _mm_maskz_getexp_round_sh(U, A, B, C)					\
+  (__m128h)__builtin_ia32_getexpsh_mask_round(A, B,				\
+					      (__v8hf)_mm_setzero_ph(),		\
+					      U, C)
+
+#define _mm512_getexp_round_ph(A, R)						\
+  ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A),		\
+  (__v32hf)_mm512_setzero_ph(), (__mmask32)-1, R))
+
+#define _mm512_mask_getexp_round_ph(W, U, A, R)					\
+  ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A),		\
+  (__v32hf)(__m512h)(W), (__mmask32)(U), R))
+
+#define _mm512_maskz_getexp_round_ph(U, A, R)					\
+  ((__m512h)__builtin_ia32_getexpph512_mask((__v32hf)(__m512h)(A),		\
+  (__v32hf)_mm512_setzero_ph(), (__mmask32)(U), R))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vgetmantph, vgetmantsh.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_sh (__m128h __A, __m128h __B,
+		_MM_MANTISSA_NORM_ENUM __C,
+		_MM_MANTISSA_SIGN_ENUM __D)
+{
+  return (__m128h)
+    __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					 (__D << 2) | __C, _mm_setzero_ph (),
+					 (__mmask8) -1,
+					 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getmant_sh (__m128h __W, __mmask8 __U, __m128h __A,
+		     __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
+		     _MM_MANTISSA_SIGN_ENUM __D)
+{
+  return (__m128h)
+    __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					 (__D << 2) | __C, (__v8hf) __W,
+					 __U, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getmant_sh (__mmask8 __U, __m128h __A, __m128h __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D)
+{
+  return (__m128h)
+    __builtin_ia32_getmantsh_mask_round ((__v8hf) __A, (__v8hf) __B,
+					 (__D << 2) | __C,
+					 (__v8hf) _mm_setzero_ph(),
+					 __U, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_getmant_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
+		   _MM_MANTISSA_SIGN_ENUM __C)
+{
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     _mm512_setzero_ph (),
+						     (__mmask32) -1,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_getmant_ph (__m512h __W, __mmask32 __U, __m512h __A,
+			_MM_MANTISSA_NORM_ENUM __B,
+			_MM_MANTISSA_SIGN_ENUM __C)
+{
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     (__v32hf) __W, __U,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_getmant_ph (__mmask32 __U, __m512h __A,
+			 _MM_MANTISSA_NORM_ENUM __B,
+			 _MM_MANTISSA_SIGN_ENUM __C)
+{
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     (__v32hf)
+						     _mm512_setzero_ph (),
+						     __U,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_round_sh (__m128h __A, __m128h __B,
+		      _MM_MANTISSA_NORM_ENUM __C,
+		      _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+  return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
+							(__v8hf) __B,
+							(__D << 2) | __C,
+							_mm_setzero_ph (),
+							(__mmask8) -1,
+							__R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getmant_round_sh (__m128h __W, __mmask8 __U, __m128h __A,
+			   __m128h __B, _MM_MANTISSA_NORM_ENUM __C,
+			   _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+  return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
+							(__v8hf) __B,
+							(__D << 2) | __C,
+							(__v8hf) __W,
+							__U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getmant_round_sh (__mmask8 __U, __m128h __A, __m128h __B,
+			    _MM_MANTISSA_NORM_ENUM __C,
+			    _MM_MANTISSA_SIGN_ENUM __D, const int __R)
+{
+  return (__m128h) __builtin_ia32_getmantsh_mask_round ((__v8hf) __A,
+							(__v8hf) __B,
+							(__D << 2) | __C,
+							(__v8hf)
+							_mm_setzero_ph(),
+							__U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_getmant_round_ph (__m512h __A, _MM_MANTISSA_NORM_ENUM __B,
+			 _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     _mm512_setzero_ph (),
+						     (__mmask32) -1, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_getmant_round_ph (__m512h __W, __mmask32 __U, __m512h __A,
+			      _MM_MANTISSA_NORM_ENUM __B,
+			      _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     (__v32hf) __W, __U,
+						     __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A,
+			       _MM_MANTISSA_NORM_ENUM __B,
+			       _MM_MANTISSA_SIGN_ENUM __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_getmantph512_mask ((__v32hf) __A,
+						     (__C << 2) | __B,
+						     (__v32hf)
+						     _mm512_setzero_ph (),
+						     __U, __R);
+}
+
+#else
+#define _mm512_getmant_ph(X, B, C)					\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)	\
+					      _mm512_setzero_ph(),	\
+					      (__mmask32)-1,		\
+					      _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_mask_getmant_ph(W, U, X, B, C)				\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)(W),	\
+					      (__mmask32)(U),		\
+					      _MM_FROUND_CUR_DIRECTION))
+
+
+#define _mm512_maskz_getmant_ph(U, X, B, C)				\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)	\
+					      _mm512_setzero_ph(),	\
+					      (__mmask32)(U),		\
+					      _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_getmant_sh(X, Y, C, D)					\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)	\
+						 _mm_setzero_ph (),	\
+						 (__mmask8)-1,		\
+						 _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_getmant_sh(W, U, X, Y, C, D)				\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)(W),	\
+						 (__mmask8)(U),		\
+						 _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_getmant_sh(U, X, Y, C, D)				\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)	\
+						 _mm_setzero_ph(),	\
+						 (__mmask8)(U),		\
+						 _MM_FROUND_CUR_DIRECTION))
+
+#define _mm512_getmant_round_ph(X, B, C, R)				\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)	\
+					      _mm512_setzero_ph(),	\
+					      (__mmask32)-1,		\
+					      (R)))
+
+#define _mm512_mask_getmant_round_ph(W, U, X, B, C, R)			\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)(W),	\
+					      (__mmask32)(U),		\
+					      (R)))
+
+
+#define _mm512_maskz_getmant_round_ph(U, X, B, C, R)			\
+  ((__m512h)__builtin_ia32_getmantph512_mask ((__v32hf)(__m512h)(X),	\
+					      (int)(((C)<<2) | (B)),	\
+					      (__v32hf)(__m512h)	\
+					      _mm512_setzero_ph(),	\
+					      (__mmask32)(U),		\
+					      (R)))
+
+#define _mm_getmant_round_sh(X, Y, C, D, R)				\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)	\
+						 _mm_setzero_ph (),	\
+						 (__mmask8)-1,		\
+						 (R)))
+
+#define _mm_mask_getmant_round_sh(W, U, X, Y, C, D, R)			\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)(W),	\
+						 (__mmask8)(U),		\
+						 (R)))
+
+#define _mm_maskz_getmant_round_sh(U, X, Y, C, D, R)			\
+  ((__m128h)__builtin_ia32_getmantsh_mask_round ((__v8hf)(__m128h)(X),	\
+						 (__v8hf)(__m128h)(Y),	\
+						 (int)(((D)<<2) | (C)),	\
+						 (__v8hf)(__m128h)	\
+						 _mm_setzero_ph(),	\
+						 (__mmask8)(U),		\
+						 (R)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index 20b6716aa00..206d60407fc 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -701,6 +701,235 @@ _mm256_maskz_roundscale_ph (__mmask16 __A, __m256h __B, int __C)
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vfpclassph.  */
+#ifdef __OPTIMIZE__
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fpclass_ph_mask (__mmask8 __U, __m128h __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpclassph128_mask ((__v8hf) __A,
+						      __imm, __U);
+}
+
+extern __inline __mmask8
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fpclass_ph_mask (__m128h __A, const int __imm)
+{
+  return (__mmask8) __builtin_ia32_fpclassph128_mask ((__v8hf) __A,
+						      __imm,
+						      (__mmask8) -1);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fpclass_ph_mask (__mmask16 __U, __m256h __A, const int __imm)
+{
+  return (__mmask16) __builtin_ia32_fpclassph256_mask ((__v16hf) __A,
+						      __imm, __U);
+}
+
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fpclass_ph_mask (__m256h __A, const int __imm)
+{
+  return (__mmask16) __builtin_ia32_fpclassph256_mask ((__v16hf) __A,
+						      __imm,
+						      (__mmask16) -1);
+}
+
+#else
+#define _mm_fpclass_ph_mask(X, C)                                       \
+  ((__mmask8) __builtin_ia32_fpclassph128_mask ((__v8hf) (__m128h) (X),  \
+						(int) (C),(__mmask8)-1))
+
+#define _mm_mask_fpclass_ph_mask(u, X, C)                               \
+  ((__mmask8) __builtin_ia32_fpclassph128_mask ((__v8hf) (__m128h) (X),  \
+						(int) (C),(__mmask8)(u)))
+
+#define _mm256_fpclass_ph_mask(X, C)                                    \
+  ((__mmask16) __builtin_ia32_fpclassph256_mask ((__v16hf) (__m256h) (X),  \
+						(int) (C),(__mmask16)-1))
+
+#define _mm256_mask_fpclass_ph_mask(u, X, C)				\
+  ((__mmask16) __builtin_ia32_fpclassph256_mask ((__v16hf) (__m256h) (X),  \
+						(int) (C),(__mmask16)(u)))
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vgetexpph, vgetexpsh.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_getexp_ph (__m256h __A)
+{
+  return (__m256h) __builtin_ia32_getexpph256_mask ((__v16hf) __A,
+						   (__v16hf)
+						   _mm256_setzero_ph (),
+						   (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_getexp_ph (__m256h __W, __mmask16 __U, __m256h __A)
+{
+  return (__m256h) __builtin_ia32_getexpph256_mask ((__v16hf) __A,
+						   (__v16hf) __W,
+						   (__mmask16) __U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_getexp_ph (__mmask16 __U, __m256h __A)
+{
+  return (__m256h) __builtin_ia32_getexpph256_mask ((__v16hf) __A,
+						   (__v16hf)
+						   _mm256_setzero_ph (),
+						   (__mmask16) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getexp_ph (__m128h __A)
+{
+  return (__m128h) __builtin_ia32_getexpph128_mask ((__v8hf) __A,
+						   (__v8hf)
+						   _mm_setzero_ph (),
+						   (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getexp_ph (__m128h __W, __mmask8 __U, __m128h __A)
+{
+  return (__m128h) __builtin_ia32_getexpph128_mask ((__v8hf) __A,
+						   (__v8hf) __W,
+						   (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getexp_ph (__mmask8 __U, __m128h __A)
+{
+  return (__m128h) __builtin_ia32_getexpph128_mask ((__v8hf) __A,
+						   (__v8hf)
+						   _mm_setzero_ph (),
+						   (__mmask8) __U);
+}
+
+
+/* Intrinsics vgetmantph, vgetmantsh.  */
+#ifdef __OPTIMIZE__
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_getmant_ph (__m256h __A, _MM_MANTISSA_NORM_ENUM __B,
+		   _MM_MANTISSA_SIGN_ENUM __C)
+{
+  return (__m256h) __builtin_ia32_getmantph256_mask ((__v16hf) __A,
+						     (__C << 2) | __B,
+						     (__v16hf)
+						     _mm256_setzero_ph (),
+						     (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_getmant_ph (__m256h __W, __mmask16 __U, __m256h __A,
+			_MM_MANTISSA_NORM_ENUM __B,
+			_MM_MANTISSA_SIGN_ENUM __C)
+{
+  return (__m256h) __builtin_ia32_getmantph256_mask ((__v16hf) __A,
+						     (__C << 2) | __B,
+						     (__v16hf) __W,
+						     (__mmask16) __U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_getmant_ph (__mmask16 __U, __m256h __A,
+			 _MM_MANTISSA_NORM_ENUM __B,
+			 _MM_MANTISSA_SIGN_ENUM __C)
+{
+  return (__m256h) __builtin_ia32_getmantph256_mask ((__v16hf) __A,
+						     (__C << 2) | __B,
+						     (__v16hf)
+						     _mm256_setzero_ph (),
+						     (__mmask16) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_getmant_ph (__m128h __A, _MM_MANTISSA_NORM_ENUM __B,
+		_MM_MANTISSA_SIGN_ENUM __C)
+{
+  return (__m128h) __builtin_ia32_getmantph128_mask ((__v8hf) __A,
+						     (__C << 2) | __B,
+						     (__v8hf)
+						     _mm_setzero_ph (),
+						     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_getmant_ph (__m128h __W, __mmask8 __U, __m128h __A,
+		     _MM_MANTISSA_NORM_ENUM __B,
+		     _MM_MANTISSA_SIGN_ENUM __C)
+{
+  return (__m128h) __builtin_ia32_getmantph128_mask ((__v8hf) __A,
+						     (__C << 2) | __B,
+						     (__v8hf) __W,
+						     (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_getmant_ph (__mmask8 __U, __m128h __A,
+		      _MM_MANTISSA_NORM_ENUM __B,
+		      _MM_MANTISSA_SIGN_ENUM __C)
+{
+  return (__m128h) __builtin_ia32_getmantph128_mask ((__v8hf) __A,
+						     (__C << 2) | __B,
+						     (__v8hf)
+						     _mm_setzero_ph (),
+						     (__mmask8) __U);
+}
+
+#else
+#define _mm256_getmant_ph(X, B, C)                                              \
+  ((__m256h) __builtin_ia32_getmantph256_mask ((__v16hf)(__m256h) (X),           \
+					 (int)(((C)<<2) | (B)),                 \
+					  (__v16hf)(__m256h)_mm256_setzero_ph (),\
+					  (__mmask16)-1))
+
+#define _mm256_mask_getmant_ph(W, U, X, B, C)                                   \
+  ((__m256h) __builtin_ia32_getmantph256_mask ((__v16hf)(__m256h) (X),           \
+					 (int)(((C)<<2) | (B)),                 \
+					  (__v16hf)(__m256h)(W),                 \
+					  (__mmask16)(U)))
+
+#define _mm256_maskz_getmant_ph(U, X, B, C)                                     \
+  ((__m256h) __builtin_ia32_getmantph256_mask ((__v16hf)(__m256h) (X),           \
+					 (int)(((C)<<2) | (B)),                 \
+					  (__v16hf)(__m256h)_mm256_setzero_ph (),\
+					  (__mmask16)(U)))
+
+#define _mm_getmant_ph(X, B, C)                                                 \
+  ((__m128h) __builtin_ia32_getmantph128_mask ((__v8hf)(__m128h) (X),           \
+					 (int)(((C)<<2) | (B)),                 \
+					  (__v8hf)(__m128h)_mm_setzero_ph (),   \
+					  (__mmask8)-1))
+
+#define _mm_mask_getmant_ph(W, U, X, B, C)                                      \
+  ((__m128h) __builtin_ia32_getmantph128_mask ((__v8hf)(__m128h) (X),           \
+					 (int)(((C)<<2) | (B)),                 \
+					  (__v8hf)(__m128h)(W),                 \
+					  (__mmask8)(U)))
+
+#define _mm_maskz_getmant_ph(U, X, B, C)                                        \
+  ((__m128h) __builtin_ia32_getmantph128_mask ((__v8hf)(__m128h) (X),           \
+					 (int)(((C)<<2) | (B)),                 \
+					  (__v8hf)(__m128h)_mm_setzero_ph (),   \
+					  (__mmask8)(U)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index d2ba1a5edac..79e7edf13e5 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1304,6 +1304,9 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
 
 # FP16 builtins
 DEF_FUNCTION_TYPE (V8HF, V8HI)
+DEF_FUNCTION_TYPE (QI, V8HF, INT, UQI)
+DEF_FUNCTION_TYPE (HI, V16HF, INT, UHI)
+DEF_FUNCTION_TYPE (SI, V32HF, INT, USI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 6964062c874..ed1a4a38b1c 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2818,6 +2818,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv8
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv16hf_mask, "__builtin_ia32_vreduceph_v16hf_mask", IX86_BUILTIN_VREDUCEPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rndscalev8hf_mask, "__builtin_ia32_vrndscaleph_v8hf_mask", IX86_BUILTIN_VRNDSCALEPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_rndscalev16hf_mask, "__builtin_ia32_vrndscaleph_v16hf_mask", IX86_BUILTIN_VRNDSCALEPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv16hf_mask, "__builtin_ia32_fpclassph256_mask", IX86_BUILTIN_FPCLASSPH256, UNKNOWN, (int) HI_FTYPE_V16HF_INT_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv8hf_mask, "__builtin_ia32_fpclassph128_mask", IX86_BUILTIN_FPCLASSPH128, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_fpclassv32hf_mask, "__builtin_ia32_fpclassph512_mask", IX86_BUILTIN_FPCLASSPH512, UNKNOWN, (int) SI_FTYPE_V32HF_INT_USI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512dq_vmfpclassv8hf_mask, "__builtin_ia32_fpclasssh_mask", IX86_BUILTIN_FPCLASSSH_MASK, UNKNOWN, (int) QI_FTYPE_V8HF_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getexpv16hf_mask, "__builtin_ia32_getexpph256_mask", IX86_BUILTIN_GETEXPPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getexpv8hf_mask, "__builtin_ia32_getexpph128_mask", IX86_BUILTIN_GETEXPPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getmantv16hf_mask, "__builtin_ia32_getmantph256_mask", IX86_BUILTIN_GETMANTPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getmantv8hf_mask, "__builtin_ia32_getmantph128_mask", IX86_BUILTIN_GETMANTPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3041,6 +3049,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducepv32hf_mask_round, "__buil
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_reducesv8hf_mask_round, "__builtin_ia32_vreducesh_v8hf_mask_round", IX86_BUILTIN_VREDUCESH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_rndscalev32hf_mask_round, "__builtin_ia32_vrndscaleph_v32hf_mask_round", IX86_BUILTIN_VRNDSCALEPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_rndscalev8hf_mask_round, "__builtin_ia32_vrndscalesh_v8hf_mask_round", IX86_BUILTIN_VRNDSCALESH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getexpv32hf_mask_round, "__builtin_ia32_getexpph512_mask", IX86_BUILTIN_GETEXPPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_sgetexpv8hf_mask_round, "__builtin_ia32_getexpsh_mask_round", IX86_BUILTIN_GETEXPSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vgetmantv8hf_mask_round, "__builtin_ia32_getmantsh_mask_round", IX86_BUILTIN_GETMANTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 655234cbdd0..266aa411ddb 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9735,6 +9735,9 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case HI_FTYPE_V16SF_INT_UHI:
     case QI_FTYPE_V8SF_INT_UQI:
     case QI_FTYPE_V4SF_INT_UQI:
+    case QI_FTYPE_V8HF_INT_UQI:
+    case HI_FTYPE_V16HF_INT_UHI:
+    case SI_FTYPE_V32HF_INT_USI:
     case V4SI_FTYPE_V4SI_V4SI_UHI:
     case V8SI_FTYPE_V8SI_V8SI_UHI:
       nargs = 3;
@@ -10056,8 +10059,10 @@ ix86_expand_args_builtin (const struct builtin_description *d,
 	      case CODE_FOR_avx_vpermilv4df_mask:
 	      case CODE_FOR_avx512f_getmantv8df_mask:
 	      case CODE_FOR_avx512f_getmantv16sf_mask:
+	      case CODE_FOR_avx512vl_getmantv16hf_mask:
 	      case CODE_FOR_avx512vl_getmantv8sf_mask:
 	      case CODE_FOR_avx512vl_getmantv4df_mask:
+	      case CODE_FOR_avx512fp16_getmantv8hf_mask:
 	      case CODE_FOR_avx512vl_getmantv4sf_mask:
 	      case CODE_FOR_avx512vl_getmantv2df_mask:
 	      case CODE_FOR_avx512dq_rangepv8df_mask_round:
@@ -10593,10 +10598,12 @@ ix86_expand_round_builtin (const struct builtin_description *d,
 		{
 		case CODE_FOR_avx512f_getmantv8df_mask_round:
 		case CODE_FOR_avx512f_getmantv16sf_mask_round:
+		case CODE_FOR_avx512bw_getmantv32hf_mask_round:
 		case CODE_FOR_avx512f_vgetmantv2df_round:
 		case CODE_FOR_avx512f_vgetmantv2df_mask_round:
 		case CODE_FOR_avx512f_vgetmantv4sf_round:
 		case CODE_FOR_avx512f_vgetmantv4sf_mask_round:
+		case CODE_FOR_avx512f_vgetmantv8hf_mask_round:
 		  error ("the immediate argument must be a 4-bit immediate");
 		  return const0_rtx;
 		case CODE_FOR_avx512f_cmpv8df3_mask_round:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f43651a95ce..c4db778e25d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -701,7 +701,8 @@ (define_mode_attr ssequarterinsnmode
   [(V16SF "V4SF") (V8DF "V2DF") (V16SI "TI") (V8DI "TI")])
 
 (define_mode_attr vecmemsuffix
-  [(V16SF "{z}") (V8SF "{y}") (V4SF "{x}")
+  [(V32HF "{z}") (V16HF "{y}") (V8HF "{x}")
+   (V16SF "{z}") (V8SF "{y}") (V4SF "{x}")
    (V8DF "{z}") (V4DF "{y}") (V2DF "{x}")])
 
 (define_mode_attr ssedoublemodelower
@@ -10050,8 +10051,8 @@ (define_insn "<avx512>_vternlog<mode>_mask"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx512>_getexp<mode><mask_name><round_saeonly_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-        (unspec:VF_AVX512VL [(match_operand:VF_AVX512VL 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")]
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+        (unspec:VFH_AVX512VL [(match_operand:VFH_AVX512VL 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")]
                         UNSPEC_GETEXP))]
    "TARGET_AVX512F"
    "vgetexp<ssemodesuffix>\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}";
@@ -10059,11 +10060,11 @@ (define_insn "<avx512>_getexp<mode><mask_name><round_saeonly_name>"
      (set_attr "mode" "<MODE>")])
 
 (define_insn "avx512f_sgetexp<mode><mask_scalar_name><round_saeonly_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (unspec:VF_128
-	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")]
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (unspec:VFH_128
+	    [(match_operand:VFH_128 1 "register_operand" "v")
+	     (match_operand:VFH_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")]
 	    UNSPEC_GETEXP)
 	  (match_dup 1)
 	  (const_int 1)))]
@@ -23571,10 +23572,10 @@ (define_insn "avx512dq_ranges<mode><mask_scalar_name><round_saeonly_scalar_name>
 (define_insn "avx512dq_fpclass<mode><mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
           (unspec:<avx512fmaskmode>
-            [(match_operand:VF_AVX512VL 1 "vector_operand" "vm")
+            [(match_operand:VFH_AVX512VL 1 "vector_operand" "vm")
              (match_operand 2 "const_0_to_255_operand" "n")]
              UNSPEC_FPCLASS))]
-   "TARGET_AVX512DQ"
+   "TARGET_AVX512DQ || VALID_AVX512FP16_REG_MODE(<MODE>mode)"
    "vfpclass<ssemodesuffix><vecmemsuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}";
   [(set_attr "type" "sse")
    (set_attr "length_immediate" "1")
@@ -23585,11 +23586,11 @@ (define_insn "avx512dq_vmfpclass<mode><mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(and:<avx512fmaskmode>
 	  (unspec:<avx512fmaskmode>
-	    [(match_operand:VF_128 1 "nonimmediate_operand" "vm")
+	    [(match_operand:VFH_128 1 "nonimmediate_operand" "vm")
              (match_operand 2 "const_0_to_255_operand" "n")]
 	    UNSPEC_FPCLASS)
 	  (const_int 1)))]
-   "TARGET_AVX512DQ"
+   "TARGET_AVX512DQ || VALID_AVX512FP16_REG_MODE(<MODE>mode)"
    "vfpclass<ssescalarmodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}";
   [(set_attr "type" "sse")
    (set_attr "length_immediate" "1")
@@ -23597,9 +23598,9 @@ (define_insn "avx512dq_vmfpclass<mode><mask_scalar_merge_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_getmant<mode><mask_name><round_saeonly_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(unspec:VF_AVX512VL
-	  [(match_operand:VF_AVX512VL 1 "nonimmediate_operand" "<round_saeonly_constraint>")
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(unspec:VFH_AVX512VL
+	  [(match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_GETMANT))]
   "TARGET_AVX512F"
@@ -23608,11 +23609,11 @@ (define_insn "<avx512>_getmant<mode><mask_name><round_saeonly_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "avx512f_vgetmant<mode><mask_scalar_name><round_saeonly_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (unspec:VF_128
-	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (unspec:VFH_128
+	    [(match_operand:VFH_128 1 "register_operand" "v")
+	     (match_operand:VFH_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")
 	     (match_operand:SI 3 "const_0_to_15_operand")]
 	    UNSPEC_GETMANT)
 	  (match_dup 1)
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 4c8e54e4c2a..b3cffa0644f 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -713,10 +713,20 @@
 #define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D)
 #define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D)
 #define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8)
+#define __builtin_ia32_fpclassph512_mask(A, D, C) __builtin_ia32_fpclassph512_mask(A, 1, C)
+#define __builtin_ia32_fpclasssh_mask(A, D, U) __builtin_ia32_fpclasssh_mask(A, 1, U)
+#define __builtin_ia32_getexpph512_mask(A, B, C, D) __builtin_ia32_getexpph512_mask(A, B, C, 8)
+#define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4)
+#define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8)
+#define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
 #define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D)
+#define __builtin_ia32_fpclassph256_mask(A, D, C) __builtin_ia32_fpclassph256_mask(A, 1, C)
+#define __builtin_ia32_fpclassph128_mask(A, D, C) __builtin_ia32_fpclassph128_mask(A, 1, C)
+#define __builtin_ia32_getmantph256_mask(A, E, C, D) __builtin_ia32_getmantph256_mask(A, 1, C, D)
+#define __builtin_ia32_getmantph128_mask(A, E, C, D) __builtin_ia32_getmantph128_mask(A, 1, C, D)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 044d427c932..67ef567e437 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -730,10 +730,20 @@
 #define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D)
 #define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D)
 #define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8)
+#define __builtin_ia32_fpclassph512_mask(A, D, C) __builtin_ia32_fpclassph512_mask(A, 1, C)
+#define __builtin_ia32_fpclasssh_mask(A, D, U) __builtin_ia32_fpclasssh_mask(A, 1, U)
+#define __builtin_ia32_getexpph512_mask(A, B, C, D) __builtin_ia32_getexpph512_mask(A, B, C, 8)
+#define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4)
+#define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8)
+#define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
 #define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D)
+#define __builtin_ia32_fpclassph256_mask(A, D, C) __builtin_ia32_fpclassph256_mask(A, 1, C)
+#define __builtin_ia32_fpclassph128_mask(A, D, C) __builtin_ia32_fpclassph128_mask(A, 1, C)
+#define __builtin_ia32_getmantph256_mask(A, E, C, D) __builtin_ia32_getmantph256_mask(A, 1, C, D)
+#define __builtin_ia32_getmantph128_mask(A, E, C, D) __builtin_ia32_getmantph128_mask(A, 1, C, D)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index b7ffdf7e1df..04163874f90 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -677,8 +677,11 @@ test_1 (_mm512_reduce_ph, __m512h, __m512h, 123)
 test_1 (_mm_roundscale_ph, __m128h, __m128h, 123)
 test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123)
 test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123)
+test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8)
 test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
+test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
+test_1y (_mm512_getmant_round_ph, __m512h, __m512h, 1, 1, 8)
 test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
@@ -705,6 +708,8 @@ test_2 (_mm_maskz_roundscale_ph, __m128h, __mmask8, __m128h, 123)
 test_2 (_mm256_maskz_roundscale_ph, __m256h, __mmask16, __m256h, 123)
 test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123)
 test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123)
+test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8)
+test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -712,6 +717,10 @@ test_2x (_mm512_maskz_reduce_round_ph, __m512h, __mmask32, __m512h, 123, 8)
 test_2x (_mm512_maskz_roundscale_round_ph, __m512h, __mmask32, __m512h, 123, 8)
 test_2x (_mm_reduce_round_sh, __m128h, __m128h, __m128h, 123, 8)
 test_2x (_mm_roundscale_round_sh, __m128h, __m128h, __m128h, 123, 8)
+test_2x (_mm512_maskz_getmant_ph, __m512h, __mmask32, __m512h, 1, 1)
+test_2x (_mm_getmant_sh, __m128h, __m128h, __m128h, 1, 1)
+test_2y (_mm512_maskz_getmant_round_ph, __m512h, __mmask32, __m512h, 1, 1, 8)
+test_2y (_mm_getmant_round_sh, __m128h, __m128h, __m128h, 1, 1, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -737,12 +746,18 @@ test_3 (_mm_mask_roundscale_ph, __m128h, __m128h, __mmask8, __m128h, 123)
 test_3 (_mm256_mask_roundscale_ph, __m256h, __m256h, __mmask16, __m256h, 123)
 test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123)
 test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123)
+test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
 test_3x (_mm512_mask_roundscale_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
 test_3x (_mm_maskz_reduce_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_3x (_mm_maskz_roundscale_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8)
+test_3x (_mm512_mask_getmant_ph, __m512h, __m512h, __mmask32, __m512h, 1, 1)
+test_3x (_mm_maskz_getmant_sh, __m128h, __mmask8, __m128h, __m128h, 1, 1)
+test_3y (_mm_maskz_getmant_round_sh, __m128h, __mmask8, __m128h, __m128h, 1, 1, 8)
+test_3y (_mm512_mask_getmant_round_ph, __m512h, __m512h, __mmask32, __m512h, 1, 1, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -760,8 +775,11 @@ test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m51
 test_4 (_mm_mask_scalef_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
 test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
+test_4 (_mm_mask_getexp_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
+test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
+test_4y (_mm_mask_getmant_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1, 8)
 
 /* avx512fp16vlintrin.h */
 test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 5dbe8cba5ea..008600a393d 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -782,8 +782,11 @@ test_1 (_mm512_reduce_ph, __m512h, __m512h, 123)
 test_1 (_mm_roundscale_ph, __m128h, __m128h, 123)
 test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123)
 test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123)
+test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8)
 test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
+test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
+test_1y (_mm512_getmant_round_ph, __m512h, __m512h, 1, 1, 8)
 test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
@@ -809,6 +812,8 @@ test_2 (_mm_maskz_roundscale_ph, __m128h, __mmask8, __m128h, 123)
 test_2 (_mm256_maskz_roundscale_ph, __m256h, __mmask16, __m256h, 123)
 test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123)
 test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123)
+test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8)
+test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -816,6 +821,10 @@ test_2x (_mm512_maskz_reduce_round_ph, __m512h, __mmask32, __m512h, 123, 8)
 test_2x (_mm512_maskz_roundscale_round_ph, __m512h, __mmask32, __m512h, 123, 8)
 test_2x (_mm_reduce_round_sh, __m128h, __m128h, __m128h, 123, 8)
 test_2x (_mm_roundscale_round_sh, __m128h, __m128h, __m128h, 123, 8)
+test_2x (_mm512_maskz_getmant_ph, __m512h, __mmask32, __m512h, 1, 1)
+test_2x (_mm_getmant_sh, __m128h, __m128h, __m128h, 1, 1)
+test_2y (_mm512_maskz_getmant_round_ph, __m512h, __mmask32, __m512h, 1, 1, 8)
+test_2y (_mm_getmant_round_sh, __m128h, __m128h, __m128h, 1, 1, 8)
 test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -840,12 +849,18 @@ test_3 (_mm_mask_roundscale_ph, __m128h, __m128h, __mmask8, __m128h, 123)
 test_3 (_mm256_mask_roundscale_ph, __m256h, __m256h, __mmask16, __m256h, 123)
 test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123)
 test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123)
+test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
 test_3x (_mm512_mask_roundscale_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
 test_3x (_mm_maskz_reduce_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_3x (_mm_maskz_roundscale_round_sh, __m128h, __mmask8, __m128h, __m128h, 123, 8)
+test_3x (_mm512_mask_getmant_ph, __m512h, __m512h, __mmask32, __m512h, 1, 1)
+test_3x (_mm_maskz_getmant_sh, __m128h, __mmask8, __m128h, __m128h, 1, 1)
+test_3y (_mm_maskz_getmant_round_sh, __m128h, __mmask8, __m128h, __m128h, 1, 1, 8)
+test_3y (_mm512_mask_getmant_round_ph, __m512h, __m512h, __mmask32, __m512h, 1, 1, 8)
 test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
@@ -862,8 +877,11 @@ test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
 test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
 test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
+test_4 (_mm_mask_getexp_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
+test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
+test_4y (_mm_mask_getmant_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1, 8)
 
 /* avx512fp16vlintrin.h */
 test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 2d968f07bc8..b3f07587acb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -731,10 +731,20 @@
 #define __builtin_ia32_vrndscaleph_v8hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v8hf_mask(A, 123, C, D)
 #define __builtin_ia32_vrndscaleph_v16hf_mask(A, B, C, D) __builtin_ia32_vrndscaleph_v16hf_mask(A, 123, C, D)
 #define __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, C, D, E, F) __builtin_ia32_vrndscalesh_v8hf_mask_round(A, B, 123, D, E, 8)
+#define __builtin_ia32_fpclassph512_mask(A, D, C) __builtin_ia32_fpclassph512_mask(A, 1, C)
+#define __builtin_ia32_fpclasssh_mask(A, D, U) __builtin_ia32_fpclasssh_mask(A, 1, U)
+#define __builtin_ia32_getexpph512_mask(A, B, C, D) __builtin_ia32_getexpph512_mask(A, B, C, 8)
+#define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4)
+#define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8)
+#define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
 #define __builtin_ia32_vcmpph_v16hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v16hf_mask(A, B, 1, D)
+#define __builtin_ia32_fpclassph256_mask(A, D, C) __builtin_ia32_fpclassph256_mask(A, 1, C)
+#define __builtin_ia32_fpclassph128_mask(A, D, C) __builtin_ia32_fpclassph128_mask(A, 1, C)
+#define __builtin_ia32_getmantph256_mask(A, E, C, D) __builtin_ia32_getmantph256_mask(A, 1, C, D)
+#define __builtin_ia32_getmantph128_mask(A, E, C, D) __builtin_ia32_getmantph128_mask(A, 1, C, D)
 
 /* vpclmulqdqintrin.h */
 #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1) 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 23/62] AVX512FP16: Add testcase for fpclass/getmant/getexp instructions.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (21 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 22/62] AVX512FP16: Add fpclass/getexp/getmant instructions liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 24/62] AVX512FP16: Add vmovw/vmovsh liuhongt
                   ` (38 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h (V512):
	Add xmm component.
	* gcc.target/i386/avx512fp16-vfpclassph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vfpclassph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfpclasssh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfpclasssh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vgetexpph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vgetexpph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vgetexpsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vgetexpsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vgetmantph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vgetmantph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vgetmantsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vgetmantsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfpclassph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfpclassph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vgetexpph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vgetexpph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vgetmantph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vgetmantph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-helper.h       |   1 +
 .../i386/avx512fp16-vfpclassph-1a.c           |  16 +++
 .../i386/avx512fp16-vfpclassph-1b.c           |  77 +++++++++++++
 .../i386/avx512fp16-vfpclasssh-1a.c           |  16 +++
 .../i386/avx512fp16-vfpclasssh-1b.c           |  76 +++++++++++++
 .../gcc.target/i386/avx512fp16-vgetexpph-1a.c |  24 +++++
 .../gcc.target/i386/avx512fp16-vgetexpph-1b.c |  99 +++++++++++++++++
 .../gcc.target/i386/avx512fp16-vgetexpsh-1a.c |  24 +++++
 .../gcc.target/i386/avx512fp16-vgetexpsh-1b.c |  61 +++++++++++
 .../i386/avx512fp16-vgetmantph-1a.c           |  24 +++++
 .../i386/avx512fp16-vgetmantph-1b.c           | 102 ++++++++++++++++++
 .../i386/avx512fp16-vgetmantsh-1a.c           |  24 +++++
 .../i386/avx512fp16-vgetmantsh-1b.c           |  62 +++++++++++
 .../i386/avx512fp16vl-vfpclassph-1a.c         |  22 ++++
 .../i386/avx512fp16vl-vfpclassph-1b.c         |  16 +++
 .../i386/avx512fp16vl-vgetexpph-1a.c          |  26 +++++
 .../i386/avx512fp16vl-vgetexpph-1b.c          |  16 +++
 .../i386/avx512fp16vl-vgetmantph-1a.c         |  30 ++++++
 .../i386/avx512fp16vl-vgetmantph-1b.c         |  16 +++
 19 files changed, 732 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
index ec88888532c..f6f46872c35 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
@@ -29,6 +29,7 @@ typedef union
   __m256h         ymmh[2];
   __m256i         ymmi[2];
   __m128h         xmmh[4];
+  __m128	  xmm[4];
   unsigned short  u16[32];
   unsigned int    u32[16];
   float           f32[16];
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1a.c
new file mode 100644
index 00000000000..a97dddf6110
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfpclassphz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclassphz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h x512;
+volatile __mmask16 m32;
+
+void extern
+avx512dq_test (void)
+{
+  m32 = _mm512_fpclass_ph_mask (x512, 13);
+  m32 = _mm512_mask_fpclass_ph_mask (2, x512, 13);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1b.c
new file mode 100644
index 00000000000..9ffb5606b81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1b.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16" } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#define AVX512FP16
+#include "avx512f-helper.h"
+
+#include <math.h>
+#include <limits.h>
+#include <float.h>
+#include "avx512f-mask-type.h"
+#define SIZE (AVX512F_LEN / 16)
+
+#ifndef __FPCLASSPH__
+#define __FPCLASSPH__
+int check_fp_class_hp (_Float16 src, int imm)
+{
+  int qNaN_res = isnan (src);
+  int sNaN_res = isnan (src);
+  int Pzero_res = (src == 0.0);
+  int Nzero_res = (src == -0.0);
+  int PInf_res = (isinf (src) == 1);
+  int NInf_res = (isinf (src) == -1);
+  int Denorm_res = (fpclassify (src) == FP_SUBNORMAL);
+  int FinNeg_res = __builtin_finite (src) && (src < 0);
+
+  int result = (((imm & 1) && qNaN_res)
+		|| (((imm >> 1) & 1) && Pzero_res)
+		|| (((imm >> 2) & 1) && Nzero_res)
+		|| (((imm >> 3) & 1) && PInf_res)
+		|| (((imm >> 4) & 1) && NInf_res)
+		|| (((imm >> 5) & 1) && Denorm_res)
+		|| (((imm >> 6) & 1) && FinNeg_res)
+		|| (((imm >> 7) & 1) && sNaN_res));
+  return result;
+}
+#endif
+
+MASK_TYPE
+CALC (_Float16 *s1, int imm)
+{
+  int i;
+  MASK_TYPE res = 0;
+
+  for (i = 0; i < SIZE; i++)
+    if (check_fp_class_hp(s1[i], imm))
+      res = res | (1 << i);
+
+  return res;
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, h) src;
+  MASK_TYPE res1, res2, res_ref = 0;
+  MASK_TYPE mask = MASK_VALUE;
+
+  src.a[0] = NAN;
+  src.a[1] = 1.0 / 0.0;
+  for (i = 1; i < SIZE; i++)
+    {
+      src.a[i] = -24.43 + 0.6 * i;
+    }
+
+  res1 = INTRINSIC (_fpclass_ph_mask) (src.x, 0xFF);
+  res2 = INTRINSIC (_mask_fpclass_ph_mask) (mask, src.x, 0xFF);
+
+  res_ref = CALC (src.a, 0xFF);
+
+  if (res_ref != res1)
+    abort ();
+
+  if ((mask & res_ref) != res2)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1a.c
new file mode 100644
index 00000000000..7a31fd8b47d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfpclasssh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclasssh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[0-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x128;
+volatile __mmask8 m8;
+
+void extern
+avx512dq_test (void)
+{
+  m8 = _mm_fpclass_sh_mask (x128, 13);
+  m8 = _mm_mask_fpclass_sh_mask (m8, x128, 13);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1b.c
new file mode 100644
index 00000000000..bdc6f9f059a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclasssh-1b.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16" } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#define AVX512FP16
+#include "avx512f-helper.h"
+
+#include <math.h>
+#include <limits.h>
+#include <float.h>
+#include "avx512f-mask-type.h"
+#define SIZE (128 / 16)
+
+#ifndef __FPCLASSSH__
+#define __FPCLASSSH__
+int check_fp_class_hp (_Float16 src, int imm)
+{
+  int qNaN_res = isnan (src);
+  int sNaN_res = isnan (src);
+  int Pzero_res = (src == 0.0);
+  int Nzero_res = (src == -0.0);
+  int PInf_res = (isinf (src) == 1);
+  int NInf_res = (isinf (src) == -1);
+  int Denorm_res = (fpclassify (src) == FP_SUBNORMAL);
+  int FinNeg_res = __builtin_finite (src) && (src < 0);
+
+  int result = (((imm & 1) && qNaN_res)
+		|| (((imm >> 1) & 1) && Pzero_res)
+		|| (((imm >> 2) & 1) && Nzero_res)
+		|| (((imm >> 3) & 1) && PInf_res)
+		|| (((imm >> 4) & 1) && NInf_res)
+		|| (((imm >> 5) & 1) && Denorm_res)
+		|| (((imm >> 6) & 1) && FinNeg_res)
+		|| (((imm >> 7) & 1) && sNaN_res));
+  return result;
+}
+#endif
+
+__mmask8
+CALC (_Float16 *s1, int imm)
+{
+  int i;
+  __mmask8 res = 0;
+
+  if (check_fp_class_hp(s1[0], imm))
+    res = res | 1;
+
+  return res;
+}
+
+void
+TEST (void)
+{
+  int i;
+  union128h src;
+  __mmask8 res1, res2, res_ref = 0;
+  __mmask8 mask = MASK_VALUE;
+
+  src.a[0] = 1.0 / 0.0;
+  for (i = 1; i < SIZE; i++)
+    {
+      src.a[i] = -24.43 + 0.6 * i;
+    }
+
+  res1 = _mm_fpclass_sh_mask (src.x, 0xFF);
+  res2 = _mm_mask_fpclass_sh_mask (mask, src.x, 0xFF);
+
+
+  res_ref = CALC (src.a, 0xFF);
+
+  if (res_ref != res1)
+    abort ();
+
+  if ((mask & res_ref) != res2)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1a.c
new file mode 100644
index 00000000000..993cbd944d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1} } */
+
+#include <immintrin.h>
+
+volatile __m512h x;
+volatile __mmask32 m;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm512_getexp_ph (x);
+  x = _mm512_mask_getexp_ph (x, m, x);
+  x = _mm512_maskz_getexp_ph (m, x);
+  x = _mm512_getexp_round_ph (x, _MM_FROUND_NO_EXC);
+  x = _mm512_mask_getexp_round_ph (x, m, x, _MM_FROUND_NO_EXC);
+  x = _mm512_maskz_getexp_round_ph (m, x, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1b.c
new file mode 100644
index 00000000000..3483c9537dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpph-1b.c
@@ -0,0 +1,99 @@
+ /* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(getexp_ph) (V512 * dest, V512 op1,
+                  __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    float emu[32];
+    __mmask16 m1, m2;
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(*dest, &v7, &v8);
+    v3.zmm = _mm512_getexp_round_ps(v1.zmm, _ROUND_CUR);
+    v4.zmm = _mm512_getexp_round_ps(v2.zmm, _ROUND_CUR);
+    for (i=0; i<16; i++)
+      {
+      emu[i] = v3.f32[i];
+      emu[i+16] = v4.f32[i];
+      }
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+               v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+            v5.f32[i] = emu[i];
+
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+               v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = emu[i+16];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(getexp_ph) (&exp, src1,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_getexp_ph) (HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _getexp_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(getexp_ph) (&exp, src1, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_getexp_ph) (HF(res), MASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_getexp_ph);
+
+  EMULATE(getexp_ph) (&exp, src1,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_getexp_ph) (ZMASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_getexp_ph);
+#if AVX512F_LEN == 512
+  EMULATE(getexp_ph) (&exp, src1,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_getexp_round_ph) (HF(src1), _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _getexp_round_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(getexp_ph) (&exp, src1, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_getexp_round_ph) (HF(res), MASK_VALUE, HF(src1),
+					 _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_getexp_round_ph);
+
+  EMULATE(getexp_ph) (&exp, src1,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_getexp_round_ph) (ZMASK_VALUE, HF(src1), _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_getexp_round_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1a.c
new file mode 100644
index 00000000000..397fd3e14a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\{\n\]\[^\n\]*%xmm\[0-9\]+\, %xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\{\n\]\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\{\n\]\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+\, %xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetexpsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getexp_sh (x, x);
+  x = _mm_mask_getexp_sh (x, m, x, x);
+  x = _mm_maskz_getexp_sh (m, x, x);
+  x = _mm_getexp_round_sh (x, x, _MM_FROUND_NO_EXC);
+  x = _mm_mask_getexp_round_sh (x, m, x, x, _MM_FROUND_NO_EXC);
+  x = _mm_maskz_getexp_round_sh (m, x, x, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1b.c
new file mode 100644
index 00000000000..ca9834df6e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetexpsh-1b.c
@@ -0,0 +1,61 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_getexp_sh(V512 * dest, V512 op1,
+                  __mmask32 k, int zero_mask)
+{
+    V512 v0, v1, v2, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    v0.xmm[0] = _mm_getexp_round_ss (v1.xmm[0], v1.xmm[0], _ROUND_CUR);
+
+    if ((k&1) || !k)
+      v5.f32[0] = v0.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_getexp_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_getexp_round_sh(exp.xmmh[0], src1.xmmh[0], _ROUND_CUR);
+  check_results(&res, &exp, N_ELEMS, "_mm_getexp_round_sh");
+
+  init_dest(&res, &exp);
+  emulate_getexp_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_mask_getexp_round_sh(res.xmmh[0], 0x1, exp.xmmh[0],
+					 src1.xmmh[0], _ROUND_CUR);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_getexp_round_sh");
+
+  emulate_getexp_sh(&exp, src1,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_getexp_round_sh(0x3, exp.xmmh[0], src1.xmmh[0],
+					  _ROUND_CUR);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_getexp_round_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1a.c
new file mode 100644
index 00000000000..69e0c72721b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h x, y;
+volatile __mmask32 m;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm512_getmant_ph (y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm512_mask_getmant_ph (x, m, y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm512_maskz_getmant_ph (m, y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm512_getmant_round_ph (y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC);
+  x = _mm512_mask_getmant_round_ph (x, m, y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC);
+  x = _mm512_maskz_getmant_round_ph (m, y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1b.c
new file mode 100644
index 00000000000..c18d1aa5dc1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantph-1b.c
@@ -0,0 +1,102 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(getmant_ph) (V512 * dest, V512 op1,
+                  __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    float emu[32];
+    __mmask16 m1, m2;
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(*dest, &v7, &v8);
+    v3.zmm = _mm512_getmant_round_ps(v1.zmm, 2, 0, _ROUND_CUR);
+    v4.zmm = _mm512_getmant_round_ps(v2.zmm, 2, 0, _ROUND_CUR);
+    for (i=0; i<16; i++)
+      {
+      emu[i] = v3.f32[i];
+      emu[i+16] = v4.f32[i];
+      }
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+               v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+            v5.f32[i] = emu[i];
+
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+               v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = emu[i+16];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(getmant_ph) (&exp, src1,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_getmant_ph) (HF(src1), 2, 0);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _getmant_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(getmant_ph) (&exp, src1,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_getmant_ph) (HF(res), MASK_VALUE,
+					  HF(src1), 2, 0);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_getmant_ph);
+
+  EMULATE(getmant_ph) (&exp, src1,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_getmant_ph) (ZMASK_VALUE, HF(src1),
+					   2, 0);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_getmant_ph);
+#if AVX512F_LEN == 512
+  EMULATE(getmant_ph) (&exp, src1,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_getmant_round_ph) (HF(src1), 2, 0, _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _getmant_round_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(getmant_ph) (&exp, src1,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_getmant_round_ph) (HF(res), MASK_VALUE,
+					  HF(src1), 2, 0, _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_getmant_round_ph);
+
+  EMULATE(getmant_ph) (&exp, src1,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_getmant_round_ph) (ZMASK_VALUE, HF(src1),
+					   2, 0, _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_getmant_round_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1a.c
new file mode 100644
index 00000000000..b533f20341b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantsh\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x, y, z;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_getmant_sh (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm_mask_getmant_sh (x, m, y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm_maskz_getmant_sh (m, y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm_getmant_round_sh (y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC);
+  x = _mm_mask_getmant_round_sh (x, m, y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC);
+  x = _mm_maskz_getmant_round_sh (m, y, z, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1b.c
new file mode 100644
index 00000000000..bee8b04dfc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vgetmantsh-1b.c
@@ -0,0 +1,62 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_getmant_sh(V512 * dest, V512 op1,
+                  __mmask32 k, int zero_mask)
+{
+    V512 v0, v1, v2, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    v0.xmm[0] = _mm_getmant_round_ss (v1.xmm[0], v1.xmm[0], 2, 0, _ROUND_CUR);
+
+    if ((k&1) || !k)
+      v5.f32[0] = v0.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  emulate_getmant_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_getmant_round_sh(src1.xmmh[0], exp.xmmh[0],
+				     2, 0, _ROUND_CUR);
+  check_results(&res, &exp, 1, "_mm_getmant_round_sh");
+
+  init_dest(&res, &exp);
+  emulate_getmant_sh(&exp, src1,  0x1, 0);
+  res.xmmh[0] = _mm_mask_getmant_round_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+					  exp.xmmh[0], 2, 0, _ROUND_CUR);
+  check_results(&res, &exp, 1, "_mm_mask_getmant_round_sh");
+
+  emulate_getmant_sh(&exp, src1,  0x3, 1);
+  res.xmmh[0] = _mm_maskz_getmant_round_sh(0x3, src1.xmmh[0], exp.xmmh[0],
+					   2, 0, _ROUND_CUR);
+  check_results(&res, &exp, 1, "_mm_maskz_getmant_round_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1a.c
new file mode 100644
index 00000000000..897a3c83692
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1a.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfpclassphy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclassphx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclassphy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclassphx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h x256;
+volatile __m128h x128;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512dq_test (void)
+{
+  m16 = _mm256_fpclass_ph_mask (x256, 13);
+  m8 = _mm_fpclass_ph_mask (x128, 13);
+  m16 = _mm256_mask_fpclass_ph_mask (2, x256, 13);
+  m8 = _mm_mask_fpclass_ph_mask (2, x128, 13);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c
new file mode 100644
index 00000000000..6745f137c27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfpclassph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfpclassph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfpclassph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1a.c
new file mode 100644
index 00000000000..82c23b6e63d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1} } */
+/* { dg-final { scan-assembler-times "vgetexpph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1} } */
+
+#include <immintrin.h>
+
+volatile __m256h xx;
+volatile __m128h x2;
+volatile __mmask8 m8;
+volatile __mmask16 m16;
+
+void extern
+avx512vl_test (void)
+{
+  xx = _mm256_getexp_ph (xx);
+  xx = _mm256_mask_getexp_ph (xx, m16, xx);
+  xx = _mm256_maskz_getexp_ph (m16, xx);
+  x2 = _mm_getexp_ph (x2);
+  x2 = _mm_mask_getexp_ph (x2, m8, x2);
+  x2 = _mm_maskz_getexp_ph (m8, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c
new file mode 100644
index 00000000000..7eb4fa4f537
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetexpph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vgetexpph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vgetexpph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1a.c
new file mode 100644
index 00000000000..4ce6ed58cf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1a.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512vl -mavx512fp16 " } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vgetmantph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h x, y;
+volatile __m128h a, b;
+volatile __mmask8 m8;
+volatile __mmask16 m16;
+
+void extern
+avx512vl_test (void)
+{
+  x = _mm256_getmant_ph (y, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  x = _mm256_mask_getmant_ph (x, m16, y, _MM_MANT_NORM_p75_1p5,
+			      _MM_MANT_SIGN_src);
+  x = _mm256_maskz_getmant_ph (m16, y, _MM_MANT_NORM_p75_1p5,
+			       _MM_MANT_SIGN_src);
+  a = _mm_getmant_ph (b, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src);
+  a = _mm_mask_getmant_ph (a, m8, b, _MM_MANT_NORM_p75_1p5,
+			   _MM_MANT_SIGN_src);
+  a = _mm_maskz_getmant_ph (m8, b, _MM_MANT_NORM_p75_1p5,
+			    _MM_MANT_SIGN_src);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c
new file mode 100644
index 00000000000..e5f87401558
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vgetmantph-1b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define DEBUG
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vgetmantph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vgetmantph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 24/62] AVX512FP16: Add vmovw/vmovsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (22 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 23/62] AVX512FP16: Add testcase for fpclass/getmant/getexp instructions liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-09-16  5:08   ` Hongtao Liu
  2021-07-01  6:16 ` [PATCH 25/62] AVX512FP16: Add testcase for vmovsh/vmovw liuhongt
                   ` (37 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h: (_mm_cvtsi16_si128):
	New intrinsic.
	(_mm_cvtsi128_si16): Likewise.
	(_mm_mask_load_sh): Likewise.
	(_mm_maskz_load_sh): Likewise.
	(_mm_mask_store_sh): Likewise.
	(_mm_move_sh): Likewise.
	(_mm_mask_move_sh): Likewise.
	(_mm_maskz_move_sh): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_special_args_builtin): Handle new builtin types.
	(ix86_expand_vector_init_one_nonzero): Adjust for FP16 target.
	* config/i386/sse.md (VI2F): New mode iterator.
	(vec_set<mode>_0): Use new mode iterator.
	(avx512f_mov<ssescalarmodelower>_mask): Adjust for HF vector mode.
	(avx512f_store<mode>_mask): Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 59 ++++++++++++++++++++++++++
 gcc/config/i386/i386-builtin-types.def |  3 ++
 gcc/config/i386/i386-builtin.def       |  5 +++
 gcc/config/i386/i386-expand.c          | 11 +++++
 gcc/config/i386/sse.md                 | 33 +++++++-------
 5 files changed, 95 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 2fbfc140c44..cdf6646c8c6 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -2453,6 +2453,65 @@ _mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vmovw.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsi16_si128 (short __A)
+{
+  return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A);
+}
+
+extern __inline short
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsi128_si16 (__m128i __A)
+{
+  return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0);
+}
+
+/* Intrinsics vmovsh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C)
+{
+  return __builtin_ia32_loadsh_mask (__C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B)
+{
+  return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A);
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C)
+{
+  __builtin_ia32_storesh_mask (__A,  __C, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_move_sh (__m128h __A, __m128h  __B)
+{
+  __A[0] = __B[0];
+  return __A;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h  __C, __m128h __D)
+{
+  return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_move_sh (__mmask8 __A, __m128h  __B, __m128h __C)
+{
+  return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A);
+}
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 79e7edf13e5..6cf3e354c78 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -134,6 +134,7 @@ DEF_POINTER_TYPE (PCVOID, VOID, CONST)
 DEF_POINTER_TYPE (PVOID, VOID)
 DEF_POINTER_TYPE (PDOUBLE, DOUBLE)
 DEF_POINTER_TYPE (PFLOAT, FLOAT)
+DEF_POINTER_TYPE (PCFLOAT16, FLOAT16, CONST)
 DEF_POINTER_TYPE (PSHORT, SHORT)
 DEF_POINTER_TYPE (PUSHORT, USHORT)
 DEF_POINTER_TYPE (PINT, INT)
@@ -1308,6 +1309,8 @@ DEF_FUNCTION_TYPE (QI, V8HF, INT, UQI)
 DEF_FUNCTION_TYPE (HI, V16HF, INT, UHI)
 DEF_FUNCTION_TYPE (SI, V32HF, INT, USI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
+DEF_FUNCTION_TYPE (VOID, PCFLOAT16, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, PCFLOAT16, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index ed1a4a38b1c..be617b8f18a 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -393,6 +393,10 @@ BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mas
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovswb512mem_mask", IX86_BUILTIN_PMOVSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovwb512mem_mask", IX86_BUILTIN_PMOVWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
 
+/* AVX512FP16 */
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_loadhf_mask, "__builtin_ia32_loadsh_mask", IX86_BUILTIN_LOADSH_MASK, UNKNOWN, (int) V8HF_FTYPE_PCFLOAT16_V8HF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_storehf_mask, "__builtin_ia32_storesh_mask", IX86_BUILTIN_STORESH_MASK, UNKNOWN, (int) VOID_FTYPE_PCFLOAT16_V8HF_UQI)
+
 /* RDPKRU and WRPKRU.  */
 BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_rdpkru,  "__builtin_ia32_rdpkru", IX86_BUILTIN_RDPKRU, UNKNOWN, (int) UNSIGNED_FTYPE_VOID)
 BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_wrpkru,  "__builtin_ia32_wrpkru", IX86_BUILTIN_WRPKRU, UNKNOWN, (int) VOID_FTYPE_UNSIGNED)
@@ -2826,6 +2830,7 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getexpv8hf_mask, "__builtin_ia32_getexpph128_mask", IX86_BUILTIN_GETEXPPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getmantv16hf_mask, "__builtin_ia32_getmantph256_mask", IX86_BUILTIN_GETMANTPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getmantv8hf_mask, "__builtin_ia32_getmantph128_mask", IX86_BUILTIN_GETMANTPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_movhf_mask, "__builtin_ia32_vmovsh_mask", IX86_BUILTIN_VMOVSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 266aa411ddb..bfc7fc75b97 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10907,6 +10907,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
     case VOID_FTYPE_PFLOAT_V16SF_UHI:
     case VOID_FTYPE_PFLOAT_V8SF_UQI:
     case VOID_FTYPE_PFLOAT_V4SF_UQI:
+    case VOID_FTYPE_PCFLOAT16_V8HF_UQI:
     case VOID_FTYPE_PV32QI_V32HI_USI:
     case VOID_FTYPE_PV16QI_V16HI_UHI:
     case VOID_FTYPE_PUDI_V8HI_UQI:
@@ -10979,6 +10980,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
     case V16SF_FTYPE_PCFLOAT_V16SF_UHI:
     case V8SF_FTYPE_PCFLOAT_V8SF_UQI:
     case V4SF_FTYPE_PCFLOAT_V4SF_UQI:
+    case V8HF_FTYPE_PCFLOAT16_V8HF_UQI:
       nargs = 3;
       klass = load;
       memory = 0;
@@ -13993,6 +13995,8 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
       break;
     case E_V8HImode:
       use_vector_set = TARGET_SSE2;
+      gen_vec_set_0 = TARGET_AVX512FP16 && one_var == 0
+	? gen_vec_setv8hi_0 : NULL;
       break;
     case E_V8QImode:
       use_vector_set = TARGET_MMX_WITH_SSE && TARGET_SSE4_1;
@@ -14004,8 +14008,12 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
       use_vector_set = TARGET_SSE4_1;
       break;
     case E_V32QImode:
+      use_vector_set = TARGET_AVX;
+      break;
     case E_V16HImode:
       use_vector_set = TARGET_AVX;
+      gen_vec_set_0 = TARGET_AVX512FP16 && one_var == 0
+	? gen_vec_setv16hi_0 : NULL;
       break;
     case E_V8SImode:
       use_vector_set = TARGET_AVX;
@@ -14053,6 +14061,9 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
       use_vector_set = TARGET_AVX512FP16 && one_var == 0;
       gen_vec_set_0 = gen_vec_setv32hf_0;
       break;
+    case E_V32HImode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv32hi_0;
     default:
       break;
     }
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c4db778e25d..97f7c698d5d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -758,6 +758,7 @@ (define_mode_iterator VIHF_AVX512BW
   (V32HF "TARGET_AVX512FP16")])
 
 ;; Int-float size matches
+(define_mode_iterator VI2F [V8HI V16HI V32HI V8HF V16HF V32HF])
 (define_mode_iterator VI4F_128 [V4SI V4SF])
 (define_mode_iterator VI8F_128 [V2DI V2DF])
 (define_mode_iterator VI4F_256 [V8SI V8SF])
@@ -1317,13 +1318,13 @@ (define_insn_and_split "*<avx512>_load<mode>"
   [(set (match_dup 0) (match_dup 1))])
 
 (define_insn "avx512f_mov<ssescalarmodelower>_mask"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (match_operand:VF_128 2 "register_operand" "v")
-	    (match_operand:VF_128 3 "nonimm_or_0_operand" "0C")
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (match_operand:VFH_128 2 "register_operand" "v")
+	    (match_operand:VFH_128 3 "nonimm_or_0_operand" "0C")
 	    (match_operand:QI 4 "register_operand" "Yk"))
-	  (match_operand:VF_128 1 "register_operand" "v")
+	  (match_operand:VFH_128 1 "register_operand" "v")
 	  (const_int 1)))]
   "TARGET_AVX512F"
   "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
@@ -1336,7 +1337,7 @@ (define_expand "avx512f_load<mode>_mask"
 	(vec_merge:<ssevecmode>
 	  (vec_merge:<ssevecmode>
 	    (vec_duplicate:<ssevecmode>
-	      (match_operand:MODEF 1 "memory_operand"))
+	      (match_operand:MODEFH 1 "memory_operand"))
 	    (match_operand:<ssevecmode> 2 "nonimm_or_0_operand")
 	    (match_operand:QI 3 "register_operand"))
 	  (match_dup 4)
@@ -1349,7 +1350,7 @@ (define_insn "*avx512f_load<mode>_mask"
 	(vec_merge:<ssevecmode>
 	  (vec_merge:<ssevecmode>
 	    (vec_duplicate:<ssevecmode>
-	      (match_operand:MODEF 1 "memory_operand" "m"))
+	      (match_operand:MODEFH 1 "memory_operand" "m"))
 	    (match_operand:<ssevecmode> 2 "nonimm_or_0_operand" "0C")
 	    (match_operand:QI 3 "register_operand" "Yk"))
 	  (match_operand:<ssevecmode> 4 "const0_operand" "C")
@@ -1362,11 +1363,11 @@ (define_insn "*avx512f_load<mode>_mask"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "avx512f_store<mode>_mask"
-  [(set (match_operand:MODEF 0 "memory_operand" "=m")
-	(if_then_else:MODEF
+  [(set (match_operand:MODEFH 0 "memory_operand" "=m")
+	(if_then_else:MODEFH
 	  (and:QI (match_operand:QI 2 "register_operand" "Yk")
 		 (const_int 1))
-	  (vec_select:MODEF
+	  (vec_select:MODEFH
 	    (match_operand:<ssevecmode> 1 "register_operand" "v")
 	    (parallel [(const_int 0)]))
 	  (match_dup 0)))]
@@ -8513,11 +8514,11 @@ (define_insn "vec_set<mode>_0"
 
 ;; vmovw clears also the higer bits
 (define_insn "vec_set<mode>_0"
-  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v")
-	(vec_merge:VF_AVX512FP16
-	  (vec_duplicate:VF_AVX512FP16
-	    (match_operand:HF 2 "nonimmediate_operand" "rm"))
-	  (match_operand:VF_AVX512FP16 1 "const0_operand" "C")
+  [(set (match_operand:VI2F 0 "register_operand" "=v")
+	(vec_merge:VI2F
+	  (vec_duplicate:VI2F
+	    (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "rm"))
+	  (match_operand:VI2F 1 "const0_operand" "C")
 	  (const_int 1)))]
   "TARGET_AVX512FP16"
   "vmovw\t{%2, %x0|%x0, %2}"
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 25/62] AVX512FP16: Add testcase for vmovsh/vmovw.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (23 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 24/62] AVX512FP16: Add vmovw/vmovsh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 26/62] AVX512FP16: Add vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq liuhongt
                   ` (36 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vmovsh-1a.c: New test.
	* gcc.target/i386/avx512fp16-vmovsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmovw-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vmovw-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmovw-2a.c: Ditto.
	* gcc.target/i386/avx512fp16-vmovw-2b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmovw-3a.c: Ditto.
	* gcc.target/i386/avx512fp16-vmovw-3b.c: Ditto.
	* gcc.target/i386/avx512fp16-vmovw-4a.c: Ditto.
	* gcc.target/i386/avx512fp16-vmovw-4b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-vmovsh-1a.c    |  26 ++++
 .../gcc.target/i386/avx512fp16-vmovsh-1b.c    | 115 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vmovw-1a.c     |  15 +++
 .../gcc.target/i386/avx512fp16-vmovw-1b.c     |  27 ++++
 .../gcc.target/i386/avx512fp16-vmovw-2a.c     |  21 ++++
 .../gcc.target/i386/avx512fp16-vmovw-2b.c     |  53 ++++++++
 .../gcc.target/i386/avx512fp16-vmovw-3a.c     |  23 ++++
 .../gcc.target/i386/avx512fp16-vmovw-3b.c     |  52 ++++++++
 .../gcc.target/i386/avx512fp16-vmovw-4a.c     |  27 ++++
 .../gcc.target/i386/avx512fp16-vmovw-4b.c     |  52 ++++++++
 10 files changed, 411 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c
new file mode 100644
index 00000000000..e35be10fcd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r\]*%\[er\]ax+\[^\n\r]*\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+\[^\n\r\]*%\[er\]ax+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+\[^\n\r\]*%\[er\]ax+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^z\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovsh\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+extern _Float16 const* p;
+volatile __m128h x1, x2, res;
+volatile __mmask8 m8;
+
+void
+avx512f_test (void)
+{
+  x2 = _mm_mask_load_sh (x1, m8, p);
+  x2 = _mm_maskz_load_sh (m8, p);
+  _mm_mask_store_sh (p, m8, x1);
+
+  res = _mm_move_sh (x1, x2);
+  res = _mm_mask_move_sh (res, m8, x1, x2);
+  res = _mm_maskz_move_sh (m8, x1, x2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1b.c
new file mode 100644
index 00000000000..cea224a62e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1b.c
@@ -0,0 +1,115 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+void NOINLINE
+emulate_mov2_load_sh(V512 * dest, V512 op1,
+		     __mmask8 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  if ((k&1) || !k)
+    v5.f32[0] = v1.f32[0];
+  else if (zero_mask)
+    v5.f32[0] = 0;
+  else
+    v5.f32[0] = v7.f32[0]; //remains unchanged
+
+  for (i = 1; i < 8; i++)
+    v5.f32[i] = 0;
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void NOINLINE
+emulate_mov3_load_sh(V512 * dest, V512 op1, V512 op2,
+		     __mmask8 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  if ((k&1) || !k)
+    v5.f32[0] = v3.f32[0];
+  else if (zero_mask)
+    v5.f32[0] = 0;
+  else
+    v5.f32[0] = v7.f32[0]; //remains unchanged
+
+  for (i = 1; i < 8; i++)
+    v5.f32[i] = v1.f32[i];
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void NOINLINE
+emulate_mov2_store_sh(V512 * dest, V512 op1, __mmask8 k)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  if ((k&1) || !k)
+    v5.f32[0] = v1.f32[0];
+  else
+    v5.f32[0] = v7.f32[0]; //remains unchanged
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  // no mask
+  emulate_mov2_load_sh (&exp, src1, 0x0, 0);
+  res.xmmh[0] = _mm_load_sh((const void *)&(src1.u16[0]));
+  check_results(&res, &exp, 8, "_mm_load_sh");
+
+  // with mask and mask bit is set
+  emulate_mov2_load_sh (&exp, src1, 0x1, 0);
+  res.xmmh[0] = _mm_mask_load_sh(res.xmmh[0], 0x1, (const void *)&(src1.u16[0]));
+  check_results(&res, &exp, 8, "_mm__mask_load_sh");
+
+  // with zero-mask
+  emulate_mov2_load_sh (&exp, src1, 0x0, 1);
+  res.xmmh[0] = _mm_maskz_load_sh(0x1, (const void *)&(src1.u16[0]));
+  check_results(&res, &exp, 8, "_mm_maskz_load_sh");
+
+  emulate_mov3_load_sh (&exp, src1, src2, 0x1, 0);
+  res.xmmh[0] = _mm_mask_move_sh(res.xmmh[0], 0x1, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, 8, "_mm_mask_move_sh");
+
+  emulate_mov3_load_sh (&exp, src1, src2, 0x1, 1);
+  res.xmmh[0] = _mm_maskz_move_sh(0x1, src1.xmmh[0], src2.xmmh[0]);
+  check_results(&res, &exp, 8, "_mm_maskz_move_sh");
+
+  // no mask
+  emulate_mov2_store_sh (&exp, src1, 0x0);
+  _mm_store_sh((void *)&(res.u16[0]), src1.xmmh[0]);
+  check_results(&exp, &res, 1, "_mm_store_sh");
+
+  // with mask
+  emulate_mov2_store_sh (&exp, src1, 0x1);
+  _mm_mask_store_sh((void *)&(res.u16[0]), 0x1, src1.xmmh[0]);
+  check_results(&exp, &res, 1, "_mm_mask_store_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1a.c
new file mode 100644
index 00000000000..177802c6dcb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1a.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vmovw\[^-]" 1 } } */
+/* { dg-final { scan-assembler-times "vpextrw" 1 } } */
+#include <immintrin.h>
+
+volatile __m128i x1;
+volatile short x2;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm_cvtsi16_si128 (x2);
+  x2 = _mm_cvtsi128_si16 (x1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1b.c
new file mode 100644
index 00000000000..a96007d6fd8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-1b.c
@@ -0,0 +1,27 @@
+/* { dg-do run {target avx512fp16} } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+static void
+do_test (void)
+{
+  union128i_w u;
+  short b = 128;
+  short e[8] = {0,0,0,0,0,0,0,0};
+
+  u.x = _mm_cvtsi16_si128 (b);
+
+  e[0] = b;
+
+  if (check_union128i_w (u, e))
+    abort ();
+  u.a[0] = 123;
+  b = _mm_cvtsi128_si16 (u.x);
+  if (u.a[0] != b)
+    abort();  
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2a.c
new file mode 100644
index 00000000000..efa24e5523c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef short __v8hi __attribute__ ((__vector_size__ (16)));
+typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
+
+__m128i
+__attribute__ ((noinline, noclone))
+foo1 (short x)
+{
+  return __extension__ (__m128i)(__v8hi) { x, 0, 0, 0, 0, 0, 0, 0 };
+}
+
+__m128i
+__attribute__ ((noinline, noclone))
+foo2 (short *x)
+{
+  return __extension__ (__m128i)(__v8hi) { *x, 0, 0, 0, 0, 0, 0, 0 };
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[^-\n\r]*xmm0" 2  } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2b.c
new file mode 100644
index 00000000000..b680a16945f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-2b.c
@@ -0,0 +1,53 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-vmovw-2a.c"
+
+__m128i
+__attribute__ ((noinline,noclone))
+foo3 (__m128i x)
+{
+  return foo1 (((__v8hi) x)[0]);
+}
+
+static void
+do_test (void)
+{
+  short x;
+  union128i_w u = { -1, -1,};
+  union128i_w exp = { 0, 0};
+  __m128i v;
+  union128i_w a;
+
+  x = 25;
+  exp.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo1 (x);
+  a.x = v;
+  if (check_union128i_w (a, exp.a))
+    abort ();
+
+  x = 33;
+  exp.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo2 (&x);
+  a.x = v;
+  if (check_union128i_w (a, exp.a))
+    abort ();
+
+  x = -33;
+  u.a[0] = x;
+  exp.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo3 (u.x);
+  a.x = v;
+  if (check_union128i_w (a, exp.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3a.c
new file mode 100644
index 00000000000..c60310710a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3a.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef short __v16hi __attribute__ ((__vector_size__ (32)));
+typedef long long __m256i __attribute__ ((__vector_size__ (32), __may_alias__));
+
+__m256i
+__attribute__ ((noinline, noclone))
+foo1 (short x)
+{
+  return __extension__ (__m256i)(__v16hi) { x, 0, 0, 0, 0, 0, 0, 0,
+					     0, 0, 0, 0, 0, 0, 0, 0 };
+}
+
+__m256i
+__attribute__ ((noinline, noclone))
+foo2 (short *x)
+{
+  return __extension__ (__m256i)(__v16hi) { *x, 0, 0, 0, 0, 0, 0, 0,
+					     0, 0, 0, 0, 0, 0, 0, 0 };
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[^-\n\r]*xmm0" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3b.c
new file mode 100644
index 00000000000..13c1f6518f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-3b.c
@@ -0,0 +1,52 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-vmovw-3a.c"
+
+__m256i
+__attribute__ ((noinline,noclone))
+foo3 (__m256i x)
+{
+  return foo1 (((__v16hi) x)[0]);
+}
+
+static void
+do_test (void)
+{
+  short x;
+  union256i_w u = { -1, -1, -1, -1 };
+  union256i_w exp = { 0, 0, 0, 0 };
+
+  __m256i v;
+  union256i_w a;
+  exp.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo1 (x);
+  a.x = v;
+  if (check_union256i_w (a, exp.a))
+    abort ();
+
+  x = 33;
+  exp.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo2 (&x);
+  a.x = v;
+  if (check_union256i_w (a, exp.a))
+    abort ();
+
+  x = -23;
+  u.a[0] = x;
+  exp.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo3 (u.x);
+  a.x = v;
+  if (check_union256i_w (a, exp.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4a.c
new file mode 100644
index 00000000000..2ba198dd7fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef short __v32hi __attribute__ ((__vector_size__ (64)));
+typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__m512i
+__attribute__ ((noinline, noclone))
+foo1 (short x)
+{
+  return __extension__ (__m512i)(__v32hi) { x, 0, 0, 0, 0, 0, 0, 0,
+					     0, 0, 0, 0, 0, 0, 0, 0,
+					     0, 0, 0, 0, 0, 0, 0, 0,
+					     0, 0, 0, 0, 0, 0, 0, 0 };
+}
+
+__m512i
+__attribute__ ((noinline, noclone))
+foo2 (short *x)
+{
+  return __extension__ (__m512i)(__v32hi) { *x, 0, 0, 0, 0, 0, 0, 0,
+					     0, 0, 0, 0, 0, 0, 0, 0,
+					     0, 0, 0, 0, 0, 0, 0, 0,
+					     0, 0, 0, 0, 0, 0, 0, 0 };
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[^-\n\r]*xmm0" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4b.c
new file mode 100644
index 00000000000..ec6477b793f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vmovw-4b.c
@@ -0,0 +1,52 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-vmovw-4a.c"
+
+__m512i
+__attribute__ ((noinline,noclone))
+foo3 (__m512i x)
+{
+  return foo1 (((__v32hi) x)[0]);
+}
+
+static void
+do_test (void)
+{
+  short x = 25;
+  union512i_w u = { -1, -1, -1, -1, -1, -1, -1, -1 };
+  union512i_w exp = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+  __m512i v;
+  union512i_w a;
+  exp.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo1 (x);
+  a.x = v;
+  if (check_union512i_w (a, exp.a))
+    abort ();
+
+  x = 55;
+  exp.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo2 (&x);
+  a.x = v;
+  if (check_union512i_w (a, exp.a))
+    abort ();
+
+  x = 33;
+  u.a[0] = x;
+  exp.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo3 (u.x);
+  a.x = v;
+  if (check_union512i_w (a, exp.a))
+    abort ();
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 26/62] AVX512FP16: Add vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (24 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 25/62] AVX512FP16: Add testcase for vmovsh/vmovw liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 27/62] AVX512FP16: Add testcase for vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq liuhongt
                   ` (35 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm512_cvtph_epi32):
	New intrinsic/
	(_mm512_mask_cvtph_epi32): Likewise.
	(_mm512_maskz_cvtph_epi32): Likewise.
	(_mm512_cvt_roundph_epi32): Likewise.
	(_mm512_mask_cvt_roundph_epi32): Likewise.
	(_mm512_maskz_cvt_roundph_epi32): Likewise.
	(_mm512_cvtph_epu32): Likewise.
	(_mm512_mask_cvtph_epu32): Likewise.
	(_mm512_maskz_cvtph_epu32): Likewise.
	(_mm512_cvt_roundph_epu32): Likewise.
	(_mm512_mask_cvt_roundph_epu32): Likewise.
	(_mm512_maskz_cvt_roundph_epu32): Likewise.
	(_mm512_cvtph_epi64): Likewise.
	(_mm512_mask_cvtph_epi64): Likewise.
	(_mm512_maskz_cvtph_epi64): Likewise.
	(_mm512_cvt_roundph_epi64): Likewise.
	(_mm512_mask_cvt_roundph_epi64): Likewise.
	(_mm512_maskz_cvt_roundph_epi64): Likewise.
	(_mm512_cvtph_epu64): Likewise.
	(_mm512_mask_cvtph_epu64): Likewise.
	(_mm512_maskz_cvtph_epu64): Likewise.
	(_mm512_cvt_roundph_epu64): Likewise.
	(_mm512_mask_cvt_roundph_epu64): Likewise.
	(_mm512_maskz_cvt_roundph_epu64): Likewise.
	(_mm512_cvtph_epi16): Likewise.
	(_mm512_mask_cvtph_epi16): Likewise.
	(_mm512_maskz_cvtph_epi16): Likewise.
	(_mm512_cvt_roundph_epi16): Likewise.
	(_mm512_mask_cvt_roundph_epi16): Likewise.
	(_mm512_maskz_cvt_roundph_epi16): Likewise.
	(_mm512_cvtph_epu16): Likewise.
	(_mm512_mask_cvtph_epu16): Likewise.
	(_mm512_maskz_cvtph_epu16): Likewise.
	(_mm512_cvt_roundph_epu16): Likewise.
	(_mm512_mask_cvt_roundph_epu16): Likewise.
	(_mm512_maskz_cvt_roundph_epu16): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_cvtph_epi32):
	New intrinsic.
	(_mm_mask_cvtph_epi32): Likewise.
	(_mm_maskz_cvtph_epi32): Likewise.
	(_mm256_cvtph_epi32): Likewise.
	(_mm256_mask_cvtph_epi32): Likewise.
	(_mm256_maskz_cvtph_epi32): Likewise.
	(_mm_cvtph_epu32): Likewise.
	(_mm_mask_cvtph_epu32): Likewise.
	(_mm_maskz_cvtph_epu32): Likewise.
	(_mm256_cvtph_epu32): Likewise.
	(_mm256_mask_cvtph_epu32): Likewise.
	(_mm256_maskz_cvtph_epu32): Likewise.
	(_mm_cvtph_epi64): Likewise.
	(_mm_mask_cvtph_epi64): Likewise.
	(_mm_maskz_cvtph_epi64): Likewise.
	(_mm256_cvtph_epi64): Likewise.
	(_mm256_mask_cvtph_epi64): Likewise.
	(_mm256_maskz_cvtph_epi64): Likewise.
	(_mm_cvtph_epu64): Likewise.
	(_mm_mask_cvtph_epu64): Likewise.
	(_mm_maskz_cvtph_epu64): Likewise.
	(_mm256_cvtph_epu64): Likewise.
	(_mm256_mask_cvtph_epu64): Likewise.
	(_mm256_maskz_cvtph_epu64): Likewise.
	(_mm_cvtph_epi16): Likewise.
	(_mm_mask_cvtph_epi16): Likewise.
	(_mm_maskz_cvtph_epi16): Likewise.
	(_mm256_cvtph_epi16): Likewise.
	(_mm256_mask_cvtph_epi16): Likewise.
	(_mm256_maskz_cvtph_epi16): Likewise.
	(_mm_cvtph_epu16): Likewise.
	(_mm_mask_cvtph_epu16): Likewise.
	(_mm_maskz_cvtph_epu16): Likewise.
	(_mm256_cvtph_epu16): Likewise.
	(_mm256_mask_cvtph_epu16): Likewise.
	(_mm256_maskz_cvtph_epu16): Likewise.
	* config/i386/i386-builtin-types.def: Add new builtin types.
	* config/i386/i386-builtin.def: Add new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	(ix86_expand_round_builtin): Ditto.
	* config/i386/sse.md (sseintconvert): New.
	(ssePHmode): Ditto.
	(UNSPEC_US_FIX_NOTRUNC): Ditto.
	(sseintconvertsignprefix): Ditto.
	(avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode><mask_name><round_name>):
	Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 525 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   | 345 ++++++++++++++++
 gcc/config/i386/i386-builtin-types.def |   9 +
 gcc/config/i386/i386-builtin.def       |  18 +
 gcc/config/i386/i386-expand.c          |   9 +
 gcc/config/i386/sse.md                 |  35 ++
 gcc/testsuite/gcc.target/i386/avx-1.c  |   6 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   6 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  18 +
 gcc/testsuite/gcc.target/i386/sse-22.c |  18 +
 gcc/testsuite/gcc.target/i386/sse-23.c |   6 +
 11 files changed, 995 insertions(+)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index cdf6646c8c6..42576c4ae2e 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -2512,6 +2512,531 @@ _mm_maskz_move_sh (__mmask8 __A, __m128h  __B, __m128h __C)
   return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A);
 }
 
+/* Intrinsics vcvtph2dq.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtph_epi32 (__m256h __A)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2dq_v16si_mask_round (__A,
+					       (__v16si)
+					       _mm512_setzero_si512 (),
+					       (__mmask16) -1,
+					       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2dq_v16si_mask_round (__C,
+					       (__v16si) __A,
+					       __B,
+					       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtph_epi32 (__mmask16 __A, __m256h __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2dq_v16si_mask_round (__B,
+					       (__v16si)
+					       _mm512_setzero_si512 (),
+					       __A,
+					       _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundph_epi32 (__m256h __A, int __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2dq_v16si_mask_round (__A,
+					       (__v16si)
+					       _mm512_setzero_si512 (),
+					       (__mmask16) -1,
+					       __B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2dq_v16si_mask_round (__C,
+					       (__v16si) __A,
+					       __B,
+					       __D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2dq_v16si_mask_round (__B,
+					       (__v16si)
+					       _mm512_setzero_si512 (),
+					       __A,
+					       __C);
+}
+
+#else
+#define _mm512_cvt_roundph_epi32(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2dq_v16si_mask_round ((A),			\
+					      (__v16si)			\
+					      _mm512_setzero_si512 (),	\
+					      (__mmask16)-1,		\
+					      (B)))
+
+#define _mm512_mask_cvt_roundph_epi32(A, B, C, D)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2dq_v16si_mask_round ((C), (__v16si)(A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundph_epi32(A, B, C)				\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2dq_v16si_mask_round ((B),			\
+					      (__v16si)			\
+					      _mm512_setzero_si512 (),	\
+					      (A),			\
+					      (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2udq.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtph_epu32 (__m256h __A)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq_v16si_mask_round (__A,
+						(__v16si)
+						_mm512_setzero_si512 (),
+						(__mmask16) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq_v16si_mask_round (__C,
+						(__v16si) __A,
+						__B,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtph_epu32 (__mmask16 __A, __m256h __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq_v16si_mask_round (__B,
+						(__v16si)
+						_mm512_setzero_si512 (),
+						__A,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundph_epu32 (__m256h __A, int __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq_v16si_mask_round (__A,
+						(__v16si)
+						_mm512_setzero_si512 (),
+						(__mmask16) -1,
+						__B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq_v16si_mask_round (__C,
+						(__v16si) __A,
+						__B,
+						__D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2udq_v16si_mask_round (__B,
+						(__v16si)
+						_mm512_setzero_si512 (),
+						__A,
+						__C);
+}
+
+#else
+#define _mm512_cvt_roundph_epu32(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2udq_v16si_mask_round ((A),			\
+					       (__v16si)		\
+					       _mm512_setzero_si512 (),	\
+					       (__mmask16)-1,		\
+					       (B)))
+
+#define _mm512_mask_cvt_roundph_epu32(A, B, C, D)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2udq_v16si_mask_round ((C), (__v16si)(A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundph_epu32(A, B, C)				\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2udq_v16si_mask_round ((B),			\
+					       (__v16si)		\
+					       _mm512_setzero_si512 (),	\
+					       (A),			\
+					       (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2qq.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtph_epi64 (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2qq_v8di_mask_round (__A,
+						   _mm512_setzero_si512 (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2qq_v8di_mask_round (__C, __A, __B,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2qq_v8di_mask_round (__B,
+						   _mm512_setzero_si512 (),
+						   __A,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundph_epi64 (__m128h __A, int __B)
+{
+  return __builtin_ia32_vcvtph2qq_v8di_mask_round (__A,
+						   _mm512_setzero_si512 (),
+						   (__mmask8) -1,
+						   __B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+{
+  return __builtin_ia32_vcvtph2qq_v8di_mask_round (__C, __A, __B, __D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
+{
+  return __builtin_ia32_vcvtph2qq_v8di_mask_round (__B,
+						   _mm512_setzero_si512 (),
+						   __A,
+						   __C);
+}
+
+#else
+#define _mm512_cvt_roundph_epi64(A, B)					\
+  (__builtin_ia32_vcvtph2qq_v8di_mask_round ((A),			\
+					     _mm512_setzero_si512 (),	\
+					     (__mmask8)-1,		\
+					     (B)))
+
+#define _mm512_mask_cvt_roundph_epi64(A, B, C, D)			\
+  (__builtin_ia32_vcvtph2qq_v8di_mask_round ((C), (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundph_epi64(A, B, C)				\
+  (__builtin_ia32_vcvtph2qq_v8di_mask_round ((B),			\
+					     _mm512_setzero_si512 (),	\
+					     (A),			\
+					     (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2uqq.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtph_epu64 (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__A,
+						    _mm512_setzero_si512 (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__C, __A, __B,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__B,
+						    _mm512_setzero_si512 (),
+						    __A,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundph_epu64 (__m128h __A, int __B)
+{
+  return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__A,
+						    _mm512_setzero_si512 (),
+						    (__mmask8) -1,
+						    __B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+{
+  return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__C, __A, __B, __D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
+{
+  return __builtin_ia32_vcvtph2uqq_v8di_mask_round (__B,
+						    _mm512_setzero_si512 (),
+						    __A,
+						    __C);
+}
+
+#else
+#define _mm512_cvt_roundph_epu64(A, B)					\
+  (__builtin_ia32_vcvtph2uqq_v8di_mask_round ((A),			\
+					      _mm512_setzero_si512 (),	\
+					      (__mmask8)-1,		\
+					      (B)))
+
+#define _mm512_mask_cvt_roundph_epu64(A, B, C, D)			\
+  (__builtin_ia32_vcvtph2uqq_v8di_mask_round ((C), (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundph_epu64(A, B, C)				\
+  (__builtin_ia32_vcvtph2uqq_v8di_mask_round ((B),			\
+					      _mm512_setzero_si512 (),	\
+					      (A),			\
+					      (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2w.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtph_epi16 (__m512h __A)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2w_v32hi_mask_round (__A,
+					      (__v32hi)
+					      _mm512_setzero_si512 (),
+					      (__mmask32) -1,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2w_v32hi_mask_round (__C,
+					      (__v32hi) __A,
+					      __B,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtph_epi16 (__mmask32 __A, __m512h __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2w_v32hi_mask_round (__B,
+					      (__v32hi)
+					      _mm512_setzero_si512 (),
+					      __A,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundph_epi16 (__m512h __A, int __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2w_v32hi_mask_round (__A,
+					      (__v32hi)
+					      _mm512_setzero_si512 (),
+					      (__mmask32) -1,
+					      __B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2w_v32hi_mask_round (__C,
+					      (__v32hi) __A,
+					      __B,
+					      __D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2w_v32hi_mask_round (__B,
+					      (__v32hi)
+					      _mm512_setzero_si512 (),
+					      __A,
+					      __C);
+}
+
+#else
+#define _mm512_cvt_roundph_epi16(A, B)					\
+  ((__m512i)__builtin_ia32_vcvtph2w_v32hi_mask_round ((A),		\
+						      (__v32hi)		\
+						      _mm512_setzero_si512 (), \
+						      (__mmask32)-1,	\
+						      (B)))
+
+#define _mm512_mask_cvt_roundph_epi16(A, B, C, D)			\
+  ((__m512i)__builtin_ia32_vcvtph2w_v32hi_mask_round ((C),		\
+						      (__v32hi)(A),	\
+						      (B),		\
+						      (D)))
+
+#define _mm512_maskz_cvt_roundph_epi16(A, B, C)				\
+  ((__m512i)__builtin_ia32_vcvtph2w_v32hi_mask_round ((B),		\
+						      (__v32hi)		\
+						      _mm512_setzero_si512 (), \
+						      (A),		\
+						      (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2uw.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtph_epu16 (__m512h __A)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw_v32hi_mask_round (__A,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       (__mmask32) -1,
+					       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw_v32hi_mask_round (__C, (__v32hi) __A, __B,
+					       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtph_epu16 (__mmask32 __A, __m512h __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw_v32hi_mask_round (__B,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       __A,
+					       _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundph_epu16 (__m512h __A, int __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw_v32hi_mask_round (__A,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       (__mmask32) -1,
+					       __B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw_v32hi_mask_round (__C, (__v32hi) __A, __B, __D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvtph2uw_v32hi_mask_round (__B,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       __A,
+					       __C);
+}
+
+#else
+#define _mm512_cvt_roundph_epu16(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2uw_v32hi_mask_round ((A),			\
+					      (__v32hi)			\
+					      _mm512_setzero_si512 (),	\
+					      (__mmask32)-1, (B)))
+
+#define _mm512_mask_cvt_roundph_epu16(A, B, C, D)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2uw_v32hi_mask_round ((C), (__v32hi)(A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundph_epu16(A, B, C)				\
+  ((__m512i)								\
+   __builtin_ia32_vcvtph2uw_v32hi_mask_round ((B),			\
+					      (__v32hi)			\
+					      _mm512_setzero_si512 (),	\
+					      (A),			\
+					      (C)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index 206d60407fc..8a7e0aaa6b1 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -930,6 +930,351 @@ _mm_maskz_getmant_ph (__mmask8 __U, __m128h __A,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvtph2dq.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtph_epi32 (__m128h __A)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2dq_v4si_mask (__A,
+					(__v4si)
+					_mm_setzero_si128 (),
+					(__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtph_epi32 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2dq_v4si_mask (__C, ( __v4si) __A, __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtph_epi32 (__mmask8 __A, __m128h __B)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2dq_v4si_mask (__B,
+					(__v4si) _mm_setzero_si128 (),
+					__A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtph_epi32 (__m128h __A)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2dq_v8si_mask (__A,
+					(__v8si)
+					_mm256_setzero_si256 (),
+					(__mmask8) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtph_epi32 (__m256i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2dq_v8si_mask (__C, ( __v8si) __A, __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtph_epi32 (__mmask8 __A, __m128h __B)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2dq_v8si_mask (__B,
+					(__v8si)
+					_mm256_setzero_si256 (),
+					__A);
+}
+
+/* Intrinsics vcvtph2udq.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtph_epu32 (__m128h __A)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2udq_v4si_mask (__A,
+					 (__v4si)
+					 _mm_setzero_si128 (),
+					 (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtph_epu32 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2udq_v4si_mask (__C, ( __v4si) __A, __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtph_epu32 (__mmask8 __A, __m128h __B)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2udq_v4si_mask (__B,
+					 (__v4si)
+					 _mm_setzero_si128 (),
+					 __A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtph_epu32 (__m128h __A)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2udq_v8si_mask (__A,
+					 (__v8si)
+					 _mm256_setzero_si256 (),
+					 (__mmask8) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtph_epu32 (__m256i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2udq_v8si_mask (__C, ( __v8si) __A, __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtph_epu32 (__mmask8 __A, __m128h __B)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2udq_v8si_mask (__B,
+					 (__v8si) _mm256_setzero_si256 (),
+					 __A);
+}
+
+/* Intrinsics vcvtph2qq.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtph_epi64 (__m128h __A)
+{
+  return
+    __builtin_ia32_vcvtph2qq_v2di_mask (__A,
+					_mm_setzero_si128 (),
+					(__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtph_epi64 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2qq_v2di_mask (__C, __A, __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2qq_v2di_mask (__B,
+					     _mm_setzero_si128 (),
+					     __A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtph_epi64 (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2qq_v4di_mask (__A,
+					     _mm256_setzero_si256 (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtph_epi64 (__m256i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2qq_v4di_mask (__C, __A, __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtph_epi64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2qq_v4di_mask (__B,
+					     _mm256_setzero_si256 (),
+					     __A);
+}
+
+/* Intrinsics vcvtph2uqq.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtph_epu64 (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2uqq_v2di_mask (__A,
+					      _mm_setzero_si128 (),
+					      (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtph_epu64 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2uqq_v2di_mask (__C, __A, __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2uqq_v2di_mask (__B,
+					      _mm_setzero_si128 (),
+					      __A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtph_epu64 (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2uqq_v4di_mask (__A,
+					      _mm256_setzero_si256 (),
+					      (__mmask8) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtph_epu64 (__m256i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2uqq_v4di_mask (__C, __A, __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2uqq_v4di_mask (__B,
+					      _mm256_setzero_si256 (),
+					      __A);
+}
+
+/* Intrinsics vcvtph2w.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtph_epi16 (__m128h __A)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2w_v8hi_mask (__A,
+				       (__v8hi)
+				       _mm_setzero_si128 (),
+				       (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtph_epi16 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2w_v8hi_mask (__C, ( __v8hi) __A, __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtph_epi16 (__mmask8 __A, __m128h __B)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2w_v8hi_mask (__B,
+				       (__v8hi)
+				       _mm_setzero_si128 (),
+				       __A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtph_epi16 (__m256h __A)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2w_v16hi_mask (__A,
+					(__v16hi)
+					_mm256_setzero_si256 (),
+					(__mmask16) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtph_epi16 (__m256i __A, __mmask16 __B, __m256h __C)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2w_v16hi_mask (__C, ( __v16hi) __A, __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtph_epi16 (__mmask16 __A, __m256h __B)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2w_v16hi_mask (__B,
+					(__v16hi)
+					_mm256_setzero_si256 (),
+					__A);
+}
+
+/* Intrinsics vcvtph2uw.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtph_epu16 (__m128h __A)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2uw_v8hi_mask (__A,
+					(__v8hi)
+					_mm_setzero_si128 (),
+					(__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtph_epu16 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2uw_v8hi_mask (__C, ( __v8hi) __A, __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtph_epu16 (__mmask8 __A, __m128h __B)
+{
+  return (__m128i)
+    __builtin_ia32_vcvtph2uw_v8hi_mask (__B,
+					(__v8hi)
+					_mm_setzero_si128 (),
+					__A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtph_epu16 (__m256h __A)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2uw_v16hi_mask (__A,
+					 (__v16hi)
+					 _mm256_setzero_si256 (),
+					 (__mmask16) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtph_epu16 (__m256i __A, __mmask16 __B, __m256h __C)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2uw_v16hi_mask (__C, ( __v16hi) __A, __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtph_epu16 (__mmask16 __A, __m256h __B)
+{
+  return (__m256i)
+    __builtin_ia32_vcvtph2uw_v16hi_mask (__B,
+					 (__v16hi)
+					 _mm256_setzero_si256 (),
+					 __A);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 6cf3e354c78..c430dc9ab48 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1311,21 +1311,30 @@ DEF_FUNCTION_TYPE (SI, V32HF, INT, USI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
 DEF_FUNCTION_TYPE (VOID, PCFLOAT16, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, PCFLOAT16, V8HF, UQI)
+DEF_FUNCTION_TYPE (V2DI, V8HF, V2DI, UQI)
+DEF_FUNCTION_TYPE (V4DI, V8HF, V4DI, UQI)
+DEF_FUNCTION_TYPE (V4SI, V8HF, V4SI, UQI)
+DEF_FUNCTION_TYPE (V8SI, V8HF, V8SI, UQI)
+DEF_FUNCTION_TYPE (V8HI, V8HF, V8HI, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT)
+DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
+DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI)
+DEF_FUNCTION_TYPE (V16SI, V16HF, V16SI, UHI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI)
 DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
+DEF_FUNCTION_TYPE (V32HI, V32HF, V32HI, USI, INT)
 DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index be617b8f18a..dde8af53ff0 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2831,6 +2831,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getmantv16hf_mask, "__builtin_ia32_getmantph256_mask", IX86_BUILTIN_GETMANTPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getmantv8hf_mask, "__builtin_ia32_getmantph128_mask", IX86_BUILTIN_GETMANTPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_movhf_mask, "__builtin_ia32_vmovsh_mask", IX86_BUILTIN_VMOVSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v4si_mask, "__builtin_ia32_vcvtph2dq_v4si_mask", IX86_BUILTIN_VCVTPH2DQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v8si_mask, "__builtin_ia32_vcvtph2dq_v8si_mask", IX86_BUILTIN_VCVTPH2DQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v4si_mask, "__builtin_ia32_vcvtph2udq_v4si_mask", IX86_BUILTIN_VCVTPH2UDQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v8si_mask, "__builtin_ia32_vcvtph2udq_v8si_mask", IX86_BUILTIN_VCVTPH2UDQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v2di_mask, "__builtin_ia32_vcvtph2qq_v2di_mask", IX86_BUILTIN_VCVTPH2QQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v4di_mask, "__builtin_ia32_vcvtph2qq_v4di_mask", IX86_BUILTIN_VCVTPH2QQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v2di_mask, "__builtin_ia32_vcvtph2uqq_v2di_mask", IX86_BUILTIN_VCVTPH2UQQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v4di_mask, "__builtin_ia32_vcvtph2uqq_v4di_mask", IX86_BUILTIN_VCVTPH2UQQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v8hi_mask, "__builtin_ia32_vcvtph2w_v8hi_mask", IX86_BUILTIN_VCVTPH2W_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v16hi_mask, "__builtin_ia32_vcvtph2w_v16hi_mask", IX86_BUILTIN_VCVTPH2W_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v8hi_mask, "__builtin_ia32_vcvtph2uw_v8hi_mask", IX86_BUILTIN_VCVTPH2UW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v16hi_mask, "__builtin_ia32_vcvtph2uw_v16hi_mask", IX86_BUILTIN_VCVTPH2UW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3058,6 +3070,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getexpv32hf_mask_round,
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_sgetexpv8hf_mask_round, "__builtin_ia32_getexpsh_mask_round", IX86_BUILTIN_GETEXPSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getmantv32hf_mask_round, "__builtin_ia32_getmantph512_mask", IX86_BUILTIN_GETMANTPH512, UNKNOWN, (int) V32HF_FTYPE_V32HF_INT_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vgetmantv8hf_mask_round, "__builtin_ia32_getmantsh_mask_round", IX86_BUILTIN_GETMANTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq_v16si_mask_round", IX86_BUILTIN_VCVTPH2DQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq_v16si_mask_round", IX86_BUILTIN_VCVTPH2UDQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq_v8di_mask_round", IX86_BUILTIN_VCVTPH2QQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq_v8di_mask_round", IX86_BUILTIN_VCVTPH2UQQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w_v32hi_mask_round", IX86_BUILTIN_VCVTPH2W_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw_v32hi_mask_round", IX86_BUILTIN_VCVTPH2UW_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index bfc7fc75b97..59d1f4f5eea 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9565,9 +9565,13 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16HF_FTYPE_V16HF_V16HF_UHI:
     case V8SF_FTYPE_V8HI_V8SF_UQI:
     case V4SF_FTYPE_V8HI_V4SF_UQI:
+    case V8SI_FTYPE_V8HF_V8SI_UQI:
     case V8SI_FTYPE_V8SF_V8SI_UQI:
     case V4SI_FTYPE_V4SF_V4SI_UQI:
+    case V4SI_FTYPE_V8HF_V4SI_UQI:
+    case V4DI_FTYPE_V8HF_V4DI_UQI:
     case V4DI_FTYPE_V4SF_V4DI_UQI:
+    case V2DI_FTYPE_V8HF_V2DI_UQI:
     case V2DI_FTYPE_V4SF_V2DI_UQI:
     case V8HF_FTYPE_V8HF_V8HF_UQI:
     case V4SF_FTYPE_V4DI_V4SF_UQI:
@@ -9578,6 +9582,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16QI_FTYPE_V16HI_V16QI_UHI:
     case V16QI_FTYPE_V4SI_V16QI_UQI:
     case V16QI_FTYPE_V8SI_V16QI_UQI:
+    case V8HI_FTYPE_V8HF_V8HI_UQI:
     case V8HI_FTYPE_V4SI_V8HI_UQI:
     case V8HI_FTYPE_V8SI_V8HI_UQI:
     case V16QI_FTYPE_V2DI_V16QI_UQI:
@@ -9635,6 +9640,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V8DI_FTYPE_DI_V8DI_UQI:
     case V16SF_FTYPE_V8SF_V16SF_UHI:
     case V16SI_FTYPE_V8SI_V16SI_UHI:
+    case V16HI_FTYPE_V16HF_V16HI_UHI:
     case V16HI_FTYPE_V16HI_V16HI_UHI:
     case V8HI_FTYPE_V16QI_V8HI_UQI:
     case V16HI_FTYPE_V16QI_V16HI_UHI:
@@ -10501,7 +10507,9 @@ ix86_expand_round_builtin (const struct builtin_description *d,
       break;
     case V8SF_FTYPE_V8DF_V8SF_QI_INT:
     case V8DF_FTYPE_V8DF_V8DF_QI_INT:
+    case V32HI_FTYPE_V32HF_V32HI_USI_INT:
     case V8SI_FTYPE_V8DF_V8SI_QI_INT:
+    case V8DI_FTYPE_V8HF_V8DI_UQI_INT:
     case V8DI_FTYPE_V8DF_V8DI_QI_INT:
     case V8SF_FTYPE_V8DI_V8SF_QI_INT:
     case V8DF_FTYPE_V8DI_V8DF_QI_INT:
@@ -10510,6 +10518,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V8DI_FTYPE_V8SF_V8DI_QI_INT:
     case V16SF_FTYPE_V16SI_V16SF_HI_INT:
     case V16SI_FTYPE_V16SF_V16SI_HI_INT:
+    case V16SI_FTYPE_V16HF_V16SI_UHI_INT:
     case V8DF_FTYPE_V8SF_V8DF_QI_INT:
     case V16SF_FTYPE_V16HI_V16SF_HI_INT:
     case V2DF_FTYPE_V2DF_V2DF_V2DF_INT:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 97f7c698d5d..7b705422396 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -722,6 +722,11 @@ (define_mode_attr ssebytemode
   [(V8DI "V64QI") (V4DI "V32QI") (V2DI "V16QI")
    (V16SI "V64QI") (V8SI "V32QI") (V4SI "V16QI")])
 
+(define_mode_attr sseintconvert
+  [(V32HI "w") (V16HI "w") (V8HI "w")
+   (V16SI "dq") (V8SI "dq") (V4SI "dq")
+   (V8DI "qq") (V4DI "qq") (V2DI "qq")])
+
 ;; All 128bit vector integer modes
 (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI])
 
@@ -943,6 +948,12 @@ (define_mode_attr ssehalfvecmodelower
    (V4SF  "v2sf")
    (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")])
 
+;; Mapping of vector modes to vector hf modes of conversion.
+(define_mode_attr ssePHmode
+  [(V32HI "V32HF") (V16HI "V16HF") (V8HI "V8HF")
+   (V16SI "V16HF") (V8SI "V8HF") (V4SI "V8HF")
+   (V8DI "V8HF") (V4DI "V8HF") (V2DI "V8HF")])
+
 ;; Mapping of vector modes to packed single mode of the same size
 (define_mode_attr ssePSmode
   [(V16SI "V16SF") (V8DF "V16SF")
@@ -5408,6 +5419,30 @@ (define_insn "*fma4i_vmfnmsub_<mode>"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
+;; Parallel half-precision floating point conversion operations
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_int_iterator UNSPEC_US_FIX_NOTRUNC
+	[UNSPEC_UNSIGNED_FIX_NOTRUNC UNSPEC_FIX_NOTRUNC])
+
+(define_int_attr sseintconvertsignprefix
+	[(UNSPEC_UNSIGNED_FIX_NOTRUNC "u")
+	 (UNSPEC_FIX_NOTRUNC "")])
+
+(define_insn "avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode><mask_name><round_name>"
+  [(set (match_operand:VI248_AVX512VL 0 "register_operand" "=v")
+        (unspec:VI248_AVX512VL
+	   [(match_operand:<ssePHmode> 1 "<round_nimm_predicate>" "<round_constraint>")]
+	   UNSPEC_US_FIX_NOTRUNC))]
+  "TARGET_AVX512FP16"
+  "vcvtph2<sseintconvertsignprefix><sseintconvert>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel single-precision floating point conversion operations
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index b3cffa0644f..cdfc2e3b69f 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -719,6 +719,12 @@
 #define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4)
 #define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8)
 #define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4)
+#define __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 67ef567e437..5e4aaf8ce9b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -736,6 +736,12 @@
 #define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4)
 #define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8)
 #define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4)
+#define __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 04163874f90..32aa4518703 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -678,6 +678,12 @@ test_1 (_mm_roundscale_ph, __m128h, __m128h, 123)
 test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123)
 test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123)
 test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8)
+test_1 (_mm512_cvt_roundph_epi16, __m512i, __m512h, 8)
+test_1 (_mm512_cvt_roundph_epu16, __m512i, __m512h, 8)
+test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8)
+test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8)
+test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8)
+test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8)
 test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
@@ -710,6 +716,12 @@ test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123)
 test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123)
 test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8)
 test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epi16, __m512i, __mmask32, __m512h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epu16, __m512i, __mmask32, __m512h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -748,6 +760,12 @@ test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123)
 test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123)
 test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
+test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
+test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 008600a393d..44ac10d602f 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -783,6 +783,12 @@ test_1 (_mm_roundscale_ph, __m128h, __m128h, 123)
 test_1 (_mm256_roundscale_ph, __m256h, __m256h, 123)
 test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123)
 test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8)
+test_1 (_mm512_cvt_roundph_epi16, __m512i, __m512h, 8)
+test_1 (_mm512_cvt_roundph_epu16, __m512i, __m512h, 8)
+test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8)
+test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8)
+test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8)
+test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8)
 test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
@@ -814,6 +820,12 @@ test_2 (_mm512_maskz_roundscale_ph, __m512h, __mmask32, __m512h, 123)
 test_2 (_mm_roundscale_sh, __m128h, __m128h, __m128h, 123)
 test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8)
 test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epi16, __m512i, __mmask32, __m512h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epu16, __m512i, __mmask32, __m512h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -851,6 +863,12 @@ test_3 (_mm512_mask_roundscale_ph, __m512h, __m512h, __mmask32, __m512h, 123)
 test_3 (_mm_maskz_roundscale_sh, __m128h, __mmask8, __m128h, __m128h, 123)
 test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
+test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
+test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index b3f07587acb..ae6151b4a61 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -737,6 +737,12 @@
 #define __builtin_ia32_getexpsh_mask_round(A, B, C, D, E) __builtin_ia32_getexpsh_mask_round(A, B, C, D, 4)
 #define __builtin_ia32_getmantph512_mask(A, F, C, D, E) __builtin_ia32_getmantph512_mask(A, 1, C, D, 8)
 #define __builtin_ia32_getmantsh_mask_round(A, B, C, W, U, D) __builtin_ia32_getmantsh_mask_round(A, B, 1, W, U, 4)
+#define __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2dq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 27/62] AVX512FP16: Add testcase for vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (25 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 26/62] AVX512FP16: Add vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 28/62] AVX512FP16: Add vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph liuhongt
                   ` (34 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h (V512): Add QI
	components.
	* gcc.target/i386/avx512fp16-vcvtph2dq-1a.c: New test.
	* gcc.target/i386/avx512fp16-vcvtph2dq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2qq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2qq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2udq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2udq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2uw-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2uw-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2w-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2w-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-helper.h       | 25 +++++-
 .../gcc.target/i386/avx512fp16-vcvtph2dq-1a.c | 24 ++++++
 .../gcc.target/i386/avx512fp16-vcvtph2dq-1b.c | 79 +++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtph2qq-1a.c | 24 ++++++
 .../gcc.target/i386/avx512fp16-vcvtph2qq-1b.c | 78 +++++++++++++++++
 .../i386/avx512fp16-vcvtph2udq-1a.c           | 24 ++++++
 .../i386/avx512fp16-vcvtph2udq-1b.c           | 79 +++++++++++++++++
 .../i386/avx512fp16-vcvtph2uqq-1a.c           | 24 ++++++
 .../i386/avx512fp16-vcvtph2uqq-1b.c           | 78 +++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtph2uw-1a.c | 24 ++++++
 .../gcc.target/i386/avx512fp16-vcvtph2uw-1b.c | 84 +++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtph2w-1a.c  | 24 ++++++
 .../gcc.target/i386/avx512fp16-vcvtph2w-1b.c  | 83 ++++++++++++++++++
 .../i386/avx512fp16vl-vcvtph2dq-1a.c          | 27 ++++++
 .../i386/avx512fp16vl-vcvtph2dq-1b.c          | 15 ++++
 .../i386/avx512fp16vl-vcvtph2qq-1a.c          | 27 ++++++
 .../i386/avx512fp16vl-vcvtph2qq-1b.c          | 15 ++++
 .../i386/avx512fp16vl-vcvtph2udq-1a.c         | 27 ++++++
 .../i386/avx512fp16vl-vcvtph2udq-1b.c         | 15 ++++
 .../i386/avx512fp16vl-vcvtph2uqq-1a.c         | 27 ++++++
 .../i386/avx512fp16vl-vcvtph2uqq-1b.c         | 15 ++++
 .../i386/avx512fp16vl-vcvtph2uw-1a.c          | 29 +++++++
 .../i386/avx512fp16vl-vcvtph2uw-1b.c          | 15 ++++
 .../i386/avx512fp16vl-vcvtph2w-1a.c           | 29 +++++++
 .../i386/avx512fp16vl-vcvtph2w-1b.c           | 15 ++++
 25 files changed, 903 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
index f6f46872c35..aa83b66998c 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
@@ -25,13 +25,17 @@ typedef union
 {
   __m512          zmm;
   __m512h         zmmh;
+  __m512i         zmmi;
   __m256          ymm[2];
   __m256h         ymmh[2];
   __m256i         ymmi[2];
   __m128h         xmmh[4];
   __m128	  xmm[4];
+  __m128i	  xmmi[4];
   unsigned short  u16[32];
   unsigned int    u32[16];
+  long long	  s64[8];
+  unsigned long long u64[8];
   float           f32[16];
   _Float16        f16[32];
 } V512;
@@ -162,9 +166,9 @@ init_src()
     int i;
 
     for (i = 0; i < AVX512F_MAX_ELEM; i++) {
-        v1.f32[i] = -i + 1;
+        v1.f32[i] = i + 1;
         v2.f32[i] = i * 0.5f;
-        v3.f32[i] = i * 2.5f;
+        v3.f32[i] = i * 1.5f;
         v4.f32[i] = i - 0.5f;
 
         src3.u32[i] = (i + 1) * 10;
@@ -217,30 +221,45 @@ init_dest(V512 * res, V512 * exp)
 #if AVX512F_LEN == 256
 #undef HF
 #undef SF
+#undef SI
+#undef H_HF
 #undef NET_MASK 
-#undef MASK_VALUE 
+#undef MASK_VALUE
+#undef HALF_MASK
 #undef ZMASK_VALUE 
 #define NET_MASK 0xffff
 #define MASK_VALUE 0xcccc
 #define ZMASK_VALUE 0xfcc1
+#define HALF_MASK 0xcc
 #define HF(x) x.ymmh[0]
+#define H_HF(x) x.xmmh[0]
 #define SF(x) x.ymm[0]
+#define SI(x) x.ymmi[0]
 #elif AVX512F_LEN == 128
 #undef HF
 #undef SF
+#undef SI
+#undef H_HF
 #undef NET_MASK 
 #undef MASK_VALUE 
 #undef ZMASK_VALUE 
+#undef HALF_MASK
 #define NET_MASK 0xff
 #define MASK_VALUE 0xcc
+#define HALF_MASK MASK_VALUE
 #define ZMASK_VALUE 0xc1
 #define HF(x) x.xmmh[0]
 #define SF(x) x.xmm[0]
+#define SI(x) x.xmmi[0]
+#define H_HF(x) x.xmmh[0]
 #else
 #define NET_MASK 0xffffffff
 #define MASK_VALUE 0xcccccccc
 #define ZMASK_VALUE 0xfcc1fcc1
+#define HALF_MASK 0xcccc
 #define HF(x) x.zmmh
 #define SF(x) x.zmm
+#define SI(x) x.zmmi
+#define H_HF(x) x.ymmh[0]
 #endif
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1a.c
new file mode 100644
index 00000000000..31a56393f0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\{rz-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m256h x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtph_epi32 (x1);
+  res1 = _mm512_mask_cvtph_epi32 (res, m16, x2);
+  res2 = _mm512_maskz_cvtph_epi32 (m16, x3);
+  res = _mm512_cvt_roundph_epi32 (x1, 4);
+  res1 = _mm512_mask_cvt_roundph_epi32 (res, m16, x2, 8);
+  res2 = _mm512_maskz_cvt_roundph_epi32 (m16, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1b.c
new file mode 100644
index 00000000000..80a85828271
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2dq-1b.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_d) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u32[i] = 0;
+      }
+      else {
+	v5.u32[i] = dest->u32[i];
+      }
+    }
+    else {
+      v5.u32[i] = v1.f32[i];
+
+    }
+  }
+  *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_d)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtph_epi32) (H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epi32);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0);
+  SI(res) = INTRINSIC (_mask_cvtph_epi32) (SI(res), HALF_MASK, H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epi32);
+
+  EMULATE(cvtph2_d)(&exp, src1,  HALF_MASK, 1);
+  SI(res) = INTRINSIC (_maskz_cvtph_epi32) (HALF_MASK, H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epi32);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_d)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvt_roundph_epi32) (H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epi32);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0);
+  SI(res) = INTRINSIC (_mask_cvt_roundph_epi32) (SI(res), HALF_MASK, H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epi32);
+
+  EMULATE(cvtph2_d)(&exp, src1,  HALF_MASK, 1);
+  SI(res) = INTRINSIC (_maskz_cvt_roundph_epi32) (HALF_MASK, H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epi32);
+#endif
+
+  if (n_errs != 0)
+    abort ();
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1a.c
new file mode 100644
index 00000000000..d80ee611f3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m128h x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtph_epi64 (x1);
+  res1 = _mm512_mask_cvtph_epi64 (res, m8, x2);
+  res2 = _mm512_maskz_cvtph_epi64 (m8, x3);
+  res = _mm512_cvt_roundph_epi64 (x1, 4);
+  res1 = _mm512_mask_cvt_roundph_epi64 (res, m8, x2, 8);
+  res2 = _mm512_maskz_cvt_roundph_epi64 (m8, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1b.c
new file mode 100644
index 00000000000..42b21cf2e4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2qq-1b.c
@@ -0,0 +1,78 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_q) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 8; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u64[i] = 0;
+      }
+      else {
+	v5.u64[i] = dest->u64[i];
+      }
+    }
+    else {
+      v5.u64[i] = v1.f32[i];
+    }
+  }
+  *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_q)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtph_epi64) (src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epi64);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0);
+  SI(res) = INTRINSIC (_mask_cvtph_epi64) (SI(res), 0xcc, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epi64);
+
+  EMULATE(cvtph2_q)(&exp, src1,  0xfa, 1);
+  SI(res) = INTRINSIC (_maskz_cvtph_epi64) (0xfa, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epi64);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_q)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvt_roundph_epi64) (src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epi64);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0);
+  SI(res) = INTRINSIC (_mask_cvt_roundph_epi64) (SI(res), 0xcc, src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epi64);
+
+  EMULATE(cvtph2_q)(&exp, src1,  0xfa, 1);
+  SI(res) = INTRINSIC (_maskz_cvt_roundph_epi64) (0xfa, src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epi64);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1a.c
new file mode 100644
index 00000000000..b4a833afdab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\{rn-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\{rz-sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m256h x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtph_epu32 (x1);
+  res1 = _mm512_mask_cvtph_epu32 (res, m16, x2);
+  res2 = _mm512_maskz_cvtph_epu32 (m16, x3);
+  res = _mm512_cvt_roundph_epu32 (x1, 4);
+  res1 = _mm512_mask_cvt_roundph_epu32 (res, m16, x2, 8);
+  res2 = _mm512_maskz_cvt_roundph_epu32 (m16, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1b.c
new file mode 100644
index 00000000000..15fa0ba2b4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2udq-1b.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_d) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u32[i] = 0;
+      }
+      else {
+	v5.u32[i] = dest->u32[i];
+      }
+    }
+    else {
+      v5.u32[i] = v1.f32[i];
+
+    }
+  }
+  *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_d)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtph_epu32) (H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epu32);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0);
+  SI(res) = INTRINSIC (_mask_cvtph_epu32) (SI(res), HALF_MASK, H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epu32);
+
+  EMULATE(cvtph2_d)(&exp, src1,  HALF_MASK, 1);
+  SI(res) = INTRINSIC (_maskz_cvtph_epu32) (HALF_MASK, H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epu32);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_d)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvt_roundph_epu32) (H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epu32);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0);
+  SI(res) = INTRINSIC (_mask_cvt_roundph_epu32) (SI(res), HALF_MASK, H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epu32);
+
+  EMULATE(cvtph2_d)(&exp, src1,  HALF_MASK, 1);
+  SI(res) = INTRINSIC (_maskz_cvt_roundph_epu32) (HALF_MASK, H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epu32);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c
new file mode 100644
index 00000000000..b4087798be9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m128h x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtph_epu64 (x1);
+  res1 = _mm512_mask_cvtph_epu64 (res, m8, x2);
+  res2 = _mm512_maskz_cvtph_epu64 (m8, x3);
+  res = _mm512_cvt_roundph_epu64 (x1, 4);
+  res1 = _mm512_mask_cvt_roundph_epu64 (res, m8, x2, 8);
+  res2 = _mm512_maskz_cvt_roundph_epu64 (m8, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c
new file mode 100644
index 00000000000..7f34772aca6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uqq-1b.c
@@ -0,0 +1,78 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_q) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 8; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u64[i] = 0;
+      }
+      else {
+	v5.u64[i] = dest->u64[i];
+      }
+    }
+    else {
+      v5.u64[i] = v1.f32[i];
+    }
+  }
+  *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_q)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtph_epu64) (src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epu64);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0);
+  SI(res) = INTRINSIC (_mask_cvtph_epu64) (SI(res), 0xcc, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epu64);
+
+  EMULATE(cvtph2_q)(&exp, src1,  0xfc, 1);
+  SI(res) = INTRINSIC (_maskz_cvtph_epu64) (0xfc, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epu64);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_q)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvt_roundph_epu64) (src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epu64);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0);
+  SI(res) = INTRINSIC (_mask_cvt_roundph_epu64) (SI(res), 0xcc, src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epu64);
+
+  EMULATE(cvtph2_q)(&exp, src1,  0xfc, 1);
+  SI(res) = INTRINSIC (_maskz_cvt_roundph_epu64) (0xfc, src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epu64);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1a.c
new file mode 100644
index 00000000000..262274526b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtph_epu16 (x1);
+  res1 = _mm512_mask_cvtph_epu16 (res, m32, x2);
+  res2 = _mm512_maskz_cvtph_epu16 (m32, x3);
+  res = _mm512_cvt_roundph_epu16 (x1, 4);
+  res1 = _mm512_mask_cvt_roundph_epu16 (res, m32, x2, 8);
+  res2 = _mm512_maskz_cvt_roundph_epu16 (m32, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1b.c
new file mode 100644
index 00000000000..437a1f0eeae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2uw-1b.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_w) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	dest->u16[i] = 0;
+      }
+    }
+    else {
+      dest->u16[i] = v1.f32[i];
+
+    }
+
+    if (((1 << i) & m2) == 0) {
+      if (zero_mask) {
+	dest->u16[i+16] = 0;
+      }
+    }
+    else {
+      dest->u16[i+16] = v2.f32[i];
+    }
+  }
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_w)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtph_epu16) (HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epu16);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0);
+  SI(res) = INTRINSIC (_mask_cvtph_epu16) (SI(res), MASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epu16);
+
+  EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1);
+  SI(res) = INTRINSIC (_maskz_cvtph_epu16) (ZMASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epu16);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_w)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvt_roundph_epu16) (HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epu16);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0);
+  SI(res) = INTRINSIC (_mask_cvt_roundph_epu16) (SI(res), MASK_VALUE, HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epu16);
+
+  EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1);
+  SI(res) = INTRINSIC (_maskz_cvt_roundph_epu16) (ZMASK_VALUE, HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epu16);
+#endif
+
+  if (n_errs != 0)
+    abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1a.c
new file mode 100644
index 00000000000..bcaa7446d34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtph_epi16 (x1);
+  res1 = _mm512_mask_cvtph_epi16 (res, m32, x2);
+  res2 = _mm512_maskz_cvtph_epi16 (m32, x3);
+  res = _mm512_cvt_roundph_epi16 (x1, 4);
+  res1 = _mm512_mask_cvt_roundph_epi16 (res, m32, x2, 8);
+  res2 = _mm512_maskz_cvt_roundph_epi16 (m32, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1b.c
new file mode 100644
index 00000000000..dfa20523932
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2w-1b.c
@@ -0,0 +1,83 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_w) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	dest->u16[i] = 0;
+      }
+    }
+    else {
+      dest->u16[i] = v1.f32[i];
+
+    }
+
+    if (((1 << i) & m2) == 0) {
+      if (zero_mask) {
+	dest->u16[i+16] = 0;
+      }
+    }
+    else {
+      dest->u16[i+16] = v2.f32[i];
+    }
+  }
+}
+
+void
+TEST (void)
+{
+  V512 res, exp;
+
+  init_src();
+
+  EMULATE(cvtph2_w)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtph_epi16) (HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_epi16);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0);
+  SI(res) = INTRINSIC (_mask_cvtph_epi16) (SI(res), MASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_epi16);
+
+  EMULATE(cvtph2_w)(&exp, src1,  ZMASK_VALUE, 1);
+  SI(res) = INTRINSIC (_maskz_cvtph_epi16) (ZMASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_epi16);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_w)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvt_roundph_epi16) (HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_epi16);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0);
+  SI(res) = INTRINSIC (_mask_cvt_roundph_epi16) (SI(res), MASK_VALUE, HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_epi16);
+
+  EMULATE(cvtph2_w)(&exp, src1,  ZMASK_VALUE, 1);
+  SI(res) = INTRINSIC (_maskz_cvt_roundph_epi16) (ZMASK_VALUE, HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_epi16);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c
new file mode 100644
index 00000000000..df653b0b2c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvtph_epi32 (x3);
+  res1 = _mm256_mask_cvtph_epi32 (res1, m8, x3);
+  res1 = _mm256_maskz_cvtph_epi32 (m8, x3);
+
+  res2 = _mm_cvtph_epi32 (x3);
+  res2 = _mm_mask_cvtph_epi32 (res2, m8, x3);
+  res2 = _mm_maskz_cvtph_epi32 (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c
new file mode 100644
index 00000000000..93a3e903da4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2dq-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2dq-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2dq-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c
new file mode 100644
index 00000000000..ddc6f2a702e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvtph_epi64 (x3);
+  res1 = _mm256_mask_cvtph_epi64 (res1, m8, x3);
+  res1 = _mm256_maskz_cvtph_epi64 (m8, x3);
+
+  res2 = _mm_cvtph_epi64 (x3);
+  res2 = _mm_mask_cvtph_epi64 (res2, m8, x3);
+  res2 = _mm_maskz_cvtph_epi64 (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c
new file mode 100644
index 00000000000..5afc5a1836b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2qq-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2qq-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2qq-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c
new file mode 100644
index 00000000000..d07d76647a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvtph_epu32 (x3);
+  res1 = _mm256_mask_cvtph_epu32 (res1, m8, x3);
+  res1 = _mm256_maskz_cvtph_epu32 (m8, x3);
+
+  res2 = _mm_cvtph_epu32 (x3);
+  res2 = _mm_mask_cvtph_epu32 (res2, m8, x3);
+  res2 = _mm_maskz_cvtph_epu32 (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c
new file mode 100644
index 00000000000..d869a0ca259
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2udq-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2udq-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2udq-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c
new file mode 100644
index 00000000000..26dbf227d81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvtph_epu64 (x3);
+  res1 = _mm256_mask_cvtph_epu64 (res1, m8, x3);
+  res1 = _mm256_maskz_cvtph_epu64 (m8, x3);
+
+  res2 = _mm_cvtph_epu64 (x3);
+  res2 = _mm_mask_cvtph_epu64 (res2, m8, x3);
+  res2 = _mm_maskz_cvtph_epu64 (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c
new file mode 100644
index 00000000000..d9b10a82f8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uqq-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2uqq-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2uqq-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c
new file mode 100644
index 00000000000..0f9fd27881c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m256h x3;
+volatile __m128h x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvtph_epu16 (x3);
+  res1 = _mm256_mask_cvtph_epu16 (res1, m16, x3);
+  res1 = _mm256_maskz_cvtph_epu16 (m16, x3);
+
+  res2 = _mm_cvtph_epu16 (x4);
+  res2 = _mm_mask_cvtph_epu16 (res2, m8, x4);
+  res2 = _mm_maskz_cvtph_epu16 (m8, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c
new file mode 100644
index 00000000000..280dcd75320
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2uw-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2uw-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2uw-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c
new file mode 100644
index 00000000000..8dee4ee25d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m256h x3;
+volatile __m128h x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvtph_epi16 (x3);
+  res1 = _mm256_mask_cvtph_epi16 (res1, m16, x3);
+  res1 = _mm256_maskz_cvtph_epi16 (m16, x3);
+
+  res2 = _mm_cvtph_epi16 (x4);
+  res2 = _mm_mask_cvtph_epi16 (res2, m8, x4);
+  res2 = _mm_maskz_cvtph_epi16 (m8, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c
new file mode 100644
index 00000000000..739ba6478ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2w-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2w-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2w-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 28/62] AVX512FP16: Add vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (26 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 27/62] AVX512FP16: Add testcase for vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 29/62] AVX512FP16: Add testcase for vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph liuhongt
                   ` (33 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm512_cvtepi32_ph): New
	intrinsic.
	(_mm512_mask_cvtepi32_ph): Likewise.
	(_mm512_maskz_cvtepi32_ph): Likewise.
	(_mm512_cvt_roundepi32_ph): Likewise.
	(_mm512_mask_cvt_roundepi32_ph): Likewise.
	(_mm512_maskz_cvt_roundepi32_ph): Likewise.
	(_mm512_cvtepu32_ph): Likewise.
	(_mm512_mask_cvtepu32_ph): Likewise.
	(_mm512_maskz_cvtepu32_ph): Likewise.
	(_mm512_cvt_roundepu32_ph): Likewise.
	(_mm512_mask_cvt_roundepu32_ph): Likewise.
	(_mm512_maskz_cvt_roundepu32_ph): Likewise.
	(_mm512_cvtepi64_ph): Likewise.
	(_mm512_mask_cvtepi64_ph): Likewise.
	(_mm512_maskz_cvtepi64_ph): Likewise.
	(_mm512_cvt_roundepi64_ph): Likewise.
	(_mm512_mask_cvt_roundepi64_ph): Likewise.
	(_mm512_maskz_cvt_roundepi64_ph): Likewise.
	(_mm512_cvtepu64_ph): Likewise.
	(_mm512_mask_cvtepu64_ph): Likewise.
	(_mm512_maskz_cvtepu64_ph): Likewise.
	(_mm512_cvt_roundepu64_ph): Likewise.
	(_mm512_mask_cvt_roundepu64_ph): Likewise.
	(_mm512_maskz_cvt_roundepu64_ph): Likewise.
	(_mm512_cvtepi16_ph): Likewise.
	(_mm512_mask_cvtepi16_ph): Likewise.
	(_mm512_maskz_cvtepi16_ph): Likewise.
	(_mm512_cvt_roundepi16_ph): Likewise.
	(_mm512_mask_cvt_roundepi16_ph): Likewise.
	(_mm512_maskz_cvt_roundepi16_ph): Likewise.
	(_mm512_cvtepu16_ph): Likewise.
	(_mm512_mask_cvtepu16_ph): Likewise.
	(_mm512_maskz_cvtepu16_ph): Likewise.
	(_mm512_cvt_roundepu16_ph): Likewise.
	(_mm512_mask_cvt_roundepu16_ph): Likewise.
	(_mm512_maskz_cvt_roundepu16_ph): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_cvtepi32_ph): New
	intrinsic.
	(_mm_mask_cvtepi32_ph): Likewise.
	(_mm_maskz_cvtepi32_ph): Likewise.
	(_mm256_cvtepi32_ph): Likewise.
	(_mm256_mask_cvtepi32_ph): Likewise.
	(_mm256_maskz_cvtepi32_ph): Likewise.
	(_mm_cvtepu32_ph): Likewise.
	(_mm_mask_cvtepu32_ph): Likewise.
	(_mm_maskz_cvtepu32_ph): Likewise.
	(_mm256_cvtepu32_ph): Likewise.
	(_mm256_mask_cvtepu32_ph): Likewise.
	(_mm256_maskz_cvtepu32_ph): Likewise.
	(_mm_cvtepi64_ph): Likewise.
	(_mm_mask_cvtepi64_ph): Likewise.
	(_mm_maskz_cvtepi64_ph): Likewise.
	(_mm256_cvtepi64_ph): Likewise.
	(_mm256_mask_cvtepi64_ph): Likewise.
	(_mm256_maskz_cvtepi64_ph): Likewise.
	(_mm_cvtepu64_ph): Likewise.
	(_mm_mask_cvtepu64_ph): Likewise.
	(_mm_maskz_cvtepu64_ph): Likewise.
	(_mm256_cvtepu64_ph): Likewise.
	(_mm256_mask_cvtepu64_ph): Likewise.
	(_mm256_maskz_cvtepu64_ph): Likewise.
	(_mm_cvtepi16_ph): Likewise.
	(_mm_mask_cvtepi16_ph): Likewise.
	(_mm_maskz_cvtepi16_ph): Likewise.
	(_mm256_cvtepi16_ph): Likewise.
	(_mm256_mask_cvtepi16_ph): Likewise.
	(_mm256_maskz_cvtepi16_ph): Likewise.
	(_mm_cvtepu16_ph): Likewise.
	(_mm_mask_cvtepu16_ph): Likewise.
	(_mm_maskz_cvtepu16_ph): Likewise.
	(_mm256_cvtepu16_ph): Likewise.
	(_mm256_mask_cvtepu16_ph): Likewise.
	(_mm256_maskz_cvtepu16_ph): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c
	(ix86_expand_args_builtin): Handle new builtin types.
	(ix86_expand_round_builtin): Ditto.
	* config/i386/i386-modes.def: Declare V2HF and V6HF.
	* config/i386/sse.md (VI2H_AVX512VL): New.
	(qq2phsuff): Ditto.
	(sseintvecmode): Add HF vector modes.
	(avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode><mask_name><round_name>):
	New.
	(avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>): Ditto.
	(*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>): Ditto.
	(avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask): Ditto.
	(*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask): Ditto.
	(*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask_1): Ditto.
	(avx512fp16_vcvt<floatsuffix>qq2ph_v2di): Ditto.
	(*avx512fp16_vcvt<floatsuffix>qq2ph_v2di): Ditto.
	(avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask): Ditto.
	(*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask): Ditto.
	(*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask_1): Ditto.
	* config/i386/subst.md (round_qq2phsuff): New subst_attr.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 492 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   | 312 ++++++++++++++++
 gcc/config/i386/i386-builtin-types.def |   9 +
 gcc/config/i386/i386-builtin.def       |  18 +
 gcc/config/i386/i386-expand.c          |   9 +
 gcc/config/i386/i386-modes.def         |   2 +
 gcc/config/i386/sse.md                 | 153 +++++++-
 gcc/config/i386/subst.md               |   1 +
 gcc/testsuite/gcc.target/i386/avx-1.c  |   6 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   6 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  18 +
 gcc/testsuite/gcc.target/i386/sse-22.c |  18 +
 gcc/testsuite/gcc.target/i386/sse-23.c |   6 +
 13 files changed, 1047 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 42576c4ae2e..bd801942365 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -2702,6 +2702,172 @@ _mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvtdq2ph.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepi32_ph (__m512i __A)
+{
+  return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __A,
+						    _mm256_setzero_ph (),
+						    (__mmask16) -1,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtepi32_ph (__m256h __A, __mmask16 __B, __m512i __C)
+{
+  return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __C,
+						    __A,
+						    __B,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtepi32_ph (__mmask16 __A, __m512i __B)
+{
+  return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __B,
+						    _mm256_setzero_ph (),
+						    __A,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundepi32_ph (__m512i __A, int __B)
+{
+  return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __A,
+						    _mm256_setzero_ph (),
+						    (__mmask16) -1,
+						    __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundepi32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
+{
+  return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __C,
+						    __A,
+						    __B,
+						    __D);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundepi32_ph (__mmask16 __A, __m512i __B, int __C)
+{
+  return __builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si) __B,
+						    _mm256_setzero_ph (),
+						    __A,
+						    __C);
+}
+
+#else
+#define _mm512_cvt_roundepi32_ph(A, B)					\
+  (__builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si)(A),		\
+					      _mm256_setzero_ph (),	\
+					      (__mmask16)-1,		\
+					      (B)))
+
+#define _mm512_mask_cvt_roundepi32_ph(A, B, C, D)		\
+  (__builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si)(C),	\
+					      (A),		\
+					      (B),		\
+					      (D)))
+
+#define _mm512_maskz_cvt_roundepi32_ph(A, B, C)				\
+  (__builtin_ia32_vcvtdq2ph_v16si_mask_round ((__v16si)(B),		\
+					      _mm256_setzero_ph (),	\
+					      (A),			\
+					      (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtudq2ph.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepu32_ph (__m512i __A)
+{
+  return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __A,
+						     _mm256_setzero_ph (),
+						     (__mmask16) -1,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtepu32_ph (__m256h __A, __mmask16 __B, __m512i __C)
+{
+  return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __C,
+						     __A,
+						     __B,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtepu32_ph (__mmask16 __A, __m512i __B)
+{
+  return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __B,
+						     _mm256_setzero_ph (),
+						     __A,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundepu32_ph (__m512i __A, int __B)
+{
+  return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __A,
+						     _mm256_setzero_ph (),
+						     (__mmask16) -1,
+						     __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundepu32_ph (__m256h __A, __mmask16 __B, __m512i __C, int __D)
+{
+  return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __C,
+						     __A,
+						     __B,
+						     __D);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundepu32_ph (__mmask16 __A, __m512i __B, int __C)
+{
+  return __builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si) __B,
+						     _mm256_setzero_ph (),
+						     __A,
+						     __C);
+}
+
+#else
+#define _mm512_cvt_roundepu32_ph(A, B)					\
+  (__builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si)(A),		\
+					       _mm256_setzero_ph (),	\
+					       (__mmask16)-1,		\
+					       B))
+
+#define _mm512_mask_cvt_roundepu32_ph(A, B, C, D)		\
+  (__builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si)C,	\
+					       A,		\
+					       B,		\
+					       D))
+
+#define _mm512_maskz_cvt_roundepu32_ph(A, B, C)				\
+  (__builtin_ia32_vcvtudq2ph_v16si_mask_round ((__v16si)B,		\
+					       _mm256_setzero_ph (),	\
+					       A,			\
+					       C))
+
+#endif /* __OPTIMIZE__ */
+
 /* Intrinsics vcvtph2qq.  */
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -2853,6 +3019,166 @@ _mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvtqq2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepi64_ph (__m512i __A)
+{
+  return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __A,
+						   _mm_setzero_ph (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m512i __C)
+{
+  return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __C,
+						   __A,
+						   __B,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtepi64_ph (__mmask8 __A, __m512i __B)
+{
+  return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __B,
+						   _mm_setzero_ph (),
+						   __A,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundepi64_ph (__m512i __A, int __B)
+{
+  return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __A,
+						   _mm_setzero_ph (),
+						   (__mmask8) -1,
+						   __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundepi64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
+{
+  return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __C,
+						   __A,
+						   __B,
+						   __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundepi64_ph (__mmask8 __A, __m512i __B, int __C)
+{
+  return __builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di) __B,
+						   _mm_setzero_ph (),
+						   __A,
+						   __C);
+}
+
+#else
+#define _mm512_cvt_roundepi64_ph(A, B)				\
+  (__builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di)(A),	\
+					     _mm_setzero_ph (),	\
+					     (__mmask8)-1,	\
+					     (B)))
+
+#define _mm512_mask_cvt_roundepi64_ph(A, B, C, D)			\
+  (__builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di)(C), (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundepi64_ph(A, B, C)			\
+  (__builtin_ia32_vcvtqq2ph_v8di_mask_round ((__v8di)(B),	\
+					     _mm_setzero_ph (),	\
+					     (A),		\
+					     (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtuqq2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepu64_ph (__m512i __A)
+{
+  return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __A,
+						    _mm_setzero_ph (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m512i __C)
+{
+  return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __C,
+						    __A,
+						    __B,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtepu64_ph (__mmask8 __A, __m512i __B)
+{
+  return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __B,
+						    _mm_setzero_ph (),
+						    __A,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundepu64_ph (__m512i __A, int __B)
+{
+  return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __A,
+						    _mm_setzero_ph (),
+						    (__mmask8) -1,
+						    __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundepu64_ph (__m128h __A, __mmask8 __B, __m512i __C, int __D)
+{
+  return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __C,
+						    __A,
+						    __B,
+						    __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundepu64_ph (__mmask8 __A, __m512i __B, int __C)
+{
+  return __builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di) __B,
+						    _mm_setzero_ph (),
+						    __A,
+						    __C);
+}
+
+#else
+#define _mm512_cvt_roundepu64_ph(A, B)					\
+  (__builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di)(A),		\
+					      _mm_setzero_ph (),	\
+					      (__mmask8)-1,		\
+					      (B)))
+
+#define _mm512_mask_cvt_roundepu64_ph(A, B, C, D)			\
+  (__builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di)(C), (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundepu64_ph(A, B, C)				\
+  (__builtin_ia32_vcvtuqq2ph_v8di_mask_round ((__v8di)(B),		\
+					      _mm_setzero_ph (),	\
+					      (A),			\
+					      (C)))
+
+#endif /* __OPTIMIZE__ */
+
 /* Intrinsics vcvtph2w.  */
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -3037,6 +3363,172 @@ _mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvtw2ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepi16_ph (__m512i __A)
+{
+  return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __A,
+						   _mm512_setzero_ph (),
+						   (__mmask32) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtepi16_ph (__m512h __A, __mmask32 __B, __m512i __C)
+{
+  return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __C,
+						   __A,
+						   __B,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtepi16_ph (__mmask32 __A, __m512i __B)
+{
+  return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __B,
+						   _mm512_setzero_ph (),
+						   __A,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundepi16_ph (__m512i __A, int __B)
+{
+  return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __A,
+						   _mm512_setzero_ph (),
+						   (__mmask32) -1,
+						   __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundepi16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
+{
+  return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __C,
+						   __A,
+						   __B,
+						   __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundepi16_ph (__mmask32 __A, __m512i __B, int __C)
+{
+  return __builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi) __B,
+						   _mm512_setzero_ph (),
+						   __A,
+						   __C);
+}
+
+#else
+#define _mm512_cvt_roundepi16_ph(A, B)					\
+  (__builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi)(A),		\
+					     _mm512_setzero_ph (),	\
+					     (__mmask32)-1,		\
+					     (B)))
+
+#define _mm512_mask_cvt_roundepi16_ph(A, B, C, D)		\
+  (__builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi)(C),	\
+					     (A),		\
+					     (B),		\
+					     (D)))
+
+#define _mm512_maskz_cvt_roundepi16_ph(A, B, C)				\
+  (__builtin_ia32_vcvtw2ph_v32hi_mask_round ((__v32hi)(B),		\
+					     _mm512_setzero_ph (),	\
+					     (A),			\
+					     (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtuw2ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtepu16_ph (__m512i __A)
+{
+  return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __A,
+						    _mm512_setzero_ph (),
+						    (__mmask32) -1,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtepu16_ph (__m512h __A, __mmask32 __B, __m512i __C)
+{
+  return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __C,
+						    __A,
+						    __B,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtepu16_ph (__mmask32 __A, __m512i __B)
+{
+  return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __B,
+						    _mm512_setzero_ph (),
+						    __A,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundepu16_ph (__m512i __A, int __B)
+{
+  return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __A,
+						    _mm512_setzero_ph (),
+						    (__mmask32) -1,
+						    __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundepu16_ph (__m512h __A, __mmask32 __B, __m512i __C, int __D)
+{
+  return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __C,
+						    __A,
+						    __B,
+						    __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C)
+{
+  return __builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi) __B,
+						    _mm512_setzero_ph (),
+						    __A,
+						    __C);
+}
+
+#else
+#define _mm512_cvt_roundepu16_ph(A, B)					\
+  (__builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi)(A),		\
+					      _mm512_setzero_ph (),	\
+					      (__mmask32)-1,		\
+					      (B)))
+
+#define _mm512_mask_cvt_roundepu16_ph(A, B, C, D)		\
+  (__builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi)(C),	\
+					      (A),		\
+					      (B),		\
+					      (D)))
+
+#define _mm512_maskz_cvt_roundepu16_ph(A, B, C)				\
+  (__builtin_ia32_vcvtuw2ph_v32hi_mask_round ((__v32hi)(B),		\
+					      _mm512_setzero_ph (),	\
+					      (A),			\
+					      (C)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index 8a7e0aaa6b1..93d9ff8bf3c 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -1050,6 +1050,110 @@ _mm256_maskz_cvtph_epu32 (__mmask8 __A, __m128h __B)
 					 __A);
 }
 
+/* Intrinsics vcvtdq2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi32_ph (__m128i __A)
+{
+  return __builtin_ia32_vcvtdq2ph_v4si_mask ((__v4si) __A,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtepi32_ph (__m128h __A, __mmask8 __B, __m128i __C)
+{
+  return __builtin_ia32_vcvtdq2ph_v4si_mask ((__v4si) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtepi32_ph (__mmask8 __A, __m128i __B)
+{
+  return __builtin_ia32_vcvtdq2ph_v4si_mask ((__v4si) __B,
+					     _mm_setzero_ph (),
+					     __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtepi32_ph (__m256i __A)
+{
+  return __builtin_ia32_vcvtdq2ph_v8si_mask ((__v8si) __A,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtepi32_ph (__m128h __A, __mmask8 __B, __m256i __C)
+{
+  return __builtin_ia32_vcvtdq2ph_v8si_mask ((__v8si) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtepi32_ph (__mmask8 __A, __m256i __B)
+{
+  return __builtin_ia32_vcvtdq2ph_v8si_mask ((__v8si) __B,
+					     _mm_setzero_ph (),
+					     __A);
+}
+
+/* Intrinsics vcvtudq2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepu32_ph (__m128i __A)
+{
+  return __builtin_ia32_vcvtudq2ph_v4si_mask ((__v4si) __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtepu32_ph (__m128h __A, __mmask8 __B, __m128i __C)
+{
+  return __builtin_ia32_vcvtudq2ph_v4si_mask ((__v4si) __C,
+					      __A,
+					      __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtepu32_ph (__mmask8 __A, __m128i __B)
+{
+  return __builtin_ia32_vcvtudq2ph_v4si_mask ((__v4si) __B,
+					      _mm_setzero_ph (),
+					      __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtepu32_ph (__m256i __A)
+{
+  return __builtin_ia32_vcvtudq2ph_v8si_mask ((__v8si) __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtepu32_ph (__m128h __A, __mmask8 __B, __m256i __C)
+{
+  return __builtin_ia32_vcvtudq2ph_v8si_mask ((__v8si) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtepu32_ph (__mmask8 __A, __m256i __B)
+{
+  return __builtin_ia32_vcvtudq2ph_v8si_mask ((__v8si) __B,
+					      _mm_setzero_ph (),
+					      __A);
+}
+
 /* Intrinsics vcvtph2qq.  */
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -1153,6 +1257,108 @@ _mm256_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B)
 					      __A);
 }
 
+/* Intrinsics vcvtqq2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi64_ph (__m128i __A)
+{
+  return __builtin_ia32_vcvtqq2ph_v2di_mask ((__v2di) __A,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m128i __C)
+{
+  return __builtin_ia32_vcvtqq2ph_v2di_mask ((__v2di) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtepi64_ph (__mmask8 __A, __m128i __B)
+{
+  return __builtin_ia32_vcvtqq2ph_v2di_mask ((__v2di) __B,
+					     _mm_setzero_ph (),
+					     __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtepi64_ph (__m256i __A)
+{
+  return __builtin_ia32_vcvtqq2ph_v4di_mask ((__v4di) __A,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtepi64_ph (__m128h __A, __mmask8 __B, __m256i __C)
+{
+  return __builtin_ia32_vcvtqq2ph_v4di_mask ((__v4di) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtepi64_ph (__mmask8 __A, __m256i __B)
+{
+  return __builtin_ia32_vcvtqq2ph_v4di_mask ((__v4di) __B,
+					     _mm_setzero_ph (),
+					     __A);
+}
+
+/* Intrinsics vcvtuqq2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepu64_ph (__m128i __A)
+{
+  return __builtin_ia32_vcvtuqq2ph_v2di_mask ((__v2di) __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m128i __C)
+{
+  return __builtin_ia32_vcvtuqq2ph_v2di_mask ((__v2di) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtepu64_ph (__mmask8 __A, __m128i __B)
+{
+  return __builtin_ia32_vcvtuqq2ph_v2di_mask ((__v2di) __B,
+					      _mm_setzero_ph (),
+					      __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtepu64_ph (__m256i __A)
+{
+  return __builtin_ia32_vcvtuqq2ph_v4di_mask ((__v4di) __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtepu64_ph (__m128h __A, __mmask8 __B, __m256i __C)
+{
+  return __builtin_ia32_vcvtuqq2ph_v4di_mask ((__v4di) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtepu64_ph (__mmask8 __A, __m256i __B)
+{
+  return __builtin_ia32_vcvtuqq2ph_v4di_mask ((__v4di) __B,
+					      _mm_setzero_ph (),
+					      __A);
+}
+
 /* Intrinsics vcvtph2w.  */
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -1275,6 +1481,112 @@ _mm256_maskz_cvtph_epu16 (__mmask16 __A, __m256h __B)
 					 __A);
 }
 
+/* Intrinsics vcvtw2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_ph (__m128i __A)
+{
+  return __builtin_ia32_vcvtw2ph_v8hi_mask ((__v8hi) __A,
+					    _mm_setzero_ph (),
+					    (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtepi16_ph (__m128h __A, __mmask8 __B, __m128i __C)
+{
+  return __builtin_ia32_vcvtw2ph_v8hi_mask ((__v8hi) __C,
+					    __A,
+					    __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtepi16_ph (__mmask8 __A, __m128i __B)
+{
+  return __builtin_ia32_vcvtw2ph_v8hi_mask ((__v8hi) __B,
+					    _mm_setzero_ph (),
+					    __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtepi16_ph (__m256i __A)
+{
+  return __builtin_ia32_vcvtw2ph_v16hi_mask ((__v16hi) __A,
+					     _mm256_setzero_ph (),
+					     (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtepi16_ph (__m256h __A, __mmask16 __B, __m256i __C)
+{
+  return __builtin_ia32_vcvtw2ph_v16hi_mask ((__v16hi) __C,
+					     __A,
+					     __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtepi16_ph (__mmask16 __A, __m256i __B)
+{
+  return __builtin_ia32_vcvtw2ph_v16hi_mask ((__v16hi) __B,
+					     _mm256_setzero_ph (),
+					     __A);
+}
+
+/* Intrinsics vcvtuw2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepu16_ph (__m128i __A)
+{
+  return __builtin_ia32_vcvtuw2ph_v8hi_mask ((__v8hi) __A,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtepu16_ph (__m128h __A, __mmask8 __B, __m128i __C)
+{
+  return __builtin_ia32_vcvtuw2ph_v8hi_mask ((__v8hi) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtepu16_ph (__mmask8 __A, __m128i __B)
+{
+  return __builtin_ia32_vcvtuw2ph_v8hi_mask ((__v8hi) __B,
+					     _mm_setzero_ph (),
+					     __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtepu16_ph (__m256i __A)
+{
+  return __builtin_ia32_vcvtuw2ph_v16hi_mask ((__v16hi) __A,
+					      _mm256_setzero_ph (),
+					      (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtepu16_ph (__m256h __A, __mmask16 __B, __m256i __C)
+{
+  return __builtin_ia32_vcvtuw2ph_v16hi_mask ((__v16hi) __C, __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtepu16_ph (__mmask16 __A, __m256i __B)
+{
+  return __builtin_ia32_vcvtuw2ph_v16hi_mask ((__v16hi) __B,
+					      _mm256_setzero_ph (),
+					      __A);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index c430dc9ab48..57b9ea786e1 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1316,6 +1316,11 @@ DEF_FUNCTION_TYPE (V4DI, V8HF, V4DI, UQI)
 DEF_FUNCTION_TYPE (V4SI, V8HF, V4SI, UQI)
 DEF_FUNCTION_TYPE (V8SI, V8HF, V8SI, UQI)
 DEF_FUNCTION_TYPE (V8HI, V8HF, V8HI, UQI)
+DEF_FUNCTION_TYPE (V8HF, V4SI, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V8SI, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V2DI, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V4DI, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V8HI, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI)
@@ -1323,18 +1328,22 @@ DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT)
 DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT)
+DEF_FUNCTION_TYPE (V8HF, V8DI, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
 DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI)
+DEF_FUNCTION_TYPE (V16HF, V16HI, V16HF, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V16SI, V16HF, V16SI, UHI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI)
 DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI)
+DEF_FUNCTION_TYPE (V16HF, V16SI, V16HF, UHI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
 DEF_FUNCTION_TYPE (V32HI, V32HF, V32HI, USI, INT)
+DEF_FUNCTION_TYPE (V32HF, V32HI, V32HF, USI, INT)
 DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index dde8af53ff0..44c55876e48 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2843,6 +2843,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v16hi_mask, "__builtin_ia32_vcvtph2w_v16hi_mask", IX86_BUILTIN_VCVTPH2W_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v8hi_mask, "__builtin_ia32_vcvtph2uw_v8hi_mask", IX86_BUILTIN_VCVTPH2UW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v16hi_mask, "__builtin_ia32_vcvtph2uw_v16hi_mask", IX86_BUILTIN_VCVTPH2UW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v8hi_mask, "__builtin_ia32_vcvtw2ph_v8hi_mask", IX86_BUILTIN_VCVTW2PH_V8HI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v16hi_mask, "__builtin_ia32_vcvtw2ph_v16hi_mask", IX86_BUILTIN_VCVTW2PH_V16HI_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HI_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v8hi_mask, "__builtin_ia32_vcvtuw2ph_v8hi_mask", IX86_BUILTIN_VCVTUW2PH_V8HI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v16hi_mask, "__builtin_ia32_vcvtuw2ph_v16hi_mask", IX86_BUILTIN_VCVTUW2PH_V16HI_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HI_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v4si_mask, "__builtin_ia32_vcvtdq2ph_v4si_mask", IX86_BUILTIN_VCVTDQ2PH_V4SI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4SI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v8si_mask, "__builtin_ia32_vcvtdq2ph_v8si_mask", IX86_BUILTIN_VCVTDQ2PH_V8SI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8SI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v4si_mask, "__builtin_ia32_vcvtudq2ph_v4si_mask", IX86_BUILTIN_VCVTUDQ2PH_V4SI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4SI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v8si_mask, "__builtin_ia32_vcvtudq2ph_v8si_mask", IX86_BUILTIN_VCVTUDQ2PH_V8SI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8SI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v2di_mask, "__builtin_ia32_vcvtqq2ph_v2di_mask", IX86_BUILTIN_VCVTQQ2PH_V2DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v4di_mask, "__builtin_ia32_vcvtqq2ph_v4di_mask", IX86_BUILTIN_VCVTQQ2PH_V4DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v2di_mask, "__builtin_ia32_vcvtuqq2ph_v2di_mask", IX86_BUILTIN_VCVTUQQ2PH_V2DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v4di_mask, "__builtin_ia32_vcvtuqq2ph_v4di_mask", IX86_BUILTIN_VCVTUQQ2PH_V4DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DI_V8HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3076,6 +3088,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_r
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq_v8di_mask_round", IX86_BUILTIN_VCVTPH2UQQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w_v32hi_mask_round", IX86_BUILTIN_VCVTPH2W_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw_v32hi_mask_round", IX86_BUILTIN_VCVTPH2UW_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph_v32hi_mask_round", IX86_BUILTIN_VCVTW2PH_V32HI_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph_v32hi_mask_round", IX86_BUILTIN_VCVTUW2PH_V32HI_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph_v16si_mask_round", IX86_BUILTIN_VCVTDQ2PH_V16SI_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph_v16si_mask_round", IX86_BUILTIN_VCVTUDQ2PH_V16SI_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTUQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 59d1f4f5eea..7d9e1bd6a2d 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9574,6 +9574,11 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V2DI_FTYPE_V8HF_V2DI_UQI:
     case V2DI_FTYPE_V4SF_V2DI_UQI:
     case V8HF_FTYPE_V8HF_V8HF_UQI:
+    case V8HF_FTYPE_V8HI_V8HF_UQI:
+    case V8HF_FTYPE_V8SI_V8HF_UQI:
+    case V8HF_FTYPE_V4SI_V8HF_UQI:
+    case V8HF_FTYPE_V4DI_V8HF_UQI:
+    case V8HF_FTYPE_V2DI_V8HF_UQI:
     case V4SF_FTYPE_V4DI_V4SF_UQI:
     case V4SF_FTYPE_V2DI_V4SF_UQI:
     case V4DF_FTYPE_V4DI_V4DF_UQI:
@@ -9640,6 +9645,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V8DI_FTYPE_DI_V8DI_UQI:
     case V16SF_FTYPE_V8SF_V16SF_UHI:
     case V16SI_FTYPE_V8SI_V16SI_UHI:
+    case V16HF_FTYPE_V16HI_V16HF_UHI:
     case V16HI_FTYPE_V16HF_V16HI_UHI:
     case V16HI_FTYPE_V16HI_V16HI_UHI:
     case V8HI_FTYPE_V16QI_V8HI_UQI:
@@ -10513,16 +10519,19 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V8DI_FTYPE_V8DF_V8DI_QI_INT:
     case V8SF_FTYPE_V8DI_V8SF_QI_INT:
     case V8DF_FTYPE_V8DI_V8DF_QI_INT:
+    case V32HF_FTYPE_V32HI_V32HF_USI_INT:
     case V32HF_FTYPE_V32HF_V32HF_USI_INT:
     case V16SF_FTYPE_V16SF_V16SF_HI_INT:
     case V8DI_FTYPE_V8SF_V8DI_QI_INT:
     case V16SF_FTYPE_V16SI_V16SF_HI_INT:
     case V16SI_FTYPE_V16SF_V16SI_HI_INT:
     case V16SI_FTYPE_V16HF_V16SI_UHI_INT:
+    case V16HF_FTYPE_V16SI_V16HF_UHI_INT:
     case V8DF_FTYPE_V8SF_V8DF_QI_INT:
     case V16SF_FTYPE_V16HI_V16SF_HI_INT:
     case V2DF_FTYPE_V2DF_V2DF_V2DF_INT:
     case V4SF_FTYPE_V4SF_V4SF_V4SF_INT:
+    case V8HF_FTYPE_V8DI_V8HF_UQI_INT:
       nargs = 4;
       break;
     case V4SF_FTYPE_V4SF_V4SF_INT_INT:
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index fcadfcd4c94..699f9a234c9 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -90,6 +90,8 @@ VECTOR_MODES (FLOAT, 32);     /*   V16HF V8SF V4DF V2TF */
 VECTOR_MODES (FLOAT, 64);     /*  V32HF V16SF V8DF V4TF */
 VECTOR_MODES (FLOAT, 128);    /* V64HF V32SF V16DF V8TF */
 VECTOR_MODES (FLOAT, 256);    /* V128HF V64SF V32DF V16TF */
+VECTOR_MODE (FLOAT, HF, 2)    /* 	      	   V2HF */
+VECTOR_MODE (FLOAT, HF, 6)    /*		   V6HF */
 VECTOR_MODE (INT, TI, 1);     /*                   V1TI */
 VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
 VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7b705422396..8b23048a232 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -494,6 +494,11 @@ (define_mode_iterator VI48_AVX512F_AVX512VL
 (define_mode_iterator VI2_AVX512VL
   [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI])
 
+(define_mode_iterator VI2H_AVX512VL
+  [(V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL") V32HI
+   (V8SI "TARGET_AVX512VL") V16SI
+   V8DI ])
+
 (define_mode_iterator VI1_AVX512VL_F
   [V32QI (V16QI "TARGET_AVX512VL") (V64QI "TARGET_AVX512F")])
 
@@ -895,9 +900,9 @@ (define_mode_attr avx512fmaskhalfmode
 
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
-  [(V16SF "V16SI") (V8DF  "V8DI")
-   (V8SF  "V8SI")  (V4DF  "V4DI")
-   (V4SF  "V4SI")  (V2DF  "V2DI")
+  [(V32HF "V32HI") (V16SF "V16SI") (V8DF  "V8DI")
+   (V16HF "V16HI") (V8SF  "V8SI")  (V4DF  "V4DI")
+   (V8HF "V8HI") (V4SF  "V4SI")  (V2DF  "V2DI")
    (V16SI "V16SI") (V8DI  "V8DI")
    (V8SI  "V8SI")  (V4DI  "V4DI")
    (V4SI  "V4SI")  (V2DI  "V2DI")
@@ -5432,6 +5437,11 @@ (define_int_attr sseintconvertsignprefix
 	[(UNSPEC_UNSIGNED_FIX_NOTRUNC "u")
 	 (UNSPEC_FIX_NOTRUNC "")])
 
+(define_mode_attr qq2phsuff
+  [(V32HI "") (V16HI "") (V8HI "")
+   (V16SI "") (V8SI "{y}") (V4SI "{x}")
+   (V8DI "{z}") (V4DI "{y}") (V2DI "{x}")])
+
 (define_insn "avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode><mask_name><round_name>"
   [(set (match_operand:VI248_AVX512VL 0 "register_operand" "=v")
         (unspec:VI248_AVX512VL
@@ -5443,6 +5453,143 @@ (define_insn "avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode><
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode><mask_name><round_name>"
+  [(set (match_operand:<ssePHmode> 0 "register_operand" "=v")
+	(any_float:<ssePHmode>
+	  (match_operand:VI2H_AVX512VL 1 "<round_nimm_predicate>" "<round_constraint>")))]
+  "TARGET_AVX512FP16"
+  "vcvt<floatsuffix><sseintconvert>2ph<round_qq2phsuff>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_concat:V8HF
+	    (any_float:V4HF (match_operand:VI4_128_8_256 1 "vector_operand" "vm"))
+	    (match_dup 2)))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "operands[2] = CONST0_RTX (V4HFmode);")
+
+(define_insn "*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_concat:V8HF
+	    (any_float:V4HF (match_operand:VI4_128_8_256 1 "vector_operand" "vm"))
+	    (match_operand:V4HF 2 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvt<floatsuffix><sseintconvert>2ph<qq2phsuff>\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+        (vec_merge:V4HF
+	    (any_float:V4HF (match_operand:VI4_128_8_256 1 "vector_operand" "vm"))
+            (vec_select:V4HF
+                (match_operand:V8HF 2 "nonimm_or_0_operand" "0C")
+                (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))
+            (match_operand:QI 3 "register_operand" "Yk"))
+	    (match_dup 4)))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "operands[4] = CONST0_RTX (V4HFmode);")
+
+(define_insn "*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+        (vec_merge:V4HF
+	    (any_float:V4HF (match_operand:VI4_128_8_256 1 "vector_operand" "vm"))
+            (vec_select:V4HF
+                (match_operand:V8HF 2 "nonimm_or_0_operand" "0C")
+                (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))
+            (match_operand:QI 3 "register_operand" "Yk"))
+	    (match_operand:V4HF 4 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvt<floatsuffix><sseintconvert>2ph<qq2phsuff>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask_1"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+	(vec_merge:V4HF
+		(any_float:V4HF (match_operand:VI4_128_8_256 1
+				  "vector_operand" "vm"))
+	    (match_operand:V4HF 3 "const0_operand" "C")
+	    (match_operand:QI 2 "register_operand" "Yk"))
+	    (match_operand:V4HF 4 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvt<floatsuffix><sseintconvert>2ph<qq2phsuff>\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512fp16_vcvt<floatsuffix>qq2ph_v2di"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_concat:V8HF
+	    (any_float:V2HF (match_operand:V2DI 1 "vector_operand" "vm"))
+	    (match_dup 2)))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "operands[2] = CONST0_RTX (V6HFmode);")
+
+(define_insn "*avx512fp16_vcvt<floatsuffix>qq2ph_v2di"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_concat:V8HF
+	    (any_float:V2HF (match_operand:V2DI 1 "vector_operand" "vm"))
+	    (match_operand:V6HF 2 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvt<floatsuffix>qq2ph{x}\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+(define_expand "avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+        (vec_merge:V2HF
+	    (any_float:V2HF (match_operand:V2DI 1 "vector_operand" "vm"))
+            (vec_select:V2HF
+                (match_operand:V8HF 2 "nonimm_or_0_operand" "0C")
+                (parallel [(const_int 0) (const_int 1)]))
+            (match_operand:QI 3 "register_operand" "Yk"))
+	    (match_dup 4)))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "operands[4] = CONST0_RTX (V6HFmode);")
+
+(define_insn "*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+        (vec_merge:V2HF
+	    (any_float:V2HF (match_operand:V2DI 1 "vector_operand" "vm"))
+            (vec_select:V2HF
+                (match_operand:V8HF 2 "nonimm_or_0_operand" "0C")
+                (parallel [(const_int 0) (const_int 1)]))
+            (match_operand:QI 3 "register_operand" "Yk"))
+	    (match_operand:V6HF 4 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvt<floatsuffix>qq2ph{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+(define_insn "*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask_1"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+	(vec_merge:V2HF
+		(any_float:V2HF (match_operand:V2DI 1
+				  "vector_operand" "vm"))
+	    (match_operand:V2HF 3 "const0_operand" "C")
+	    (match_operand:QI 2 "register_operand" "Yk"))
+	    (match_operand:V6HF 4 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvt<floatsuffix>qq2ph{x}\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel single-precision floating point conversion operations
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index ecb158f07e5..2e9c2b38e25 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -134,6 +134,7 @@ (define_subst_attr "round_mask_op3" "round" "" "<round_mask_operand3>")
 (define_subst_attr "round_mask_op4" "round" "" "<round_mask_operand4>")
 (define_subst_attr "round_sd_mask_op4" "round" "" "<round_sd_mask_operand4>")
 (define_subst_attr "round_constraint" "round" "vm" "v")
+(define_subst_attr "round_qq2phsuff" "round" "<qq2phsuff>" "")
 (define_subst_attr "bcst_round_constraint" "round" "vmBr" "v")
 (define_subst_attr "round_constraint2" "round" "m" "v")
 (define_subst_attr "round_constraint3" "round" "rm" "r")
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index cdfc2e3b69f..b569cc0bdd9 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -725,6 +725,12 @@
 #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 5e4aaf8ce9b..07e59118438 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -742,6 +742,12 @@
 #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 32aa4518703..0530192d97e 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -684,6 +684,12 @@ test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8)
 test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8)
 test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8)
+test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8)
+test_1 (_mm512_cvt_roundepu16_ph, __m512h, __m512i, 8)
+test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8)
+test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8)
+test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
+test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
 test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
@@ -722,6 +728,12 @@ test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -766,6 +778,12 @@ test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepu32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 44ac10d602f..04e6340516b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -789,6 +789,12 @@ test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8)
 test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8)
 test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8)
+test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8)
+test_1 (_mm512_cvt_roundepu16_ph, __m512h, __m512i, 8)
+test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8)
+test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8)
+test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
+test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
 test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
@@ -826,6 +832,12 @@ test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8)
+test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -869,6 +881,12 @@ test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepu32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
+test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index ae6151b4a61..684891cc98b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -743,6 +743,12 @@
 #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 29/62] AVX512FP16: Add testcase for vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (27 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 28/62] AVX512FP16: Add vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh liuhongt
                   ` (32 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtw2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtw2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c | 24 +++++
 .../gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c | 79 ++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c | 24 +++++
 .../gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c | 84 +++++++++++++++++
 .../i386/avx512fp16-vcvtudq2ph-1a.c           | 24 +++++
 .../i386/avx512fp16-vcvtudq2ph-1b.c           | 79 ++++++++++++++++
 .../i386/avx512fp16-vcvtuqq2ph-1a.c           | 24 +++++
 .../i386/avx512fp16-vcvtuqq2ph-1b.c           | 83 +++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c | 24 +++++
 .../gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c | 93 +++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtw2ph-1a.c  | 24 +++++
 .../gcc.target/i386/avx512fp16-vcvtw2ph-1b.c  | 92 ++++++++++++++++++
 .../i386/avx512fp16vl-vcvtdq2ph-1a.c          | 27 ++++++
 .../i386/avx512fp16vl-vcvtdq2ph-1b.c          | 15 +++
 .../i386/avx512fp16vl-vcvtqq2ph-1a.c          | 28 ++++++
 .../i386/avx512fp16vl-vcvtqq2ph-1b.c          | 15 +++
 .../i386/avx512fp16vl-vcvtudq2ph-1a.c         | 27 ++++++
 .../i386/avx512fp16vl-vcvtudq2ph-1b.c         | 15 +++
 .../i386/avx512fp16vl-vcvtuqq2ph-1a.c         | 28 ++++++
 .../i386/avx512fp16vl-vcvtuqq2ph-1b.c         | 15 +++
 .../i386/avx512fp16vl-vcvtuw2ph-1a.c          | 29 ++++++
 .../i386/avx512fp16vl-vcvtuw2ph-1b.c          | 15 +++
 .../i386/avx512fp16vl-vcvtw2ph-1a.c           | 29 ++++++
 .../i386/avx512fp16vl-vcvtw2ph-1b.c           | 15 +++
 24 files changed, 912 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c
new file mode 100644
index 00000000000..45697d94b1c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res, res1, res2;
+volatile __m512i x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtepi32_ph (x1);
+  res1 = _mm512_mask_cvtepi32_ph (res, m16, x2);
+  res2 = _mm512_maskz_cvtepi32_ph (m16, x3);
+  res = _mm512_cvt_roundepi32_ph (x1, 4);
+  res1 = _mm512_mask_cvt_roundepi32_ph (res, m16, x2, 8);
+  res2 = _mm512_maskz_cvt_roundepi32_ph (m16, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c
new file mode 100644
index 00000000000..a2bb56c25d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtdq2ph-1b.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 32)
+
+void NOINLINE
+EMULATE(cvtd2_ph) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.f32[i] = 0;
+      }
+      else {
+	v5.u32[i] = v7.u32[i];
+      }
+    }
+    else {
+      v5.f32[i] = op1.u32[i];
+    }
+  }
+  *dest = pack_twops_2ph(v5, v5);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtd2_ph)(&exp, src3, NET_MASK, 0);
+  H_HF(res) = INTRINSIC (_cvtepi32_ph) (SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtepi32_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 0);
+  H_HF(res) = INTRINSIC (_mask_cvtepi32_ph) (H_HF(res), HALF_MASK, SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtepi32_ph);
+
+  EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 1);
+  H_HF(res) = INTRINSIC (_maskz_cvtepi32_ph) (HALF_MASK, SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtepi32_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtd2_ph)(&exp, src3, NET_MASK, 0);
+  H_HF(res) = INTRINSIC (_cvt_roundepi32_ph) (SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundepi32_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 0);
+  H_HF(res) = INTRINSIC (_mask_cvt_roundepi32_ph) (H_HF(res), HALF_MASK, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundepi32_ph);
+
+  EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 1);
+  H_HF(res) = INTRINSIC (_maskz_cvt_roundepi32_ph) (HALF_MASK, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundepi32_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c
new file mode 100644
index 00000000000..4e8515e9a3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, res1, res2;
+volatile __m512i x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtepi64_ph (x1);
+  res1 = _mm512_mask_cvtepi64_ph (res, m8, x2);
+  res2 = _mm512_maskz_cvtepi64_ph (m8, x3);
+  res = _mm512_cvt_roundepi64_ph (x1, 4);
+  res1 = _mm512_mask_cvt_roundepi64_ph (res, m8, x2, 8);
+  res2 = _mm512_maskz_cvt_roundepi64_ph (m8, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c
new file mode 100644
index 00000000000..cb213b9d9f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtqq2ph-1b.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 64)
+
+void NOINLINE
+EMULATE(cvtq2_ph) (V512 * dest, V512 op1, int n_el,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < n_el; i++) {
+      if (((1 << i) & m1) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.u32[i] = v7.u32[i];
+	  }
+      }
+      else {
+	  v5.f32[i] = op1.u64[i];
+      }
+  }
+
+  // The left part should be zero
+  for (i = n_el; i < 16; i++)
+    v5.f32[i] = 0;
+
+  *dest = pack_twops_2ph(v5, v5);
+}
+
+void
+TEST (void)
+{
+
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, NET_MASK, 0);
+  res.xmmh[0] = INTRINSIC (_cvtepi64_ph) (SI(src3));
+  CHECK_RESULT (&res, &exp, 8, _cvtepi64_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xcc, 0);
+  res.xmmh[0] = INTRINSIC (_mask_cvtepi64_ph) (res.xmmh[0], 0xcc, SI(src3));
+  CHECK_RESULT (&res, &exp, 8, _mask_cvtepi64_ph);
+
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xf1, 1);
+  res.xmmh[0] = INTRINSIC (_maskz_cvtepi64_ph) (0xf1, SI(src3));
+  CHECK_RESULT (&res, &exp, 8, _maskz_cvtepi64_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, NET_MASK, 0);
+  res.xmmh[0] = INTRINSIC (_cvt_roundepi64_ph) (SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, 8, _cvt_roundepi64_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xcc, 0);
+  res.xmmh[0] = INTRINSIC (_mask_cvt_roundepi64_ph) (res.xmmh[0], 0xcc, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, 8, _mask_cvt_roundepi64_ph);
+
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xf1, 1);
+  res.xmmh[0] = INTRINSIC (_maskz_cvt_roundepi64_ph) (0xf1, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, 8, _maskz_cvt_roundepi64_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c
new file mode 100644
index 00000000000..8d90ef6f168
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res, res1, res2;
+volatile __m512i x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtepu32_ph (x1);
+  res1 = _mm512_mask_cvtepu32_ph (res, m16, x2);
+  res2 = _mm512_maskz_cvtepu32_ph (m16, x3);
+  res = _mm512_cvt_roundepu32_ph (x1, 4);
+  res1 = _mm512_mask_cvt_roundepu32_ph (res, m16, x2, 8);
+  res2 = _mm512_maskz_cvt_roundepu32_ph (m16, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c
new file mode 100644
index 00000000000..e9c1cd1bcb0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtudq2ph-1b.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 32)
+
+void NOINLINE
+EMULATE(cvtd2_ph) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.f32[i] = 0;
+      }
+      else {
+	v5.u32[i] = v7.u32[i];
+      }
+    }
+    else {
+      v5.f32[i] = op1.u32[i];
+    }
+  }
+  *dest = pack_twops_2ph(v5, v5);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtd2_ph)(&exp, src3, NET_MASK, 0);
+  H_HF(res)= INTRINSIC (_cvtepu32_ph) (SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtepu32_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 0);
+  H_HF(res) = INTRINSIC (_mask_cvtepu32_ph) (H_HF(res), HALF_MASK, SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtepu32_ph);
+
+  EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 1);
+  H_HF(res) = INTRINSIC (_maskz_cvtepu32_ph) (HALF_MASK, SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtepu32_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtd2_ph)(&exp, src3, NET_MASK, 0);
+  H_HF(res)= INTRINSIC (_cvt_roundepu32_ph) (SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundepu32_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 0);
+  H_HF(res) = INTRINSIC (_mask_cvt_roundepu32_ph) (H_HF(res), HALF_MASK, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundepu32_ph);
+
+  EMULATE(cvtd2_ph)(&exp, src3, HALF_MASK, 1);
+  H_HF(res) = INTRINSIC (_maskz_cvt_roundepu32_ph) (HALF_MASK, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundepu32_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c
new file mode 100644
index 00000000000..a234bb50482
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtuqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, res1, res2;
+volatile __m512i x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtepu64_ph (x1);
+  res1 = _mm512_mask_cvtepu64_ph (res, m8, x2);
+  res2 = _mm512_maskz_cvtepu64_ph (m8, x3);
+  res = _mm512_cvt_roundepu64_ph (x1, 4);
+  res1 = _mm512_mask_cvt_roundepu64_ph (res, m8, x2, 8);
+  res2 = _mm512_maskz_cvt_roundepu64_ph (m8, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c
new file mode 100644
index 00000000000..873d9109e47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuqq2ph-1b.c
@@ -0,0 +1,83 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 64)
+
+void NOINLINE
+EMULATE(cvtq2_ph) (V512 * dest, V512 op1, int n_el,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < n_el; i++) {
+      if (((1 << i) & m1) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.u32[i] = v7.u32[i];
+	  }
+      }
+      else {
+	  v5.f32[i] = op1.u64[i];
+      }
+  }
+
+  // The left part should be zero
+  for (i = n_el; i < 16; i++)
+    v5.f32[i] = 0;
+
+  *dest = pack_twops_2ph(v5, v5);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, NET_MASK, 0);
+  res.xmmh[0] = INTRINSIC (_cvtepu64_ph) (SI(src3));
+  CHECK_RESULT (&res, &exp, 8, _cvtepu64_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xcc, 0);
+  res.xmmh[0] = INTRINSIC (_mask_cvtepu64_ph) (res.xmmh[0], 0xcc, SI(src3));
+  CHECK_RESULT (&res, &exp, 8, _mask_cvtepu64_ph);
+
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xc1, 1);
+  res.xmmh[0] = INTRINSIC (_maskz_cvtepu64_ph) (0xc1, SI(src3));
+  CHECK_RESULT (&res, &exp, 8, _maskz_cvtepu64_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, NET_MASK, 0);
+  res.xmmh[0] = INTRINSIC (_cvt_roundepu64_ph) (SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, 8, _cvt_roundepu64_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xcc, 0);
+  res.xmmh[0] = INTRINSIC (_mask_cvt_roundepu64_ph) (res.xmmh[0], 0xcc, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, 8, _mask_cvt_roundepu64_ph);
+
+  EMULATE(cvtq2_ph)(&exp, src3, N_ELEMS, 0xc1, 1);
+  res.xmmh[0] = INTRINSIC (_maskz_cvt_roundepu64_ph) (0xc1, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, 8, _maskz_cvt_roundepu64_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c
new file mode 100644
index 00000000000..43c96a0d2fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res;
+volatile __m512i x1;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtepu16_ph (x1);
+  res = _mm512_mask_cvtepu16_ph (res, m32, x1);
+  res = _mm512_maskz_cvtepu16_ph (m32, x1);
+  res = _mm512_cvt_roundepu16_ph (x1, 4);
+  res = _mm512_mask_cvt_roundepu16_ph (res, m32, x1, 8);
+  res = _mm512_maskz_cvt_roundepu16_ph (m32, x1, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c
new file mode 100644
index 00000000000..6d6b6da342f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtuw2ph-1b.c
@@ -0,0 +1,93 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtw2_ph) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+      if (((1 << i) & m1) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.f32[i] = v7.f32[i];
+	  }
+      }
+      else {
+	  v5.f32[i] = op1.u16[i];
+
+      }
+
+      if (((1 << i) & m2) == 0) {
+	  if (zero_mask) {
+	      v6.f32[i] = 0;
+	  }
+	  else {
+	      v6.f32[i] = v8.f32[i];
+	  }
+      }
+      else {
+	  v6.f32[i] = op1.u16[i+16];
+      }
+  }
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtw2_ph)(&exp, src3, NET_MASK, 0);
+  HF(res) = INTRINSIC (_cvtepu16_ph) (SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtepu16_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtw2_ph)(&exp, src3, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_cvtepu16_ph) (HF(res), MASK_VALUE, SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtepu16_ph);
+
+  EMULATE(cvtw2_ph)(&exp, src3, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_cvtepu16_ph) (ZMASK_VALUE, SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtepu16_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtw2_ph)(&exp, src3, NET_MASK, 0);
+  HF(res) = INTRINSIC (_cvt_roundepu16_ph) (SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundepu16_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtw2_ph)(&exp, src3, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_cvt_roundepu16_ph) (HF(res), MASK_VALUE, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundepu16_ph);
+
+  EMULATE(cvtw2_ph)(&exp, src3, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_cvt_roundepu16_ph) (ZMASK_VALUE, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundepu16_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1a.c
new file mode 100644
index 00000000000..c6eaee1772b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res;
+volatile __m512i x1;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtepi16_ph (x1);
+  res = _mm512_mask_cvtepi16_ph (res, m32, x1);
+  res = _mm512_maskz_cvtepi16_ph (m32, x1);
+  res = _mm512_cvt_roundepi16_ph (x1, 4);
+  res = _mm512_mask_cvt_roundepi16_ph (res, m32, x1, 8);
+  res = _mm512_maskz_cvt_roundepi16_ph (m32, x1, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1b.c
new file mode 100644
index 00000000000..e02b6fcdbf7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtw2ph-1b.c
@@ -0,0 +1,92 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtw2_ph) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+      if (((1 << i) & m1) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.f32[i] = v7.f32[i];
+	  }
+      }
+      else {
+	  v5.f32[i] = op1.u16[i];
+
+      }
+
+      if (((1 << i) & m2) == 0) {
+	  if (zero_mask) {
+	      v6.f32[i] = 0;
+	  }
+	  else {
+	      v6.f32[i] = v8.f32[i];
+	  }
+      }
+      else {
+	  v6.f32[i] = op1.u16[i+16];
+      }
+  }
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtw2_ph)(&exp, src3, NET_MASK, 0);
+  HF(res) = INTRINSIC (_cvtepi16_ph) (SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtepi16_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtw2_ph)(&exp, src3, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_cvtepi16_ph) (HF(res), MASK_VALUE, SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtepi16_ph);
+
+  EMULATE(cvtw2_ph)(&exp, src3, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_cvtepi16_ph) (ZMASK_VALUE, SI(src3));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtepi16_ph);    
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtw2_ph)(&exp, src3, NET_MASK, 0);
+  HF(res) = INTRINSIC (_cvt_roundepi16_ph) (SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundepi16_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtw2_ph)(&exp, src3, MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_cvt_roundepi16_ph) (HF(res), MASK_VALUE, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundepi16_ph);
+
+  EMULATE(cvtw2_ph)(&exp, src3, ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_cvt_roundepi16_ph) (ZMASK_VALUE, SI(src3), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundepi16_ph);    
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c
new file mode 100644
index 00000000000..ab0541dce1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtdq2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtdq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtdq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res3;
+volatile __m256i x2;
+volatile __m128i x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res3 = _mm256_cvtepi32_ph (x2);
+  res3 = _mm256_mask_cvtepi32_ph (res3, m8, x2);
+  res3 = _mm256_maskz_cvtepi32_ph (m8, x2);
+
+  res3 = _mm_cvtepi32_ph (x3);
+  res3 = _mm_mask_cvtepi32_ph (res3, m8, x3);
+  res3 = _mm_maskz_cvtepi32_ph (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c
new file mode 100644
index 00000000000..033587a6704
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtdq2ph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtdq2ph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtdq2ph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c
new file mode 100644
index 00000000000..8e42a4b29f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res3;
+volatile __m256i x2;
+volatile __m128i x3;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res3 = _mm256_cvtepi64_ph (x2);
+  res3 = _mm256_mask_cvtepi64_ph (res3, m16, x2);
+  res3 = _mm256_maskz_cvtepi64_ph (m16, x2);
+
+  res3 = _mm_cvtepi64_ph (x3);
+  res3 = _mm_mask_cvtepi64_ph (res3, m8, x3);
+  res3 = _mm_maskz_cvtepi64_ph (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c
new file mode 100644
index 00000000000..6a4a329f368
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtqq2ph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtqq2ph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtqq2ph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c
new file mode 100644
index 00000000000..4fa2ab92245
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtudq2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtudq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtudq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res3;
+volatile __m256i x2;
+volatile __m128i x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res3 = _mm256_cvtepu32_ph (x2);
+  res3 = _mm256_mask_cvtepu32_ph (res3, m8, x2);
+  res3 = _mm256_maskz_cvtepu32_ph (m8, x2);
+
+  res3 = _mm_cvtepu32_ph (x3);
+  res3 = _mm_mask_cvtepu32_ph (res3, m8, x3);
+  res3 = _mm_maskz_cvtepu32_ph (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c
new file mode 100644
index 00000000000..4ea2c268760
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtudq2ph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtudq2ph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtudq2ph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c
new file mode 100644
index 00000000000..a3ee951d4c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res3;
+volatile __m256i x2;
+volatile __m128i x3;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res3 = _mm256_cvtepu64_ph (x2);
+  res3 = _mm256_mask_cvtepu64_ph (res3, m16, x2);
+  res3 = _mm256_maskz_cvtepu64_ph (m16, x2);
+
+  res3 = _mm_cvtepu64_ph (x3);
+  res3 = _mm_mask_cvtepu64_ph (res3, m8, x3);
+  res3 = _mm_maskz_cvtepu64_ph (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c
new file mode 100644
index 00000000000..c747e8de0dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtuqq2ph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtuqq2ph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c
new file mode 100644
index 00000000000..59393dc01a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res2;
+volatile __m128h res3;
+volatile __m256i x2;
+volatile __m128i x3;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res2 = _mm256_cvtepu16_ph (x2);
+  res2 = _mm256_mask_cvtepu16_ph (res2, m16, x2);
+  res2 = _mm256_maskz_cvtepu16_ph (m16, x2);
+
+  res3 = _mm_cvtepu16_ph (x3);
+  res3 = _mm_mask_cvtepu16_ph (res3, m8, x3);
+  res3 = _mm_maskz_cvtepu16_ph (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c
new file mode 100644
index 00000000000..89d94df57b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtuw2ph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtuw2ph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtuw2ph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c
new file mode 100644
index 00000000000..ff5530f60a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res2;
+volatile __m128h res3;
+volatile __m256i x2;
+volatile __m128i x3;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res2 = _mm256_cvtepi16_ph (x2);
+  res2 = _mm256_mask_cvtepi16_ph (res2, m16, x2);
+  res2 = _mm256_maskz_cvtepi16_ph (m16, x2);
+
+  res3 = _mm_cvtepi16_ph (x3);
+  res3 = _mm_mask_cvtepi16_ph (res3, m8, x3);
+  res3 = _mm_maskz_cvtepi16_ph (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c
new file mode 100644
index 00000000000..243e45bda62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtw2ph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtw2ph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtw2ph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (28 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 29/62] AVX512FP16: Add testcase for vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-09-17  8:07   ` Hongtao Liu
  2021-07-01  6:16 ` [PATCH 31/62] AVX512FP16: Add testcase for vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh liuhongt
                   ` (31 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_cvtsh_i32): New intrinsic.
	(_mm_cvtsh_u32): Likewise.
	(_mm_cvt_roundsh_i32): Likewise.
	(_mm_cvt_roundsh_u32): Likewise.
	(_mm_cvtsh_i64): Likewise.
	(_mm_cvtsh_u64): Likewise.
	(_mm_cvt_roundsh_i64): Likewise.
	(_mm_cvt_roundsh_u64): Likewise.
	(_mm_cvti32_sh): Likewise.
	(_mm_cvtu32_sh): Likewise.
	(_mm_cvt_roundi32_sh): Likewise.
	(_mm_cvt_roundu32_sh): Likewise.
	(_mm_cvti64_sh): Likewise.
	(_mm_cvtu64_sh): Likewise.
	(_mm_cvt_roundi64_sh): Likewise.
	(_mm_cvt_roundu64_sh): Likewise.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-expand.c (ix86_expand_round_builtin):
	Handle new builtin types.
	* config/i386/sse.md
	(avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix><round_name>):
	New define_insn.
	(avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix>_2): Likewise.
	(avx512fp16_vcvt<floatsuffix>si2sh<rex64namesuffix><round_name>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 158 +++++++++++++++++++++++++
 gcc/config/i386/i386-builtin-types.def |   8 ++
 gcc/config/i386/i386-builtin.def       |   8 ++
 gcc/config/i386/i386-expand.c          |   8 ++
 gcc/config/i386/sse.md                 |  46 +++++++
 gcc/testsuite/gcc.target/i386/avx-1.c  |   8 ++
 gcc/testsuite/gcc.target/i386/sse-13.c |   8 ++
 gcc/testsuite/gcc.target/i386/sse-14.c |  10 ++
 gcc/testsuite/gcc.target/i386/sse-22.c |  10 ++
 gcc/testsuite/gcc.target/i386/sse-23.c |   8 ++
 10 files changed, 272 insertions(+)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index bd801942365..7524a8d6a5b 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -3529,6 +3529,164 @@ _mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C)
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvtsh2si, vcvtsh2us.  */
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsh_i32 (__m128h __A)
+{
+  return (int) __builtin_ia32_vcvtsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline unsigned
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsh_u32 (__m128h __A)
+{
+  return (int) __builtin_ia32_vcvtsh2usi32_round (__A,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundsh_i32 (__m128h __A, const int __R)
+{
+  return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R);
+}
+
+extern __inline unsigned
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundsh_u32 (__m128h __A, const int __R)
+{
+  return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R);
+}
+
+#else
+#define _mm_cvt_roundsh_i32(A, B)		\
+  ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B)))
+#define _mm_cvt_roundsh_u32(A, B)		\
+  ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B)))
+
+#endif /* __OPTIMIZE__ */
+
+#ifdef __x86_64__
+extern __inline long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsh_i64 (__m128h __A)
+{
+  return (long long)
+    __builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsh_u64 (__m128h __A)
+{
+  return (long long)
+    __builtin_ia32_vcvtsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundsh_i64 (__m128h __A, const int __R)
+{
+  return (long long) __builtin_ia32_vcvtsh2si64_round (__A, __R);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundsh_u64 (__m128h __A, const int __R)
+{
+  return (long long) __builtin_ia32_vcvtsh2usi64_round (__A, __R);
+}
+
+#else
+#define _mm_cvt_roundsh_i64(A, B)			\
+  ((long long)__builtin_ia32_vcvtsh2si64_round ((A), (B)))
+#define _mm_cvt_roundsh_u64(A, B)			\
+  ((long long)__builtin_ia32_vcvtsh2usi64_round ((A), (B)))
+
+#endif /* __OPTIMIZE__ */
+#endif /* __x86_64__ */
+
+/* Intrinsics vcvtsi2sh, vcvtusi2sh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvti32_sh (__m128h __A, int __B)
+{
+  return __builtin_ia32_vcvtsi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtu32_sh (__m128h __A, unsigned int __B)
+{
+  return __builtin_ia32_vcvtusi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundi32_sh (__m128h __A, int __B, const int __R)
+{
+  return __builtin_ia32_vcvtsi2sh32_round (__A, __B, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundu32_sh (__m128h __A, unsigned int __B, const int __R)
+{
+  return __builtin_ia32_vcvtusi2sh32_round (__A, __B, __R);
+}
+
+#else
+#define _mm_cvt_roundi32_sh(A, B, C)		\
+  (__builtin_ia32_vcvtsi2sh32_round ((A), (B), (C)))
+#define _mm_cvt_roundu32_sh(A, B, C)		\
+  (__builtin_ia32_vcvtusi2sh32_round ((A), (B), (C)))
+
+#endif /* __OPTIMIZE__ */
+
+#ifdef __x86_64__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvti64_sh (__m128h __A, long long __B)
+{
+  return __builtin_ia32_vcvtsi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtu64_sh (__m128h __A, unsigned long long __B)
+{
+  return __builtin_ia32_vcvtusi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundi64_sh (__m128h __A, long long __B, const int __R)
+{
+  return __builtin_ia32_vcvtsi2sh64_round (__A, __B, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R)
+{
+  return __builtin_ia32_vcvtusi2sh64_round (__A, __B, __R);
+}
+
+#else
+#define _mm_cvt_roundi64_sh(A, B, C)		\
+  (__builtin_ia32_vcvtsi2sh64_round ((A), (B), (C)))
+#define _mm_cvt_roundu64_sh(A, B, C)		\
+  (__builtin_ia32_vcvtusi2sh64_round ((A), (B), (C)))
+
+#endif /* __OPTIMIZE__ */
+#endif /* __x86_64__ */
+
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 57b9ea786e1..74bda59a65e 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1308,9 +1308,17 @@ DEF_FUNCTION_TYPE (V8HF, V8HI)
 DEF_FUNCTION_TYPE (QI, V8HF, INT, UQI)
 DEF_FUNCTION_TYPE (HI, V16HF, INT, UHI)
 DEF_FUNCTION_TYPE (SI, V32HF, INT, USI)
+DEF_FUNCTION_TYPE (INT, V8HF, INT)
+DEF_FUNCTION_TYPE (INT64, V8HF, INT)
+DEF_FUNCTION_TYPE (UINT, V8HF, INT)
+DEF_FUNCTION_TYPE (UINT64, V8HF, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
 DEF_FUNCTION_TYPE (VOID, PCFLOAT16, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, PCFLOAT16, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V8HF, INT, INT)
+DEF_FUNCTION_TYPE (V8HF, V8HF, INT64, INT)
+DEF_FUNCTION_TYPE (V8HF, V8HF, UINT, INT)
+DEF_FUNCTION_TYPE (V8HF, V8HF, UINT64, INT)
 DEF_FUNCTION_TYPE (V2DI, V8HF, V2DI, UQI)
 DEF_FUNCTION_TYPE (V4DI, V8HF, V4DI, UQI)
 DEF_FUNCTION_TYPE (V4SI, V8HF, V4SI, UQI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 44c55876e48..3602b40d6d5 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -3094,6 +3094,14 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph_v16si_mask_round", IX86_BUILTIN_VCVTUDQ2PH_V16SI_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTUQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2si_round, "__builtin_ia32_vcvtsh2si32_round", IX86_BUILTIN_VCVTSH2SI32_ROUND, UNKNOWN, (int) INT_FTYPE_V8HF_INT)
+BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2siq_round, "__builtin_ia32_vcvtsh2si64_round", IX86_BUILTIN_VCVTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usi_round, "__builtin_ia32_vcvtsh2usi32_round", IX86_BUILTIN_VCVTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT)
+BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usiq_round, "__builtin_ia32_vcvtsh2usi64_round", IX86_BUILTIN_VCVTSH2USI64_ROUND, UNKNOWN, (int) UINT64_FTYPE_V8HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__builtin_ia32_vcvtsi2sh32_round", IX86_BUILTIN_VCVTSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_INT)
+BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT)
+BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2shq_round, "__builtin_ia32_vcvtusi2sh64_round", IX86_BUILTIN_VCVTUSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT64_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 7d9e1bd6a2d..b83c6d9a92b 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10489,16 +10489,24 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     {
     case UINT64_FTYPE_V2DF_INT:
     case UINT64_FTYPE_V4SF_INT:
+    case UINT64_FTYPE_V8HF_INT:
     case UINT_FTYPE_V2DF_INT:
     case UINT_FTYPE_V4SF_INT:
+    case UINT_FTYPE_V8HF_INT:
     case INT64_FTYPE_V2DF_INT:
     case INT64_FTYPE_V4SF_INT:
+    case INT64_FTYPE_V8HF_INT:
     case INT_FTYPE_V2DF_INT:
     case INT_FTYPE_V4SF_INT:
+    case INT_FTYPE_V8HF_INT:
       nargs = 2;
       break;
     case V32HF_FTYPE_V32HF_V32HF_INT:
     case V8HF_FTYPE_V8HF_V8HF_INT:
+    case V8HF_FTYPE_V8HF_INT_INT:
+    case V8HF_FTYPE_V8HF_UINT_INT:
+    case V8HF_FTYPE_V8HF_INT64_INT:
+    case V8HF_FTYPE_V8HF_UINT64_INT:
     case V4SF_FTYPE_V4SF_UINT_INT:
     case V4SF_FTYPE_V4SF_UINT64_INT:
     case V2DF_FTYPE_V2DF_UINT64_INT:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8b23048a232..b312d26b806 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5589,6 +5589,52 @@ (define_insn "*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask_1"
    (set_attr "prefix" "evex")
    (set_attr "mode" "TI")])
 
+(define_insn "avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix><round_name>"
+  [(set (match_operand:SWI48 0 "register_operand" "=r,r")
+	(unspec:SWI48
+	  [(vec_select:HF
+	     (match_operand:V8HF 1 "<round_nimm_scalar_predicate>" "v,<round_constraint2>")
+	     (parallel [(const_int 0)]))]
+	  UNSPEC_US_FIX_NOTRUNC))]
+  "TARGET_AVX512FP16"
+  "%vcvtsh2<sseintconvertsignprefix>si\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
+  [(set_attr "type" "sseicvt")
+   (set_attr "athlon_decode" "double,vector")
+   (set_attr "bdver1_decode" "double,double")
+   (set_attr "prefix_rep" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix>_2"
+  [(set (match_operand:SWI48 0 "register_operand" "=r,r")
+	(unspec:SWI48 [(match_operand:HF 1 "nonimmediate_operand" "v,m")]
+		      UNSPEC_US_FIX_NOTRUNC))]
+  "TARGET_AVX512FP16"
+  "%vcvtsh2<sseintconvertsignprefix>si\t{%1, %0|%0, %k1}"
+  [(set_attr "type" "sseicvt")
+   (set_attr "athlon_decode" "double,vector")
+   (set_attr "bdver1_decode" "double,double")
+   (set_attr "prefix_rep" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "avx512fp16_vcvt<floatsuffix>si2sh<rex64namesuffix><round_name>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (vec_duplicate:V8HF
+	    (any_float:HF (match_operand:SWI48 2 "<round_nimm_scalar_predicate>" "<round_constraint3>")))
+	  (match_operand:V8HF 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vcvt<floatsuffix>si2sh\t{%2, <round_op3>%1, %0|%0, %1<round_op3>, %2}"
+  [(set_attr "type" "sseicvt")
+   (set_attr "athlon_decode" "*")
+   (set_attr "amdfam10_decode" "*")
+   (set_attr "bdver1_decode" "*")
+   (set_attr "btver2_decode" "double")
+   (set_attr "znver1_decode" "double")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index b569cc0bdd9..0aae949097a 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -731,6 +731,14 @@
 #define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtsh2si32_round(A, B) __builtin_ia32_vcvtsh2si32_round(A, 8)
+#define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8)
+#define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8)
+#define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8)
+#define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8)
+#define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
+#define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
+#define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 07e59118438..997fb733132 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -748,6 +748,14 @@
 #define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtsh2si32_round(A, B) __builtin_ia32_vcvtsh2si32_round(A, 8)
+#define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8)
+#define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8)
+#define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8)
+#define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8)
+#define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
+#define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
+#define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 0530192d97e..89a589e0d80 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -690,6 +690,14 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8)
 test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
+test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8)
+test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8)
+#ifdef __x86_64__
+test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8)
+test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8)
+test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8)
+test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8)
+#endif
 test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
@@ -734,6 +742,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8)
+test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
+test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 04e6340516b..fed12744c6c 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -795,6 +795,14 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8)
 test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
+test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8)
+test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8)
+#ifdef __x86_64__
+test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8)
+test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8)
+test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8)
+test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8)
+#endif
 test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
 test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
@@ -838,6 +846,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8)
+test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
+test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 684891cc98b..6e8d8a1833c 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -749,6 +749,14 @@
 #define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtsh2si32_round(A, B) __builtin_ia32_vcvtsh2si32_round(A, 8)
+#define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8)
+#define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8)
+#define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8)
+#define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8)
+#define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
+#define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
+#define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 31/62] AVX512FP16: Add testcase for vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (29 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 32/62] AVX512FP16: Add vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq liuhongt
                   ` (30 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h (V512): Add int32
	component.
	* gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: New test.
	* gcc.target/i386/avx512fp16-vcvtsh2si-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-helper.h       |  1 +
 .../gcc.target/i386/avx512fp16-vcvtsh2si-1a.c | 17 ++++++
 .../gcc.target/i386/avx512fp16-vcvtsh2si-1b.c | 54 +++++++++++++++++++
 .../i386/avx512fp16-vcvtsh2si64-1a.c          | 17 ++++++
 .../i386/avx512fp16-vcvtsh2si64-1b.c          | 52 ++++++++++++++++++
 .../i386/avx512fp16-vcvtsh2usi-1a.c           | 17 ++++++
 .../i386/avx512fp16-vcvtsh2usi-1b.c           | 54 +++++++++++++++++++
 .../i386/avx512fp16-vcvtsh2usi64-1a.c         | 16 ++++++
 .../i386/avx512fp16-vcvtsh2usi64-1b.c         | 53 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c | 16 ++++++
 .../gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c | 41 ++++++++++++++
 .../i386/avx512fp16-vcvtsi2sh64-1a.c          | 16 ++++++
 .../i386/avx512fp16-vcvtsi2sh64-1b.c          | 41 ++++++++++++++
 .../i386/avx512fp16-vcvtusi2sh-1a.c           | 16 ++++++
 .../i386/avx512fp16-vcvtusi2sh-1b.c           | 41 ++++++++++++++
 .../i386/avx512fp16-vcvtusi2sh64-1a.c         | 16 ++++++
 .../i386/avx512fp16-vcvtusi2sh64-1b.c         | 41 ++++++++++++++
 17 files changed, 509 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
index aa83b66998c..cf1c536d9f7 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
@@ -34,6 +34,7 @@ typedef union
   __m128i	  xmmi[4];
   unsigned short  u16[32];
   unsigned int    u32[16];
+  int		  i32[16];
   long long	  s64[8];
   unsigned long long u64[8];
   float           f32[16];
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1a.c
new file mode 100644
index 00000000000..f29c953572d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1a.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsh2si\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2si\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */
+
+
+#include <immintrin.h>
+
+volatile __m128h x1;
+volatile int res1;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm_cvtsh_i32 (x1);
+  res1 = _mm_cvt_roundsh_i32 (x1, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1b.c
new file mode 100644
index 00000000000..89c492cfc44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si-1b.c
@@ -0,0 +1,54 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 2
+
+void NOINLINE
+emulate_cvtph2_d(V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u32[i] = 0;
+      }
+      else {
+	v5.u32[i] = dest->u32[i];
+      }
+    }
+    else {
+      v5.u32[i] = v1.f32[i];
+
+    }
+  }
+  *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_cvtph2_d(&exp, src1,  NET_MASK, 0);
+  res.i32[0] = _mm_cvt_roundsh_i32(src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_i32");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c
new file mode 100644
index 00000000000..0289ebf95ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsh2si\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2si\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */
+
+
+#include <immintrin.h>
+
+volatile __m128h x1;
+volatile long long res2;
+
+void extern
+avx512f_test (void)
+{
+  res2 = _mm_cvtsh_i64 (x1);
+  res2 = _mm_cvt_roundsh_i64 (x1, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c
new file mode 100644
index 00000000000..6a5e836fd7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c
@@ -0,0 +1,52 @@
+/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 4
+
+void NOINLINE
+emulate_cvtph2_q(V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 8; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u64[i] = 0;
+      }
+      else {
+	v5.u64[i] = dest->u64[i];
+      }
+    }
+    else {
+      v5.u64[i] = v1.f32[i];
+    }
+  }
+  *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_cvtph2_q(&exp, src1,  NET_MASK, 0);
+  res.s64[0] = _mm_cvt_roundsh_i64(src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_i64");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c
new file mode 100644
index 00000000000..7d00867247e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsh2usi\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2usi\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */
+
+
+#include <immintrin.h>
+
+volatile __m128h x1;
+volatile unsigned int res1;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm_cvtsh_u32 (x1);
+  res1 = _mm_cvt_roundsh_u32 (x1, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c
new file mode 100644
index 00000000000..466ce6ead83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c
@@ -0,0 +1,54 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 2
+
+void NOINLINE
+emulate_cvtph2_d(V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u32[i] = 0;
+      }
+      else {
+	v5.u32[i] = dest->u32[i];
+      }
+    }
+    else {
+      v5.u32[i] = v1.f32[i];
+
+    }
+  }
+  *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_cvtph2_d(&exp, src1,  NET_MASK, 0);
+  res.u32[0] = _mm_cvt_roundsh_i32(src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_u32");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c
new file mode 100644
index 00000000000..363252d8d5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile  { target { ! ia32 } } } */
+/* { dg-options "-mavx512fp16 -O2 " } */
+/* { dg-final { scan-assembler-times "vcvtsh2usi\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2usi\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x1;
+volatile unsigned long long res2;
+
+void extern
+avx512f_test (void)
+{
+  res2 = _mm_cvtsh_u64 (x1);
+  res2 = _mm_cvt_roundsh_u64 (x1, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c
new file mode 100644
index 00000000000..74643ae2bd6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c
@@ -0,0 +1,53 @@
+/* { dg-do run  { target { { ! ia32 } && avx512fp16 } } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 4
+
+void NOINLINE
+emulate_cvtph2_q(V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 8; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u64[i] = 0;
+      }
+      else {
+	v5.u64[i] = dest->u64[i];
+      }
+    }
+    else {
+      v5.u64[i] = v1.f32[i];
+    }
+  }
+  *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_cvtph2_q(&exp, src1,  NET_MASK, 0);
+  res.u64[0] = _mm_cvt_roundsh_i64(src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, 4, "_mm_cvt_roundsh_u64");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c
new file mode 100644
index 00000000000..4cd69d9b4e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsi2sh\[ \\t\]+\[^%\n\]*%e\[^\{\n\]*\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsi2sh\[ \\t\]+\[^%\n\]*%e\[^\{\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x;
+volatile int n;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_cvti32_sh (x, n);
+  x = _mm_cvt_roundi32_sh (x, n, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c
new file mode 100644
index 00000000000..d9c9a853a17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_vcvtsi2sh(V512 *dest, V512 op1, 
+		  int value_32, __int64_t value_64, int bits)
+{
+  V512 v1,v2,v5,v6;
+  unpack_ph_2twops(op1, &v1, &v2);
+  if (bits == 32)
+    v5.xmm[0] = _mm_cvt_roundi32_ss (v1.xmm[0], value_32, _ROUND_NINT);
+#ifdef __x86_64__
+  else 
+    v5.xmm[0] = _mm_cvt_roundi64_ss (v1.xmm[0], value_64, _ROUND_NINT);
+#endif
+  v5.xmm[1] = v1.xmm[1]; 
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_vcvtsi2sh(&exp, src1, 99, 0, 32);
+  res.xmmh[0] = _mm_cvt_roundi32_sh(src1.xmmh[0], 99, _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundi32_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c
new file mode 100644
index 00000000000..5f3e5520bf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsi2sh\[ \\t\]+\[^%\n\]*%r\[^\{\n\]*\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsi2sh\[ \\t\]+\[^%\n\]*%r\[^\{\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x;
+volatile long long n;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_cvti64_sh (x, n);
+  x = _mm_cvt_roundi64_sh (x, n, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c
new file mode 100644
index 00000000000..6f66a87a8e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_vcvtsi2sh(V512 *dest, V512 op1, 
+		  int value_32, __int64_t value_64, int bits)
+{
+  V512 v1,v2,v5,v6;
+  unpack_ph_2twops(op1, &v1, &v2);
+  if (bits == 32)
+    v5.xmm[0] = _mm_cvt_roundi32_ss (v1.xmm[0], value_32, _ROUND_NINT);
+#ifdef __x86_64__
+  else 
+    v5.xmm[0] = _mm_cvt_roundi64_ss (v1.xmm[0], value_64, _ROUND_NINT);
+#endif
+  v5.xmm[1] = v1.xmm[1]; 
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_vcvtsi2sh(&exp, src1, 0, 99, 64);
+  res.xmmh[0] = _mm_cvt_roundi64_sh(src1.xmmh[0], 99, _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundi64_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c
new file mode 100644
index 00000000000..9c85da09e29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtusi2sh\[ \\t\]+\[^%\n\]*%e\[^\{\n\]*\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtusi2sh\[ \\t\]+\[^%\n\]*%e\[^\{\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x;
+volatile unsigned n;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_cvtu32_sh (x, n);
+  x = _mm_cvt_roundu32_sh (x, n, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c
new file mode 100644
index 00000000000..d339f0a4043
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_vcvtusi2sh(V512 *dest, V512 op1, 
+		   int value_32, __int64_t value_64, int bits)
+{
+  V512 v1,v2,v5,v6;
+  unpack_ph_2twops(op1, &v1, &v2);
+  if (bits == 32)
+    v5.xmm[0] = _mm_cvt_roundu32_ss (v1.xmm[0], value_32, _ROUND_NINT);
+#ifdef __x86_64__
+  else 
+    v5.xmm[0] = _mm_cvt_roundu64_ss (v1.xmm[0], value_64, _ROUND_NINT);
+#endif
+  v5.xmm[1] = v1.xmm[1]; 
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_vcvtusi2sh(&exp, src1, 99, 0, 32);
+  res.xmmh[0] = _mm_cvt_roundu32_sh(src1.xmmh[0], 99, _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundu32_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c
new file mode 100644
index 00000000000..1f22ac258e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtusi2sh\[ \\t\]+\[^%\n\]*%r\[^\{\n\]*\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtusi2sh\[ \\t\]+\[^%\n\]*%r\[^\{\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x;
+volatile unsigned long long n;
+
+void extern
+avx512f_test (void)
+{
+  x = _mm_cvtu64_sh (x, n);
+  x = _mm_cvt_roundu64_sh (x, n, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c
new file mode 100644
index 00000000000..20e711e1b0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_vcvtusi2sh(V512 *dest, V512 op1, 
+		   int value_32, __int64_t value_64, int bits)
+{
+  V512 v1,v2,v5,v6;
+  unpack_ph_2twops(op1, &v1, &v2);
+  if (bits == 32)
+    v5.xmm[0] = _mm_cvt_roundu32_ss (v1.xmm[0], value_32, _ROUND_NINT);
+#ifdef __x86_64__
+  else 
+    v5.xmm[0] = _mm_cvt_roundu64_ss (v1.xmm[0], value_64, _ROUND_NINT);
+#endif
+  v5.xmm[1] = v1.xmm[1]; 
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_vcvtusi2sh(&exp, src1, 0, 99, 64);
+  res.xmmh[0] = _mm_cvt_roundu64_sh(src1.xmmh[0], 99, _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundu64_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 32/62] AVX512FP16: Add vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (30 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 31/62] AVX512FP16: Add testcase for vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 33/62] AVX512FP16: Add testcase for vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq liuhongt
                   ` (29 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm512_cvttph_epi32):
	New intrinsic.
	(_mm512_mask_cvttph_epi32): Likewise.
	(_mm512_maskz_cvttph_epi32): Likewise.
	(_mm512_cvtt_roundph_epi32): Likewise.
	(_mm512_mask_cvtt_roundph_epi32): Likewise.
	(_mm512_maskz_cvtt_roundph_epi32): Likewise.
	(_mm512_cvttph_epu32): Likewise.
	(_mm512_mask_cvttph_epu32): Likewise.
	(_mm512_maskz_cvttph_epu32): Likewise.
	(_mm512_cvtt_roundph_epu32): Likewise.
	(_mm512_mask_cvtt_roundph_epu32): Likewise.
	(_mm512_maskz_cvtt_roundph_epu32): Likewise.
	(_mm512_cvttph_epi64): Likewise.
	(_mm512_mask_cvttph_epi64): Likewise.
	(_mm512_maskz_cvttph_epi64): Likewise.
	(_mm512_cvtt_roundph_epi64): Likewise.
	(_mm512_mask_cvtt_roundph_epi64): Likewise.
	(_mm512_maskz_cvtt_roundph_epi64): Likewise.
	(_mm512_cvttph_epu64): Likewise.
	(_mm512_mask_cvttph_epu64): Likewise.
	(_mm512_maskz_cvttph_epu64): Likewise.
	(_mm512_cvtt_roundph_epu64): Likewise.
	(_mm512_mask_cvtt_roundph_epu64): Likewise.
	(_mm512_maskz_cvtt_roundph_epu64): Likewise.
	(_mm512_cvttph_epi16): Likewise.
	(_mm512_mask_cvttph_epi16): Likewise.
	(_mm512_maskz_cvttph_epi16): Likewise.
	(_mm512_cvtt_roundph_epi16): Likewise.
	(_mm512_mask_cvtt_roundph_epi16): Likewise.
	(_mm512_maskz_cvtt_roundph_epi16): Likewise.
	(_mm512_cvttph_epu16): Likewise.
	(_mm512_mask_cvttph_epu16): Likewise.
	(_mm512_maskz_cvttph_epu16): Likewise.
	(_mm512_cvtt_roundph_epu16): Likewise.
	(_mm512_mask_cvtt_roundph_epu16): Likewise.
	(_mm512_maskz_cvtt_roundph_epu16): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_cvttph_epi32):
	New intirnsic.
	(_mm_mask_cvttph_epi32): Likewise.
	(_mm_maskz_cvttph_epi32): Likewise.
	(_mm256_cvttph_epi32): Likewise.
	(_mm256_mask_cvttph_epi32): Likewise.
	(_mm256_maskz_cvttph_epi32): Likewise.
	(_mm_cvttph_epu32): Likewise.
	(_mm_mask_cvttph_epu32): Likewise.
	(_mm_maskz_cvttph_epu32): Likewise.
	(_mm256_cvttph_epu32): Likewise.
	(_mm256_mask_cvttph_epu32): Likewise.
	(_mm256_maskz_cvttph_epu32): Likewise.
	(_mm_cvttph_epi64): Likewise.
	(_mm_mask_cvttph_epi64): Likewise.
	(_mm_maskz_cvttph_epi64): Likewise.
	(_mm256_cvttph_epi64): Likewise.
	(_mm256_mask_cvttph_epi64): Likewise.
	(_mm256_maskz_cvttph_epi64): Likewise.
	(_mm_cvttph_epu64): Likewise.
	(_mm_mask_cvttph_epu64): Likewise.
	(_mm_maskz_cvttph_epu64): Likewise.
	(_mm256_cvttph_epu64): Likewise.
	(_mm256_mask_cvttph_epu64): Likewise.
	(_mm256_maskz_cvttph_epu64): Likewise.
	(_mm_cvttph_epi16): Likewise.
	(_mm_mask_cvttph_epi16): Likewise.
	(_mm_maskz_cvttph_epi16): Likewise.
	(_mm256_cvttph_epi16): Likewise.
	(_mm256_mask_cvttph_epi16): Likewise.
	(_mm256_maskz_cvttph_epi16): Likewise.
	(_mm_cvttph_epu16): Likewise.
	(_mm_mask_cvttph_epu16): Likewise.
	(_mm_maskz_cvttph_epu16): Likewise.
	(_mm256_cvttph_epu16): Likewise.
	(_mm256_mask_cvttph_epu16): Likewise.
	(_mm256_maskz_cvttph_epu16): Likewise.
	* config/i386/i386-builtin.def: Add new builtins.
	* config/i386/sse.md
	(avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name><round_saeonly_name>):
	New.
	(avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name>): Ditto.
	(avx512fp16_fix<fixunssuffix>_truncv2di2<mask_name>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 539 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   | 365 +++++++++++++++++
 gcc/config/i386/i386-builtin.def       |  18 +
 gcc/config/i386/sse.md                 |  34 ++
 gcc/testsuite/gcc.target/i386/avx-1.c  |   6 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   6 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  18 +
 gcc/testsuite/gcc.target/i386/sse-22.c |  18 +
 gcc/testsuite/gcc.target/i386/sse-23.c |   6 +
 9 files changed, 1010 insertions(+)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 7524a8d6a5b..66de5b88927 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -2702,6 +2702,201 @@ _mm512_maskz_cvt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvttph2dq.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttph_epi32 (__m256h __A)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq_v16si_mask_round (__A,
+						(__v16si)
+						_mm512_setzero_si512 (),
+						(__mmask16) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttph_epi32 (__m512i __A, __mmask16 __B, __m256h __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq_v16si_mask_round (__C,
+						(__v16si) __A,
+						__B,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttph_epi32 (__mmask16 __A, __m256h __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq_v16si_mask_round (__B,
+						(__v16si)
+						_mm512_setzero_si512 (),
+						__A,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtt_roundph_epi32 (__m256h __A, int __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq_v16si_mask_round (__A,
+						(__v16si)
+						_mm512_setzero_si512 (),
+						(__mmask16) -1,
+						__B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtt_roundph_epi32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq_v16si_mask_round (__C,
+						(__v16si) __A,
+						__B,
+						__D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtt_roundph_epi32 (__mmask16 __A, __m256h __B, int __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2dq_v16si_mask_round (__B,
+						(__v16si)
+						_mm512_setzero_si512 (),
+						__A,
+						__C);
+}
+
+#else
+#define _mm512_cvtt_roundph_epi32(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2dq_v16si_mask_round ((A),			\
+					       (__v16si)		\
+					       (_mm512_setzero_si512 ()), \
+					       (__mmask16)(-1), (B)))
+
+#define _mm512_mask_cvtt_roundph_epi32(A, B, C, D)		\
+  ((__m512i)							\
+   __builtin_ia32_vcvttph2dq_v16si_mask_round ((C),		\
+					       (__v16si)(A),	\
+					       (B),		\
+					       (D)))
+
+#define _mm512_maskz_cvtt_roundph_epi32(A, B, C)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2dq_v16si_mask_round ((B),			\
+					       (__v16si)		\
+					       _mm512_setzero_si512 (),	\
+					       (A),			\
+					       (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvttph2udq.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttph_epu32 (__m256h __A)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq_v16si_mask_round (__A,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) -1,
+						 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttph_epu32 (__m512i __A, __mmask16 __B, __m256h __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq_v16si_mask_round (__C,
+						 (__v16si) __A,
+						 __B,
+						 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttph_epu32 (__mmask16 __A, __m256h __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq_v16si_mask_round (__B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 __A,
+						 _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtt_roundph_epu32 (__m256h __A, int __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq_v16si_mask_round (__A,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 (__mmask16) -1,
+						 __B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtt_roundph_epu32 (__m512i __A, __mmask16 __B, __m256h __C, int __D)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq_v16si_mask_round (__C,
+						 (__v16si) __A,
+						 __B,
+						 __D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtt_roundph_epu32 (__mmask16 __A, __m256h __B, int __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2udq_v16si_mask_round (__B,
+						 (__v16si)
+						 _mm512_setzero_si512 (),
+						 __A,
+						 __C);
+}
+
+#else
+#define _mm512_cvtt_roundph_epu32(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2udq_v16si_mask_round ((A),			\
+						(__v16si)		\
+						_mm512_setzero_si512 (), \
+						(__mmask16)-1,		\
+						(B)))
+
+#define _mm512_mask_cvtt_roundph_epu32(A, B, C, D)		\
+  ((__m512i)							\
+   __builtin_ia32_vcvttph2udq_v16si_mask_round ((C),		\
+						(__v16si)(A),	\
+						(B),		\
+						(D)))
+
+#define _mm512_maskz_cvtt_roundph_epu32(A, B, C)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2udq_v16si_mask_round ((B),			\
+						(__v16si)		\
+						_mm512_setzero_si512 (), \
+						(A),			\
+						(C)))
+
+#endif /* __OPTIMIZE__ */
+
 /* Intrinsics vcvtdq2ph.  */
 extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -3019,6 +3214,156 @@ _mm512_maskz_cvt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvttph2qq.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttph_epi64 (__m128h __A)
+{
+  return __builtin_ia32_vcvttph2qq_v8di_mask_round (__A,
+						    _mm512_setzero_si512 (),
+						    (__mmask8) -1,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttph_epi64 (__m512i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvttph2qq_v8di_mask_round (__C, __A, __B,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvttph2qq_v8di_mask_round (__B,
+						    _mm512_setzero_si512 (),
+						    __A,
+						    _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtt_roundph_epi64 (__m128h __A, int __B)
+{
+  return __builtin_ia32_vcvttph2qq_v8di_mask_round (__A,
+						    _mm512_setzero_si512 (),
+						    (__mmask8) -1,
+						    __B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtt_roundph_epi64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+{
+  return __builtin_ia32_vcvttph2qq_v8di_mask_round (__C, __A, __B, __D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtt_roundph_epi64 (__mmask8 __A, __m128h __B, int __C)
+{
+  return __builtin_ia32_vcvttph2qq_v8di_mask_round (__B,
+						    _mm512_setzero_si512 (),
+						    __A,
+						    __C);
+}
+
+#else
+#define _mm512_cvtt_roundph_epi64(A, B)					\
+  (__builtin_ia32_vcvttph2qq_v8di_mask_round ((A),			\
+					      _mm512_setzero_si512 (),	\
+					      (__mmask8)-1,		\
+					      (B)))
+
+#define _mm512_mask_cvtt_roundph_epi64(A, B, C, D)			\
+  __builtin_ia32_vcvttph2qq_v8di_mask_round ((C), (A), (B), (D))
+
+#define _mm512_maskz_cvtt_roundph_epi64(A, B, C)			\
+  (__builtin_ia32_vcvttph2qq_v8di_mask_round ((B),			\
+					      _mm512_setzero_si512 (),	\
+					      (A),			\
+					      (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvttph2uqq.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttph_epu64 (__m128h __A)
+{
+  return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__A,
+						     _mm512_setzero_si512 (),
+						     (__mmask8) -1,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttph_epu64 (__m512i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__C, __A, __B,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__B,
+						     _mm512_setzero_si512 (),
+						     __A,
+						     _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtt_roundph_epu64 (__m128h __A, int __B)
+{
+  return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__A,
+						     _mm512_setzero_si512 (),
+						     (__mmask8) -1,
+						     __B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtt_roundph_epu64 (__m512i __A, __mmask8 __B, __m128h __C, int __D)
+{
+  return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__C, __A, __B, __D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtt_roundph_epu64 (__mmask8 __A, __m128h __B, int __C)
+{
+  return __builtin_ia32_vcvttph2uqq_v8di_mask_round (__B,
+						     _mm512_setzero_si512 (),
+						     __A,
+						     __C);
+}
+
+#else
+#define _mm512_cvtt_roundph_epu64(A, B)					\
+  (__builtin_ia32_vcvttph2uqq_v8di_mask_round ((A),			\
+					       _mm512_setzero_si512 (),	\
+					       (__mmask8)-1,		\
+					       (B)))
+
+#define _mm512_mask_cvtt_roundph_epu64(A, B, C, D)			\
+  __builtin_ia32_vcvttph2uqq_v8di_mask_round ((C), (A), (B), (D))
+
+#define _mm512_maskz_cvtt_roundph_epu64(A, B, C)			\
+  (__builtin_ia32_vcvttph2uqq_v8di_mask_round ((B),			\
+					       _mm512_setzero_si512 (),	\
+					       (A),			\
+					       (C)))
+
+#endif /* __OPTIMIZE__ */
+
 /* Intrinsics vcvtqq2ph.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -3363,6 +3708,200 @@ _mm512_maskz_cvt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvttph2w.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttph_epi16 (__m512h __A)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2w_v32hi_mask_round (__A,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       (__mmask32) -1,
+					       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttph_epi16 (__m512i __A, __mmask32 __B, __m512h __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2w_v32hi_mask_round (__C,
+					       (__v32hi) __A,
+					       __B,
+					       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttph_epi16 (__mmask32 __A, __m512h __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2w_v32hi_mask_round (__B,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       __A,
+					       _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtt_roundph_epi16 (__m512h __A, int __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2w_v32hi_mask_round (__A,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       (__mmask32) -1,
+					       __B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtt_roundph_epi16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2w_v32hi_mask_round (__C,
+					       (__v32hi) __A,
+					       __B,
+					       __D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtt_roundph_epi16 (__mmask32 __A, __m512h __B, int __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2w_v32hi_mask_round (__B,
+					       (__v32hi)
+					       _mm512_setzero_si512 (),
+					       __A,
+					       __C);
+}
+
+#else
+#define _mm512_cvtt_roundph_epi16(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2w_v32hi_mask_round ((A),			\
+					      (__v32hi)_mm512_setzero_si512 (), \
+					      (__mmask32)-1,		\
+					      (B)))
+
+#define _mm512_mask_cvtt_roundph_epi16(A, B, C, D)		\
+  ((__m512i)							\
+   __builtin_ia32_vcvttph2w_v32hi_mask_round ((C),		\
+					      (__v32hi)(A),	\
+					      (B),		\
+					      (D)))
+
+#define _mm512_maskz_cvtt_roundph_epi16(A, B, C)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2w_v32hi_mask_round ((B),			\
+					      (__v32hi)_mm512_setzero_si512 (), \
+					      (A),			\
+					      (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvttph2uw.  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvttph_epu16 (__m512h __A)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw_v32hi_mask_round (__A,
+						(__v32hi)
+						_mm512_setzero_si512 (),
+						(__mmask32) -1,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvttph_epu16 (__m512i __A, __mmask32 __B, __m512h __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw_v32hi_mask_round (__C,
+						(__v32hi) __A,
+						__B,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvttph_epu16 (__mmask32 __A, __m512h __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw_v32hi_mask_round (__B,
+						(__v32hi)
+						_mm512_setzero_si512 (),
+						__A,
+						_MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtt_roundph_epu16 (__m512h __A, int __B)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw_v32hi_mask_round (__A,
+						(__v32hi)
+						_mm512_setzero_si512 (),
+						(__mmask32) -1,
+						__B);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtt_roundph_epu16 (__m512i __A, __mmask32 __B, __m512h __C, int __D)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw_v32hi_mask_round (__C,
+						(__v32hi) __A,
+						__B,
+						__D);
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtt_roundph_epu16 (__mmask32 __A, __m512h __B, int __C)
+{
+  return (__m512i)
+    __builtin_ia32_vcvttph2uw_v32hi_mask_round (__B,
+						(__v32hi)
+						_mm512_setzero_si512 (),
+						__A,
+						__C);
+}
+
+#else
+#define _mm512_cvtt_roundph_epu16(A, B)					\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2uw_v32hi_mask_round ((A),			\
+					       (__v32hi)		\
+					       _mm512_setzero_si512 (),	\
+					       (__mmask32)-1,		\
+					       (B)))
+
+#define _mm512_mask_cvtt_roundph_epu16(A, B, C, D)		\
+  ((__m512i)							\
+   __builtin_ia32_vcvttph2uw_v32hi_mask_round ((C),		\
+					       (__v32hi)(A),	\
+					       (B),		\
+					       (D)))
+
+#define _mm512_maskz_cvtt_roundph_epu16(A, B, C)			\
+  ((__m512i)								\
+   __builtin_ia32_vcvttph2uw_v32hi_mask_round ((B),			\
+					       (__v32hi)		\
+					       _mm512_setzero_si512 (),	\
+					       (A),			\
+					       (C)))
+
+#endif /* __OPTIMIZE__ */
+
 /* Intrinsics vcvtw2ph.  */
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index 93d9ff8bf3c..e1ee37edde6 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -1050,6 +1050,132 @@ _mm256_maskz_cvtph_epu32 (__mmask8 __A, __m128h __B)
 					 __A);
 }
 
+/* Intrinsics vcvttph2dq.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttph_epi32 (__m128h __A)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2dq_v4si_mask (__A,
+					 (__v4si) _mm_setzero_si128 (),
+					 (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvttph_epi32 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m128i)__builtin_ia32_vcvttph2dq_v4si_mask (__C,
+						       ( __v4si) __A,
+						       __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvttph_epi32 (__mmask8 __A, __m128h __B)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2dq_v4si_mask (__B,
+					 (__v4si) _mm_setzero_si128 (),
+					 __A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvttph_epi32 (__m128h __A)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2dq_v8si_mask (__A,
+					 (__v8si)
+					 _mm256_setzero_si256 (),
+					 (__mmask8) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvttph_epi32 (__m256i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2dq_v8si_mask (__C,
+					 ( __v8si) __A,
+					 __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvttph_epi32 (__mmask8 __A, __m128h __B)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2dq_v8si_mask (__B,
+					 (__v8si)
+					 _mm256_setzero_si256 (),
+					 __A);
+}
+
+/* Intrinsics vcvttph2udq.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttph_epu32 (__m128h __A)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2udq_v4si_mask (__A,
+					  (__v4si)
+					  _mm_setzero_si128 (),
+					  (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvttph_epu32 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2udq_v4si_mask (__C,
+					  ( __v4si) __A,
+					  __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvttph_epu32 (__mmask8 __A, __m128h __B)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2udq_v4si_mask (__B,
+					  (__v4si)
+					  _mm_setzero_si128 (),
+					  __A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvttph_epu32 (__m128h __A)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2udq_v8si_mask (__A,
+					  (__v8si)
+					  _mm256_setzero_si256 (), (__mmask8) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvttph_epu32 (__m256i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2udq_v8si_mask (__C,
+					  ( __v8si) __A,
+					  __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvttph_epu32 (__mmask8 __A, __m128h __B)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2udq_v8si_mask (__B,
+					  (__v8si)
+					  _mm256_setzero_si256 (),
+					  __A);
+}
+
 /* Intrinsics vcvtdq2ph.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -1257,6 +1383,116 @@ _mm256_maskz_cvtph_epu64 (__mmask8 __A, __m128h __B)
 					      __A);
 }
 
+/* Intrinsics vcvttph2qq.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttph_epi64 (__m128h __A)
+{
+  return __builtin_ia32_vcvttph2qq_v2di_mask (__A,
+					      _mm_setzero_si128 (),
+					      (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvttph_epi64 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvttph2qq_v2di_mask (__C,
+					      __A,
+					      __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvttph2qq_v2di_mask (__B,
+					      _mm_setzero_si128 (),
+					      __A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvttph_epi64 (__m128h __A)
+{
+  return __builtin_ia32_vcvttph2qq_v4di_mask (__A,
+					      _mm256_setzero_si256 (),
+					      (__mmask8) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvttph_epi64 (__m256i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvttph2qq_v4di_mask (__C,
+					      __A,
+					      __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvttph_epi64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvttph2qq_v4di_mask (__B,
+					      _mm256_setzero_si256 (),
+					      __A);
+}
+
+/* Intrinsics vcvttph2uqq.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttph_epu64 (__m128h __A)
+{
+  return __builtin_ia32_vcvttph2uqq_v2di_mask (__A,
+					       _mm_setzero_si128 (),
+					       (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvttph_epu64 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvttph2uqq_v2di_mask (__C,
+					       __A,
+					       __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvttph2uqq_v2di_mask (__B,
+					       _mm_setzero_si128 (),
+					       __A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvttph_epu64 (__m128h __A)
+{
+  return __builtin_ia32_vcvttph2uqq_v4di_mask (__A,
+					       _mm256_setzero_si256 (),
+					       (__mmask8) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvttph_epu64 (__m256i __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvttph2uqq_v4di_mask (__C,
+					       __A,
+					       __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvttph_epu64 (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvttph2uqq_v4di_mask (__B,
+					       _mm256_setzero_si256 (),
+					       __A);
+}
+
 /* Intrinsics vcvtqq2ph.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -1481,6 +1717,135 @@ _mm256_maskz_cvtph_epu16 (__mmask16 __A, __m256h __B)
 					 __A);
 }
 
+/* Intrinsics vcvttph2w.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttph_epi16 (__m128h __A)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2w_v8hi_mask (__A,
+					(__v8hi)
+					_mm_setzero_si128 (),
+					(__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvttph_epi16 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2w_v8hi_mask (__C,
+					( __v8hi) __A,
+					__B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvttph_epi16 (__mmask8 __A, __m128h __B)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2w_v8hi_mask (__B,
+					(__v8hi)
+					_mm_setzero_si128 (),
+					__A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvttph_epi16 (__m256h __A)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2w_v16hi_mask (__A,
+					 (__v16hi)
+					 _mm256_setzero_si256 (),
+					 (__mmask16) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvttph_epi16 (__m256i __A, __mmask16 __B, __m256h __C)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2w_v16hi_mask (__C,
+					 ( __v16hi) __A,
+					 __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvttph_epi16 (__mmask16 __A, __m256h __B)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2w_v16hi_mask (__B,
+					 (__v16hi)
+					 _mm256_setzero_si256 (),
+					 __A);
+}
+
+/* Intrinsics vcvttph2uw.  */
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttph_epu16 (__m128h __A)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2uw_v8hi_mask (__A,
+					 (__v8hi)
+					 _mm_setzero_si128 (),
+					 (__mmask8) -1);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvttph_epu16 (__m128i __A, __mmask8 __B, __m128h __C)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2uw_v8hi_mask (__C,
+					 ( __v8hi) __A,
+					 __B);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvttph_epu16 (__mmask8 __A, __m128h __B)
+{
+  return (__m128i)
+    __builtin_ia32_vcvttph2uw_v8hi_mask (__B,
+					 (__v8hi)
+					 _mm_setzero_si128 (),
+					 __A);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvttph_epu16 (__m256h __A)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2uw_v16hi_mask (__A,
+					  (__v16hi)
+					  _mm256_setzero_si256 (),
+					  (__mmask16) -1);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvttph_epu16 (__m256i __A, __mmask16 __B, __m256h __C)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2uw_v16hi_mask (__C,
+					  ( __v16hi) __A,
+					  __B);
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvttph_epu16 (__mmask16 __A, __m256h __B)
+{
+  return (__m256i)
+    __builtin_ia32_vcvttph2uw_v16hi_mask (__B,
+					  (__v16hi) _mm256_setzero_si256 (),
+					  __A);
+}
+
 /* Intrinsics vcvtw2ph.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 3602b40d6d5..17571e3b4c3 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2835,14 +2835,26 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v8si_mask, "__builtin_ia32_vcvtph2dq_v8si_mask", IX86_BUILTIN_VCVTPH2DQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v4si_mask, "__builtin_ia32_vcvtph2udq_v4si_mask", IX86_BUILTIN_VCVTPH2UDQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v8si_mask, "__builtin_ia32_vcvtph2udq_v8si_mask", IX86_BUILTIN_VCVTPH2UDQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv4si2_mask, "__builtin_ia32_vcvttph2dq_v4si_mask", IX86_BUILTIN_VCVTTPH2DQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv8si2_mask, "__builtin_ia32_vcvttph2dq_v8si_mask", IX86_BUILTIN_VCVTTPH2DQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv4si2_mask, "__builtin_ia32_vcvttph2udq_v4si_mask", IX86_BUILTIN_VCVTTPH2UDQ_V4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V8HF_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv8si2_mask, "__builtin_ia32_vcvttph2udq_v8si_mask", IX86_BUILTIN_VCVTTPH2UDQ_V8SI_MASK, UNKNOWN, (int) V8SI_FTYPE_V8HF_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v2di_mask, "__builtin_ia32_vcvtph2qq_v2di_mask", IX86_BUILTIN_VCVTPH2QQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v4di_mask, "__builtin_ia32_vcvtph2qq_v4di_mask", IX86_BUILTIN_VCVTPH2QQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v2di_mask, "__builtin_ia32_vcvtph2uqq_v2di_mask", IX86_BUILTIN_VCVTPH2UQQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v4di_mask, "__builtin_ia32_vcvtph2uqq_v4di_mask", IX86_BUILTIN_VCVTPH2UQQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv2di2_mask, "__builtin_ia32_vcvttph2qq_v2di_mask", IX86_BUILTIN_VCVTTPH2QQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv4di2_mask, "__builtin_ia32_vcvttph2qq_v4di_mask", IX86_BUILTIN_VCVTTPH2QQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv2di2_mask, "__builtin_ia32_vcvttph2uqq_v2di_mask", IX86_BUILTIN_VCVTTPH2UQQ_V2DI_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv4di2_mask, "__builtin_ia32_vcvttph2uqq_v4di_mask", IX86_BUILTIN_VCVTTPH2UQQ_V4DI_MASK, UNKNOWN, (int) V4DI_FTYPE_V8HF_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v8hi_mask, "__builtin_ia32_vcvtph2w_v8hi_mask", IX86_BUILTIN_VCVTPH2W_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v16hi_mask, "__builtin_ia32_vcvtph2w_v16hi_mask", IX86_BUILTIN_VCVTPH2W_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v8hi_mask, "__builtin_ia32_vcvtph2uw_v8hi_mask", IX86_BUILTIN_VCVTPH2UW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v16hi_mask, "__builtin_ia32_vcvtph2uw_v16hi_mask", IX86_BUILTIN_VCVTPH2UW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv8hi2_mask, "__builtin_ia32_vcvttph2w_v8hi_mask", IX86_BUILTIN_VCVTTPH2W_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv16hi2_mask, "__builtin_ia32_vcvttph2w_v16hi_mask", IX86_BUILTIN_VCVTTPH2W_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv8hi2_mask, "__builtin_ia32_vcvttph2uw_v8hi_mask", IX86_BUILTIN_VCVTTPH2UW_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HF_V8HI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv16hi2_mask, "__builtin_ia32_vcvttph2uw_v16hi_mask", IX86_BUILTIN_VCVTTPH2UW_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HF_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v8hi_mask, "__builtin_ia32_vcvtw2ph_v8hi_mask", IX86_BUILTIN_VCVTW2PH_V8HI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HI_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v16hi_mask, "__builtin_ia32_vcvtw2ph_v16hi_mask", IX86_BUILTIN_VCVTW2PH_V16HI_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HI_V16HF_UHI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v8hi_mask, "__builtin_ia32_vcvtuw2ph_v8hi_mask", IX86_BUILTIN_VCVTUW2PH_V8HI_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HI_V8HF_UQI)
@@ -3084,10 +3096,16 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_getmantv32hf_mask_round
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vgetmantv8hf_mask_round, "__builtin_ia32_getmantsh_mask_round", IX86_BUILTIN_GETMANTSH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2dq_v16si_mask_round, "__builtin_ia32_vcvtph2dq_v16si_mask_round", IX86_BUILTIN_VCVTPH2DQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2udq_v16si_mask_round, "__builtin_ia32_vcvtph2udq_v16si_mask_round", IX86_BUILTIN_VCVTPH2UDQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv16si2_mask_round, "__builtin_ia32_vcvttph2dq_v16si_mask_round", IX86_BUILTIN_VCVTTPH2DQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv16si2_mask_round, "__builtin_ia32_vcvttph2udq_v16si_mask_round", IX86_BUILTIN_VCVTTPH2UDQ_V16SI_MASK_ROUND, UNKNOWN, (int) V16SI_FTYPE_V16HF_V16SI_UHI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2qq_v8di_mask_round, "__builtin_ia32_vcvtph2qq_v8di_mask_round", IX86_BUILTIN_VCVTPH2QQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uqq_v8di_mask_round, "__builtin_ia32_vcvtph2uqq_v8di_mask_round", IX86_BUILTIN_VCVTPH2UQQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv8di2_mask_round, "__builtin_ia32_vcvttph2qq_v8di_mask_round", IX86_BUILTIN_VCVTTPH2QQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv8di2_mask_round, "__builtin_ia32_vcvttph2uqq_v8di_mask_round", IX86_BUILTIN_VCVTTPH2UQQ_V8DI_MASK_ROUND, UNKNOWN, (int) V8DI_FTYPE_V8HF_V8DI_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2w_v32hi_mask_round, "__builtin_ia32_vcvtph2w_v32hi_mask_round", IX86_BUILTIN_VCVTPH2W_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtph2uw_v32hi_mask_round, "__builtin_ia32_vcvtph2uw_v32hi_mask_round", IX86_BUILTIN_VCVTPH2UW_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2w_v32hi_mask_round", IX86_BUILTIN_VCVTTPH2W_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncv32hi2_mask_round, "__builtin_ia32_vcvttph2uw_v32hi_mask_round", IX86_BUILTIN_VCVTTPH2UW_V32HI_MASK_ROUND, UNKNOWN, (int) V32HI_FTYPE_V32HF_V32HI_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtw2ph_v32hi_mask_round, "__builtin_ia32_vcvtw2ph_v32hi_mask_round", IX86_BUILTIN_VCVTW2PH_V32HI_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuw2ph_v32hi_mask_round, "__builtin_ia32_vcvtuw2ph_v32hi_mask_round", IX86_BUILTIN_VCVTUW2PH_V32HI_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HI_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_round, "__builtin_ia32_vcvtdq2ph_v16si_mask_round", IX86_BUILTIN_VCVTDQ2PH_V16SI_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b312d26b806..66b4fa61eb5 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5636,6 +5636,40 @@ (define_insn "avx512fp16_vcvt<floatsuffix>si2sh<rex64namesuffix><round_name>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "HF")])
 
+(define_insn "avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name><round_saeonly_name>"
+  [(set (match_operand:VI2H_AVX512VL 0 "register_operand" "=v")
+	(any_fix:VI2H_AVX512VL
+	  (match_operand:<ssePHmode> 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
+  "TARGET_AVX512FP16"
+  "vcvttph2<fixsuffix><sseintconvert>\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name>"
+  [(set (match_operand:VI4_128_8_256 0 "register_operand" "=v")
+	(any_fix:VI4_128_8_256
+	  (vec_select:V4HF
+	    (match_operand:V8HF 1 "nonimmediate_operand" "vm")
+	    (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvttph2<fixsuffix><sseintconvert>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512fp16_fix<fixunssuffix>_truncv2di2<mask_name>"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+	(any_fix:V2DI
+	  (vec_select:V2HF
+	    (match_operand:V8HF 1 "nonimmediate_operand" "vm")
+	    (parallel [(const_int 0) (const_int 1)]))))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvttph2<fixsuffix>qq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel single-precision floating point conversion operations
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 0aae949097a..4b6cf7e1ed6 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -723,8 +723,14 @@
 #define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 997fb733132..2e730d554dd 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -740,8 +740,14 @@
 #define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 89a589e0d80..98e38fb025a 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -680,8 +680,14 @@ test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123)
 test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8)
 test_1 (_mm512_cvt_roundph_epi16, __m512i, __m512h, 8)
 test_1 (_mm512_cvt_roundph_epu16, __m512i, __m512h, 8)
+test_1 (_mm512_cvtt_roundph_epi16, __m512i, __m512h, 8)
+test_1 (_mm512_cvtt_roundph_epu16, __m512i, __m512h, 8)
 test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8)
 test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8)
+test_1 (_mm512_cvtt_roundph_epi32, __m512i, __m256h, 8)
+test_1 (_mm512_cvtt_roundph_epu32, __m512i, __m256h, 8)
+test_1 (_mm512_cvtt_roundph_epi64, __m512i, __m128h, 8)
+test_1 (_mm512_cvtt_roundph_epu64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8)
@@ -732,10 +738,16 @@ test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8)
 test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epi16, __m512i, __mmask32, __m512h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu16, __m512i, __mmask32, __m512h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epi16, __m512i, __mmask32, __m512h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epu16, __m512i, __mmask32, __m512h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
@@ -784,10 +796,16 @@ test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
 test_3 (_mm512_mask_cvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8)
 test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index fed12744c6c..3ad10908d49 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -785,10 +785,16 @@ test_1 (_mm512_roundscale_ph, __m512h, __m512h, 123)
 test_1 (_mm512_getexp_round_ph, __m512h, __m512h, 8)
 test_1 (_mm512_cvt_roundph_epi16, __m512i, __m512h, 8)
 test_1 (_mm512_cvt_roundph_epu16, __m512i, __m512h, 8)
+test_1 (_mm512_cvtt_roundph_epi16, __m512i, __m512h, 8)
+test_1 (_mm512_cvtt_roundph_epu16, __m512i, __m512h, 8)
 test_1 (_mm512_cvt_roundph_epi32, __m512i, __m256h, 8)
 test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8)
 test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8)
+test_1 (_mm512_cvtt_roundph_epi32, __m512i, __m256h, 8)
+test_1 (_mm512_cvtt_roundph_epu32, __m512i, __m256h, 8)
+test_1 (_mm512_cvtt_roundph_epi64, __m512i, __m128h, 8)
+test_1 (_mm512_cvtt_roundph_epu64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu16_ph, __m512h, __m512i, 8)
 test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8)
@@ -836,10 +842,16 @@ test_2 (_mm512_maskz_getexp_round_ph, __m512h, __mmask32, __m512h, 8)
 test_2 (_mm_getexp_round_sh, __m128h, __m128h, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epi16, __m512i, __mmask32, __m512h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu16, __m512i, __mmask32, __m512h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epi16, __m512i, __mmask32, __m512h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epu16, __m512i, __mmask32, __m512h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvtt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
@@ -887,10 +899,16 @@ test_3 (_mm_maskz_getexp_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3 (_mm512_mask_getexp_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
 test_3 (_mm512_mask_cvt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epi16, __m512i, __m512i, __mmask32, __m512h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epu16, __m512i, __m512i, __mmask32, __m512h, 8)
 test_3 (_mm512_mask_cvt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvtt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 6e8d8a1833c..6990f93bfce 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -741,8 +741,14 @@
 #define __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtph2udq_v16si_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2qq_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uqq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2dq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvttph2udq_v16si_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2qq_v8di_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uqq_v8di_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2w_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtph2uw_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2w_v32hi_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvttph2uw_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtw2ph_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, D) __builtin_ia32_vcvtuw2ph_v32hi_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtdq2ph_v16si_mask_round(A, B, C, 8)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 33/62] AVX512FP16: Add testcase for vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (31 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 32/62] AVX512FP16: Add vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 34/62] AVX512FP16: Add vcvttsh2si/vcvttsh2usi liuhongt
                   ` (28 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vcvttph2dq-1a.c: New test.
	* gcc.target/i386/avx512fp16-vcvttph2dq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2qq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2qq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2udq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2udq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2uw-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2uw-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2w-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttph2w-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c: Ditto.
---
 .../i386/avx512fp16-vcvttph2dq-1a.c           | 24 ++++++
 .../i386/avx512fp16-vcvttph2dq-1b.c           | 79 +++++++++++++++++
 .../i386/avx512fp16-vcvttph2qq-1a.c           | 24 ++++++
 .../i386/avx512fp16-vcvttph2qq-1b.c           | 78 +++++++++++++++++
 .../i386/avx512fp16-vcvttph2udq-1a.c          | 24 ++++++
 .../i386/avx512fp16-vcvttph2udq-1b.c          | 79 +++++++++++++++++
 .../i386/avx512fp16-vcvttph2uqq-1a.c          | 24 ++++++
 .../i386/avx512fp16-vcvttph2uqq-1b.c          | 78 +++++++++++++++++
 .../i386/avx512fp16-vcvttph2uw-1a.c           | 24 ++++++
 .../i386/avx512fp16-vcvttph2uw-1b.c           | 84 +++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvttph2w-1a.c | 24 ++++++
 .../gcc.target/i386/avx512fp16-vcvttph2w-1b.c | 83 ++++++++++++++++++
 .../i386/avx512fp16vl-vcvttph2dq-1a.c         | 27 ++++++
 .../i386/avx512fp16vl-vcvttph2dq-1b.c         | 15 ++++
 .../i386/avx512fp16vl-vcvttph2qq-1a.c         | 27 ++++++
 .../i386/avx512fp16vl-vcvttph2qq-1b.c         | 15 ++++
 .../i386/avx512fp16vl-vcvttph2udq-1a.c        | 27 ++++++
 .../i386/avx512fp16vl-vcvttph2udq-1b.c        | 15 ++++
 .../i386/avx512fp16vl-vcvttph2uqq-1a.c        | 27 ++++++
 .../i386/avx512fp16vl-vcvttph2uqq-1b.c        | 15 ++++
 .../i386/avx512fp16vl-vcvttph2uw-1a.c         | 29 +++++++
 .../i386/avx512fp16vl-vcvttph2uw-1b.c         | 15 ++++
 .../i386/avx512fp16vl-vcvttph2w-1a.c          | 29 +++++++
 .../i386/avx512fp16vl-vcvttph2w-1b.c          | 15 ++++
 24 files changed, 881 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1a.c
new file mode 100644
index 00000000000..0e44aaf1bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m256h x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvttph_epi32 (x1);
+  res1 = _mm512_mask_cvttph_epi32 (res, m16, x2);
+  res2 = _mm512_maskz_cvttph_epi32 (m16, x3);
+  res = _mm512_cvtt_roundph_epi32 (x1, 4);
+  res1 = _mm512_mask_cvtt_roundph_epi32 (res, m16, x2, 8);
+  res2 = _mm512_maskz_cvtt_roundph_epi32 (m16, x3, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1b.c
new file mode 100644
index 00000000000..c18fefbf206
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2dq-1b.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_d) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u32[i] = 0;
+      }
+      else {
+	v5.u32[i] = dest->u32[i];
+      }
+    }
+    else {
+      v5.u32[i] = v1.f32[i];
+
+    }
+  }
+  *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_d)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvttph_epi32) (H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epi32);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0);
+  SI(res) = INTRINSIC (_mask_cvttph_epi32) (SI(res), HALF_MASK, H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epi32);
+
+  EMULATE(cvtph2_d)(&exp, src1,  HALF_MASK, 1);
+  SI(res) = INTRINSIC (_maskz_cvttph_epi32) (HALF_MASK, H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epi32);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_d)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtt_roundph_epi32) (H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epi32);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0);
+  SI(res) = INTRINSIC (_mask_cvtt_roundph_epi32) (SI(res), HALF_MASK, H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epi32);
+
+  EMULATE(cvtph2_d)(&exp, src1,  HALF_MASK, 1);
+  SI(res) = INTRINSIC (_maskz_cvtt_roundph_epi32) (HALF_MASK, H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epi32);
+#endif
+
+  if (n_errs != 0)
+    abort ();
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1a.c
new file mode 100644
index 00000000000..124169467ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m128h x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvttph_epi64 (x1);
+  res1 = _mm512_mask_cvttph_epi64 (res, m8, x2);
+  res2 = _mm512_maskz_cvttph_epi64 (m8, x3);
+  res = _mm512_cvtt_roundph_epi64 (x1, 4);
+  res1 = _mm512_mask_cvtt_roundph_epi64 (res, m8, x2, 8);
+  res2 = _mm512_maskz_cvtt_roundph_epi64 (m8, x3, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1b.c
new file mode 100644
index 00000000000..2a9a2ca26f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2qq-1b.c
@@ -0,0 +1,78 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_q) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 8; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u64[i] = 0;
+      }
+      else {
+	v5.u64[i] = dest->u64[i];
+      }
+    }
+    else {
+      v5.u64[i] = v1.f32[i];
+    }
+  }
+  *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_q)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvttph_epi64) (src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epi64);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0);
+  SI(res) = INTRINSIC (_mask_cvttph_epi64) (SI(res), 0xcc, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epi64);
+
+  EMULATE(cvtph2_q)(&exp, src1,  0xfa, 1);
+  SI(res) = INTRINSIC (_maskz_cvttph_epi64) (0xfa, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epi64);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_q)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtt_roundph_epi64) (src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epi64);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0);
+  SI(res) = INTRINSIC (_mask_cvtt_roundph_epi64) (SI(res), 0xcc, src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epi64);
+
+  EMULATE(cvtph2_q)(&exp, src1,  0xfa, 1);
+  SI(res) = INTRINSIC (_maskz_cvtt_roundph_epi64) (0xfa, src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epi64);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1a.c
new file mode 100644
index 00000000000..0fd60f56777
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m256h x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvttph_epu32 (x1);
+  res1 = _mm512_mask_cvttph_epu32 (res, m16, x2);
+  res2 = _mm512_maskz_cvttph_epu32 (m16, x3);
+  res = _mm512_cvtt_roundph_epu32 (x1, 4);
+  res1 = _mm512_mask_cvtt_roundph_epu32 (res, m16, x2, 8);
+  res2 = _mm512_maskz_cvtt_roundph_epu32 (m16, x3, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1b.c
new file mode 100644
index 00000000000..98bce374753
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2udq-1b.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_d) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u32[i] = 0;
+      }
+      else {
+	v5.u32[i] = dest->u32[i];
+      }
+    }
+    else {
+      v5.u32[i] = v1.f32[i];
+
+    }
+  }
+  *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_d)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvttph_epu32) (H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epu32);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0);
+  SI(res) = INTRINSIC (_mask_cvttph_epu32) (SI(res), HALF_MASK, H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epu32);
+
+  EMULATE(cvtph2_d)(&exp, src1,  HALF_MASK, 1);
+  SI(res) = INTRINSIC (_maskz_cvttph_epu32) (HALF_MASK, H_HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epu32);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_d)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtt_roundph_epu32) (H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epu32);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_d)(&exp, src1, HALF_MASK, 0);
+  SI(res) = INTRINSIC (_mask_cvtt_roundph_epu32) (SI(res), HALF_MASK, H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epu32);
+
+  EMULATE(cvtph2_d)(&exp, src1,  HALF_MASK, 1);
+  SI(res) = INTRINSIC (_maskz_cvtt_roundph_epu32) (HALF_MASK, H_HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epu32);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c
new file mode 100644
index 00000000000..04fee2936c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m128h x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvttph_epu64 (x1);
+  res1 = _mm512_mask_cvttph_epu64 (res, m8, x2);
+  res2 = _mm512_maskz_cvttph_epu64 (m8, x3);
+  res = _mm512_cvtt_roundph_epu64 (x1, 4);
+  res1 = _mm512_mask_cvtt_roundph_epu64 (res, m8, x2, 8);
+  res2 = _mm512_maskz_cvtt_roundph_epu64 (m8, x3, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c
new file mode 100644
index 00000000000..31879ef8983
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c
@@ -0,0 +1,78 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_q) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 8; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u64[i] = 0;
+      }
+      else {
+	v5.u64[i] = dest->u64[i];
+      }
+    }
+    else {
+      v5.u64[i] = v1.f32[i];
+    }
+  }
+  *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_q)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvttph_epu64) (src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epu64);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0);
+  SI(res) = INTRINSIC (_mask_cvttph_epu64) (SI(res), 0xcc, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epu64);
+
+  EMULATE(cvtph2_q)(&exp, src1,  0xfc, 1);
+  SI(res) = INTRINSIC (_maskz_cvttph_epu64) (0xfc, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epu64);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_q)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtt_roundph_epu64) (src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epu64);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_q)(&exp, src1, 0xcc, 0);
+  SI(res) = INTRINSIC (_mask_cvtt_roundph_epu64) (SI(res), 0xcc, src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epu64);
+
+  EMULATE(cvtph2_q)(&exp, src1,  0xfc, 1);
+  SI(res) = INTRINSIC (_maskz_cvtt_roundph_epu64) (0xfc, src1.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epu64);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1a.c
new file mode 100644
index 00000000000..b31af8441a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvttph_epu16 (x1);
+  res1 = _mm512_mask_cvttph_epu16 (res, m32, x2);
+  res2 = _mm512_maskz_cvttph_epu16 (m32, x3);
+  res = _mm512_cvtt_roundph_epu16 (x1, 4);
+  res1 = _mm512_mask_cvtt_roundph_epu16 (res, m32, x2, 8);
+  res2 = _mm512_maskz_cvtt_roundph_epu16 (m32, x3, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1b.c
new file mode 100644
index 00000000000..34e94e8e549
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2uw-1b.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_w) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	dest->u16[i] = 0;
+      }
+    }
+    else {
+      dest->u16[i] = v1.f32[i];
+
+    }
+
+    if (((1 << i) & m2) == 0) {
+      if (zero_mask) {
+	dest->u16[i+16] = 0;
+      }
+    }
+    else {
+      dest->u16[i+16] = v2.f32[i];
+    }
+  }
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_w)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvttph_epu16) (HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epu16);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0);
+  SI(res) = INTRINSIC (_mask_cvttph_epu16) (SI(res), MASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epu16);
+
+  EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1);
+  SI(res) = INTRINSIC (_maskz_cvttph_epu16) (ZMASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epu16);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_w)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtt_roundph_epu16) (HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epu16);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0);
+  SI(res) = INTRINSIC (_mask_cvtt_roundph_epu16) (SI(res), MASK_VALUE, HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epu16);
+
+  EMULATE(cvtph2_w)(&exp, src1, ZMASK_VALUE, 1);
+  SI(res) = INTRINSIC (_maskz_cvtt_roundph_epu16) (ZMASK_VALUE, HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epu16);
+#endif
+
+  if (n_errs != 0)
+    abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1a.c
new file mode 100644
index 00000000000..a918594d0d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\{sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512i res, res1, res2;
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m32;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvttph_epi16 (x1);
+  res1 = _mm512_mask_cvttph_epi16 (res, m32, x2);
+  res2 = _mm512_maskz_cvttph_epi16 (m32, x3);
+  res = _mm512_cvtt_roundph_epi16 (x1, 4);
+  res1 = _mm512_mask_cvtt_roundph_epi16 (res, m32, x2, 8);
+  res2 = _mm512_maskz_cvtt_roundph_epi16 (m32, x3, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1b.c
new file mode 100644
index 00000000000..23bc8e680c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttph2w-1b.c
@@ -0,0 +1,83 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_w) (V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+  m2 = (k >> 16) & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	dest->u16[i] = 0;
+      }
+    }
+    else {
+      dest->u16[i] = v1.f32[i];
+
+    }
+
+    if (((1 << i) & m2) == 0) {
+      if (zero_mask) {
+	dest->u16[i+16] = 0;
+      }
+    }
+    else {
+      dest->u16[i+16] = v2.f32[i];
+    }
+  }
+}
+
+void
+TEST (void)
+{
+  V512 res, exp;
+
+  init_src();
+
+  EMULATE(cvtph2_w)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvttph_epi16) (HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvttph_epi16);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0);
+  SI(res) = INTRINSIC (_mask_cvttph_epi16) (SI(res), MASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvttph_epi16);
+
+  EMULATE(cvtph2_w)(&exp, src1,  ZMASK_VALUE, 1);
+  SI(res) = INTRINSIC (_maskz_cvttph_epi16) (ZMASK_VALUE, HF(src1));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvttph_epi16);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_w)(&exp, src1,  NET_MASK, 0);
+  SI(res) = INTRINSIC (_cvtt_roundph_epi16) (HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtt_roundph_epi16);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_w)(&exp, src1, MASK_VALUE, 0);
+  SI(res) = INTRINSIC (_mask_cvtt_roundph_epi16) (SI(res), MASK_VALUE, HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtt_roundph_epi16);
+
+  EMULATE(cvtph2_w)(&exp, src1,  ZMASK_VALUE, 1);
+  SI(res) = INTRINSIC (_maskz_cvtt_roundph_epi16) (ZMASK_VALUE, HF(src1), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtt_roundph_epi16);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c
new file mode 100644
index 00000000000..b4c084020ac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvttph_epi32 (x3);
+  res1 = _mm256_mask_cvttph_epi32 (res1, m8, x3);
+  res1 = _mm256_maskz_cvttph_epi32 (m8, x3);
+
+  res2 = _mm_cvttph_epi32 (x3);
+  res2 = _mm_mask_cvttph_epi32 (res2, m8, x3);
+  res2 = _mm_maskz_cvttph_epi32 (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c
new file mode 100644
index 00000000000..f9d82f92f4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2dq-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2dq-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c
new file mode 100644
index 00000000000..421c688ee29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvttph_epi64 (x3);
+  res1 = _mm256_mask_cvttph_epi64 (res1, m8, x3);
+  res1 = _mm256_maskz_cvttph_epi64 (m8, x3);
+
+  res2 = _mm_cvttph_epi64 (x3);
+  res2 = _mm_mask_cvttph_epi64 (res2, m8, x3);
+  res2 = _mm_maskz_cvttph_epi64 (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c
new file mode 100644
index 00000000000..323ab74fa05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2qq-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2qq-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c
new file mode 100644
index 00000000000..60f43189d61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvttph_epu32 (x3);
+  res1 = _mm256_mask_cvttph_epu32 (res1, m8, x3);
+  res1 = _mm256_maskz_cvttph_epu32 (m8, x3);
+
+  res2 = _mm_cvttph_epu32 (x3);
+  res2 = _mm_mask_cvttph_epu32 (res2, m8, x3);
+  res2 = _mm_maskz_cvttph_epu32 (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c
new file mode 100644
index 00000000000..61365d456c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2udq-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2udq-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c
new file mode 100644
index 00000000000..37008f9d9e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvttph_epu64 (x3);
+  res1 = _mm256_mask_cvttph_epu64 (res1, m8, x3);
+  res1 = _mm256_maskz_cvttph_epu64 (m8, x3);
+
+  res2 = _mm_cvttph_epu64 (x3);
+  res2 = _mm_mask_cvttph_epu64 (res2, m8, x3);
+  res2 = _mm_maskz_cvttph_epu64 (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c
new file mode 100644
index 00000000000..6360402e6d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2uqq-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2uqq-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c
new file mode 100644
index 00000000000..eafa31a786b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m256h x3;
+volatile __m128h x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvttph_epu16 (x3);
+  res1 = _mm256_mask_cvttph_epu16 (res1, m16, x3);
+  res1 = _mm256_maskz_cvttph_epu16 (m16, x3);
+
+  res2 = _mm_cvttph_epu16 (x4);
+  res2 = _mm_mask_cvttph_epu16 (res2, m8, x4);
+  res2 = _mm_maskz_cvttph_epu16 (m8, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c
new file mode 100644
index 00000000000..dd5ed9d5b38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2uw-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2uw-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c
new file mode 100644
index 00000000000..7476d3c1160
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __m256h x3;
+volatile __m128h x4;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvttph_epi16 (x3);
+  res1 = _mm256_mask_cvttph_epi16 (res1, m16, x3);
+  res1 = _mm256_maskz_cvttph_epi16 (m16, x3);
+
+  res2 = _mm_cvttph_epi16 (x4);
+  res2 = _mm_mask_cvttph_epi16 (res2, m8, x4);
+  res2 = _mm_maskz_cvttph_epi16 (m8, x4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c
new file mode 100644
index 00000000000..7a04a6a8ebc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2w-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvttph2w-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 34/62] AVX512FP16: Add vcvttsh2si/vcvttsh2usi.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (32 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 33/62] AVX512FP16: Add testcase for vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 35/62] AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx liuhongt
                   ` (27 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_cvttsh_i32):
	New intrinsic.
	(_mm_cvttsh_u32): Likewise.
	(_mm_cvtt_roundsh_i32): Likewise.
	(_mm_cvtt_roundsh_u32): Likewise.
	(_mm_cvttsh_i64): Likewise.
	(_mm_cvttsh_u64): Likewise.
	(_mm_cvtt_roundsh_i64): Likewise.
	(_mm_cvtt_roundsh_u64): Likewise.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/sse.md
	(avx512fp16_fix<fixunssuffix>_trunc<mode>2<round_saeonly_name>):
	New.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: New test.
	* gcc.target/i386/avx512fp16-vcvttsh2si-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c: Ditto.
	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h            | 81 +++++++++++++++++++
 gcc/config/i386/i386-builtin.def              |  4 +
 gcc/config/i386/sse.md                        | 16 ++++
 gcc/testsuite/gcc.target/i386/avx-1.c         |  4 +
 .../i386/avx512fp16-vcvttsh2si-1a.c           | 16 ++++
 .../i386/avx512fp16-vcvttsh2si-1b.c           | 54 +++++++++++++
 .../i386/avx512fp16-vcvttsh2si64-1a.c         | 16 ++++
 .../i386/avx512fp16-vcvttsh2si64-1b.c         | 52 ++++++++++++
 .../i386/avx512fp16-vcvttsh2usi-1a.c          | 16 ++++
 .../i386/avx512fp16-vcvttsh2usi-1b.c          | 54 +++++++++++++
 .../i386/avx512fp16-vcvttsh2usi64-1a.c        | 16 ++++
 .../i386/avx512fp16-vcvttsh2usi64-1b.c        | 53 ++++++++++++
 gcc/testsuite/gcc.target/i386/sse-13.c        |  4 +
 gcc/testsuite/gcc.target/i386/sse-14.c        |  4 +
 gcc/testsuite/gcc.target/i386/sse-22.c        |  4 +
 gcc/testsuite/gcc.target/i386/sse-23.c        |  4 +
 16 files changed, 398 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 66de5b88927..bcd04f14769 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -4148,6 +4148,87 @@ _mm_cvt_roundsh_u64 (__m128h __A, const int __R)
 #endif /* __OPTIMIZE__ */
 #endif /* __x86_64__ */
 
+/* Intrinsics vcvttsh2si, vcvttsh2us.  */
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttsh_i32 (__m128h __A)
+{
+  return (int)
+    __builtin_ia32_vcvttsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline unsigned
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttsh_u32 (__m128h __A)
+{
+  return (int)
+    __builtin_ia32_vcvttsh2usi32_round (__A, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtt_roundsh_i32 (__m128h __A, const int __R)
+{
+  return (int) __builtin_ia32_vcvttsh2si32_round (__A, __R);
+}
+
+extern __inline unsigned
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtt_roundsh_u32 (__m128h __A, const int __R)
+{
+  return (int) __builtin_ia32_vcvttsh2usi32_round (__A, __R);
+}
+
+#else
+#define _mm_cvtt_roundsh_i32(A, B)		\
+  ((int)__builtin_ia32_vcvttsh2si32_round ((A), (B)))
+#define _mm_cvtt_roundsh_u32(A, B)		\
+  ((int)__builtin_ia32_vcvttsh2usi32_round ((A), (B)))
+
+#endif /* __OPTIMIZE__ */
+
+#ifdef __x86_64__
+extern __inline long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttsh_i64 (__m128h __A)
+{
+  return (long long)
+    __builtin_ia32_vcvttsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvttsh_u64 (__m128h __A)
+{
+  return (long long)
+    __builtin_ia32_vcvttsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtt_roundsh_i64 (__m128h __A, const int __R)
+{
+  return (long long) __builtin_ia32_vcvttsh2si64_round (__A, __R);
+}
+
+extern __inline unsigned long long
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtt_roundsh_u64 (__m128h __A, const int __R)
+{
+  return (long long) __builtin_ia32_vcvttsh2usi64_round (__A, __R);
+}
+
+#else
+#define _mm_cvtt_roundsh_i64(A, B)			\
+  ((long long)__builtin_ia32_vcvttsh2si64_round ((A), (B)))
+#define _mm_cvtt_roundsh_u64(A, B)			\
+  ((long long)__builtin_ia32_vcvttsh2usi64_round ((A), (B)))
+
+#endif /* __OPTIMIZE__ */
+#endif /* __x86_64__ */
+
 /* Intrinsics vcvtsi2sh, vcvtusi2sh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 17571e3b4c3..4e6d08c2d3f 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -3116,6 +3116,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2si_round, "__b
 BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2siq_round, "__builtin_ia32_vcvtsh2si64_round", IX86_BUILTIN_VCVTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usi_round, "__builtin_ia32_vcvtsh2usi32_round", IX86_BUILTIN_VCVTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT)
 BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usiq_round, "__builtin_ia32_vcvtsh2usi64_round", IX86_BUILTIN_VCVTSH2USI64_ROUND, UNKNOWN, (int) UINT64_FTYPE_V8HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncsi2_round, "__builtin_ia32_vcvttsh2si32_round", IX86_BUILTIN_VCVTTSH2SI32_ROUND, UNKNOWN, (int) INT_FTYPE_V8HF_INT)
+BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fix_truncdi2_round, "__builtin_ia32_vcvttsh2si64_round", IX86_BUILTIN_VCVTTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncsi2_round, "__builtin_ia32_vcvttsh2usi32_round", IX86_BUILTIN_VCVTTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT)
+BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fixuns_truncdi2_round, "__builtin_ia32_vcvttsh2usi64_round", IX86_BUILTIN_VCVTTSH2USI64_ROUND, UNKNOWN, (int) UINT64_FTYPE_V8HF_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__builtin_ia32_vcvtsi2sh32_round", IX86_BUILTIN_VCVTSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_INT)
 BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 66b4fa61eb5..c16e0dc46a7 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5670,6 +5670,22 @@ (define_insn "avx512fp16_fix<fixunssuffix>_truncv2di2<mask_name>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "TI")])
 
+(define_insn "avx512fp16_fix<fixunssuffix>_trunc<mode>2<round_saeonly_name>"
+  [(set (match_operand:SWI48 0 "register_operand" "=r,r")
+	(any_fix:SWI48
+	  (vec_select:HF
+	    (match_operand:V8HF 1 "<round_saeonly_nimm_scalar_predicate>" "v,<round_saeonly_constraint>")
+	    (parallel [(const_int 0)]))))]
+  "TARGET_AVX512FP16"
+  "%vcvttsh2<fixsuffix>si\t{<round_saeonly_op2>%1, %0|%0, %k1<round_saeonly_op2>}"
+  [(set_attr "type" "sseicvt")
+   (set_attr "athlon_decode" "double,vector")
+   (set_attr "amdfam10_decode" "double,double")
+   (set_attr "bdver1_decode" "double,double")
+   (set_attr "prefix_rep" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel single-precision floating point conversion operations
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 4b6cf7e1ed6..595a6ac007a 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -741,6 +741,10 @@
 #define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8)
 #define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8)
 #define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8)
+#define __builtin_ia32_vcvttsh2si32_round(A, B) __builtin_ia32_vcvttsh2si32_round(A, 8)
+#define __builtin_ia32_vcvttsh2si64_round(A, B) __builtin_ia32_vcvttsh2si64_round(A, 8)
+#define __builtin_ia32_vcvttsh2usi32_round(A, B) __builtin_ia32_vcvttsh2usi32_round(A, 8)
+#define __builtin_ia32_vcvttsh2usi64_round(A, B) __builtin_ia32_vcvttsh2usi64_round(A, 8)
 #define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8)
 #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
 #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1a.c
new file mode 100644
index 00000000000..80d84fce153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x1;
+volatile int res1;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm_cvttsh_i32 (x1);
+  res1 = _mm_cvtt_roundsh_i32 (x1, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1b.c
new file mode 100644
index 00000000000..c5b0a64d5f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si-1b.c
@@ -0,0 +1,54 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 2
+
+void NOINLINE
+emulate_cvtph2_d(V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u32[i] = 0;
+      }
+      else {
+	v5.u32[i] = dest->u32[i];
+      }
+    }
+    else {
+      v5.u32[i] = v1.f32[i];
+
+    }
+  }
+  *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_cvtph2_d(&exp, src1,  NET_MASK, 0);
+  res.i32[0] = _mm_cvtt_roundsh_i32(src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvtt_roundsh_i32");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c
new file mode 100644
index 00000000000..76a9053ef89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x1;
+volatile long long res2;
+
+void extern
+avx512f_test (void)
+{
+  res2 = _mm_cvttsh_i64 (x1);
+  res2 = _mm_cvtt_roundsh_i64 (x1, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c
new file mode 100644
index 00000000000..4e0fe5bb6bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c
@@ -0,0 +1,52 @@
+/* { dg-do run { target { { ! ia32 } && avx512fp16 } } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 4
+
+void NOINLINE
+emulate_cvtph2_q(V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 8; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u64[i] = 0;
+      }
+      else {
+	v5.u64[i] = dest->u64[i];
+      }
+    }
+    else {
+      v5.u64[i] = v1.f32[i];
+    }
+  }
+  *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_cvtph2_q(&exp, src1,  NET_MASK, 0);
+  res.s64[0] = _mm_cvtt_roundsh_i64(src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvtt_roundsh_i64");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c
new file mode 100644
index 00000000000..59564578a4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%eax" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x1;
+volatile unsigned int res1;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm_cvttsh_u32 (x1);
+  res1 = _mm_cvtt_roundsh_u32 (x1, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c
new file mode 100644
index 00000000000..214e3e13db7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c
@@ -0,0 +1,54 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 2
+
+void NOINLINE
+emulate_cvtph2_d(V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u32[i] = 0;
+      }
+      else {
+	v5.u32[i] = dest->u32[i];
+      }
+    }
+    else {
+      v5.u32[i] = v1.f32[i];
+
+    }
+  }
+  *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_cvtph2_d(&exp, src1,  NET_MASK, 0);
+  res.u32[0] = _mm_cvtt_roundsh_i32(src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvtt_roundsh_u32");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c
new file mode 100644
index 00000000000..23e8e70a901
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile  { target { ! ia32 } } } */
+/* { dg-options "-mavx512fp16 -O2 " } */
+/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%rax" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h x1;
+volatile unsigned long long res2;
+
+void extern
+avx512f_test (void)
+{
+  res2 = _mm_cvttsh_u64 (x1);
+  res2 = _mm_cvtt_roundsh_u64 (x1, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c
new file mode 100644
index 00000000000..863fb6e167d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c
@@ -0,0 +1,53 @@
+/* { dg-do run  { target { { ! ia32 } && avx512fp16 } } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 4
+
+void NOINLINE
+emulate_cvtph2_q(V512 * dest, V512 op1,
+		 __mmask32 k, int zero_mask)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  __mmask16 m1, m2;
+
+  m1 = k & 0xffff;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+
+  for (i = 0; i < 8; i++) {
+    if (((1 << i) & m1) == 0) {
+      if (zero_mask) {
+	v5.u64[i] = 0;
+      }
+      else {
+	v5.u64[i] = dest->u64[i];
+      }
+    }
+    else {
+      v5.u64[i] = v1.f32[i];
+    }
+  }
+  *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+  
+  init_src();
+  emulate_cvtph2_q(&exp, src1,  NET_MASK, 0);
+  res.u64[0] = _mm_cvtt_roundsh_i64(src1.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, 4, "_mm_cvtt_roundsh_u64");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 2e730d554dd..0d976fb0de4 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -758,6 +758,10 @@
 #define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8)
 #define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8)
 #define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8)
+#define __builtin_ia32_vcvttsh2si32_round(A, B) __builtin_ia32_vcvttsh2si32_round(A, 8)
+#define __builtin_ia32_vcvttsh2si64_round(A, B) __builtin_ia32_vcvttsh2si64_round(A, 8)
+#define __builtin_ia32_vcvttsh2usi32_round(A, B) __builtin_ia32_vcvttsh2usi32_round(A, 8)
+#define __builtin_ia32_vcvttsh2usi64_round(A, B) __builtin_ia32_vcvttsh2usi64_round(A, 8)
 #define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8)
 #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
 #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 98e38fb025a..403f3af6067 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -698,9 +698,13 @@ test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
 test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8)
 test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8)
+test_1 (_mm_cvtt_roundsh_i32, int, __m128h, 8)
+test_1 (_mm_cvtt_roundsh_u32, unsigned, __m128h, 8)
 #ifdef __x86_64__
 test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8)
 test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8)
+test_1 (_mm_cvtt_roundsh_i64, long long, __m128h, 8)
+test_1 (_mm_cvtt_roundsh_u64, unsigned long long, __m128h, 8)
 test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8)
 test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8)
 #endif
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 3ad10908d49..b980ac3cddd 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -803,9 +803,13 @@ test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
 test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8)
 test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8)
+test_1 (_mm_cvtt_roundsh_i32, int, __m128h, 8)
+test_1 (_mm_cvtt_roundsh_u32, unsigned, __m128h, 8)
 #ifdef __x86_64__
 test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8)
 test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8)
+test_1 (_mm_cvtt_roundsh_i64, long long, __m128h, 8)
+test_1 (_mm_cvtt_roundsh_u64, unsigned long long, __m128h, 8)
 test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8)
 test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8)
 #endif
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 6990f93bfce..1bd734a9352 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -759,6 +759,10 @@
 #define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8)
 #define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8)
 #define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8)
+#define __builtin_ia32_vcvttsh2si32_round(A, B) __builtin_ia32_vcvttsh2si32_round(A, 8)
+#define __builtin_ia32_vcvttsh2si64_round(A, B) __builtin_ia32_vcvttsh2si64_round(A, 8)
+#define __builtin_ia32_vcvttsh2usi32_round(A, B) __builtin_ia32_vcvttsh2usi32_round(A, 8)
+#define __builtin_ia32_vcvttsh2usi64_round(A, B) __builtin_ia32_vcvttsh2usi64_round(A, 8)
 #define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8)
 #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
 #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 35/62] AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (33 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 34/62] AVX512FP16: Add vcvttsh2si/vcvttsh2usi liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 36/62] AVX512FP16: Add testcase for vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx liuhongt
                   ` (26 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm512_cvtph_pd):
	New intrinsic.
	(_mm512_mask_cvtph_pd): Likewise.
	(_mm512_maskz_cvtph_pd): Likewise.
	(_mm512_cvt_roundph_pd): Likewise.
	(_mm512_mask_cvt_roundph_pd): Likewise.
	(_mm512_maskz_cvt_roundph_pd): Likewise.
	(_mm512_cvtxph_ps): Likewise.
	(_mm512_mask_cvtxph_ps): Likewise.
	(_mm512_maskz_cvtxph_ps): Likewise.
	(_mm512_cvtx_roundph_ps): Likewise.
	(_mm512_mask_cvtx_roundph_ps): Likewise.
	(_mm512_maskz_cvtx_roundph_ps): Likewise.
	(_mm512_cvtxps_ph): Likewise.
	(_mm512_mask_cvtxps_ph): Likewise.
	(_mm512_maskz_cvtxps_ph): Likewise.
	(_mm512_cvtx_roundps_ph): Likewise.
	(_mm512_mask_cvtx_roundps_ph): Likewise.
	(_mm512_maskz_cvtx_roundps_ph): Likewise.
	(_mm512_cvtpd_ph): Likewise.
	(_mm512_mask_cvtpd_ph): Likewise.
	(_mm512_maskz_cvtpd_ph): Likewise.
	(_mm512_cvt_roundpd_ph): Likewise.
	(_mm512_mask_cvt_roundpd_ph): Likewise.
	(_mm512_maskz_cvt_roundpd_ph): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_cvtph_pd):
	New intrinsic.
	(_mm_mask_cvtph_pd): Likewise.
	(_mm_maskz_cvtph_pd): Likewise.
	(_mm256_cvtph_pd): Likewise.
	(_mm256_mask_cvtph_pd): Likewise.
	(_mm256_maskz_cvtph_pd): Likewise.
	(_mm_cvtxph_ps): Likewise.
	(_mm_mask_cvtxph_ps): Likewise.
	(_mm_maskz_cvtxph_ps): Likewise.
	(_mm256_cvtxph_ps): Likewise.
	(_mm256_mask_cvtxph_ps): Likewise.
	(_mm256_maskz_cvtxph_ps): Likewise.
	(_mm_cvtxps_ph): Likewise.
	(_mm_mask_cvtxps_ph): Likewise.
	(_mm_maskz_cvtxps_ph): Likewise.
	(_mm256_cvtxps_ph): Likewise.
	(_mm256_mask_cvtxps_ph): Likewise.
	(_mm256_maskz_cvtxps_ph): Likewise.
	(_mm_cvtpd_ph): Likewise.
	(_mm_mask_cvtpd_ph): Likewise.
	(_mm_maskz_cvtpd_ph): Likewise.
	(_mm256_cvtpd_ph): Likewise.
	(_mm256_mask_cvtpd_ph): Likewise.
	(_mm256_maskz_cvtpd_ph): Likewise.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/i386-builtin-types.def: Add corresponding builtin types.
	* config/i386/i386-expand.c: Handle new builtin types.
	* config/i386/sse.md
	(VF4_128_8_256): New.
	(VF48H_AVX512VL): Ditto.
	(ssePHmode): Add HF vector modes.
	(castmode): Add new convertable modes.
	(qq2phsuff): Ditto.
	(ph2pssuffix): New.
	(avx512fp16_vcvt<castmode>2ph_<mode><mask_name><round_name>): Ditto.
	(avx512fp16_vcvt<castmode>2ph_<mode>): Ditto.
	(*avx512fp16_vcvt<castmode>2ph_<mode>): Ditto.
	(avx512fp16_vcvt<castmode>2ph_<mode>_mask): Ditto.
	(*avx512fp16_vcvt<castmode>2ph_<mode>_mask): Ditto.
	(*avx512fp16_vcvt<castmode>2ph_<mode>_mask_1): Ditto.
	(avx512fp16_vcvtpd2ph_v2df): Ditto.
	(*avx512fp16_vcvtpd2ph_v2df): Ditto.
	(avx512fp16_vcvtpd2ph_v2df_mask): Ditto.
	(*avx512fp16_vcvtpd2ph_v2df_mask): Ditto.
	(*avx512fp16_vcvtpd2ph_v2df_mask_1): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 297 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   | 200 +++++++++++++++++
 gcc/config/i386/i386-builtin-types.def |  12 +
 gcc/config/i386/i386-builtin.def       |  12 +
 gcc/config/i386/i386-expand.c          |  12 +
 gcc/config/i386/sse.md                 | 189 +++++++++++++++-
 gcc/testsuite/gcc.target/i386/avx-1.c  |   4 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   4 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  12 +
 gcc/testsuite/gcc.target/i386/sse-22.c |  12 +
 gcc/testsuite/gcc.target/i386/sse-23.c |   4 +
 11 files changed, 755 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index bcd04f14769..5a6a0ba83a9 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -4306,6 +4306,303 @@ _mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R)
 #endif /* __OPTIMIZE__ */
 #endif /* __x86_64__ */
 
+/* Intrinsics vcvtph2pd.  */
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtph_pd (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2pd_v8df_mask_round (__A,
+						   _mm512_setzero_pd (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtph_pd (__m512d __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2pd_v8df_mask_round (__C, __A, __B,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtph_pd (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2pd_v8df_mask_round (__B,
+						   _mm512_setzero_pd (),
+						   __A,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundph_pd (__m128h __A, int __B)
+{
+  return __builtin_ia32_vcvtph2pd_v8df_mask_round (__A,
+						   _mm512_setzero_pd (),
+						   (__mmask8) -1,
+						   __B);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundph_pd (__m512d __A, __mmask8 __B, __m128h __C, int __D)
+{
+  return __builtin_ia32_vcvtph2pd_v8df_mask_round (__C, __A, __B, __D);
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundph_pd (__mmask8 __A, __m128h __B, int __C)
+{
+  return __builtin_ia32_vcvtph2pd_v8df_mask_round (__B,
+						   _mm512_setzero_pd (),
+						   __A,
+						   __C);
+}
+
+#else
+#define _mm512_cvt_roundph_pd(A, B)					\
+  (__builtin_ia32_vcvtph2pd_v8df_mask_round ((A),			\
+					     _mm512_setzero_pd (),	\
+					     (__mmask8)-1,		\
+					     (B)))
+
+#define _mm512_mask_cvt_roundph_pd(A, B, C, D)				\
+  (__builtin_ia32_vcvtph2pd_v8df_mask_round ((C), (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundph_pd(A, B, C)				\
+  (__builtin_ia32_vcvtph2pd_v8df_mask_round ((B),			\
+					     _mm512_setzero_pd (),	\
+					     (A),			\
+					     (C)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtph2psx.  */
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtxph_ps (__m256h __A)
+{
+  return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__A,
+						   _mm512_setzero_ps (),
+						   (__mmask16) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtxph_ps (__m512 __A, __mmask16 __B, __m256h __C)
+{
+  return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__C, __A, __B,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtxph_ps (__mmask16 __A, __m256h __B)
+{
+  return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__B,
+						   _mm512_setzero_ps (),
+						   __A,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtx_roundph_ps (__m256h __A, int __B)
+{
+  return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__A,
+						   _mm512_setzero_ps (),
+						   (__mmask16) -1,
+						   __B);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtx_roundph_ps (__m512 __A, __mmask16 __B, __m256h __C, int __D)
+{
+  return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__C, __A, __B, __D);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtx_roundph_ps (__mmask16 __A, __m256h __B, int __C)
+{
+  return __builtin_ia32_vcvtph2ps_v16sf_mask_round (__B,
+						   _mm512_setzero_ps (),
+						   __A,
+						   __C);
+}
+
+#else
+#define _mm512_cvtx_roundph_ps(A, B)					\
+  (__builtin_ia32_vcvtph2ps_v16sf_mask_round ((A),			\
+					     _mm512_setzero_ps (),	\
+					     (__mmask16)-1,		\
+					     (B)))
+
+#define _mm512_mask_cvtx_roundph_ps(A, B, C, D)				\
+  (__builtin_ia32_vcvtph2ps_v16sf_mask_round ((C), (A), (B), (D)))
+
+#define _mm512_maskz_cvtx_roundph_ps(A, B, C)				\
+  (__builtin_ia32_vcvtph2ps_v16sf_mask_round ((B),			\
+					     _mm512_setzero_ps (),	\
+					     (A),			\
+					     (C)))
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtps2ph.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtxps_ph (__m512 __A)
+{
+  return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __A,
+						   _mm256_setzero_ph (),
+						   (__mmask16) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtxps_ph (__m256h __A, __mmask16 __B, __m512 __C)
+{
+  return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __C,
+						   __A, __B,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtxps_ph (__mmask16 __A, __m512 __B)
+{
+  return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __B,
+						   _mm256_setzero_ph (),
+						   __A,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtx_roundps_ph (__m512 __A, int __B)
+{
+  return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __A,
+						   _mm256_setzero_ph (),
+						   (__mmask16) -1,
+						   __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtx_roundps_ph (__m256h __A, __mmask16 __B, __m512 __C, int __D)
+{
+  return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __C,
+						   __A, __B, __D);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtx_roundps_ph (__mmask16 __A, __m512 __B, int __C)
+{
+  return __builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf) __B,
+						   _mm256_setzero_ph (),
+						   __A, __C);
+}
+
+#else
+#define _mm512_cvtx_roundps_ph(A, B)					\
+  (__builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf)(A),		\
+					      _mm256_setzero_ph (),	\
+					     (__mmask16)-1, (B)))
+
+#define _mm512_mask_cvtx_roundps_ph(A, B, C, D)			\
+  (__builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf)(C),	\
+					     (A), (B), (D)))
+
+#define _mm512_maskz_cvtx_roundps_ph(A, B, C)			\
+  (__builtin_ia32_vcvtps2ph_v16sf_mask_round ((__v16sf)(B),	\
+					     _mm256_setzero_ph (),	\
+					     (A), (C)))
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtpd2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtpd_ph (__m512d __A)
+{
+  return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __A,
+						   _mm_setzero_ph (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m512d __C)
+{
+  return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __C,
+						   __A, __B,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvtpd_ph (__mmask8 __A, __m512d __B)
+{
+  return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __B,
+						   _mm_setzero_ph (),
+						   __A,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvt_roundpd_ph (__m512d __A, int __B)
+{
+  return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __A,
+						   _mm_setzero_ph (),
+						   (__mmask8) -1,
+						   __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_cvt_roundpd_ph (__m128h __A, __mmask8 __B, __m512d __C, int __D)
+{
+  return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __C,
+						   __A, __B, __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C)
+{
+  return __builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df) __B,
+						   _mm_setzero_ph (),
+						   __A, __C);
+}
+
+#else
+#define _mm512_cvt_roundpd_ph(A, B)					\
+  (__builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df)(A),		\
+					     _mm_setzero_ph (),		\
+					     (__mmask8)-1, (B)))
+
+#define _mm512_mask_cvt_roundpd_ph(A, B, C, D)			\
+  (__builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df)(C),	\
+					     (A), (B), (D)))
+
+#define _mm512_maskz_cvt_roundpd_ph(A, B, C)			\
+  (__builtin_ia32_vcvtpd2ph_v8df_mask_round ((__v8df)(B),	\
+					     _mm_setzero_ph (),	\
+					     (A), (C)))
+
+#endif /* __OPTIMIZE__ */
 
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index e1ee37edde6..0124b830dd5 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -1952,6 +1952,206 @@ _mm256_maskz_cvtepu16_ph (__mmask16 __A, __m256i __B)
 					      __A);
 }
 
+/* Intrinsics vcvtph2pd.  */
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtph_pd (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2pd_v2df_mask (__A,
+					     _mm_setzero_pd (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtph_pd (__m128d __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2pd_v2df_mask (__C, __A, __B);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtph_pd (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2pd_v2df_mask (__B, _mm_setzero_pd (), __A);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtph_pd (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2pd_v4df_mask (__A,
+					     _mm256_setzero_pd (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtph_pd (__m256d __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2pd_v4df_mask (__C, __A, __B);
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtph_pd (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2pd_v4df_mask (__B,
+					     _mm256_setzero_pd (),
+					     __A);
+}
+
+/* Intrinsics vcvtph2ps.  */
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtxph_ps (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2ps_v4sf_mask (__A,
+					     _mm_setzero_ps (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtxph_ps (__m128 __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2ps_v4sf_mask (__C, __A, __B);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtxph_ps (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2ps_v4sf_mask (__B, _mm_setzero_ps (), __A);
+}
+
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtxph_ps (__m128h __A)
+{
+  return __builtin_ia32_vcvtph2ps_v8sf_mask (__A,
+					     _mm256_setzero_ps (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtxph_ps (__m256 __A, __mmask8 __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtph2ps_v8sf_mask (__C, __A, __B);
+}
+
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtxph_ps (__mmask8 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtph2ps_v8sf_mask (__B,
+					     _mm256_setzero_ps (),
+					     __A);
+}
+
+/* Intrinsics vcvtxps2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtxps_ph (__m128 __A)
+{
+  return __builtin_ia32_vcvtps2ph_v4sf_mask ((__v4sf) __A,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtxps_ph (__m128h __A, __mmask8 __B, __m128 __C)
+{
+  return __builtin_ia32_vcvtps2ph_v4sf_mask ((__v4sf) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtxps_ph (__mmask8 __A, __m128 __B)
+{
+  return __builtin_ia32_vcvtps2ph_v4sf_mask ((__v4sf) __B,
+					     _mm_setzero_ph (),
+					     __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtxps_ph (__m256 __A)
+{
+  return __builtin_ia32_vcvtps2ph_v8sf_mask ((__v8sf) __A,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtxps_ph (__m128h __A, __mmask8 __B, __m256 __C)
+{
+  return __builtin_ia32_vcvtps2ph_v8sf_mask ((__v8sf) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtxps_ph (__mmask8 __A, __m256 __B)
+{
+  return __builtin_ia32_vcvtps2ph_v8sf_mask ((__v8sf) __B,
+					     _mm_setzero_ph (),
+					     __A);
+}
+
+/* Intrinsics vcvtpd2ph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtpd_ph (__m128d __A)
+{
+  return __builtin_ia32_vcvtpd2ph_v2df_mask ((__v2df) __A,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m128d __C)
+{
+  return __builtin_ia32_vcvtpd2ph_v2df_mask ((__v2df) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtpd_ph (__mmask8 __A, __m128d __B)
+{
+  return __builtin_ia32_vcvtpd2ph_v2df_mask ((__v2df) __B,
+					     _mm_setzero_ph (),
+					     __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtpd_ph (__m256d __A)
+{
+  return __builtin_ia32_vcvtpd2ph_v4df_mask ((__v4df) __A,
+					     _mm_setzero_ph (),
+					     (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_cvtpd_ph (__m128h __A, __mmask8 __B, __m256d __C)
+{
+  return __builtin_ia32_vcvtpd2ph_v4df_mask ((__v4df) __C, __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_cvtpd_ph (__mmask8 __A, __m256d __B)
+{
+  return __builtin_ia32_vcvtpd2ph_v4df_mask ((__v4df) __B,
+					     _mm_setzero_ph (),
+					     __A);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 74bda59a65e..4123e66f7cd 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1321,13 +1321,21 @@ DEF_FUNCTION_TYPE (V8HF, V8HF, UINT, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, UINT64, INT)
 DEF_FUNCTION_TYPE (V2DI, V8HF, V2DI, UQI)
 DEF_FUNCTION_TYPE (V4DI, V8HF, V4DI, UQI)
+DEF_FUNCTION_TYPE (V2DF, V8HF, V2DF, UQI)
+DEF_FUNCTION_TYPE (V4DF, V8HF, V4DF, UQI)
 DEF_FUNCTION_TYPE (V4SI, V8HF, V4SI, UQI)
+DEF_FUNCTION_TYPE (V4SF, V8HF, V4SF, UQI)
 DEF_FUNCTION_TYPE (V8SI, V8HF, V8SI, UQI)
+DEF_FUNCTION_TYPE (V8SF, V8HF, V8SF, UQI)
 DEF_FUNCTION_TYPE (V8HI, V8HF, V8HI, UQI)
 DEF_FUNCTION_TYPE (V8HF, V4SI, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V4SF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8SI, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V8SF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V2DI, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V4DI, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V2DF, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V4DF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HI, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
@@ -1336,7 +1344,9 @@ DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT)
 DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT)
+DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8DI, V8HF, UQI, INT)
+DEF_FUNCTION_TYPE (V8HF, V8DF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
@@ -1344,9 +1354,11 @@ DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HI, V16HF, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V16SI, V16HF, V16SI, UHI, INT)
+DEF_FUNCTION_TYPE (V16SF, V16HF, V16SF, UHI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI)
 DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16SI, V16HF, UHI, INT)
+DEF_FUNCTION_TYPE (V16HF, V16SF, V16HF, UHI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 4e6d08c2d3f..2992bd0383d 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2867,6 +2867,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v4di_mask, "__builtin_ia32_vcvtqq2ph_v4di_mask", IX86_BUILTIN_VCVTQQ2PH_V4DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DI_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v2di_mask, "__builtin_ia32_vcvtuqq2ph_v2di_mask", IX86_BUILTIN_VCVTUQQ2PH_V2DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DI_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v4di_mask, "__builtin_ia32_vcvtuqq2ph_v4di_mask", IX86_BUILTIN_VCVTUQQ2PH_V4DI_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DI_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv2df2_mask, "__builtin_ia32_vcvtph2pd_v2df_mask", IX86_BUILTIN_VCVTPH2PD_V2DF_MASK, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv4df2_mask, "__builtin_ia32_vcvtph2pd_v4df_mask", IX86_BUILTIN_VCVTPH2PD_V4DF_MASK, UNKNOWN, (int) V4DF_FTYPE_V8HF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv4sf2_mask, "__builtin_ia32_vcvtph2ps_v4sf_mask", IX86_BUILTIN_VCVTPH2PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V8HF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv8sf2_mask, "__builtin_ia32_vcvtph2ps_v8sf_mask", IX86_BUILTIN_VCVTPH2PS_V8SF_MASK, UNKNOWN, (int) V8SF_FTYPE_V8HF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v4sf_mask, "__builtin_ia32_vcvtps2ph_v4sf_mask", IX86_BUILTIN_VCVTPS2PH_V4SF_MASK, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v8sf_mask, "__builtin_ia32_vcvtps2ph_v8sf_mask", IX86_BUILTIN_VCVTPS2PH_V8SF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8SF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v2df_mask, "__builtin_ia32_vcvtpd2ph_v2df_mask", IX86_BUILTIN_VCVTPD2PH_V2DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v4df_mask, "__builtin_ia32_vcvtpd2ph_v4df_mask", IX86_BUILTIN_VCVTPD2PH_V4DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DF_V8HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3124,6 +3132,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__b
 BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT)
 BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2shq_round, "__builtin_ia32_vcvtusi2sh64_round", IX86_BUILTIN_VCVTUSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT64_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv8df2_mask_round, "__builtin_ia32_vcvtph2pd_v8df_mask_round", IX86_BUILTIN_VCVTPH2PD_V8DF_MASK_ROUND, UNKNOWN, (int) V8DF_FTYPE_V8HF_V8DF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2ps_v16sf_mask_round", IX86_BUILTIN_VCVTPH2PS_V16SF_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph_v8df_mask_round", IX86_BUILTIN_VCVTPD2PH_V8DF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2ph_v16sf_mask_round", IX86_BUILTIN_VCVTPS2PH_V16SF_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index b83c6d9a92b..a216f6f2bf3 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9566,9 +9566,11 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V8SF_FTYPE_V8HI_V8SF_UQI:
     case V4SF_FTYPE_V8HI_V4SF_UQI:
     case V8SI_FTYPE_V8HF_V8SI_UQI:
+    case V8SF_FTYPE_V8HF_V8SF_UQI:
     case V8SI_FTYPE_V8SF_V8SI_UQI:
     case V4SI_FTYPE_V4SF_V4SI_UQI:
     case V4SI_FTYPE_V8HF_V4SI_UQI:
+    case V4SF_FTYPE_V8HF_V4SF_UQI:
     case V4DI_FTYPE_V8HF_V4DI_UQI:
     case V4DI_FTYPE_V4SF_V4DI_UQI:
     case V2DI_FTYPE_V8HF_V2DI_UQI:
@@ -9576,12 +9578,18 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V8HF_FTYPE_V8HF_V8HF_UQI:
     case V8HF_FTYPE_V8HI_V8HF_UQI:
     case V8HF_FTYPE_V8SI_V8HF_UQI:
+    case V8HF_FTYPE_V8SF_V8HF_UQI:
     case V8HF_FTYPE_V4SI_V8HF_UQI:
+    case V8HF_FTYPE_V4SF_V8HF_UQI:
     case V8HF_FTYPE_V4DI_V8HF_UQI:
+    case V8HF_FTYPE_V4DF_V8HF_UQI:
     case V8HF_FTYPE_V2DI_V8HF_UQI:
+    case V8HF_FTYPE_V2DF_V8HF_UQI:
     case V4SF_FTYPE_V4DI_V4SF_UQI:
     case V4SF_FTYPE_V2DI_V4SF_UQI:
     case V4DF_FTYPE_V4DI_V4DF_UQI:
+    case V4DF_FTYPE_V8HF_V4DF_UQI:
+    case V2DF_FTYPE_V8HF_V2DF_UQI:
     case V2DF_FTYPE_V2DI_V2DF_UQI:
     case V16QI_FTYPE_V8HI_V16QI_UQI:
     case V16QI_FTYPE_V16HI_V16QI_UHI:
@@ -10527,6 +10535,8 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V8DI_FTYPE_V8DF_V8DI_QI_INT:
     case V8SF_FTYPE_V8DI_V8SF_QI_INT:
     case V8DF_FTYPE_V8DI_V8DF_QI_INT:
+    case V8DF_FTYPE_V8HF_V8DF_UQI_INT:
+    case V16SF_FTYPE_V16HF_V16SF_UHI_INT:
     case V32HF_FTYPE_V32HI_V32HF_USI_INT:
     case V32HF_FTYPE_V32HF_V32HF_USI_INT:
     case V16SF_FTYPE_V16SF_V16SF_HI_INT:
@@ -10540,6 +10550,8 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V2DF_FTYPE_V2DF_V2DF_V2DF_INT:
     case V4SF_FTYPE_V4SF_V4SF_V4SF_INT:
     case V8HF_FTYPE_V8DI_V8HF_UQI_INT:
+    case V8HF_FTYPE_V8DF_V8HF_UQI_INT:
+    case V16HF_FTYPE_V16SF_V16HF_UHI_INT:
       nargs = 4;
       break;
     case V4SF_FTYPE_V4SF_V4SF_INT_INT:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c16e0dc46a7..7447d6b75b5 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -621,6 +621,9 @@ (define_mode_iterator V48_AVX2
    (V4SI "TARGET_AVX2") (V2DI "TARGET_AVX2")
    (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")])
 
+(define_mode_iterator VF4_128_8_256
+  [V4DF V4SF])
+
 (define_mode_iterator VI1_AVX512VLBW
   [(V64QI "TARGET_AVX512BW") (V32QI  "TARGET_AVX512VL")
 	(V16QI  "TARGET_AVX512VL")])
@@ -783,6 +786,8 @@ (define_mode_iterator VI48F_256_512
   (V4DI  "TARGET_AVX512VL") (V4DF  "TARGET_AVX512VL")])
 (define_mode_iterator VF48_I1248
   [V16SI V16SF V8DI V8DF V32HI V64QI])
+(define_mode_iterator VF48H_AVX512VL
+  [V8DF V16SF (V8SF "TARGET_AVX512VL")])
 (define_mode_iterator VI48F
   [V16SI V16SF V8DI V8DF
    (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
@@ -957,7 +962,8 @@ (define_mode_attr ssehalfvecmodelower
 (define_mode_attr ssePHmode
   [(V32HI "V32HF") (V16HI "V16HF") (V8HI "V8HF")
    (V16SI "V16HF") (V8SI "V8HF") (V4SI "V8HF")
-   (V8DI "V8HF") (V4DI "V8HF") (V2DI "V8HF")])
+   (V8DI "V8HF") (V4DI "V8HF") (V2DI "V8HF")
+   (V8DF "V8HF") (V16SF "V16HF") (V8SF "V8HF")])
 
 ;; Mapping of vector modes to packed single mode of the same size
 (define_mode_attr ssePSmode
@@ -1101,7 +1107,8 @@ (define_mode_attr sserotatemax
 
 ;; Mapping of mode to cast intrinsic name
 (define_mode_attr castmode
- [(V8SI "si") (V8SF "ps") (V4DF "pd")
+ [(V4SF "ps") (V2DF "pd")
+  (V8SI "si") (V8SF "ps") (V4DF "pd")
   (V16SI "si") (V16SF "ps") (V8DF "pd")])
 
 ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise.
@@ -5440,7 +5447,9 @@ (define_int_attr sseintconvertsignprefix
 (define_mode_attr qq2phsuff
   [(V32HI "") (V16HI "") (V8HI "")
    (V16SI "") (V8SI "{y}") (V4SI "{x}")
-   (V8DI "{z}") (V4DI "{y}") (V2DI "{x}")])
+   (V8DI "{z}") (V4DI "{y}") (V2DI "{x}")
+   (V16SF "") (V8SF "{y}") (V4SF "{x}")
+   (V8DF "{z}") (V4DF "{y}") (V2DF "{x}")])
 
 (define_insn "avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode><mask_name><round_name>"
   [(set (match_operand:VI248_AVX512VL 0 "register_operand" "=v")
@@ -5686,6 +5695,180 @@ (define_insn "avx512fp16_fix<fixunssuffix>_trunc<mode>2<round_saeonly_name>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
+(define_mode_attr ph2pssuffix
+  [(V16SF "x") (V8SF "x") (V4SF "x")
+   (V8DF "") (V4DF "") (V2DF "")])
+
+(define_insn "avx512fp16_float_extend_ph<mode>2<mask_name><round_saeonly_name>"
+  [(set (match_operand:VF48H_AVX512VL 0 "register_operand" "=v")
+	(float_extend:VF48H_AVX512VL
+	  (match_operand:<ssePHmode> 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
+  "TARGET_AVX512FP16"
+  "vcvtph2<castmode><ph2pssuffix>\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512fp16_float_extend_ph<mode>2<mask_name>"
+  [(set (match_operand:VF4_128_8_256 0 "register_operand" "=v")
+	(float_extend:VF4_128_8_256
+	  (vec_select:V4HF
+	    (match_operand:V8HF 1 "nonimmediate_operand" "vm")
+	    (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvtph2<castmode><ph2pssuffix>\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "avx512fp16_float_extend_phv2df2<mask_name>"
+  [(set (match_operand:V2DF 0 "register_operand" "=v")
+	(float_extend:V2DF
+	  (vec_select:V2HF
+	    (match_operand:V8HF 1 "nonimmediate_operand" "vm")
+	    (parallel [(const_int 0) (const_int 1)]))))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvtph2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+(define_insn "avx512fp16_vcvt<castmode>2ph_<mode><mask_name><round_name>"
+  [(set (match_operand:<ssePHmode> 0 "register_operand" "=v")
+	(float_truncate:<ssePHmode>
+	  (match_operand:VF48H_AVX512VL 1 "<round_nimm_predicate>" "<round_constraint>")))]
+  "TARGET_AVX512FP16"
+  "vcvt<castmode>2ph<ph2pssuffix><round_qq2phsuff>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512fp16_vcvt<castmode>2ph_<mode>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_concat:V8HF
+	    (float_truncate:V4HF (match_operand:VF4_128_8_256 1 "vector_operand" "vm"))
+	    (match_dup 2)))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "operands[2] = CONST0_RTX (V4HFmode);")
+
+(define_insn "*avx512fp16_vcvt<castmode>2ph_<mode>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_concat:V8HF
+	    (float_truncate:V4HF (match_operand:VF4_128_8_256 1 "vector_operand" "vm"))
+	    (match_operand:V4HF 2 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvt<castmode>2ph<ph2pssuffix><qq2phsuff>\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512fp16_vcvt<castmode>2ph_<mode>_mask"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+        (vec_merge:V4HF
+	    (float_truncate:V4HF (match_operand:VF4_128_8_256 1 "vector_operand" "vm"))
+            (vec_select:V4HF
+                (match_operand:V8HF 2 "nonimm_or_0_operand" "0C")
+                (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))
+            (match_operand:QI 3 "register_operand" "Yk"))
+	    (match_dup 4)))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "operands[4] = CONST0_RTX (V4HFmode);")
+
+(define_insn "*avx512fp16_vcvt<castmode>2ph_<mode>_mask"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+        (vec_merge:V4HF
+	    (float_truncate:V4HF (match_operand:VF4_128_8_256 1 "vector_operand" "vm"))
+            (vec_select:V4HF
+                (match_operand:V8HF 2 "nonimm_or_0_operand" "0C")
+                (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))
+            (match_operand:QI 3 "register_operand" "Yk"))
+	    (match_operand:V4HF 4 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvt<castmode>2ph<ph2pssuffix><qq2phsuff>\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "*avx512fp16_vcvt<castmode>2ph_<mode>_mask_1"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+	(vec_merge:V4HF
+		(float_truncate:V4HF (match_operand:VF4_128_8_256 1
+				  "vector_operand" "vm"))
+	    (match_operand:V4HF 3 "const0_operand" "C")
+	    (match_operand:QI 2 "register_operand" "Yk"))
+	    (match_operand:V4HF 4 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvt<castmode>2ph<ph2pssuffix><qq2phsuff>\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "avx512fp16_vcvtpd2ph_v2df"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_concat:V8HF
+	    (float_truncate:V2HF (match_operand:V2DF 1 "vector_operand" "vm"))
+	    (match_dup 2)))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "operands[2] = CONST0_RTX (V6HFmode);")
+
+(define_insn "*avx512fp16_vcvtpd2ph_v2df"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_concat:V8HF
+	    (float_truncate:V2HF (match_operand:V2DF 1 "vector_operand" "vm"))
+	    (match_operand:V6HF 2 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvtpd2ph{x}\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+(define_expand "avx512fp16_vcvtpd2ph_v2df_mask"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+        (vec_merge:V2HF
+	    (float_truncate:V2HF (match_operand:V2DF 1 "vector_operand" "vm"))
+            (vec_select:V2HF
+                (match_operand:V8HF 2 "nonimm_or_0_operand" "0C")
+                (parallel [(const_int 0) (const_int 1)]))
+            (match_operand:QI 3 "register_operand" "Yk"))
+	    (match_dup 4)))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "operands[4] = CONST0_RTX (V6HFmode);")
+
+(define_insn "*avx512fp16_vcvtpd2ph_v2df_mask"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+        (vec_merge:V2HF
+	    (float_truncate:V2HF (match_operand:V2DF 1 "vector_operand" "vm"))
+            (vec_select:V2HF
+                (match_operand:V8HF 2 "nonimm_or_0_operand" "0C")
+                (parallel [(const_int 0) (const_int 1)]))
+            (match_operand:QI 3 "register_operand" "Yk"))
+	    (match_operand:V6HF 4 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvtpd2ph{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+(define_insn "*avx512fp16_vcvtpd2ph_v2df_mask_1"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+    (vec_concat:V8HF
+	(vec_merge:V2HF
+		(float_truncate:V2HF (match_operand:V2DF 1
+				  "vector_operand" "vm"))
+	    (match_operand:V2HF 3 "const0_operand" "C")
+	    (match_operand:QI 2 "register_operand" "Yk"))
+	    (match_operand:V6HF 4 "const0_operand" "C")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "vcvtpd2ph{x}\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel single-precision floating point conversion operations
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 595a6ac007a..f186f8c40f3 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -749,6 +749,10 @@
 #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
 #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
 #define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8)
+#define __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 0d976fb0de4..0e88174e636 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -766,6 +766,10 @@
 #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
 #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
 #define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8)
+#define __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 403f3af6067..5c3e370d4a7 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -687,6 +687,8 @@ test_1 (_mm512_cvt_roundph_epu32, __m512i, __m256h, 8)
 test_1 (_mm512_cvtt_roundph_epi32, __m512i, __m256h, 8)
 test_1 (_mm512_cvtt_roundph_epu32, __m512i, __m256h, 8)
 test_1 (_mm512_cvtt_roundph_epi64, __m512i, __m128h, 8)
+test_1 (_mm512_cvt_roundph_pd, __m512d, __m128h, 8)
+test_1 (_mm512_cvtx_roundph_ps, __m512, __m256h, 8)
 test_1 (_mm512_cvtt_roundph_epu64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundph_epi64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8)
@@ -696,6 +698,8 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8)
 test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
+test_1 (_mm512_cvtx_roundps_ph, __m256h, __m512, 8)
+test_1 (_mm512_cvt_roundpd_ph, __m128h, __m512d, 8)
 test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8)
 test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8)
 test_1 (_mm_cvtt_roundsh_i32, int, __m128h, 8)
@@ -751,6 +755,8 @@ test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvtt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvtt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvtt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvt_roundph_pd, __m512d, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvtx_roundph_ps, __m512, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvtt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8)
@@ -758,6 +764,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8)
+test_2 (_mm512_maskz_cvtx_roundps_ph, __m256h, __mmask16, __m512, 8)
+test_2 (_mm512_maskz_cvt_roundpd_ph, __m128h, __mmask8, __m512d, 8)
 test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
 test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
@@ -809,6 +817,8 @@ test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvtt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvtt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvtt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvt_roundph_pd, __m512d, __m512d, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvtx_roundph_ps, __m512, __m512, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvtt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
@@ -816,6 +826,8 @@ test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
+test_3 (_mm512_mask_cvtx_roundps_ph, __m256h, __m256h, __mmask16, __m512, 8)
+test_3 (_mm512_mask_cvt_roundpd_ph, __m128h, __m128h, __mmask8, __m512d, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index b980ac3cddd..5bf94d56ce3 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -794,6 +794,8 @@ test_1 (_mm512_cvt_roundph_epu64, __m512i, __m128h, 8)
 test_1 (_mm512_cvtt_roundph_epi32, __m512i, __m256h, 8)
 test_1 (_mm512_cvtt_roundph_epu32, __m512i, __m256h, 8)
 test_1 (_mm512_cvtt_roundph_epi64, __m512i, __m128h, 8)
+test_1 (_mm512_cvt_roundph_pd, __m512d, __m128h, 8)
+test_1 (_mm512_cvtx_roundph_ps, __m512, __m256h, 8)
 test_1 (_mm512_cvtt_roundph_epu64, __m512i, __m128h, 8)
 test_1 (_mm512_cvt_roundepi16_ph, __m512h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu16_ph, __m512h, __m512i, 8)
@@ -801,6 +803,8 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8)
 test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
 test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
+test_1 (_mm512_cvtx_roundps_ph, __m256h, __m512, 8)
+test_1 (_mm512_cvt_roundpd_ph, __m128h, __m512d, 8)
 test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8)
 test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8)
 test_1 (_mm_cvtt_roundsh_i32, int, __m128h, 8)
@@ -855,6 +859,8 @@ test_2 (_mm512_maskz_cvt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvtt_roundph_epi32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvtt_roundph_epu32, __m512i, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvtt_roundph_epi64, __m512i, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvt_roundph_pd, __m512d, __mmask8, __m128h, 8)
+test_2 (_mm512_maskz_cvtx_roundph_ps, __m512, __mmask16, __m256h, 8)
 test_2 (_mm512_maskz_cvtt_roundph_epu64, __m512i, __mmask8, __m128h, 8)
 test_2 (_mm512_maskz_cvt_roundepi16_ph, __m512h, __mmask32, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu16_ph, __m512h, __mmask32, __m512i, 8)
@@ -862,6 +868,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8)
+test_2 (_mm512_maskz_cvtx_roundps_ph, __m256h, __mmask16, __m512, 8)
+test_2 (_mm512_maskz_cvt_roundpd_ph, __m128h, __mmask8, __m512d, 8)
 test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
 test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
@@ -912,6 +920,8 @@ test_3 (_mm512_mask_cvt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvtt_roundph_epi32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvtt_roundph_epu32, __m512i, __m512i, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvtt_roundph_epi64, __m512i, __m512i, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvt_roundph_pd, __m512d, __m512d, __mmask8, __m128h, 8)
+test_3 (_mm512_mask_cvtx_roundph_ps, __m512, __m512, __mmask16, __m256h, 8)
 test_3 (_mm512_mask_cvtt_roundph_epu64, __m512i, __m512i, __mmask8, __m128h, 8)
 test_3 (_mm512_mask_cvt_roundepi16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu16_ph, __m512h, __m512h, __mmask32, __m512i, 8)
@@ -919,6 +929,8 @@ test_3 (_mm512_mask_cvt_roundepi32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu32_ph, __m256h, __m256h, __mmask16, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
+test_3 (_mm512_mask_cvtx_roundps_ph, __m256h, __m256h, __mmask16, __m512, 8)
+test_3 (_mm512_mask_cvt_roundpd_ph, __m128h, __m128h, __mmask8, __m512d, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 1bd734a9352..2f27d9a1e87 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -767,6 +767,10 @@
 #define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
 #define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
 #define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8)
+#define __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtph2pd_v8df_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 36/62] AVX512FP16: Add testcase for vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (34 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 35/62] AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 37/62] AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh liuhongt
                   ` (25 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h (V512): Add DF contents.
	(src3f): New.
	* gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2pd-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2pd-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2psx-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtph2psx-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtps2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtps2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-helper.h       | 25 ++++--
 .../gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c | 24 ++++++
 .../gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c | 82 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtph2pd-1a.c | 24 ++++++
 .../gcc.target/i386/avx512fp16-vcvtph2pd-1b.c | 78 +++++++++++++++++
 .../i386/avx512fp16-vcvtph2psx-1a.c           | 24 ++++++
 .../i386/avx512fp16-vcvtph2psx-1b.c           | 81 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtps2ph-1a.c | 24 ++++++
 .../gcc.target/i386/avx512fp16-vcvtps2ph-1b.c | 84 +++++++++++++++++++
 .../i386/avx512fp16vl-vcvtpd2ph-1a.c          | 28 +++++++
 .../i386/avx512fp16vl-vcvtpd2ph-1b.c          | 15 ++++
 .../i386/avx512fp16vl-vcvtph2pd-1a.c          | 27 ++++++
 .../i386/avx512fp16vl-vcvtph2pd-1b.c          | 15 ++++
 .../i386/avx512fp16vl-vcvtph2psx-1a.c         | 27 ++++++
 .../i386/avx512fp16vl-vcvtph2psx-1b.c         | 15 ++++
 .../i386/avx512fp16vl-vcvtps2ph-1a.c          | 27 ++++++
 .../i386/avx512fp16vl-vcvtps2ph-1b.c          | 15 ++++
 17 files changed, 609 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
index cf1c536d9f7..ce3cfdc3f6b 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
@@ -26,23 +26,27 @@ typedef union
   __m512          zmm;
   __m512h         zmmh;
   __m512i         zmmi;
+  __m512d         zmmd;
   __m256          ymm[2];
   __m256h         ymmh[2];
   __m256i         ymmi[2];
+  __m256d         ymmd[2];
   __m128h         xmmh[4];
   __m128	  xmm[4];
   __m128i	  xmmi[4];
+  __m128d	  xmmd[4];
   unsigned short  u16[32];
   unsigned int    u32[16];
   int		  i32[16];
   long long	  s64[8];
   unsigned long long u64[8];
+  double          f64[8];
   float           f32[16];
   _Float16        f16[32];
 } V512;
 
 /* Global variables.  */
-V512 src1, src2, src3;
+V512 src1, src2, src3, src3f;
 int n_errs = 0;
 
 /* Helper function for packing/unpacking ph operands. */
@@ -167,12 +171,16 @@ init_src()
     int i;
 
     for (i = 0; i < AVX512F_MAX_ELEM; i++) {
-        v1.f32[i] = i + 1;
-        v2.f32[i] = i * 0.5f;
-        v3.f32[i] = i * 1.5f;
-        v4.f32[i] = i - 0.5f;
+	v1.f32[i] = i + 1;
+	v2.f32[i] = i * 0.5f;
+	v3.f32[i] = i * 1.5f;
+	v4.f32[i] = i - 0.5f;
 
-        src3.u32[i] = (i + 1) * 10;
+	src3.u32[i] = (i + 1) * 10;
+    }
+
+    for (i = 0; i < 8; i++) {
+	src3f.f64[i] = (i + 1) * 7.5;
     }
 
     src1 = pack_twops_2ph(v1, v2);
@@ -223,6 +231,7 @@ init_dest(V512 * res, V512 * exp)
 #undef HF
 #undef SF
 #undef SI
+#undef DF
 #undef H_HF
 #undef NET_MASK 
 #undef MASK_VALUE
@@ -235,10 +244,12 @@ init_dest(V512 * res, V512 * exp)
 #define HF(x) x.ymmh[0]
 #define H_HF(x) x.xmmh[0]
 #define SF(x) x.ymm[0]
+#define DF(x) x.ymmd[0]
 #define SI(x) x.ymmi[0]
 #elif AVX512F_LEN == 128
 #undef HF
 #undef SF
+#undef DF
 #undef SI
 #undef H_HF
 #undef NET_MASK 
@@ -251,6 +262,7 @@ init_dest(V512 * res, V512 * exp)
 #define ZMASK_VALUE 0xc1
 #define HF(x) x.xmmh[0]
 #define SF(x) x.xmm[0]
+#define DF(x) x.xmmd[0]
 #define SI(x) x.xmmi[0]
 #define H_HF(x) x.xmmh[0]
 #else
@@ -260,6 +272,7 @@ init_dest(V512 * res, V512 * exp)
 #define HALF_MASK 0xcccc
 #define HF(x) x.zmmh
 #define SF(x) x.zmm
+#define DF(x) x.zmmd
 #define SI(x) x.zmmi
 #define H_HF(x) x.ymmh[0]
 #endif
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c
new file mode 100644
index 00000000000..8f74405873f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtpd2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2phz\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2ph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2ph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, res1, res2;
+volatile __m512d x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtpd_ph (x1);
+  res1 = _mm512_mask_cvtpd_ph (res, m8, x2);
+  res2 = _mm512_maskz_cvtpd_ph (m8, x3);
+  res = _mm512_cvt_roundpd_ph (x1, 4);
+  res1 = _mm512_mask_cvt_roundpd_ph (res, m8, x2, 8);
+  res2 = _mm512_maskz_cvt_roundpd_ph (m8, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c
new file mode 100644
index 00000000000..dde364b65ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c
@@ -0,0 +1,82 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 64)
+
+void NOINLINE
+EMULATE(cvtpd2_ph) (V512 * dest, V512 op1, int n_el,
+                 __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < n_el; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+               v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = op1.f64[i];
+        }
+    }
+    *dest = pack_twops_2ph(v5, v5);
+    for (i = n_el; i < 8; i++)
+      dest->u16[i] = 0;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, NET_MASK, 0);
+  res.xmmh[0] = INTRINSIC (_cvtpd_ph) (DF(src3f));
+  CHECK_RESULT (&res, &exp, 8, _cvtpd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, 0xcc, 0);
+  res.xmmh[0] = INTRINSIC (_mask_cvtpd_ph) (res.xmmh[0], 0xcc,
+					   DF(src3f));
+  CHECK_RESULT (&res, &exp, 8, _mask_cvtpd_ph);
+
+  EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, 0xf1, 1);
+  res.xmmh[0] = INTRINSIC (_maskz_cvtpd_ph) (0xf1, DF(src3f));
+  CHECK_RESULT (&res, &exp, 8, _maskz_cvtpd_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, NET_MASK, 0);
+  res.xmmh[0] = INTRINSIC (_cvt_roundpd_ph) (DF(src3f), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, 8, _cvt_roundpd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, 0xcc, 0);
+  res.xmmh[0] = INTRINSIC (_mask_cvt_roundpd_ph) (res.xmmh[0], 0xcc,
+					   DF(src3f), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, 8, _mask_cvt_roundpd_ph);
+
+  EMULATE(cvtpd2_ph)(&exp, src3f, N_ELEMS, 0xf1, 1);
+  res.xmmh[0] = INTRINSIC (_maskz_cvt_roundpd_ph) (0xf1, DF(src3f), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, 8, _maskz_cvt_roundpd_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1a.c
new file mode 100644
index 00000000000..b7bb3b7840f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512d res, res1, res2;
+volatile __m128h x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtph_pd (x1);
+  res1 = _mm512_mask_cvtph_pd (res, m8, x2);
+  res2 = _mm512_maskz_cvtph_pd (m8, x3);
+  res = _mm512_cvt_roundph_pd (x1, 4);
+  res1 = _mm512_mask_cvt_roundph_pd (res, m8, x2, 8);
+  res2 = _mm512_maskz_cvt_roundph_pd (m8, x3, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1b.c
new file mode 100644
index 00000000000..c20888ba534
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2pd-1b.c
@@ -0,0 +1,78 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtph2_pd) (V512 * dest, V512 op1,
+                 __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8; 
+    int i;
+    __mmask16 m1, m2;
+    
+    m1 = k & 0xffff; 
+    unpack_ph_2twops(op1, &v1, &v2);
+
+    for (i = 0; i < 8; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.u64[i] = 0;
+            }
+            else {
+               v5.u64[i] = dest->u64[i];
+            }
+        }
+        else {
+           v5.f64[i] = v1.f32[i];
+        }
+    }
+
+    *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtph2_pd)(&exp, src1,  NET_MASK, 0);
+  DF(res) = INTRINSIC (_cvtph_pd) (src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvtph_pd);
+ 
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_pd)(&exp, src1, 0xcc, 0);
+  DF(res) = INTRINSIC (_mask_cvtph_pd) (DF(res), 0xcc, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvtph_pd);
+ 
+  EMULATE(cvtph2_pd)(&exp, src1,  0xc1, 1);
+  DF(res) = INTRINSIC (_maskz_cvtph_pd) (0xc1, src1.xmmh[0]);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvtph_pd);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtph2_pd)(&exp, src1,  NET_MASK, 0);
+  DF(res) = INTRINSIC (_cvt_roundph_pd) (src1.xmmh[0], _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _cvt_roundph_pd);
+ 
+  init_dest(&res, &exp);
+  EMULATE(cvtph2_pd)(&exp, src1, 0xcc, 0);
+  DF(res) = INTRINSIC (_mask_cvt_roundph_pd) (DF(res), 0xcc, src1.xmmh[0], _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_cvt_roundph_pd);
+ 
+  EMULATE(cvtph2_pd)(&exp, src1,  0xc1, 1);
+  DF(res) = INTRINSIC (_maskz_cvt_roundph_pd) (0xc1, src1.xmmh[0], _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_cvt_roundph_pd);
+#endif
+
+  if (n_errs != 0) {
+    abort ();
+}
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1a.c
new file mode 100644
index 00000000000..c79549f67c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\{sae\}\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512 res, res1, res2;
+volatile __m256h x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtxph_ps (x1);
+  res1 = _mm512_mask_cvtxph_ps (res, m16, x2);
+  res2 = _mm512_maskz_cvtxph_ps (m16, x3);
+  res = _mm512_cvtx_roundph_ps (x1, 4);
+  res1 = _mm512_mask_cvtx_roundph_ps (res, m16, x2, 8);
+  res2 = _mm512_maskz_cvtx_roundph_ps (m16, x3, 8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1b.c
new file mode 100644
index 00000000000..a2f20c099b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtph2psx-1b.c
@@ -0,0 +1,81 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 32)
+#define CHECK_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(cvtxph2_ps) (V512 * dest, V512 op1, int n_el,
+		   __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    unpack_ph_2twops(op1, &v1, &v2);
+
+    for (i = 0; i < n_el; i++) {
+      if (((1 << i) & m1) == 0) {
+	if (zero_mask) {
+	  v5.u32[i] = 0;
+	}
+	else {
+	  v5.u32[i] = dest->u32[i];
+	}
+      }
+      else {
+	v5.f32[i] = v1.f32[i];
+      }
+    }
+
+    for (i = n_el; i < 16; i++)
+      v5.u32[i] = 0;
+
+    *dest = v5;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xffff, 0);
+  SF(res) = INTRINSIC (_cvtxph_ps) (H_HF(src1));
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _cvtxph_ps);
+ 
+  init_dest(&res, &exp);
+  EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xcc, 0);
+  SF(res) = INTRINSIC (_mask_cvtxph_ps) (SF(res), 0xcc, H_HF(src1));
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _mask_cvtxph_ps);
+ 
+  EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xc1, 1);
+  SF(res) = INTRINSIC (_maskz_cvtxph_ps) (0xc1, H_HF(src1));
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _maskz_cvtxph_ps);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xffff, 0);
+  SF(res) = INTRINSIC (_cvtx_roundph_ps) (H_HF(src1), _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _cvtx_roundph_ps);
+ 
+  init_dest(&res, &exp);
+  EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xcc, 0);
+  SF(res) = INTRINSIC (_mask_cvtx_roundph_ps) (SF(res), 0xcc, H_HF(src1), _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _mask_cvtx_roundph_ps);
+ 
+  EMULATE(cvtxph2_ps)(&exp, src1, N_ELEMS, 0xc1, 1);
+  SF(res) = INTRINSIC (_maskz_cvtx_roundph_ps) (0xc1, H_HF(src1), _ROUND_CUR);
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _maskz_cvtx_roundph_ps);
+#endif
+
+  if (n_errs != 0) 
+    abort ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1a.c
new file mode 100644
index 00000000000..cb957f86920
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res, res1, res2;
+volatile __m512 x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_cvtxps_ph (x1);
+  res1 = _mm512_mask_cvtxps_ph (res, m16, x2);
+  res2 = _mm512_maskz_cvtxps_ph (m16, x3);
+  res = _mm512_cvtx_roundps_ph (x1, 4);
+  res1 = _mm512_mask_cvtx_roundps_ph (res, m16, x2, 8);
+  res2 = _mm512_maskz_cvtx_roundps_ph (m16, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1b.c
new file mode 100644
index 00000000000..e316e766f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtps2ph-1b.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 32)
+#define CHECK_ELEMS (AVX512F_LEN_HALF / 16)
+
+void NOINLINE
+EMULATE(cvtxps2_ph) (V512 * dest, V512 op1, int n_el,
+                 __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < n_el; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+               v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = op1.f32[i];
+        }
+    }
+    *dest = pack_twops_2ph(v5, v5);
+    for (i = n_el; i < 16; i++)
+      dest->u16[i] = 0;
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, NET_MASK, 0);
+  H_HF(res) = INTRINSIC (_cvtxps_ph) (SF(src3f));
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _cvtxps_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, 0xcc, 0);
+  H_HF(res) = INTRINSIC (_mask_cvtxps_ph) (H_HF(res), 0xcc,
+					   SF(src3f));
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _mask_cvtxps_ph);
+
+  EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, 0xf1, 1);
+  H_HF(res) = INTRINSIC (_maskz_cvtxps_ph) (0xf1, SF(src3f));
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _maskz_cvtxps_ph);
+
+#if AVX512F_LEN == 512
+  EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, NET_MASK, 0);
+  H_HF(res) = INTRINSIC (_cvtx_roundps_ph) (SF(src3f), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _cvtx_roundps_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, 0xcc, 0);
+  H_HF(res) = INTRINSIC (_mask_cvtx_roundps_ph) (H_HF(res), 0xcc,
+					   SF(src3f), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _mask_cvtx_roundps_ph);
+
+  EMULATE(cvtxps2_ph)(&exp, src3f, N_ELEMS, 0xf1, 1);
+  H_HF(res) = INTRINSIC (_maskz_cvtx_roundps_ph) (0xf1, SF(src3f), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, CHECK_ELEMS, _maskz_cvtx_roundps_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c
new file mode 100644
index 00000000000..57604a91334
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res3;
+volatile __m256d x2;
+volatile __m128d x3;
+volatile __mmask16 m16;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res3 = _mm256_cvtpd_ph (x2);
+  res3 = _mm256_mask_cvtpd_ph (res3, m16, x2);
+  res3 = _mm256_maskz_cvtpd_ph (m16, x2);
+
+  res3 = _mm_cvtpd_ph (x3);
+  res3 = _mm_mask_cvtpd_ph (res3, m8, x3);
+  res3 = _mm_maskz_cvtpd_ph (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c
new file mode 100644
index 00000000000..ea4b200803b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtpd2ph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtpd2ph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c
new file mode 100644
index 00000000000..80010c02297
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d res1;
+volatile __m128d res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvtph_pd (x3);
+  res1 = _mm256_mask_cvtph_pd (res1, m8, x3);
+  res1 = _mm256_maskz_cvtph_pd (m8, x3);
+
+  res2 = _mm_cvtph_pd (x3);
+  res2 = _mm_mask_cvtph_pd (res2, m8, x3);
+  res2 = _mm_maskz_cvtph_pd (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c
new file mode 100644
index 00000000000..a3849056870
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2pd-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2pd-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c
new file mode 100644
index 00000000000..e8c4c8c70d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256 res1;
+volatile __m128 res2;
+volatile __m128h x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_cvtxph_ps (x3);
+  res1 = _mm256_mask_cvtxph_ps (res1, m8, x3);
+  res1 = _mm256_maskz_cvtxph_ps (m8, x3);
+
+  res2 = _mm_cvtxph_ps (x3);
+  res2 = _mm_mask_cvtxph_ps (res2, m8, x3);
+  res2 = _mm_maskz_cvtxph_ps (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c
new file mode 100644
index 00000000000..ad91de85370
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2psx-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtph2psx-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c
new file mode 100644
index 00000000000..a89f8c4fe87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vcvtps2phxy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2phxy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2phxy\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res3;
+volatile __m256 x2;
+volatile __m128 x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res3 = _mm256_cvtxps_ph (x2);
+  res3 = _mm256_mask_cvtxps_ph (res3, m8, x2);
+  res3 = _mm256_maskz_cvtxps_ph (m8, x2);
+
+  res3 = _mm_cvtxps_ph (x3);
+  res3 = _mm_mask_cvtxps_ph (res3, m8, x3);
+  res3 = _mm_maskz_cvtxps_ph (m8, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c
new file mode 100644
index 00000000000..a339d0c933e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtps2ph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vcvtps2ph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 37/62] AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (35 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 36/62] AVX512FP16: Add testcase for vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 38/62] AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh liuhongt
                   ` (24 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_cvtsh_ss):
	New intrinsic.
	(_mm_mask_cvtsh_ss): Likewise.
	(_mm_maskz_cvtsh_ss): Likewise.
	(_mm_cvtsh_sd): Likewise.
	(_mm_mask_cvtsh_sd): Likewise.
	(_mm_maskz_cvtsh_sd): Likewise.
	(_mm_cvt_roundsh_ss): Likewise.
	(_mm_mask_cvt_roundsh_ss): Likewise.
	(_mm_maskz_cvt_roundsh_ss): Likewise.
	(_mm_cvt_roundsh_sd): Likewise.
	(_mm_mask_cvt_roundsh_sd): Likewise.
	(_mm_maskz_cvt_roundsh_sd): Likewise.
	(_mm_cvtss_sh): Likewise.
	(_mm_mask_cvtss_sh): Likewise.
	(_mm_maskz_cvtss_sh): Likewise.
	(_mm_cvtsd_sh): Likewise.
	(_mm_mask_cvtsd_sh): Likewise.
	(_mm_maskz_cvtsd_sh): Likewise.
	(_mm_cvt_roundss_sh): Likewise.
	(_mm_mask_cvt_roundss_sh): Likewise.
	(_mm_maskz_cvt_roundss_sh): Likewise.
	(_mm_cvt_roundsd_sh): Likewise.
	(_mm_mask_cvt_roundsd_sh): Likewise.
	(_mm_maskz_cvt_roundsd_sh): Likewise.
	* config/i386/i386-builtin-types.def
	(V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT,
	V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT,
	V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT,
	V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT): Add new builtin types.
	* config/i386/i386-builtin.def: Add corrresponding new builtins.
	* config/i386/i386-expand.c: Handle new builtin types.
	* config/i386/sse.md (VF48_128): New mode iterator.
	(avx512fp16_vcvtsh2<ssescalarmodesuffix><mask_scalar_name><round_saeonly_scalar_name>):
	New.
	(avx512fp16_vcvt<ssescalarmodesuffix>2sh<mask_scalar_name><round_scalar_name>):
	Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 280 +++++++++++++++++++++++++
 gcc/config/i386/i386-builtin-types.def |   4 +
 gcc/config/i386/i386-builtin.def       |   4 +
 gcc/config/i386/i386-expand.c          |   4 +
 gcc/config/i386/sse.md                 |  36 ++++
 gcc/testsuite/gcc.target/i386/avx-1.c  |   4 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   4 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  12 ++
 gcc/testsuite/gcc.target/i386/sse-22.c |  12 ++
 gcc/testsuite/gcc.target/i386/sse-23.c |   4 +
 10 files changed, 364 insertions(+)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 5a6a0ba83a9..05efbc5777b 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -4604,6 +4604,286 @@ _mm512_maskz_cvt_roundpd_ph (__mmask8 __A, __m512d __B, int __C)
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vcvtsh2ss, vcvtsh2sd.  */
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsh_ss (__m128 __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
+					      _mm_setzero_ps (),
+					      (__mmask8) -1,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
+			 __m128h __D)
+{
+  return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtsh_ss (__mmask8 __A, __m128 __B,
+			  __m128h __C)
+{
+  return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
+					      _mm_setzero_ps (),
+					      __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsh_sd (__m128d __A, __m128h __B)
+{
+  return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
+					      _mm_setzero_pd (),
+					      (__mmask8) -1,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
+			 __m128h __D)
+{
+  return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtsh_sd (__mmask8 __A, __m128d __B, __m128h __C)
+{
+  return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
+					      _mm_setzero_pd (),
+					      __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundsh_ss (__m128 __A, __m128h __B, const int __R)
+{
+  return __builtin_ia32_vcvtsh2ss_mask_round (__B, __A,
+					      _mm_setzero_ps (),
+					      (__mmask8) -1, __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvt_roundsh_ss (__m128 __A, __mmask8 __B, __m128 __C,
+			 __m128h __D, const int __R)
+{
+  return __builtin_ia32_vcvtsh2ss_mask_round (__D, __C, __A, __B, __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvt_roundsh_ss (__mmask8 __A, __m128 __B,
+			  __m128h __C, const int __R)
+{
+  return __builtin_ia32_vcvtsh2ss_mask_round (__C, __B,
+					      _mm_setzero_ps (),
+					      __A, __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundsh_sd (__m128d __A, __m128h __B, const int __R)
+{
+  return __builtin_ia32_vcvtsh2sd_mask_round (__B, __A,
+					      _mm_setzero_pd (),
+					      (__mmask8) -1, __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvt_roundsh_sd (__m128d __A, __mmask8 __B, __m128d __C,
+			 __m128h __D, const int __R)
+{
+  return __builtin_ia32_vcvtsh2sd_mask_round (__D, __C, __A, __B, __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvt_roundsh_sd (__mmask8 __A, __m128d __B, __m128h __C, const int __R)
+{
+  return __builtin_ia32_vcvtsh2sd_mask_round (__C, __B,
+					      _mm_setzero_pd (),
+					      __A, __R);
+}
+
+#else
+#define _mm_cvt_roundsh_ss(A, B, R)				\
+  (__builtin_ia32_vcvtsh2ss_mask_round ((B), (A),		\
+					_mm_setzero_ps (),	\
+					(__mmask8) -1, (R)))
+
+#define _mm_mask_cvt_roundsh_ss(A, B, C, D, R)				\
+  (__builtin_ia32_vcvtsh2ss_mask_round ((D), (C), (A), (B), (R)))
+
+#define _mm_maskz_cvt_roundsh_ss(A, B, C, R)			\
+  (__builtin_ia32_vcvtsh2ss_mask_round((C), (B),		\
+				       _mm_setzero_ps (),	\
+				       (A), (R)))
+
+#define _mm_cvt_roundsh_sd(A, B, R)				\
+  (__builtin_ia32_vcvtsh2sd_mask_round((B), (A),		\
+				       _mm_setzero_pd (),	\
+				       (__mmask8) -1, (R)))
+
+#define _mm_mask_cvt_roundsh_sd(A, B, C, D, R)				\
+  (__builtin_ia32_vcvtsh2sd_mask_round((D), (C), (A), (B), (R)))
+
+#define _mm_maskz_cvt_roundsh_sd(A, B, C, R)			\
+  (__builtin_ia32_vcvtsh2sd_mask_round((C), (B),		\
+				       _mm_setzero_pd (),	\
+				       (A), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vcvtss2sh, vcvtsd2sh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtss_sh (__m128h __A, __m128 __B)
+{
+  return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D)
+{
+  return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtss_sh (__mmask8 __A, __m128h __B, __m128 __C)
+{
+  return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
+					      _mm_setzero_ph (),
+					      __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsd_sh (__m128h __A, __m128d __B)
+{
+  return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvtsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D)
+{
+  return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B,
+					      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvtsd_sh (__mmask8 __A, __m128h __B, __m128d __C)
+{
+  return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
+					      _mm_setzero_ph (),
+					      __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundss_sh (__m128h __A, __m128 __B, const int __R)
+{
+  return __builtin_ia32_vcvtss2sh_mask_round (__B, __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvt_roundss_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128 __D,
+			 const int __R)
+{
+  return __builtin_ia32_vcvtss2sh_mask_round (__D, __C, __A, __B, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvt_roundss_sh (__mmask8 __A, __m128h __B, __m128 __C,
+			  const int __R)
+{
+  return __builtin_ia32_vcvtss2sh_mask_round (__C, __B,
+					      _mm_setzero_ph (),
+					      __A, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvt_roundsd_sh (__m128h __A, __m128d __B, const int __R)
+{
+  return __builtin_ia32_vcvtsd2sh_mask_round (__B, __A,
+					      _mm_setzero_ph (),
+					      (__mmask8) -1, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_cvt_roundsd_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128d __D,
+			 const int __R)
+{
+  return __builtin_ia32_vcvtsd2sh_mask_round (__D, __C, __A, __B, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C,
+			  const int __R)
+{
+  return __builtin_ia32_vcvtsd2sh_mask_round (__C, __B,
+					      _mm_setzero_ph (),
+					      __A, __R);
+}
+
+#else
+#define _mm_cvt_roundss_sh(A, B, R)				\
+  (__builtin_ia32_vcvtss2sh_mask_round ((B), (A),		\
+					_mm_setzero_ph (),	\
+					(__mmask8) -1, R))
+
+#define _mm_mask_cvt_roundss_sh(A, B, C, D, R)				\
+  (__builtin_ia32_vcvtss2sh_mask_round ((D), (C), (A), (B), (R)))
+
+#define _mm_maskz_cvt_roundss_sh(A, B, C, R)			\
+  (__builtin_ia32_vcvtss2sh_mask_round ((C), (B),		\
+					_mm_setzero_ph (),	\
+					A, R))
+
+#define _mm_cvt_roundsd_sh(A, B, R)				\
+  (__builtin_ia32_vcvtsd2sh_mask_round ((B), (A),		\
+					_mm_setzero_ph (),	\
+					(__mmask8) -1, R))
+
+#define _mm_mask_cvt_roundsd_sh(A, B, C, D, R)				\
+  (__builtin_ia32_vcvtsd2sh_mask_round ((D), (C), (A), (B), (R)))
+
+#define _mm_maskz_cvt_roundsd_sh(A, B, C, R)			\
+  (__builtin_ia32_vcvtsd2sh_mask_round ((C), (B),		\
+					_mm_setzero_ph (),	\
+					(A), (R)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 4123e66f7cd..0cdbf1bc0c0 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1348,6 +1348,10 @@ DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8DI, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8DF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
+DEF_FUNCTION_TYPE (V8HF, V2DF, V8HF, V8HF, UQI, INT)
+DEF_FUNCTION_TYPE (V8HF, V4SF, V8HF, V8HF, UQI, INT)
+DEF_FUNCTION_TYPE (V2DF, V8HF, V2DF, V2DF, UQI, INT)
+DEF_FUNCTION_TYPE (V4SF, V8HF, V4SF, V4SF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
 DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 2992bd0383d..4bb48bc21dc 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -3136,6 +3136,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv8df2_
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_float_extend_phv16sf2_mask_round, "__builtin_ia32_vcvtph2ps_v16sf_mask_round", IX86_BUILTIN_VCVTPH2PS_V16SF_MASK_ROUND, UNKNOWN, (int) V16SF_FTYPE_V16HF_V16SF_UHI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v8df_mask_round, "__builtin_ia32_vcvtpd2ph_v8df_mask_round", IX86_BUILTIN_VCVTPD2PH_V8DF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v16sf_mask_round, "__builtin_ia32_vcvtps2ph_v16sf_mask_round", IX86_BUILTIN_VCVTPS2PH_V16SF_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SF_V16HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2ss_mask_round, "__builtin_ia32_vcvtsh2ss_mask_round", IX86_BUILTIN_VCVTSH2SS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2sd_mask_round, "__builtin_ia32_vcvtsh2sd_mask_round", IX86_BUILTIN_VCVTSH2SD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtss2sh_mask_round, "__builtin_ia32_vcvtss2sh_mask_round", IX86_BUILTIN_VCVTSS2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsd2sh_mask_round, "__builtin_ia32_vcvtsd2sh_mask_round", IX86_BUILTIN_VCVTSD2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index a216f6f2bf3..9233c6cd1e8 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10565,8 +10565,10 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT:
     case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT:
     case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT:
+    case V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT:
     case V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT:
     case V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT:
+    case V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT:
     case V2DF_FTYPE_V2DF_V2DF_V2DF_QI_INT:
     case V2DF_FTYPE_V2DF_V4SF_V2DF_QI_INT:
     case V2DF_FTYPE_V2DF_V4SF_V2DF_UQI_INT:
@@ -10574,6 +10576,8 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V4SF_FTYPE_V4SF_V2DF_V4SF_QI_INT:
     case V4SF_FTYPE_V4SF_V2DF_V4SF_UQI_INT:
     case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT:
+    case V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT:
+    case V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT:
       nargs = 5;
       break;
     case V32HF_FTYPE_V32HF_INT_V32HF_USI_INT:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7447d6b75b5..95f4a82c9cd 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -788,6 +788,10 @@ (define_mode_iterator VF48_I1248
   [V16SI V16SF V8DI V8DF V32HI V64QI])
 (define_mode_iterator VF48H_AVX512VL
   [V8DF V16SF (V8SF "TARGET_AVX512VL")])
+
+(define_mode_iterator VF48_128
+  [V2DF V4SF])
+
 (define_mode_iterator VI48F
   [V16SI V16SF V8DI V8DF
    (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
@@ -5869,6 +5873,38 @@ (define_insn "*avx512fp16_vcvtpd2ph_v2df_mask_1"
    (set_attr "prefix" "evex")
    (set_attr "mode" "TI")])
 
+(define_insn "avx512fp16_vcvtsh2<ssescalarmodesuffix><mask_scalar_name><round_saeonly_scalar_name>"
+  [(set (match_operand:VF48_128 0 "register_operand" "=v")
+     (vec_merge:VF48_128
+       (vec_duplicate:VF48_128
+         (float_extend:<ssescalarmode>
+           (vec_select:HF
+             (match_operand:V8HF 1 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")
+	     (parallel [(const_int 0)]))))
+       (match_operand:VF48_128 2 "register_operand" "v")
+       (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vcvtsh2<ssescalarmodesuffix>\t{<round_saeonly_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %1<round_saeonly_scalar_mask_op3>}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
+(define_insn "avx512fp16_vcvt<ssescalarmodesuffix>2sh<mask_scalar_name><round_scalar_name>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+     (vec_merge:V8HF
+       (vec_duplicate:V8HF
+         (float_truncate:HF
+           (vec_select:<ssescalarmode>
+             (match_operand:VF48_128 1 "<round_scalar_nimm_predicate>" "<round_scalar_constraint>")
+	     (parallel [(const_int 0)]))))
+       (match_operand:V8HF 2 "register_operand" "v")
+       (const_int 1)))]
+"TARGET_AVX512FP16"
+"vcvt<ssescalarmodesuffix>2sh\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %1<round_scalar_mask_op3>}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "TI")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel single-precision floating point conversion operations
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index f186f8c40f3..deb25098f25 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -753,6 +753,10 @@
 #define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 0e88174e636..dbe206bd1bb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -770,6 +770,10 @@
 #define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 5c3e370d4a7..e64321d8afa 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -766,6 +766,10 @@ test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8)
 test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8)
 test_2 (_mm512_maskz_cvtx_roundps_ph, __m256h, __mmask16, __m512, 8)
 test_2 (_mm512_maskz_cvt_roundpd_ph, __m128h, __mmask8, __m512d, 8)
+test_2 (_mm_cvt_roundsh_ss, __m128, __m128, __m128h, 8)
+test_2 (_mm_cvt_roundsh_sd, __m128d, __m128d, __m128h, 8)
+test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8)
+test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8)
 test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
 test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
@@ -828,6 +832,10 @@ test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
 test_3 (_mm512_mask_cvtx_roundps_ph, __m256h, __m256h, __mmask16, __m512, 8)
 test_3 (_mm512_mask_cvt_roundpd_ph, __m128h, __m128h, __mmask8, __m512d, 8)
+test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8)
+test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8)
+test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8)
+test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -856,6 +864,10 @@ test_4 (_mm_mask_scalef_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h,
 test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
 test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
 test_4 (_mm_mask_getexp_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8)
+test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8)
+test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8)
+test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 5bf94d56ce3..d92898fdd11 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -872,6 +872,10 @@ test_2 (_mm512_maskz_cvtx_roundps_ph, __m256h, __mmask16, __m512, 8)
 test_2 (_mm512_maskz_cvt_roundpd_ph, __m128h, __mmask8, __m512d, 8)
 test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
 test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
+test_2 (_mm_cvt_roundsh_ss, __m128, __m128, __m128h, 8)
+test_2 (_mm_cvt_roundsh_sd, __m128d, __m128d, __m128h, 8)
+test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8)
+test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -931,6 +935,10 @@ test_3 (_mm512_mask_cvt_roundepi64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
 test_3 (_mm512_mask_cvt_roundepu64_ph, __m128h, __m128h, __mmask8, __m512i, 8)
 test_3 (_mm512_mask_cvtx_roundps_ph, __m256h, __m256h, __mmask16, __m512, 8)
 test_3 (_mm512_mask_cvt_roundpd_ph, __m128h, __m128h, __mmask8, __m512d, 8)
+test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8)
+test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8)
+test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8)
+test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -958,6 +966,10 @@ test_4 (_mm512_mask_scalef_round_ph, __m512h, __m512h, __mmask32, __m512h, __m51
 test_4 (_mm_mask_reduce_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
 test_4 (_mm_mask_roundscale_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123)
 test_4 (_mm_mask_getexp_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8)
+test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8)
+test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8)
+test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 2f27d9a1e87..2f5027ba36f 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -771,6 +771,10 @@
 #define __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtph2ps_v16sf_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, D) __builtin_ia32_vcvtpd2ph_v8df_mask_round(A, B, C, 8)
 #define __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, D) __builtin_ia32_vcvtps2ph_v16sf_mask_round(A, B, C, 8)
+#define __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2ss_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 38/62] AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (36 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 37/62] AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 39/62] AVX512FP16: Add intrinsics for casting between vector float16 and vector float32/float64/integer liuhongt
                   ` (23 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c: New test.
	* gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtss2sh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vcvtss2sh-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c | 25 ++++++++
 .../gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c | 60 +++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c | 25 ++++++++
 .../gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c | 57 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c | 25 ++++++++
 .../gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c | 59 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vcvtss2sh-1a.c | 25 ++++++++
 .../gcc.target/i386/avx512fp16-vcvtss2sh-1b.c | 60 +++++++++++++++++++
 8 files changed, 336 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c
new file mode 100644
index 00000000000..b663ca507fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsd2sh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, x1;
+volatile __m128d x2;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_cvtsd_sh (x1, x2);
+  res = _mm_mask_cvtsd_sh (res, m8, x1, x2);
+  res = _mm_maskz_cvtsd_sh (m8, x1, x2);
+  res = _mm_cvt_roundsd_sh (x1, x2, 8);
+  res = _mm_mask_cvt_roundsd_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_cvt_roundsd_sh (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c
new file mode 100644
index 00000000000..552362058c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c
@@ -0,0 +1,60 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_vcvtsd2sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v5, v6, v7, v8;
+    int i;
+    
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = (float)op2.f64[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  emulate_vcvtsd2sh(&exp, src1, src2, 0x1, 0);
+  res.xmmh[0] = _mm_cvt_roundsd_sh(src1.xmmh[0], src2.xmmd[0],
+                                 _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsd_sh");
+
+  init_dest(&res, &exp);
+  emulate_vcvtsd2sh(&exp, src1, src2, 0x1, 0);
+  res.xmmh[0] = _mm_mask_cvt_roundsd_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+                                      src2.xmmd[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "mm_mask_cvt_roundsd_sh");
+
+  emulate_vcvtsd2sh(&exp, src1, src2, 0x2, 1);
+  res.xmmh[0] = _mm_maskz_cvt_roundsd_sh(0x2, src1.xmmh[0],
+                                       src2.xmmd[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "mm_maskz_cvt_roundsd_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c
new file mode 100644
index 00000000000..59719ed18e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2sd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+volatile __m128d res;
+volatile __m128d x1;
+volatile __m128h x2;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_cvtsh_sd (x1, x2);
+  res = _mm_mask_cvtsh_sd (res, m8, x1, x2);
+  res = _mm_maskz_cvtsh_sd (m8, x1, x2);
+  res = _mm_cvt_roundsh_sd (x1, x2, 8);
+  res = _mm_mask_cvt_roundsh_sd (res, m8, x1, x2, 8);
+  res = _mm_maskz_cvt_roundsh_sd (m8, x1, x2, 4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c
new file mode 100644
index 00000000000..e6bdc9580bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c
@@ -0,0 +1,57 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_vcvtsh2sd(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+
+    unpack_ph_2twops(op2, &v3, &v4);
+
+    if ((k&1) || !k)
+      v5.f64[0] = v3.f32[0];
+    else if (zero_mask)
+      v5.f64[0] = 0;
+    else
+      v5.f64[0] = dest->f64[0];
+
+    v5.f64[1] = op1.f64[1];
+
+    *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  emulate_vcvtsh2sd(&exp, src1, src2, 0x1, 0);
+  res.xmmd[0] = _mm_cvt_roundsh_sd(src1.xmmd[0], src2.xmmh[0],
+                                 _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_sd");
+
+  init_dest(&res, &exp);
+  emulate_vcvtsh2sd(&exp, src1, src2, 0x1, 0);
+  res.xmmd[0] = _mm_mask_cvt_roundsh_sd(res.xmmd[0], 0x1, src1.xmmd[0],
+                                      src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "mm_mask_cvt_roundsh_sd");
+
+  emulate_vcvtsh2sd(&exp, src1, src2, 0x2, 1);
+  res.xmmd[0] = _mm_maskz_cvt_roundsh_sd(0x2, src1.xmmd[0],
+                                       src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "mm_maskz_cvt_roundsh_sd");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c
new file mode 100644
index 00000000000..e6c369c067f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsh2ss\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+
+#include <immintrin.h>
+
+volatile __m128 res;
+volatile __m128 x1;
+volatile __m128h x2;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_cvtsh_ss (x1, x2);
+  res = _mm_mask_cvtsh_ss (res, m8, x1, x2);
+  res = _mm_maskz_cvtsh_ss (m8, x1, x2);
+  res = _mm_cvt_roundsh_ss (x1, x2, 8);
+  res = _mm_mask_cvt_roundsh_ss (res, m8, x1, x2, 8);
+  res = _mm_maskz_cvt_roundsh_ss (m8, x1, x2, 4);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c
new file mode 100644
index 00000000000..319598341cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c
@@ -0,0 +1,59 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+ void NOINLINE
+emulate_vcvtsh2ss(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op2, &v3, &v4);
+    if ((k&1) || !k)
+      v5.f32[0] = v3.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = dest->f32[0];
+
+    for (i = 1; i < 4; i++)
+      v5.f32[i] = op1.f32[i];
+
+    *dest = v5;
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  emulate_vcvtsh2ss(&exp, src1, src2, 0x1, 0);
+  res.xmm[0] = _mm_cvt_roundsh_ss(src1.xmm[0], src2.xmmh[0],
+                                 _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundsh_ss");
+
+  init_dest(&res, &exp);
+  emulate_vcvtsh2ss(&exp, src1, src2, 0x1, 0);
+  res.xmm[0] = _mm_mask_cvt_roundsh_ss(res.xmm[0], 0x1, src1.xmm[0],
+                                      src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "mm_mask_cvt_roundsh_ss");
+
+  emulate_vcvtsh2ss(&exp, src1, src2, 0x2, 1);
+  res.xmm[0] = _mm_maskz_cvt_roundsh_ss(0x2, src1.xmm[0],
+                                       src2.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "mm_maskz_cvt_roundsh_ss");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1a.c
new file mode 100644
index 00000000000..63ad0906555
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtss2sh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, x1;
+volatile __m128 x2;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_cvtss_sh (x1, x2);
+  res = _mm_mask_cvtss_sh (res, m8, x1, x2);
+  res = _mm_maskz_cvtss_sh (m8, x1, x2);
+  res = _mm_cvt_roundss_sh (x1, x2, 8);
+  res = _mm_mask_cvt_roundss_sh (res, m8, x1, x2, 8);
+  res = _mm_maskz_cvt_roundss_sh (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1b.c
new file mode 100644
index 00000000000..94981bbb79f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vcvtss2sh-1b.c
@@ -0,0 +1,60 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_vcvtss2sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask)
+{
+    V512 v1, v2, v5, v6, v7, v8;
+    int i;
+    
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = op2.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+
+    for (i = 1; i < 8; i++)
+      v5.f32[i] = v1.f32[i];
+
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  emulate_vcvtss2sh(&exp, src1, src2, 0x1, 0);
+  res.xmmh[0] = _mm_cvt_roundss_sh(src1.xmmh[0], src2.xmm[0],
+                                 _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_cvt_roundss_sh");
+
+  init_dest(&res, &exp);
+  emulate_vcvtss2sh(&exp, src1, src2, 0x1, 0);
+  res.xmmh[0] = _mm_mask_cvt_roundss_sh(res.xmmh[0], 0x1, src1.xmmh[0],
+                                      src2.xmm[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "mm_mask_cvt_roundss_sh");
+
+  emulate_vcvtss2sh(&exp, src1, src2, 0x2, 1);
+  res.xmmh[0] = _mm_maskz_cvt_roundss_sh(0x2, src1.xmmh[0],
+                                       src2.xmm[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "mm_maskz_cvt_roundss_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 39/62] AVX512FP16: Add intrinsics for casting between vector float16 and vector float32/float64/integer.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (37 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 38/62] AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 40/62] AVX512FP16: Add vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph liuhongt
                   ` (22 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_undefined_ph):
	New intrinsic.
	(_mm256_undefined_ph): Likewise.
	(_mm512_undefined_ph): Likewise.
	(_mm_cvtsh_h): Likewise.
	(_mm256_cvtsh_h): Likewise.
	(_mm512_cvtsh_h): Likewise.
	(_mm512_castph_ps): Likewise.
	(_mm512_castph_pd): Likewise.
	(_mm512_castph_si512): Likewise.
	(_mm512_castph512_ph128): Likewise.
	(_mm512_castph512_ph256): Likewise.
	(_mm512_castph128_ph512): Likewise.
	(_mm512_castph256_ph512): Likewise.
	(_mm512_zextph128_ph512): Likewise.
	(_mm512_zextph256_ph512): Likewise.
	(_mm512_castps_ph): Likewise.
	(_mm512_castpd_ph): Likewise.
	(_mm512_castsi512_ph): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_castph_ps):
	New intrinsic.
	(_mm256_castph_ps): Likewise.
	(_mm_castph_pd): Likewise.
	(_mm256_castph_pd): Likewise.
	(_mm_castph_si128): Likewise.
	(_mm256_castph_si256): Likewise.
	(_mm_castps_ph): Likewise.
	(_mm256_castps_ph): Likewise.
	(_mm_castpd_ph): Likewise.
	(_mm256_castpd_ph): Likewise.
	(_mm_castsi128_ph): Likewise.
	(_mm256_castsi256_ph): Likewise.
	(_mm256_castph256_ph128): Likewise.
	(_mm256_castph128_ph256): Likewise.
	(_mm256_zextph128_ph256): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-typecast-1.c: New test.
	* gcc.target/i386/avx512fp16-typecast-2.c: Ditto.
	* gcc.target/i386/avx512fp16vl-typecast-1.c: Ditto.
	* gcc.target/i386/avx512fp16vl-typecast-2.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h            | 153 ++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h          | 117 ++++++++++++++
 .../gcc.target/i386/avx512fp16-typecast-1.c   |  44 +++++
 .../gcc.target/i386/avx512fp16-typecast-2.c   |  43 +++++
 .../gcc.target/i386/avx512fp16vl-typecast-1.c |  55 +++++++
 .../gcc.target/i386/avx512fp16vl-typecast-2.c |  37 +++++
 6 files changed, 449 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-typecast-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-typecast-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-2.c

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 05efbc5777b..ddb227529fa 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -192,6 +192,159 @@ _mm512_setzero_ph (void)
   return _mm512_set1_ph (0.0f);
 }
 
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_undefined_ph (void)
+{
+  __m128h __Y = __Y;
+  return __Y;
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_undefined_ph (void)
+{
+  __m256h __Y = __Y;
+  return __Y;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_undefined_ph (void)
+{
+  __m512h __Y = __Y;
+  return __Y;
+}
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtsh_h (__m128h __A)
+{
+  return __A[0];
+}
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cvtsh_h (__m256h __A)
+{
+  return __A[0];
+}
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_cvtsh_h (__m512h __A)
+{
+  return __A[0];
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castph_ps (__m512h __a)
+{
+  return (__m512) __a;
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castph_pd (__m512h __a)
+{
+  return (__m512d) __a;
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castph_si512 (__m512h __a)
+{
+  return (__m512i) __a;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castph512_ph128 (__m512h __A)
+{
+  union
+  {
+    __m128h a[4];
+    __m512h v;
+  } u = { .v = __A };
+  return u.a[0];
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castph512_ph256 (__m512h __A)
+{
+  union
+  {
+    __m256h a[2];
+    __m512h v;
+  } u = { .v = __A };
+  return u.a[0];
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castph128_ph512 (__m128h __A)
+{
+  union
+  {
+    __m128h a[4];
+    __m512h v;
+  } u;
+  u.a[0] = __A;
+  return u.v;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castph256_ph512 (__m256h __A)
+{
+  union
+  {
+    __m256h a[2];
+    __m512h v;
+  } u;
+  u.a[0] = __A;
+  return u.v;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextph128_ph512 (__m128h __A)
+{
+  return (__m512h) _mm512_insertf32x4 (_mm512_setzero_ps (),
+				       (__m128) __A, 0);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_zextph256_ph512 (__m256h __A)
+{
+  return (__m512h) _mm512_insertf64x4 (_mm512_setzero_pd (),
+				       (__m256d) __A, 0);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castps_ph (__m512 __a)
+{
+  return (__m512h) __a;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castpd_ph (__m512d __a)
+{
+  return (__m512h) __a;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_castsi512_ph (__m512i __a)
+{
+  return (__m512h) __a;
+}
+
 /* Create a vector with element 0 as F and the rest zero.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index 0124b830dd5..bcbe4523357 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -34,6 +34,123 @@
 #define __DISABLE_AVX512FP16VL__
 #endif /* __AVX512FP16VL__ */
 
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_castph_ps (__m128h __a)
+{
+  return (__m128) __a;
+}
+
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_castph_ps (__m256h __a)
+{
+  return (__m256) __a;
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_castph_pd (__m128h __a)
+{
+  return (__m128d) __a;
+}
+
+extern __inline __m256d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_castph_pd (__m256h __a)
+{
+  return (__m256d) __a;
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_castph_si128 (__m128h __a)
+{
+  return (__m128i) __a;
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_castph_si256 (__m256h __a)
+{
+  return (__m256i) __a;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_castps_ph (__m128 __a)
+{
+  return (__m128h) __a;
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_castps_ph (__m256 __a)
+{
+  return (__m256h) __a;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_castpd_ph (__m128d __a)
+{
+  return (__m128h) __a;
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_castpd_ph (__m256d __a)
+{
+  return (__m256h) __a;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_castsi128_ph (__m128i __a)
+{
+  return (__m128h) __a;
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_castsi256_ph (__m256i __a)
+{
+  return (__m256h) __a;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_castph256_ph128 (__m256h __A)
+{
+  union
+  {
+    __m128h a[2];
+    __m256h v;
+  } u = { .v = __A };
+  return u.a[0];
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_castph128_ph256 (__m128h __A)
+{
+  union
+  {
+    __m128h a[2];
+    __m256h v;
+  } u;
+  u.a[0] = __A;
+  return u.v;
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_zextph128_ph256 (__m128h __A)
+{
+  return (__m256h) _mm256_insertf128_ps (_mm256_setzero_ps (),
+					 (__m128) __A, 0);
+}
+
 /* Intrinsics v[add,sub,mul,div]ph.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-1.c
new file mode 100644
index 00000000000..cf0cc7443c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-1.c
@@ -0,0 +1,44 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+void
+test_512 (void)
+{
+  V512 res;
+
+  res.ymmh[0] = _mm512_castph512_ph256 (src1.zmmh);
+  check_results (&res, &src1, 16, "_mm512_castph512_ph256");
+
+  res.xmmh[0] = _mm512_castph512_ph128 (src1.zmmh);
+  check_results (&res, &src1, 8, "_mm512_castph512_ph128");
+
+  res.zmmh = _mm512_castph256_ph512 (src1.ymmh[0]);
+  check_results (&res, &src1, 16, "_mm512_castph256_ph512");
+
+  res.zmmh = _mm512_castph128_ph512 (src1.xmmh[0]);
+  check_results (&res, &src1, 8, "_mm512_castph128_ph512");
+
+  res.zmm = _mm512_castph_ps (src1.zmmh);
+  check_results (&res, &src1, 32, "_mm512_castph_ps");
+
+  res.zmmd = _mm512_castph_pd (src1.zmmh);
+  check_results (&res, &src1, 32, "_mm512_castph_pd");
+
+  res.zmmi = _mm512_castph_si512 (src1.zmmh);
+  check_results (&res, &src1, 32, "_mm512_castph_si512");
+
+  res.zmmh = _mm512_castps_ph (src1.zmm);
+  check_results (&res, &src1, 32, "_mm512_castps_ph");
+
+  res.zmmh = _mm512_castpd_ph (src1.zmmd);
+  check_results (&res, &src1, 32, "_mm512_castpd_ph");
+
+  res.zmmh = _mm512_castsi512_ph (src1.zmmi);
+  check_results (&res, &src1, 32, "_mm512_castsi512_ph");
+
+  if (n_errs != 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-2.c
new file mode 100644
index 00000000000..a29f1dbd76a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-typecast-2.c
@@ -0,0 +1,43 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512f-check.h"
+
+extern int memcmp (const void *, const void *, __SIZE_TYPE__);
+
+void
+do_test (void)
+{
+  union512i_d zero;
+  union512h ad;
+  union256h b,bd;
+  union128h c;
+
+  int i;
+
+  for (i = 0; i < 16; i++)
+    {
+      b.a[i] = 65.43f + i;
+      zero.a[i] = 0;
+    }
+
+  for (i = 0; i < 8; i++)
+    {
+      c.a[i] = 32.01f + i;
+    }
+
+  ad.x = _mm512_zextph256_ph512 (b.x);
+  if (memcmp (ad.a, b.a, 32)
+      || memcmp (&ad.a[16], &zero.a, 32))
+    abort ();
+
+  ad.x = _mm512_zextph128_ph512 (c.x);
+  if (memcmp (ad.a, c.a, 16)
+      || memcmp (&ad.a[8], &zero.a, 48))
+    abort ();
+   
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-1.c
new file mode 100644
index 00000000000..3621bb52f08
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-1.c
@@ -0,0 +1,55 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+void
+test_512 (void)
+{
+  V512 res;
+  res.xmm[0] = _mm_castph_ps (src1.xmmh[0]);
+  check_results (&res, &src1, 8, "_mm_castph_ps");
+
+  res.xmmd[0] = _mm_castph_pd (src1.xmmh[0]);
+  check_results (&res, &src1, 8, "_mm_castph_pd");
+
+  res.xmmi[0] = _mm_castph_si128 (src1.xmmh[0]);
+  check_results (&res, &src1, 8, "_mm_castph_si128");
+
+  res.xmmh[0] = _mm_castps_ph (src1.xmm[0]);
+  check_results (&res, &src1, 8, "_mm_castps_ph");
+
+  res.xmmh[0] = _mm_castpd_ph (src1.xmmd[0]);
+  check_results (&res, &src1, 8, "_mm_castpd_ph");
+
+  res.xmmh[0] = _mm_castsi128_ph (src1.xmmi[0]);
+  check_results (&res, &src1, 8, "_mm_castsi128_ph");
+
+  res.ymm[0] = _mm256_castph_ps (src1.ymmh[0]);
+  check_results (&res, &src1, 16, "_mm256_castph_ps");
+
+  res.ymmd[0] = _mm256_castph_pd (src1.ymmh[0]);
+  check_results (&res, &src1, 16, "_mm256_castph_pd");
+
+  res.ymmi[0] = _mm256_castph_si256 (src1.ymmh[0]);
+  check_results (&res, &src1, 16, "_mm256_castph_si256");
+
+  res.ymmh[0] = _mm256_castps_ph (src1.ymm[0]);
+  check_results (&res, &src1, 16, "_mm256_castps_ph");
+
+  res.ymmh[0] = _mm256_castpd_ph (src1.ymmd[0]);
+  check_results (&res, &src1, 16, "_mm256_castpd_ph");
+
+  res.ymmh[0] = _mm256_castsi256_ph (src1.ymmi[0]);
+  check_results (&res, &src1, 16, "_mm256_castsi256_ph");
+
+  res.xmmh[0] = _mm256_castph256_ph128 (src1.ymmh[0]);
+  check_results (&res, &src1, 8, "_mm256_castph256_ph128");
+
+  res.ymmh[0] = _mm256_castph128_ph256 (src1.xmmh[0]);
+  check_results (&res, &src1, 8, "_mm256_castph128_ph256");
+  
+  if (n_errs != 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-2.c
new file mode 100644
index 00000000000..dce387f1fab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-typecast-2.c
@@ -0,0 +1,37 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512f-check.h"
+
+extern int memcmp (const void *, const void *, __SIZE_TYPE__);
+
+void
+do_test (void)
+{
+  union512i_d zero;
+  union512h ad;
+  union256h b,bd;
+  union128h c;
+
+  int i;
+
+  for (i = 0; i < 16; i++)
+    {
+      b.a[i] = 65.43f + i;
+      zero.a[i] = 0;
+    }
+
+  for (i = 0; i < 8; i++)
+    {
+      c.a[i] = 32.01f + i;
+    }
+   
+  bd.x = _mm256_zextph128_ph256 (c.x);
+  if (memcmp (bd.a, c.a, 16)
+      || memcmp (&bd.a[8], &zero.a, 16))
+    abort ();
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 40/62] AVX512FP16: Add vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (38 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 39/62] AVX512FP16: Add intrinsics for casting between vector float16 and vector float32/float64/integer liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-09-18  7:04   ` Hongtao Liu
  2021-07-01  6:16 ` [PATCH 41/62] AVX512FP16: Add testcase for " liuhongt
                   ` (21 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm512_fmaddsub_ph):
	New intrinsic.
	(_mm512_mask_fmaddsub_ph): Likewise.
	(_mm512_mask3_fmaddsub_ph): Likewise.
	(_mm512_maskz_fmaddsub_ph): Likewise.
	(_mm512_fmaddsub_round_ph): Likewise.
	(_mm512_mask_fmaddsub_round_ph): Likewise.
	(_mm512_mask3_fmaddsub_round_ph): Likewise.
	(_mm512_maskz_fmaddsub_round_ph): Likewise.
	(_mm512_mask_fmsubadd_ph): Likewise.
	(_mm512_mask3_fmsubadd_ph): Likewise.
	(_mm512_maskz_fmsubadd_ph): Likewise.
	(_mm512_fmsubadd_round_ph): Likewise.
	(_mm512_mask_fmsubadd_round_ph): Likewise.
	(_mm512_mask3_fmsubadd_round_ph): Likewise.
	(_mm512_maskz_fmsubadd_round_ph): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm256_fmaddsub_ph):
	New intrinsic.
	(_mm256_mask_fmaddsub_ph): Likewise.
	(_mm256_mask3_fmaddsub_ph): Likewise.
	(_mm256_maskz_fmaddsub_ph): Likewise.
	(_mm_fmaddsub_ph): Likewise.
	(_mm_mask_fmaddsub_ph): Likewise.
	(_mm_mask3_fmaddsub_ph): Likewise.
	(_mm_maskz_fmaddsub_ph): Likewise.
	(_mm256_fmsubadd_ph): Likewise.
	(_mm256_mask_fmsubadd_ph): Likewise.
	(_mm256_mask3_fmsubadd_ph): Likewise.
	(_mm256_maskz_fmsubadd_ph): Likewise.
	(_mm_fmsubadd_ph): Likewise.
	(_mm_mask_fmsubadd_ph): Likewise.
	(_mm_mask3_fmsubadd_ph): Likewise.
	(_mm_maskz_fmsubadd_ph): Likewise.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/sse.md (VFH_SF_AVX512VL): New mode iterator.
	* (<avx512>_fmsubadd_<mode>_maskz<round_expand_name>): New expander.
	* (<avx512>_fmaddsub_<mode>_maskz<round_expand_name>): Use
	VFH_SF_AVX512VL.
	* (<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>):
	Ditto.
	* (<avx512>_fmaddsub_<mode>_mask<round_name>): Ditto.
	* (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto.
	* (<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>):
	Ditto.
	* (<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto.
	* (<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 228 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   | 182 ++++++++++++++++++++
 gcc/config/i386/i386-builtin.def       |  18 ++
 gcc/config/i386/sse.md                 | 103 ++++++-----
 gcc/testsuite/gcc.target/i386/avx-1.c  |   6 +
 gcc/testsuite/gcc.target/i386/sse-13.c |   6 +
 gcc/testsuite/gcc.target/i386/sse-14.c |   8 +
 gcc/testsuite/gcc.target/i386/sse-22.c |   8 +
 gcc/testsuite/gcc.target/i386/sse-23.c |   6 +
 9 files changed, 524 insertions(+), 41 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index ddb227529fa..4092663b504 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -5037,6 +5037,234 @@ _mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vfmaddsub[132,213,231]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) -1,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmaddsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) __U,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U,
+					 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmaddsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U,
+					 _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) -1, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmaddsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmaddsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U, __R);
+}
+
+#else
+#define _mm512_fmaddsub_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmaddsub_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmaddsub_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfmaddsubph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmaddsub_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmaddsubph512_maskz((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfmsubadd[132,213,231]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+  _mm512_fmsubadd_ph (__m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) -1,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmsubadd_ph (__m512h __A, __mmask32 __U,
+			 __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) __U,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmsubadd_ph (__m512h __A, __m512h __B,
+			  __m512h __C, __mmask32 __U)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U,
+					 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmsubadd_ph (__mmask32 __U, __m512h __A,
+			  __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U,
+					 _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmsubadd_round_ph (__m512h __A, __m512h __B,
+			  __m512h __C, const int __R)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) -1, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmsubadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
+					(__v32hf) __B,
+					(__v32hf) __C,
+					(__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmsubadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
+					 (__v32hf) __B,
+					 (__v32hf) __C,
+					 (__mmask32) __U, __R);
+}
+
+#else
+#define _mm512_fmsubadd_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmsubadd_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmsubadd_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfmsubaddph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmsubadd_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmsubaddph512_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index bcbe4523357..8825fae52aa 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -2269,6 +2269,188 @@ _mm256_maskz_cvtpd_ph (__mmask8 __A, __m256d __B)
 					     __A);
 }
 
+/* Intrinsics vfmaddsub[132,213,231]ph.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmaddsub_ph (__m256h __A, __m256h __B, __m256h __C)
+{
+  return (__m256h)__builtin_ia32_vfmaddsubph256_mask ((__v16hf)__A,
+						      (__v16hf)__B,
+						      (__v16hf)__C,
+						      (__mmask16)-1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmaddsub_ph (__m256h __A, __mmask16 __U, __m256h __B,
+			 __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmaddsubph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) __U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmaddsub_ph (__m256h __A, __m256h __B, __m256h __C,
+			  __mmask16 __U)
+{
+  return (__m256h) __builtin_ia32_vfmaddsubph256_mask3 ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmaddsub_ph (__mmask16 __U, __m256h __A, __m256h __B,
+			  __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmaddsubph256_maskz ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmaddsub_ph (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h)__builtin_ia32_vfmaddsubph128_mask ((__v8hf)__A,
+						      (__v8hf)__B,
+						      (__v8hf)__C,
+						      (__mmask8)-1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmaddsub_ph (__m128h __A, __mmask8 __U, __m128h __B,
+		      __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmaddsubph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmaddsub_ph (__m128h __A, __m128h __B, __m128h __C,
+		       __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfmaddsubph128_mask3 ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmaddsub_ph (__mmask8 __U, __m128h __A, __m128h __B,
+		       __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmaddsubph128_maskz ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+/* Intrinsics vfmsubadd[132,213,231]ph.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmsubadd_ph (__m256h __A, __m256h __B, __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmsubaddph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmsubadd_ph (__m256h __A, __mmask16 __U, __m256h __B,
+			 __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmsubaddph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) __U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmsubadd_ph (__m256h __A, __m256h __B, __m256h __C,
+			  __mmask16 __U)
+{
+  return (__m256h) __builtin_ia32_vfmsubaddph256_mask3 ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmsubadd_ph (__mmask16 __U, __m256h __A, __m256h __B,
+			  __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmsubaddph256_maskz ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmsubadd_ph (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmsubaddph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmsubadd_ph (__m128h __A, __mmask8 __U, __m128h __B,
+		      __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmsubaddph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmsubadd_ph (__m128h __A, __m128h __B, __m128h __C,
+		       __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfmsubaddph128_mask3 ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmsubadd_ph (__mmask8 __U, __m128h __A, __m128h __B,
+		       __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmsubaddph128_maskz ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 4bb48bc21dc..42bba719ec3 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2875,6 +2875,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v8sf_mask, "__builtin_ia32_vcvtps2ph_v8sf_mask", IX86_BUILTIN_VCVTPS2PH_V8SF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8SF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v2df_mask, "__builtin_ia32_vcvtpd2ph_v2df_mask", IX86_BUILTIN_VCVTPD2PH_V2DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v4df_mask, "__builtin_ia32_vcvtpd2ph_v4df_mask", IX86_BUILTIN_VCVTPD2PH_V4DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddsub_v16hf_mask, "__builtin_ia32_vfmaddsubph256_mask", IX86_BUILTIN_VFMADDSUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddsub_v16hf_mask3, "__builtin_ia32_vfmaddsubph256_mask3", IX86_BUILTIN_VFMADDSUBPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddsub_v16hf_maskz, "__builtin_ia32_vfmaddsubph256_maskz", IX86_BUILTIN_VFMADDSUBPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddsub_v8hf_mask, "__builtin_ia32_vfmaddsubph128_mask", IX86_BUILTIN_VFMADDSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddsub_v8hf_mask3, "__builtin_ia32_vfmaddsubph128_mask3", IX86_BUILTIN_VFMADDSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddsub_v8hf_maskz, "__builtin_ia32_vfmaddsubph128_maskz", IX86_BUILTIN_VFMADDSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsubadd_v16hf_mask, "__builtin_ia32_vfmsubaddph256_mask", IX86_BUILTIN_VFMSUBADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsubadd_v16hf_mask3, "__builtin_ia32_vfmsubaddph256_mask3", IX86_BUILTIN_VFMSUBADDPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsubadd_v16hf_maskz, "__builtin_ia32_vfmsubaddph256_maskz", IX86_BUILTIN_VFMSUBADDPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask, "__builtin_ia32_vfmsubaddph128_mask", IX86_BUILTIN_VFMSUBADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask3, "__builtin_ia32_vfmsubaddph128_mask3", IX86_BUILTIN_VFMSUBADDPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_maskz, "__builtin_ia32_vfmsubaddph128_maskz", IX86_BUILTIN_VFMSUBADDPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3140,6 +3152,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2ss_mask_round,
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2sd_mask_round, "__builtin_ia32_vcvtsh2sd_mask_round", IX86_BUILTIN_VCVTSH2SD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtss2sh_mask_round, "__builtin_ia32_vcvtss2sh_mask_round", IX86_BUILTIN_VCVTSS2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsd2sh_mask_round, "__builtin_ia32_vcvtsd2sh_mask_round", IX86_BUILTIN_VCVTSD2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask_round, "__builtin_ia32_vfmaddsubph512_mask", IX86_BUILTIN_VFMADDSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask3_round, "__builtin_ia32_vfmaddsubph512_mask3", IX86_BUILTIN_VFMADDSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_round, "__builtin_ia32_vfmaddsubph512_maskz", IX86_BUILTIN_VFMADDSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 95f4a82c9cd..847684e232e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4542,6 +4542,13 @@ (define_mode_iterator VF_SF_AVX512VL
   [SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
+(define_mode_iterator VFH_SF_AVX512VL
+  [(V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+   DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+
 (define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
   [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
 	(fma:VF_SF_AVX512VL
@@ -4848,10 +4855,10 @@ (define_expand "fmaddsub_<mode>"
   "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
 
 (define_expand "<avx512>_fmaddsub_<mode>_maskz<round_expand_name>"
-  [(match_operand:VF_AVX512VL 0 "register_operand")
-   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
+  [(match_operand:VFH_AVX512VL 0 "register_operand")
+   (match_operand:VFH_AVX512VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 3 "<round_expand_nimm_predicate>")
    (match_operand:<avx512fmaskmode> 4 "register_operand")]
   "TARGET_AVX512F"
 {
@@ -4861,6 +4868,20 @@ (define_expand "<avx512>_fmaddsub_<mode>_maskz<round_expand_name>"
   DONE;
 })
 
+(define_expand "<avx512>_fmsubadd_<mode>_maskz<round_expand_name>"
+  [(match_operand:VFH_AVX512VL 0 "register_operand")
+   (match_operand:VFH_AVX512VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 3 "<round_expand_nimm_predicate>")
+   (match_operand:<avx512fmaskmode> 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_fma_fmsubadd_<mode>_maskz_1<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
 (define_insn "*fma_fmaddsub_<mode>"
   [(set (match_operand:VF_128_256 0 "register_operand" "=v,v,v,x,x")
 	(unspec:VF_128_256
@@ -4880,11 +4901,11 @@ (define_insn "*fma_fmaddsub_<mode>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>"
-  [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
-	(unspec:VF_SF_AVX512VL
-	  [(match_operand:VF_SF_AVX512VL 1 "<round_nimm_predicate>" "%0,0,v")
-	   (match_operand:VF_SF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>")
-	   (match_operand:VF_SF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>,0")]
+  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
+	(unspec:VFH_SF_AVX512VL
+	  [(match_operand:VFH_SF_AVX512VL 1 "<round_nimm_predicate>" "%0,0,v")
+	   (match_operand:VFH_SF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>")
+	   (match_operand:VFH_SF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>,0")]
 	  UNSPEC_FMADDSUB))]
   "TARGET_AVX512F && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
@@ -4895,12 +4916,12 @@ (define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fmaddsub_<mode>_mask<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
-	(vec_merge:VF_AVX512VL
-	  (unspec:VF_AVX512VL
-	    [(match_operand:VF_AVX512VL 1 "register_operand" "0,0")
-	     (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
-	     (match_operand:VF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")]
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
+	(vec_merge:VFH_AVX512VL
+	  (unspec:VFH_AVX512VL
+	    [(match_operand:VFH_AVX512VL 1 "register_operand" "0,0")
+	     (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
+	     (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
@@ -4912,12 +4933,12 @@ (define_insn "<avx512>_fmaddsub_<mode>_mask<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fmaddsub_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(vec_merge:VF_AVX512VL
-	  (unspec:VF_AVX512VL
-	    [(match_operand:VF_AVX512VL 1 "register_operand" "v")
-	     (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
-	     (match_operand:VF_AVX512VL 3 "register_operand" "0")]
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(vec_merge:VFH_AVX512VL
+	  (unspec:VFH_AVX512VL
+	    [(match_operand:VFH_AVX512VL 1 "register_operand" "v")
+	     (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
+	     (match_operand:VFH_AVX512VL 3 "register_operand" "0")]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
@@ -4946,12 +4967,12 @@ (define_insn "*fma_fmsubadd_<mode>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>"
-  [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
-	(unspec:VF_SF_AVX512VL
-	  [(match_operand:VF_SF_AVX512VL   1 "<round_nimm_predicate>" "%0,0,v")
-	   (match_operand:VF_SF_AVX512VL   2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>")
-	   (neg:VF_SF_AVX512VL
-	     (match_operand:VF_SF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>,0"))]
+  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
+	(unspec:VFH_SF_AVX512VL
+	  [(match_operand:VFH_SF_AVX512VL   1 "<round_nimm_predicate>" "%0,0,v")
+	   (match_operand:VFH_SF_AVX512VL   2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>")
+	   (neg:VFH_SF_AVX512VL
+	     (match_operand:VFH_SF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>,0"))]
 	  UNSPEC_FMADDSUB))]
   "TARGET_AVX512F && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
@@ -4962,13 +4983,13 @@ (define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fmsubadd_<mode>_mask<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
-	(vec_merge:VF_AVX512VL
-	  (unspec:VF_AVX512VL
-	    [(match_operand:VF_AVX512VL 1 "register_operand" "0,0")
-	     (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
-	     (neg:VF_AVX512VL
-	       (match_operand:VF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))]
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
+	(vec_merge:VFH_AVX512VL
+	  (unspec:VFH_AVX512VL
+	    [(match_operand:VFH_AVX512VL 1 "register_operand" "0,0")
+	     (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
+	     (neg:VFH_AVX512VL
+	       (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
@@ -4980,13 +5001,13 @@ (define_insn "<avx512>_fmsubadd_<mode>_mask<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fmsubadd_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(vec_merge:VF_AVX512VL
-	  (unspec:VF_AVX512VL
-	    [(match_operand:VF_AVX512VL 1 "register_operand" "v")
-	     (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
-	     (neg:VF_AVX512VL
-	       (match_operand:VF_AVX512VL 3 "register_operand" "0"))]
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(vec_merge:VFH_AVX512VL
+	  (unspec:VFH_AVX512VL
+	    [(match_operand:VFH_AVX512VL 1 "register_operand" "v")
+	     (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
+	     (neg:VFH_AVX512VL
+	       (match_operand:VFH_AVX512VL 3 "register_operand" "0"))]
 	    UNSPEC_FMADDSUB)
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index deb25098f25..51a0cf2fe87 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -757,6 +757,12 @@
 #define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index dbe206bd1bb..a53f4653908 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -774,6 +774,12 @@
 #define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index e64321d8afa..48895e0dd0d 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -836,6 +836,8 @@ test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8)
 test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8)
 test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8)
 test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8)
+test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -868,6 +870,12 @@ test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8)
 test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8)
 test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8)
 test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8)
+test_4 (_mm512_mask_fmaddsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index d92898fdd11..bc530da388b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -939,6 +939,8 @@ test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8)
 test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8)
 test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8)
 test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8)
+test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -970,6 +972,12 @@ test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8)
 test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8)
 test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8)
 test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8)
+test_4 (_mm512_mask_fmaddsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 2f5027ba36f..df43931ca97 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -775,6 +775,12 @@
 #define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8)
 #define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 41/62] AVX512FP16: Add testcase for vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (39 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 40/62] AVX512FP16: Add vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 42/62] AVX512FP16: Add FP16 fma instructions liuhongt
                   ` (20 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c: Ditto.
---
 .../i386/avx512fp16-vfmaddsubXXXph-1a.c       |  28 +++
 .../i386/avx512fp16-vfmaddsubXXXph-1b.c       | 171 +++++++++++++++++
 .../i386/avx512fp16-vfmsubaddXXXph-1a.c       |  28 +++
 .../i386/avx512fp16-vfmsubaddXXXph-1b.c       | 175 ++++++++++++++++++
 .../i386/avx512fp16vl-vfmaddsubXXXph-1a.c     |  28 +++
 .../i386/avx512fp16vl-vfmaddsubXXXph-1b.c     |  15 ++
 .../i386/avx512fp16vl-vfmsubaddXXXph-1a.c     |  28 +++
 .../i386/avx512fp16vl-vfmsubaddXXXph-1b.c     |  15 ++
 8 files changed, 488 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c
new file mode 100644
index 00000000000..7063646ef58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm512_fmaddsub_ph (x1, x2, x3);
+  x1 = _mm512_mask_fmaddsub_ph (x1, m, x2, x3);
+  x3 = _mm512_mask3_fmaddsub_ph (x1, x2, x3, m);
+  x1 = _mm512_maskz_fmaddsub_ph (m, x1, x2, x3);
+  x1 = _mm512_fmaddsub_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
+  x1 = _mm512_mask_fmaddsub_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC);
+  x3 = _mm512_mask3_fmaddsub_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC);
+  x1 = _mm512_maskz_fmaddsub_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c
new file mode 100644
index 00000000000..16cf0af19d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c
@@ -0,0 +1,171 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(fmaddsub_ph) (V512 * dest, V512 op1, V512 op2,
+                    __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+            if (i % 2 == 1) {
+                v5.f32[i] = v1.f32[i] * v3.f32[i] + v7.f32[i];
+            }
+            else {
+                v5.f32[i] = v1.f32[i] * v3.f32[i] - v7.f32[i];
+            }
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+                v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            if (i % 2 == 1) {
+                v6.f32[i] = v2.f32[i] * v4.f32[i] + v8.f32[i];
+            }
+            else {
+                v6.f32[i] = v2.f32[i] * v4.f32[i] - v8.f32[i];
+            }
+        }
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void NOINLINE
+EMULATE(m_fmaddsub_ph) (V512 * dest, V512 op1, V512 op2,
+                    __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+            if (i % 2 == 1) {
+                v5.f32[i] = v1.f32[i] * v7.f32[i] + v3.f32[i];
+            }
+            else {
+                v5.f32[i] = v1.f32[i] * v7.f32[i] - v3.f32[i];
+            }
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+                v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            if (i % 2 == 1) {
+                v6.f32[i] = v2.f32[i] * v8.f32[i] + v4.f32[i];
+            }
+            else {
+                v6.f32[i] = v2.f32[i] * v8.f32[i] - v4.f32[i];
+            }
+        }
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(fmaddsub_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fmaddsub_ph) (HF(src1), HF(src2), HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmaddsub_ph);
+  init_dest(&res, &exp);
+  EMULATE(fmaddsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fmaddsub_ph) (HF(src1), HF(src2),
+				      HF(res), MASK_VALUE);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmaddsub_ph);
+  init_dest(&res, &exp);
+  EMULATE(m_fmaddsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fmaddsub_ph) (HF(res), MASK_VALUE,
+				     HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmaddsub_ph);
+  init_dest(&res, &exp);
+  EMULATE(fmaddsub_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fmaddsub_ph) (ZMASK_VALUE, HF(src1),
+				      HF(src2), HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmaddsub_ph);
+
+  init_dest(&res, &exp);
+#if AVX512F_LEN == 512
+  EMULATE(fmaddsub_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fmaddsub_round_ph) (HF(src1), HF(src2),
+				      HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmaddsub_ph);
+  init_dest(&res, &exp);
+  EMULATE(fmaddsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fmaddsub_round_ph) (HF(src1), HF(src2),
+					    HF(res), MASK_VALUE, _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmaddsub_ph);
+  init_dest(&res, &exp);
+  EMULATE(m_fmaddsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fmaddsub_round_ph) (HF(res), MASK_VALUE,
+					   HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmaddsub_ph);
+  init_dest(&res, &exp);
+  EMULATE(fmaddsub_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fmaddsub_round_ph) (ZMASK_VALUE, HF(src1),
+					    HF(src2), HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmaddsub_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c
new file mode 100644
index 00000000000..87087c9fb42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm512_fmsubadd_ph (x1, x2, x3);
+  x1 = _mm512_mask_fmsubadd_ph (x1, m, x2, x3);
+  x3 = _mm512_mask3_fmsubadd_ph (x1, x2, x3, m);
+  x1 = _mm512_maskz_fmsubadd_ph (m, x1, x2, x3);
+  x1 = _mm512_fmsubadd_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
+  x1 = _mm512_mask_fmsubadd_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC);
+  x3 = _mm512_mask3_fmsubadd_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC);
+  x1 = _mm512_maskz_fmsubadd_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c
new file mode 100644
index 00000000000..159cae4bb26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c
@@ -0,0 +1,175 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(fmsubadd_ph) (V512 * dest, V512 op1, V512 op2,
+                    __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+            if (i % 2 == 1) {
+                v5.f32[i] = v1.f32[i] * v3.f32[i] - v7.f32[i];
+            }
+            else {
+                v5.f32[i] = v1.f32[i] * v3.f32[i] + v7.f32[i];
+            }
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            if (i % 2 == 1) {
+                v6.f32[i] = v2.f32[i] * v4.f32[i] - v8.f32[i];
+            }
+            else {
+                v6.f32[i] = v2.f32[i] * v4.f32[i] + v8.f32[i];
+            }
+        }
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void NOINLINE
+EMULATE(m_fmsubadd_ph) (V512 * dest, V512 op1, V512 op2,
+                    __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+               v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+            if (i % 2 == 1) {
+                v5.f32[i] = v1.f32[i] * v7.f32[i] - v3.f32[i];
+            }
+            else {
+                v5.f32[i] = v1.f32[i] * v7.f32[i] + v3.f32[i];
+            }
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            if (i % 2 == 1) {
+                v6.f32[i] = v2.f32[i] * v8.f32[i] - v4.f32[i];
+            }
+            else {
+                v6.f32[i] = v2.f32[i] * v8.f32[i] + v4.f32[i];
+            }
+        }
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(fmsubadd_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fmsubadd_ph) (HF(src1), HF(src2), HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmsubadd_ph);
+  init_dest(&res, &exp);
+  EMULATE(fmsubadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fmsubadd_ph) (HF(src1), HF(src2),
+				      HF(res), MASK_VALUE);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmsubadd_ph);
+  init_dest(&res, &exp);
+  EMULATE(m_fmsubadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fmsubadd_ph) (HF(res), MASK_VALUE,
+				     HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmsubadd_ph);
+  init_dest(&res, &exp);
+  EMULATE(fmsubadd_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fmsubadd_ph) (ZMASK_VALUE, HF(src1),
+				      HF(src2), HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmsubadd_ph);
+
+  init_dest(&res, &exp);
+#if AVX512F_LEN == 512
+  EMULATE(fmsubadd_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fmsubadd_round_ph) (HF(src1), HF(src2),
+				      HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmsubadd_ph);
+  init_dest(&res, &exp);
+  EMULATE(fmsubadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fmsubadd_round_ph) (HF(src1), HF(src2),
+					    HF(res), MASK_VALUE,
+					    _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmsubadd_ph);
+  init_dest(&res, &exp);
+  EMULATE(m_fmsubadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fmsubadd_round_ph) (HF(res), MASK_VALUE,
+					   HF(src1), HF(src2),
+					   _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmsubadd_ph);
+  init_dest(&res, &exp);
+  EMULATE(fmsubadd_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fmsubadd_round_ph) (ZMASK_VALUE, HF(src1),
+					    HF(src2), HF(res),
+					    _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmsubadd_ph);
+#endif
+
+  if (n_errs != 0) {
+    abort ();
+  }
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c
new file mode 100644
index 00000000000..963fbb6af90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h yy, y2, y3;
+volatile __m128h xx, x2, x3;
+volatile __mmask8 m;
+volatile __mmask16 m16;
+
+void extern
+avx512vl_test (void)
+{
+  yy = _mm256_mask_fmaddsub_ph (yy, m16, y2, y3);
+  xx = _mm_mask_fmaddsub_ph (xx, m, x2, x3);
+
+  y3 = _mm256_mask3_fmaddsub_ph (yy, y2, y3, m16);
+  x3 = _mm_mask3_fmaddsub_ph (xx, x2, x3, m);
+
+  yy = _mm256_maskz_fmaddsub_ph (m16, yy, y2, y3);
+  xx = _mm_maskz_fmaddsub_ph (m, xx, x2, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c
new file mode 100644
index 00000000000..7f9748b7e26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmaddsubXXXph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmaddsubXXXph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c
new file mode 100644
index 00000000000..0316b8e0714
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h yy, y2, y3;
+volatile __m128h xx, x2, x3;
+volatile __mmask8 m;
+volatile __mmask16 m16;
+
+void extern
+avx512vl_test (void)
+{
+  yy = _mm256_mask_fmsubadd_ph (yy, m16, y2, y3);
+  xx = _mm_mask_fmsubadd_ph (xx, m, x2, x3);
+
+  y3 = _mm256_mask3_fmsubadd_ph (yy, y2, y3, m16);
+  x3 = _mm_mask3_fmsubadd_ph (xx, x2, x3, m);
+
+  yy = _mm256_maskz_fmsubadd_ph (m16, yy, y2, y3);
+  xx = _mm_maskz_fmsubadd_ph (m, xx, x2, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c
new file mode 100644
index 00000000000..c8caca105ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmsubaddXXXph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmsubaddXXXph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 42/62] AVX512FP16: Add FP16 fma instructions.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (40 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 41/62] AVX512FP16: Add testcase for " liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 43/62] AVX512FP16: Add testcase for " liuhongt
                   ` (19 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

Add vfmadd[132,213,231]ph/vfnmadd[132,213,231]ph/vfmsub[132,213,231]ph/
vfnmsub[132,213,231]ph.

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm512_mask_fmadd_ph):
	New intrinsic.
	(_mm512_mask3_fmadd_ph): Likewise.
	(_mm512_maskz_fmadd_ph): Likewise.
	(_mm512_fmadd_round_ph): Likewise.
	(_mm512_mask_fmadd_round_ph): Likewise.
	(_mm512_mask3_fmadd_round_ph): Likewise.
	(_mm512_maskz_fmadd_round_ph): Likewise.
	(_mm512_fnmadd_ph): Likewise.
	(_mm512_mask_fnmadd_ph): Likewise.
	(_mm512_mask3_fnmadd_ph): Likewise.
	(_mm512_maskz_fnmadd_ph): Likewise.
	(_mm512_fnmadd_round_ph): Likewise.
	(_mm512_mask_fnmadd_round_ph): Likewise.
	(_mm512_mask3_fnmadd_round_ph): Likewise.
	(_mm512_maskz_fnmadd_round_ph): Likewise.
	(_mm512_fmsub_ph): Likewise.
	(_mm512_mask_fmsub_ph): Likewise.
	(_mm512_mask3_fmsub_ph): Likewise.
	(_mm512_maskz_fmsub_ph): Likewise.
	(_mm512_fmsub_round_ph): Likewise.
	(_mm512_mask_fmsub_round_ph): Likewise.
	(_mm512_mask3_fmsub_round_ph): Likewise.
	(_mm512_maskz_fmsub_round_ph): Likewise.
	(_mm512_fnmsub_ph): Likewise.
	(_mm512_mask_fnmsub_ph): Likewise.
	(_mm512_mask3_fnmsub_ph): Likewise.
	(_mm512_maskz_fnmsub_ph): Likewise.
	(_mm512_fnmsub_round_ph): Likewise.
	(_mm512_mask_fnmsub_round_ph): Likewise.
	(_mm512_mask3_fnmsub_round_ph): Likewise.
	(_mm512_maskz_fnmsub_round_ph): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm256_fmadd_ph):
	New intrinsic.
	(_mm256_mask_fmadd_ph): Likewise.
	(_mm256_mask3_fmadd_ph): Likewise.
	(_mm256_maskz_fmadd_ph): Likewise.
	(_mm_fmadd_ph): Likewise.
	(_mm_mask_fmadd_ph): Likewise.
	(_mm_mask3_fmadd_ph): Likewise.
	(_mm_maskz_fmadd_ph): Likewise.
	(_mm256_fnmadd_ph): Likewise.
	(_mm256_mask_fnmadd_ph): Likewise.
	(_mm256_mask3_fnmadd_ph): Likewise.
	(_mm256_maskz_fnmadd_ph): Likewise.
	(_mm_fnmadd_ph): Likewise.
	(_mm_mask_fnmadd_ph): Likewise.
	(_mm_mask3_fnmadd_ph): Likewise.
	(_mm_maskz_fnmadd_ph): Likewise.
	(_mm256_fmsub_ph): Likewise.
	(_mm256_mask_fmsub_ph): Likewise.
	(_mm256_mask3_fmsub_ph): Likewise.
	(_mm256_maskz_fmsub_ph): Likewise.
	(_mm_fmsub_ph): Likewise.
	(_mm_mask_fmsub_ph): Likewise.
	(_mm_mask3_fmsub_ph): Likewise.
	(_mm_maskz_fmsub_ph): Likewise.
	(_mm256_fnmsub_ph): Likewise.
	(_mm256_mask_fnmsub_ph): Likewise.
	(_mm256_mask3_fnmsub_ph): Likewise.
	(_mm256_maskz_fnmsub_ph): Likewise.
	(_mm_fnmsub_ph): Likewise.
	(_mm_mask_fnmsub_ph): Likewise.
	(_mm_mask3_fnmsub_ph): Likewise.
	(_mm_maskz_fnmsub_ph): Likewise.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/sse.md (avx512bcst): Add HF vector modes.
	(<avx512>_fmadd_<mode>_maskz<round_expand_name>): Adjust to
	support HF vector modes.
	(<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>):
	Ditto.
	(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_1): Ditto.
	(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_2): Ditto.
	(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_3): Ditto.
	(<avx512>_fmadd_<mode>_mask<round_name>): Ditto.
	(<avx512>_fmadd_<mode>_mask3<round_name>): Ditto.
	(<avx512>_fmsub_<mode>_maskz<round_expand_name>): Ditto.
	(<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>):
	Ditto.
	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1): Ditto.
	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2): Ditto.
	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3): Ditto.
	(<avx512>_fmsub_<mode>_mask<round_name>): Ditto.
	(<avx512>_fmsub_<mode>_mask3<round_name>): Ditto.
	(<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>):
	Ditto.
	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1): Ditto.
	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2): Ditto.
	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3): Ditto.
	(<avx512>_fnmadd_<mode>_mask<round_name>): Ditto.
	(<avx512>_fnmadd_<mode>_mask3<round_name>): Ditto.
	(<avx512>_fnmsub_<mode>_maskz<round_expand_name>): Ditto.
	(<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>):
	Ditto.
	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1): Ditto.
	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2): Ditto.
	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3): Ditto.
	(<avx512>_fnmsub_<mode>_mask<round_name>): Ditto.
	(<avx512>_fnmsub_<mode>_mask3<round_name>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test fot new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 432 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   | 364 +++++++++++++++++++++
 gcc/config/i386/i386-builtin.def       |  36 +++
 gcc/config/i386/sse.md                 | 196 +++++------
 gcc/testsuite/gcc.target/i386/avx-1.c  |  12 +
 gcc/testsuite/gcc.target/i386/sse-13.c |  12 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  16 +
 gcc/testsuite/gcc.target/i386/sse-22.c |  16 +
 gcc/testsuite/gcc.target/i386/sse-23.c |  12 +
 9 files changed, 999 insertions(+), 97 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 4092663b504..f246bab5159 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -5265,6 +5265,438 @@ _mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vfmadd[132,213,231]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+  _mm512_fmadd_ph (__m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+				     (__v32hf) __B,
+				     (__v32hf) __C,
+				     (__mmask32) -1,
+				     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+				     (__v32hf) __B,
+				     (__v32hf) __C,
+				     (__mmask32) __U,
+				     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) -1, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfmaddph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfmaddph512_mask3 ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfmaddph512_maskz ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
+
+#else
+#define _mm512_fmadd_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmadd_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmaddph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmadd_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfmaddph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmadd_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmaddph512_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfnmadd[132,213,231]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) -1,
+				      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fnmadd_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fnmadd_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+{
+  return (__m512h)
+    __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       (__mmask32) __U,
+				       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fnmadd_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       (__mmask32) __U,
+				       _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) -1, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fnmadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmaddph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fnmadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmaddph512_mask3 ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fnmadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmaddph512_maskz ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
+
+#else
+#define _mm512_fnmadd_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fnmadd_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfnmaddph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fnmadd_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfnmaddph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fnmadd_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfnmaddph512_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfmsub[132,213,231]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmsub_ph (__m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+				     (__v32hf) __B,
+				     (__v32hf) __C,
+				     (__mmask32) -1,
+				     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+				     (__v32hf) __B,
+				     (__v32hf) __C,
+				     (__mmask32) __U,
+				     _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) -1, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfmsubph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfmsubph512_mask3 ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfmsubph512_maskz ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
+
+#else
+#define _mm512_fmsub_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfmsubph512_mask((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fmsub_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmsubph512_mask((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fmsub_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfmsubph512_mask3((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fmsub_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfmsubph512_maskz((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfnmsub[132,213,231]ph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) -1,
+				      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fnmsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+				      (__v32hf) __B,
+				      (__v32hf) __C,
+				      (__mmask32) __U,
+				      _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fnmsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
+{
+  return (__m512h)
+    __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       (__mmask32) __U,
+				       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fnmsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
+				       (__v32hf) __B,
+				       (__v32hf) __C,
+				       (__mmask32) __U,
+				       _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) -1, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fnmsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
+			       __m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmsubph512_mask ((__v32hf) __A,
+						       (__v32hf) __B,
+						       (__v32hf) __C,
+						       (__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fnmsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
+				__mmask32 __U, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmsubph512_mask3 ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
+				__m512h __C, const int __R)
+{
+  return (__m512h) __builtin_ia32_vfnmsubph512_maskz ((__v32hf) __A,
+							(__v32hf) __B,
+							(__v32hf) __C,
+							(__mmask32) __U, __R);
+}
+
+#else
+#define _mm512_fnmsub_round_ph(A, B, C, R)				\
+  ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), -1, (R)))
+
+#define _mm512_mask_fnmsub_round_ph(A, U, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfnmsubph512_mask ((A), (B), (C), (U), (R)))
+
+#define _mm512_mask3_fnmsub_round_ph(A, B, C, U, R)			\
+  ((__m512h)__builtin_ia32_vfnmsubph512_mask3 ((A), (B), (C), (U), (R)))
+
+#define _mm512_maskz_fnmsub_round_ph(U, A, B, C, R)			\
+  ((__m512h)__builtin_ia32_vfnmsubph512_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index 8825fae52aa..bba98f105ac 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -2451,6 +2451,370 @@ _mm_maskz_fmsubadd_ph (__mmask8 __U, __m128h __A, __m128h __B,
 							__U);
 }
 
+/* Intrinsics vfmadd[132,213,231]ph.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmadd_ph (__m256h __A, __m256h __B, __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmaddph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmadd_ph (__m256h __A, __mmask16 __U, __m256h __B,
+			 __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmaddph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) __U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmadd_ph (__m256h __A, __m256h __B, __m256h __C,
+			  __mmask16 __U)
+{
+  return (__m256h) __builtin_ia32_vfmaddph256_mask3 ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmadd_ph (__mmask16 __U, __m256h __A, __m256h __B,
+			  __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmaddph256_maskz ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_ph (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmaddph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmadd_ph (__m128h __A, __mmask8 __U, __m128h __B,
+		      __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmaddph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmadd_ph (__m128h __A, __m128h __B, __m128h __C,
+		       __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfmaddph128_mask3 ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmadd_ph (__mmask8 __U, __m128h __A, __m128h __B,
+		       __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmaddph128_maskz ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+/* Intrinsics vfnmadd[132,213,231]ph.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fnmadd_ph (__m256h __A, __m256h __B, __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfnmaddph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fnmadd_ph (__m256h __A, __mmask16 __U, __m256h __B,
+			 __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfnmaddph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) __U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fnmadd_ph (__m256h __A, __m256h __B, __m256h __C,
+			  __mmask16 __U)
+{
+  return (__m256h) __builtin_ia32_vfnmaddph256_mask3 ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fnmadd_ph (__mmask16 __U, __m256h __A, __m256h __B,
+			  __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfnmaddph256_maskz ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmadd_ph (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfnmaddph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fnmadd_ph (__m128h __A, __mmask8 __U, __m128h __B,
+		      __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfnmaddph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmadd_ph (__m128h __A, __m128h __B, __m128h __C,
+		       __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfnmaddph128_mask3 ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmadd_ph (__mmask8 __U, __m128h __A, __m128h __B,
+		       __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfnmaddph128_maskz ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+/* Intrinsics vfmsub[132,213,231]ph.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmsub_ph (__m256h __A, __m256h __B, __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmsubph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmsub_ph (__m256h __A, __mmask16 __U, __m256h __B,
+			 __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmsubph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) __U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmsub_ph (__m256h __A, __m256h __B, __m256h __C,
+			  __mmask16 __U)
+{
+  return (__m256h) __builtin_ia32_vfmsubph256_mask3 ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmsub_ph (__mmask16 __U, __m256h __A, __m256h __B,
+			  __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfmsubph256_maskz ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmsub_ph (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmsubph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmsub_ph (__m128h __A, __mmask8 __U, __m128h __B,
+		      __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmsubph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmsub_ph (__m128h __A, __m128h __B, __m128h __C,
+		       __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfmsubph128_mask3 ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmsub_ph (__mmask8 __U, __m128h __A, __m128h __B,
+		       __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfmsubph128_maskz ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+/* Intrinsics vfnmsub[132,213,231]ph.  */
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fnmsub_ph (__m256h __A, __m256h __B, __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfnmsubph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) -1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fnmsub_ph (__m256h __A, __mmask16 __U, __m256h __B,
+			 __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfnmsubph256_mask ((__v16hf) __A,
+						       (__v16hf) __B,
+						       (__v16hf) __C,
+						       (__mmask16) __U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fnmsub_ph (__m256h __A, __m256h __B, __m256h __C,
+			  __mmask16 __U)
+{
+  return (__m256h) __builtin_ia32_vfnmsubph256_mask3 ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fnmsub_ph (__mmask16 __U, __m256h __A, __m256h __B,
+			  __m256h __C)
+{
+  return (__m256h) __builtin_ia32_vfnmsubph256_maskz ((__v16hf) __A,
+							(__v16hf) __B,
+							(__v16hf) __C,
+							(__mmask16)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_ph (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfnmsubph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) -1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fnmsub_ph (__m128h __A, __mmask8 __U, __m128h __B,
+		      __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfnmsubph128_mask ((__v8hf) __A,
+						       (__v8hf) __B,
+						       (__v8hf) __C,
+						       (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmsub_ph (__m128h __A, __m128h __B, __m128h __C,
+		       __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfnmsubph128_mask3 ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmsub_ph (__mmask8 __U, __m128h __A, __m128h __B,
+		       __m128h __C)
+{
+  return (__m128h) __builtin_ia32_vfnmsubph128_maskz ((__v8hf) __A,
+							(__v8hf) __B,
+							(__v8hf) __C,
+							(__mmask8)
+							__U);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 42bba719ec3..cf0259843cc 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2887,6 +2887,30 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask, "__builtin_ia32_vfmsubaddph128_mask", IX86_BUILTIN_VFMSUBADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask3, "__builtin_ia32_vfmsubaddph128_mask3", IX86_BUILTIN_VFMSUBADDPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_maskz, "__builtin_ia32_vfmsubaddph128_maskz", IX86_BUILTIN_VFMSUBADDPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmadd_v16hf_mask, "__builtin_ia32_vfmaddph256_mask", IX86_BUILTIN_VFMADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmadd_v16hf_mask3, "__builtin_ia32_vfmaddph256_mask3", IX86_BUILTIN_VFMADDPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmadd_v16hf_maskz, "__builtin_ia32_vfmaddph256_maskz", IX86_BUILTIN_VFMADDPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmadd_v8hf_mask, "__builtin_ia32_vfmaddph128_mask", IX86_BUILTIN_VFMADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmadd_v8hf_mask3, "__builtin_ia32_vfmaddph128_mask3", IX86_BUILTIN_VFMADDPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmadd_v8hf_maskz, "__builtin_ia32_vfmaddph128_maskz", IX86_BUILTIN_VFMADDPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmadd_v16hf_mask, "__builtin_ia32_vfnmaddph256_mask", IX86_BUILTIN_VFNMADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmadd_v16hf_mask3, "__builtin_ia32_vfnmaddph256_mask3", IX86_BUILTIN_VFNMADDPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmadd_v16hf_maskz, "__builtin_ia32_vfnmaddph256_maskz", IX86_BUILTIN_VFNMADDPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmadd_v8hf_mask, "__builtin_ia32_vfnmaddph128_mask", IX86_BUILTIN_VFNMADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmadd_v8hf_mask3, "__builtin_ia32_vfnmaddph128_mask3", IX86_BUILTIN_VFNMADDPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmadd_v8hf_maskz, "__builtin_ia32_vfnmaddph128_maskz", IX86_BUILTIN_VFNMADDPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsub_v16hf_mask, "__builtin_ia32_vfmsubph256_mask", IX86_BUILTIN_VFMSUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsub_v16hf_mask3, "__builtin_ia32_vfmsubph256_mask3", IX86_BUILTIN_VFMSUBPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsub_v16hf_maskz, "__builtin_ia32_vfmsubph256_maskz", IX86_BUILTIN_VFMSUBPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsub_v8hf_mask, "__builtin_ia32_vfmsubph128_mask", IX86_BUILTIN_VFMSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsub_v8hf_mask3, "__builtin_ia32_vfmsubph128_mask3", IX86_BUILTIN_VFMSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsub_v8hf_maskz, "__builtin_ia32_vfmsubph128_maskz", IX86_BUILTIN_VFMSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmsub_v16hf_mask, "__builtin_ia32_vfnmsubph256_mask", IX86_BUILTIN_VFNMSUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmsub_v16hf_mask3, "__builtin_ia32_vfnmsubph256_mask3", IX86_BUILTIN_VFNMSUBPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fnmsub_v16hf_maskz, "__builtin_ia32_vfnmsubph256_maskz", IX86_BUILTIN_VFNMSUBPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask, "__builtin_ia32_vfnmsubph128_mask", IX86_BUILTIN_VFNMSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask3, "__builtin_ia32_vfnmsubph128_mask3", IX86_BUILTIN_VFNMSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_maskz, "__builtin_ia32_vfnmsubph128_maskz", IX86_BUILTIN_VFNMSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3158,6 +3182,18 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_ro
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask_round, "__builtin_ia32_vfmaddph512_mask", IX86_BUILTIN_VFMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_mask3_round, "__builtin_ia32_vfmaddph512_mask3", IX86_BUILTIN_VFMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmadd_v32hf_maskz_round, "__builtin_ia32_vfmaddph512_maskz", IX86_BUILTIN_VFMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask_round, "__builtin_ia32_vfnmaddph512_mask", IX86_BUILTIN_VFNMADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_mask3_round, "__builtin_ia32_vfnmaddph512_mask3", IX86_BUILTIN_VFNMADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmadd_v32hf_maskz_round, "__builtin_ia32_vfnmaddph512_maskz", IX86_BUILTIN_VFNMADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask_round, "__builtin_ia32_vfmsubph512_mask", IX86_BUILTIN_VFMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_mask3_round, "__builtin_ia32_vfmsubph512_mask3", IX86_BUILTIN_VFMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round, "__builtin_ia32_vfmsubph512_maskz", IX86_BUILTIN_VFMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 847684e232e..fdcc0515228 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -825,7 +825,9 @@ (define_mode_attr avx512bcst
    (V16SI "%{1to16%}") (V8DI "%{1to8%}")
    (V4SF "%{1to4%}") (V2DF "%{1to2%}")
    (V8SF "%{1to8%}") (V4DF "%{1to4%}")
-   (V16SF "%{1to16%}") (V8DF "%{1to8%}")])
+   (V16SF "%{1to16%}") (V8DF "%{1to8%}")
+   (V8HF "%{1to8%}") (V16HF "%{1to16%}")
+   (V32HF "%{1to32%}")])
 
 ;; Mapping from float mode to required SSE level
 (define_mode_attr sse
@@ -4507,10 +4509,10 @@ (define_expand "fma4i_fnmsub_<mode>"
 	    (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand"))))])
 
 (define_expand "<avx512>_fmadd_<mode>_maskz<round_expand_name>"
-  [(match_operand:VF_AVX512VL 0 "register_operand")
-   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
+  [(match_operand:VFH_AVX512VL 0 "register_operand")
+   (match_operand:VFH_AVX512VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 3 "<round_expand_nimm_predicate>")
    (match_operand:<avx512fmaskmode> 4 "register_operand")]
   "TARGET_AVX512F && <round_mode512bit_condition>"
 {
@@ -4550,11 +4552,11 @@ (define_mode_iterator VFH_SF_AVX512VL
    DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
 (define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
-  [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
-	(fma:VF_SF_AVX512VL
-	  (match_operand:VF_SF_AVX512VL 1 "<bcst_round_nimm_predicate>" "%0,0,v")
-	  (match_operand:VF_SF_AVX512VL 2 "<bcst_round_nimm_predicate>" "<bcst_round_constraint>,v,<bcst_round_constraint>")
-	  (match_operand:VF_SF_AVX512VL 3 "<bcst_round_nimm_predicate>" "v,<bcst_round_constraint>,0")))]
+  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
+	(fma:VFH_SF_AVX512VL
+	  (match_operand:VFH_SF_AVX512VL 1 "<bcst_round_nimm_predicate>" "%0,0,v")
+	  (match_operand:VFH_SF_AVX512VL 2 "<bcst_round_nimm_predicate>" "<bcst_round_constraint>,v,<bcst_round_constraint>")
+	  (match_operand:VFH_SF_AVX512VL 3 "<bcst_round_nimm_predicate>" "v,<bcst_round_constraint>,0")))]
   "TARGET_AVX512F && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    vfmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
@@ -4564,12 +4566,12 @@ (define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
-	(vec_merge:VF_AVX512VL
-	  (fma:VF_AVX512VL
-	    (match_operand:VF_AVX512VL 1 "register_operand" "0,0")
-	    (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
-	    (match_operand:VF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
+	(vec_merge:VFH_AVX512VL
+	  (fma:VFH_AVX512VL
+	    (match_operand:VFH_AVX512VL 1 "register_operand" "0,0")
+	    (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
+	    (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
   "TARGET_AVX512F && <round_mode512bit_condition>"
@@ -4580,12 +4582,12 @@ (define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fmadd_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(vec_merge:VF_AVX512VL
-	  (fma:VF_AVX512VL
-	    (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v")
-	    (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
-	    (match_operand:VF_AVX512VL 3 "register_operand" "0"))
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(vec_merge:VFH_AVX512VL
+	  (fma:VFH_AVX512VL
+	    (match_operand:VFH_AVX512VL 1 "<round_nimm_predicate>" "%v")
+	    (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
+	    (match_operand:VFH_AVX512VL 3 "register_operand" "0"))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
   "TARGET_AVX512F"
@@ -4612,10 +4614,10 @@ (define_insn "*fma_fmsub_<mode>"
    (set_attr "mode" "<MODE>")])
 
 (define_expand "<avx512>_fmsub_<mode>_maskz<round_expand_name>"
-  [(match_operand:VF_AVX512VL 0 "register_operand")
-   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
+  [(match_operand:VFH_AVX512VL 0 "register_operand")
+   (match_operand:VFH_AVX512VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 3 "<round_expand_nimm_predicate>")
    (match_operand:<avx512fmaskmode> 4 "register_operand")]
   "TARGET_AVX512F && <round_mode512bit_condition>"
 {
@@ -4626,12 +4628,12 @@ (define_expand "<avx512>_fmsub_<mode>_maskz<round_expand_name>"
 })
 
 (define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>"
-  [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
-	(fma:VF_SF_AVX512VL
-	  (match_operand:VF_SF_AVX512VL   1 "<bcst_round_nimm_predicate>" "%0,0,v")
-	  (match_operand:VF_SF_AVX512VL   2 "<bcst_round_nimm_predicate>" "<bcst_round_constraint>,v,<bcst_round_constraint>")
-	  (neg:VF_SF_AVX512VL
-	    (match_operand:VF_SF_AVX512VL 3 "<bcst_round_nimm_predicate>" "v,<bcst_round_constraint>,0"))))]
+  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
+	(fma:VFH_SF_AVX512VL
+	  (match_operand:VFH_SF_AVX512VL   1 "<bcst_round_nimm_predicate>" "%0,0,v")
+	  (match_operand:VFH_SF_AVX512VL   2 "<bcst_round_nimm_predicate>" "<bcst_round_constraint>,v,<bcst_round_constraint>")
+	  (neg:VFH_SF_AVX512VL
+	    (match_operand:VFH_SF_AVX512VL 3 "<bcst_round_nimm_predicate>" "v,<bcst_round_constraint>,0"))))]
   "TARGET_AVX512F && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    vfmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
@@ -4641,13 +4643,13 @@ (define_insn "<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fmsub_<mode>_mask<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
-	(vec_merge:VF_AVX512VL
-	  (fma:VF_AVX512VL
-	    (match_operand:VF_AVX512VL 1 "register_operand" "0,0")
-	    (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
-	    (neg:VF_AVX512VL
-	      (match_operand:VF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")))
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
+	(vec_merge:VFH_AVX512VL
+	  (fma:VFH_AVX512VL
+	    (match_operand:VFH_AVX512VL 1 "register_operand" "0,0")
+	    (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
+	    (neg:VFH_AVX512VL
+	      (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
   "TARGET_AVX512F"
@@ -4658,13 +4660,13 @@ (define_insn "<avx512>_fmsub_<mode>_mask<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fmsub_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(vec_merge:VF_AVX512VL
-	  (fma:VF_AVX512VL
-	    (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v")
-	    (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
-	    (neg:VF_AVX512VL
-	      (match_operand:VF_AVX512VL 3 "register_operand" "0")))
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(vec_merge:VFH_AVX512VL
+	  (fma:VFH_AVX512VL
+	    (match_operand:VFH_AVX512VL 1 "<round_nimm_predicate>" "%v")
+	    (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
+	    (neg:VFH_AVX512VL
+	      (match_operand:VFH_AVX512VL 3 "register_operand" "0")))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
   "TARGET_AVX512F && <round_mode512bit_condition>"
@@ -4691,10 +4693,10 @@ (define_insn "*fma_fnmadd_<mode>"
    (set_attr "mode" "<MODE>")])
 
 (define_expand "<avx512>_fnmadd_<mode>_maskz<round_expand_name>"
-  [(match_operand:VF_AVX512VL 0 "register_operand")
-   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
+  [(match_operand:VFH_AVX512VL 0 "register_operand")
+   (match_operand:VFH_AVX512VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 3 "<round_expand_nimm_predicate>")
    (match_operand:<avx512fmaskmode> 4 "register_operand")]
   "TARGET_AVX512F && <round_mode512bit_condition>"
 {
@@ -4705,12 +4707,12 @@ (define_expand "<avx512>_fnmadd_<mode>_maskz<round_expand_name>"
 })
 
 (define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>"
-  [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
-	(fma:VF_SF_AVX512VL
-	  (neg:VF_SF_AVX512VL
-	    (match_operand:VF_SF_AVX512VL 1 "<bcst_round_nimm_predicate>" "%0,0,v"))
-	  (match_operand:VF_SF_AVX512VL   2 "<bcst_round_nimm_predicate>" "<bcst_round_constraint>,v,<bcst_round_constraint>")
-	  (match_operand:VF_SF_AVX512VL   3 "<bcst_round_nimm_predicate>" "v,<bcst_round_constraint>,0")))]
+  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
+	(fma:VFH_SF_AVX512VL
+	  (neg:VFH_SF_AVX512VL
+	    (match_operand:VFH_SF_AVX512VL 1 "<bcst_round_nimm_predicate>" "%0,0,v"))
+	  (match_operand:VFH_SF_AVX512VL   2 "<bcst_round_nimm_predicate>" "<bcst_round_constraint>,v,<bcst_round_constraint>")
+	  (match_operand:VFH_SF_AVX512VL   3 "<bcst_round_nimm_predicate>" "v,<bcst_round_constraint>,0")))]
   "TARGET_AVX512F && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    vfnmadd132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
@@ -4720,13 +4722,13 @@ (define_insn "<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fnmadd_<mode>_mask<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
-	(vec_merge:VF_AVX512VL
-	  (fma:VF_AVX512VL
-	    (neg:VF_AVX512VL
-	      (match_operand:VF_AVX512VL 1 "register_operand" "0,0"))
-	    (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
-	    (match_operand:VF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
+	(vec_merge:VFH_AVX512VL
+	  (fma:VFH_AVX512VL
+	    (neg:VFH_AVX512VL
+	      (match_operand:VFH_AVX512VL 1 "register_operand" "0,0"))
+	    (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
+	    (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
   "TARGET_AVX512F && <round_mode512bit_condition>"
@@ -4737,13 +4739,13 @@ (define_insn "<avx512>_fnmadd_<mode>_mask<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fnmadd_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(vec_merge:VF_AVX512VL
-	  (fma:VF_AVX512VL
-	    (neg:VF_AVX512VL
-	      (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v"))
-	    (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
-	    (match_operand:VF_AVX512VL 3 "register_operand" "0"))
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(vec_merge:VFH_AVX512VL
+	  (fma:VFH_AVX512VL
+	    (neg:VFH_AVX512VL
+	      (match_operand:VFH_AVX512VL 1 "<round_nimm_predicate>" "%v"))
+	    (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
+	    (match_operand:VFH_AVX512VL 3 "register_operand" "0"))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
   "TARGET_AVX512F && <round_mode512bit_condition>"
@@ -4771,10 +4773,10 @@ (define_insn "*fma_fnmsub_<mode>"
    (set_attr "mode" "<MODE>")])
 
 (define_expand "<avx512>_fnmsub_<mode>_maskz<round_expand_name>"
-  [(match_operand:VF_AVX512VL 0 "register_operand")
-   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
-   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
+  [(match_operand:VFH_AVX512VL 0 "register_operand")
+   (match_operand:VFH_AVX512VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_AVX512VL 3 "<round_expand_nimm_predicate>")
    (match_operand:<avx512fmaskmode> 4 "register_operand")]
   "TARGET_AVX512F && <round_mode512bit_condition>"
 {
@@ -4785,13 +4787,13 @@ (define_expand "<avx512>_fnmsub_<mode>_maskz<round_expand_name>"
 })
 
 (define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>"
-  [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
-	(fma:VF_SF_AVX512VL
-	  (neg:VF_SF_AVX512VL
-	    (match_operand:VF_SF_AVX512VL 1 "<bcst_round_nimm_predicate>" "%0,0,v"))
-	  (match_operand:VF_SF_AVX512VL 2 "<bcst_round_nimm_predicate>" "<bcst_round_constraint>,v,<bcst_round_constraint>")
-	  (neg:VF_SF_AVX512VL
-	    (match_operand:VF_SF_AVX512VL 3 "<bcst_round_nimm_predicate>" "v,<bcst_round_constraint>,0"))))]
+  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
+	(fma:VFH_SF_AVX512VL
+	  (neg:VFH_SF_AVX512VL
+	    (match_operand:VFH_SF_AVX512VL 1 "<bcst_round_nimm_predicate>" "%0,0,v"))
+	  (match_operand:VFH_SF_AVX512VL 2 "<bcst_round_nimm_predicate>" "<bcst_round_constraint>,v,<bcst_round_constraint>")
+	  (neg:VFH_SF_AVX512VL
+	    (match_operand:VFH_SF_AVX512VL 3 "<bcst_round_nimm_predicate>" "v,<bcst_round_constraint>,0"))))]
   "TARGET_AVX512F && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
   "@
    vfnmsub132<ssemodesuffix>\t{<round_sd_mask_op4>%2, %3, %0<sd_mask_op4>|%0<sd_mask_op4>, %3, %2<round_sd_mask_op4>}
@@ -4801,14 +4803,14 @@ (define_insn "<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fnmsub_<mode>_mask<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
-	(vec_merge:VF_AVX512VL
-	  (fma:VF_AVX512VL
-	    (neg:VF_AVX512VL
-	      (match_operand:VF_AVX512VL 1 "register_operand" "0,0"))
-	    (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
-	    (neg:VF_AVX512VL
-	      (match_operand:VF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")))
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
+	(vec_merge:VFH_AVX512VL
+	  (fma:VFH_AVX512VL
+	    (neg:VFH_AVX512VL
+	      (match_operand:VFH_AVX512VL 1 "register_operand" "0,0"))
+	    (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
+	    (neg:VFH_AVX512VL
+	      (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
   "TARGET_AVX512F && <round_mode512bit_condition>"
@@ -4819,14 +4821,14 @@ (define_insn "<avx512>_fnmsub_<mode>_mask<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<avx512>_fnmsub_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(vec_merge:VF_AVX512VL
-	  (fma:VF_AVX512VL
-	    (neg:VF_AVX512VL
-	      (match_operand:VF_AVX512VL 1 "<round_nimm_predicate>" "%v"))
-	    (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
-	    (neg:VF_AVX512VL
-	      (match_operand:VF_AVX512VL 3 "register_operand" "0")))
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
+	(vec_merge:VFH_AVX512VL
+	  (fma:VFH_AVX512VL
+	    (neg:VFH_AVX512VL
+	      (match_operand:VFH_AVX512VL 1 "<round_nimm_predicate>" "%v"))
+	    (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
+	    (neg:VFH_AVX512VL
+	      (match_operand:VFH_AVX512VL 3 "register_operand" "0")))
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
   "TARGET_AVX512F"
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 51a0cf2fe87..d2ab16538d8 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -763,6 +763,18 @@
 #define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index a53f4653908..49c72f6fcef 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -780,6 +780,18 @@
 #define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 48895e0dd0d..9151e50afd2 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -838,6 +838,10 @@ test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8)
 test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8)
 test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -876,6 +880,18 @@ test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __
 test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
 test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
 test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask_fmadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fmadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask_fnmadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fnmadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask_fmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index bc530da388b..892b6334ae2 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -941,6 +941,10 @@ test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8)
 test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8)
 test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -978,6 +982,18 @@ test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __
 test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
 test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
 test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask_fmadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fmadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask_fnmadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fnmadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask_fmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
+test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
+test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index df43931ca97..447b83829f3 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -781,6 +781,18 @@
 #define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddph512_mask(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 43/62] AVX512FP16: Add testcase for fma instructions
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (41 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 42/62] AVX512FP16: Add FP16 fma instructions liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 44/62] AVX512FP16: Add scalar/vector bitwise operations, including liuhongt
                   ` (18 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c: Ditto.
---
 .../i386/avx512fp16-vfmaddXXXph-1a.c          |  28 +++
 .../i386/avx512fp16-vfmaddXXXph-1b.c          | 160 ++++++++++++++++++
 .../i386/avx512fp16-vfmsubXXXph-1a.c          |  32 ++++
 .../i386/avx512fp16-vfmsubXXXph-1b.c          | 155 +++++++++++++++++
 .../i386/avx512fp16-vfnmaddXXXph-1a.c         |  28 +++
 .../i386/avx512fp16-vfnmaddXXXph-1b.c         | 159 +++++++++++++++++
 .../i386/avx512fp16-vfnmsubXXXph-1a.c         |  32 ++++
 .../i386/avx512fp16-vfnmsubXXXph-1b.c         | 157 +++++++++++++++++
 .../i386/avx512fp16vl-vfmaddXXXph-1a.c        |  28 +++
 .../i386/avx512fp16vl-vfmaddXXXph-1b.c        |  15 ++
 .../i386/avx512fp16vl-vfmsubXXXph-1a.c        |  28 +++
 .../i386/avx512fp16vl-vfmsubXXXph-1b.c        |  15 ++
 .../i386/avx512fp16vl-vfnmaddXXXph-1a.c       |  28 +++
 .../i386/avx512fp16vl-vfnmaddXXXph-1b.c       |  15 ++
 .../i386/avx512fp16vl-vfnmsubXXXph-1a.c       |  28 +++
 .../i386/avx512fp16vl-vfnmsubXXXph-1b.c       |  15 ++
 16 files changed, 923 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c
new file mode 100644
index 00000000000..f9e2777196a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmadd231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm512_fmadd_ph (x1, x2, x3);
+  x1 = _mm512_mask_fmadd_ph (x1, m, x2, x3);
+  x3 = _mm512_mask3_fmadd_ph (x1, x2, x3, m);
+  x1 = _mm512_maskz_fmadd_ph (m, x1, x2, x3);
+  x1 = _mm512_fmadd_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
+  x1 = _mm512_mask_fmadd_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC);
+  x3 = _mm512_mask3_fmadd_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC);
+  x1 = _mm512_maskz_fmadd_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c
new file mode 100644
index 00000000000..71c2b8fb930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c
@@ -0,0 +1,160 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(fmadd_ph) (V512 * dest, V512 op1, V512 op2,
+                 __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8; 
+    int i;
+    __mmask16 m1, m2;
+    
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+ 
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = v1.f32[i] * v3.f32[i] + v7.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = v2.f32[i] * v4.f32[i] + v8.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void NOINLINE
+EMULATE(m_fmadd_ph) (V512 * dest, V512 op1, V512 op2,
+                   __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8; 
+    int i;
+    __mmask16 m1, m2;
+    
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+ 
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = v7.f32[i] * v1.f32[i] + v3.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = v8.f32[i] * v2.f32[i] + v4.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(fmadd_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fmadd_ph) (HF(src1), HF(src2),
+				   HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(m_fmadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fmadd_ph) (HF(res), MASK_VALUE,
+					HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fmadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fmadd_ph) (HF(src1), HF(src2),
+				   HF(res), MASK_VALUE);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fmadd_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fmadd_ph) (ZMASK_VALUE, HF(src1),
+				   HF(src2), HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmadd_ph);
+
+#if AVX512F_LEN == 512
+  init_dest(&res, &exp);
+  EMULATE(fmadd_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fmadd_round_ph) (HF(src1), HF(src2),
+				   HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(m_fmadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fmadd_round_ph) (HF(res), MASK_VALUE, HF(src1),
+					HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmadd_ph);
+
+  EMULATE(fmadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fmadd_round_ph) (HF(src1), HF(src2), HF(res),
+					 MASK_VALUE, _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fmadd_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fmadd_round_ph) (ZMASK_VALUE, HF(src1), HF(src2),
+					 HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmadd_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c
new file mode 100644
index 00000000000..3b1147a41cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmsub231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm512_fmsub_ph (x1, x2, x3);
+  x1 = _mm512_mask_fmsub_ph (x1, m, x2, x3);
+  x3 = _mm512_mask3_fmsub_ph (x1, x2, x3, m);
+  x1 = _mm512_maskz_fmsub_ph (m, x1, x2, x3);
+  x1 = _mm512_fmsub_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT
+			      | _MM_FROUND_NO_EXC);
+  x1 = _mm512_mask_fmsub_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF
+				   | _MM_FROUND_NO_EXC);
+  x3 = _mm512_mask3_fmsub_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF
+				    | _MM_FROUND_NO_EXC);
+  x1 = _mm512_maskz_fmsub_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO
+				    | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c
new file mode 100644
index 00000000000..abb9a9bc826
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c
@@ -0,0 +1,155 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(fmsub_ph) (V512 * dest, V512 op1, V512 op2,
+                 __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8; 
+    int i;
+    __mmask16 m1, m2;
+    
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+ 
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = v1.f32[i] * v3.f32[i] - v7.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = v2.f32[i] * v4.f32[i] - v8.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void NOINLINE
+EMULATE(m_fmsub_ph) (V512 * dest, V512 op1, V512 op2,
+                   __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8; 
+    int i;
+    __mmask16 m1, m2;
+    
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+ 
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = v7.f32[i] * v1.f32[i] - v3.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+               v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = v8.f32[i] * v2.f32[i] - v4.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(fmsub_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fmsub_ph) (HF(src1), HF(src2), HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(m_fmsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fmsub_ph) (HF(res), MASK_VALUE,
+					HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fmsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fmsub_ph) (HF(src1), HF(src2), HF(res), MASK_VALUE);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fmsub_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fmsub_ph) (ZMASK_VALUE, HF(src1), HF(src2), HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmsub_ph);
+
+#if AVX512F_LEN == 512
+  init_dest(&res, &exp);
+  EMULATE(fmsub_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fmsub_round_ph) (HF(src1), HF(src2), HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(m_fmsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fmsub_round_ph) (HF(res), MASK_VALUE,
+					HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmsub_ph);
+
+  EMULATE(fmsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fmsub_round_ph) (HF(src1), HF(src2),
+					 HF(res), MASK_VALUE, _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fmsub_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fmsub_round_ph) (ZMASK_VALUE, HF(src1),
+					 HF(src2), HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmsub_ph);
+#endif
+
+  if (n_errs != 0) {
+    abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c
new file mode 100644
index 00000000000..20e77ce7398
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfnmadd231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm512_fnmadd_ph (x1, x2, x3);
+  x1 = _mm512_mask_fnmadd_ph (x1, m, x2, x3);
+  x3 = _mm512_mask3_fnmadd_ph (x1, x2, x3, m);
+  x1 = _mm512_maskz_fnmadd_ph (m, x1, x2, x3);
+  x1 = _mm512_fnmadd_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
+  x1 = _mm512_mask_fnmadd_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC);
+  x3 = _mm512_mask3_fnmadd_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC);
+  x1 = _mm512_maskz_fnmadd_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c
new file mode 100644
index 00000000000..b15b1bd1149
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c
@@ -0,0 +1,159 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(fnmadd_ph) (V512 * dest, V512 op1, V512 op2,
+                  __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8; 
+    int i;
+    __mmask16 m1, m2;
+    
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+ 
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = -(v1.f32[i] * v3.f32[i]) + v7.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+                 v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = -(v2.f32[i] * v4.f32[i]) + v8.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void NOINLINE
+EMULATE(m_fnmadd_ph) (V512 * dest, V512 op1, V512 op2,
+                  __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8; 
+    int i;
+    __mmask16 m1, m2;
+    
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+ 
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = -(v1.f32[i] * v7.f32[i]) + v3.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+                 v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = -(v2.f32[i] * v8.f32[i]) + v4.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(fnmadd_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fnmadd_ph) (HF(src1), HF(src2),
+				    HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fnmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(m_fnmadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fnmadd_ph) (HF(res), MASK_VALUE,
+					 HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fnmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fnmadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fnmadd_ph) (HF(src1), HF(src2),
+				    HF(res), MASK_VALUE);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fnmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fnmadd_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fnmadd_ph) (ZMASK_VALUE, HF(src1),
+				    HF(src2), HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fnmadd_ph);
+
+#if AVX512F_LEN == 512
+  init_dest(&res, &exp);
+  EMULATE(fnmadd_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fnmadd_round_ph) (HF(src1), HF(src2),
+				    HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fnmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(m_fnmadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fnmadd_round_ph) (HF(res), MASK_VALUE,
+					 HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fnmadd_ph);
+
+  EMULATE(fnmadd_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fnmadd_round_ph) (HF(src1), HF(src2),
+					  HF(res), MASK_VALUE, _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fnmadd_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fnmadd_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fnmadd_round_ph) (ZMASK_VALUE, HF(src1),
+					  HF(src2), HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fnmadd_ph);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c
new file mode 100644
index 00000000000..eb05de46347
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfnmsub231ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub231ph\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h x1, x2, x3;
+volatile __mmask32 m;
+
+void extern
+avx512f_test (void)
+{
+  x1 = _mm512_fnmsub_ph (x1, x2, x3);
+  x1 = _mm512_mask_fnmsub_ph (x1, m, x2, x3);
+  x3 = _mm512_mask3_fnmsub_ph (x1, x2, x3, m);
+  x1 = _mm512_maskz_fnmsub_ph (m, x1, x2, x3);
+  x1 = _mm512_fnmsub_round_ph (x1, x2, x3, _MM_FROUND_TO_NEAREST_INT
+			       | _MM_FROUND_NO_EXC);
+  x1 = _mm512_mask_fnmsub_round_ph (x1, m, x2, x3, _MM_FROUND_TO_NEG_INF
+				    | _MM_FROUND_NO_EXC);
+  x3 = _mm512_mask3_fnmsub_round_ph (x1, x2, x3, m, _MM_FROUND_TO_POS_INF
+				     | _MM_FROUND_NO_EXC);
+  x1 = _mm512_maskz_fnmsub_round_ph (m, x1, x2, x3, _MM_FROUND_TO_ZERO
+				     | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c
new file mode 100644
index 00000000000..73f0172ca20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c
@@ -0,0 +1,157 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(fnmsub_ph) (V512 * dest, V512 op1, V512 op2,
+                  __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8; 
+    int i;
+    __mmask16 m1, m2;
+    
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+ 
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = -(v1.f32[i] * v3.f32[i]) - v7.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+                 v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = -(v2.f32[i] * v4.f32[i]) - v8.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void NOINLINE
+EMULATE(m_fnmsub_ph) (V512 * dest, V512 op1, V512 op2,
+                  __mmask32 k, int zero_mask)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+    __mmask16 m1, m2;
+
+    m1 = k & 0xffff;
+    m2 = (k >> 16) & 0xffff;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    for (i = 0; i < 16; i++) {
+        if (((1 << i) & m1) == 0) {
+            if (zero_mask) {
+                v5.f32[i] = 0;
+            }
+            else {
+                v5.u32[i] = v7.u32[i];
+            }
+        }
+        else {
+           v5.f32[i] = -(v1.f32[i] * v7.f32[i]) - v3.f32[i];
+        }
+
+        if (((1 << i) & m2) == 0) {
+            if (zero_mask) {
+                 v6.f32[i] = 0;
+            }
+            else {
+                v6.u32[i] = v8.u32[i];
+            }
+        }
+        else {
+            v6.f32[i] = -(v2.f32[i] * v8.f32[i]) - v4.f32[i];
+        }
+
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(fnmsub_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fnmsub_ph) (HF(src1), HF(src2),
+				    HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fnmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(m_fnmsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fnmsub_ph) (HF(res), MASK_VALUE,
+					 HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fnmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fnmsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fnmsub_ph) (HF(src1), HF(src2), HF(res), MASK_VALUE);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fnmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fnmsub_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fnmsub_ph) (ZMASK_VALUE, HF(src1), HF(src2), HF(res));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fnmsub_ph);
+
+#if AVX512F_LEN == 512
+  init_dest(&res, &exp);
+  EMULATE(fnmsub_ph)(&exp, src1, src2,  NET_MASK, 0);
+  HF(res) = INTRINSIC (_fnmsub_round_ph) (HF(src1), HF(src2),
+				    HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fnmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(m_fnmsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask_fnmsub_round_ph) (HF(res), MASK_VALUE,
+					 HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fnmsub_ph);
+
+  EMULATE(fnmsub_ph)(&exp, src1, src2,  MASK_VALUE, 0);
+  HF(res) = INTRINSIC (_mask3_fnmsub_round_ph) (HF(src1), HF(src2),
+					  HF(res), MASK_VALUE, _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fnmsub_ph);
+
+  init_dest(&res, &exp);
+  EMULATE(fnmsub_ph)(&exp, src1, src2,  ZMASK_VALUE, 1);
+  HF(res) = INTRINSIC (_maskz_fnmsub_round_ph) (ZMASK_VALUE, HF(src1),
+					  HF(src2), HF(res), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fnmsub_ph);
+#endif
+
+  if (n_errs != 0) {
+    abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c
new file mode 100644
index 00000000000..eea38b860ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmadd231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h yy, y2, y3;
+volatile __m128h xx, x2, x3;
+volatile __mmask8 m;
+volatile __mmask16 m16;
+
+void extern
+avx512vl_test (void)
+{
+  yy = _mm256_mask_fmadd_ph (yy, m16, y2, y3);
+  xx = _mm_mask_fmadd_ph (xx, m, x2, x3);
+
+  y3 = _mm256_mask3_fmadd_ph (yy, y2, y3, m16);
+  x3 = _mm_mask3_fmadd_ph (xx, x2, x3, m);
+
+  yy = _mm256_maskz_fmadd_ph (m16, yy, y2, y3);
+  xx = _mm_maskz_fmadd_ph (m, xx, x2, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c
new file mode 100644
index 00000000000..f6e4a9ae128
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmaddXXXph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmaddXXXph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c
new file mode 100644
index 00000000000..add1abc2bea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmsub231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h yy, y2, y3;
+volatile __m128h xx, x2, x3;
+volatile __mmask8 m;
+volatile __mmask16 m16;
+
+void extern
+avx512vl_test (void)
+{
+  yy = _mm256_mask_fmsub_ph (yy, m16, y2, y3);
+  xx = _mm_mask_fmsub_ph (xx, m, x2, x3);
+
+  y3 = _mm256_mask3_fmsub_ph (yy, y2, y3, m16);
+  x3 = _mm_mask3_fmsub_ph (xx, x2, x3, m);
+
+  yy = _mm256_maskz_fmsub_ph (m16, yy, y2, y3);
+  xx = _mm_maskz_fmsub_ph (m, xx, x2, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c
new file mode 100644
index 00000000000..b9c2085ecd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmsubXXXph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmsubXXXph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c
new file mode 100644
index 00000000000..6dad9013581
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfnmadd231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h yy, y2, y3;
+volatile __m128h xx, x2, x3;
+volatile __mmask8 m;
+volatile __mmask16 m16;
+
+void extern
+avx512vl_test (void)
+{
+  yy = _mm256_mask_fnmadd_ph (yy, m16, y2, y3);
+  xx = _mm_mask_fnmadd_ph (xx, m, x2, x3);
+
+  y3 = _mm256_mask3_fnmadd_ph (yy, y2, y3, m16);
+  x3 = _mm_mask3_fnmadd_ph (xx, x2, x3, m);
+
+  yy = _mm256_maskz_fnmadd_ph (m16, yy, y2, y3);
+  xx = _mm_maskz_fnmadd_ph (m, xx, x2, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c
new file mode 100644
index 00000000000..6c615d6541e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfnmaddXXXph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfnmaddXXXph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c
new file mode 100644
index 00000000000..1a7fd092b73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfnmsub231ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub231ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...ph\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h yy, y2, y3;
+volatile __m128h xx, x2, x3;
+volatile __mmask8 m;
+volatile __mmask16 m16;
+
+void extern
+avx512vl_test (void)
+{
+  yy = _mm256_mask_fnmsub_ph (yy, m16, y2, y3);
+  xx = _mm_mask_fnmsub_ph (xx, m, x2, x3);
+
+  y3 = _mm256_mask3_fnmsub_ph (yy, y2, y3, m16);
+  x3 = _mm_mask3_fnmsub_ph (xx, x2, x3, m);
+
+  yy = _mm256_maskz_fnmsub_ph (m16, yy, y2, y3);
+  xx = _mm_maskz_fnmsub_ph (m, xx, x2, x3);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c
new file mode 100644
index 00000000000..6d72b3dc220
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfnmsubXXXph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfnmsubXXXph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 44/62] AVX512FP16: Add scalar/vector bitwise operations, including
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (42 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 43/62] AVX512FP16: Add testcase for " liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-23  5:13   ` Hongtao Liu
  2021-07-01  6:16 ` [PATCH 45/62] AVX512FP16: Add testcase for fp16 bitwise operations liuhongt
                   ` (17 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

From: "H.J. Lu" <hjl.tools@gmail.com>

1. FP16 vector xor/ior/and/andnot/abs/neg
2. FP16 scalar abs/neg/copysign/xorsign

gcc/ChangeLog:

	* config/i386/i386-expand.c (ix86_expand_fp_absneg_operator):
	Handle HFmode.
	(ix86_expand_copysign): Ditto.
	(ix86_expand_xorsign): Ditto.
	* config/i386/i386.c (ix86_build_const_vector): Handle HF vector
	modes.
	(ix86_build_signbit_mask): Ditto.
	(ix86_can_change_mode_class): Ditto.
	* config/i386/i386.md (SSEMODEF): Add HF mode.
	(ssevecmodef): Ditto.
	(<code><mode>2): Use MODEFH.
	(*<code><mode>2_1): Ditto.
	(define_split): Ditto.
	(xorsign<mode>3): Ditto.
	(@xorsign<mode>3_1): Ditto.
	* config/i386/sse.md (VFB): New mode iterator.
	(VFB_128_256): Ditto.
	(VFB_512): Ditto.
	(sseintvecmode2): Support HF vector mode.
	(<code><mode>2): Use new mode iterator.
	(*<code><mode>2): Ditto.
	(copysign<mode>3): Ditto.
	(xorsign<mode>3): Ditto.
	(<code><mode>3<mask_name>): Ditto.
	(<code><mode>3<mask_name>): Ditto.
	(<sse>_andnot<mode>3<mask_name>): Adjust for HF vector mode.
	(<sse>_andnot<mode>3<mask_name>): Ditto.
	(*<code><mode>3<mask_name>): Ditto.
	(*<code><mode>3<mask_name>): Ditto.
---
 gcc/config/i386/i386-expand.c |  12 +++-
 gcc/config/i386/i386.c        |  12 +++-
 gcc/config/i386/i386.md       |  40 ++++++-----
 gcc/config/i386/sse.md        | 128 ++++++++++++++++++++--------------
 4 files changed, 118 insertions(+), 74 deletions(-)

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 9233c6cd1e8..006f4bec8db 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -1781,6 +1781,8 @@ ix86_expand_fp_absneg_operator (enum rtx_code code, machine_mode mode,
 	vmode = V4SFmode;
       else if (mode == DFmode)
 	vmode = V2DFmode;
+      else if (mode == HFmode)
+	vmode = V8HFmode;
     }
 
   dst = operands[0];
@@ -1918,7 +1920,9 @@ ix86_expand_copysign (rtx operands[])
 
   mode = GET_MODE (dest);
 
-  if (mode == SFmode)
+  if (mode == HFmode)
+    vmode = V8HFmode;
+  else if (mode == SFmode)
     vmode = V4SFmode;
   else if (mode == DFmode)
     vmode = V2DFmode;
@@ -1934,7 +1938,7 @@ ix86_expand_copysign (rtx operands[])
       if (real_isneg (CONST_DOUBLE_REAL_VALUE (op0)))
 	op0 = simplify_unary_operation (ABS, mode, op0, mode);
 
-      if (mode == SFmode || mode == DFmode)
+      if (mode == HFmode || mode == SFmode || mode == DFmode)
 	{
 	  if (op0 == CONST0_RTX (mode))
 	    op0 = CONST0_RTX (vmode);
@@ -2073,7 +2077,9 @@ ix86_expand_xorsign (rtx operands[])
 
   mode = GET_MODE (dest);
 
-  if (mode == SFmode)
+  if (mode == HFmode)
+    vmode = V8HFmode;
+  else if (mode == SFmode)
     vmode = V4SFmode;
   else if (mode == DFmode)
     vmode = V2DFmode;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index dc0d440061b..17e1b5ea874 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15374,6 +15374,9 @@ ix86_build_const_vector (machine_mode mode, bool vect, rtx value)
     case E_V2DImode:
       gcc_assert (vect);
       /* FALLTHRU */
+    case E_V8HFmode:
+    case E_V16HFmode:
+    case E_V32HFmode:
     case E_V16SFmode:
     case E_V8SFmode:
     case E_V4SFmode:
@@ -15412,6 +15415,13 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, bool invert)
 
   switch (mode)
     {
+    case E_V8HFmode:
+    case E_V16HFmode:
+    case E_V32HFmode:
+      vec_mode = mode;
+      imode = HImode;
+      break;
+
     case E_V16SImode:
     case E_V16SFmode:
     case E_V8SImode:
@@ -19198,7 +19208,7 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
 	 disallow a change to these modes, reload will assume it's ok to
 	 drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
 	 the vec_dupv4hi pattern.  */
-      if (GET_MODE_SIZE (from) < 4)
+      if (GET_MODE_SIZE (from) < 4 && from != E_HFmode)
 	return false;
     }
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 014aba187e1..a85c23d74f1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1233,9 +1233,10 @@ (define_mode_iterator MODEFH [(HF "TARGET_AVX512FP16") SF DF])
 ;; All x87 floating point modes plus HFmode
 (define_mode_iterator X87MODEFH [HF SF DF XF])
 
-;; All SSE floating point modes
-(define_mode_iterator SSEMODEF [SF DF TF])
-(define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
+;; All SSE floating point modes and HFmode
+(define_mode_iterator SSEMODEF [HF SF DF TF])
+(define_mode_attr ssevecmodef [(HF "V8HF") (SF "V4SF") (DF "V2DF") (TF "TF")])
+
 
 ;; SSE instruction suffix for various modes
 (define_mode_attr ssemodesuffix
@@ -10529,8 +10530,8 @@ (define_insn_and_split "*nabstf2_1"
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
 (define_expand "<code><mode>2"
-  [(set (match_operand:X87MODEF 0 "register_operand")
-	(absneg:X87MODEF (match_operand:X87MODEF 1 "register_operand")))]
+  [(set (match_operand:X87MODEFH 0 "register_operand")
+	(absneg:X87MODEFH (match_operand:X87MODEFH 1 "register_operand")))]
   "TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
   "ix86_expand_fp_absneg_operator (<CODE>, <MODE>mode, operands); DONE;")
 
@@ -10559,9 +10560,9 @@ (define_split
   "ix86_split_fp_absneg_operator (<CODE>, <MODE>mode, operands); DONE;")
 
 (define_insn "*<code><mode>2_1"
-  [(set (match_operand:MODEF 0 "register_operand" "=x,x,Yv,f,!r")
-	(absneg:MODEF
-	  (match_operand:MODEF 1 "register_operand" "0,x,Yv,0,0")))
+  [(set (match_operand:MODEFH 0 "register_operand" "=x,x,Yv,f,!r")
+	(absneg:MODEFH
+	  (match_operand:MODEFH 1 "register_operand" "0,x,Yv,0,0")))
    (use (match_operand:<ssevecmode> 2 "vector_operand" "xBm,0,Yvm,X,X"))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
@@ -10572,7 +10573,8 @@ (define_insn "*<code><mode>2_1"
        (match_test ("SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"))
        (if_then_else
 	 (eq_attr "alternative" "3,4")
-	 (symbol_ref "TARGET_MIX_SSE_I387")
+	 (symbol_ref "TARGET_MIX_SSE_I387
+		      && <MODE>mode != HFmode")
 	 (const_string "*"))
        (if_then_else
 	 (eq_attr "alternative" "3,4")
@@ -10580,9 +10582,9 @@ (define_insn "*<code><mode>2_1"
 	 (symbol_ref "false"))))])
 
 (define_split
-  [(set (match_operand:MODEF 0 "sse_reg_operand")
-	(absneg:MODEF
-	  (match_operand:MODEF 1 "sse_reg_operand")))
+  [(set (match_operand:MODEFH 0 "sse_reg_operand")
+	(absneg:MODEFH
+	  (match_operand:MODEFH 1 "sse_reg_operand")))
    (use (match_operand:<ssevecmodef> 2 "vector_operand"))
    (clobber (reg:CC FLAGS_REG))]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
@@ -10706,17 +10708,17 @@ (define_split
   "ix86_split_copysign_var (operands); DONE;")
 
 (define_expand "xorsign<mode>3"
-  [(match_operand:MODEF 0 "register_operand")
-   (match_operand:MODEF 1 "register_operand")
-   (match_operand:MODEF 2 "register_operand")]
+  [(match_operand:MODEFH 0 "register_operand")
+   (match_operand:MODEFH 1 "register_operand")
+   (match_operand:MODEFH 2 "register_operand")]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
   "ix86_expand_xorsign (operands); DONE;")
 
 (define_insn_and_split "@xorsign<mode>3_1"
-  [(set (match_operand:MODEF 0 "register_operand" "=Yv")
-	(unspec:MODEF
-	  [(match_operand:MODEF 1 "register_operand" "Yv")
-	   (match_operand:MODEF 2 "register_operand" "0")
+  [(set (match_operand:MODEFH 0 "register_operand" "=Yv")
+	(unspec:MODEFH
+	  [(match_operand:MODEFH 1 "register_operand" "Yv")
+	   (match_operand:MODEFH 2 "register_operand" "0")
 	   (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm")]
 	  UNSPEC_XORSIGN))]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index fdcc0515228..7c594babcce 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -317,11 +317,26 @@ (define_mode_iterator VFH
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
+;; 128-, 256- and 512-bit float vector modes for bitwise operations
+(define_mode_iterator VFB
+  [(V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
+   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+
 ;; 128- and 256-bit float vector modes
 (define_mode_iterator VF_128_256
   [(V8SF "TARGET_AVX") V4SF
    (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
+;; 128- and 256-bit float vector modes for bitwise operations
+(define_mode_iterator VFB_128_256
+  [(V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
+   (V8SF "TARGET_AVX") V4SF
+   (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
+
 ;; All SFmode vector float modes
 (define_mode_iterator VF1
   [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF])
@@ -374,6 +389,10 @@ (define_mode_iterator VF_256
 (define_mode_iterator VF_512
   [V16SF V8DF])
 
+;; All 512bit vector float modes for bitwise operations
+(define_mode_iterator VFB_512
+  [(V32HF "TARGET_AVX512FP16") V16SF V8DF])
+
 (define_mode_iterator VI48_AVX512VL
   [V16SI (V8SI  "TARGET_AVX512VL") (V4SI  "TARGET_AVX512VL")
    V8DI  (V4DI  "TARGET_AVX512VL") (V2DI  "TARGET_AVX512VL")])
@@ -923,7 +942,8 @@ (define_mode_attr sseintvecmode
 
 (define_mode_attr sseintvecmode2
   [(V8DF "XI") (V4DF "OI") (V2DF "TI")
-   (V8SF "OI") (V4SF "TI")])
+   (V8SF "OI") (V4SF "TI")
+   (V16HF "OI") (V8HF "TI")])
 
 (define_mode_attr sseintvecmodelower
   [(V16SF "v16si") (V8DF "v8di")
@@ -1968,22 +1988,22 @@ (define_insn "kunpckdi"
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_expand "<code><mode>2"
-  [(set (match_operand:VF 0 "register_operand")
-	(absneg:VF
-	  (match_operand:VF 1 "register_operand")))]
+  [(set (match_operand:VFB 0 "register_operand")
+	(absneg:VFB
+	  (match_operand:VFB 1 "register_operand")))]
   "TARGET_SSE"
   "ix86_expand_fp_absneg_operator (<CODE>, <MODE>mode, operands); DONE;")
 
 (define_insn_and_split "*<code><mode>2"
-  [(set (match_operand:VF 0 "register_operand" "=x,x,v,v")
-	(absneg:VF
-	  (match_operand:VF 1 "vector_operand" "0,xBm,v,m")))
-   (use (match_operand:VF 2 "vector_operand" "xBm,0,vm,v"))]
+  [(set (match_operand:VFB 0 "register_operand" "=x,x,v,v")
+	(absneg:VFB
+	  (match_operand:VFB 1 "vector_operand" "0,xBm,v,m")))
+   (use (match_operand:VFB 2 "vector_operand" "xBm,0,vm,v"))]
   "TARGET_SSE"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
-	(<absneg_op>:VF (match_dup 1) (match_dup 2)))]
+	(<absneg_op>:VFB (match_dup 1) (match_dup 2)))]
 {
   if (TARGET_AVX)
     {
@@ -3893,11 +3913,11 @@ (define_expand "vcond_mask_<mode><sseintvecmodelower>"
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "<sse>_andnot<mode>3<mask_name>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v")
-	(and:VF_128_256
-	  (not:VF_128_256
-	    (match_operand:VF_128_256 1 "register_operand" "0,x,v,v"))
-	  (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
+  [(set (match_operand:VFB_128_256 0 "register_operand" "=x,x,v,v")
+	(and:VFB_128_256
+	  (not:VFB_128_256
+	    (match_operand:VFB_128_256 1 "register_operand" "0,x,v,v"))
+	  (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
   "TARGET_SSE && <mask_avx512vl_condition>"
 {
   char buf[128];
@@ -3920,6 +3940,8 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
 
   switch (get_attr_mode (insn))
     {
+    case MODE_V16HF:
+    case MODE_V8HF:
     case MODE_V8SF:
     case MODE_V4SF:
       suffix = "ps";
@@ -3958,11 +3980,11 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
 	      (const_string "<MODE>")))])
 
 (define_insn "<sse>_andnot<mode>3<mask_name>"
-  [(set (match_operand:VF_512 0 "register_operand" "=v")
-	(and:VF_512
-	  (not:VF_512
-	    (match_operand:VF_512 1 "register_operand" "v"))
-	  (match_operand:VF_512 2 "nonimmediate_operand" "vm")))]
+  [(set (match_operand:VFB_512 0 "register_operand" "=v")
+	(and:VFB_512
+	  (not:VFB_512
+	    (match_operand:VFB_512 1 "register_operand" "v"))
+	  (match_operand:VFB_512 2 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F"
 {
   char buf[128];
@@ -3972,8 +3994,9 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
   suffix = "<ssemodesuffix>";
   ops = "";
 
-  /* There is no vandnp[sd] in avx512f.  Use vpandn[qd].  */
-  if (!TARGET_AVX512DQ)
+  /* Since there are no vandnp[sd] without AVX512DQ nor vandnph,
+     use vp<logic>[dq].  */
+  if (!TARGET_AVX512DQ || <MODE>mode == V32HFmode)
     {
       suffix = GET_MODE_INNER (<MODE>mode) == DFmode ? "q" : "d";
       ops = "p";
@@ -3993,26 +4016,26 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
 		      (const_string "XI")))])
 
 (define_expand "<code><mode>3<mask_name>"
-  [(set (match_operand:VF_128_256 0 "register_operand")
-       (any_logic:VF_128_256
-         (match_operand:VF_128_256 1 "vector_operand")
-         (match_operand:VF_128_256 2 "vector_operand")))]
+  [(set (match_operand:VFB_128_256 0 "register_operand")
+       (any_logic:VFB_128_256
+         (match_operand:VFB_128_256 1 "vector_operand")
+         (match_operand:VFB_128_256 2 "vector_operand")))]
   "TARGET_SSE && <mask_avx512vl_condition>"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
 (define_expand "<code><mode>3<mask_name>"
-  [(set (match_operand:VF_512 0 "register_operand")
-       (any_logic:VF_512
-         (match_operand:VF_512 1 "nonimmediate_operand")
-         (match_operand:VF_512 2 "nonimmediate_operand")))]
+  [(set (match_operand:VFB_512 0 "register_operand")
+       (any_logic:VFB_512
+         (match_operand:VFB_512 1 "nonimmediate_operand")
+         (match_operand:VFB_512 2 "nonimmediate_operand")))]
   "TARGET_AVX512F"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
 (define_insn "*<code><mode>3<mask_name>"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v")
-	(any_logic:VF_128_256
-	  (match_operand:VF_128_256 1 "vector_operand" "%0,x,v,v")
-	  (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
+  [(set (match_operand:VFB_128_256 0 "register_operand" "=x,x,v,v")
+	(any_logic:VFB_128_256
+	  (match_operand:VFB_128_256 1 "vector_operand" "%0,x,v,v")
+	  (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
   "TARGET_SSE && <mask_avx512vl_condition>
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
 {
@@ -4036,6 +4059,8 @@ (define_insn "*<code><mode>3<mask_name>"
 
   switch (get_attr_mode (insn))
     {
+    case MODE_V16HF:
+    case MODE_V8HF:
     case MODE_V8SF:
     case MODE_V4SF:
       suffix = "ps";
@@ -4074,10 +4099,10 @@ (define_insn "*<code><mode>3<mask_name>"
 	      (const_string "<MODE>")))])
 
 (define_insn "*<code><mode>3<mask_name>"
-  [(set (match_operand:VF_512 0 "register_operand" "=v")
-	(any_logic:VF_512
-	  (match_operand:VF_512 1 "nonimmediate_operand" "%v")
-	  (match_operand:VF_512 2 "nonimmediate_operand" "vm")))]
+  [(set (match_operand:VFB_512 0 "register_operand" "=v")
+	(any_logic:VFB_512
+	  (match_operand:VFB_512 1 "nonimmediate_operand" "%v")
+	  (match_operand:VFB_512 2 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
 {
   char buf[128];
@@ -4087,8 +4112,9 @@ (define_insn "*<code><mode>3<mask_name>"
   suffix = "<ssemodesuffix>";
   ops = "";
 
-  /* There is no v<logic>p[sd] in avx512f.  Use vp<logic>[dq].  */
-  if (!TARGET_AVX512DQ)
+  /* Since there are no v<logic>p[sd] without AVX512DQ nor v<logic>ph,
+     use vp<logic>[dq].  */
+  if (!TARGET_AVX512DQ || <MODE>mode == V32HFmode)
     {
       suffix = GET_MODE_INNER (<MODE>mode) == DFmode ? "q" : "d";
       ops = "p";
@@ -4109,14 +4135,14 @@ (define_insn "*<code><mode>3<mask_name>"
 
 (define_expand "copysign<mode>3"
   [(set (match_dup 4)
-	(and:VF
-	  (not:VF (match_dup 3))
-	  (match_operand:VF 1 "vector_operand")))
+	(and:VFB
+	  (not:VFB (match_dup 3))
+	  (match_operand:VFB 1 "vector_operand")))
    (set (match_dup 5)
-	(and:VF (match_dup 3)
-		(match_operand:VF 2 "vector_operand")))
-   (set (match_operand:VF 0 "register_operand")
-	(ior:VF (match_dup 4) (match_dup 5)))]
+	(and:VFB (match_dup 3)
+		 (match_operand:VFB 2 "vector_operand")))
+   (set (match_operand:VFB 0 "register_operand")
+	(ior:VFB (match_dup 4) (match_dup 5)))]
   "TARGET_SSE"
 {
   operands[3] = ix86_build_signbit_mask (<MODE>mode, 1, 0);
@@ -4127,11 +4153,11 @@ (define_expand "copysign<mode>3"
 
 (define_expand "xorsign<mode>3"
   [(set (match_dup 4)
-	(and:VF (match_dup 3)
-		(match_operand:VF 2 "vector_operand")))
-   (set (match_operand:VF 0 "register_operand")
-	(xor:VF (match_dup 4)
-		(match_operand:VF 1 "vector_operand")))]
+	(and:VFB (match_dup 3)
+		(match_operand:VFB 2 "vector_operand")))
+   (set (match_operand:VFB 0 "register_operand")
+	(xor:VFB (match_dup 4)
+		 (match_operand:VFB 1 "vector_operand")))]
   "TARGET_SSE"
 {
   operands[3] = ix86_build_signbit_mask (<MODE>mode, 1, 0);
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 45/62] AVX512FP16: Add testcase for fp16 bitwise operations.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (43 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 44/62] AVX512FP16: Add scalar/vector bitwise operations, including liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 46/62] AVX512FP16: Enable FP16 mask load/store liuhongt
                   ` (16 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-neg-1a.c: New test.
	* gcc.target/i386/avx512fp16-neg-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-scalar-bitwise-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-scalar-bitwise-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vector-bitwise-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vector-bitwise-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-neg-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-neg-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-neg-1a.c       |  19 +++
 .../gcc.target/i386/avx512fp16-neg-1b.c       |  33 +++++
 .../i386/avx512fp16-scalar-bitwise-1a.c       |  31 +++++
 .../i386/avx512fp16-scalar-bitwise-1b.c       |  82 ++++++++++++
 .../i386/avx512fp16-vector-bitwise-1a.c       | 124 ++++++++++++++++++
 .../i386/avx512fp16-vector-bitwise-1b.c       | 119 +++++++++++++++++
 .../gcc.target/i386/avx512fp16vl-neg-1a.c     |  18 +++
 .../gcc.target/i386/avx512fp16vl-neg-1b.c     |  33 +++++
 8 files changed, 459 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-neg-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-neg-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1a.c
new file mode 100644
index 00000000000..bf7693e0b1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1a.c
@@ -0,0 +1,19 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+/* { dg-final { scan-assembler-times "vpxord\[ \\t\]+\[^\n\r\]*%zmm0" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\n\r\]*%xmm0" 1 } } */
+
+#include<immintrin.h>
+
+_Float16
+neghf (_Float16 a)
+{
+  return -a;
+}
+
+__m512h
+neghf512 (__m512h a)
+{
+  return -a;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1b.c
new file mode 100644
index 00000000000..770f7b283d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-neg-1b.c
@@ -0,0 +1,33 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+static void
+test_512 (void)
+{
+  V512 v1, v2, v3, v4, exp, res;
+  int i;
+  init_src();
+  
+  unpack_ph_2twops(src1, &v1, &v2);
+  v1.f32[0] = -v1.f32[0];
+  exp = pack_twops_2ph(v1, v2);
+  res.zmmh = src1.zmmh;
+  res.f16[0] = -res.f16[0];
+  check_results(&res, &exp, 32, "neg");
+
+  unpack_ph_2twops(src1, &v1, &v2);
+  for (i=0; i<16; i++)
+  {
+    v1.f32[i] = -v1.f32[i];  
+    v2.f32[i] = -v2.f32[i];  
+  }
+  exp = pack_twops_2ph(v1, v2);
+  res.zmmh = -src1.zmmh;
+  check_results(&res, &exp, 32, "neg");
+  if (n_errs != 0) {
+      abort ();
+  }
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1a.c
new file mode 100644
index 00000000000..1325c341a33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1a.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16" } */
+
+_Float16
+f1 (_Float16 x)
+{
+  return __builtin_fabsf16 (x);
+}
+
+_Float16
+f2 (_Float16 x, _Float16 y)
+{
+  return __builtin_copysignf16 (x, y);
+}
+
+_Float16
+f3 (_Float16 x)
+{
+  return -x;
+}
+
+_Float16
+f4 (_Float16 x, _Float16 y)
+{
+  return x * __builtin_copysignf16 (1, y);
+}
+
+
+/* { dg-final { scan-assembler-times "vandps\[^\n\r\]*xmm\[0-9\]" 4 } } */
+/* { dg-final { scan-assembler-times "vorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1b.c
new file mode 100644
index 00000000000..7a292519a4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-scalar-bitwise-1b.c
@@ -0,0 +1,82 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-Ofast -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+void NOINLINE
+emulate_absneg_ph (V512 * dest, V512 op1, int abs)
+{
+  V512 v1, v2, v3, v4;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v3, &v4);
+
+  for (i = 0; i != 16; i++) {
+    if (abs) {
+      v3.f32[i] = __builtin_fabsf (v1.f32[i]);
+      v4.f32[i] = __builtin_fabsf (v2.f32[i]);
+    }
+    else {
+      v3.f32[i] = -v1.f32[i];
+      v4.f32[i] = -v2.f32[i];
+    }
+  }
+  *dest = pack_twops_2ph(v3, v4);
+}
+
+void NOINLINE
+emulate_copysign_ph (V512 * dest, V512 op1, V512 op2, int xorsign)
+{
+  V512 v1, v2, v3, v4, v5, v6;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v5, &v6);
+
+  for (i = 0; i != 16; i++) {
+    if (xorsign) {
+      v5.f32[i] = v1.f32[i] * __builtin_copysignf (1, v3.f32[i]);
+      v6.f32[i] = v2.f32[i] * __builtin_copysignf (1, v4.f32[i]);
+    }
+    else {
+      v5.f32[i] = __builtin_copysignf (v1.f32[i], v3.f32[i]);
+      v6.f32[i] = __builtin_copysignf (v2.f32[i], v4.f32[i]);
+    }
+  }
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res, exp;
+
+  init_src ();
+
+  /* Abs for float16.  */
+  emulate_absneg_ph (&exp, src1, 1);
+  res.f16[0] = __builtin_fabsf16 (src1.f16[0]);
+  check_results (&res, &exp, 1, "abs_float16");
+
+  /* Neg for float16.  */
+  emulate_absneg_ph (&exp, src1, 0);
+  res.f16[0] = -(src1.f16[0]);
+  check_results (&res, &exp, 1, "neg_float16");
+
+  /* Copysign for float16.  */
+  emulate_copysign_ph (&exp, src1, src2, 0);
+  res.f16[0] = __builtin_copysignf16 (src1.f16[0], src2.f16[0]);
+  check_results (&res, &exp, 1, "copysign_float16");
+
+  /* Xorsign for float16.  */
+  emulate_copysign_ph (&exp, src1, src2, 1);
+  res.f16[0] = src1.f16[0] * __builtin_copysignf16 (1, src2.f16[0]);
+  check_results (&res, &exp, 1, "xorsign_float16");
+
+  if (n_errs != 0) {
+    abort ();
+  }
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1a.c
new file mode 100644
index 00000000000..13c05abc532
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1a.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512vl -mavx512fp16" } */
+
+#include<immintrin.h>
+__m128h
+f1 (__m128h x)
+{
+  int i = 0;
+  __m128h y;
+  for (; i != 8; i++)
+    y[i] = __builtin_fabsf16 (x[i]);
+  return y;
+}
+
+__m256h
+f2 (__m256h x)
+{
+  int i = 0;
+  __m256h y;
+  for (; i != 16; i++)
+    y[i] = __builtin_fabsf16 (x[i]);
+  return y;
+}
+
+__m512h
+f3 (__m512h x)
+{
+  int i = 0;
+  __m512h y;
+  for (; i != 32; i++)
+    y[i] = __builtin_fabsf16 (x[i]);
+  return y;
+}
+
+__m128h
+f4 (__m128h x)
+{
+  return -x;
+}
+
+__m256h
+f5 (__m256h x)
+{
+  return -x;
+}
+
+__m512h
+f6 (__m512h x)
+{
+  return -x;
+}
+
+__m128h
+f7 (__m128h x, __m128h y)
+{
+  int i = 0;
+  __m128h z;
+  for (; i != 8; i++)
+    z[i] = __builtin_copysignf16 (x[i], y[i]);
+  return z;
+}
+
+__m256h
+f8 (__m256h x, __m256h y)
+{
+  int i = 0;
+  __m256h z;
+  for (; i != 16; i++)
+    z[i] = __builtin_copysignf16 (x[i], y[i]);
+  return z;
+}
+
+__m512h
+f9 (__m512h x, __m512h y)
+{
+  int i = 0;
+  __m512h z;
+  for (; i != 32; i++)
+    z[i] = __builtin_copysignf16 (x[i], y[i]);
+  return z;
+}
+
+__m128h
+f10 (__m128h x, __m128h y)
+{
+  int i = 0;
+  __m128h z;
+  for (; i != 8; i++)
+    z[i] = x[i] * __builtin_copysignf16 (1, y[i]);
+  return z;
+}
+
+__m256h
+f11 (__m256h x, __m256h y)
+{
+  int i = 0;
+  __m256h z;
+  for (; i != 16; i++)
+    z[i] = x[i] * __builtin_copysignf16 (1, y[i]);
+  return z;
+}
+
+__m512h
+f12 (__m512h x, __m512h y)
+{
+  int i = 0;
+  __m512h z;
+  for (; i != 32; i++)
+    z[i] = x[i] * __builtin_copysignf16 (1, y[i]);
+  return z;
+}
+
+/* { dg-final { scan-assembler "vandps\[^\n\r\]*xmm0" } } */
+/* { dg-final { scan-assembler "vandps\[^\n\r\]*ymm0" } } */
+/* { dg-final { scan-assembler "vpandd\[^\n\r\]*zmm0" } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm0" 2 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*ymm0" 2 } } */
+/* { dg-final { scan-assembler-times "vpxord\[^\n\r\]*zmm0" 2 } } */
+/* { dg-final { scan-assembler-times "vorps\[^\n\r\]*xmm0" 1 } } */
+/* { dg-final { scan-assembler-times "vorps\[^\n\r\]*ymm0" 1 } } */
+/* { dg-final { scan-assembler-times "vpord\[^\n\r\]*zmm0" 1 } } */
+/* { dg-final { scan-assembler-times "vandnps\[^\n\r\]*xmm0" 1 } } */
+/* { dg-final { scan-assembler-times "vandnps\[^\n\r\]*ymm0" 1 } } */
+/* { dg-final { scan-assembler-times "vpandnd\[^\n\r\]*zmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1b.c
new file mode 100644
index 00000000000..1398b360064
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-bitwise-1b.c
@@ -0,0 +1,119 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-Ofast -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+void NOINLINE
+emulate_absneg_ph (V512 * dest, V512 op1, int abs)
+{
+  V512 v1, v2, v3, v4;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(*dest, &v3, &v4);
+
+  for (i = 0; i != 16; i++) {
+    if (abs) {
+      v3.f32[i] = __builtin_fabsf (v1.f32[i]);
+      v4.f32[i] = __builtin_fabsf (v2.f32[i]);
+    }
+    else {
+      v3.f32[i] = -v1.f32[i];
+      v4.f32[i] = -v2.f32[i];
+    }
+  }
+  *dest = pack_twops_2ph(v3, v4);
+}
+
+void NOINLINE
+emulate_copysign_ph (V512 * dest, V512 op1, V512 op2, int xorsign)
+{
+  V512 v1, v2, v3, v4, v5, v6;
+  int i;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v5, &v6);
+
+  for (i = 0; i != 16; i++) {
+    if (xorsign) {
+      v5.f32[i] = v1.f32[i] * __builtin_copysignf (1, v3.f32[i]);
+      v6.f32[i] = v2.f32[i] * __builtin_copysignf (1, v4.f32[i]);
+    }
+    else {
+      v5.f32[i] = __builtin_copysignf (v1.f32[i], v3.f32[i]);
+      v6.f32[i] = __builtin_copysignf (v2.f32[i], v4.f32[i]);
+    }
+  }
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+
+void
+test_512 (void)
+{
+  V512 res, exp;
+
+  init_src ();
+
+  /* Abs for vector float16.  */
+  emulate_absneg_ph (&exp, src1, 1);
+  for (int i = 0; i != 8; i++)
+    res.f16[i] = __builtin_fabsf16 (src1.f16[i]);
+  check_results (&res, &exp, 8, "abs_m128h");
+
+  for (int i = 0; i != 16; i++)
+    res.f16[i] = __builtin_fabsf16 (src1.f16[i]);
+  check_results (&res, &exp, 16, "abs_m256h");
+
+  for (int i = 0; i != 32; i++)
+    res.f16[i] = __builtin_fabsf16 (src1.f16[i]);
+  check_results (&res, &exp, 32, "abs_m512h");
+
+  /* Neg for vector float16.  */
+  emulate_absneg_ph (&exp, src1, 0);
+  for (int i = 0; i != 8; i++)
+    res.f16[i] = -(src1.f16[i]);
+  check_results (&res, &exp, 8, "neg_m128h");
+
+  for (int i = 0; i != 16; i++)
+    res.f16[i] = -(src1.f16[i]);
+  check_results (&res, &exp, 16, "neg_m256h");
+
+  for (int i = 0; i != 32; i++)
+    res.f16[i] = -(src1.f16[i]);
+  check_results (&res, &exp, 32, "neg_m512h");
+
+  /* Copysign for vector float16.  */
+  emulate_copysign_ph (&exp, src1, src2, 0);
+  for (int i = 0; i != 8; i++)
+    res.f16[i] = __builtin_copysignf16 (src1.f16[i], src2.f16[i]);
+  check_results (&res, &exp, 8, "copysign_m128h");
+
+  for (int i = 0; i != 16; i++)
+    res.f16[i] = __builtin_copysignf16 (src1.f16[i], src2.f16[i]);
+  check_results (&res, &exp, 16, "copysign_m256h");
+
+  for (int i = 0; i != 32; i++)
+    res.f16[i] = __builtin_copysignf16 (src1.f16[i], src2.f16[i]);
+  check_results (&res, &exp, 32, "copysign_m512h");
+
+  /* Xorsign for vector float16.  */
+  emulate_copysign_ph (&exp, src1, src2, 1);
+  for (int i = 0; i != 8; i++)
+    res.f16[i] = src1.f16[i] * __builtin_copysignf16 (1, src2.f16[i]);
+  check_results (&res, &exp, 8, "xorsign_m128h");
+
+  for (int i = 0; i != 16; i++)
+    res.f16[i] = src1.f16[i] * __builtin_copysignf16 (1, src2.f16[i]);
+  check_results (&res, &exp, 16, "xorsign_m256h");
+
+  for (int i = 0; i != 32; i++)
+    res.f16[i] = src1.f16[i] * __builtin_copysignf16 (1, src2.f16[i]);
+  check_results (&res, &exp, 32, "xorsign_m512h");
+
+  if (n_errs != 0) {
+    abort ();
+  }
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1a.c
new file mode 100644
index 00000000000..a40a0d88dd2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1a.c
@@ -0,0 +1,18 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
+
+/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\n\r\]*%xmm0" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\n\r\]*%ymm0" 1 } } */
+#include<immintrin.h>
+
+__m128h
+neghf128 (__m128h a)
+{
+  return -a;
+}
+
+__m256h
+neghf256 (__m256h a)
+{
+  return -a;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1b.c
new file mode 100644
index 00000000000..d8f65fb3f60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-neg-1b.c
@@ -0,0 +1,33 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+static void
+test_512 (void)
+{
+  V512 v1, v2, v3, v4, exp, res;
+  int i;
+  init_src();
+  
+  unpack_ph_2twops(src1, &v1, &v2);
+  v1.f32[0] = -v1.f32[0];
+  exp = pack_twops_2ph(v1, v2);
+  res.zmmh = src1.zmmh;
+  res.f16[0] = -res.f16[0];
+  check_results(&res, &exp, 32, "neg");
+
+  unpack_ph_2twops(src1, &v1, &v2);
+  for (i=0; i<16; i++)
+  {
+    v1.f32[i] = -v1.f32[i];  
+    v2.f32[i] = -v2.f32[i];  
+  }
+  exp = pack_twops_2ph(v1, v2);
+  res.zmmh = -src1.zmmh;
+  check_results(&res, &exp, 32, "neg");
+  if (n_errs != 0) {
+      abort ();
+  }
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 46/62] AVX512FP16: Enable FP16 mask load/store.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (44 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 45/62] AVX512FP16: Add testcase for fp16 bitwise operations liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 47/62] AVX512FP16: Add scalar fma instructions liuhongt
                   ` (15 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

From: "H.J. Lu" <hjl.tools@gmail.com>

gcc/ChangeLog:

	* config/i386/sse.md (avx512fmaskmodelower): Extend to support
	HF modes.
	(maskload<mode><avx512fmaskmodelower>): Ditto.
	(maskstore<mode><avx512fmaskmodelower>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-xorsign-1.c: New test.
---
 gcc/config/i386/sse.md                        | 13 +++---
 .../gcc.target/i386/avx512fp16-xorsign-1.c    | 41 +++++++++++++++++++
 2 files changed, 48 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7c594babcce..cbf1e75c0b2 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -915,6 +915,7 @@ (define_mode_attr avx512fmaskmodelower
    (V32HI "si") (V16HI "hi") (V8HI  "qi") (V4HI "qi")
    (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
    (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
+   (V32HF "si") (V16HF "hi") (V8HF  "qi")
    (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
    (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
 
@@ -23106,9 +23107,9 @@ (define_expand "maskload<mode><sseintvecmodelower>"
   "TARGET_AVX")
 
 (define_expand "maskload<mode><avx512fmaskmodelower>"
-  [(set (match_operand:V48_AVX512VL 0 "register_operand")
-	(vec_merge:V48_AVX512VL
-	  (match_operand:V48_AVX512VL 1 "memory_operand")
+  [(set (match_operand:V48H_AVX512VL 0 "register_operand")
+	(vec_merge:V48H_AVX512VL
+	  (match_operand:V48H_AVX512VL 1 "memory_operand")
 	  (match_dup 0)
 	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
   "TARGET_AVX512F")
@@ -23131,9 +23132,9 @@ (define_expand "maskstore<mode><sseintvecmodelower>"
   "TARGET_AVX")
 
 (define_expand "maskstore<mode><avx512fmaskmodelower>"
-  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
-	(vec_merge:V48_AVX512VL
-	  (match_operand:V48_AVX512VL 1 "register_operand")
+  [(set (match_operand:V48H_AVX512VL 0 "memory_operand")
+	(vec_merge:V48H_AVX512VL
+	  (match_operand:V48H_AVX512VL 1 "register_operand")
 	  (match_dup 0)
 	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
   "TARGET_AVX512F")
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
new file mode 100644
index 00000000000..a22a6ceabff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fdump-tree-vect-details -save-temps" } */
+
+extern void abort ();
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+#define N 16
+_Float16 a[N] = {-0.1f, -3.2f, -6.3f, -9.4f,
+		 -12.5f, -15.6f, -18.7f, -21.8f,
+		 24.9f, 27.1f, 30.2f, 33.3f,
+		 36.4f, 39.5f, 42.6f, 45.7f};
+_Float16 b[N] = {-1.2f, 3.4f, -5.6f, 7.8f,
+		 -9.0f, 1.0f, -2.0f, 3.0f,
+		 -4.0f, -5.0f, 6.0f, 7.0f,
+		 -8.0f, -9.0f, 10.0f, 11.0f};
+_Float16 r[N];
+
+static void
+__attribute__ ((noinline, noclone))
+do_test (void)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    r[i] = a[i] * __builtin_copysignf16 (1.0f, b[i]);
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    if (r[i] != a[i] * __builtin_copysignf16 (1.0f, b[i]))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-assembler "\[ \t\]xor" } } */
+/* { dg-final { scan-assembler "\[ \t\]and" } } */
+/* { dg-final { scan-assembler-not "copysign" } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 47/62] AVX512FP16: Add scalar fma instructions.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (45 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 46/62] AVX512FP16: Enable FP16 mask load/store liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 48/62] AVX512FP16: Add testcase for scalar FMA instructions liuhongt
                   ` (14 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

Add vfmadd[132,213,231]sh/vfnmadd[132,213,231]sh/
vfmsub[132,213,231]sh/vfnmsub[132,213,231]sh.

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_fmadd_sh):
	New intrinsic.
	(_mm_mask_fmadd_sh): Likewise.
	(_mm_mask3_fmadd_sh): Likewise.
	(_mm_maskz_fmadd_sh): Likewise.
	(_mm_fmadd_round_sh): Likewise.
	(_mm_mask_fmadd_round_sh): Likewise.
	(_mm_mask3_fmadd_round_sh): Likewise.
	(_mm_maskz_fmadd_round_sh): Likewise.
	(_mm_fnmadd_sh): Likewise.
	(_mm_mask_fnmadd_sh): Likewise.
	(_mm_mask3_fnmadd_sh): Likewise.
	(_mm_maskz_fnmadd_sh): Likewise.
	(_mm_fnmadd_round_sh): Likewise.
	(_mm_mask_fnmadd_round_sh): Likewise.
	(_mm_mask3_fnmadd_round_sh): Likewise.
	(_mm_maskz_fnmadd_round_sh): Likewise.
	(_mm_fmsub_sh): Likewise.
	(_mm_mask_fmsub_sh): Likewise.
	(_mm_mask3_fmsub_sh): Likewise.
	(_mm_maskz_fmsub_sh): Likewise.
	(_mm_fmsub_round_sh): Likewise.
	(_mm_mask_fmsub_round_sh): Likewise.
	(_mm_mask3_fmsub_round_sh): Likewise.
	(_mm_maskz_fmsub_round_sh): Likewise.
	(_mm_fnmsub_sh): Likewise.
	(_mm_mask_fnmsub_sh): Likewise.
	(_mm_mask3_fnmsub_sh): Likewise.
	(_mm_maskz_fnmsub_sh): Likewise.
	(_mm_fnmsub_round_sh): Likewise.
	(_mm_mask_fnmsub_round_sh): Likewise.
	(_mm_mask3_fnmsub_round_sh): Likewise.
	(_mm_maskz_fnmsub_round_sh): Likewise.
	* config/i386/i386-builtin-types.def
	(V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT): New builtin type.
	* config/i386/i386-builtin.def: Add new builtins.
	* config/i386/i386-expand.c: Handle new builtin type.
	* config/i386/sse.md (fmai_vmfmadd_<mode><round_name>):
	Ajdust to support FP16.
	(fmai_vmfmsub_<mode><round_name>): Ditto.
	(fmai_vmfnmadd_<mode><round_name>): Ditto.
	(fmai_vmfnmsub_<mode><round_name>): Ditto.
	(*fmai_fmadd_<mode>): Ditto.
	(*fmai_fmsub_<mode>): Ditto.
	(*fmai_fnmadd_<mode><round_name>): Ditto.
	(*fmai_fnmsub_<mode><round_name>): Ditto.
	(avx512f_vmfmadd_<mode>_mask<round_name>): Ditto.
	(avx512f_vmfmadd_<mode>_mask3<round_name>): Ditto.
	(avx512f_vmfmadd_<mode>_maskz<round_expand_name>): Ditto.
	(avx512f_vmfmadd_<mode>_maskz_1<round_name>): Ditto.
	(*avx512f_vmfmsub_<mode>_mask<round_name>): Ditto.
	(avx512f_vmfmsub_<mode>_mask3<round_name>): Ditto.
	(*avx512f_vmfmsub_<mode>_maskz_1<round_name>): Ditto.
	(*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto.
	(*avx512f_vmfnmsub_<mode>_mask3<round_name>): Ditto.
	(*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto.
	(*avx512f_vmfnmadd_<mode>_mask<round_name>): Renamed to ...
	(avx512f_vmfnmadd_<mode>_mask<round_name>) ... this, and
	adjust to support FP16.
	(avx512f_vmfnmadd_<mode>_mask3<round_name>): Ditto.
	(avx512f_vmfnmadd_<mode>_maskz_1<round_name>): Ditto.
	(avx512f_vmfnmadd_<mode>_maskz<round_expand_name>): New
	expander.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 412 +++++++++++++++++++++++++
 gcc/config/i386/i386-builtin-types.def |   1 +
 gcc/config/i386/i386-builtin.def       |   7 +
 gcc/config/i386/i386-expand.c          |   1 +
 gcc/config/i386/sse.md                 | 340 ++++++++++----------
 gcc/testsuite/gcc.target/i386/avx-1.c  |  12 +
 gcc/testsuite/gcc.target/i386/sse-13.c |  12 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  16 +
 gcc/testsuite/gcc.target/i386/sse-22.c |  16 +
 gcc/testsuite/gcc.target/i386/sse-23.c |  12 +
 10 files changed, 666 insertions(+), 163 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index f246bab5159..5c85ec15b22 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -5697,6 +5697,418 @@ _mm512_maskz_fnmsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vfmadd[132,213,231]sh.  */
+extern __inline __m128h
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_sh (__m128h __W, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask  ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) -1,
+						  __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+			 const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+			  const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+			  __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
+
+#else
+#define _mm_fmadd_round_sh(A, B, C, R)					\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (-1), (R)))
+#define _mm_mask_fmadd_round_sh(A, U, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), (C), (U), (R)))
+#define _mm_mask3_fmadd_round_sh(A, B, C, U, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask3 ((A), (B), (C), (U), (R)))
+#define _mm_maskz_fmadd_round_sh(U, A, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfnmadd[132,213,231]sh.  */
+extern __inline __m128h
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fnmadd_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmadd_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmadd_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) -1,
+						   __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fnmadd_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+			 const int __R)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  (__v8hf) __B,
+						  (__mmask8) __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmadd_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+			  const int __R)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmadd_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+			  __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfnmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
+
+#else
+#define _mm_fnmadd_round_sh(A, B, C, R)					\
+  ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (-1), (R)))
+#define _mm_mask_fnmadd_round_sh(A, U, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfnmaddsh3_mask ((A), (B), (C), (U), (R)))
+#define _mm_mask3_fnmadd_round_sh(A, B, C, U, R)				\
+  ((__m128h) __builtin_ia32_vfnmaddsh3_mask3 ((A), (B), (C), (U), (R)))
+#define _mm_maskz_fnmadd_round_sh(U, A, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfnmaddsh3_maskz ((A), (B), (C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfmsub[132,213,231]sh.  */
+extern __inline __m128h
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmsub_sh (__m128h __W, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   -(__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) -1,
+						  __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+			 const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  (__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+			  const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+						   (__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+			  __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   (__v8hf) __A,
+						   -(__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
+
+#else
+#define _mm_fmsub_round_sh(A, B, C, R)					\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (-1), (R)))
+#define _mm_mask_fmsub_round_sh(A, U, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), (B), -(C), (U), (R)))
+#define _mm_mask3_fmsub_round_sh(A, B, C, U, R)				\
+  ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), (B), (C), (U), (R)))
+#define _mm_maskz_fmsub_round_sh(U, A, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), (B), -(C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vfnmsub[132,213,231]sh.  */
+extern __inline __m128h
+  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  -(__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fnmsub_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  -(__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmsub_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U)
+{
+  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+						   -(__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmsub_sh (__mmask8 __U, __m128h __W, __m128h __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   -(__v8hf) __A,
+						   -(__v8hf) __B,
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  -(__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) -1,
+						  __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fnmsub_round_sh (__m128h __W, __mmask8 __U, __m128h __A, __m128h __B,
+			 const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_mask ((__v8hf) __W,
+						  -(__v8hf) __A,
+						  -(__v8hf) __B,
+						  (__mmask8) __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fnmsub_round_sh (__m128h __W, __m128h __A, __m128h __B, __mmask8 __U,
+			  const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmsubsh3_mask3 ((__v8hf) __W,
+						   -(__v8hf) __A,
+						   (__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
+			  __m128h __B, const int __R)
+{
+  return (__m128h) __builtin_ia32_vfmaddsh3_maskz ((__v8hf) __W,
+						   -(__v8hf) __A,
+						   -(__v8hf) __B,
+						   (__mmask8) __U, __R);
+}
+
+#else
+#define _mm_fnmsub_round_sh(A, B, C, R)					\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (-1), (R)))
+#define _mm_mask_fnmsub_round_sh(A, U, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_mask ((A), -(B), -(C), (U), (R)))
+#define _mm_mask3_fnmsub_round_sh(A, B, C, U, R)				\
+  ((__m128h) __builtin_ia32_vfmsubsh3_mask3 ((A), -(B), (C), (U), (R)))
+#define _mm_maskz_fnmsub_round_sh(U, A, B, C, R)				\
+  ((__m128h) __builtin_ia32_vfmaddsh3_maskz ((A), -(B), -(C), (U), (R)))
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 0cdbf1bc0c0..22b924bf98d 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1342,6 +1342,7 @@ DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
 DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
+DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, INT)
 DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT)
 DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT)
 DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index cf0259843cc..f446a6ce5d3 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -3194,6 +3194,13 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsub_v32hf_maskz_round
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask_round, "__builtin_ia32_vfnmsubph512_mask", IX86_BUILTIN_VFNMSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_mask3_round, "__builtin_ia32_vfnmsubph512_mask3", IX86_BUILTIN_VFNMSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fnmsub_v32hf_maskz_round, "__builtin_ia32_vfnmsubph512_maskz", IX86_BUILTIN_VFNMSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask_round, "__builtin_ia32_vfmaddsh3_mask", IX86_BUILTIN_VFMADDSH3_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_mask3_round, "__builtin_ia32_vfmaddsh3_mask3", IX86_BUILTIN_VFMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmadd_v8hf_maskz_round, "__builtin_ia32_vfmaddsh3_maskz", IX86_BUILTIN_VFMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask_round, "__builtin_ia32_vfnmaddsh3_mask", IX86_BUILTIN_VFNMADDSH3_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask3_round, "__builtin_ia32_vfnmaddsh3_mask3", IX86_BUILTIN_VFNMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_maskz_round, "__builtin_ia32_vfnmaddsh3_maskz", IX86_BUILTIN_VFNMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmsub_v8hf_mask3_round, "__builtin_ia32_vfmsubsh3_mask3", IX86_BUILTIN_VFMSUBSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 006f4bec8db..f6de05c769a 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10558,6 +10558,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V8HF_FTYPE_V8DI_V8HF_UQI_INT:
     case V8HF_FTYPE_V8DF_V8HF_UQI_INT:
     case V16HF_FTYPE_V16SF_V16HF_UHI_INT:
+    case V8HF_FTYPE_V8HF_V8HF_V8HF_INT:
       nargs = 4;
       break;
     case V4SF_FTYPE_V4SF_V4SF_INT_INT:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index cbf1e75c0b2..31f8fc68c65 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5049,60 +5049,60 @@ (define_insn "<avx512>_fmsubadd_<mode>_mask3<round_name>"
 ;; high-order elements from the destination register.
 
 (define_expand "fmai_vmfmadd_<mode><round_name>"
-  [(set (match_operand:VF_128 0 "register_operand")
-	(vec_merge:VF_128
-	  (fma:VF_128
-	    (match_operand:VF_128 1 "register_operand")
-	    (match_operand:VF_128 2 "<round_nimm_scalar_predicate>")
-	    (match_operand:VF_128 3 "<round_nimm_scalar_predicate>"))
+  [(set (match_operand:VFH_128 0 "register_operand")
+	(vec_merge:VFH_128
+	  (fma:VFH_128
+	    (match_operand:VFH_128 1 "register_operand")
+	    (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>")
+	    (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA")
 
 (define_expand "fmai_vmfmsub_<mode><round_name>"
-  [(set (match_operand:VF_128 0 "register_operand")
-	(vec_merge:VF_128
-	  (fma:VF_128
-	    (match_operand:VF_128 1 "register_operand")
-	    (match_operand:VF_128 2 "<round_nimm_scalar_predicate>")
-	    (neg:VF_128
-	      (match_operand:VF_128 3 "<round_nimm_scalar_predicate>")))
+  [(set (match_operand:VFH_128 0 "register_operand")
+	(vec_merge:VFH_128
+	  (fma:VFH_128
+	    (match_operand:VFH_128 1 "register_operand")
+	    (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>")
+	    (neg:VFH_128
+	      (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA")
 
 (define_expand "fmai_vmfnmadd_<mode><round_name>"
-  [(set (match_operand:VF_128 0 "register_operand")
-	(vec_merge:VF_128
-	  (fma:VF_128
-	    (neg:VF_128
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>"))
-	    (match_operand:VF_128 1 "register_operand")
-	    (match_operand:VF_128 3 "<round_nimm_scalar_predicate>"))
+  [(set (match_operand:VFH_128 0 "register_operand")
+	(vec_merge:VFH_128
+	  (fma:VFH_128
+	    (neg:VFH_128
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>"))
+	    (match_operand:VFH_128 1 "register_operand")
+	    (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA")
 
 (define_expand "fmai_vmfnmsub_<mode><round_name>"
-  [(set (match_operand:VF_128 0 "register_operand")
-	(vec_merge:VF_128
-	  (fma:VF_128
-	    (neg:VF_128
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>"))
-	    (match_operand:VF_128 1 "register_operand")
-	    (neg:VF_128
-	      (match_operand:VF_128 3 "<round_nimm_scalar_predicate>")))
+  [(set (match_operand:VFH_128 0 "register_operand")
+	(vec_merge:VFH_128
+	  (fma:VFH_128
+	    (neg:VFH_128
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>"))
+	    (match_operand:VFH_128 1 "register_operand")
+	    (neg:VFH_128
+	      (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA")
 
 (define_insn "*fmai_fmadd_<mode>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-        (vec_merge:VF_128
-	  (fma:VF_128
-	    (match_operand:VF_128 1 "register_operand" "0,0")
-	    (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>, v")
-	    (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+        (vec_merge:VFH_128
+	  (fma:VFH_128
+	    (match_operand:VFH_128 1 "register_operand" "0,0")
+	    (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>, v")
+	    (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
@@ -5113,13 +5113,13 @@ (define_insn "*fmai_fmadd_<mode>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*fmai_fmsub_<mode>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-        (vec_merge:VF_128
-	  (fma:VF_128
-	    (match_operand:VF_128   1 "register_operand" "0,0")
-	    (match_operand:VF_128   2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
-	    (neg:VF_128
-	      (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+        (vec_merge:VFH_128
+	  (fma:VFH_128
+	    (match_operand:VFH_128   1 "register_operand" "0,0")
+	    (match_operand:VFH_128   2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
+	    (neg:VFH_128
+	      (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
@@ -5130,13 +5130,13 @@ (define_insn "*fmai_fmsub_<mode>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*fmai_fnmadd_<mode><round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-        (vec_merge:VF_128
-	  (fma:VF_128
-	    (neg:VF_128
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
-	    (match_operand:VF_128   1 "register_operand" "0,0")
-	    (match_operand:VF_128   3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+        (vec_merge:VFH_128
+	  (fma:VFH_128
+	    (neg:VFH_128
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
+	    (match_operand:VFH_128   1 "register_operand" "0,0")
+	    (match_operand:VFH_128   3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
@@ -5147,14 +5147,14 @@ (define_insn "*fmai_fnmadd_<mode><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*fmai_fnmsub_<mode><round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-        (vec_merge:VF_128
-	  (fma:VF_128
-	    (neg:VF_128
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
-	    (match_operand:VF_128   1 "register_operand" "0,0")
-	    (neg:VF_128
-	      (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+        (vec_merge:VFH_128
+	  (fma:VFH_128
+	    (neg:VFH_128
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
+	    (match_operand:VFH_128   1 "register_operand" "0,0")
+	    (neg:VFH_128
+	      (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA || TARGET_AVX512F"
@@ -5165,13 +5165,13 @@ (define_insn "*fmai_fnmsub_<mode><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "avx512f_vmfmadd_<mode>_mask<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (match_operand:VF_128 1 "register_operand" "0,0")
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
-	      (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (match_operand:VFH_128 1 "register_operand" "0,0")
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
+	      (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
 	    (match_dup 1)
 	    (match_operand:QI 4 "register_operand" "Yk,Yk"))
 	  (match_dup 1)
@@ -5184,13 +5184,13 @@ (define_insn "avx512f_vmfmadd_<mode>_mask<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "avx512f_vmfmadd_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (match_operand:VF_128 1 "<round_nimm_scalar_predicate>" "%v")
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>")
-	      (match_operand:VF_128 3 "register_operand" "0"))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (match_operand:VFH_128 1 "<round_nimm_scalar_predicate>" "%v")
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>")
+	      (match_operand:VFH_128 3 "register_operand" "0"))
 	    (match_dup 3)
 	    (match_operand:QI 4 "register_operand" "Yk"))
 	  (match_dup 3)
@@ -5201,10 +5201,10 @@ (define_insn "avx512f_vmfmadd_<mode>_mask3<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_expand "avx512f_vmfmadd_<mode>_maskz<round_expand_name>"
-  [(match_operand:VF_128 0 "register_operand")
-   (match_operand:VF_128 1 "<round_expand_nimm_predicate>")
-   (match_operand:VF_128 2 "<round_expand_nimm_predicate>")
-   (match_operand:VF_128 3 "<round_expand_nimm_predicate>")
+  [(match_operand:VFH_128 0 "register_operand")
+   (match_operand:VFH_128 1 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_128 2 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_128 3 "<round_expand_nimm_predicate>")
    (match_operand:QI 4 "register_operand")]
   "TARGET_AVX512F"
 {
@@ -5215,14 +5215,14 @@ (define_expand "avx512f_vmfmadd_<mode>_maskz<round_expand_name>"
 })
 
 (define_insn "avx512f_vmfmadd_<mode>_maskz_1<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (match_operand:VF_128 1 "register_operand" "0,0")
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
-	      (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
-	    (match_operand:VF_128 4 "const0_operand" "C,C")
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (match_operand:VFH_128 1 "register_operand" "0,0")
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
+	      (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
+	    (match_operand:VFH_128 4 "const0_operand" "C,C")
 	    (match_operand:QI 5 "register_operand" "Yk,Yk"))
 	  (match_dup 1)
 	  (const_int 1)))]
@@ -5234,14 +5234,14 @@ (define_insn "avx512f_vmfmadd_<mode>_maskz_1<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*avx512f_vmfmsub_<mode>_mask<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (match_operand:VF_128 1 "register_operand" "0,0")
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
-	      (neg:VF_128
-		(match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (match_operand:VFH_128 1 "register_operand" "0,0")
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
+	      (neg:VFH_128
+		(match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
 	    (match_dup 1)
 	    (match_operand:QI 4 "register_operand" "Yk,Yk"))
 	  (match_dup 1)
@@ -5254,14 +5254,14 @@ (define_insn "*avx512f_vmfmsub_<mode>_mask<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "avx512f_vmfmsub_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (match_operand:VF_128 1 "<round_nimm_scalar_predicate>" "%v")
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>")
-	      (neg:VF_128
-		(match_operand:VF_128 3 "register_operand" "0")))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (match_operand:VFH_128 1 "<round_nimm_scalar_predicate>" "%v")
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>")
+	      (neg:VFH_128
+		(match_operand:VFH_128 3 "register_operand" "0")))
 	    (match_dup 3)
 	    (match_operand:QI 4 "register_operand" "Yk"))
 	  (match_dup 3)
@@ -5272,15 +5272,15 @@ (define_insn "avx512f_vmfmsub_<mode>_mask3<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*avx512f_vmfmsub_<mode>_maskz_1<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (match_operand:VF_128 1 "register_operand" "0,0")
-	      (match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
-	      (neg:VF_128
-		(match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
-	    (match_operand:VF_128 4 "const0_operand" "C,C")
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (match_operand:VFH_128 1 "register_operand" "0,0")
+	      (match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v")
+	      (neg:VFH_128
+		(match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
+	    (match_operand:VFH_128 4 "const0_operand" "C,C")
 	    (match_operand:QI 5 "register_operand" "Yk,Yk"))
 	  (match_dup 1)
 	  (const_int 1)))]
@@ -5291,15 +5291,15 @@ (define_insn "*avx512f_vmfmsub_<mode>_maskz_1<round_name>"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512f_vmfnmadd_<mode>_mask<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (neg:VF_128
-		(match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
-	      (match_operand:VF_128 1 "register_operand" "0,0")
-	      (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
+(define_insn "avx512f_vmfnmadd_<mode>_mask<round_name>"
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (neg:VFH_128
+		(match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
+	      (match_operand:VFH_128 1 "register_operand" "0,0")
+	      (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
 	    (match_dup 1)
 	    (match_operand:QI 4 "register_operand" "Yk,Yk"))
 	  (match_dup 1)
@@ -5311,15 +5311,15 @@ (define_insn "*avx512f_vmfnmadd_<mode>_mask<round_name>"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512f_vmfnmadd_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (neg:VF_128
-		(match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>"))
-	      (match_operand:VF_128 1 "<round_nimm_scalar_predicate>" "%v")
-	      (match_operand:VF_128 3 "register_operand" "0"))
+(define_insn "avx512f_vmfnmadd_<mode>_mask3<round_name>"
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (neg:VFH_128
+		(match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>"))
+	      (match_operand:VFH_128 1 "<round_nimm_scalar_predicate>" "%v")
+	      (match_operand:VFH_128 3 "register_operand" "0"))
 	    (match_dup 3)
 	    (match_operand:QI 4 "register_operand" "Yk"))
 	  (match_dup 3)
@@ -5329,16 +5329,30 @@ (define_insn "*avx512f_vmfnmadd_<mode>_mask3<round_name>"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*avx512f_vmfnmadd_<mode>_maskz_1<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (neg:VF_128
-		(match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
-	      (match_operand:VF_128 1 "register_operand" "0,0")
-	      (match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
-	    (match_operand:VF_128 4 "const0_operand" "C,C")
+(define_expand "avx512f_vmfnmadd_<mode>_maskz<round_expand_name>"
+  [(match_operand:VFH_128 0 "register_operand")
+   (match_operand:VFH_128 1 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_128 2 "<round_expand_nimm_predicate>")
+   (match_operand:VFH_128 3 "<round_expand_nimm_predicate>")
+   (match_operand:QI 4 "register_operand")]
+  "TARGET_AVX512F"
+{
+  emit_insn (gen_avx512f_vmfnmadd_<mode>_maskz_1<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
+(define_insn "avx512f_vmfnmadd_<mode>_maskz_1<round_name>"
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (neg:VFH_128
+		(match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
+	      (match_operand:VFH_128 1 "register_operand" "0,0")
+	      (match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>"))
+	    (match_operand:VFH_128 4 "const0_operand" "C,C")
 	    (match_operand:QI 5 "register_operand" "Yk,Yk"))
 	  (match_dup 1)
 	  (const_int 1)))]
@@ -5350,15 +5364,15 @@ (define_insn "*avx512f_vmfnmadd_<mode>_maskz_1<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*avx512f_vmfnmsub_<mode>_mask<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (neg:VF_128
-		(match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
-	      (match_operand:VF_128 1 "register_operand" "0,0")
-	      (neg:VF_128
-		(match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (neg:VFH_128
+		(match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
+	      (match_operand:VFH_128 1 "register_operand" "0,0")
+	      (neg:VFH_128
+		(match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
 	    (match_dup 1)
 	    (match_operand:QI 4 "register_operand" "Yk,Yk"))
 	  (match_dup 1)
@@ -5371,15 +5385,15 @@ (define_insn "*avx512f_vmfnmsub_<mode>_mask<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*avx512f_vmfnmsub_<mode>_mask3<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (neg:VF_128
-		(match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>"))
-	      (match_operand:VF_128 1 "<round_nimm_scalar_predicate>" "%v")
-	      (neg:VF_128
-		(match_operand:VF_128 3 "register_operand" "0")))
+  [(set (match_operand:VFH_128 0 "register_operand" "=v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (neg:VFH_128
+		(match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>"))
+	      (match_operand:VFH_128 1 "<round_nimm_scalar_predicate>" "%v")
+	      (neg:VFH_128
+		(match_operand:VFH_128 3 "register_operand" "0")))
 	    (match_dup 3)
 	    (match_operand:QI 4 "register_operand" "Yk"))
 	  (match_dup 3)
@@ -5390,16 +5404,16 @@ (define_insn "*avx512f_vmfnmsub_<mode>_mask3<round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*avx512f_vmfnmsub_<mode>_maskz_1<round_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=v,v")
-	(vec_merge:VF_128
-	  (vec_merge:VF_128
-	    (fma:VF_128
-	      (neg:VF_128
-		(match_operand:VF_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
-	      (match_operand:VF_128 1 "register_operand" "0,0")
-	      (neg:VF_128
-		(match_operand:VF_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
-	    (match_operand:VF_128 4 "const0_operand" "C,C")
+  [(set (match_operand:VFH_128 0 "register_operand" "=v,v")
+	(vec_merge:VFH_128
+	  (vec_merge:VFH_128
+	    (fma:VFH_128
+	      (neg:VFH_128
+		(match_operand:VFH_128 2 "<round_nimm_scalar_predicate>" "<round_constraint>,v"))
+	      (match_operand:VFH_128 1 "register_operand" "0,0")
+	      (neg:VFH_128
+		(match_operand:VFH_128 3 "<round_nimm_scalar_predicate>" "v,<round_constraint>")))
+	    (match_operand:VFH_128 4 "const0_operand" "C,C")
 	    (match_operand:QI 5 "register_operand" "Yk,Yk"))
 	  (match_dup 1)
 	  (const_int 1)))]
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index d2ab16538d8..6c2d1dc3df4 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -775,6 +775,18 @@
 #define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 49c72f6fcef..f16be008909 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -792,6 +792,18 @@
 #define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 9151e50afd2..01ac4e04173 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -842,6 +842,10 @@ test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
+test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
+test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
+test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -892,6 +896,18 @@ test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m51
 test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
 test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
 test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm_mask_fmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
+test_4 (_mm_mask3_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
+test_4 (_mm_maskz_fmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
+test_4 (_mm_mask_fnmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
+test_4 (_mm_mask3_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
+test_4 (_mm_maskz_fnmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
+test_4 (_mm_mask_fmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
+test_4 (_mm_mask3_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
+test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
+test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
+test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
+test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 892b6334ae2..79e3f35ab86 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -945,6 +945,10 @@ test_3 (_mm512_fmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3 (_mm512_fnmadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3 (_mm512_fmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
 test_3 (_mm512_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
+test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
+test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
+test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
+test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -994,6 +998,18 @@ test_4 (_mm512_maskz_fmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m51
 test_4 (_mm512_mask_fnmsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
 test_4 (_mm512_mask3_fnmsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
 test_4 (_mm512_maskz_fnmsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
+test_4 (_mm_mask_fmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
+test_4 (_mm_mask3_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
+test_4 (_mm_maskz_fmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
+test_4 (_mm_mask_fnmadd_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
+test_4 (_mm_mask3_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
+test_4 (_mm_maskz_fnmadd_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
+test_4 (_mm_mask_fmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
+test_4 (_mm_mask3_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
+test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
+test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
+test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
+test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 447b83829f3..caf14408b91 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -793,6 +793,18 @@
 #define __builtin_ia32_vfnmsubph512_mask(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubph512_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubph512_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmaddsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfmsubsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfmsubsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8)
+#define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 48/62] AVX512FP16: Add testcase for scalar FMA instructions.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (46 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 47/62] AVX512FP16: Add scalar fma instructions liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph liuhongt
                   ` (13 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c: New test.
	* gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c: Ditto.
---
 .../i386/avx512fp16-vfmaddXXXsh-1a.c          | 28 ++++++
 .../i386/avx512fp16-vfmaddXXXsh-1b.c          | 90 +++++++++++++++++++
 .../i386/avx512fp16-vfmsubXXXsh-1a.c          | 28 ++++++
 .../i386/avx512fp16-vfmsubXXXsh-1b.c          | 89 ++++++++++++++++++
 .../i386/avx512fp16-vfnmaddXXXsh-1a.c         | 32 +++++++
 .../i386/avx512fp16-vfnmaddXXXsh-1b.c         | 90 +++++++++++++++++++
 .../i386/avx512fp16-vfnmsubXXXsh-1a.c         | 28 ++++++
 .../i386/avx512fp16-vfnmsubXXXsh-1b.c         | 90 +++++++++++++++++++
 8 files changed, 475 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c
new file mode 100644
index 00000000000..472454d116d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmadd231sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd231sh\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmadd...sh\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h a, b, c;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmadd_sh (a, b, c);
+  a = _mm_mask_fmadd_sh (a, m, b, c);
+  c = _mm_mask3_fmadd_sh (a, b, c, m);
+  a = _mm_maskz_fmadd_sh (m, a, b, c);
+  a = _mm_fmadd_round_sh (a, b, c, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
+  a = _mm_mask_fmadd_round_sh (a, m, b, c, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC);
+  c = _mm_mask3_fmadd_round_sh (a, b, c, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC);
+  a = _mm_maskz_fmadd_round_sh (m, a, b, c, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c
new file mode 100644
index 00000000000..a0eca9cde3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c
@@ -0,0 +1,90 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_fmadd_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask, int mask3)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = v1.f32[0] * v3.f32[0] + v7.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+   
+    for (i = 1; i < 8; i++){
+      if (mask3)
+        v5.f32[i] = v7.f32[i];
+      else
+        v5.f32[i] = v1.f32[i];
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  emulate_fmadd_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_fmadd_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_fmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fmadd_sh(&exp, src1, src2,  0x1, 0, 1);
+  res.xmmh[0] = _mm_mask3_fmadd_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], 
+                                    0x1);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask3_fmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fmadd_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_mask_fmadd_sh(src1.xmmh[0], 0x1, src2.xmmh[0], 
+                                   res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_fmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fmadd_sh(&exp, src1, src2,  0x3, 1, 0);
+  res.xmmh[0] = _mm_maskz_fmadd_sh(0x3, src1.xmmh[0], src2.xmmh[0], 
+                                    res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_fmadd_sh");
+
+  init_dest(&res, &exp);
+  emulate_fmadd_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_fmadd_round_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], 
+                                    _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_fmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fmadd_sh(&exp, src1, src2,  0x1, 0, 1);
+  res.xmmh[0] = _mm_mask3_fmadd_round_sh(src1.xmmh[0], src2.xmmh[0], 
+                                          res.xmmh[0], 0x1, _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask3_fmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fmadd_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_mask_fmadd_round_sh(src1.xmmh[0], 0x1, src2.xmmh[0], 
+                                         res.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_fmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fmadd_sh(&exp, src1, src2,  0x3, 1, 0);
+  res.xmmh[0] = _mm_maskz_fmadd_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], 
+                                          res.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_fmadd_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c
new file mode 100644
index 00000000000..335b9e21fcf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfmsub231sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub231sh\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsub...sh\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h a, b, c;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fmsub_sh (a, b, c);
+  a = _mm_mask_fmsub_sh (a, m, b, c);
+  c = _mm_mask3_fmsub_sh (a, b, c, m);
+  a = _mm_maskz_fmsub_sh (m, a, b, c);
+  a = _mm_fmsub_round_sh (a, b, c, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
+  a = _mm_mask_fmsub_round_sh (a, m, b, c, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC);
+  c = _mm_mask3_fmsub_round_sh (a, b, c, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC);
+  a = _mm_maskz_fmsub_round_sh (m, a, b, c, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c
new file mode 100644
index 00000000000..a2563fa816e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c
@@ -0,0 +1,89 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_fmsub_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask, int mask3)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = v1.f32[0] * v3.f32[0] - v7.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+    for (i = 1; i < 8; i++){
+      if (mask3)
+        v5.f32[i] = v7.f32[i];
+      else
+        v5.f32[i] = v1.f32[i];
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+  init_dest(&res, &exp);
+  emulate_fmsub_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_fmsub_sh(src1.xmmh[0],
+                             src2.xmmh[0], res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_fmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fmsub_sh(&exp, src1, src2,  0x1, 0, 1);
+  res.xmmh[0] = _mm_mask3_fmsub_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0],
+                                   0x1);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask3_fmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fmsub_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_mask_fmsub_sh(src1.xmmh[0], 0x1, src2.xmmh[0], res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_fmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fmsub_sh(&exp, src1, src2,  0x3, 1, 0);
+  res.xmmh[0] = _mm_maskz_fmsub_sh(0x3, src1.xmmh[0], src2.xmmh[0],
+                                   res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_fmsub_sh");
+
+  init_dest(&res, &exp);
+  emulate_fmsub_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_fmsub_round_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0],
+                                   _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_fmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fmsub_sh(&exp, src1, src2,  0x1, 0, 1);
+  res.xmmh[0] = _mm_mask3_fmsub_round_sh(src1.xmmh[0], src2.xmmh[0],
+                                         res.xmmh[0], 0x1, _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask3_fmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fmsub_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_mask_fmsub_round_sh(src1.xmmh[0], 0x1, src2.xmmh[0],
+                                        res.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_fmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fmsub_sh(&exp, src1, src2,  0x3, 1, 0);
+  res.xmmh[0] = _mm_maskz_fmsub_round_sh(0x3, src1.xmmh[0], src2.xmmh[0],
+                                         res.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_fmsub_sh");
+
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c
new file mode 100644
index 00000000000..77106aaeecb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfnmadd231sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd231sh\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmadd...sh\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h a, b, c;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmadd_sh (a, b, c);
+  a = _mm_mask_fnmadd_sh (a, m, b, c);
+  c = _mm_mask3_fnmadd_sh (a, b, c, m);
+  a = _mm_maskz_fnmadd_sh (m, a, b, c);
+  a = _mm_fnmadd_round_sh (a, b, c, _MM_FROUND_TO_NEAREST_INT
+			   | _MM_FROUND_NO_EXC);
+  a = _mm_mask_fnmadd_round_sh (a, m, b, c, _MM_FROUND_TO_NEG_INF
+				| _MM_FROUND_NO_EXC);
+  c = _mm_mask3_fnmadd_round_sh (a, b, c, m, _MM_FROUND_TO_POS_INF
+				 | _MM_FROUND_NO_EXC);
+  a = _mm_maskz_fnmadd_round_sh (m, a, b, c, _MM_FROUND_TO_ZERO
+				 | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c
new file mode 100644
index 00000000000..92001508424
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c
@@ -0,0 +1,90 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_fnmadd_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask, int mask3)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = -(v1.f32[0] * v3.f32[0]) + v7.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+   
+    for (i = 1; i < 8; i++){
+      if (mask3)
+        v5.f32[i] = v7.f32[i];
+      else
+        v5.f32[i] = v1.f32[i];
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  emulate_fnmadd_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_fnmadd_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_fnmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fnmadd_sh(&exp, src1, src2,  0x1, 0, 1);
+  res.xmmh[0] = _mm_mask3_fnmadd_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], 
+                                    0x1);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask3_fnmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fnmadd_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_mask_fnmadd_sh(src1.xmmh[0], 0x1, src2.xmmh[0], 
+                                   res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_fnmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fnmadd_sh(&exp, src1, src2,  0x3, 1, 0);
+  res.xmmh[0] = _mm_maskz_fnmadd_sh(0x3, src1.xmmh[0], src2.xmmh[0], 
+                                    res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_fnmadd_sh");
+
+  init_dest(&res, &exp);
+  emulate_fnmadd_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_fnmadd_round_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], 
+                                    _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_fnmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fnmadd_sh(&exp, src1, src2,  0x1, 0, 1);
+  res.xmmh[0] = _mm_mask3_fnmadd_round_sh(src1.xmmh[0], src2.xmmh[0], 
+                                          res.xmmh[0], 0x1, _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask3_fnmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fnmadd_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_mask_fnmadd_round_sh(src1.xmmh[0], 0x1, src2.xmmh[0], 
+                                         res.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_fnmadd_sh");
+  init_dest(&res, &exp);
+  emulate_fnmadd_sh(&exp, src1, src2,  0x3, 1, 0);
+  res.xmmh[0] = _mm_maskz_fnmadd_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], 
+                                          res.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_fnmadd_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c
new file mode 100644
index 00000000000..5d1460838e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  2 } } */
+/* { dg-final { scan-assembler-times "vfnmsub231sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub231sh\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfnmsub...sh\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h a, b, c;
+volatile __mmask8 m;
+
+void extern
+avx512f_test (void)
+{
+  a = _mm_fnmsub_sh (a, b, c);
+  a = _mm_mask_fnmsub_sh (a, m, b, c);
+  c = _mm_mask3_fnmsub_sh (a, b, c, m);
+  a = _mm_maskz_fnmsub_sh (m, a, b, c);
+  a = _mm_fnmsub_round_sh (a, b, c, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
+  a = _mm_mask_fnmsub_round_sh (a, m, b, c, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC);
+  c = _mm_mask3_fnmsub_round_sh (a, b, c, m, _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC);
+  a = _mm_maskz_fnmsub_round_sh (m, a, b, c, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c
new file mode 100644
index 00000000000..7bdb861425f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c
@@ -0,0 +1,90 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+emulate_fnmsub_sh(V512 * dest, V512 op1, V512 op2,
+                __mmask8 k, int zero_mask, int mask3)
+{
+    V512 v1, v2, v3, v4, v5, v6, v7, v8;
+    int i;
+
+    unpack_ph_2twops(op1, &v1, &v2);
+    unpack_ph_2twops(op2, &v3, &v4);
+    unpack_ph_2twops(*dest, &v7, &v8);
+
+    if ((k&1) || !k)
+      v5.f32[0] = -(v1.f32[0] * v3.f32[0]) - v7.f32[0];
+    else if (zero_mask)
+      v5.f32[0] = 0;
+    else
+      v5.f32[0] = v7.f32[0];
+
+    for (i = 1; i < 8; i++){
+      if (mask3)
+        v5.f32[i] = v7.f32[i];
+      else
+        v5.f32[i] = v1.f32[i];
+    }
+    *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+test_512 (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  emulate_fnmsub_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_fnmsub_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_fnmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fnmsub_sh(&exp, src1, src2,  0x1, 0, 1);
+  res.xmmh[0] = _mm_mask3_fnmsub_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], 
+                                    0x1);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask3_fnmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fnmsub_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_mask_fnmsub_sh(src1.xmmh[0], 0x1, src2.xmmh[0], 
+                                   res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_fnmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fnmsub_sh(&exp, src1, src2,  0x3, 1, 0);
+  res.xmmh[0] = _mm_maskz_fnmsub_sh(0x3, src1.xmmh[0], src2.xmmh[0], 
+                                    res.xmmh[0]);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_fnmsub_sh");
+
+  init_dest(&res, &exp);
+  emulate_fnmsub_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_fnmsub_round_sh(src1.xmmh[0], src2.xmmh[0], res.xmmh[0], 
+                                    _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_fnmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fnmsub_sh(&exp, src1, src2,  0x1, 0, 1);
+  res.xmmh[0] = _mm_mask3_fnmsub_round_sh(src1.xmmh[0], src2.xmmh[0], 
+                                          res.xmmh[0], 0x1, _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask3_fnmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fnmsub_sh(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_mask_fnmsub_round_sh(src1.xmmh[0], 0x1, src2.xmmh[0], 
+                                         res.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_mask_fnmsub_sh");
+  init_dest(&res, &exp);
+  emulate_fnmsub_sh(&exp, src1, src2,  0x3, 1, 0);
+  res.xmmh[0] = _mm_maskz_fnmsub_round_sh(0x3, src1.xmmh[0], src2.xmmh[0], 
+                                          res.xmmh[0], _ROUND_NINT);
+  check_results(&res, &exp, N_ELEMS, "_mm_maskz_fnmsub_sh");
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (47 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 48/62] AVX512FP16: Add testcase for scalar FMA instructions liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-09-22  4:38   ` Hongtao Liu
  2021-07-01  6:16 ` [PATCH 50/62] AVX512FP16: Add testcases for vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph liuhongt
                   ` (12 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm512_fcmadd_pch):
	New intrinsic.
	(_mm512_mask_fcmadd_pch): Likewise.
	(_mm512_mask3_fcmadd_pch): Likewise.
	(_mm512_maskz_fcmadd_pch): Likewise.
	(_mm512_fmadd_pch): Likewise.
	(_mm512_mask_fmadd_pch): Likewise.
	(_mm512_mask3_fmadd_pch): Likewise.
	(_mm512_maskz_fmadd_pch): Likewise.
	(_mm512_fcmadd_round_pch): Likewise.
	(_mm512_mask_fcmadd_round_pch): Likewise.
	(_mm512_mask3_fcmadd_round_pch): Likewise.
	(_mm512_maskz_fcmadd_round_pch): Likewise.
	(_mm512_fmadd_round_pch): Likewise.
	(_mm512_mask_fmadd_round_pch): Likewise.
	(_mm512_mask3_fmadd_round_pch): Likewise.
	(_mm512_maskz_fmadd_round_pch): Likewise.
	(_mm512_fcmul_pch): Likewise.
	(_mm512_mask_fcmul_pch): Likewise.
	(_mm512_maskz_fcmul_pch): Likewise.
	(_mm512_fmul_pch): Likewise.
	(_mm512_mask_fmul_pch): Likewise.
	(_mm512_maskz_fmul_pch): Likewise.
	(_mm512_fcmul_round_pch): Likewise.
	(_mm512_mask_fcmul_round_pch): Likewise.
	(_mm512_maskz_fcmul_round_pch): Likewise.
	(_mm512_fmul_round_pch): Likewise.
	(_mm512_mask_fmul_round_pch): Likewise.
	(_mm512_maskz_fmul_round_pch): Likewise.
	* config/i386/avx512fp16vlintrin.h (_mm_fmadd_pch):
	New intrinsic.
	(_mm_mask_fmadd_pch): Likewise.
	(_mm_mask3_fmadd_pch): Likewise.
	(_mm_maskz_fmadd_pch): Likewise.
	(_mm256_fmadd_pch): Likewise.
	(_mm256_mask_fmadd_pch): Likewise.
	(_mm256_mask3_fmadd_pch): Likewise.
	(_mm256_maskz_fmadd_pch): Likewise.
	(_mm_fcmadd_pch): Likewise.
	(_mm_mask_fcmadd_pch): Likewise.
	(_mm_mask3_fcmadd_pch): Likewise.
	(_mm_maskz_fcmadd_pch): Likewise.
	(_mm256_fcmadd_pch): Likewise.
	(_mm256_mask_fcmadd_pch): Likewise.
	(_mm256_mask3_fcmadd_pch): Likewise.
	(_mm256_maskz_fcmadd_pch): Likewise.
	(_mm_fmul_pch): Likewise.
	(_mm_mask_fmul_pch): Likewise.
	(_mm_maskz_fmul_pch): Likewise.
	(_mm256_fmul_pch): Likewise.
	(_mm256_mask_fmul_pch): Likewise.
	(_mm256_maskz_fmul_pch): Likewise.
	(_mm_fcmul_pch): Likewise.
	(_mm_mask_fcmul_pch): Likewise.
	(_mm_maskz_fcmul_pch): Likewise.
	(_mm256_fcmul_pch): Likewise.
	(_mm256_mask_fcmul_pch): Likewise.
	(_mm256_maskz_fcmul_pch): Likewise.
	* config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF_V8HF_V8HF,
	V8HF_FTYPE_V16HF_V16HF_V16HF, V16HF_FTYPE_V16HF_V16HF_V16HF_UQI,
	V32HF_FTYPE_V32HF_V32HF_V32HF_INT,
	V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT): Add new builtin types.
	* config/i386/i386-builtin.def: Add new builtins.
	* config/i386/i386-expand.c: Handle new builtin types.
	* config/i386/subst.md (SUBST_CV): New.
	(maskc_name): Ditto.
	(maskc_operand3): Ditto.
	(maskc): Ditto.
	(sdc_maskz_name): Ditto.
	(sdc_mask_op4): Ditto.
	(sdc_mask_op5): Ditto.
	(sdc_mask_mode512bit_condition): Ditto.
	(sdc): Ditto.
	(round_maskc_operand3): Ditto.
	(round_sdc_mask_operand4): Ditto.
	(round_maskc_op3): Ditto.
	(round_sdc_mask_op4): Ditto.
	(round_saeonly_sdc_mask_operand5): Ditto.
	* config/i386/sse.md (unspec): Add complex fma unspecs.
	(avx512fmaskcmode): New.
	(UNSPEC_COMPLEX_F_C_MA): Ditto.
	(UNSPEC_COMPLEX_F_C_MUL): Ditto.
	(complexopname): Ditto.
	(<avx512>_fmaddc_<mode>_maskz<round_expand_name>): New expander.
	(<avx512>_fcmaddc_<mode>_maskz<round_expand_name>): Ditto.
	(fma_<complexopname>_<mode><sdc_maskz_name><round_name>): New
	define insn.
	(<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto.
	(<avx512>_<complexopname>_<mode><maskc_name><round_name>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 386 +++++++++++++++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h   | 257 ++++++++++++++++
 gcc/config/i386/i386-builtin-types.def |   5 +
 gcc/config/i386/i386-builtin.def       |  30 ++
 gcc/config/i386/i386-expand.c          |   5 +
 gcc/config/i386/sse.md                 |  98 +++++++
 gcc/config/i386/subst.md               |  40 +++
 gcc/testsuite/gcc.target/i386/avx-1.c  |  10 +
 gcc/testsuite/gcc.target/i386/sse-13.c |  10 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  14 +
 gcc/testsuite/gcc.target/i386/sse-22.c |  14 +
 gcc/testsuite/gcc.target/i386/sse-23.c |  10 +
 12 files changed, 879 insertions(+)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 5c85ec15b22..9dd71019972 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -6109,6 +6109,392 @@ _mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vf[,c]maddcph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph_v32hf_round ((__v32hf) __C,
+					   (__v32hf) __A,
+					   (__v32hf) __B,
+					   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+{
+  return (__m512h) __builtin_ia32_movaps512_mask
+    ((__v16sf)
+     __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __D,
+						 (__v32hf) __A,
+						 (__v32hf) __C, __B,
+						 _MM_FROUND_CUR_DIRECTION),
+     (__v16sf) __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
+{
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __C,
+						(__v32hf) __A,
+						(__v32hf) __B,
+						__D, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fcmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
+{
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph_v32hf_maskz_round((__v32hf) __D,
+						(__v32hf) __B,
+						(__v32hf) __C,
+						__A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmadd_pch (__m512h __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddcph_v32hf_round((__v32hf) __C,
+					 (__v32hf) __A,
+					 (__v32hf) __B,
+					 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+{
+  return (__m512h) __builtin_ia32_movaps512_mask
+    ((__v16sf)
+     __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) __D,
+						(__v32hf) __A,
+						(__v32hf) __C, __B,
+						_MM_FROUND_CUR_DIRECTION),
+     (__v16sf) __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddcph_v32hf_mask_round((__v32hf) __C,
+					      (__v32hf) __A,
+					      (__v32hf) __B,
+					      __D, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddcph_v32hf_maskz_round((__v32hf) __D,
+					       (__v32hf) __B,
+					       (__v32hf) __C,
+					       __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
+{
+  return (__m512h)__builtin_ia32_vfcmaddcph_v32hf_round((__v32hf) __C,
+							(__v32hf) __A,
+							(__v32hf) __B,
+							__D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fcmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+			      __m512h __D, const int __E)
+{
+  return (__m512h) __builtin_ia32_movaps512_mask
+    ((__v16sf)
+     __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __D,
+						 (__v32hf) __A,
+						 (__v32hf) __C, __B,
+						 __E),
+     (__v16sf) __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
+			       __mmask16 __D, const int __E)
+{
+  return (__m512h)
+    __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __C,
+						(__v32hf) __A,
+						(__v32hf) __B,
+						__D, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fcmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
+			       __m512h __D, const int __E)
+{
+  return (__m512h)__builtin_ia32_vfcmaddcph_v32hf_maskz_round((__v32hf) __D,
+							      (__v32hf) __B,
+							      (__v32hf) __C,
+							      __A,
+							      __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddcph_v32hf_round ((__v32hf) __C,
+					  (__v32hf) __A,
+					  (__v32hf) __B,
+					  __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+			     __m512h __D, const int __E)
+{
+  return (__m512h) __builtin_ia32_movaps512_mask
+    ((__v16sf)
+     __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) __D,
+						(__v32hf) __A,
+						(__v32hf) __C, __B,
+						__E),
+     (__v16sf) __A, __B);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask3_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
+			      __mmask16 __D, const int __E)
+{
+  return (__m512h)
+    __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) __C,
+					       (__v32hf) __A,
+					       (__v32hf) __B,
+					       __D, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
+			      __m512h __D, const int __E)
+{
+  return (__m512h)__builtin_ia32_vfmaddcph_v32hf_maskz_round((__v32hf) __D,
+							     (__v32hf) __B,
+							     (__v32hf) __C,
+							     __A, __E);
+}
+
+#else
+#define _mm512_fcmadd_round_pch(A, B, C, D)			\
+  (__m512h) __builtin_ia32_vfcmaddcph_v32hf_round ((C), (A), (B), (D))
+
+#define _mm512_mask_fcmadd_round_pch(A, B, C, D, E)			\
+  ((__m512h) __builtin_ia32_movaps512_mask (				\
+   (__v16sf)								\
+    __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) (D),		\
+						(__v32hf) (A),		\
+						(__v32hf) (C),		\
+						(B), (E)),		\
+						(__v16sf) (A), (B)));
+
+
+#define _mm512_mask3_fcmadd_round_pch(A, B, C, D, E)			\
+  ((__m512h)								\
+   __builtin_ia32_vfcmaddcph_v32hf_mask_round ((C), (A), (B), (D), (E)))
+
+#define _mm512_maskz_fcmadd_round_pch(A, B, C, D, E)			\
+  (__m512h)								\
+   __builtin_ia32_vfcmaddcph_v32hf_maskz_round((D), (B), (C), (A), (E))
+
+#define _mm512_fmadd_round_pch(A, B, C, D)			\
+  (__m512h) __builtin_ia32_vfmaddcph_v32hf_round((C), (A), (B), (D))
+
+#define _mm512_mask_fmadd_round_pch(A, B, C, D, E)			\
+  ((__m512h) __builtin_ia32_movaps512_mask (				\
+   (__v16sf)								\
+    __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) (D),		\
+					       (__v32hf) (A),		\
+					       (__v32hf) (C),		\
+					       (B), (E)),		\
+					       (__v16sf) (A), (B)));
+
+#define _mm512_mask3_fmadd_round_pch(A, B, C, D, E)			\
+  (__m512h)								\
+   __builtin_ia32_vfmaddcph_v32hf_mask_round((C), (A), (B), (D), (E))
+
+#define _mm512_maskz_fmadd_round_pch(A, B, C, D, E)			\
+  (__m512h)								\
+   __builtin_ia32_vfmaddcph_v32hf_maskz_round((D), (B), (C), (A), (E))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vf[,c]mulcph.  */
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fcmul_pch (__m512h __A, __m512h __B)
+{
+  return (__m512h)
+    __builtin_ia32_vfcmulcph_v32hf_round((__v32hf) __A,
+					 (__v32hf) __B,
+					 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fcmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+{
+  return (__m512h)
+    __builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __C,
+					      (__v32hf) __D,
+					      (__v32hf) __A,
+					      __B, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fcmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __B,
+					      (__v32hf) __C,
+					      _mm512_setzero_ph (),
+					      __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmul_pch (__m512h __A, __m512h __B)
+{
+  return (__m512h)
+    __builtin_ia32_vfmulcph_v32hf_round((__v32hf) __A,
+					(__v32hf) __B,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
+{
+  return (__m512h)
+    __builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __C,
+					     (__v32hf) __D,
+					     (__v32hf) __A,
+					     __B, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
+{
+  return (__m512h)
+    __builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __B,
+					     (__v32hf) __C,
+					     _mm512_setzero_ph (),
+					     __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fcmul_round_pch (__m512h __A, __m512h __B, const int __D)
+{
+  return (__m512h)__builtin_ia32_vfcmulcph_v32hf_round((__v32hf) __A,
+						       (__v32hf) __B, __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fcmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+			     __m512h __D, const int __E)
+{
+  return (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __C,
+							    (__v32hf) __D,
+							    (__v32hf) __A,
+							    __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fcmul_round_pch (__mmask16 __A, __m512h __B,
+			      __m512h __C, const int __E)
+{
+  return (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __B,
+							    (__v32hf) __C,
+							    _mm512_setzero_ph (),
+							    __A, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_fmul_round_pch (__m512h __A, __m512h __B, const int __D)
+{
+  return (__m512h)__builtin_ia32_vfmulcph_v32hf_round((__v32hf) __A,
+						      (__v32hf) __B,
+						      __D);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_fmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
+			    __m512h __D, const int __E)
+{
+  return (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __C,
+							   (__v32hf) __D,
+							   (__v32hf) __A,
+							   __B, __E);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B,
+			     __m512h __C, const int __E)
+{
+  return (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __B,
+							   (__v32hf) __C,
+							   _mm512_setzero_ph (),
+							   __A, __E);
+}
+
+#else
+#define _mm512_fcmul_round_pch(A, B, D)				\
+  (__m512h)__builtin_ia32_vfcmulcph_v32hf_round(A, B, D)
+
+#define _mm512_mask_fcmul_round_pch(A, B, C, D, E)			\
+  (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round(C, D, A, B, E)
+
+#define _mm512_maskz_fcmul_round_pch(A, B, C, E)			\
+  (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round(B, C,		\
+						     _mm512_setzero_ph(), \
+						     A, E)
+
+#define _mm512_fmul_round_pch(A, B, D)			\
+  (__m512h)__builtin_ia32_vfmulcph_v32hf_round(A, B, D)
+
+#define _mm512_mask_fmul_round_pch(A, B, C, D, E)			\
+  (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round(C, D, A, B, E)
+
+#define _mm512_maskz_fmul_round_pch(A, B, C, E)				\
+  (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round(B, C,		\
+						    _mm512_setzero_ph (), \
+						    A, E)
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index bba98f105ac..c7bdfbc0517 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -2815,6 +2815,263 @@ _mm_maskz_fnmsub_ph (__mmask8 __U, __m128h __A, __m128h __B,
 							__U);
 }
 
+/* Intrinsics vf[,c]maddcph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_pch (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h)__builtin_ia32_vfmaddcph_v8hf((__v8hf) __C, (__v8hf) __A,
+						(__v8hf) __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmadd_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return (__m128h) __builtin_ia32_movaps128_mask
+    ((__v4sf)
+     __builtin_ia32_vfmaddcph_v8hf_mask ((__v8hf) __D,
+					 (__v8hf) __A,
+					 (__v8hf) __C, __B),
+     (__v4sf) __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmadd_pch (__m128h __A, __m128h __B, __m128h __C,  __mmask8 __D)
+{
+  return (__m128h) __builtin_ia32_vfmaddcph_v8hf_mask ((__v8hf) __C,
+						       (__v8hf) __A,
+						       (__v8hf) __B, __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmadd_pch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
+{
+  return (__m128h)__builtin_ia32_vfmaddcph_v8hf_maskz((__v8hf) __D,
+						      (__v8hf) __B,
+						      (__v8hf) __C, __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmadd_pch (__m256h __A, __m256h __B, __m256h __C)
+{
+  return (__m256h)__builtin_ia32_vfmaddcph_v16hf((__v16hf) __C, (__v16hf) __A,
+						 (__v16hf) __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmadd_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D)
+{
+  return (__m256h) __builtin_ia32_movaps256_mask
+    ((__v8sf)
+     __builtin_ia32_vfmaddcph_v16hf_mask ((__v16hf) __D,
+					  (__v16hf) __A,
+					  (__v16hf) __C, __B),
+     (__v8sf) __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fmadd_pch (__m256h __A, __m256h __B, __m256h __C,  __mmask8 __D)
+{
+  return (__m256h) __builtin_ia32_vfmaddcph_v16hf_mask ((__v16hf) __C,
+							(__v16hf) __A,
+							(__v16hf) __B, __D);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmadd_pch (__mmask8 __A, __m256h __B, __m256h __C, __m256h __D)
+{
+  return (__m256h)__builtin_ia32_vfmaddcph_v16hf_maskz((__v16hf) __D,
+						       (__v16hf) __B,
+						       (__v16hf) __C, __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fcmadd_pch (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h)__builtin_ia32_vfcmaddcph_v8hf ((__v8hf) __C,
+						  (__v8hf) __A, (__v8hf) __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fcmadd_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return (__m128h)__builtin_ia32_movaps128_mask
+    ((__v4sf)
+     __builtin_ia32_vfcmaddcph_v8hf_mask ((__v8hf) __D,
+					  (__v8hf) __A,
+					  (__v8hf) __C, __B),
+     (__v4sf) __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fcmadd_pch (__m128h __A, __m128h __B, __m128h __C,  __mmask8 __D)
+{
+  return (__m128h) __builtin_ia32_vfcmaddcph_v8hf_mask ((__v8hf) __C,
+							(__v8hf) __A,
+							(__v8hf) __B, __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fcmadd_pch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
+{
+  return (__m128h)__builtin_ia32_vfcmaddcph_v8hf_maskz ((__v8hf) __D,
+							(__v8hf) __B,
+							(__v8hf) __C, __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fcmadd_pch (__m256h __A, __m256h __B, __m256h __C)
+{
+  return (__m256h)__builtin_ia32_vfcmaddcph_v16hf((__v16hf) __C,
+						  (__v16hf) __A, (__v16hf) __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fcmadd_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D)
+{
+  return (__m256h) __builtin_ia32_movaps256_mask
+    ((__v8sf)
+     __builtin_ia32_vfcmaddcph_v16hf_mask ((__v16hf) __D,
+					   (__v16hf) __A,
+					   (__v16hf) __C, __B),
+     (__v8sf) __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask3_fcmadd_pch (__m256h __A, __m256h __B, __m256h __C,  __mmask8 __D)
+{
+  return (__m256h) __builtin_ia32_vfcmaddcph_v16hf_mask ((__v16hf) __C,
+							 (__v16hf) __A,
+							 (__v16hf) __B, __D);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fcmadd_pch (__mmask8 __A, __m256h __B, __m256h __C, __m256h __D)
+{
+  return (__m256h)__builtin_ia32_vfcmaddcph_v16hf_maskz((__v16hf) __D,
+							(__v16hf) __B,
+							(__v16hf) __C, __A);
+}
+
+/* Intrinsics vf[,c]mulcph.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmul_pch (__m128h __A, __m128h __B)
+{
+  return (__m128h)__builtin_ia32_vfmulcph_v8hf((__v8hf) __A, (__v8hf) __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmul_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return (__m128h)__builtin_ia32_vfmulcph_v8hf_mask((__v8hf) __C,
+						    (__v8hf) __D,
+						    (__v8hf) __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmul_pch (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return (__m128h)__builtin_ia32_vfmulcph_v8hf_mask((__v8hf) __B,
+						    (__v8hf) __C,
+						    _mm_setzero_ph (),
+						    __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fmul_pch (__m256h __A, __m256h __B)
+{
+  return (__m256h)__builtin_ia32_vfmulcph_v16hf((__v16hf) __A, (__v16hf) __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fmul_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D)
+{
+  return (__m256h)__builtin_ia32_vfmulcph_v16hf_mask((__v16hf) __C,
+						     (__v16hf) __D,
+						     (__v16hf) __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fmul_pch (__mmask8 __A, __m256h __B, __m256h __C)
+{
+  return (__m256h)__builtin_ia32_vfmulcph_v16hf_mask((__v16hf) __B,
+						     (__v16hf) __C,
+						     _mm256_setzero_ph (),
+						     __A);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fcmul_pch (__m128h __A, __m128h __B)
+{
+  return (__m128h)__builtin_ia32_vfcmulcph_v8hf((__v8hf) __A, (__v8hf) __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fcmul_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return (__m128h)__builtin_ia32_vfcmulcph_v8hf_mask((__v8hf) __C, (__v8hf) __D,
+						     (__v8hf) __A, __B);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fcmul_pch (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return (__m128h)__builtin_ia32_vfcmulcph_v8hf_mask((__v8hf) __B,
+						     (__v8hf) __C,
+						     _mm_setzero_ph (),
+						     __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_fcmul_pch (__m256h __A, __m256h __B)
+{
+  return (__m256h)__builtin_ia32_vfcmulcph_v16hf((__v16hf) __A, (__v16hf) __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_fcmul_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D)
+{
+  return (__m256h)__builtin_ia32_vfcmulcph_v16hf_mask((__v16hf) __C,
+						      (__v16hf) __D,
+						      (__v16hf) __A, __B);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_fcmul_pch (__mmask8 __A, __m256h __B, __m256h __C)
+{
+  return (__m256h)__builtin_ia32_vfcmulcph_v16hf_mask((__v16hf) __B,
+						      (__v16hf) __C,
+						      _mm256_setzero_ph (),
+						      __A);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 22b924bf98d..35bcafd14e3 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1348,6 +1348,7 @@ DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT)
 DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8DI, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V8DF, V8HF, UQI, INT)
+DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF)
 DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V2DF, V8HF, V8HF, UQI, INT)
 DEF_FUNCTION_TYPE (V8HF, V4SF, V8HF, V8HF, UQI, INT)
@@ -1358,12 +1359,14 @@ DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
 DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HI, V16HF, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI)
+DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF)
 DEF_FUNCTION_TYPE (V16SI, V16HF, V16SI, UHI, INT)
 DEF_FUNCTION_TYPE (V16SF, V16HF, V16SF, UHI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI)
 DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI)
 DEF_FUNCTION_TYPE (V16HF, V16SI, V16HF, UHI, INT)
 DEF_FUNCTION_TYPE (V16HF, V16SF, V16HF, UHI, INT)
+DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UQI)
 DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
@@ -1371,7 +1374,9 @@ DEF_FUNCTION_TYPE (V32HI, V32HF, V32HI, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HI, V32HF, USI, INT)
 DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT)
+DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
 DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
+DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, UHI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, INT, V32HF, USI, INT)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index f446a6ce5d3..448f9f75fa4 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2911,6 +2911,26 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask, "__builtin_ia32_vfnmsubph128_mask", IX86_BUILTIN_VFNMSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask3, "__builtin_ia32_vfnmsubph128_mask3", IX86_BUILTIN_VFNMSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_maskz, "__builtin_ia32_vfnmsubph128_maskz", IX86_BUILTIN_VFNMSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v8hf, "__builtin_ia32_vfmaddcph_v8hf", IX86_BUILTIN_VFMADDCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddc_v8hf_mask, "__builtin_ia32_vfmaddcph_v8hf_mask", IX86_BUILTIN_VFMADDCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddc_v8hf_maskz, "__builtin_ia32_vfmaddcph_v8hf_maskz", IX86_BUILTIN_VFMADDCPH_V8HF_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v16hf, "__builtin_ia32_vfmaddcph_v16hf", IX86_BUILTIN_VFMADDCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddc_v16hf_mask, "__builtin_ia32_vfmaddcph_v16hf_mask", IX86_BUILTIN_VFMADDCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddc_v16hf_maskz, "__builtin_ia32_vfmaddcph_v16hf_maskz", IX86_BUILTIN_VFMADDCPH_V16HF_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v8hf, "__builtin_ia32_vfcmaddcph_v8hf", IX86_BUILTIN_VFCMADDCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddc_v8hf_mask, "__builtin_ia32_vfcmaddcph_v8hf_mask", IX86_BUILTIN_VFCMADDCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddc_v8hf_maskz, "__builtin_ia32_vfcmaddcph_v8hf_maskz", IX86_BUILTIN_VFCMADDCPH_V8HF_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v16hf, "__builtin_ia32_vfcmaddcph_v16hf", IX86_BUILTIN_VFCMADDCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmaddc_v16hf_mask, "__builtin_ia32_vfcmaddcph_v16hf_mask", IX86_BUILTIN_VFCMADDCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmaddc_v16hf_maskz, "__builtin_ia32_vfcmaddcph_v16hf_maskz", IX86_BUILTIN_VFCMADDCPH_V16HF_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulc_v8hf, "__builtin_ia32_vfcmulcph_v8hf", IX86_BUILTIN_VFCMULCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulc_v8hf_mask, "__builtin_ia32_vfcmulcph_v8hf_mask", IX86_BUILTIN_VFCMULCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmulc_v16hf, "__builtin_ia32_vfcmulcph_v16hf", IX86_BUILTIN_VFCMULCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmulc_v16hf_mask, "__builtin_ia32_vfcmulcph_v16hf_mask", IX86_BUILTIN_VFCMULCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulc_v8hf, "__builtin_ia32_vfmulcph_v8hf", IX86_BUILTIN_VFMULCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulc_v8hf_mask, "__builtin_ia32_vfmulcph_v8hf_mask", IX86_BUILTIN_VFMULCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmulc_v16hf, "__builtin_ia32_vfmulcph_v16hf", IX86_BUILTIN_VFMULCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmulc_v16hf_mask, "__builtin_ia32_vfmulcph_v16hf_mask", IX86_BUILTIN_VFMULCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
 
 /* Builtins with rounding support.  */
 BDESC_END (ARGS, ROUND_ARGS)
@@ -3201,6 +3221,16 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask_round
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask3_round, "__builtin_ia32_vfnmaddsh3_mask3", IX86_BUILTIN_VFNMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_maskz_round, "__builtin_ia32_vfnmaddsh3_maskz", IX86_BUILTIN_VFNMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmsub_v8hf_mask3_round, "__builtin_ia32_vfmsubsh3_mask3", IX86_BUILTIN_VFMSUBSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v32hf_round, "__builtin_ia32_vfmaddcph_v32hf_round", IX86_BUILTIN_VFMADDCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_mask_round, "__builtin_ia32_vfmaddcph_v32hf_mask_round", IX86_BUILTIN_VFMADDCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_maskz_round, "__builtin_ia32_vfmaddcph_v32hf_maskz_round", IX86_BUILTIN_VFMADDCPH_V32HF_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v32hf_round, "__builtin_ia32_vfcmaddcph_v32hf_round", IX86_BUILTIN_VFCMADDCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_mask_round, "__builtin_ia32_vfcmaddcph_v32hf_mask_round", IX86_BUILTIN_VFCMADDCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_maskz_round, "__builtin_ia32_vfcmaddcph_v32hf_maskz_round", IX86_BUILTIN_VFCMADDCPH_V32HF_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__builtin_ia32_vfcmulcph_v32hf_round", IX86_BUILTIN_VFCMULCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph_v32hf_mask_round", IX86_BUILTIN_VFCMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph_v32hf_round", IX86_BUILTIN_VFMULCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph_v32hf_mask_round", IX86_BUILTIN_VFMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index f6de05c769a..f6d74549dc2 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -9582,6 +9582,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V2DI_FTYPE_V8HF_V2DI_UQI:
     case V2DI_FTYPE_V4SF_V2DI_UQI:
     case V8HF_FTYPE_V8HF_V8HF_UQI:
+    case V8HF_FTYPE_V8HF_V8HF_V8HF:
     case V8HF_FTYPE_V8HI_V8HF_UQI:
     case V8HF_FTYPE_V8SI_V8HF_UQI:
     case V8HF_FTYPE_V8SF_V8HF_UQI:
@@ -9660,6 +9661,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V16SF_FTYPE_V8SF_V16SF_UHI:
     case V16SI_FTYPE_V8SI_V16SI_UHI:
     case V16HF_FTYPE_V16HI_V16HF_UHI:
+    case V16HF_FTYPE_V16HF_V16HF_V16HF:
     case V16HI_FTYPE_V16HF_V16HI_UHI:
     case V16HI_FTYPE_V16HI_V16HI_UHI:
     case V8HI_FTYPE_V16QI_V8HI_UQI:
@@ -9816,6 +9818,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
     case V8HI_FTYPE_V8HI_V8HI_V8HI_UQI:
     case V8SI_FTYPE_V8SI_V8SI_V8SI_UQI:
     case V4SI_FTYPE_V4SI_V4SI_V4SI_UQI:
+    case V16HF_FTYPE_V16HF_V16HF_V16HF_UQI:
     case V16HF_FTYPE_V16HF_V16HF_V16HF_UHI:
     case V8SF_FTYPE_V8SF_V8SF_V8SF_UQI:
     case V16QI_FTYPE_V16QI_V16QI_V16QI_UHI:
@@ -10545,6 +10548,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V16SF_FTYPE_V16HF_V16SF_UHI_INT:
     case V32HF_FTYPE_V32HI_V32HF_USI_INT:
     case V32HF_FTYPE_V32HF_V32HF_USI_INT:
+    case V32HF_FTYPE_V32HF_V32HF_V32HF_INT:
     case V16SF_FTYPE_V16SF_V16SF_HI_INT:
     case V8DI_FTYPE_V8SF_V8DI_QI_INT:
     case V16SF_FTYPE_V16SI_V16SF_HI_INT:
@@ -10574,6 +10578,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
     case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT:
     case V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT:
     case V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT:
+    case V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT:
     case V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT:
     case V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT:
     case V2DF_FTYPE_V2DF_V2DF_V2DF_QI_INT:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 31f8fc68c65..ddd93f739e3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -194,6 +194,14 @@ (define_c_enum "unspec" [
   UNSPEC_VCVTNE2PS2BF16
   UNSPEC_VCVTNEPS2BF16
   UNSPEC_VDPBF16PS
+
+  ;; For AVX512FP16 suppport
+  UNSPEC_COMPLEX_FMA
+  UNSPEC_COMPLEX_FCMA
+  UNSPEC_COMPLEX_FMUL
+  UNSPEC_COMPLEX_FCMUL
+  UNSPEC_COMPLEX_MASK
+
 ])
 
 (define_c_enum "unspecv" [
@@ -909,6 +917,10 @@ (define_mode_attr avx512fmaskmode
    (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
+;; Mapping of vector modes to corresponding complex mask size
+(define_mode_attr avx512fmaskcmode
+  [(V32HF "HI") (V16HF "QI") (V8HF  "QI")])
+
 ;; Mapping of vector modes to corresponding mask size
 (define_mode_attr avx512fmaskmodelower
   [(V64QI "di") (V32QI "si") (V16QI "hi")
@@ -5499,6 +5511,92 @@ (define_insn "*fma4i_vmfnmsub_<mode>"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
+;; Complex type operations
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_int_iterator UNSPEC_COMPLEX_F_C_MA
+	[UNSPEC_COMPLEX_FMA UNSPEC_COMPLEX_FCMA])
+
+(define_int_iterator UNSPEC_COMPLEX_F_C_MUL
+	[UNSPEC_COMPLEX_FMUL UNSPEC_COMPLEX_FCMUL])
+
+(define_int_attr complexopname
+	[(UNSPEC_COMPLEX_FMA "fmaddc")
+	 (UNSPEC_COMPLEX_FCMA "fcmaddc")
+	 (UNSPEC_COMPLEX_FMUL "fmulc")
+	 (UNSPEC_COMPLEX_FCMUL "fcmulc")])
+
+(define_expand "<avx512>_fmaddc_<mode>_maskz<round_expand_name>"
+  [(match_operand:VF_AVX512FP16VL 0 "register_operand")
+   (match_operand:VF_AVX512FP16VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512FP16VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512FP16VL 3 "<round_expand_nimm_predicate>")
+   (match_operand:<avx512fmaskcmode> 4 "register_operand")]
+  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
+{
+  emit_insn (gen_fma_fmaddc_<mode>_maskz_1<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
+(define_expand "<avx512>_fcmaddc_<mode>_maskz<round_expand_name>"
+  [(match_operand:VF_AVX512FP16VL 0 "register_operand")
+   (match_operand:VF_AVX512FP16VL 1 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512FP16VL 2 "<round_expand_nimm_predicate>")
+   (match_operand:VF_AVX512FP16VL 3 "<round_expand_nimm_predicate>")
+   (match_operand:<avx512fmaskcmode> 4 "register_operand")]
+  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
+{
+  emit_insn (gen_fma_fcmaddc_<mode>_maskz_1<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
+(define_insn "fma_<complexopname>_<mode><sdc_maskz_name><round_name>"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
+	(unspec:VF_AVX512FP16VL
+	  [(match_operand:VF_AVX512FP16VL 1 "<round_nimm_predicate>" "0")
+	   (match_operand:VF_AVX512FP16VL 2 "<round_nimm_predicate>" "%v")
+	   (match_operand:VF_AVX512FP16VL 3 "<round_nimm_predicate>" "<round_constraint>")]
+	   UNSPEC_COMPLEX_F_C_MA))]
+  "TARGET_AVX512FP16 && <sdc_mask_mode512bit_condition> && <round_mode512bit_condition>"
+  "v<complexopname><ssemodesuffix>\t{<round_sdc_mask_op4>%3, %2, %0<sdc_mask_op4>|%0<sdc_mask_op4>, %2, %3<round_sdc_mask_op4>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "<avx512>_<complexopname>_<mode>_mask<round_name>"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
+	(vec_merge:VF_AVX512FP16VL
+	  (unspec:VF_AVX512FP16VL
+	    [(match_operand:VF_AVX512FP16VL 1 "register_operand" "0")
+	     (match_operand:VF_AVX512FP16VL 2 "nonimmediate_operand" "%v")
+	     (match_operand:VF_AVX512FP16VL 3 "nonimmediate_operand" "<round_constraint>")]
+	     UNSPEC_COMPLEX_F_C_MA)
+	  (match_dup 1)
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:<avx512fmaskcmode> 4 "register_operand" "Yk")]
+	    UNSPEC_COMPLEX_MASK)))]
+  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
+  "v<complexopname><ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "<avx512>_<complexopname>_<mode><maskc_name><round_name>"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
+	  (unspec:VF_AVX512FP16VL
+	    [(match_operand:VF_AVX512FP16VL 1 "nonimmediate_operand" "%v")
+	     (match_operand:VF_AVX512FP16VL 2 "nonimmediate_operand" "<round_constraint>")]
+	     UNSPEC_COMPLEX_F_C_MUL))]
+  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
+  "v<complexopname><ssemodesuffix>\t{<round_maskc_op3>%2, %1, %0<maskc_operand3>|%0<maskc_operand3>, %1, %2<round_maskc_op3>}"
+  [(set_attr "type" "ssemul")
+   (set_attr "mode" "<MODE>")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel half-precision floating point conversion operations
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 2e9c2b38e25..3a1f554e9b9 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -28,6 +28,9 @@ (define_mode_iterator SUBST_V
    V16SF V8SF  V4SF
    V8DF  V4DF  V2DF])
 
+(define_mode_iterator SUBST_CV
+  [V32HF V16HF V8HF])
+
 (define_mode_iterator SUBST_S
   [QI HI SI DI])
 
@@ -42,9 +45,11 @@ (define_mode_iterator SUBST_A
    QI HI SI DI SF DF])
 
 (define_subst_attr "mask_name" "mask" "" "_mask")
+(define_subst_attr "maskc_name" "maskc" "" "_mask")
 (define_subst_attr "mask_applied" "mask" "false" "true")
 (define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2")
 (define_subst_attr "mask_operand3" "mask" "" "%{%4%}%N3")
+(define_subst_attr "maskc_operand3" "maskc" "" "%{%4%}%N3")
 (define_subst_attr "mask_operand3_1" "mask" "" "%%{%%4%%}%%N3") ;; for sprintf
 (define_subst_attr "mask_operand4" "mask" "" "%{%5%}%N4")
 (define_subst_attr "mask_operand6" "mask" "" "%{%7%}%N6")
@@ -89,6 +94,18 @@ (define_subst "merge_mask"
 	  (match_dup 0)
 	  (match_operand:<avx512fmaskmode> 2 "register_operand" "Yk")))])
 
+(define_subst "maskc"
+  [(set (match_operand:SUBST_CV 0)
+        (match_operand:SUBST_CV 1))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+        (vec_merge:SUBST_CV
+	  (match_dup 1)
+	  (match_operand:SUBST_CV 2 "nonimm_or_0_operand" "0C")
+	  (unspec:<avx512fmaskmode>
+	    [(match_operand:<avx512fmaskcmode> 3 "register_operand" "Yk")]
+	    UNSPEC_COMPLEX_MASK)))])
+
 (define_subst_attr "mask_scalar_merge_name" "mask_scalar_merge" "" "_mask")
 (define_subst_attr "mask_scalar_merge_operand3" "mask_scalar_merge" "" "%{%3%}")
 (define_subst_attr "mask_scalar_merge_operand4" "mask_scalar_merge" "" "%{%4%}")
@@ -119,11 +136,31 @@ (define_subst "sd"
 	 (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk")))
 ])
 
+(define_subst_attr "sdc_maskz_name" "sdc" "" "_maskz_1")
+(define_subst_attr "sdc_mask_op4" "sdc" "" "%{%5%}%N4")
+(define_subst_attr "sdc_mask_op5" "sdc" "" "%{%6%}%N5")
+(define_subst_attr "sdc_mask_mode512bit_condition" "sdc" "1" "(<MODE_SIZE> == 64 || TARGET_AVX512VL)")
+
+(define_subst "sdc"
+ [(set (match_operand:SUBST_CV 0)
+       (match_operand:SUBST_CV 1))]
+ ""
+ [(set (match_dup 0)
+       (vec_merge:SUBST_CV
+	 (match_dup 1)
+	 (match_operand:SUBST_CV 2 "const0_operand" "C")
+	 (unspec:<avx512fmaskmode>
+	   [(match_operand:<avx512fmaskcmode> 3 "register_operand" "Yk")]
+	   UNSPEC_COMPLEX_MASK)))
+])
+
 (define_subst_attr "round_name" "round" "" "_round")
 (define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4")
 (define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5")
+(define_subst_attr "round_maskc_operand3" "maskc" "%R3" "%R5")
 (define_subst_attr "round_mask_operand4" "mask" "%R4" "%R6")
 (define_subst_attr "round_sd_mask_operand4" "sd" "%R4" "%R6")
+(define_subst_attr "round_sdc_mask_operand4" "sdc" "%R4" "%R6")
 (define_subst_attr "round_op2" "round" "" "%R2")
 (define_subst_attr "round_op3" "round" "" "%R3")
 (define_subst_attr "round_op4" "round" "" "%R4")
@@ -131,8 +168,10 @@ (define_subst_attr "round_op5" "round" "" "%R5")
 (define_subst_attr "round_op6" "round" "" "%R6")
 (define_subst_attr "round_mask_op2" "round" "" "<round_mask_operand2>")
 (define_subst_attr "round_mask_op3" "round" "" "<round_mask_operand3>")
+(define_subst_attr "round_maskc_op3" "round" "" "<round_maskc_operand3>")
 (define_subst_attr "round_mask_op4" "round" "" "<round_mask_operand4>")
 (define_subst_attr "round_sd_mask_op4" "round" "" "<round_sd_mask_operand4>")
+(define_subst_attr "round_sdc_mask_op4" "round" "" "<round_sdc_mask_operand4>")
 (define_subst_attr "round_constraint" "round" "vm" "v")
 (define_subst_attr "round_qq2phsuff" "round" "<qq2phsuff>" "")
 (define_subst_attr "bcst_round_constraint" "round" "vmBr" "v")
@@ -169,6 +208,7 @@ (define_subst_attr "round_saeonly_mask_operand3" "mask" "%r3" "%r5")
 (define_subst_attr "round_saeonly_mask_operand4" "mask" "%r4" "%r6")
 (define_subst_attr "round_saeonly_mask_scalar_merge_operand4" "mask_scalar_merge" "%r4" "%r5")
 (define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%r5" "%r7")
+(define_subst_attr "round_saeonly_sdc_mask_operand5" "sdc" "%r5" "%r7")
 (define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%r2")
 (define_subst_attr "round_saeonly_op3" "round_saeonly" "" "%r3")
 (define_subst_attr "round_saeonly_op4" "round_saeonly" "" "%r4")
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 6c2d1dc3df4..56e90d9f9a5 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -787,6 +787,16 @@
 #define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, 8)
+#define __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, 8)
+#define __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfmulcph_v32hf_round(A, B, C) __builtin_ia32_vfmulcph_v32hf_round(A, B, 8)
+#define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8)
+#define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index f16be008909..ef9f8aad853 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -804,6 +804,16 @@
 #define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, 8)
+#define __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, 8)
+#define __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfmulcph_v32hf_round(A, B, C) __builtin_ia32_vfmulcph_v32hf_round(A, B, 8)
+#define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8)
+#define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 01ac4e04173..f27c73fd4cc 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -772,6 +772,8 @@ test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8)
 test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8)
 test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
 test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
+test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -846,6 +848,10 @@ test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
+test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
+test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -908,6 +914,14 @@ test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h,
 test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
 test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
 test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
+test_4 (_mm512_mask_fmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
+test_4 (_mm512_mask_fcmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
+test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
+test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
+test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
+test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
+test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
+test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 79e3f35ab86..ccf8c3a6c03 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -876,6 +876,8 @@ test_2 (_mm_cvt_roundsh_ss, __m128, __m128, __m128h, 8)
 test_2 (_mm_cvt_roundsh_sd, __m128d, __m128d, __m128h, 8)
 test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8)
 test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8)
+test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8)
+test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -949,6 +951,10 @@ test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
+test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
+test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
+test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -1010,6 +1016,14 @@ test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h,
 test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
 test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
 test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
+test_4 (_mm512_mask_fmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
+test_4 (_mm512_mask_fcmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
+test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
+test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
+test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
+test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
+test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
+test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index caf14408b91..dc39d7e2012 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -805,6 +805,16 @@
 #define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8)
 #define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8)
+#define __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, 8)
+#define __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, 8)
+#define __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfmulcph_v32hf_round(A, B, C) __builtin_ia32_vfmulcph_v32hf_round(A, B, 8)
+#define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8)
+#define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 50/62] AVX512FP16: Add testcases for vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (48 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 51/62] AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh liuhongt
                   ` (11 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-helper.h
	(init_src): Adjust init value.
	(NET_CMASK): New net mask for complex input.
	* gcc.target/i386/avx512fp16-vfcmaddcph-1a.c: New test.
	* gcc.target/i386/avx512fp16-vfcmaddcph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfcmulcph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfcmulcph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmaddcph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmaddcph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmulcph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmulcph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmulcph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmulcph-1b.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-helper.h       |   9 +-
 .../i386/avx512fp16-vfcmaddcph-1a.c           |  27 ++++
 .../i386/avx512fp16-vfcmaddcph-1b.c           | 133 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vfcmulcph-1a.c |  25 ++++
 .../gcc.target/i386/avx512fp16-vfcmulcph-1b.c | 111 +++++++++++++++
 .../gcc.target/i386/avx512fp16-vfmaddcph-1a.c |  27 ++++
 .../gcc.target/i386/avx512fp16-vfmaddcph-1b.c | 131 +++++++++++++++++
 .../gcc.target/i386/avx512fp16-vfmulcph-1a.c  |  25 ++++
 .../gcc.target/i386/avx512fp16-vfmulcph-1b.c  | 115 +++++++++++++++
 .../i386/avx512fp16vl-vfcmaddcph-1a.c         |  30 ++++
 .../i386/avx512fp16vl-vfcmaddcph-1b.c         |  15 ++
 .../i386/avx512fp16vl-vfcmulcph-1a.c          |  28 ++++
 .../i386/avx512fp16vl-vfcmulcph-1b.c          |  15 ++
 .../i386/avx512fp16vl-vfmaddcph-1a.c          |  30 ++++
 .../i386/avx512fp16vl-vfmaddcph-1b.c          |  15 ++
 .../i386/avx512fp16vl-vfmulcph-1a.c           |  28 ++++
 .../i386/avx512fp16vl-vfmulcph-1b.c           |  15 ++
 17 files changed, 777 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
index ce3cfdc3f6b..69948f8ee4f 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-helper.h
@@ -172,9 +172,9 @@ init_src()
 
     for (i = 0; i < AVX512F_MAX_ELEM; i++) {
 	v1.f32[i] = i + 1;
-	v2.f32[i] = i * 0.5f;
+	v2.f32[i] = (i + 2) * 0.5f;
 	v3.f32[i] = i * 1.5f;
-	v4.f32[i] = i - 0.5f;
+	v4.f32[i] = i - 1.5f;
 
 	src3.u32[i] = (i + 1) * 10;
     }
@@ -234,10 +234,12 @@ init_dest(V512 * res, V512 * exp)
 #undef DF
 #undef H_HF
 #undef NET_MASK 
+#undef NET_CMASK 
 #undef MASK_VALUE
 #undef HALF_MASK
 #undef ZMASK_VALUE 
 #define NET_MASK 0xffff
+#define NET_CMASK 0xff
 #define MASK_VALUE 0xcccc
 #define ZMASK_VALUE 0xfcc1
 #define HALF_MASK 0xcc
@@ -253,10 +255,12 @@ init_dest(V512 * res, V512 * exp)
 #undef SI
 #undef H_HF
 #undef NET_MASK 
+#undef NET_CMASK 
 #undef MASK_VALUE 
 #undef ZMASK_VALUE 
 #undef HALF_MASK
 #define NET_MASK 0xff
+#define NET_CMASK 0xff
 #define MASK_VALUE 0xcc
 #define HALF_MASK MASK_VALUE
 #define ZMASK_VALUE 0xc1
@@ -267,6 +271,7 @@ init_dest(V512 * res, V512 * exp)
 #define H_HF(x) x.xmmh[0]
 #else
 #define NET_MASK 0xffffffff
+#define NET_CMASK 0xffff
 #define MASK_VALUE 0xcccccccc
 #define ZMASK_VALUE 0xfcc1fcc1
 #define HALF_MASK 0xcccc
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c
new file mode 100644
index 00000000000..6c2c34c1731
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_fcmadd_pch (x1, x2, x3);
+  res1 = _mm512_mask_fcmadd_pch (res1, m16, x1, x2);
+  res1 = _mm512_mask3_fcmadd_pch (res1, x1, x2, m16);
+  res2 = _mm512_maskz_fcmadd_pch (m16, x1, x2, x3);
+  res = _mm512_fcmadd_round_pch (x1, x2, x3, 8);
+  res1 = _mm512_mask_fcmadd_round_pch (res1, m16, x1, x2, 8);
+  res1 = _mm512_mask3_fcmadd_round_pch (res1, x1, x2, m16, 8);
+  res2 = _mm512_maskz_fcmadd_round_pch (m16, x1, x2, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1b.c
new file mode 100644
index 00000000000..835699b834d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1b.c
@@ -0,0 +1,133 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(c_fmadd_pch) (V512 * dest, V512 op1, V512 op2,
+		    __mmask16 k, int zero_mask, int c_flag,
+		    int is_mask3)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  int invert = 1;
+  if (c_flag == 1)
+    invert = -1;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << (i / 2)) & k) == 0) {
+      if (zero_mask) {
+	v5.f32[i] = 0;
+      }
+      else {
+	v5.u32[i] = is_mask3 ? v3.u32[i] : v7.u32[i];
+      }
+    }
+    else {
+      if ((i % 2) == 0) {
+	v5.f32[i] = v1.f32[i] * v7.f32[i]
+	  - invert * (v1.f32[i+1] * v7.f32[i+1]) + v3.f32[i];
+      }
+      else {
+	v5.f32[i] = v1.f32[i-1] * v7.f32[i]
+	  + invert * (v1.f32[i] * v7.f32[i-1]) + v3.f32[i];
+
+      }
+    }
+    if (((1 << (i / 2 + 8)) & k) == 0) {
+      if (zero_mask) {
+	v6.f32[i] = 0;
+      }
+      else {
+	v6.u32[i] = is_mask3 ? v4.u32[i] : v8.u32[i];
+      }
+    }
+    else {
+      if ((i % 2) == 0) {
+	v6.f32[i] = v2.f32[i] * v8.f32[i]
+	  - invert * (v2.f32[i+1] * v8.f32[i+1]) + v4.f32[i];
+      }
+      else {
+	v6.f32[i] = v2.f32[i-1] * v8.f32[i]
+	  + invert * (v2.f32[i] * v8.f32[i-1]) + v4.f32[i];
+      }
+
+    }
+  }
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, NET_CMASK, 0, 1, 0);
+  HF(res) = INTRINSIC (_fcmadd_pch) (HF(res), HF(src1),
+				     HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fcmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 1, 0);
+  HF(res) = INTRINSIC (_mask_fcmadd_pch) (HF(res) ,HALF_MASK, HF(src1),
+					  HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fcmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 1, 1);
+  HF(res) = INTRINSIC (_mask3_fcmadd_pch) (HF(res), HF(src1),
+					   HF(src2), HALF_MASK);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fcmadd_pch);
+
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 1, 1, 0);
+  HF(res) = INTRINSIC (_maskz_fcmadd_pch) (HALF_MASK, HF(res),
+					   HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fcmadd_pch);
+
+#if AVX512F_LEN == 512
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, NET_CMASK, 0, 1, 0);
+  HF(res) = INTRINSIC (_fcmadd_round_pch) (HF(res), HF(src1),
+				     HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fcmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 1, 0);
+  HF(res) = INTRINSIC (_mask_fcmadd_round_pch) (HF(res) ,HALF_MASK, HF(src1),
+					  HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fcmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 1, 1);
+  HF(res) = INTRINSIC (_mask3_fcmadd_round_pch) (HF(res), HF(src1),
+					   HF(src2), HALF_MASK, _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fcmadd_pch);
+
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 1, 1, 0);
+  HF(res) = INTRINSIC (_maskz_fcmadd_round_pch) (HALF_MASK, HF(res),
+					   HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fcmadd_pch);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1a.c
new file mode 100644
index 00000000000..ca2f14072ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_fcmul_pch (x1, x2);
+  res1 = _mm512_mask_fcmul_pch (res1, m16, x1, x2);
+  res2 = _mm512_maskz_fcmul_pch (m16, x1, x2);
+  res = _mm512_fcmul_round_pch (x1, x2, 8);
+  res1 = _mm512_mask_fcmul_round_pch (res1, m16, x1, x2, 8);
+  res2 = _mm512_maskz_fcmul_round_pch (m16, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1b.c
new file mode 100644
index 00000000000..ee41f6c58d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcph-1b.c
@@ -0,0 +1,111 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(c_fmul_pch) (V512 * dest, V512 op1, V512 op2,
+		  __mmask16 k, int zero_mask, int c_flag)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  int invert = 1;
+  if (c_flag == 1)
+    invert = -1;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+      if (((1 << (i / 2)) & k) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.u32[i] = v7.u32[i];
+	  }
+      }
+      else {
+	  if ((i % 2) == 0) {
+	      v5.f32[i] = v1.f32[i] * v3.f32[i]
+		- invert * (v1.f32[i+1] * v3.f32[i+1]);
+	  }
+	  else {
+	      v5.f32[i] = v1.f32[i] * v3.f32[i-1]
+		+ invert * (v1.f32[i-1] * v3.f32[i]);
+
+	  }
+      }
+      if (((1 << (i / 2 + 8)) & k) == 0) {
+	  if (zero_mask) {
+	      v6.f32[i] = 0;
+	  }
+	  else {
+	      v6.u32[i] = v8.u32[i];
+	  }
+      }
+      else {
+	  if ((i % 2) == 0) {
+	      v6.f32[i] = v2.f32[i] * v4.f32[i]
+		- invert * (v2.f32[i+1] * v4.f32[i+1]);
+	  }
+	  else {
+	      v6.f32[i] = v2.f32[i] * v4.f32[i-1]
+		+ invert * (v2.f32[i-1] * v4.f32[i]);
+	  }
+
+      }
+   }
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(c_fmul_pch)(&exp, src1, src2, NET_CMASK, 0, 1);
+  HF(res) = INTRINSIC (_fcmul_pch) (HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fcmul_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 0, 1);
+  HF(res) = INTRINSIC (_mask_fcmul_pch) (HF(res) ,HALF_MASK, HF(src1),
+					 HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fcmul_pch);
+
+  EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 1, 1);
+  HF(res) = INTRINSIC (_maskz_fcmul_pch) ( HALF_MASK, HF(src1),
+					   HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fcmul_pch);
+
+#if AVX512F_LEN == 512
+  EMULATE(c_fmul_pch)(&exp, src1, src2, NET_CMASK, 0, 1);
+  HF(res) = INTRINSIC (_fcmul_round_pch) (HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fcmul_round_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 0, 1);
+  HF(res) = INTRINSIC (_mask_fcmul_round_pch) (HF(res) ,HALF_MASK, HF(src1),
+					 HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fcmul_round_pch);
+
+  EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 1, 1);
+  HF(res) = INTRINSIC (_maskz_fcmul_round_pch) ( HALF_MASK, HF(src1),
+					   HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fcmul_round_pch);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c
new file mode 100644
index 00000000000..4dae5f02dc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_fmadd_pch (x1, x2, x3);
+  res1 = _mm512_mask_fmadd_pch (res1, m16, x1, x2);
+  res1 = _mm512_mask3_fmadd_pch (res1, x1, x2, m16);
+  res2 = _mm512_maskz_fmadd_pch (m16, x1, x2, x3);
+  res = _mm512_fmadd_round_pch (x1, x2, x3, 8);
+  res1 = _mm512_mask_fmadd_round_pch (res1, m16, x1, x2, 8);
+  res1 = _mm512_mask3_fmadd_round_pch (res1, x1, x2, m16, 8);
+  res2 = _mm512_maskz_fmadd_round_pch (m16, x1, x2, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1b.c
new file mode 100644
index 00000000000..1da6f01e139
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1b.c
@@ -0,0 +1,131 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(c_fmadd_pch) (V512 * dest, V512 op1, V512 op2,
+		    __mmask16 k, int zero_mask, int c_flag,
+		    int is_mask3)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  int invert = 1;
+  if (c_flag == 1)
+    invert = -1;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+    if (((1 << (i / 2)) & k) == 0) {
+      if (zero_mask) {
+	v5.f32[i] = 0;
+      }
+      else {
+	v5.u32[i] = is_mask3 ? v3.u32[i] : v7.u32[i];
+      }
+    }
+    else {
+      if ((i % 2) == 0) {
+	v5.f32[i] = v1.f32[i] * v7.f32[i]
+	  - invert * (v1.f32[i+1] * v7.f32[i+1]) + v3.f32[i];
+      }
+      else {
+	v5.f32[i] = v1.f32[i-1] * v7.f32[i]
+	  + invert * (v1.f32[i] * v7.f32[i-1]) + v3.f32[i];
+
+      }
+    }
+    if (((1 << (i / 2 + 8)) & k) == 0) {
+      if (zero_mask) {
+	v6.f32[i] = 0;
+      }
+      else {
+	v6.u32[i] = is_mask3 ? v4.u32[i] : v8.u32[i];
+      }
+    }
+    else {
+      if ((i % 2) == 0) {
+	v6.f32[i] = v2.f32[i] * v8.f32[i]
+	  - invert * (v2.f32[i+1] * v8.f32[i+1]) + v4.f32[i];
+      }
+      else {
+	v6.f32[i] = v2.f32[i-1] * v8.f32[i]
+	  + invert * (v2.f32[i] * v8.f32[i-1]) + v4.f32[i];
+      }
+
+    }
+  }
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, NET_CMASK, 0, 0, 0);
+  HF(res) = INTRINSIC (_fmadd_pch) (HF(res), HF(src1),
+				    HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 0, 0);
+  HF(res) = INTRINSIC (_mask_fmadd_pch) (HF(res), HALF_MASK, HF(src1),
+					 HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 0, 1);
+  HF(res) = INTRINSIC (_mask3_fmadd_pch) (HF(res), HF(src1), HF(src2),
+					  HALF_MASK);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 1, 0, 0);
+  HF(res) = INTRINSIC (_maskz_fmadd_pch) (HALF_MASK, HF(res), HF(src1),
+					  HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmadd_pch);
+
+#if AVX512F_LEN == 512
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, NET_CMASK, 0, 0, 0);
+  HF(res) = INTRINSIC (_fmadd_round_pch) (HF(res), HF(src1),
+				    HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 0, 0);
+  HF(res) = INTRINSIC (_mask_fmadd_round_pch) (HF(res), HALF_MASK, HF(src1),
+					 HF(src2),  _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 0, 0, 1);
+  HF(res) = INTRINSIC (_mask3_fmadd_round_pch) (HF(res), HF(src1), HF(src2),
+					  HALF_MASK, _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask3_fmadd_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_pch)(&exp, src1, src2, HALF_MASK, 1, 0, 0);
+  HF(res) = INTRINSIC (_maskz_fmadd_round_pch) (HALF_MASK, HF(res), HF(src1),
+					  HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmadd_pch);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1a.c
new file mode 100644
index 00000000000..f31cbca368e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m512h res, res1, res2;
+volatile __m512h x1, x2, x3;
+volatile __mmask16 m16;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm512_fmul_pch (x1, x2);
+  res1 = _mm512_mask_fmul_pch (res1, m16, x1, x2);
+  res2 = _mm512_maskz_fmul_pch (m16, x1, x2);
+  res = _mm512_fmul_round_pch (x1, x2, 8);
+  res1 = _mm512_mask_fmul_round_pch (res1, m16, x1, x2, 8);
+  res2 = _mm512_maskz_fmul_round_pch (m16, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1b.c
new file mode 100644
index 00000000000..d9bb1b0ec12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcph-1b.c
@@ -0,0 +1,115 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS (AVX512F_LEN / 16)
+
+void NOINLINE
+EMULATE(c_fmul_pch) (V512 * dest, V512 op1, V512 op2,
+		  __mmask16 k, int zero_mask, int c_flag)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  int invert = 1;
+  if (c_flag == 1)
+    invert = -1;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  for (i = 0; i < 16; i++) {
+      if (((1 << (i / 2)) & k) == 0) {
+	  if (zero_mask) {
+	      v5.f32[i] = 0;
+	  }
+	  else {
+	      v5.u32[i] = v7.u32[i];
+	  }
+      }
+      else {
+	  if ((i % 2) == 0) {
+	      v5.f32[i] = v1.f32[i] * v3.f32[i]
+		- invert * (v1.f32[i+1] * v3.f32[i+1]);
+	  }
+	  else {
+	      v5.f32[i] = v1.f32[i-1] * v3.f32[i]
+		+ invert * (v1.f32[i] * v3.f32[i-1]);
+
+	  }
+      }
+      if (((1 << (i / 2 + 8)) & k) == 0) {
+	  if (zero_mask) {
+	      v6.f32[i] = 0;
+	  }
+	  else {
+	      v6.u32[i] = v8.u32[i];
+	  }
+      }
+      else {
+	  if ((i % 2) == 0) {
+	      v6.f32[i] = v2.f32[i] * v4.f32[i]
+		- invert * (v2.f32[i+1] * v4.f32[i+1]);
+	  }
+	  else {
+	      v6.f32[i] = v2.f32[i-1] * v4.f32[i]
+		+ invert * (v2.f32[i] * v4.f32[i-1]);
+	  }
+
+      }
+   }
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  EMULATE(c_fmul_pch)(&exp, src1, src2, NET_CMASK, 0, 0);
+  HF(res) = INTRINSIC (_fmul_pch) (HF(src1), HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmul_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 0, 0);
+  HF(res) = INTRINSIC (_mask_fmul_pch) (HF(res),HALF_MASK, HF(src1),
+					HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmul_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 1, 0);
+  HF(res) = INTRINSIC (_maskz_fmul_pch) (HALF_MASK, HF(src1),
+					 HF(src2));
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmul_pch);
+
+#if AVX512F_LEN == 512
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_pch)(&exp, src1, src2, NET_CMASK, 0, 0);
+  HF(res) = INTRINSIC (_fmul_round_pch) (HF(src1), HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _fmul_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 0, 0);
+  HF(res) = INTRINSIC (_mask_fmul_round_pch) (HF(res),HALF_MASK, HF(src1),
+					HF(src2),  _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mask_fmul_pch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_pch)(&exp, src1, src2, HALF_MASK, 1, 0);
+  HF(res) = INTRINSIC (_maskz_fmul_round_pch) (HALF_MASK, HF(src1),
+					 HF(src2), _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _maskz_fmul_pch);
+#endif
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c
new file mode 100644
index 00000000000..eff13812c87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1, x2, x3;
+volatile __m128h x4, x5, x6;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_fcmadd_pch (x1, x2, x3);
+  res1 = _mm256_mask_fcmadd_pch (res1, m8, x1, x2);
+  res1 = _mm256_mask3_fcmadd_pch (res1, x1, x2, m8);
+  res1 = _mm256_maskz_fcmadd_pch (m8, x1, x2, x3);
+
+  res2 = _mm_fcmadd_pch (x4, x5, x6);
+  res2 = _mm_mask_fcmadd_pch (res2, m8, x4, x5);
+  res2 = _mm_mask3_fcmadd_pch (res2, x4, x5, m8);
+  res2 = _mm_maskz_fcmadd_pch (m8, x4, x5, x6);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c
new file mode 100644
index 00000000000..5e3a54ecaae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfcmaddcph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfcmaddcph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c
new file mode 100644
index 00000000000..4e48e9c7f85
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1, x2, x3;
+volatile __m128h x4, x5, x6;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_fcmul_pch (x1, x2);
+  res1 = _mm256_mask_fcmul_pch (res1, m8, x1, x2);
+  res1 = _mm256_maskz_fcmul_pch (m8, x1, x2);
+
+  res2 = _mm_fcmul_pch (x4, x5);
+  res2 = _mm_mask_fcmul_pch (res2, m8, x4, x5);
+  res2 = _mm_maskz_fcmul_pch (m8, x4, x5);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c
new file mode 100644
index 00000000000..19564a1955d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmulcph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfcmulcph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfcmulcph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c
new file mode 100644
index 00000000000..b9a24d0b9d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1, x2, x3;
+volatile __m128h x4, x5, x6;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_fmadd_pch (x1, x2, x3);
+  res1 = _mm256_mask_fmadd_pch (res1, m8, x1, x2);
+  res1 = _mm256_mask3_fmadd_pch (res1, x1, x2, m8);
+  res1 = _mm256_maskz_fmadd_pch (m8, x1, x2, x3);
+
+  res2 = _mm_fmadd_pch (x4, x5, x6);
+  res2 = _mm_mask_fmadd_pch (res2, m8, x4, x5);
+  res2 = _mm_mask3_fmadd_pch (res2, x4, x5, m8);
+  res2 = _mm_maskz_fmadd_pch (m8, x4, x5, x6);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c
new file mode 100644
index 00000000000..bf85fea75ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmaddcph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmaddcph-1b.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1a.c
new file mode 100644
index 00000000000..54e58c66edb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]+%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256h res1;
+volatile __m128h res2;
+volatile __m256h x1, x2, x3;
+volatile __m128h x4, x5, x6;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res1 = _mm256_fmul_pch (x1, x2);
+  res1 = _mm256_mask_fmul_pch (res1, m8, x1, x2);
+  res1 = _mm256_maskz_fmul_pch (m8, x1, x2);
+
+  res2 = _mm_fmul_pch (x4, x5);
+  res2 = _mm_mask_fmul_pch (res2, m8, x4, x5);
+  res2 = _mm_maskz_fmul_pch (m8, x4, x5);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1b.c
new file mode 100644
index 00000000000..f88d8423965
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmulcph-1b.c
@@ -0,0 +1,15 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#define AVX512VL
+#define AVX512F_LEN 256      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmulcph-1b.c"
+                             
+#undef AVX512F_LEN           
+#undef AVX512F_LEN_HALF      
+                             
+#define AVX512F_LEN 128      
+#define AVX512F_LEN_HALF 128 
+#include "avx512fp16-vfmulcph-1b.c"
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 51/62] AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (49 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 50/62] AVX512FP16: Add testcases for vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 52/62] AVX512FP16: Add testcases for vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh liuhongt
                   ` (10 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_mask_fcmadd_sch):
	New intrinsic.
	(_mm_mask3_fcmadd_sch): Likewise.
	(_mm_maskz_fcmadd_sch): Likewise.
	(_mm_fcmadd_sch): Likewise.
	(_mm_mask_fmadd_sch): Likewise.
	(_mm_mask3_fmadd_sch): Likewise.
	(_mm_maskz_fmadd_sch): Likewise.
	(_mm_fmadd_sch): Likewise.
	(_mm_mask_fcmadd_round_sch): Likewise.
	(_mm_mask3_fcmadd_round_sch): Likewise.
	(_mm_maskz_fcmadd_round_sch): Likewise.
	(_mm_fcmadd_round_sch): Likewise.
	(_mm_mask_fmadd_round_sch): Likewise.
	(_mm_mask3_fmadd_round_sch): Likewise.
	(_mm_maskz_fmadd_round_sch): Likewise.
	(_mm_fmadd_round_sch): Likewise.
	(_mm_fcmul_sch): Likewise.
	(_mm_mask_fcmul_sch): Likewise.
	(_mm_maskz_fcmul_sch): Likewise.
	(_mm_fmul_sch): Likewise.
	(_mm_mask_fmul_sch): Likewise.
	(_mm_maskz_fmul_sch): Likewise.
	(_mm_fcmul_round_sch): Likewise.
	(_mm_mask_fcmul_round_sch): Likewise.
	(_mm_maskz_fcmul_round_sch): Likewise.
	(_mm_fmul_round_sch): Likewise.
	(_mm_mask_fmul_round_sch): Likewise.
	(_mm_maskz_fmul_round_sch): Likewise.
	* config/i386/i386-builtin.def: Add corresponding new builtins.
	* config/i386/sse.md
	(avx512fp16_fmaddcsh_v8hf_maskz<round_expand_name>): New expander.
	(avx512fp16_fcmaddcsh_v8hf_maskz<round_expand_name>): Ditto.
	(avx512fp16_fma_<complexopname>sh_v8hf<mask_scalarcz_name><round_scalarcz_name>):
	New define insn.
	(avx512fp16_<complexopname>sh_v8hf_mask<round_name>): Ditto.
	(avx512fp16_<complexopname>sh_v8hf<mask_scalarc_name><round_scalarcz_name>):
	Ditto.
	* config/i386/subst.md (mask_scalarcz_name): New.
	(mask_scalarc_name): Ditto.
	(mask_scalarc_operand3): Ditto.
	(mask_scalarcz_operand4): Ditto.
	(round_scalarcz_name): Ditto.
	(round_scalarc_mask_operand3): Ditto.
	(round_scalarcz_mask_operand4): Ditto.
	(round_scalarc_mask_op3): Ditto.
	(round_scalarcz_mask_op4): Ditto.
	(round_scalarcz_constraint): Ditto.
	(round_scalarcz_nimm_predicate): Ditto.
	(mask_scalarcz): Ditto.
	(mask_scalarc): Ditto.
	(round_scalarcz): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add test for new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* gcc.target/i386/sse-14.c: Add test for new intrinsics.
	* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h     | 464 +++++++++++++++++++++++++
 gcc/config/i386/i386-builtin.def       |  10 +
 gcc/config/i386/sse.md                 |  76 ++++
 gcc/config/i386/subst.md               |  63 ++++
 gcc/testsuite/gcc.target/i386/avx-1.c  |  10 +
 gcc/testsuite/gcc.target/i386/sse-13.c |  10 +
 gcc/testsuite/gcc.target/i386/sse-14.c |  14 +
 gcc/testsuite/gcc.target/i386/sse-22.c |  14 +
 gcc/testsuite/gcc.target/i386/sse-23.c |  10 +
 9 files changed, 671 insertions(+)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 9dd71019972..39c10beb1de 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -6495,6 +6495,470 @@ _mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B,
 
 #endif /* __OPTIMIZE__ */
 
+/* Intrinsics vf[,c]maddcsh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fcmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+#ifdef __AVX512VL__
+  return (__m128h) __builtin_ia32_movaps128_mask (
+    (__v4sf)
+    __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __D,
+					       (__v8hf) __A,
+					       (__v8hf) __C, __B,
+					       _MM_FROUND_CUR_DIRECTION),
+    (__v4sf) __A, __B);
+#else
+  return (__m128h) __builtin_ia32_blendvps ((__v4sf) __A,
+    (__v4sf)
+    __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __D,
+					       (__v8hf) __A,
+					       (__v8hf) __C, __B,
+					       _MM_FROUND_CUR_DIRECTION),
+    (__v4sf) _mm_set_ss ((float) ((int) __B << 31)));
+#endif
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
+{
+  return (__m128h) _mm_move_ss ((__m128) __C,
+    (__m128)
+    __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __C,
+					       (__v8hf) __A,
+					       (__v8hf) __B, __D,
+					       _MM_FROUND_CUR_DIRECTION));
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fcmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
+{
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_v8hf_maskz_round((__v8hf) __D,
+					       (__v8hf) __B,
+					       (__v8hf) __C,
+					       __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fcmadd_sch (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h)
+    __builtin_ia32_vfcmaddcsh_v8hf_round((__v8hf) __C,
+					 (__v8hf) __A,
+					 (__v8hf) __B,
+					 _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmadd_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+#ifdef __AVX512VL__
+  return (__m128h) __builtin_ia32_movaps128_mask (
+    (__v4sf)
+    __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __D,
+					      (__v8hf) __A,
+					      (__v8hf) __C, __B,
+					      _MM_FROUND_CUR_DIRECTION),
+    (__v4sf) __A, __B);
+#else
+  return (__m128h) __builtin_ia32_blendvps ((__v4sf) __A,
+    (__v4sf)
+    __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __D,
+					      (__v8hf) __A,
+					      (__v8hf) __C, __B,
+					      _MM_FROUND_CUR_DIRECTION),
+    (__v4sf) _mm_set_ss ((float) ((int) __B << 31)));
+#endif
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmadd_sch (__m128h __A, __m128h __B, __m128h __C, __mmask8 __D)
+{
+  return (__m128h) _mm_move_ss ((__m128) __C,
+    (__m128)
+    __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __C,
+					      (__v8hf) __A,
+					      (__v8hf) __B, __D,
+					      _MM_FROUND_CUR_DIRECTION));
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmadd_sch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
+{
+  return (__m128h)
+    __builtin_ia32_vfmaddcsh_v8hf_maskz_round((__v8hf) __D,
+					      (__v8hf) __B,
+					      (__v8hf) __C,
+					      __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_sch (__m128h __A, __m128h __B, __m128h __C)
+{
+  return (__m128h)
+    __builtin_ia32_vfmaddcsh_v8hf_round((__v8hf) __C,
+					(__v8hf) __A,
+					(__v8hf) __B,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fcmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+			   __m128h __D, const int __E)
+{
+#ifdef __AVX512VL__
+  return (__m128h) __builtin_ia32_movaps128_mask (
+    (__v4sf)
+    __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __D,
+					       (__v8hf) __A,
+					       (__v8hf) __C,
+					       __B, __E),
+    (__v4sf) __A, __B);
+#else
+  return (__m128h) __builtin_ia32_blendvps ((__v4sf) __A,
+    (__v4sf)
+    __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __D,
+					       (__v8hf) __A,
+					       (__v8hf) __C,
+					       __B, __E),
+    (__v4sf) _mm_set_ss ((float) ((int) __B << 31)));
+#endif
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
+			    __mmask8 __D, const int __E)
+{
+  return (__m128h) _mm_move_ss ((__m128) __C,
+    (__m128)
+    __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) __C,
+					       (__v8hf) __A,
+					       (__v8hf) __B,
+					       __D, __E));
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fcmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
+			    __m128h __D, const int __E)
+{
+  return (__m128h)__builtin_ia32_vfcmaddcsh_v8hf_maskz_round((__v8hf) __D,
+							     (__v8hf) __B,
+							     (__v8hf) __C,
+							     __A, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fcmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
+{
+  return (__m128h)__builtin_ia32_vfcmaddcsh_v8hf_round((__v8hf) __C,
+						       (__v8hf) __A,
+						       (__v8hf) __B,
+						       __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmadd_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+			  __m128h __D, const int __E)
+{
+#ifdef __AVX512VL__
+  return (__m128h) __builtin_ia32_movaps128_mask (
+    (__v4sf)
+    __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __D,
+					      (__v8hf) __A,
+					      (__v8hf) __C,
+					      __B, __E),
+    (__v4sf) __A, __B);
+#else
+  return (__m128h) __builtin_ia32_blendvps ((__v4sf) __A,
+    (__v4sf)
+    __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __D,
+					      (__v8hf) __A,
+					      (__v8hf) __C,
+					      __B, __E),
+    (__v4sf) _mm_set_ss ((float) ((int) __B << 31)));
+#endif
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask3_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C,
+			    __mmask8 __D, const int __E)
+{
+  return (__m128h) _mm_move_ss ((__m128) __C,
+    (__m128)
+    __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) __C,
+					      (__v8hf) __A,
+					      (__v8hf) __B,
+					      __D, __E));
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmadd_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
+			   __m128h __D, const int __E)
+{
+  return (__m128h)__builtin_ia32_vfmaddcsh_v8hf_maskz_round((__v8hf) __D,
+							    (__v8hf) __B,
+							    (__v8hf) __C,
+							    __A, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmadd_round_sch (__m128h __A, __m128h __B, __m128h __C, const int __D)
+{
+  return (__m128h)__builtin_ia32_vfmaddcsh_v8hf_round((__v8hf) __C,
+						      (__v8hf) __A,
+						      (__v8hf) __B,
+						      __D);
+}
+
+#else
+#ifdef __AVX512VL__
+#define _mm_mask_fcmadd_round_sch(A, B, C, D, E)			\
+    ((__m128h) __builtin_ia32_movaps128_mask (				\
+     (__v4sf)								\
+     __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) (D),		\
+						(__v8hf) (A),           \
+						(__v8hf) (C),           \
+						(B), (E)),              \
+						(__v4sf) (A), (B)))
+
+#else
+#define _mm_mask_fcmadd_round_sch(A, B, C, D, E)			\
+  ((__m128h) __builtin_ia32_blendvps ((__v4sf) (A),			\
+   (__v4sf)								\
+    __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) (D),		\
+					       (__v8hf) (A),		\
+					       (__v8hf) (C),		\
+					       (B), (E)),		\
+    (__v4sf) _mm_set_ss ((float) ((int) (B) << 31))))
+#endif
+
+#define _mm_mask3_fcmadd_round_sch(A, B, C, D, E)			\
+  ((__m128h) _mm_move_ss ((__m128) (C),					\
+   (__m128)								\
+    __builtin_ia32_vfcmaddcsh_v8hf_mask_round ((__v8hf) (C),		\
+					       (__v8hf) (A),		\
+					       (__v8hf) (B),		\
+					       (D), (E))))
+
+#define _mm_maskz_fcmadd_round_sch(A, B, C, D, E)		\
+  __builtin_ia32_vfcmaddcsh_v8hf_maskz_round ((D), (B), (C), (A), (E))
+
+#define _mm_fcmadd_round_sch(A, B, C, D)		\
+  __builtin_ia32_vfcmaddcsh_v8hf_round ((C), (A), (B), (D))
+
+#ifdef __AVX512VL__
+#define _mm_mask_fmadd_round_sch(A, B, C, D, E)				\
+    ((__m128h) __builtin_ia32_movaps128_mask (				\
+     (__v4sf)								\
+     __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) (D),		\
+					       (__v8hf) (A),		\
+					       (__v8hf) (C),		\
+					       (B), (E)),		\
+					       (__v4sf) (A), (B)))
+
+#else
+#define _mm_mask_fmadd_round_sch(A, B, C, D, E)				\
+  ((__m128h) __builtin_ia32_blendvps ((__v4sf) (A),			\
+   (__v4sf)								\
+    __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) (D),		\
+					      (__v8hf) (A),		\
+					      (__v8hf) (C),		\
+					      (B), (E)),		\
+    (__v4sf) _mm_set_ss ((float) ((int) (B) << 31))))
+#endif
+
+#define _mm_mask3_fmadd_round_sch(A, B, C, D, E)			\
+  ((__m128h) _mm_move_ss ((__m128) (C),					\
+   (__m128)								\
+    __builtin_ia32_vfmaddcsh_v8hf_mask_round ((__v8hf) (C),		\
+					      (__v8hf) (A),		\
+					      (__v8hf) (B),		\
+					      (D), (E))))
+
+#define _mm_maskz_fmadd_round_sch(A, B, C, D, E)		\
+  __builtin_ia32_vfmaddcsh_v8hf_maskz_round ((D), (B), (C), (A), (E))
+
+#define _mm_fmadd_round_sch(A, B, C, D)		\
+  __builtin_ia32_vfmaddcsh_v8hf_round ((C), (A), (B), (D))
+
+#endif /* __OPTIMIZE__ */
+
+/* Intrinsics vf[,c]mulcsh.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fcmul_sch (__m128h __A, __m128h __B)
+{
+  return (__m128h)
+    __builtin_ia32_vfcmulcsh_v8hf_round((__v8hf) __A,
+					(__v8hf) __B,
+					_MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fcmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return (__m128h)
+    __builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __C,
+					     (__v8hf) __D,
+					     (__v8hf) __A,
+					     __B, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fcmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return (__m128h)
+    __builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __B,
+					     (__v8hf) __C,
+					     _mm_setzero_ph (),
+					     __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmul_sch (__m128h __A, __m128h __B)
+{
+  return (__m128h)
+    __builtin_ia32_vfmulcsh_v8hf_round((__v8hf) __A,
+				       (__v8hf) __B,
+				       _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmul_sch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
+{
+  return (__m128h)
+    __builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __C,
+					    (__v8hf) __D,
+					    (__v8hf) __A,
+					    __B, _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmul_sch (__mmask8 __A, __m128h __B, __m128h __C)
+{
+  return (__m128h)
+    __builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __B,
+					    (__v8hf) __C,
+					    _mm_setzero_ph (),
+					    __A, _MM_FROUND_CUR_DIRECTION);
+}
+
+#ifdef __OPTIMIZE__
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fcmul_round_sch (__m128h __A, __m128h __B, const int __D)
+{
+  return (__m128h)__builtin_ia32_vfcmulcsh_v8hf_round((__v8hf) __A,
+						      (__v8hf) __B,
+						      __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fcmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+			  __m128h __D, const int __E)
+{
+  return (__m128h)__builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __C,
+							   (__v8hf) __D,
+							   (__v8hf) __A,
+							   __B, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fcmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C,
+			   const int __E)
+{
+  return (__m128h)__builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __B,
+							   (__v8hf) __C,
+							   _mm_setzero_ph (),
+							   __A, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_fmul_round_sch (__m128h __A, __m128h __B, const int __D)
+{
+  return (__m128h)__builtin_ia32_vfmulcsh_v8hf_round((__v8hf) __A,
+						     (__v8hf) __B, __D);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_fmul_round_sch (__m128h __A, __mmask8 __B, __m128h __C,
+			 __m128h __D, const int __E)
+{
+  return (__m128h)__builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __C,
+							  (__v8hf) __D,
+							  (__v8hf) __A,
+							  __B, __E);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E)
+{
+  return (__m128h)__builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __B,
+							  (__v8hf) __C,
+							  _mm_setzero_ph (),
+							  __A, __E);
+}
+
+#else
+#define _mm_fcmul_round_sch(__A, __B, __D)				\
+  (__m128h)__builtin_ia32_vfcmulcsh_v8hf_round((__v8hf) __A,(__v8hf) __B, __D)
+
+#define _mm_mask_fcmul_round_sch(__A, __B, __C, __D, __E)		\
+  (__m128h)__builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __C,	\
+						    (__v8hf) __D,	\
+						    (__v8hf) __A,	\
+						    __B, __E)
+
+#define _mm_maskz_fcmul_round_sch(__A, __B, __C, __E)			\
+  (__m128h)__builtin_ia32_vfcmulcsh_v8hf_mask_round((__v8hf) __B,	\
+						    (__v8hf) __C,	\
+						    _mm_setzero_ph(),	\
+						    __A, __E)
+
+#define _mm_fmul_round_sch(__A, __B, __D)				\
+  (__m128h)__builtin_ia32_vfmulcsh_v8hf_round((__v8hf) __A,(__v8hf) __B, __D)
+
+#define _mm_mask_fmul_round_sch(__A, __B, __C, __D, __E)		\
+  (__m128h)__builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __C,	\
+						   (__v8hf) __D,	\
+						   (__v8hf) __A,	\
+						   __B, __E)
+
+#define _mm_maskz_fmul_round_sch(__A, __B, __C, __E)			\
+  (__m128h)__builtin_ia32_vfmulcsh_v8hf_mask_round((__v8hf) __B,	\
+						   (__v8hf) __C,	\
+						   _mm_setzero_ph (),	\
+						   __A, __E)
+
+#endif /* __OPTIMIZE__ */
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 448f9f75fa4..8d57413153f 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -3231,6 +3231,16 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph_v32hf_mask_round", IX86_BUILTIN_VFCMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph_v32hf_round", IX86_BUILTIN_VFMULCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
 BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph_v32hf_mask_round", IX86_BUILTIN_VFMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fma_fcmaddcsh_v8hf_round, "__builtin_ia32_vfcmaddcsh_v8hf_round", IX86_BUILTIN_VFCMADDCSH_V8HF_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_mask_round, "__builtin_ia32_vfcmaddcsh_v8hf_mask_round", IX86_BUILTIN_VFCMADDCSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddcsh_v8hf_maskz_round, "__builtin_ia32_vfcmaddcsh_v8hf_maskz_round", IX86_BUILTIN_VFCMADDCSH_V8HF_MASKZ_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fma_fmaddcsh_v8hf_round, "__builtin_ia32_vfmaddcsh_v8hf_round", IX86_BUILTIN_VFMADDCSH_V8HF_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddcsh_v8hf_mask_round, "__builtin_ia32_vfmaddcsh_v8hf_mask_round", IX86_BUILTIN_VFMADDCSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddcsh_v8hf_maskz_round, "__builtin_ia32_vfmaddcsh_v8hf_maskz_round", IX86_BUILTIN_VFMADDCSH_V8HF_MASKZ_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulcsh_v8hf_round, "__builtin_ia32_vfcmulcsh_v8hf_round", IX86_BUILTIN_VFCMULCSH_V8HF_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulcsh_v8hf_mask_round, "__builtin_ia32_vfcmulcsh_v8hf_mask_round", IX86_BUILTIN_VFCMULCSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulcsh_v8hf_round, "__builtin_ia32_vfmulcsh_v8hf_round", IX86_BUILTIN_VFMULCSH_V8HF_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_INT)
+BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulcsh_v8hf_mask_round, "__builtin_ia32_vfmulcsh_v8hf_mask_round", IX86_BUILTIN_VFMULCSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
 
 BDESC_END (ROUND_ARGS, MULTI_ARG)
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ddd93f739e3..2c3dba5bdb0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5597,6 +5597,82 @@ (define_insn "<avx512>_<complexopname>_<mode><maskc_name><round_name>"
   [(set_attr "type" "ssemul")
    (set_attr "mode" "<MODE>")])
 
+(define_expand "avx512fp16_fmaddcsh_v8hf_maskz<round_expand_name>"
+  [(match_operand:V8HF 0 "register_operand")
+   (match_operand:V8HF 1 "<round_expand_nimm_predicate>")
+   (match_operand:V8HF 2 "<round_expand_nimm_predicate>")
+   (match_operand:V8HF 3 "<round_expand_nimm_predicate>")
+   (match_operand:QI 4 "register_operand")]
+  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
+{
+  emit_insn (gen_avx512fp16_fma_fmaddcsh_v8hf_maskz<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (V8HFmode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
+(define_expand "avx512fp16_fcmaddcsh_v8hf_maskz<round_expand_name>"
+  [(match_operand:V8HF 0 "register_operand")
+   (match_operand:V8HF 1 "<round_expand_nimm_predicate>")
+   (match_operand:V8HF 2 "<round_expand_nimm_predicate>")
+   (match_operand:V8HF 3 "<round_expand_nimm_predicate>")
+   (match_operand:QI 4 "register_operand")]
+  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
+{
+  emit_insn (gen_avx512fp16_fma_fcmaddcsh_v8hf_maskz<round_expand_name> (
+    operands[0], operands[1], operands[2], operands[3],
+    CONST0_RTX (V8HFmode), operands[4]<round_expand_operand>));
+  DONE;
+})
+
+(define_insn "avx512fp16_fma_<complexopname>sh_v8hf<mask_scalarcz_name><round_scalarcz_name>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (unspec:V8HF
+	    [(match_operand:V8HF 1 "<round_scalarcz_nimm_predicate>" "0")
+	     (match_operand:V8HF 2 "<round_scalarcz_nimm_predicate>" "v")
+	     (match_operand:V8HF 3 "<round_scalarcz_nimm_predicate>" "<round_scalarcz_constraint>")]
+	     UNSPEC_COMPLEX_F_C_MA)
+	   (match_dup 2)
+	   (const_int 3)))]
+  "TARGET_AVX512FP16"
+  "v<complexopname>sh\t{<round_scalarcz_mask_op4>%3, %2, %0<mask_scalarcz_operand4>|%0<mask_scalarcz_operand4>, %2, %3<round_scalarcz_maskcz_mask_op4>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V8HF")])
+
+(define_insn "avx512fp16_<complexopname>sh_v8hf_mask<round_name>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (vec_merge:V8HF
+	    (unspec:V8HF
+	      [(match_operand:V8HF 1 "<round_nimm_predicate>" "0")
+	       (match_operand:V8HF 2 "<round_nimm_predicate>" "v")
+	       (match_operand:V8HF 3 "<round_nimm_predicate>" "<round_constraint>")]
+	       UNSPEC_COMPLEX_F_C_MA)
+	     (match_dup 1)
+	     (unspec:QI [(match_operand:QI 4 "register_operand" "Yk")]
+			UNSPEC_COMPLEX_MASK))
+	   (match_dup 2)
+	   (const_int 3)))]
+  "TARGET_AVX512FP16"
+  "v<complexopname>sh\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V8HF")])
+
+(define_insn "avx512fp16_<complexopname>sh_v8hf<mask_scalarc_name><round_scalarcz_name>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	  (vec_merge:V8HF
+	    (unspec:V8HF
+	      [(match_operand:V8HF 1 "nonimmediate_operand" "v")
+	       (match_operand:V8HF 2 "<round_scalarcz_nimm_predicate>" "<round_scalarcz_constraint>")]
+	       UNSPEC_COMPLEX_F_C_MUL)
+	    (match_dup 1)
+	    (const_int 3)))]
+  "TARGET_AVX512FP16"
+  "v<complexopname>sh\t{<round_scalarc_mask_op3>%2, %1, %0<mask_scalarc_operand3>|%0<mask_scalarc_operand3>, %1, %2<round_scalarc_mask_op3>}"
+  [(set_attr "type" "ssemul")
+   (set_attr "mode" "V8HF")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel half-precision floating point conversion operations
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 3a1f554e9b9..5b14a632111 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -308,8 +308,12 @@ (define_subst "mask_expand4"
     (match_operand:<avx512fmaskmode> 5 "register_operand")])
 
 (define_subst_attr "mask_scalar_name" "mask_scalar" "" "_mask")
+(define_subst_attr "mask_scalarcz_name" "mask_scalarcz" "" "_maskz")
+(define_subst_attr "mask_scalarc_name" "mask_scalarc" "" "_mask")
+(define_subst_attr "mask_scalarc_operand3" "mask_scalarc" "" "%{%4%}%N3")
 (define_subst_attr "mask_scalar_operand3" "mask_scalar" "" "%{%4%}%N3")
 (define_subst_attr "mask_scalar_operand4" "mask_scalar" "" "%{%5%}%N4")
+(define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5%}%N4")
 
 (define_subst "mask_scalar"
   [(set (match_operand:SUBST_V 0)
@@ -327,12 +331,55 @@ (define_subst "mask_scalar"
 	  (match_dup 2)
 	  (const_int 1)))])
 
+(define_subst "mask_scalarcz"
+  [(set (match_operand:SUBST_CV 0)
+	(vec_merge:SUBST_CV
+	  (match_operand:SUBST_CV 1)
+	  (match_operand:SUBST_CV 2)
+	  (const_int 3)))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+	(vec_merge:SUBST_CV
+	  (vec_merge:SUBST_CV
+	    (match_dup 1)
+	    (match_operand:SUBST_CV 3 "const0_operand" "C")
+	    (unspec:<avx512fmaskmode>
+	      [(match_operand:<avx512fmaskcmode> 4 "register_operand" "Yk")]
+	      UNSPEC_COMPLEX_MASK))
+	  (match_dup 2)
+	  (const_int 3)))])
+
+(define_subst "mask_scalarc"
+  [(set (match_operand:SUBST_CV 0)
+	(vec_merge:SUBST_CV
+	  (match_operand:SUBST_CV 1)
+	  (match_operand:SUBST_CV 2)
+	  (const_int 3)))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+	(vec_merge:SUBST_CV
+	  (vec_merge:SUBST_CV
+	    (match_dup 1)
+	    (match_operand:SUBST_CV 3 "nonimm_or_0_operand" "0C")
+	    (unspec:<avx512fmaskmode>
+	      [(match_operand:<avx512fmaskcmode> 4 "register_operand" "Yk")]
+	      UNSPEC_COMPLEX_MASK))
+	  (match_dup 2)
+	  (const_int 3)))])
+
 (define_subst_attr "round_scalar_name" "round_scalar" "" "_round")
+(define_subst_attr "round_scalarcz_name" "round_scalarcz" "" "_round")
 (define_subst_attr "round_scalar_mask_operand3" "mask_scalar" "%R3" "%R5")
+(define_subst_attr "round_scalarc_mask_operand3" "mask_scalarc" "%R3" "%R5")
+(define_subst_attr "round_scalarcz_mask_operand4" "mask_scalarcz" "%R4" "%R6")
 (define_subst_attr "round_scalar_mask_op3" "round_scalar" "" "<round_scalar_mask_operand3>")
+(define_subst_attr "round_scalarc_mask_op3" "round_scalarcz" "" "<round_scalarc_mask_operand3>")
+(define_subst_attr "round_scalarcz_mask_op4" "round_scalarcz" "" "<round_scalarcz_mask_operand4>")
 (define_subst_attr "round_scalar_constraint" "round_scalar" "vm" "v")
+(define_subst_attr "round_scalarcz_constraint" "round_scalarcz" "vm" "v")
 (define_subst_attr "round_scalar_prefix" "round_scalar" "vex" "evex")
 (define_subst_attr "round_scalar_nimm_predicate" "round_scalar" "nonimmediate_operand" "register_operand")
+(define_subst_attr "round_scalarcz_nimm_predicate" "round_scalarcz" "vector_operand" "register_operand")
 
 (define_subst "round_scalar"
   [(set (match_operand:SUBST_V 0)
@@ -350,6 +397,22 @@ (define_subst "round_scalar"
 	     (match_operand:SI 3 "const_4_or_8_to_11_operand")]
 		UNSPEC_EMBEDDED_ROUNDING))])
 
+(define_subst "round_scalarcz"
+  [(set (match_operand:SUBST_V 0)
+        (vec_merge:SUBST_V
+          (match_operand:SUBST_V 1)
+          (match_operand:SUBST_V 2)
+          (const_int 3)))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+	(unspec:SUBST_V [
+	     (vec_merge:SUBST_V
+		(match_dup 1)
+		(match_dup 2)
+		(const_int 3))
+	     (match_operand:SI 3 "const_4_or_8_to_11_operand")]
+		UNSPEC_EMBEDDED_ROUNDING))])
+
 (define_subst_attr "round_saeonly_scalar_name" "round_saeonly_scalar" "" "_round")
 (define_subst_attr "round_saeonly_scalar_mask_operand3" "mask_scalar" "%r3" "%r5")
 (define_subst_attr "round_saeonly_scalar_mask_operand4" "mask_scalar" "%r4" "%r6")
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 56e90d9f9a5..69de37a0087 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -797,6 +797,16 @@
 #define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8)
 #define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8)
 #define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, 8)
+#define __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, 8)
+#define __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfmulcsh_v8hf_round(A, B, 8)
+#define __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfcmulcsh_v8hf_round(A, B, 8)
+#define __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index ef9f8aad853..60adfcc1c67 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -814,6 +814,16 @@
 #define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8)
 #define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8)
 #define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, 8)
+#define __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, 8)
+#define __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfmulcsh_v8hf_round(A, B, 8)
+#define __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfcmulcsh_v8hf_round(A, B, 8)
+#define __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index f27c73fd4cc..956a9d16f84 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -774,6 +774,8 @@ test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
 test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
 test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8)
+test_2 (_mm_fmul_round_sch, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_fcmul_round_sch, __m128h, __m128h, __m128h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -850,8 +852,12 @@ test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
 test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
+test_3 (_mm_fmadd_round_sch, __m128h, __m128h, __m128h, __m128h, 8)
+test_3 (_mm_fcmadd_round_sch, __m128h, __m128h, __m128h, __m128h, 8)
 test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
+test_3 (_mm_maskz_fmul_round_sch, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm_maskz_fcmul_round_sch, __m128h, __mmask8, __m128h, __m128h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
 test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
@@ -920,8 +926,16 @@ test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmas
 test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
 test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
 test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
+test_4 (_mm_mask_fmadd_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_fcmadd_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask3_fmadd_round_sch, __m128h, __m128h, __m128h, __m128h, __mmask8, 8)
+test_4 (_mm_mask3_fcmadd_round_sch, __m128h, __m128h, __m128h, __m128h, __mmask8, 8)
+test_4 (_mm_maskz_fmadd_round_sch, __m128h, __mmask8, __m128h, __m128h, __m128h, 8)
+test_4 (_mm_maskz_fcmadd_round_sch, __m128h, __mmask8, __m128h, __m128h, __m128h, 8)
 test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
 test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
+test_4 (_mm_mask_fmul_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_fcmul_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index ccf8c3a6c03..31492ef3697 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -878,6 +878,8 @@ test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8)
 test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8)
 test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8)
 test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8)
+test_2 (_mm_fmul_round_sch, __m128h, __m128h, __m128h, 8)
+test_2 (_mm_fcmul_round_sch, __m128h, __m128h, __m128h, 8)
 test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
 test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
 test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
@@ -954,6 +956,10 @@ test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
 test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
 test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
 test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
+test_3 (_mm_maskz_fmul_round_sch, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm_maskz_fcmul_round_sch, __m128h, __mmask8, __m128h, __m128h, 8)
+test_3 (_mm_fmadd_round_sch, __m128h, __m128h, __m128h, __m128h, 8)
+test_3 (_mm_fcmadd_round_sch, __m128h, __m128h, __m128h, __m128h, 8)
 test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
 test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
 test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
@@ -1022,8 +1028,16 @@ test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmas
 test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
 test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
 test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
+test_4 (_mm_mask_fmadd_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_fcmadd_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask3_fmadd_round_sch, __m128h, __m128h, __m128h, __m128h, __mmask8, 8)
+test_4 (_mm_mask3_fcmadd_round_sch, __m128h, __m128h, __m128h, __m128h, __mmask8, 8)
+test_4 (_mm_maskz_fmadd_round_sch, __m128h, __mmask8, __m128h, __m128h, __m128h, 8)
+test_4 (_mm_maskz_fcmadd_round_sch, __m128h, __mmask8, __m128h, __m128h, __m128h, 8)
 test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
 test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
+test_4 (_mm_mask_fmul_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
+test_4 (_mm_mask_fcmul_round_sch, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
 test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
 test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index dc39d7e2012..4a110e86855 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -815,6 +815,16 @@
 #define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8)
 #define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8)
 #define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfmaddcsh_v8hf_round(A, B, C, 8)
+#define __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcsh_v8hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcsh_v8hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, D) __builtin_ia32_vfcmaddcsh_v8hf_round(A, B, C, 8)
+#define __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcsh_v8hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcsh_v8hf_maskz_round(B, C, D, A, 8)
+#define __builtin_ia32_vfmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfmulcsh_v8hf_round(A, B, 8)
+#define __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcsh_v8hf_mask_round(A, C, D, B, 8)
+#define __builtin_ia32_vfcmulcsh_v8hf_round(A, B, C) __builtin_ia32_vfcmulcsh_v8hf_round(A, B, 8)
+#define __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcsh_v8hf_mask_round(A, C, D, B, 8)
 
 /* avx512fp16vlintrin.h */
 #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 52/62] AVX512FP16: Add testcases for vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (50 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 51/62] AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 53/62] AVX512FP16: Add expander for sqrthf2 liuhongt
                   ` (9 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: New test.
	* gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfcmulcsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfcmulcsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmaddcsh-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmulcsh-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-vfmulcsh-1b.c: Ditto.
---
 .../i386/avx512fp16-vfcmaddcsh-1a.c           | 27 +++++++
 .../i386/avx512fp16-vfcmaddcsh-1b.c           | 78 +++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vfcmulcsh-1a.c | 25 ++++++
 .../gcc.target/i386/avx512fp16-vfcmulcsh-1b.c | 71 +++++++++++++++++
 .../gcc.target/i386/avx512fp16-vfmaddcsh-1a.c | 27 +++++++
 .../gcc.target/i386/avx512fp16-vfmaddcsh-1b.c | 77 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vfmulcsh-1a.c  | 25 ++++++
 .../gcc.target/i386/avx512fp16-vfmulcsh-1b.c  | 71 +++++++++++++++++
 8 files changed, 401 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1b.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
new file mode 100644
index 00000000000..8bd8eebd8df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfcmaddcsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, res1, res2;
+volatile __m128h x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx128f_test (void)
+{
+  res = _mm_fcmadd_sch (x1, x2, x3);
+  res1 = _mm_mask_fcmadd_sch (res1, m8, x1, x2);
+  res1 = _mm_mask3_fcmadd_sch (res1, x1, x2, m8);
+  res2 = _mm_maskz_fcmadd_sch (m8, x1, x2, x3);
+  res = _mm_fcmadd_round_sch (x1, x2, x3, 8);
+  res1 = _mm_mask_fcmadd_round_sch (res1, m8, x1, x2, 8);
+  res1 = _mm_mask3_fcmadd_round_sch (res1, x1, x2, m8, 8);
+  res2 = _mm_maskz_fcmadd_round_sch (m8, x1, x2, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c
new file mode 100644
index 00000000000..c4790684b66
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1b.c
@@ -0,0 +1,78 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+EMULATE(c_fmadd_csh) (V512 * dest, V512 op1, V512 op2,
+		    __mmask8 k, int zero_mask, int c_flag,
+		    int is_mask3)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  int invert = 1;
+  if (c_flag == 1)
+    invert = -1;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  if ((k&1) || !k) {
+    v5.f32[0] = v1.f32[0] * v7.f32[0]
+      - invert * (v1.f32[1] * v7.f32[1]) + v3.f32[0];
+    v5.f32[1] = v1.f32[0] * v7.f32[1]
+      + invert * (v1.f32[1] * v7.f32[0]) + v3.f32[1];
+  }
+  else if (zero_mask)
+    v5.f32[0] = 0;
+  else
+    v5.f32[0] = v7.f32[0];
+
+  for (i = 2; i < 8; i++)
+    v5.f32[i] = is_mask3? v3.f32[i] : v7.f32[i];
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_csh)(&exp, src1, src2,  0x1, 0, 1, 0);
+  res.xmmh[0] = _mm_fcmadd_round_sch(res.xmmh[0], src1.xmmh[0],
+				     src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_fcmadd_sch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_csh)(&exp, src1, src2,  0x1, 0, 1, 0);
+  res.xmmh[0] = _mm_mask_fcmadd_round_sch(res.xmmh[0], 0x1,
+					  src1.xmmh[0], src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask_fcmadd_sch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_csh)(&exp, src1, src2,  0x1, 0, 1, 1);
+  res.xmmh[0] = _mm_mask3_fcmadd_round_sch(res.xmmh[0], src1.xmmh[0], src2.xmmh[0],
+					   0x1, _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask3_fcmadd_sch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_csh)(&exp, src1, src2,  0x3, 1, 1, 0);
+  res.xmmh[0] = _mm_maskz_fcmadd_round_sch(0x3, res.xmmh[0], src1.xmmh[0],
+					   src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_maskz_fcmadd_sch);
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1a.c
new file mode 100644
index 00000000000..872d91ac257
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, res1, res2;
+volatile __m128h x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_fcmul_sch (x1, x2);
+  res1 = _mm_mask_fcmul_sch (res1, m8, x1, x2);
+  res2 = _mm_maskz_fcmul_sch (m8, x1, x2);
+  res = _mm_fcmul_round_sch (x1, x2, 8);
+  res1 = _mm_mask_fcmul_round_sch (res1, m8, x1, x2, 8);
+  res2 = _mm_maskz_fcmul_round_sch (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1b.c
new file mode 100644
index 00000000000..995df8422f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmulcsh-1b.c
@@ -0,0 +1,71 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+EMULATE(c_fmul_csh) (V512 * dest, V512 op1, V512 op2,
+		    __mmask8 k, int zero_mask, int c_flag)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  int invert = 1;
+  if (c_flag == 1)
+    invert = -1;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  if ((k&1) || !k) {
+    v5.f32[0] = v1.f32[0] * v3.f32[0]
+      - invert * (v1.f32[1] * v3.f32[1]);
+    v5.f32[1] = v1.f32[1] * v3.f32[0]
+      + invert * (v1.f32[0] * v3.f32[1]);
+  }
+  else if (zero_mask)
+    v5.f32[0] = 0;
+  else
+    v5.f32[0] = v7.f32[0];
+
+  for (i = 2; i < 8; i++)
+    v5.f32[i] = v1.f32[i];
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_csh)(&exp, src1, src2,  0x1, 0 , 1);
+  res.xmmh[0] = _mm_fcmul_round_sch(src1.xmmh[0], src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_fcmul_sch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_csh)(&exp, src1, src2,  0x1, 0, 1);
+  res.xmmh[0] = _mm_mask_fcmul_round_sch(res.xmmh[0], 0x1,
+					 src1.xmmh[0], src2.xmmh[0],
+					 _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask_fcmul_sch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_csh)(&exp, src1, src2,  0x3, 1, 1);
+  res.xmmh[0] = _mm_maskz_fcmul_round_sch(0x3, src1.xmmh[0],
+					  src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_maskz_fcmul_sch);
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c
new file mode 100644
index 00000000000..1e376b4a2bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfmaddcsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, res1, res2;
+volatile __m128h x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx128f_test (void)
+{
+  res = _mm_fmadd_sch (x1, x2, x3);
+  res1 = _mm_mask_fmadd_sch (res1, m8, x1, x2);
+  res1 = _mm_mask3_fmadd_sch (res1, x1, x2, m8);
+  res2 = _mm_maskz_fmadd_sch (m8, x1, x2, x3);
+  res = _mm_fmadd_round_sch (x1, x2, x3, 8);
+  res1 = _mm_mask_fmadd_round_sch (res1, m8, x1, x2, 8);
+  res1 = _mm_mask3_fmadd_round_sch (res1, x1, x2, m8, 8);
+  res2 = _mm_maskz_fmadd_round_sch (m8, x1, x2, x3, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1b.c
new file mode 100644
index 00000000000..4c74e01d8a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1b.c
@@ -0,0 +1,77 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+EMULATE(c_fmadd_csh) (V512 * dest, V512 op1, V512 op2,
+		    __mmask8 k, int zero_mask, int c_flag,
+		    int is_mask3)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  int invert = 1;
+  if (c_flag == 1)
+    invert = -1;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  if ((k&1) || !k) {
+    v5.f32[0] = v1.f32[0] * v7.f32[0]
+      - invert * (v1.f32[1] * v7.f32[1]) + v3.f32[0];
+    v5.f32[1] = v1.f32[0] * v7.f32[1]
+      + invert * (v1.f32[1] * v7.f32[0]) + v3.f32[1];
+  }
+  else if (zero_mask)
+    v5.f32[0] = 0;
+  else
+    v5.f32[0] = v7.f32[0];
+
+  for (i = 2; i < 8; i++)
+    v5.f32[i] = is_mask3? v3.f32[i] : v7.f32[i];
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_csh)(&exp, src1, src2,  0x1, 0, 0, 0);
+  res.xmmh[0] = _mm_fmadd_round_sch(res.xmmh[0], src1.xmmh[0],
+				    src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_fmadd_sch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_csh)(&exp, src1, src2,  0x1, 0, 0, 0);
+  res.xmmh[0] = _mm_mask_fmadd_round_sch(res.xmmh[0], 0x1, src1.xmmh[0],
+					 src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask_fmadd_sch);
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_csh)(&exp, src1, src2,  0x1, 0, 0, 1);
+  res.xmmh[0] = _mm_mask3_fmadd_round_sch(res.xmmh[0], src1.xmmh[0], src2.xmmh[0],
+					   0x1, _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask3_fmadd_sch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmadd_csh)(&exp, src1, src2,  0x3, 1, 0, 0);
+  res.xmmh[0] = _mm_maskz_fmadd_round_sch(0x3, res.xmmh[0], src1.xmmh[0],
+					  src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_maskz_fmadd_sch);
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1a.c
new file mode 100644
index 00000000000..5d48874b760
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcsh\[ \\t\]+\{rz-sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m128h res, res1, res2;
+volatile __m128h x1, x2, x3;
+volatile __mmask8 m8;
+
+void extern
+avx512f_test (void)
+{
+  res = _mm_fmul_sch (x1, x2);
+  res1 = _mm_mask_fmul_sch (res1, m8, x1, x2);
+  res2 = _mm_maskz_fmul_sch (m8, x1, x2);
+  res = _mm_fmul_round_sch (x1, x2, 8);
+  res1 = _mm_mask_fmul_round_sch (res1, m8, x1, x2, 8);
+  res2 = _mm_maskz_fmul_round_sch (m8, x1, x2, 11);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1b.c
new file mode 100644
index 00000000000..45840d62f67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmulcsh-1b.c
@@ -0,0 +1,71 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512dq" } */
+
+
+#define AVX512FP16
+#include "avx512fp16-helper.h"
+
+#define N_ELEMS 8
+
+void NOINLINE
+EMULATE(c_fmul_csh) (V512 * dest, V512 op1, V512 op2,
+		    __mmask8 k, int zero_mask, int c_flag)
+{
+  V512 v1, v2, v3, v4, v5, v6, v7, v8;
+  int i;
+  int invert = 1;
+  if (c_flag == 1)
+    invert = -1;
+
+  unpack_ph_2twops(op1, &v1, &v2);
+  unpack_ph_2twops(op2, &v3, &v4);
+  unpack_ph_2twops(*dest, &v7, &v8);
+
+  if ((k&1) || !k) {
+    v5.f32[0] = v1.f32[0] * v3.f32[0]
+      - invert * (v1.f32[1] * v3.f32[1]);
+    v5.f32[1] = v1.f32[0] * v3.f32[1]
+      + invert * (v1.f32[1] * v3.f32[0]);
+  }
+  else if (zero_mask)
+    v5.f32[0] = 0;
+  else
+    v5.f32[0] = v7.f32[0];
+
+  for (i = 2; i < 8; i++)
+    v5.f32[i] = v1.f32[i];
+
+  *dest = pack_twops_2ph(v5, v6);
+}
+
+void
+TEST (void)
+{
+  V512 res;
+  V512 exp;
+
+  init_src();
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_csh)(&exp, src1, src2,  0x1, 0 , 0);
+  res.xmmh[0] = _mm_fmul_round_sch(src1.xmmh[0], src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_fmul_sch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_csh)(&exp, src1, src2,  0x1, 0, 0);
+  res.xmmh[0] = _mm_mask_fmul_round_sch(res.xmmh[0], 0x1,
+					src1.xmmh[0], src2.xmmh[0],
+					_ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_mask_fmul_sch);
+
+  init_dest(&res, &exp);
+  EMULATE(c_fmul_csh)(&exp, src1, src2,  0x3, 1, 0);
+  res.xmmh[0] = _mm_maskz_fmul_round_sch(0x3, src1.xmmh[0],
+					 src2.xmmh[0], _ROUND_NINT);
+  CHECK_RESULT (&res, &exp, N_ELEMS, _mm_maskz_fmul_sch);
+
+  if (n_errs != 0) {
+      abort ();
+  }
+}
+
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 53/62] AVX512FP16: Add expander for sqrthf2.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (51 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 52/62] AVX512FP16: Add testcases for vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-23  5:12   ` Hongtao Liu
  2021-07-01  6:16 ` [PATCH 54/62] AVX512FP16: Add expander for ceil/floor/trunc/roundeven liuhongt
                   ` (8 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/i386-features.c (i386-features.c): Handle
	E_HFmode.
	* config/i386/i386.md (sqrthf2): New expander.
	(*sqrt<mode>2_sse): Extend to MODEFH.
	* config/i386/sse.md
	(*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>):
	Extend to VFH_128.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-builtin-sqrt-1.c: New test.
	* gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c: New test.
---
 gcc/config/i386/i386-features.c               | 15 +++++++++++----
 gcc/config/i386/i386.md                       | 12 +++++++++---
 gcc/config/i386/sse.md                        |  8 ++++----
 .../i386/avx512fp16-builtin-sqrt-1.c          | 18 ++++++++++++++++++
 .../i386/avx512fp16vl-builtin-sqrt-1.c        | 19 +++++++++++++++++++
 5 files changed, 61 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c

diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index a25769ae478..0b5a1a3af53 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -2238,15 +2238,22 @@ remove_partial_avx_dependency (void)
 
 	  rtx zero;
 	  machine_mode dest_vecmode;
-	  if (dest_mode == E_SFmode)
+	  switch (dest_mode)
 	    {
+	    case E_HFmode:
+	      dest_vecmode = V8HFmode;
+	      zero = gen_rtx_SUBREG (V8HFmode, v4sf_const0, 0);
+	      break;
+	    case E_SFmode:
 	      dest_vecmode = V4SFmode;
 	      zero = v4sf_const0;
-	    }
-	  else
-	    {
+	      break;
+	    case E_DFmode:
 	      dest_vecmode = V2DFmode;
 	      zero = gen_rtx_SUBREG (V2DFmode, v4sf_const0, 0);
+	      break;
+	    default:
+	      gcc_unreachable ();
 	    }
 
 	  /* Change source to vector mode.  */
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a85c23d74f1..81c893c60de 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16561,9 +16561,9 @@ (define_expand "rsqrtsf2"
 })
 
 (define_insn "*sqrt<mode>2_sse"
-  [(set (match_operand:MODEF 0 "register_operand" "=v,v,v")
-	(sqrt:MODEF
-	  (match_operand:MODEF 1 "nonimmediate_operand" "0,v,m")))]
+  [(set (match_operand:MODEFH 0 "register_operand" "=v,v,v")
+	(sqrt:MODEFH
+	  (match_operand:MODEFH 1 "nonimmediate_operand" "0,v,m")))]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
   "@
    %vsqrt<ssemodesuffix>\t{%d1, %0|%0, %d1}
@@ -16583,6 +16583,12 @@ (define_insn "*sqrt<mode>2_sse"
 	    ]
 	    (symbol_ref "true")))])
 
+(define_expand "sqrthf2"
+  [(set (match_operand:HF 0 "register_operand")
+	(sqrt:HF
+	  (match_operand:HF 1 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
 (define_expand "sqrt<mode>2"
   [(set (match_operand:MODEF 0 "register_operand")
 	(sqrt:MODEF
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2c3dba5bdb0..b47e7f0b82a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2389,12 +2389,12 @@ (define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (vec_duplicate:VF_128
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (vec_duplicate:VFH_128
 	    (sqrt:<ssescalarmode>
 	      (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "xm,<round_scalar_constraint>")))
-	  (match_operand:VF_128 2 "register_operand" "0,v")
+	  (match_operand:VFH_128 2 "register_operand" "0,v")
 	  (const_int 1)))]
   "TARGET_SSE"
   "@
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c
new file mode 100644
index 00000000000..38cdf23fef7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16" } */
+
+_Float16
+f1 (_Float16 x)
+{
+  return __builtin_sqrtf16 (x);
+}
+
+void
+f2 (_Float16* __restrict psrc, _Float16* __restrict pdst)
+{
+  for (int i = 0; i != 32; i++)
+    pdst[i] = __builtin_sqrtf16 (psrc[i]);
+}
+
+/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*zmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c
new file mode 100644
index 00000000000..08deb3ea470
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16 -mavx512vl" } */
+
+void
+f1 (_Float16* __restrict psrc, _Float16* __restrict pdst)
+{
+  for (int i = 0; i != 8; i++)
+    pdst[i] = __builtin_sqrtf16 (psrc[i]);
+}
+
+void
+f2 (_Float16* __restrict psrc, _Float16* __restrict pdst)
+{
+  for (int i = 0; i != 16; i++)
+    pdst[i] = __builtin_sqrtf16 (psrc[i]);
+}
+
+/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*ymm\[0-9\]" 1 } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 54/62] AVX512FP16: Add expander for ceil/floor/trunc/roundeven.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (52 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 53/62] AVX512FP16: Add expander for sqrthf2 liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 55/62] AVX512FP16: Add expander for cstorehf4 liuhongt
                   ` (7 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/i386.md (<rounding_insn>hf2): New expander.
	(sse4_1_round<mode>2): Extend from MODEF to MODEFH.
	* config/i386/sse.md (*sse4_1_round<ssescalarmodesuffix>):
	Extend from VF_128 to VFH_128.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-builtin-round-1.c: New test.
---
 gcc/config/i386/i386.md                       | 19 ++++++++++--
 gcc/config/i386/sse.md                        |  8 ++---
 .../i386/avx512fp16-builtin-round-1.c         | 31 +++++++++++++++++++
 3 files changed, 51 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-1.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 81c893c60de..247a6e489ef 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17721,9 +17721,9 @@ (define_expand "significand<mode>2"
 \f
 
 (define_insn "sse4_1_round<mode>2"
-  [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v,v")
-	(unspec:MODEF
-	  [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,v,m")
+  [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
+	(unspec:MODEFH
+	  [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
 	   (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n,n")]
 	  UNSPEC_ROUND))]
   "TARGET_SSE4_1"
@@ -17980,6 +17980,19 @@ (define_expand "<rounding_insn>xf2"
   "TARGET_USE_FANCY_MATH_387
    && (flag_fp_int_builtin_inexact || !flag_trapping_math)")
 
+(define_expand "<rounding_insn>hf2"
+  [(parallel [(set (match_operand:HF 0 "register_operand")
+		   (unspec:HF [(match_operand:HF 1 "register_operand")]
+				FRNDINT_ROUNDING))
+	      (clobber (reg:CC FLAGS_REG))])]
+  "TARGET_AVX512FP16"
+{
+  emit_insn (gen_sse4_1_roundhf2
+  	       (operands[0], operands[1],
+	       GEN_INT (ROUND_<ROUNDING> | ROUND_NO_EXC)));
+  DONE;
+})
+
 (define_expand "<rounding_insn><mode>2"
   [(parallel [(set (match_operand:MODEF 0 "register_operand")
 		   (unspec:MODEF [(match_operand:MODEF 1 "register_operand")]
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b47e7f0b82a..a76c30c75cb 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -20202,14 +20202,14 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*sse4_1_round<ssescalarmodesuffix>"
-  [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
-	(vec_merge:VF_128
-	  (vec_duplicate:VF_128
+  [(set (match_operand:VFH_128 0 "register_operand" "=Yr,*x,x,v")
+	(vec_merge:VFH_128
+	  (vec_duplicate:VFH_128
 	    (unspec:<ssescalarmode>
 	      [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
 	       (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
 	      UNSPEC_ROUND))
-	  (match_operand:VF_128 1 "register_operand" "0,0,x,v")
+	  (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
 	  (const_int 1)))]
   "TARGET_SSE4_1"
   "@
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-1.c
new file mode 100644
index 00000000000..3cab1526967
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16" } */
+
+_Float16
+f1 (_Float16 x)
+{
+  return __builtin_truncf16 (x);
+}
+
+_Float16
+f2 (_Float16 x)
+{
+  return __builtin_floorf16 (x);
+}
+
+_Float16
+f3 (_Float16 x)
+{
+  return __builtin_ceilf16 (x);
+}
+
+_Float16
+f4 (_Float16 x)
+{
+  return __builtin_roundevenf16 (x);
+}
+
+/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\\\$11\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\\\$10\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\\\$9\[^\n\r\]*xmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vrndscalesh\[ \\t\]+\\\$8\[^\n\r\]*xmm\[0-9\]" 1 } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 55/62] AVX512FP16: Add expander for cstorehf4.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (53 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 54/62] AVX512FP16: Add expander for ceil/floor/trunc/roundeven liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16) liuhongt
                   ` (6 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/i386.md (cstore<mode>4): Extend from MODEF to
	MODEFH.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-builtin-fpcompare-1.c: New test.
	* gcc.target/i386/avx512fp16-builtin-fpcompare-2.c: New test.
---
 gcc/config/i386/i386.md                       |  4 +-
 .../i386/avx512fp16-builtin-fpcompare-1.c     | 40 +++++++++++++++++++
 .../i386/avx512fp16-builtin-fpcompare-2.c     | 29 ++++++++++++++
 3 files changed, 71 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-2.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 247a6e489ef..5f45c4ff583 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1524,8 +1524,8 @@ (define_expand "cbranch<mode>4"
 
 (define_expand "cstore<mode>4"
   [(set (reg:CC FLAGS_REG)
-	(compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
-		    (match_operand:MODEF 3 "cmp_fp_expander_operand")))
+	(compare:CC (match_operand:MODEFH 2 "cmp_fp_expander_operand")
+		    (match_operand:MODEFH 3 "cmp_fp_expander_operand")))
    (set (match_operand:QI 0 "register_operand")
               (match_operator 1 "ix86_fp_comparison_operator"
                [(reg:CC FLAGS_REG)
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-1.c
new file mode 100644
index 00000000000..62115f15f30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16" } */
+
+int
+f1 (_Float16 x, _Float16 y)
+{
+  return x > y;
+}
+
+int
+f2 (_Float16 x, _Float16 y)
+{
+  return x < y;
+}
+
+/* { dg-final { scan-assembler-times "seta" 2 } } */
+
+int
+f3 (_Float16 x, _Float16 y)
+{
+  return x >= y;
+}
+
+int
+f4 (_Float16 x, _Float16 y)
+{
+  return x <= y;
+}
+
+/* { dg-final { scan-assembler-times "setnb" 2 } } */
+
+int
+f5 (_Float16 x, _Float16 y)
+{
+  return __builtin_isunordered (x, y);
+}
+
+/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } }  */
+/* { dg-final { scan-assembler-times "xorl" 5 } } */
+/* { dg-final { scan-assembler-times "vcomish\[^\n\r\]*xmm\[0-9\]" 4 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-2.c
new file mode 100644
index 00000000000..150c351e784
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-fpcompare-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpmath=sse -mavx512fp16" } */
+
+int
+foo (_Float16 y)
+{
+  return __builtin_isinf (y);
+}
+
+int
+foo2 (_Float16 y)
+{
+  return __builtin_isfinite (y);
+}
+
+int
+foo3 (_Float16 y)
+{
+  return __builtin_signbit(y);
+}
+
+int
+foo4 (_Float16 y)
+{
+  return __builtin_isnormal (y);
+}
+
+/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } }  */
+/* { dg-final { scan-assembler-times "vucomish\[^\n\r\]*xmm\[0-9\]" 4 } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (54 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 55/62] AVX512FP16: Add expander for cstorehf4 liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  9:50   ` Richard Biener
  2021-07-01 21:17   ` Joseph Myers
  2021-07-01  6:16 ` [PATCH 57/62] AVX512FP16: Add expander for fmahf4 liuhongt
                   ` (5 subsequent siblings)
  61 siblings, 2 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/i386.md (*sqrthf2): New define_insn.
	* config/i386/sse.md
	(*avx512fp16_vmsqrthf2<mask_scalar_name><round_scalar_name>):
	Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-builtin-sqrt-2.c: New test.
---
 gcc/config/i386/i386.md                        | 18 ++++++++++++++++++
 gcc/config/i386/sse.md                         | 18 ++++++++++++++++++
 .../i386/avx512fp16-builtin-sqrt-2.c           | 18 ++++++++++++++++++
 3 files changed, 54 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5f45c4ff583..684b2080a93 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16583,6 +16583,24 @@ (define_insn "*sqrt<mode>2_sse"
 	    ]
 	    (symbol_ref "true")))])
 
+/* Optimize for code like (_Float16) __builtin_sqrtf ((float) f16)
+   since it's not handled in frontend.  */
+(define_insn "*sqrthf2"
+  [(set (match_operand:HF 0 "register_operand" "=v,v")
+  	(float_truncate:HF
+	  (sqrt:MODEF
+	    (float_extend:MODEF
+	      (match_operand:HF 1 "nonimmediate_operand" "v,m")))))]
+  "TARGET_AVX512FP16"
+  "@
+   vsqrtsh\t{%d1, %0|%0, %d1}
+   vsqrtsh\t{%1, %d0|%d0, %1}"
+  [(set_attr "type" "sse")
+   (set_attr "atom_sse_attr" "sqrt")
+   (set_attr "prefix" "evex")
+   (set_attr "avx_partial_xmm_update" "false,true")
+   (set_attr "mode" "HF")])
+
 (define_expand "sqrthf2"
   [(set (match_operand:HF 0 "register_operand")
 	(sqrt:HF
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a76c30c75cb..f87f6893835 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2407,6 +2407,24 @@ (define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
    (set_attr "btver2_sse_attr" "sqrt")
    (set_attr "mode" "<ssescalarmode>")])
 
+(define_insn "*avx512fp16_vmsqrthf2<mask_scalar_name><round_scalar_name>"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (vec_duplicate:V8HF
+	    (float_truncate:HF
+	      (sqrt:MODEF
+	        (float_extend:MODEF
+		  (match_operand:HF 1 "nonimmediate_operand" "<round_scalar_constraint>")))))
+	  (match_operand:VFH_128 2 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vsqrtsh\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %1<round_scalar_mask_op3>}"
+  [(set_attr "type" "sse")
+   (set_attr "atom_sse_attr" "sqrt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
+
 (define_expand "rsqrt<mode>2"
   [(set (match_operand:VF1_AVX512ER_128_256 0 "register_operand")
 	(unspec:VF1_AVX512ER_128_256
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
new file mode 100644
index 00000000000..4fefee179af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16" } */
+
+#include<math.h>
+_Float16
+foo (_Float16 f16)
+{
+  return sqrtf (f16);
+}
+
+_Float16
+foo1 (_Float16 f16)
+{
+  return sqrt (f16);
+}
+
+/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
+/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 2 } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 57/62] AVX512FP16: Add expander for fmahf4
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (55 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16) liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16) liuhongt
                   ` (4 subsequent siblings)
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/sse.md (FMAMODEM): extend to handle FP16.
	(VFH_SF_AVX512VL): Extend to handle HFmode.
	(VF_SF_AVX512VL): Deleted.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-fma-1.c: New test.
	* gcc.target/i386/avx512fp16vl-fma-1.c: New test.
	* gcc.target/i386/avx512fp16vl-fma-vectorize-1.c: New test.
---
 gcc/config/i386/sse.md                        | 11 +--
 .../gcc.target/i386/avx512fp16-fma-1.c        | 69 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16vl-fma-1.c      | 70 +++++++++++++++++++
 .../i386/avx512fp16vl-fma-vectorize-1.c       | 45 ++++++++++++
 4 files changed, 190 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f87f6893835..2b8d12086f4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4489,7 +4489,11 @@ (define_mode_iterator FMAMODEM
    (V8SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
    (V4DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
    (V16SF "TARGET_AVX512F")
-   (V8DF "TARGET_AVX512F")])
+   (V8DF "TARGET_AVX512F")
+   (HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V32HF "TARGET_AVX512FP16")])
 
 (define_expand "fma<mode>4"
   [(set (match_operand:FMAMODEM 0 "register_operand")
@@ -4597,14 +4601,11 @@ (define_insn "*fma_fmadd_<mode>"
    (set_attr "mode" "<MODE>")])
 
 ;; Suppose AVX-512F as baseline
-(define_mode_iterator VF_SF_AVX512VL
-  [SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
-   DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
-
 (define_mode_iterator VFH_SF_AVX512VL
   [(V32HF "TARGET_AVX512FP16")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (HF "TARGET_AVX512FP16")
    SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c
new file mode 100644
index 00000000000..d78d7629838
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c
@@ -0,0 +1,69 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16" } */
+
+typedef _Float16 v32hf __attribute__ ((__vector_size__ (64)));
+
+_Float16
+foo1 (_Float16 a, _Float16 b, _Float16 c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+_Float16
+foo2 (_Float16 a, _Float16 b, _Float16 c)
+{
+  return -a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+_Float16
+foo3 (_Float16 a, _Float16 b, _Float16 c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+_Float16
+foo4 (_Float16 a, _Float16 b, _Float16 c)
+{
+  return -a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v32hf
+foo5 (v32hf a, v32hf b, v32hf c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */
+
+v32hf
+foo6 (v32hf a, v32hf b, v32hf c)
+{
+  return -a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */
+
+v32hf
+foo7 (v32hf a, v32hf b, v32hf c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */
+
+v32hf
+foo8 (v32hf a, v32hf b, v32hf c)
+{
+  return -a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c
new file mode 100644
index 00000000000..1a832f37d6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16 -mavx512vl" } */
+
+typedef _Float16 v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 v16hf __attribute__ ((__vector_size__ (32)));
+
+v8hf
+foo1 (v8hf a, v8hf b, v8hf c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v8hf
+foo2 (v8hf a, v8hf b, v8hf c)
+{
+  return -a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v8hf
+foo3 (v8hf a, v8hf b, v8hf c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v8hf
+foo4 (v8hf a, v8hf b, v8hf c)
+{
+  return -a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v16hf
+foo5 (v16hf a, v16hf b, v16hf c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */
+
+v16hf
+foo6 (v16hf a, v16hf b, v16hf c)
+{
+  return -a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */
+
+v16hf
+foo7 (v16hf a, v16hf b, v16hf c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */
+
+v16hf
+foo8 (v16hf a, v16hf b, v16hf c)
+{
+  return -a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c
new file mode 100644
index 00000000000..d0b8bec34f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16 -mavx512vl" } */
+
+typedef _Float16 v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 v16hf __attribute__ ((__vector_size__ (32)));
+
+void
+foo1 (_Float16* __restrict pa, _Float16* __restrict pb,
+      _Float16* __restrict pc, _Float16* __restrict pd)
+{
+  for (int i = 0; i != 8; i++)
+    pd[i] = pa[i] * pb[i] + pc[i];
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+void
+foo2 (_Float16* __restrict pa, _Float16* __restrict pb,
+      _Float16* __restrict pc, _Float16* __restrict pd)
+{
+    for (int i = 0; i != 8; i++)
+    pd[i] = -pa[i] * pb[i] + pc[i];
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+void
+foo3 (_Float16* __restrict pa, _Float16* __restrict pb,
+      _Float16* __restrict pc, _Float16* __restrict pd)
+{
+  for (int i = 0; i != 8; i++)
+    pd[i] = pa[i] * pb[i] - pc[i];
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+void
+foo4 (_Float16* __restrict pa, _Float16* __restrict pb,
+      _Float16* __restrict pc, _Float16* __restrict pd)
+{
+  for (int i = 0; i != 8; i++)
+    pd[i] = -pa[i] * pb[i] - pc[i];
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16).
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (56 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 57/62] AVX512FP16: Add expander for fmahf4 liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  9:52   ` Richard Biener
  2021-07-01 21:26   ` Joseph Myers
  2021-07-01  6:16 ` [PATCH 59/62] AVX512FP16: Support load/store/abs intrinsics liuhongt
                   ` (3 subsequent siblings)
  61 siblings, 2 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub

gcc/ChangeLog:

	* config/i386/i386.md (*avx512fp16_1_roundhf2): New define_insn.
	* config/i386/sse.md (*avx512fp16_1_roundhf): New fine_insn.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-builtin-round-2.c: New test.
---
 gcc/config/i386/i386.md                       | 22 ++++++++++++++
 gcc/config/i386/sse.md                        | 20 +++++++++++++
 .../i386/avx512fp16-builtin-round-2.c         | 29 +++++++++++++++++++
 3 files changed, 71 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 684b2080a93..457f37dcb61 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17738,6 +17738,28 @@ (define_expand "significand<mode>2"
 })
 \f
 
+/* Optimize for code like (_Float16) __builtin_ceif ((float) f16)
+   since it's not handled in frontend.  */
+
+(define_insn "*avx512fp16_1_roundhf2"
+  [(set (match_operand:HF 0 "register_operand" "=v,v")
+	(float_truncate:HF
+	  (unspec:MODEF
+	    [(float_extend:MODEF
+	  	(match_operand:HF 1 "nonimmediate_operand" "v,m"))
+	     (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
+	    UNSPEC_ROUND)))]
+  "TARGET_AVX512FP16"
+  "@
+   vrndscalesh\t{%2, %d1, %0|%0, %d1, %2}
+   vrndscalesh\t{%2, %1, %d0|%d0, %1, %2}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "length_immediate" "1,1")
+   (set_attr "prefix" "evex")
+   (set_attr "avx_partial_xmm_update" "false,true")
+   (set_attr "mode" "HF")])
+
+
 (define_insn "sse4_1_round<mode>2"
   [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
 	(unspec:MODEFH
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2b8d12086f4..b3d8ffb4f8e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -20220,6 +20220,26 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
    (set_attr "prefix" "orig,orig,vex,evex")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*avx512fp16_1_roundhf"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (vec_duplicate:V8HF
+	    (float_truncate:HF
+	      (unspec:MODEF
+	        [(float_extend:MODEF
+		   (match_operand:HF 2 "nonimmediate_operand" "vm"))
+	       	 (match_operand:SI 3 "const_0_to_15_operand" "n")]
+	      UNSPEC_ROUND)))
+	  (match_operand:V8HF 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vrndscalesh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_insn "*sse4_1_round<ssescalarmodesuffix>"
   [(set (match_operand:VFH_128 0 "register_operand" "=Yr,*x,x,v")
 	(vec_merge:VFH_128
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c
new file mode 100644
index 00000000000..bcd41929637
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo1 (_Float16 a)
+{
+  return __builtin_roundeven (a);
+}
+
+_Float16
+foo2 (_Float16 a)
+{
+  return __builtin_trunc (a);
+}
+
+_Float16
+foo3 (_Float16 a)
+{
+  return __builtin_ceil (a);
+}
+
+_Float16
+foo4 (_Float16 a)
+{
+  return __builtin_floor (a);
+}
+
+/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
+/* { dg-final { scan-assembler-times "vrndscalesh\[^\n\r\]*xmm\[0-9\]" 4 } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 59/62] AVX512FP16: Support load/store/abs intrinsics.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (57 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16) liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-09-22 10:30   ` Hongtao Liu
  2021-07-01  6:16 ` [PATCH 60/62] AVX512FP16: Add reduce operators(add/mul/min/max) liuhongt
                   ` (2 subsequent siblings)
  61 siblings, 1 reply; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub, dianhong xu

From: dianhong xu <dianhong.xu@intel.com>

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (__m512h_u, __m256h_u,
	__m128h_u): New typedef.
	(_mm512_load_ph): New intrinsic.
	(_mm256_load_ph): Ditto.
	(_mm_load_ph): Ditto.
	(_mm512_loadu_ph): Ditto.
	(_mm256_loadu_ph): Ditto.
	(_mm_loadu_ph): Ditto.
	(_mm512_store_ph): Ditto.
	(_mm256_store_ph): Ditto.
	(_mm_store_ph): Ditto.
	(_mm512_storeu_ph): Ditto.
	(_mm256_storeu_ph): Ditto.
	(_mm_storeu_ph): Ditto.
	(_mm512_abs_ph): Ditto.
	* config/i386/avx512fp16vlintrin.h
	(_mm_abs_ph): Ditto.
	(_mm256_abs_ph): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-13.c: New test.
---
 gcc/config/i386/avx512fp16intrin.h            |  97 ++++++++++++
 gcc/config/i386/avx512fp16vlintrin.h          |  16 ++
 gcc/testsuite/gcc.target/i386/avx512fp16-13.c | 143 ++++++++++++++++++
 3 files changed, 256 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-13.c

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 39c10beb1de..b8ca9201828 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -45,6 +45,11 @@ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
 typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
 typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
 
+/* Unaligned version of the same type.  */
+typedef _Float16 __m128h_u __attribute__ ((__vector_size__ (16), __may_alias__, __aligned__ (1)));
+typedef _Float16 __m256h_u __attribute__ ((__vector_size__ (32), __may_alias__, __aligned__ (1)));
+typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
+
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5,
@@ -362,6 +367,48 @@ _mm_load_sh (void const *__P)
 		     *(_Float16 const *) __P);
 }
 
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_load_ph (void const *__P)
+{
+  return *(const __m512h *) __P;
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_load_ph (void const *__P)
+{
+  return *(const __m256h *) __P;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_load_ph (void const *__P)
+{
+  return *(const __m128h *) __P;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_loadu_ph (void const *__P)
+{
+  return *(const __m512h_u *) __P;
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_loadu_ph (void const *__P)
+{
+  return *(const __m256h_u *) __P;
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_loadu_ph (void const *__P)
+{
+  return *(const __m128h_u *) __P;
+}
+
 /* Stores the lower _Float16 value.  */
 extern __inline void
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -370,6 +417,56 @@ _mm_store_sh (void *__P, __m128h __A)
   *(_Float16 *) __P = ((__v8hf)__A)[0];
 }
 
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_store_ph (void *__P, __m512h __A)
+{
+   *(__m512h *) __P = __A;
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_store_ph (void *__P, __m256h __A)
+{
+   *(__m256h *) __P = __A;
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_store_ph (void *__P, __m128h __A)
+{
+   *(__m128h *) __P = __A;
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_storeu_ph (void *__P, __m512h __A)
+{
+   *(__m512h_u *) __P = __A;
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_storeu_ph (void *__P, __m256h __A)
+{
+   *(__m256h_u *) __P = __A;
+}
+
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_storeu_ph (void *__P, __m128h __A)
+{
+   *(__m128h_u *) __P = __A;
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_abs_ph(__m512h __A)
+{
+  return (__m512h) _mm512_and_epi32 ( _mm512_set1_epi32(0x7FFF7FFF),
+				     (__m512i) __A);
+}
+
 /* Intrinsics v[add,sub,mul,div]ph.  */
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index c7bdfbc0517..d4aa9928406 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -425,6 +425,22 @@ _mm256_maskz_min_ph (__mmask16 __A, __m256h __B, __m256h __C)
 					   _mm256_setzero_ph (), __A);
 }
 
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_abs_ph (__m128h __A)
+{
+  return (__m128h) _mm_and_si128 ( _mm_set1_epi32(0x7FFF7FFF),
+				  (__m128i) __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_abs_ph (__m256h __A)
+{
+  return (__m256h) _mm256_and_si256 ( _mm256_set1_epi32(0x7FFF7FFF),
+				     (__m256i) __A);
+}
+
 /* vcmpph */
 #ifdef __OPTIMIZE
 extern __inline __mmask8
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
new file mode 100644
index 00000000000..3b6219e493f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
@@ -0,0 +1,143 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
+
+#include <immintrin.h>
+void
+__attribute__ ((noinline, noclone))
+store512_ph (void *p, __m512h a)
+{
+  _mm512_store_ph (p, a);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)" 1 } } */
+
+void
+__attribute__ ((noinline, noclone))
+store256_ph (void *p, __m256h a)
+{
+  _mm256_store_ph (p, a);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */
+
+void
+__attribute__ ((noinline, noclone))
+store_ph (void *p, __m128h a)
+{
+  _mm_store_ph (p, a);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */
+
+__m512h
+__attribute__ ((noinline, noclone))
+load512_ph (void const *p)
+{
+  return _mm512_load_ph (p);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)" 1 } } */
+
+__m256h
+__attribute__ ((noinline, noclone))
+load256_ph (void const *p)
+{
+  return _mm256_load_ph (p);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */
+
+__m128h
+__attribute__ ((noinline, noclone))
+load_ph (void const *p)
+{
+  return _mm_load_ph (p);
+}
+/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */
+
+__m512h
+__attribute__ ((noinline, noclone))
+load512u_ph (void const *p)
+{
+  return _mm512_loadu_ph (p);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^,\]*,\[^\{\n\]*%zmm\[0-9\]" 1 } } */
+
+__m256h
+__attribute__ ((noinline, noclone))
+load256u_ph (void const *p)
+{
+  return _mm256_loadu_ph (p);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^,\]*,\[^\{\n\]*%ymm\[0-9\]" 1 } } */
+
+__m128h
+__attribute__ ((noinline, noclone))
+load128u_ph (void const *p)
+{
+  return _mm_loadu_ph (p);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^,\]*,\[^\{\n\]*%xmm\[0-9\]" 1 } } */
+
+void
+__attribute__ ((noinline, noclone))
+store512u_ph (void *p, __m512h a)
+{
+  return _mm512_storeu_ph (p, a);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^\{\n\]*%zmm\[0-9\], *\[^,\]*" 1 } } */
+
+void
+__attribute__ ((noinline, noclone))
+store256u_ph (void *p, __m256h a)
+{
+  return _mm256_storeu_ph (p, a);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^\{\n\]*%ymm\[0-9\], *\[^,\]*" 1 } } */
+
+void
+__attribute__ ((noinline, noclone))
+storeu_ph (void *p, __m128h a)
+{
+  return _mm_storeu_ph (p, a);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^\{\n\]*%xmm\[0-9\], *\[^,\]*" 1 } } */
+
+__m512h
+__attribute__ ((noinline, noclone))
+abs512_ph (__m512h a)
+{
+  return _mm512_abs_ph (a);
+}
+
+/* { dg-final { scan-assembler-times "vpandd\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 { target {! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%zmm\[0-9\]+" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vpandd\[^\n\]*%zmm\[0-9\]+" 1 { target ia32 } } } */
+
+__m256h
+__attribute__ ((noinline, noclone))
+abs256_ph (__m256h a)
+{
+  return _mm256_abs_ph (a);
+}
+
+/* { dg-final { scan-assembler-times "vpandq\[ \\t\]+\[^\n\]*\\\{1to\[1-4\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 { target {! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq\[^\n\]*%ymm\[0-9\]+" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vpand\[^\n\]*%ymm\[0-9\]+" 1 { target ia32 } } } */
+
+__m128h
+__attribute__ ((noinline, noclone))
+abs_ph (__m128h a)
+{
+  return _mm_abs_ph (a);
+}
+
+/* { dg-final { scan-assembler-times "vpandq\[ \\t\]+\[^\n\]*\\\{1to\[1-2\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 { target {! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq\[^\n\]*%xmm\[0-9\]+" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vpand\[^\n\]*%xmm\[0-9\]+" 1 { target ia32 } } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 60/62] AVX512FP16: Add reduce operators(add/mul/min/max).
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (58 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 59/62] AVX512FP16: Support load/store/abs intrinsics liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 61/62] AVX512FP16: Add complex conjugation intrinsic instructions liuhongt
  2021-07-01  6:16 ` [PATCH 62/62] AVX512FP16: Add permutation and mask blend intrinsics liuhongt
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub, dianhong xu

From: dianhong xu <dianhong.xu@intel.com>

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_MM512_REDUCE_OP): New macro
	(_mm512_reduce_add_ph): New intrinsic.
	(_mm512_reduce_mul_ph): Ditto.
	(_mm512_reduce_min_ph): Ditto.
	(_mm512_reduce_max_ph): Ditto.
	* config/i386/avx512fp16vlintrin.h
	(_MM256_REDUCE_OP/_MM_REDUCE_OP): New macro.
	(_mm256_reduce_add_ph): New intrinsic.
	(_mm256_reduce_mul_ph): Ditto.
	(_mm256_reduce_min_ph): Ditto.
	(_mm256_reduce_max_ph): Ditto.
	(_mm_reduce_add_ph): Ditto.
	(_mm_reduce_mul_ph): Ditto.
	(_mm_reduce_min_ph): Ditto.
	(_mm_reduce_max_ph): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-reduce-op-1.c: New test.
	* gcc.target/i386/avx512fp16vl-reduce-op-1.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h            |  69 +++++
 gcc/config/i386/avx512fp16vlintrin.h          | 105 ++++++++
 .../gcc.target/i386/avx512fp16-reduce-op-1.c  | 132 ++++++++++
 .../i386/avx512fp16vl-reduce-op-1.c           | 244 ++++++++++++++++++
 4 files changed, 550 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-reduce-op-1.c

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index b8ca9201828..6e0f3a80e54 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -7056,6 +7056,75 @@ _mm_maskz_fmul_round_sch (__mmask8 __A, __m128h __B, __m128h __C, const int __E)
 
 #endif /* __OPTIMIZE__ */
 
+#define _MM512_REDUCE_OP(op) 						\
+  __m256h __T1 = (__m256h) _mm512_extractf64x4_pd ((__m512d) __A, 0);	\
+  __m256h __T2 = (__m256h) _mm512_extractf64x4_pd ((__m512d) __A, 1);	\
+  __m256h __T3 = (__T1 op __T2);					\
+  __m128h __T4 = (__m128h) _mm256_extractf128_pd ((__m256d) __T3, 0);	\
+  __m128h __T5 = (__m128h) _mm256_extractf128_pd ((__m256d) __T3, 1);	\
+  __m128h __T6 = (__T4 op __T5);					\
+  __m128h __T7 = (__m128h) __builtin_shuffle ((__m128h)__T6,		\
+		 (__v8hi) {4, 5, 6, 7, 0, 1, 2, 3});			\
+  __m128h __T8 = (__T6 op __T7);					\
+  __m128h __T9 = (__m128h) __builtin_shuffle ((__m128h)__T8,		\
+		 (__v8hi) {2, 3, 0, 1, 4, 5, 6, 7});			\
+  __m128h __T10 = __T8 op __T9;						\
+  return __T10[0] op __T10[1]
+
+// TODO reduce
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_reduce_add_ph (__m512h __A)
+{
+   _MM512_REDUCE_OP(+);
+}
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_reduce_mul_ph (__m512h __A)
+{
+   _MM512_REDUCE_OP(*);
+}
+
+#undef _MM512_REDUCE_OP
+#define _MM512_REDUCE_OP(op) 						\
+  __m512h __T1 = (__m512h) __builtin_shuffle ((__m512d) __A,		\
+		 (__v8di) {4,5,6,7,0,0,0,0});				\
+  __m512h __T2 = _mm512_##op(__A, __T1);				\
+  __m512h __T3 = (__m512h) __builtin_shuffle ((__m512d) __T2,		\
+		 (__v8di) {2,3,0,0,0,0,0,0});				\
+  __m512h __T4 = _mm512_##op(__T2, __T3);				\
+  __m512h __T5 = (__m512h) __builtin_shuffle ((__m512d) __T4,		\
+		 (__v8di) {1,0,0,0,0,0,0,0});				\
+  __m512h __T6 = _mm512_##op(__T4, __T5);				\
+  __m512h __T7 = (__m512h) __builtin_shuffle ((__m512) __T6,		\
+		 (__v16si) {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0});		\
+  __m512h __T8 = _mm512_##op(__T6, __T7);				\
+  __m512h __T9 = (__m512h) __builtin_shuffle (__T8,			\
+					     (__v32hi) {1,0,0,0,0,0,0,0,\
+							0,0,0,0,0,0,0,0,\
+							0,0,0,0,0,0,0,0,\
+							0,0,0,0,0,0,0,0}\
+							);		\
+  __m512h __T10 = _mm512_##op(__T8, __T9);				\
+  return __T10[0]
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_reduce_min_ph (__m512h __A)
+{
+  _MM512_REDUCE_OP(min_ph);
+}
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_reduce_max_ph (__m512h __A)
+{
+  _MM512_REDUCE_OP(max_ph);
+}
+
+#undef _MM512_REDUCE_OP
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index d4aa9928406..eea1941617f 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -3088,6 +3088,111 @@ _mm256_maskz_fcmul_pch (__mmask8 __A, __m256h __B, __m256h __C)
 						      __A);
 }
 
+#define _MM256_REDUCE_OP(op) 						\
+  __m128h __T1 = (__m128h) _mm256_extractf128_pd ((__m256d) __A, 0);	\
+  __m128h __T2 = (__m128h) _mm256_extractf128_pd ((__m256d) __A, 1);	\
+  __m128h __T3 = (__T1 op __T2);					\
+  __m128h __T4 = (__m128h) __builtin_shuffle (__T3,			\
+		 (__v8hi) {4, 5, 6, 7, 0, 1, 2, 3});			\
+  __m128h __T5 = (__T3) op (__T4);					\
+  __m128h __T6 = (__m128h) __builtin_shuffle (__T5,			\
+		 (__v8hi) {2, 3, 0, 1, 4, 5, 6, 7});			\
+  __m128h __T7 = __T5 op __T6;						\
+  return __T7[0] op __T7[1]
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_reduce_add_ph (__m256h __A)
+{
+  _MM256_REDUCE_OP(+);
+}
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_reduce_mul_ph (__m256h __A)
+{
+  _MM256_REDUCE_OP(*);
+}
+
+#undef _MM256_REDUCE_OP
+#define _MM256_REDUCE_OP(op) 						\
+  __m128h __T1 = (__m128h) _mm256_extractf128_pd ((__m256d) __A, 0);	\
+  __m128h __T2 = (__m128h) _mm256_extractf128_pd ((__m256d) __A, 1);	\
+  __m128h __T3 = _mm_##op (__T1, __T2);				\
+  __m128h __T4 = (__m128h) __builtin_shuffle (__T3,			\
+		 (__v8hi) {2, 3, 0, 1, 6, 7, 4, 5});			\
+  __m128h __T5 = _mm_##op (__T3, __T4);				\
+  __m128h __T6 = (__m128h) __builtin_shuffle (__T5, (__v8hi) {4, 5});	\
+  __m128h __T7 = _mm_##op (__T5, __T6);				\
+  __m128h __T8 = (__m128h) __builtin_shuffle (__T7, (__v8hi) {1, 0});	\
+  __m128h __T9 = _mm_##op (__T7, __T8);				\
+  return __T9[0]
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_reduce_min_ph (__m256h __A)
+{
+  _MM256_REDUCE_OP(min_ph);
+}
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_reduce_max_ph (__m256h __A)
+{
+  _MM256_REDUCE_OP(max_ph);
+}
+
+#define _MM_REDUCE_OP(op) 						\
+  __m128h __T1 = (__m128h) __builtin_shuffle (__A,			\
+		 (__v8hi) {4, 5, 6, 7, 0, 1, 2, 3});			\
+  __m128h __T2 = (__A) op (__T1);					\
+  __m128h __T3 = (__m128h) __builtin_shuffle (__T2,			\
+		 (__v8hi){2, 3, 0, 1, 4, 5, 6, 7});			\
+  __m128h __T4 = __T2 op __T3;						\
+  return __T4[0] op __T4[1]
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_reduce_add_ph (__m128h __A)
+{
+  _MM_REDUCE_OP(+);
+}
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_reduce_mul_ph (__m128h __A)
+{
+  _MM_REDUCE_OP(*);
+}
+
+#undef _MM_REDUCE_OP
+#define _MM_REDUCE_OP(op) 						\
+  __m128h __T1 = (__m128h) __builtin_shuffle (__A,			\
+		 (__v8hi) {2, 3, 0, 1, 6, 7, 4, 5});			\
+  __m128h __T2 = _mm_##op (__A, __T1);					\
+  __m128h __T3 = (__m128h) __builtin_shuffle (__T2, (__v8hi){4, 5});	\
+  __m128h __T4 = _mm_##op (__T2, __T3);				\
+  __m128h __T5 = (__m128h) __builtin_shuffle (__T4, (__v8hi){1, 0});	\
+  __m128h __T6 = _mm_##op (__T4, __T5);				\
+  return __T6[0]
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_reduce_min_ph (__m128h __A)
+{
+  _MM_REDUCE_OP(min_ph);
+}
+
+extern __inline _Float16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_reduce_max_ph (__m128h __A)
+{
+  _MM_REDUCE_OP(max_ph);
+}
+
+#undef _MM256_REDUCE_OP
+#undef _MM_REDUCE_OP
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-1.c
new file mode 100644
index 00000000000..35563166536
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-reduce-op-1.c
@@ -0,0 +1,132 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+
+#include <immintrin.h>
+#include "avx512-check.h"
+
+__m512h a1 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+		238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16,
+		82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+		23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+
+__m512h a2 = { 1.25f16, 2.25f16, -0.25f16, 4.0f16, -2.0f16, 4.0f16, -3.0f16, 2.0f16,
+	       -0.5f16, -1.0f16, 1.0f16, -1.0f16, 1.0f16, 1.0f16, 2.0f16, 4.0f16,
+	       1.25f16, 2.25f16, -4.25f16, 4.0f16, -2.4f16, 4.0f16, -3.0f, 2.0f16,
+	       -4.5f16, 7.6f16, 0.7f16, -8.2f16, 2.1f16, 2.4f16, -2.0f16, 19.4f16 };
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_add_ph (__m512h a)
+{
+  return _mm512_reduce_add_ph (a);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_mul_ph (__m512h a)
+{
+  return _mm512_reduce_mul_ph (a);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_max_ph (__m512h a)
+{
+  return _mm512_reduce_max_ph (a);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_min_ph (__m512h a)
+{
+  return _mm512_reduce_min_ph (a);
+}
+
+#define SIZE 32
+#define REF_ADDMUL(op, a)					\
+  __m256h __a1 = _mm256_setzero_ph ();				\
+  for (int i =0; i < 16; i++) {					\
+    __a1[i] = (_Float16) a[i] op (_Float16) a[i + 16];		\
+  }								\
+  __m128h __a2 = _mm_setzero_ph ();				\
+  for (int i =0; i < 8; i++) {					\
+    __a2[i] = (_Float16) __a1[i] op (_Float16) __a1[i + 8];	\
+  }								\
+  _Float16 __c0 = __a2[0] op __a2[4];				\
+  _Float16 __c1 = __a2[1] op __a2[5];				\
+  _Float16 __c2 = __a2[2] op __a2[6];				\
+  _Float16 __c3 = __a2[3] op __a2[7];				\
+  _Float16 __d0 = __c0 op __c2;					\
+  _Float16 __d1 = __c1 op __c3;					\
+  _Float16 __e0 = __d0 op __d1;					\
+  r3 = __e0
+
+#define TESTOP(opname, op, a)				\
+  do {							\
+    _Float16 r1 = _mm512_reduce_##opname##_ph (a);	\
+    _Float16 r2 = test_reduce_##opname##_ph (a);	\
+    _Float16 r3 = a[0];					\
+    if (r1 != r2) {					\
+      __builtin_abort ();				\
+    }							\
+    REF_ADDMUL (op, a);					\
+    if (r1 != r3) {					\
+      __builtin_abort ();				\
+    }							\
+  } while (0)
+
+#define TEST_ADDMUL_PH(a)			\
+  do {						\
+    TESTOP (add, +, a);				\
+    TESTOP (mul, *, a);				\
+  } while (0)
+
+  static void
+  test_512_addmul_ph (void)
+  {
+    TEST_ADDMUL_PH (a1);
+    TEST_ADDMUL_PH (a2);
+  }
+
+#undef TESTOP
+#define TESTOP(opname, op, a)				\
+  do {							\
+    _Float16 r1 = _mm512_reduce_##opname##_ph (a);	\
+    _Float16 r2 = test_reduce_##opname##_ph (a);	\
+    _Float16 r3 = a[0];					\
+    if (r1 != r2) {					\
+      __builtin_abort ();				\
+    }							\
+    for (int i = 1; i < SIZE; i++)			\
+      r3 = r3 op a[i];					\
+    if (r1 != r3) {					\
+      __builtin_abort ();				\
+    }							\
+  } while (0)
+
+#define TEST_MINMAX_PH(a)			\
+  do {						\
+    TESTOP (min, < a[i] ? r3 :, a);		\
+    TESTOP (max, > a[i] ? r3 :, a);		\
+  } while (0)
+
+static void
+test_512_minmax_ph (void)
+{
+  TEST_MINMAX_PH (a1);
+  TEST_MINMAX_PH (a2);
+}
+
+static void
+do_test (void)
+{
+  test_512_addmul_ph();
+  test_512_minmax_ph();
+}
+
+#undef SIZE
+#undef REF_ADDMUL
+#undef TESTOP 
+#undef TEST_ADDMUL_PH
+#undef TEST_MINMAX_PH
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-reduce-op-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-reduce-op-1.c
new file mode 100644
index 00000000000..70485d89720
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-reduce-op-1.c
@@ -0,0 +1,244 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+
+#include <immintrin.h>
+#include "avx512-check.h"
+
+__m256h a1 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+	       238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16 };
+__m256h a2 = { 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+	       23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+
+__m128h b1 = { 1.25f16, 2.25f16, -0.25f16, 4.0f16, -2.0f16, 4.0f16, -3.0f16, 2.0f16 };
+__m128h b2 = { -0.5f16, -1.0f16, 1.0f16, -1.0f16, 1.0f16, 1.0f16, 2.0f16, 4.0f16 };
+__m128h b3 = { 1.25f16, 2.25f16, -4.25f16, 4.0f16, -2.4f16, 4.0f16, -3.0f, 2.0f16 };
+__m128h b4 = { -4.5f16, 7.6f16, 0.7f16, -8.2f16, 2.1f16, 2.4f16, -2.0f16, 1.4f16 };
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_256_add_ph (__m256h a)
+{
+  return _mm256_reduce_add_ph (a);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_256_mul_ph (__m256h a)
+{
+  return _mm256_reduce_mul_ph (a);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_256_max_ph (__m256h a)
+{
+  return _mm256_reduce_max_ph (a);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_256_min_ph (__m256h a)
+{
+  return _mm256_reduce_min_ph (a);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_add_ph (__m128h b)
+{
+  return _mm_reduce_add_ph (b);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_mul_ph (__m128h b)
+{
+  return _mm_reduce_mul_ph (b);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_max_ph (__m128h b)
+{
+  return _mm_reduce_max_ph (b);
+}
+
+__attribute__((noinline, noclone)) _Float16
+test_reduce_min_ph (__m128h b)
+{
+  return _mm_reduce_min_ph (b);
+}
+
+#define SIZE 16
+#define REF_ADDMUL(op, a)				\
+  __m128h __a1 = _mm_setzero_ph ();			\
+  for (int i = 0; i < 8; i++) {				\
+    __a1[i] = (_Float16) a[i] op (_Float16) a[i + 8];	\
+  }							\
+  _Float16 __c0 = __a1[0] op __a1[4];			\
+  _Float16 __c1 = __a1[1] op __a1[5];			\
+  _Float16 __c2 = __a1[2] op __a1[6];			\
+  _Float16 __c3 = __a1[3] op __a1[7];			\
+  _Float16 __d0 = __c0 op __c2;				\
+  _Float16 __d1 = __c1 op __c3;				\
+  _Float16 __e0 = __d0 op __d1;				\
+  r3 = __e0
+
+#define TESTOP(opname, op, a)				\
+  do {							\
+    _Float16 r1 = _mm256_reduce_##opname##_ph (a);	\
+    _Float16 r2 = test_reduce_256_##opname##_ph (a);	\
+    _Float16 r3 = a[0];					\
+    if (r1 != r2) {					\
+      __builtin_abort ();				\
+    }							\
+    REF_ADDMUL (op, a);					\
+    if (r1 != r3) {					\
+      __builtin_abort ();				\
+    }							\
+  } while (0)
+
+#define TEST_ADDMUL_PH(a)			\
+  do {						\
+    TESTOP (add, +, a);				\
+    TESTOP (mul, *, a);				\
+  } while (0)
+
+static void
+test_256_addmul_ph (void)
+{
+  TEST_ADDMUL_PH (a1);
+  TEST_ADDMUL_PH (a2);
+}
+
+#undef TESTOP
+#define TESTOP(opname, op, a)				\
+  do {							\
+    _Float16 r1 = _mm256_reduce_##opname##_ph (a);	\
+    _Float16 r2 = test_reduce_256_##opname##_ph (a);	\
+    _Float16 r3 = a[0];					\
+    if (r1 != r2) {					\
+      __builtin_abort ();				\
+    }							\
+    for (int i = 1; i < SIZE; i++)			\
+      r3 = r3 op a[i];					\
+    if (r1 != r3) {					\
+      __builtin_abort ();				\
+    }							\
+  } while (0)
+
+#define TEST_MINMAX_PH(a)			\
+  do {						\
+    TESTOP (min, < a[i] ? r3 :, a);		\
+    TESTOP (max, > a[i] ? r3 :, a);		\
+  } while (0)
+
+static void
+test_256_minmax_ph (void)
+{
+  TEST_MINMAX_PH (a1);
+  TEST_MINMAX_PH (a2);
+}
+
+static void
+test_256_ph (void)
+{
+   test_256_addmul_ph ();
+   test_256_minmax_ph ();
+}
+
+#undef SIZE
+#define SIZE 8
+
+#undef REF_ADDMUL
+#define REF_ADDMUL(op, a)			\
+  _Float16 __c0 = a[0] op a[4];			\
+  _Float16 __c1 = a[1] op a[5];			\
+  _Float16 __c2 = a[2] op a[6];			\
+  _Float16 __c3 = a[3] op a[7];			\
+  _Float16 __d0 = __c0 op __c2;			\
+  _Float16 __d1 = __c1 op __c3;			\
+  _Float16 __e0 = __d0 op __d1;			\
+  r3 = __e0
+
+#undef TESTOP
+#define TESTOP(opname, op, a)				\
+  do {							\
+    _Float16 r1 = _mm_reduce_##opname##_ph (a);		\
+    _Float16 r2 = test_reduce_##opname##_ph (a);	\
+    _Float16 r3 = a[0];					\
+    if (r1 != r2) {					\
+      __builtin_abort ();				\
+    }							\
+    REF_ADDMUL (op, a);					\
+    if (r1 != r3) {					\
+      __builtin_abort ();				\
+    }							\
+  } while (0)
+
+#undef TEST_ADDMUL_PH
+#define TEST_ADDMUL_PH(a)			\
+  do {						\
+    TESTOP (add, +, a);				\
+    TESTOP (mul, *, a);				\
+  } while (0)
+
+static void
+test_128_addmul_ph (void)
+{
+  TEST_ADDMUL_PH (b1);
+  TEST_ADDMUL_PH (b2);
+  TEST_ADDMUL_PH (b3);
+  TEST_ADDMUL_PH (b4);
+}
+
+#undef TESTOP
+#define TESTOP(opname, op, b)				\
+  do {							\
+    _Float16 r1 = _mm_reduce_##opname##_ph (b);		\
+    _Float16 r2 = test_reduce_##opname##_ph (b);	\
+    _Float16 r3 = b[0];					\
+    if (r1 != r2) {					\
+      __builtin_abort ();				\
+    }							\
+    for (int i = 1; i < SIZE; i++)			\
+      r3 = r3 op b[i];					\
+    if (r1 != r3) {					\
+      __builtin_abort ();				\
+    }							\
+  } while (0)
+
+#undef TEST_MINMAX_PH
+#define TEST_MINMAX_PH(b)			\
+  do {						\
+    TESTOP (min, < b[i] ? r3 :, b);		\
+    TESTOP (max, > b[i] ? r3 :, b);		\
+  } while (0)
+
+static void
+test_128_minmax_ph (void)
+{
+  TEST_MINMAX_PH (b1);
+  TEST_MINMAX_PH (b2);
+  TEST_MINMAX_PH (b3);
+  TEST_MINMAX_PH (b4);
+}
+
+static void
+test_128_ph (void)
+{
+  test_128_addmul_ph ();
+  test_128_minmax_ph ();
+}
+
+static void
+do_test (void)
+{
+  test_256_ph ();
+  test_128_ph ();
+}
+
+
+#undef SIZE
+#undef REF_ADDMUL
+#undef TESTOP
+#undef TEST_ADDMUL_PH
+#undef TEST_MINMAX_PH
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 61/62] AVX512FP16: Add complex conjugation intrinsic instructions.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (59 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 60/62] AVX512FP16: Add reduce operators(add/mul/min/max) liuhongt
@ 2021-07-01  6:16 ` liuhongt
  2021-07-01  6:16 ` [PATCH 62/62] AVX512FP16: Add permutation and mask blend intrinsics liuhongt
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub, dianhong xu

From: dianhong xu <dianhong.xu@intel.com>

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h: Add new intrinsics.
	(_mm512_conj_pch): New intrinsic.
	(_mm512_mask_conj_pch): Ditto.
	(_mm512_maskz_conj_pch): Ditto.
	* config/i386/avx512fp16vlintrin.h: Add new intrinsics.
	(_mm256_conj_pch): New intrinsic.
	(_mm256_mask_conj_pch): Ditto.
	(_mm256_maskz_conj_pch): Ditto.
	(_mm_conj_pch): Ditto.
	(_mm_mask_conj_pch): Ditto.
	(_mm_maskz_conj_pch): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-conjugation-1.c: New test.
	* gcc.target/i386/avx512fp16vl-conjugation-1.c: New test.
---
 gcc/config/i386/avx512fp16intrin.h            | 25 +++++++
 gcc/config/i386/avx512fp16vlintrin.h          | 53 +++++++++++++++
 .../i386/avx512fp16-conjugation-1.c           | 34 ++++++++++
 .../i386/avx512fp16vl-conjugation-1.c         | 65 +++++++++++++++++++
 4 files changed, 177 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-conjugation-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-conjugation-1.c

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 6e0f3a80e54..38767ef270b 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -718,6 +718,31 @@ _mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
 						   (A), (D)))
 #endif  /* __OPTIMIZE__  */
 
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_conj_pch (__m512h __A)
+{
+  return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 (1<<31));
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A)
+{
+  return (__m512h) __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
+						  (__v16sf) __W,
+						  (__mmask16) __U);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A)
+{
+  return (__m512h) __builtin_ia32_movaps512_mask ((__v16sf) _mm512_conj_pch (__A),
+						  (__v16sf) _mm512_setzero_ps (),
+						  (__mmask16) __U);
+}
+
 /* Intrinsics of v[add,sub,mul,div]sh.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index eea1941617f..9bbd5c5a5f4 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -151,6 +151,59 @@ _mm256_zextph128_ph256 (__m128h __A)
 					 (__m128) __A, 0);
 }
 
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_conj_pch (__m256h __A)
+{
+  return (__m256h) _mm256_xor_epi32 ((__m256i) __A, _mm256_set1_epi32 (1<<31));
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_conj_pch (__m256h __W, __mmask8 __U, __m256h __A)
+{
+  return (__m256h) __builtin_ia32_movaps256_mask ((__v8sf)
+						   _mm256_conj_pch (__A),
+						  (__v8sf) __W,
+						  (__mmask8) __U);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maskz_conj_pch (__mmask8 __U, __m256h __A)
+{
+  return (__m256h) __builtin_ia32_movaps256_mask ((__v8sf)
+						   _mm256_conj_pch (__A),
+						  (__v8sf)
+						   _mm256_setzero_ps (),
+						  (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_conj_pch (__m128h __A)
+{
+  return (__m128h) _mm_xor_epi32 ((__m128i) __A, _mm_set1_epi32 (1<<31));
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_conj_pch (__m128h __W, __mmask8 __U, __m128h __A)
+{
+  return (__m128h) __builtin_ia32_movaps128_mask ((__v4sf) _mm_conj_pch (__A),
+						  (__v4sf) __W,
+						  (__mmask8) __U);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_conj_pch (__mmask8 __U, __m128h __A)
+{
+  return (__m128h) __builtin_ia32_movaps128_mask ((__v4sf) _mm_conj_pch (__A),
+						  (__v4sf) _mm_setzero_ps (),
+						  (__mmask8) __U);
+}
+
 /* Intrinsics v[add,sub,mul,div]ph.  */
 extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-conjugation-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-conjugation-1.c
new file mode 100644
index 00000000000..662b23ca43d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-conjugation-1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <immintrin.h>
+__m512h
+__attribute__ ((noinline, noclone))
+test_mm512_conj_pch (__m512h __A)
+{
+  return _mm512_conj_pch (__A);
+}
+
+/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%zmm\[0-9\]+" 3 } } */
+
+__m512h
+__attribute__ ((noinline, noclone))
+test_mm512_mask_conj_pch (__m512h __W, __mmask16 __U, __m512h __A)
+{
+  return _mm512_mask_conj_pch (__W, __U, __A);
+}
+
+/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%zmm\[0-9\]+" 3 } } */
+/* { dg-final { scan-assembler-times "kmovw\[^\n\]*%k\[1-9\]+" 2 } } */
+/* { dg-final { scan-assembler-times "vmovaps\[^\n]" 2 } } */
+
+__m512h
+__attribute__ ((noinline, noclone))
+test_mm512_maskz_conj_pch (__mmask16 __U, __m512h __A)
+{
+   return _mm512_maskz_conj_pch (__U, __A);
+}
+
+/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%zmm\[0-9\]+" 3 } } */
+/* { dg-final { scan-assembler-times "kmovw\[^\n\]*%k\[1-9\]+" 2 } } */
+/* { dg-final { scan-assembler-times "vmovaps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-conjugation-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-conjugation-1.c
new file mode 100644
index 00000000000..0bce99790c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-conjugation-1.c
@@ -0,0 +1,65 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+
+#include <immintrin.h>
+__m256h
+__attribute__ ((noinline, noclone))
+test_mm256_conj_pch (__m256h __A)
+{
+  return _mm256_conj_pch (__A);
+}
+
+/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%ymm\[0-9\]+" 3  {target { ! ia32} } } } */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%ymm\[0-9\]+" 3  {target ia32} } } */
+
+__m128h
+__attribute__ ((noinline, noclone))
+test_mm_conj_pch (__m128h __A)
+{
+  return _mm_conj_pch (__A);
+}
+
+/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%xmm\[0-9\]+" 3  {target { ! ia32} } } } */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%xmm\[0-9\]+" 3  {target ia32} } } */
+
+__m256h
+__attribute__ ((noinline, noclone))
+test_mm256_mask_conj_pch (__m256h __W, __mmask8 __U, __m256h __A)
+{
+  return _mm256_mask_conj_pch (__W, __U, __A);
+}
+
+/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%ymm\[0-9\]+" 3  {target { ! ia32} } } } */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%ymm\[0-9\]+" 3  {target ia32} } } */
+/* { dg-final { scan-assembler-times "vmovaps\[^\n\]*%ymm\[0-9\]+" 2 } } */
+
+__m128h
+__attribute__ ((noinline, noclone))
+test_mm_mask_conj_pch (__m128h __W, __mmask8 __U, __m128h __A)
+{
+  return _mm_mask_conj_pch (__W, __U, __A);
+}
+
+/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%xmm\[0-9\]+" 3  {target { ! ia32} } } } */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%xmm\[0-9\]+" 3  {target ia32} } } */
+/* { dg-final { scan-assembler-times "vmovaps\[^\n\]*%xmm\[0-9\]+" 2 } } */
+
+__m256h
+__attribute__ ((noinline, noclone))
+test_mm256_maskz_conj_pch (__mmask8 __U, __m256h __A)
+{
+   return _mm256_maskz_conj_pch (__U, __A);
+}
+/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%ymm\[0-9\]+" 3  {target { ! ia32} } } } */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%ymm\[0-9\]+" 3  {target ia32} } } */
+/* { dg-final { scan-assembler-times "vmovaps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+__m128h
+__attribute__ ((noinline, noclone))
+test_mm_maskz_conj_pch (__mmask8 __U, __m128h __A) {
+   return _mm_maskz_conj_pch (__U, __A);
+}
+
+/* { dg-final { scan-assembler-times "vpxord\[^\n\]*%xmm\[0-9\]+" 3  {target { ! ia32} } } } */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%xmm\[0-9\]+" 3  {target ia32} } } */
+/* { dg-final { scan-assembler-times "vmovaps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 62/62] AVX512FP16: Add permutation and mask blend intrinsics.
  2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
                   ` (60 preceding siblings ...)
  2021-07-01  6:16 ` [PATCH 61/62] AVX512FP16: Add complex conjugation intrinsic instructions liuhongt
@ 2021-07-01  6:16 ` liuhongt
  61 siblings, 0 replies; 85+ messages in thread
From: liuhongt @ 2021-07-01  6:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools, ubizjak, jakub, dianhong xu

From: dianhong xu <dianhong.xu@intel.com>

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h:
	(_mm512_mask_blend_ph): New intrinsic.
	(_mm512_permutex2var_ph): Ditto.
	(_mm512_permutexvar_ph): Ditto.
	* config/i386/avx512fp16vlintrin.h:
	(_mm256_mask_blend_ph): New intrinsic.
	(_mm256_permutex2var_ph): Ditto.
	(_mm256_permutexvar_ph): Ditto.
	(_mm_mask_blend_ph): Ditto.
	(_mm_permutex2var_ph): Ditto.
	(_mm_permutexvar_ph): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-14.c: New test.
---
 gcc/config/i386/avx512fp16intrin.h            | 31 +++++++
 gcc/config/i386/avx512fp16vlintrin.h          | 62 +++++++++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-14.c | 91 +++++++++++++++++++
 3 files changed, 184 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-14.c

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 38767ef270b..2a2cb7b6348 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -7150,6 +7150,37 @@ _mm512_reduce_max_ph (__m512h __A)
 
 #undef _MM512_REDUCE_OP
 
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_mask_blend_ph (__mmask32 __U, __m512h __A, __m512h __W)
+{
+  return (__m512h) __builtin_ia32_movdquhi512_mask ((__v32hi) __W,
+						    (__v32hi) __A,
+						    (__mmask32) __U);
+
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_permutex2var_ph (__m512h __A, __m512i __I, __m512h __B)
+{
+  return (__m512h) __builtin_ia32_vpermi2varhi512_mask ((__v32hi) __A,
+						       (__v32hi) __I,
+						       (__v32hi) __B,
+						       (__mmask32)-1);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_permutexvar_ph (__m512i __A, __m512h __B)
+{
+  return (__m512h) __builtin_ia32_permvarhi512_mask ((__v32hi) __B,
+						     (__v32hi) __A,
+						     (__v32hi)
+						     (_mm512_setzero_ph ()),
+						     (__mmask32)-1);
+}
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
index 9bbd5c5a5f4..bc691ee61b7 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -3246,6 +3246,68 @@ _mm_reduce_max_ph (__m128h __A)
 #undef _MM256_REDUCE_OP
 #undef _MM_REDUCE_OP
 
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_mask_blend_ph (__mmask16 __U, __m256h __A, __m256h __W)
+{
+  return (__m256h) __builtin_ia32_movdquhi256_mask ((__v16hi) __W,
+						    (__v16hi) __A,
+						    (__mmask16) __U);
+
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_permutex2var_ph (__m256h __A, __m256i __I, __m256h __B)
+{
+  return (__m256h) __builtin_ia32_vpermi2varhi256_mask ((__v16hi) __A,
+						       (__v16hi) __I,
+						       (__v16hi) __B,
+						       (__mmask16)-1);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_permutexvar_ph (__m256i __A, __m256h __B)
+{
+  return (__m256h) __builtin_ia32_permvarhi256_mask ((__v16hi) __B,
+						     (__v16hi) __A,
+						     (__v16hi)
+						     (_mm256_setzero_ph ()),
+						     (__mmask16)-1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_blend_ph (__mmask8 __U, __m128h __A, __m128h __W)
+{
+  return (__m128h) __builtin_ia32_movdquhi128_mask ((__v8hi) __W,
+						    (__v8hi) __A,
+						    (__mmask8) __U);
+
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_permutex2var_ph (__m128h __A, __m128i __I, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_vpermi2varhi128_mask ((__v8hi) __A,
+						       (__v8hi) __I,
+						       (__v8hi) __B,
+						       (__mmask8)-1);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_permutexvar_ph (__m128i __A, __m128h __B)
+{
+  return (__m128h) __builtin_ia32_permvarhi128_mask ((__v8hi) __B,
+						     (__v8hi) __A,
+						     (__v8hi)
+						     (_mm_setzero_ph ()),
+						     (__mmask8)-1);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-14.c b/gcc/testsuite/gcc.target/i386/avx512fp16-14.c
new file mode 100644
index 00000000000..b2321fbcbab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-14.c
@@ -0,0 +1,91 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512bw" } */
+
+#include <immintrin.h>
+
+__m512h
+__attribute__ ((noinline, noclone))
+test_mm512_mask_blend_ph (__mmask32 U, __m512h A, __m512h B )
+{
+  return _mm512_mask_blend_ph (U, A, B);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpblendmw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target ia32 } } } */
+
+__m512h
+__attribute__ ((noinline, noclone))
+test_mm512_permutex2var_ph (__m512h A, __m512i I, __m512h B)
+{
+  return _mm512_permutex2var_ph (A, I, B);
+}
+
+/* { dg-final { scan-assembler-times "vpermt2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpermi2w\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+" 1 { target ia32 } } } */
+
+__m512h
+__attribute__ ((noinline, noclone))
+test_mm512_permutexvar_ph (__m512i A, __m512h B)
+{
+  return _mm512_permutexvar_ph (A, B);
+}
+
+/* { dg-final { scan-assembler-times "vpermw\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+" 1 } } */
+
+__m256h
+__attribute__ ((noinline, noclone))
+test_mm256_mask_blend_ph (__mmask16 U, __m256h A, __m256h B )
+{
+  return _mm256_mask_blend_ph (U, A, B);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpblendmw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target ia32 } } } */
+
+__m256h
+__attribute__ ((noinline, noclone))
+test_mm256_permutex2var_ph (__m256h A, __m256i I, __m256h B)
+{
+  return _mm256_permutex2var_ph (A, I, B);
+}
+
+/* { dg-final { scan-assembler-times "vpermt2w\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpermi2w\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+" 1 { target ia32 } } } */
+
+__m256h
+__attribute__ ((noinline, noclone))
+test_mm256_permutexvar_ph (__m256i A, __m256h B)
+{
+  return _mm256_permutexvar_ph (A, B);
+}
+
+/* { dg-final { scan-assembler-times "vpermw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+" 1 } } */
+
+__m128h
+__attribute__ ((noinline, noclone))
+test_mm_mask_blend_ph (__mmask8 U, __m128h A, __m128h B )
+{
+  return _mm_mask_blend_ph (U, A, B);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpblendmw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { target ia32 } } } */
+
+__m128h
+__attribute__ ((noinline, noclone))
+test_mm_permutex2var_ph (__m128h A, __m128i I, __m128h B)
+{
+  return _mm_permutex2var_ph (A, I, B);
+}
+
+/* { dg-final { scan-assembler-times "vpermt2w\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpermi2w\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+" 1 { target ia32 } } } */
+
+__m128h
+__attribute__ ((noinline, noclone))
+test_mm_permutexvar_ph (__m128i A, __m128h B)
+{
+  return _mm_permutexvar_ph (A, B);
+}
+
+/* { dg-final { scan-assembler-times "vpermw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+" 1 } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).
  2021-07-01  6:16 ` [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16) liuhongt
@ 2021-07-01  9:50   ` Richard Biener
  2021-07-01 10:23     ` Hongtao Liu
  2021-07-01 21:17   ` Joseph Myers
  1 sibling, 1 reply; 85+ messages in thread
From: Richard Biener @ 2021-07-01  9:50 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Jakub Jelinek

On Thu, Jul 1, 2021 at 9:20 AM liuhongt via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:

How does this look on GIMPLE and why's it not better handled there?

Richard.

> gcc/ChangeLog:
>
>         * config/i386/i386.md (*sqrthf2): New define_insn.
>         * config/i386/sse.md
>         (*avx512fp16_vmsqrthf2<mask_scalar_name><round_scalar_name>):
>         Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx512fp16-builtin-sqrt-2.c: New test.
> ---
>  gcc/config/i386/i386.md                        | 18 ++++++++++++++++++
>  gcc/config/i386/sse.md                         | 18 ++++++++++++++++++
>  .../i386/avx512fp16-builtin-sqrt-2.c           | 18 ++++++++++++++++++
>  3 files changed, 54 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 5f45c4ff583..684b2080a93 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -16583,6 +16583,24 @@ (define_insn "*sqrt<mode>2_sse"
>             ]
>             (symbol_ref "true")))])
>
> +/* Optimize for code like (_Float16) __builtin_sqrtf ((float) f16)
> +   since it's not handled in frontend.  */
> +(define_insn "*sqrthf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v,v")
> +       (float_truncate:HF
> +         (sqrt:MODEF
> +           (float_extend:MODEF
> +             (match_operand:HF 1 "nonimmediate_operand" "v,m")))))]
> +  "TARGET_AVX512FP16"
> +  "@
> +   vsqrtsh\t{%d1, %0|%0, %d1}
> +   vsqrtsh\t{%1, %d0|%d0, %1}"
> +  [(set_attr "type" "sse")
> +   (set_attr "atom_sse_attr" "sqrt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "avx_partial_xmm_update" "false,true")
> +   (set_attr "mode" "HF")])
> +
>  (define_expand "sqrthf2"
>    [(set (match_operand:HF 0 "register_operand")
>         (sqrt:HF
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index a76c30c75cb..f87f6893835 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -2407,6 +2407,24 @@ (define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
>     (set_attr "btver2_sse_attr" "sqrt")
>     (set_attr "mode" "<ssescalarmode>")])
>
> +(define_insn "*avx512fp16_vmsqrthf2<mask_scalar_name><round_scalar_name>"
> +  [(set (match_operand:V8HF 0 "register_operand" "=v")
> +       (vec_merge:V8HF
> +         (vec_duplicate:V8HF
> +           (float_truncate:HF
> +             (sqrt:MODEF
> +               (float_extend:MODEF
> +                 (match_operand:HF 1 "nonimmediate_operand" "<round_scalar_constraint>")))))
> +         (match_operand:VFH_128 2 "register_operand" "v")
> +         (const_int 1)))]
> +  "TARGET_AVX512FP16"
> +  "vsqrtsh\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %1<round_scalar_mask_op3>}"
> +  [(set_attr "type" "sse")
> +   (set_attr "atom_sse_attr" "sqrt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
> +
>  (define_expand "rsqrt<mode>2"
>    [(set (match_operand:VF1_AVX512ER_128_256 0 "register_operand")
>         (unspec:VF1_AVX512ER_128_256
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
> new file mode 100644
> index 00000000000..4fefee179af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mavx512fp16" } */
> +
> +#include<math.h>
> +_Float16
> +foo (_Float16 f16)
> +{
> +  return sqrtf (f16);
> +}
> +
> +_Float16
> +foo1 (_Float16 f16)
> +{
> +  return sqrt (f16);
> +}
> +
> +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
> +/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 2 } } */
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16).
  2021-07-01  6:16 ` [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16) liuhongt
@ 2021-07-01  9:52   ` Richard Biener
  2021-07-01 21:26   ` Joseph Myers
  1 sibling, 0 replies; 85+ messages in thread
From: Richard Biener @ 2021-07-01  9:52 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Jakub Jelinek

On Thu, Jul 1, 2021 at 9:22 AM liuhongt via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> gcc/ChangeLog:

Same question.  There's maybe no direct optab for ceil but the foldings
could emit .CEIL () internal fns based on availability.

>         * config/i386/i386.md (*avx512fp16_1_roundhf2): New define_insn.
>         * config/i386/sse.md (*avx512fp16_1_roundhf): New fine_insn.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx512fp16-builtin-round-2.c: New test.
> ---
>  gcc/config/i386/i386.md                       | 22 ++++++++++++++
>  gcc/config/i386/sse.md                        | 20 +++++++++++++
>  .../i386/avx512fp16-builtin-round-2.c         | 29 +++++++++++++++++++
>  3 files changed, 71 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 684b2080a93..457f37dcb61 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -17738,6 +17738,28 @@ (define_expand "significand<mode>2"
>  })
>
>
> +/* Optimize for code like (_Float16) __builtin_ceif ((float) f16)
> +   since it's not handled in frontend.  */
> +
> +(define_insn "*avx512fp16_1_roundhf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v,v")
> +       (float_truncate:HF
> +         (unspec:MODEF
> +           [(float_extend:MODEF
> +               (match_operand:HF 1 "nonimmediate_operand" "v,m"))
> +            (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
> +           UNSPEC_ROUND)))]
> +  "TARGET_AVX512FP16"
> +  "@
> +   vrndscalesh\t{%2, %d1, %0|%0, %d1, %2}
> +   vrndscalesh\t{%2, %1, %d0|%d0, %1, %2}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "length_immediate" "1,1")
> +   (set_attr "prefix" "evex")
> +   (set_attr "avx_partial_xmm_update" "false,true")
> +   (set_attr "mode" "HF")])
> +
> +
>  (define_insn "sse4_1_round<mode>2"
>    [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
>         (unspec:MODEFH
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 2b8d12086f4..b3d8ffb4f8e 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -20220,6 +20220,26 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
>     (set_attr "prefix" "orig,orig,vex,evex")
>     (set_attr "mode" "<MODE>")])
>
> +(define_insn "*avx512fp16_1_roundhf"
> +  [(set (match_operand:V8HF 0 "register_operand" "=v")
> +       (vec_merge:V8HF
> +         (vec_duplicate:V8HF
> +           (float_truncate:HF
> +             (unspec:MODEF
> +               [(float_extend:MODEF
> +                  (match_operand:HF 2 "nonimmediate_operand" "vm"))
> +                (match_operand:SI 3 "const_0_to_15_operand" "n")]
> +             UNSPEC_ROUND)))
> +         (match_operand:V8HF 1 "register_operand" "v")
> +         (const_int 1)))]
> +  "TARGET_AVX512FP16"
> +  "vrndscalesh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "length_immediate" "1")
> +   (set_attr "prefix_extra" "1")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*sse4_1_round<ssescalarmodesuffix>"
>    [(set (match_operand:VFH_128 0 "register_operand" "=Yr,*x,x,v")
>         (vec_merge:VFH_128
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c
> new file mode 100644
> index 00000000000..bcd41929637
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-round-2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo1 (_Float16 a)
> +{
> +  return __builtin_roundeven (a);
> +}
> +
> +_Float16
> +foo2 (_Float16 a)
> +{
> +  return __builtin_trunc (a);
> +}
> +
> +_Float16
> +foo3 (_Float16 a)
> +{
> +  return __builtin_ceil (a);
> +}
> +
> +_Float16
> +foo4 (_Float16 a)
> +{
> +  return __builtin_floor (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
> +/* { dg-final { scan-assembler-times "vrndscalesh\[^\n\r\]*xmm\[0-9\]" 4 } } */
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).
  2021-07-01  9:50   ` Richard Biener
@ 2021-07-01 10:23     ` Hongtao Liu
  2021-07-01 12:43       ` Richard Biener
  0 siblings, 1 reply; 85+ messages in thread
From: Hongtao Liu @ 2021-07-01 10:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: liuhongt, Jakub Jelinek, GCC Patches

On Thu, Jul 1, 2021 at 5:51 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Jul 1, 2021 at 9:20 AM liuhongt via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>
> How does this look on GIMPLE and why's it not better handled there?
Do  you mean in match.pd, i'll try that.

C++ FE doesn't support _FLoat16, and the place float/double are
handled is in convert.c(which is GENERIC?), that's why I decided to do
it in the backend.

  /* Disable until we figure out how to decide whether the functions are
     present in runtime.  */
  /* Convert (float)sqrt((double)x) where x is float into sqrtf(x) */
  if (optimize
      && (TYPE_MODE (type) == TYPE_MODE (double_type_node)
          || TYPE_MODE (type) == TYPE_MODE (float_type_node)))

>
> Richard.
>
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.md (*sqrthf2): New define_insn.
> >         * config/i386/sse.md
> >         (*avx512fp16_vmsqrthf2<mask_scalar_name><round_scalar_name>):
> >         Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/avx512fp16-builtin-sqrt-2.c: New test.
> > ---
> >  gcc/config/i386/i386.md                        | 18 ++++++++++++++++++
> >  gcc/config/i386/sse.md                         | 18 ++++++++++++++++++
> >  .../i386/avx512fp16-builtin-sqrt-2.c           | 18 ++++++++++++++++++
> >  3 files changed, 54 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 5f45c4ff583..684b2080a93 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -16583,6 +16583,24 @@ (define_insn "*sqrt<mode>2_sse"
> >             ]
> >             (symbol_ref "true")))])
> >
> > +/* Optimize for code like (_Float16) __builtin_sqrtf ((float) f16)
> > +   since it's not handled in frontend.  */
> > +(define_insn "*sqrthf2"
> > +  [(set (match_operand:HF 0 "register_operand" "=v,v")
> > +       (float_truncate:HF
> > +         (sqrt:MODEF
> > +           (float_extend:MODEF
> > +             (match_operand:HF 1 "nonimmediate_operand" "v,m")))))]
> > +  "TARGET_AVX512FP16"
> > +  "@
> > +   vsqrtsh\t{%d1, %0|%0, %d1}
> > +   vsqrtsh\t{%1, %d0|%d0, %1}"
> > +  [(set_attr "type" "sse")
> > +   (set_attr "atom_sse_attr" "sqrt")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "avx_partial_xmm_update" "false,true")
> > +   (set_attr "mode" "HF")])
> > +
> >  (define_expand "sqrthf2"
> >    [(set (match_operand:HF 0 "register_operand")
> >         (sqrt:HF
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index a76c30c75cb..f87f6893835 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -2407,6 +2407,24 @@ (define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
> >     (set_attr "btver2_sse_attr" "sqrt")
> >     (set_attr "mode" "<ssescalarmode>")])
> >
> > +(define_insn "*avx512fp16_vmsqrthf2<mask_scalar_name><round_scalar_name>"
> > +  [(set (match_operand:V8HF 0 "register_operand" "=v")
> > +       (vec_merge:V8HF
> > +         (vec_duplicate:V8HF
> > +           (float_truncate:HF
> > +             (sqrt:MODEF
> > +               (float_extend:MODEF
> > +                 (match_operand:HF 1 "nonimmediate_operand" "<round_scalar_constraint>")))))
> > +         (match_operand:VFH_128 2 "register_operand" "v")
> > +         (const_int 1)))]
> > +  "TARGET_AVX512FP16"
> > +  "vsqrtsh\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %1<round_scalar_mask_op3>}"
> > +  [(set_attr "type" "sse")
> > +   (set_attr "atom_sse_attr" "sqrt")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "HF")])
> > +
> > +
> >  (define_expand "rsqrt<mode>2"
> >    [(set (match_operand:VF1_AVX512ER_128_256 0 "register_operand")
> >         (unspec:VF1_AVX512ER_128_256
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
> > new file mode 100644
> > index 00000000000..4fefee179af
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Ofast -mavx512fp16" } */
> > +
> > +#include<math.h>
> > +_Float16
> > +foo (_Float16 f16)
> > +{
> > +  return sqrtf (f16);
> > +}
> > +
> > +_Float16
> > +foo1 (_Float16 f16)
> > +{
> > +  return sqrt (f16);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
> > +/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 2 } } */
> > --
> > 2.18.1
> >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).
  2021-07-01 10:23     ` Hongtao Liu
@ 2021-07-01 12:43       ` Richard Biener
  2021-07-01 21:48         ` Joseph Myers
  0 siblings, 1 reply; 85+ messages in thread
From: Richard Biener @ 2021-07-01 12:43 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: liuhongt, Jakub Jelinek, GCC Patches

On Thu, Jul 1, 2021 at 12:18 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Jul 1, 2021 at 5:51 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Jul 1, 2021 at 9:20 AM liuhongt via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >
> > How does this look on GIMPLE and why's it not better handled there?
> Do  you mean in match.pd, i'll try that.
>
> C++ FE doesn't support _FLoat16, and the place float/double are
> handled is in convert.c(which is GENERIC?), that's why I decided to do
> it in the backend.
>
>   /* Disable until we figure out how to decide whether the functions are
>      present in runtime.  */
>   /* Convert (float)sqrt((double)x) where x is float into sqrtf(x) */
>   if (optimize
>       && (TYPE_MODE (type) == TYPE_MODE (double_type_node)
>           || TYPE_MODE (type) == TYPE_MODE (float_type_node)))

Yes, but we can easily add a pattern to match.pd, sth like

(for sq (SQRT)
 (simplify
  (convert (sq@1 (convert @0)))
  (if (types_match (type, TREE_TYPE (@0))
       && TYPE_PRECISION (TREE_TYPE (@1)) > TYPE_PRECISION (TREE_TYPE (@0))
       && direct_internal_fn_supported_p (IFN_SQRT, type, OPTIMIZE_FOR_BOTH))
   (IFN_SQRT @0)))

or so.

> >
> > Richard.
> >
> > > gcc/ChangeLog:
> > >
> > >         * config/i386/i386.md (*sqrthf2): New define_insn.
> > >         * config/i386/sse.md
> > >         (*avx512fp16_vmsqrthf2<mask_scalar_name><round_scalar_name>):
> > >         Ditto.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/i386/avx512fp16-builtin-sqrt-2.c: New test.
> > > ---
> > >  gcc/config/i386/i386.md                        | 18 ++++++++++++++++++
> > >  gcc/config/i386/sse.md                         | 18 ++++++++++++++++++
> > >  .../i386/avx512fp16-builtin-sqrt-2.c           | 18 ++++++++++++++++++
> > >  3 files changed, 54 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
> > >
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index 5f45c4ff583..684b2080a93 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -16583,6 +16583,24 @@ (define_insn "*sqrt<mode>2_sse"
> > >             ]
> > >             (symbol_ref "true")))])
> > >
> > > +/* Optimize for code like (_Float16) __builtin_sqrtf ((float) f16)
> > > +   since it's not handled in frontend.  */
> > > +(define_insn "*sqrthf2"
> > > +  [(set (match_operand:HF 0 "register_operand" "=v,v")
> > > +       (float_truncate:HF
> > > +         (sqrt:MODEF
> > > +           (float_extend:MODEF
> > > +             (match_operand:HF 1 "nonimmediate_operand" "v,m")))))]
> > > +  "TARGET_AVX512FP16"
> > > +  "@
> > > +   vsqrtsh\t{%d1, %0|%0, %d1}
> > > +   vsqrtsh\t{%1, %d0|%d0, %1}"
> > > +  [(set_attr "type" "sse")
> > > +   (set_attr "atom_sse_attr" "sqrt")
> > > +   (set_attr "prefix" "evex")
> > > +   (set_attr "avx_partial_xmm_update" "false,true")
> > > +   (set_attr "mode" "HF")])
> > > +
> > >  (define_expand "sqrthf2"
> > >    [(set (match_operand:HF 0 "register_operand")
> > >         (sqrt:HF
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index a76c30c75cb..f87f6893835 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -2407,6 +2407,24 @@ (define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
> > >     (set_attr "btver2_sse_attr" "sqrt")
> > >     (set_attr "mode" "<ssescalarmode>")])
> > >
> > > +(define_insn "*avx512fp16_vmsqrthf2<mask_scalar_name><round_scalar_name>"
> > > +  [(set (match_operand:V8HF 0 "register_operand" "=v")
> > > +       (vec_merge:V8HF
> > > +         (vec_duplicate:V8HF
> > > +           (float_truncate:HF
> > > +             (sqrt:MODEF
> > > +               (float_extend:MODEF
> > > +                 (match_operand:HF 1 "nonimmediate_operand" "<round_scalar_constraint>")))))
> > > +         (match_operand:VFH_128 2 "register_operand" "v")
> > > +         (const_int 1)))]
> > > +  "TARGET_AVX512FP16"
> > > +  "vsqrtsh\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %1<round_scalar_mask_op3>}"
> > > +  [(set_attr "type" "sse")
> > > +   (set_attr "atom_sse_attr" "sqrt")
> > > +   (set_attr "prefix" "evex")
> > > +   (set_attr "mode" "HF")])
> > > +
> > > +
> > >  (define_expand "rsqrt<mode>2"
> > >    [(set (match_operand:VF1_AVX512ER_128_256 0 "register_operand")
> > >         (unspec:VF1_AVX512ER_128_256
> > > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
> > > new file mode 100644
> > > index 00000000000..4fefee179af
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-2.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-Ofast -mavx512fp16" } */
> > > +
> > > +#include<math.h>
> > > +_Float16
> > > +foo (_Float16 f16)
> > > +{
> > > +  return sqrtf (f16);
> > > +}
> > > +
> > > +_Float16
> > > +foo1 (_Float16 f16)
> > > +{
> > > +  return sqrt (f16);
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
> > > +/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 2 } } */
> > > --
> > > 2.18.1
> > >
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).
  2021-07-01  6:16 ` [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16) liuhongt
  2021-07-01  9:50   ` Richard Biener
@ 2021-07-01 21:17   ` Joseph Myers
  1 sibling, 0 replies; 85+ messages in thread
From: Joseph Myers @ 2021-07-01 21:17 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, jakub

On Thu, 1 Jul 2021, liuhongt via Gcc-patches wrote:

> +/* Optimize for code like (_Float16) __builtin_sqrtf ((float) f16)
> +   since it's not handled in frontend.  */

If correct, it *should* be handled in front end (well, middle-end).  See 
what convert.c:convert_to_real_1 does, with a long comment about when it's 
safe for sqrt (the comment says it's safe when P1 >= P2*2+2, which is true 
for SFmode and HFmode).

The issue (apart from convert_to_real_1 being earlier than this really 
ought to be done - something based on match.pd would be better - but you 
can ignore that for now) would be the limitation earlier in that code to 
the modes of float and double:

  /* Disable until we figure out how to decide whether the functions are
     present in runtime.  */
  /* Convert (float)sqrt((double)x) where x is float into sqrtf(x) */
  if (optimize
      && (TYPE_MODE (type) == TYPE_MODE (double_type_node)
          || TYPE_MODE (type) == TYPE_MODE (float_type_node)))

In this case, you *don't* have the sqrtf16 function in the runtime library 
(adding _Float16 support to glibc would be good, but runs into various 
other complications that would need considering, especially questions of 
how if at all it can be added on an architecture before the minimum GCC 
version for building glibc for that architecture is recent enough to 
support _Float16 for that architecture).  So effectively what you'd need 
is some way of saying "__builtin_sqrtf16 is available", where "available" 
for now means "will be expanded inline", i.e. some combination of 
!flag_math_errno and instruction set features.  That's not really within 
the scope of what the libc_has_function hook does, but it could maybe be 
extended to take information about the exact function in question, or 
another similar hook could be added.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16).
  2021-07-01  6:16 ` [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16) liuhongt
  2021-07-01  9:52   ` Richard Biener
@ 2021-07-01 21:26   ` Joseph Myers
  2021-07-02  7:36     ` Richard Biener
  1 sibling, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2021-07-01 21:26 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, jakub

On Thu, 1 Jul 2021, liuhongt via Gcc-patches wrote:

> +/* Optimize for code like (_Float16) __builtin_ceif ((float) f16)
> +   since it's not handled in frontend.  */

Much the same comments apply as for sqrt.  But in this case, the 
conversion code is in match.pd - right now, specific to pairs of types 
(float, double) and (float, long double).  And it's logically valid for 
any pair of same-radix floating-point types, the values of one of which 
are a subset of the values of the other (a strict subset, for it actually 
to be an interesting optimization).  (So when making it apply to more 
general types, take care that it does *not* apply to the __ibm128 / 
_Float128 pair on powerpc64le, in either order, because neither of those 
types has values a subset of the other.)

(Also, the match.pd code isn't handling roundeven at present, but that 
should be a trivial addition to it.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).
  2021-07-01 12:43       ` Richard Biener
@ 2021-07-01 21:48         ` Joseph Myers
  2021-07-02  7:38           ` Richard Biener
  0 siblings, 1 reply; 85+ messages in thread
From: Joseph Myers @ 2021-07-01 21:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: Hongtao Liu, Jakub Jelinek, liuhongt, GCC Patches

On Thu, 1 Jul 2021, Richard Biener via Gcc-patches wrote:

> > C++ FE doesn't support _FLoat16, and the place float/double are
> > handled is in convert.c(which is GENERIC?), that's why I decided to do
> > it in the backend.

I think there ought to be a preliminary patch series adding whatever 
_FloatN support is relevant to the C++ front end - covering at least those 
types that have modes different from float / double / long double, even if 
you don't cover all the _FloatN / _FloatNx types (e.g. _Float32 as 
distinct from float), and ensuring the corresponding constant suffixes are 
also accepted in C++ in whatever way makes sense for that language.  (As I 
noted in bug 85518, there are ICEs in name mangling when such types escape 
into C++ code at present.)

When this was discussed on the gcc list in March, Jonathan Wakely at least 
supported making _Float16 available in C++ 
<https://gcc.gnu.org/pipermail/gcc/2021-March/234982.html> 
<https://gcc.gnu.org/pipermail/gcc/2021-March/235008.html>, even if no C++ 
front-end maintainers contributed to that discussion.

> Yes, but we can easily add a pattern to match.pd, sth like
> 
> (for sq (SQRT)
>  (simplify
>   (convert (sq@1 (convert @0)))
>   (if (types_match (type, TREE_TYPE (@0))
>        && TYPE_PRECISION (TREE_TYPE (@1)) > TYPE_PRECISION (TREE_TYPE (@0))

(With a more complicated precision condition, see convert.c for details.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16).
  2021-07-01 21:26   ` Joseph Myers
@ 2021-07-02  7:36     ` Richard Biener
  2021-07-02 11:46       ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 85+ messages in thread
From: Richard Biener @ 2021-07-02  7:36 UTC (permalink / raw)
  To: Joseph Myers; +Cc: liuhongt, Jakub Jelinek, GCC Patches

On Thu, Jul 1, 2021 at 11:26 PM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 1 Jul 2021, liuhongt via Gcc-patches wrote:
>
> > +/* Optimize for code like (_Float16) __builtin_ceif ((float) f16)
> > +   since it's not handled in frontend.  */
>
> Much the same comments apply as for sqrt.  But in this case, the
> conversion code is in match.pd - right now, specific to pairs of types
> (float, double) and (float, long double).  And it's logically valid for
> any pair of same-radix floating-point types, the values of one of which
> are a subset of the values of the other (a strict subset, for it actually
> to be an interesting optimization).  (So when making it apply to more
> general types, take care that it does *not* apply to the __ibm128 /
> _Float128 pair on powerpc64le, in either order, because neither of those
> types has values a subset of the other.)
>
> (Also, the match.pd code isn't handling roundeven at present, but that
> should be a trivial addition to it.)

Note that for non-"standard" float formats emitting builtins would require
guaranteed libm support (IIRC we don't have rounding functions in
libgcc), so guarding on availability of a machine instruction (optab)
for the operation should be tested instead - see my proposed match
rule for sqrt and using internal-functions and the direct optab machinery.

Richard.

> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).
  2021-07-01 21:48         ` Joseph Myers
@ 2021-07-02  7:38           ` Richard Biener
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Biener @ 2021-07-02  7:38 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Hongtao Liu, Jakub Jelinek, liuhongt, GCC Patches

On Thu, Jul 1, 2021 at 11:49 PM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 1 Jul 2021, Richard Biener via Gcc-patches wrote:
>
> > > C++ FE doesn't support _FLoat16, and the place float/double are
> > > handled is in convert.c(which is GENERIC?), that's why I decided to do
> > > it in the backend.
>
> I think there ought to be a preliminary patch series adding whatever
> _FloatN support is relevant to the C++ front end - covering at least those
> types that have modes different from float / double / long double, even if
> you don't cover all the _FloatN / _FloatNx types (e.g. _Float32 as
> distinct from float), and ensuring the corresponding constant suffixes are
> also accepted in C++ in whatever way makes sense for that language.  (As I
> noted in bug 85518, there are ICEs in name mangling when such types escape
> into C++ code at present.)
>
> When this was discussed on the gcc list in March, Jonathan Wakely at least
> supported making _Float16 available in C++
> <https://gcc.gnu.org/pipermail/gcc/2021-March/234982.html>
> <https://gcc.gnu.org/pipermail/gcc/2021-March/235008.html>, even if no C++
> front-end maintainers contributed to that discussion.

Agreed.

> > Yes, but we can easily add a pattern to match.pd, sth like
> >
> > (for sq (SQRT)
> >  (simplify
> >   (convert (sq@1 (convert @0)))
> >   (if (types_match (type, TREE_TYPE (@0))
> >        && TYPE_PRECISION (TREE_TYPE (@1)) > TYPE_PRECISION (TREE_TYPE (@0))
>
> (With a more complicated precision condition, see convert.c for details.)
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16).
  2021-07-02  7:36     ` Richard Biener
@ 2021-07-02 11:46       ` Bernhard Reutner-Fischer
  2021-07-04  5:17         ` Hongtao Liu
  0 siblings, 1 reply; 85+ messages in thread
From: Bernhard Reutner-Fischer @ 2021-07-02 11:46 UTC (permalink / raw)
  To: Richard Biener, Richard Biener via Gcc-patches, Joseph Myers
  Cc: Jakub Jelinek, liuhongt, GCC Patches

On 2 July 2021 09:36:54 CEST, Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>On Thu, Jul 1, 2021 at 11:26 PM Joseph Myers <joseph@codesourcery.com>
>wrote:
>>
>> On Thu, 1 Jul 2021, liuhongt via Gcc-patches wrote:
>>
>> > +/* Optimize for code like (_Float16) __builtin_ceif ((float) f16)
>> > +   since it's not handled in frontend.  */

In addition to what others said, I suppose you mean ceil with an 'l'?

thanks,

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16).
  2021-07-02 11:46       ` Bernhard Reutner-Fischer
@ 2021-07-04  5:17         ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-07-04  5:17 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: Richard Biener, Richard Biener via Gcc-patches, Joseph Myers,
	Jakub Jelinek, liuhongt

On Fri, Jul 2, 2021 at 7:48 PM Bernhard Reutner-Fischer via
Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
> On 2 July 2021 09:36:54 CEST, Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >On Thu, Jul 1, 2021 at 11:26 PM Joseph Myers <joseph@codesourcery.com>
> >wrote:
> >>
> >> On Thu, 1 Jul 2021, liuhongt via Gcc-patches wrote:
> >>
> >> > +/* Optimize for code like (_Float16) __builtin_ceif ((float) f16)
> >> > +   since it's not handled in frontend.  */
>
> In addition to what others said, I suppose you mean ceil with an 'l'?
Yes, __builtin_ceil.
>
> thanks,



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 53/62] AVX512FP16: Add expander for sqrthf2.
  2021-07-01  6:16 ` [PATCH 53/62] AVX512FP16: Add expander for sqrthf2 liuhongt
@ 2021-07-23  5:12   ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-07-23  5:12 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

On Thu, Jul 1, 2021 at 2:18 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/i386-features.c (i386-features.c): Handle
>         E_HFmode.
>         * config/i386/i386.md (sqrthf2): New expander.
>         (*sqrt<mode>2_sse): Extend to MODEFH.
>         * config/i386/sse.md
>         (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>):
>         Extend to VFH_128.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx512fp16-builtin-sqrt-1.c: New test.
>         * gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c: New test.
> ---
>  gcc/config/i386/i386-features.c               | 15 +++++++++++----
>  gcc/config/i386/i386.md                       | 12 +++++++++---
>  gcc/config/i386/sse.md                        |  8 ++++----
>  .../i386/avx512fp16-builtin-sqrt-1.c          | 18 ++++++++++++++++++
>  .../i386/avx512fp16vl-builtin-sqrt-1.c        | 19 +++++++++++++++++++
>  5 files changed, 61 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c
>
> diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
> index a25769ae478..0b5a1a3af53 100644
> --- a/gcc/config/i386/i386-features.c
> +++ b/gcc/config/i386/i386-features.c
> @@ -2238,15 +2238,22 @@ remove_partial_avx_dependency (void)
>
>           rtx zero;
>           machine_mode dest_vecmode;
> -         if (dest_mode == E_SFmode)
> +         switch (dest_mode)
>             {
> +           case E_HFmode:
> +             dest_vecmode = V8HFmode;
> +             zero = gen_rtx_SUBREG (V8HFmode, v4sf_const0, 0);
> +             break;
> +           case E_SFmode:
>               dest_vecmode = V4SFmode;
>               zero = v4sf_const0;
> -           }
> -         else
> -           {
> +             break;
> +           case E_DFmode:
>               dest_vecmode = V2DFmode;
>               zero = gen_rtx_SUBREG (V2DFmode, v4sf_const0, 0);
> +             break;
> +           default:
> +             gcc_unreachable ();
>             }
>
>           /* Change source to vector mode.  */
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index a85c23d74f1..81c893c60de 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -16561,9 +16561,9 @@ (define_expand "rsqrtsf2"
>  })
>
>  (define_insn "*sqrt<mode>2_sse"
> -  [(set (match_operand:MODEF 0 "register_operand" "=v,v,v")
> -       (sqrt:MODEF
> -         (match_operand:MODEF 1 "nonimmediate_operand" "0,v,m")))]
> +  [(set (match_operand:MODEFH 0 "register_operand" "=v,v,v")
> +       (sqrt:MODEFH
> +         (match_operand:MODEFH 1 "nonimmediate_operand" "0,v,m")))]
>    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
>    "@
>     %vsqrt<ssemodesuffix>\t{%d1, %0|%0, %d1}
> @@ -16583,6 +16583,12 @@ (define_insn "*sqrt<mode>2_sse"
>             ]
>             (symbol_ref "true")))])
>
As mentioned by uros, l think this also better has a separate pattern for hf.
> +(define_expand "sqrthf2"
> +  [(set (match_operand:HF 0 "register_operand")
> +       (sqrt:HF
> +         (match_operand:HF 1 "nonimmediate_operand")))]
> +  "TARGET_AVX512FP16")
> +
>  (define_expand "sqrt<mode>2"
>    [(set (match_operand:MODEF 0 "register_operand")
>         (sqrt:MODEF
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 2c3dba5bdb0..b47e7f0b82a 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -2389,12 +2389,12 @@ (define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
>     (set_attr "mode" "<ssescalarmode>")])
>
>  (define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
> -  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> -       (vec_merge:VF_128
> -         (vec_duplicate:VF_128
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +       (vec_merge:VFH_128
> +         (vec_duplicate:VFH_128
>             (sqrt:<ssescalarmode>
>               (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "xm,<round_scalar_constraint>")))
> -         (match_operand:VF_128 2 "register_operand" "0,v")
> +         (match_operand:VFH_128 2 "register_operand" "0,v")
>           (const_int 1)))]
>    "TARGET_SSE"
>    "@
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c
> new file mode 100644
> index 00000000000..38cdf23fef7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-builtin-sqrt-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mavx512fp16" } */
> +
> +_Float16
> +f1 (_Float16 x)
> +{
> +  return __builtin_sqrtf16 (x);
> +}
> +
> +void
> +f2 (_Float16* __restrict psrc, _Float16* __restrict pdst)
> +{
> +  for (int i = 0; i != 32; i++)
> +    pdst[i] = __builtin_sqrtf16 (psrc[i]);
> +}
> +
> +/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*zmm\[0-9\]" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c
> new file mode 100644
> index 00000000000..08deb3ea470
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mavx512fp16 -mavx512vl" } */
> +
> +void
> +f1 (_Float16* __restrict psrc, _Float16* __restrict pdst)
> +{
> +  for (int i = 0; i != 8; i++)
> +    pdst[i] = __builtin_sqrtf16 (psrc[i]);
> +}
> +
> +void
> +f2 (_Float16* __restrict psrc, _Float16* __restrict pdst)
> +{
> +  for (int i = 0; i != 16; i++)
> +    pdst[i] = __builtin_sqrtf16 (psrc[i]);
> +}
> +
> +/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*xmm\[0-9\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*ymm\[0-9\]" 1 } } */
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 44/62] AVX512FP16: Add scalar/vector bitwise operations, including
  2021-07-01  6:16 ` [PATCH 44/62] AVX512FP16: Add scalar/vector bitwise operations, including liuhongt
@ 2021-07-23  5:13   ` Hongtao Liu
  2021-07-26  2:25     ` Hongtao Liu
  0 siblings, 1 reply; 85+ messages in thread
From: Hongtao Liu @ 2021-07-23  5:13 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

On Thu, Jul 1, 2021 at 2:18 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> From: "H.J. Lu" <hjl.tools@gmail.com>
>
> 1. FP16 vector xor/ior/and/andnot/abs/neg
> 2. FP16 scalar abs/neg/copysign/xorsign
>
> gcc/ChangeLog:
>
>         * config/i386/i386-expand.c (ix86_expand_fp_absneg_operator):
>         Handle HFmode.
>         (ix86_expand_copysign): Ditto.
>         (ix86_expand_xorsign): Ditto.
>         * config/i386/i386.c (ix86_build_const_vector): Handle HF vector
>         modes.
>         (ix86_build_signbit_mask): Ditto.
>         (ix86_can_change_mode_class): Ditto.
>         * config/i386/i386.md (SSEMODEF): Add HF mode.
>         (ssevecmodef): Ditto.
>         (<code><mode>2): Use MODEFH.
>         (*<code><mode>2_1): Ditto.
>         (define_split): Ditto.
>         (xorsign<mode>3): Ditto.
>         (@xorsign<mode>3_1): Ditto.
As mentioned by uros, l think these also better have separate patterns for hf.
>         * config/i386/sse.md (VFB): New mode iterator.
>         (VFB_128_256): Ditto.
>         (VFB_512): Ditto.
>         (sseintvecmode2): Support HF vector mode.
>         (<code><mode>2): Use new mode iterator.
>         (*<code><mode>2): Ditto.
>         (copysign<mode>3): Ditto.
>         (xorsign<mode>3): Ditto.
>         (<code><mode>3<mask_name>): Ditto.
>         (<code><mode>3<mask_name>): Ditto.
>         (<sse>_andnot<mode>3<mask_name>): Adjust for HF vector mode.
>         (<sse>_andnot<mode>3<mask_name>): Ditto.
>         (*<code><mode>3<mask_name>): Ditto.
>         (*<code><mode>3<mask_name>): Ditto.
> ---
>  gcc/config/i386/i386-expand.c |  12 +++-
>  gcc/config/i386/i386.c        |  12 +++-
>  gcc/config/i386/i386.md       |  40 ++++++-----
>  gcc/config/i386/sse.md        | 128 ++++++++++++++++++++--------------
>  4 files changed, 118 insertions(+), 74 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 9233c6cd1e8..006f4bec8db 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -1781,6 +1781,8 @@ ix86_expand_fp_absneg_operator (enum rtx_code code, machine_mode mode,
>         vmode = V4SFmode;
>        else if (mode == DFmode)
>         vmode = V2DFmode;
> +      else if (mode == HFmode)
> +       vmode = V8HFmode;
>      }
>
>    dst = operands[0];
> @@ -1918,7 +1920,9 @@ ix86_expand_copysign (rtx operands[])
>
>    mode = GET_MODE (dest);
>
> -  if (mode == SFmode)
> +  if (mode == HFmode)
> +    vmode = V8HFmode;
> +  else if (mode == SFmode)
>      vmode = V4SFmode;
>    else if (mode == DFmode)
>      vmode = V2DFmode;
> @@ -1934,7 +1938,7 @@ ix86_expand_copysign (rtx operands[])
>        if (real_isneg (CONST_DOUBLE_REAL_VALUE (op0)))
>         op0 = simplify_unary_operation (ABS, mode, op0, mode);
>
> -      if (mode == SFmode || mode == DFmode)
> +      if (mode == HFmode || mode == SFmode || mode == DFmode)
>         {
>           if (op0 == CONST0_RTX (mode))
>             op0 = CONST0_RTX (vmode);
> @@ -2073,7 +2077,9 @@ ix86_expand_xorsign (rtx operands[])
>
>    mode = GET_MODE (dest);
>
> -  if (mode == SFmode)
> +  if (mode == HFmode)
> +    vmode = V8HFmode;
> +  else if (mode == SFmode)
>      vmode = V4SFmode;
>    else if (mode == DFmode)
>      vmode = V2DFmode;
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index dc0d440061b..17e1b5ea874 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -15374,6 +15374,9 @@ ix86_build_const_vector (machine_mode mode, bool vect, rtx value)
>      case E_V2DImode:
>        gcc_assert (vect);
>        /* FALLTHRU */
> +    case E_V8HFmode:
> +    case E_V16HFmode:
> +    case E_V32HFmode:
>      case E_V16SFmode:
>      case E_V8SFmode:
>      case E_V4SFmode:
> @@ -15412,6 +15415,13 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, bool invert)
>
>    switch (mode)
>      {
> +    case E_V8HFmode:
> +    case E_V16HFmode:
> +    case E_V32HFmode:
> +      vec_mode = mode;
> +      imode = HImode;
> +      break;
> +
>      case E_V16SImode:
>      case E_V16SFmode:
>      case E_V8SImode:
> @@ -19198,7 +19208,7 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
>          disallow a change to these modes, reload will assume it's ok to
>          drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>          the vec_dupv4hi pattern.  */
> -      if (GET_MODE_SIZE (from) < 4)
> +      if (GET_MODE_SIZE (from) < 4 && from != E_HFmode)
>         return false;
>      }
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 014aba187e1..a85c23d74f1 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1233,9 +1233,10 @@ (define_mode_iterator MODEFH [(HF "TARGET_AVX512FP16") SF DF])
>  ;; All x87 floating point modes plus HFmode
>  (define_mode_iterator X87MODEFH [HF SF DF XF])
>
> -;; All SSE floating point modes
> -(define_mode_iterator SSEMODEF [SF DF TF])
> -(define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
> +;; All SSE floating point modes and HFmode
> +(define_mode_iterator SSEMODEF [HF SF DF TF])
> +(define_mode_attr ssevecmodef [(HF "V8HF") (SF "V4SF") (DF "V2DF") (TF "TF")])
> +
>
>  ;; SSE instruction suffix for various modes
>  (define_mode_attr ssemodesuffix
> @@ -10529,8 +10530,8 @@ (define_insn_and_split "*nabstf2_1"
>    [(set_attr "isa" "noavx,noavx,avx,avx")])
>
>  (define_expand "<code><mode>2"
> -  [(set (match_operand:X87MODEF 0 "register_operand")
> -       (absneg:X87MODEF (match_operand:X87MODEF 1 "register_operand")))]
> +  [(set (match_operand:X87MODEFH 0 "register_operand")
> +       (absneg:X87MODEFH (match_operand:X87MODEFH 1 "register_operand")))]
>    "TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
>    "ix86_expand_fp_absneg_operator (<CODE>, <MODE>mode, operands); DONE;")
>
> @@ -10559,9 +10560,9 @@ (define_split
>    "ix86_split_fp_absneg_operator (<CODE>, <MODE>mode, operands); DONE;")
>
>  (define_insn "*<code><mode>2_1"
> -  [(set (match_operand:MODEF 0 "register_operand" "=x,x,Yv,f,!r")
> -       (absneg:MODEF
> -         (match_operand:MODEF 1 "register_operand" "0,x,Yv,0,0")))
> +  [(set (match_operand:MODEFH 0 "register_operand" "=x,x,Yv,f,!r")
> +       (absneg:MODEFH
> +         (match_operand:MODEFH 1 "register_operand" "0,x,Yv,0,0")))
>     (use (match_operand:<ssevecmode> 2 "vector_operand" "xBm,0,Yvm,X,X"))
>     (clobber (reg:CC FLAGS_REG))]
>    "TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> @@ -10572,7 +10573,8 @@ (define_insn "*<code><mode>2_1"
>         (match_test ("SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"))
>         (if_then_else
>          (eq_attr "alternative" "3,4")
> -        (symbol_ref "TARGET_MIX_SSE_I387")
> +        (symbol_ref "TARGET_MIX_SSE_I387
> +                     && <MODE>mode != HFmode")
>          (const_string "*"))
>         (if_then_else
>          (eq_attr "alternative" "3,4")
> @@ -10580,9 +10582,9 @@ (define_insn "*<code><mode>2_1"
>          (symbol_ref "false"))))])
>
>  (define_split
> -  [(set (match_operand:MODEF 0 "sse_reg_operand")
> -       (absneg:MODEF
> -         (match_operand:MODEF 1 "sse_reg_operand")))
> +  [(set (match_operand:MODEFH 0 "sse_reg_operand")
> +       (absneg:MODEFH
> +         (match_operand:MODEFH 1 "sse_reg_operand")))
>     (use (match_operand:<ssevecmodef> 2 "vector_operand"))
>     (clobber (reg:CC FLAGS_REG))]
>    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
> @@ -10706,17 +10708,17 @@ (define_split
>    "ix86_split_copysign_var (operands); DONE;")
>
>  (define_expand "xorsign<mode>3"
> -  [(match_operand:MODEF 0 "register_operand")
> -   (match_operand:MODEF 1 "register_operand")
> -   (match_operand:MODEF 2 "register_operand")]
> +  [(match_operand:MODEFH 0 "register_operand")
> +   (match_operand:MODEFH 1 "register_operand")
> +   (match_operand:MODEFH 2 "register_operand")]
>    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
>    "ix86_expand_xorsign (operands); DONE;")
>
>  (define_insn_and_split "@xorsign<mode>3_1"
> -  [(set (match_operand:MODEF 0 "register_operand" "=Yv")
> -       (unspec:MODEF
> -         [(match_operand:MODEF 1 "register_operand" "Yv")
> -          (match_operand:MODEF 2 "register_operand" "0")
> +  [(set (match_operand:MODEFH 0 "register_operand" "=Yv")
> +       (unspec:MODEFH
> +         [(match_operand:MODEFH 1 "register_operand" "Yv")
> +          (match_operand:MODEFH 2 "register_operand" "0")
>            (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm")]
>           UNSPEC_XORSIGN))]
>    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index fdcc0515228..7c594babcce 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -317,11 +317,26 @@ (define_mode_iterator VFH
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
>
> +;; 128-, 256- and 512-bit float vector modes for bitwise operations
> +(define_mode_iterator VFB
> +  [(V32HF "TARGET_AVX512FP16")
> +   (V16HF "TARGET_AVX512FP16")
> +   (V8HF "TARGET_AVX512FP16")
> +   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
> +   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
> +
>  ;; 128- and 256-bit float vector modes
>  (define_mode_iterator VF_128_256
>    [(V8SF "TARGET_AVX") V4SF
>     (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
>
> +;; 128- and 256-bit float vector modes for bitwise operations
> +(define_mode_iterator VFB_128_256
> +  [(V16HF "TARGET_AVX512FP16")
> +   (V8HF "TARGET_AVX512FP16")
> +   (V8SF "TARGET_AVX") V4SF
> +   (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
> +
>  ;; All SFmode vector float modes
>  (define_mode_iterator VF1
>    [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF])
> @@ -374,6 +389,10 @@ (define_mode_iterator VF_256
>  (define_mode_iterator VF_512
>    [V16SF V8DF])
>
> +;; All 512bit vector float modes for bitwise operations
> +(define_mode_iterator VFB_512
> +  [(V32HF "TARGET_AVX512FP16") V16SF V8DF])
> +
>  (define_mode_iterator VI48_AVX512VL
>    [V16SI (V8SI  "TARGET_AVX512VL") (V4SI  "TARGET_AVX512VL")
>     V8DI  (V4DI  "TARGET_AVX512VL") (V2DI  "TARGET_AVX512VL")])
> @@ -923,7 +942,8 @@ (define_mode_attr sseintvecmode
>
>  (define_mode_attr sseintvecmode2
>    [(V8DF "XI") (V4DF "OI") (V2DF "TI")
> -   (V8SF "OI") (V4SF "TI")])
> +   (V8SF "OI") (V4SF "TI")
> +   (V16HF "OI") (V8HF "TI")])
>
>  (define_mode_attr sseintvecmodelower
>    [(V16SF "v16si") (V8DF "v8di")
> @@ -1968,22 +1988,22 @@ (define_insn "kunpckdi"
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
>  (define_expand "<code><mode>2"
> -  [(set (match_operand:VF 0 "register_operand")
> -       (absneg:VF
> -         (match_operand:VF 1 "register_operand")))]
> +  [(set (match_operand:VFB 0 "register_operand")
> +       (absneg:VFB
> +         (match_operand:VFB 1 "register_operand")))]
>    "TARGET_SSE"
>    "ix86_expand_fp_absneg_operator (<CODE>, <MODE>mode, operands); DONE;")
>
>  (define_insn_and_split "*<code><mode>2"
> -  [(set (match_operand:VF 0 "register_operand" "=x,x,v,v")
> -       (absneg:VF
> -         (match_operand:VF 1 "vector_operand" "0,xBm,v,m")))
> -   (use (match_operand:VF 2 "vector_operand" "xBm,0,vm,v"))]
> +  [(set (match_operand:VFB 0 "register_operand" "=x,x,v,v")
> +       (absneg:VFB
> +         (match_operand:VFB 1 "vector_operand" "0,xBm,v,m")))
> +   (use (match_operand:VFB 2 "vector_operand" "xBm,0,vm,v"))]
>    "TARGET_SSE"
>    "#"
>    "&& reload_completed"
>    [(set (match_dup 0)
> -       (<absneg_op>:VF (match_dup 1) (match_dup 2)))]
> +       (<absneg_op>:VFB (match_dup 1) (match_dup 2)))]
>  {
>    if (TARGET_AVX)
>      {
> @@ -3893,11 +3913,11 @@ (define_expand "vcond_mask_<mode><sseintvecmodelower>"
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
>  (define_insn "<sse>_andnot<mode>3<mask_name>"
> -  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v")
> -       (and:VF_128_256
> -         (not:VF_128_256
> -           (match_operand:VF_128_256 1 "register_operand" "0,x,v,v"))
> -         (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
> +  [(set (match_operand:VFB_128_256 0 "register_operand" "=x,x,v,v")
> +       (and:VFB_128_256
> +         (not:VFB_128_256
> +           (match_operand:VFB_128_256 1 "register_operand" "0,x,v,v"))
> +         (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
>    "TARGET_SSE && <mask_avx512vl_condition>"
>  {
>    char buf[128];
> @@ -3920,6 +3940,8 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
>
>    switch (get_attr_mode (insn))
>      {
> +    case MODE_V16HF:
> +    case MODE_V8HF:
>      case MODE_V8SF:
>      case MODE_V4SF:
>        suffix = "ps";
> @@ -3958,11 +3980,11 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
>               (const_string "<MODE>")))])
>
>  (define_insn "<sse>_andnot<mode>3<mask_name>"
> -  [(set (match_operand:VF_512 0 "register_operand" "=v")
> -       (and:VF_512
> -         (not:VF_512
> -           (match_operand:VF_512 1 "register_operand" "v"))
> -         (match_operand:VF_512 2 "nonimmediate_operand" "vm")))]
> +  [(set (match_operand:VFB_512 0 "register_operand" "=v")
> +       (and:VFB_512
> +         (not:VFB_512
> +           (match_operand:VFB_512 1 "register_operand" "v"))
> +         (match_operand:VFB_512 2 "nonimmediate_operand" "vm")))]
>    "TARGET_AVX512F"
>  {
>    char buf[128];
> @@ -3972,8 +3994,9 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
>    suffix = "<ssemodesuffix>";
>    ops = "";
>
> -  /* There is no vandnp[sd] in avx512f.  Use vpandn[qd].  */
> -  if (!TARGET_AVX512DQ)
> +  /* Since there are no vandnp[sd] without AVX512DQ nor vandnph,
> +     use vp<logic>[dq].  */
> +  if (!TARGET_AVX512DQ || <MODE>mode == V32HFmode)
>      {
>        suffix = GET_MODE_INNER (<MODE>mode) == DFmode ? "q" : "d";
>        ops = "p";
> @@ -3993,26 +4016,26 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
>                       (const_string "XI")))])
>
>  (define_expand "<code><mode>3<mask_name>"
> -  [(set (match_operand:VF_128_256 0 "register_operand")
> -       (any_logic:VF_128_256
> -         (match_operand:VF_128_256 1 "vector_operand")
> -         (match_operand:VF_128_256 2 "vector_operand")))]
> +  [(set (match_operand:VFB_128_256 0 "register_operand")
> +       (any_logic:VFB_128_256
> +         (match_operand:VFB_128_256 1 "vector_operand")
> +         (match_operand:VFB_128_256 2 "vector_operand")))]
>    "TARGET_SSE && <mask_avx512vl_condition>"
>    "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
>
>  (define_expand "<code><mode>3<mask_name>"
> -  [(set (match_operand:VF_512 0 "register_operand")
> -       (any_logic:VF_512
> -         (match_operand:VF_512 1 "nonimmediate_operand")
> -         (match_operand:VF_512 2 "nonimmediate_operand")))]
> +  [(set (match_operand:VFB_512 0 "register_operand")
> +       (any_logic:VFB_512
> +         (match_operand:VFB_512 1 "nonimmediate_operand")
> +         (match_operand:VFB_512 2 "nonimmediate_operand")))]
>    "TARGET_AVX512F"
>    "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
>
>  (define_insn "*<code><mode>3<mask_name>"
> -  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v")
> -       (any_logic:VF_128_256
> -         (match_operand:VF_128_256 1 "vector_operand" "%0,x,v,v")
> -         (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
> +  [(set (match_operand:VFB_128_256 0 "register_operand" "=x,x,v,v")
> +       (any_logic:VFB_128_256
> +         (match_operand:VFB_128_256 1 "vector_operand" "%0,x,v,v")
> +         (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
>    "TARGET_SSE && <mask_avx512vl_condition>
>     && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>  {
> @@ -4036,6 +4059,8 @@ (define_insn "*<code><mode>3<mask_name>"
>
>    switch (get_attr_mode (insn))
>      {
> +    case MODE_V16HF:
> +    case MODE_V8HF:
>      case MODE_V8SF:
>      case MODE_V4SF:
>        suffix = "ps";
> @@ -4074,10 +4099,10 @@ (define_insn "*<code><mode>3<mask_name>"
>               (const_string "<MODE>")))])
>
>  (define_insn "*<code><mode>3<mask_name>"
> -  [(set (match_operand:VF_512 0 "register_operand" "=v")
> -       (any_logic:VF_512
> -         (match_operand:VF_512 1 "nonimmediate_operand" "%v")
> -         (match_operand:VF_512 2 "nonimmediate_operand" "vm")))]
> +  [(set (match_operand:VFB_512 0 "register_operand" "=v")
> +       (any_logic:VFB_512
> +         (match_operand:VFB_512 1 "nonimmediate_operand" "%v")
> +         (match_operand:VFB_512 2 "nonimmediate_operand" "vm")))]
>    "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>  {
>    char buf[128];
> @@ -4087,8 +4112,9 @@ (define_insn "*<code><mode>3<mask_name>"
>    suffix = "<ssemodesuffix>";
>    ops = "";
>
> -  /* There is no v<logic>p[sd] in avx512f.  Use vp<logic>[dq].  */
> -  if (!TARGET_AVX512DQ)
> +  /* Since there are no v<logic>p[sd] without AVX512DQ nor v<logic>ph,
> +     use vp<logic>[dq].  */
> +  if (!TARGET_AVX512DQ || <MODE>mode == V32HFmode)
>      {
>        suffix = GET_MODE_INNER (<MODE>mode) == DFmode ? "q" : "d";
>        ops = "p";
> @@ -4109,14 +4135,14 @@ (define_insn "*<code><mode>3<mask_name>"
>
>  (define_expand "copysign<mode>3"
>    [(set (match_dup 4)
> -       (and:VF
> -         (not:VF (match_dup 3))
> -         (match_operand:VF 1 "vector_operand")))
> +       (and:VFB
> +         (not:VFB (match_dup 3))
> +         (match_operand:VFB 1 "vector_operand")))
>     (set (match_dup 5)
> -       (and:VF (match_dup 3)
> -               (match_operand:VF 2 "vector_operand")))
> -   (set (match_operand:VF 0 "register_operand")
> -       (ior:VF (match_dup 4) (match_dup 5)))]
> +       (and:VFB (match_dup 3)
> +                (match_operand:VFB 2 "vector_operand")))
> +   (set (match_operand:VFB 0 "register_operand")
> +       (ior:VFB (match_dup 4) (match_dup 5)))]
>    "TARGET_SSE"
>  {
>    operands[3] = ix86_build_signbit_mask (<MODE>mode, 1, 0);
> @@ -4127,11 +4153,11 @@ (define_expand "copysign<mode>3"
>
>  (define_expand "xorsign<mode>3"
>    [(set (match_dup 4)
> -       (and:VF (match_dup 3)
> -               (match_operand:VF 2 "vector_operand")))
> -   (set (match_operand:VF 0 "register_operand")
> -       (xor:VF (match_dup 4)
> -               (match_operand:VF 1 "vector_operand")))]
> +       (and:VFB (match_dup 3)
> +               (match_operand:VFB 2 "vector_operand")))
> +   (set (match_operand:VFB 0 "register_operand")
> +       (xor:VFB (match_dup 4)
> +                (match_operand:VFB 1 "vector_operand")))]
>    "TARGET_SSE"
>  {
>    operands[3] = ix86_build_signbit_mask (<MODE>mode, 1, 0);
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 44/62] AVX512FP16: Add scalar/vector bitwise operations, including
  2021-07-23  5:13   ` Hongtao Liu
@ 2021-07-26  2:25     ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-07-26  2:25 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

On Fri, Jul 23, 2021 at 1:13 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Jul 1, 2021 at 2:18 PM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > From: "H.J. Lu" <hjl.tools@gmail.com>
> >
> > 1. FP16 vector xor/ior/and/andnot/abs/neg
> > 2. FP16 scalar abs/neg/copysign/xorsign
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386-expand.c (ix86_expand_fp_absneg_operator):
> >         Handle HFmode.
> >         (ix86_expand_copysign): Ditto.
> >         (ix86_expand_xorsign): Ditto.
> >         * config/i386/i386.c (ix86_build_const_vector): Handle HF vector
> >         modes.
> >         (ix86_build_signbit_mask): Ditto.
> >         (ix86_can_change_mode_class): Ditto.
> >         * config/i386/i386.md (SSEMODEF): Add HF mode.
> >         (ssevecmodef): Ditto.
> >         (<code><mode>2): Use MODEFH.
> >         (*<code><mode>2_1): Ditto.
> >         (define_split): Ditto.
> >         (xorsign<mode>3): Ditto.
> >         (@xorsign<mode>3_1): Ditto.
> As mentioned by uros, l think these also better have separate patterns for hf.
I realized there're parameters names in define_insn and
define_insn_and_split, and they will be called by xorsign/copysign
functions in i386-expand.c, for simplicity i'd like to keep the
macroization of HF patterns in this patch.

> >         * config/i386/sse.md (VFB): New mode iterator.
> >         (VFB_128_256): Ditto.
> >         (VFB_512): Ditto.
> >         (sseintvecmode2): Support HF vector mode.
> >         (<code><mode>2): Use new mode iterator.
> >         (*<code><mode>2): Ditto.
> >         (copysign<mode>3): Ditto.
> >         (xorsign<mode>3): Ditto.
> >         (<code><mode>3<mask_name>): Ditto.
> >         (<code><mode>3<mask_name>): Ditto.
> >         (<sse>_andnot<mode>3<mask_name>): Adjust for HF vector mode.
> >         (<sse>_andnot<mode>3<mask_name>): Ditto.
> >         (*<code><mode>3<mask_name>): Ditto.
> >         (*<code><mode>3<mask_name>): Ditto.
> > ---
> >  gcc/config/i386/i386-expand.c |  12 +++-
> >  gcc/config/i386/i386.c        |  12 +++-
> >  gcc/config/i386/i386.md       |  40 ++++++-----
> >  gcc/config/i386/sse.md        | 128 ++++++++++++++++++++--------------
> >  4 files changed, 118 insertions(+), 74 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index 9233c6cd1e8..006f4bec8db 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -1781,6 +1781,8 @@ ix86_expand_fp_absneg_operator (enum rtx_code code, machine_mode mode,
> >         vmode = V4SFmode;
> >        else if (mode == DFmode)
> >         vmode = V2DFmode;
> > +      else if (mode == HFmode)
> > +       vmode = V8HFmode;
> >      }
> >
> >    dst = operands[0];
> > @@ -1918,7 +1920,9 @@ ix86_expand_copysign (rtx operands[])
> >
> >    mode = GET_MODE (dest);
> >
> > -  if (mode == SFmode)
> > +  if (mode == HFmode)
> > +    vmode = V8HFmode;
> > +  else if (mode == SFmode)
> >      vmode = V4SFmode;
> >    else if (mode == DFmode)
> >      vmode = V2DFmode;
> > @@ -1934,7 +1938,7 @@ ix86_expand_copysign (rtx operands[])
> >        if (real_isneg (CONST_DOUBLE_REAL_VALUE (op0)))
> >         op0 = simplify_unary_operation (ABS, mode, op0, mode);
> >
> > -      if (mode == SFmode || mode == DFmode)
> > +      if (mode == HFmode || mode == SFmode || mode == DFmode)
> >         {
> >           if (op0 == CONST0_RTX (mode))
> >             op0 = CONST0_RTX (vmode);
> > @@ -2073,7 +2077,9 @@ ix86_expand_xorsign (rtx operands[])
> >
> >    mode = GET_MODE (dest);
> >
> > -  if (mode == SFmode)
> > +  if (mode == HFmode)
> > +    vmode = V8HFmode;
> > +  else if (mode == SFmode)
> >      vmode = V4SFmode;
> >    else if (mode == DFmode)
> >      vmode = V2DFmode;
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index dc0d440061b..17e1b5ea874 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -15374,6 +15374,9 @@ ix86_build_const_vector (machine_mode mode, bool vect, rtx value)
> >      case E_V2DImode:
> >        gcc_assert (vect);
> >        /* FALLTHRU */
> > +    case E_V8HFmode:
> > +    case E_V16HFmode:
> > +    case E_V32HFmode:
> >      case E_V16SFmode:
> >      case E_V8SFmode:
> >      case E_V4SFmode:
> > @@ -15412,6 +15415,13 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, bool invert)
> >
> >    switch (mode)
> >      {
> > +    case E_V8HFmode:
> > +    case E_V16HFmode:
> > +    case E_V32HFmode:
> > +      vec_mode = mode;
> > +      imode = HImode;
> > +      break;
> > +
> >      case E_V16SImode:
> >      case E_V16SFmode:
> >      case E_V8SImode:
> > @@ -19198,7 +19208,7 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
> >          disallow a change to these modes, reload will assume it's ok to
> >          drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
> >          the vec_dupv4hi pattern.  */
> > -      if (GET_MODE_SIZE (from) < 4)
> > +      if (GET_MODE_SIZE (from) < 4 && from != E_HFmode)
> >         return false;
> >      }
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 014aba187e1..a85c23d74f1 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -1233,9 +1233,10 @@ (define_mode_iterator MODEFH [(HF "TARGET_AVX512FP16") SF DF])
> >  ;; All x87 floating point modes plus HFmode
> >  (define_mode_iterator X87MODEFH [HF SF DF XF])
> >
> > -;; All SSE floating point modes
> > -(define_mode_iterator SSEMODEF [SF DF TF])
> > -(define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
> > +;; All SSE floating point modes and HFmode
> > +(define_mode_iterator SSEMODEF [HF SF DF TF])
> > +(define_mode_attr ssevecmodef [(HF "V8HF") (SF "V4SF") (DF "V2DF") (TF "TF")])
> > +
> >
> >  ;; SSE instruction suffix for various modes
> >  (define_mode_attr ssemodesuffix
> > @@ -10529,8 +10530,8 @@ (define_insn_and_split "*nabstf2_1"
> >    [(set_attr "isa" "noavx,noavx,avx,avx")])
> >
> >  (define_expand "<code><mode>2"
> > -  [(set (match_operand:X87MODEF 0 "register_operand")
> > -       (absneg:X87MODEF (match_operand:X87MODEF 1 "register_operand")))]
> > +  [(set (match_operand:X87MODEFH 0 "register_operand")
> > +       (absneg:X87MODEFH (match_operand:X87MODEFH 1 "register_operand")))]
> >    "TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> >    "ix86_expand_fp_absneg_operator (<CODE>, <MODE>mode, operands); DONE;")
> >
> > @@ -10559,9 +10560,9 @@ (define_split
> >    "ix86_split_fp_absneg_operator (<CODE>, <MODE>mode, operands); DONE;")
> >
> >  (define_insn "*<code><mode>2_1"
> > -  [(set (match_operand:MODEF 0 "register_operand" "=x,x,Yv,f,!r")
> > -       (absneg:MODEF
> > -         (match_operand:MODEF 1 "register_operand" "0,x,Yv,0,0")))
> > +  [(set (match_operand:MODEFH 0 "register_operand" "=x,x,Yv,f,!r")
> > +       (absneg:MODEFH
> > +         (match_operand:MODEFH 1 "register_operand" "0,x,Yv,0,0")))
> >     (use (match_operand:<ssevecmode> 2 "vector_operand" "xBm,0,Yvm,X,X"))
> >     (clobber (reg:CC FLAGS_REG))]
> >    "TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> > @@ -10572,7 +10573,8 @@ (define_insn "*<code><mode>2_1"
> >         (match_test ("SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"))
> >         (if_then_else
> >          (eq_attr "alternative" "3,4")
> > -        (symbol_ref "TARGET_MIX_SSE_I387")
> > +        (symbol_ref "TARGET_MIX_SSE_I387
> > +                     && <MODE>mode != HFmode")
> >          (const_string "*"))
> >         (if_then_else
> >          (eq_attr "alternative" "3,4")
> > @@ -10580,9 +10582,9 @@ (define_insn "*<code><mode>2_1"
> >          (symbol_ref "false"))))])
> >
> >  (define_split
> > -  [(set (match_operand:MODEF 0 "sse_reg_operand")
> > -       (absneg:MODEF
> > -         (match_operand:MODEF 1 "sse_reg_operand")))
> > +  [(set (match_operand:MODEFH 0 "sse_reg_operand")
> > +       (absneg:MODEFH
> > +         (match_operand:MODEFH 1 "sse_reg_operand")))
> >     (use (match_operand:<ssevecmodef> 2 "vector_operand"))
> >     (clobber (reg:CC FLAGS_REG))]
> >    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
> > @@ -10706,17 +10708,17 @@ (define_split
> >    "ix86_split_copysign_var (operands); DONE;")
> >
> >  (define_expand "xorsign<mode>3"
> > -  [(match_operand:MODEF 0 "register_operand")
> > -   (match_operand:MODEF 1 "register_operand")
> > -   (match_operand:MODEF 2 "register_operand")]
> > +  [(match_operand:MODEFH 0 "register_operand")
> > +   (match_operand:MODEFH 1 "register_operand")
> > +   (match_operand:MODEFH 2 "register_operand")]
> >    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
> >    "ix86_expand_xorsign (operands); DONE;")
> >
> >  (define_insn_and_split "@xorsign<mode>3_1"
> > -  [(set (match_operand:MODEF 0 "register_operand" "=Yv")
> > -       (unspec:MODEF
> > -         [(match_operand:MODEF 1 "register_operand" "Yv")
> > -          (match_operand:MODEF 2 "register_operand" "0")
> > +  [(set (match_operand:MODEFH 0 "register_operand" "=Yv")
> > +       (unspec:MODEFH
> > +         [(match_operand:MODEFH 1 "register_operand" "Yv")
> > +          (match_operand:MODEFH 2 "register_operand" "0")
> >            (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm")]
> >           UNSPEC_XORSIGN))]
> >    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index fdcc0515228..7c594babcce 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -317,11 +317,26 @@ (define_mode_iterator VFH
> >     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
> >     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
> >
> > +;; 128-, 256- and 512-bit float vector modes for bitwise operations
> > +(define_mode_iterator VFB
> > +  [(V32HF "TARGET_AVX512FP16")
> > +   (V16HF "TARGET_AVX512FP16")
> > +   (V8HF "TARGET_AVX512FP16")
> > +   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
> > +   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
> > +
> >  ;; 128- and 256-bit float vector modes
> >  (define_mode_iterator VF_128_256
> >    [(V8SF "TARGET_AVX") V4SF
> >     (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
> >
> > +;; 128- and 256-bit float vector modes for bitwise operations
> > +(define_mode_iterator VFB_128_256
> > +  [(V16HF "TARGET_AVX512FP16")
> > +   (V8HF "TARGET_AVX512FP16")
> > +   (V8SF "TARGET_AVX") V4SF
> > +   (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
> > +
> >  ;; All SFmode vector float modes
> >  (define_mode_iterator VF1
> >    [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF])
> > @@ -374,6 +389,10 @@ (define_mode_iterator VF_256
> >  (define_mode_iterator VF_512
> >    [V16SF V8DF])
> >
> > +;; All 512bit vector float modes for bitwise operations
> > +(define_mode_iterator VFB_512
> > +  [(V32HF "TARGET_AVX512FP16") V16SF V8DF])
> > +
> >  (define_mode_iterator VI48_AVX512VL
> >    [V16SI (V8SI  "TARGET_AVX512VL") (V4SI  "TARGET_AVX512VL")
> >     V8DI  (V4DI  "TARGET_AVX512VL") (V2DI  "TARGET_AVX512VL")])
> > @@ -923,7 +942,8 @@ (define_mode_attr sseintvecmode
> >
> >  (define_mode_attr sseintvecmode2
> >    [(V8DF "XI") (V4DF "OI") (V2DF "TI")
> > -   (V8SF "OI") (V4SF "TI")])
> > +   (V8SF "OI") (V4SF "TI")
> > +   (V16HF "OI") (V8HF "TI")])
> >
> >  (define_mode_attr sseintvecmodelower
> >    [(V16SF "v16si") (V8DF "v8di")
> > @@ -1968,22 +1988,22 @@ (define_insn "kunpckdi"
> >  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >
> >  (define_expand "<code><mode>2"
> > -  [(set (match_operand:VF 0 "register_operand")
> > -       (absneg:VF
> > -         (match_operand:VF 1 "register_operand")))]
> > +  [(set (match_operand:VFB 0 "register_operand")
> > +       (absneg:VFB
> > +         (match_operand:VFB 1 "register_operand")))]
> >    "TARGET_SSE"
> >    "ix86_expand_fp_absneg_operator (<CODE>, <MODE>mode, operands); DONE;")
> >
> >  (define_insn_and_split "*<code><mode>2"
> > -  [(set (match_operand:VF 0 "register_operand" "=x,x,v,v")
> > -       (absneg:VF
> > -         (match_operand:VF 1 "vector_operand" "0,xBm,v,m")))
> > -   (use (match_operand:VF 2 "vector_operand" "xBm,0,vm,v"))]
> > +  [(set (match_operand:VFB 0 "register_operand" "=x,x,v,v")
> > +       (absneg:VFB
> > +         (match_operand:VFB 1 "vector_operand" "0,xBm,v,m")))
> > +   (use (match_operand:VFB 2 "vector_operand" "xBm,0,vm,v"))]
> >    "TARGET_SSE"
> >    "#"
> >    "&& reload_completed"
> >    [(set (match_dup 0)
> > -       (<absneg_op>:VF (match_dup 1) (match_dup 2)))]
> > +       (<absneg_op>:VFB (match_dup 1) (match_dup 2)))]
> >  {
> >    if (TARGET_AVX)
> >      {
> > @@ -3893,11 +3913,11 @@ (define_expand "vcond_mask_<mode><sseintvecmodelower>"
> >  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >
> >  (define_insn "<sse>_andnot<mode>3<mask_name>"
> > -  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v")
> > -       (and:VF_128_256
> > -         (not:VF_128_256
> > -           (match_operand:VF_128_256 1 "register_operand" "0,x,v,v"))
> > -         (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
> > +  [(set (match_operand:VFB_128_256 0 "register_operand" "=x,x,v,v")
> > +       (and:VFB_128_256
> > +         (not:VFB_128_256
> > +           (match_operand:VFB_128_256 1 "register_operand" "0,x,v,v"))
> > +         (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
> >    "TARGET_SSE && <mask_avx512vl_condition>"
> >  {
> >    char buf[128];
> > @@ -3920,6 +3940,8 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
> >
> >    switch (get_attr_mode (insn))
> >      {
> > +    case MODE_V16HF:
> > +    case MODE_V8HF:
> >      case MODE_V8SF:
> >      case MODE_V4SF:
> >        suffix = "ps";
> > @@ -3958,11 +3980,11 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
> >               (const_string "<MODE>")))])
> >
> >  (define_insn "<sse>_andnot<mode>3<mask_name>"
> > -  [(set (match_operand:VF_512 0 "register_operand" "=v")
> > -       (and:VF_512
> > -         (not:VF_512
> > -           (match_operand:VF_512 1 "register_operand" "v"))
> > -         (match_operand:VF_512 2 "nonimmediate_operand" "vm")))]
> > +  [(set (match_operand:VFB_512 0 "register_operand" "=v")
> > +       (and:VFB_512
> > +         (not:VFB_512
> > +           (match_operand:VFB_512 1 "register_operand" "v"))
> > +         (match_operand:VFB_512 2 "nonimmediate_operand" "vm")))]
> >    "TARGET_AVX512F"
> >  {
> >    char buf[128];
> > @@ -3972,8 +3994,9 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
> >    suffix = "<ssemodesuffix>";
> >    ops = "";
> >
> > -  /* There is no vandnp[sd] in avx512f.  Use vpandn[qd].  */
> > -  if (!TARGET_AVX512DQ)
> > +  /* Since there are no vandnp[sd] without AVX512DQ nor vandnph,
> > +     use vp<logic>[dq].  */
> > +  if (!TARGET_AVX512DQ || <MODE>mode == V32HFmode)
> >      {
> >        suffix = GET_MODE_INNER (<MODE>mode) == DFmode ? "q" : "d";
> >        ops = "p";
> > @@ -3993,26 +4016,26 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
> >                       (const_string "XI")))])
> >
> >  (define_expand "<code><mode>3<mask_name>"
> > -  [(set (match_operand:VF_128_256 0 "register_operand")
> > -       (any_logic:VF_128_256
> > -         (match_operand:VF_128_256 1 "vector_operand")
> > -         (match_operand:VF_128_256 2 "vector_operand")))]
> > +  [(set (match_operand:VFB_128_256 0 "register_operand")
> > +       (any_logic:VFB_128_256
> > +         (match_operand:VFB_128_256 1 "vector_operand")
> > +         (match_operand:VFB_128_256 2 "vector_operand")))]
> >    "TARGET_SSE && <mask_avx512vl_condition>"
> >    "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
> >
> >  (define_expand "<code><mode>3<mask_name>"
> > -  [(set (match_operand:VF_512 0 "register_operand")
> > -       (any_logic:VF_512
> > -         (match_operand:VF_512 1 "nonimmediate_operand")
> > -         (match_operand:VF_512 2 "nonimmediate_operand")))]
> > +  [(set (match_operand:VFB_512 0 "register_operand")
> > +       (any_logic:VFB_512
> > +         (match_operand:VFB_512 1 "nonimmediate_operand")
> > +         (match_operand:VFB_512 2 "nonimmediate_operand")))]
> >    "TARGET_AVX512F"
> >    "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
> >
> >  (define_insn "*<code><mode>3<mask_name>"
> > -  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v")
> > -       (any_logic:VF_128_256
> > -         (match_operand:VF_128_256 1 "vector_operand" "%0,x,v,v")
> > -         (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
> > +  [(set (match_operand:VFB_128_256 0 "register_operand" "=x,x,v,v")
> > +       (any_logic:VFB_128_256
> > +         (match_operand:VFB_128_256 1 "vector_operand" "%0,x,v,v")
> > +         (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
> >    "TARGET_SSE && <mask_avx512vl_condition>
> >     && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> >  {
> > @@ -4036,6 +4059,8 @@ (define_insn "*<code><mode>3<mask_name>"
> >
> >    switch (get_attr_mode (insn))
> >      {
> > +    case MODE_V16HF:
> > +    case MODE_V8HF:
> >      case MODE_V8SF:
> >      case MODE_V4SF:
> >        suffix = "ps";
> > @@ -4074,10 +4099,10 @@ (define_insn "*<code><mode>3<mask_name>"
> >               (const_string "<MODE>")))])
> >
> >  (define_insn "*<code><mode>3<mask_name>"
> > -  [(set (match_operand:VF_512 0 "register_operand" "=v")
> > -       (any_logic:VF_512
> > -         (match_operand:VF_512 1 "nonimmediate_operand" "%v")
> > -         (match_operand:VF_512 2 "nonimmediate_operand" "vm")))]
> > +  [(set (match_operand:VFB_512 0 "register_operand" "=v")
> > +       (any_logic:VFB_512
> > +         (match_operand:VFB_512 1 "nonimmediate_operand" "%v")
> > +         (match_operand:VFB_512 2 "nonimmediate_operand" "vm")))]
> >    "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> >  {
> >    char buf[128];
> > @@ -4087,8 +4112,9 @@ (define_insn "*<code><mode>3<mask_name>"
> >    suffix = "<ssemodesuffix>";
> >    ops = "";
> >
> > -  /* There is no v<logic>p[sd] in avx512f.  Use vp<logic>[dq].  */
> > -  if (!TARGET_AVX512DQ)
> > +  /* Since there are no v<logic>p[sd] without AVX512DQ nor v<logic>ph,
> > +     use vp<logic>[dq].  */
> > +  if (!TARGET_AVX512DQ || <MODE>mode == V32HFmode)
> >      {
> >        suffix = GET_MODE_INNER (<MODE>mode) == DFmode ? "q" : "d";
> >        ops = "p";
> > @@ -4109,14 +4135,14 @@ (define_insn "*<code><mode>3<mask_name>"
> >
> >  (define_expand "copysign<mode>3"
> >    [(set (match_dup 4)
> > -       (and:VF
> > -         (not:VF (match_dup 3))
> > -         (match_operand:VF 1 "vector_operand")))
> > +       (and:VFB
> > +         (not:VFB (match_dup 3))
> > +         (match_operand:VFB 1 "vector_operand")))
> >     (set (match_dup 5)
> > -       (and:VF (match_dup 3)
> > -               (match_operand:VF 2 "vector_operand")))
> > -   (set (match_operand:VF 0 "register_operand")
> > -       (ior:VF (match_dup 4) (match_dup 5)))]
> > +       (and:VFB (match_dup 3)
> > +                (match_operand:VFB 2 "vector_operand")))
> > +   (set (match_operand:VFB 0 "register_operand")
> > +       (ior:VFB (match_dup 4) (match_dup 5)))]
> >    "TARGET_SSE"
> >  {
> >    operands[3] = ix86_build_signbit_mask (<MODE>mode, 1, 0);
> > @@ -4127,11 +4153,11 @@ (define_expand "copysign<mode>3"
> >
> >  (define_expand "xorsign<mode>3"
> >    [(set (match_dup 4)
> > -       (and:VF (match_dup 3)
> > -               (match_operand:VF 2 "vector_operand")))
> > -   (set (match_operand:VF 0 "register_operand")
> > -       (xor:VF (match_dup 4)
> > -               (match_operand:VF 1 "vector_operand")))]
> > +       (and:VFB (match_dup 3)
> > +               (match_operand:VFB 2 "vector_operand")))
> > +   (set (match_operand:VFB 0 "register_operand")
> > +       (xor:VFB (match_dup 4)
> > +                (match_operand:VFB 1 "vector_operand")))]
> >    "TARGET_SSE"
> >  {
> >    operands[3] = ix86_build_signbit_mask (<MODE>mode, 1, 0);
> > --
> > 2.18.1
> >
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 07/62] AVX512FP16: Add vaddph/vsubph/vdivph/vmulph.
  2021-07-01  6:15 ` [PATCH 07/62] AVX512FP16: Add vaddph/vsubph/vdivph/vmulph liuhongt
@ 2021-09-09  7:48   ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-09-09  7:48 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

On Thu, Jul 1, 2021 at 2:17 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config.gcc: Add avx512fp16vlintrin.h.
>         * config/i386/avx512fp16intrin.h: (_mm512_add_ph): New intrinsic.
>         (_mm512_mask_add_ph): Likewise.
>         (_mm512_maskz_add_ph): Likewise.
>         (_mm512_sub_ph): Likewise.
>         (_mm512_mask_sub_ph): Likewise.
>         (_mm512_maskz_sub_ph): Likewise.
>         (_mm512_mul_ph): Likewise.
>         (_mm512_mask_mul_ph): Likewise.
>         (_mm512_maskz_mul_ph): Likewise.
>         (_mm512_div_ph): Likewise.
>         (_mm512_mask_div_ph): Likewise.
>         (_mm512_maskz_div_ph): Likewise.
>         (_mm512_add_round_ph): Likewise.
>         (_mm512_mask_add_round_ph): Likewise.
>         (_mm512_maskz_add_round_ph): Likewise.
>         (_mm512_sub_round_ph): Likewise.
>         (_mm512_mask_sub_round_ph): Likewise.
>         (_mm512_maskz_sub_round_ph): Likewise.
>         (_mm512_mul_round_ph): Likewise.
>         (_mm512_mask_mul_round_ph): Likewise.
>         (_mm512_maskz_mul_round_ph): Likewise.
>         (_mm512_div_round_ph): Likewise.
>         (_mm512_mask_div_round_ph): Likewise.
>         (_mm512_maskz_div_round_ph): Likewise.
>         * config/i386/avx512fp16vlintrin.h: New header.
>         * config/i386/i386-builtin-types.def (V16HF, V8HF, V32HF):
>         Add new builtin types.
>         * config/i386/i386-builtin.def: Add corresponding builtins.
>         * config/i386/i386-expand.c
>         (ix86_expand_args_builtin): Handle new builtin types.
>         (ix86_expand_round_builtin): Likewise.
>         * config/i386/immintrin.h: Include avx512fp16vlintrin.h
>         * config/i386/sse.md (VFH): New mode_iterator.
>         (VF2H): Likewise.
>         (avx512fmaskmode): Add HF vector modes.
>         (avx512fmaskhalfmode): Likewise.
>         (<plusminus_insn><mode>3<mask_name><round_name>): Adjust to for
>         HF vector modes.
>         (*<plusminus_insn><mode>3<mask_name><round_name>): Likewise.
>         (mul<mode>3<mask_name><round_name>): Likewise.
>         (*mul<mode>3<mask_name><round_name>): Likewise.
>         (div<mode>3): Likewise.
>         (<sse>_div<mode>3<mask_name><round_name>): Likewise.
>         * config/i386/subst.md (SUBST_V): Add HF vector modes.
>         (SUBST_A): Likewise.
>         (round_mode512bit_condition): Adjust for V32HFmode.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx-1.c: Add -mavx512vl and test for new intrinsics.
>         * gcc.target/i386/avx-2.c: Add -mavx512vl.
>         * gcc.target/i386/avx512fp16-11a.c: New test.
>         * gcc.target/i386/avx512fp16-11b.c: Ditto.
>         * gcc.target/i386/avx512vlfp16-11a.c: Ditto.
>         * gcc.target/i386/avx512vlfp16-11b.c: Ditto.
>         * gcc.target/i386/sse-13.c: Add test for new builtins.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * gcc.target/i386/sse-14.c: Add test for new intrinsics.
>         * gcc.target/i386/sse-22.c: Ditto.
I'm going to check in 2 patches: this patch and [1] which contains
testcase for this patch.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Newly added runtime tests passed under sde.

[1]https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574125.html

> ---
>  gcc/config.gcc                                |   2 +-
>  gcc/config/i386/avx512fp16intrin.h            | 251 ++++++++++++++++++
>  gcc/config/i386/avx512fp16vlintrin.h          | 219 +++++++++++++++
>  gcc/config/i386/i386-builtin-types.def        |   7 +
>  gcc/config/i386/i386-builtin.def              |  20 ++
>  gcc/config/i386/i386-expand.c                 |   5 +
>  gcc/config/i386/immintrin.h                   |   2 +
>  gcc/config/i386/sse.md                        |  62 +++--
>  gcc/config/i386/subst.md                      |   6 +-
>  gcc/testsuite/gcc.target/i386/avx-1.c         |   8 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
>  .../gcc.target/i386/avx512fp16-11a.c          |  36 +++
>  .../gcc.target/i386/avx512fp16-11b.c          |  75 ++++++
>  .../gcc.target/i386/avx512vlfp16-11a.c        |  68 +++++
>  .../gcc.target/i386/avx512vlfp16-11b.c        |  96 +++++++
>  gcc/testsuite/gcc.target/i386/sse-13.c        |   6 +
>  gcc/testsuite/gcc.target/i386/sse-14.c        |  14 +
>  gcc/testsuite/gcc.target/i386/sse-22.c        |  14 +
>  gcc/testsuite/gcc.target/i386/sse-23.c        |   6 +
>  19 files changed, 872 insertions(+), 27 deletions(-)
>  create mode 100644 gcc/config/i386/avx512fp16vlintrin.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-11a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-11b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 5b4f894185a..d64a8b9407e 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
>                        tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
>                        amxbf16intrin.h x86gprintrin.h uintrintrin.h
>                        hresetintrin.h keylockerintrin.h avxvnniintrin.h
> -                      mwaitintrin.h avx512fp16intrin.h"
> +                      mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h"
>         ;;
>  ia64-*-*)
>         extra_headers=ia64intrin.h
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> index 3fc0770986e..3e9d676dc39 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -217,6 +217,257 @@ _mm_store_sh (void *__P, __m128h __A)
>    *(_Float16 *) __P = ((__v8hf)__A)[0];
>  }
>
> +/* Intrinsics v[add,sub,mul,div]ph.  */
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_add_ph (__m512h __A, __m512h __B)
> +{
> +  return (__m512h) ((__v32hf) __A + (__v32hf) __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_add_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
> +{
> +  return __builtin_ia32_vaddph_v32hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_add_ph (__mmask32 __A, __m512h __B, __m512h __C)
> +{
> +  return __builtin_ia32_vaddph_v32hf_mask (__B, __C,
> +                                          _mm512_setzero_ph (), __A);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_sub_ph (__m512h __A, __m512h __B)
> +{
> +  return (__m512h) ((__v32hf) __A - (__v32hf) __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_sub_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
> +{
> +  return __builtin_ia32_vsubph_v32hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_sub_ph (__mmask32 __A, __m512h __B, __m512h __C)
> +{
> +  return __builtin_ia32_vsubph_v32hf_mask (__B, __C,
> +                                          _mm512_setzero_ph (), __A);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mul_ph (__m512h __A, __m512h __B)
> +{
> +  return (__m512h) ((__v32hf) __A * (__v32hf) __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_mul_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
> +{
> +  return __builtin_ia32_vmulph_v32hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_mul_ph (__mmask32 __A, __m512h __B, __m512h __C)
> +{
> +  return __builtin_ia32_vmulph_v32hf_mask (__B, __C,
> +                                          _mm512_setzero_ph (), __A);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_div_ph (__m512h __A, __m512h __B)
> +{
> +  return (__m512h) ((__v32hf) __A / (__v32hf) __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_div_ph (__m512h __A, __mmask32 __B, __m512h __C, __m512h __D)
> +{
> +  return __builtin_ia32_vdivph_v32hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_div_ph (__mmask32 __A, __m512h __B, __m512h __C)
> +{
> +  return __builtin_ia32_vdivph_v32hf_mask (__B, __C,
> +                                          _mm512_setzero_ph (), __A);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_add_round_ph (__m512h __A, __m512h __B, const int __C)
> +{
> +  return __builtin_ia32_vaddph_v32hf_mask_round (__A, __B,
> +                                                _mm512_setzero_ph (),
> +                                                (__mmask32) -1, __C);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_add_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
> +                         __m512h __D, const int __E)
> +{
> +  return __builtin_ia32_vaddph_v32hf_mask_round (__C, __D, __A, __B, __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_add_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
> +                          const int __D)
> +{
> +  return __builtin_ia32_vaddph_v32hf_mask_round (__B, __C,
> +                                                _mm512_setzero_ph (),
> +                                                __A, __D);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_sub_round_ph (__m512h __A, __m512h __B, const int __C)
> +{
> +  return __builtin_ia32_vsubph_v32hf_mask_round (__A, __B,
> +                                                _mm512_setzero_ph (),
> +                                                (__mmask32) -1, __C);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_sub_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
> +                         __m512h __D, const int __E)
> +{
> +  return __builtin_ia32_vsubph_v32hf_mask_round (__C, __D, __A, __B, __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_sub_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
> +                          const int __D)
> +{
> +  return __builtin_ia32_vsubph_v32hf_mask_round (__B, __C,
> +                                                _mm512_setzero_ph (),
> +                                                __A, __D);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mul_round_ph (__m512h __A, __m512h __B, const int __C)
> +{
> +  return __builtin_ia32_vmulph_v32hf_mask_round (__A, __B,
> +                                                _mm512_setzero_ph (),
> +                                                (__mmask32) -1, __C);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_mul_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
> +                         __m512h __D, const int __E)
> +{
> +  return __builtin_ia32_vmulph_v32hf_mask_round (__C, __D, __A, __B, __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_mul_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
> +                          const int __D)
> +{
> +  return __builtin_ia32_vmulph_v32hf_mask_round (__B, __C,
> +                                                _mm512_setzero_ph (),
> +                                                __A, __D);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_div_round_ph (__m512h __A, __m512h __B, const int __C)
> +{
> +  return __builtin_ia32_vdivph_v32hf_mask_round (__A, __B,
> +                                                _mm512_setzero_ph (),
> +                                                (__mmask32) -1, __C);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_div_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
> +                         __m512h __D, const int __E)
> +{
> +  return __builtin_ia32_vdivph_v32hf_mask_round (__C, __D, __A, __B, __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_div_round_ph (__mmask32 __A, __m512h __B, __m512h __C,
> +                          const int __D)
> +{
> +  return __builtin_ia32_vdivph_v32hf_mask_round (__B, __C,
> +                                                _mm512_setzero_ph (),
> +                                                __A, __D);
> +}
> +#else
> +#define _mm512_add_round_ph(A, B, C)                                   \
> +  ((__m512h)__builtin_ia32_vaddph_v32hf_mask_round((A), (B),           \
> +                                                  _mm512_setzero_ph (),\
> +                                                  (__mmask32)-1, (C)))
> +
> +#define _mm512_mask_add_round_ph(A, B, C, D, E)                        \
> +  ((__m512h)__builtin_ia32_vaddph_v32hf_mask_round((C), (D), (A), (B), (E)))
> +
> +#define _mm512_maskz_add_round_ph(A, B, C, D)                          \
> +  ((__m512h)__builtin_ia32_vaddph_v32hf_mask_round((B), (C),           \
> +                                                  _mm512_setzero_ph (),\
> +                                                  (A), (D)))
> +
> +#define _mm512_sub_round_ph(A, B, C)                                   \
> +  ((__m512h)__builtin_ia32_vsubph_v32hf_mask_round((A), (B),           \
> +                                                  _mm512_setzero_ph (),\
> +                                                  (__mmask32)-1, (C)))
> +
> +#define _mm512_mask_sub_round_ph(A, B, C, D, E)                        \
> +  ((__m512h)__builtin_ia32_vsubph_v32hf_mask_round((C), (D), (A), (B), (E)))
> +
> +#define _mm512_maskz_sub_round_ph(A, B, C, D)                          \
> +  ((__m512h)__builtin_ia32_vsubph_v32hf_mask_round((B), (C),           \
> +                                                  _mm512_setzero_ph (),\
> +                                                  (A), (D)))
> +
> +#define _mm512_mul_round_ph(A, B, C)                                   \
> +  ((__m512h)__builtin_ia32_vmulph_v32hf_mask_round((A), (B),           \
> +                                                  _mm512_setzero_ph (),\
> +                                                  (__mmask32)-1, (C)))
> +
> +#define _mm512_mask_mul_round_ph(A, B, C, D, E)                        \
> +  ((__m512h)__builtin_ia32_vmulph_v32hf_mask_round((C), (D), (A), (B), (E)))
> +
> +#define _mm512_maskz_mul_round_ph(A, B, C, D)                          \
> +  ((__m512h)__builtin_ia32_vmulph_v32hf_mask_round((B), (C),           \
> +                                                  _mm512_setzero_ph (),\
> +                                                  (A), (D)))
> +
> +#define _mm512_div_round_ph(A, B, C)                                   \
> +  ((__m512h)__builtin_ia32_vdivph_v32hf_mask_round((A), (B),           \
> +                                                  _mm512_setzero_ph (),\
> +                                                  (__mmask32)-1, (C)))
> +
> +#define _mm512_mask_div_round_ph(A, B, C, D, E)                        \
> +  ((__m512h)__builtin_ia32_vdivph_v32hf_mask_round((C), (D), (A), (B), (E)))
> +
> +#define _mm512_maskz_div_round_ph(A, B, C, D)                          \
> +  ((__m512h)__builtin_ia32_vdivph_v32hf_mask_round((B), (C),           \
> +                                                  _mm512_setzero_ph (),\
> +                                                  (A), (D)))
> +#endif  /* __OPTIMIZE__  */
> +
>  #ifdef __DISABLE_AVX512FP16__
>  #undef __DISABLE_AVX512FP16__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
> new file mode 100644
> index 00000000000..75fa9eb29e7
> --- /dev/null
> +++ b/gcc/config/i386/avx512fp16vlintrin.h
> @@ -0,0 +1,219 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _IMMINTRIN_H_INCLUDED
> +#error "Never use <avx512fp16vlintrin.h> directly; include <immintrin.h> instead."
> +#endif
> +
> +#ifndef __AVX512FP16VLINTRIN_H_INCLUDED
> +#define __AVX512FP16VLINTRIN_H_INCLUDED
> +
> +#if !defined(__AVX512VL__) || !defined(__AVX512FP16__)
> +#pragma GCC push_options
> +#pragma GCC target("avx512fp16,avx512vl")
> +#define __DISABLE_AVX512FP16VL__
> +#endif /* __AVX512FP16VL__ */
> +
> +/* Intrinsics v[add,sub,mul,div]ph.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_add_ph (__m128h __A, __m128h __B)
> +{
> +  return (__m128h) ((__v8hf) __A + (__v8hf) __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_add_ph (__m256h __A, __m256h __B)
> +{
> +  return (__m256h) ((__v16hf) __A + (__v16hf) __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_add_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return __builtin_ia32_vaddph_v8hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_add_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
> +{
> +  return __builtin_ia32_vaddph_v16hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_add_ph (__mmask8 __A, __m128h __B, __m128h __C)
> +{
> +  return __builtin_ia32_vaddph_v8hf_mask (__B, __C, _mm_setzero_ph (),
> +                                         __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_add_ph (__mmask16 __A, __m256h __B, __m256h __C)
> +{
> +  return __builtin_ia32_vaddph_v16hf_mask (__B, __C,
> +                                          _mm256_setzero_ph (), __A);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_sub_ph (__m128h __A, __m128h __B)
> +{
> +  return (__m128h) ((__v8hf) __A - (__v8hf) __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_sub_ph (__m256h __A, __m256h __B)
> +{
> +  return (__m256h) ((__v16hf) __A - (__v16hf) __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_sub_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return __builtin_ia32_vsubph_v8hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_sub_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
> +{
> +  return __builtin_ia32_vsubph_v16hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_sub_ph (__mmask8 __A, __m128h __B, __m128h __C)
> +{
> +  return __builtin_ia32_vsubph_v8hf_mask (__B, __C, _mm_setzero_ph (),
> +                                         __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_sub_ph (__mmask16 __A, __m256h __B, __m256h __C)
> +{
> +  return __builtin_ia32_vsubph_v16hf_mask (__B, __C,
> +                                          _mm256_setzero_ph (), __A);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mul_ph (__m128h __A, __m128h __B)
> +{
> +  return (__m128h) ((__v8hf) __A * (__v8hf) __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mul_ph (__m256h __A, __m256h __B)
> +{
> +  return (__m256h) ((__v16hf) __A * (__v16hf) __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_mul_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return __builtin_ia32_vmulph_v8hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_mul_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
> +{
> +  return __builtin_ia32_vmulph_v16hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_mul_ph (__mmask8 __A, __m128h __B, __m128h __C)
> +{
> +  return __builtin_ia32_vmulph_v8hf_mask (__B, __C, _mm_setzero_ph (),
> +                                         __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_mul_ph (__mmask16 __A, __m256h __B, __m256h __C)
> +{
> +  return __builtin_ia32_vmulph_v16hf_mask (__B, __C,
> +                                          _mm256_setzero_ph (), __A);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_div_ph (__m128h __A, __m128h __B)
> +{
> +  return (__m128h) ((__v8hf) __A / (__v8hf) __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_div_ph (__m256h __A, __m256h __B)
> +{
> +  return (__m256h) ((__v16hf) __A / (__v16hf) __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_div_ph (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return __builtin_ia32_vdivph_v8hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_div_ph (__m256h __A, __mmask16 __B, __m256h __C, __m256h __D)
> +{
> +  return __builtin_ia32_vdivph_v16hf_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_div_ph (__mmask8 __A, __m128h __B, __m128h __C)
> +{
> +  return __builtin_ia32_vdivph_v8hf_mask (__B, __C, _mm_setzero_ph (),
> +                                         __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_div_ph (__mmask16 __A, __m256h __B, __m256h __C)
> +{
> +  return __builtin_ia32_vdivph_v16hf_mask (__B, __C,
> +                                          _mm256_setzero_ph (), __A);
> +}
> +
> +#ifdef __DISABLE_AVX512FP16VL__
> +#undef __DISABLE_AVX512FP16VL__
> +#pragma GCC pop_options
> +#endif /* __DISABLE_AVX512FP16VL__ */
> +
> +#endif /* __AVX512FP16VLINTRIN_H_INCLUDED */
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index eb5153002ae..ee3b8c30589 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -98,6 +98,7 @@ DEF_VECTOR_TYPE (V16UQI, UQI, V16QI)
>  # AVX vectors
>  DEF_VECTOR_TYPE (V4DF, DOUBLE)
>  DEF_VECTOR_TYPE (V8SF, FLOAT)
> +DEF_VECTOR_TYPE (V16HF, FLOAT16)
>  DEF_VECTOR_TYPE (V4DI, DI)
>  DEF_VECTOR_TYPE (V8SI, SI)
>  DEF_VECTOR_TYPE (V16HI, HI)
> @@ -108,6 +109,7 @@ DEF_VECTOR_TYPE (V16UHI, UHI, V16HI)
>
>  # AVX512F vectors
>  DEF_VECTOR_TYPE (V32SF, FLOAT)
> +DEF_VECTOR_TYPE (V32HF, FLOAT16)
>  DEF_VECTOR_TYPE (V16SF, FLOAT)
>  DEF_VECTOR_TYPE (V8DF, DOUBLE)
>  DEF_VECTOR_TYPE (V8DI, DI)
> @@ -1302,3 +1304,8 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
>
>  # FP16 builtins
>  DEF_FUNCTION_TYPE (V8HF, V8HI)
> +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
> +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
> +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
> +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
> +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
> diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
> index 1cc0cc6968c..b783d266dd8 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -2774,6 +2774,20 @@ BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__builti
>  BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPHI16PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI)
>  BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPHI16PS_V4SF_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI)
>
> +/* AVX512FP16.  */
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_vaddph_v8hf_mask", IX86_BUILTIN_VADDPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv16hf3_mask, "__builtin_ia32_vaddph_v16hf_mask", IX86_BUILTIN_VADDPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask, "__builtin_ia32_vaddph_v32hf_mask", IX86_BUILTIN_VADDPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv8hf3_mask, "__builtin_ia32_vsubph_v8hf_mask", IX86_BUILTIN_VSUBPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv16hf3_mask, "__builtin_ia32_vsubph_v16hf_mask", IX86_BUILTIN_VSUBPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask, "__builtin_ia32_vsubph_v32hf_mask", IX86_BUILTIN_VSUBPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv8hf3_mask, "__builtin_ia32_vmulph_v8hf_mask", IX86_BUILTIN_VMULPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv16hf3_mask, "__builtin_ia32_vmulph_v16hf_mask", IX86_BUILTIN_VMULPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask, "__builtin_ia32_vmulph_v32hf_mask", IX86_BUILTIN_VMULPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv8hf3_mask, "__builtin_ia32_vdivph_v8hf_mask", IX86_BUILTIN_VDIVPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv16hf3_mask, "__builtin_ia32_vdivph_v16hf_mask", IX86_BUILTIN_VDIVPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask, "__builtin_ia32_vdivph_v32hf_mask", IX86_BUILTIN_VDIVPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI)
> +
>  /* Builtins with rounding support.  */
>  BDESC_END (ARGS, ROUND_ARGS)
>
> @@ -2973,6 +2987,12 @@ BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_fixuns_truncv8dfv8di2_mask_round, "
>  BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv16sf_mask_round, "__builtin_ia32_rangeps512_mask", IX86_BUILTIN_RANGEPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_INT_V16SF_HI_INT)
>  BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_rangepv8df_mask_round, "__builtin_ia32_rangepd512_mask", IX86_BUILTIN_RANGEPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_INT_V8DF_QI_INT)
>
> +/* AVX512FP16.  */
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv32hf3_mask_round, "__builtin_ia32_vaddph_v32hf_mask_round", IX86_BUILTIN_VADDPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_subv32hf3_mask_round, "__builtin_ia32_vsubph_v32hf_mask_round", IX86_BUILTIN_VSUBPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_mulv32hf3_mask_round, "__builtin_ia32_vmulph_v32hf_mask_round", IX86_BUILTIN_VMULPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_divv32hf3_mask_round, "__builtin_ia32_vdivph_v32hf_mask_round", IX86_BUILTIN_VDIVPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
> +
>  BDESC_END (ROUND_ARGS, MULTI_ARG)
>
>  /* FMA4 and XOP.  */
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 5ce7163b241..39647eb2cf1 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -9760,6 +9760,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
>      case V16HI_FTYPE_V8SI_V8SI_V16HI_UHI:
>      case V8HI_FTYPE_V4SI_V4SI_V8HI_UQI:
>      case V4DF_FTYPE_V4DF_V4DI_V4DF_UQI:
> +    case V32HF_FTYPE_V32HF_V32HF_V32HF_USI:
>      case V8SF_FTYPE_V8SF_V8SI_V8SF_UQI:
>      case V4SF_FTYPE_V4SF_V4SI_V4SF_UQI:
>      case V2DF_FTYPE_V2DF_V2DI_V2DF_UQI:
> @@ -9777,6 +9778,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
>      case V8HI_FTYPE_V8HI_V8HI_V8HI_UQI:
>      case V8SI_FTYPE_V8SI_V8SI_V8SI_UQI:
>      case V4SI_FTYPE_V4SI_V4SI_V4SI_UQI:
> +    case V16HF_FTYPE_V16HF_V16HF_V16HF_UHI:
>      case V8SF_FTYPE_V8SF_V8SF_V8SF_UQI:
>      case V16QI_FTYPE_V16QI_V16QI_V16QI_UHI:
>      case V16HI_FTYPE_V16HI_V16HI_V16HI_UHI:
> @@ -9784,6 +9786,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
>      case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI:
>      case V4DI_FTYPE_V4DI_V4DI_V4DI_UQI:
>      case V4DF_FTYPE_V4DF_V4DF_V4DF_UQI:
> +    case V8HF_FTYPE_V8HF_V8HF_V8HF_UQI:
>      case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI:
>      case V8DF_FTYPE_V8DF_V8DF_V8DF_UQI:
>      case V8DF_FTYPE_V8DF_V8DI_V8DF_UQI:
> @@ -10460,6 +10463,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
>      case INT_FTYPE_V4SF_INT:
>        nargs = 2;
>        break;
> +    case V32HF_FTYPE_V32HF_V32HF_INT:
>      case V4SF_FTYPE_V4SF_UINT_INT:
>      case V4SF_FTYPE_V4SF_UINT64_INT:
>      case V2DF_FTYPE_V2DF_UINT64_INT:
> @@ -10500,6 +10504,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
>      case V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT:
>      case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT:
>      case V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT:
> +    case V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT:
>      case V2DF_FTYPE_V2DF_V2DF_V2DF_QI_INT:
>      case V2DF_FTYPE_V2DF_V4SF_V2DF_QI_INT:
>      case V2DF_FTYPE_V2DF_V4SF_V2DF_UQI_INT:
> diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
> index 5344e22c9c8..e08efb9dff3 100644
> --- a/gcc/config/i386/immintrin.h
> +++ b/gcc/config/i386/immintrin.h
> @@ -96,6 +96,8 @@
>
>  #include <avx512fp16intrin.h>
>
> +#include <avx512fp16vlintrin.h>
> +
>  #include <shaintrin.h>
>
>  #include <fmaintrin.h>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 1009d656cbb..2c1b6fbcd86 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -295,6 +295,13 @@ (define_mode_iterator VF
>    [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
>
> +(define_mode_iterator VFH
> +  [(V32HF "TARGET_AVX512FP16")
> +   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
> +   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
> +   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
> +   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
> +
>  ;; 128- and 256-bit float vector modes
>  (define_mode_iterator VF_128_256
>    [(V8SF "TARGET_AVX") V4SF
> @@ -318,6 +325,13 @@ (define_mode_iterator VF1_128_256VL
>  (define_mode_iterator VF2
>    [(V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
>
> +;; All DFmode & HFmode vector float modes
> +(define_mode_iterator VF2H
> +  [(V32HF "TARGET_AVX512FP16")
> +   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
> +   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
> +   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
> +
>  ;; 128- and 256-bit DF vector modes
>  (define_mode_iterator VF2_128_256
>    [(V4DF "TARGET_AVX") V2DF])
> @@ -824,6 +838,7 @@ (define_mode_attr avx512fmaskmode
>     (V32HI "SI") (V16HI "HI") (V8HI  "QI") (V4HI "QI")
>     (V16SI "HI") (V8SI  "QI") (V4SI  "QI")
>     (V8DI  "QI") (V4DI  "QI") (V2DI  "QI")
> +   (V32HF "SI") (V16HF "HI") (V8HF  "QI")
>     (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
>     (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
>
> @@ -842,6 +857,7 @@ (define_mode_attr avx512fmaskhalfmode
>     (V32HI "HI") (V16HI "QI") (V8HI  "QI") (V4HI "QI")
>     (V16SI "QI") (V8SI  "QI") (V4SI  "QI")
>     (V8DI  "QI") (V4DI  "QI") (V2DI  "QI")
> +   (V32HF "HI") (V16HF "QI") (V8HF  "QI")
>     (V16SF "QI") (V8SF  "QI") (V4SF  "QI")
>     (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
>
> @@ -1940,18 +1956,18 @@ (define_insn_and_split "*nabs<mode>2"
>    [(set_attr "isa" "noavx,noavx,avx,avx")])
>
>  (define_expand "<insn><mode>3<mask_name><round_name>"
> -  [(set (match_operand:VF 0 "register_operand")
> -       (plusminus:VF
> -         (match_operand:VF 1 "<round_nimm_predicate>")
> -         (match_operand:VF 2 "<round_nimm_predicate>")))]
> +  [(set (match_operand:VFH 0 "register_operand")
> +       (plusminus:VFH
> +         (match_operand:VFH 1 "<round_nimm_predicate>")
> +         (match_operand:VFH 2 "<round_nimm_predicate>")))]
>    "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
>    "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
>
>  (define_insn "*<insn><mode>3<mask_name><round_name>"
> -  [(set (match_operand:VF 0 "register_operand" "=x,v")
> -       (plusminus:VF
> -         (match_operand:VF 1 "<bcst_round_nimm_predicate>" "<comm>0,v")
> -         (match_operand:VF 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
> +  [(set (match_operand:VFH 0 "register_operand" "=x,v")
> +       (plusminus:VFH
> +         (match_operand:VFH 1 "<bcst_round_nimm_predicate>" "<comm>0,v")
> +         (match_operand:VFH 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
>    "TARGET_SSE && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
>     && <mask_mode512bit_condition> && <round_mode512bit_condition>"
>    "@
> @@ -2002,18 +2018,18 @@ (define_insn "<sse>_vm<insn><mode>3<mask_scalar_name><round_scalar_name>"
>     (set_attr "mode" "<ssescalarmode>")])
>
>  (define_expand "mul<mode>3<mask_name><round_name>"
> -  [(set (match_operand:VF 0 "register_operand")
> -       (mult:VF
> -         (match_operand:VF 1 "<round_nimm_predicate>")
> -         (match_operand:VF 2 "<round_nimm_predicate>")))]
> +  [(set (match_operand:VFH 0 "register_operand")
> +       (mult:VFH
> +         (match_operand:VFH 1 "<round_nimm_predicate>")
> +         (match_operand:VFH 2 "<round_nimm_predicate>")))]
>    "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
>    "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
>
>  (define_insn "*mul<mode>3<mask_name><round_name>"
> -  [(set (match_operand:VF 0 "register_operand" "=x,v")
> -       (mult:VF
> -         (match_operand:VF 1 "<bcst_round_nimm_predicate>" "%0,v")
> -         (match_operand:VF 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
> +  [(set (match_operand:VFH 0 "register_operand" "=x,v")
> +       (mult:VFH
> +         (match_operand:VFH 1 "<bcst_round_nimm_predicate>" "%0,v")
> +         (match_operand:VFH 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
>    "TARGET_SSE && ix86_binary_operator_ok (MULT, <MODE>mode, operands)
>     && <mask_mode512bit_condition> && <round_mode512bit_condition>"
>    "@
> @@ -2067,9 +2083,9 @@ (define_insn "<sse>_vm<multdiv_mnemonic><mode>3<mask_scalar_name><round_scalar_n
>     (set_attr "mode" "<ssescalarmode>")])
>
>  (define_expand "div<mode>3"
> -  [(set (match_operand:VF2 0 "register_operand")
> -       (div:VF2 (match_operand:VF2 1 "register_operand")
> -                (match_operand:VF2 2 "vector_operand")))]
> +  [(set (match_operand:VF2H 0 "register_operand")
> +       (div:VF2H (match_operand:VF2H 1 "register_operand")
> +                 (match_operand:VF2H 2 "vector_operand")))]
>    "TARGET_SSE2")
>
>  (define_expand "div<mode>3"
> @@ -2090,10 +2106,10 @@ (define_expand "div<mode>3"
>  })
>
>  (define_insn "<sse>_div<mode>3<mask_name><round_name>"
> -  [(set (match_operand:VF 0 "register_operand" "=x,v")
> -       (div:VF
> -         (match_operand:VF 1 "register_operand" "0,v")
> -         (match_operand:VF 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
> +  [(set (match_operand:VFH 0 "register_operand" "=x,v")
> +       (div:VFH
> +         (match_operand:VFH 1 "register_operand" "0,v")
> +         (match_operand:VFH 2 "<bcst_round_nimm_predicate>" "xBm,<bcst_round_constraint>")))]
>    "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
>    "@
>     div<ssemodesuffix>\t{%2, %0|%0, %2}
> diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
> index 477a89803fa..762383bfd11 100644
> --- a/gcc/config/i386/subst.md
> +++ b/gcc/config/i386/subst.md
> @@ -24,6 +24,7 @@ (define_mode_iterator SUBST_V
>     V32HI V16HI V8HI
>     V16SI V8SI  V4SI
>     V8DI  V4DI  V2DI
> +   V32HF V16HF V8HF
>     V16SF V8SF  V4SF
>     V8DF  V4DF  V2DF])
>
> @@ -35,6 +36,7 @@ (define_mode_iterator SUBST_A
>     V32HI V16HI V8HI
>     V16SI V8SI  V4SI
>     V8DI  V4DI  V2DI
> +   V32HF V16HF V8HF
>     V16SF V8SF  V4SF
>     V8DF  V4DF  V2DF
>     QI HI SI DI SF DF])
> @@ -142,7 +144,9 @@ (define_subst_attr "round_prefix" "round" "vex" "evex")
>  (define_subst_attr "round_mode512bit_condition" "round" "1" "(<MODE>mode == V16SFmode
>                                                               || <MODE>mode == V8DFmode
>                                                               || <MODE>mode == V8DImode
> -                                                             || <MODE>mode == V16SImode)")
> +                                                             || <MODE>mode == V16SImode
> +                                                             || <MODE>mode == V32HFmode)")
> +
>  (define_subst_attr "round_modev8sf_condition" "round" "1" "(<MODE>mode == V8SFmode)")
>  (define_subst_attr "round_modev4sf_condition" "round" "1" "(<MODE>mode == V4SFmode)")
>  (define_subst_attr "round_codefor" "round" "*" "")
> diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> index f3676077743..1eaee861141 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
> +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16 -mavx512vl" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> @@ -685,6 +685,12 @@
>  #define __builtin_ia32_vpshld_v2di(A, B, C) __builtin_ia32_vpshld_v2di(A, B, 1)
>  #define __builtin_ia32_vpshld_v2di_mask(A, B, C, D, E)  __builtin_ia32_vpshld_v2di_mask(A, B, 1, D, E)
>
> +/* avx512fp16intrin.h */
> +#define __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8)
> +
>  /* vpclmulqdqintrin.h */
>  #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1)
>  #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
> index 1751c52565c..642ae4d7bfb 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
> +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16 -mavx512vl" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-11a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-11a.c
> new file mode 100644
> index 00000000000..28492fa3f7b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-11a.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile} */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +#include <immintrin.h>
> +__m512h
> +__attribute__ ((noinline, noclone))
> +vadd512 (__m512h a, __m512h b)
> +{
> +  return a + b;
> +}
> +
> +__m512h
> +__attribute__ ((noinline, noclone))
> +vsub512 (__m512h a, __m512h b)
> +{
> +  return a - b;
> +}
> +
> +__m512h
> +__attribute__ ((noinline, noclone))
> +vmul512 (__m512h a, __m512h b)
> +{
> +  return a * b;
> +}
> +
> +__m512h
> +__attribute__ ((noinline, noclone))
> +vdiv512 (__m512h a, __m512h b)
> +{
> +  return a / b;
> +}
> +
> +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\n\r\]*%zmm\[01\]" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-11b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-11b.c
> new file mode 100644
> index 00000000000..fc105152d2f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-11b.c
> @@ -0,0 +1,75 @@
> +/* { dg-do run { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +static void do_test (void);
> +
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +#include "avx512fp16-11a.c"
> +
> +/* Get random float16 between -50.x to 50.x.  */
> +_Float16
> +get_float16_noround()
> +{
> +  return ((int) (100.0 * rand ()/ (RAND_MAX + 1.0)) - 50)
> +    + 0.1f * (int) (10 * rand() / (RAND_MAX + 1.0));
> +}
> +
> +static void
> +do_test (void)
> +{
> +  _Float16 x[32];
> +  _Float16 y[32];
> +  _Float16 res_add[32];
> +  _Float16 res_sub[32];
> +  _Float16 res_mul[32];
> +  _Float16 res_div[32];
> +  for (int i = 0 ; i != 32; i++)
> +    {
> +      x[i] = get_float16_noround ();
> +      y[i] = get_float16_noround ();
> +      if (y[i] == 0)
> +       y[i] = 1.0f;
> +      res_add[i] = x[i] + y[i];
> +      res_sub[i] = x[i] - y[i];
> +      res_mul[i] = x[i] * y[i];
> +      res_div[i] = x[i] / y[i];
> +
> +    }
> +
> +  union512h u512 = { x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
> +      x[8], x[9], x[10], x[11], x[12], x[13], x[14], x[15],
> +      x[16], x[17], x[18], x[19], x[20], x[21], x[22], x[23],
> +      x[24], x[25], x[26], x[27], x[28], x[29], x[30], x[31] };
> +  union512h u512_1 = {y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7],
> +      y[8], y[9], y[10], y[11], y[12], y[13], y[14], y[15],
> +      y[16], y[17], y[18], y[19], y[20], y[21], y[22], y[23],
> +      y[24], y[25], y[26], y[27], y[28], y[29], y[30], y[31] };
> +
> +  __m512h v512;
> +  union512h a512;
> +
> +  memset (&v512, -1, sizeof (v512));
> +  v512 = vadd512 (u512.x, u512_1.x);
> +  a512.x = v512;
> +  if (check_union512h (a512, res_add))
> +    abort ();
> +  memset (&v512, -1, sizeof (v512));
> +  v512 = vsub512 (u512.x, u512_1.x);
> +  a512.x = v512;
> +  if (check_union512h (a512, res_sub))
> +    abort ();
> +  memset (&v512, -1, sizeof (v512));
> +  v512 = vmul512 (u512.x, u512_1.x);
> +  a512.x = v512;
> +  if (check_union512h (a512, res_mul))
> +    abort ();
> +  memset (&v512, -1, sizeof (v512));
> +  v512 = vdiv512 (u512.x, u512_1.x);
> +  a512.x = v512;
> +  if (check_union512h (a512, res_div))
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c
> new file mode 100644
> index 00000000000..a8c6296f504
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11a.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile} */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
> +
> +#include <immintrin.h>
> +__m128h
> +__attribute__ ((noinline, noclone))
> +vadd128 (__m128h a, __m128h b)
> +{
> +  return a + b;
> +}
> +
> +__m256h
> +__attribute__ ((noinline, noclone))
> +vadd256 (__m256h a, __m256h b)
> +{
> +  return a + b;
> +}
> +
> +__m128h
> +__attribute__ ((noinline, noclone))
> +vsub128 (__m128h a, __m128h b)
> +{
> +  return a - b;
> +}
> +
> +__m256h
> +__attribute__ ((noinline, noclone))
> +vsub256 (__m256h a, __m256h b)
> +{
> +  return a - b;
> +}
> +
> +__m128h
> +__attribute__ ((noinline, noclone))
> +vmul128 (__m128h a, __m128h b)
> +{
> +  return a * b;
> +}
> +
> +__m256h
> +__attribute__ ((noinline, noclone))
> +vmul256 (__m256h a, __m256h b)
> +{
> +  return a * b;
> +}
> +
> +__m128h
> +__attribute__ ((noinline, noclone))
> +vdiv128 (__m128h a, __m128h b)
> +{
> +  return a / b;
> +}
> +
> +__m256h
> +__attribute__ ((noinline, noclone))
> +vdiv256 (__m256h a, __m256h b)
> +{
> +  return a / b;
> +}
> +
> +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vaddph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vsubph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vmulph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\n\r\]*%xmm\[01\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vdivph\[ \\t\]+\[^\n\r\]*%ymm\[01\]" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c
> new file mode 100644
> index 00000000000..b8d3e8a4e96
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512vlfp16-11b.c
> @@ -0,0 +1,96 @@
> +/* { dg-do run { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +static void do_test (void);
> +
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +#include "avx512vlfp16-11a.c"
> +
> +/* Get random float16 between -50.x to 50.x.  */
> +_Float16
> +get_float16_noround()
> +{
> +  return ((int) (100.0 * rand ()/ (RAND_MAX + 1.0)) - 50)
> +    + 0.1f * (int) (10 * rand() / (RAND_MAX + 1.0));
> +}
> +
> +static void
> +do_test (void)
> +{
> +  _Float16 x[16];
> +  _Float16 y[16];
> +  _Float16 res_add[16];
> +  _Float16 res_sub[16];
> +  _Float16 res_mul[16];
> +  _Float16 res_div[16];
> +  for (int i = 0 ; i != 16; i++)
> +    {
> +      x[i] = get_float16_noround ();
> +      y[i] = get_float16_noround ();
> +      if (y[i] == 0)
> +       y[i] = 1.0f;
> +      res_add[i] = x[i] + y[i];
> +      res_sub[i] = x[i] - y[i];
> +      res_mul[i] = x[i] * y[i];
> +      res_div[i] = x[i] / y[i];
> +
> +    }
> +
> +  union128h u128 = { x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7] };
> +  union128h u128_1 = { y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7] };
> +  union256h u256 = { x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
> +      x[8], x[9], x[10], x[11], x[12], x[13], x[14], x[15] };
> +  union256h u256_1 = { y[0], y[1], y[2], y[3], y[4], y[5], y[6], y[7],
> +      y[8], y[9], y[10], y[11], y[12], y[13], y[14], y[15]};
> +
> +  __m128h v128;
> +  __m256h v256;
> +  union128h a128;
> +  union256h a256;
> +
> +  memset (&v128, -1, sizeof (v128));
> +  v128 = vadd128 (u128.x, u128_1.x);
> +  a128.x = v128;
> +  if (check_union128h (a128, res_add))
> +    abort ();
> +  memset (&v128, -1, sizeof (v128));
> +  v128 = vsub128 (u128.x, u128_1.x);
> +  a128.x = v128;
> +  if (check_union128h (a128, res_sub))
> +    abort ();
> +  memset (&v128, -1, sizeof (v128));
> +  v128 = vmul128 (u128.x, u128_1.x);
> +  a128.x = v128;
> +  if (check_union128h (a128, res_mul))
> +    abort ();
> +  memset (&v128, -1, sizeof (v128));
> +  v128 = vdiv128 (u128.x, u128_1.x);
> +  a128.x = v128;
> +  if (check_union128h (a128, res_div))
> +    abort ();
> +
> +  memset (&v256, -1, sizeof (v256));
> +  v256 = vadd256 (u256.x, u256_1.x);
> +  a256.x = v256;
> +  if (check_union256h (a256, res_add))
> +    abort ();
> +  memset (&v256, -1, sizeof (v256));
> +  v256 = vsub256 (u256.x, u256_1.x);
> +  a256.x = v256;
> +  if (check_union256h (a256, res_sub))
> +    abort ();
> +  memset (&v256, -1, sizeof (v256));
> +  v256 = vmul256 (u256.x, u256_1.x);
> +  a256.x = v256;
> +  if (check_union256h (a256, res_mul))
> +    abort ();
> +  memset (&v256, -1, sizeof (v256));
> +  v256 = vdiv256 (u256.x, u256_1.x);
> +  a256.x = v256;
> +  if (check_union256h (a256, res_div))
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> index f5f5c113612..50ed74cd6d6 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> @@ -702,6 +702,12 @@
>  #define __builtin_ia32_vpshld_v2di(A, B, C) __builtin_ia32_vpshld_v2di(A, B, 1)
>  #define __builtin_ia32_vpshld_v2di_mask(A, B, C, D, E)  __builtin_ia32_vpshld_v2di_mask(A, B, 1, D, E)
>
> +/* avx512fp16intrin.h */
> +#define __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8)
> +
>  /* vpclmulqdqintrin.h */
>  #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1)
>  #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> index 747d504cedb..26a5e94c7ca 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> @@ -667,6 +667,20 @@ test_3 (_mm512_mask_rcp28_round_ps, __m512, __m512, __mmask16, __m512, 8)
>  test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 8)
>  test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 8)
>
> +/* avx512fp16intrin.h */
> +test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
> +test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
> +test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
> +test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> +
>  /* shaintrin.h */
>  test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> index 33411969901..8d25effd724 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> @@ -772,6 +772,20 @@ test_2 (_mm_rcp28_round_ss, __m128, __m128, __m128, 8)
>  test_2 (_mm_rsqrt28_round_sd, __m128d, __m128d, __m128d, 8)
>  test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 8)
>
> +/* avx512fp16intrin.h */
> +test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
> +test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
> +test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
> +test_2 (_mm512_div_round_ph, __m512h, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_add_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_sub_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_mul_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_div_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_sub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_mul_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_div_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> +
>  /* shaintrin.h */
>  test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1)
>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> index 86590ca5ffb..f7dd5d7495c 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> @@ -703,6 +703,12 @@
>  #define __builtin_ia32_vpshld_v2di(A, B, C) __builtin_ia32_vpshld_v2di(A, B, 1)
>  #define __builtin_ia32_vpshld_v2di_mask(A, B, C, D, E)  __builtin_ia32_vpshld_v2di_mask(A, B, 1, D, E)
>
> +/* avx512fp16intrin.h */
> +#define __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vaddph_v32hf_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vsubph_v32hf_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vmulph_v32hf_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vdivph_v32hf_mask_round(A, B, C, D, 8)
> +
>  /* vpclmulqdqintrin.h */
>  #define __builtin_ia32_vpclmulqdq_v4di(A, B, C)  __builtin_ia32_vpclmulqdq_v4di(A, B, 1)
>  #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1)
> --
> 2.18.1
>


--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 09/62] AVX512FP16: Enable _Float16 autovectorization
  2021-07-01  6:15 ` [PATCH 09/62] AVX512FP16: Enable _Float16 autovectorization liuhongt
@ 2021-09-10  7:03   ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-09-10  7:03 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 14767 bytes --]

On Thu, Jul 1, 2021 at 2:17 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> From: "H.J. Lu" <hjl.tools@gmail.com>
>
> gcc/ChangeLog:
>
>         * config/i386/i386-expand.c
>         (ix86_avx256_split_vector_move_misalign): Handle V16HF mode.
>         * config/i386/i386.c
>         (ix86_preferred_simd_mode): Handle HF mode.
>         * config/i386/sse.md (V_256H): New mode iterator.
>         (avx_vextractf128<mode>): Use it.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/vect-float16-1.c: New test.
>         * gcc.target/i386/vect-float16-10.c: Ditto.
>         * gcc.target/i386/vect-float16-11.c: Ditto.
>         * gcc.target/i386/vect-float16-12.c: Ditto.
>         * gcc.target/i386/vect-float16-2.c: Ditto.
>         * gcc.target/i386/vect-float16-3.c: Ditto.
>         * gcc.target/i386/vect-float16-4.c: Ditto.
>         * gcc.target/i386/vect-float16-5.c: Ditto.
>         * gcc.target/i386/vect-float16-6.c: Ditto.
>         * gcc.target/i386/vect-float16-7.c: Ditto.
>         * gcc.target/i386/vect-float16-8.c: Ditto.
>         * gcc.target/i386/vect-float16-9.c: Ditto.
I'm going to check in this patch w/ a bit change, the change is
removing TARGET_AVX512FP16 for vector HFmodes when vpinsrw/../vpextrw
instructions are used for V*HFmodevector_init and
vector_extract{,_lo/hi}.
Attach an updated patch.
Also check in 6 patches which are [PATCH 10/62] to [PATH 15/62].

[PATCH 10/62] AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh.
[PATCH 11/62] AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh.
[PATCH 12/62] AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh.
[PATCH 13/62] AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh.
[PATCH 14/62] AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish.
[PATCH 15/62] AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish.

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
  Also newly added runtime testcases  were run on sde/SPR.

[10] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574128.html
[11] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574127.html
[12] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574129.html
[13] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574130.html
[14] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574131.html
[15] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574132.html

> ---
>  gcc/config/i386/i386-expand.c                   |  4 ++++
>  gcc/config/i386/i386.c                          | 14 ++++++++++++++
>  gcc/config/i386/sse.md                          |  7 ++++++-
>  gcc/testsuite/gcc.target/i386/vect-float16-1.c  | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-10.c | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-11.c | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-12.c | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-2.c  | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-3.c  | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-4.c  | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-5.c  | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-6.c  | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-7.c  | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-8.c  | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/i386/vect-float16-9.c  | 14 ++++++++++++++
>  15 files changed, 192 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-10.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-11.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-12.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-7.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-9.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 39647eb2cf1..df50c72ab16 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -498,6 +498,10 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1)
>        extract = gen_avx_vextractf128v32qi;
>        mode = V16QImode;
>        break;
> +    case E_V16HFmode:
> +      extract = gen_avx_vextractf128v16hf;
> +      mode = V8HFmode;
> +      break;
>      case E_V8SFmode:
>        extract = gen_avx_vextractf128v8sf;
>        mode = V4SFmode;
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 79e6880d9dd..dc0d440061b 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -22360,6 +22360,20 @@ ix86_preferred_simd_mode (scalar_mode mode)
>        else
>         return V2DImode;
>
> +    case E_HFmode:
> +      if (TARGET_AVX512FP16)
> +       {
> +         if (TARGET_AVX512VL)
> +           {
> +             if (TARGET_PREFER_AVX128)
> +               return V8HFmode;
> +             else if (TARGET_PREFER_AVX256)
> +               return V16HFmode;
> +           }
> +         return V32HFmode;
> +       }
> +      return word_mode;
> +
>      case E_SFmode:
>        if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
>         return V16SFmode;
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 2c1b6fbcd86..a0cfd611006 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -276,6 +276,11 @@ (define_mode_iterator V_128
>  (define_mode_iterator V_256
>    [V32QI V16HI V8SI V4DI V8SF V4DF])
>
> +;; All 256bit vector modes including HF vector mode
> +(define_mode_iterator V_256H
> +  [V32QI V16HI V8SI V4DI V8SF V4DF
> +   (V16HF "TARGET_AVX512F && TARGET_AVX512VL")])
> +
>  ;; All 128bit and 256bit vector modes
>  (define_mode_iterator V_128_256
>    [V32QI V16QI V16HI V8HI V8SI V4SI V4DI V2DI V8SF V4SF V4DF V2DF])
> @@ -9045,7 +9050,7 @@ (define_expand "avx512vl_vextractf128<mode>"
>
>  (define_expand "avx_vextractf128<mode>"
>    [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
> -   (match_operand:V_256 1 "register_operand")
> +   (match_operand:V_256H 1 "register_operand")
>     (match_operand:SI 2 "const_0_to_1_operand")]
>    "TARGET_AVX"
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-1.c b/gcc/testsuite/gcc.target/i386/vect-float16-1.c
> new file mode 100644
> index 00000000000..0f82cf94932
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 256; i++)
> +    a[i] = b[i] + c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vaddph" 8 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-10.c b/gcc/testsuite/gcc.target/i386/vect-float16-10.c
> new file mode 100644
> index 00000000000..217645692ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-10.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 256; i++)
> +    a[i] = b[i] / c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vdivph" 8 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-11.c b/gcc/testsuite/gcc.target/i386/vect-float16-11.c
> new file mode 100644
> index 00000000000..e0409ce9d3f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-11.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 128; i++)
> +    a[i] = b[i] / c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vdivph" 16 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-12.c b/gcc/testsuite/gcc.target/i386/vect-float16-12.c
> new file mode 100644
> index 00000000000..d92a25dc255
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-12.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 256; i++)
> +    a[i] = b[i] / c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vdivph" 16 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-2.c b/gcc/testsuite/gcc.target/i386/vect-float16-2.c
> new file mode 100644
> index 00000000000..974fca4ce09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 128; i++)
> +    a[i] = b[i] + c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vaddph" 16 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-3.c b/gcc/testsuite/gcc.target/i386/vect-float16-3.c
> new file mode 100644
> index 00000000000..9bca9142df7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-3.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 256; i++)
> +    a[i] = b[i] + c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vaddph" 16 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-4.c b/gcc/testsuite/gcc.target/i386/vect-float16-4.c
> new file mode 100644
> index 00000000000..e6f26f0aa40
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 256; i++)
> +    a[i] = b[i] - c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vsubph" 8 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-5.c b/gcc/testsuite/gcc.target/i386/vect-float16-5.c
> new file mode 100644
> index 00000000000..38f287b1dc0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-5.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 128; i++)
> +    a[i] = b[i] - c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vsubph" 16 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-6.c b/gcc/testsuite/gcc.target/i386/vect-float16-6.c
> new file mode 100644
> index 00000000000..bc9f7870061
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-6.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 256; i++)
> +    a[i] = b[i] - c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vsubph" 16 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-7.c b/gcc/testsuite/gcc.target/i386/vect-float16-7.c
> new file mode 100644
> index 00000000000..b4849cf77c7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-7.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 256; i++)
> +    a[i] = b[i] * c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vmulph" 8 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-8.c b/gcc/testsuite/gcc.target/i386/vect-float16-8.c
> new file mode 100644
> index 00000000000..71631b17cc3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-8.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 128; i++)
> +    a[i] = b[i] * c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vmulph" 16 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-9.c b/gcc/testsuite/gcc.target/i386/vect-float16-9.c
> new file mode 100644
> index 00000000000..1be5c7f022f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-float16-9.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
> +
> +void
> +foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> +     _Float16 *__restrict__ c)
> +{
> +  for (int i = 0; i < 256; i++)
> +    a[i] = b[i] * c[i];
> +}
> +
> +/* { dg-final { scan-assembler-times "vmulph" 16 } } */
> --
> 2.18.1
>


-- 
BR,
Hongtao

[-- Attachment #2: 0001-AVX512FP16-Enable-_Float16-autovectorization.patch --]
[-- Type: text/x-patch, Size: 15779 bytes --]

From 02399fddf24a2d7db60feaa8027b9cf95687024b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sun, 27 Jan 2019 19:38:02 -0800
Subject: [PATCH 1/7] AVX512FP16: Enable _Float16 autovectorization

gcc/ChangeLog:

	* config/i386/i386-expand.c
	(ix86_avx256_split_vector_move_misalign): Handle V16HF mode.
	* config/i386/i386.c
	(ix86_preferred_simd_mode): Handle HF mode.
	* config/i386/sse.md (V_256H): New mode iterator.
	(avx_vextractf128<mode>): Use it.
	(VEC_INIT_MODE): Align vector HFmode condition to vector
	HImodes since there're no real HF instruction used.
	(VEC_INIT_HALF_MODE): Ditto.
	(VIHF): Ditto.
	(VIHF_AVX512BW): Ditto.
	(*vec_extracthf): Ditto.
	(VEC_EXTRACT_MODE): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vect-float16-1.c: New test.
	* gcc.target/i386/vect-float16-10.c: Ditto.
	* gcc.target/i386/vect-float16-11.c: Ditto.
	* gcc.target/i386/vect-float16-12.c: Ditto.
	* gcc.target/i386/vect-float16-2.c: Ditto.
	* gcc.target/i386/vect-float16-3.c: Ditto.
	* gcc.target/i386/vect-float16-4.c: Ditto.
	* gcc.target/i386/vect-float16-5.c: Ditto.
	* gcc.target/i386/vect-float16-6.c: Ditto.
	* gcc.target/i386/vect-float16-7.c: Ditto.
	* gcc.target/i386/vect-float16-8.c: Ditto.
	* gcc.target/i386/vect-float16-9.c: Ditto.
---
 gcc/config/i386/i386-expand.c                 |  4 ++++
 gcc/config/i386/i386.c                        | 14 +++++++++++
 gcc/config/i386/sse.md                        | 24 +++++++++----------
 .../gcc.target/i386/vect-float16-1.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-10.c         | 14 +++++++++++
 .../gcc.target/i386/vect-float16-11.c         | 14 +++++++++++
 .../gcc.target/i386/vect-float16-12.c         | 14 +++++++++++
 .../gcc.target/i386/vect-float16-2.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-3.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-4.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-5.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-6.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-7.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-8.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-9.c          | 14 +++++++++++
 15 files changed, 198 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-9.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 0c1aec585fe..cac8354a067 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -678,6 +678,10 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1)
       extract = gen_avx_vextractf128v32qi;
       mode = V16QImode;
       break;
+    case E_V16HFmode:
+      extract = gen_avx_vextractf128v16hf;
+      mode = V8HFmode;
+      break;
     case E_V8SFmode:
       extract = gen_avx_vextractf128v8sf;
       mode = V4SFmode;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index dc649f96d0d..7b173bc0beb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22641,6 +22641,20 @@ ix86_preferred_simd_mode (scalar_mode mode)
       else
 	return V2DImode;
 
+    case E_HFmode:
+      if (TARGET_AVX512FP16)
+	{
+	  if (TARGET_AVX512VL)
+	    {
+	      if (TARGET_PREFER_AVX128)
+		return V8HFmode;
+	      else if (TARGET_PREFER_AVX256)
+		return V16HFmode;
+	    }
+	  return V32HFmode;
+	}
+      return word_mode;
+
     case E_SFmode:
       if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
 	return V16SFmode;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 06339163bc5..26024609e2b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -279,6 +279,10 @@ (define_mode_iterator V_128
 (define_mode_iterator V_256
   [V32QI V16HI V8SI V4DI V8SF V4DF])
 
+;; All 256bit vector modes including HF vector mode
+(define_mode_iterator V_256H
+  [V32QI V16HI V8SI V4DI V8SF V4DF V16HF])
+
 ;; All 128bit and 256bit vector modes
 (define_mode_iterator V_128_256
   [V32QI V16QI V16HI V8HI V8SI V4SI V4DI V2DI V8SF V4SF V4DF V2DF])
@@ -406,8 +410,7 @@ (define_mode_iterator VIHF
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V8SI "TARGET_AVX") V4SI
    (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
-   (V8HF "TARGET_AVX512FP16")])
+   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF])
 
 (define_mode_iterator VI_AVX2
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
@@ -752,7 +755,7 @@ (define_mode_iterator VI_AVX512BW
   [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
 (define_mode_iterator VIHF_AVX512BW
   [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
-  (V32HF "TARGET_AVX512FP16")])
+  (V32HF "TARGET_AVX512BW")])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -9381,7 +9384,7 @@ (define_expand "avx512vl_vextractf128<mode>"
 
 (define_expand "avx_vextractf128<mode>"
   [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
-   (match_operand:V_256 1 "register_operand")
+   (match_operand:V_256H 1 "register_operand")
    (match_operand:SI 2 "const_0_to_1_operand")]
   "TARGET_AVX"
 {
@@ -9868,7 +9871,7 @@ (define_insn "*vec_extracthf"
 	  (match_operand:V8HF 1 "register_operand" "v,v")
 	  (parallel
 	    [(match_operand:SI 2 "const_0_to_7_operand")])))]
-  "TARGET_AVX512FP16"
+  "TARGET_SSE2"
   "@
    vpextrw\t{%2, %1, %k0|%k0, %1, %2}
    vpextrw\t{%2, %1, %0|%0, %1, %2}"
@@ -9882,8 +9885,7 @@ (define_mode_iterator VEC_EXTRACT_MODE
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
-   (V8HF "TARGET_AVX512FP16")
+   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -15615,7 +15617,7 @@ (define_expand "vec_interleave_low<mode>"
 
 ;; Modes handled by pinsr patterns.
 (define_mode_iterator PINSR_MODE
-  [(V16QI "TARGET_SSE4_1") V8HI (V8HF "TARGET_AVX512FP16")
+  [(V16QI "TARGET_SSE4_1") V8HI V8HF
    (V4SI "TARGET_SSE4_1")
    (V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
 
@@ -23723,8 +23725,7 @@ (define_mode_iterator VEC_INIT_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
-   (V8HF "TARGET_AVX512FP16")
+   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -23736,8 +23737,7 @@ (define_mode_iterator VEC_INIT_HALF_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
-   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
-   (V8HF "TARGET_AVX512FP16")
+   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
    (V4TI "TARGET_AVX512F")])
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-1.c b/gcc/testsuite/gcc.target/i386/vect-float16-1.c
new file mode 100644
index 00000000000..0f82cf94932
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times "vaddph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-10.c b/gcc/testsuite/gcc.target/i386/vect-float16-10.c
new file mode 100644
index 00000000000..217645692ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-10.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] / c[i];
+}
+
+/* { dg-final { scan-assembler-times "vdivph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-11.c b/gcc/testsuite/gcc.target/i386/vect-float16-11.c
new file mode 100644
index 00000000000..e0409ce9d3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-11.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] / c[i];
+}
+
+/* { dg-final { scan-assembler-times "vdivph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-12.c b/gcc/testsuite/gcc.target/i386/vect-float16-12.c
new file mode 100644
index 00000000000..d92a25dc255
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-12.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] / c[i];
+}
+
+/* { dg-final { scan-assembler-times "vdivph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-2.c b/gcc/testsuite/gcc.target/i386/vect-float16-2.c
new file mode 100644
index 00000000000..974fca4ce09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times "vaddph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-3.c b/gcc/testsuite/gcc.target/i386/vect-float16-3.c
new file mode 100644
index 00000000000..9bca9142df7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times "vaddph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-4.c b/gcc/testsuite/gcc.target/i386/vect-float16-4.c
new file mode 100644
index 00000000000..e6f26f0aa40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] - c[i];
+}
+
+/* { dg-final { scan-assembler-times "vsubph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-5.c b/gcc/testsuite/gcc.target/i386/vect-float16-5.c
new file mode 100644
index 00000000000..38f287b1dc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-5.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] - c[i];
+}
+
+/* { dg-final { scan-assembler-times "vsubph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-6.c b/gcc/testsuite/gcc.target/i386/vect-float16-6.c
new file mode 100644
index 00000000000..bc9f7870061
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-6.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] - c[i];
+}
+
+/* { dg-final { scan-assembler-times "vsubph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-7.c b/gcc/testsuite/gcc.target/i386/vect-float16-7.c
new file mode 100644
index 00000000000..b4849cf77c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-7.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] * c[i];
+}
+
+/* { dg-final { scan-assembler-times "vmulph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-8.c b/gcc/testsuite/gcc.target/i386/vect-float16-8.c
new file mode 100644
index 00000000000..71631b17cc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] * c[i];
+}
+
+/* { dg-final { scan-assembler-times "vmulph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-9.c b/gcc/testsuite/gcc.target/i386/vect-float16-9.c
new file mode 100644
index 00000000000..1be5c7f022f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-9.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] * c[i];
+}
+
+/* { dg-final { scan-assembler-times "vmulph" 16 } } */
-- 
2.27.0


[-- Attachment #3: 0001-AVX512FP16-Enable-_Float16-autovectorization.patch --]
[-- Type: text/x-patch, Size: 15779 bytes --]

From 02399fddf24a2d7db60feaa8027b9cf95687024b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sun, 27 Jan 2019 19:38:02 -0800
Subject: [PATCH 1/7] AVX512FP16: Enable _Float16 autovectorization

gcc/ChangeLog:

	* config/i386/i386-expand.c
	(ix86_avx256_split_vector_move_misalign): Handle V16HF mode.
	* config/i386/i386.c
	(ix86_preferred_simd_mode): Handle HF mode.
	* config/i386/sse.md (V_256H): New mode iterator.
	(avx_vextractf128<mode>): Use it.
	(VEC_INIT_MODE): Align vector HFmode condition to vector
	HImodes since there're no real HF instruction used.
	(VEC_INIT_HALF_MODE): Ditto.
	(VIHF): Ditto.
	(VIHF_AVX512BW): Ditto.
	(*vec_extracthf): Ditto.
	(VEC_EXTRACT_MODE): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vect-float16-1.c: New test.
	* gcc.target/i386/vect-float16-10.c: Ditto.
	* gcc.target/i386/vect-float16-11.c: Ditto.
	* gcc.target/i386/vect-float16-12.c: Ditto.
	* gcc.target/i386/vect-float16-2.c: Ditto.
	* gcc.target/i386/vect-float16-3.c: Ditto.
	* gcc.target/i386/vect-float16-4.c: Ditto.
	* gcc.target/i386/vect-float16-5.c: Ditto.
	* gcc.target/i386/vect-float16-6.c: Ditto.
	* gcc.target/i386/vect-float16-7.c: Ditto.
	* gcc.target/i386/vect-float16-8.c: Ditto.
	* gcc.target/i386/vect-float16-9.c: Ditto.
---
 gcc/config/i386/i386-expand.c                 |  4 ++++
 gcc/config/i386/i386.c                        | 14 +++++++++++
 gcc/config/i386/sse.md                        | 24 +++++++++----------
 .../gcc.target/i386/vect-float16-1.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-10.c         | 14 +++++++++++
 .../gcc.target/i386/vect-float16-11.c         | 14 +++++++++++
 .../gcc.target/i386/vect-float16-12.c         | 14 +++++++++++
 .../gcc.target/i386/vect-float16-2.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-3.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-4.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-5.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-6.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-7.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-8.c          | 14 +++++++++++
 .../gcc.target/i386/vect-float16-9.c          | 14 +++++++++++
 15 files changed, 198 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-float16-9.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 0c1aec585fe..cac8354a067 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -678,6 +678,10 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1)
       extract = gen_avx_vextractf128v32qi;
       mode = V16QImode;
       break;
+    case E_V16HFmode:
+      extract = gen_avx_vextractf128v16hf;
+      mode = V8HFmode;
+      break;
     case E_V8SFmode:
       extract = gen_avx_vextractf128v8sf;
       mode = V4SFmode;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index dc649f96d0d..7b173bc0beb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22641,6 +22641,20 @@ ix86_preferred_simd_mode (scalar_mode mode)
       else
 	return V2DImode;
 
+    case E_HFmode:
+      if (TARGET_AVX512FP16)
+	{
+	  if (TARGET_AVX512VL)
+	    {
+	      if (TARGET_PREFER_AVX128)
+		return V8HFmode;
+	      else if (TARGET_PREFER_AVX256)
+		return V16HFmode;
+	    }
+	  return V32HFmode;
+	}
+      return word_mode;
+
     case E_SFmode:
       if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
 	return V16SFmode;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 06339163bc5..26024609e2b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -279,6 +279,10 @@ (define_mode_iterator V_128
 (define_mode_iterator V_256
   [V32QI V16HI V8SI V4DI V8SF V4DF])
 
+;; All 256bit vector modes including HF vector mode
+(define_mode_iterator V_256H
+  [V32QI V16HI V8SI V4DI V8SF V4DF V16HF])
+
 ;; All 128bit and 256bit vector modes
 (define_mode_iterator V_128_256
   [V32QI V16QI V16HI V8HI V8SI V4SI V4DI V2DI V8SF V4SF V4DF V2DF])
@@ -406,8 +410,7 @@ (define_mode_iterator VIHF
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V8SI "TARGET_AVX") V4SI
    (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
-   (V8HF "TARGET_AVX512FP16")])
+   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF])
 
 (define_mode_iterator VI_AVX2
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
@@ -752,7 +755,7 @@ (define_mode_iterator VI_AVX512BW
   [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
 (define_mode_iterator VIHF_AVX512BW
   [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
-  (V32HF "TARGET_AVX512FP16")])
+  (V32HF "TARGET_AVX512BW")])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -9381,7 +9384,7 @@ (define_expand "avx512vl_vextractf128<mode>"
 
 (define_expand "avx_vextractf128<mode>"
   [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
-   (match_operand:V_256 1 "register_operand")
+   (match_operand:V_256H 1 "register_operand")
    (match_operand:SI 2 "const_0_to_1_operand")]
   "TARGET_AVX"
 {
@@ -9868,7 +9871,7 @@ (define_insn "*vec_extracthf"
 	  (match_operand:V8HF 1 "register_operand" "v,v")
 	  (parallel
 	    [(match_operand:SI 2 "const_0_to_7_operand")])))]
-  "TARGET_AVX512FP16"
+  "TARGET_SSE2"
   "@
    vpextrw\t{%2, %1, %k0|%k0, %1, %2}
    vpextrw\t{%2, %1, %0|%0, %1, %2}"
@@ -9882,8 +9885,7 @@ (define_mode_iterator VEC_EXTRACT_MODE
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
-   (V8HF "TARGET_AVX512FP16")
+   (V32HF "TARGET_AVX512BW") (V16HF "TARGET_AVX") V8HF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -15615,7 +15617,7 @@ (define_expand "vec_interleave_low<mode>"
 
 ;; Modes handled by pinsr patterns.
 (define_mode_iterator PINSR_MODE
-  [(V16QI "TARGET_SSE4_1") V8HI (V8HF "TARGET_AVX512FP16")
+  [(V16QI "TARGET_SSE4_1") V8HI V8HF
    (V4SI "TARGET_SSE4_1")
    (V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
 
@@ -23723,8 +23725,7 @@ (define_mode_iterator VEC_INIT_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
-   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
-   (V8HF "TARGET_AVX512FP16")
+   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -23736,8 +23737,7 @@ (define_mode_iterator VEC_INIT_HALF_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
-   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
-   (V8HF "TARGET_AVX512FP16")
+   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
    (V4TI "TARGET_AVX512F")])
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-1.c b/gcc/testsuite/gcc.target/i386/vect-float16-1.c
new file mode 100644
index 00000000000..0f82cf94932
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times "vaddph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-10.c b/gcc/testsuite/gcc.target/i386/vect-float16-10.c
new file mode 100644
index 00000000000..217645692ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-10.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] / c[i];
+}
+
+/* { dg-final { scan-assembler-times "vdivph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-11.c b/gcc/testsuite/gcc.target/i386/vect-float16-11.c
new file mode 100644
index 00000000000..e0409ce9d3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-11.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] / c[i];
+}
+
+/* { dg-final { scan-assembler-times "vdivph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-12.c b/gcc/testsuite/gcc.target/i386/vect-float16-12.c
new file mode 100644
index 00000000000..d92a25dc255
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-12.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] / c[i];
+}
+
+/* { dg-final { scan-assembler-times "vdivph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-2.c b/gcc/testsuite/gcc.target/i386/vect-float16-2.c
new file mode 100644
index 00000000000..974fca4ce09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times "vaddph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-3.c b/gcc/testsuite/gcc.target/i386/vect-float16-3.c
new file mode 100644
index 00000000000..9bca9142df7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-assembler-times "vaddph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-4.c b/gcc/testsuite/gcc.target/i386/vect-float16-4.c
new file mode 100644
index 00000000000..e6f26f0aa40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] - c[i];
+}
+
+/* { dg-final { scan-assembler-times "vsubph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-5.c b/gcc/testsuite/gcc.target/i386/vect-float16-5.c
new file mode 100644
index 00000000000..38f287b1dc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-5.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] - c[i];
+}
+
+/* { dg-final { scan-assembler-times "vsubph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-6.c b/gcc/testsuite/gcc.target/i386/vect-float16-6.c
new file mode 100644
index 00000000000..bc9f7870061
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-6.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] - c[i];
+}
+
+/* { dg-final { scan-assembler-times "vsubph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-7.c b/gcc/testsuite/gcc.target/i386/vect-float16-7.c
new file mode 100644
index 00000000000..b4849cf77c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-7.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mno-avx512vl" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] * c[i];
+}
+
+/* { dg-final { scan-assembler-times "vmulph" 8 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-8.c b/gcc/testsuite/gcc.target/i386/vect-float16-8.c
new file mode 100644
index 00000000000..71631b17cc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=128" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 128; i++)
+    a[i] = b[i] * c[i];
+}
+
+/* { dg-final { scan-assembler-times "vmulph" 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-float16-9.c b/gcc/testsuite/gcc.target/i386/vect-float16-9.c
new file mode 100644
index 00000000000..1be5c7f022f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-float16-9.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512fp16 -mavx512vl -mprefer-vector-width=256" } */
+
+/* Check that we vectorize to a full 128-bit vector for _Float16 types.  */
+
+void
+foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
+     _Float16 *__restrict__ c)
+{
+  for (int i = 0; i < 256; i++)
+    a[i] = b[i] * c[i];
+}
+
+/* { dg-final { scan-assembler-times "vmulph" 16 } } */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh.
  2021-07-01  6:16 ` [PATCH 16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh liuhongt
@ 2021-09-14  3:50   ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-09-14  3:50 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

i'm going to commit 8 patches:

[PATCH 16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh.
[PATCH 17/62] AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh.
[PATCH 18/62] AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh.
[PATCH 19/62] AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh.
[PATCH 20/62] AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh.
[PATCH 21/62] AVX512FP16: Add testcase for
vreduceph/vreducesh/vrndscaleph/vrndscalesh.
[PATCH 22/62] AVX512FP16: Add fpclass/getexp/getmant instructions.
[PATCH 23/62] AVX512FP16: Add testcase for fpclass/getmant/getexp instructions.

 Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
 Newly added tests passed on SPR.

On Thu, Jul 1, 2021 at 2:17 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/avx512fp16intrin.h: (_mm512_sqrt_ph):
>         New intrinsic.
>         (_mm512_mask_sqrt_ph): Likewise.
>         (_mm512_maskz_sqrt_ph): Likewise.
>         (_mm512_sqrt_round_ph): Likewise.
>         (_mm512_mask_sqrt_round_ph): Likewise.
>         (_mm512_maskz_sqrt_round_ph): Likewise.
>         (_mm512_rsqrt_ph): Likewise.
>         (_mm512_mask_rsqrt_ph): Likewise.
>         (_mm512_maskz_rsqrt_ph): Likewise.
>         (_mm_rsqrt_sh): Likewise.
>         (_mm_mask_rsqrt_sh): Likewise.
>         (_mm_maskz_rsqrt_sh): Likewise.
>         (_mm_sqrt_sh): Likewise.
>         (_mm_mask_sqrt_sh): Likewise.
>         (_mm_maskz_sqrt_sh): Likewise.
>         (_mm_sqrt_round_sh): Likewise.
>         (_mm_mask_sqrt_round_sh): Likewise.
>         (_mm_maskz_sqrt_round_sh): Likewise.
>         * config/i386/avx512fp16vlintrin.h (_mm_sqrt_ph): New intrinsic.
>         (_mm256_sqrt_ph): Likewise.
>         (_mm_mask_sqrt_ph): Likewise.
>         (_mm256_mask_sqrt_ph): Likewise.
>         (_mm_maskz_sqrt_ph): Likewise.
>         (_mm256_maskz_sqrt_ph): Likewise.
>         (_mm_rsqrt_ph): Likewise.
>         (_mm256_rsqrt_ph): Likewise.
>         (_mm_mask_rsqrt_ph): Likewise.
>         (_mm256_mask_rsqrt_ph): Likewise.
>         (_mm_maskz_rsqrt_ph): Likewise.
>         (_mm256_maskz_rsqrt_ph): Likewise.
>         * config/i386/i386-builtin-types.def: Add corresponding builtin types.
>         * config/i386/i386-builtin.def: Add corresponding new builtins.
>         * config/i386/i386-expand.c
>         (ix86_expand_args_builtin): Handle new builtins.
>         (ix86_expand_round_builtin): Ditto.
>         * config/i386/sse.md (VF_AVX512FP16VL): New.
>         (sqrt<mode>2): Adjust for HF vector modes.
>         (<sse>_sqrt<mode>2<mask_name><round_name>): Likewise.
>         (<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>):
>         Likewise.
>         (<sse>_rsqrt<mode>2<mask_name>): New.
>         (avx512fp16_vmrsqrtv8hf2<mask_scalar_name>): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx-1.c: Add test for new builtins.
>         * gcc.target/i386/sse-13.c: Ditto.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * gcc.target/i386/sse-14.c: Add test for new intrinsics.
>         * gcc.target/i386/sse-22.c: Ditto.
> ---
>  gcc/config/i386/avx512fp16intrin.h     | 193 +++++++++++++++++++++++++
>  gcc/config/i386/avx512fp16vlintrin.h   |  93 ++++++++++++
>  gcc/config/i386/i386-builtin-types.def |   4 +
>  gcc/config/i386/i386-builtin.def       |   8 +
>  gcc/config/i386/i386-expand.c          |   4 +
>  gcc/config/i386/sse.md                 |  44 ++++--
>  gcc/testsuite/gcc.target/i386/avx-1.c  |   2 +
>  gcc/testsuite/gcc.target/i386/sse-13.c |   2 +
>  gcc/testsuite/gcc.target/i386/sse-14.c |   6 +
>  gcc/testsuite/gcc.target/i386/sse-22.c |   6 +
>  gcc/testsuite/gcc.target/i386/sse-23.c |   2 +
>  11 files changed, 355 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> index ed8ad84a105..50db5d12140 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -1235,6 +1235,199 @@ _mm_comi_round_sh (__m128h __A, __m128h __B, const int __P, const int __R)
>
>  #endif /* __OPTIMIZE__  */
>
> +/* Intrinsics vsqrtph.  */
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_sqrt_ph (__m512h __A)
> +{
> +  return __builtin_ia32_vsqrtph_v32hf_mask_round (__A,
> +                                                 _mm512_setzero_ph(),
> +                                                 (__mmask32) -1,
> +                                                 _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_sqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
> +{
> +  return __builtin_ia32_vsqrtph_v32hf_mask_round (__C, __A, __B,
> +                                                 _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_sqrt_ph (__mmask32 __A, __m512h __B)
> +{
> +  return __builtin_ia32_vsqrtph_v32hf_mask_round (__B,
> +                                                 _mm512_setzero_ph (),
> +                                                 __A,
> +                                                 _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_sqrt_round_ph (__m512h __A, const int __B)
> +{
> +  return __builtin_ia32_vsqrtph_v32hf_mask_round (__A,
> +                                                 _mm512_setzero_ph(),
> +                                                 (__mmask32) -1, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_sqrt_round_ph (__m512h __A, __mmask32 __B, __m512h __C,
> +                          const int __D)
> +{
> +  return __builtin_ia32_vsqrtph_v32hf_mask_round (__C, __A, __B, __D);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_sqrt_round_ph (__mmask32 __A, __m512h __B, const int __C)
> +{
> +  return __builtin_ia32_vsqrtph_v32hf_mask_round (__B,
> +                                                 _mm512_setzero_ph (),
> +                                                 __A, __C);
> +}
> +
> +#else
> +#define _mm512_sqrt_round_ph(A, B)                                     \
> +  (__builtin_ia32_vsqrtph_v32hf_mask_round ((A),                       \
> +                                           _mm512_setzero_ph (),       \
> +                                           (__mmask32)-1, (B)))
> +
> +#define _mm512_mask_sqrt_round_ph(A, B, C, D)                          \
> +  (__builtin_ia32_vsqrtph_v32hf_mask_round ((C), (A), (B), (D)))
> +
> +#define _mm512_maskz_sqrt_round_ph(A, B, C)                            \
> +  (__builtin_ia32_vsqrtph_v32hf_mask_round ((B),                       \
> +                                           _mm512_setzero_ph (),       \
> +                                           (A), (C)))
> +
> +#endif /* __OPTIMIZE__ */
> +
> +/* Intrinsics vrsqrtph.  */
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_rsqrt_ph (__m512h __A)
> +{
> +  return __builtin_ia32_vrsqrtph_v32hf_mask (__A, _mm512_setzero_ph (),
> +                                            (__mmask32) -1);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_rsqrt_ph (__m512h __A, __mmask32 __B, __m512h __C)
> +{
> +  return __builtin_ia32_vrsqrtph_v32hf_mask (__C, __A, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_rsqrt_ph (__mmask32 __A, __m512h __B)
> +{
> +  return __builtin_ia32_vrsqrtph_v32hf_mask (__B, _mm512_setzero_ph (),
> +                                            __A);
> +}
> +
> +/* Intrinsics vrsqrtsh.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_rsqrt_sh (__m128h __A, __m128h __B)
> +{
> +  return __builtin_ia32_vrsqrtsh_v8hf_mask (__B, __A, _mm_setzero_ph (),
> +                                           (__mmask8) -1);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_rsqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return __builtin_ia32_vrsqrtsh_v8hf_mask (__D, __C, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_rsqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
> +{
> +  return __builtin_ia32_vrsqrtsh_v8hf_mask (__C, __B, _mm_setzero_ph (),
> +                                           __A);
> +}
> +
> +/* Intrinsics vsqrtsh.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_sqrt_sh (__m128h __A, __m128h __B)
> +{
> +  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__B, __A,
> +                                                _mm_setzero_ph (),
> +                                                (__mmask8) -1,
> +                                                _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_sqrt_sh (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__D, __C, __A, __B,
> +                                                _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_sqrt_sh (__mmask8 __A, __m128h __B, __m128h __C)
> +{
> +  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__C, __B,
> +                                                _mm_setzero_ph (),
> +                                                __A, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_sqrt_round_sh (__m128h __A, __m128h __B, const int __C)
> +{
> +  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__B, __A,
> +                                                _mm_setzero_ph (),
> +                                                (__mmask8) -1, __C);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_sqrt_round_sh (__m128h __A, __mmask8 __B, __m128h __C,
> +                       __m128h __D, const int __E)
> +{
> +  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__D, __C, __A, __B,
> +                                                __E);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_sqrt_round_sh (__mmask8 __A, __m128h __B, __m128h __C,
> +                        const int __D)
> +{
> +  return __builtin_ia32_vsqrtsh_v8hf_mask_round (__C, __B,
> +                                                _mm_setzero_ph (),
> +                                                __A, __D);
> +}
> +
> +#else
> +#define _mm_sqrt_round_sh(A, B, C)                             \
> +  (__builtin_ia32_vsqrtsh_v8hf_mask_round ((B), (A),           \
> +                                          _mm_setzero_ph (),   \
> +                                          (__mmask8)-1, (C)))
> +
> +#define _mm_mask_sqrt_round_sh(A, B, C, D, E)                  \
> +  (__builtin_ia32_vsqrtsh_v8hf_mask_round ((D), (C), (A), (B), (E)))
> +
> +#define _mm_maskz_sqrt_round_sh(A, B, C, D)                    \
> +  (__builtin_ia32_vsqrtsh_v8hf_mask_round ((C), (B),           \
> +                                          _mm_setzero_ph (),   \
> +                                          (A), (D)))
> +
> +#endif /* __OPTIMIZE__ */
> +
>  #ifdef __DISABLE_AVX512FP16__
>  #undef __DISABLE_AVX512FP16__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
> index 1787ed5f4ff..aaed85203c9 100644
> --- a/gcc/config/i386/avx512fp16vlintrin.h
> +++ b/gcc/config/i386/avx512fp16vlintrin.h
> @@ -358,6 +358,99 @@ _mm_mask_cmp_ph_mask (__mmask16 __A, __m256h __B, __m256h __C,
>
>  #endif /* __OPTIMIZE__ */
>
> +/* Intrinsics vsqrtph.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_sqrt_ph (__m128h __A)
> +{
> +  return __builtin_ia32_vsqrtph_v8hf_mask (__A, _mm_setzero_ph (),
> +                                          (__mmask8) -1);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_sqrt_ph (__m256h __A)
> +{
> +  return __builtin_ia32_vsqrtph_v16hf_mask (__A, _mm256_setzero_ph (),
> +                                           (__mmask16) -1);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_sqrt_ph (__m128h __A, __mmask8 __B, __m128h __C)
> +{
> +  return __builtin_ia32_vsqrtph_v8hf_mask (__C, __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_sqrt_ph (__m256h __A, __mmask16 __B, __m256h __C)
> +{
> +  return __builtin_ia32_vsqrtph_v16hf_mask (__C, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_sqrt_ph (__mmask8 __A, __m128h __B)
> +{
> +  return __builtin_ia32_vsqrtph_v8hf_mask (__B, _mm_setzero_ph (),
> +                                          __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_sqrt_ph (__mmask16 __A, __m256h __B)
> +{
> +  return __builtin_ia32_vsqrtph_v16hf_mask (__B, _mm256_setzero_ph (),
> +                                           __A);
> +}
> +
> +/* Intrinsics vrsqrtph.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_rsqrt_ph (__m128h __A)
> +{
> +  return __builtin_ia32_vrsqrtph_v8hf_mask (__A, _mm_setzero_ph (),
> +                                           (__mmask8) -1);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_rsqrt_ph (__m256h __A)
> +{
> +  return __builtin_ia32_vrsqrtph_v16hf_mask (__A, _mm256_setzero_ph (),
> +                                            (__mmask16) -1);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_rsqrt_ph (__m128h __A, __mmask8 __B, __m128h __C)
> +{
> +  return __builtin_ia32_vrsqrtph_v8hf_mask (__C, __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_rsqrt_ph (__m256h __A, __mmask16 __B, __m256h __C)
> +{
> +  return __builtin_ia32_vrsqrtph_v16hf_mask (__C, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_rsqrt_ph (__mmask8 __A, __m128h __B)
> +{
> +  return __builtin_ia32_vrsqrtph_v8hf_mask (__B, _mm_setzero_ph (), __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_rsqrt_ph (__mmask16 __A, __m256h __B)
> +{
> +  return __builtin_ia32_vrsqrtph_v16hf_mask (__B, _mm256_setzero_ph (),
> +                                            __A);
> +}
> +
>  #ifdef __DISABLE_AVX512FP16VL__
>  #undef __DISABLE_AVX512FP16VL__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index e3070ad00bd..9ebad6b5f49 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -1305,16 +1305,20 @@ DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
>  # FP16 builtins
>  DEF_FUNCTION_TYPE (V8HF, V8HI)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
> +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
>  DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI)
>  DEF_FUNCTION_TYPE (UQI, V8HF, V8HF, INT, UQI, INT)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
>  DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
> +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI)
>  DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI)
>  DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
> +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
>  DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI)
> +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
>  DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
> diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
> index 045cf561ec7..999b2e1abb5 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -2802,6 +2802,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask, "__
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_cmpv8hf3_mask, "__builtin_ia32_vcmpph_v8hf_mask", IX86_BUILTIN_VCMPPH_V8HF_MASK, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_cmpv16hf3_mask, "__builtin_ia32_vcmpph_v16hf_mask", IX86_BUILTIN_VCMPPH_V16HF_MASK, UNKNOWN, (int) UHI_FTYPE_V16HF_V16HF_INT_UHI)
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask, "__builtin_ia32_vcmpph_v32hf_mask", IX86_BUILTIN_VCMPPH_V32HF_MASK, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv8hf2_mask, "__builtin_ia32_vsqrtph_v8hf_mask", IX86_BUILTIN_VSQRTPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv16hf2_mask, "__builtin_ia32_vsqrtph_v16hf_mask", IX86_BUILTIN_VSQRTPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv8hf2_mask, "__builtin_ia32_vrsqrtph_v8hf_mask", IX86_BUILTIN_VRSQRTPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv16hf2_mask, "__builtin_ia32_vrsqrtph_v16hf_mask", IX86_BUILTIN_VRSQRTPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_UHI)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_rsqrtv32hf2_mask, "__builtin_ia32_vrsqrtph_v32hf_mask", IX86_BUILTIN_VRSQRTPH_V32HF_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmrsqrtv8hf2_mask, "__builtin_ia32_vrsqrtsh_v8hf_mask", IX86_BUILTIN_VRSQRTSH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
>
>  /* Builtins with rounding support.  */
>  BDESC_END (ARGS, ROUND_ARGS)
> @@ -3017,6 +3023,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsmaxv8hf3_mask_roun
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsminv8hf3_mask_round, "__builtin_ia32_vminsh_v8hf_mask_round", IX86_BUILTIN_VMINSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_cmpv32hf3_mask_round, "__builtin_ia32_vcmpph_v32hf_mask_round", IX86_BUILTIN_VCMPPH_V32HF_MASK_ROUND, UNKNOWN, (int) USI_FTYPE_V32HF_V32HF_INT_USI_INT)
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmcmpv8hf3_mask_round, "__builtin_ia32_vcmpsh_v8hf_mask_round", IX86_BUILTIN_VCMPSH_V8HF_MASK_ROUND, UNKNOWN, (int) UQI_FTYPE_V8HF_V8HF_INT_UQI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_sqrtv32hf2_mask_round, "__builtin_ia32_vsqrtph_v32hf_mask_round", IX86_BUILTIN_VSQRTPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_USI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vmsqrtv8hf2_mask_round, "__builtin_ia32_vsqrtsh_v8hf_mask_round", IX86_BUILTIN_VSQRTSH_V8HF_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
>
>  BDESC_END (ROUND_ARGS, MULTI_ARG)
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index a79cc324ceb..d76e4405413 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -9532,6 +9532,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
>      case V16HI_FTYPE_V16SI_V16HI_UHI:
>      case V16QI_FTYPE_V16SI_V16QI_UHI:
>      case V16QI_FTYPE_V8DI_V16QI_UQI:
> +    case V32HF_FTYPE_V32HF_V32HF_USI:
>      case V16SF_FTYPE_V16SF_V16SF_UHI:
>      case V16SF_FTYPE_V4SF_V16SF_UHI:
>      case V16SI_FTYPE_SI_V16SI_UHI:
> @@ -9561,12 +9562,14 @@ ix86_expand_args_builtin (const struct builtin_description *d,
>      case V16HI_FTYPE_HI_V16HI_UHI:
>      case V8HI_FTYPE_V8HI_V8HI_UQI:
>      case V8HI_FTYPE_HI_V8HI_UQI:
> +    case V16HF_FTYPE_V16HF_V16HF_UHI:
>      case V8SF_FTYPE_V8HI_V8SF_UQI:
>      case V4SF_FTYPE_V8HI_V4SF_UQI:
>      case V8SI_FTYPE_V8SF_V8SI_UQI:
>      case V4SI_FTYPE_V4SF_V4SI_UQI:
>      case V4DI_FTYPE_V4SF_V4DI_UQI:
>      case V2DI_FTYPE_V4SF_V2DI_UQI:
> +    case V8HF_FTYPE_V8HF_V8HF_UQI:
>      case V4SF_FTYPE_V4DI_V4SF_UQI:
>      case V4SF_FTYPE_V2DI_V4SF_UQI:
>      case V4DF_FTYPE_V4DI_V4DF_UQI:
> @@ -10495,6 +10498,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
>      case V8DI_FTYPE_V8DF_V8DI_QI_INT:
>      case V8SF_FTYPE_V8DI_V8SF_QI_INT:
>      case V8DF_FTYPE_V8DI_V8DF_QI_INT:
> +    case V32HF_FTYPE_V32HF_V32HF_USI_INT:
>      case V16SF_FTYPE_V16SF_V16SF_HI_INT:
>      case V8DI_FTYPE_V8SF_V8DI_QI_INT:
>      case V16SF_FTYPE_V16SI_V16SF_HI_INT:
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index b7e22e0ec80..4763fd0558d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -395,6 +395,9 @@ (define_mode_iterator VF1_AVX512VL
>  (define_mode_iterator VF_AVX512FP16
>    [V32HF V16HF V8HF])
>
> +(define_mode_iterator VF_AVX512FP16VL
> +  [V32HF (V16HF "TARGET_AVX512VL") (V8HF "TARGET_AVX512VL")])
> +
>  ;; All vector integer modes
>  (define_mode_iterator VI
>    [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
> @@ -2238,8 +2241,8 @@ (define_insn "srcp14<mode>_mask"
>     (set_attr "mode" "<MODE>")])
>
>  (define_expand "sqrt<mode>2"
> -  [(set (match_operand:VF2 0 "register_operand")
> -       (sqrt:VF2 (match_operand:VF2 1 "vector_operand")))]
> +  [(set (match_operand:VF2H 0 "register_operand")
> +       (sqrt:VF2H (match_operand:VF2H 1 "vector_operand")))]
>    "TARGET_SSE2")
>
>  (define_expand "sqrt<mode>2"
> @@ -2259,8 +2262,8 @@ (define_expand "sqrt<mode>2"
>  })
>
>  (define_insn "<sse>_sqrt<mode>2<mask_name><round_name>"
> -  [(set (match_operand:VF 0 "register_operand" "=x,v")
> -       (sqrt:VF (match_operand:VF 1 "<round_nimm_predicate>" "xBm,<round_constraint>")))]
> +  [(set (match_operand:VFH 0 "register_operand" "=x,v")
> +       (sqrt:VFH (match_operand:VFH 1 "<round_nimm_predicate>" "xBm,<round_constraint>")))]
>    "TARGET_SSE && <mask_mode512bit_condition> && <round_mode512bit_condition>"
>    "@
>     sqrt<ssemodesuffix>\t{%1, %0|%0, %1}
> @@ -2273,11 +2276,11 @@ (define_insn "<sse>_sqrt<mode>2<mask_name><round_name>"
>     (set_attr "mode" "<MODE>")])
>
>  (define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
> -  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> -       (vec_merge:VF_128
> -         (sqrt:VF_128
> -           (match_operand:VF_128 1 "nonimmediate_operand" "xm,<round_scalar_constraint>"))
> -         (match_operand:VF_128 2 "register_operand" "0,v")
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +       (vec_merge:VFH_128
> +         (sqrt:VFH_128
> +           (match_operand:VFH_128 1 "nonimmediate_operand" "xm,<round_scalar_constraint>"))
> +         (match_operand:VFH_128 2 "register_operand" "0,v")
>           (const_int 1)))]
>    "TARGET_SSE"
>    "@
> @@ -2330,6 +2333,16 @@ (define_insn "<sse>_rsqrt<mode>2"
>     (set_attr "prefix" "maybe_vex")
>     (set_attr "mode" "<MODE>")])
>
> +(define_insn "<sse>_rsqrt<mode>2<mask_name>"
> +  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
> +       (unspec:VF_AVX512FP16VL
> +         [(match_operand:VF_AVX512FP16VL 1 "vector_operand" "vBm")] UNSPEC_RSQRT))]
> +  "TARGET_AVX512FP16"
> +  "vrsqrtph\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
> +  [(set_attr "type" "sse")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<MODE>")])
> +
>  (define_insn "<mask_codefor>rsqrt14<mode><mask_name>"
>    [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
>         (unspec:VF_AVX512VL
> @@ -2405,6 +2418,19 @@ (define_insn "*sse_vmrsqrtv4sf2"
>     (set_attr "prefix" "orig,vex")
>     (set_attr "mode" "SF")])
>
> +(define_insn "avx512fp16_vmrsqrtv8hf2<mask_scalar_name>"
> +  [(set (match_operand:V8HF 0 "register_operand" "=v")
> +       (vec_merge:V8HF
> +         (unspec:V8HF [(match_operand:V8HF 1 "nonimmediate_operand" "vm")]
> +                      UNSPEC_RSQRT)
> +         (match_operand:V8HF 2 "register_operand" "v")
> +         (const_int 1)))]
> +  "TARGET_AVX512FP16"
> +  "vrsqrtsh\t{%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %w1}"
> +  [(set_attr "type" "sse")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_expand "<code><mode>3<mask_name><round_saeonly_name>"
>    [(set (match_operand:VFH 0 "register_operand")
>         (smaxmin:VFH
> diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> index d9aa8a70e35..651cb1c80fb 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> @@ -701,6 +701,8 @@
>  #define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D)
>  #define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8)
>  #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
> +#define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8)
> +#define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> index 9a2833d78f2..94553dec9e7 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> @@ -718,6 +718,8 @@
>  #define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D)
>  #define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8)
>  #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
> +#define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8)
> +#define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> index ce0ad71f190..7281bffdf2b 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> @@ -670,6 +670,7 @@ test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 8)
>  test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 8)
>
>  /* avx512fp16intrin.h */
> +test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8)
>  test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
>  test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
>  test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
> @@ -684,6 +685,8 @@ test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8)
>  test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8)
>  test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1)
>  test_2 (_mm_comi_sh, int, __m128h, __m128h, 1)
> +test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8)
> +test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8)
>  test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
>  test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
>  test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
> @@ -700,6 +703,8 @@ test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
>  test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
>  test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
>  test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1)
> +test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
> +test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
>  test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
>  test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
>  test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> @@ -714,6 +719,7 @@ test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h,
>  test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
>  test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
>  test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
> +test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
>
>  /* avx512fp16vlintrin.h */
>  test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> index 439346490bd..04326e0e37d 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> @@ -775,6 +775,7 @@ test_2 (_mm_rsqrt28_round_sd, __m128d, __m128d, __m128d, 8)
>  test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 8)
>
>  /* avx512fp16intrin.h */
> +test_1 (_mm512_sqrt_round_ph, __m512h, __m512h, 8)
>  test_2 (_mm512_add_round_ph, __m512h, __m512h, __m512h, 8)
>  test_2 (_mm512_sub_round_ph, __m512h, __m512h, __m512h, 8)
>  test_2 (_mm512_mul_round_ph, __m512h, __m512h, __m512h, 8)
> @@ -789,6 +790,8 @@ test_2 (_mm_max_round_sh, __m128h, __m128h, __m128h, 8)
>  test_2 (_mm_min_round_sh, __m128h, __m128h, __m128h, 8)
>  test_2 (_mm512_cmp_ph_mask, __mmask32, __m512h, __m512h, 1)
>  test_2 (_mm_comi_sh, int, __m128h, __m128h, 1)
> +test_2 (_mm512_maskz_sqrt_round_ph, __m512h, __mmask32, __m512h, 8)
> +test_2 (_mm_sqrt_round_sh, __m128h, __m128h, __m128h, 8)
>  test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
>  test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
>  test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
> @@ -805,6 +808,8 @@ test_3 (_mm512_maskz_min_round_ph, __m512h, __mmask32, __m512h, __m512h, 8)
>  test_3 (_mm_maskz_max_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
>  test_3 (_mm_maskz_min_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
>  test_3 (_mm512_mask_cmp_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1)
> +test_3 (_mm512_mask_sqrt_round_ph, __m512h, __m512h, __mmask32, __m512h, 8)
> +test_3 (_mm_maskz_sqrt_round_sh, __m128h, __mmask8, __m128h, __m128h, 8)
>  test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
>  test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
>  test_4 (_mm512_mask_add_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
> @@ -819,6 +824,7 @@ test_4 (_mm512_mask_max_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h,
>  test_4 (_mm512_mask_min_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 8)
>  test_4 (_mm_mask_max_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
>  test_4 (_mm_mask_min_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
> +test_4 (_mm_mask_sqrt_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 8)
>
>  /* avx512fp16vlintrin.h */
>  test_2 (_mm_cmp_ph_mask, __mmask8, __m128h, __m128h, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> index f6768bac345..7559d335dbc 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> @@ -719,6 +719,8 @@
>  #define __builtin_ia32_vcmpph_v32hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v32hf_mask(A, B, 1, D)
>  #define __builtin_ia32_vcmpph_v32hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpph_v32hf_mask_round(A, B, 1, D, 8)
>  #define __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, C, D, E) __builtin_ia32_vcmpsh_v8hf_mask_round(A, B, 1, D, 8)
> +#define __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, D) __builtin_ia32_vsqrtph_v32hf_mask_round(C, A, B, 8)
> +#define __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, E) __builtin_ia32_vsqrtsh_v8hf_mask_round(D, C, A, B, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 24/62] AVX512FP16: Add vmovw/vmovsh.
  2021-07-01  6:16 ` [PATCH 24/62] AVX512FP16: Add vmovw/vmovsh liuhongt
@ 2021-09-16  5:08   ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-09-16  5:08 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

I'm going to check in 6 patches

[PATCH 24/62] AVX512FP16: Add vmovw/vmovsh.
[PATCH 25/62] AVX512FP16: Add testcase for vmovsh/vmovw.
[PATCH 26/62] AVX512FP16: Add
vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq
[PATCH 27/62] AVX512FP16: Add testcase for
vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq.
[PATCH 28/62] AVX512FP16: Add
vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph
[PATCH 29/62] AVX512FP16: Add testcase for
vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph.

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
  Newly added runtime testcase passed on SPR.

On Thu, Jul 1, 2021 at 2:17 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/avx512fp16intrin.h: (_mm_cvtsi16_si128):
>         New intrinsic.
>         (_mm_cvtsi128_si16): Likewise.
>         (_mm_mask_load_sh): Likewise.
>         (_mm_maskz_load_sh): Likewise.
>         (_mm_mask_store_sh): Likewise.
>         (_mm_move_sh): Likewise.
>         (_mm_mask_move_sh): Likewise.
>         (_mm_maskz_move_sh): Likewise.
>         * config/i386/i386-builtin-types.def: Add corresponding builtin types.
>         * config/i386/i386-builtin.def: Add corresponding new builtins.
>         * config/i386/i386-expand.c
>         (ix86_expand_special_args_builtin): Handle new builtin types.
>         (ix86_expand_vector_init_one_nonzero): Adjust for FP16 target.
>         * config/i386/sse.md (VI2F): New mode iterator.
>         (vec_set<mode>_0): Use new mode iterator.
>         (avx512f_mov<ssescalarmodelower>_mask): Adjust for HF vector mode.
>         (avx512f_store<mode>_mask): Ditto.
> ---
>  gcc/config/i386/avx512fp16intrin.h     | 59 ++++++++++++++++++++++++++
>  gcc/config/i386/i386-builtin-types.def |  3 ++
>  gcc/config/i386/i386-builtin.def       |  5 +++
>  gcc/config/i386/i386-expand.c          | 11 +++++
>  gcc/config/i386/sse.md                 | 33 +++++++-------
>  5 files changed, 95 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> index 2fbfc140c44..cdf6646c8c6 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -2453,6 +2453,65 @@ _mm512_maskz_getmant_round_ph (__mmask32 __U, __m512h __A,
>
>  #endif /* __OPTIMIZE__ */
>
> +/* Intrinsics vmovw.  */
> +extern __inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsi16_si128 (short __A)
> +{
> +  return _mm_set_epi16 (0, 0, 0, 0, 0, 0, 0, __A);
> +}
> +
> +extern __inline short
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsi128_si16 (__m128i __A)
> +{
> +  return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, 0);
> +}
> +
> +/* Intrinsics vmovsh.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_load_sh (__m128h __A, __mmask8 __B, _Float16 const* __C)
> +{
> +  return __builtin_ia32_loadsh_mask (__C, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_load_sh (__mmask8 __A, _Float16 const* __B)
> +{
> +  return __builtin_ia32_loadsh_mask (__B, _mm_setzero_ph (), __A);
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C)
> +{
> +  __builtin_ia32_storesh_mask (__A,  __C, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_move_sh (__m128h __A, __m128h  __B)
> +{
> +  __A[0] = __B[0];
> +  return __A;
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_move_sh (__m128h __A, __mmask8 __B, __m128h  __C, __m128h __D)
> +{
> +  return __builtin_ia32_vmovsh_mask (__C, __D, __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_move_sh (__mmask8 __A, __m128h  __B, __m128h __C)
> +{
> +  return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A);
> +}
> +
>  #ifdef __DISABLE_AVX512FP16__
>  #undef __DISABLE_AVX512FP16__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index 79e7edf13e5..6cf3e354c78 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -134,6 +134,7 @@ DEF_POINTER_TYPE (PCVOID, VOID, CONST)
>  DEF_POINTER_TYPE (PVOID, VOID)
>  DEF_POINTER_TYPE (PDOUBLE, DOUBLE)
>  DEF_POINTER_TYPE (PFLOAT, FLOAT)
> +DEF_POINTER_TYPE (PCFLOAT16, FLOAT16, CONST)
>  DEF_POINTER_TYPE (PSHORT, SHORT)
>  DEF_POINTER_TYPE (PUSHORT, USHORT)
>  DEF_POINTER_TYPE (PINT, INT)
> @@ -1308,6 +1309,8 @@ DEF_FUNCTION_TYPE (QI, V8HF, INT, UQI)
>  DEF_FUNCTION_TYPE (HI, V16HF, INT, UHI)
>  DEF_FUNCTION_TYPE (SI, V32HF, INT, USI)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
> +DEF_FUNCTION_TYPE (VOID, PCFLOAT16, V8HF, UQI)
> +DEF_FUNCTION_TYPE (V8HF, PCFLOAT16, V8HF, UQI)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, UQI)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, INT)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, INT, V8HF, UQI)
> diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
> index ed1a4a38b1c..be617b8f18a 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -393,6 +393,10 @@ BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_us_truncatev32hiv32qi2_mas
>  BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_ss_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovswb512mem_mask", IX86_BUILTIN_PMOVSWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
>  BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_truncatev32hiv32qi2_mask_store, "__builtin_ia32_pmovwb512mem_mask", IX86_BUILTIN_PMOVWB512_MEM, UNKNOWN, (int) VOID_FTYPE_PV32QI_V32HI_USI)
>
> +/* AVX512FP16 */
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_loadhf_mask, "__builtin_ia32_loadsh_mask", IX86_BUILTIN_LOADSH_MASK, UNKNOWN, (int) V8HF_FTYPE_PCFLOAT16_V8HF_UQI)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_storehf_mask, "__builtin_ia32_storesh_mask", IX86_BUILTIN_STORESH_MASK, UNKNOWN, (int) VOID_FTYPE_PCFLOAT16_V8HF_UQI)
> +
>  /* RDPKRU and WRPKRU.  */
>  BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_rdpkru,  "__builtin_ia32_rdpkru", IX86_BUILTIN_RDPKRU, UNKNOWN, (int) UNSIGNED_FTYPE_VOID)
>  BDESC (OPTION_MASK_ISA_PKU, 0, CODE_FOR_wrpkru,  "__builtin_ia32_wrpkru", IX86_BUILTIN_WRPKRU, UNKNOWN, (int) VOID_FTYPE_UNSIGNED)
> @@ -2826,6 +2830,7 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getexpv8hf_mask, "__builtin_ia32_getexpph128_mask", IX86_BUILTIN_GETEXPPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_getmantv16hf_mask, "__builtin_ia32_getmantph256_mask", IX86_BUILTIN_GETMANTPH256, UNKNOWN, (int) V16HF_FTYPE_V16HF_INT_V16HF_UHI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_getmantv8hf_mask, "__builtin_ia32_getmantph128_mask", IX86_BUILTIN_GETMANTPH128, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_V8HF_UQI)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_movhf_mask, "__builtin_ia32_vmovsh_mask", IX86_BUILTIN_VMOVSH_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
>
>  /* Builtins with rounding support.  */
>  BDESC_END (ARGS, ROUND_ARGS)
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 266aa411ddb..bfc7fc75b97 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -10907,6 +10907,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
>      case VOID_FTYPE_PFLOAT_V16SF_UHI:
>      case VOID_FTYPE_PFLOAT_V8SF_UQI:
>      case VOID_FTYPE_PFLOAT_V4SF_UQI:
> +    case VOID_FTYPE_PCFLOAT16_V8HF_UQI:
>      case VOID_FTYPE_PV32QI_V32HI_USI:
>      case VOID_FTYPE_PV16QI_V16HI_UHI:
>      case VOID_FTYPE_PUDI_V8HI_UQI:
> @@ -10979,6 +10980,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
>      case V16SF_FTYPE_PCFLOAT_V16SF_UHI:
>      case V8SF_FTYPE_PCFLOAT_V8SF_UQI:
>      case V4SF_FTYPE_PCFLOAT_V4SF_UQI:
> +    case V8HF_FTYPE_PCFLOAT16_V8HF_UQI:
>        nargs = 3;
>        klass = load;
>        memory = 0;
> @@ -13993,6 +13995,8 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
>        break;
>      case E_V8HImode:
>        use_vector_set = TARGET_SSE2;
> +      gen_vec_set_0 = TARGET_AVX512FP16 && one_var == 0
> +       ? gen_vec_setv8hi_0 : NULL;
>        break;
>      case E_V8QImode:
>        use_vector_set = TARGET_MMX_WITH_SSE && TARGET_SSE4_1;
> @@ -14004,8 +14008,12 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
>        use_vector_set = TARGET_SSE4_1;
>        break;
>      case E_V32QImode:
> +      use_vector_set = TARGET_AVX;
> +      break;
>      case E_V16HImode:
>        use_vector_set = TARGET_AVX;
> +      gen_vec_set_0 = TARGET_AVX512FP16 && one_var == 0
> +       ? gen_vec_setv16hi_0 : NULL;
>        break;
>      case E_V8SImode:
>        use_vector_set = TARGET_AVX;
> @@ -14053,6 +14061,9 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
>        use_vector_set = TARGET_AVX512FP16 && one_var == 0;
>        gen_vec_set_0 = gen_vec_setv32hf_0;
>        break;
> +    case E_V32HImode:
> +      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> +      gen_vec_set_0 = gen_vec_setv32hi_0;
>      default:
>        break;
>      }
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index c4db778e25d..97f7c698d5d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -758,6 +758,7 @@ (define_mode_iterator VIHF_AVX512BW
>    (V32HF "TARGET_AVX512FP16")])
>
>  ;; Int-float size matches
> +(define_mode_iterator VI2F [V8HI V16HI V32HI V8HF V16HF V32HF])
>  (define_mode_iterator VI4F_128 [V4SI V4SF])
>  (define_mode_iterator VI8F_128 [V2DI V2DF])
>  (define_mode_iterator VI4F_256 [V8SI V8SF])
> @@ -1317,13 +1318,13 @@ (define_insn_and_split "*<avx512>_load<mode>"
>    [(set (match_dup 0) (match_dup 1))])
>
>  (define_insn "avx512f_mov<ssescalarmodelower>_mask"
> -  [(set (match_operand:VF_128 0 "register_operand" "=v")
> -       (vec_merge:VF_128
> -         (vec_merge:VF_128
> -           (match_operand:VF_128 2 "register_operand" "v")
> -           (match_operand:VF_128 3 "nonimm_or_0_operand" "0C")
> +  [(set (match_operand:VFH_128 0 "register_operand" "=v")
> +       (vec_merge:VFH_128
> +         (vec_merge:VFH_128
> +           (match_operand:VFH_128 2 "register_operand" "v")
> +           (match_operand:VFH_128 3 "nonimm_or_0_operand" "0C")
>             (match_operand:QI 4 "register_operand" "Yk"))
> -         (match_operand:VF_128 1 "register_operand" "v")
> +         (match_operand:VFH_128 1 "register_operand" "v")
>           (const_int 1)))]
>    "TARGET_AVX512F"
>    "vmov<ssescalarmodesuffix>\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}"
> @@ -1336,7 +1337,7 @@ (define_expand "avx512f_load<mode>_mask"
>         (vec_merge:<ssevecmode>
>           (vec_merge:<ssevecmode>
>             (vec_duplicate:<ssevecmode>
> -             (match_operand:MODEF 1 "memory_operand"))
> +             (match_operand:MODEFH 1 "memory_operand"))
>             (match_operand:<ssevecmode> 2 "nonimm_or_0_operand")
>             (match_operand:QI 3 "register_operand"))
>           (match_dup 4)
> @@ -1349,7 +1350,7 @@ (define_insn "*avx512f_load<mode>_mask"
>         (vec_merge:<ssevecmode>
>           (vec_merge:<ssevecmode>
>             (vec_duplicate:<ssevecmode>
> -             (match_operand:MODEF 1 "memory_operand" "m"))
> +             (match_operand:MODEFH 1 "memory_operand" "m"))
>             (match_operand:<ssevecmode> 2 "nonimm_or_0_operand" "0C")
>             (match_operand:QI 3 "register_operand" "Yk"))
>           (match_operand:<ssevecmode> 4 "const0_operand" "C")
> @@ -1362,11 +1363,11 @@ (define_insn "*avx512f_load<mode>_mask"
>     (set_attr "mode" "<MODE>")])
>
>  (define_insn "avx512f_store<mode>_mask"
> -  [(set (match_operand:MODEF 0 "memory_operand" "=m")
> -       (if_then_else:MODEF
> +  [(set (match_operand:MODEFH 0 "memory_operand" "=m")
> +       (if_then_else:MODEFH
>           (and:QI (match_operand:QI 2 "register_operand" "Yk")
>                  (const_int 1))
> -         (vec_select:MODEF
> +         (vec_select:MODEFH
>             (match_operand:<ssevecmode> 1 "register_operand" "v")
>             (parallel [(const_int 0)]))
>           (match_dup 0)))]
> @@ -8513,11 +8514,11 @@ (define_insn "vec_set<mode>_0"
>
>  ;; vmovw clears also the higer bits
>  (define_insn "vec_set<mode>_0"
> -  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v")
> -       (vec_merge:VF_AVX512FP16
> -         (vec_duplicate:VF_AVX512FP16
> -           (match_operand:HF 2 "nonimmediate_operand" "rm"))
> -         (match_operand:VF_AVX512FP16 1 "const0_operand" "C")
> +  [(set (match_operand:VI2F 0 "register_operand" "=v")
> +       (vec_merge:VI2F
> +         (vec_duplicate:VI2F
> +           (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "rm"))
> +         (match_operand:VI2F 1 "const0_operand" "C")
>           (const_int 1)))]
>    "TARGET_AVX512FP16"
>    "vmovw\t{%2, %x0|%x0, %2}"
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.
  2021-07-01  6:16 ` [PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh liuhongt
@ 2021-09-17  8:07   ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-09-17  8:07 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

I'm going to check in 10 patches.

[PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.
[PATCH 31/62] AVX512FP16: Add testcase for
vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.
[PATCH 32/62] AVX512FP16: Add
vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq
[PATCH 33/62] AVX512FP16: Add testcase for
vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq
[PATCH 34/62] AVX512FP16: Add vcvttsh2si/vcvttsh2usi
[PATCH 35/62] AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx
[PATCH 36/62] AVX512FP16: Add testcase for
vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx
[PATCH 37/62] AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh.
[PATCH 38/62] AVX512FP16: Add testcase for
vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh
[PATCH 39/62] AVX512FP16: Add intrinsics for casting between vector
float16 and vector float32/float64/integer.

  Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
  Newly added runtime testcase passed on sde{-m32,}.


On Thu, Jul 1, 2021 at 2:17 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/avx512fp16intrin.h (_mm_cvtsh_i32): New intrinsic.
>         (_mm_cvtsh_u32): Likewise.
>         (_mm_cvt_roundsh_i32): Likewise.
>         (_mm_cvt_roundsh_u32): Likewise.
>         (_mm_cvtsh_i64): Likewise.
>         (_mm_cvtsh_u64): Likewise.
>         (_mm_cvt_roundsh_i64): Likewise.
>         (_mm_cvt_roundsh_u64): Likewise.
>         (_mm_cvti32_sh): Likewise.
>         (_mm_cvtu32_sh): Likewise.
>         (_mm_cvt_roundi32_sh): Likewise.
>         (_mm_cvt_roundu32_sh): Likewise.
>         (_mm_cvti64_sh): Likewise.
>         (_mm_cvtu64_sh): Likewise.
>         (_mm_cvt_roundi64_sh): Likewise.
>         (_mm_cvt_roundu64_sh): Likewise.
>         * config/i386/i386-builtin-types.def: Add corresponding builtin types.
>         * config/i386/i386-builtin.def: Add corresponding new builtins.
>         * config/i386/i386-expand.c (ix86_expand_round_builtin):
>         Handle new builtin types.
>         * config/i386/sse.md
>         (avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix><round_name>):
>         New define_insn.
>         (avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix>_2): Likewise.
>         (avx512fp16_vcvt<floatsuffix>si2sh<rex64namesuffix><round_name>): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx-1.c: Add test for new builtins.
>         * gcc.target/i386/sse-13.c: Ditto.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * gcc.target/i386/sse-14.c: Add test for new intrinsics.
>         * gcc.target/i386/sse-22.c: Ditto.
> ---
>  gcc/config/i386/avx512fp16intrin.h     | 158 +++++++++++++++++++++++++
>  gcc/config/i386/i386-builtin-types.def |   8 ++
>  gcc/config/i386/i386-builtin.def       |   8 ++
>  gcc/config/i386/i386-expand.c          |   8 ++
>  gcc/config/i386/sse.md                 |  46 +++++++
>  gcc/testsuite/gcc.target/i386/avx-1.c  |   8 ++
>  gcc/testsuite/gcc.target/i386/sse-13.c |   8 ++
>  gcc/testsuite/gcc.target/i386/sse-14.c |  10 ++
>  gcc/testsuite/gcc.target/i386/sse-22.c |  10 ++
>  gcc/testsuite/gcc.target/i386/sse-23.c |   8 ++
>  10 files changed, 272 insertions(+)
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> index bd801942365..7524a8d6a5b 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -3529,6 +3529,164 @@ _mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, __m512i __B, int __C)
>
>  #endif /* __OPTIMIZE__ */
>
> +/* Intrinsics vcvtsh2si, vcvtsh2us.  */
> +extern __inline int
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsh_i32 (__m128h __A)
> +{
> +  return (int) __builtin_ia32_vcvtsh2si32_round (__A, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline unsigned
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsh_u32 (__m128h __A)
> +{
> +  return (int) __builtin_ia32_vcvtsh2usi32_round (__A,
> +                                                 _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline int
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundsh_i32 (__m128h __A, const int __R)
> +{
> +  return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R);
> +}
> +
> +extern __inline unsigned
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundsh_u32 (__m128h __A, const int __R)
> +{
> +  return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R);
> +}
> +
> +#else
> +#define _mm_cvt_roundsh_i32(A, B)              \
> +  ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B)))
> +#define _mm_cvt_roundsh_u32(A, B)              \
> +  ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B)))
> +
> +#endif /* __OPTIMIZE__ */
> +
> +#ifdef __x86_64__
> +extern __inline long long
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsh_i64 (__m128h __A)
> +{
> +  return (long long)
> +    __builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline unsigned long long
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsh_u64 (__m128h __A)
> +{
> +  return (long long)
> +    __builtin_ia32_vcvtsh2usi64_round (__A, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline long long
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundsh_i64 (__m128h __A, const int __R)
> +{
> +  return (long long) __builtin_ia32_vcvtsh2si64_round (__A, __R);
> +}
> +
> +extern __inline unsigned long long
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundsh_u64 (__m128h __A, const int __R)
> +{
> +  return (long long) __builtin_ia32_vcvtsh2usi64_round (__A, __R);
> +}
> +
> +#else
> +#define _mm_cvt_roundsh_i64(A, B)                      \
> +  ((long long)__builtin_ia32_vcvtsh2si64_round ((A), (B)))
> +#define _mm_cvt_roundsh_u64(A, B)                      \
> +  ((long long)__builtin_ia32_vcvtsh2usi64_round ((A), (B)))
> +
> +#endif /* __OPTIMIZE__ */
> +#endif /* __x86_64__ */
> +
> +/* Intrinsics vcvtsi2sh, vcvtusi2sh.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvti32_sh (__m128h __A, int __B)
> +{
> +  return __builtin_ia32_vcvtsi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtu32_sh (__m128h __A, unsigned int __B)
> +{
> +  return __builtin_ia32_vcvtusi2sh32_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundi32_sh (__m128h __A, int __B, const int __R)
> +{
> +  return __builtin_ia32_vcvtsi2sh32_round (__A, __B, __R);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundu32_sh (__m128h __A, unsigned int __B, const int __R)
> +{
> +  return __builtin_ia32_vcvtusi2sh32_round (__A, __B, __R);
> +}
> +
> +#else
> +#define _mm_cvt_roundi32_sh(A, B, C)           \
> +  (__builtin_ia32_vcvtsi2sh32_round ((A), (B), (C)))
> +#define _mm_cvt_roundu32_sh(A, B, C)           \
> +  (__builtin_ia32_vcvtusi2sh32_round ((A), (B), (C)))
> +
> +#endif /* __OPTIMIZE__ */
> +
> +#ifdef __x86_64__
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvti64_sh (__m128h __A, long long __B)
> +{
> +  return __builtin_ia32_vcvtsi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtu64_sh (__m128h __A, unsigned long long __B)
> +{
> +  return __builtin_ia32_vcvtusi2sh64_round (__A, __B, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundi64_sh (__m128h __A, long long __B, const int __R)
> +{
> +  return __builtin_ia32_vcvtsi2sh64_round (__A, __B, __R);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundu64_sh (__m128h __A, unsigned long long __B, const int __R)
> +{
> +  return __builtin_ia32_vcvtusi2sh64_round (__A, __B, __R);
> +}
> +
> +#else
> +#define _mm_cvt_roundi64_sh(A, B, C)           \
> +  (__builtin_ia32_vcvtsi2sh64_round ((A), (B), (C)))
> +#define _mm_cvt_roundu64_sh(A, B, C)           \
> +  (__builtin_ia32_vcvtusi2sh64_round ((A), (B), (C)))
> +
> +#endif /* __OPTIMIZE__ */
> +#endif /* __x86_64__ */
> +
> +
>  #ifdef __DISABLE_AVX512FP16__
>  #undef __DISABLE_AVX512FP16__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index 57b9ea786e1..74bda59a65e 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -1308,9 +1308,17 @@ DEF_FUNCTION_TYPE (V8HF, V8HI)
>  DEF_FUNCTION_TYPE (QI, V8HF, INT, UQI)
>  DEF_FUNCTION_TYPE (HI, V16HF, INT, UHI)
>  DEF_FUNCTION_TYPE (SI, V32HF, INT, USI)
> +DEF_FUNCTION_TYPE (INT, V8HF, INT)
> +DEF_FUNCTION_TYPE (INT64, V8HF, INT)
> +DEF_FUNCTION_TYPE (UINT, V8HF, INT)
> +DEF_FUNCTION_TYPE (UINT64, V8HF, INT)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF)
>  DEF_FUNCTION_TYPE (VOID, PCFLOAT16, V8HF, UQI)
>  DEF_FUNCTION_TYPE (V8HF, PCFLOAT16, V8HF, UQI)
> +DEF_FUNCTION_TYPE (V8HF, V8HF, INT, INT)
> +DEF_FUNCTION_TYPE (V8HF, V8HF, INT64, INT)
> +DEF_FUNCTION_TYPE (V8HF, V8HF, UINT, INT)
> +DEF_FUNCTION_TYPE (V8HF, V8HF, UINT64, INT)
>  DEF_FUNCTION_TYPE (V2DI, V8HF, V2DI, UQI)
>  DEF_FUNCTION_TYPE (V4DI, V8HF, V4DI, UQI)
>  DEF_FUNCTION_TYPE (V4SI, V8HF, V4SI, UQI)
> diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
> index 44c55876e48..3602b40d6d5 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -3094,6 +3094,14 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtdq2ph_v16si_mask_
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtudq2ph_v16si_mask_round, "__builtin_ia32_vcvtudq2ph_v16si_mask_round", IX86_BUILTIN_VCVTUDQ2PH_V16SI_MASK_ROUND, UNKNOWN, (int) V16HF_FTYPE_V16SI_V16HF_UHI_INT)
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtqq2ph_v8di_mask_round, "__builtin_ia32_vcvtqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtuqq2ph_v8di_mask_round, "__builtin_ia32_vcvtuqq2ph_v8di_mask_round", IX86_BUILTIN_VCVTUQQ2PH_V8DI_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8DI_V8HF_UQI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2si_round, "__builtin_ia32_vcvtsh2si32_round", IX86_BUILTIN_VCVTSH2SI32_ROUND, UNKNOWN, (int) INT_FTYPE_V8HF_INT)
> +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2siq_round, "__builtin_ia32_vcvtsh2si64_round", IX86_BUILTIN_VCVTSH2SI64_ROUND, UNKNOWN, (int) INT64_FTYPE_V8HF_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usi_round, "__builtin_ia32_vcvtsh2usi32_round", IX86_BUILTIN_VCVTSH2USI32_ROUND, UNKNOWN, (int) UINT_FTYPE_V8HF_INT)
> +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2usiq_round, "__builtin_ia32_vcvtsh2usi64_round", IX86_BUILTIN_VCVTSH2USI64_ROUND, UNKNOWN, (int) UINT64_FTYPE_V8HF_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2sh_round, "__builtin_ia32_vcvtsi2sh32_round", IX86_BUILTIN_VCVTSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT_INT)
> +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsi2shq_round, "__builtin_ia32_vcvtsi2sh64_round", IX86_BUILTIN_VCVTSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_INT64_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2sh_round, "__builtin_ia32_vcvtusi2sh32_round", IX86_BUILTIN_VCVTUSI2SH32_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT_INT)
> +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtusi2shq_round, "__builtin_ia32_vcvtusi2sh64_round", IX86_BUILTIN_VCVTUSI2SH64_ROUND, UNKNOWN, (int) V8HF_FTYPE_V8HF_UINT64_INT)
>
>  BDESC_END (ROUND_ARGS, MULTI_ARG)
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 7d9e1bd6a2d..b83c6d9a92b 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -10489,16 +10489,24 @@ ix86_expand_round_builtin (const struct builtin_description *d,
>      {
>      case UINT64_FTYPE_V2DF_INT:
>      case UINT64_FTYPE_V4SF_INT:
> +    case UINT64_FTYPE_V8HF_INT:
>      case UINT_FTYPE_V2DF_INT:
>      case UINT_FTYPE_V4SF_INT:
> +    case UINT_FTYPE_V8HF_INT:
>      case INT64_FTYPE_V2DF_INT:
>      case INT64_FTYPE_V4SF_INT:
> +    case INT64_FTYPE_V8HF_INT:
>      case INT_FTYPE_V2DF_INT:
>      case INT_FTYPE_V4SF_INT:
> +    case INT_FTYPE_V8HF_INT:
>        nargs = 2;
>        break;
>      case V32HF_FTYPE_V32HF_V32HF_INT:
>      case V8HF_FTYPE_V8HF_V8HF_INT:
> +    case V8HF_FTYPE_V8HF_INT_INT:
> +    case V8HF_FTYPE_V8HF_UINT_INT:
> +    case V8HF_FTYPE_V8HF_INT64_INT:
> +    case V8HF_FTYPE_V8HF_UINT64_INT:
>      case V4SF_FTYPE_V4SF_UINT_INT:
>      case V4SF_FTYPE_V4SF_UINT64_INT:
>      case V2DF_FTYPE_V2DF_UINT64_INT:
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 8b23048a232..b312d26b806 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -5589,6 +5589,52 @@ (define_insn "*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask_1"
>     (set_attr "prefix" "evex")
>     (set_attr "mode" "TI")])
>
> +(define_insn "avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix><round_name>"
> +  [(set (match_operand:SWI48 0 "register_operand" "=r,r")
> +       (unspec:SWI48
> +         [(vec_select:HF
> +            (match_operand:V8HF 1 "<round_nimm_scalar_predicate>" "v,<round_constraint2>")
> +            (parallel [(const_int 0)]))]
> +         UNSPEC_US_FIX_NOTRUNC))]
> +  "TARGET_AVX512FP16"
> +  "%vcvtsh2<sseintconvertsignprefix>si\t{<round_op2>%1, %0|%0, %k1<round_op2>}"
> +  [(set_attr "type" "sseicvt")
> +   (set_attr "athlon_decode" "double,vector")
> +   (set_attr "bdver1_decode" "double,double")
> +   (set_attr "prefix_rep" "1")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<MODE>")])
> +
> +(define_insn "avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix>_2"
> +  [(set (match_operand:SWI48 0 "register_operand" "=r,r")
> +       (unspec:SWI48 [(match_operand:HF 1 "nonimmediate_operand" "v,m")]
> +                     UNSPEC_US_FIX_NOTRUNC))]
> +  "TARGET_AVX512FP16"
> +  "%vcvtsh2<sseintconvertsignprefix>si\t{%1, %0|%0, %k1}"
> +  [(set_attr "type" "sseicvt")
> +   (set_attr "athlon_decode" "double,vector")
> +   (set_attr "bdver1_decode" "double,double")
> +   (set_attr "prefix_rep" "1")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<MODE>")])
> +
> +(define_insn "avx512fp16_vcvt<floatsuffix>si2sh<rex64namesuffix><round_name>"
> +  [(set (match_operand:V8HF 0 "register_operand" "=v")
> +       (vec_merge:V8HF
> +         (vec_duplicate:V8HF
> +           (any_float:HF (match_operand:SWI48 2 "<round_nimm_scalar_predicate>" "<round_constraint3>")))
> +         (match_operand:V8HF 1 "register_operand" "v")
> +         (const_int 1)))]
> +  "TARGET_AVX512FP16"
> +  "vcvt<floatsuffix>si2sh\t{%2, <round_op3>%1, %0|%0, %1<round_op3>, %2}"
> +  [(set_attr "type" "sseicvt")
> +   (set_attr "athlon_decode" "*")
> +   (set_attr "amdfam10_decode" "*")
> +   (set_attr "bdver1_decode" "*")
> +   (set_attr "btver2_decode" "double")
> +   (set_attr "znver1_decode" "double")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
>
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ;;
> diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> index b569cc0bdd9..0aae949097a 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> @@ -731,6 +731,14 @@
>  #define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8)
>  #define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8)
>  #define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8)
> +#define __builtin_ia32_vcvtsh2si32_round(A, B) __builtin_ia32_vcvtsh2si32_round(A, 8)
> +#define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8)
> +#define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8)
> +#define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8)
> +#define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8)
> +#define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
> +#define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
> +#define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> index 07e59118438..997fb733132 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> @@ -748,6 +748,14 @@
>  #define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8)
>  #define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8)
>  #define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8)
> +#define __builtin_ia32_vcvtsh2si32_round(A, B) __builtin_ia32_vcvtsh2si32_round(A, 8)
> +#define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8)
> +#define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8)
> +#define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8)
> +#define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8)
> +#define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
> +#define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
> +#define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> index 0530192d97e..89a589e0d80 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> @@ -690,6 +690,14 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8)
>  test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8)
>  test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
>  test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
> +test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8)
> +test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8)
> +#ifdef __x86_64__
> +test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8)
> +test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8)
> +test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8)
> +test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8)
> +#endif
>  test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
>  test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
>  test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
> @@ -734,6 +742,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
>  test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8)
>  test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8)
>  test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8)
> +test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
> +test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
>  test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
>  test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
>  test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> index 04e6340516b..fed12744c6c 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> @@ -795,6 +795,14 @@ test_1 (_mm512_cvt_roundepi32_ph, __m256h, __m512i, 8)
>  test_1 (_mm512_cvt_roundepu32_ph, __m256h, __m512i, 8)
>  test_1 (_mm512_cvt_roundepi64_ph, __m128h, __m512i, 8)
>  test_1 (_mm512_cvt_roundepu64_ph, __m128h, __m512i, 8)
> +test_1 (_mm_cvt_roundsh_i32, int, __m128h, 8)
> +test_1 (_mm_cvt_roundsh_u32, unsigned, __m128h, 8)
> +#ifdef __x86_64__
> +test_1 (_mm_cvt_roundsh_i64, long long, __m128h, 8)
> +test_1 (_mm_cvt_roundsh_u64, unsigned long long, __m128h, 8)
> +test_2 (_mm_cvt_roundi64_sh, __m128h, __m128h, long long, 8)
> +test_2 (_mm_cvt_roundu64_sh, __m128h, __m128h, unsigned long long, 8)
> +#endif
>  test_1x (_mm512_reduce_round_ph, __m512h, __m512h, 123, 8)
>  test_1x (_mm512_roundscale_round_ph, __m512h, __m512h, 123, 8)
>  test_1x (_mm512_getmant_ph, __m512h, __m512h, 1, 1)
> @@ -838,6 +846,8 @@ test_2 (_mm512_maskz_cvt_roundepi32_ph, __m256h, __mmask16, __m512i, 8)
>  test_2 (_mm512_maskz_cvt_roundepu32_ph, __m256h, __mmask16, __m512i, 8)
>  test_2 (_mm512_maskz_cvt_roundepi64_ph, __m128h, __mmask8, __m512i, 8)
>  test_2 (_mm512_maskz_cvt_roundepu64_ph, __m128h, __mmask8, __m512i, 8)
> +test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
> +test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
>  test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
>  test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
>  test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> index 684891cc98b..6e8d8a1833c 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> @@ -749,6 +749,14 @@
>  #define __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, D) __builtin_ia32_vcvtudq2ph_v16si_mask_round(A, B, C, 8)
>  #define __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtqq2ph_v8di_mask_round(A, B, C, 8)
>  #define __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, D) __builtin_ia32_vcvtuqq2ph_v8di_mask_round(A, B, C, 8)
> +#define __builtin_ia32_vcvtsh2si32_round(A, B) __builtin_ia32_vcvtsh2si32_round(A, 8)
> +#define __builtin_ia32_vcvtsh2si64_round(A, B) __builtin_ia32_vcvtsh2si64_round(A, 8)
> +#define __builtin_ia32_vcvtsh2usi32_round(A, B) __builtin_ia32_vcvtsh2usi32_round(A, 8)
> +#define __builtin_ia32_vcvtsh2usi64_round(A, B) __builtin_ia32_vcvtsh2usi64_round(A, 8)
> +#define __builtin_ia32_vcvtsi2sh32_round(A, B, C) __builtin_ia32_vcvtsi2sh32_round(A, B, 8)
> +#define __builtin_ia32_vcvtsi2sh64_round(A, B, C) __builtin_ia32_vcvtsi2sh64_round(A, B, 8)
> +#define __builtin_ia32_vcvtusi2sh32_round(A, B, C) __builtin_ia32_vcvtusi2sh32_round(A, B, 8)
> +#define __builtin_ia32_vcvtusi2sh64_round(A, B, C) __builtin_ia32_vcvtusi2sh64_round(A, B, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> --
> 2.18.1
>


--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 40/62] AVX512FP16: Add vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph.
  2021-07-01  6:16 ` [PATCH 40/62] AVX512FP16: Add vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph liuhongt
@ 2021-09-18  7:04   ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-09-18  7:04 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

Hi:
  I'm going to check the 9 patches below.
  Bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,}.
  Newly added testcase passed on sde{-m32,}.

[PATCH 40/62] AVX512FP16: Add vfmaddsub[132, 213,
231]ph/vfmsubadd[132, 213, 231]ph.
[PATCH 41/62] AVX512FP16: Add testcase for vfmaddsub[132, 213,
231]ph/vfmsubadd[132, 213, 231]ph.
[PATCH 42/62] AVX512FP16: Add FP16 fma instructions.
[PATCH 43/62] AVX512FP16: Add testcase for fma instructions
[PATCH 44/62] AVX512FP16: Add scalar/vector bitwise operations, including
[PATCH 45/62] AVX512FP16: Add testcase for fp16 bitwise operations.
[PATCH 46/62] AVX512FP16: Enable FP16 mask load/store.
[PATCH 47/62] AVX512FP16: Add scalar fma instructions.
[PATCH 48/62] AVX512FP16: Add testcase for scalar FMA instructions.

On Thu, Jul 1, 2021 at 2:17 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/avx512fp16intrin.h (_mm512_fmaddsub_ph):
>         New intrinsic.
>         (_mm512_mask_fmaddsub_ph): Likewise.
>         (_mm512_mask3_fmaddsub_ph): Likewise.
>         (_mm512_maskz_fmaddsub_ph): Likewise.
>         (_mm512_fmaddsub_round_ph): Likewise.
>         (_mm512_mask_fmaddsub_round_ph): Likewise.
>         (_mm512_mask3_fmaddsub_round_ph): Likewise.
>         (_mm512_maskz_fmaddsub_round_ph): Likewise.
>         (_mm512_mask_fmsubadd_ph): Likewise.
>         (_mm512_mask3_fmsubadd_ph): Likewise.
>         (_mm512_maskz_fmsubadd_ph): Likewise.
>         (_mm512_fmsubadd_round_ph): Likewise.
>         (_mm512_mask_fmsubadd_round_ph): Likewise.
>         (_mm512_mask3_fmsubadd_round_ph): Likewise.
>         (_mm512_maskz_fmsubadd_round_ph): Likewise.
>         * config/i386/avx512fp16vlintrin.h (_mm256_fmaddsub_ph):
>         New intrinsic.
>         (_mm256_mask_fmaddsub_ph): Likewise.
>         (_mm256_mask3_fmaddsub_ph): Likewise.
>         (_mm256_maskz_fmaddsub_ph): Likewise.
>         (_mm_fmaddsub_ph): Likewise.
>         (_mm_mask_fmaddsub_ph): Likewise.
>         (_mm_mask3_fmaddsub_ph): Likewise.
>         (_mm_maskz_fmaddsub_ph): Likewise.
>         (_mm256_fmsubadd_ph): Likewise.
>         (_mm256_mask_fmsubadd_ph): Likewise.
>         (_mm256_mask3_fmsubadd_ph): Likewise.
>         (_mm256_maskz_fmsubadd_ph): Likewise.
>         (_mm_fmsubadd_ph): Likewise.
>         (_mm_mask_fmsubadd_ph): Likewise.
>         (_mm_mask3_fmsubadd_ph): Likewise.
>         (_mm_maskz_fmsubadd_ph): Likewise.
>         * config/i386/i386-builtin.def: Add corresponding new builtins.
>         * config/i386/sse.md (VFH_SF_AVX512VL): New mode iterator.
>         * (<avx512>_fmsubadd_<mode>_maskz<round_expand_name>): New expander.
>         * (<avx512>_fmaddsub_<mode>_maskz<round_expand_name>): Use
>         VFH_SF_AVX512VL.
>         * (<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>):
>         Ditto.
>         * (<avx512>_fmaddsub_<mode>_mask<round_name>): Ditto.
>         * (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto.
>         * (<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>):
>         Ditto.
>         * (<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto.
>         * (<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx-1.c: Add test for new builtins.
>         * gcc.target/i386/sse-13.c: Ditto.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * gcc.target/i386/sse-14.c: Add test for new intrinsics.
>         * gcc.target/i386/sse-22.c: Ditto.
> ---
>  gcc/config/i386/avx512fp16intrin.h     | 228 +++++++++++++++++++++++++
>  gcc/config/i386/avx512fp16vlintrin.h   | 182 ++++++++++++++++++++
>  gcc/config/i386/i386-builtin.def       |  18 ++
>  gcc/config/i386/sse.md                 | 103 ++++++-----
>  gcc/testsuite/gcc.target/i386/avx-1.c  |   6 +
>  gcc/testsuite/gcc.target/i386/sse-13.c |   6 +
>  gcc/testsuite/gcc.target/i386/sse-14.c |   8 +
>  gcc/testsuite/gcc.target/i386/sse-22.c |   8 +
>  gcc/testsuite/gcc.target/i386/sse-23.c |   6 +
>  9 files changed, 524 insertions(+), 41 deletions(-)
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> index ddb227529fa..4092663b504 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -5037,6 +5037,234 @@ _mm_maskz_cvt_roundsd_sh (__mmask8 __A, __m128h __B, __m128d __C,
>
>  #endif /* __OPTIMIZE__ */
>
> +/* Intrinsics vfmaddsub[132,213,231]ph.  */
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
> +                                       (__v32hf) __B,
> +                                       (__v32hf) __C,
> +                                       (__mmask32) -1,
> +                                       _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fmaddsub_ph (__m512h __A, __mmask32 __U, __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
> +                                       (__v32hf) __B,
> +                                       (__v32hf) __C,
> +                                       (__mmask32) __U,
> +                                       _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask3_fmaddsub_ph (__m512h __A, __m512h __B, __m512h __C, __mmask32 __U)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        (__v32hf) __C,
> +                                        (__mmask32) __U,
> +                                        _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fmaddsub_ph (__mmask32 __U, __m512h __A, __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        (__v32hf) __C,
> +                                        (__mmask32) __U,
> +                                        _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C, const int __R)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
> +                                       (__v32hf) __B,
> +                                       (__v32hf) __C,
> +                                       (__mmask32) -1, __R);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fmaddsub_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
> +                              __m512h __C, const int __R)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddsubph512_mask ((__v32hf) __A,
> +                                       (__v32hf) __B,
> +                                       (__v32hf) __C,
> +                                       (__mmask32) __U, __R);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask3_fmaddsub_round_ph (__m512h __A, __m512h __B, __m512h __C,
> +                               __mmask32 __U, const int __R)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddsubph512_mask3 ((__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        (__v32hf) __C,
> +                                        (__mmask32) __U, __R);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fmaddsub_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
> +                               __m512h __C, const int __R)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddsubph512_maskz ((__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        (__v32hf) __C,
> +                                        (__mmask32) __U, __R);
> +}
> +
> +#else
> +#define _mm512_fmaddsub_round_ph(A, B, C, R)                           \
> +  ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), -1, (R)))
> +
> +#define _mm512_mask_fmaddsub_round_ph(A, U, B, C, R)                   \
> +  ((__m512h)__builtin_ia32_vfmaddsubph512_mask ((A), (B), (C), (U), (R)))
> +
> +#define _mm512_mask3_fmaddsub_round_ph(A, B, C, U, R)                  \
> +  ((__m512h)__builtin_ia32_vfmaddsubph512_mask3 ((A), (B), (C), (U), (R)))
> +
> +#define _mm512_maskz_fmaddsub_round_ph(U, A, B, C, R)                  \
> +  ((__m512h)__builtin_ia32_vfmaddsubph512_maskz((A), (B), (C), (U), (R)))
> +
> +#endif /* __OPTIMIZE__ */
> +
> +/* Intrinsics vfmsubadd[132,213,231]ph.  */
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +  _mm512_fmsubadd_ph (__m512h __A, __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
> +                                       (__v32hf) __B,
> +                                       (__v32hf) __C,
> +                                       (__mmask32) -1,
> +                                       _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fmsubadd_ph (__m512h __A, __mmask32 __U,
> +                        __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
> +                                       (__v32hf) __B,
> +                                       (__v32hf) __C,
> +                                       (__mmask32) __U,
> +                                       _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask3_fmsubadd_ph (__m512h __A, __m512h __B,
> +                         __m512h __C, __mmask32 __U)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        (__v32hf) __C,
> +                                        (__mmask32) __U,
> +                                        _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fmsubadd_ph (__mmask32 __U, __m512h __A,
> +                         __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        (__v32hf) __C,
> +                                        (__mmask32) __U,
> +                                        _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fmsubadd_round_ph (__m512h __A, __m512h __B,
> +                         __m512h __C, const int __R)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
> +                                       (__v32hf) __B,
> +                                       (__v32hf) __C,
> +                                       (__mmask32) -1, __R);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fmsubadd_round_ph (__m512h __A, __mmask32 __U, __m512h __B,
> +                              __m512h __C, const int __R)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmsubaddph512_mask ((__v32hf) __A,
> +                                       (__v32hf) __B,
> +                                       (__v32hf) __C,
> +                                       (__mmask32) __U, __R);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask3_fmsubadd_round_ph (__m512h __A, __m512h __B, __m512h __C,
> +                               __mmask32 __U, const int __R)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmsubaddph512_mask3 ((__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        (__v32hf) __C,
> +                                        (__mmask32) __U, __R);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fmsubadd_round_ph (__mmask32 __U, __m512h __A, __m512h __B,
> +                               __m512h __C, const int __R)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmsubaddph512_maskz ((__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        (__v32hf) __C,
> +                                        (__mmask32) __U, __R);
> +}
> +
> +#else
> +#define _mm512_fmsubadd_round_ph(A, B, C, R)                           \
> +  ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), -1, (R)))
> +
> +#define _mm512_mask_fmsubadd_round_ph(A, U, B, C, R)                   \
> +  ((__m512h)__builtin_ia32_vfmsubaddph512_mask ((A), (B), (C), (U), (R)))
> +
> +#define _mm512_mask3_fmsubadd_round_ph(A, B, C, U, R)                  \
> +  ((__m512h)__builtin_ia32_vfmsubaddph512_mask3 ((A), (B), (C), (U), (R)))
> +
> +#define _mm512_maskz_fmsubadd_round_ph(U, A, B, C, R)                  \
> +  ((__m512h)__builtin_ia32_vfmsubaddph512_maskz ((A), (B), (C), (U), (R)))
> +
> +#endif /* __OPTIMIZE__ */
> +
>  #ifdef __DISABLE_AVX512FP16__
>  #undef __DISABLE_AVX512FP16__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
> index bcbe4523357..8825fae52aa 100644
> --- a/gcc/config/i386/avx512fp16vlintrin.h
> +++ b/gcc/config/i386/avx512fp16vlintrin.h
> @@ -2269,6 +2269,188 @@ _mm256_maskz_cvtpd_ph (__mmask8 __A, __m256d __B)
>                                              __A);
>  }
>
> +/* Intrinsics vfmaddsub[132,213,231]ph.  */
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_fmaddsub_ph (__m256h __A, __m256h __B, __m256h __C)
> +{
> +  return (__m256h)__builtin_ia32_vfmaddsubph256_mask ((__v16hf)__A,
> +                                                     (__v16hf)__B,
> +                                                     (__v16hf)__C,
> +                                                     (__mmask16)-1);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_fmaddsub_ph (__m256h __A, __mmask16 __U, __m256h __B,
> +                        __m256h __C)
> +{
> +  return (__m256h) __builtin_ia32_vfmaddsubph256_mask ((__v16hf) __A,
> +                                                      (__v16hf) __B,
> +                                                      (__v16hf) __C,
> +                                                      (__mmask16) __U);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask3_fmaddsub_ph (__m256h __A, __m256h __B, __m256h __C,
> +                         __mmask16 __U)
> +{
> +  return (__m256h) __builtin_ia32_vfmaddsubph256_mask3 ((__v16hf) __A,
> +                                                       (__v16hf) __B,
> +                                                       (__v16hf) __C,
> +                                                       (__mmask16)
> +                                                       __U);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_fmaddsub_ph (__mmask16 __U, __m256h __A, __m256h __B,
> +                         __m256h __C)
> +{
> +  return (__m256h) __builtin_ia32_vfmaddsubph256_maskz ((__v16hf) __A,
> +                                                       (__v16hf) __B,
> +                                                       (__v16hf) __C,
> +                                                       (__mmask16)
> +                                                       __U);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_fmaddsub_ph (__m128h __A, __m128h __B, __m128h __C)
> +{
> +  return (__m128h)__builtin_ia32_vfmaddsubph128_mask ((__v8hf)__A,
> +                                                     (__v8hf)__B,
> +                                                     (__v8hf)__C,
> +                                                     (__mmask8)-1);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_fmaddsub_ph (__m128h __A, __mmask8 __U, __m128h __B,
> +                     __m128h __C)
> +{
> +  return (__m128h) __builtin_ia32_vfmaddsubph128_mask ((__v8hf) __A,
> +                                                      (__v8hf) __B,
> +                                                      (__v8hf) __C,
> +                                                      (__mmask8) __U);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask3_fmaddsub_ph (__m128h __A, __m128h __B, __m128h __C,
> +                      __mmask8 __U)
> +{
> +  return (__m128h) __builtin_ia32_vfmaddsubph128_mask3 ((__v8hf) __A,
> +                                                       (__v8hf) __B,
> +                                                       (__v8hf) __C,
> +                                                       (__mmask8)
> +                                                       __U);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_fmaddsub_ph (__mmask8 __U, __m128h __A, __m128h __B,
> +                      __m128h __C)
> +{
> +  return (__m128h) __builtin_ia32_vfmaddsubph128_maskz ((__v8hf) __A,
> +                                                       (__v8hf) __B,
> +                                                       (__v8hf) __C,
> +                                                       (__mmask8)
> +                                                       __U);
> +}
> +
> +/* Intrinsics vfmsubadd[132,213,231]ph.  */
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_fmsubadd_ph (__m256h __A, __m256h __B, __m256h __C)
> +{
> +  return (__m256h) __builtin_ia32_vfmsubaddph256_mask ((__v16hf) __A,
> +                                                      (__v16hf) __B,
> +                                                      (__v16hf) __C,
> +                                                      (__mmask16) -1);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_fmsubadd_ph (__m256h __A, __mmask16 __U, __m256h __B,
> +                        __m256h __C)
> +{
> +  return (__m256h) __builtin_ia32_vfmsubaddph256_mask ((__v16hf) __A,
> +                                                      (__v16hf) __B,
> +                                                      (__v16hf) __C,
> +                                                      (__mmask16) __U);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask3_fmsubadd_ph (__m256h __A, __m256h __B, __m256h __C,
> +                         __mmask16 __U)
> +{
> +  return (__m256h) __builtin_ia32_vfmsubaddph256_mask3 ((__v16hf) __A,
> +                                                       (__v16hf) __B,
> +                                                       (__v16hf) __C,
> +                                                       (__mmask16)
> +                                                       __U);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_fmsubadd_ph (__mmask16 __U, __m256h __A, __m256h __B,
> +                         __m256h __C)
> +{
> +  return (__m256h) __builtin_ia32_vfmsubaddph256_maskz ((__v16hf) __A,
> +                                                       (__v16hf) __B,
> +                                                       (__v16hf) __C,
> +                                                       (__mmask16)
> +                                                       __U);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_fmsubadd_ph (__m128h __A, __m128h __B, __m128h __C)
> +{
> +  return (__m128h) __builtin_ia32_vfmsubaddph128_mask ((__v8hf) __A,
> +                                                      (__v8hf) __B,
> +                                                      (__v8hf) __C,
> +                                                      (__mmask8) -1);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_fmsubadd_ph (__m128h __A, __mmask8 __U, __m128h __B,
> +                     __m128h __C)
> +{
> +  return (__m128h) __builtin_ia32_vfmsubaddph128_mask ((__v8hf) __A,
> +                                                      (__v8hf) __B,
> +                                                      (__v8hf) __C,
> +                                                      (__mmask8) __U);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask3_fmsubadd_ph (__m128h __A, __m128h __B, __m128h __C,
> +                      __mmask8 __U)
> +{
> +  return (__m128h) __builtin_ia32_vfmsubaddph128_mask3 ((__v8hf) __A,
> +                                                       (__v8hf) __B,
> +                                                       (__v8hf) __C,
> +                                                       (__mmask8)
> +                                                       __U);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_fmsubadd_ph (__mmask8 __U, __m128h __A, __m128h __B,
> +                      __m128h __C)
> +{
> +  return (__m128h) __builtin_ia32_vfmsubaddph128_maskz ((__v8hf) __A,
> +                                                       (__v8hf) __B,
> +                                                       (__v8hf) __C,
> +                                                       (__mmask8)
> +                                                       __U);
> +}
> +
>  #ifdef __DISABLE_AVX512FP16VL__
>  #undef __DISABLE_AVX512FP16VL__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
> index 4bb48bc21dc..42bba719ec3 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -2875,6 +2875,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp1
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtps2ph_v8sf_mask, "__builtin_ia32_vcvtps2ph_v8sf_mask", IX86_BUILTIN_VCVTPS2PH_V8SF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8SF_V8HF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v2df_mask, "__builtin_ia32_vcvtpd2ph_v2df_mask", IX86_BUILTIN_VCVTPD2PH_V2DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtpd2ph_v4df_mask, "__builtin_ia32_vcvtpd2ph_v4df_mask", IX86_BUILTIN_VCVTPD2PH_V4DF_MASK, UNKNOWN, (int) V8HF_FTYPE_V4DF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddsub_v16hf_mask, "__builtin_ia32_vfmaddsubph256_mask", IX86_BUILTIN_VFMADDSUBPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddsub_v16hf_mask3, "__builtin_ia32_vfmaddsubph256_mask3", IX86_BUILTIN_VFMADDSUBPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddsub_v16hf_maskz, "__builtin_ia32_vfmaddsubph256_maskz", IX86_BUILTIN_VFMADDSUBPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddsub_v8hf_mask, "__builtin_ia32_vfmaddsubph128_mask", IX86_BUILTIN_VFMADDSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddsub_v8hf_mask3, "__builtin_ia32_vfmaddsubph128_mask3", IX86_BUILTIN_VFMADDSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddsub_v8hf_maskz, "__builtin_ia32_vfmaddsubph128_maskz", IX86_BUILTIN_VFMADDSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsubadd_v16hf_mask, "__builtin_ia32_vfmsubaddph256_mask", IX86_BUILTIN_VFMSUBADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsubadd_v16hf_mask3, "__builtin_ia32_vfmsubaddph256_mask3", IX86_BUILTIN_VFMSUBADDPH256_MASK3, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmsubadd_v16hf_maskz, "__builtin_ia32_vfmsubaddph256_maskz", IX86_BUILTIN_VFMSUBADDPH256_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask, "__builtin_ia32_vfmsubaddph128_mask", IX86_BUILTIN_VFMSUBADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_mask3, "__builtin_ia32_vfmsubaddph128_mask3", IX86_BUILTIN_VFMSUBADDPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmsubadd_v8hf_maskz, "__builtin_ia32_vfmsubaddph128_maskz", IX86_BUILTIN_VFMSUBADDPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
>
>  /* Builtins with rounding support.  */
>  BDESC_END (ARGS, ROUND_ARGS)
> @@ -3140,6 +3152,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2ss_mask_round,
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsh2sd_mask_round, "__builtin_ia32_vcvtsh2sd_mask_round", IX86_BUILTIN_VCVTSH2SD_MASK_ROUND, UNKNOWN, (int) V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT)
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtss2sh_mask_round, "__builtin_ia32_vcvtss2sh_mask_round", IX86_BUILTIN_VCVTSS2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT)
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_vcvtsd2sh_mask_round, "__builtin_ia32_vcvtsd2sh_mask_round", IX86_BUILTIN_VCVTSD2SH_MASK_ROUND, UNKNOWN, (int) V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask_round, "__builtin_ia32_vfmaddsubph512_mask", IX86_BUILTIN_VFMADDSUBPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_mask3_round, "__builtin_ia32_vfmaddsubph512_mask3", IX86_BUILTIN_VFMADDSUBPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddsub_v32hf_maskz_round, "__builtin_ia32_vfmaddsubph512_maskz", IX86_BUILTIN_VFMADDSUBPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask_round, "__builtin_ia32_vfmsubaddph512_mask", IX86_BUILTIN_VFMSUBADDPH512_MASK, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_mask3_round, "__builtin_ia32_vfmsubaddph512_mask3", IX86_BUILTIN_VFMSUBADDPH512_MASK3, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmsubadd_v32hf_maskz_round, "__builtin_ia32_vfmsubaddph512_maskz", IX86_BUILTIN_VFMSUBADDPH512_MASKZ, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT)
>
>  BDESC_END (ROUND_ARGS, MULTI_ARG)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 95f4a82c9cd..847684e232e 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -4542,6 +4542,13 @@ (define_mode_iterator VF_SF_AVX512VL
>    [SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
>     DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
>
> +(define_mode_iterator VFH_SF_AVX512VL
> +  [(V32HF "TARGET_AVX512FP16")
> +   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
> +   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
> +   SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
> +   DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
> +
>  (define_insn "<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>"
>    [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
>         (fma:VF_SF_AVX512VL
> @@ -4848,10 +4855,10 @@ (define_expand "fmaddsub_<mode>"
>    "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
>
>  (define_expand "<avx512>_fmaddsub_<mode>_maskz<round_expand_name>"
> -  [(match_operand:VF_AVX512VL 0 "register_operand")
> -   (match_operand:VF_AVX512VL 1 "<round_expand_nimm_predicate>")
> -   (match_operand:VF_AVX512VL 2 "<round_expand_nimm_predicate>")
> -   (match_operand:VF_AVX512VL 3 "<round_expand_nimm_predicate>")
> +  [(match_operand:VFH_AVX512VL 0 "register_operand")
> +   (match_operand:VFH_AVX512VL 1 "<round_expand_nimm_predicate>")
> +   (match_operand:VFH_AVX512VL 2 "<round_expand_nimm_predicate>")
> +   (match_operand:VFH_AVX512VL 3 "<round_expand_nimm_predicate>")
>     (match_operand:<avx512fmaskmode> 4 "register_operand")]
>    "TARGET_AVX512F"
>  {
> @@ -4861,6 +4868,20 @@ (define_expand "<avx512>_fmaddsub_<mode>_maskz<round_expand_name>"
>    DONE;
>  })
>
> +(define_expand "<avx512>_fmsubadd_<mode>_maskz<round_expand_name>"
> +  [(match_operand:VFH_AVX512VL 0 "register_operand")
> +   (match_operand:VFH_AVX512VL 1 "<round_expand_nimm_predicate>")
> +   (match_operand:VFH_AVX512VL 2 "<round_expand_nimm_predicate>")
> +   (match_operand:VFH_AVX512VL 3 "<round_expand_nimm_predicate>")
> +   (match_operand:<avx512fmaskmode> 4 "register_operand")]
> +  "TARGET_AVX512F"
> +{
> +  emit_insn (gen_fma_fmsubadd_<mode>_maskz_1<round_expand_name> (
> +    operands[0], operands[1], operands[2], operands[3],
> +    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
> +  DONE;
> +})
> +
>  (define_insn "*fma_fmaddsub_<mode>"
>    [(set (match_operand:VF_128_256 0 "register_operand" "=v,v,v,x,x")
>         (unspec:VF_128_256
> @@ -4880,11 +4901,11 @@ (define_insn "*fma_fmaddsub_<mode>"
>     (set_attr "mode" "<MODE>")])
>
>  (define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>"
> -  [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
> -       (unspec:VF_SF_AVX512VL
> -         [(match_operand:VF_SF_AVX512VL 1 "<round_nimm_predicate>" "%0,0,v")
> -          (match_operand:VF_SF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>")
> -          (match_operand:VF_SF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>,0")]
> +  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
> +       (unspec:VFH_SF_AVX512VL
> +         [(match_operand:VFH_SF_AVX512VL 1 "<round_nimm_predicate>" "%0,0,v")
> +          (match_operand:VFH_SF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>")
> +          (match_operand:VFH_SF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>,0")]
>           UNSPEC_FMADDSUB))]
>    "TARGET_AVX512F && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
>    "@
> @@ -4895,12 +4916,12 @@ (define_insn "<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>"
>     (set_attr "mode" "<MODE>")])
>
>  (define_insn "<avx512>_fmaddsub_<mode>_mask<round_name>"
> -  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
> -       (vec_merge:VF_AVX512VL
> -         (unspec:VF_AVX512VL
> -           [(match_operand:VF_AVX512VL 1 "register_operand" "0,0")
> -            (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
> -            (match_operand:VF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")]
> +  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
> +       (vec_merge:VFH_AVX512VL
> +         (unspec:VFH_AVX512VL
> +           [(match_operand:VFH_AVX512VL 1 "register_operand" "0,0")
> +            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
> +            (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")]
>             UNSPEC_FMADDSUB)
>           (match_dup 1)
>           (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
> @@ -4912,12 +4933,12 @@ (define_insn "<avx512>_fmaddsub_<mode>_mask<round_name>"
>     (set_attr "mode" "<MODE>")])
>
>  (define_insn "<avx512>_fmaddsub_<mode>_mask3<round_name>"
> -  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
> -       (vec_merge:VF_AVX512VL
> -         (unspec:VF_AVX512VL
> -           [(match_operand:VF_AVX512VL 1 "register_operand" "v")
> -            (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
> -            (match_operand:VF_AVX512VL 3 "register_operand" "0")]
> +  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
> +       (vec_merge:VFH_AVX512VL
> +         (unspec:VFH_AVX512VL
> +           [(match_operand:VFH_AVX512VL 1 "register_operand" "v")
> +            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
> +            (match_operand:VFH_AVX512VL 3 "register_operand" "0")]
>             UNSPEC_FMADDSUB)
>           (match_dup 3)
>           (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
> @@ -4946,12 +4967,12 @@ (define_insn "*fma_fmsubadd_<mode>"
>     (set_attr "mode" "<MODE>")])
>
>  (define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>"
> -  [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v")
> -       (unspec:VF_SF_AVX512VL
> -         [(match_operand:VF_SF_AVX512VL   1 "<round_nimm_predicate>" "%0,0,v")
> -          (match_operand:VF_SF_AVX512VL   2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>")
> -          (neg:VF_SF_AVX512VL
> -            (match_operand:VF_SF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>,0"))]
> +  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
> +       (unspec:VFH_SF_AVX512VL
> +         [(match_operand:VFH_SF_AVX512VL   1 "<round_nimm_predicate>" "%0,0,v")
> +          (match_operand:VFH_SF_AVX512VL   2 "<round_nimm_predicate>" "<round_constraint>,v,<round_constraint>")
> +          (neg:VFH_SF_AVX512VL
> +            (match_operand:VFH_SF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>,0"))]
>           UNSPEC_FMADDSUB))]
>    "TARGET_AVX512F && <sd_mask_mode512bit_condition> && <round_mode512bit_condition>"
>    "@
> @@ -4962,13 +4983,13 @@ (define_insn "<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>"
>     (set_attr "mode" "<MODE>")])
>
>  (define_insn "<avx512>_fmsubadd_<mode>_mask<round_name>"
> -  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v")
> -       (vec_merge:VF_AVX512VL
> -         (unspec:VF_AVX512VL
> -           [(match_operand:VF_AVX512VL 1 "register_operand" "0,0")
> -            (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
> -            (neg:VF_AVX512VL
> -              (match_operand:VF_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))]
> +  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
> +       (vec_merge:VFH_AVX512VL
> +         (unspec:VFH_AVX512VL
> +           [(match_operand:VFH_AVX512VL 1 "register_operand" "0,0")
> +            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
> +            (neg:VFH_AVX512VL
> +              (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))]
>             UNSPEC_FMADDSUB)
>           (match_dup 1)
>           (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
> @@ -4980,13 +5001,13 @@ (define_insn "<avx512>_fmsubadd_<mode>_mask<round_name>"
>     (set_attr "mode" "<MODE>")])
>
>  (define_insn "<avx512>_fmsubadd_<mode>_mask3<round_name>"
> -  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
> -       (vec_merge:VF_AVX512VL
> -         (unspec:VF_AVX512VL
> -           [(match_operand:VF_AVX512VL 1 "register_operand" "v")
> -            (match_operand:VF_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
> -            (neg:VF_AVX512VL
> -              (match_operand:VF_AVX512VL 3 "register_operand" "0"))]
> +  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
> +       (vec_merge:VFH_AVX512VL
> +         (unspec:VFH_AVX512VL
> +           [(match_operand:VFH_AVX512VL 1 "register_operand" "v")
> +            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>")
> +            (neg:VFH_AVX512VL
> +              (match_operand:VFH_AVX512VL 3 "register_operand" "0"))]
>             UNSPEC_FMADDSUB)
>           (match_dup 3)
>           (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk")))]
> diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> index deb25098f25..51a0cf2fe87 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> @@ -757,6 +757,12 @@
>  #define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8)
>  #define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8)
>  #define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, 8)
> +#define __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, 8)
> +#define __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, 8)
> +#define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8)
> +#define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8)
> +#define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> index dbe206bd1bb..a53f4653908 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> @@ -774,6 +774,12 @@
>  #define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8)
>  #define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8)
>  #define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, 8)
> +#define __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, 8)
> +#define __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, 8)
> +#define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8)
> +#define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8)
> +#define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> index e64321d8afa..48895e0dd0d 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> @@ -836,6 +836,8 @@ test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8)
>  test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8)
>  test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8)
>  test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8)
> +test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
> +test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
>  test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
>  test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
>  test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
> @@ -868,6 +870,12 @@ test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8)
>  test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8)
>  test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8)
>  test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8)
> +test_4 (_mm512_mask_fmaddsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
> +test_4 (_mm512_mask3_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
> +test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
> +test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
> +test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
> +test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
>  test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
>  test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
>  test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> index d92898fdd11..bc530da388b 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> @@ -939,6 +939,8 @@ test_3 (_mm_maskz_cvt_roundsh_ss, __m128, __mmask8, __m128, __m128h, 8)
>  test_3 (_mm_maskz_cvt_roundsh_sd, __m128d, __mmask8, __m128d, __m128h, 8)
>  test_3 (_mm_maskz_cvt_roundss_sh, __m128h, __mmask8, __m128h, __m128, 8)
>  test_3 (_mm_maskz_cvt_roundsd_sh, __m128h, __mmask8, __m128h, __m128d, 8)
> +test_3 (_mm512_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
> +test_3 (_mm512_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, 9)
>  test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
>  test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
>  test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
> @@ -970,6 +972,12 @@ test_4 (_mm_mask_cvt_roundsh_ss, __m128, __m128, __mmask8, __m128, __m128h, 8)
>  test_4 (_mm_mask_cvt_roundsh_sd, __m128d, __m128d, __mmask8, __m128d, __m128h, 8)
>  test_4 (_mm_mask_cvt_roundss_sh, __m128h, __m128h, __mmask8, __m128h, __m128, 8)
>  test_4 (_mm_mask_cvt_roundsd_sh, __m128h, __m128h, __mmask8, __m128h, __m128d, 8)
> +test_4 (_mm512_mask_fmaddsub_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
> +test_4 (_mm512_mask3_fmaddsub_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
> +test_4 (_mm512_maskz_fmaddsub_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
> +test_4 (_mm512_mask3_fmsubadd_round_ph, __m512h, __m512h, __m512h, __m512h, __mmask32, 9)
> +test_4 (_mm512_mask_fmsubadd_round_ph, __m512h, __m512h, __mmask32, __m512h, __m512h, 9)
> +test_4 (_mm512_maskz_fmsubadd_round_ph, __m512h, __mmask32, __m512h, __m512h, __m512h, 9)
>  test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
>  test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
>  test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> index 2f5027ba36f..df43931ca97 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> @@ -775,6 +775,12 @@
>  #define __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsh2sd_mask_round(A, B, C, D, 8)
>  #define __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtss2sh_mask_round(A, B, C, D, 8)
>  #define __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, E) __builtin_ia32_vcvtsd2sh_mask_round(A, B, C, D, 8)
> +#define __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask(A, B, C, D, 8)
> +#define __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_mask3(A, B, C, D, 8)
> +#define __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, E) __builtin_ia32_vfmaddsubph512_maskz(A, B, C, D, 8)
> +#define __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask(A, B, C, D, 8)
> +#define __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_mask3(A, B, C, D, 8)
> +#define __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, E) __builtin_ia32_vfmsubaddph512_maskz(A, B, C, D, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph
  2021-07-01  6:16 ` [PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph liuhongt
@ 2021-09-22  4:38   ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-09-22  4:38 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek

I'm going to check in 7 patches.

[PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph
[PATCH 50/62] AVX512FP16: Add testcases for
vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph.
[PATCH 51/62] AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh.
[PATCH 52/62] AVX512FP16: Add testcases for
vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh.
[PATCH 53/62] AVX512FP16: Add expander for sqrthf2.
[PATCH 54/62] AVX512FP16: Add expander for ceil/floor/trunc/roundeven.
[PATCH 55/62] AVX512FP16: Add expander for cstorehf4.

bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,}
Newly added runtime tests passed on sde{-m32,}.

On Thu, Jul 1, 2021 at 2:18 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/avx512fp16intrin.h (_mm512_fcmadd_pch):
>         New intrinsic.
>         (_mm512_mask_fcmadd_pch): Likewise.
>         (_mm512_mask3_fcmadd_pch): Likewise.
>         (_mm512_maskz_fcmadd_pch): Likewise.
>         (_mm512_fmadd_pch): Likewise.
>         (_mm512_mask_fmadd_pch): Likewise.
>         (_mm512_mask3_fmadd_pch): Likewise.
>         (_mm512_maskz_fmadd_pch): Likewise.
>         (_mm512_fcmadd_round_pch): Likewise.
>         (_mm512_mask_fcmadd_round_pch): Likewise.
>         (_mm512_mask3_fcmadd_round_pch): Likewise.
>         (_mm512_maskz_fcmadd_round_pch): Likewise.
>         (_mm512_fmadd_round_pch): Likewise.
>         (_mm512_mask_fmadd_round_pch): Likewise.
>         (_mm512_mask3_fmadd_round_pch): Likewise.
>         (_mm512_maskz_fmadd_round_pch): Likewise.
>         (_mm512_fcmul_pch): Likewise.
>         (_mm512_mask_fcmul_pch): Likewise.
>         (_mm512_maskz_fcmul_pch): Likewise.
>         (_mm512_fmul_pch): Likewise.
>         (_mm512_mask_fmul_pch): Likewise.
>         (_mm512_maskz_fmul_pch): Likewise.
>         (_mm512_fcmul_round_pch): Likewise.
>         (_mm512_mask_fcmul_round_pch): Likewise.
>         (_mm512_maskz_fcmul_round_pch): Likewise.
>         (_mm512_fmul_round_pch): Likewise.
>         (_mm512_mask_fmul_round_pch): Likewise.
>         (_mm512_maskz_fmul_round_pch): Likewise.
>         * config/i386/avx512fp16vlintrin.h (_mm_fmadd_pch):
>         New intrinsic.
>         (_mm_mask_fmadd_pch): Likewise.
>         (_mm_mask3_fmadd_pch): Likewise.
>         (_mm_maskz_fmadd_pch): Likewise.
>         (_mm256_fmadd_pch): Likewise.
>         (_mm256_mask_fmadd_pch): Likewise.
>         (_mm256_mask3_fmadd_pch): Likewise.
>         (_mm256_maskz_fmadd_pch): Likewise.
>         (_mm_fcmadd_pch): Likewise.
>         (_mm_mask_fcmadd_pch): Likewise.
>         (_mm_mask3_fcmadd_pch): Likewise.
>         (_mm_maskz_fcmadd_pch): Likewise.
>         (_mm256_fcmadd_pch): Likewise.
>         (_mm256_mask_fcmadd_pch): Likewise.
>         (_mm256_mask3_fcmadd_pch): Likewise.
>         (_mm256_maskz_fcmadd_pch): Likewise.
>         (_mm_fmul_pch): Likewise.
>         (_mm_mask_fmul_pch): Likewise.
>         (_mm_maskz_fmul_pch): Likewise.
>         (_mm256_fmul_pch): Likewise.
>         (_mm256_mask_fmul_pch): Likewise.
>         (_mm256_maskz_fmul_pch): Likewise.
>         (_mm_fcmul_pch): Likewise.
>         (_mm_mask_fcmul_pch): Likewise.
>         (_mm_maskz_fcmul_pch): Likewise.
>         (_mm256_fcmul_pch): Likewise.
>         (_mm256_mask_fcmul_pch): Likewise.
>         (_mm256_maskz_fcmul_pch): Likewise.
>         * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF_V8HF_V8HF,
>         V8HF_FTYPE_V16HF_V16HF_V16HF, V16HF_FTYPE_V16HF_V16HF_V16HF_UQI,
>         V32HF_FTYPE_V32HF_V32HF_V32HF_INT,
>         V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT): Add new builtin types.
>         * config/i386/i386-builtin.def: Add new builtins.
>         * config/i386/i386-expand.c: Handle new builtin types.
>         * config/i386/subst.md (SUBST_CV): New.
>         (maskc_name): Ditto.
>         (maskc_operand3): Ditto.
>         (maskc): Ditto.
>         (sdc_maskz_name): Ditto.
>         (sdc_mask_op4): Ditto.
>         (sdc_mask_op5): Ditto.
>         (sdc_mask_mode512bit_condition): Ditto.
>         (sdc): Ditto.
>         (round_maskc_operand3): Ditto.
>         (round_sdc_mask_operand4): Ditto.
>         (round_maskc_op3): Ditto.
>         (round_sdc_mask_op4): Ditto.
>         (round_saeonly_sdc_mask_operand5): Ditto.
>         * config/i386/sse.md (unspec): Add complex fma unspecs.
>         (avx512fmaskcmode): New.
>         (UNSPEC_COMPLEX_F_C_MA): Ditto.
>         (UNSPEC_COMPLEX_F_C_MUL): Ditto.
>         (complexopname): Ditto.
>         (<avx512>_fmaddc_<mode>_maskz<round_expand_name>): New expander.
>         (<avx512>_fcmaddc_<mode>_maskz<round_expand_name>): Ditto.
>         (fma_<complexopname>_<mode><sdc_maskz_name><round_name>): New
>         define insn.
>         (<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto.
>         (<avx512>_<complexopname>_<mode><maskc_name><round_name>): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx-1.c: Add test for new builtins.
>         * gcc.target/i386/sse-13.c: Ditto.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * gcc.target/i386/sse-14.c: Add test for new intrinsics.
>         * gcc.target/i386/sse-22.c: Ditto.
> ---
>  gcc/config/i386/avx512fp16intrin.h     | 386 +++++++++++++++++++++++++
>  gcc/config/i386/avx512fp16vlintrin.h   | 257 ++++++++++++++++
>  gcc/config/i386/i386-builtin-types.def |   5 +
>  gcc/config/i386/i386-builtin.def       |  30 ++
>  gcc/config/i386/i386-expand.c          |   5 +
>  gcc/config/i386/sse.md                 |  98 +++++++
>  gcc/config/i386/subst.md               |  40 +++
>  gcc/testsuite/gcc.target/i386/avx-1.c  |  10 +
>  gcc/testsuite/gcc.target/i386/sse-13.c |  10 +
>  gcc/testsuite/gcc.target/i386/sse-14.c |  14 +
>  gcc/testsuite/gcc.target/i386/sse-22.c |  14 +
>  gcc/testsuite/gcc.target/i386/sse-23.c |  10 +
>  12 files changed, 879 insertions(+)
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> index 5c85ec15b22..9dd71019972 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -6109,6 +6109,392 @@ _mm_maskz_fnmsub_round_sh (__mmask8 __U, __m128h __W, __m128h __A,
>
>  #endif /* __OPTIMIZE__ */
>
> +/* Intrinsics vf[,c]maddcph.  */
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfcmaddcph_v32hf_round ((__v32hf) __C,
> +                                          (__v32hf) __A,
> +                                          (__v32hf) __B,
> +                                          _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
> +{
> +  return (__m512h) __builtin_ia32_movaps512_mask
> +    ((__v16sf)
> +     __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __D,
> +                                                (__v32hf) __A,
> +                                                (__v32hf) __C, __B,
> +                                                _MM_FROUND_CUR_DIRECTION),
> +     (__v16sf) __A, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask3_fcmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __C,
> +                                               (__v32hf) __A,
> +                                               (__v32hf) __B,
> +                                               __D, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fcmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfcmaddcph_v32hf_maskz_round((__v32hf) __D,
> +                                               (__v32hf) __B,
> +                                               (__v32hf) __C,
> +                                               __A, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fmadd_pch (__m512h __A, __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddcph_v32hf_round((__v32hf) __C,
> +                                        (__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
> +{
> +  return (__m512h) __builtin_ia32_movaps512_mask
> +    ((__v16sf)
> +     __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) __D,
> +                                               (__v32hf) __A,
> +                                               (__v32hf) __C, __B,
> +                                               _MM_FROUND_CUR_DIRECTION),
> +     (__v16sf) __A, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask3_fmadd_pch (__m512h __A, __m512h __B, __m512h __C, __mmask16 __D)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddcph_v32hf_mask_round((__v32hf) __C,
> +                                             (__v32hf) __A,
> +                                             (__v32hf) __B,
> +                                             __D, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fmadd_pch (__mmask16 __A, __m512h __B, __m512h __C, __m512h __D)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddcph_v32hf_maskz_round((__v32hf) __D,
> +                                              (__v32hf) __B,
> +                                              (__v32hf) __C,
> +                                              __A, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
> +{
> +  return (__m512h)__builtin_ia32_vfcmaddcph_v32hf_round((__v32hf) __C,
> +                                                       (__v32hf) __A,
> +                                                       (__v32hf) __B,
> +                                                       __D);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fcmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
> +                             __m512h __D, const int __E)
> +{
> +  return (__m512h) __builtin_ia32_movaps512_mask
> +    ((__v16sf)
> +     __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __D,
> +                                                (__v32hf) __A,
> +                                                (__v32hf) __C, __B,
> +                                                __E),
> +     (__v16sf) __A, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask3_fcmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
> +                              __mmask16 __D, const int __E)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) __C,
> +                                               (__v32hf) __A,
> +                                               (__v32hf) __B,
> +                                               __D, __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fcmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
> +                              __m512h __D, const int __E)
> +{
> +  return (__m512h)__builtin_ia32_vfcmaddcph_v32hf_maskz_round((__v32hf) __D,
> +                                                             (__v32hf) __B,
> +                                                             (__v32hf) __C,
> +                                                             __A,
> +                                                             __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C, const int __D)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddcph_v32hf_round ((__v32hf) __C,
> +                                         (__v32hf) __A,
> +                                         (__v32hf) __B,
> +                                         __D);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fmadd_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
> +                            __m512h __D, const int __E)
> +{
> +  return (__m512h) __builtin_ia32_movaps512_mask
> +    ((__v16sf)
> +     __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) __D,
> +                                               (__v32hf) __A,
> +                                               (__v32hf) __C, __B,
> +                                               __E),
> +     (__v16sf) __A, __B);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask3_fmadd_round_pch (__m512h __A, __m512h __B, __m512h __C,
> +                             __mmask16 __D, const int __E)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) __C,
> +                                              (__v32hf) __A,
> +                                              (__v32hf) __B,
> +                                              __D, __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fmadd_round_pch (__mmask16 __A, __m512h __B, __m512h __C,
> +                             __m512h __D, const int __E)
> +{
> +  return (__m512h)__builtin_ia32_vfmaddcph_v32hf_maskz_round((__v32hf) __D,
> +                                                            (__v32hf) __B,
> +                                                            (__v32hf) __C,
> +                                                            __A, __E);
> +}
> +
> +#else
> +#define _mm512_fcmadd_round_pch(A, B, C, D)                    \
> +  (__m512h) __builtin_ia32_vfcmaddcph_v32hf_round ((C), (A), (B), (D))
> +
> +#define _mm512_mask_fcmadd_round_pch(A, B, C, D, E)                    \
> +  ((__m512h) __builtin_ia32_movaps512_mask (                           \
> +   (__v16sf)                                                           \
> +    __builtin_ia32_vfcmaddcph_v32hf_mask_round ((__v32hf) (D),         \
> +                                               (__v32hf) (A),          \
> +                                               (__v32hf) (C),          \
> +                                               (B), (E)),              \
> +                                               (__v16sf) (A), (B)));
> +
> +
> +#define _mm512_mask3_fcmadd_round_pch(A, B, C, D, E)                   \
> +  ((__m512h)                                                           \
> +   __builtin_ia32_vfcmaddcph_v32hf_mask_round ((C), (A), (B), (D), (E)))
> +
> +#define _mm512_maskz_fcmadd_round_pch(A, B, C, D, E)                   \
> +  (__m512h)                                                            \
> +   __builtin_ia32_vfcmaddcph_v32hf_maskz_round((D), (B), (C), (A), (E))
> +
> +#define _mm512_fmadd_round_pch(A, B, C, D)                     \
> +  (__m512h) __builtin_ia32_vfmaddcph_v32hf_round((C), (A), (B), (D))
> +
> +#define _mm512_mask_fmadd_round_pch(A, B, C, D, E)                     \
> +  ((__m512h) __builtin_ia32_movaps512_mask (                           \
> +   (__v16sf)                                                           \
> +    __builtin_ia32_vfmaddcph_v32hf_mask_round ((__v32hf) (D),          \
> +                                              (__v32hf) (A),           \
> +                                              (__v32hf) (C),           \
> +                                              (B), (E)),               \
> +                                              (__v16sf) (A), (B)));
> +
> +#define _mm512_mask3_fmadd_round_pch(A, B, C, D, E)                    \
> +  (__m512h)                                                            \
> +   __builtin_ia32_vfmaddcph_v32hf_mask_round((C), (A), (B), (D), (E))
> +
> +#define _mm512_maskz_fmadd_round_pch(A, B, C, D, E)                    \
> +  (__m512h)                                                            \
> +   __builtin_ia32_vfmaddcph_v32hf_maskz_round((D), (B), (C), (A), (E))
> +
> +#endif /* __OPTIMIZE__ */
> +
> +/* Intrinsics vf[,c]mulcph.  */
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fcmul_pch (__m512h __A, __m512h __B)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfcmulcph_v32hf_round((__v32hf) __A,
> +                                        (__v32hf) __B,
> +                                        _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fcmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __C,
> +                                             (__v32hf) __D,
> +                                             (__v32hf) __A,
> +                                             __B, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fcmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __B,
> +                                             (__v32hf) __C,
> +                                             _mm512_setzero_ph (),
> +                                             __A, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fmul_pch (__m512h __A, __m512h __B)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmulcph_v32hf_round((__v32hf) __A,
> +                                       (__v32hf) __B,
> +                                       _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fmul_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __C,
> +                                            (__v32hf) __D,
> +                                            (__v32hf) __A,
> +                                            __B, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fmul_pch (__mmask16 __A, __m512h __B, __m512h __C)
> +{
> +  return (__m512h)
> +    __builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __B,
> +                                            (__v32hf) __C,
> +                                            _mm512_setzero_ph (),
> +                                            __A, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fcmul_round_pch (__m512h __A, __m512h __B, const int __D)
> +{
> +  return (__m512h)__builtin_ia32_vfcmulcph_v32hf_round((__v32hf) __A,
> +                                                      (__v32hf) __B, __D);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fcmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
> +                            __m512h __D, const int __E)
> +{
> +  return (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __C,
> +                                                           (__v32hf) __D,
> +                                                           (__v32hf) __A,
> +                                                           __B, __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fcmul_round_pch (__mmask16 __A, __m512h __B,
> +                             __m512h __C, const int __E)
> +{
> +  return (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round((__v32hf) __B,
> +                                                           (__v32hf) __C,
> +                                                           _mm512_setzero_ph (),
> +                                                           __A, __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_fmul_round_pch (__m512h __A, __m512h __B, const int __D)
> +{
> +  return (__m512h)__builtin_ia32_vfmulcph_v32hf_round((__v32hf) __A,
> +                                                     (__v32hf) __B,
> +                                                     __D);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_mask_fmul_round_pch (__m512h __A, __mmask16 __B, __m512h __C,
> +                           __m512h __D, const int __E)
> +{
> +  return (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __C,
> +                                                          (__v32hf) __D,
> +                                                          (__v32hf) __A,
> +                                                          __B, __E);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_maskz_fmul_round_pch (__mmask16 __A, __m512h __B,
> +                            __m512h __C, const int __E)
> +{
> +  return (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round((__v32hf) __B,
> +                                                          (__v32hf) __C,
> +                                                          _mm512_setzero_ph (),
> +                                                          __A, __E);
> +}
> +
> +#else
> +#define _mm512_fcmul_round_pch(A, B, D)                                \
> +  (__m512h)__builtin_ia32_vfcmulcph_v32hf_round(A, B, D)
> +
> +#define _mm512_mask_fcmul_round_pch(A, B, C, D, E)                     \
> +  (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round(C, D, A, B, E)
> +
> +#define _mm512_maskz_fcmul_round_pch(A, B, C, E)                       \
> +  (__m512h)__builtin_ia32_vfcmulcph_v32hf_mask_round(B, C,             \
> +                                                    _mm512_setzero_ph(), \
> +                                                    A, E)
> +
> +#define _mm512_fmul_round_pch(A, B, D)                 \
> +  (__m512h)__builtin_ia32_vfmulcph_v32hf_round(A, B, D)
> +
> +#define _mm512_mask_fmul_round_pch(A, B, C, D, E)                      \
> +  (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round(C, D, A, B, E)
> +
> +#define _mm512_maskz_fmul_round_pch(A, B, C, E)                                \
> +  (__m512h)__builtin_ia32_vfmulcph_v32hf_mask_round(B, C,              \
> +                                                   _mm512_setzero_ph (), \
> +                                                   A, E)
> +
> +#endif /* __OPTIMIZE__ */
> +
>  #ifdef __DISABLE_AVX512FP16__
>  #undef __DISABLE_AVX512FP16__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
> index bba98f105ac..c7bdfbc0517 100644
> --- a/gcc/config/i386/avx512fp16vlintrin.h
> +++ b/gcc/config/i386/avx512fp16vlintrin.h
> @@ -2815,6 +2815,263 @@ _mm_maskz_fnmsub_ph (__mmask8 __U, __m128h __A, __m128h __B,
>                                                         __U);
>  }
>
> +/* Intrinsics vf[,c]maddcph.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_fmadd_pch (__m128h __A, __m128h __B, __m128h __C)
> +{
> +  return (__m128h)__builtin_ia32_vfmaddcph_v8hf((__v8hf) __C, (__v8hf) __A,
> +                                               (__v8hf) __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_fmadd_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return (__m128h) __builtin_ia32_movaps128_mask
> +    ((__v4sf)
> +     __builtin_ia32_vfmaddcph_v8hf_mask ((__v8hf) __D,
> +                                        (__v8hf) __A,
> +                                        (__v8hf) __C, __B),
> +     (__v4sf) __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask3_fmadd_pch (__m128h __A, __m128h __B, __m128h __C,  __mmask8 __D)
> +{
> +  return (__m128h) __builtin_ia32_vfmaddcph_v8hf_mask ((__v8hf) __C,
> +                                                      (__v8hf) __A,
> +                                                      (__v8hf) __B, __D);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_fmadd_pch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
> +{
> +  return (__m128h)__builtin_ia32_vfmaddcph_v8hf_maskz((__v8hf) __D,
> +                                                     (__v8hf) __B,
> +                                                     (__v8hf) __C, __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_fmadd_pch (__m256h __A, __m256h __B, __m256h __C)
> +{
> +  return (__m256h)__builtin_ia32_vfmaddcph_v16hf((__v16hf) __C, (__v16hf) __A,
> +                                                (__v16hf) __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_fmadd_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D)
> +{
> +  return (__m256h) __builtin_ia32_movaps256_mask
> +    ((__v8sf)
> +     __builtin_ia32_vfmaddcph_v16hf_mask ((__v16hf) __D,
> +                                         (__v16hf) __A,
> +                                         (__v16hf) __C, __B),
> +     (__v8sf) __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask3_fmadd_pch (__m256h __A, __m256h __B, __m256h __C,  __mmask8 __D)
> +{
> +  return (__m256h) __builtin_ia32_vfmaddcph_v16hf_mask ((__v16hf) __C,
> +                                                       (__v16hf) __A,
> +                                                       (__v16hf) __B, __D);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_fmadd_pch (__mmask8 __A, __m256h __B, __m256h __C, __m256h __D)
> +{
> +  return (__m256h)__builtin_ia32_vfmaddcph_v16hf_maskz((__v16hf) __D,
> +                                                      (__v16hf) __B,
> +                                                      (__v16hf) __C, __A);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_fcmadd_pch (__m128h __A, __m128h __B, __m128h __C)
> +{
> +  return (__m128h)__builtin_ia32_vfcmaddcph_v8hf ((__v8hf) __C,
> +                                                 (__v8hf) __A, (__v8hf) __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_fcmadd_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return (__m128h)__builtin_ia32_movaps128_mask
> +    ((__v4sf)
> +     __builtin_ia32_vfcmaddcph_v8hf_mask ((__v8hf) __D,
> +                                         (__v8hf) __A,
> +                                         (__v8hf) __C, __B),
> +     (__v4sf) __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask3_fcmadd_pch (__m128h __A, __m128h __B, __m128h __C,  __mmask8 __D)
> +{
> +  return (__m128h) __builtin_ia32_vfcmaddcph_v8hf_mask ((__v8hf) __C,
> +                                                       (__v8hf) __A,
> +                                                       (__v8hf) __B, __D);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_fcmadd_pch (__mmask8 __A, __m128h __B, __m128h __C, __m128h __D)
> +{
> +  return (__m128h)__builtin_ia32_vfcmaddcph_v8hf_maskz ((__v8hf) __D,
> +                                                       (__v8hf) __B,
> +                                                       (__v8hf) __C, __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_fcmadd_pch (__m256h __A, __m256h __B, __m256h __C)
> +{
> +  return (__m256h)__builtin_ia32_vfcmaddcph_v16hf((__v16hf) __C,
> +                                                 (__v16hf) __A, (__v16hf) __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_fcmadd_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D)
> +{
> +  return (__m256h) __builtin_ia32_movaps256_mask
> +    ((__v8sf)
> +     __builtin_ia32_vfcmaddcph_v16hf_mask ((__v16hf) __D,
> +                                          (__v16hf) __A,
> +                                          (__v16hf) __C, __B),
> +     (__v8sf) __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask3_fcmadd_pch (__m256h __A, __m256h __B, __m256h __C,  __mmask8 __D)
> +{
> +  return (__m256h) __builtin_ia32_vfcmaddcph_v16hf_mask ((__v16hf) __C,
> +                                                        (__v16hf) __A,
> +                                                        (__v16hf) __B, __D);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_fcmadd_pch (__mmask8 __A, __m256h __B, __m256h __C, __m256h __D)
> +{
> +  return (__m256h)__builtin_ia32_vfcmaddcph_v16hf_maskz((__v16hf) __D,
> +                                                       (__v16hf) __B,
> +                                                       (__v16hf) __C, __A);
> +}
> +
> +/* Intrinsics vf[,c]mulcph.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_fmul_pch (__m128h __A, __m128h __B)
> +{
> +  return (__m128h)__builtin_ia32_vfmulcph_v8hf((__v8hf) __A, (__v8hf) __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_fmul_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return (__m128h)__builtin_ia32_vfmulcph_v8hf_mask((__v8hf) __C,
> +                                                   (__v8hf) __D,
> +                                                   (__v8hf) __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_fmul_pch (__mmask8 __A, __m128h __B, __m128h __C)
> +{
> +  return (__m128h)__builtin_ia32_vfmulcph_v8hf_mask((__v8hf) __B,
> +                                                   (__v8hf) __C,
> +                                                   _mm_setzero_ph (),
> +                                                   __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_fmul_pch (__m256h __A, __m256h __B)
> +{
> +  return (__m256h)__builtin_ia32_vfmulcph_v16hf((__v16hf) __A, (__v16hf) __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_fmul_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D)
> +{
> +  return (__m256h)__builtin_ia32_vfmulcph_v16hf_mask((__v16hf) __C,
> +                                                    (__v16hf) __D,
> +                                                    (__v16hf) __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_fmul_pch (__mmask8 __A, __m256h __B, __m256h __C)
> +{
> +  return (__m256h)__builtin_ia32_vfmulcph_v16hf_mask((__v16hf) __B,
> +                                                    (__v16hf) __C,
> +                                                    _mm256_setzero_ph (),
> +                                                    __A);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_fcmul_pch (__m128h __A, __m128h __B)
> +{
> +  return (__m128h)__builtin_ia32_vfcmulcph_v8hf((__v8hf) __A, (__v8hf) __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mask_fcmul_pch (__m128h __A, __mmask8 __B, __m128h __C, __m128h __D)
> +{
> +  return (__m128h)__builtin_ia32_vfcmulcph_v8hf_mask((__v8hf) __C, (__v8hf) __D,
> +                                                    (__v8hf) __A, __B);
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_maskz_fcmul_pch (__mmask8 __A, __m128h __B, __m128h __C)
> +{
> +  return (__m128h)__builtin_ia32_vfcmulcph_v8hf_mask((__v8hf) __B,
> +                                                    (__v8hf) __C,
> +                                                    _mm_setzero_ph (),
> +                                                    __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_fcmul_pch (__m256h __A, __m256h __B)
> +{
> +  return (__m256h)__builtin_ia32_vfcmulcph_v16hf((__v16hf) __A, (__v16hf) __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_mask_fcmul_pch (__m256h __A, __mmask8 __B, __m256h __C, __m256h __D)
> +{
> +  return (__m256h)__builtin_ia32_vfcmulcph_v16hf_mask((__v16hf) __C,
> +                                                     (__v16hf) __D,
> +                                                     (__v16hf) __A, __B);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_maskz_fcmul_pch (__mmask8 __A, __m256h __B, __m256h __C)
> +{
> +  return (__m256h)__builtin_ia32_vfcmulcph_v16hf_mask((__v16hf) __B,
> +                                                     (__v16hf) __C,
> +                                                     _mm256_setzero_ph (),
> +                                                     __A);
> +}
> +
>  #ifdef __DISABLE_AVX512FP16VL__
>  #undef __DISABLE_AVX512FP16VL__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index 22b924bf98d..35bcafd14e3 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -1348,6 +1348,7 @@ DEF_FUNCTION_TYPE (V8DI, V8HF, V8DI, UQI, INT)
>  DEF_FUNCTION_TYPE (V8DF, V8HF, V8DF, UQI, INT)
>  DEF_FUNCTION_TYPE (V8HF, V8DI, V8HF, UQI, INT)
>  DEF_FUNCTION_TYPE (V8HF, V8DF, V8HF, UQI, INT)
> +DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF)
>  DEF_FUNCTION_TYPE (V8HF, V8HF, V8HF, V8HF, UQI, INT)
>  DEF_FUNCTION_TYPE (V8HF, V2DF, V8HF, V8HF, UQI, INT)
>  DEF_FUNCTION_TYPE (V8HF, V4SF, V8HF, V8HF, UQI, INT)
> @@ -1358,12 +1359,14 @@ DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF)
>  DEF_FUNCTION_TYPE (V16HI, V16HF, V16HI, UHI)
>  DEF_FUNCTION_TYPE (V16HF, V16HI, V16HF, UHI)
>  DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, UHI)
> +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF)
>  DEF_FUNCTION_TYPE (V16SI, V16HF, V16SI, UHI, INT)
>  DEF_FUNCTION_TYPE (V16SF, V16HF, V16SF, UHI, INT)
>  DEF_FUNCTION_TYPE (V16HF, V16HF, INT, V16HF, UHI)
>  DEF_FUNCTION_TYPE (UHI, V16HF, V16HF, INT, UHI)
>  DEF_FUNCTION_TYPE (V16HF, V16SI, V16HF, UHI, INT)
>  DEF_FUNCTION_TYPE (V16HF, V16SF, V16HF, UHI, INT)
> +DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UQI)
>  DEF_FUNCTION_TYPE (V16HF, V16HF, V16HF, V16HF, UHI)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, INT)
> @@ -1371,7 +1374,9 @@ DEF_FUNCTION_TYPE (V32HI, V32HF, V32HI, USI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HI, V32HF, USI, INT)
>  DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, USI, INT)
> +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI)
>  DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
> +DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, UHI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, INT, V32HF, USI, INT)
> diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
> index f446a6ce5d3..448f9f75fa4 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -2911,6 +2911,26 @@ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask, "__builtin_ia32_vfnmsubph128_mask", IX86_BUILTIN_VFNMSUBPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_mask3, "__builtin_ia32_vfnmsubph128_mask3", IX86_BUILTIN_VFNMSUBPH128_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
>  BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fnmsub_v8hf_maskz, "__builtin_ia32_vfnmsubph128_maskz", IX86_BUILTIN_VFNMSUBPH128_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v8hf, "__builtin_ia32_vfmaddcph_v8hf", IX86_BUILTIN_VFMADDCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddc_v8hf_mask, "__builtin_ia32_vfmaddcph_v8hf_mask", IX86_BUILTIN_VFMADDCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmaddc_v8hf_maskz, "__builtin_ia32_vfmaddcph_v8hf_maskz", IX86_BUILTIN_VFMADDCPH_V8HF_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v16hf, "__builtin_ia32_vfmaddcph_v16hf", IX86_BUILTIN_VFMADDCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddc_v16hf_mask, "__builtin_ia32_vfmaddcph_v16hf_mask", IX86_BUILTIN_VFMADDCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmaddc_v16hf_maskz, "__builtin_ia32_vfmaddcph_v16hf_maskz", IX86_BUILTIN_VFMADDCPH_V16HF_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v8hf, "__builtin_ia32_vfcmaddcph_v8hf", IX86_BUILTIN_VFCMADDCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddc_v8hf_mask, "__builtin_ia32_vfcmaddcph_v8hf_mask", IX86_BUILTIN_VFCMADDCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmaddc_v8hf_maskz, "__builtin_ia32_vfcmaddcph_v8hf_maskz", IX86_BUILTIN_VFCMADDCPH_V8HF_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v16hf, "__builtin_ia32_vfcmaddcph_v16hf", IX86_BUILTIN_VFCMADDCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmaddc_v16hf_mask, "__builtin_ia32_vfcmaddcph_v16hf_mask", IX86_BUILTIN_VFCMADDCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmaddc_v16hf_maskz, "__builtin_ia32_vfcmaddcph_v16hf_maskz", IX86_BUILTIN_VFCMADDCPH_V16HF_MASKZ, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulc_v8hf, "__builtin_ia32_vfcmulcph_v8hf", IX86_BUILTIN_VFCMULCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fcmulc_v8hf_mask, "__builtin_ia32_vfcmulcph_v8hf_mask", IX86_BUILTIN_VFCMULCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmulc_v16hf, "__builtin_ia32_vfcmulcph_v16hf", IX86_BUILTIN_VFCMULCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fcmulc_v16hf_mask, "__builtin_ia32_vfcmulcph_v16hf_mask", IX86_BUILTIN_VFCMULCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulc_v8hf, "__builtin_ia32_vfmulcph_v8hf", IX86_BUILTIN_VFMULCPH_V8HF, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512fp16_fmulc_v8hf_mask, "__builtin_ia32_vfmulcph_v8hf_mask", IX86_BUILTIN_VFMULCPH_V8HF_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmulc_v16hf, "__builtin_ia32_vfmulcph_v16hf", IX86_BUILTIN_VFMULCPH_V16HF, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512vl_fmulc_v16hf_mask, "__builtin_ia32_vfmulcph_v16hf_mask", IX86_BUILTIN_VFMULCPH_V16HF_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UQI)
>
>  /* Builtins with rounding support.  */
>  BDESC_END (ARGS, ROUND_ARGS)
> @@ -3201,6 +3221,16 @@ BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask_round
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_mask3_round, "__builtin_ia32_vfnmaddsh3_mask3", IX86_BUILTIN_VFNMADDSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfnmadd_v8hf_maskz_round, "__builtin_ia32_vfnmaddsh3_maskz", IX86_BUILTIN_VFNMADDSH3_MASKZ, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
>  BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512f_vmfmsub_v8hf_mask3_round, "__builtin_ia32_vfmsubsh3_mask3", IX86_BUILTIN_VFMSUBSH3_MASK3, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fmaddc_v32hf_round, "__builtin_ia32_vfmaddcph_v32hf_round", IX86_BUILTIN_VFMADDCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_mask_round, "__builtin_ia32_vfmaddcph_v32hf_mask_round", IX86_BUILTIN_VFMADDCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmaddc_v32hf_maskz_round, "__builtin_ia32_vfmaddcph_v32hf_maskz_round", IX86_BUILTIN_VFMADDCPH_V32HF_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_fma_fcmaddc_v32hf_round, "__builtin_ia32_vfcmaddcph_v32hf_round", IX86_BUILTIN_VFCMADDCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_mask_round, "__builtin_ia32_vfcmaddcph_v32hf_mask_round", IX86_BUILTIN_VFCMADDCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmaddc_v32hf_maskz_round, "__builtin_ia32_vfcmaddcph_v32hf_maskz_round", IX86_BUILTIN_VFCMADDCPH_V32HF_MASKZ_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_round, "__builtin_ia32_vfcmulcph_v32hf_round", IX86_BUILTIN_VFCMULCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fcmulc_v32hf_mask_round, "__builtin_ia32_vfcmulcph_v32hf_mask_round", IX86_BUILTIN_VFCMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_round, "__builtin_ia32_vfmulcph_v32hf_round", IX86_BUILTIN_VFMULCPH_V32HF_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_INT)
> +BDESC (0, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_avx512bw_fmulc_v32hf_mask_round, "__builtin_ia32_vfmulcph_v32hf_mask_round", IX86_BUILTIN_VFMULCPH_V32HF_MASK_ROUND, UNKNOWN, (int) V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT)
>
>  BDESC_END (ROUND_ARGS, MULTI_ARG)
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index f6de05c769a..f6d74549dc2 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -9582,6 +9582,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
>      case V2DI_FTYPE_V8HF_V2DI_UQI:
>      case V2DI_FTYPE_V4SF_V2DI_UQI:
>      case V8HF_FTYPE_V8HF_V8HF_UQI:
> +    case V8HF_FTYPE_V8HF_V8HF_V8HF:
>      case V8HF_FTYPE_V8HI_V8HF_UQI:
>      case V8HF_FTYPE_V8SI_V8HF_UQI:
>      case V8HF_FTYPE_V8SF_V8HF_UQI:
> @@ -9660,6 +9661,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
>      case V16SF_FTYPE_V8SF_V16SF_UHI:
>      case V16SI_FTYPE_V8SI_V16SI_UHI:
>      case V16HF_FTYPE_V16HI_V16HF_UHI:
> +    case V16HF_FTYPE_V16HF_V16HF_V16HF:
>      case V16HI_FTYPE_V16HF_V16HI_UHI:
>      case V16HI_FTYPE_V16HI_V16HI_UHI:
>      case V8HI_FTYPE_V16QI_V8HI_UQI:
> @@ -9816,6 +9818,7 @@ ix86_expand_args_builtin (const struct builtin_description *d,
>      case V8HI_FTYPE_V8HI_V8HI_V8HI_UQI:
>      case V8SI_FTYPE_V8SI_V8SI_V8SI_UQI:
>      case V4SI_FTYPE_V4SI_V4SI_V4SI_UQI:
> +    case V16HF_FTYPE_V16HF_V16HF_V16HF_UQI:
>      case V16HF_FTYPE_V16HF_V16HF_V16HF_UHI:
>      case V8SF_FTYPE_V8SF_V8SF_V8SF_UQI:
>      case V16QI_FTYPE_V16QI_V16QI_V16QI_UHI:
> @@ -10545,6 +10548,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
>      case V16SF_FTYPE_V16HF_V16SF_UHI_INT:
>      case V32HF_FTYPE_V32HI_V32HF_USI_INT:
>      case V32HF_FTYPE_V32HF_V32HF_USI_INT:
> +    case V32HF_FTYPE_V32HF_V32HF_V32HF_INT:
>      case V16SF_FTYPE_V16SF_V16SF_HI_INT:
>      case V8DI_FTYPE_V8SF_V8DI_QI_INT:
>      case V16SF_FTYPE_V16SI_V16SF_HI_INT:
> @@ -10574,6 +10578,7 @@ ix86_expand_round_builtin (const struct builtin_description *d,
>      case V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT:
>      case V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT:
>      case V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT:
> +    case V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT:
>      case V32HF_FTYPE_V32HF_V32HF_V32HF_USI_INT:
>      case V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT:
>      case V2DF_FTYPE_V2DF_V2DF_V2DF_QI_INT:
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 31f8fc68c65..ddd93f739e3 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -194,6 +194,14 @@ (define_c_enum "unspec" [
>    UNSPEC_VCVTNE2PS2BF16
>    UNSPEC_VCVTNEPS2BF16
>    UNSPEC_VDPBF16PS
> +
> +  ;; For AVX512FP16 suppport
> +  UNSPEC_COMPLEX_FMA
> +  UNSPEC_COMPLEX_FCMA
> +  UNSPEC_COMPLEX_FMUL
> +  UNSPEC_COMPLEX_FCMUL
> +  UNSPEC_COMPLEX_MASK
> +
>  ])
>
>  (define_c_enum "unspecv" [
> @@ -909,6 +917,10 @@ (define_mode_attr avx512fmaskmode
>     (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
>     (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
>
> +;; Mapping of vector modes to corresponding complex mask size
> +(define_mode_attr avx512fmaskcmode
> +  [(V32HF "HI") (V16HF "QI") (V8HF  "QI")])
> +
>  ;; Mapping of vector modes to corresponding mask size
>  (define_mode_attr avx512fmaskmodelower
>    [(V64QI "di") (V32QI "si") (V16QI "hi")
> @@ -5499,6 +5511,92 @@ (define_insn "*fma4i_vmfnmsub_<mode>"
>    [(set_attr "type" "ssemuladd")
>     (set_attr "mode" "<MODE>")])
>
> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +;;
> +;; Complex type operations
> +;;
> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +
> +(define_int_iterator UNSPEC_COMPLEX_F_C_MA
> +       [UNSPEC_COMPLEX_FMA UNSPEC_COMPLEX_FCMA])
> +
> +(define_int_iterator UNSPEC_COMPLEX_F_C_MUL
> +       [UNSPEC_COMPLEX_FMUL UNSPEC_COMPLEX_FCMUL])
> +
> +(define_int_attr complexopname
> +       [(UNSPEC_COMPLEX_FMA "fmaddc")
> +        (UNSPEC_COMPLEX_FCMA "fcmaddc")
> +        (UNSPEC_COMPLEX_FMUL "fmulc")
> +        (UNSPEC_COMPLEX_FCMUL "fcmulc")])
> +
> +(define_expand "<avx512>_fmaddc_<mode>_maskz<round_expand_name>"
> +  [(match_operand:VF_AVX512FP16VL 0 "register_operand")
> +   (match_operand:VF_AVX512FP16VL 1 "<round_expand_nimm_predicate>")
> +   (match_operand:VF_AVX512FP16VL 2 "<round_expand_nimm_predicate>")
> +   (match_operand:VF_AVX512FP16VL 3 "<round_expand_nimm_predicate>")
> +   (match_operand:<avx512fmaskcmode> 4 "register_operand")]
> +  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
> +{
> +  emit_insn (gen_fma_fmaddc_<mode>_maskz_1<round_expand_name> (
> +    operands[0], operands[1], operands[2], operands[3],
> +    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
> +  DONE;
> +})
> +
> +(define_expand "<avx512>_fcmaddc_<mode>_maskz<round_expand_name>"
> +  [(match_operand:VF_AVX512FP16VL 0 "register_operand")
> +   (match_operand:VF_AVX512FP16VL 1 "<round_expand_nimm_predicate>")
> +   (match_operand:VF_AVX512FP16VL 2 "<round_expand_nimm_predicate>")
> +   (match_operand:VF_AVX512FP16VL 3 "<round_expand_nimm_predicate>")
> +   (match_operand:<avx512fmaskcmode> 4 "register_operand")]
> +  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
> +{
> +  emit_insn (gen_fma_fcmaddc_<mode>_maskz_1<round_expand_name> (
> +    operands[0], operands[1], operands[2], operands[3],
> +    CONST0_RTX (<MODE>mode), operands[4]<round_expand_operand>));
> +  DONE;
> +})
> +
> +(define_insn "fma_<complexopname>_<mode><sdc_maskz_name><round_name>"
> +  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
> +       (unspec:VF_AVX512FP16VL
> +         [(match_operand:VF_AVX512FP16VL 1 "<round_nimm_predicate>" "0")
> +          (match_operand:VF_AVX512FP16VL 2 "<round_nimm_predicate>" "%v")
> +          (match_operand:VF_AVX512FP16VL 3 "<round_nimm_predicate>" "<round_constraint>")]
> +          UNSPEC_COMPLEX_F_C_MA))]
> +  "TARGET_AVX512FP16 && <sdc_mask_mode512bit_condition> && <round_mode512bit_condition>"
> +  "v<complexopname><ssemodesuffix>\t{<round_sdc_mask_op4>%3, %2, %0<sdc_mask_op4>|%0<sdc_mask_op4>, %2, %3<round_sdc_mask_op4>}"
> +  [(set_attr "type" "ssemuladd")
> +   (set_attr "mode" "<MODE>")])
> +
> +(define_insn "<avx512>_<complexopname>_<mode>_mask<round_name>"
> +  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
> +       (vec_merge:VF_AVX512FP16VL
> +         (unspec:VF_AVX512FP16VL
> +           [(match_operand:VF_AVX512FP16VL 1 "register_operand" "0")
> +            (match_operand:VF_AVX512FP16VL 2 "nonimmediate_operand" "%v")
> +            (match_operand:VF_AVX512FP16VL 3 "nonimmediate_operand" "<round_constraint>")]
> +            UNSPEC_COMPLEX_F_C_MA)
> +         (match_dup 1)
> +         (unspec:<avx512fmaskmode>
> +           [(match_operand:<avx512fmaskcmode> 4 "register_operand" "Yk")]
> +           UNSPEC_COMPLEX_MASK)))]
> +  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
> +  "v<complexopname><ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
> +  [(set_attr "type" "ssemuladd")
> +   (set_attr "mode" "<MODE>")])
> +
> +(define_insn "<avx512>_<complexopname>_<mode><maskc_name><round_name>"
> +  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=v")
> +         (unspec:VF_AVX512FP16VL
> +           [(match_operand:VF_AVX512FP16VL 1 "nonimmediate_operand" "%v")
> +            (match_operand:VF_AVX512FP16VL 2 "nonimmediate_operand" "<round_constraint>")]
> +            UNSPEC_COMPLEX_F_C_MUL))]
> +  "TARGET_AVX512FP16 && <round_mode512bit_condition>"
> +  "v<complexopname><ssemodesuffix>\t{<round_maskc_op3>%2, %1, %0<maskc_operand3>|%0<maskc_operand3>, %1, %2<round_maskc_op3>}"
> +  [(set_attr "type" "ssemul")
> +   (set_attr "mode" "<MODE>")])
> +
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ;;
>  ;; Parallel half-precision floating point conversion operations
> diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
> index 2e9c2b38e25..3a1f554e9b9 100644
> --- a/gcc/config/i386/subst.md
> +++ b/gcc/config/i386/subst.md
> @@ -28,6 +28,9 @@ (define_mode_iterator SUBST_V
>     V16SF V8SF  V4SF
>     V8DF  V4DF  V2DF])
>
> +(define_mode_iterator SUBST_CV
> +  [V32HF V16HF V8HF])
> +
>  (define_mode_iterator SUBST_S
>    [QI HI SI DI])
>
> @@ -42,9 +45,11 @@ (define_mode_iterator SUBST_A
>     QI HI SI DI SF DF])
>
>  (define_subst_attr "mask_name" "mask" "" "_mask")
> +(define_subst_attr "maskc_name" "maskc" "" "_mask")
>  (define_subst_attr "mask_applied" "mask" "false" "true")
>  (define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2")
>  (define_subst_attr "mask_operand3" "mask" "" "%{%4%}%N3")
> +(define_subst_attr "maskc_operand3" "maskc" "" "%{%4%}%N3")
>  (define_subst_attr "mask_operand3_1" "mask" "" "%%{%%4%%}%%N3") ;; for sprintf
>  (define_subst_attr "mask_operand4" "mask" "" "%{%5%}%N4")
>  (define_subst_attr "mask_operand6" "mask" "" "%{%7%}%N6")
> @@ -89,6 +94,18 @@ (define_subst "merge_mask"
>           (match_dup 0)
>           (match_operand:<avx512fmaskmode> 2 "register_operand" "Yk")))])
>
> +(define_subst "maskc"
> +  [(set (match_operand:SUBST_CV 0)
> +        (match_operand:SUBST_CV 1))]
> +  "TARGET_AVX512F"
> +  [(set (match_dup 0)
> +        (vec_merge:SUBST_CV
> +         (match_dup 1)
> +         (match_operand:SUBST_CV 2 "nonimm_or_0_operand" "0C")
> +         (unspec:<avx512fmaskmode>
> +           [(match_operand:<avx512fmaskcmode> 3 "register_operand" "Yk")]
> +           UNSPEC_COMPLEX_MASK)))])
> +
>  (define_subst_attr "mask_scalar_merge_name" "mask_scalar_merge" "" "_mask")
>  (define_subst_attr "mask_scalar_merge_operand3" "mask_scalar_merge" "" "%{%3%}")
>  (define_subst_attr "mask_scalar_merge_operand4" "mask_scalar_merge" "" "%{%4%}")
> @@ -119,11 +136,31 @@ (define_subst "sd"
>          (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk")))
>  ])
>
> +(define_subst_attr "sdc_maskz_name" "sdc" "" "_maskz_1")
> +(define_subst_attr "sdc_mask_op4" "sdc" "" "%{%5%}%N4")
> +(define_subst_attr "sdc_mask_op5" "sdc" "" "%{%6%}%N5")
> +(define_subst_attr "sdc_mask_mode512bit_condition" "sdc" "1" "(<MODE_SIZE> == 64 || TARGET_AVX512VL)")
> +
> +(define_subst "sdc"
> + [(set (match_operand:SUBST_CV 0)
> +       (match_operand:SUBST_CV 1))]
> + ""
> + [(set (match_dup 0)
> +       (vec_merge:SUBST_CV
> +        (match_dup 1)
> +        (match_operand:SUBST_CV 2 "const0_operand" "C")
> +        (unspec:<avx512fmaskmode>
> +          [(match_operand:<avx512fmaskcmode> 3 "register_operand" "Yk")]
> +          UNSPEC_COMPLEX_MASK)))
> +])
> +
>  (define_subst_attr "round_name" "round" "" "_round")
>  (define_subst_attr "round_mask_operand2" "mask" "%R2" "%R4")
>  (define_subst_attr "round_mask_operand3" "mask" "%R3" "%R5")
> +(define_subst_attr "round_maskc_operand3" "maskc" "%R3" "%R5")
>  (define_subst_attr "round_mask_operand4" "mask" "%R4" "%R6")
>  (define_subst_attr "round_sd_mask_operand4" "sd" "%R4" "%R6")
> +(define_subst_attr "round_sdc_mask_operand4" "sdc" "%R4" "%R6")
>  (define_subst_attr "round_op2" "round" "" "%R2")
>  (define_subst_attr "round_op3" "round" "" "%R3")
>  (define_subst_attr "round_op4" "round" "" "%R4")
> @@ -131,8 +168,10 @@ (define_subst_attr "round_op5" "round" "" "%R5")
>  (define_subst_attr "round_op6" "round" "" "%R6")
>  (define_subst_attr "round_mask_op2" "round" "" "<round_mask_operand2>")
>  (define_subst_attr "round_mask_op3" "round" "" "<round_mask_operand3>")
> +(define_subst_attr "round_maskc_op3" "round" "" "<round_maskc_operand3>")
>  (define_subst_attr "round_mask_op4" "round" "" "<round_mask_operand4>")
>  (define_subst_attr "round_sd_mask_op4" "round" "" "<round_sd_mask_operand4>")
> +(define_subst_attr "round_sdc_mask_op4" "round" "" "<round_sdc_mask_operand4>")
>  (define_subst_attr "round_constraint" "round" "vm" "v")
>  (define_subst_attr "round_qq2phsuff" "round" "<qq2phsuff>" "")
>  (define_subst_attr "bcst_round_constraint" "round" "vmBr" "v")
> @@ -169,6 +208,7 @@ (define_subst_attr "round_saeonly_mask_operand3" "mask" "%r3" "%r5")
>  (define_subst_attr "round_saeonly_mask_operand4" "mask" "%r4" "%r6")
>  (define_subst_attr "round_saeonly_mask_scalar_merge_operand4" "mask_scalar_merge" "%r4" "%r5")
>  (define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%r5" "%r7")
> +(define_subst_attr "round_saeonly_sdc_mask_operand5" "sdc" "%r5" "%r7")
>  (define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%r2")
>  (define_subst_attr "round_saeonly_op3" "round_saeonly" "" "%r3")
>  (define_subst_attr "round_saeonly_op4" "round_saeonly" "" "%r4")
> diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> index 6c2d1dc3df4..56e90d9f9a5 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> @@ -787,6 +787,16 @@
>  #define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8)
>  #define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8)
>  #define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8)
> +#define __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, 8)
> +#define __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, 8)
> +#define __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, 8)
> +#define __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, 8)
> +#define __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, 8)
> +#define __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, 8)
> +#define __builtin_ia32_vfmulcph_v32hf_round(A, B, C) __builtin_ia32_vfmulcph_v32hf_round(A, B, 8)
> +#define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8)
> +#define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8)
> +#define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> index f16be008909..ef9f8aad853 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> @@ -804,6 +804,16 @@
>  #define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8)
>  #define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8)
>  #define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8)
> +#define __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, 8)
> +#define __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, 8)
> +#define __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, 8)
> +#define __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, 8)
> +#define __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, 8)
> +#define __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, 8)
> +#define __builtin_ia32_vfmulcph_v32hf_round(A, B, C) __builtin_ia32_vfmulcph_v32hf_round(A, B, 8)
> +#define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8)
> +#define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8)
> +#define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> index 01ac4e04173..f27c73fd4cc 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> @@ -772,6 +772,8 @@ test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8)
>  test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8)
>  test_2 (_mm_cvt_roundi32_sh, __m128h, __m128h, int, 8)
>  test_2 (_mm_cvt_roundu32_sh, __m128h, __m128h, unsigned, 8)
> +test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8)
> +test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8)
>  test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
>  test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
>  test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
> @@ -846,6 +848,10 @@ test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
>  test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
>  test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
>  test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
> +test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
> +test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
>  test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
>  test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
>  test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
> @@ -908,6 +914,14 @@ test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h,
>  test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
>  test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
>  test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
> +test_4 (_mm512_mask_fmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_fcmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
> +test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
> +test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
> +test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
> +test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
>  test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
>  test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
>  test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> index 79e3f35ab86..ccf8c3a6c03 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> @@ -876,6 +876,8 @@ test_2 (_mm_cvt_roundsh_ss, __m128, __m128, __m128h, 8)
>  test_2 (_mm_cvt_roundsh_sd, __m128d, __m128d, __m128h, 8)
>  test_2 (_mm_cvt_roundss_sh, __m128h, __m128h, __m128, 8)
>  test_2 (_mm_cvt_roundsd_sh, __m128h, __m128h, __m128d, 8)
> +test_2 (_mm512_fmul_round_pch, __m512h, __m512h, __m512h, 8)
> +test_2 (_mm512_fcmul_round_pch, __m512h, __m512h, __m512h, 8)
>  test_2x (_mm512_cmp_round_ph_mask, __mmask32, __m512h, __m512h, 1, 8)
>  test_2x (_mm_cmp_round_sh_mask, __mmask8, __m128h, __m128h, 1, 8)
>  test_2x (_mm_comi_round_sh, int, __m128h, __m128h, 1, 8)
> @@ -949,6 +951,10 @@ test_3 (_mm_fmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
>  test_3 (_mm_fnmadd_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
>  test_3 (_mm_fmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
>  test_3 (_mm_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, 9)
> +test_3 (_mm512_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
> +test_3 (_mm512_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_fmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
> +test_3 (_mm512_maskz_fcmul_round_pch, __m512h, __mmask16, __m512h, __m512h, 8)
>  test_3x (_mm512_mask_cmp_round_ph_mask, __mmask32, __mmask32, __m512h, __m512h, 1, 8)
>  test_3x (_mm_mask_cmp_round_sh_mask, __mmask8, __mmask8, __m128h, __m128h, 1, 8)
>  test_3x (_mm512_mask_reduce_round_ph, __m512h, __m512h, __mmask32, __m512h, 123, 8)
> @@ -1010,6 +1016,14 @@ test_4 (_mm_maskz_fmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h,
>  test_4 (_mm_mask_fnmsub_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 9)
>  test_4 (_mm_mask3_fnmsub_round_sh, __m128h, __m128h, __m128h, __m128h, __mmask8, 9)
>  test_4 (_mm_maskz_fnmsub_round_sh, __m128h, __mmask8, __m128h, __m128h, __m128h, 9)
> +test_4 (_mm512_mask_fmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_fcmadd_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
> +test_4 (_mm512_mask3_fmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
> +test_4 (_mm512_mask3_fcmadd_round_pch, __m512h, __m512h, __m512h, __m512h, __mmask16, 8)
> +test_4 (_mm512_maskz_fmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
> +test_4 (_mm512_maskz_fcmadd_round_pch, __m512h, __mmask16, __m512h, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_fmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
> +test_4 (_mm512_mask_fcmul_round_pch, __m512h, __m512h, __mmask16, __m512h, __m512h, 8)
>  test_4x (_mm_mask_reduce_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
>  test_4x (_mm_mask_roundscale_round_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 123, 8)
>  test_4x (_mm_mask_getmant_sh, __m128h, __m128h, __mmask8, __m128h, __m128h, 1, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> index caf14408b91..dc39d7e2012 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> @@ -805,6 +805,16 @@
>  #define __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask(A, B, C, D, 8)
>  #define __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_mask3(A, B, C, D, 8)
>  #define __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, E) __builtin_ia32_vfnmsubsh3_maskz(A, B, C, D, 8)
> +#define __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfcmaddcph_v32hf_round(A, B, C, 8)
> +#define __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmaddcph_v32hf_mask_round(A, C, D, B, 8)
> +#define __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfcmaddcph_v32hf_maskz_round(B, C, D, A, 8)
> +#define __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, D) __builtin_ia32_vfmaddcph_v32hf_round(A, B, C, 8)
> +#define __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmaddcph_v32hf_mask_round(A, C, D, B, 8)
> +#define __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, E) __builtin_ia32_vfmaddcph_v32hf_maskz_round(B, C, D, A, 8)
> +#define __builtin_ia32_vfmulcph_v32hf_round(A, B, C) __builtin_ia32_vfmulcph_v32hf_round(A, B, 8)
> +#define __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfmulcph_v32hf_mask_round(A, C, D, B, 8)
> +#define __builtin_ia32_vfcmulcph_v32hf_round(A, B, C) __builtin_ia32_vfcmulcph_v32hf_round(A, B, 8)
> +#define __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, E) __builtin_ia32_vfcmulcph_v32hf_mask_round(A, C, D, B, 8)
>
>  /* avx512fp16vlintrin.h */
>  #define __builtin_ia32_vcmpph_v8hf_mask(A, B, C, D) __builtin_ia32_vcmpph_v8hf_mask(A, B, 1, D)
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 59/62] AVX512FP16: Support load/store/abs intrinsics.
  2021-07-01  6:16 ` [PATCH 59/62] AVX512FP16: Support load/store/abs intrinsics liuhongt
@ 2021-09-22 10:30   ` Hongtao Liu
  0 siblings, 0 replies; 85+ messages in thread
From: Hongtao Liu @ 2021-09-22 10:30 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, H. J. Lu, Uros Bizjak, Jakub Jelinek, dianhong xu

I'm going to check in 4 patches.

[PATCH 59/62] AVX512FP16: Support load/store/abs intrinsics.
[PATCH 60/62] AVX512FP16: Add reduce operators(add/mul/min/max).
[PATCH 61/62] AVX512FP16: Add complex conjugation intrinsic instructions.
[PATCH 62/62] AVX512FP16: Add permutation and mask blend intrinsics.

  Bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,}.
  Newly added runtime tests passed on sde{-m32,}.

On Thu, Jul 1, 2021 at 2:18 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> From: dianhong xu <dianhong.xu@intel.com>
>
> gcc/ChangeLog:
>
>         * config/i386/avx512fp16intrin.h (__m512h_u, __m256h_u,
>         __m128h_u): New typedef.
>         (_mm512_load_ph): New intrinsic.
>         (_mm256_load_ph): Ditto.
>         (_mm_load_ph): Ditto.
>         (_mm512_loadu_ph): Ditto.
>         (_mm256_loadu_ph): Ditto.
>         (_mm_loadu_ph): Ditto.
>         (_mm512_store_ph): Ditto.
>         (_mm256_store_ph): Ditto.
>         (_mm_store_ph): Ditto.
>         (_mm512_storeu_ph): Ditto.
>         (_mm256_storeu_ph): Ditto.
>         (_mm_storeu_ph): Ditto.
>         (_mm512_abs_ph): Ditto.
>         * config/i386/avx512fp16vlintrin.h
>         (_mm_abs_ph): Ditto.
>         (_mm256_abs_ph): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx512fp16-13.c: New test.
> ---
>  gcc/config/i386/avx512fp16intrin.h            |  97 ++++++++++++
>  gcc/config/i386/avx512fp16vlintrin.h          |  16 ++
>  gcc/testsuite/gcc.target/i386/avx512fp16-13.c | 143 ++++++++++++++++++
>  3 files changed, 256 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-13.c
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> index 39c10beb1de..b8ca9201828 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -45,6 +45,11 @@ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
>  typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
>  typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
>
> +/* Unaligned version of the same type.  */
> +typedef _Float16 __m128h_u __attribute__ ((__vector_size__ (16), __may_alias__, __aligned__ (1)));
> +typedef _Float16 __m256h_u __attribute__ ((__vector_size__ (32), __may_alias__, __aligned__ (1)));
> +typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64), __may_alias__, __aligned__ (1)));
> +
>  extern __inline __m128h
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5,
> @@ -362,6 +367,48 @@ _mm_load_sh (void const *__P)
>                      *(_Float16 const *) __P);
>  }
>
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_load_ph (void const *__P)
> +{
> +  return *(const __m512h *) __P;
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_load_ph (void const *__P)
> +{
> +  return *(const __m256h *) __P;
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_load_ph (void const *__P)
> +{
> +  return *(const __m128h *) __P;
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_loadu_ph (void const *__P)
> +{
> +  return *(const __m512h_u *) __P;
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_loadu_ph (void const *__P)
> +{
> +  return *(const __m256h_u *) __P;
> +}
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_loadu_ph (void const *__P)
> +{
> +  return *(const __m128h_u *) __P;
> +}
> +
>  /* Stores the lower _Float16 value.  */
>  extern __inline void
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> @@ -370,6 +417,56 @@ _mm_store_sh (void *__P, __m128h __A)
>    *(_Float16 *) __P = ((__v8hf)__A)[0];
>  }
>
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_store_ph (void *__P, __m512h __A)
> +{
> +   *(__m512h *) __P = __A;
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_store_ph (void *__P, __m256h __A)
> +{
> +   *(__m256h *) __P = __A;
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_store_ph (void *__P, __m128h __A)
> +{
> +   *(__m128h *) __P = __A;
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_storeu_ph (void *__P, __m512h __A)
> +{
> +   *(__m512h_u *) __P = __A;
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_storeu_ph (void *__P, __m256h __A)
> +{
> +   *(__m256h_u *) __P = __A;
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_storeu_ph (void *__P, __m128h __A)
> +{
> +   *(__m128h_u *) __P = __A;
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_abs_ph(__m512h __A)
> +{
> +  return (__m512h) _mm512_and_epi32 ( _mm512_set1_epi32(0x7FFF7FFF),
> +                                    (__m512i) __A);
> +}
> +
>  /* Intrinsics v[add,sub,mul,div]ph.  */
>  extern __inline __m512h
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> diff --git a/gcc/config/i386/avx512fp16vlintrin.h b/gcc/config/i386/avx512fp16vlintrin.h
> index c7bdfbc0517..d4aa9928406 100644
> --- a/gcc/config/i386/avx512fp16vlintrin.h
> +++ b/gcc/config/i386/avx512fp16vlintrin.h
> @@ -425,6 +425,22 @@ _mm256_maskz_min_ph (__mmask16 __A, __m256h __B, __m256h __C)
>                                            _mm256_setzero_ph (), __A);
>  }
>
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_abs_ph (__m128h __A)
> +{
> +  return (__m128h) _mm_and_si128 ( _mm_set1_epi32(0x7FFF7FFF),
> +                                 (__m128i) __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_abs_ph (__m256h __A)
> +{
> +  return (__m256h) _mm256_and_si256 ( _mm256_set1_epi32(0x7FFF7FFF),
> +                                    (__m256i) __A);
> +}
> +
>  /* vcmpph */
>  #ifdef __OPTIMIZE
>  extern __inline __mmask8
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
> new file mode 100644
> index 00000000000..3b6219e493f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
> @@ -0,0 +1,143 @@
> +/* { dg-do compile} */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
> +
> +#include <immintrin.h>
> +void
> +__attribute__ ((noinline, noclone))
> +store512_ph (void *p, __m512h a)
> +{
> +  _mm512_store_ph (p, a);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)" 1 } } */
> +
> +void
> +__attribute__ ((noinline, noclone))
> +store256_ph (void *p, __m256h a)
> +{
> +  _mm256_store_ph (p, a);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */
> +
> +void
> +__attribute__ ((noinline, noclone))
> +store_ph (void *p, __m128h a)
> +{
> +  _mm_store_ph (p, a);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */
> +
> +__m512h
> +__attribute__ ((noinline, noclone))
> +load512_ph (void const *p)
> +{
> +  return _mm512_load_ph (p);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)" 1 } } */
> +
> +__m256h
> +__attribute__ ((noinline, noclone))
> +load256_ph (void const *p)
> +{
> +  return _mm256_load_ph (p);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */
> +
> +__m128h
> +__attribute__ ((noinline, noclone))
> +load_ph (void const *p)
> +{
> +  return _mm_load_ph (p);
> +}
> +/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */
> +
> +__m512h
> +__attribute__ ((noinline, noclone))
> +load512u_ph (void const *p)
> +{
> +  return _mm512_loadu_ph (p);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^,\]*,\[^\{\n\]*%zmm\[0-9\]" 1 } } */
> +
> +__m256h
> +__attribute__ ((noinline, noclone))
> +load256u_ph (void const *p)
> +{
> +  return _mm256_loadu_ph (p);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^,\]*,\[^\{\n\]*%ymm\[0-9\]" 1 } } */
> +
> +__m128h
> +__attribute__ ((noinline, noclone))
> +load128u_ph (void const *p)
> +{
> +  return _mm_loadu_ph (p);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^,\]*,\[^\{\n\]*%xmm\[0-9\]" 1 } } */
> +
> +void
> +__attribute__ ((noinline, noclone))
> +store512u_ph (void *p, __m512h a)
> +{
> +  return _mm512_storeu_ph (p, a);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^\{\n\]*%zmm\[0-9\], *\[^,\]*" 1 } } */
> +
> +void
> +__attribute__ ((noinline, noclone))
> +store256u_ph (void *p, __m256h a)
> +{
> +  return _mm256_storeu_ph (p, a);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^\{\n\]*%ymm\[0-9\], *\[^,\]*" 1 } } */
> +
> +void
> +__attribute__ ((noinline, noclone))
> +storeu_ph (void *p, __m128h a)
> +{
> +  return _mm_storeu_ph (p, a);
> +}
> +
> +/* { dg-final { scan-assembler-times "vmovdqu16\[ \\t\]*\[^\{\n\]*%xmm\[0-9\], *\[^,\]*" 1 } } */
> +
> +__m512h
> +__attribute__ ((noinline, noclone))
> +abs512_ph (__m512h a)
> +{
> +  return _mm512_abs_ph (a);
> +}
> +
> +/* { dg-final { scan-assembler-times "vpandd\[ \\t\]+\[^\n\]*\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 { target {! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%zmm\[0-9\]+" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vpandd\[^\n\]*%zmm\[0-9\]+" 1 { target ia32 } } } */
> +
> +__m256h
> +__attribute__ ((noinline, noclone))
> +abs256_ph (__m256h a)
> +{
> +  return _mm256_abs_ph (a);
> +}
> +
> +/* { dg-final { scan-assembler-times "vpandq\[ \\t\]+\[^\n\]*\\\{1to\[1-4\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 { target {! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpbroadcastq\[^\n\]*%ymm\[0-9\]+" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vpand\[^\n\]*%ymm\[0-9\]+" 1 { target ia32 } } } */
> +
> +__m128h
> +__attribute__ ((noinline, noclone))
> +abs_ph (__m128h a)
> +{
> +  return _mm_abs_ph (a);
> +}
> +
> +/* { dg-final { scan-assembler-times "vpandq\[ \\t\]+\[^\n\]*\\\{1to\[1-2\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 { target {! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpbroadcastq\[^\n\]*%xmm\[0-9\]+" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vpand\[^\n\]*%xmm\[0-9\]+" 1 { target ia32 } } } */
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2021-09-22 10:24 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-01  6:15 [PATCH 00/62] Support all AVX512FP16 intrinsics liuhongt
2021-07-01  6:15 ` [PATCH 01/62] AVX512FP16: Support vector init/broadcast for FP16 liuhongt
2021-07-01  6:15 ` [PATCH 02/62] AVX512FP16: Add testcase for vector init and broadcast intrinsics liuhongt
2021-07-01  6:15 ` [PATCH 03/62] AVX512FP16: Fix HF vector passing in variable arguments liuhongt
2021-07-01  6:15 ` [PATCH 04/62] AVX512FP16: Add ABI tests for xmm liuhongt
2021-07-01  6:15 ` [PATCH 05/62] AVX512FP16: Add ABI test for ymm liuhongt
2021-07-01  6:15 ` [PATCH 06/62] AVX512FP16: Add abi test for zmm liuhongt
2021-07-01  6:15 ` [PATCH 07/62] AVX512FP16: Add vaddph/vsubph/vdivph/vmulph liuhongt
2021-09-09  7:48   ` Hongtao Liu
2021-07-01  6:15 ` [PATCH 08/62] AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph liuhongt
2021-07-01  6:15 ` [PATCH 09/62] AVX512FP16: Enable _Float16 autovectorization liuhongt
2021-09-10  7:03   ` Hongtao Liu
2021-07-01  6:15 ` [PATCH 10/62] AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh liuhongt
2021-07-01  6:15 ` [PATCH 11/62] AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh liuhongt
2021-07-01  6:15 ` [PATCH 12/62] AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh liuhongt
2021-07-01  6:15 ` [PATCH 13/62] AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh liuhongt
2021-07-01  6:16 ` [PATCH 14/62] AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish liuhongt
2021-07-01  6:16 ` [PATCH 15/62] AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish liuhongt
2021-07-01  6:16 ` [PATCH 16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh liuhongt
2021-09-14  3:50   ` Hongtao Liu
2021-07-01  6:16 ` [PATCH 17/62] AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh liuhongt
2021-07-01  6:16 ` [PATCH 18/62] AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh liuhongt
2021-07-01  6:16 ` [PATCH 19/62] AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh liuhongt
2021-07-01  6:16 ` [PATCH 20/62] AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh liuhongt
2021-07-01  6:16 ` [PATCH 21/62] AVX512FP16: Add testcase for vreduceph/vreducesh/vrndscaleph/vrndscalesh liuhongt
2021-07-01  6:16 ` [PATCH 22/62] AVX512FP16: Add fpclass/getexp/getmant instructions liuhongt
2021-07-01  6:16 ` [PATCH 23/62] AVX512FP16: Add testcase for fpclass/getmant/getexp instructions liuhongt
2021-07-01  6:16 ` [PATCH 24/62] AVX512FP16: Add vmovw/vmovsh liuhongt
2021-09-16  5:08   ` Hongtao Liu
2021-07-01  6:16 ` [PATCH 25/62] AVX512FP16: Add testcase for vmovsh/vmovw liuhongt
2021-07-01  6:16 ` [PATCH 26/62] AVX512FP16: Add vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq liuhongt
2021-07-01  6:16 ` [PATCH 27/62] AVX512FP16: Add testcase for vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq liuhongt
2021-07-01  6:16 ` [PATCH 28/62] AVX512FP16: Add vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph liuhongt
2021-07-01  6:16 ` [PATCH 29/62] AVX512FP16: Add testcase for vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph liuhongt
2021-07-01  6:16 ` [PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh liuhongt
2021-09-17  8:07   ` Hongtao Liu
2021-07-01  6:16 ` [PATCH 31/62] AVX512FP16: Add testcase for vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh liuhongt
2021-07-01  6:16 ` [PATCH 32/62] AVX512FP16: Add vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq liuhongt
2021-07-01  6:16 ` [PATCH 33/62] AVX512FP16: Add testcase for vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq liuhongt
2021-07-01  6:16 ` [PATCH 34/62] AVX512FP16: Add vcvttsh2si/vcvttsh2usi liuhongt
2021-07-01  6:16 ` [PATCH 35/62] AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx liuhongt
2021-07-01  6:16 ` [PATCH 36/62] AVX512FP16: Add testcase for vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx liuhongt
2021-07-01  6:16 ` [PATCH 37/62] AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh liuhongt
2021-07-01  6:16 ` [PATCH 38/62] AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh liuhongt
2021-07-01  6:16 ` [PATCH 39/62] AVX512FP16: Add intrinsics for casting between vector float16 and vector float32/float64/integer liuhongt
2021-07-01  6:16 ` [PATCH 40/62] AVX512FP16: Add vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph liuhongt
2021-09-18  7:04   ` Hongtao Liu
2021-07-01  6:16 ` [PATCH 41/62] AVX512FP16: Add testcase for " liuhongt
2021-07-01  6:16 ` [PATCH 42/62] AVX512FP16: Add FP16 fma instructions liuhongt
2021-07-01  6:16 ` [PATCH 43/62] AVX512FP16: Add testcase for " liuhongt
2021-07-01  6:16 ` [PATCH 44/62] AVX512FP16: Add scalar/vector bitwise operations, including liuhongt
2021-07-23  5:13   ` Hongtao Liu
2021-07-26  2:25     ` Hongtao Liu
2021-07-01  6:16 ` [PATCH 45/62] AVX512FP16: Add testcase for fp16 bitwise operations liuhongt
2021-07-01  6:16 ` [PATCH 46/62] AVX512FP16: Enable FP16 mask load/store liuhongt
2021-07-01  6:16 ` [PATCH 47/62] AVX512FP16: Add scalar fma instructions liuhongt
2021-07-01  6:16 ` [PATCH 48/62] AVX512FP16: Add testcase for scalar FMA instructions liuhongt
2021-07-01  6:16 ` [PATCH 49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph liuhongt
2021-09-22  4:38   ` Hongtao Liu
2021-07-01  6:16 ` [PATCH 50/62] AVX512FP16: Add testcases for vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph liuhongt
2021-07-01  6:16 ` [PATCH 51/62] AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh liuhongt
2021-07-01  6:16 ` [PATCH 52/62] AVX512FP16: Add testcases for vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh liuhongt
2021-07-01  6:16 ` [PATCH 53/62] AVX512FP16: Add expander for sqrthf2 liuhongt
2021-07-23  5:12   ` Hongtao Liu
2021-07-01  6:16 ` [PATCH 54/62] AVX512FP16: Add expander for ceil/floor/trunc/roundeven liuhongt
2021-07-01  6:16 ` [PATCH 55/62] AVX512FP16: Add expander for cstorehf4 liuhongt
2021-07-01  6:16 ` [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16) liuhongt
2021-07-01  9:50   ` Richard Biener
2021-07-01 10:23     ` Hongtao Liu
2021-07-01 12:43       ` Richard Biener
2021-07-01 21:48         ` Joseph Myers
2021-07-02  7:38           ` Richard Biener
2021-07-01 21:17   ` Joseph Myers
2021-07-01  6:16 ` [PATCH 57/62] AVX512FP16: Add expander for fmahf4 liuhongt
2021-07-01  6:16 ` [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16) liuhongt
2021-07-01  9:52   ` Richard Biener
2021-07-01 21:26   ` Joseph Myers
2021-07-02  7:36     ` Richard Biener
2021-07-02 11:46       ` Bernhard Reutner-Fischer
2021-07-04  5:17         ` Hongtao Liu
2021-07-01  6:16 ` [PATCH 59/62] AVX512FP16: Support load/store/abs intrinsics liuhongt
2021-09-22 10:30   ` Hongtao Liu
2021-07-01  6:16 ` [PATCH 60/62] AVX512FP16: Add reduce operators(add/mul/min/max) liuhongt
2021-07-01  6:16 ` [PATCH 61/62] AVX512FP16: Add complex conjugation intrinsic instructions liuhongt
2021-07-01  6:16 ` [PATCH 62/62] AVX512FP16: Add permutation and mask blend intrinsics liuhongt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).