public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/2] Initial support for AVX512FP16
       [not found] <20210701054808.39000-1-hongtao.liu@intel.com>
@ 2021-07-01  5:55 ` Hongtao Liu
  2021-07-01 20:46   ` Joseph Myers
       [not found] ` <20210701054808.39000-3-hongtao.liu@intel.com>
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-07-01  5:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: H. J. Lu, Uros Bizjak, Jakub Jelinek, liuhongt

On Thu, Jul 1, 2021 at 1:48 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> Hi:
>   AVX512FP16 is disclosed, refer to [1].
>   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
>   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
>   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
>
> [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> [2] https://reviews.llvm.org/D33719
>
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
>
> Guo, Xuepeng (1):
>   AVX512FP16: Initial support for _Float16 type and AVX512FP16 feature.
>
> liuhongt (1):
>   AVX512FP16: Add HFmode support in libgcc.
>
>  gcc/common/config/i386/cpuinfo.h              |   2 +
>  gcc/common/config/i386/i386-common.c          |  26 +-
>  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
>  gcc/common/config/i386/i386-isas.h            |   1 +
>  gcc/config.gcc                                |   2 +-
>  gcc/config/i386/avx512fp16intrin.h            |  53 ++++
>  gcc/config/i386/cpuid.h                       |   1 +
>  gcc/config/i386/i386-builtin-types.def        |   7 +-
>  gcc/config/i386/i386-builtins.c               |   6 +
>  gcc/config/i386/i386-c.c                      |  20 ++
>  gcc/config/i386/i386-expand.c                 |   8 +
>  gcc/config/i386/i386-isa.def                  |   1 +
>  gcc/config/i386/i386-modes.def                |   1 +
>  gcc/config/i386/i386-options.c                |  10 +-
>  gcc/config/i386/i386.c                        | 158 ++++++++++--
>  gcc/config/i386/i386.h                        |  18 +-
>  gcc/config/i386/i386.md                       | 242 +++++++++++++++---
>  gcc/config/i386/i386.opt                      |   4 +
>  gcc/config/i386/immintrin.h                   |   2 +
>  gcc/config/i386/sse.md                        |  42 +--
>  gcc/doc/invoke.texi                           |  10 +-
>  gcc/optabs-query.c                            |   9 +-
>  gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
>  gcc/testsuite/g++.target/i386/float16-2.C     |  14 +
>  gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
>  gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
>  .../gcc.target/i386/avx512fp16-12a.c          |  21 ++
>  .../gcc.target/i386/avx512fp16-12b.c          |  27 ++
>  gcc/testsuite/gcc.target/i386/float16-1.c     |   8 +
>  gcc/testsuite/gcc.target/i386/float16-2.c     |  14 +
>  gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 +
>  gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
>  gcc/testsuite/lib/target-supports.exp         |  13 +-
>  libgcc/Makefile.in                            |   4 +-
>  libgcc/config.host                            |   6 +-
>  libgcc/config/i386/32/sfp-machine.h           |   1 +
>  libgcc/config/i386/64/sfp-machine.h           |   1 +
>  libgcc/config/i386/64/t-softfp                |   9 +
>  libgcc/config/i386/_divhc3.c                  |   4 +
>  libgcc/config/i386/_mulhc3.c                  |   4 +
>  libgcc/config/i386/sfp-machine.h              |   1 +
>  libgcc/config/i386/t-softfp                   |  20 ++
>  libgcc/configure                              |  33 +++
>  libgcc/configure.ac                           |  13 +
>  libgcc/soft-fp/extendhfxf2.c                  |  53 ++++
>  libgcc/soft-fp/truncxfhf2.c                   |  52 ++++
>  56 files changed, 907 insertions(+), 106 deletions(-)
>  create mode 100644 gcc/config/i386/avx512fp16intrin.h
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
>  create mode 100644 libgcc/config/i386/64/t-softfp
>  create mode 100644 libgcc/config/i386/_divhc3.c
>  create mode 100644 libgcc/config/i386/_mulhc3.c
>  create mode 100644 libgcc/soft-fp/extendhfxf2.c
>  create mode 100644 libgcc/soft-fp/truncxfhf2.c
>
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/2] AVX512FP16: Add HFmode support in libgcc.
       [not found] ` <20210701054808.39000-3-hongtao.liu@intel.com>
@ 2021-07-01  5:55   ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-01  5:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: H. J. Lu, Uros Bizjak, Jakub Jelinek, liuhongt

On Thu, Jul 1, 2021 at 1:48 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> 1. Add extendhftf2, extendhfxf2, truncxfhf2, trunctfhf2, fixhfti,
> fixunshfti, floattihf and floatuntihf.
> 2. Always add _divhc3.c and _mulhc3.c.  If assembler doesn't support
> AVX512FP16, they are empty.
>
> 2019-01-01  H.J. Lu <hongjiu.lu@intel.com>
> gcc/ChangeLog:
>
>         * optabs-query.c (get_best_extraction_insn): Adjust smallest_int_mode
>         for HFmode.
>
> libgcc/ChangeLog:
>
>         * Makefile.in: Adjust to support avx512fp16.
>         * config.host: Adjust tmake_file to omit host_address value.
>         * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): Add for _Float16.
>         * config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Likewise.
>         * config/i386/sfp-machine.h (_FP_NANSIGN_H): Define.
>         * config/i386/t-softfp: Add divhc3, mulhc3, extendhftf2, extendhfxf2,
>         truncxfhf2 and trunctfhf2.
>         * configure: Regenerate.
>         * configure.ac: Add check for AVX512FP16.
>         * config/i386/64/t-softfp: New file to add fixhfti, fixunshfti,
>         floattihf and floatuntihf,
>         * config/i386/_divhc3.c: New file to add divhc3.
>         * config/i386/_mulhc3.c: New file to add mulhc3.
>         * soft-fp/extendhfxf2.c: New file to add extendhfxf2.
>         * soft-fp/truncxfhf2.c: New file to add truncxfhf2.
> ---
>  gcc/optabs-query.c                  |  9 ++++-
>  libgcc/Makefile.in                  |  4 ++-
>  libgcc/config.host                  |  6 +---
>  libgcc/config/i386/32/sfp-machine.h |  1 +
>  libgcc/config/i386/64/sfp-machine.h |  1 +
>  libgcc/config/i386/64/t-softfp      |  9 +++++
>  libgcc/config/i386/_divhc3.c        |  4 +++
>  libgcc/config/i386/_mulhc3.c        |  4 +++
>  libgcc/config/i386/sfp-machine.h    |  1 +
>  libgcc/config/i386/t-softfp         | 20 +++++++++++
>  libgcc/configure                    | 33 ++++++++++++++++++
>  libgcc/configure.ac                 | 13 +++++++
>  libgcc/soft-fp/extendhfxf2.c        | 53 +++++++++++++++++++++++++++++
>  libgcc/soft-fp/truncxfhf2.c         | 52 ++++++++++++++++++++++++++++
>  14 files changed, 203 insertions(+), 7 deletions(-)
>  create mode 100644 libgcc/config/i386/64/t-softfp
>  create mode 100644 libgcc/config/i386/_divhc3.c
>  create mode 100644 libgcc/config/i386/_mulhc3.c
>  create mode 100644 libgcc/soft-fp/extendhfxf2.c
>  create mode 100644 libgcc/soft-fp/truncxfhf2.c
>
> diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
> index 3248ce2c06e..a59cb5607d1 100644
> --- a/gcc/optabs-query.c
> +++ b/gcc/optabs-query.c
> @@ -205,7 +205,14 @@ get_best_extraction_insn (extraction_insn *insn,
>                           machine_mode field_mode)
>  {
>    opt_scalar_int_mode mode_iter;
> -  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode_for_size (struct_bits))
> +  scalar_int_mode smallest_int_mode;
> +  /* FIXME: validate_subreg only allows (subreg:WORD_MODE (reg:HF) 0). */
> +  if (FLOAT_MODE_P (field_mode)
> +      && known_eq (GET_MODE_SIZE (field_mode), 2))
> +    smallest_int_mode = word_mode;
> +  else
> +    smallest_int_mode = smallest_int_mode_for_size (struct_bits);
> +  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode)
>      {
>        scalar_int_mode mode = mode_iter.require ();
>        if (get_extraction_insn (insn, pattern, type, mode))
> diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
> index 2c8be561eb5..4452b821a5e 100644
> --- a/libgcc/Makefile.in
> +++ b/libgcc/Makefile.in
> @@ -51,6 +51,8 @@ md_unwind_header = @md_unwind_header@
>  sfp_machine_header = @sfp_machine_header@
>  thread_header = @thread_header@
>
> +have_as_avx512fp16 = @have_as_avx512fp16@
> +
>  host_noncanonical = @host_noncanonical@
>  real_host_noncanonical = @real_host_noncanonical@
>  target_noncanonical = @target_noncanonical@
> @@ -314,7 +316,7 @@ MULTIOSSUBDIR := $(shell if test $(MULTIOSDIR) != .; then echo /$(MULTIOSDIR); f
>  inst_libdir = $(libsubdir)$(MULTISUBDIR)
>  inst_slibdir = $(slibdir)$(MULTIOSSUBDIR)
>
> -gcc_compile_bare = $(CC) $(INTERNAL_CFLAGS)
> +gcc_compile_bare = $(CC) $(INTERNAL_CFLAGS) $(CFLAGS-$(<F))
>  compile_deps = -MT $@ -MD -MP -MF $(basename $@).dep
>  gcc_compile = $(gcc_compile_bare) -o $@ $(compile_deps)
>  gcc_s_compile = $(gcc_compile) -DSHARED
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 50f00062232..3f16b547810 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -1539,11 +1539,7 @@ i[34567]86-*-elfiamcu | i[34567]86-*-rtems*)
>         # These use soft-fp for SFmode and DFmode, not just TFmode.
>         ;;
>  i[34567]86-*-* | x86_64-*-*)
> -       tmake_file="${tmake_file} t-softfp-tf"
> -       if test "${host_address}" = 32; then
> -               tmake_file="${tmake_file} i386/${host_address}/t-softfp"
> -       fi
> -       tmake_file="${tmake_file} i386/t-softfp t-softfp"
> +       tmake_file="${tmake_file} t-softfp-tf i386/${host_address}/t-softfp i386/t-softfp t-softfp"
>         ;;
>  esac
>
> diff --git a/libgcc/config/i386/32/sfp-machine.h b/libgcc/config/i386/32/sfp-machine.h
> index 1fa282d7afe..e24cbc8d180 100644
> --- a/libgcc/config/i386/32/sfp-machine.h
> +++ b/libgcc/config/i386/32/sfp-machine.h
> @@ -86,6 +86,7 @@
>  #define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>
> +#define _FP_NANFRAC_H          _FP_QNANBIT_H
>  #define _FP_NANFRAC_S          _FP_QNANBIT_S
>  #define _FP_NANFRAC_D          _FP_QNANBIT_D, 0
>  /* Even if XFmode is 12byte,  we have to pad it to
> diff --git a/libgcc/config/i386/64/sfp-machine.h b/libgcc/config/i386/64/sfp-machine.h
> index 1ff94c23ea4..e1c616699bb 100644
> --- a/libgcc/config/i386/64/sfp-machine.h
> +++ b/libgcc/config/i386/64/sfp-machine.h
> @@ -13,6 +13,7 @@ typedef unsigned int UTItype __attribute__ ((mode (TI)));
>
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>
> +#define _FP_NANFRAC_H          _FP_QNANBIT_H
>  #define _FP_NANFRAC_S          _FP_QNANBIT_S
>  #define _FP_NANFRAC_D          _FP_QNANBIT_D
>  #define _FP_NANFRAC_E          _FP_QNANBIT_E, 0
> diff --git a/libgcc/config/i386/64/t-softfp b/libgcc/config/i386/64/t-softfp
> new file mode 100644
> index 00000000000..44db2e5aebe
> --- /dev/null
> +++ b/libgcc/config/i386/64/t-softfp
> @@ -0,0 +1,9 @@
> +ifeq ($(have_as_avx512fp16),yes)
> +# Add the following HFmode functions to static libgcc2.
> +hf-extras := fixhfti.c fixunshfti.c floattihf.c floatuntihf.c
> +
> +CFLAGS-fixhfti.c += -mavx512fp16
> +CFLAGS-fixunshfti.c += -mavx512fp16
> +CFLAGS-floattihf.c += -mavx512fp16
> +CFLAGS-floatuntihf.c += -mavx512fp16
> +endif
> diff --git a/libgcc/config/i386/_divhc3.c b/libgcc/config/i386/_divhc3.c
> new file mode 100644
> index 00000000000..b2e5b0cfc7d
> --- /dev/null
> +++ b/libgcc/config/i386/_divhc3.c
> @@ -0,0 +1,4 @@
> +#ifdef __AVX512FP16__
> +#define L_divhc3
> +#include "libgcc2.c"
> +#endif
> diff --git a/libgcc/config/i386/_mulhc3.c b/libgcc/config/i386/_mulhc3.c
> new file mode 100644
> index 00000000000..90af0ead882
> --- /dev/null
> +++ b/libgcc/config/i386/_mulhc3.c
> @@ -0,0 +1,4 @@
> +#ifdef __AVX512FP16__
> +#define L_mulhc3
> +#include "libgcc2.c"
> +#endif
> diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
> index 8319f0550bc..f15d29d3755 100644
> --- a/libgcc/config/i386/sfp-machine.h
> +++ b/libgcc/config/i386/sfp-machine.h
> @@ -17,6 +17,7 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
>  #define _FP_KEEPNANFRACP       1
>  #define _FP_QNANNEGATEDP 0
>
> +#define _FP_NANSIGN_H          1
>  #define _FP_NANSIGN_S          1
>  #define _FP_NANSIGN_D          1
>  #define _FP_NANSIGN_E          1
> diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp
> index 685d9cf8502..d9cfa36ca90 100644
> --- a/libgcc/config/i386/t-softfp
> +++ b/libgcc/config/i386/t-softfp
> @@ -1 +1,21 @@
>  LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c
> +
> +# Replace _divhc3 and _mulhc3.
> +libgcc2-hf-functions = _divhc3 _mulhc3
> +LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functions)
> +libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
> +LIB2ADD_ST += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
> +
> +ifeq ($(have_as_avx512fp16),yes)
> +# Add the following HFmode functions to static libgcc2.
> +hf-extras += extendhfxf2.c extendhftf2.c truncxfhf2.c trunctfhf2.c
> +LIB2ADD_ST += $(addprefix $(srcdir)/soft-fp/, $(hf-extras))
> +
> +CFLAGS-extendhfxf2.c += -mavx512fp16
> +CFLAGS-extendhftf2.c += -mavx512fp16
> +CFLAGS-truncxfhf2.c += -mavx512fp16
> +CFLAGS-trunctfhf2.c += -mavx512fp16
> +
> +CFLAGS-_divhc3.c += -mavx512fp16
> +CFLAGS-_mulhc3.c += -mavx512fp16
> +endif
> diff --git a/libgcc/configure b/libgcc/configure
> index 4919a56f518..503019f020c 100755
> --- a/libgcc/configure
> +++ b/libgcc/configure
> @@ -605,6 +605,7 @@ solaris_ld_v2_maps
>  real_host_noncanonical
>  accel_dir_suffix
>  use_tm_clone_registry
> +have_as_avx512fp16
>  force_explicit_eh_registry
>  CET_FLAGS
>  fixed_point
> @@ -5302,6 +5303,38 @@ $as_echo "$libgcc_cv_powerpc_3_1_float128_hw" >&6; }
>    CFLAGS="$saved_CFLAGS"
>  esac
>
> +case "${target}" in
> +i[34567]86-*-* | x86_64-*-*)
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking if the assembler supports AVX512FP16" >&5
> +$as_echo_n "checking if the assembler supports AVX512FP16... " >&6; }
> +if ${libgcc_cv_as_avx512fp16+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +int
> +main ()
> +{
> +asm("vmovsh %xmm0, %xmm0, %xmm1");
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"; then :
> +  libgcc_cv_as_avx512fp16=yes
> +else
> +  libgcc_cv_as_avx512fp16=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_as_avx512fp16" >&5
> +$as_echo "$libgcc_cv_as_avx512fp16" >&6; }
> +  ;;
> +esac
> +have_as_avx512fp16=$libgcc_cv_as_avx512fp16
> +
> +
>  # Collect host-machine-specific information.
>  . ${srcdir}/config.host
>
> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> index 13a80b2551b..a45374891df 100644
> --- a/libgcc/configure.ac
> +++ b/libgcc/configure.ac
> @@ -485,6 +485,19 @@ powerpc*-*-linux*)
>    CFLAGS="$saved_CFLAGS"
>  esac
>
> +case "${target}" in
> +dnl Check if as supports AVX512FP16 instructions.
> +i[[34567]]86-*-* | x86_64-*-*)
> +  AC_CACHE_CHECK([if the assembler supports AVX512FP16],
> +                [libgcc_cv_as_avx512fp16],
> +                [AC_TRY_COMPILE([], [asm("vmovsh %xmm0, %xmm0, %xmm1");],
> +                [libgcc_cv_as_avx512fp16=yes],
> +                [libgcc_cv_as_avx512fp16=no])])
> +  ;;
> +esac
> +have_as_avx512fp16=$libgcc_cv_as_avx512fp16
> +AC_SUBST(have_as_avx512fp16)
> +
>  # Collect host-machine-specific information.
>  . ${srcdir}/config.host
>
> diff --git a/libgcc/soft-fp/extendhfxf2.c b/libgcc/soft-fp/extendhfxf2.c
> new file mode 100644
> index 00000000000..2a11e109dc5
> --- /dev/null
> +++ b/libgcc/soft-fp/extendhfxf2.c
> @@ -0,0 +1,53 @@
> +/* Software floating-point emulation.
> +   Return an IEEE half converted to IEEE extended.
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define FP_NO_EXACT_UNDERFLOW
> +#include "soft-fp.h"
> +#include "half.h"
> +#include "extended.h"
> +
> +XFtype
> +__extendhfxf2 (HFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_H (A);
> +  FP_DECL_E (R);
> +  XFtype r;
> +
> +  FP_INIT_EXCEPTIONS;
> +  FP_UNPACK_RAW_H (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_EXTEND (E, H, 4, 1, R, A);
> +#else
> +  FP_EXTEND (E, H, 2, 1, R, A);
> +#endif
> +  FP_PACK_RAW_E (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> diff --git a/libgcc/soft-fp/truncxfhf2.c b/libgcc/soft-fp/truncxfhf2.c
> new file mode 100644
> index 00000000000..8d80a1f5129
> --- /dev/null
> +++ b/libgcc/soft-fp/truncxfhf2.c
> @@ -0,0 +1,52 @@
> +/* Software floating-point emulation.
> +   Truncate IEEE extended into IEEE half.
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   In addition to the permissions in the GNU Lesser General Public
> +   License, the Free Software Foundation gives you unlimited
> +   permission to link the compiled version of this file into
> +   combinations with other programs, and to distribute those
> +   combinations without any restriction coming from the use of this
> +   file.  (The Lesser General Public License restrictions do apply in
> +   other respects; for example, they cover modification of the file,
> +   and distribution when not linked into a combine executable.)
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "soft-fp.h"
> +#include "half.h"
> +#include "extended.h"
> +
> +HFtype
> +__truncxfhf2 (XFtype a)
> +{
> +  FP_DECL_EX;
> +  FP_DECL_E (A);
> +  FP_DECL_H (R);
> +  HFtype r;
> +
> +  FP_INIT_ROUNDMODE;
> +  FP_UNPACK_SEMIRAW_E (A, a);
> +#if _FP_W_TYPE_SIZE < 64
> +  FP_TRUNC (H, E, 1, 4, R, A);
> +#else
> +  FP_TRUNC (H, E, 1, 2, R, A);
> +#endif
> +  FP_PACK_SEMIRAW_H (r, R);
> +  FP_HANDLE_EXCEPTIONS;
> +
> +  return r;
> +}
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 1/2] AVX512FP16: Initial support for _Float16 type and AVX512FP16 feature.
       [not found] ` <20210701054808.39000-2-hongtao.liu@intel.com>
@ 2021-07-01  5:55   ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-01  5:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: H. J. Lu, Uros Bizjak, Jakub Jelinek, liuhongt

On Thu, Jul 1, 2021 at 1:48 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> From: "Guo, Xuepeng" <xuepeng.guo@intel.com>
>
> gcc/ChangeLog:
>
>         * common/config/i386/cpuinfo.h (get_available_features):
>         Detect FEATURE_AVX512FP16.
>         * common/config/i386/i386-common.c
>         (OPTION_MASK_ISA_AVX512FP16_SET,
>         OPTION_MASK_ISA_AVX512FP16_UNSET,
>         OPTION_MASK_ISA2_AVX512FP16_SET,
>         OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
>         (OPTION_MASK_ISA2_AVX512BW_UNSET,
>         OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
>         (ix86_handle_option): Handle -mavx512fp16.
>         * common/config/i386/i386-cpuinfo.h (enum processor_features):
>         Add FEATURE_AVX512FP16.
>         * common/config/i386/i386-isas.h: Add entry for AVX512FP16.
>         * config.gcc: Add avx512fp16intrin.h.
>         * config/i386/avx512fp16intrin.h: New intrinsic header.
>         * config/i386/cpuid.h: Add bit_AVX512FP16.
>         * config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
>         (UINT8): Ditto.
>         (V8HF): New vector type.
>         * config/i386/i386-builtins.c: Support _Float16 type for i386 backend.
>         * config/i386/i386-c.c (ix86_target_macros_internal): Define
>         __AVX512FP16__.
>         (ix86_target_macros): Undefine all _Float16 macros when AVX512FP16 is
>         disabled.
>         * config/i386/i386-expand.c (ix86_expand_move): Issue error when
>         using HFmode without AVX512FP16 enabled.
>         (ix86_expand_branch): Support HFmode.
>         * config/i386/i386-isa.def: Add PTA define for AVX512FP16.
>         * config/i386/i386-modes.def: Add HFmode.
>         * config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
>         (ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
>         (ix86_option_override_internal): Enable SSE math for AVX512FP16.
>         * config/i386/i386.c (classify_argument): Add HFmode and
>         HCmode.
>         (construct_container): Avoid HCmode.
>         (function_value_32): Set return register to xmm0 for HF/HCmode.
>         (function_value_64): Add HFmode and HCmode.
>         (ix86_get_ssemov): Use vmovdqu16/vmovw/vmovsh for HFmode/HImode
>         scalar or vector.
>         (ix86_print_operand): Update output for HFmode constant.
>         (output_387_binary_op): Update instruction suffix for HFmode.
>         (sse_store_index): Use SFmode cost for HFmode cost.
>         (inline_memory_move_cost): Add HFmode, and perfer SSE cost over
>         GPR cost for HFmode.
>         (ix86_hard_regno_mode_ok): Allow HFmode.
>         (ix86_set_reg_reg_cost): Support cost for FP16 modes.
>         (ix86_scalar_mode_supported_p): Add HFmode.
>         (ix86_libgcc_floating_mode_supported_p): New function for
>         TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P hook.
>         (ix86_mangle_type): Add manlging for _Float16 type.
>         (ix86_get_excess_precision): Set FLT_EVAL_METHOD for AVX512FP16.
>         (ix86_can_inline_p): Skip fmpath check when AVX512FP16 enabled.
>         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Define.
>         * config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
>         (VALID_SSE_REG_MODE): Add HFmode.
>         (VALID_FP_MODE_P): Add HFmode and HCmode.
>         (SSE_FLOAT_MODE_P): Add HFmode.
>         (PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
>         * config/i386/i386.md (mode): Add HFmode.
>         (MODE_SIZE): Add HFmode.
>         (MODESH): New mode iterator.
>         (MODEFH): Likewise.
>         (X87MODEFH): Likewise.
>         (ssemodesuffix): Add sh suffix for HFmode.
>         (cbranch<mode>4): Use MODEFH.
>         (<insn><mode>3): Likewise.
>         (mul<mode>3): Likewise.
>         (div<mode>3): Likewise.
>         (*ieee_s<ieee_maxmin><mode>3): Likewise.
>         (*cmpi<unord>hf): New define_insn for HFmode.
>         (*pushhf_rex64): Likewise.
>         (*pushhf): Likewise.
>         (*movhf_internal): Likewise.
>         (extendhf<mode>2): Likewise.
>         (trunc<mode>hf2): Likewise.
>         (*fop_hf_comm): Likewise.
>         (*fop_hf_1): Likewise.
>         (float<floatunssuffix><mode>hf2): Likewise.
>         (define_split): Use MODESH.
>         (mov<mode>): Use X87MODEFH.
>         (mov<mode>cc): Likewise.
>         * config/i386/i386.opt: Add mavx512fp16.
>         * config/i386/immintrin.h: Include avx512fp16intrin.h.
>         * config/i386/sse.md (VFH_128): New mode iterator.
>         (sse): Add scalar and vector HFmodes.
>         (ssescalarmode): Add vector HFmode mapping.
>         (ssescalarmodesuffix): Add sh suffix for HFmode.
>         (*<sse>_vm<insn><mode>3): Use VFH_128.
>         (*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
>         (*ieee_<ieee_maxmin><mode>3): Likewise.
>         * doc/invoke.texi: Add mavx512fp16.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
>         * gcc.target/i386/avx-2.c: Ditto.
>         * gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
>         * gcc.target/i386/funcspec-56.inc: Add new target attribute check.
>         * gcc.target/i386/sse-13.c: Add -mavx512fp16.
>         * gcc.target/i386/sse-14.c: Ditto.
>         * gcc.target/i386/sse-22.c: Ditto.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * lib/target-supports.exp: (check_effective_target_avx512fp16): New.
>         * g++.target/i386/float16-1.C: New test.
>         * g++.target/i386/float16-2.C: Ditto.
>         * g++.target/i386/float16-3.C: Ditto.
>         * gcc.target/i386/avx512fp16-12a.c: Ditto.
>         * gcc.target/i386/avx512fp16-12b.c: Ditto.
>         * gcc.target/i386/float16-1.c: Ditto.
>         * gcc.target/i386/float16-2.c: Ditto.
>         * gcc.target/i386/float16-3a.c: Ditto.
>         * gcc.target/i386/float16-3b.c: Ditto.
>         * gcc.target/i386/float16-4a.c: Ditto.
>         * gcc.target/i386/float16-4b.c: Ditto.
>         * gcc.target/i386/pr54855-12.c: Ditto.
>
> Co-Authored-By: Guo, Xuepeng <xuepeng.guo@intel.com>
> Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
> Co-Authored-By: Liu, Hongtao <hongtao.liu@intel.com>
> Co-Authored-By: Wang, Hongyu <hongyu.wang@intel.com>
> Co-Authored-By: Xu, Dianhong <dianhong.xu@intel.com>
> ---
>  gcc/common/config/i386/cpuinfo.h              |   2 +
>  gcc/common/config/i386/i386-common.c          |  26 +-
>  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
>  gcc/common/config/i386/i386-isas.h            |   1 +
>  gcc/config.gcc                                |   2 +-
>  gcc/config/i386/avx512fp16intrin.h            |  53 ++++
>  gcc/config/i386/cpuid.h                       |   1 +
>  gcc/config/i386/i386-builtin-types.def        |   7 +-
>  gcc/config/i386/i386-builtins.c               |   6 +
>  gcc/config/i386/i386-c.c                      |  20 ++
>  gcc/config/i386/i386-expand.c                 |   8 +
>  gcc/config/i386/i386-isa.def                  |   1 +
>  gcc/config/i386/i386-modes.def                |   1 +
>  gcc/config/i386/i386-options.c                |  10 +-
>  gcc/config/i386/i386.c                        | 158 ++++++++++--
>  gcc/config/i386/i386.h                        |  18 +-
>  gcc/config/i386/i386.md                       | 242 +++++++++++++++---
>  gcc/config/i386/i386.opt                      |   4 +
>  gcc/config/i386/immintrin.h                   |   2 +
>  gcc/config/i386/sse.md                        |  42 +--
>  gcc/doc/invoke.texi                           |  10 +-
>  gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
>  gcc/testsuite/g++.target/i386/float16-2.C     |  14 +
>  gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
>  gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
>  .../gcc.target/i386/avx512fp16-12a.c          |  21 ++
>  .../gcc.target/i386/avx512fp16-12b.c          |  27 ++
>  gcc/testsuite/gcc.target/i386/float16-1.c     |   8 +
>  gcc/testsuite/gcc.target/i386/float16-2.c     |  14 +
>  gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 +
>  gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
>  gcc/testsuite/lib/target-supports.exp         |  13 +-
>  42 files changed, 704 insertions(+), 99 deletions(-)
>  create mode 100644 gcc/config/i386/avx512fp16intrin.h
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
>
> diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
> index 458f41de776..1835ac64e67 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -731,6 +731,8 @@ get_available_features (struct __processor_model *cpu_model,
>             set_feature (FEATURE_AVX5124FMAPS);
>           if (edx & bit_AVX512VP2INTERSECT)
>             set_feature (FEATURE_AVX512VP2INTERSECT);
> +         if (edx & bit_AVX512FP16)
> +           set_feature (FEATURE_AVX512FP16);
>         }
>
>        __cpuid_count (7, 1, eax, ebx, ecx, edx);
> diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
> index e156cc34584..197e9cd86b4 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX5124VNNIW_SET OPTION_MASK_ISA2_AVX5124VNNIW
>  #define OPTION_MASK_ISA_AVX512VBMI2_SET \
>    (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512F_SET)
> +#define OPTION_MASK_ISA_AVX512FP16_SET OPTION_MASK_ISA_AVX512BW_SET
> +#define OPTION_MASK_ISA2_AVX512FP16_SET OPTION_MASK_ISA2_AVX512FP16
>  #define OPTION_MASK_ISA_AVX512VNNI_SET \
>    (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512F_SET)
>  #define OPTION_MASK_ISA2_AVXVNNI_SET OPTION_MASK_ISA2_AVXVNNI
> @@ -231,6 +233,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS
>  #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW
>  #define OPTION_MASK_ISA_AVX512VBMI2_UNSET OPTION_MASK_ISA_AVX512VBMI2
> +#define OPTION_MASK_ISA_AVX512FP16_UNSET OPTION_MASK_ISA_AVX512BW_UNSET
> +#define OPTION_MASK_ISA2_AVX512FP16_UNSET OPTION_MASK_ISA2_AVX512FP16
>  #define OPTION_MASK_ISA_AVX512VNNI_UNSET OPTION_MASK_ISA_AVX512VNNI
>  #define OPTION_MASK_ISA2_AVXVNNI_UNSET OPTION_MASK_ISA2_AVXVNNI
>  #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ
> @@ -313,7 +317,8 @@ along with GCC; see the file COPYING3.  If not see
>    (OPTION_MASK_ISA2_AVX512BF16_UNSET \
>     | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \
>     | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \
> -   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
> +   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> +   | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
>    (OPTION_MASK_ISA2_AVX512F_UNSET)
>  #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
> @@ -326,7 +331,9 @@ along with GCC; see the file COPYING3.  If not see
>    (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET)
>  #define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET
>
> -#define OPTION_MASK_ISA2_AVX512BW_UNSET OPTION_MASK_ISA2_AVX512BF16_UNSET
> +#define OPTION_MASK_ISA2_AVX512BW_UNSET \
> +  (OPTION_MASK_ISA2_AVX512BF16_UNSET \
> +    | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>
>  /* Set 1 << value as value of -malign-FLAG option.  */
>
> @@ -830,6 +837,21 @@ ix86_handle_option (struct gcc_options *opts,
>         }
>        return true;
>
> +    case OPT_mavx512fp16:
> +      if (value)
> +       {
> +         opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
> +       }
> +      else
> +       {
> +         opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
> +       }
> +      return true;
> +
>      case OPT_mavx512vnni:
>        if (value)
>         {
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
> index e68dd656046..4e0659fc7b2 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -228,6 +228,7 @@ enum processor_features
>    FEATURE_AESKLE,
>    FEATURE_WIDEKL,
>    FEATURE_AVXVNNI,
> +  FEATURE_AVX512FP16,
>    CPU_FEATURE_MAX
>  };
>
> diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
> index 898c18f3dda..a6783660278 100644
> --- a/gcc/common/config/i386/i386-isas.h
> +++ b/gcc/common/config/i386/i386-isas.h
> @@ -169,4 +169,5 @@ ISA_NAMES_TABLE_START
>    ISA_NAMES_TABLE_ENTRY("aeskle", FEATURE_AESKLE, P_NONE, NULL)
>    ISA_NAMES_TABLE_ENTRY("widekl", FEATURE_WIDEKL, P_NONE, "-mwidekl")
>    ISA_NAMES_TABLE_ENTRY("avxvnni", FEATURE_AVXVNNI, P_NONE, "-mavxvnni")
> +  ISA_NAMES_TABLE_ENTRY("avx512fp16", FEATURE_AVX512FP16, P_NONE, "-mavx512fp16")
>  ISA_NAMES_TABLE_END
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 0230bb88861..5b4f894185a 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
>                        tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
>                        amxbf16intrin.h x86gprintrin.h uintrintrin.h
>                        hresetintrin.h keylockerintrin.h avxvnniintrin.h
> -                      mwaitintrin.h"
> +                      mwaitintrin.h avx512fp16intrin.h"
>         ;;
>  ia64-*-*)
>         extra_headers=ia64intrin.h
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> new file mode 100644
> index 00000000000..38d63161ba6
> --- /dev/null
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -0,0 +1,53 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _IMMINTRIN_H_INCLUDED
> +#error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
> +#endif
> +
> +#ifndef __AVX512FP16INTRIN_H_INCLUDED
> +#define __AVX512FP16INTRIN_H_INCLUDED
> +
> +#ifndef __AVX512FP16__
> +#pragma GCC push_options
> +#pragma GCC target("avx512fp16")
> +#define __DISABLE_AVX512FP16__
> +#endif /* __AVX512FP16__ */
> +
> +/* Internal data types for implementing the intrinsics.  */
> +typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
> +typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
> +typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
> +
> +/* The Intel API is flexible enough that we must allow aliasing with other
> +   vector types, and their scalar components.  */
> +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
> +typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
> +typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +#ifdef __DISABLE_AVX512FP16__
> +#undef __DISABLE_AVX512FP16__
> +#pragma GCC pop_options
> +#endif /* __DISABLE_AVX512FP16__ */
> +
> +#endif /* __AVX512FP16INTRIN_H_INCLUDED */
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index aebc17c6827..82b8050028b 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -126,6 +126,7 @@
>  #define bit_AVX5124VNNIW (1 << 2)
>  #define bit_AVX5124FMAPS (1 << 3)
>  #define bit_AVX512VP2INTERSECT (1 << 8)
> +#define bit_AVX512FP16   (1 << 23)
>  #define bit_IBT        (1 << 20)
>  #define bit_UINTR (1 << 5)
>  #define bit_PCONFIG    (1 << 18)
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index 3ca313c19ec..eb5153002ae 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -68,6 +68,7 @@ DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
>  DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)
>  DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
>  DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
> +DEF_PRIMITIVE_TYPE (FLOAT16, float16_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>  DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> @@ -84,6 +85,7 @@ DEF_VECTOR_TYPE (V8QI, QI)
>  # SSE vectors
>  DEF_VECTOR_TYPE (V2DF, DOUBLE)
>  DEF_VECTOR_TYPE (V4SF, FLOAT)
> +DEF_VECTOR_TYPE (V8HF, FLOAT16)
>  DEF_VECTOR_TYPE (V2DI, DI)
>  DEF_VECTOR_TYPE (V4SI, SI)
>  DEF_VECTOR_TYPE (V8HI, HI)
> @@ -1296,4 +1298,7 @@ DEF_FUNCTION_TYPE (UINT, UINT, V2DI, V2DI, PVOID)
>  DEF_FUNCTION_TYPE (UINT, UINT, V2DI, PVOID)
>  DEF_FUNCTION_TYPE (VOID, V2DI, V2DI, V2DI, UINT)
>  DEF_FUNCTION_TYPE (UINT8, PV2DI, V2DI, PCVOID)
> -DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
> \ No newline at end of file
> +DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
> +
> +# FP16 builtins
> +DEF_FUNCTION_TYPE (V8HF, V8HI)
> diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
> index 204e2903126..826fa650f21 100644
> --- a/gcc/config/i386/i386-builtins.c
> +++ b/gcc/config/i386/i386-builtins.c
> @@ -1371,6 +1371,12 @@ ix86_init_builtin_types (void)
>       it.  */
>    lang_hooks.types.register_builtin_type (float128_type_node, "__float128");
>
> +  /* Provide the _Float16 type if needed so that it can be used in
> +     AVX512FP16 intrinsics.   */
> +  if (!maybe_get_identifier ("_Float16"))
> +    lang_hooks.types.register_builtin_type (float16_type_node,
> +                                           "_Float16");
> +
>    const_string_type_node
>      = build_pointer_type (build_qualified_type
>                           (char_type_node, TYPE_QUAL_CONST));
> diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> index 5ed0de006fb..d3704717b2a 100644
> --- a/gcc/config/i386/i386-c.c
> +++ b/gcc/config/i386/i386-c.c
> @@ -598,6 +598,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>      def_or_undef (parse_in, "__PTWRITE__");
>    if (isa_flag2 & OPTION_MASK_ISA2_AVX512BF16)
>      def_or_undef (parse_in, "__AVX512BF16__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX512FP16)
> +    def_or_undef (parse_in, "__AVX512FP16__");
>    if (TARGET_MMX_WITH_SSE)
>      def_or_undef (parse_in, "__MMX_WITH_SSE__");
>    if (isa_flag2 & OPTION_MASK_ISA2_ENQCMD)
> @@ -771,6 +773,24 @@ ix86_target_macros (void)
>
>    cpp_define (parse_in, "__SIZEOF_FLOAT128__=16");
>
> +  if (!TARGET_AVX512FP16)
> +    {
> +      /* NB: _Float16 is always provided in the the C front-end for
> +        AVX512FP16 intrinsics.  If AVX512FP16 isn't enabled, undef
> +        all _Float16 macros.  */
> +      cpp_undef (parse_in, "__FLT16_MANT_DIG__");
> +      cpp_undef (parse_in, "__FLT16_DIG__");
> +      cpp_undef (parse_in, "__FLT16_MIN_EXP__");
> +      cpp_undef (parse_in, "__FLT16_MIN_10_EXP__");
> +      cpp_undef (parse_in, "__FLT16_MAX_EXP__");
> +      cpp_undef (parse_in, "__FLT16_MAX_10_EXP__");
> +      cpp_undef (parse_in, "__FLT16_MAX__");
> +      cpp_undef (parse_in, "__FLT16_EPSILON__");
> +      cpp_undef (parse_in, "__FLT16_MIN__");
> +      cpp_undef (parse_in, "__FLT16_DECIMAL_DIG__");
> +      cpp_undef (parse_in, "__FLT16_DENORM_MIN__");
> +    }
> +
>    cpp_define_formatted (parse_in, "__ATOMIC_HLE_ACQUIRE=%d", IX86_HLE_ACQUIRE);
>    cpp_define_formatted (parse_in, "__ATOMIC_HLE_RELEASE=%d", IX86_HLE_RELEASE);
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index e9763eb5b3e..ab5f5b284c8 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -197,6 +197,13 @@ ix86_expand_move (machine_mode mode, rtx operands[])
>    rtx tmp, addend = NULL_RTX;
>    enum tls_model model;
>
> +  /* NB: HFmode is always enabled so that the _Float16 type can be
> +     used for AVX512FP16 intrinsics.  We will issue an error here
> +     if AVX512FP16 isn't available.  */
> +  if (mode == HFmode && !TARGET_AVX512FP16)
> +    fatal_error (input_location,
> +                "%<_Float16%> is not supported on this target");
> +
>    op0 = operands[0];
>    op1 = operands[1];
>
> @@ -2132,6 +2139,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
>
>    switch (mode)
>      {
> +    case E_HFmode:
>      case E_SFmode:
>      case E_DFmode:
>      case E_XFmode:
> diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
> index a0d46cbc892..83d9302ea3d 100644
> --- a/gcc/config/i386/i386-isa.def
> +++ b/gcc/config/i386/i386-isa.def
> @@ -108,3 +108,4 @@ DEF_PTA(HRESET)
>  DEF_PTA(KL)
>  DEF_PTA(WIDEKL)
>  DEF_PTA(AVXVNNI)
> +DEF_PTA(AVX512FP16)
> diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
> index 4e7014be034..9232f59a925 100644
> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
>  FLOAT_MODE (TF, 16, ieee_quad_format);
> +FLOAT_MODE (HF, 2, ieee_half_format);
>
>  /* In ILP32 mode, XFmode has size 12 and alignment 4.
>     In LP64 mode, XFmode has size and alignment 16.  */
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index 0eccb549c22..b7b6f68af56 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -223,7 +223,8 @@ static struct ix86_target_opts isa2_opts[] =
>    { "-mhreset",                OPTION_MASK_ISA2_HRESET },
>    { "-mkl",            OPTION_MASK_ISA2_KL },
>    { "-mwidekl",        OPTION_MASK_ISA2_WIDEKL },
> -  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI }
> +  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI },
> +  { "-mavx512fp16",    OPTION_MASK_ISA2_AVX512FP16 }
>  };
>  static struct ix86_target_opts isa_opts[] =
>  {
> @@ -1045,6 +1046,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
>      IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
>      IX86_ATTR_ISA ("hreset", OPT_mhreset),
>      IX86_ATTR_ISA ("avxvnni",   OPT_mavxvnni),
> +    IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16),
>
>      /* enum options */
>      IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_),
> @@ -2495,6 +2497,12 @@ ix86_option_override_internal (bool main_args_p,
>    else
>      opts->x_ix86_fpmath = TARGET_FPMATH_DEFAULT_P (opts->x_ix86_isa_flags);
>
> +  if (TARGET_AVX512FP16 && (opts->x_ix86_fpmath & FPMATH_SSE) == 0)
> +    {
> +      opts->x_ix86_fpmath = (fpmath_unit) (opts->x_ix86_fpmath
> +                                          | FPMATH_SSE);
> +    }
> +
>    /* Use external vectorized library in vectorizing intrinsics.  */
>    if (opts_set->x_ix86_veclibabi_type)
>      switch (opts->x_ix86_veclibabi_type)
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index a93128fa0a4..9ca31e934ab 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -604,6 +604,10 @@ ix86_can_inline_p (tree caller, tree callee)
>      ret = false;
>
>    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> +          /* AVX512FP16 will always enable ssemath since there's
> +             no x87 intrutions for HFmode.
> +             This is for -m32 -mavx512fp16 when fpmath=x87 default.  */
> +          && ! TARGET_AVX512FP16
>            /* If the calle doesn't use FP expressions differences in
>               ix86_fpmath can be ignored.  We are called from FEs
>               for multi-versioning call optimization, so beware of
> @@ -2350,6 +2354,7 @@ classify_argument (machine_mode mode, const_tree type,
>        gcc_unreachable ();
>      case E_CTImode:
>        return 0;
> +    case E_HFmode:
>      case E_SFmode:
>        if (!(bit_offset % 64))
>         classes[0] = X86_64_SSESF_CLASS;
> @@ -2367,6 +2372,7 @@ classify_argument (machine_mode mode, const_tree type,
>        classes[0] = X86_64_SSE_CLASS;
>        classes[1] = X86_64_SSEUP_CLASS;
>        return 2;
> +    case E_HCmode:
>      case E_SCmode:
>        classes[0] = X86_64_SSE_CLASS;
>        if (!(bit_offset % 64))
> @@ -2578,9 +2584,9 @@ construct_container (machine_mode mode, machine_mode orig_mode,
>           return NULL;
>         }
>
> -  /* First construct simple cases.  Avoid SCmode, since we want to use
> -     single register to pass this type.  */
> -  if (n == 1 && mode != SCmode)
> +  /* First construct simple cases.  Avoid HCmode and SCmode, since we
> +     want to use single register to pass these types.  */
> +  if (n == 1 && mode != HCmode && mode != SCmode)
>      switch (regclass[0])
>        {
>        case X86_64_INTEGER_CLASS:
> @@ -3896,6 +3902,10 @@ function_value_32 (machine_mode orig_mode, machine_mode mode,
>    else if (VECTOR_MODE_P (mode) && GET_MODE_SIZE (mode) == 64)
>      regno = FIRST_SSE_REG;
>
> +  /* _Float16 return values in %xmm0.  */
> +  else if (mode == HFmode || mode == HCmode)
> +    regno = FIRST_SSE_REG;
> +
>    /* Floating point return values in %st(0) (unless -mno-fp-ret-in-387).  */
>    else if (X87_FLOAT_MODE_P (mode) && TARGET_FLOAT_RETURNS_IN_80387)
>      regno = FIRST_FLOAT_REG;
> @@ -3939,6 +3949,8 @@ function_value_64 (machine_mode orig_mode, machine_mode mode,
>
>        switch (mode)
>         {
> +       case E_HFmode:
> +       case E_HCmode:
>         case E_SFmode:
>         case E_SCmode:
>         case E_DFmode:
> @@ -5303,7 +5315,12 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>        switch (type)
>         {
>         case opcode_int:
> -         opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> +         if (scalar_mode == E_HFmode)
> +           opcode = (misaligned_p
> +                     ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
> +                     : "vmovdqa64");
> +         else
> +           opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
>           break;
>         case opcode_float:
>           opcode = misaligned_p ? "vmovups" : "vmovaps";
> @@ -5317,6 +5334,11 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>      {
>        switch (scalar_mode)
>         {
> +       case E_HFmode:
> +         opcode = (misaligned_p
> +                   ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
> +                   : "vmovdqa64");
> +         break;
>         case E_SFmode:
>           opcode = misaligned_p ? "%vmovups" : "%vmovaps";
>           break;
> @@ -5452,6 +5474,9 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
>      case MODE_SI:
>        return "%vmovd\t{%1, %0|%0, %1}";
>
> +    case MODE_HI:
> +      return "vmovw\t{%1, %0|%0, %1}";
> +
>      case MODE_DF:
>        if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
>         return "vmovsd\t{%d1, %0|%0, %d1}";
> @@ -5464,6 +5489,12 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
>        else
>         return "%vmovss\t{%1, %0|%0, %1}";
>
> +    case MODE_HF:
> +      if (REG_P (operands[0]) && REG_P (operands[1]))
> +       return "vmovsh\t{%d1, %0|%0, %d1}";
> +      else
> +       return "vmovsh\t{%1, %0|%0, %1}";
> +
>      case MODE_V1DF:
>        gcc_assert (!TARGET_AVX);
>        return "movlpd\t{%1, %0|%0, %1}";
> @@ -13411,6 +13442,15 @@ ix86_print_operand (FILE *file, rtx x, int code)
>           (file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P');
>      }
>
> +  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == HFmode)
> +    {
> +      long l = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (x),
> +                              REAL_MODE_FORMAT (HFmode));
> +      if (ASSEMBLER_DIALECT == ASM_ATT)
> +       putc ('$', file);
> +      fprintf (file, "0x%04x", (unsigned int) l);
> +    }
> +
>    else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
>      {
>        long l;
> @@ -13901,7 +13941,9 @@ output_387_binary_op (rtx_insn *insn, rtx *operands)
>
>    if (is_sse)
>     {
> -     p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd";
> +     p = (GET_MODE (operands[0]) == HFmode
> +         ? "sh"
> +         : (GET_MODE (operands[0]) == SFmode ? "ss" : "sd"));
>       strcat (buf, p);
>
>       if (TARGET_AVX)
> @@ -19157,21 +19199,26 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
>  static inline int
>  sse_store_index (machine_mode mode)
>  {
> -      switch (GET_MODE_SIZE (mode))
> -       {
> -         case 4:
> -           return 0;
> -         case 8:
> -           return 1;
> -         case 16:
> -           return 2;
> -         case 32:
> -           return 3;
> -         case 64:
> -           return 4;
> -         default:
> -           return -1;
> -       }
> +  /* NB: Use SFmode cost for HFmode instead of adding HFmode load/store
> +     costs to processor_costs, which requires changes to all entries in
> +     processor cost table.  */
> +  if (mode == E_HFmode)
> +    mode = E_SFmode;
> +  switch (GET_MODE_SIZE (mode))
> +    {
> +    case 4:
> +      return 0;
> +    case 8:
> +      return 1;
> +    case 16:
> +      return 2;
> +    case 32:
> +      return 3;
> +    case 64:
> +      return 4;
> +    default:
> +      return -1;
> +    }
>  }
>
>  /* Return the cost of moving data of mode M between a
> @@ -19198,6 +19245,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>        int index;
>        switch (mode)
>         {
> +    case E_HFmode:
>           case E_SFmode:
>             index = 0;
>             break;
> @@ -19298,11 +19346,31 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>           }
>         break;
>        case 2:
> -       if (in == 2)
> -         return MAX (ix86_cost->hard_register.int_load[1],
> -                     ix86_cost->hard_register.int_store[1]);
> -       return in ? ix86_cost->hard_register.int_load[1]
> -                 : ix86_cost->hard_register.int_store[1];
> +       {
> +         int cost;
> +         if (in == 2)
> +           cost = MAX (ix86_cost->hard_register.int_load[1],
> +                       ix86_cost->hard_register.int_store[1]);
> +         else
> +           cost = in ? ix86_cost->hard_register.int_load[1]
> +                     : ix86_cost->hard_register.int_store[1];
> +         if (mode == E_HFmode)
> +           {
> +             /* Prefer SSE over GPR for HFmode.  */
> +             int sse_cost;
> +             int index = sse_store_index (mode);
> +             if (in == 2)
> +               sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
> +                               ix86_cost->hard_register.sse_store[index]);
> +             else
> +               sse_cost = (in
> +                           ? ix86_cost->hard_register.sse_load [index]
> +                           : ix86_cost->hard_register.sse_store [index]);
> +             if (sse_cost >= cost)
> +               cost = sse_cost + 1;
> +           }
> +         return cost;
> +       }
>        default:
>         if (in == 2)
>           cost = MAX (ix86_cost->hard_register.int_load[2],
> @@ -19476,6 +19544,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>           - XI mode
>           - any of 512-bit wide vector mode
>           - any scalar mode.  */
> +      /* For AVX512FP16, vmovw supports movement of HImode
> +        between gpr and sse registser.  */
>        if (TARGET_AVX512F
>           && (mode == XImode
>               || VALID_AVX512F_REG_MODE (mode)
> @@ -19539,6 +19609,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>      return true;
>    else if (VALID_FP_MODE_P (mode))
>      return true;
> +  else if ((mode == HFmode || mode == HCmode) && TARGET_AVX512FP16)
> +    return true;
>    else if (VALID_DFP_MODE_P (mode))
>      return true;
>    /* Lots of MMX code casts 8 byte vector modes to DImode.  If we then go
> @@ -19720,7 +19792,8 @@ ix86_set_reg_reg_cost (machine_mode mode)
>
>      case MODE_VECTOR_INT:
>      case MODE_VECTOR_FLOAT:
> -      if ((TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
> +      if ((TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
> +         || (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
>           || (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
>           || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
>           || (TARGET_SSE && VALID_SSE_REG_MODE (mode))
> @@ -21550,10 +21623,31 @@ ix86_scalar_mode_supported_p (scalar_mode mode)
>      return default_decimal_float_supported_p ();
>    else if (mode == TFmode)
>      return true;
> +  else if (mode == HFmode)
> +    /* NB: Always return TRUE for HFmode so that the _Float16 type will
> +       be defined by the C front-end for AVX512FP16 intrinsics.  We will
> +       issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
> +       enabled.  */
> +    return true;
>    else
>      return default_scalar_mode_supported_p (mode);
>  }
>
> +/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
> +   if MODE is HFmode, and punt to the generic implementation otherwise.  */
> +
> +static bool
> +ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode)
> +{
> +  /* NB: Always return TRUE for HFmode so that the _Float16 type will
> +     be defined by the C front-end for AVX512FP16 intrinsics.  We will
> +     issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
> +     enabled.  */
> +  return (mode == HFmode
> +         ? true
> +         : default_libgcc_floating_mode_supported_p (mode));
> +}
> +
>  /* Implements target hook vector_mode_supported_p.  */
>  static bool
>  ix86_vector_mode_supported_p (machine_mode mode)
> @@ -21842,6 +21936,10 @@ ix86_mangle_type (const_tree type)
>
>    switch (TYPE_MODE (type))
>      {
> +    case E_HFmode:
> +      /* _Float16 is "DF16_".
> +        Align with clang's decision in https://reviews.llvm.org/D33719. */
> +      return "DF16_";
>      case E_TFmode:
>        /* __float128 is "g".  */
>        return "g";
> @@ -23218,6 +23316,8 @@ ix86_get_excess_precision (enum excess_precision_type type)
>    switch (type)
>      {
>        case EXCESS_PRECISION_TYPE_FAST:
> +       if (TARGET_AVX512FP16)
> +         return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
>         /* The fastest type to promote to will always be the native type,
>            whether that occurs with implicit excess precision or
>            otherwise.  */
> @@ -23230,6 +23330,8 @@ ix86_get_excess_precision (enum excess_precision_type type)
>            cases.  */
>         if (!TARGET_80387)
>           return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +       else if (TARGET_AVX512FP16)
> +         return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
>         else if (!TARGET_MIX_SSE_I387)
>           {
>             if (!(TARGET_SSE && TARGET_SSE_MATH))
> @@ -23795,6 +23897,10 @@ ix86_run_selftests (void)
>  #undef TARGET_SCALAR_MODE_SUPPORTED_P
>  #define TARGET_SCALAR_MODE_SUPPORTED_P ix86_scalar_mode_supported_p
>
> +#undef TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P
> +#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \
> +  ix86_libgcc_floating_mode_supported_p
> +
>  #undef TARGET_VECTOR_MODE_SUPPORTED_P
>  #define TARGET_VECTOR_MODE_SUPPORTED_P ix86_vector_mode_supported_p
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 6e0340a4b60..1e4733420a1 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -990,7 +990,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>
>  #define VALID_AVX512F_SCALAR_MODE(MODE)                                        \
>    ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode            \
> -   || (MODE) == SFmode)
> +   || (MODE) == SFmode                                                 \
> +   || (((MODE) == HImode || (MODE) == HFmode) && TARGET_AVX512FP16))
>
>  #define VALID_AVX512F_REG_MODE(MODE)                                   \
>    ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode     \
> @@ -1005,6 +1006,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>     || (MODE) == V4SImode || (MODE) == V4SFmode || (MODE) == V8HImode   \
>     || (MODE) == TFmode || (MODE) == V1TImode)
>
> +#define VALID_AVX512FP16_REG_MODE(MODE)                                        \
> +  ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode)
> +
>  #define VALID_SSE2_REG_MODE(MODE)                                      \
>    ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode     \
>     || (MODE) == V4QImode || (MODE) == V2HImode                         \
> @@ -1032,7 +1036,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>
>  #define VALID_FP_MODE_P(MODE)                                          \
>    ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode            \
> -   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)                \
> +   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
>
>  #define VALID_INT_MODE_P(MODE)                                         \
>    ((MODE) == QImode || (MODE) == HImode                                        \
> @@ -1055,13 +1059,17 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>     || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode   \
>     || (MODE) == V2TImode || (MODE) == V8DImode || (MODE) == V64QImode  \
>     || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode \
> -   || (MODE) == V16SFmode)
> +   || (MODE) == V16SFmode                                              \
> +   || (((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode) \
> +       && TARGET_AVX512FP16))
>
>  #define X87_FLOAT_MODE_P(MODE) \
>    (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode))
>
>  #define SSE_FLOAT_MODE_P(MODE) \
> -  ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
> +  ((TARGET_AVX512FP16 && (MODE) == HFmode) \
> +   || (TARGET_SSE && (MODE) == SFmode) \
> +   || (TARGET_SSE2 && (MODE) == DFmode))
>
>  #define FMA4_VEC_FLOAT_MODE_P(MODE) \
>    (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \
> @@ -2256,7 +2264,7 @@ constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
>  constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
>    | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
>    | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
> -  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI;
> +  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16;
>  constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
>    | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
>  constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 9b619e2f78f..ee5660e8161 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -496,7 +496,7 @@ (define_attr "type"
>
>  ;; Main data type used by the insn
>  (define_attr "mode"
> -  "unknown,none,QI,HI,SI,DI,TI,OI,XI,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
> +  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
>    V2DF,V2SF,V1DF,V8DF"
>    (const_string "unknown"))
>
> @@ -832,8 +832,7 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
>                     sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
>                     avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
>                     avx512bw,noavx512bw,avx512dq,noavx512dq,
> -                   avx512vl,noavx512vl,
> -                   avxvnni,avx512vnnivl"
> +                   avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16"
>    (const_string "base"))
>
>  ;; Define instruction set of MMX instructions
> @@ -885,7 +884,8 @@ (define_attr "enabled" ""
>          (eq_attr "isa" "avxvnni") (symbol_ref "TARGET_AVXVNNI")
>          (eq_attr "isa" "avx512vnnivl")
>            (symbol_ref "TARGET_AVX512VNNI && TARGET_AVX512VL")
> -
> +        (eq_attr "isa" "avx512fp16")
> +          (symbol_ref "TARGET_AVX512FP16")
>          (eq_attr "mmx_isa" "native")
>            (symbol_ref "!TARGET_MMX_WITH_SSE")
>          (eq_attr "mmx_isa" "sse")
> @@ -1089,8 +1089,9 @@ (define_mode_iterator SWI48DWI [SI DI (TI "TARGET_64BIT")])
>  ;; compile time constant, it is faster to use <MODE_SIZE> than
>  ;; GET_MODE_SIZE (<MODE>mode).  For XFmode which depends on
>  ;; command line options just use GET_MODE_SIZE macro.
> -(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (TI "16")
> -                            (SF "4") (DF "8") (XF "GET_MODE_SIZE (XFmode)")
> +(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
> +                            (TI "16") (HF "2") (SF "4") (DF "8")
> +                            (XF "GET_MODE_SIZE (XFmode)")
>                              (V16QI "16") (V32QI "32") (V64QI "64")
>                              (V8HI "16") (V16HI "32") (V32HI "64")
>                              (V4SI "16") (V8SI "32") (V16SI "64")
> @@ -1222,13 +1223,22 @@ (define_mode_iterator MODEF [SF DF])
>  ;; All x87 floating point modes
>  (define_mode_iterator X87MODEF [SF DF XF])
>
> +;; SSE and x87 SFmode floating point mode and HFmode
> +(define_mode_iterator MODESH [(HF "TARGET_AVX512FP16") SF])
> +
> +;; SSE and x87 SFmode and DFmode floating point modes plus HFmode
> +(define_mode_iterator MODEFH [(HF "TARGET_AVX512FP16") SF DF])
> +
> +;; All x87 floating point modes plus HFmode
> +(define_mode_iterator X87MODEFH [HF SF DF XF])
> +
>  ;; All SSE floating point modes
>  (define_mode_iterator SSEMODEF [SF DF TF])
>  (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
>
>  ;; SSE instruction suffix for various modes
>  (define_mode_attr ssemodesuffix
> -  [(SF "ss") (DF "sd")
> +  [(HF "sh") (SF "ss") (DF "sd")
>     (V16SF "ps") (V8DF "pd")
>     (V8SF "ps") (V4DF "pd")
>     (V4SF "ps") (V2DF "pd")
> @@ -1495,8 +1505,8 @@ (define_expand "cstorexf4"
>
>  (define_expand "cbranch<mode>4"
>    [(set (reg:CC FLAGS_REG)
> -       (compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
> -                   (match_operand:MODEF 2 "cmp_fp_expander_operand")))
> +       (compare:CC (match_operand:MODEFH 1 "cmp_fp_expander_operand")
> +                   (match_operand:MODEFH 2 "cmp_fp_expander_operand")))
>     (set (pc) (if_then_else
>                (match_operator 0 "ix86_fp_comparison_operator"
>                 [(reg:CC FLAGS_REG)
> @@ -1702,6 +1712,17 @@ (define_insn "*cmpi<unord><MODEF:mode>"
>          (eq_attr "alternative" "0")
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
> +
> +(define_insn "*cmpi<unord>hf"
> +  [(set (reg:CCFP FLAGS_REG)
> +       (compare:CCFP
> +         (match_operand:HF 0 "register_operand" "v")
> +         (match_operand:HF 1 "register_ssemem_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "v<unord>comish\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "ssecomi")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
>
>  ;; Push/pop instructions.
>
> @@ -2433,8 +2454,8 @@ (define_insn "*movsi_internal"
>            (symbol_ref "true")))])
>
>  (define_insn "*movhi_internal"
> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k")
> -       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC"))]
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k,?r,?v,*v,*v,*m")
> +       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
>    "!(MEM_P (operands[0]) && MEM_P (operands[1]))
>     && ix86_hardreg_mov_ok (operands[0], operands[1])"
>
> @@ -2460,6 +2481,9 @@ (define_insn "*movhi_internal"
>           gcc_unreachable ();
>         }
>
> +    case TYPE_SSEMOV:
> +      return ix86_output_ssemov (insn, operands);
> +
>      case TYPE_MSKLOG:
>        if (operands[1] == const0_rtx)
>         return "kxorw\t%0, %0, %0";
> @@ -2475,7 +2499,9 @@ (define_insn "*movhi_internal"
>      }
>  }
>    [(set (attr "type")
> -     (cond [(eq_attr "alternative" "4,5,6,7")
> +     (cond [(eq_attr "alternative" "9,10,11,12,13")
> +             (const_string "ssemov")
> +           (eq_attr "alternative" "4,5,6,7")
>               (const_string "mskmov")
>             (eq_attr "alternative" "8")
>               (const_string "msklog")
> @@ -2500,6 +2526,8 @@ (define_insn "*movhi_internal"
>      (set (attr "mode")
>        (cond [(eq_attr "type" "imovx")
>                (const_string "SI")
> +            (eq_attr "alternative" "11")
> +              (const_string "HF")
>              (and (eq_attr "alternative" "1,2")
>                   (match_operand:HI 1 "aligned_operand"))
>                (const_string "SI")
> @@ -2508,7 +2536,12 @@ (define_insn "*movhi_internal"
>                        (not (match_test "TARGET_HIMODE_MATH"))))
>                (const_string "SI")
>             ]
> -           (const_string "HI")))])
> +           (const_string "HI")))
> +    (set (attr "isa")
> +        (cond [(eq_attr "alternative" "9,10,11,12,13")
> +               (const_string "avx512fp16")
> +              ]
> +              (const_string "*")))])
>
>  ;; Situation is quite tricky about when to choose full sized (SImode) move
>  ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
> @@ -3158,10 +3191,34 @@ (define_insn "*pushsf"
>     (set_attr "unit" "i387,*,*")
>     (set_attr "mode" "SF,SI,SF")])
>
> +(define_insn "*pushhf_rex64"
> +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> +       (match_operand:HF 1 "nonmemory_no_elim_operand" "r,x"))]
> +  "TARGET_64BIT && TARGET_AVX512FP16"
> +{
> +  /* Anything else should be already split before reg-stack.  */
> +  gcc_assert (which_alternative == 0);
> +  return "push{q}\t%q1";
> +}
> +  [(set_attr "type" "push,multi")
> +   (set_attr "mode" "DI,HF")])
> +
> +(define_insn "*pushhf"
> +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> +       (match_operand:HF 1 "general_no_elim_operand" "rmF,x"))]
> +  "!TARGET_64BIT && TARGET_AVX512FP16"
> +{
> +  /* Anything else should be already split before reg-stack.  */
> +  gcc_assert (which_alternative == 0);
> +  return "push{l}\t%k1";
> +}
> +  [(set_attr "type" "push,multi")
> +   (set_attr "mode" "SI,HF")])
> +
>  ;; %%% Kill this when call knows how to work this out.
>  (define_split
> -  [(set (match_operand:SF 0 "push_operand")
> -       (match_operand:SF 1 "any_fp_register_operand"))]
> +  [(set (match_operand:MODESH 0 "push_operand")
> +       (match_operand:MODESH 1 "any_fp_register_operand"))]
>    "reload_completed"
>    [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2)))
>     (set (match_dup 0) (match_dup 1))]
> @@ -3209,8 +3266,8 @@ (define_expand "movtf"
>    "ix86_expand_move (TFmode, operands); DONE;")
>
>  (define_expand "mov<mode>"
> -  [(set (match_operand:X87MODEF 0 "nonimmediate_operand")
> -       (match_operand:X87MODEF 1 "general_operand"))]
> +  [(set (match_operand:X87MODEFH 0 "nonimmediate_operand")
> +       (match_operand:X87MODEFH 1 "general_operand"))]
>    ""
>    "ix86_expand_move (<MODE>mode, operands); DONE;")
>
> @@ -3646,6 +3703,56 @@ (define_insn "*movsf_internal"
>            ]
>            (const_string "*")))])
>
> +(define_insn "*movhf_internal"
> + [(set (match_operand:HF 0 "nonimmediate_operand"
> +        "=?r,?m,v,v,v,m,?r,?v,r  ,m")
> +       (match_operand:HF 1 "general_operand"
> +        "rmF,rF,C,v,m,v,v ,r ,rmF,rF"))]
> + "TARGET_AVX512FP16
> +  && !(MEM_P (operands[0]) && MEM_P (operands[1]))
> +  && (lra_in_progress
> +      || reload_completed
> +      || !CONST_DOUBLE_P (operands[1])
> +      || standard_sse_constant_p (operands[1], HFmode) == 1
> +      || memory_operand (operands[0], HFmode))"
> +{
> +  switch (get_attr_type (insn))
> +    {
> +    case TYPE_IMOV:
> +      return "mov{w}\t{%1, %0|%0, %1}";
> +
> +    case TYPE_SSELOG1:
> +      return standard_sse_constant_opcode (insn, operands);
> +
> +    case TYPE_SSEMOV:
> +      return ix86_output_ssemov (insn, operands);
> +
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +  [(set (attr "type")
> +       (cond [(eq_attr "alternative" "0,1,8,9")
> +                (const_string "imov")
> +              (eq_attr "alternative" "2")
> +                (const_string "sselog1")
> +             ]
> +             (const_string "ssemov")))
> +   (set (attr "prefix")
> +       (cond [(eq_attr "alternative" "0,1,8,9")
> +                (const_string "orig")
> +              (eq_attr "alternative" "2")
> +                (const_string "maybe_evex")
> +             ]
> +             (const_string "evex")))
> +   (set (attr "mode")
> +       (cond [(eq_attr "alternative" "0,1,6,7,8,9")
> +                (const_string "HI")
> +              (eq_attr "alternative" "2")
> +                (const_string "V4SF")
> +             ]
> +             (const_string "HF")))])
> +
>  (define_split
>    [(set (match_operand 0 "any_fp_register_operand")
>         (match_operand 1 "memory_operand"))]
> @@ -4383,6 +4490,17 @@ (define_split
>    emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
>  })
>
> +(define_insn "extendhf<mode>2"
> +  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
> +        (float_extend:MODEF
> +         (match_operand:HF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvtsh2<ssemodesuffix>\t{%1, %0, %0|%0, %0, %1}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<MODE>")])
> +
> +
>  (define_expand "extend<mode>xf2"
>    [(set (match_operand:XF 0 "nonimmediate_operand")
>          (float_extend:XF (match_operand:MODEF 1 "general_operand")))]
> @@ -4560,6 +4678,18 @@ (define_insn "truncxf<mode>2"
>               (symbol_ref "flag_unsafe_math_optimizations")
>            ]
>            (symbol_ref "true")))])
> +
> +;; Conversion from {SF,DF}mode to HFmode.
> +
> +(define_insn "trunc<mode>hf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (float_truncate:HF
> +         (match_operand:MODEF 1 "register_ssemem_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvt<ssemodesuffix>2sh\t{%1, %d0|%d0, %1}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
>
>  ;; Signed conversion to DImode.
>
> @@ -4936,6 +5066,16 @@ (define_insn "*float<SWI48:mode><MODEF:mode>2"
>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")]
>            (symbol_ref "true")))])
>
> +(define_insn "float<floatunssuffix><mode>hf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (any_float:HF
> +         (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvt<floatsuffix>si2sh<rex64suffix>\t{%1, %d0|%d0, %1}"
> +  [(set_attr "type" "sseicvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*floatdi<MODEF:mode>2_i387"
>    [(set (match_operand:MODEF 0 "register_operand" "=f")
>         (float:MODEF (match_operand:DI 1 "nonimmediate_operand" "m")))]
> @@ -7517,10 +7657,10 @@ (define_expand "<insn>xf3"
>    "TARGET_80387")
>
>  (define_expand "<insn><mode>3"
> -  [(set (match_operand:MODEF 0 "register_operand")
> -       (plusminus:MODEF
> -         (match_operand:MODEF 1 "register_operand")
> -         (match_operand:MODEF 2 "nonimmediate_operand")))]
> +  [(set (match_operand:MODEFH 0 "register_operand")
> +       (plusminus:MODEFH
> +         (match_operand:MODEFH 1 "register_operand")
> +         (match_operand:MODEFH 2 "nonimmediate_operand")))]
>    "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
>      || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)")
>
> @@ -8094,9 +8234,9 @@ (define_expand "mulxf3"
>    "TARGET_80387")
>
>  (define_expand "mul<mode>3"
> -  [(set (match_operand:MODEF 0 "register_operand")
> -       (mult:MODEF (match_operand:MODEF 1 "register_operand")
> -                   (match_operand:MODEF 2 "nonimmediate_operand")))]
> +  [(set (match_operand:MODEFH 0 "register_operand")
> +       (mult:MODEFH (match_operand:MODEFH 1 "register_operand")
> +                   (match_operand:MODEFH 2 "nonimmediate_operand")))]
>    "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
>      || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)")
>
> @@ -8111,9 +8251,9 @@ (define_expand "divxf3"
>    "TARGET_80387")
>
>  (define_expand "div<mode>3"
> -  [(set (match_operand:MODEF 0 "register_operand")
> -       (div:MODEF (match_operand:MODEF 1 "register_operand")
> -                  (match_operand:MODEF 2 "nonimmediate_operand")))]
> +  [(set (match_operand:MODEFH 0 "register_operand")
> +       (div:MODEFH (match_operand:MODEFH 1 "register_operand")
> +                  (match_operand:MODEFH 2 "nonimmediate_operand")))]
>    "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
>      || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
>  {
> @@ -16105,6 +16245,22 @@ (define_insn "*fop_<mode>_comm"
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
>
> +(define_insn "*fop_hf_comm"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (match_operator:HF 3 "binary_fp_operator"
> +         [(match_operand:HF 1 "nonimmediate_operand" "%v")
> +          (match_operand:HF 2 "nonimmediate_operand" "vm")]))]
> +  "TARGET_AVX512FP16
> +   && COMMUTATIVE_ARITH_P (operands[3])
> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> +  "* return output_387_binary_op (insn, operands);"
> +  [(set (attr "type")
> +       (if_then_else (match_operand:HF 3 "mult_operator")
> +         (const_string "ssemul")
> +         (const_string "sseadd")))
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*rcpsf2_sse"
>    [(set (match_operand:SF 0 "register_operand" "=x,x,x")
>         (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
> @@ -16178,6 +16334,22 @@ (define_insn "*fop_<mode>_1"
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
>
> +(define_insn "*fop_hf_1"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (match_operator:HF 3 "binary_fp_operator"
> +         [(match_operand:HF 1 "nonimmediate_operand" "v")
> +          (match_operand:HF 2 "nonimmediate_operand" "vm")]))]
> +  "TARGET_AVX512FP16
> +   && !COMMUTATIVE_ARITH_P (operands[3])
> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> +  "* return output_387_binary_op (insn, operands);"
> +  [(set (attr "type")
> +       (if_then_else (match_operand:MODEF 3 "div_operator")
> +         (const_string "ssediv")
> +         (const_string "sseadd")))
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<MODE>")])
> +
>  (define_insn "*fop_<X87MODEF:mode>_2_i387"
>    [(set (match_operand:X87MODEF 0 "register_operand" "=f")
>         (match_operator:X87MODEF 3 "binary_fp_operator"
> @@ -18972,11 +19144,11 @@ (define_peephole2
>  })
>
>  (define_expand "mov<mode>cc"
> -  [(set (match_operand:X87MODEF 0 "register_operand")
> -       (if_then_else:X87MODEF
> +  [(set (match_operand:X87MODEFH 0 "register_operand")
> +       (if_then_else:X87MODEFH
>           (match_operand 1 "comparison_operator")
> -         (match_operand:X87MODEF 2 "register_operand")
> -         (match_operand:X87MODEF 3 "register_operand")))]
> +         (match_operand:X87MODEFH 2 "register_operand")
> +         (match_operand:X87MODEFH 3 "register_operand")))]
>    "(TARGET_80387 && TARGET_CMOVE)
>     || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
>    "if (ix86_expand_fp_movcc (operands)) DONE; else FAIL;")
> @@ -19140,10 +19312,10 @@ (define_insn "<code><mode>3"
>  ;; presence of -0.0 and NaN.
>
>  (define_insn "*ieee_s<ieee_maxmin><mode>3"
> -  [(set (match_operand:MODEF 0 "register_operand" "=x,v")
> -       (unspec:MODEF
> -         [(match_operand:MODEF 1 "register_operand" "0,v")
> -          (match_operand:MODEF 2 "nonimmediate_operand" "xm,vm")]
> +  [(set (match_operand:MODEFH 0 "register_operand" "=x,v")
> +       (unspec:MODEFH
> +         [(match_operand:MODEFH 1 "register_operand" "0,v")
> +          (match_operand:MODEFH 2 "nonimmediate_operand" "xm,vm")]
>           IEEE_MAXMIN))]
>    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
>    "@
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 7b8547bb1c3..ad366974b5b 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1166,3 +1166,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property.
>  mmwait
>  Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save
>  Support MWAIT and MONITOR built-in functions and code generation.
> +
> +mavx512fp16
> +Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
> +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation.
> diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
> index f129de4bbe5..5344e22c9c8 100644
> --- a/gcc/config/i386/immintrin.h
> +++ b/gcc/config/i386/immintrin.h
> @@ -94,6 +94,8 @@
>
>  #include <avx512vp2intersectvlintrin.h>
>
> +#include <avx512fp16intrin.h>
> +
>  #include <shaintrin.h>
>
>  #include <fmaintrin.h>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index ffcc0c81964..446f9ba552f 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -321,6 +321,11 @@ (define_mode_iterator VF2_512_256VL
>  (define_mode_iterator VF_128
>    [V4SF (V2DF "TARGET_SSE2")])
>
> +;; All 128bit vector HF/SF/DF modes
> +(define_mode_iterator VFH_128
> +  [(V8HF "TARGET_AVX512FP16")
> +   V4SF (V2DF "TARGET_SSE2")])
> +
>  ;; All 256bit vector float modes
>  (define_mode_iterator VF_256
>    [V8SF V4DF])
> @@ -730,8 +735,10 @@ (define_mode_attr avx512bcst
>
>  ;; Mapping from float mode to required SSE level
>  (define_mode_attr sse
> -  [(SF "sse") (DF "sse2")
> +  [(SF "sse") (DF "sse2") (HF "avx512fp16")
>     (V4SF "sse") (V2DF "sse2")
> +   (V32HF "avx512fp16") (V16HF "avx512fp16")
> +   (V8HF "avx512fp16")
>     (V16SF "avx512f") (V8SF "avx")
>     (V8DF "avx512f") (V4DF "avx")])
>
> @@ -869,6 +876,7 @@ (define_mode_attr ssescalarmode
>     (V32HI "HI") (V16HI "HI") (V8HI "HI")
>     (V16SI "SI") (V8SI "SI")  (V4SI "SI")
>     (V8DI "DI")  (V4DI "DI")  (V2DI "DI")
> +   (V32HF "HF") (V16HF "HF") (V8HF "HF")
>     (V16SF "SF") (V8SF "SF")  (V4SF "SF")
>     (V8DF "DF")  (V4DF "DF")  (V2DF "DF")
>     (V4TI "TI")  (V2TI "TI")])
> @@ -948,10 +956,10 @@ (define_mode_attr sseintprefix
>
>  ;; SSE scalar suffix for vector modes
>  (define_mode_attr ssescalarmodesuffix
> -  [(SF "ss") (DF "sd")
> -   (V16SF "ss") (V8DF "sd")
> -   (V8SF "ss") (V4DF "sd")
> -   (V4SF "ss") (V2DF "sd")
> +  [(HF "sh") (SF "ss") (DF "sd")
> +   (V32HF "sh") (V16SF "ss") (V8DF "sd")
> +   (V16HF "sh") (V8SF "ss") (V4DF "sd")
> +   (V8HF "sh") (V4SF "ss") (V2DF "sd")
>     (V16SI "d") (V8DI "q")
>     (V8SI "d") (V4DI "q")
>     (V4SI "d") (V2DI "q")])
> @@ -1903,12 +1911,12 @@ (define_insn "*<insn><mode>3<mask_name><round_name>"
>  ;; Standard scalar operation patterns which preserve the rest of the
>  ;; vector for combiner.
>  (define_insn "*<sse>_vm<insn><mode>3"
> -  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> -       (vec_merge:VF_128
> -         (vec_duplicate:VF_128
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +       (vec_merge:VFH_128
> +         (vec_duplicate:VFH_128
>             (plusminus:<ssescalarmode>
>               (vec_select:<ssescalarmode>
> -               (match_operand:VF_128 1 "register_operand" "0,v")
> +               (match_operand:VFH_128 1 "register_operand" "0,v")
>                 (parallel [(const_int 0)]))
>               (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")))
>           (match_dup 1)
> @@ -1966,12 +1974,12 @@ (define_insn "*mul<mode>3<mask_name><round_name>"
>  ;; Standard scalar operation patterns which preserve the rest of the
>  ;; vector for combiner.
>  (define_insn "*<sse>_vm<multdiv_mnemonic><mode>3"
> -  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> -       (vec_merge:VF_128
> -         (vec_duplicate:VF_128
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +       (vec_merge:VFH_128
> +         (vec_duplicate:VFH_128
>             (multdiv:<ssescalarmode>
>               (vec_select:<ssescalarmode>
> -               (match_operand:VF_128 1 "register_operand" "0,v")
> +               (match_operand:VFH_128 1 "register_operand" "0,v")
>                 (parallel [(const_int 0)]))
>               (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")))
>           (match_dup 1)
> @@ -2368,12 +2376,12 @@ (define_insn "ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>"
>  ;; Standard scalar operation patterns which preserve the rest of the
>  ;; vector for combiner.
>  (define_insn "*ieee_<ieee_maxmin><mode>3"
> -  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> -       (vec_merge:VF_128
> -         (vec_duplicate:VF_128
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +       (vec_merge:VFH_128
> +         (vec_duplicate:VFH_128
>             (unspec:<ssescalarmode>
>               [(vec_select:<ssescalarmode>
> -                (match_operand:VF_128 1 "register_operand" "0,v")
> +                (match_operand:VFH_128 1 "register_operand" "0,v")
>                  (parallel [(const_int 0)]))
>                (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")]
>                IEEE_MAXMIN))
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2dc6a2106d9..3e1b1dbd606 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1392,6 +1392,7 @@ See RS/6000 and PowerPC Options.
>  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
>  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
>  -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni@gol
> +-mavx512fp16 @gol
>  -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
>  -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
>  -mkl -mwidekl @gol
> @@ -31059,6 +31060,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
>  @itemx -mavx512bf16
>  @opindex mavx512bf16
>  @need 200
> +@itemx -mavx512fp16
> +@opindex mavx512fp16
> +@need 200
>  @itemx -mgfni
>  @opindex mgfni
>  @need 200
> @@ -31137,9 +31141,9 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP,
>  XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2,
>  GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
>  ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
> -UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI or CLDEMOTE
> -extended instruction sets. Each has a corresponding @option{-mno-} option to
> -disable use of these instructions.
> +UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16
> +or CLDEMOTE extended instruction sets. Each has a corresponding
> +@option{-mno-} option to disable use of these instructions.
>
>  These extensions are also available as built-in functions: see
>  @ref{x86 Built-in Functions}, for details of the functions enabled and
> diff --git a/gcc/testsuite/g++.target/i386/float16-1.C b/gcc/testsuite/g++.target/i386/float16-1.C
> new file mode 100644
> index 00000000000..8f07e85d184
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-1.C
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-avx512fp16" } */
> +
> +_Float16
> +foo (_Float16 x) /* { dg-error "is not supported on this target\[\n\r]*compilation terminated" } */
> +{
> +  return x;
> +}
> diff --git a/gcc/testsuite/g++.target/i386/float16-2.C b/gcc/testsuite/g++.target/i386/float16-2.C
> new file mode 100644
> index 00000000000..99eb797eff1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-2.C
> @@ -0,0 +1,14 @@
> +/* { dg-do assemble { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +union flt
> +{
> +  _Float16 flt;
> +  short s;
> +};
> +
> +_Float16
> +foo (union flt x)
> +{
> +  return x.flt;
> +}
> diff --git a/gcc/testsuite/g++.target/i386/float16-3.C b/gcc/testsuite/g++.target/i386/float16-3.C
> new file mode 100644
> index 00000000000..940878503f1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-3.C
> @@ -0,0 +1,10 @@
> +/* { dg-do assemble { target avx512fp16 } } */
> +/* { dg-options "-O0 -mavx512fp16" } */
> +
> +template <typename> void a(char *) {}
> +char b, d;
> +void c()
> +{
> +  a<unsigned char>(&d);
> +  a<_Float16>(&b);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> index 6178e38ce02..f3676077743 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw" } */
> +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
> index 986fbd819e4..1751c52565c 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw" } */
> +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> index 0a377dba1d5..0ad9064f637 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> @@ -87,6 +87,9 @@ main ()
>  #ifdef AVX512VNNI
>        && (ecx & bit_AVX512VNNI)
>  #endif
> +#ifdef AVX512FP16
> +      && (edx & bit_AVX512FP16)
> +#endif
>  #ifdef VAES
>        && (ecx & bit_VAES)
>  #endif
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> new file mode 100644
> index 00000000000..88887556d68
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +__attribute__ ((noinline, noclone))
> +do_max (_Float16 __A, _Float16 __B)
> +{
> +  return __A > __B ? __A : __B;
> +}
> +
> +_Float16
> +__attribute__ ((noinline, noclone))
> +do_min (_Float16 __A, _Float16 __B)
> +{
> +  return __A < __B ? __A : __B;
> +}
> +
> +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> new file mode 100644
> index 00000000000..c9e23bf95c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +#include "avx512fp16-12a.c"
> +
> +static void
> +do_test (void)
> +{
> +  _Float16 x = 0.1f;
> +  _Float16 y = -3.2f;
> +  _Float16 z;
> +
> +  z = do_max (x, y);
> +  if (z != x)
> +    abort ();
> +
> +  z = do_min (x, y);
> +  if (z != y)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/float16-1.c b/gcc/testsuite/gcc.target/i386/float16-1.c
> new file mode 100644
> index 00000000000..8f07e85d184
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-avx512fp16" } */
> +
> +_Float16
> +foo (_Float16 x) /* { dg-error "is not supported on this target\[\n\r]*compilation terminated" } */
> +{
> +  return x;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/float16-2.c b/gcc/testsuite/gcc.target/i386/float16-2.c
> new file mode 100644
> index 00000000000..99eb797eff1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do assemble { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +union flt
> +{
> +  _Float16 flt;
> +  short s;
> +};
> +
> +_Float16
> +foo (union flt x)
> +{
> +  return x.flt;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/float16-3a.c b/gcc/testsuite/gcc.target/i386/float16-3a.c
> new file mode 100644
> index 00000000000..3846c8e9b6e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-3a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtsi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-3b.c b/gcc/testsuite/gcc.target/i386/float16-3b.c
> new file mode 100644
> index 00000000000..247dd6e7e33
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-3b.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (unsigned int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtusi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-4a.c b/gcc/testsuite/gcc.target/i386/float16-4a.c
> new file mode 100644
> index 00000000000..631082581f3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-4a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (long long x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtsi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-4b.c b/gcc/testsuite/gcc.target/i386/float16-4b.c
> new file mode 100644
> index 00000000000..828d8530769
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-4b.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (unsigned long long x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtusi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> index 79265c7c94f..8499fdf2db9 100644
> --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> @@ -79,6 +79,7 @@ extern void test_hreset (void)                        __attribute__((__target__("hreset")));
>  extern void test_keylocker (void)              __attribute__((__target__("kl")));
>  extern void test_widekl (void)                 __attribute__((__target__("widekl")));
>  extern void test_avxvnni (void)                        __attribute__((__target__("avxvnni")));
> +extern void test_avx512fp16 (void)             __attribute__((__target__("avx512fp16")));
>
>  extern void test_no_sgx (void)                 __attribute__((__target__("no-sgx")));
>  extern void test_no_avx5124fmaps(void)         __attribute__((__target__("no-avx5124fmaps")));
> @@ -159,6 +160,7 @@ extern void test_no_hreset (void)           __attribute__((__target__("no-hreset")));
>  extern void test_no_keylocker (void)           __attribute__((__target__("no-kl")));
>  extern void test_no_widekl (void)              __attribute__((__target__("no-widekl")));
>  extern void test_no_avxvnni (void)             __attribute__((__target__("no-avxvnni")));
> +extern void test_no_avx512fp16 (void)          __attribute__((__target__("no-avx512fp16")));
>
>  extern void test_arch_nocona (void)            __attribute__((__target__("arch=nocona")));
>  extern void test_arch_core2 (void)             __attribute__((__target__("arch=core2")));
> diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> new file mode 100644
> index 00000000000..87b4f459a5a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> +
> +#include <immintrin.h>
> +
> +__m128h
> +foo (__m128h x, __m128h y)
> +{
> +  x[0] = x[0] > y[0] ? x[0] : y[0];
> +  return x;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> index 7029771334b..f5f5c113612 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> index 4ce0ffffaf3..747d504cedb 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> index 6e8b6f3fa1b..33411969901 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> @@ -103,7 +103,7 @@
>
>
>  #ifndef DIFFERENT_PRAGMAS
> -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>  #endif
>
>  /* Following intrinsics require immediate arguments.  They
> @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
>
>  /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
>  #ifdef DIFFERENT_PRAGMAS
> -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>  #endif
>  #include <immintrin.h>
>  test_1 (_cvtss_sh, unsigned short, float, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> index 7faa053ace8..86590ca5ffb 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> @@ -708,6 +708,6 @@
>  #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1)
>  #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1)
>
> -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>
>  #include <x86intrin.h>
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index 7f78c5593ac..3a7f19ca8a7 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3020,7 +3020,7 @@ proc check_effective_target_has_q_floating_suffix { } {
>
>  proc check_effective_target_float16 {} {
>      return [check_no_compiler_messages_nocache float16 object {
> -        _Float16 x;
> +        _Float16 foo (_Float16 x) { return x; }
>      } [add_options_for_float16 ""]]
>  }
>
> @@ -8654,6 +8654,17 @@ proc check_prefer_avx128 { } {
>  }
>
>
> +# Return 1 if avx512fp16 instructions can be compiled.
> +
> +proc check_effective_target_avx512fp16 { } {
> +    return [check_no_compiler_messages avx512fp16 object {
> +       void foo (void)
> +       {
> +         asm volatile ("vmovw %di, %xmm0");
> +       }
> +    } "-O2 -mavx512fp16" ]
> +}
> +
>  # Return 1 if avx512f instructions can be compiled.
>
>  proc check_effective_target_avx512f { } {
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
       [not found] <20210701054808.39000-1-hongtao.liu@intel.com>
                   ` (2 preceding siblings ...)
       [not found] ` <20210701054808.39000-2-hongtao.liu@intel.com>
@ 2021-07-01 11:10 ` Uros Bizjak
  2021-07-01 12:39   ` H.J. Lu
  2021-07-02  6:30   ` Hongtao Liu
  3 siblings, 2 replies; 138+ messages in thread
From: Uros Bizjak @ 2021-07-01 11:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: Hongtao Liu, H. J. Lu, Jakub Jelinek, liuhongt

[Sorry for double post, gcc-patches address was wrong in original post]

On Thu, Jul 1, 2021 at 7:48 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> Hi:
>   AVX512FP16 is disclosed, refer to [1].
>   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
>   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
>   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
>
> [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> [2] https://reviews.llvm.org/D33719

Looking through implementation of _Float16 support, I think, there is
no need for _Float16 support to depend on AVX512FP16.

The compiler is smart enough to use either a named pattern that
describes the instruction when available or diverts to a library call
to a soft-fp implementation. So, I think that general _Float16 support
should be implemented first (similar to _float128) and then upgraded
with AVX512FP16 specific instructions.

MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
secondary_reload register.

soft-fp library already includes all the infrastructure to implement
_Float16 (see half.h), so HFmode basic operations should be trivial to
implement (I have gone through this exercise personally years ago when
implementing __float128 soft-fp support).

Looking through the patch 1/2, it looks that a new ABI is introduced,
where FP16 values are passed through XMM registers, but I don't think
there is updated psABI documentation available (for x86_64 as well as
i386, where FP16 values will probably be passed through memory).

So, the net effect of the above proposal(s) is that x86 will support
_Float16 out-of the box, emulate it via soft-fp without AVX512FP16 and
use AVX512FP16 instructions with -mavx512fp16.

Uros.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-01 11:10 ` [PATCH 0/2] Initial support for AVX512FP16 Uros Bizjak
@ 2021-07-01 12:39   ` H.J. Lu
  2021-07-01 12:58     ` Richard Biener
                       ` (2 more replies)
  2021-07-02  6:30   ` Hongtao Liu
  1 sibling, 3 replies; 138+ messages in thread
From: H.J. Lu @ 2021-07-01 12:39 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Hongtao Liu, Jakub Jelinek, liuhongt

On Thu, Jul 1, 2021 at 4:10 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> [Sorry for double post, gcc-patches address was wrong in original post]
>
> On Thu, Jul 1, 2021 at 7:48 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > Hi:
> >   AVX512FP16 is disclosed, refer to [1].
> >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
> >   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
> >   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
> >
> > [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > [2] https://reviews.llvm.org/D33719
>
> Looking through implementation of _Float16 support, I think, there is
> no need for _Float16 support to depend on AVX512FP16.
>
> The compiler is smart enough to use either a named pattern that
> describes the instruction when available or diverts to a library call
> to a soft-fp implementation. So, I think that general _Float16 support
> should be implemented first (similar to _float128) and then upgraded
> with AVX512FP16 specific instructions.
>
> MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> secondary_reload register.
>
> soft-fp library already includes all the infrastructure to implement
> _Float16 (see half.h), so HFmode basic operations should be trivial to
> implement (I have gone through this exercise personally years ago when
> implementing __float128 soft-fp support).
>
> Looking through the patch 1/2, it looks that a new ABI is introduced,
> where FP16 values are passed through XMM registers, but I don't think
> there is updated psABI documentation available (for x86_64 as well as

_Float16 support was added to x86-64 psABI:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/71d1183e7bb95e9f8ad732e0f2b5a4f127796e2a

2 years ago.

> i386, where FP16 values will probably be passed through memory).

That is correct.

> So, the net effect of the above proposal(s) is that x86 will support
> _Float16 out-of the box, emulate it via soft-fp without AVX512FP16 and
> use AVX512FP16 instructions with -mavx512fp16.
>

The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
require -mavx512fp16, we need to compile complex _Float16 functions in
libgcc without -mavx512fp16.  Complex _Float16 performance is very
important for our _Float16 usage.   _Float16 performance has to be
very fast.  There should be no emulation anywhere when -mavx512fp16
is used.   That is why _Float16 is available only with -mavx512fp16.

-- 
H.J.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-01 12:39   ` H.J. Lu
@ 2021-07-01 12:58     ` Richard Biener
  2021-07-01 13:03       ` Jakub Jelinek
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
  2021-07-01 12:58     ` [PATCH 0/2] " Uros Bizjak
  2021-07-01 21:40     ` Joseph Myers
  2 siblings, 2 replies; 138+ messages in thread
From: Richard Biener @ 2021-07-01 12:58 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, Jakub Jelinek, liuhongt, gcc-patches

On Thu, Jul 1, 2021 at 2:41 PM H.J. Lu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Jul 1, 2021 at 4:10 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > [Sorry for double post, gcc-patches address was wrong in original post]
> >
> > On Thu, Jul 1, 2021 at 7:48 AM liuhongt <hongtao.liu@intel.com> wrote:
> > >
> > > Hi:
> > >   AVX512FP16 is disclosed, refer to [1].
> > >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
> > >   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
> > >   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
> > >
> > > [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > > [2] https://reviews.llvm.org/D33719
> >
> > Looking through implementation of _Float16 support, I think, there is
> > no need for _Float16 support to depend on AVX512FP16.
> >
> > The compiler is smart enough to use either a named pattern that
> > describes the instruction when available or diverts to a library call
> > to a soft-fp implementation. So, I think that general _Float16 support
> > should be implemented first (similar to _float128) and then upgraded
> > with AVX512FP16 specific instructions.
> >
> > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> > secondary_reload register.
> >
> > soft-fp library already includes all the infrastructure to implement
> > _Float16 (see half.h), so HFmode basic operations should be trivial to
> > implement (I have gone through this exercise personally years ago when
> > implementing __float128 soft-fp support).
> >
> > Looking through the patch 1/2, it looks that a new ABI is introduced,
> > where FP16 values are passed through XMM registers, but I don't think
> > there is updated psABI documentation available (for x86_64 as well as
>
> _Float16 support was added to x86-64 psABI:
>
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/71d1183e7bb95e9f8ad732e0f2b5a4f127796e2a
>
> 2 years ago.
>
> > i386, where FP16 values will probably be passed through memory).
>
> That is correct.
>
> > So, the net effect of the above proposal(s) is that x86 will support
> > _Float16 out-of the box, emulate it via soft-fp without AVX512FP16 and
> > use AVX512FP16 instructions with -mavx512fp16.
> >
>
> The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> require -mavx512fp16, we need to compile complex _Float16 functions in
> libgcc without -mavx512fp16.  Complex _Float16 performance is very
> important for our _Float16 usage.   _Float16 performance has to be
> very fast.  There should be no emulation anywhere when -mavx512fp16
> is used.   That is why _Float16 is available only with -mavx512fp16.

It should be possible to emulate scalar _Float16 using _Float32 with a
reasonable
performance trade-off.  I think users caring for _Float16 performance will
use vector intrinsics anyway since for scalar code _Float32 code will likely
perform the same (at double storage cost)

Richard.

> --
> H.J.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-01 12:39   ` H.J. Lu
  2021-07-01 12:58     ` Richard Biener
@ 2021-07-01 12:58     ` Uros Bizjak
  2021-07-01 21:40     ` Joseph Myers
  2 siblings, 0 replies; 138+ messages in thread
From: Uros Bizjak @ 2021-07-01 12:58 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Hongtao Liu, Jakub Jelinek, liuhongt

On Thu, Jul 1, 2021 at 2:40 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jul 1, 2021 at 4:10 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > [Sorry for double post, gcc-patches address was wrong in original post]
> >
> > On Thu, Jul 1, 2021 at 7:48 AM liuhongt <hongtao.liu@intel.com> wrote:
> > >
> > > Hi:
> > >   AVX512FP16 is disclosed, refer to [1].
> > >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
> > >   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
> > >   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
> > >
> > > [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > > [2] https://reviews.llvm.org/D33719
> >
> > Looking through implementation of _Float16 support, I think, there is
> > no need for _Float16 support to depend on AVX512FP16.
> >
> > The compiler is smart enough to use either a named pattern that
> > describes the instruction when available or diverts to a library call
> > to a soft-fp implementation. So, I think that general _Float16 support
> > should be implemented first (similar to _float128) and then upgraded
> > with AVX512FP16 specific instructions.
> >
> > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> > secondary_reload register.
> >
> > soft-fp library already includes all the infrastructure to implement
> > _Float16 (see half.h), so HFmode basic operations should be trivial to
> > implement (I have gone through this exercise personally years ago when
> > implementing __float128 soft-fp support).
> >
> > Looking through the patch 1/2, it looks that a new ABI is introduced,
> > where FP16 values are passed through XMM registers, but I don't think
> > there is updated psABI documentation available (for x86_64 as well as
>
> _Float16 support was added to x86-64 psABI:
>
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/71d1183e7bb95e9f8ad732e0f2b5a4f127796e2a
>
> 2 years ago.

Uh, sorry, my psABI link [1] is way out of date, but this is what
google gives for "x86_64 psABI pdf" ...

[1] https://uclibc.org/docs/psABI-x86_64.pdf

>
> > i386, where FP16 values will probably be passed through memory).
>
> That is correct.
>
> > So, the net effect of the above proposal(s) is that x86 will support
> > _Float16 out-of the box, emulate it via soft-fp without AVX512FP16 and
> > use AVX512FP16 instructions with -mavx512fp16.
> >
>
> The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> require -mavx512fp16, we need to compile complex _Float16 functions in
> libgcc without -mavx512fp16.  Complex _Float16 performance is very
> important for our _Float16 usage.   _Float16 performance has to be
> very fast.  There should be no emulation anywhere when -mavx512fp16
> is used.   That is why _Float16 is available only with -mavx512fp16.

If this performance is important, then the best way is that in
addition to generic versions, recompile these functions for AVX512FP16
target, or even implement them in assembly. The compiler can then call
these specific functions when -mavx512fp16 is used. Please see how
alpha implements calls to  its X_floating library.

Uros.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-01 12:58     ` Richard Biener
@ 2021-07-01 13:03       ` Jakub Jelinek
  2021-07-06  8:51         ` Hongtao Liu
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
  1 sibling, 1 reply; 138+ messages in thread
From: Jakub Jelinek @ 2021-07-01 13:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: H.J. Lu, Uros Bizjak, liuhongt, gcc-patches

On Thu, Jul 01, 2021 at 02:58:01PM +0200, Richard Biener wrote:
> > The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> > require -mavx512fp16, we need to compile complex _Float16 functions in
> > libgcc without -mavx512fp16.  Complex _Float16 performance is very
> > important for our _Float16 usage.   _Float16 performance has to be
> > very fast.  There should be no emulation anywhere when -mavx512fp16
> > is used.   That is why _Float16 is available only with -mavx512fp16.
> 
> It should be possible to emulate scalar _Float16 using _Float32 with a
> reasonable
> performance trade-off.  I think users caring for _Float16 performance will
> use vector intrinsics anyway since for scalar code _Float32 code will likely
> perform the same (at double storage cost)

Only if it is allowed to have excess precision for _Float16.  If not, then
one would need to (expensively?) round after every operation at least.

	Jakub


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-01  5:55 ` [PATCH 0/2] Initial support for AVX512FP16 Hongtao Liu
@ 2021-07-01 20:46   ` Joseph Myers
  2021-07-06  8:53     ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Joseph Myers @ 2021-07-01 20:46 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, Jakub Jelinek, liuhongt

Some general comments, following what I said on libc-alpha:


1. Can you confirm that the ABI being used for 64-bit, for _Float16 and 
_Complex _Float16 argument passing and return, follows the current x86_64 
ABI document?


2. Can you confirm that if you build with this instruction set extension 
enabled by default, and run GCC tests for a corresponding (emulated?) 
processor, all the existing float16 tests in the testsuite are enabled and 
PASS (both compilation and execution) (both 64-bit and 32-bit testing)?


3. There's an active 32-bit ABI mailing list (ia32-abi@googlegroups.com).  
If you want to support _Float16 in the 32-bit case, please work with it to 
get the corresponding ABI documented (using only memory and 
general-purpose registers seems like a good idea, so that the ABI can be 
supported for the base architecture without depending on SSE registers 
being present).  In the absence of 32-bit ABI support it might be better 
to disable the HFmode support for 32-bit.


4. Support for _Float16 really ought not to depend on whether a particular 
instruction set extension is present, just like with other floating-point 
types; it makes sense, as an API, for all x86 processors (and like many 
APIs, it will be faster on some processors than on others).  More specific 
points here are:

(a) Basic arithmetic (+-*/) can be done by converting to SFmode, doing 
arithmetic there and converting back to HFmode; the results of doing so 
will be correctly rounded.  Indeed, I think optabs.c handles that 
automatically when operations are available on a wider mode but not on the 
desired mode (but you'd need to check carefully that all the expected 
conversions do occur).

(b) Conversions to/from all other floating-point modes will always be 
needed, whether in hardware or in software.

(c) In the F16C (Ivy Bridge and later) case, where you have hardware 
conversions to/from float (only), it's fine to convert to double (or long 
double) via float.  (On efficiency grounds, widening from HFmode to TFmode 
should be a pure software operations, that should be faster than having an 
intermediate conversion to SFmode when the SFmode-to-TFmode conversion is 
a software operation.)

(d) In the F16C case (where there are hardware conversions only from 
SFmode, not from wider modes), conversion *from* DFmode (or XFmode or 
TFmode) to HFmode should be a software operation, to avoid double 
rounding; an intermediate conversion to SFmode would be incorrect.

(e) It's OK for conversions to/from integer modes to go via SFmode 
(although I don't know if that's efficient or not).  Any case where a 
conversion from integer to SFmode is inexact would overflow HFmode, so 
there are no double rounding issues.

(f) In the F16C case, it seems the hardware instructions only work on 
vectors, not scalars, so care would need to be taken to use them for 
scalar conversions only if the other elements of the vector register are 
known to be safe to convert without raising any exceptions (e.g. all zero 
bits, or -fno-trapping-math in effect).

(g) If concerned about efficiency of intermediate truncations on 
processors without hardware _Float16 arithmetic, look at 
aarch64_excess_precision; you have the option of using excess precision 
for _Float16 by default, though that only really helps for C given the 
lack of excess precision support in the C++ front end.  (Enabling this can 
cause trouble for code that only expects C99/C11 values of 
FLT_EVAL_METHOD, however; see the -fpermitted-flt-eval-methods option for 
more details.)


5. Suppose that in some cases you do disable _Float16 support (whether 
that's just for 32-bit until the ABI has been defined, or also in the 
absence of instruction set support despite my comments above).  Then the 
way you do that in this patch series, enabling the type in 
ix86_scalar_mode_supported_p and ix86_libgcc_floating_mode_supported_p and 
giving an error later in ix86_expand_move, is a bad idea.

Errors in expanders are generally problematic (they don't have good 
location information available).  But apart from that, ordinary user code 
should be able to tell whether _Float16 is supported by testing whether 
e.g. __FLT16_MANT_DIG__ is defined (like float.h does), or by including 
float.h (with __STDC_WANT_IEC_60559_TYPES_EXT__ defined) and then testing 
whether one of the FLT16_* macros is defined, or in a configure test by 
just declaring something using the _Float16 type.  Patch 1 changes 
check_effective_target_float16 to work around your technique for disabling 
_Float16 in ix86_expand_move, but it should be considered a stable user 
API that any of the above methods can be used in user code to check for 
_Float16 support - user code shouldn't need to know implementation details 
that you need to do something that will go through ix86_expand_move to see 
whether _Float16 is supported or not (and user code shouldn't need to use 
a configure test at all for this, testing FLT16_* after including float.h 
should work as a fully portable way of testing it - that's using only ISO 
C facilities).

So enable HFmode in ix86_scalar_mode_supported_p and 
ix86_libgcc_floating_mode_supported_p exactly when all operations are 
supported in the rest of the compiler - don't enable it there and then 
disable it elsewhere, because that will break user code testing for 
whether _Float16 is available using FLT16_* macros.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-01 12:39   ` H.J. Lu
  2021-07-01 12:58     ` Richard Biener
  2021-07-01 12:58     ` [PATCH 0/2] " Uros Bizjak
@ 2021-07-01 21:40     ` Joseph Myers
  2 siblings, 0 replies; 138+ messages in thread
From: Joseph Myers @ 2021-07-01 21:40 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, Jakub Jelinek, liuhongt, gcc-patches

On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:

> The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> require -mavx512fp16, we need to compile complex _Float16 functions in
> libgcc without -mavx512fp16.  Complex _Float16 performance is very
> important for our _Float16 usage.   _Float16 performance has to be
> very fast.  There should be no emulation anywhere when -mavx512fp16
> is used.   That is why _Float16 is available only with -mavx512fp16.

You could build IFUNC versions of the libgcc functions (like float128 on 
powerpc64le), to be fast (modulo any IFUNC overhead) when run on 
AVX512FP16 hardware.  Or arrange for different libcall names to be used 
depending on the instruction set features available, and build those 
functions under multiple names, to be fast when the application is built 
with -mavx512fp16.

Since the HCmode libgcc functions just convert to/from SFmode and do all 
their computations on SFmode (to avoid intermediate overflows / 
cancellation resulting in inaccuracy), an F16C version may make sense as 
well (assuming use of the F16C conversion instructions is still efficient 
once you allow for zeroing the unused parts of the vector register, if 
necessary to avoid spurious exceptions from converting junk data in those 
parts of the register).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-01 11:10 ` [PATCH 0/2] Initial support for AVX512FP16 Uros Bizjak
  2021-07-01 12:39   ` H.J. Lu
@ 2021-07-02  6:30   ` Hongtao Liu
  2021-07-02  8:03     ` Uros Bizjak
  1 sibling, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-07-02  6:30 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, H. J. Lu, Jakub Jelinek, liuhongt

On Thu, Jul 1, 2021 at 7:10 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> [Sorry for double post, gcc-patches address was wrong in original post]
>
> On Thu, Jul 1, 2021 at 7:48 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > Hi:
> >   AVX512FP16 is disclosed, refer to [1].
> >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
> >   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
> >   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
> >
> > [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > [2] https://reviews.llvm.org/D33719
>
> Looking through implementation of _Float16 support, I think, there is
> no need for _Float16 support to depend on AVX512FP16.
>
> The compiler is smart enough to use either a named pattern that
> describes the instruction when available or diverts to a library call
> to a soft-fp implementation. So, I think that general _Float16 support
> should be implemented first (similar to _float128) and then upgraded
> with AVX512FP16 specific instructions.
>
> MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> secondary_reload register.
>
MOVD is under sse2, so is pinsrw, which means if we want xmm
load/stores for HF, sse2 is the least requirement.
Also we support PEXTRW reg/m16, xmm, imm8 under SSE4_1 under which we
have 16bit direct load/store for HFmode and no need for a secondary
reload.
So for simplicity, can we just restrict _Float16 under sse4_1?
> soft-fp library already includes all the infrastructure to implement
> _Float16 (see half.h), so HFmode basic operations should be trivial to
> implement (I have gone through this exercise personally years ago when
> implementing __float128 soft-fp support).
>
> Looking through the patch 1/2, it looks that a new ABI is introduced,
> where FP16 values are passed through XMM registers, but I don't think
> there is updated psABI documentation available (for x86_64 as well as
> i386, where FP16 values will probably be passed through memory).
>
> So, the net effect of the above proposal(s) is that x86 will support
> _Float16 out-of the box, emulate it via soft-fp without AVX512FP16 and
> use AVX512FP16 instructions with -mavx512fp16.
>
> Uros.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-02  6:30   ` Hongtao Liu
@ 2021-07-02  8:03     ` Uros Bizjak
  2021-07-02  8:19       ` Richard Biener
  2021-07-05  1:25       ` Hongtao Liu
  0 siblings, 2 replies; 138+ messages in thread
From: Uros Bizjak @ 2021-07-02  8:03 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: gcc-patches, H. J. Lu, Jakub Jelinek, liuhongt

On Fri, Jul 2, 2021 at 8:25 AM Hongtao Liu <crazylht@gmail.com> wrote:

> > >   AVX512FP16 is disclosed, refer to [1].
> > >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
> > >   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
> > >   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
> > >
> > > [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > > [2] https://reviews.llvm.org/D33719
> >
> > Looking through implementation of _Float16 support, I think, there is
> > no need for _Float16 support to depend on AVX512FP16.
> >
> > The compiler is smart enough to use either a named pattern that
> > describes the instruction when available or diverts to a library call
> > to a soft-fp implementation. So, I think that general _Float16 support
> > should be implemented first (similar to _float128) and then upgraded
> > with AVX512FP16 specific instructions.
> >
> > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> > secondary_reload register.
> >
> MOVD is under sse2, so is pinsrw, which means if we want xmm
> load/stores for HF, sse2 is the least requirement.
> Also we support PEXTRW reg/m16, xmm, imm8 under SSE4_1 under which we
> have 16bit direct load/store for HFmode and no need for a secondary
> reload.
> So for simplicity, can we just restrict _Float16 under sse4_1?

When baseline is not met, the equivalent integer calling convention is
used, for example:

--cut here--
typedef int __v2si __attribute__ ((vector_size (8)));

__v2si foo (__v2si a, __v2si b)
{
  return a + b;
}
--cut here--

will still compile with -m32 -mno-mmx with warnings:

mmx1.c: In function ‘foo’:
mmx1.c:4:1: warning: MMX vector return without MMX enabled changes the
ABI [-Wpsabi]
mmx1.c:3:8: warning: MMX vector argument without MMX enabled changes
the ABI [-Wpsabi]

So, by setting the baseline to SSE4.1, a big pool of targets will be
forced to use alternative ABI. This is quite inconvenient, and we
revert to the alternative ABI if we *really*  can't satisfy ABI
requirements (e.g. register type is not available, basic move insn
can't be implemented). Based on your analysis, I think that SSE2
should be the baseline.

Also, looking at insn tables, it looks that movzwl from memory + movd
is faster than pinsrw (and similar for pextrw to memory), but I have
no hard data here.

Regarding secondary_reload, a scratch register is needed in case of
HImode moves between memory and XMM reg, since scratch register needs
a different mode than source and destination. Please see
TARGET_SECONDARY_RELOAD documentation and several examples in the
source.

Uros.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-02  8:03     ` Uros Bizjak
@ 2021-07-02  8:19       ` Richard Biener
  2021-07-03 14:44         ` Hongtao Liu
  2021-07-05  1:25       ` Hongtao Liu
  1 sibling, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-07-02  8:19 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Hongtao Liu, Jakub Jelinek, liuhongt, gcc-patches

On Fri, Jul 2, 2021 at 10:07 AM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, Jul 2, 2021 at 8:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> > > >   AVX512FP16 is disclosed, refer to [1].
> > > >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
> > > >   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
> > > >   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
> > > >
> > > > [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > > > [2] https://reviews.llvm.org/D33719
> > >
> > > Looking through implementation of _Float16 support, I think, there is
> > > no need for _Float16 support to depend on AVX512FP16.
> > >
> > > The compiler is smart enough to use either a named pattern that
> > > describes the instruction when available or diverts to a library call
> > > to a soft-fp implementation. So, I think that general _Float16 support
> > > should be implemented first (similar to _float128) and then upgraded
> > > with AVX512FP16 specific instructions.
> > >
> > > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> > > secondary_reload register.
> > >
> > MOVD is under sse2, so is pinsrw, which means if we want xmm
> > load/stores for HF, sse2 is the least requirement.
> > Also we support PEXTRW reg/m16, xmm, imm8 under SSE4_1 under which we
> > have 16bit direct load/store for HFmode and no need for a secondary
> > reload.
> > So for simplicity, can we just restrict _Float16 under sse4_1?
>
> When baseline is not met, the equivalent integer calling convention is
> used, for example:
>
> --cut here--
> typedef int __v2si __attribute__ ((vector_size (8)));
>
> __v2si foo (__v2si a, __v2si b)
> {
>   return a + b;
> }
> --cut here--
>
> will still compile with -m32 -mno-mmx with warnings:
>
> mmx1.c: In function ‘foo’:
> mmx1.c:4:1: warning: MMX vector return without MMX enabled changes the
> ABI [-Wpsabi]
> mmx1.c:3:8: warning: MMX vector argument without MMX enabled changes
> the ABI [-Wpsabi]
>
> So, by setting the baseline to SSE4.1, a big pool of targets will be
> forced to use alternative ABI. This is quite inconvenient, and we
> revert to the alternative ABI if we *really*  can't satisfy ABI
> requirements (e.g. register type is not available, basic move insn
> can't be implemented). Based on your analysis, I think that SSE2
> should be the baseline.
>
> Also, looking at insn tables, it looks that movzwl from memory + movd
> is faster than pinsrw (and similar for pextrw to memory), but I have
> no hard data here.
>
> Regarding secondary_reload, a scratch register is needed in case of
> HImode moves between memory and XMM reg, since scratch register needs
> a different mode than source and destination. Please see
> TARGET_SECONDARY_RELOAD documentation and several examples in the
> source.

I would suggest for the purpose of simplifying the initial patch series to
not make _Float16 supported on 32bits and leave that (and its ABI) for
future enhancement.  Then the baseline should be SSE2 (x86-64 base)
which I think should be OK despite needing some awkwardness for
HFmode stores (scratch reg needed).

Richard.

> Uros.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-02  8:19       ` Richard Biener
@ 2021-07-03 14:44         ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-03 14:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, Jakub Jelinek, liuhongt, gcc-patches

On Fri, Jul 2, 2021 at 4:19 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jul 2, 2021 at 10:07 AM Uros Bizjak via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Fri, Jul 2, 2021 at 8:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > > > >   AVX512FP16 is disclosed, refer to [1].
> > > > >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
> > > > >   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
> > > > >   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
> > > > >
> > > > > [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > > > > [2] https://reviews.llvm.org/D33719
> > > >
> > > > Looking through implementation of _Float16 support, I think, there is
> > > > no need for _Float16 support to depend on AVX512FP16.
> > > >
> > > > The compiler is smart enough to use either a named pattern that
> > > > describes the instruction when available or diverts to a library call
> > > > to a soft-fp implementation. So, I think that general _Float16 support
> > > > should be implemented first (similar to _float128) and then upgraded
> > > > with AVX512FP16 specific instructions.
> > > >
> > > > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> > > > secondary_reload register.
> > > >
> > > MOVD is under sse2, so is pinsrw, which means if we want xmm
> > > load/stores for HF, sse2 is the least requirement.
> > > Also we support PEXTRW reg/m16, xmm, imm8 under SSE4_1 under which we
> > > have 16bit direct load/store for HFmode and no need for a secondary
> > > reload.
> > > So for simplicity, can we just restrict _Float16 under sse4_1?
> >
> > When baseline is not met, the equivalent integer calling convention is
> > used, for example:
> >
> > --cut here--
> > typedef int __v2si __attribute__ ((vector_size (8)));
> >
> > __v2si foo (__v2si a, __v2si b)
> > {
> >   return a + b;
> > }
> > --cut here--
> >
> > will still compile with -m32 -mno-mmx with warnings:
> >
> > mmx1.c: In function ‘foo’:
> > mmx1.c:4:1: warning: MMX vector return without MMX enabled changes the
> > ABI [-Wpsabi]
> > mmx1.c:3:8: warning: MMX vector argument without MMX enabled changes
> > the ABI [-Wpsabi]
> >
> > So, by setting the baseline to SSE4.1, a big pool of targets will be
> > forced to use alternative ABI. This is quite inconvenient, and we
> > revert to the alternative ABI if we *really*  can't satisfy ABI
> > requirements (e.g. register type is not available, basic move insn
> > can't be implemented). Based on your analysis, I think that SSE2
> > should be the baseline.
> >
> > Also, looking at insn tables, it looks that movzwl from memory + movd
> > is faster than pinsrw (and similar for pextrw to memory), but I have
> > no hard data here.
> >
> > Regarding secondary_reload, a scratch register is needed in case of
> > HImode moves between memory and XMM reg, since scratch register needs
> > a different mode than source and destination. Please see
> > TARGET_SECONDARY_RELOAD documentation and several examples in the
> > source.
>
> I would suggest for the purpose of simplifying the initial patch series to
> not make _Float16 supported on 32bits and leave that (and its ABI) for
w/o AVX512FP16, it's ok.
The problem is AVX512FP16 instructions are also available for -m32,
and corresponding intrinsics will need the "_Float16" type(or other
builtin type name)  which will also be used by users. It means we
still need a 32-bit _Float16 ABI for them.

> future enhancement.  Then the baseline should be SSE2 (x86-64 base)
> which I think should be OK despite needing some awkwardness for
> HFmode stores (scratch reg needed).
>
> Richard.
>
> > Uros.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-02  8:03     ` Uros Bizjak
  2021-07-02  8:19       ` Richard Biener
@ 2021-07-05  1:25       ` Hongtao Liu
  2021-07-05 11:02         ` Richard Biener
  1 sibling, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-07-05  1:25 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, H. J. Lu, Jakub Jelinek, liuhongt

On Fri, Jul 2, 2021 at 4:03 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Jul 2, 2021 at 8:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> > > >   AVX512FP16 is disclosed, refer to [1].
> > > >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
> > > >   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
> > > >   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
> > > >
> > > > [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > > > [2] https://reviews.llvm.org/D33719
> > >
> > > Looking through implementation of _Float16 support, I think, there is
> > > no need for _Float16 support to depend on AVX512FP16.
> > >
> > > The compiler is smart enough to use either a named pattern that
> > > describes the instruction when available or diverts to a library call
> > > to a soft-fp implementation. So, I think that general _Float16 support
> > > should be implemented first (similar to _float128) and then upgraded
> > > with AVX512FP16 specific instructions.
> > >
> > > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> > > secondary_reload register.
> > >
> > MOVD is under sse2, so is pinsrw, which means if we want xmm
> > load/stores for HF, sse2 is the least requirement.
> > Also we support PEXTRW reg/m16, xmm, imm8 under SSE4_1 under which we
> > have 16bit direct load/store for HFmode and no need for a secondary
> > reload.
> > So for simplicity, can we just restrict _Float16 under sse4_1?
>
> When baseline is not met, the equivalent integer calling convention is
> used, for example:
Problem is under TARGET_SSE and w/ -mno-sse2, float calling convention
 is available for sse register, it's ok for float since there's movss
under sse, but there's no 16bit load/store for sse registers, nor
movement between gpr and sse register.
>
> --cut here--
> typedef int __v2si __attribute__ ((vector_size (8)));
>
> __v2si foo (__v2si a, __v2si b)
> {
>   return a + b;
> }
> --cut here--
>
> will still compile with -m32 -mno-mmx with warnings:
>
> mmx1.c: In function ‘foo’:
> mmx1.c:4:1: warning: MMX vector return without MMX enabled changes the
> ABI [-Wpsabi]
> mmx1.c:3:8: warning: MMX vector argument without MMX enabled changes
> the ABI [-Wpsabi]
>
> So, by setting the baseline to SSE4.1, a big pool of targets will be
> forced to use alternative ABI. This is quite inconvenient, and we
> revert to the alternative ABI if we *really*  can't satisfy ABI
> requirements (e.g. register type is not available, basic move insn
> can't be implemented). Based on your analysis, I think that SSE2
> should be the baseline.
Agreed.
>
> Also, looking at insn tables, it looks that movzwl from memory + movd
> is faster than pinsrw (and similar for pextrw to memory), but I have
> no hard data here.
>
> Regarding secondary_reload, a scratch register is needed in case of
> HImode moves between memory and XMM reg, since scratch register needs
> a different mode than source and destination. Please see
> TARGET_SECONDARY_RELOAD documentation and several examples in the
> source.
>
> Uros.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-05  1:25       ` Hongtao Liu
@ 2021-07-05 11:02         ` Richard Biener
  0 siblings, 0 replies; 138+ messages in thread
From: Richard Biener @ 2021-07-05 11:02 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Uros Bizjak, Jakub Jelinek, liuhongt, gcc-patches

On Mon, Jul 5, 2021 at 3:21 AM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, Jul 2, 2021 at 4:03 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Fri, Jul 2, 2021 at 8:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > > > >   AVX512FP16 is disclosed, refer to [1].
> > > > >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the convenience of review, we divide the 67 patches into 2 major parts.
> > > > >   The first part is 2 patches containing basic support for AVX512FP16 (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 patches covering all instructions of AVX512FP16(including intrinsic support and some optimizations).
> > > > >   There is a problem with the first part, _Float16 is not a C++ standard, so the front-end does not support this type and its mangling, so we "make up" a _Float16 type on the back-end and use _DF16 as its mangling. The purpose of this is to align with llvm side, because llvm C++ FE already supports _Float16[2].
> > > > >
> > > > > [1] https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > > > > [2] https://reviews.llvm.org/D33719
> > > >
> > > > Looking through implementation of _Float16 support, I think, there is
> > > > no need for _Float16 support to depend on AVX512FP16.
> > > >
> > > > The compiler is smart enough to use either a named pattern that
> > > > describes the instruction when available or diverts to a library call
> > > > to a soft-fp implementation. So, I think that general _Float16 support
> > > > should be implemented first (similar to _float128) and then upgraded
> > > > with AVX512FP16 specific instructions.
> > > >
> > > > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> > > > secondary_reload register.
> > > >
> > > MOVD is under sse2, so is pinsrw, which means if we want xmm
> > > load/stores for HF, sse2 is the least requirement.
> > > Also we support PEXTRW reg/m16, xmm, imm8 under SSE4_1 under which we
> > > have 16bit direct load/store for HFmode and no need for a secondary
> > > reload.
> > > So for simplicity, can we just restrict _Float16 under sse4_1?
> >
> > When baseline is not met, the equivalent integer calling convention is
> > used, for example:
> Problem is under TARGET_SSE and w/ -mno-sse2, float calling convention
>  is available for sse register, it's ok for float since there's movss
> under sse, but there's no 16bit load/store for sse registers, nor
> movement between gpr and sse register.

You can always spill though, that's prefered for some archs
over xmm <-> gpr moves anyway.

Richard.

> >
> > --cut here--
> > typedef int __v2si __attribute__ ((vector_size (8)));
> >
> > __v2si foo (__v2si a, __v2si b)
> > {
> >   return a + b;
> > }
> > --cut here--
> >
> > will still compile with -m32 -mno-mmx with warnings:
> >
> > mmx1.c: In function ‘foo’:
> > mmx1.c:4:1: warning: MMX vector return without MMX enabled changes the
> > ABI [-Wpsabi]
> > mmx1.c:3:8: warning: MMX vector argument without MMX enabled changes
> > the ABI [-Wpsabi]
> >
> > So, by setting the baseline to SSE4.1, a big pool of targets will be
> > forced to use alternative ABI. This is quite inconvenient, and we
> > revert to the alternative ABI if we *really*  can't satisfy ABI
> > requirements (e.g. register type is not available, basic move insn
> > can't be implemented). Based on your analysis, I think that SSE2
> > should be the baseline.
> Agreed.
> >
> > Also, looking at insn tables, it looks that movzwl from memory + movd
> > is faster than pinsrw (and similar for pextrw to memory), but I have
> > no hard data here.
> >
> > Regarding secondary_reload, a scratch register is needed in case of
> > HImode moves between memory and XMM reg, since scratch register needs
> > a different mode than source and destination. Please see
> > TARGET_SECONDARY_RELOAD documentation and several examples in the
> > source.
> >
> > Uros.
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-01 13:03       ` Jakub Jelinek
@ 2021-07-06  8:51         ` Hongtao Liu
  2021-07-06 10:14           ` Richard Biener
  2021-07-06 18:11           ` Joseph Myers
  0 siblings, 2 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-06  8:51 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, liuhongt, gcc-patches

On Thu, Jul 1, 2021 at 9:04 PM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Jul 01, 2021 at 02:58:01PM +0200, Richard Biener wrote:
> > > The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> > > require -mavx512fp16, we need to compile complex _Float16 functions in
> > > libgcc without -mavx512fp16.  Complex _Float16 performance is very
> > > important for our _Float16 usage.   _Float16 performance has to be
> > > very fast.  There should be no emulation anywhere when -mavx512fp16
> > > is used.   That is why _Float16 is available only with -mavx512fp16.
> >
> > It should be possible to emulate scalar _Float16 using _Float32 with a
> > reasonable
> > performance trade-off.  I think users caring for _Float16 performance will
> > use vector intrinsics anyway since for scalar code _Float32 code will likely
> > perform the same (at double storage cost)
>
> Only if it is allowed to have excess precision for _Float16.  If not, then
> one would need to (expensively?) round after every operation at least.
There may be inconsistent behavior between soft-fp and avx512fp16
instructions if we emulate _Float16 w/ float .
 i.e
  1) for a + b - c where b and c are variables with the same big value
and a + b is NAN at _Float16 and real value at float, avx512fp16
instruction will raise an exception but soft-fp won't(unless it's
rounded after every operation.)
  2) a / b where b is denormal value and AVX512FP16 won't flush it to
zero even w/ -Ofast, but when it's extended to float and using divss,
it will be flushed to zero and raise an exception when compiling w/
Ofast

To solve the upper issue, i try to add full emulation for _Float16(for
all those under libgcc/soft-fp/, i.e. add/sub/mul/div/cmp, .etc),
problem is in pass_expand, it always try wider mode first instead of
using soft-fp

  /* Look for a wider mode of the same class for which we think we
     can open-code the operation.  Check for a widening multiply at the
     wider mode as well.  */

  if (CLASS_HAS_WIDER_MODES_P (mclass)
      && methods != OPTAB_DIRECT && methods != OPTAB_LIB)
    FOR_EACH_WIDER_MODE (wider_mode, mode)

I think pass_expand did this for some reason, so I'm a little afraid
to touch this part of the code.

So the key point is that the soft-fp and avx512fp16 instructions may
do not behave the same on the exception, is this acceptable?

BTW, i've finished a initial patch to enable _Float16 on sse2, and
emulate _Float16 operation w/ float, and it passes all  312 new tests
which are related to _Float16, but those units tests doesn't cover the
scenario I'm talking about.
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-01 20:46   ` Joseph Myers
@ 2021-07-06  8:53     ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-06  8:53 UTC (permalink / raw)
  To: Joseph Myers; +Cc: GCC Patches, Jakub Jelinek, liuhongt

On Fri, Jul 2, 2021 at 4:46 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> Some general comments, following what I said on libc-alpha:
>
>
> 1. Can you confirm that the ABI being used for 64-bit, for _Float16 and
> _Complex _Float16 argument passing and return, follows the current x86_64
> ABI document?
>
>
> 2. Can you confirm that if you build with this instruction set extension
> enabled by default, and run GCC tests for a corresponding (emulated?)
> processor, all the existing float16 tests in the testsuite are enabled and
> PASS (both compilation and execution) (both 64-bit and 32-bit testing)?
>
>
> 3. There's an active 32-bit ABI mailing list (ia32-abi@googlegroups.com).
> If you want to support _Float16 in the 32-bit case, please work with it to
> get the corresponding ABI documented (using only memory and
> general-purpose registers seems like a good idea, so that the ABI can be
> supported for the base architecture without depending on SSE registers
> being present).  In the absence of 32-bit ABI support it might be better
> to disable the HFmode support for 32-bit.
>
>
> 4. Support for _Float16 really ought not to depend on whether a particular
> instruction set extension is present, just like with other floating-point
> types; it makes sense, as an API, for all x86 processors (and like many
> APIs, it will be faster on some processors than on others).  More specific
> points here are:
>
> (a) Basic arithmetic (+-*/) can be done by converting to SFmode, doing
> arithmetic there and converting back to HFmode; the results of doing so
> will be correctly rounded.  Indeed, I think optabs.c handles that
> automatically when operations are available on a wider mode but not on the
> desired mode (but you'd need to check carefully that all the expected
> conversions do occur).
So would different behavior of exceptions between soft-fp and
avx512fp16 is acceptable?
>
> (b) Conversions to/from all other floating-point modes will always be
> needed, whether in hardware or in software.
>
> (c) In the F16C (Ivy Bridge and later) case, where you have hardware
> conversions to/from float (only), it's fine to convert to double (or long
> double) via float.  (On efficiency grounds, widening from HFmode to TFmode
> should be a pure software operations, that should be faster than having an
> intermediate conversion to SFmode when the SFmode-to-TFmode conversion is
> a software operation.)
>
> (d) In the F16C case (where there are hardware conversions only from
> SFmode, not from wider modes), conversion *from* DFmode (or XFmode or
> TFmode) to HFmode should be a software operation, to avoid double
> rounding; an intermediate conversion to SFmode would be incorrect.
>
> (e) It's OK for conversions to/from integer modes to go via SFmode
> (although I don't know if that's efficient or not).  Any case where a
> conversion from integer to SFmode is inexact would overflow HFmode, so
> there are no double rounding issues.
>
> (f) In the F16C case, it seems the hardware instructions only work on
> vectors, not scalars, so care would need to be taken to use them for
> scalar conversions only if the other elements of the vector register are
> known to be safe to convert without raising any exceptions (e.g. all zero
> bits, or -fno-trapping-math in effect).
>
> (g) If concerned about efficiency of intermediate truncations on
> processors without hardware _Float16 arithmetic, look at
> aarch64_excess_precision; you have the option of using excess precision
> for _Float16 by default, though that only really helps for C given the
> lack of excess precision support in the C++ front end.  (Enabling this can
> cause trouble for code that only expects C99/C11 values of
> FLT_EVAL_METHOD, however; see the -fpermitted-flt-eval-methods option for
> more details.)
>
>
> 5. Suppose that in some cases you do disable _Float16 support (whether
> that's just for 32-bit until the ABI has been defined, or also in the
> absence of instruction set support despite my comments above).  Then the
> way you do that in this patch series, enabling the type in
> ix86_scalar_mode_supported_p and ix86_libgcc_floating_mode_supported_p and
> giving an error later in ix86_expand_move, is a bad idea.
>
> Errors in expanders are generally problematic (they don't have good
> location information available).  But apart from that, ordinary user code
> should be able to tell whether _Float16 is supported by testing whether
> e.g. __FLT16_MANT_DIG__ is defined (like float.h does), or by including
> float.h (with __STDC_WANT_IEC_60559_TYPES_EXT__ defined) and then testing
> whether one of the FLT16_* macros is defined, or in a configure test by
> just declaring something using the _Float16 type.  Patch 1 changes
> check_effective_target_float16 to work around your technique for disabling
> _Float16 in ix86_expand_move, but it should be considered a stable user
> API that any of the above methods can be used in user code to check for
> _Float16 support - user code shouldn't need to know implementation details
> that you need to do something that will go through ix86_expand_move to see
> whether _Float16 is supported or not (and user code shouldn't need to use
> a configure test at all for this, testing FLT16_* after including float.h
> should work as a fully portable way of testing it - that's using only ISO
> C facilities).
>
> So enable HFmode in ix86_scalar_mode_supported_p and
> ix86_libgcc_floating_mode_supported_p exactly when all operations are
> supported in the rest of the compiler - don't enable it there and then
> disable it elsewhere, because that will break user code testing for
> whether _Float16 is available using FLT16_* macros.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-06  8:51         ` Hongtao Liu
@ 2021-07-06 10:14           ` Richard Biener
  2021-07-06 12:11             ` H.J. Lu
  2021-07-06 18:18             ` Joseph Myers
  2021-07-06 18:11           ` Joseph Myers
  1 sibling, 2 replies; 138+ messages in thread
From: Richard Biener @ 2021-07-06 10:14 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, liuhongt, gcc-patches

On Tue, Jul 6, 2021 at 10:46 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Jul 1, 2021 at 9:04 PM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Jul 01, 2021 at 02:58:01PM +0200, Richard Biener wrote:
> > > > The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> > > > require -mavx512fp16, we need to compile complex _Float16 functions in
> > > > libgcc without -mavx512fp16.  Complex _Float16 performance is very
> > > > important for our _Float16 usage.   _Float16 performance has to be
> > > > very fast.  There should be no emulation anywhere when -mavx512fp16
> > > > is used.   That is why _Float16 is available only with -mavx512fp16.
> > >
> > > It should be possible to emulate scalar _Float16 using _Float32 with a
> > > reasonable
> > > performance trade-off.  I think users caring for _Float16 performance will
> > > use vector intrinsics anyway since for scalar code _Float32 code will likely
> > > perform the same (at double storage cost)
> >
> > Only if it is allowed to have excess precision for _Float16.  If not, then
> > one would need to (expensively?) round after every operation at least.
> There may be inconsistent behavior between soft-fp and avx512fp16
> instructions if we emulate _Float16 w/ float .
>  i.e
>   1) for a + b - c where b and c are variables with the same big value
> and a + b is NAN at _Float16 and real value at float, avx512fp16
> instruction will raise an exception but soft-fp won't(unless it's
> rounded after every operation.)
>   2) a / b where b is denormal value and AVX512FP16 won't flush it to
> zero even w/ -Ofast, but when it's extended to float and using divss,
> it will be flushed to zero and raise an exception when compiling w/
> Ofast
>
> To solve the upper issue, i try to add full emulation for _Float16(for
> all those under libgcc/soft-fp/, i.e. add/sub/mul/div/cmp, .etc),
> problem is in pass_expand, it always try wider mode first instead of
> using soft-fp
>
>   /* Look for a wider mode of the same class for which we think we
>      can open-code the operation.  Check for a widening multiply at the
>      wider mode as well.  */
>
>   if (CLASS_HAS_WIDER_MODES_P (mclass)
>       && methods != OPTAB_DIRECT && methods != OPTAB_LIB)
>     FOR_EACH_WIDER_MODE (wider_mode, mode)
>
> I think pass_expand did this for some reason, so I'm a little afraid
> to touch this part of the code.

It might be the first time we hit this ;)  I don't think it's safe for
non-integer modes or even anything but a small set of operations.
Just consider ssadd besides rounding issues or FP.

> So the key point is that the soft-fp and avx512fp16 instructions may
> do not behave the same on the exception, is this acceptable?

I think that's quite often the case for soft-fp.

> BTW, i've finished a initial patch to enable _Float16 on sse2, and
> emulate _Float16 operation w/ float, and it passes all  312 new tests
> which are related to _Float16, but those units tests doesn't cover the
> scenario I'm talking about.
> >
> >         Jakub
> >
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-06 10:14           ` Richard Biener
@ 2021-07-06 12:11             ` H.J. Lu
  2021-07-06 18:20               ` Joseph Myers
  2021-07-06 18:18             ` Joseph Myers
  1 sibling, 1 reply; 138+ messages in thread
From: H.J. Lu @ 2021-07-06 12:11 UTC (permalink / raw)
  To: Richard Biener; +Cc: Hongtao Liu, Jakub Jelinek, liuhongt, gcc-patches

On Tue, Jul 6, 2021 at 3:15 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Tue, Jul 6, 2021 at 10:46 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Jul 1, 2021 at 9:04 PM Jakub Jelinek via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Thu, Jul 01, 2021 at 02:58:01PM +0200, Richard Biener wrote:
> > > > > The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> > > > > require -mavx512fp16, we need to compile complex _Float16 functions in
> > > > > libgcc without -mavx512fp16.  Complex _Float16 performance is very
> > > > > important for our _Float16 usage.   _Float16 performance has to be
> > > > > very fast.  There should be no emulation anywhere when -mavx512fp16
> > > > > is used.   That is why _Float16 is available only with -mavx512fp16.
> > > >
> > > > It should be possible to emulate scalar _Float16 using _Float32 with a
> > > > reasonable
> > > > performance trade-off.  I think users caring for _Float16 performance will
> > > > use vector intrinsics anyway since for scalar code _Float32 code will likely
> > > > perform the same (at double storage cost)
> > >
> > > Only if it is allowed to have excess precision for _Float16.  If not, then
> > > one would need to (expensively?) round after every operation at least.
> > There may be inconsistent behavior between soft-fp and avx512fp16
> > instructions if we emulate _Float16 w/ float .
> >  i.e
> >   1) for a + b - c where b and c are variables with the same big value
> > and a + b is NAN at _Float16 and real value at float, avx512fp16
> > instruction will raise an exception but soft-fp won't(unless it's
> > rounded after every operation.)
> >   2) a / b where b is denormal value and AVX512FP16 won't flush it to
> > zero even w/ -Ofast, but when it's extended to float and using divss,
> > it will be flushed to zero and raise an exception when compiling w/
> > Ofast
> >
> > To solve the upper issue, i try to add full emulation for _Float16(for
> > all those under libgcc/soft-fp/, i.e. add/sub/mul/div/cmp, .etc),
> > problem is in pass_expand, it always try wider mode first instead of
> > using soft-fp
> >
> >   /* Look for a wider mode of the same class for which we think we
> >      can open-code the operation.  Check for a widening multiply at the
> >      wider mode as well.  */
> >
> >   if (CLASS_HAS_WIDER_MODES_P (mclass)
> >       && methods != OPTAB_DIRECT && methods != OPTAB_LIB)
> >     FOR_EACH_WIDER_MODE (wider_mode, mode)
> >
> > I think pass_expand did this for some reason, so I'm a little afraid
> > to touch this part of the code.
>
> It might be the first time we hit this ;)  I don't think it's safe for
> non-integer modes or even anything but a small set of operations.
> Just consider ssadd besides rounding issues or FP.
>
> > So the key point is that the soft-fp and avx512fp16 instructions may
> > do not behave the same on the exception, is this acceptable?
>
> I think that's quite often the case for soft-fp.

So this is a GCC limitation.  Please document difference behaviors
of _Float16 with and without AVX512FP16, similar to

---
 The '__fp16' type may only be used as an argument to intrinsics defined
in '<arm_fp16.h>', or as a storage format.  For purposes of arithmetic
and other operations, '__fp16' values in C or C++ expressions are
automatically promoted to 'float'.

 The ARM target provides hardware support for conversions between
'__fp16' and 'float' values as an extension to VFP and NEON (Advanced
SIMD), and from ARMv8-A provides hardware support for conversions
between '__fp16' and 'double' values.  GCC generates code using these
hardware instructions if you compile with options to select an FPU that
provides them; for example, '-mfpu=neon-fp16 -mfloat-abi=softfp', in
addition to the '-mfp16-format' option to select a half-precision
format.

 Language-level support for the '__fp16' data type is independent of
whether GCC generates code using hardware floating-point instructions.
In cases where hardware support is not specified, GCC implements
conversions between '__fp16' and other types as library calls.

 It is recommended that portable code use the '_Float16' type defined by
ISO/IEC TS 18661-3:2015.  *Note Floating Types::.
---

We recommend portable code of _Float16 with AVX512FP16.

> > BTW, i've finished a initial patch to enable _Float16 on sse2, and
> > emulate _Float16 operation w/ float, and it passes all  312 new tests
> > which are related to _Float16, but those units tests doesn't cover the
> > scenario I'm talking about.
> > >
> > >         Jakub
> > >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
H.J.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-06  8:51         ` Hongtao Liu
  2021-07-06 10:14           ` Richard Biener
@ 2021-07-06 18:11           ` Joseph Myers
  2021-07-07  1:24             ` Hongtao Liu
  1 sibling, 1 reply; 138+ messages in thread
From: Joseph Myers @ 2021-07-06 18:11 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, liuhongt, gcc-patches

On Tue, 6 Jul 2021, Hongtao Liu via Gcc-patches wrote:

> There may be inconsistent behavior between soft-fp and avx512fp16
> instructions if we emulate _Float16 w/ float .
>  i.e
>   1) for a + b - c where b and c are variables with the same big value
> and a + b is NAN at _Float16 and real value at float, avx512fp16
> instruction will raise an exception but soft-fp won't(unless it's
> rounded after every operation.)

There are at least two variants of emulation using float:

(a) Using the excess precision support, as on AArch64, which means the C 
front end converts the _Float16 operations to float ones, with explicit 
narrowing on assignment (and conversion as if by assignment - argument 
passing and return, casts, etc.).  Excess precision indeed involves 
different semantics compared to doing each operation directly in the range 
and precision of _Float16.

(b) Letting the expand/optabs code generate operations in a wider mode.  
My understanding is that the result should get converted back to the 
narrower mode after each operation (by the expand/optabs code / 
convert_move called by it generating such a conversion), meaning (for 
basic arithmetic operations) that the semantics end up the same as if the 
operation had been done directly on _Float16 (but with more truncation 
operations occurring than would be the case with excess precision support 
used).

>   2) a / b where b is denormal value and AVX512FP16 won't flush it to
> zero even w/ -Ofast, but when it's extended to float and using divss,
> it will be flushed to zero and raise an exception when compiling w/
> Ofast

I don't think that's a concern, flush to zero is well outside the scope of 
standards defining _Float16 semantics.

> So the key point is that the soft-fp and avx512fp16 instructions may
> do not behave the same on the exception, is this acceptable?

As far as I understand it, all cases within the standards will behave as 
expected for exceptions, whether pure software floating-point is used, 
pure hardware _Float16 arithmetic or one of the forms of emulation listed 
above.  (Where "as expected" itself depends on the value of 
FLT_EVAL_METHOD, i.e. whether excess precision is used for _Float16.)  
Flush to zero and trapping exceptions are outside the scope of the 
standards.  Since trapping exceptions is outside the scope of the 
standards, so is anything that distinguishes whether an arithmetic 
operation raises the same exception more than once or the order in which 
it raises different exceptions (e.g. the possibility of "inexact" being 
raised more than once, both by arithmetic on float and by narrowing from 
float to _Float16).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-06 10:14           ` Richard Biener
  2021-07-06 12:11             ` H.J. Lu
@ 2021-07-06 18:18             ` Joseph Myers
  1 sibling, 0 replies; 138+ messages in thread
From: Joseph Myers @ 2021-07-06 18:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: Hongtao Liu, Jakub Jelinek, liuhongt, gcc-patches

On Tue, 6 Jul 2021, Richard Biener via Gcc-patches wrote:

> >   /* Look for a wider mode of the same class for which we think we
> >      can open-code the operation.  Check for a widening multiply at the
> >      wider mode as well.  */
> >
> >   if (CLASS_HAS_WIDER_MODES_P (mclass)
> >       && methods != OPTAB_DIRECT && methods != OPTAB_LIB)
> >     FOR_EACH_WIDER_MODE (wider_mode, mode)
> >
> > I think pass_expand did this for some reason, so I'm a little afraid
> > to touch this part of the code.
> 
> It might be the first time we hit this ;)  I don't think it's safe for
> non-integer modes or even anything but a small set of operations.
> Just consider ssadd besides rounding issues or FP.

I think it's safe for basic arithmetic (+-*/), for IEEE floating-point 
arithmetic when the wider mode has significand more than twice as wide as 
the narrower one (given that the result is immediately converted back to 
the narrower mode, double rounding isn't an issue given such a constraint 
on the widths of the modes - and given that the wider mode has sufficient 
exponent range to avoid intermediate overflow / underflow as an issue as 
well).

(The precise requirements on the width of the modes may depend on the 
operation in question.  It's *not* safe for fused multiply-add, regardless 
of the widths in question; a software implementation of fmaf16 using float 
arithmetic could be quite simple, using round-to-odd like e.g. glibc's 
implementation of fmaf using double arithmetic, but "call fmaf then 
convert the result to _Float16" would be an incorrect implementation.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-06 12:11             ` H.J. Lu
@ 2021-07-06 18:20               ` Joseph Myers
  0 siblings, 0 replies; 138+ messages in thread
From: Joseph Myers @ 2021-07-06 18:20 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Biener, Jakub Jelinek, gcc-patches, liuhongt

On Tue, 6 Jul 2021, H.J. Lu via Gcc-patches wrote:

> > > So the key point is that the soft-fp and avx512fp16 instructions may
> > > do not behave the same on the exception, is this acceptable?
> >
> > I think that's quite often the case for soft-fp.
> 
> So this is a GCC limitation.  Please document difference behaviors
> of _Float16 with and without AVX512FP16, similar to

I don't think it's yet clear there will be any such limitation, just 
semantics that depend on whether the excess precision support is used or 
not (which is covered by FLT_EVAL_METHOD, like on AArch64).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-06 18:11           ` Joseph Myers
@ 2021-07-07  1:24             ` Hongtao Liu
  2021-07-14  7:50               ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-07-07  1:24 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Jakub Jelinek, liuhongt, gcc-patches

On Wed, Jul 7, 2021 at 2:11 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Tue, 6 Jul 2021, Hongtao Liu via Gcc-patches wrote:
>
> > There may be inconsistent behavior between soft-fp and avx512fp16
> > instructions if we emulate _Float16 w/ float .
> >  i.e
> >   1) for a + b - c where b and c are variables with the same big value
> > and a + b is NAN at _Float16 and real value at float, avx512fp16
> > instruction will raise an exception but soft-fp won't(unless it's
> > rounded after every operation.)
>
> There are at least two variants of emulation using float:
>
> (a) Using the excess precision support, as on AArch64, which means the C
> front end converts the _Float16 operations to float ones, with explicit
> narrowing on assignment (and conversion as if by assignment - argument
> passing and return, casts, etc.).  Excess precision indeed involves
> different semantics compared to doing each operation directly in the range
> and precision of _Float16.
>
Yes, set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
could round after each operation.
> (b) Letting the expand/optabs code generate operations in a wider mode.
> My understanding is that the result should get converted back to the
> narrower mode after each operation (by the expand/optabs code /
> convert_move called by it generating such a conversion), meaning (for
> basic arithmetic operations) that the semantics end up the same as if the
> operation had been done directly on _Float16 (but with more truncation
> operations occurring than would be the case with excess precision support
> used).
Yes, just w/ different behavior related to exceptions..
>
> >   2) a / b where b is denormal value and AVX512FP16 won't flush it to
> > zero even w/ -Ofast, but when it's extended to float and using divss,
> > it will be flushed to zero and raise an exception when compiling w/
> > Ofast
>
> I don't think that's a concern, flush to zero is well outside the scope of
> standards defining _Float16 semantics.
Ok.
>
> > So the key point is that the soft-fp and avx512fp16 instructions may
> > do not behave the same on the exception, is this acceptable?
>
> As far as I understand it, all cases within the standards will behave as
> expected for exceptions, whether pure software floating-point is used,
> pure hardware _Float16 arithmetic or one of the forms of emulation listed
> above.  (Where "as expected" itself depends on the value of
> FLT_EVAL_METHOD, i.e. whether excess precision is used for _Float16.)
> Flush to zero and trapping exceptions are outside the scope of the
> standards.  Since trapping exceptions is outside the scope of the
> standards, so is anything that distinguishes whether an arithmetic
> operation raises the same exception more than once or the order in which
> it raises different exceptions (e.g. the possibility of "inexact" being
> raised more than once, both by arithmetic on float and by narrowing from
> float to _Float16).
>
Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
round after each operation could keep semantics right.
And I'll document the behavior difference between soft-fp and
AVX512FP16 instruction for exceptions.
> --
> Joseph S. Myers
> joseph@codesourcery.com



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 0/2] Initial support for AVX512FP16
  2021-07-07  1:24             ` Hongtao Liu
@ 2021-07-14  7:50               ` Hongtao Liu
  2021-07-14 15:32                 ` [llvm-dev] " Craig Topper
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-07-14  7:50 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Jakub Jelinek, liuhongt, gcc-patches, llvm-dev, pengfei.wang

> >
> Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> round after each operation could keep semantics right.
> And I'll document the behavior difference between soft-fp and
> AVX512FP16 instruction for exceptions.
I got some feedback from my colleague who's working on supporting
_Float16 for llvm.
The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
soft-fp so that codes can be more efficient.
i.e.
_Float16 a, b, c, d;
d = a + b + c;

would be transformed to
float tmp, tmp1, a1, b1, c1;
a1 = (float) a;
b1 = (float) b;
c1 = (float) c;
tmp = a1 + b1;
tmp1 = tmp + c1;
d = (_Float16) tmp;

so there's only 1 truncation in the end.

if users want to round back after every operation. codes should be
explicitly written as
_Float16 a, b, c, d, e;
e = a + b;
d = e + c;

That's what Clang does, quote from [1]
 _Float16 arithmetic will be performed using native half-precision
support when available on the target (e.g. on ARMv8.2a); otherwise it
will be performed at a higher precision (currently always float) and
then truncated down to _Float16. Note that C and C++ allow
intermediate floating-point operands of an expression to be computed
with greater precision than is expressible in their type, so Clang may
avoid intermediate truncations in certain cases; this may lead to
results that are inconsistent with native arithmetic.

and so does arm gcc
quote from arm.c

/* We can calculate either in 16-bit range and precision or
   32-bit range and precision.  Make that decision based on whether
   we have native support for the ARMv8.2-A 16-bit floating-point
   instructions or not.  */
return (TARGET_VFP_FP16INST
? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);


[1]https://clang.llvm.org/docs/LanguageExtensions.html
> > --
> > Joseph S. Myers
> > joseph@codesourcery.com
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
  2021-07-14  7:50               ` Hongtao Liu
@ 2021-07-14 15:32                 ` Craig Topper
  2021-07-15  2:07                   ` Wang, Pengfei
  0 siblings, 1 reply; 138+ messages in thread
From: Craig Topper @ 2021-07-14 15:32 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Joseph Myers, Jakub Jelinek, llvm-dev, liuhongt, gcc-patches

On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev <
llvm-dev@lists.llvm.org> wrote:

> > >
> > Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> > round after each operation could keep semantics right.
> > And I'll document the behavior difference between soft-fp and
> > AVX512FP16 instruction for exceptions.
> I got some feedback from my colleague who's working on supporting
> _Float16 for llvm.
> The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
> soft-fp so that codes can be more efficient.
> i.e.
> _Float16 a, b, c, d;
> d = a + b + c;
>
> would be transformed to
> float tmp, tmp1, a1, b1, c1;
> a1 = (float) a;
> b1 = (float) b;
> c1 = (float) c;
> tmp = a1 + b1;
> tmp1 = tmp + c1;
> d = (_Float16) tmp;
>
> so there's only 1 truncation in the end.
>
> if users want to round back after every operation. codes should be
> explicitly written as
> _Float16 a, b, c, d, e;
> e = a + b;
> d = e + c;
>
> That's what Clang does, quote from [1]
>  _Float16 arithmetic will be performed using native half-precision
> support when available on the target (e.g. on ARMv8.2a); otherwise it
> will be performed at a higher precision (currently always float) and
> then truncated down to _Float16. Note that C and C++ allow
> intermediate floating-point operands of an expression to be computed
> with greater precision than is expressible in their type, so Clang may
> avoid intermediate truncations in certain cases; this may lead to
> results that are inconsistent with native arithmetic.
>

Clang for AArch64 promotes each individual operation and rounds immediately
afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two
fadd operations. It's implemented in the LLVM backend where we can't see
what was originally a single expression.


>
> and so does arm gcc
> quote from arm.c
>
> /* We can calculate either in 16-bit range and precision or
>    32-bit range and precision.  Make that decision based on whether
>    we have native support for the ARMv8.2-A 16-bit floating-point
>    instructions or not.  */
> return (TARGET_VFP_FP16INST
> ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
>
>
> [1]https://clang.llvm.org/docs/LanguageExtensions.html
> > > --
> > > Joseph S. Myers
> > > joseph@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* RE: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
  2021-07-14 15:32                 ` [llvm-dev] " Craig Topper
@ 2021-07-15  2:07                   ` Wang, Pengfei
  2021-07-15  6:34                     ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Wang, Pengfei @ 2021-07-15  2:07 UTC (permalink / raw)
  To: Craig Topper, Hongtao Liu
  Cc: Jakub Jelinek, Liu, Hongtao, gcc-patches, Joseph Myers

  *   Clang for AArch64 promotes each individual operation and rounds immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd operations. It's implemented in the LLVM backend where we can't see what was originally a single expression.

Yes, but this is not consistent with Clang document. I think we should ask Clang FE to do the promotion and truncation.

Thanks
Pengfei

From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of Craig Topper via llvm-dev
Sent: Wednesday, July 14, 2021 11:32 PM
To: Hongtao Liu <crazylht@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>; llvm-dev <llvm-dev@lists.llvm.org>; Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org; Joseph Myers <joseph@codesourcery.com>
Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev <llvm-dev@lists.llvm.org<mailto:llvm-dev@lists.llvm.org>> wrote:
> >
> Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> round after each operation could keep semantics right.
> And I'll document the behavior difference between soft-fp and
> AVX512FP16 instruction for exceptions.
I got some feedback from my colleague who's working on supporting
_Float16 for llvm.
The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
soft-fp so that codes can be more efficient.
i.e.
_Float16 a, b, c, d;
d = a + b + c;

would be transformed to
float tmp, tmp1, a1, b1, c1;
a1 = (float) a;
b1 = (float) b;
c1 = (float) c;
tmp = a1 + b1;
tmp1 = tmp + c1;
d = (_Float16) tmp;

so there's only 1 truncation in the end.

if users want to round back after every operation. codes should be
explicitly written as
_Float16 a, b, c, d, e;
e = a + b;
d = e + c;

That's what Clang does, quote from [1]
 _Float16 arithmetic will be performed using native half-precision
support when available on the target (e.g. on ARMv8.2a); otherwise it
will be performed at a higher precision (currently always float) and
then truncated down to _Float16. Note that C and C++ allow
intermediate floating-point operands of an expression to be computed
with greater precision than is expressible in their type, so Clang may
avoid intermediate truncations in certain cases; this may lead to
results that are inconsistent with native arithmetic.

Clang for AArch64 promotes each individual operation and rounds immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd operations. It's implemented in the LLVM backend where we can't see what was originally a single expression.


and so does arm gcc
quote from arm.c

/* We can calculate either in 16-bit range and precision or
   32-bit range and precision.  Make that decision based on whether
   we have native support for the ARMv8.2-A 16-bit floating-point
   instructions or not.  */
return (TARGET_VFP_FP16INST
? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);


[1]https://clang.llvm.org/docs/LanguageExtensions.html
> > --
> > Joseph S. Myers
> > joseph@codesourcery.com<mailto:joseph@codesourcery.com>
>
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao
_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org<mailto:llvm-dev@lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
  2021-07-15  2:07                   ` Wang, Pengfei
@ 2021-07-15  6:34                     ` Hongtao Liu
  2021-07-15  6:57                       ` Wang, Pengfei
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-07-15  6:34 UTC (permalink / raw)
  To: Wang, Pengfei
  Cc: Craig Topper, Jakub Jelinek, Liu, Hongtao, gcc-patches, Joseph Myers

On Thu, Jul 15, 2021 at 10:07 AM Wang, Pengfei <pengfei.wang@intel.com> wrote:
>
> Clang for AArch64 promotes each individual operation and rounds immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd operations. It's implemented in the LLVM backend where we can't see what was originally a single expression.
>
>
>
> Yes, but this is not consistent with Clang document. I think we should ask Clang FE to do the promotion and truncation.
>
>
>
> Thanks
>
> Pengfei
>
>
>
> From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of Craig Topper via llvm-dev
> Sent: Wednesday, July 14, 2021 11:32 PM
> To: Hongtao Liu <crazylht@gmail.com>
> Cc: Jakub Jelinek <jakub@redhat.com>; llvm-dev <llvm-dev@lists.llvm.org>; Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org; Joseph Myers <joseph@codesourcery.com>
> Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
>
>
>
> On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev <llvm-dev@lists.llvm.org> wrote:
>
> > >
> > Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> > round after each operation could keep semantics right.
> > And I'll document the behavior difference between soft-fp and
> > AVX512FP16 instruction for exceptions.
> I got some feedback from my colleague who's working on supporting
> _Float16 for llvm.
> The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
> soft-fp so that codes can be more efficient.
> i.e.
> _Float16 a, b, c, d;
> d = a + b + c;
>
> would be transformed to
> float tmp, tmp1, a1, b1, c1;
> a1 = (float) a;
> b1 = (float) b;
> c1 = (float) c;
> tmp = a1 + b1;
> tmp1 = tmp + c1;
> d = (_Float16) tmp;
>
> so there's only 1 truncation in the end.
>
> if users want to round back after every operation. codes should be
> explicitly written as
> _Float16 a, b, c, d, e;
> e = a + b;
> d = e + c;
>
> That's what Clang does, quote from [1]
>  _Float16 arithmetic will be performed using native half-precision
> support when available on the target (e.g. on ARMv8.2a); otherwise it
> will be performed at a higher precision (currently always float) and
> then truncated down to _Float16. Note that C and C++ allow
> intermediate floating-point operands of an expression to be computed
> with greater precision than is expressible in their type, so Clang may
> avoid intermediate truncations in certain cases; this may lead to
> results that are inconsistent with native arithmetic.
>
>
>
> Clang for AArch64 promotes each individual operation and rounds immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd operations. It's implemented in the LLVM backend where we can't see what was originally a single expression.
>
>
When i'm reading option documents for excess-precision from
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

-fexcess-precision=style

This option allows further control over excess precision on machines
where floating-point operations occur in a format with more precision
or range than the IEEE standard and interchange floating-point types.
By default, -fexcess-precision=fast is in effect; this means that
operations may be carried out in a wider precision than the types
specified in the source if that would result in faster code, and it is
unpredictable when rounding to the types specified in the source code
takes place. When compiling C, if -fexcess-precision=standard is
specified then excess precision follows the rules specified in ISO
C99; in particular, both casts and assignments cause values to be
rounded to their semantic types (whereas -ffloat-store only affects
assignments). This option is enabled by default for C if a strict
conformance option such as -std=c99 is used. -ffast-math enables
-fexcess-precision=fast by default regardless of whether a strict
conformance option is used.

For -fexcess-precision=fast,
 we should set flt_eval_mathond to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
for soft-fp, and FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for AVX512FP16

For  -fexcess-precision=standard
set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_SSE2? so for
soft-fp it will round back after every operation?
>
>
> and so does arm gcc
> quote from arm.c
>
> /* We can calculate either in 16-bit range and precision or
>    32-bit range and precision.  Make that decision based on whether
>    we have native support for the ARMv8.2-A 16-bit floating-point
>    instructions or not.  */
> return (TARGET_VFP_FP16INST
> ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
>
>
> [1]https://clang.llvm.org/docs/LanguageExtensions.html
> > > --
> > > Joseph S. Myers
> > > joseph@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* RE: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
  2021-07-15  6:34                     ` Hongtao Liu
@ 2021-07-15  6:57                       ` Wang, Pengfei
  2021-07-15  7:49                         ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Wang, Pengfei @ 2021-07-15  6:57 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Craig Topper, Jakub Jelinek, Liu, Hongtao, gcc-patches, Joseph Myers

It seems Clang doesn't support -fexcess-precision=xxx:
https://github.com/llvm/llvm-project/blob/main/clang/test/Driver/clang_f_opts.c#L403

Thanks
Pengfei

-----Original Message-----
From: Hongtao Liu <crazylht@gmail.com> 
Sent: Thursday, July 15, 2021 2:35 PM
To: Wang, Pengfei <pengfei.wang@intel.com>
Cc: Craig Topper <craig.topper@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org; Joseph Myers <joseph@codesourcery.com>
Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

On Thu, Jul 15, 2021 at 10:07 AM Wang, Pengfei <pengfei.wang@intel.com> wrote:
>
> Clang for AArch64 promotes each individual operation and rounds immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd operations. It's implemented in the LLVM backend where we can't see what was originally a single expression.
>
>
>
> Yes, but this is not consistent with Clang document. I think we should ask Clang FE to do the promotion and truncation.
>
>
>
> Thanks
>
> Pengfei
>
>
>
> From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of Craig 
> Topper via llvm-dev
> Sent: Wednesday, July 14, 2021 11:32 PM
> To: Hongtao Liu <crazylht@gmail.com>
> Cc: Jakub Jelinek <jakub@redhat.com>; llvm-dev 
> <llvm-dev@lists.llvm.org>; Liu, Hongtao <hongtao.liu@intel.com>; 
> gcc-patches@gcc.gnu.org; Joseph Myers <joseph@codesourcery.com>
> Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
>
>
>
> On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev <llvm-dev@lists.llvm.org> wrote:
>
> > >
> > Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to 
> > round after each operation could keep semantics right.
> > And I'll document the behavior difference between soft-fp and
> > AVX512FP16 instruction for exceptions.
> I got some feedback from my colleague who's working on supporting
> _Float16 for llvm.
> The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for 
> soft-fp so that codes can be more efficient.
> i.e.
> _Float16 a, b, c, d;
> d = a + b + c;
>
> would be transformed to
> float tmp, tmp1, a1, b1, c1;
> a1 = (float) a;
> b1 = (float) b;
> c1 = (float) c;
> tmp = a1 + b1;
> tmp1 = tmp + c1;
> d = (_Float16) tmp;
>
> so there's only 1 truncation in the end.
>
> if users want to round back after every operation. codes should be 
> explicitly written as
> _Float16 a, b, c, d, e;
> e = a + b;
> d = e + c;
>
> That's what Clang does, quote from [1]
>  _Float16 arithmetic will be performed using native half-precision 
> support when available on the target (e.g. on ARMv8.2a); otherwise it 
> will be performed at a higher precision (currently always float) and 
> then truncated down to _Float16. Note that C and C++ allow 
> intermediate floating-point operands of an expression to be computed 
> with greater precision than is expressible in their type, so Clang may 
> avoid intermediate truncations in certain cases; this may lead to 
> results that are inconsistent with native arithmetic.
>
>
>
> Clang for AArch64 promotes each individual operation and rounds immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd operations. It's implemented in the LLVM backend where we can't see what was originally a single expression.
>
>
When i'm reading option documents for excess-precision from https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

-fexcess-precision=style

This option allows further control over excess precision on machines where floating-point operations occur in a format with more precision or range than the IEEE standard and interchange floating-point types.
By default, -fexcess-precision=fast is in effect; this means that operations may be carried out in a wider precision than the types specified in the source if that would result in faster code, and it is unpredictable when rounding to the types specified in the source code takes place. When compiling C, if -fexcess-precision=standard is specified then excess precision follows the rules specified in ISO C99; in particular, both casts and assignments cause values to be rounded to their semantic types (whereas -ffloat-store only affects assignments). This option is enabled by default for C if a strict conformance option such as -std=c99 is used. -ffast-math enables -fexcess-precision=fast by default regardless of whether a strict conformance option is used.

For -fexcess-precision=fast,
 we should set flt_eval_mathond to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for soft-fp, and FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for AVX512FP16

For  -fexcess-precision=standard
set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_SSE2? so for soft-fp it will round back after every operation?
>
>
> and so does arm gcc
> quote from arm.c
>
> /* We can calculate either in 16-bit range and precision or
>    32-bit range and precision.  Make that decision based on whether
>    we have native support for the ARMv8.2-A 16-bit floating-point
>    instructions or not.  */
> return (TARGET_VFP_FP16INST
> ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
>
>
> [1]https://clang.llvm.org/docs/LanguageExtensions.html
> > > --
> > > Joseph S. Myers
> > > joseph@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
  2021-07-15  6:57                       ` Wang, Pengfei
@ 2021-07-15  7:49                         ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-15  7:49 UTC (permalink / raw)
  To: Wang, Pengfei
  Cc: Craig Topper, Jakub Jelinek, Liu, Hongtao, gcc-patches, Joseph Myers

On Thu, Jul 15, 2021 at 2:58 PM Wang, Pengfei <pengfei.wang@intel.com> wrote:
>
> It seems Clang doesn't support -fexcess-precision=xxx:
> https://github.com/llvm/llvm-project/blob/main/clang/test/Driver/clang_f_opts.c#L403
>
> Thanks
> Pengfei
>
> -----Original Message-----
> From: Hongtao Liu <crazylht@gmail.com>
> Sent: Thursday, July 15, 2021 2:35 PM
> To: Wang, Pengfei <pengfei.wang@intel.com>
> Cc: Craig Topper <craig.topper@gmail.com>; Jakub Jelinek <jakub@redhat.com>; Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org; Joseph Myers <joseph@codesourcery.com>
> Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
>
> On Thu, Jul 15, 2021 at 10:07 AM Wang, Pengfei <pengfei.wang@intel.com> wrote:
> >
> > Clang for AArch64 promotes each individual operation and rounds immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd operations. It's implemented in the LLVM backend where we can't see what was originally a single expression.
> >
> >
> >
> > Yes, but this is not consistent with Clang document. I think we should ask Clang FE to do the promotion and truncation.
> >
> >
> >
> > Thanks
> >
> > Pengfei
> >
> >
> >
> > From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of Craig
> > Topper via llvm-dev
> > Sent: Wednesday, July 14, 2021 11:32 PM
> > To: Hongtao Liu <crazylht@gmail.com>
> > Cc: Jakub Jelinek <jakub@redhat.com>; llvm-dev
> > <llvm-dev@lists.llvm.org>; Liu, Hongtao <hongtao.liu@intel.com>;
> > gcc-patches@gcc.gnu.org; Joseph Myers <joseph@codesourcery.com>
> > Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
> >
> >
> >
> > On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev <llvm-dev@lists.llvm.org> wrote:
> >
> > > >
> > > Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> > > round after each operation could keep semantics right.
> > > And I'll document the behavior difference between soft-fp and
> > > AVX512FP16 instruction for exceptions.
> > I got some feedback from my colleague who's working on supporting
> > _Float16 for llvm.
> > The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
> > soft-fp so that codes can be more efficient.
> > i.e.
> > _Float16 a, b, c, d;
> > d = a + b + c;
> >
> > would be transformed to
> > float tmp, tmp1, a1, b1, c1;
> > a1 = (float) a;
> > b1 = (float) b;
> > c1 = (float) c;
> > tmp = a1 + b1;
> > tmp1 = tmp + c1;
> > d = (_Float16) tmp;
> >
> > so there's only 1 truncation in the end.
> >
> > if users want to round back after every operation. codes should be
> > explicitly written as
> > _Float16 a, b, c, d, e;
> > e = a + b;
> > d = e + c;
> >
> > That's what Clang does, quote from [1]
> >  _Float16 arithmetic will be performed using native half-precision
> > support when available on the target (e.g. on ARMv8.2a); otherwise it
> > will be performed at a higher precision (currently always float) and
> > then truncated down to _Float16. Note that C and C++ allow
> > intermediate floating-point operands of an expression to be computed
> > with greater precision than is expressible in their type, so Clang may
> > avoid intermediate truncations in certain cases; this may lead to
> > results that are inconsistent with native arithmetic.
> >
> >
> >
> > Clang for AArch64 promotes each individual operation and rounds immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd operations. It's implemented in the LLVM backend where we can't see what was originally a single expression.
> >
> >
> When i'm reading option documents for excess-precision from https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
>
> -fexcess-precision=style
By this option, we can provide a solution that rounds back after each
operation or not, this should provide more convenience.

>
> This option allows further control over excess precision on machines where floating-point operations occur in a format with more precision or range than the IEEE standard and interchange floating-point types.
> By default, -fexcess-precision=fast is in effect; this means that operations may be carried out in a wider precision than the types specified in the source if that would result in faster code, and it is unpredictable when rounding to the types specified in the source code takes place. When compiling C, if -fexcess-precision=standard is specified then excess precision follows the rules specified in ISO C99; in particular, both casts and assignments cause values to be rounded to their semantic types (whereas -ffloat-store only affects assignments). This option is enabled by default for C if a strict conformance option such as -std=c99 is used. -ffast-math enables -fexcess-precision=fast by default regardless of whether a strict conformance option is used.
>
> For -fexcess-precision=fast,
>  we should set flt_eval_mathond to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for soft-fp, and FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for AVX512FP16
>
> For  -fexcess-precision=standard
> set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_SSE2? so for soft-fp it will round back after every operation?
> >
> >
> > and so does arm gcc
> > quote from arm.c
> >
> > /* We can calculate either in 16-bit range and precision or
> >    32-bit range and precision.  Make that decision based on whether
> >    we have native support for the ARMv8.2-A 16-bit floating-point
> >    instructions or not.  */
> > return (TARGET_VFP_FP16INST
> > ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> > : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
> >
> >
> > [1]https://clang.llvm.org/docs/LanguageExtensions.html
> > > > --
> > > > Joseph S. Myers
> > > > joseph@codesourcery.com
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev@lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH V2 00/10] Initial support for AVX512FP16
  2021-07-01 12:58     ` Richard Biener
  2021-07-01 13:03       ` Jakub Jelinek
@ 2021-07-21  7:43       ` liuhongt
  2021-07-21  7:43         ` [PATCH 01/10] Update hf soft-fp from glibc liuhongt
                           ` (10 more replies)
  1 sibling, 11 replies; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

Hi:
  As discussed in [1], this patch support _Float16 under target sse2
and above, w/o avx512fp16, _Float16 type is storage only, all operations
are emulated by soft-fp and float instructions. Soft-fp keeps the intermediate
result of the operation at 32-bit precision by defaults, which may lead to
inconsistent behavior between soft-fp and avx512fp16 instructions, using option
-fexcess-precision=standard will force round back after every operation.
 
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574112.html

There's 10 patches in this series:

1)  Update hf soft-fp from glibc.
2)  [i386] Enable _Float16 type for TARGET_SSE2 and above.
3)  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
    truncations.
4) AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
instructions.
5) AVX512FP16: Support vector init/broadcast/set/extract for FP16.
6) AVX512FP16: Add testcase for vector init and broadcast intrinsics.
7) AVX512FP16: Add tests for vector passing in variable arguments.
8) AVX512FP16: Add ABI tests for xmm.
9) AVX512FP16: Add ABI test for ymm.
10) AVX512FP16: Add abi test for zmm

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,} on CLX.
  Boostrappped and regtested on x86_64-linux-gnu{-m32\ -march=native,\ -march=native} on SPR.
  Pass 300+ new tests under gcc.dg/torture/*float16*
  
  On SPR, there're regressions related to FLT_EVAL_METHODS for pr69225-[1234567].c
 since TARGET_AVX512FP16 will set FLT_EVAL_MATHOD as FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16.

 gcc/common/config/i386/cpuinfo.h              |    2 +
 gcc/common/config/i386/i386-common.c          |   26 +-
 gcc/common/config/i386/i386-cpuinfo.h         |    1 +
 gcc/common/config/i386/i386-isas.h            |    1 +
 gcc/config.gcc                                |    2 +-
 gcc/config/i386/avx512fp16intrin.h            |  225 ++++
 gcc/config/i386/cpuid.h                       |    1 +
 gcc/config/i386/i386-builtin-types.def        |    7 +-
 gcc/config/i386/i386-builtins.c               |   23 +
 gcc/config/i386/i386-c.c                      |    2 +
 gcc/config/i386/i386-expand.c                 |  129 +-
 gcc/config/i386/i386-isa.def                  |    1 +
 gcc/config/i386/i386-modes.def                |   13 +-
 gcc/config/i386/i386-options.c                |    4 +-
 gcc/config/i386/i386.c                        |  238 +++-
 gcc/config/i386/i386.h                        |   28 +-
 gcc/config/i386/i386.md                       |  304 ++++-
 gcc/config/i386/i386.opt                      |    4 +
 gcc/config/i386/immintrin.h                   |    4 +
 gcc/config/i386/sse.md                        |  395 ++++--
 gcc/doc/extend.texi                           |   16 +
 gcc/doc/invoke.texi                           |   10 +-
 gcc/lto/lto-lang.c                            |    3 +
 gcc/optabs-query.c                            |   10 +-
 gcc/testsuite/g++.dg/other/i386-2.C           |    2 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |    2 +-
 gcc/testsuite/g++.target/i386/float16-1.C     |    8 +
 gcc/testsuite/g++.target/i386/float16-2.C     |   14 +
 gcc/testsuite/g++.target/i386/float16-3.C     |   10 +
 gcc/testsuite/gcc.target/i386/avx-1.c         |    2 +-
 gcc/testsuite/gcc.target/i386/avx-2.c         |    2 +-
 gcc/testsuite/gcc.target/i386/avx512-check.h  |    3 +
 .../gcc.target/i386/avx512fp16-10a.c          |   14 +
 .../gcc.target/i386/avx512fp16-10b.c          |   25 +
 .../gcc.target/i386/avx512fp16-12a.c          |   21 +
 .../gcc.target/i386/avx512fp16-12b.c          |   27 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |   24 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |   32 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |   26 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |   33 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |   30 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |   28 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |   33 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |   36 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |   36 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |   35 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |   40 +
 gcc/testsuite/gcc.target/i386/avx512fp16-4.c  |   31 +
 gcc/testsuite/gcc.target/i386/avx512fp16-5.c  |  133 ++
 gcc/testsuite/gcc.target/i386/avx512fp16-6.c  |   57 +
 gcc/testsuite/gcc.target/i386/avx512fp16-7.c  |   86 ++
 gcc/testsuite/gcc.target/i386/avx512fp16-8.c  |   53 +
 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c |   27 +
 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c |   49 +
 .../gcc.target/i386/avx512fp16-vararg-1.c     |  122 ++
 .../gcc.target/i386/avx512fp16-vararg-2.c     |  107 ++
 .../gcc.target/i386/avx512fp16-vararg-3.c     |  114 ++
 .../gcc.target/i386/avx512fp16-vararg-4.c     |  115 ++
 .../gcc.target/i386/avx512fp16-vec_set_var.c  |   30 +
 gcc/testsuite/gcc.target/i386/float16-3a.c    |   10 +
 gcc/testsuite/gcc.target/i386/float16-3b.c    |   10 +
 gcc/testsuite/gcc.target/i386/float16-4a.c    |   10 +
 gcc/testsuite/gcc.target/i386/float16-4b.c    |   10 +
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |    2 +
 gcc/testsuite/gcc.target/i386/m512-check.h    |   38 +-
 gcc/testsuite/gcc.target/i386/pr54855-12.c    |   14 +
 gcc/testsuite/gcc.target/i386/pr54855-13.c    |   14 +
 gcc/testsuite/gcc.target/i386/sse-13.c        |    2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |    2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |    4 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |    2 +-
 .../gcc.target/i386/sse2-float16-1.c          |    8 +
 .../gcc.target/i386/sse2-float16-2.c          |   16 +
 .../gcc.target/i386/sse2-float16-3.c          |   12 +
 .../abi/avx512fp16/abi-avx512fp16-xmm.exp     |   48 +
 .../gcc.target/x86_64/abi/avx512fp16/args.h   |  190 +++
 .../x86_64/abi/avx512fp16/asm-support.S       |   81 ++
 .../x86_64/abi/avx512fp16/avx512fp16-check.h  |   74 ++
 .../abi/avx512fp16/avx512fp16-xmm-check.h     |    3 +
 .../x86_64/abi/avx512fp16/defines.h           |  150 +++
 .../avx512fp16/m256h/abi-avx512fp16-ymm.exp   |   45 +
 .../x86_64/abi/avx512fp16/m256h/args.h        |  182 +++
 .../x86_64/abi/avx512fp16/m256h/asm-support.S |   81 ++
 .../avx512fp16/m256h/avx512fp16-ymm-check.h   |    3 +
 .../avx512fp16/m256h/test_m256_returning.c    |   54 +
 .../abi/avx512fp16/m256h/test_passing_m256.c  |  370 ++++++
 .../avx512fp16/m256h/test_passing_structs.c   |  113 ++
 .../avx512fp16/m256h/test_passing_unions.c    |  337 ++++++
 .../abi/avx512fp16/m256h/test_varargs-m256.c  |  160 +++
 .../avx512fp16/m512h/abi-avx512fp16-zmm.exp   |   48 +
 .../x86_64/abi/avx512fp16/m512h/args.h        |  186 +++
 .../x86_64/abi/avx512fp16/m512h/asm-support.S |   97 ++
 .../avx512fp16/m512h/avx512fp16-zmm-check.h   |    4 +
 .../avx512fp16/m512h/test_m512_returning.c    |   62 +
 .../abi/avx512fp16/m512h/test_passing_m512.c  |  380 ++++++
 .../avx512fp16/m512h/test_passing_structs.c   |  123 ++
 .../avx512fp16/m512h/test_passing_unions.c    |  415 +++++++
 .../abi/avx512fp16/m512h/test_varargs-m512.c  |  164 +++
 .../gcc.target/x86_64/abi/avx512fp16/macros.h |   53 +
 .../test_3_element_struct_and_unions.c        |  692 +++++++++++
 .../abi/avx512fp16/test_basic_alignment.c     |   45 +
 .../test_basic_array_size_and_align.c         |   43 +
 .../abi/avx512fp16/test_basic_returning.c     |   87 ++
 .../x86_64/abi/avx512fp16/test_basic_sizes.c  |   43 +
 .../test_basic_struct_size_and_align.c        |   42 +
 .../test_basic_union_size_and_align.c         |   40 +
 .../abi/avx512fp16/test_complex_returning.c   |  104 ++
 .../abi/avx512fp16/test_m64m128_returning.c   |   73 ++
 .../abi/avx512fp16/test_passing_floats.c      | 1066 +++++++++++++++++
 .../abi/avx512fp16/test_passing_m64m128.c     |  510 ++++++++
 .../abi/avx512fp16/test_passing_structs.c     |  332 +++++
 .../abi/avx512fp16/test_passing_unions.c      |  335 ++++++
 .../abi/avx512fp16/test_struct_returning.c    |  274 +++++
 .../x86_64/abi/avx512fp16/test_varargs-m128.c |  164 +++
 gcc/testsuite/lib/target-supports.exp         |   13 +-
 libgcc/config.host                            |    5 +-
 libgcc/config/i386/32/sfp-machine.h           |    1 +
 libgcc/config/i386/64/sfp-machine.h           |    1 +
 libgcc/config/i386/64/t-softfp                |    1 +
 libgcc/config/i386/sfp-machine.h              |    1 +
 libgcc/config/i386/t-softfp                   |    5 +
 libgcc/soft-fp/eqhf2.c                        |   49 +
 libgcc/soft-fp/extendhfdf2.c                  |   53 +
 libgcc/soft-fp/extendhfsf2.c                  |   49 +
 libgcc/soft-fp/half.h                         |    1 +
 libgcc/soft-fp/truncdfhf2.c                   |   52 +
 libgcc/soft-fp/truncsfhf2.c                   |   48 +
 127 files changed, 10324 insertions(+), 238 deletions(-)
 create mode 100644 gcc/config/i386/avx512fp16intrin.h
 create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vec_set_var.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c
 create mode 100644 libgcc/config/i386/64/t-softfp
 create mode 100644 libgcc/soft-fp/eqhf2.c
 create mode 100644 libgcc/soft-fp/extendhfdf2.c
 create mode 100644 libgcc/soft-fp/extendhfsf2.c
 create mode 100644 libgcc/soft-fp/truncdfhf2.c
 create mode 100644 libgcc/soft-fp/truncsfhf2.c

-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 01/10] Update hf soft-fp from glibc.
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-07-21  7:43         ` [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
                           ` (9 subsequent siblings)
  10 siblings, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

libgcc/ChangeLog

	* soft-fp/eqhf2.c: New file.
	* soft-fp/extendhfdf2.c: New file.
	* soft-fp/extendhfsf2.c: New file.
	* soft-fp/extendhfxf2.c: New file.
	* soft-fp/half.h (FP_CMP_EQ_H): New marco.
	* soft-fp/truncdfhf2.c: New file
	* soft-fp/truncsfhf2.c: New file
	* soft-fp/truncxfhf2.c: New file
---
 libgcc/soft-fp/eqhf2.c       | 49 +++++++++++++++++++++++++++++++++
 libgcc/soft-fp/extendhfdf2.c | 53 ++++++++++++++++++++++++++++++++++++
 libgcc/soft-fp/extendhfsf2.c | 49 +++++++++++++++++++++++++++++++++
 libgcc/soft-fp/half.h        |  1 +
 libgcc/soft-fp/truncdfhf2.c  | 52 +++++++++++++++++++++++++++++++++++
 libgcc/soft-fp/truncsfhf2.c  | 48 ++++++++++++++++++++++++++++++++
 6 files changed, 252 insertions(+)
 create mode 100644 libgcc/soft-fp/eqhf2.c
 create mode 100644 libgcc/soft-fp/extendhfdf2.c
 create mode 100644 libgcc/soft-fp/extendhfsf2.c
 create mode 100644 libgcc/soft-fp/truncdfhf2.c
 create mode 100644 libgcc/soft-fp/truncsfhf2.c

diff --git a/libgcc/soft-fp/eqhf2.c b/libgcc/soft-fp/eqhf2.c
new file mode 100644
index 00000000000..6d6634e5c54
--- /dev/null
+++ b/libgcc/soft-fp/eqhf2.c
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return 0 iff a == b, 1 otherwise
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+
+CMPtype
+__eqhf2 (HFtype a, HFtype b)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_H (B);
+  CMPtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+  FP_UNPACK_RAW_H (B, b);
+  FP_CMP_EQ_H (r, A, B, 1);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
+
+strong_alias (__eqhf2, __nehf2);
diff --git a/libgcc/soft-fp/extendhfdf2.c b/libgcc/soft-fp/extendhfdf2.c
new file mode 100644
index 00000000000..337ba791d48
--- /dev/null
+++ b/libgcc/soft-fp/extendhfdf2.c
@@ -0,0 +1,53 @@
+/* Software floating-point emulation.
+   Return an IEEE half converted to IEEE double
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "half.h"
+#include "double.h"
+
+DFtype
+__extendhfdf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_D (R);
+  DFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_EXTEND (D, H, 2, 1, R, A);
+#else
+  FP_EXTEND (D, H, 1, 1, R, A);
+#endif
+  FP_PACK_RAW_D (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/libgcc/soft-fp/extendhfsf2.c b/libgcc/soft-fp/extendhfsf2.c
new file mode 100644
index 00000000000..a02f46d9a99
--- /dev/null
+++ b/libgcc/soft-fp/extendhfsf2.c
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return an IEEE half converted to IEEE single
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "half.h"
+#include "single.h"
+
+SFtype
+__extendhfsf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (R);
+  SFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+  FP_EXTEND (S, H, 1, 1, R, A);
+  FP_PACK_RAW_S (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/libgcc/soft-fp/half.h b/libgcc/soft-fp/half.h
index c7823ac61d3..4108f5cb3c2 100644
--- a/libgcc/soft-fp/half.h
+++ b/libgcc/soft-fp/half.h
@@ -167,4 +167,5 @@ union _FP_UNION_H
 #define _FP_FRAC_HIGH_RAW_H(X)	_FP_FRAC_HIGH_1 (X)
 #define _FP_FRAC_HIGH_DW_H(X)	_FP_FRAC_HIGH_1 (X)
 
+#define FP_CMP_EQ_H(r, X, Y, ex)	_FP_CMP_EQ (H, 1, (r), X, Y, (ex))
 #endif /* !SOFT_FP_HALF_H */
diff --git a/libgcc/soft-fp/truncdfhf2.c b/libgcc/soft-fp/truncdfhf2.c
new file mode 100644
index 00000000000..8bcb2787692
--- /dev/null
+++ b/libgcc/soft-fp/truncdfhf2.c
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE double into IEEE half.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "double.h"
+
+HFtype
+__truncdfhf2 (DFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_D (A);
+  FP_DECL_H (R);
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_D (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_TRUNC (H, D, 1, 2, R, A);
+#else
+  FP_TRUNC (H, D, 1, 1, R, A);
+#endif
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/libgcc/soft-fp/truncsfhf2.c b/libgcc/soft-fp/truncsfhf2.c
new file mode 100644
index 00000000000..25bee29f7f5
--- /dev/null
+++ b/libgcc/soft-fp/truncsfhf2.c
@@ -0,0 +1,48 @@
+/* Software floating-point emulation.
+   Truncate IEEE single into IEEE half.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "single.h"
+
+HFtype
+__truncsfhf2 (SFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_S (A);
+  FP_DECL_H (R);
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_S (A, a);
+  FP_TRUNC (H, S, 1, 1, R, A);
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
  2021-07-21  7:43         ` [PATCH 01/10] Update hf soft-fp from glibc liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-07-21 10:35           ` Uros Bizjak
                             ` (2 more replies)
  2021-07-21  7:43         ` [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations liuhongt
                           ` (8 subsequent siblings)
  10 siblings, 3 replies; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

gcc/ChangeLog:

	* config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
	* config/i386/i386.c (enum x86_64_reg_class): Add
	X86_64_SSEHF_CLASS.
	(merge_classes): Handle X86_64_SSEHF_CLASS.
	(examine_argument): Ditto.
	(construct_container): Ditto.
	(classify_argument): Ditto, and set HFmode/HCmode to
	X86_64_SSEHF_CLASS.
	(function_value_32): Return _FLoat16/Complex Float16 by
	%xmm0/%xmm1.
	(function_value_64): Return _Float16/Complex Float16 by SSE
	register.
	(ix86_print_operand): Handle CONST_DOUBLE HFmode.
	(ix86_secondary_reload): Require gpr as intermediate register
	to store _Float16 from sse register when sse4 is not
	available.
	(ix86_hard_regno_mode_ok): Put HFmode in sse register and gpr.
	(ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
	sse2.
	(ix86_scalar_mode_supported_p): Ditto.
	(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
	(ix86_get_excess_precision): Return
	FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 under sse2.
	* config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
	* config/i386/i386.md (*pushhf_rex64): New define_insn.
	(*pushhf): Ditto.
	(*movhf_internal): Ditto.
	* doc/extend.texi (Half-Precision Floating Point): Documemt
	_Float16 for x86.

gcc/lto/ChangeLog:

	* lto-lang.c (lto_type_for_mode): Return float16_type_node
	when mode == TYPE_MODE (float16_type_node).

gcc/testsuite/ChangeLog

	* gcc.target/i386/sse2-float16-1.c: New test.
	* gcc.target/i386/sse2-float16-2.c: Ditto.
	* gcc.target/i386/sse2-float16-3.c: Ditto.
---
 gcc/config/i386/i386-modes.def                |   1 +
 gcc/config/i386/i386.c                        |  99 ++++++++++++++-
 gcc/config/i386/i386.h                        |   2 +-
 gcc/config/i386/i386.md                       | 118 +++++++++++++++++-
 gcc/doc/extend.texi                           |  16 +++
 gcc/lto/lto-lang.c                            |   3 +
 .../gcc.target/i386/sse2-float16-1.c          |   8 ++
 .../gcc.target/i386/sse2-float16-2.c          |  16 +++
 .../gcc.target/i386/sse2-float16-3.c          |  12 ++
 9 files changed, 265 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c

diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 4e7014be034..9232f59a925 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 
 FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
+FLOAT_MODE (HF, 2, ieee_half_format);
 
 /* In ILP32 mode, XFmode has size 12 and alignment 4.
    In LP64 mode, XFmode has size and alignment 16.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff96134fb37..02628d838fc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -387,6 +387,7 @@ enum x86_64_reg_class
     X86_64_INTEGER_CLASS,
     X86_64_INTEGERSI_CLASS,
     X86_64_SSE_CLASS,
+    X86_64_SSEHF_CLASS,
     X86_64_SSESF_CLASS,
     X86_64_SSEDF_CLASS,
     X86_64_SSEUP_CLASS,
@@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum x86_64_reg_class class2)
     return X86_64_MEMORY_CLASS;
 
   /* Rule #4: If one of the classes is INTEGER, the result is INTEGER.  */
-  if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
-      || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS))
+  if ((class1 == X86_64_INTEGERSI_CLASS
+       && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
+      || (class2 == X86_64_INTEGERSI_CLASS
+	  && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS)))
     return X86_64_INTEGERSI_CLASS;
   if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
       || class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS)
@@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
 	    /* The partial classes are now full classes.  */
 	    if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4)
 	      subclasses[0] = X86_64_SSE_CLASS;
+	    if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2)
+	      subclasses[0] = X86_64_SSE_CLASS;
 	    if (subclasses[0] == X86_64_INTEGERSI_CLASS
 		&& !((bit_offset % 64) == 0 && bytes == 4))
 	      subclasses[0] = X86_64_INTEGER_CLASS;
@@ -2350,6 +2355,12 @@ classify_argument (machine_mode mode, const_tree type,
       gcc_unreachable ();
     case E_CTImode:
       return 0;
+    case E_HFmode:
+      if (!(bit_offset % 64))
+	classes[0] = X86_64_SSEHF_CLASS;
+      else
+	classes[0] = X86_64_SSE_CLASS;
+      return 1;
     case E_SFmode:
       if (!(bit_offset % 64))
 	classes[0] = X86_64_SSESF_CLASS;
@@ -2367,6 +2378,15 @@ classify_argument (machine_mode mode, const_tree type,
       classes[0] = X86_64_SSE_CLASS;
       classes[1] = X86_64_SSEUP_CLASS;
       return 2;
+    case E_HCmode:
+      classes[0] = X86_64_SSE_CLASS;
+      if (!(bit_offset % 64))
+	return 1;
+      else
+	{
+	  classes[1] = X86_64_SSEHF_CLASS;
+	  return 2;
+	}
     case E_SCmode:
       classes[0] = X86_64_SSE_CLASS;
       if (!(bit_offset % 64))
@@ -2481,6 +2501,7 @@ examine_argument (machine_mode mode, const_tree type, int in_return,
 	(*int_nregs)++;
 	break;
       case X86_64_SSE_CLASS:
+      case X86_64_SSEHF_CLASS:
       case X86_64_SSESF_CLASS:
       case X86_64_SSEDF_CLASS:
 	(*sse_nregs)++;
@@ -2580,13 +2601,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
 
   /* First construct simple cases.  Avoid SCmode, since we want to use
      single register to pass this type.  */
-  if (n == 1 && mode != SCmode)
+  if (n == 1 && mode != SCmode && mode != HCmode)
     switch (regclass[0])
       {
       case X86_64_INTEGER_CLASS:
       case X86_64_INTEGERSI_CLASS:
 	return gen_rtx_REG (mode, intreg[0]);
       case X86_64_SSE_CLASS:
+      case X86_64_SSEHF_CLASS:
       case X86_64_SSESF_CLASS:
       case X86_64_SSEDF_CLASS:
 	if (mode != BLKmode)
@@ -2683,6 +2705,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
 				   GEN_INT (i*8));
 	    intreg++;
 	    break;
+	  case X86_64_SSEHF_CLASS:
+	    exp [nexps++]
+	      = gen_rtx_EXPR_LIST (VOIDmode,
+				   gen_rtx_REG (HFmode,
+						GET_SSE_REGNO (sse_regno)),
+				   GEN_INT (i*8));
+	    sse_regno++;
+	    break;
 	  case X86_64_SSESF_CLASS:
 	    exp [nexps++]
 	      = gen_rtx_EXPR_LIST (VOIDmode,
@@ -3903,6 +3933,19 @@ function_value_32 (machine_mode orig_mode, machine_mode mode,
     /* Most things go in %eax.  */
     regno = AX_REG;
 
+  /* Return _Float16/_Complex _Foat16 by sse register.  */
+  if (mode == HFmode)
+    regno = FIRST_SSE_REG;
+  if (mode == HCmode)
+    {
+      rtx ret = gen_rtx_PARALLEL (mode, rtvec_alloc(1));
+      XVECEXP (ret, 0, 0)
+	= gen_rtx_EXPR_LIST (VOIDmode,
+			     gen_rtx_REG (SImode, FIRST_SSE_REG),
+			     GEN_INT (0));
+      return ret;
+    }
+
   /* Override FP return register with %xmm0 for local functions when
      SSE math is enabled or for functions with sseregparm attribute.  */
   if ((fn || fntype) && (mode == SFmode || mode == DFmode))
@@ -3939,6 +3982,8 @@ function_value_64 (machine_mode orig_mode, machine_mode mode,
 
       switch (mode)
 	{
+	case E_HFmode:
+	case E_HCmode:
 	case E_SFmode:
 	case E_SCmode:
 	case E_DFmode:
@@ -13411,6 +13456,15 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	  (file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P');
     }
 
+  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == HFmode)
+    {
+      long l = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (x),
+			       REAL_MODE_FORMAT (HFmode));
+      if (ASSEMBLER_DIALECT == ASM_ATT)
+	putc ('$', file);
+      fprintf (file, "0x%04x", (unsigned int) l);
+    }
+
   else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
     {
       long l;
@@ -18928,6 +18982,16 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
       return NO_REGS;
     }
 
+  /* Require movement to gpr, and then store to memory.  */
+  if (mode == HFmode
+      && !TARGET_SSE4_1
+      && SSE_CLASS_P (rclass)
+      && !in_p && MEM_P (x))
+    {
+      sri->extra_cost = 1;
+      return GENERAL_REGS;
+    }
+
   /* This condition handles corner case where an expression involving
      pointers gets vectorized.  We're trying to use the address of a
      stack slot as a vector initializer.
@@ -19546,6 +19610,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
   else if (VALID_INT_MODE_P (mode)
 	   || VALID_FP_MODE_P (mode))
     return true;
+  else if (mode == HFmode || mode == HCmode)
+    return true;
   /* Lots of MMX code casts 8 byte vector modes to DImode.  If we then go
      on to use that value in smaller contexts, this can easily force a
      pseudo to be allocated to GENERAL_REGS.  Since this is no worse than
@@ -21555,10 +21621,27 @@ ix86_scalar_mode_supported_p (scalar_mode mode)
     return default_decimal_float_supported_p ();
   else if (mode == TFmode)
     return true;
+  else if (mode == HFmode && TARGET_SSE2)
+    return true;
   else
     return default_scalar_mode_supported_p (mode);
 }
 
+/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
+   if MODE is HFmode, and punt to the generic implementation otherwise.  */
+
+static bool
+ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode)
+{
+  /* NB: Always return TRUE for HFmode so that the _Float16 type will
+     be defined by the C front-end for AVX512FP16 intrinsics.  We will
+     issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
+     enabled.  */
+  return ((mode == HFmode && TARGET_SSE2)
+	  ? true
+	  : default_libgcc_floating_mode_supported_p (mode));
+}
+
 /* Implements target hook vector_mode_supported_p.  */
 static bool
 ix86_vector_mode_supported_p (machine_mode mode)
@@ -23254,13 +23337,15 @@ ix86_get_excess_precision (enum excess_precision_type type)
 	   provide would be identical were it not for the unpredictable
 	   cases.  */
 	if (!TARGET_80387)
-	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+	  return TARGET_SSE2
+		 ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
+		 : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
 	else if (!TARGET_MIX_SSE_I387)
 	  {
 	    if (!(TARGET_SSE && TARGET_SSE_MATH))
 	      return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
 	    else if (TARGET_SSE2)
-	      return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+	      return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
 	  }
 
 	/* If we are in standards compliant mode, but we know we will
@@ -23820,6 +23905,10 @@ ix86_run_selftests (void)
 #undef TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P ix86_scalar_mode_supported_p
 
+#undef TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P
+#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P	\
+ix86_libgcc_floating_mode_supported_p
+
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P ix86_vector_mode_supported_p
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 0c2c93daf32..e21922e8782 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1018,7 +1018,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define VALID_SSE2_REG_MODE(MODE)					\
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode	\
    || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode	\
-   || (MODE) == V2DImode || (MODE) == DFmode)
+   || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
 
 #define VALID_SSE_REG_MODE(MODE)					\
   ((MODE) == V1TImode || (MODE) == TImode				\
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8b809c49fe0..dd991c3ffdf 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1222,6 +1222,9 @@ (define_mode_iterator MODEF [SF DF])
 ;; All x87 floating point modes
 (define_mode_iterator X87MODEF [SF DF XF])
 
+;; All x87 floating point modes plus HF
+(define_mode_iterator X87MODEFH [SF DF XF HF])
+
 ;; All SSE floating point modes
 (define_mode_iterator SSEMODEF [SF DF TF])
 (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
@@ -3130,6 +3133,32 @@ (define_split
   operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx);
 })
 
+(define_insn "*pushhf_rex64"
+  [(set (match_operand:HF 0 "push_operand" "=X,X")
+	(match_operand:HF 1 "nonmemory_no_elim_operand" "r,x"))]
+  "TARGET_64BIT"
+{
+  /* Anything else should be already split before reg-stack.  */
+  gcc_assert (which_alternative == 0);
+  return "push{q}\t%q1";
+}
+  [(set_attr "type" "push,multi")
+   (set_attr "mode" "DI,TI")
+   (set_attr "isa"  "*,sse4")])
+
+(define_insn "*pushhf"
+  [(set (match_operand:HF 0 "push_operand" "=X,X")
+	(match_operand:HF 1 "general_no_elim_operand" "rmF,x"))]
+  "!TARGET_64BIT"
+{
+  /* Anything else should be already split before reg-stack.  */
+  gcc_assert (which_alternative == 0);
+  return "push{l}\t%k1";
+}
+  [(set_attr "type" "push,multi")
+   (set_attr "mode" "SI,TI")
+   (set_attr "isa"  "*,sse4")])
+
 (define_insn "*pushsf_rex64"
   [(set (match_operand:SF 0 "push_operand" "=X,X,X")
 	(match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,v"))]
@@ -3158,10 +3187,11 @@ (define_insn "*pushsf"
    (set_attr "unit" "i387,*,*")
    (set_attr "mode" "SF,SI,SF")])
 
+(define_mode_iterator MODESH [SF HF])
 ;; %%% Kill this when call knows how to work this out.
 (define_split
-  [(set (match_operand:SF 0 "push_operand")
-	(match_operand:SF 1 "any_fp_register_operand"))]
+  [(set (match_operand:MODESH 0 "push_operand")
+	(match_operand:MODESH 1 "any_fp_register_operand"))]
   "reload_completed"
   [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2)))
    (set (match_dup 0) (match_dup 1))]
@@ -3209,8 +3239,8 @@ (define_expand "movtf"
   "ix86_expand_move (TFmode, operands); DONE;")
 
 (define_expand "mov<mode>"
-  [(set (match_operand:X87MODEF 0 "nonimmediate_operand")
-	(match_operand:X87MODEF 1 "general_operand"))]
+  [(set (match_operand:X87MODEFH 0 "nonimmediate_operand")
+	(match_operand:X87MODEFH 1 "general_operand"))]
   ""
   "ix86_expand_move (<MODE>mode, operands); DONE;")
 
@@ -3646,6 +3676,86 @@ (define_insn "*movsf_internal"
 	   ]
 	   (const_string "*")))])
 
+(define_insn "*movhf_internal"
+ [(set (match_operand:HF 0 "nonimmediate_operand"
+	 "=?r,?m,v,v,?r,m,?v,v")
+       (match_operand:HF 1 "general_operand"
+	 "rmF,rF,C,v, v,v, r,m"))]
+ "!(MEM_P (operands[0]) && MEM_P (operands[1]))
+  && (lra_in_progress
+      || reload_completed
+      || !CONST_DOUBLE_P (operands[1])
+      || (TARGET_SSE && TARGET_SSE_MATH
+	  && standard_sse_constant_p (operands[1], HFmode) == 1)
+      || memory_operand (operands[0], HFmode))"
+{
+  switch (get_attr_type (insn))
+    {
+    case TYPE_IMOV:
+      return "mov{w}\t{%1, %0|%0, %1}";
+
+    case TYPE_SSELOG1:
+      return standard_sse_constant_opcode (insn, operands);
+
+    case TYPE_SSEMOV:
+      return ix86_output_ssemov (insn, operands);
+
+    case TYPE_SSELOG:
+      if (SSE_REG_P (operands[0]))
+	return MEM_P (operands[1])
+	       ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
+	       : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+      else
+	return MEM_P (operands[1])
+	       ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
+	       : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set (attr "isa")
+	(cond [(eq_attr "alternative" "2,3,4,6,7")
+		 (const_string "sse2")
+	       (eq_attr "alternative" "5")
+		 (const_string "sse4")
+	      ]
+	      (const_string "*")))
+   (set (attr "type")
+	(cond [(eq_attr "alternative" "0,1")
+		 (const_string "imov")
+	       (eq_attr "alternative" "2")
+		 (const_string "sselog1")
+	       (eq_attr "alternative" "4,5,6,7")
+		 (const_string "sselog")
+	      ]
+	      (const_string "ssemov")))
+   (set (attr "memory")
+	(cond [(eq_attr "alternative" "4,6")
+		 (const_string "none")
+	       (eq_attr "alternative" "5")
+		 (const_string "store")
+	       (eq_attr "alternative" "7")
+		 (const_string "load")
+	      ]
+	      (const_string "*")))
+   (set (attr "prefix")
+	(cond [(eq_attr "alternative" "0,1")
+		 (const_string "orig")
+	      ]
+	      (const_string "maybe_vex")))
+   (set (attr "mode")
+	(cond [(eq_attr "alternative" "0,1")
+		 (const_string "HI")
+	       (eq_attr "alternative" "2")
+		 (const_string "V4SF")
+	       (eq_attr "alternative" "4,5,6,7")
+		 (const_string "TI")
+	       (eq_attr "alternative" "3")
+		 (const_string "SF")
+	      ]
+	      (const_string "*")))])
+
 (define_split
   [(set (match_operand 0 "any_fp_register_operand")
 	(match_operand 1 "memory_operand"))]
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b83cd4919bb..2cd0b38fe5b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
 @section Half-Precision Floating Point
 @cindex half-precision floating point
 @cindex @code{__fp16} data type
+@cindex @code{__Float16} data type
 
 On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
 point via the @code{__fp16} type defined in the ARM C Language Extensions.
@@ -1150,6 +1151,21 @@ calls.
 It is recommended that portable code use the @code{_Float16} type defined
 by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
 
+On x86 targets with @code{target("sse2")} and above, GCC supports half-precision
+(16-bit) floating point via the @code{_Float16} type which is defined by
+18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
+which contains same data format as C.
+
+Without @code{target("avx512fp16")} @code{_Float16} type is storage only, and all
+operations will be emulated by soft-fp and @code{float} instructions.
+
+Soft-fp keeps the intermediate result of the operation at 32-bit precision by defaults,
+which may lead to inconsistent behavior between soft-fp and avx512fp16 instructions,
+using @option{-fexcess-precision=standard} will force round back after every operation.
+
+With @option{-mavx512fp16}, instead of calling soft-fp, GCC automatically generates
+hardware instructions.
+
 @node Decimal Float
 @section Decimal Floating Types
 @cindex decimal floating types
diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
index c13c7e45ac1..92f499643b5 100644
--- a/gcc/lto/lto-lang.c
+++ b/gcc/lto/lto-lang.c
@@ -992,6 +992,9 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
     return unsigned_p ? unsigned_intTI_type_node : intTI_type_node;
 #endif
 
+  if (float16_type_node && mode == TYPE_MODE (float16_type_node))
+    return float16_type_node;
+
   if (mode == TYPE_MODE (float_type_node))
     return float_type_node;
 
diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-1.c b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
new file mode 100644
index 00000000000..1b645eb499d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-sse2" } */
+
+_Float16/* { dg-error "is not supported on this target" } */
+foo (_Float16 x) /* { dg-error "is not supported on this target" } */
+{
+  return x;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-2.c b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
new file mode 100644
index 00000000000..3da7683fc31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx512f" } */
+
+union flt
+{
+  _Float16 flt;
+  short s;
+};
+
+_Float16
+foo (union flt x)
+{
+  return x.flt;
+}
+
+/* { dg-final { scan-assembler {(?n)pinsrw[\t ].*%xmm0} } } */
diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-3.c b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
new file mode 100644
index 00000000000..60ff9d4ab80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx512f" } */
+
+#include<complex.h>
+
+_Complex _Float16
+foo (_Complex _Float16 x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler {(?n)movd[\t ].*%xmm0} } } */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
  2021-07-21  7:43         ` [PATCH 01/10] Update hf soft-fp from glibc liuhongt
  2021-07-21  7:43         ` [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-07-21 10:51           ` Uros Bizjak
  2021-07-22 12:14           ` Richard Biener
  2021-07-21  7:43         ` [PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions liuhongt
                           ` (7 subsequent siblings)
  10 siblings, 2 replies; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

gcc/ChangeLog:

	* optabs-query.c (get_best_extraction_insn): Use word_mode for
	HF field.

libgcc/ChangeLog:

	* config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto.
	* config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto.
	* config/i386/t-softfp: Add hf soft-fp.
	* config.host: Add i386/64/t-softfp.
	* config/i386/64/t-softfp: New file.
---
 gcc/optabs-query.c                  | 10 +++++++++-
 libgcc/config.host                  |  5 +----
 libgcc/config/i386/32/sfp-machine.h |  1 +
 libgcc/config/i386/64/sfp-machine.h |  1 +
 libgcc/config/i386/64/t-softfp      |  1 +
 libgcc/config/i386/sfp-machine.h    |  1 +
 libgcc/config/i386/t-softfp         |  5 +++++
 7 files changed, 19 insertions(+), 5 deletions(-)
 create mode 100644 libgcc/config/i386/64/t-softfp

diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
index 05ee5f517da..0438e451474 100644
--- a/gcc/optabs-query.c
+++ b/gcc/optabs-query.c
@@ -205,7 +205,15 @@ get_best_extraction_insn (extraction_insn *insn,
 			  machine_mode field_mode)
 {
   opt_scalar_int_mode mode_iter;
-  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode_for_size (struct_bits))
+  scalar_int_mode smallest_int_mode;
+  /* FIXME: validate_subreg only allows (subreg:WORD_MODE (reg:HF) 0). */
+  if (FLOAT_MODE_P (field_mode)
+      && known_eq (GET_MODE_SIZE (field_mode), 2))
+    smallest_int_mode = word_mode;
+  else
+    smallest_int_mode = smallest_int_mode_for_size (struct_bits);
+
+  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode)
     {
       scalar_int_mode mode = mode_iter.require ();
       if (get_extraction_insn (insn, pattern, type, mode))
diff --git a/libgcc/config.host b/libgcc/config.host
index 50f00062232..96da9ef1cce 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1540,10 +1540,7 @@ i[34567]86-*-elfiamcu | i[34567]86-*-rtems*)
 	;;
 i[34567]86-*-* | x86_64-*-*)
   	tmake_file="${tmake_file} t-softfp-tf"
-	if test "${host_address}" = 32; then
-		tmake_file="${tmake_file} i386/${host_address}/t-softfp"
-	fi
-	tmake_file="${tmake_file} i386/t-softfp t-softfp"
+	tmake_file="${tmake_file} i386/${host_address}/t-softfp i386/t-softfp t-softfp"
 	;;
 esac
 
diff --git a/libgcc/config/i386/32/sfp-machine.h b/libgcc/config/i386/32/sfp-machine.h
index 1fa282d7afe..e24cbc8d180 100644
--- a/libgcc/config/i386/32/sfp-machine.h
+++ b/libgcc/config/i386/32/sfp-machine.h
@@ -86,6 +86,7 @@
 #define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
+#define _FP_NANFRAC_H		_FP_QNANBIT_H
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 /* Even if XFmode is 12byte,  we have to pad it to
diff --git a/libgcc/config/i386/64/sfp-machine.h b/libgcc/config/i386/64/sfp-machine.h
index 1ff94c23ea4..e1c616699bb 100644
--- a/libgcc/config/i386/64/sfp-machine.h
+++ b/libgcc/config/i386/64/sfp-machine.h
@@ -13,6 +13,7 @@ typedef unsigned int UTItype __attribute__ ((mode (TI)));
 
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
+#define _FP_NANFRAC_H		_FP_QNANBIT_H
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D
 #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
diff --git a/libgcc/config/i386/64/t-softfp b/libgcc/config/i386/64/t-softfp
new file mode 100644
index 00000000000..d812bb120bd
--- /dev/null
+++ b/libgcc/config/i386/64/t-softfp
@@ -0,0 +1 @@
+softfp_extras := fixhfti fixunshfti floattihf floatuntihf
\ No newline at end of file
diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
index 8319f0550bc..f15d29d3755 100644
--- a/libgcc/config/i386/sfp-machine.h
+++ b/libgcc/config/i386/sfp-machine.h
@@ -17,6 +17,7 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
 #define _FP_KEEPNANFRACP	1
 #define _FP_QNANNEGATEDP 0
 
+#define _FP_NANSIGN_H		1
 #define _FP_NANSIGN_S		1
 #define _FP_NANSIGN_D		1
 #define _FP_NANSIGN_E		1
diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp
index 685d9cf8502..4ac214eb0ce 100644
--- a/libgcc/config/i386/t-softfp
+++ b/libgcc/config/i386/t-softfp
@@ -1 +1,6 @@
 LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c
+
+softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
+softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
+
+softfp_extras += eqhf2
\ No newline at end of file
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
                           ` (2 preceding siblings ...)
  2021-07-21  7:43         ` [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-07-22  8:49           ` Uros Bizjak
  2021-07-21  7:43         ` [PATCH 05/10] AVX512FP16: Support vector init/broadcast/set/extract for FP16 liuhongt
                           ` (6 subsequent siblings)
  10 siblings, 1 reply; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak
  Cc: joseph, hjl.tools, richard.guenther, crazylht, Guo, Xuepeng

From: "Guo, Xuepeng" <xuepeng.guo@intel.com>

gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (get_available_features):
	Detect FEATURE_AVX512FP16.
	* common/config/i386/i386-common.c
	(OPTION_MASK_ISA_AVX512FP16_SET,
	OPTION_MASK_ISA_AVX512FP16_UNSET,
	OPTION_MASK_ISA2_AVX512FP16_SET,
	OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
	(OPTION_MASK_ISA2_AVX512BW_UNSET,
	OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
	(ix86_handle_option): Handle -mavx512fp16.
	* common/config/i386/i386-cpuinfo.h (enum processor_features):
	Add FEATURE_AVX512FP16.
	* common/config/i386/i386-isas.h: Add entry for AVX512FP16.
	* config.gcc: Add avx512fp16intrin.h.
	* config/i386/avx512fp16intrin.h: New intrinsic header.
	* config/i386/cpuid.h: Add bit_AVX512FP16.
	* config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
	* config/i386/i386-builtins.c: Support _Float16 type for i386
	backend.
	(ix86_init_float16_builtins): New function.
	(ix86_float16_type_node): New.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__AVX512FP16__.
	* config/i386/i386-expand.c (ix86_expand_branch): Support
	HFmode.
	(ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
	SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
	(ix86_expand_fp_movcc): Ditto.
	* config/i386/i386-isa.def: Add PTA define for AVX512FP16.
	* config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
	(ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
	* config/i386/i386.c (ix86_get_ssemov): Use
	vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
	(ix86_get_excess_precision): Use
	FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
	existed.
	(output_387_binary_op): Update instruction suffix for HFmode.
	(sse_store_index): Use SFmode cost for HFmode cost.
	(inline_memory_move_cost): Add HFmode, and perfer SSE cost over
	GPR cost for HFmode.
	(ix86_hard_regno_mode_ok): Allow HImode in sse register.
	(ix86_mangle_type): Add manlging for _Float16 type.
	(inline_secondary_memory_needed): No memory is needed for
	16bit movement between gpr and sse reg under
	TARGET_AVX512FP16.
	(ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
	SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
	(ix86_division_cost): Ditto.
	(ix86_rtx_costs): Ditto.
	(ix86_add_stmt_cost): Ditto.
	(ix86_optab_supported_p): Ditto.
	* config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
	(SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
	(SSE_FLOAT_MODE_P): Add HFmode.
	(PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
	* config/i386/i386.md (mode): Add HFmode.
	(MODE_SIZE): Add HFmode.
	(MODEFH): Likewise.
	(ssemodesuffix): Add sh suffix for HFmode.
	(cbranch<mode>4): Use MODEFH.
	(<insn><mode>3): Likewise.
	(mul<mode>3): Likewise.
	(div<mode>3): Likewise.
	(*ieee_s<ieee_maxmin><mode>3): Likewise.
	(*cmpi<unord>hf): New define_insn for HFmode.
	(*movhf_internal): Adjust for avx512fp16 instruction.
	(extendhf<mode>2): Likewise.
	(trunc<mode>hf2): Likewise.
	(*fop_hf_comm): Likewise.
	(*fop_hf_1): Likewise.
	(float<floatunssuffix><mode>hf2): Likewise.
	(mov<mode>cc): Likewise.
	* config/i386/i386.opt: Add mavx512fp16.
	* config/i386/immintrin.h: Include avx512fp16intrin.h.
	* doc/invoke.texi: Add mavx512fp16.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
	* gcc.target/i386/avx-2.c: Ditto.
	* gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute check.
	* gcc.target/i386/sse-13.c: Add -mavx512fp16.
	* gcc.target/i386/sse-14.c: Ditto.
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* lib/target-supports.exp: (check_effective_target_avx512fp16): New.
	* g++.target/i386/float16-1.C: New test.
	* g++.target/i386/float16-2.C: Ditto.
	* g++.target/i386/float16-3.C: Ditto.
	* gcc.target/i386/avx512fp16-12a.c: Ditto.
	* gcc.target/i386/avx512fp16-12b.c: Ditto.
	* gcc.target/i386/float16-3a.c: Ditto.
	* gcc.target/i386/float16-3b.c: Ditto.
	* gcc.target/i386/float16-4a.c: Ditto.
	* gcc.target/i386/float16-4b.c: Ditto.
	* gcc.target/i386/pr54855-12.c: Ditto.
	* g++.dg/other/i386-2.C: Ditto.
	* g++.dg/other/i386-3.C: Ditto.

Co-Authored-By: Guo, Xuepeng <xuepeng.guo@intel.com>
Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
Co-Authored-By: Liu, Hongtao <hongtao.liu@intel.com>
Co-Authored-By: Wang, Hongyu <hongyu.wang@intel.com>
Co-Authored-By: Xu, Dianhong <dianhong.xu@intel.com>
---
 gcc/common/config/i386/cpuinfo.h              |   2 +
 gcc/common/config/i386/i386-common.c          |  26 ++-
 gcc/common/config/i386/i386-cpuinfo.h         |   1 +
 gcc/common/config/i386/i386-isas.h            |   1 +
 gcc/config.gcc                                |   2 +-
 gcc/config/i386/avx512fp16intrin.h            |  53 +++++
 gcc/config/i386/cpuid.h                       |   1 +
 gcc/config/i386/i386-builtin-types.def        |   1 +
 gcc/config/i386/i386-builtins.c               |  23 +++
 gcc/config/i386/i386-c.c                      |   2 +
 gcc/config/i386/i386-expand.c                 |   5 +-
 gcc/config/i386/i386-isa.def                  |   1 +
 gcc/config/i386/i386-options.c                |   4 +-
 gcc/config/i386/i386.c                        | 128 ++++++++----
 gcc/config/i386/i386.h                        |  11 +-
 gcc/config/i386/i386.md                       | 185 ++++++++++++++----
 gcc/config/i386/i386.opt                      |   4 +
 gcc/config/i386/immintrin.h                   |   4 +
 gcc/doc/invoke.texi                           |  10 +-
 gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
 gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
 gcc/testsuite/g++.target/i386/float16-2.C     |  14 ++
 gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
 gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
 gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
 gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
 .../gcc.target/i386/avx512fp16-12a.c          |  21 ++
 .../gcc.target/i386/avx512fp16-12b.c          |  27 +++
 gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 ++
 gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
 gcc/testsuite/lib/target-supports.exp         |  13 +-
 40 files changed, 531 insertions(+), 103 deletions(-)
 create mode 100644 gcc/config/i386/avx512fp16intrin.h
 create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 458f41de776..1835ac64e67 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -731,6 +731,8 @@ get_available_features (struct __processor_model *cpu_model,
 	    set_feature (FEATURE_AVX5124FMAPS);
 	  if (edx & bit_AVX512VP2INTERSECT)
 	    set_feature (FEATURE_AVX512VP2INTERSECT);
+	  if (edx & bit_AVX512FP16)
+	    set_feature (FEATURE_AVX512FP16);
 	}
 
       __cpuid_count (7, 1, eax, ebx, ecx, edx);
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 76ab1a14e54..00c65ba15ab 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX5124VNNIW_SET OPTION_MASK_ISA2_AVX5124VNNIW
 #define OPTION_MASK_ISA_AVX512VBMI2_SET \
   (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512F_SET)
+#define OPTION_MASK_ISA_AVX512FP16_SET OPTION_MASK_ISA_AVX512BW_SET
+#define OPTION_MASK_ISA2_AVX512FP16_SET OPTION_MASK_ISA2_AVX512FP16
 #define OPTION_MASK_ISA_AVX512VNNI_SET \
   (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512F_SET)
 #define OPTION_MASK_ISA2_AVXVNNI_SET OPTION_MASK_ISA2_AVXVNNI
@@ -231,6 +233,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS
 #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW
 #define OPTION_MASK_ISA_AVX512VBMI2_UNSET OPTION_MASK_ISA_AVX512VBMI2
+#define OPTION_MASK_ISA_AVX512FP16_UNSET OPTION_MASK_ISA_AVX512BW_UNSET
+#define OPTION_MASK_ISA2_AVX512FP16_UNSET OPTION_MASK_ISA2_AVX512FP16
 #define OPTION_MASK_ISA_AVX512VNNI_UNSET OPTION_MASK_ISA_AVX512VNNI
 #define OPTION_MASK_ISA2_AVXVNNI_UNSET OPTION_MASK_ISA2_AVXVNNI
 #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ
@@ -313,7 +317,8 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA2_AVX512BF16_UNSET \
    | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \
    | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \
-   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
+   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
+   | OPTION_MASK_ISA2_AVX512FP16_UNSET)
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
   (OPTION_MASK_ISA2_AVX512F_UNSET)
 #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
@@ -326,7 +331,9 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET)
 #define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET
 
-#define OPTION_MASK_ISA2_AVX512BW_UNSET OPTION_MASK_ISA2_AVX512BF16_UNSET
+#define OPTION_MASK_ISA2_AVX512BW_UNSET \
+  (OPTION_MASK_ISA2_AVX512BF16_UNSET \
+    | OPTION_MASK_ISA2_AVX512FP16_UNSET)
 
 /* Set 1 << value as value of -malign-FLAG option.  */
 
@@ -853,6 +860,21 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mavx512fp16:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512FP16_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
+	}
+      return true;
+
     case OPT_mavx512vnni:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index e68dd656046..4e0659fc7b2 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -228,6 +228,7 @@ enum processor_features
   FEATURE_AESKLE,
   FEATURE_WIDEKL,
   FEATURE_AVXVNNI,
+  FEATURE_AVX512FP16,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 898c18f3dda..a6783660278 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -169,4 +169,5 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("aeskle", FEATURE_AESKLE, P_NONE, NULL)
   ISA_NAMES_TABLE_ENTRY("widekl", FEATURE_WIDEKL, P_NONE, "-mwidekl")
   ISA_NAMES_TABLE_ENTRY("avxvnni", FEATURE_AVXVNNI, P_NONE, "-mavxvnni")
+  ISA_NAMES_TABLE_ENTRY("avx512fp16", FEATURE_AVX512FP16, P_NONE, "-mavx512fp16")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3df9b52cf25..a354351408c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
 		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
 		       amxbf16intrin.h x86gprintrin.h uintrintrin.h
 		       hresetintrin.h keylockerintrin.h avxvnniintrin.h
-		       mwaitintrin.h"
+		       mwaitintrin.h avx512fp16intrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
new file mode 100644
index 00000000000..38d63161ba6
--- /dev/null
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -0,0 +1,53 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef __AVX512FP16INTRIN_H_INCLUDED
+#define __AVX512FP16INTRIN_H_INCLUDED
+
+#ifndef __AVX512FP16__
+#pragma GCC push_options
+#pragma GCC target("avx512fp16")
+#define __DISABLE_AVX512FP16__
+#endif /* __AVX512FP16__ */
+
+/* Internal data types for implementing the intrinsics.  */
+typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
+typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
+
+/* The Intel API is flexible enough that we must allow aliasing with other
+   vector types, and their scalar components.  */
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
+typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
+
+#ifdef __DISABLE_AVX512FP16__
+#undef __DISABLE_AVX512FP16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512FP16__ */
+
+#endif /* __AVX512FP16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index aebc17c6827..82b8050028b 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -126,6 +126,7 @@
 #define bit_AVX5124VNNIW (1 << 2)
 #define bit_AVX5124FMAPS (1 << 3)
 #define bit_AVX512VP2INTERSECT	(1 << 8)
+#define bit_AVX512FP16   (1 << 23)
 #define bit_IBT	(1 << 20)
 #define bit_UINTR (1 << 5)
 #define bit_PCONFIG	(1 << 18)
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 3ca313c19ec..1768b88d748 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -68,6 +68,7 @@ DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
 DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
+DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
index 204e2903126..668f09f12a0 100644
--- a/gcc/config/i386/i386-builtins.c
+++ b/gcc/config/i386/i386-builtins.c
@@ -125,6 +125,7 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
 /* Table for the ix86 builtin non-function types.  */
 static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
+tree ix86_float16_type_node = NULL_TREE;
 /* Retrieve an element from the above table, building some of
    the types lazily.  */
 
@@ -1343,6 +1344,26 @@ ix86_init_builtins_va_builtins_abi (void)
 			BUILT_IN_VA_COPY, BUILT_IN_NORMAL, NULL, fnattr_sysv);
 }
 
+static void
+ix86_init_float16_builtins (void)
+{
+  /* Provide the _Float16 type and float16_type_node if needed so that
+     it can be used in AVX512FP16 intrinsics and builtins.  */
+  if (!float16_type_node)
+    {
+      ix86_float16_type_node = make_node (REAL_TYPE);
+      TYPE_PRECISION (ix86_float16_type_node) = 16;
+      SET_TYPE_MODE (ix86_float16_type_node, HFmode);
+      layout_type (ix86_float16_type_node);
+    }
+  else
+    ix86_float16_type_node = float16_type_node;
+
+  if (!maybe_get_identifier ("_Float16") && TARGET_SSE2)
+    lang_hooks.types.register_builtin_type (ix86_float16_type_node,
+					    "_Float16");
+}
+
 static void
 ix86_init_builtin_types (void)
 {
@@ -1371,6 +1392,8 @@ ix86_init_builtin_types (void)
      it.  */
   lang_hooks.types.register_builtin_type (float128_type_node, "__float128");
 
+  ix86_init_float16_builtins ();
+
   const_string_type_node
     = build_pointer_type (build_qualified_type
 			  (char_type_node, TYPE_QUAL_CONST));
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 5ed0de006fb..cc64f855ecc 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -598,6 +598,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__PTWRITE__");
   if (isa_flag2 & OPTION_MASK_ISA2_AVX512BF16)
     def_or_undef (parse_in, "__AVX512BF16__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AVX512FP16)
+    def_or_undef (parse_in, "__AVX512FP16__");
   if (TARGET_MMX_WITH_SSE)
     def_or_undef (parse_in, "__MMX_WITH_SSE__");
   if (isa_flag2 & OPTION_MASK_ISA2_ENQCMD)
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 69ea79e6123..b7d050a1e42 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -2314,6 +2314,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
 
   switch (mode)
     {
+    case E_HFmode:
     case E_SFmode:
     case E_DFmode:
     case E_XFmode:
@@ -2627,7 +2628,7 @@ ix86_prepare_fp_compare_args (enum rtx_code code, rtx *pop0, rtx *pop1)
   bool unordered_compare = ix86_unordered_fp_compare (code);
   rtx op0 = *pop0, op1 = *pop1;
   machine_mode op_mode = GET_MODE (op0);
-  bool is_sse = TARGET_SSE_MATH && SSE_FLOAT_MODE_P (op_mode);
+  bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
 
   /* All of the unordered compare instructions only work on registers.
      The same is true of the fcomi compare instructions.  The XFmode
@@ -4112,7 +4113,7 @@ ix86_expand_fp_movcc (rtx operands[])
   rtx op0 = XEXP (operands[1], 0);
   rtx op1 = XEXP (operands[1], 1);
 
-  if (TARGET_SSE_MATH && SSE_FLOAT_MODE_P (mode))
+  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
     {
       machine_mode cmode;
 
diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
index a0d46cbc892..83d9302ea3d 100644
--- a/gcc/config/i386/i386-isa.def
+++ b/gcc/config/i386/i386-isa.def
@@ -108,3 +108,4 @@ DEF_PTA(HRESET)
 DEF_PTA(KL)
 DEF_PTA(WIDEKL)
 DEF_PTA(AVXVNNI)
+DEF_PTA(AVX512FP16)
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 3416a4f1752..df191763e4b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -223,7 +223,8 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mhreset",		OPTION_MASK_ISA2_HRESET },
   { "-mkl",		OPTION_MASK_ISA2_KL },
   { "-mwidekl", 	OPTION_MASK_ISA2_WIDEKL },
-  { "-mavxvnni",	OPTION_MASK_ISA2_AVXVNNI }
+  { "-mavxvnni",	OPTION_MASK_ISA2_AVXVNNI },
+  { "-mavx512fp16",	OPTION_MASK_ISA2_AVX512FP16 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1045,6 +1046,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
     IX86_ATTR_ISA ("hreset", OPT_mhreset),
     IX86_ATTR_ISA ("avxvnni",   OPT_mavxvnni),
+    IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 02628d838fc..e826484a4f4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5497,6 +5497,14 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
     case MODE_SI:
       return "%vmovd\t{%1, %0|%0, %1}";
 
+    case MODE_HI:
+      if (GENERAL_REG_P (operands[0]))
+	return "vmovw\t{%1, %k0|%k0, %1}";
+      else if (GENERAL_REG_P (operands[1]))
+	return "vmovw\t{%k1, %0|%0, %k1}";
+      else
+	return "vmovw\t{%1, %0|%0, %1}";
+
     case MODE_DF:
       if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
 	return "vmovsd\t{%d1, %0|%0, %d1}";
@@ -5509,6 +5517,12 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
       else
 	return "%vmovss\t{%1, %0|%0, %1}";
 
+    case MODE_HF:
+      if (REG_P (operands[0]) && REG_P (operands[1]))
+	return "vmovsh\t{%d1, %0|%0, %d1}";
+      else
+	return "vmovsh\t{%1, %0|%0, %1}";
+
     case MODE_V1DF:
       gcc_assert (!TARGET_AVX);
       return "movlpd\t{%1, %0|%0, %1}";
@@ -13955,7 +13969,9 @@ output_387_binary_op (rtx_insn *insn, rtx *operands)
 
   if (is_sse)
    {
-     p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd";
+     p = (GET_MODE (operands[0]) == HFmode
+	  ? "sh"
+	  : (GET_MODE (operands[0]) == SFmode ? "ss" : "sd"));
      strcat (buf, p);
 
      if (TARGET_AVX)
@@ -19132,9 +19148,11 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1,
       if (!TARGET_SSE2)
 	return true;
 
-      /* Between SSE and general, we have moves no larger than word size.  */
+      /* Between SSE and general, we have moves no larger than word size
+	 except for AVX512FP16, VMOVW enable 16bits movement.  */
       if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
-	  || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
+	  || GET_MODE_SIZE (mode) < GET_MODE_SIZE (TARGET_AVX512FP16
+						   ? HImode : SImode)
 	  || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
 	return true;
 
@@ -19229,21 +19247,26 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
 static inline int
 sse_store_index (machine_mode mode)
 {
-      switch (GET_MODE_SIZE (mode))
-	{
-	  case 4:
-	    return 0;
-	  case 8:
-	    return 1;
-	  case 16:
-	    return 2;
-	  case 32:
-	    return 3;
-	  case 64:
-	    return 4;
-	  default:
-	    return -1;
-	}
+  /* NB: Use SFmode cost for HFmode instead of adding HFmode load/store
+     costs to processor_costs, which requires changes to all entries in
+     processor cost table.  */
+  if (mode == E_HFmode)
+    mode = E_SFmode;
+  switch (GET_MODE_SIZE (mode))
+    {
+    case 4:
+      return 0;
+    case 8:
+      return 1;
+    case 16:
+      return 2;
+    case 32:
+      return 3;
+    case 64:
+      return 4;
+    default:
+      return -1;
+    }
 }
 
 /* Return the cost of moving data of mode M between a
@@ -19270,6 +19293,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
       int index;
       switch (mode)
 	{
+	  case E_HFmode:
 	  case E_SFmode:
 	    index = 0;
 	    break;
@@ -19370,11 +19394,31 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
 	  }
 	break;
       case 2:
-	if (in == 2)
-	  return MAX (ix86_cost->hard_register.int_load[1],
-		      ix86_cost->hard_register.int_store[1]);
-	return in ? ix86_cost->hard_register.int_load[1]
-		  : ix86_cost->hard_register.int_store[1];
+	{
+	  int cost;
+	  if (in == 2)
+	    cost = MAX (ix86_cost->hard_register.int_load[1],
+			ix86_cost->hard_register.int_store[1]);
+	  else
+	    cost = in ? ix86_cost->hard_register.int_load[1]
+		      : ix86_cost->hard_register.int_store[1];
+	  if (mode == E_HFmode)
+	    {
+	      /* Prefer SSE over GPR for HFmode.  */
+	      int sse_cost;
+	      int index = sse_store_index (mode);
+	      if (in == 2)
+		sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
+				ix86_cost->hard_register.sse_store[index]);
+	      else
+		sse_cost = (in
+			    ? ix86_cost->hard_register.sse_load [index]
+			    : ix86_cost->hard_register.sse_store [index]);
+	      if (sse_cost >= cost)
+		cost = sse_cost + 1;
+	    }
+	  return cost;
+	}
       default:
 	if (in == 2)
 	  cost = MAX (ix86_cost->hard_register.int_load[2],
@@ -19548,6 +19592,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	  - XI mode
 	  - any of 512-bit wide vector mode
 	  - any scalar mode.  */
+      /* For AVX512FP16, vmovw supports movement of HImode
+	 between gpr and sse registser.  */
       if (TARGET_AVX512F
 	  && (mode == XImode
 	      || VALID_AVX512F_REG_MODE (mode)
@@ -19833,7 +19879,7 @@ ix86_multiplication_cost (const struct processor_costs *cost,
   if (VECTOR_MODE_P (mode))
     inner_mode = GET_MODE_INNER (mode);
 
-  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
     return inner_mode == DFmode ? cost->mulsd : cost->mulss;
   else if (X87_FLOAT_MODE_P (mode))
     return cost->fmul;
@@ -19885,7 +19931,7 @@ ix86_division_cost (const struct processor_costs *cost,
   if (VECTOR_MODE_P (mode))
     inner_mode = GET_MODE_INNER (mode);
 
-  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
     return inner_mode == DFmode ? cost->divsd : cost->divss;
   else if (X87_FLOAT_MODE_P (mode))
     return cost->fdiv;
@@ -20305,7 +20351,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
 	  return true;
 	}
 
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	{
 	  *total = cost->addss;
 	  return false;
@@ -20338,7 +20384,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       /* FALLTHRU */
 
     case NEG:
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	{
 	  *total = cost->sse_op;
 	  return false;
@@ -20420,14 +20466,14 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       return false;
 
     case FLOAT_EXTEND:
-      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
+      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = 0;
       else
         *total = ix86_vec_cost (mode, cost->addss);
       return false;
 
     case FLOAT_TRUNCATE:
-      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
+      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = cost->fadd;
       else
         *total = ix86_vec_cost (mode, cost->addss);
@@ -20437,7 +20483,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       /* SSE requires memory load for the constant operand. It may make
 	 sense to account for this.  Of course the constant operand may or
 	 may not be reused. */
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = cost->sse_op;
       else if (X87_FLOAT_MODE_P (mode))
 	*total = cost->fabs;
@@ -20446,7 +20492,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       return false;
 
     case SQRT:
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = mode == SFmode ? cost->sqrtss : cost->sqrtsd;
       else if (X87_FLOAT_MODE_P (mode))
 	*total = cost->fsqrt;
@@ -21930,6 +21976,10 @@ ix86_mangle_type (const_tree type)
 
   switch (TYPE_MODE (type))
     {
+    case E_HFmode:
+      /* _Float16 is "DF16_".
+	 Align with clang's decision in https://reviews.llvm.org/D33719. */
+      return "DF16_";
     case E_TFmode:
       /* __float128 is "g".  */
       return "g";
@@ -22553,7 +22603,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	case MINUS_EXPR:
 	  if (kind == scalar_stmt)
 	    {
-	      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+	      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 		stmt_cost = ix86_cost->addss;
 	      else if (X87_FLOAT_MODE_P (mode))
 		stmt_cost = ix86_cost->fadd;
@@ -22571,7 +22621,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	  stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
 	  break;
 	case NEGATE_EXPR:
-	  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+	  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	    stmt_cost = ix86_cost->sse_op;
 	  else if (X87_FLOAT_MODE_P (mode))
 	    stmt_cost = ix86_cost->fchs;
@@ -22627,7 +22677,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	case BIT_XOR_EXPR:
 	case BIT_AND_EXPR:
 	case BIT_NOT_EXPR:
-	  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+	  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	    stmt_cost = ix86_cost->sse_op;
 	  else if (VECTOR_MODE_P (mode))
 	    stmt_cost = ix86_vec_cost (mode, ix86_cost->sse_op);
@@ -23233,8 +23283,7 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode,
       return opt_type == OPTIMIZE_FOR_SPEED;
 
     case rint_optab:
-      if (SSE_FLOAT_MODE_P (mode1)
-	  && TARGET_SSE_MATH
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode1)
 	  && !flag_trapping_math
 	  && !TARGET_SSE4_1)
 	return opt_type == OPTIMIZE_FOR_SPEED;
@@ -23243,8 +23292,7 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode,
     case floor_optab:
     case ceil_optab:
     case btrunc_optab:
-      if (SSE_FLOAT_MODE_P (mode1)
-	  && TARGET_SSE_MATH
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode1)
 	  && !flag_trapping_math
 	  && TARGET_SSE4_1)
 	return true;
@@ -23329,7 +23377,9 @@ ix86_get_excess_precision (enum excess_precision_type type)
 	/* The fastest type to promote to will always be the native type,
 	   whether that occurs with implicit excess precision or
 	   otherwise.  */
-	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+	return TARGET_AVX512FP16
+	       ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
+	       : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
       case EXCESS_PRECISION_TYPE_STANDARD:
       case EXCESS_PRECISION_TYPE_IMPLICIT:
 	/* Otherwise, the excess precision we want when we are
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e21922e8782..dca2ad32ed4 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1000,7 +1000,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define VALID_AVX512F_SCALAR_MODE(MODE)					\
   ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode		\
-   || (MODE) == SFmode)
+   || (MODE) == SFmode							\
+   || (((MODE) == HImode || (MODE) == HFmode) && TARGET_AVX512FP16))
 
 #define VALID_AVX512F_REG_MODE(MODE)					\
   ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode	\
@@ -1039,7 +1040,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define VALID_FP_MODE_P(MODE)						\
   ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode		\
-   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)		\
+   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
 
 #define VALID_INT_MODE_P(MODE)						\
   ((MODE) == QImode || (MODE) == HImode					\
@@ -1071,6 +1072,10 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
 
+#define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE)				\
+  ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH)				\
+   || (TARGET_AVX512FP16 && (MODE) == HFmode))
+
 #define FMA4_VEC_FLOAT_MODE_P(MODE) \
   (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \
 		  || (MODE) == V8SFmode || (MODE) == V4DFmode))
@@ -2264,7 +2269,7 @@ constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
 constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
   | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
-  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI;
+  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16;
 constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
   | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
 constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index dd991c3ffdf..8f11cbcf28b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -496,7 +496,7 @@ (define_attr "type"
 
 ;; Main data type used by the insn
 (define_attr "mode"
-  "unknown,none,QI,HI,SI,DI,TI,OI,XI,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
+  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
   V2DF,V2SF,V1DF,V8DF"
   (const_string "unknown"))
 
@@ -832,8 +832,7 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
 		    avx512bw,noavx512bw,avx512dq,noavx512dq,
-		    avx512vl,noavx512vl,
-		    avxvnni,avx512vnnivl"
+		    avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16"
   (const_string "base"))
 
 ;; Define instruction set of MMX instructions
@@ -885,7 +884,8 @@ (define_attr "enabled" ""
 	 (eq_attr "isa" "avxvnni") (symbol_ref "TARGET_AVXVNNI")
 	 (eq_attr "isa" "avx512vnnivl")
 	   (symbol_ref "TARGET_AVX512VNNI && TARGET_AVX512VL")
-
+	 (eq_attr "isa" "avx512fp16")
+	   (symbol_ref "TARGET_AVX512FP16")
 	 (eq_attr "mmx_isa" "native")
 	   (symbol_ref "!TARGET_MMX_WITH_SSE")
 	 (eq_attr "mmx_isa" "sse")
@@ -1089,8 +1089,9 @@ (define_mode_iterator SWI48DWI [SI DI (TI "TARGET_64BIT")])
 ;; compile time constant, it is faster to use <MODE_SIZE> than
 ;; GET_MODE_SIZE (<MODE>mode).  For XFmode which depends on
 ;; command line options just use GET_MODE_SIZE macro.
-(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (TI "16")
-			     (SF "4") (DF "8") (XF "GET_MODE_SIZE (XFmode)")
+(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
+			     (TI "16") (HF "2") (SF "4") (DF "8")
+			     (XF "GET_MODE_SIZE (XFmode)")
 			     (V16QI "16") (V32QI "32") (V64QI "64")
 			     (V8HI "16") (V16HI "32") (V32HI "64")
 			     (V4SI "16") (V8SI "32") (V16SI "64")
@@ -1222,8 +1223,11 @@ (define_mode_iterator MODEF [SF DF])
 ;; All x87 floating point modes
 (define_mode_iterator X87MODEF [SF DF XF])
 
-;; All x87 floating point modes plus HF
-(define_mode_iterator X87MODEFH [SF DF XF HF])
+;; SSE and x87 SFmode and DFmode floating point modes plus HFmode
+(define_mode_iterator MODEFH [(HF "TARGET_AVX512FP16") SF DF])
+
+;; All x87 floating point modes plus HFmode
+(define_mode_iterator X87MODEFH [HF SF DF XF])
 
 ;; All SSE floating point modes
 (define_mode_iterator SSEMODEF [SF DF TF])
@@ -1231,7 +1235,7 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
 
 ;; SSE instruction suffix for various modes
 (define_mode_attr ssemodesuffix
-  [(SF "ss") (DF "sd")
+  [(HF "sh") (SF "ss") (DF "sd")
    (V16SF "ps") (V8DF "pd")
    (V8SF "ps") (V4DF "pd")
    (V4SF "ps") (V2DF "pd")
@@ -1498,15 +1502,15 @@ (define_expand "cstorexf4"
 
 (define_expand "cbranch<mode>4"
   [(set (reg:CC FLAGS_REG)
-	(compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
-		    (match_operand:MODEF 2 "cmp_fp_expander_operand")))
+	(compare:CC (match_operand:MODEFH 1 "cmp_fp_expander_operand")
+		    (match_operand:MODEFH 2 "cmp_fp_expander_operand")))
    (set (pc) (if_then_else
               (match_operator 0 "ix86_fp_comparison_operator"
                [(reg:CC FLAGS_REG)
                 (const_int 0)])
               (label_ref (match_operand 3))
               (pc)))]
-  "TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
+  "TARGET_80387 || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
 {
   ix86_expand_branch (GET_CODE (operands[0]),
 		      operands[1], operands[2], operands[3]);
@@ -1705,6 +1709,17 @@ (define_insn "*cmpi<unord><MODEF:mode>"
 	 (eq_attr "alternative" "0")
 	 (symbol_ref "true")
 	 (symbol_ref "false"))))])
+
+(define_insn "*cmpi<unord>hf"
+  [(set (reg:CCFP FLAGS_REG)
+	(compare:CCFP
+	  (match_operand:HF 0 "register_operand" "v")
+	  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16"
+  "v<unord>comish\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecomi")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
 \f
 ;; Push/pop instructions.
 
@@ -2436,8 +2451,8 @@ (define_insn "*movsi_internal"
 	   (symbol_ref "true")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k")
-	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k,?r,?v,*v,*v,*m")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 
@@ -2463,6 +2478,9 @@ (define_insn "*movhi_internal"
 	  gcc_unreachable ();
 	}
 
+    case TYPE_SSEMOV:
+      return ix86_output_ssemov (insn, operands);
+
     case TYPE_MSKLOG:
       if (operands[1] == const0_rtx)
 	return "kxorw\t%0, %0, %0";
@@ -2478,7 +2496,9 @@ (define_insn "*movhi_internal"
     }
 }
   [(set (attr "type")
-     (cond [(eq_attr "alternative" "4,5,6,7")
+     (cond [(eq_attr "alternative" "9,10,11,12,13")
+	      (const_string "ssemov")
+	    (eq_attr "alternative" "4,5,6,7")
 	      (const_string "mskmov")
 	    (eq_attr "alternative" "8")
 	      (const_string "msklog")
@@ -2503,6 +2523,8 @@ (define_insn "*movhi_internal"
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
+	     (eq_attr "alternative" "11")
+	       (const_string "HF")
 	     (and (eq_attr "alternative" "1,2")
 		  (match_operand:HI 1 "aligned_operand"))
 	       (const_string "SI")
@@ -2511,7 +2533,12 @@ (define_insn "*movhi_internal"
 		       (not (match_test "TARGET_HIMODE_MATH"))))
 	       (const_string "SI")
 	    ]
-	    (const_string "HI")))])
+	    (const_string "HI")))
+    (set (attr "isa")
+	 (cond [(eq_attr "alternative" "9,10,11,12,13")
+		(const_string "avx512fp16")
+	       ]
+	       (const_string "*")))])
 
 ;; Situation is quite tricky about when to choose full sized (SImode) move
 ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
@@ -3727,7 +3754,10 @@ (define_insn "*movhf_internal"
 	       (eq_attr "alternative" "2")
 		 (const_string "sselog1")
 	       (eq_attr "alternative" "4,5,6,7")
-		 (const_string "sselog")
+		 (if_then_else
+		   (match_test ("TARGET_AVX512FP16"))
+		   (const_string "ssemov")
+		   (const_string "sselog"))
 	      ]
 	      (const_string "ssemov")))
    (set (attr "memory")
@@ -3750,9 +3780,15 @@ (define_insn "*movhf_internal"
 	       (eq_attr "alternative" "2")
 		 (const_string "V4SF")
 	       (eq_attr "alternative" "4,5,6,7")
-		 (const_string "TI")
+		 (if_then_else
+		   (match_test "TARGET_AVX512FP16")
+		   (const_string "HI")
+		   (const_string "TI"))
 	       (eq_attr "alternative" "3")
-		 (const_string "SF")
+		 (if_then_else
+		   (match_test "TARGET_AVX512FP16")
+		   (const_string "HF")
+		   (const_string "SF"))
 	      ]
 	      (const_string "*")))])
 
@@ -4493,6 +4529,17 @@ (define_split
   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
 })
 
+(define_insn "extendhf<mode>2"
+  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
+        (float_extend:MODEF
+	  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16"
+  "vcvtsh2<ssemodesuffix>\t{%1, %0, %0|%0, %0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
+
 (define_expand "extend<mode>xf2"
   [(set (match_operand:XF 0 "nonimmediate_operand")
         (float_extend:XF (match_operand:MODEF 1 "general_operand")))]
@@ -4670,6 +4717,18 @@ (define_insn "truncxf<mode>2"
 	      (symbol_ref "flag_unsafe_math_optimizations")
 	   ]
 	   (symbol_ref "true")))])
+
+;; Conversion from {SF,DF}mode to HFmode.
+
+(define_insn "trunc<mode>hf2"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+       (float_truncate:HF
+         (match_operand:MODEF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16"
+  "vcvt<ssemodesuffix>2sh\t{%1, %d0|%d0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
 \f
 ;; Signed conversion to DImode.
 
@@ -5046,6 +5105,16 @@ (define_insn "*float<SWI48:mode><MODEF:mode>2"
 	      (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")]
 	   (symbol_ref "true")))])
 
+(define_insn "float<floatunssuffix><mode>hf2"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+	(any_float:HF
+	  (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
+  "TARGET_AVX512FP16"
+  "vcvt<floatsuffix>si2sh<rex64suffix>\t{%1, %d0|%d0, %1}"
+  [(set_attr "type" "sseicvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_insn "*floatdi<MODEF:mode>2_i387"
   [(set (match_operand:MODEF 0 "register_operand" "=f")
 	(float:MODEF (match_operand:DI 1 "nonimmediate_operand" "m")))]
@@ -7627,12 +7696,12 @@ (define_expand "<insn>xf3"
   "TARGET_80387")
 
 (define_expand "<insn><mode>3"
-  [(set (match_operand:MODEF 0 "register_operand")
-	(plusminus:MODEF
-	  (match_operand:MODEF 1 "register_operand")
-	  (match_operand:MODEF 2 "nonimmediate_operand")))]
+  [(set (match_operand:MODEFH 0 "register_operand")
+	(plusminus:MODEFH
+	  (match_operand:MODEFH 1 "register_operand")
+	  (match_operand:MODEFH 2 "nonimmediate_operand")))]
   "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
-    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)")
+    || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)")
 \f
 ;; Multiply instructions
 
@@ -8204,11 +8273,11 @@ (define_expand "mulxf3"
   "TARGET_80387")
 
 (define_expand "mul<mode>3"
-  [(set (match_operand:MODEF 0 "register_operand")
-	(mult:MODEF (match_operand:MODEF 1 "register_operand")
-		    (match_operand:MODEF 2 "nonimmediate_operand")))]
+  [(set (match_operand:MODEFH 0 "register_operand")
+	(mult:MODEFH (match_operand:MODEFH 1 "register_operand")
+		    (match_operand:MODEFH 2 "nonimmediate_operand")))]
   "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
-    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)")
+    || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)")
 \f
 ;; Divide instructions
 
@@ -8221,11 +8290,11 @@ (define_expand "divxf3"
   "TARGET_80387")
 
 (define_expand "div<mode>3"
-  [(set (match_operand:MODEF 0 "register_operand")
-	(div:MODEF (match_operand:MODEF 1 "register_operand")
-		   (match_operand:MODEF 2 "nonimmediate_operand")))]
+  [(set (match_operand:MODEFH 0 "register_operand")
+	(div:MODEFH (match_operand:MODEFH 1 "register_operand")
+		   (match_operand:MODEFH 2 "nonimmediate_operand")))]
   "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
-    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
+    || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
 {
   if (<MODE>mode == SFmode
       && TARGET_SSE && TARGET_SSE_MATH
@@ -16312,6 +16381,22 @@ (define_insn "*fop_<mode>_comm"
 	 (symbol_ref "true")
 	 (symbol_ref "false"))))])
 
+(define_insn "*fop_hf_comm"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+	(match_operator:HF 3 "binary_fp_operator"
+	  [(match_operand:HF 1 "nonimmediate_operand" "%v")
+	   (match_operand:HF 2 "nonimmediate_operand" "vm")]))]
+  "TARGET_AVX512FP16
+   && COMMUTATIVE_ARITH_P (operands[3])
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "* return output_387_binary_op (insn, operands);"
+  [(set (attr "type")
+	(if_then_else (match_operand:HF 3 "mult_operator")
+	  (const_string "ssemul")
+	  (const_string "sseadd")))
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_insn "*rcpsf2_sse"
   [(set (match_operand:SF 0 "register_operand" "=x,x,x")
 	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
@@ -16385,6 +16470,22 @@ (define_insn "*fop_<mode>_1"
 	 (symbol_ref "true")
 	 (symbol_ref "false"))))])
 
+(define_insn "*fop_hf_1"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+	(match_operator:HF 3 "binary_fp_operator"
+	  [(match_operand:HF 1 "nonimmediate_operand" "v")
+	   (match_operand:HF 2 "nonimmediate_operand" "vm")]))]
+  "TARGET_AVX512FP16
+   && !COMMUTATIVE_ARITH_P (operands[3])
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "* return output_387_binary_op (insn, operands);"
+  [(set (attr "type")
+	(if_then_else (match_operand:MODEF 3 "div_operator")
+	  (const_string "ssediv")
+	  (const_string "sseadd")))
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*fop_<X87MODEF:mode>_2_i387"
   [(set (match_operand:X87MODEF 0 "register_operand" "=f")
 	(match_operator:X87MODEF 3 "binary_fp_operator"
@@ -19179,13 +19280,13 @@ (define_peephole2
 })
 
 (define_expand "mov<mode>cc"
-  [(set (match_operand:X87MODEF 0 "register_operand")
-	(if_then_else:X87MODEF
+  [(set (match_operand:X87MODEFH 0 "register_operand")
+	(if_then_else:X87MODEFH
 	  (match_operand 1 "comparison_operator")
-	  (match_operand:X87MODEF 2 "register_operand")
-	  (match_operand:X87MODEF 3 "register_operand")))]
+	  (match_operand:X87MODEFH 2 "register_operand")
+	  (match_operand:X87MODEFH 3 "register_operand")))]
   "(TARGET_80387 && TARGET_CMOVE)
-   || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
+   || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
   "if (ix86_expand_fp_movcc (operands)) DONE; else FAIL;")
 
 (define_insn "*movxfcc_1"
@@ -19347,12 +19448,12 @@ (define_insn "<code><mode>3"
 ;; presence of -0.0 and NaN.
 
 (define_insn "*ieee_s<ieee_maxmin><mode>3"
-  [(set (match_operand:MODEF 0 "register_operand" "=x,v")
-	(unspec:MODEF
-	  [(match_operand:MODEF 1 "register_operand" "0,v")
-	   (match_operand:MODEF 2 "nonimmediate_operand" "xm,vm")]
+  [(set (match_operand:MODEFH 0 "register_operand" "=x,v")
+	(unspec:MODEFH
+	  [(match_operand:MODEFH 1 "register_operand" "0,v")
+	   (match_operand:MODEFH 2 "nonimmediate_operand" "xm,vm")]
 	  IEEE_MAXMIN))]
-  "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
+  "SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
   "@
    <ieee_maxmin><ssemodesuffix>\t{%2, %0|%0, %2}
    v<ieee_maxmin><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 7b8547bb1c3..ad366974b5b 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1166,3 +1166,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property.
 mmwait
 Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save
 Support MWAIT and MONITOR built-in functions and code generation.
+
+mavx512fp16
+Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index f129de4bbe5..2421a78637b 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -94,6 +94,10 @@
 
 #include <avx512vp2intersectvlintrin.h>
 
+#ifdef __SSE2__
+#include <avx512fp16intrin.h>
+#endif
+
 #include <shaintrin.h>
 
 #include <fmaintrin.h>
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 32697e6117c..bb9f7ca956e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1393,6 +1393,7 @@ See RS/6000 and PowerPC Options.
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
 -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni@gol
+-mavx512fp16 @gol
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
 -mkl -mwidekl @gol
@@ -31154,6 +31155,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @itemx -mavx512bf16
 @opindex mavx512bf16
 @need 200
+@itemx -mavx512fp16
+@opindex mavx512fp16
+@need 200
 @itemx -mgfni
 @opindex mgfni
 @need 200
@@ -31232,9 +31236,9 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP,
 XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2,
 GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
 ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
-UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI or CLDEMOTE
-extended instruction sets. Each has a corresponding @option{-mno-} option to
-disable use of these instructions.
+UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16
+or CLDEMOTE extended instruction sets. Each has a corresponding
+@option{-mno-} option to disable use of these instructions.
 
 These extensions are also available as built-in functions: see
 @ref{x86 Built-in Functions}, for details of the functions enabled and
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index 62b2132957a..fba3d1ac684 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index 843aa2bdb2f..5cc0fa83457 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
diff --git a/gcc/testsuite/g++.target/i386/float16-1.C b/gcc/testsuite/g++.target/i386/float16-1.C
new file mode 100644
index 00000000000..95d1ac27c4f
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/float16-1.C
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-sse2" } */
+
+_Float16/* { dg-error "does not name a type" } */
+foo (_Float16 x) 
+{
+  return x;
+}
diff --git a/gcc/testsuite/g++.target/i386/float16-2.C b/gcc/testsuite/g++.target/i386/float16-2.C
new file mode 100644
index 00000000000..99eb797eff1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/float16-2.C
@@ -0,0 +1,14 @@
+/* { dg-do assemble { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+union flt
+{
+  _Float16 flt;
+  short s;
+};
+
+_Float16
+foo (union flt x)
+{
+  return x.flt;
+}
diff --git a/gcc/testsuite/g++.target/i386/float16-3.C b/gcc/testsuite/g++.target/i386/float16-3.C
new file mode 100644
index 00000000000..940878503f1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/float16-3.C
@@ -0,0 +1,10 @@
+/* { dg-do assemble { target avx512fp16 } } */
+/* { dg-options "-O0 -mavx512fp16" } */
+
+template <typename> void a(char *) {}
+char b, d;
+void c()
+{
+  a<unsigned char>(&d);
+  a<_Float16>(&b);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 6178e38ce02..f3676077743 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
index 986fbd819e4..1751c52565c 100644
--- a/gcc/testsuite/gcc.target/i386/avx-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 0a377dba1d5..0ad9064f637 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -87,6 +87,9 @@ main ()
 #ifdef AVX512VNNI
       && (ecx & bit_AVX512VNNI)
 #endif
+#ifdef AVX512FP16
+      && (edx & bit_AVX512FP16)
+#endif
 #ifdef VAES
       && (ecx & bit_VAES)
 #endif
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
new file mode 100644
index 00000000000..88887556d68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+__attribute__ ((noinline, noclone))
+do_max (_Float16 __A, _Float16 __B)
+{
+  return __A > __B ? __A : __B;
+}
+
+_Float16
+__attribute__ ((noinline, noclone))
+do_min (_Float16 __A, _Float16 __B)
+{
+  return __A < __B ? __A : __B;
+}
+
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-times "vminsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
new file mode 100644
index 00000000000..c9e23bf95c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-12a.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 0.1f;
+  _Float16 y = -3.2f;
+  _Float16 z;
+
+  z = do_max (x, y);
+  if (z != x)
+    abort ();
+
+  z = do_min (x, y);
+  if (z != y)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/float16-3a.c b/gcc/testsuite/gcc.target/i386/float16-3a.c
new file mode 100644
index 00000000000..3846c8e9b6e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-3a.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtsi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/float16-3b.c b/gcc/testsuite/gcc.target/i386/float16-3b.c
new file mode 100644
index 00000000000..247dd6e7e33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-3b.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (unsigned int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtusi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/float16-4a.c b/gcc/testsuite/gcc.target/i386/float16-4a.c
new file mode 100644
index 00000000000..631082581f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-4a.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (long long x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtsi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/float16-4b.c b/gcc/testsuite/gcc.target/i386/float16-4b.c
new file mode 100644
index 00000000000..828d8530769
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-4b.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (unsigned long long x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtusi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 79265c7c94f..8499fdf2db9 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -79,6 +79,7 @@ extern void test_hreset (void)			__attribute__((__target__("hreset")));
 extern void test_keylocker (void)		__attribute__((__target__("kl")));
 extern void test_widekl (void)			__attribute__((__target__("widekl")));
 extern void test_avxvnni (void)			__attribute__((__target__("avxvnni")));
+extern void test_avx512fp16 (void)		__attribute__((__target__("avx512fp16")));
 
 extern void test_no_sgx (void)			__attribute__((__target__("no-sgx")));
 extern void test_no_avx5124fmaps(void)		__attribute__((__target__("no-avx5124fmaps")));
@@ -159,6 +160,7 @@ extern void test_no_hreset (void)		__attribute__((__target__("no-hreset")));
 extern void test_no_keylocker (void)		__attribute__((__target__("no-kl")));
 extern void test_no_widekl (void)		__attribute__((__target__("no-widekl")));
 extern void test_no_avxvnni (void)		__attribute__((__target__("no-avxvnni")));
+extern void test_no_avx512fp16 (void)		__attribute__((__target__("no-avx512fp16")));
 
 extern void test_arch_nocona (void)		__attribute__((__target__("arch=nocona")));
 extern void test_arch_core2 (void)		__attribute__((__target__("arch=core2")));
diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
new file mode 100644
index 00000000000..2f8af392c83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
+/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
+
+#include <immintrin.h>
+
+_Float16
+foo (_Float16 x, _Float16 y)
+{
+  x = x > y ? x : y;
+  return x;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 7029771334b..f5f5c113612 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 4ce0ffffaf3..747d504cedb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 6e8b6f3fa1b..33411969901 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -103,7 +103,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
+#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
 #endif
 #include <immintrin.h>
 test_1 (_cvtss_sh, unsigned short, float, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 7faa053ace8..86590ca5ffb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -708,6 +708,6 @@
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1) 
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 42ac9d0ac1a..10765365d7b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3020,7 +3020,7 @@ proc check_effective_target_has_q_floating_suffix { } {
 
 proc check_effective_target_float16 {} {
     return [check_no_compiler_messages_nocache float16 object {
-        _Float16 x;
+        _Float16 foo (_Float16 x) { return x; }
     } [add_options_for_float16 ""]]
 }
 
@@ -8714,6 +8714,17 @@ proc check_prefer_avx128 { } {
 }
 
 
+# Return 1 if avx512fp16 instructions can be compiled.
+
+proc check_effective_target_avx512fp16 { } {
+    return [check_no_compiler_messages avx512fp16 object {
+	void foo (void)
+	{
+	  asm volatile ("vmovw %edi, %xmm0");
+	}
+    } "-O2 -mavx512fp16" ]
+}
+
 # Return 1 if avx512f instructions can be compiled.
 
 proc check_effective_target_avx512f { } {
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 05/10] AVX512FP16: Support vector init/broadcast/set/extract for FP16.
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
                           ` (3 preceding siblings ...)
  2021-07-21  7:43         ` [PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-07-22  5:24           ` Hongtao Liu
  2021-07-21  7:43         ` [PATCH 06/10] AVX512FP16: Add testcase for vector init and broadcast intrinsics liuhongt
                           ` (5 subsequent siblings)
  10 siblings, 1 reply; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
	(_mm256_set_ph): Likewise.
	(_mm512_set_ph): Likewise.
	(_mm_setr_ph): Likewise.
	(_mm256_setr_ph): Likewise.
	(_mm512_setr_ph): Likewise.
	(_mm_set1_ph): Likewise.
	(_mm256_set1_ph): Likewise.
	(_mm512_set1_ph): Likewise.
	(_mm_setzero_ph): Likewise.
	(_mm256_setzero_ph): Likewise.
	(_mm512_setzero_ph): Likewise.
	(_mm_set_sh): Likewise.
	(_mm_load_sh): Likewise.
	(_mm_store_sh): Likewise.
	* config/i386/i386-builtin-types.def (V8HF): New type.
	(DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type
	* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
	Support vector HFmodes.
	(ix86_expand_vector_init_one_nonzero): Likewise.
	(ix86_expand_vector_init_one_var): Likewise.
	(ix86_expand_vector_init_interleave): Likewise.
	(ix86_expand_vector_init_general): Likewise.
	(ix86_expand_vector_set): Likewise.
	(ix86_expand_vector_extract): Likewise.
	(ix86_expand_vector_init_concat): Likewise.
	(ix86_expand_sse_movcc): Handle vector HFmodes.
	(ix86_expand_vector_set_var): Ditto.
	* config/i386/i386-modes.def: Add HF vector modes in comment.
	* config/i386/i386.c (classify_argument): Add HF vector modes.
	(ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
	(ix86_vector_mode_supported_p): Likewise.
	(ix86_set_reg_reg_cost): Handle vector HFmode.
	(ix86_get_ssemov): Handle vector HFmode.
	(function_arg_advance_64): Pass unamed V16HFmode and V32HFmode
	by stack.
	* config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
	(VALID_AVX256_REG_OR_OI_MODE): Rename to ..
	(VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF.
	(VALID_SSE2_REG_VHF_MODE): New.
	(VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode.
	(SSE_REG_MODE_P): Add vector HFmode.
	* config/i386/i386.md (mode): Add HF vector modes.
	(MODE_SIZE): Likewise.
	(ssemodesuffix): Add ph suffix for HF vector modes.
	* config/i386/sse.md (VFH_128): New mode iterator.
	(VMOVE): Adjust for HF vector modes.
	(V): Likewise.
	(V_256_512): Likewise.
	(avx512): Likewise.
	(avx512fmaskmode): Likewise.
	(shuffletype): Likewise.
	(sseinsnmode): Likewise.
	(ssedoublevecmode): Likewise.
	(ssehalfvecmode): Likewise.
	(ssehalfvecmodelower): Likewise.
	(ssePScmode): Likewise.
	(ssescalarmode): Likewise.
	(ssescalarmodelower): Likewise.
	(sseintprefix): Likewise.
	(i128): Likewise.
	(bcstscalarsuff): Likewise.
	(xtg_mode): Likewise.
	(VI12HF_AVX512VL): New mode_iterator.
	(VF_AVX512FP16): Likewise.
	(VIHF): Likewise.
	(VIHF_256): Likewise.
	(VIHF_AVX512BW): Likewise.
	(V16_256): Likewise.
	(V32_512): Likewise.
	(sseintmodesuffix): New mode_attr.
	(sse): Add scalar and vector HFmodes.
	(ssescalarmode): Add vector HFmode mapping.
	(ssescalarmodesuffix): Add sh suffix for HFmode.
	(*<sse>_vm<insn><mode>3): Use VFH_128.
	(*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
	(*ieee_<ieee_maxmin><mode>3): Likewise.
	(<avx512>_blendm<mode>): New define_insn.
	(vec_setv8hf): New define_expand.
	(vec_set<mode>_0): New define_insn for HF vector set.
	(*avx512fp16_movsh): Likewise.
	(avx512fp16_movsh): Likewise.
	(vec_extract_lo_v32hi): Rename to ...
	(vec_extract_lo_<mode>): ... this, and adjust to allow HF
	vector modes.
	(vec_extract_hi_v32hi): Likewise.
	(vec_extract_hi_<mode>): Likewise.
	(vec_extract_lo_v16hi): Likewise.
	(vec_extract_lo_<mode>): Likewise.
	(vec_extract_hi_v16hi): Likewise.
	(vec_extract_hi_<mode>): Likewise.
	(vec_set_hi_v16hi): Likewise.
	(vec_set_hi_<mode>): Likewise.
	(vec_set_lo_v16hi): Likewise.
	(vec_set_lo_<mode>: Likewise.
	(*vec_extract<mode>_0): New define_insn_and_split for HF
	vector extract.
	(*vec_extracthf): New define_insn.
	(VEC_EXTRACT_MODE): Add HF vector modes.
	(PINSR_MODE): Add V8HF.
	(sse2p4_1): Likewise.
	(pinsr_evex_isa): Likewise.
	(<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
	insert for V8HFmode.
	(pbroadcast_evex_isa): Add HF vector modes.
	(AVX2_VEC_DUP_MODE): Likewise.
	(VEC_INIT_MODE): Likewise.
	(VEC_INIT_HALF_MODE): Likewise.
	(avx2_pbroadcast<mode>): Adjust to support HF vector mode
	broadcast.
	(avx2_pbroadcast<mode>_1): Likewise.
	(<avx512>_vec_dup<mode>_1): Likewise.
	(<avx512>_vec_dup<mode><mask_name>): Likewise.
	(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>):
	Likewise.
---
 gcc/config/i386/avx512fp16intrin.h     | 172 +++++++++++
 gcc/config/i386/i386-builtin-types.def |   6 +-
 gcc/config/i386/i386-expand.c          | 124 +++++++-
 gcc/config/i386/i386-modes.def         |  12 +-
 gcc/config/i386/i386.c                 |  69 ++---
 gcc/config/i386/i386.h                 |  15 +-
 gcc/config/i386/i386.md                |  13 +-
 gcc/config/i386/sse.md                 | 395 +++++++++++++++++++------
 8 files changed, 652 insertions(+), 154 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 38d63161ba6..3fc0770986e 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -45,6 +45,178 @@ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
 typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
 typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
 
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5,
+	    _Float16 __A4, _Float16 __A3, _Float16 __A2,
+	    _Float16 __A1, _Float16 __A0)
+{
+  return __extension__ (__m128h)(__v8hf){ __A0, __A1, __A2, __A3,
+					  __A4, __A5, __A6, __A7 };
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_set_ph (_Float16 __A15, _Float16 __A14, _Float16 __A13,
+	       _Float16 __A12, _Float16 __A11, _Float16 __A10,
+	       _Float16 __A9, _Float16 __A8, _Float16 __A7,
+	       _Float16 __A6, _Float16 __A5, _Float16 __A4,
+	       _Float16 __A3, _Float16 __A2, _Float16 __A1,
+	       _Float16 __A0)
+{
+  return __extension__ (__m256h)(__v16hf){ __A0, __A1, __A2, __A3,
+					   __A4, __A5, __A6, __A7,
+					   __A8, __A9, __A10, __A11,
+					   __A12, __A13, __A14, __A15 };
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29,
+	       _Float16 __A28, _Float16 __A27, _Float16 __A26,
+	       _Float16 __A25, _Float16 __A24, _Float16 __A23,
+	       _Float16 __A22, _Float16 __A21, _Float16 __A20,
+	       _Float16 __A19, _Float16 __A18, _Float16 __A17,
+	       _Float16 __A16, _Float16 __A15, _Float16 __A14,
+	       _Float16 __A13, _Float16 __A12, _Float16 __A11,
+	       _Float16 __A10, _Float16 __A9, _Float16 __A8,
+	       _Float16 __A7, _Float16 __A6, _Float16 __A5,
+	       _Float16 __A4, _Float16 __A3, _Float16 __A2,
+	       _Float16 __A1, _Float16 __A0)
+{
+  return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3,
+					   __A4, __A5, __A6, __A7,
+					   __A8, __A9, __A10, __A11,
+					   __A12, __A13, __A14, __A15,
+					   __A16, __A17, __A18, __A19,
+					   __A20, __A21, __A22, __A23,
+					   __A24, __A25, __A26, __A27,
+					   __A28, __A29, __A30, __A31 };
+}
+
+/* Create vectors of elements in the reversed order from _mm_set_ph,
+   _mm256_set_ph and _mm512_set_ph functions.  */
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+	     _Float16 __A3, _Float16 __A4, _Float16 __A5,
+	     _Float16 __A6, _Float16 __A7)
+{
+  return _mm_set_ph (__A7, __A6, __A5, __A4, __A3, __A2, __A1, __A0);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+		_Float16 __A3, _Float16 __A4, _Float16 __A5,
+		_Float16 __A6, _Float16 __A7, _Float16 __A8,
+		_Float16 __A9, _Float16 __A10, _Float16 __A11,
+		_Float16 __A12, _Float16 __A13, _Float16 __A14,
+		_Float16 __A15)
+{
+  return _mm256_set_ph (__A15, __A14, __A13, __A12, __A11, __A10, __A9,
+			__A8, __A7, __A6, __A5, __A4, __A3, __A2, __A1,
+			__A0);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+		_Float16 __A3, _Float16 __A4, _Float16 __A5,
+		_Float16 __A6, _Float16 __A7, _Float16 __A8,
+		_Float16 __A9, _Float16 __A10, _Float16 __A11,
+		_Float16 __A12, _Float16 __A13, _Float16 __A14,
+		_Float16 __A15, _Float16 __A16, _Float16 __A17,
+		_Float16 __A18, _Float16 __A19, _Float16 __A20,
+		_Float16 __A21, _Float16 __A22, _Float16 __A23,
+		_Float16 __A24, _Float16 __A25, _Float16 __A26,
+		_Float16 __A27, _Float16 __A28, _Float16 __A29,
+		_Float16 __A30, _Float16 __A31)
+
+{
+  return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25,
+			__A24, __A23, __A22, __A21, __A20, __A19, __A18,
+			__A17, __A16, __A15, __A14, __A13, __A12, __A11,
+			__A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3,
+			__A2, __A1, __A0);
+}
+
+/* Broadcast _Float16 to vector.  */
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set1_ph (_Float16 __A)
+{
+  return _mm_set_ph (__A, __A, __A, __A, __A, __A, __A, __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_set1_ph (_Float16 __A)
+{
+  return _mm256_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_ph (_Float16 __A)
+{
+  return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A);
+}
+
+/* Create a vector with all zeros.  */
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_setzero_ph (void)
+{
+  return _mm_set1_ph (0.0f);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_setzero_ph (void)
+{
+  return _mm256_set1_ph (0.0f);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_ph (void)
+{
+  return _mm512_set1_ph (0.0f);
+}
+
+/* Create a vector with element 0 as F and the rest zero.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set_sh (_Float16 __F)
+{
+  return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, __F);
+}
+
+/* Create a vector with element 0 as *P and the rest zero.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_load_sh (void const *__P)
+{
+  return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     *(_Float16 const *) __P);
+}
+
+/* Stores the lower _Float16 value.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_store_sh (void *__P, __m128h __A)
+{
+  *(_Float16 *) __P = ((__v8hf)__A)[0];
+}
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 1768b88d748..4df6ee1009d 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -85,6 +85,7 @@ DEF_VECTOR_TYPE (V8QI, QI)
 # SSE vectors
 DEF_VECTOR_TYPE (V2DF, DOUBLE)
 DEF_VECTOR_TYPE (V4SF, FLOAT)
+DEF_VECTOR_TYPE (V8HF, FLOAT16)
 DEF_VECTOR_TYPE (V2DI, DI)
 DEF_VECTOR_TYPE (V4SI, SI)
 DEF_VECTOR_TYPE (V8HI, HI)
@@ -1297,4 +1298,7 @@ DEF_FUNCTION_TYPE (UINT, UINT, V2DI, V2DI, PVOID)
 DEF_FUNCTION_TYPE (UINT, UINT, V2DI, PVOID)
 DEF_FUNCTION_TYPE (VOID, V2DI, V2DI, V2DI, UINT)
 DEF_FUNCTION_TYPE (UINT8, PV2DI, V2DI, PCVOID)
-DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
\ No newline at end of file
+DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
+
+# FP16 builtins
+DEF_FUNCTION_TYPE (V8HF, V8HI)
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index b7d050a1e42..bb965ca0e9b 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3952,6 +3952,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
       break;
     case E_V16QImode:
     case E_V8HImode:
+    case E_V8HFmode:
     case E_V4SImode:
     case E_V2DImode:
       if (TARGET_SSE4_1)
@@ -3974,6 +3975,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
       break;
     case E_V32QImode:
     case E_V16HImode:
+    case E_V16HFmode:
     case E_V8SImode:
     case E_V4DImode:
       if (TARGET_AVX2)
@@ -3993,6 +3995,9 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
     case E_V32HImode:
       gen = gen_avx512bw_blendmv32hi;
       break;
+    case E_V32HFmode:
+      gen = gen_avx512bw_blendmv32hf;
+      break;
     case E_V16SImode:
       gen = gen_avx512f_blendmv16si;
       break;
@@ -14144,6 +14149,11 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
 	}
       return true;
 
+    case E_V8HFmode:
+    case E_V16HFmode:
+    case E_V32HFmode:
+      return ix86_vector_duplicate_value (mode, target, val);
+
     default:
       return false;
     }
@@ -14228,6 +14238,18 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
       use_vector_set = TARGET_AVX512F && TARGET_64BIT && one_var == 0;
       gen_vec_set_0 = gen_vec_setv8di_0;
       break;
+    case E_V8HFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv8hf_0;
+      break;
+    case E_V16HFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv16hf_0;
+      break;
+    case E_V32HFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv32hf_0;
+      break;
     default:
       break;
     }
@@ -14377,6 +14399,8 @@ ix86_expand_vector_init_one_var (bool mmx_ok, machine_mode mode,
       if (!TARGET_64BIT)
 	return false;
       /* FALLTHRU */
+    case E_V8HFmode:
+    case E_V16HFmode:
     case E_V4DFmode:
     case E_V8SFmode:
     case E_V8SImode:
@@ -14457,6 +14481,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
     case 2:
       switch (mode)
 	{
+	case E_V32HFmode:
+	  half_mode = V16HFmode;
+	  break;
 	case E_V16SImode:
 	  half_mode = V8SImode;
 	  break;
@@ -14469,6 +14496,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
 	case E_V8DFmode:
 	  half_mode = V4DFmode;
 	  break;
+	case E_V16HFmode:
+	  half_mode = V8HFmode;
+	  break;
 	case E_V8SImode:
 	  half_mode = V4SImode;
 	  break;
@@ -14611,13 +14641,22 @@ ix86_expand_vector_init_interleave (machine_mode mode,
 {
   machine_mode first_imode, second_imode, third_imode, inner_mode;
   int i, j;
-  rtx op0, op1;
+  rtx op, op0, op1;
   rtx (*gen_load_even) (rtx, rtx, rtx);
   rtx (*gen_interleave_first_low) (rtx, rtx, rtx);
   rtx (*gen_interleave_second_low) (rtx, rtx, rtx);
 
   switch (mode)
     {
+    case E_V8HFmode:
+      gen_load_even = gen_vec_setv8hf;
+      gen_interleave_first_low = gen_vec_interleave_lowv4si;
+      gen_interleave_second_low = gen_vec_interleave_lowv2di;
+      inner_mode = HFmode;
+      first_imode = V4SImode;
+      second_imode = V2DImode;
+      third_imode = VOIDmode;
+      break;
     case E_V8HImode:
       gen_load_even = gen_vec_setv8hi;
       gen_interleave_first_low = gen_vec_interleave_lowv4si;
@@ -14642,9 +14681,19 @@ ix86_expand_vector_init_interleave (machine_mode mode,
 
   for (i = 0; i < n; i++)
     {
+      op = ops [i + i];
+      if (inner_mode == HFmode)
+	{
+	  /* Convert HFmode to HImode.  */
+	  op1 = gen_reg_rtx (HImode);
+	  op1 = gen_rtx_SUBREG (HImode, force_reg (HFmode, op), 0);
+	  op = gen_reg_rtx (HImode);
+	  emit_move_insn (op, op1);
+	}
+
       /* Extend the odd elment to SImode using a paradoxical SUBREG.  */
       op0 = gen_reg_rtx (SImode);
-      emit_move_insn (op0, gen_lowpart (SImode, ops [i + i]));
+      emit_move_insn (op0, gen_lowpart (SImode, op));
 
       /* Insert the SImode value as low element of V4SImode vector. */
       op1 = gen_reg_rtx (V4SImode);
@@ -14781,6 +14830,10 @@ ix86_expand_vector_init_general (bool mmx_ok, machine_mode mode,
       half_mode = V8HImode;
       goto half;
 
+    case E_V16HFmode:
+      half_mode = V8HFmode;
+      goto half;
+
 half:
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -14804,6 +14857,11 @@ half:
       half_mode = V16HImode;
       goto quarter;
 
+    case E_V32HFmode:
+      quarter_mode = V8HFmode;
+      half_mode = V16HFmode;
+      goto quarter;
+
 quarter:
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -14840,6 +14898,9 @@ quarter:
 	 move from GPR to SSE register directly.  */
       if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
 	break;
+      /* FALLTHRU */
+
+    case E_V8HFmode:
 
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -15087,6 +15148,16 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
 	case E_V16SFmode:
 	  cmp_mode = V16SImode;
 	  break;
+	/* TARGET_AVX512FP16 implies TARGET_AVX512BW.  */
+	case E_V8HFmode:
+	  cmp_mode = V8HImode;
+	  break;
+	case E_V16HFmode:
+	  cmp_mode = V16HImode;
+	  break;
+	case E_V32HFmode:
+	  cmp_mode = V32HImode;
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -15123,23 +15194,25 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
   machine_mode half_mode;
   bool use_vec_merge = false;
   rtx tmp;
-  static rtx (*gen_extract[6][2]) (rtx, rtx)
+  static rtx (*gen_extract[7][2]) (rtx, rtx)
     = {
 	{ gen_vec_extract_lo_v32qi, gen_vec_extract_hi_v32qi },
 	{ gen_vec_extract_lo_v16hi, gen_vec_extract_hi_v16hi },
 	{ gen_vec_extract_lo_v8si, gen_vec_extract_hi_v8si },
 	{ gen_vec_extract_lo_v4di, gen_vec_extract_hi_v4di },
 	{ gen_vec_extract_lo_v8sf, gen_vec_extract_hi_v8sf },
-	{ gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df }
+	{ gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df },
+	{ gen_vec_extract_lo_v16hf, gen_vec_extract_hi_v16hf }
       };
-  static rtx (*gen_insert[6][2]) (rtx, rtx, rtx)
+  static rtx (*gen_insert[7][2]) (rtx, rtx, rtx)
     = {
 	{ gen_vec_set_lo_v32qi, gen_vec_set_hi_v32qi },
 	{ gen_vec_set_lo_v16hi, gen_vec_set_hi_v16hi },
 	{ gen_vec_set_lo_v8si, gen_vec_set_hi_v8si },
 	{ gen_vec_set_lo_v4di, gen_vec_set_hi_v4di },
 	{ gen_vec_set_lo_v8sf, gen_vec_set_hi_v8sf },
-	{ gen_vec_set_lo_v4df, gen_vec_set_hi_v4df }
+	{ gen_vec_set_lo_v4df, gen_vec_set_hi_v4df },
+	{ gen_vec_set_lo_v16hf, gen_vec_set_hi_v16hf },
       };
   int i, j, n;
   machine_mode mmode = VOIDmode;
@@ -15306,6 +15379,10 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
 	}
       return;
 
+    case E_V8HFmode:
+      use_vec_merge = true;
+      break;
+
     case E_V8HImode:
     case E_V2HImode:
       use_vec_merge = TARGET_SSE2;
@@ -15329,6 +15406,12 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
       n = 16;
       goto half;
 
+    case E_V16HFmode:
+      half_mode = V8HFmode;
+      j = 6;
+      n = 8;
+      goto half;
+
     case E_V16HImode:
       half_mode = V8HImode;
       j = 1;
@@ -15409,6 +15492,13 @@ half:
 	}
       break;
 
+    case E_V32HFmode:
+      if (TARGET_AVX512BW)
+	{
+	  mmode = SImode;
+	  gen_blendm = gen_avx512bw_blendmv32hf;
+	}
+      break;
     case E_V32HImode:
       if (TARGET_AVX512BW)
 	{
@@ -15780,6 +15870,28 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
       ix86_expand_vector_extract (false, target, tmp, elt & 3);
       return;
 
+    case E_V32HFmode:
+      tmp = gen_reg_rtx (V16HFmode);
+      if (elt < 16)
+	emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
+      else
+	emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
+      ix86_expand_vector_extract (false, target, tmp, elt & 15);
+      return;
+
+    case E_V16HFmode:
+      tmp = gen_reg_rtx (V8HFmode);
+      if (elt < 8)
+	emit_insn (gen_vec_extract_lo_v16hf (tmp, vec));
+      else
+	emit_insn (gen_vec_extract_hi_v16hf (tmp, vec));
+      ix86_expand_vector_extract (false, target, tmp, elt & 7);
+      return;
+
+    case E_V8HFmode:
+      use_vec_extr = true;
+      break;
+
     case E_V8QImode:
       use_vec_extr = TARGET_MMX_WITH_SSE && TARGET_SSE4_1;
       /* ??? Could extract the appropriate HImode element and shift.  */
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 9232f59a925..fcadfcd4c94 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -84,12 +84,12 @@ VECTOR_MODES (INT, 16);       /*   V16QI V8HI V4SI V2DI */
 VECTOR_MODES (INT, 32);       /*  V32QI V16HI V8SI V4DI */
 VECTOR_MODES (INT, 64);       /* V64QI V32HI V16SI V8DI */
 VECTOR_MODES (INT, 128);      /* V128QI V64HI V32SI V16DI */
-VECTOR_MODES (FLOAT, 8);      /*                   V2SF */
-VECTOR_MODES (FLOAT, 16);     /*              V4SF V2DF */
-VECTOR_MODES (FLOAT, 32);     /*         V8SF V4DF V2TF */
-VECTOR_MODES (FLOAT, 64);     /*        V16SF V8DF V4TF */
-VECTOR_MODES (FLOAT, 128);    /*       V32SF V16DF V8TF */
-VECTOR_MODES (FLOAT, 256);    /*      V64SF V32DF V16TF */
+VECTOR_MODES (FLOAT, 8);      /*              V4HF V2SF */
+VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
+VECTOR_MODES (FLOAT, 32);     /*   V16HF V8SF V4DF V2TF */
+VECTOR_MODES (FLOAT, 64);     /*  V32HF V16SF V8DF V4TF */
+VECTOR_MODES (FLOAT, 128);    /* V64HF V32SF V16DF V8TF */
+VECTOR_MODES (FLOAT, 256);    /* V128HF V64SF V32DF V16TF */
 VECTOR_MODE (INT, TI, 1);     /*                   V1TI */
 VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
 VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e826484a4f4..9fd36ff4c59 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2418,6 +2418,7 @@ classify_argument (machine_mode mode, const_tree type,
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
+    case E_V16HFmode:
     case E_V16HImode:
     case E_V4DFmode:
     case E_V4DImode:
@@ -2428,6 +2429,7 @@ classify_argument (machine_mode mode, const_tree type,
       return 4;
     case E_V8DFmode:
     case E_V16SFmode:
+    case E_V32HFmode:
     case E_V8DImode:
     case E_V16SImode:
     case E_V32HImode:
@@ -2445,6 +2447,7 @@ classify_argument (machine_mode mode, const_tree type,
     case E_V4SImode:
     case E_V16QImode:
     case E_V8HImode:
+    case E_V8HFmode:
     case E_V2DFmode:
     case E_V2DImode:
       classes[0] = X86_64_SSE_CLASS;
@@ -2929,7 +2932,9 @@ function_arg_advance_64 (CUMULATIVE_ARGS *cum, machine_mode mode,
 
   /* Unnamed 512 and 256bit vector mode parameters are passed on stack.  */
   if (!named && (VALID_AVX512F_REG_MODE (mode)
-		 || VALID_AVX256_REG_MODE (mode)))
+		 || VALID_AVX256_REG_MODE (mode)
+		 || mode == V16HFmode
+		 || mode == V32HFmode))
     return 0;
 
   if (!examine_argument (mode, type, 0, &int_nregs, &sse_nregs)
@@ -3176,12 +3181,14 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode,
     default:
       break;
 
+    case E_V16HFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
     case E_V16HImode:
     case E_V4DFmode:
     case E_V4DImode:
+    case E_V32HFmode:
     case E_V16SFmode:
     case E_V16SImode:
     case E_V64QImode:
@@ -4676,12 +4683,14 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
   nat_mode = type_natural_mode (type, NULL, false);
   switch (nat_mode)
     {
+    case E_V16HFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
     case E_V16HImode:
     case E_V4DFmode:
     case E_V4DImode:
+    case E_V32HFmode:
     case E_V16SFmode:
     case E_V16SImode:
     case E_V64QImode:
@@ -5348,7 +5357,12 @@ ix86_get_ssemov (rtx *operands, unsigned size,
       switch (type)
 	{
 	case opcode_int:
-	  opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	  if (scalar_mode == E_HFmode)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
+		      : "vmovdqa64");
+	  else
+	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
 	  break;
 	case opcode_float:
 	  opcode = misaligned_p ? "vmovups" : "vmovaps";
@@ -5362,6 +5376,11 @@ ix86_get_ssemov (rtx *operands, unsigned size,
     {
       switch (scalar_mode)
 	{
+	case E_HFmode:
+	  opcode = (misaligned_p
+		    ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
+		    : "vmovdqa64");
+	  break;
 	case E_SFmode:
 	  opcode = misaligned_p ? "%vmovups" : "%vmovaps";
 	  break;
@@ -19293,7 +19312,6 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
       int index;
       switch (mode)
 	{
-	  case E_HFmode:
 	  case E_SFmode:
 	    index = 0;
 	    break;
@@ -19394,31 +19412,12 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
 	  }
 	break;
       case 2:
-	{
-	  int cost;
-	  if (in == 2)
-	    cost = MAX (ix86_cost->hard_register.int_load[1],
-			ix86_cost->hard_register.int_store[1]);
-	  else
-	    cost = in ? ix86_cost->hard_register.int_load[1]
-		      : ix86_cost->hard_register.int_store[1];
-	  if (mode == E_HFmode)
-	    {
-	      /* Prefer SSE over GPR for HFmode.  */
-	      int sse_cost;
-	      int index = sse_store_index (mode);
-	      if (in == 2)
-		sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
-				ix86_cost->hard_register.sse_store[index]);
-	      else
-		sse_cost = (in
-			    ? ix86_cost->hard_register.sse_load [index]
-			    : ix86_cost->hard_register.sse_store [index]);
-	      if (sse_cost >= cost)
-		cost = sse_cost + 1;
-	    }
-	  return cost;
-	}
+	if (in == 2)
+	  return MAX (ix86_cost->hard_register.int_load[1],
+		      ix86_cost->hard_register.int_store[1]);
+	else
+	  return in ? ix86_cost->hard_register.int_load[1]
+		    : ix86_cost->hard_register.int_store[1];
       default:
 	if (in == 2)
 	  cost = MAX (ix86_cost->hard_register.int_load[2],
@@ -19596,6 +19595,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	 between gpr and sse registser.  */
       if (TARGET_AVX512F
 	  && (mode == XImode
+	      || mode == V32HFmode
 	      || VALID_AVX512F_REG_MODE (mode)
 	      || VALID_AVX512F_SCALAR_MODE (mode)))
 	return true;
@@ -19610,9 +19610,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
       /* TODO check for QI/HI scalars.  */
       /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
       if (TARGET_AVX512VL
-	  && (mode == OImode
-	      || mode == TImode
-	      || VALID_AVX256_REG_MODE (mode)
+	  && (VALID_AVX256_REG_OR_OI_VHF_MODE (mode)
 	      || VALID_AVX512VL_128_REG_MODE (mode)))
 	return true;
 
@@ -19622,9 +19620,9 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 
       /* OImode and AVX modes are available only when AVX is enabled.  */
       return ((TARGET_AVX
-	       && VALID_AVX256_REG_OR_OI_MODE (mode))
+	       && VALID_AVX256_REG_OR_OI_VHF_MODE (mode))
 	      || VALID_SSE_REG_MODE (mode)
-	      || VALID_SSE2_REG_MODE (mode)
+	      || VALID_SSE2_REG_VHF_MODE (mode)
 	      || VALID_MMX_REG_MODE (mode)
 	      || VALID_MMX_REG_MODE_3DNOW (mode));
     }
@@ -19837,7 +19835,8 @@ ix86_set_reg_reg_cost (machine_mode mode)
 
     case MODE_VECTOR_INT:
     case MODE_VECTOR_FLOAT:
-      if ((TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
+      if ((TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
+	  || (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
 	  || (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
 	  || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
 	  || (TARGET_SSE && VALID_SSE_REG_MODE (mode))
@@ -21703,6 +21702,8 @@ ix86_vector_mode_supported_p (machine_mode mode)
   if ((TARGET_MMX || TARGET_MMX_WITH_SSE)
       && VALID_MMX_REG_MODE (mode))
     return true;
+  if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
+    return true;
   if ((TARGET_3DNOW || TARGET_MMX_WITH_SSE)
       && VALID_MMX_REG_MODE_3DNOW (mode))
     return true;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index dca2ad32ed4..086dbafbcee 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -995,8 +995,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V4DImode || (MODE) == V2TImode || (MODE) == V8SFmode	\
    || (MODE) == V4DFmode)
 
-#define VALID_AVX256_REG_OR_OI_MODE(MODE)		\
-  (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
+#define VALID_AVX256_REG_OR_OI_VHF_MODE(MODE)		\
+  (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode || (MODE) == V16HFmode)
 
 #define VALID_AVX512F_SCALAR_MODE(MODE)					\
   ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode		\
@@ -1014,13 +1014,20 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define VALID_AVX512VL_128_REG_MODE(MODE)				\
   ((MODE) == V2DImode || (MODE) == V2DFmode || (MODE) == V16QImode	\
    || (MODE) == V4SImode || (MODE) == V4SFmode || (MODE) == V8HImode	\
-   || (MODE) == TFmode || (MODE) == V1TImode)
+   || (MODE) == TFmode || (MODE) == V1TImode || (MODE) == V8HFmode	\
+   || (MODE) == TImode)
+
+#define VALID_AVX512FP16_REG_MODE(MODE)					\
+  ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode)
 
 #define VALID_SSE2_REG_MODE(MODE)					\
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode	\
    || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode	\
    || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
 
+#define VALID_SSE2_REG_VHF_MODE(MODE)			\
+  (VALID_SSE2_REG_MODE (MODE) || (MODE) == V8HFmode)
+
 #define VALID_SSE_REG_MODE(MODE)					\
   ((MODE) == V1TImode || (MODE) == TImode				\
    || (MODE) == V4SFmode || (MODE) == V4SImode				\
@@ -1064,7 +1071,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode	\
    || (MODE) == V2TImode || (MODE) == V8DImode || (MODE) == V64QImode	\
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
-   || (MODE) == V16SFmode)
+   || (MODE) == V16SFmode || VALID_AVX512FP16_REG_MODE (MODE))
 
 #define X87_FLOAT_MODE_P(MODE)	\
   (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8f11cbcf28b..20945fabb2c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -496,8 +496,8 @@ (define_attr "type"
 
 ;; Main data type used by the insn
 (define_attr "mode"
-  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
-  V2DF,V2SF,V1DF,V8DF"
+  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V32HF,V16HF,V8HF,
+   V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF"
   (const_string "unknown"))
 
 ;; The CPU unit operations uses.
@@ -1098,7 +1098,8 @@ (define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
 			     (V2DI "16") (V4DI "32") (V8DI "64")
 			     (V1TI "16") (V2TI "32") (V4TI "64")
 			     (V2DF "16") (V4DF "32") (V8DF "64")
-			     (V4SF "16") (V8SF "32") (V16SF "64")])
+			     (V4SF "16") (V8SF "32") (V16SF "64")
+			     (V8HF "16") (V16HF "32") (V32HF "64")])
 
 ;; Double word integer modes as mode attribute.
 (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "OI")])
@@ -1236,9 +1237,9 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
 ;; SSE instruction suffix for various modes
 (define_mode_attr ssemodesuffix
   [(HF "sh") (SF "ss") (DF "sd")
-   (V16SF "ps") (V8DF "pd")
-   (V8SF "ps") (V4DF "pd")
-   (V4SF "ps") (V2DF "pd")
+   (V32HF "ph") (V16SF "ps") (V8DF "pd")
+   (V16HF "ph") (V8SF "ps") (V4DF "pd")
+   (V8HF "ph") (V4SF "ps") (V2DF "pd")
    (V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
    (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
    (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")])
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ab29999023d..b004b5eee74 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -225,6 +225,7 @@ (define_mode_iterator VMOVE
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI
+   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") V2DF])
 
@@ -240,6 +241,13 @@ (define_mode_iterator VI12_AVX512VL
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
    V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
 
+(define_mode_iterator VI12HF_AVX512VL
+  [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
+   V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
+   (V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")])
+
 ;; Same iterator, but without supposed TARGET_AVX512BW
 (define_mode_iterator VI12_AVX512VLBW
   [(V64QI "TARGET_AVX512BW") (V16QI "TARGET_AVX512VL")
@@ -255,6 +263,8 @@ (define_mode_iterator V
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
@@ -277,7 +287,8 @@ (define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF])
 (define_mode_iterator V_256_512
   [V32QI V16HI V8SI V4DI V8SF V4DF
    (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V16SI "TARGET_AVX512F")
-   (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
+   (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
+   (V16HF "TARGET_AVX512FP16") (V32HF "TARGET_AVX512FP16")])
 
 ;; All vector float modes
 (define_mode_iterator VF
@@ -321,6 +332,11 @@ (define_mode_iterator VF2_512_256VL
 (define_mode_iterator VF_128
   [V4SF (V2DF "TARGET_SSE2")])
 
+;; All 128bit vector HF/SF/DF modes
+(define_mode_iterator VFH_128
+  [(V8HF "TARGET_AVX512FP16")
+   V4SF (V2DF "TARGET_SSE2")])
+
 ;; All 256bit vector float modes
 (define_mode_iterator VF_256
   [V8SF V4DF])
@@ -347,6 +363,9 @@ (define_mode_iterator VF2_AVX512VL
 (define_mode_iterator VF1_AVX512VL
   [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
 
+(define_mode_iterator VF_AVX512FP16
+  [V32HF V16HF V8HF])
+
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
@@ -355,6 +374,16 @@ (define_mode_iterator VI
    (V8SI "TARGET_AVX") V4SI
    (V4DI "TARGET_AVX") V2DI])
 
+;; All vector integer and HF modes
+(define_mode_iterator VIHF
+  [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+   (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
+   (V8SI "TARGET_AVX") V4SI
+   (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")])
+
 (define_mode_iterator VI_AVX2
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
@@ -557,6 +586,7 @@ (define_mode_attr avx512
    (V8HI  "avx512vl") (V16HI  "avx512vl") (V32HI "avx512bw")
    (V4SI  "avx512vl") (V8SI  "avx512vl") (V16SI "avx512f")
    (V2DI  "avx512vl") (V4DI  "avx512vl") (V8DI "avx512f")
+   (V8HF "avx512fp16") (V16HF "avx512vl") (V32HF "avx512bw")
    (V4SF "avx512vl") (V8SF "avx512vl") (V16SF "avx512f")
    (V2DF "avx512vl") (V4DF "avx512vl") (V8DF "avx512f")])
 
@@ -617,12 +647,13 @@ (define_mode_attr avx2_avx512
    (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw")])
 
 (define_mode_attr shuffletype
-  [(V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i")
-  (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i")
-  (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i")
-  (V32HI "i") (V16HI "i") (V8HI "i")
-  (V64QI "i") (V32QI "i") (V16QI "i")
-  (V4TI "i") (V2TI "i") (V1TI "i")])
+  [(V32HF "f") (V16HF "f") (V8HF "f")
+   (V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i")
+   (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i")
+   (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i")
+   (V32HI "i") (V16HI "i") (V8HI "i")
+   (V64QI "i") (V32QI "i") (V16QI "i")
+   (V4TI "i") (V2TI "i") (V1TI "i")])
 
 (define_mode_attr ssequartermode
   [(V16SF "V4SF") (V8DF "V2DF") (V16SI "V4SI") (V8DI "V2DI")])
@@ -659,6 +690,8 @@ (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
 
 ;; All 128 and 256bit vector integer modes
 (define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
+;; All 256bit vector integer and HF modes
+(define_mode_iterator VIHF_256 [V32QI V16HI V8SI V4DI V16HF])
 
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
@@ -680,6 +713,9 @@ (define_mode_iterator VI48_512 [V16SI V8DI])
 (define_mode_iterator VI4_256_8_512 [V8SI V8DI])
 (define_mode_iterator VI_AVX512BW
   [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
+(define_mode_iterator VIHF_AVX512BW
+  [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
+  (V32HF "TARGET_AVX512FP16")])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -720,6 +756,9 @@ (define_mode_iterator VF_AVX512
    (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
    V16SF V8DF])
 
+(define_mode_iterator V16_256 [V16HI V16HF])
+(define_mode_iterator V32_512 [V32HI V32HF])
+
 (define_mode_attr avx512bcst
   [(V4SI "%{1to4%}") (V2DI "%{1to2%}")
    (V8SI "%{1to8%}") (V4DI "%{1to4%}")
@@ -730,8 +769,10 @@ (define_mode_attr avx512bcst
 
 ;; Mapping from float mode to required SSE level
 (define_mode_attr sse
-  [(SF "sse") (DF "sse2")
+  [(SF "sse") (DF "sse2") (HF "avx512fp16")
    (V4SF "sse") (V2DF "sse2")
+   (V32HF "avx512fp16") (V16HF "avx512fp16")
+   (V8HF "avx512fp16")
    (V16SF "avx512f") (V8SF "avx")
    (V8DF "avx512f") (V4DF "avx")])
 
@@ -767,14 +808,23 @@ (define_mode_attr sseinsnmode
    (V16SF "V16SF") (V8DF "V8DF")
    (V8SF "V8SF") (V4DF "V4DF")
    (V4SF "V4SF") (V2DF "V2DF")
+   (V8HF "TI") (V16HF "OI") (V32HF "XI")
    (TI "TI")])
 
+;; SSE integer instruction suffix for various modes
+(define_mode_attr sseintmodesuffix
+  [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
+   (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
+   (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")
+   (V8HF "w") (V16HF "w") (V32HF "w")])
+
 ;; Mapping of vector modes to corresponding mask size
 (define_mode_attr avx512fmaskmode
   [(V64QI "DI") (V32QI "SI") (V16QI "HI")
    (V32HI "SI") (V16HI "HI") (V8HI  "QI") (V4HI "QI")
    (V16SI "HI") (V8SI  "QI") (V4SI  "QI")
    (V8DI  "QI") (V4DI  "QI") (V2DI  "QI")
+   (V32HF "SI") (V16HF "HI") (V8HF  "QI")
    (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
@@ -784,6 +834,7 @@ (define_mode_attr avx512fmaskmodelower
    (V32HI "si") (V16HI "hi") (V8HI  "qi") (V4HI "qi")
    (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
    (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
+   (V32HF "si") (V16HF "hi") (V8HF  "qi")
    (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
    (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
 
@@ -828,7 +879,8 @@ (define_mode_attr ssedoublevecmode
    (V16QI "V32QI") (V8HI "V16HI") (V4SI "V8SI") (V2DI "V4DI")
    (V16SF "V32SF") (V8DF "V16DF")
    (V8SF "V16SF") (V4DF "V8DF")
-   (V4SF "V8SF") (V2DF "V4DF")])
+   (V4SF "V8SF") (V2DF "V4DF")
+   (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")])
 
 ;; Mapping of vector modes to a vector mode of half size
 ;; instead of V1DI/V1DF, DI/DF are used for V2DI/V2DF although they are scalar.
@@ -838,7 +890,8 @@ (define_mode_attr ssehalfvecmode
    (V16QI  "V8QI") (V8HI   "V4HI") (V4SI  "V2SI") (V2DI "DI")
    (V16SF "V8SF") (V8DF "V4DF")
    (V8SF  "V4SF") (V4DF "V2DF")
-   (V4SF  "V2SF") (V2DF "DF")])
+   (V4SF  "V2SF") (V2DF "DF")
+   (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")])
 
 (define_mode_attr ssehalfvecmodelower
   [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti")
@@ -846,9 +899,10 @@ (define_mode_attr ssehalfvecmodelower
    (V16QI  "v8qi") (V8HI   "v4hi") (V4SI  "v2si")
    (V16SF "v8sf") (V8DF "v4df")
    (V8SF  "v4sf") (V4DF "v2df")
-   (V4SF  "v2sf")])
+   (V4SF  "v2sf")
+   (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")])
 
-;; Mapping of vector modes ti packed single mode of the same size
+;; Mapping of vector modes to packed single mode of the same size
 (define_mode_attr ssePSmode
   [(V16SI "V16SF") (V8DF "V16SF")
    (V16SF "V16SF") (V8DI "V16SF")
@@ -858,7 +912,8 @@ (define_mode_attr ssePSmode
    (V4DI "V8SF") (V2DI "V4SF")
    (V4TI "V16SF") (V2TI "V8SF") (V1TI "V4SF")
    (V8SF "V8SF") (V4SF "V4SF")
-   (V4DF "V8SF") (V2DF "V4SF")])
+   (V4DF "V8SF") (V2DF "V4SF")
+   (V32HF "V16SF") (V16HF "V8SF") (V8HF "V4SF")])
 
 (define_mode_attr ssePSmode2
   [(V8DI "V8SF") (V4DI "V4SF")])
@@ -869,6 +924,7 @@ (define_mode_attr ssescalarmode
    (V32HI "HI") (V16HI "HI") (V8HI "HI")
    (V16SI "SI") (V8SI "SI")  (V4SI "SI")
    (V8DI "DI")  (V4DI "DI")  (V2DI "DI")
+   (V32HF "HF") (V16HF "HF") (V8HF "HF")
    (V16SF "SF") (V8SF "SF")  (V4SF "SF")
    (V8DF "DF")  (V4DF "DF")  (V2DF "DF")
    (V4TI "TI")  (V2TI "TI")])
@@ -879,6 +935,7 @@ (define_mode_attr ssescalarmodelower
    (V32HI "hi") (V16HI "hi") (V8HI "hi")
    (V16SI "si") (V8SI "si")  (V4SI "si")
    (V8DI "di")  (V4DI "di")  (V2DI "di")
+   (V32HF "hf") (V16HF "hf")  (V8HF "hf")
    (V16SF "sf") (V8SF "sf")  (V4SF "sf")
    (V8DF "df")  (V4DF "df")  (V2DF "df")
    (V4TI "ti")  (V2TI "ti")])
@@ -889,6 +946,7 @@ (define_mode_attr ssexmmmode
    (V32HI "V8HI")  (V16HI "V8HI") (V8HI "V8HI")
    (V16SI "V4SI")  (V8SI "V4SI")  (V4SI "V4SI")
    (V8DI "V2DI")   (V4DI "V2DI")  (V2DI "V2DI")
+   (V32HF "V8HF")  (V16HF "V8HF") (V8HF "V8HF")
    (V16SF "V4SF")  (V8SF "V4SF")  (V4SF "V4SF")
    (V8DF "V2DF")   (V4DF "V2DF")  (V2DF "V2DF")])
 
@@ -931,10 +989,11 @@ (define_mode_attr ssescalarsize
    (V64QI "8") (V32QI "8") (V16QI "8")
    (V32HI "16") (V16HI "16") (V8HI "16")
    (V16SI "32") (V8SI "32") (V4SI "32")
+   (V32HF "16") (V16HF "16") (V8HF "16")
    (V16SF "32") (V8SF "32") (V4SF "32")
    (V8DF "64") (V4DF "64") (V2DF "64")])
 
-;; SSE prefix for integer vector modes
+;; SSE prefix for integer and HF vector modes
 (define_mode_attr sseintprefix
   [(V2DI  "p") (V2DF  "")
    (V4DI  "p") (V4DF  "")
@@ -942,16 +1001,16 @@ (define_mode_attr sseintprefix
    (V4SI  "p") (V4SF  "")
    (V8SI  "p") (V8SF  "")
    (V16SI "p") (V16SF "")
-   (V16QI "p") (V8HI "p")
-   (V32QI "p") (V16HI "p")
-   (V64QI "p") (V32HI "p")])
+   (V16QI "p") (V8HI "p") (V8HF "p")
+   (V32QI "p") (V16HI "p") (V16HF "p")
+   (V64QI "p") (V32HI "p") (V32HF "p")])
 
 ;; SSE scalar suffix for vector modes
 (define_mode_attr ssescalarmodesuffix
-  [(SF "ss") (DF "sd")
-   (V16SF "ss") (V8DF "sd")
-   (V8SF "ss") (V4DF "sd")
-   (V4SF "ss") (V2DF "sd")
+  [(HF "sh") (SF "ss") (DF "sd")
+   (V32HF "sh") (V16SF "ss") (V8DF "sd")
+   (V16HF "sh") (V8SF "ss") (V4DF "sd")
+   (V8HF "sh") (V4SF "ss") (V2DF "sd")
    (V16SI "d") (V8DI "q")
    (V8SI "d") (V4DI "q")
    (V4SI "d") (V2DI "q")])
@@ -979,7 +1038,8 @@ (define_mode_attr castmode
 ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise.
 ;; i64x4 or f64x4 for 512bit modes.
 (define_mode_attr i128
-  [(V16SF "f64x4") (V8SF "f128") (V8DF "f64x4") (V4DF "f128")
+  [(V16HF "%~128") (V32HF "i64x4") (V16SF "f64x4") (V8SF "f128")
+   (V8DF "f64x4") (V4DF "f128")
    (V64QI "i64x4") (V32QI "%~128") (V32HI "i64x4") (V16HI "%~128")
    (V16SI "i64x4") (V8SI "%~128") (V8DI "i64x4") (V4DI "%~128")])
 
@@ -1003,14 +1063,18 @@ (define_mode_attr bcstscalarsuff
    (V32HI "w")  (V16HI "w") (V8HI "w")
    (V16SI "d")  (V8SI "d")  (V4SI "d")
    (V8DI "q")   (V4DI "q")  (V2DI "q")
+   (V32HF "w")  (V16HF "w") (V8HF "w")
    (V16SF "ss") (V8SF "ss") (V4SF "ss")
    (V8DF "sd")  (V4DF "sd") (V2DF "sd")])
 
 ;; Tie mode of assembler operand to mode iterator
 (define_mode_attr xtg_mode
-  [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x") (V4SF "x") (V2DF "x")
-   (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t")
-   (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")])
+  [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x")
+   (V8HF "x") (V4SF "x") (V2DF "x")
+   (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t")
+   (V16HF "t") (V8SF "t") (V4DF "t")
+   (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g")
+   (V32HF "g") (V16SF "g") (V8DF "g")])
 
 ;; Half mask mode for unpacks
 (define_mode_attr HALFMASKMODE
@@ -1306,6 +1370,20 @@ (define_insn "<avx512>_blendm<mode>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "<avx512>_blendm<mode>"
+  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v,v")
+	(vec_merge:VF_AVX512FP16
+	  (match_operand:VF_AVX512FP16 2 "nonimmediate_operand" "vm,vm")
+	  (match_operand:VF_AVX512FP16 1 "nonimm_or_0_operand" "0C,v")
+	  (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk,Yk")))]
+  "TARGET_AVX512BW"
+  "@
+    vmovdqu<ssescalarsize>\t{%2, %0%{%3%}%N1|%0%{%3%}%N1, %2}
+    vpblendmw\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "<avx512>_store<mode>_mask"
   [(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m")
 	(vec_merge:V48_AVX512VL
@@ -1903,12 +1981,12 @@ (define_insn "*<insn><mode>3<mask_name><round_name>"
 ;; Standard scalar operation patterns which preserve the rest of the
 ;; vector for combiner.
 (define_insn "*<sse>_vm<insn><mode>3"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (vec_duplicate:VF_128
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (vec_duplicate:VFH_128
 	    (plusminus:<ssescalarmode>
 	      (vec_select:<ssescalarmode>
-	        (match_operand:VF_128 1 "register_operand" "0,v")
+		(match_operand:VFH_128 1 "register_operand" "0,v")
 		(parallel [(const_int 0)]))
 	      (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")))
 	  (match_dup 1)
@@ -1919,7 +1997,16 @@ (define_insn "*<sse>_vm<insn><mode>3"
    v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set (attr "prefix")
+     (cond [(eq_attr "alternative" "0")
+	      (const_string "orig")
+	    (eq_attr "alternative" "1")
+	      (if_then_else
+		(match_test "<MODE>mode == V8HFmode")
+		(const_string "evex")
+		(const_string "vex"))
+	   ]
+	   (const_string "*")))
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_insn "<sse>_vm<insn><mode>3<mask_scalar_name><round_scalar_name>"
@@ -1966,12 +2053,12 @@ (define_insn "*mul<mode>3<mask_name><round_name>"
 ;; Standard scalar operation patterns which preserve the rest of the
 ;; vector for combiner.
 (define_insn "*<sse>_vm<multdiv_mnemonic><mode>3"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (vec_duplicate:VF_128
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (vec_duplicate:VFH_128
 	    (multdiv:<ssescalarmode>
 	      (vec_select:<ssescalarmode>
-	        (match_operand:VF_128 1 "register_operand" "0,v")
+		(match_operand:VFH_128 1 "register_operand" "0,v")
 		(parallel [(const_int 0)]))
 	      (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")))
 	  (match_dup 1)
@@ -1982,7 +2069,16 @@ (define_insn "*<sse>_vm<multdiv_mnemonic><mode>3"
    v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
-   (set_attr "prefix" "orig,vex")
+   (set (attr "prefix")
+     (cond [(eq_attr "alternative" "0")
+	      (const_string "orig")
+	    (eq_attr "alternative" "1")
+	      (if_then_else
+		(match_test "<MODE>mode == V8HFmode")
+		(const_string "evex")
+		(const_string "vex"))
+	   ]
+	   (const_string "*")))
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -2368,12 +2464,12 @@ (define_insn "ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>"
 ;; Standard scalar operation patterns which preserve the rest of the
 ;; vector for combiner.
 (define_insn "*ieee_<ieee_maxmin><mode>3"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (vec_duplicate:VF_128
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (vec_duplicate:VFH_128
 	    (unspec:<ssescalarmode>
 	      [(vec_select:<ssescalarmode>
-	         (match_operand:VF_128 1 "register_operand" "0,v")
+		 (match_operand:VFH_128 1 "register_operand" "0,v")
 		 (parallel [(const_int 0)]))
 	       (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")]
 	       IEEE_MAXMIN))
@@ -2386,7 +2482,16 @@ (define_insn "*ieee_<ieee_maxmin><mode>3"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set (attr "prefix")
+     (cond [(eq_attr "alternative" "0")
+	      (const_string "orig")
+	    (eq_attr "alternative" "1")
+	      (if_then_else
+		(match_test "<MODE>mode == V8HFmode")
+		(const_string "evex")
+		(const_string "vex"))
+	   ]
+	   (const_string "*")))
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_insn "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>"
@@ -8364,6 +8469,45 @@ (define_insn "vec_set<mode>_0"
 	   ]
 	   (symbol_ref "true")))])
 
+;; vmovw clears also the higer bits
+(define_insn "vec_set<mode>_0"
+  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v")
+	(vec_merge:VF_AVX512FP16
+	  (vec_duplicate:VF_AVX512FP16
+	    (match_operand:HF 2 "nonimmediate_operand" "rm"))
+	  (match_operand:VF_AVX512FP16 1 "const0_operand" "C")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vmovw\t{%2, %x0|%x0, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
+(define_insn "*avx512fp16_movsh"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (vec_duplicate:V8HF
+	    (match_operand:HF 2 "register_operand" "v"))
+	  (match_operand:V8HF 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vmovsh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
+(define_insn "avx512fp16_movsh"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+          (match_operand:V8HF 2 "register_operand" "v")
+	  (match_operand:V8HF 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vmovsh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 ;; A subset is vec_setv4sf.
 (define_insn "*vec_setv4sf_sse4_1"
   [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
@@ -8499,6 +8643,20 @@ (define_expand "vec_set<mode>"
   DONE;
 })
 
+(define_expand "vec_setv8hf"
+  [(match_operand:V8HF 0 "register_operand")
+   (match_operand:HF 1 "register_operand")
+   (match_operand 2 "vec_setm_sse41_operand")]
+  "TARGET_SSE"
+{
+  if (CONST_INT_P (operands[2]))
+    ix86_expand_vector_set (false, operands[0], operands[1],
+			    INTVAL (operands[2]));
+  else
+    ix86_expand_vector_set_var (operands[0], operands[1], operands[2]);
+  DONE;
+})
+
 (define_expand "vec_set<mode>"
   [(match_operand:V_256_512 0 "register_operand")
    (match_operand:<ssescalarmode> 1 "register_operand")
@@ -9214,10 +9372,10 @@ (define_insn "vec_extract_hi_<mode>"
    (set_attr "length_immediate" "1")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn_and_split "vec_extract_lo_v32hi"
-  [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,v,m")
-	(vec_select:V16HI
-	  (match_operand:V32HI 1 "nonimmediate_operand" "v,m,v")
+(define_insn_and_split "vec_extract_lo_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,v,m")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V32_512 1 "nonimmediate_operand" "v,m,v")
 	  (parallel [(const_int 0) (const_int 1)
 		     (const_int 2) (const_int 3)
 		     (const_int 4) (const_int 5)
@@ -9244,9 +9402,10 @@ (define_insn_and_split "vec_extract_lo_v32hi"
   if (!TARGET_AVX512VL
       && REG_P (operands[0])
       && EXT_REX_SSE_REG_P (operands[1]))
-    operands[0] = lowpart_subreg (V32HImode, operands[0], V16HImode);
+    operands[0] = lowpart_subreg (<MODE>mode, operands[0],
+				  <ssehalfvecmode>mode);
   else
-    operands[1] = gen_lowpart (V16HImode, operands[1]);
+    operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);
 }
   [(set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
@@ -9255,10 +9414,10 @@ (define_insn_and_split "vec_extract_lo_v32hi"
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "vec_extract_hi_v32hi"
-  [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
-	(vec_select:V16HI
-	  (match_operand:V32HI 1 "register_operand" "v")
+(define_insn "vec_extract_hi_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V32_512 1 "register_operand" "v")
 	  (parallel [(const_int 16) (const_int 17)
 		     (const_int 18) (const_int 19)
 		     (const_int 20) (const_int 21)
@@ -9275,10 +9434,10 @@ (define_insn "vec_extract_hi_v32hi"
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn_and_split "vec_extract_lo_v16hi"
-  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,m")
-	(vec_select:V8HI
-	  (match_operand:V16HI 1 "nonimmediate_operand" "vm,v")
+(define_insn_and_split "vec_extract_lo_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,m")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V16_256 1 "nonimmediate_operand" "vm,v")
 	  (parallel [(const_int 0) (const_int 1)
 		     (const_int 2) (const_int 3)
 		     (const_int 4) (const_int 5)
@@ -9287,12 +9446,12 @@ (define_insn_and_split "vec_extract_lo_v16hi"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (V8HImode, operands[1]);")
+  "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);")
 
-(define_insn "vec_extract_hi_v16hi"
-  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=xm,vm,vm")
-	(vec_select:V8HI
-	  (match_operand:V16HI 1 "register_operand" "x,v,v")
+(define_insn "vec_extract_hi_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=xm,vm,vm")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V16_256 1 "register_operand" "x,v,v")
 	  (parallel [(const_int 8) (const_int 9)
 		     (const_int 10) (const_int 11)
 		     (const_int 12) (const_int 13)
@@ -9428,12 +9587,41 @@ (define_insn "vec_extract_hi_v32qi"
    (set_attr "prefix" "vex,evex,evex")
    (set_attr "mode" "OI")])
 
+;; NB: *vec_extract<mode>_0 must be placed before *vec_extracthf.
+;; Otherwise, it will be ignored.
+(define_insn_and_split "*vec_extract<mode>_0"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,r")
+	(vec_select:HF
+	  (match_operand:VF_AVX512FP16 1 "nonimmediate_operand" "vm,v,m")
+	  (parallel [(const_int 0)])))]
+  "TARGET_AVX512FP16 && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 1))]
+  "operands[1] = gen_lowpart (HFmode, operands[1]);")
+
+(define_insn "*vec_extracthf"
+  [(set (match_operand:HF 0 "register_sse4nonimm_operand" "=r,m")
+	(vec_select:HF
+	  (match_operand:V8HF 1 "register_operand" "v,v")
+	  (parallel
+	    [(match_operand:SI 2 "const_0_to_7_operand")])))]
+  "TARGET_AVX512FP16"
+  "@
+   vpextrw\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix" "maybe_evex")
+   (set_attr "mode" "TI")])
+
 ;; Modes handled by vec_extract patterns.
 (define_mode_iterator VEC_EXTRACT_MODE
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -14666,16 +14854,16 @@ (define_expand "vec_interleave_low<mode>"
 
 ;; Modes handled by pinsr patterns.
 (define_mode_iterator PINSR_MODE
-  [(V16QI "TARGET_SSE4_1") V8HI
+  [(V16QI "TARGET_SSE4_1") V8HI (V8HF "TARGET_AVX512FP16")
    (V4SI "TARGET_SSE4_1")
    (V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
 
 (define_mode_attr sse2p4_1
-  [(V16QI "sse4_1") (V8HI "sse2")
+  [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1")
    (V4SI "sse4_1") (V2DI "sse4_1")])
 
 (define_mode_attr pinsr_evex_isa
-  [(V16QI "avx512bw") (V8HI "avx512bw")
+  [(V16QI "avx512bw") (V8HI "avx512bw") (V8HF "avx512bw")
    (V4SI "avx512dq") (V2DI "avx512dq")])
 
 ;; sse4_1_pinsrd must come before sse2_loadld since it is preferred.
@@ -14703,11 +14891,19 @@ (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
     case 2:
     case 4:
       if (GET_MODE_SIZE (<ssescalarmode>mode) < GET_MODE_SIZE (SImode))
-	return "vpinsr<ssemodesuffix>\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+	{
+	  if (<MODE>mode == V8HFmode)
+	    return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+	  else
+	    return "vpinsr<ssemodesuffix>\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+	}
       /* FALLTHRU */
     case 3:
     case 5:
-      return "vpinsr<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      if (<MODE>mode == V8HFmode)
+	return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      else
+	return "vpinsr<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
     default:
       gcc_unreachable ();
     }
@@ -21122,16 +21318,17 @@ (define_mode_attr pbroadcast_evex_isa
   [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw")
    (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
    (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f")
-   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")])
+   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")
+   (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")])
 
 (define_insn "avx2_pbroadcast<mode>"
-  [(set (match_operand:VI 0 "register_operand" "=x,v")
-	(vec_duplicate:VI
+  [(set (match_operand:VIHF 0 "register_operand" "=x,v")
+	(vec_duplicate:VIHF
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "xm,vm")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX2"
-  "vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}"
   [(set_attr "isa" "*,<pbroadcast_evex_isa>")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
@@ -21139,17 +21336,17 @@ (define_insn "avx2_pbroadcast<mode>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx2_pbroadcast<mode>_1"
-  [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v")
-	(vec_duplicate:VI_256
+  [(set (match_operand:VIHF_256 0 "register_operand" "=x,x,v,v")
+	(vec_duplicate:VIHF_256
 	  (vec_select:<ssescalarmode>
-	    (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v")
+	    (match_operand:VIHF_256 1 "nonimmediate_operand" "m,x,m,v")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX2"
   "@
-   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
-   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}"
+   vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}
+   vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}"
   [(set_attr "isa" "*,*,<pbroadcast_evex_isa>,<pbroadcast_evex_isa>")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
@@ -21503,15 +21700,15 @@ (define_insn "avx2_vec_dupv4df"
    (set_attr "mode" "V4DF")])
 
 (define_insn "<avx512>_vec_dup<mode>_1"
-  [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v")
-	(vec_duplicate:VI_AVX512BW
+  [(set (match_operand:VIHF_AVX512BW 0 "register_operand" "=v,v")
+	(vec_duplicate:VIHF_AVX512BW
 	  (vec_select:<ssescalarmode>
-	    (match_operand:VI_AVX512BW 1 "nonimmediate_operand" "v,m")
+	    (match_operand:VIHF_AVX512BW 1 "nonimmediate_operand" "v,m")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
   "@
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %<iptr>1}"
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %<iptr>1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -21536,8 +21733,8 @@ (define_insn "<avx512>_vec_dup<mode><mask_name>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx512>_vec_dup<mode><mask_name>"
-  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
-	(vec_duplicate:VI12_AVX512VL
+  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v")
+	(vec_duplicate:VI12HF_AVX512VL
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
 	    (parallel [(const_int 0)]))))]
@@ -21572,8 +21769,8 @@ (define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>"
-  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v,v")
-	(vec_duplicate:VI12_AVX512VL
+  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v,v")
+	(vec_duplicate:VI12HF_AVX512VL
 	  (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm,r")))]
   "TARGET_AVX512BW"
   "@
@@ -21668,7 +21865,7 @@ (define_mode_attr vecdupssescalarmodesuffix
   [(V8SF "ss") (V4DF "sd") (V8SI "ss") (V4DI "sd")])
 ;; Modes handled by AVX2 vec_dup patterns.
 (define_mode_iterator AVX2_VEC_DUP_MODE
-  [V32QI V16QI V16HI V8HI V8SI V4SI])
+  [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF])
 
 (define_insn "*vec_dup<mode>"
   [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand" "=x,x,v")
@@ -22224,12 +22421,12 @@ (define_insn "vec_set_hi_<mode><mask_name>"
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "vec_set_lo_v16hi"
-  [(set (match_operand:V16HI 0 "register_operand" "=x,v")
-	(vec_concat:V16HI
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,vm")
-	  (vec_select:V8HI
-	    (match_operand:V16HI 1 "register_operand" "x,v")
+(define_insn "vec_set_lo_<mode>"
+  [(set (match_operand:V16_256 0 "register_operand" "=x,v")
+	(vec_concat:V16_256
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V16_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 8) (const_int 9)
 		       (const_int 10) (const_int 11)
 		       (const_int 12) (const_int 13)
@@ -22244,16 +22441,16 @@ (define_insn "vec_set_lo_v16hi"
    (set_attr "prefix" "vex,evex")
    (set_attr "mode" "OI")])
 
-(define_insn "vec_set_hi_v16hi"
-  [(set (match_operand:V16HI 0 "register_operand" "=x,v")
-	(vec_concat:V16HI
-	  (vec_select:V8HI
-	    (match_operand:V16HI 1 "register_operand" "x,v")
+(define_insn "vec_set_hi_<mode>"
+  [(set (match_operand:V16_256 0 "register_operand" "=x,v")
+	(vec_concat:V16_256
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V16_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")))]
   "TARGET_AVX"
   "@
    vinsert%~128\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}
@@ -22430,6 +22627,8 @@ (define_mode_iterator VEC_INIT_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -22441,6 +22640,8 @@ (define_mode_iterator VEC_INIT_HALF_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
    (V4TI "TARGET_AVX512F")])
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 06/10] AVX512FP16: Add testcase for vector init and broadcast intrinsics.
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
                           ` (4 preceding siblings ...)
  2021-07-21  7:43         ` [PATCH 05/10] AVX512FP16: Support vector init/broadcast/set/extract for FP16 liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-07-21  7:43         ` [PATCH 07/10] AVX512FP16: Add tests for vector passing in variable arguments liuhongt
                           ` (4 subsequent siblings)
  10 siblings, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

gcc/testsuite/ChangeLog:

	* gcc.target/i386/m512-check.h: Add union128h, union256h, union512h.
	* gcc.target/i386/avx512fp16-10a.c: New test.
	* gcc.target/i386/avx512fp16-10b.c: Ditto.
	* gcc.target/i386/avx512fp16-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-1c.c: Ditto.
	* gcc.target/i386/avx512fp16-1d.c: Ditto.
	* gcc.target/i386/avx512fp16-1e.c: Ditto.
	* gcc.target/i386/avx512fp16-2a.c: Ditto.
	* gcc.target/i386/avx512fp16-2b.c: Ditto.
	* gcc.target/i386/avx512fp16-2c.c: Ditto.
	* gcc.target/i386/avx512fp16-3a.c: Ditto.
	* gcc.target/i386/avx512fp16-3b.c: Ditto.
	* gcc.target/i386/avx512fp16-3c.c: Ditto.
	* gcc.target/i386/avx512fp16-4.c: Ditto.
	* gcc.target/i386/avx512fp16-5.c: Ditto.
	* gcc.target/i386/avx512fp16-6.c: Ditto.
	* gcc.target/i386/avx512fp16-7.c: Ditto.
	* gcc.target/i386/avx512fp16-8.c: Ditto.
	* gcc.target/i386/avx512fp16-9a.c: Ditto.
	* gcc.target/i386/avx512fp16-9b.c: Ditto.
	* gcc.target/i386/pr54855-13.c: Ditto.
	* gcc.target/i386/avx512fp16-vec_set_var.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-10a.c          |  14 ++
 .../gcc.target/i386/avx512fp16-10b.c          |  25 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |  24 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |  32 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |  26 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |  33 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |  30 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |  28 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |  33 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |  36 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |  36 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |  35 +++++
 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |  40 ++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-4.c  |  31 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-5.c  | 133 ++++++++++++++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-6.c  |  57 ++++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-7.c  |  86 +++++++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-8.c  |  53 +++++++
 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c |  27 ++++
 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c |  49 +++++++
 .../gcc.target/i386/avx512fp16-vec_set_var.c  |  30 ++++
 gcc/testsuite/gcc.target/i386/m512-check.h    |  38 ++++-
 gcc/testsuite/gcc.target/i386/pr54855-13.c    |  14 ++
 23 files changed, 909 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vec_set_var.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-13.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
new file mode 100644
index 00000000000..f06ffffa822
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <immintrin.h>
+
+__m128h
+__attribute__ ((noinline, noclone))
+set_128 (_Float16 x)
+{
+  return _mm_set_sh (x);
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[ \t]\+\[^\n\r]*xmm0" 1 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovw\[ \t]\+\[^\n\r]*xmm0" 2 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
new file mode 100644
index 00000000000..055edd7aaf5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-10a.c"
+
+union128h u128 = { ESP_FLOAT16, 0.0f, 0.0f, 0.0f,
+		   0.0f, 0.0f, 0.0f, 0.0f };
+
+static void
+do_test (void)
+{
+  __m128h v128 = set_128 (ESP_FLOAT16);
+  union128h a128;
+
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
new file mode 100644
index 00000000000..45c7bddeba5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo1 (_Float16 x)
+{
+  return __extension__ (__m128h)(__v8hf) { x, 0.0f, 0.0f, 0.0f,
+                                           0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo2 (_Float16 *x)
+{
+  return __extension__ (__m128h)(__v8hf) { *x, 0.0f, 0.0f, 0.0f,
+                                           0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 3 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 2 { target { ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
new file mode 100644
index 00000000000..7560c625e25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
@@ -0,0 +1,32 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-1a.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m128h v;
+  union128h a;
+  memset (&v, -1, sizeof (v));
+  v = foo1 (x);
+  a.x = v;
+  if (check_union128h (a, u.a))
+    abort ();
+  x = 33.3;
+  u.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo2 (&x);
+  a.x = v;
+  if (check_union128h (a, u.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
new file mode 100644
index 00000000000..49fc2aa42e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "(?:vmovsh|vmovw)" 2 { target { ! ia32 } } } }  */
+/* { dg-final { scan-assembler-times "vpinsrw" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpinsrw" 2 { target { ia32 } } } } */
+
+typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo1 (__m128h a, _Float16 f)
+{
+  __v8hf x = (__v8hf) a;
+  x[2] = f;
+  return (__m128h) x;
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo2 (__m128h a, _Float16 f)
+{
+  __v8hf x = (__v8hf) a;
+  x[0] = f;
+  return (__m128h) x;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1d.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
new file mode 100644
index 00000000000..cdaf656eb48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
@@ -0,0 +1,33 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-1c.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u = { -1.2f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f };
+  __m128h v;
+  union128h a, b;
+  v = foo1 (u.x, x);
+  a.x = v;
+  b = u;
+  b.a[2] = x;
+  if (check_union128h (a, b.a))
+    abort ();
+  x = 33.3;
+  b = u;
+  b.a[0] = x;
+  v = foo2 (u.x, x);
+  a.x = v;
+  if (check_union128h (a, b.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-1e.c b/gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
new file mode 100644
index 00000000000..04d33cfcf2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-1a.c"
+
+__m128h
+__attribute__ ((noinline,noclone))
+foo3 (__m128h x)
+{
+  return foo1(x[0]);
+}
+
+static void
+do_test (void)
+{
+  union128h u = { -1.2f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f };
+  union128h a, b = { -1.2f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f};
+  __m128h v;
+  v = foo3 (u.x);
+  a.x = v;
+  if (check_union128h (a, b.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-2a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
new file mode 100644
index 00000000000..c03138fb13d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
+typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
+
+__m256h
+__attribute__ ((noinline, noclone))
+foo1 (_Float16 x)
+{
+  return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+foo2 (_Float16 *x)
+{
+  return __extension__ (__m256h)(__v16hf) { *x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 3 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 2 { target { ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-2b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
new file mode 100644
index 00000000000..100afd0f49c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
@@ -0,0 +1,33 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-2a.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union256h u = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m256h v;
+  union256h a;
+  memset (&v, -1, sizeof (v));
+  v = foo1 (x);
+  a.x = v;
+  if (check_union256h (a, u.a))
+    abort ();
+  x = 33.3;
+  u.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo2 (&x);
+  a.x = v;
+  if (check_union256h (a, u.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-2c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
new file mode 100644
index 00000000000..cf4b42a4021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
@@ -0,0 +1,36 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-2a.c"
+
+__m256h
+__attribute__ ((noinline,noclone))
+foo3 (__m256h x)
+{
+  return foo1(x[0]);
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union256h u = { x, 3.5f, -5.9f, 0.0f, 0.0f, 0.0f, 7.7f, 0.0f,
+		  4.0f, -4.20f, 0.0f, 0.0f, 0.0f, -8.7f, 0.0f, 0.0f };
+
+  union256h exp = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		    0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m256h v;
+  union256h a;
+  memset (&v, -1, sizeof (v));
+  v = foo3 (u.x);
+  a.x = v;
+  if (check_union256h (a, exp.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-3a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
new file mode 100644
index 00000000000..126e7d9ee36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
+typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__m512h
+__attribute__ ((noinline, noclone))
+foo1 (_Float16 x)
+{
+  return __extension__ (__m512h)(__v32hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+foo2 (_Float16 *x)
+{
+  return __extension__ (__m512h)(__v32hf) { *x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 3 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovw\[^\n\r]*xmm0" 2 { target { ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-3b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
new file mode 100644
index 00000000000..291db066bfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
@@ -0,0 +1,35 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-3a.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union512h u = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m512h v;
+  union512h a;
+  memset (&v, -1, sizeof (v));
+  v = foo1 (x);
+  a.x = v;
+  if (check_union512h (a, u.a))
+    abort ();
+  x = 33.3;
+  u.a[0] = x;
+  memset (&v, -1, sizeof (v));
+  v = foo2 (&x);
+  a.x = v;
+  if (check_union512h (a, u.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-3c.c b/gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
new file mode 100644
index 00000000000..21f9e16434a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-3a.c"
+
+__m512h
+__attribute__ ((noinline,noclone))
+foo3 (__m512h x)
+{
+  return foo1(x[0]);
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union512h u = { x, 3.5f, -5.9f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  2.0f, -2.3f, 0.0f, 0.0f, 10.4f, 0.0f, 0.0f, 0.0f,
+		  3.0f, -3.2f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		  4.0f, -4.20f, 0.0f, 0.0f, 0.0f, -8.7f, 0.0f, 0.0f };
+
+  union512h exp = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		    0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		    0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		    0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m512h v;
+  union512h a;
+  memset (&v, -1, sizeof (v));
+  v = foo3 (u.x);
+  a.x = v;
+  if (check_union512h (a, exp.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-4.c b/gcc/testsuite/gcc.target/i386/avx512fp16-4.c
new file mode 100644
index 00000000000..1329a0434a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-4.c
@@ -0,0 +1,31 @@
+/* { dg-do assemble { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
+typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+
+extern __m128h x128, y128;
+extern __m256h x256, y256;
+extern __m512h x512, y512;
+
+__m128h
+foo1 (float f1, __m128h f2)
+{
+  x128 = y128;
+  return f2;
+}
+
+__m256h
+foo2 (float f1, __m256h f2)
+{
+  x256 = y256;
+  return f2;
+}
+
+__m512h
+foo3 (float f1, __m512h f2)
+{
+  x512 = y512;
+  return f2;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-5.c b/gcc/testsuite/gcc.target/i386/avx512fp16-5.c
new file mode 100644
index 00000000000..d28b9651b8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-5.c
@@ -0,0 +1,133 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo1 (_Float16 x)
+{
+  return __extension__ (__m128h)(__v8hf) { x, 0.0f, 0.0f, 0.0f,
+                                           1.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+foo2 (_Float16 x, _Float16 y)
+{
+  return __extension__ (__m128h)(__v8hf) { x, 0.0f, 0.0f, y,
+                                           3.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+foo3 (_Float16 x)
+{
+  return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            1.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+foo4 (_Float16 x, _Float16 y)
+{
+  return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, y,
+                                            3.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+foo5 (_Float16 x)
+{
+  return __extension__ (__m512h)(__v32hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            1.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+foo6 (_Float16 x, _Float16 y)
+{
+  return __extension__ (__m512h)(__v32hf) { x, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, y,
+                                            3.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f,
+                                            0.0f, 0.0f, 0.0f, 0.0f };
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  _Float16 y = -35.7;
+  union128h u128 = { x, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f };
+  union256h u256 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  union512h u512 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  __m128h v128;
+  __m256h v256;
+  __m512h v512;
+  union128h a128;
+  union256h a256;
+  union512h a512;
+
+  memset (&v128, -1, sizeof (v128));
+  v128 = foo1 (x);
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+  memset (&v128, -1, sizeof (v128));
+  u128.a[3] = y;
+  u128.a[4] = 3.0f;
+  v128 = foo2 (x, y);
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+
+  memset (&v256, -1, sizeof (v256));
+  v256 = foo3 (x);
+  a256.x = v256;
+  if (check_union256h (a256, u256.a))
+    abort ();
+  memset (&v256, -1, sizeof (v256));
+  u256.a[7] = y;
+  u256.a[8] = 3.0f;
+  v256 = foo4 (x, y);
+  a256.x = v256;
+  if (check_union256h (a256, u256.a))
+    abort ();
+
+  memset (&v512, -1, sizeof (v512));
+  v512 = foo5 (x);
+  a512.x = v512;
+  if (check_union512h (a512, u512.a))
+    abort ();
+  memset (&v512, -1, sizeof (v512));
+  u512.a[15] = y;
+  u512.a[16] = 3.0f;
+  v512 = foo6 (x, y);
+  a512.x = v512;
+  if (check_union512h (a512, u512.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-6.c b/gcc/testsuite/gcc.target/i386/avx512fp16-6.c
new file mode 100644
index 00000000000..d85a6c40603
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-6.c
@@ -0,0 +1,57 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+void
+__attribute__ ((noinline, noclone))
+foo128 (_Float16 *p, __m128h x)
+{
+  *p = ((__v8hf)x)[0];
+}
+
+void
+__attribute__ ((noinline, noclone))
+foo256 (_Float16 *p, __m256h x)
+{
+  *p = ((__v16hf)x)[0];
+}
+
+void
+__attribute__ ((noinline, noclone))
+foo512 (_Float16 *p, __m512h x)
+{
+  *p = ((__v32hf)x)[0];
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u128 = { x, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f };
+  union256h u256 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  union512h u512 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  _Float16 y;
+
+  foo128 (&y, u128.x);
+  if (x != y)
+    abort ();
+
+  foo256 (&y, u256.x);
+  if (x != y)
+    abort ();
+
+  foo512 (&y, u512.x);
+  if (x != y)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-7.c b/gcc/testsuite/gcc.target/i386/avx512fp16-7.c
new file mode 100644
index 00000000000..26ae25fc0d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-7.c
@@ -0,0 +1,86 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+void
+__attribute__ ((noinline, noclone))
+foo128 (_Float16 *p, __m128h x)
+{
+  *p = ((__v8hf)x)[4];
+}
+
+void
+__attribute__ ((noinline, noclone))
+foo256 (_Float16 *p, __m256h x)
+{
+  *p = ((__v16hf)x)[10];
+}
+
+void
+__attribute__ ((noinline, noclone))
+foo512 (_Float16 *p, __m512h x)
+{
+  *p = ((__v32hf)x)[30];
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u128 = { 0.0f, x, 0.0f, 0.0f, x, 0.0f, 0.0f, x };
+  union256h u256 = { x, 0.0f, 0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, x, 0.0f, 0.0f, x, 0.0f, 0.0f };
+  union512h u512 = { x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, x, 0.0f, 0.0f, x, 0.0f };
+  __m128h v128 = _mm_setr_ph (0.0f, x, 0.0f, 0.0f,
+			      x, 0.0f, 0.0f, x);
+  __m256h v256 = _mm256_setr_ph (x, 0.0f, 0.0f, 0.0f,
+				 x, 0.0f, 0.0f, 0.0f,
+				 0.0f, 0.0f, x, 0.0f,
+				 0.0f, x, 0.0f, 0.0f);
+  __m512h v512 = _mm512_setr_ph (x, 0.0f, 0.0f, 0.0f,
+				 0.0f, 0.0f, 0.0f, 0.0f,
+				 0.0f, x, 0.0f, 0.0f,
+				 0.0f, 0.0f, 0.0f, 0.0f,
+				 0.0f, 0.0f, x, 0.0f,
+				 0.0f, 0.0f, 0.0f, 0.0f,
+				 0.0f, 0.0f, 0.0f, x,
+				 0.0f, 0.0f, x, 0.0f);
+  union128h a128;
+  union256h a256;
+  union512h a512;
+  _Float16 y;
+
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+
+  a256.x = v256;
+  if (check_union256h (a256, u256.a))
+    abort ();
+
+  a512.x = v512;
+  if (check_union512h (a512, u512.a))
+    abort ();
+
+  foo128 (&y, u128.x);
+  if (x != y)
+    abort ();
+
+  foo256 (&y, u256.x);
+  if (x != y)
+    abort ();
+
+  foo512 (&y, u512.x);
+  if (x != y)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-8.c b/gcc/testsuite/gcc.target/i386/avx512fp16-8.c
new file mode 100644
index 00000000000..8f103751c2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-8.c
@@ -0,0 +1,53 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+_Float16
+__attribute__ ((noinline, noclone))
+foo128 (__m128h x)
+{
+  return ((__v8hf)x)[4];
+}
+
+_Float16
+__attribute__ ((noinline, noclone))
+foo256 (__m256h x)
+{
+  return ((__v16hf)x)[10];
+}
+
+_Float16
+__attribute__ ((noinline, noclone))
+foo512 (__m512h x)
+{
+  return ((__v32hf)x)[30];
+}
+
+static void
+do_test (void)
+{
+  _Float16 x = 25.3;
+  union128h u128 = { 0.0f, 0.0f, 0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f };
+  union256h u256 = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, x, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
+  union512h u512 = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, x, 0.0f };
+
+  if (foo128 (u128.x) != x)
+    abort ();
+
+  if (foo256 (u256.x) != x)
+    abort ();
+
+  if (foo512 (u512.x) != x)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-9a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
new file mode 100644
index 00000000000..580ffb51e45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <immintrin.h>
+
+__m128h
+__attribute__ ((noinline, noclone))
+set1_128 (_Float16 x)
+{
+  return _mm_set1_ph (x);
+}
+
+__m256h
+__attribute__ ((noinline, noclone))
+set1_256 (_Float16 x)
+{
+  return _mm256_set1_ph (x);
+}
+
+__m512h
+__attribute__ ((noinline, noclone))
+set1_512 (_Float16 x)
+{
+  return _mm512_set1_ph (x);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastw\[ \t]\+\[^\n\r]*\[xyz\]mm0" 3 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-9b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
new file mode 100644
index 00000000000..198b23e64b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
@@ -0,0 +1,49 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-9a.c"
+
+union128h u128 = { ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16 };
+union256h u256 = { ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16 };
+union512h u512 = { ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16,
+		   ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16, ESP_FLOAT16 };
+
+static void
+do_test (void)
+{
+  __m128h v128 = set1_128 (ESP_FLOAT16);
+  __m256h v256 = set1_256 (ESP_FLOAT16);
+  __m512h v512 = set1_512 (ESP_FLOAT16);
+  union128h a128;
+  union256h a256;
+  union512h a512;
+
+  a128.x = v128;
+  if (check_union128h (a128, u128.a))
+    abort ();
+
+  a256.x = v256;
+  if (check_union256h (a256, u256.a))
+    abort ();
+
+  a512.x = v512;
+  if (check_union512h (a512, u512.a))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vec_set_var.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vec_set_var.c
new file mode 100644
index 00000000000..d948f253cc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vec_set_var.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mno-avx512vl -O2" } */
+/* { dg-final { scan-assembler-times {(?n)vpblendvb[ \t]+%xmm[0-9]} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)vpblendvb[ \t]+%ymm[0-9]} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)vpbroadcastw[ \t].*%zmm[0-9].*%k[0-7]} 1 } } */
+
+typedef _Float16 v32hf __attribute__((vector_size(64)));
+typedef _Float16 v16hf __attribute__((vector_size(32)));
+typedef _Float16 v8hf __attribute__((vector_size(16)));
+
+v8hf
+foo1 (v8hf a, _Float16 b, int c)
+{
+  a[c] = b;
+  return a;
+}
+
+v16hf
+foo2 (v16hf a, _Float16 b, int c)
+{
+  a[c] = b;
+  return a;
+}
+
+v32hf
+foo3 (v32hf a, _Float16 b, int c)
+{
+  a[c] = b;
+  return a;
+}
diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h
index 6befaf0a9ba..68e74fce68d 100644
--- a/gcc/testsuite/gcc.target/i386/m512-check.h
+++ b/gcc/testsuite/gcc.target/i386/m512-check.h
@@ -60,7 +60,24 @@ typedef union
  __m512i x;
  unsigned long long a[8];
 } union512i_uq;
-                                    
+
+typedef union
+{
+  __m128h x;
+  _Float16 a[8];
+} union128h;
+
+typedef union
+{
+  __m256h x;
+  _Float16 a[16];
+} union256h;
+
+typedef union
+{
+  __m512h x;
+  _Float16 a[32];
+} union512h;
 
 CHECK_EXP (union512i_b, char, "%d")
 CHECK_EXP (union512i_w, short, "%d")
@@ -115,3 +132,22 @@ CHECK_ROUGH_EXP (union256, float, "%f")
 CHECK_ROUGH_EXP (union256d, double, "%f")
 CHECK_ROUGH_EXP (union128, float, "%f")
 CHECK_ROUGH_EXP (union128d, double, "%f")
+
+#ifdef AVX512FP16
+
+CHECK_EXP (union128h, _Float16, "%f")
+CHECK_EXP (union256h, _Float16, "%f")
+CHECK_EXP (union512h, _Float16, "%f")
+
+#ifndef ESP_FLOAT16
+#define ESP_FLOAT16 0.27
+#endif
+
+CHECK_FP_EXP (union128h, _Float16, ESP_FLOAT16, "%f")
+CHECK_FP_EXP (union256h, _Float16, ESP_FLOAT16, "%f")
+CHECK_FP_EXP (union512h, _Float16, ESP_FLOAT16, "%f")
+
+CHECK_ROUGH_EXP (union128h, _Float16, "%f")
+CHECK_ROUGH_EXP (union256h, _Float16, "%f")
+CHECK_ROUGH_EXP (union512h, _Float16, "%f")
+#endif
diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c b/gcc/testsuite/gcc.target/i386/pr54855-13.c
new file mode 100644
index 00000000000..87b4f459a5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
+/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
+
+#include <immintrin.h>
+
+__m128h
+foo (__m128h x, __m128h y)
+{
+  x[0] = x[0] > y[0] ? x[0] : y[0];
+  return x;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 07/10] AVX512FP16: Add tests for vector passing in variable arguments.
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
                           ` (5 preceding siblings ...)
  2021-07-21  7:43         ` [PATCH 06/10] AVX512FP16: Add testcase for vector init and broadcast intrinsics liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-07-21  7:43         ` [PATCH 08/10] AVX512FP16: Add ABI tests for xmm liuhongt
                           ` (3 subsequent siblings)
  10 siblings, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

From: "H.J. Lu" <hjl.tools@gmail.com>

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vararg-1.c: New test.
	* gcc.target/i386/avx512fp16-vararg-2.c: Ditto.
	* gcc.target/i386/avx512fp16-vararg-3.c: Ditto.
	* gcc.target/i386/avx512fp16-vararg-4.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-vararg-1.c     | 122 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16-vararg-2.c     | 107 +++++++++++++++
 .../gcc.target/i386/avx512fp16-vararg-3.c     | 114 ++++++++++++++++
 .../gcc.target/i386/avx512fp16-vararg-4.c     | 115 +++++++++++++++++
 4 files changed, 458 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
new file mode 100644
index 00000000000..9bd366838b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
@@ -0,0 +1,122 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-options "-mavx512fp16" } */
+
+#include <stdarg.h>
+#include <assert.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+struct m256h
+{
+  __m256h  v;
+};
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+struct m256h n2 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16,
+		      234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } };
+__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ;
+_Float16 n4 = 32.4f16;
+double n5 = 103.3;
+__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 };
+__m128d n7 = { -91.387, -8193.518 };
+struct m256h n8 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16,
+		      234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 };
+_Float16 n11 = 40.7f16;
+double n12 = 304.9;
+__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 };
+__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16,
+		131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 };
+__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+		 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16,
+		 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+		 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+struct m256h e2;
+__m128h e3;
+_Float16 e4;
+double e5;
+__m128h e6;
+__m128d e7;
+struct m256h e8;
+__m128 e9;
+__m128h e10;
+_Float16 e11;
+double e12;
+__m128h e13;
+__m256h e14;
+__m512h e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+foo (va_list va_arglist)
+{
+  e4 = va_arg (va_arglist, _Float16);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128h);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, struct m256h);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128h);
+  e11 = va_arg (va_arglist, _Float16);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128h);
+  e14 = va_arg (va_arglist, __m256h);
+  e15 = va_arg (va_arglist, __m512h);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+  va_end (va_arglist);
+}
+
+static void
+__attribute__((noinline))
+test (__m128 a1, struct m256h a2, __m128h a3, ...)
+{
+  va_list va_arglist;
+
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  va_start (va_arglist, a3);
+  foo (va_arglist);
+  va_end (va_arglist);
+}
+
+static void
+do_test (void)
+{
+  test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+	n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
new file mode 100644
index 00000000000..043f1c75d00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
@@ -0,0 +1,107 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-options "-mavx512fp16" } */
+
+#include <stdarg.h>
+#include <assert.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m256d n2 = { -93.83, 893.318, 3994.3, -39484.0 };
+__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ;
+_Float16 n4 = 32.4f16;
+double n5 = 103.3;
+__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 };
+_Float16 n11 = 40.7f16;
+double n12 = 304.9;
+__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 };
+__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16,
+		131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 };
+__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+		 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16,
+		 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+		 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m256d e2;
+__m128h e3;
+_Float16 e4;
+double e5;
+__m128h e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128h e10;
+_Float16 e11;
+double e12;
+__m128h e13;
+__m256h e14;
+__m512h e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+test (__m128 a1, __m256d a2, __m128h a3, ...)
+{
+  va_list va_arglist;
+
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  va_start (va_arglist, a3);
+  e4 = va_arg (va_arglist, _Float16);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128h);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128h);
+  e11 = va_arg (va_arglist, _Float16);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128h);
+  e14 = va_arg (va_arglist, __m256h);
+  e15 = va_arg (va_arglist, __m512h);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+  va_end (va_arglist);
+}
+
+static void
+do_test (void)
+{
+  test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+	n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
new file mode 100644
index 00000000000..cb414a97753
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
@@ -0,0 +1,114 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-options "-mavx512fp16" } */
+
+#include <stdarg.h>
+#include <assert.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+struct m256h
+{
+  __m256h  v;
+};
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+struct m256h n2 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16,
+		      234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } };
+__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ;
+_Float16 n4 = 32.4f16;
+double n5 = 103.3;
+__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 };
+__m128d n7 = { -91.387, -8193.518 };
+struct m256h n8 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 323.4f16, 42.5f16, -43.4f16,
+		      234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, -421.0f16, 234.5f16, 214.5f16 } };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 };
+_Float16 n11 = 40.7f16;
+double n12 = 304.9;
+__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 };
+__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16,
+		131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 };
+__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+		 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16,
+		 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+		 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+struct m256h e2;
+__m128h e3;
+_Float16 e4;
+double e5;
+__m128h e6;
+__m128d e7;
+struct m256h e8;
+__m128 e9;
+__m128h e10;
+_Float16 e11;
+double e12;
+__m128h e13;
+__m256h e14;
+__m512h e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+test (__m128 a1, struct m256h a2, __m128h a3, ...)
+{
+  va_list va_arglist;
+
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  va_start (va_arglist, a3);
+  e4 = va_arg (va_arglist, _Float16);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128h);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, struct m256h);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128h);
+  e11 = va_arg (va_arglist, _Float16);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128h);
+  e14 = va_arg (va_arglist, __m256h);
+  e15 = va_arg (va_arglist, __m512h);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+  va_end (va_arglist);
+}
+
+static void
+do_test (void)
+{
+  test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+	n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c
new file mode 100644
index 00000000000..962c2bf031d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c
@@ -0,0 +1,115 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-options "-mavx512fp16" } */
+
+#include <stdarg.h>
+#include <assert.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+__m256d n2 = { -93.83, 893.318, 3994.3, -39484.0 };
+__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 343.8f16, 215.4f16 } ;
+_Float16 n4 = 32.4f16;
+double n5 = 103.3;
+__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 432.5f16, 482.4f16 };
+__m128d n7 = { -91.387, -8193.518 };
+__m256d n8 = { -123.3, 2.3, 3.4, -10.03 };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 156.5f16, 953.1f16 };
+_Float16 n11 = 40.7f16;
+double n12 = 304.9;
+__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, -13.4f16, 14.5f16 };
+__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 131.5f16, -13.2f16,
+		131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 43.2f16, -34.45f16 };
+__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 24.5f16, 53.54f16,
+		 238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, -535.3f16, 324.7f16,
+		 82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, -34.5f16, -32.5f16,
+		 23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, -34.5f16, 13.5f16 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+__m256d e2;
+__m128h e3;
+_Float16 e4;
+double e5;
+__m128h e6;
+__m128d e7;
+__m256d e8;
+__m128 e9;
+__m128h e10;
+_Float16 e11;
+double e12;
+__m128h e13;
+__m256h e14;
+__m512h e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+foo (va_list va_arglist)
+{
+  e4 = va_arg (va_arglist, _Float16);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128h);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, __m256d);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128h);
+  e11 = va_arg (va_arglist, _Float16);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128h);
+  e14 = va_arg (va_arglist, __m256h);
+  e15 = va_arg (va_arglist, __m512h);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+  va_end (va_arglist);
+}
+
+static void
+__attribute__((noinline))
+test (__m128 a1, __m256d a2, __m128h a3, ...)
+{
+  va_list va_arglist;
+
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  va_start (va_arglist, a3);
+  foo (va_arglist);
+  va_end (va_arglist);
+}
+
+static void
+do_test (void)
+{
+  test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+	n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeof (e10)) == 0);
+  assert (n11 == e11);
+  assert (n12 == e12);
+  assert (__builtin_memcmp (&e13, &n13, sizeof (e13)) == 0);
+  assert (__builtin_memcmp (&e14, &n14, sizeof (e14)) == 0);
+  assert (__builtin_memcmp (&e15, &n15, sizeof (e15)) == 0);
+  assert (__builtin_memcmp (&e16, &n16, sizeof (e16)) == 0);
+  assert (__builtin_memcmp (&e17, &n17, sizeof (e17)) == 0);
+  assert (__builtin_memcmp (&e18, &n18, sizeof (e18)) == 0);
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 08/10] AVX512FP16: Add ABI tests for xmm.
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
                           ` (6 preceding siblings ...)
  2021-07-21  7:43         ` [PATCH 07/10] AVX512FP16: Add tests for vector passing in variable arguments liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-07-21  7:43         ` [PATCH 09/10] AVX512FP16: Add ABI test for ymm liuhongt
                           ` (2 subsequent siblings)
  10 siblings, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

From: "H.J. Lu" <hjl.tools@gmail.com>

Copied from regular XMM ABI tests. Only run AVX512FP16 ABI tests for ELF
targets.

gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp: New exp
	file for abi test.
	* gcc.target/x86_64/abi/avx512fp16/args.h: New header file for abi test.
	* gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/defines.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/macros.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/asm-support.S: New asm for abi check.
	* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c:
	New test.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c: Likewise.
---
 .../abi/avx512fp16/abi-avx512fp16-xmm.exp     |   48 +
 .../gcc.target/x86_64/abi/avx512fp16/args.h   |  190 +++
 .../x86_64/abi/avx512fp16/asm-support.S       |   81 ++
 .../x86_64/abi/avx512fp16/avx512fp16-check.h  |   74 ++
 .../abi/avx512fp16/avx512fp16-xmm-check.h     |    3 +
 .../x86_64/abi/avx512fp16/defines.h           |  150 +++
 .../gcc.target/x86_64/abi/avx512fp16/macros.h |   53 +
 .../test_3_element_struct_and_unions.c        |  692 +++++++++++
 .../abi/avx512fp16/test_basic_alignment.c     |   45 +
 .../test_basic_array_size_and_align.c         |   43 +
 .../abi/avx512fp16/test_basic_returning.c     |   87 ++
 .../x86_64/abi/avx512fp16/test_basic_sizes.c  |   43 +
 .../test_basic_struct_size_and_align.c        |   42 +
 .../test_basic_union_size_and_align.c         |   40 +
 .../abi/avx512fp16/test_complex_returning.c   |  104 ++
 .../abi/avx512fp16/test_m64m128_returning.c   |   73 ++
 .../abi/avx512fp16/test_passing_floats.c      | 1066 +++++++++++++++++
 .../abi/avx512fp16/test_passing_m64m128.c     |  510 ++++++++
 .../abi/avx512fp16/test_passing_structs.c     |  332 +++++
 .../abi/avx512fp16/test_passing_unions.c      |  335 ++++++
 .../abi/avx512fp16/test_struct_returning.c    |  274 +++++
 .../x86_64/abi/avx512fp16/test_varargs-m128.c |  164 +++
 22 files changed, 4449 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
new file mode 100644
index 00000000000..33d24762788
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
@@ -0,0 +1,48 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib clearcap.exp
+load_lib file-format.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || [is-effective-target ia32]
+     || [gcc_target_object_format] != "elf"
+     || ![is-effective-target avx512fp16] } then {
+  return
+}
+
+
+torture-init
+clearcap-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -Wno-abi -mavx512fp16"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+    if {[runtest_file_p $runtests $src]} {
+	c-torture-execute [list $src \
+				$srcdir/$subdir/asm-support.S] \
+				$additional_flags
+    }
+}
+
+clearcap-finish
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
new file mode 100644
index 00000000000..4a7b9a90fbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
@@ -0,0 +1,190 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include <string.h>
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 xmm0
+#define F1 xmm1
+#define F2 xmm2
+#define F3 xmm3
+#define F4 xmm4
+#define F5 xmm5
+#define F6 xmm6
+#define F7 xmm7
+
+typedef union {
+  _Float16 __Float16[8];
+  float _float[4];
+  double _double[2];
+  long _long[2];
+  int _int[4];
+  unsigned long _ulong[2];
+#ifdef CHECK_M64_M128
+  __m64 _m64[2];
+  __m128 _m128[1];
+  __m128h _m128h[1];
+#endif
+} XMM_T;
+
+typedef union {
+  _Float16 __Float16;
+  float _float;
+  double _double;
+  ldouble _ldouble;
+  ulong _ulong[2];
+} X87_T;
+extern void (*callthis)(void);
+extern unsigned long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
+XMM_T xmm_regs[16];
+X87_T x87_regs[8];
+extern volatile unsigned long volatile_var;
+extern void snapshot (void);
+extern void snapshot_ret (void);
+#define WRAP_CALL(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
+#define WRAP_RET(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
+
+/* Clear all integer registers.  */
+#define clear_int_hardware_registers \
+  asm __volatile__ ("xor %%rax, %%rax\n\t" \
+		    "xor %%rbx, %%rbx\n\t" \
+		    "xor %%rcx, %%rcx\n\t" \
+		    "xor %%rdx, %%rdx\n\t" \
+		    "xor %%rsi, %%rsi\n\t" \
+		    "xor %%rdi, %%rdi\n\t" \
+		    "xor %%r8, %%r8\n\t" \
+		    "xor %%r9, %%r9\n\t" \
+		    "xor %%r10, %%r10\n\t" \
+		    "xor %%r11, %%r11\n\t" \
+		    "xor %%r12, %%r12\n\t" \
+		    "xor %%r13, %%r13\n\t" \
+		    "xor %%r14, %%r14\n\t" \
+		    "xor %%r15, %%r15\n\t" \
+		    ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
+		    "r9", "r10", "r11", "r12", "r13", "r14", "r15");
+
+/* This is the list of registers available for passing arguments. Not all of
+   these are used or even really available.  */
+struct IntegerRegisters
+{
+  unsigned long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
+};
+struct FloatRegisters
+{
+  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
+  ldouble st0, st1, st2, st3, st4, st5, st6, st7;
+  XMM_T xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8, xmm9,
+        xmm10, xmm11, xmm12, xmm13, xmm14, xmm15;
+};
+
+/* Implemented in scalarargs.c  */
+extern struct IntegerRegisters iregs;
+extern struct FloatRegisters fregs;
+extern unsigned int num_iregs, num_fregs;
+
+#define check_int_arguments do { \
+  assert (num_iregs <= 0 || iregs.I0 == I0); \
+  assert (num_iregs <= 1 || iregs.I1 == I1); \
+  assert (num_iregs <= 2 || iregs.I2 == I2); \
+  assert (num_iregs <= 3 || iregs.I3 == I3); \
+  assert (num_iregs <= 4 || iregs.I4 == I4); \
+  assert (num_iregs <= 5 || iregs.I5 == I5); \
+  } while (0)
+
+#define check_char_arguments check_int_arguments
+#define check_short_arguments check_int_arguments
+#define check_long_arguments check_int_arguments
+
+/* Clear register struct.  */
+#define clear_struct_registers \
+  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
+    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
+  memset (&iregs, 0, sizeof (iregs)); \
+  memset (&fregs, 0, sizeof (fregs)); \
+  memset (xmm_regs, 0, sizeof (xmm_regs)); \
+  memset (x87_regs, 0, sizeof (x87_regs));
+
+/* Clear both hardware and register structs for integers.  */
+#define clear_int_registers \
+  clear_struct_registers \
+  clear_int_hardware_registers
+
+/* TODO: Do the checking.  */
+#define check_f_arguments(T) do { \
+  assert (num_fregs <= 0 || fregs.xmm0._ ## T [0] == xmm_regs[0]._ ## T [0]); \
+  assert (num_fregs <= 1 || fregs.xmm1._ ## T [0] == xmm_regs[1]._ ## T [0]); \
+  assert (num_fregs <= 2 || fregs.xmm2._ ## T [0] == xmm_regs[2]._ ## T [0]); \
+  assert (num_fregs <= 3 || fregs.xmm3._ ## T [0] == xmm_regs[3]._ ## T [0]); \
+  assert (num_fregs <= 4 || fregs.xmm4._ ## T [0] == xmm_regs[4]._ ## T [0]); \
+  assert (num_fregs <= 5 || fregs.xmm5._ ## T [0] == xmm_regs[5]._ ## T [0]); \
+  assert (num_fregs <= 6 || fregs.xmm6._ ## T [0] == xmm_regs[6]._ ## T [0]); \
+  assert (num_fregs <= 7 || fregs.xmm7._ ## T [0] == xmm_regs[7]._ ## T [0]); \
+  } while (0)
+
+#define check_float16_arguments check_f_arguments(_Float16)
+#define check_float_arguments check_f_arguments(float)
+#define check_double_arguments check_f_arguments(double)
+
+#define check_vector_arguments(T,O) do { \
+  assert (num_fregs <= 0 \
+	  || memcmp (((char *) &fregs.xmm0) + (O), \
+		     &xmm_regs[0], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 1 \
+	  || memcmp (((char *) &fregs.xmm1) + (O), \
+		     &xmm_regs[1], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 2 \
+	  || memcmp (((char *) &fregs.xmm2) + (O), \
+		     &xmm_regs[2], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 3 \
+	  || memcmp (((char *) &fregs.xmm3) + (O), \
+		     &xmm_regs[3], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 4 \
+	  || memcmp (((char *) &fregs.xmm4) + (O), \
+		     &xmm_regs[4], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 5 \
+	  || memcmp (((char *) &fregs.xmm5) + (O), \
+		     &xmm_regs[5], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 6 \
+	  || memcmp (((char *) &fregs.xmm6) + (O), \
+		     &xmm_regs[6], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 7 \
+	  || memcmp (((char *) &fregs.xmm7) + (O), \
+		     &xmm_regs[7], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  } while (0)
+
+#define check_m64_arguments check_vector_arguments(m64, 0)
+#define check_m128_arguments check_vector_arguments(m128, 0)
+
+/* ldoubles are not passed in registers */
+#define check_ldouble_arguments
+
+/* TODO: Do the clearing.  */
+#define clear_float_hardware_registers
+#define clear_x87_hardware_registers
+
+#define clear_float_registers \
+  clear_struct_registers \
+  clear_float_hardware_registers
+
+#define clear_x87_registers \
+  clear_struct_registers \
+  clear_x87_hardware_registers
+
+
+#endif /* INCLUDED_ARGS_H  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
new file mode 100644
index 00000000000..7849acd2649
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
@@ -0,0 +1,81 @@
+	.text
+	.p2align 4,,15
+.globl snapshot
+	.type	snapshot, @function
+snapshot:
+.LFB3:
+	movq	%rax, rax(%rip)
+	movq	%rbx, rbx(%rip)
+	movq	%rcx, rcx(%rip)
+	movq	%rdx, rdx(%rip)
+	movq	%rdi, rdi(%rip)
+	movq	%rsi, rsi(%rip)
+	movq	%rbp, rbp(%rip)
+	movq	%rsp, rsp(%rip)
+	movq	%r8, r8(%rip)
+	movq	%r9, r9(%rip)
+	movq	%r10, r10(%rip)
+	movq	%r11, r11(%rip)
+	movq	%r12, r12(%rip)
+	movq	%r13, r13(%rip)
+	movq	%r14, r14(%rip)
+	movq	%r15, r15(%rip)
+	vmovdqu	%xmm0, xmm_regs+0(%rip)
+	vmovdqu	%xmm1, xmm_regs+16(%rip)
+	vmovdqu	%xmm2, xmm_regs+32(%rip)
+	vmovdqu	%xmm3, xmm_regs+48(%rip)
+	vmovdqu	%xmm4, xmm_regs+64(%rip)
+	vmovdqu	%xmm5, xmm_regs+80(%rip)
+	vmovdqu	%xmm6, xmm_regs+96(%rip)
+	vmovdqu	%xmm7, xmm_regs+112(%rip)
+	vmovdqu	%xmm8, xmm_regs+128(%rip)
+	vmovdqu	%xmm9, xmm_regs+144(%rip)
+	vmovdqu	%xmm10, xmm_regs+160(%rip)
+	vmovdqu	%xmm11, xmm_regs+176(%rip)
+	vmovdqu	%xmm12, xmm_regs+192(%rip)
+	vmovdqu	%xmm13, xmm_regs+208(%rip)
+	vmovdqu	%xmm14, xmm_regs+224(%rip)
+	vmovdqu	%xmm15, xmm_regs+240(%rip)
+	jmp	*callthis(%rip)
+.LFE3:
+	.size	snapshot, .-snapshot
+
+	.p2align 4,,15
+.globl snapshot_ret
+	.type	snapshot_ret, @function
+snapshot_ret:
+	movq	%rdi, rdi(%rip)
+	subq	$8, %rsp
+	call	*callthis(%rip)
+	addq	$8, %rsp
+	movq	%rax, rax(%rip)
+	movq	%rdx, rdx(%rip)
+	vmovdqu	%xmm0, xmm_regs+0(%rip)
+	vmovdqu	%xmm1, xmm_regs+16(%rip)
+	fstpt	x87_regs(%rip)
+	fstpt	x87_regs+16(%rip)
+	fldt	x87_regs+16(%rip)
+	fldt	x87_regs(%rip)
+	ret
+	.size	snapshot_ret, .-snapshot_ret
+
+	.comm	callthis,8,8
+	.comm	rax,8,8
+	.comm	rbx,8,8
+	.comm	rcx,8,8
+	.comm	rdx,8,8
+	.comm	rsi,8,8
+	.comm	rdi,8,8
+	.comm	rsp,8,8
+	.comm	rbp,8,8
+	.comm	r8,8,8
+	.comm	r9,8,8
+	.comm	r10,8,8
+	.comm	r11,8,8
+	.comm	r12,8,8
+	.comm	r13,8,8
+	.comm	r14,8,8
+	.comm	r15,8,8
+	.comm	xmm_regs,256,32
+	.comm	x87_regs,128,32
+	.comm   volatile_var,8,8
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
new file mode 100644
index 00000000000..9fbec9d03ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
@@ -0,0 +1,74 @@
+#include <stdlib.h>
+#include <cpuid.h>
+
+/* Check if the OS supports executing AVX512FP16 instructions.  */
+
+#define XCR_XFEATURE_ENABLED_MASK	0x0
+
+#define XSTATE_FP	0x1
+#define XSTATE_SSE	0x2
+#define XSTATE_YMM	0x4
+#define XSTATE_OPMASK	0x20
+#define XSTATE_ZMM	0x40
+#define XSTATE_HI_ZMM	0x80
+
+static int
+check_osxsave (void)
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  return (ecx & bit_OSXSAVE) != 0;
+}
+
+static int
+avx512fp16_os_support (void)
+{
+  unsigned int eax, edx;
+  unsigned int ecx = XCR_XFEATURE_ENABLED_MASK;
+  unsigned int mask = XSTATE_MASK;
+
+  if (!check_osxsave ())
+    return 0;
+
+  __asm__ ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (ecx));
+
+  return ((eax & mask) == mask);
+}
+
+static void do_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+
+  if (!avx512fp16_os_support ())
+    return 0;
+
+  if (__get_cpuid_max (0, NULL) < 7)
+    return 0;
+
+  __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+    /* Run AVX512FP16 test only if host has ISA support.  */
+  if (((ebx & (bit_AVX512F | bit_AVX512BW))
+       == (bit_AVX512F | bit_AVX512BW))
+      && (edx & bit_AVX512FP16)
+      && AVX512VL (ebx))
+    {
+      do_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+      return 0;
+    }
+
+#ifdef DEBUG
+  printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
new file mode 100644
index 00000000000..0abe09f1166
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
@@ -0,0 +1,3 @@
+#define AVX512VL(ebx) (ebx & bit_AVX512VL)
+#define XSTATE_MASK (XSTATE_SSE | XSTATE_OPMASK)
+#include "avx512fp16-check.h"
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
new file mode 100644
index 00000000000..17f2c27edc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
@@ -0,0 +1,150 @@
+#ifndef DEFINED_DEFINES_H
+#define DEFINED_DEFINES_H
+
+/* Get __m64 and __m128. */
+#include <immintrin.h>
+
+typedef unsigned long ulong;
+typedef long double ldouble;
+
+/* These defines determines what part of the test should be run.  When
+   GCC implements these parts, the defines should be uncommented to
+   enable testing.  */
+
+/* Scalar type __int128.  */
+/* #define CHECK_INT128 */
+
+/* Scalar type long double.  */
+#define CHECK_LONG_DOUBLE
+
+/* Scalar type __float128.  */
+/* #define CHECK_FLOAT128 */
+
+/* Scalar types __m64 and __m128.  */
+#define CHECK_M64_M128
+
+/* Returning of complex type.  */
+#define CHECK_COMPLEX
+
+/* Structs with size >= 16.  */
+#define CHECK_LARGER_STRUCTS
+
+/* Checks for passing floats and doubles.  */
+#define CHECK_FLOAT_DOUBLE_PASSING
+
+/* Union passing with not-extremely-simple unions.  */
+#define CHECK_LARGER_UNION_PASSING
+
+/* Variable args.  */
+#define CHECK_VARARGS
+
+/* Check argument passing and returning for scalar types with sizeof = 16.  */
+/* TODO: Implement these tests. Don't activate them for now.  */
+#define CHECK_LARGE_SCALAR_PASSING
+
+/* Defines for sizing and alignment.  */
+
+#define TYPE_SIZE_CHAR         1
+#define TYPE_SIZE_SHORT        2
+#define TYPE_SIZE_INT          4
+#define TYPE_SIZE_LONG         8
+#define TYPE_SIZE_LONG_LONG    8
+#define TYPE_SIZE_INT128       16
+#define TYPE_SIZE_FLOAT16      2
+#define TYPE_SIZE_FLOAT        4
+#define TYPE_SIZE_DOUBLE       8
+#define TYPE_SIZE_LONG_DOUBLE  16
+#define TYPE_SIZE_FLOAT128     16
+#define TYPE_SIZE_M64          8
+#define TYPE_SIZE_M128         16
+#define TYPE_SIZE_ENUM         4
+#define TYPE_SIZE_POINTER      8
+
+#define TYPE_ALIGN_CHAR        1
+#define TYPE_ALIGN_SHORT       2
+#define TYPE_ALIGN_INT         4
+#define TYPE_ALIGN_LONG        8
+#define TYPE_ALIGN_LONG_LONG   8
+#define TYPE_ALIGN_INT128      16
+#define TYPE_ALIGN_FLOAT16     2
+#define TYPE_ALIGN_FLOAT       4
+#define TYPE_ALIGN_DOUBLE      8
+#define TYPE_ALIGN_LONG_DOUBLE 16
+#define TYPE_ALIGN_FLOAT128    16
+#define TYPE_ALIGN_M64         8
+#define TYPE_ALIGN_M128        16
+#define TYPE_ALIGN_ENUM        4
+#define TYPE_ALIGN_POINTER     8
+
+/* These defines control the building of the list of types to check. There
+   is a string identifying the type (with a comma after), a size of the type
+   (also with a comma and an integer for adding to the total amount of types)
+   and an alignment of the type (which is currently not really needed since
+   the abi specifies that alignof == sizeof for all scalar types).  */
+#ifdef CHECK_INT128
+#define CI128_STR "__int128",
+#define CI128_SIZ TYPE_SIZE_INT128,
+#define CI128_ALI TYPE_ALIGN_INT128,
+#define CI128_RET "???",
+#else
+#define CI128_STR
+#define CI128_SIZ
+#define CI128_ALI
+#define CI128_RET
+#endif
+#ifdef CHECK_LONG_DOUBLE
+#define CLD_STR "long double",
+#define CLD_SIZ TYPE_SIZE_LONG_DOUBLE,
+#define CLD_ALI TYPE_ALIGN_LONG_DOUBLE,
+#define CLD_RET "x87_regs[0]._ldouble",
+#else
+#define CLD_STR
+#define CLD_SIZ
+#define CLD_ALI
+#define CLD_RET
+#endif
+#ifdef CHECK_FLOAT128
+#define CF128_STR "__float128",
+#define CF128_SIZ TYPE_SIZE_FLOAT128,
+#define CF128_ALI TYPE_ALIGN_FLOAT128, 
+#define CF128_RET "???",
+#else
+#define CF128_STR
+#define CF128_SIZ
+#define CF128_ALI
+#define CF128_RET
+#endif
+#ifdef CHECK_M64_M128
+#define CMM_STR "__m64", "__m128",
+#define CMM_SIZ TYPE_SIZE_M64, TYPE_SIZE_M128,
+#define CMM_ALI TYPE_ALIGN_M64, TYPE_ALIGN_M128,
+#define CMM_RET "???", "???",
+#else
+#define CMM_STR
+#define CMM_SIZ
+#define CMM_ALI
+#define CMM_RET
+#endif
+
+/* Used in size and alignment tests.  */
+enum dummytype { enumtype };
+
+extern void abort (void);
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+#ifdef __GNUC__
+#define PACKED __attribute__((__packed__))
+#else
+#warning Some tests will fail due to missing __packed__ support
+#define PACKED
+#endif
+
+#endif /* DEFINED_DEFINES_H */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
new file mode 100644
index 00000000000..98fbc660f27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
@@ -0,0 +1,53 @@
+#ifndef MACROS_H
+
+#define check_size(_t, _size) assert(sizeof(_t) == (_size))
+
+#define check_align(_t, _align) assert(__alignof__(_t) == (_align))
+
+#define check_align_lv(_t, _align) assert(__alignof__(_t) == (_align) \
+					  && (((unsigned long)&(_t)) & ((_align) - 1) ) == 0)
+
+#define check_basic_struct_size_and_align(_type, _size, _align) { \
+  struct _str { _type dummy; } _t; \
+  check_size(_t, _size); \
+  check_align_lv(_t, _align); \
+}
+
+#define check_array_size_and_align(_type, _size, _align) { \
+  _type _a[1]; _type _b[2]; _type _c[16]; \
+  struct _str { _type _a[1]; } _s; \
+  check_align_lv(_a[0], _align); \
+  check_size(_a, _size); \
+  check_size(_b, (_size*2)); \
+  check_size(_c, (_size*16)); \
+  check_size(_s, _size); \
+  check_align_lv(_s._a[0], _align); \
+}
+
+#define check_basic_union_size_and_align(_type, _size, _align) { \
+  union _union { _type dummy; } _u; \
+  check_size(_u, _size); \
+  check_align_lv(_u, _align); \
+}
+
+#define run_signed_tests2(_function, _arg1, _arg2) \
+  _function(_arg1, _arg2); \
+  _function(signed _arg1, _arg2); \
+  _function(unsigned _arg1, _arg2);
+
+#define run_signed_tests3(_function, _arg1, _arg2, _arg3) \
+  _function(_arg1, _arg2, _arg3); \
+  _function(signed _arg1, _arg2, _arg3); \
+  _function(unsigned _arg1, _arg2, _arg3);
+
+/* Check size of a struct and a union of three types.  */
+
+#define check_struct_and_union3(type1, type2, type3, struct_size, align_size) \
+{ \
+  struct _str { type1 t1; type2 t2; type3 t3; } _t; \
+  union _uni { type1 t1; type2 t2; type3 t3; } _u; \
+  check_size(_t, struct_size); \
+  check_size(_u, align_size); \
+}
+
+#endif // MACROS_H
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
new file mode 100644
index 00000000000..cc94e0fe0e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
@@ -0,0 +1,692 @@
+/* This is an autogenerated file. Do not edit.  */
+
+#include "defines.h"
+#include "macros.h"
+
+/* Check structs and unions of all permutations of 3 basic types.  */
+int
+main (void)
+{
+  check_struct_and_union3(char, char, char, 3, 1);
+  check_struct_and_union3(char, char, short, 4, 2);
+  check_struct_and_union3(char, char, int, 8, 4);
+  check_struct_and_union3(char, char, long, 16, 8);
+  check_struct_and_union3(char, char, long long, 16, 8);
+  check_struct_and_union3(char, char, float, 8, 4);
+  check_struct_and_union3(char, char, double, 16, 8);
+  check_struct_and_union3(char, char, long double, 32, 16);
+  check_struct_and_union3(char, short, char, 6, 2);
+  check_struct_and_union3(char, short, short, 6, 2);
+  check_struct_and_union3(char, short, int, 8, 4);
+  check_struct_and_union3(char, short, long, 16, 8);
+  check_struct_and_union3(char, short, long long, 16, 8);
+  check_struct_and_union3(char, short, float, 8, 4);
+  check_struct_and_union3(char, short, double, 16, 8);
+  check_struct_and_union3(char, short, long double, 32, 16);
+  check_struct_and_union3(char, int, char, 12, 4);
+  check_struct_and_union3(char, int, short, 12, 4);
+  check_struct_and_union3(char, int, int, 12, 4);
+  check_struct_and_union3(char, int, long, 16, 8);
+  check_struct_and_union3(char, int, long long, 16, 8);
+  check_struct_and_union3(char, int, float, 12, 4);
+  check_struct_and_union3(char, int, double, 16, 8);
+  check_struct_and_union3(char, int, long double, 32, 16);
+  check_struct_and_union3(char, long, char, 24, 8);
+  check_struct_and_union3(char, long, short, 24, 8);
+  check_struct_and_union3(char, long, int, 24, 8);
+  check_struct_and_union3(char, long, long, 24, 8);
+  check_struct_and_union3(char, long, long long, 24, 8);
+  check_struct_and_union3(char, long, float, 24, 8);
+  check_struct_and_union3(char, long, double, 24, 8);
+  check_struct_and_union3(char, long, long double, 32, 16);
+  check_struct_and_union3(char, long long, char, 24, 8);
+  check_struct_and_union3(char, long long, short, 24, 8);
+  check_struct_and_union3(char, long long, int, 24, 8);
+  check_struct_and_union3(char, long long, long, 24, 8);
+  check_struct_and_union3(char, long long, long long, 24, 8);
+  check_struct_and_union3(char, long long, float, 24, 8);
+  check_struct_and_union3(char, long long, double, 24, 8);
+  check_struct_and_union3(char, long long, long double, 32, 16);
+  check_struct_and_union3(char, float, char, 12, 4);
+  check_struct_and_union3(char, float, short, 12, 4);
+  check_struct_and_union3(char, float, int, 12, 4);
+  check_struct_and_union3(char, float, long, 16, 8);
+  check_struct_and_union3(char, float, long long, 16, 8);
+  check_struct_and_union3(char, float, float, 12, 4);
+  check_struct_and_union3(char, float, double, 16, 8);
+  check_struct_and_union3(char, float, long double, 32, 16);
+  check_struct_and_union3(char, double, char, 24, 8);
+  check_struct_and_union3(char, double, short, 24, 8);
+  check_struct_and_union3(char, double, int, 24, 8);
+  check_struct_and_union3(char, double, long, 24, 8);
+  check_struct_and_union3(char, double, long long, 24, 8);
+  check_struct_and_union3(char, double, float, 24, 8);
+  check_struct_and_union3(char, double, double, 24, 8);
+  check_struct_and_union3(char, double, long double, 32, 16);
+  check_struct_and_union3(char, long double, char, 48, 16);
+  check_struct_and_union3(char, long double, short, 48, 16);
+  check_struct_and_union3(char, long double, int, 48, 16);
+  check_struct_and_union3(char, long double, long, 48, 16);
+  check_struct_and_union3(char, long double, long long, 48, 16);
+  check_struct_and_union3(char, long double, float, 48, 16);
+  check_struct_and_union3(char, long double, double, 48, 16);
+  check_struct_and_union3(char, long double, long double, 48, 16);
+  check_struct_and_union3(short, char, char, 4, 2);
+  check_struct_and_union3(short, char, short, 6, 2);
+  check_struct_and_union3(short, char, int, 8, 4);
+  check_struct_and_union3(short, char, long, 16, 8);
+  check_struct_and_union3(short, char, long long, 16, 8);
+  check_struct_and_union3(short, char, float, 8, 4);
+  check_struct_and_union3(short, char, double, 16, 8);
+  check_struct_and_union3(short, char, long double, 32, 16);
+  check_struct_and_union3(short, short, char, 6, 2);
+  check_struct_and_union3(short, short, short, 6, 2);
+  check_struct_and_union3(short, short, int, 8, 4);
+  check_struct_and_union3(short, short, long, 16, 8);
+  check_struct_and_union3(short, short, long long, 16, 8);
+  check_struct_and_union3(short, short, float, 8, 4);
+  check_struct_and_union3(short, short, double, 16, 8);
+  check_struct_and_union3(short, short, long double, 32, 16);
+  check_struct_and_union3(short, int, char, 12, 4);
+  check_struct_and_union3(short, int, short, 12, 4);
+  check_struct_and_union3(short, int, int, 12, 4);
+  check_struct_and_union3(short, int, long, 16, 8);
+  check_struct_and_union3(short, int, long long, 16, 8);
+  check_struct_and_union3(short, int, float, 12, 4);
+  check_struct_and_union3(short, int, double, 16, 8);
+  check_struct_and_union3(short, int, long double, 32, 16);
+  check_struct_and_union3(short, long, char, 24, 8);
+  check_struct_and_union3(short, long, short, 24, 8);
+  check_struct_and_union3(short, long, int, 24, 8);
+  check_struct_and_union3(short, long, long, 24, 8);
+  check_struct_and_union3(short, long, long long, 24, 8);
+  check_struct_and_union3(short, long, float, 24, 8);
+  check_struct_and_union3(short, long, double, 24, 8);
+  check_struct_and_union3(short, long, long double, 32, 16);
+  check_struct_and_union3(short, long long, char, 24, 8);
+  check_struct_and_union3(short, long long, short, 24, 8);
+  check_struct_and_union3(short, long long, int, 24, 8);
+  check_struct_and_union3(short, long long, long, 24, 8);
+  check_struct_and_union3(short, long long, long long, 24, 8);
+  check_struct_and_union3(short, long long, float, 24, 8);
+  check_struct_and_union3(short, long long, double, 24, 8);
+  check_struct_and_union3(short, long long, long double, 32, 16);
+  check_struct_and_union3(short, float, char, 12, 4);
+  check_struct_and_union3(short, float, short, 12, 4);
+  check_struct_and_union3(short, float, int, 12, 4);
+  check_struct_and_union3(short, float, long, 16, 8);
+  check_struct_and_union3(short, float, long long, 16, 8);
+  check_struct_and_union3(short, float, float, 12, 4);
+  check_struct_and_union3(short, float, double, 16, 8);
+  check_struct_and_union3(short, float, long double, 32, 16);
+  check_struct_and_union3(short, double, char, 24, 8);
+  check_struct_and_union3(short, double, short, 24, 8);
+  check_struct_and_union3(short, double, int, 24, 8);
+  check_struct_and_union3(short, double, long, 24, 8);
+  check_struct_and_union3(short, double, long long, 24, 8);
+  check_struct_and_union3(short, double, float, 24, 8);
+  check_struct_and_union3(short, double, double, 24, 8);
+  check_struct_and_union3(short, double, long double, 32, 16);
+  check_struct_and_union3(short, long double, char, 48, 16);
+  check_struct_and_union3(short, long double, short, 48, 16);
+  check_struct_and_union3(short, long double, int, 48, 16);
+  check_struct_and_union3(short, long double, long, 48, 16);
+  check_struct_and_union3(short, long double, long long, 48, 16);
+  check_struct_and_union3(short, long double, float, 48, 16);
+  check_struct_and_union3(short, long double, double, 48, 16);
+  check_struct_and_union3(short, long double, long double, 48, 16);
+  check_struct_and_union3(int, char, char, 8, 4);
+  check_struct_and_union3(int, char, short, 8, 4);
+  check_struct_and_union3(int, char, int, 12, 4);
+  check_struct_and_union3(int, char, long, 16, 8);
+  check_struct_and_union3(int, char, long long, 16, 8);
+  check_struct_and_union3(int, char, float, 12, 4);
+  check_struct_and_union3(int, char, double, 16, 8);
+  check_struct_and_union3(int, char, long double, 32, 16);
+  check_struct_and_union3(int, short, char, 8, 4);
+  check_struct_and_union3(int, short, short, 8, 4);
+  check_struct_and_union3(int, short, int, 12, 4);
+  check_struct_and_union3(int, short, long, 16, 8);
+  check_struct_and_union3(int, short, long long, 16, 8);
+  check_struct_and_union3(int, short, float, 12, 4);
+  check_struct_and_union3(int, short, double, 16, 8);
+  check_struct_and_union3(int, short, long double, 32, 16);
+  check_struct_and_union3(int, int, char, 12, 4);
+  check_struct_and_union3(int, int, short, 12, 4);
+  check_struct_and_union3(int, int, int, 12, 4);
+  check_struct_and_union3(int, int, long, 16, 8);
+  check_struct_and_union3(int, int, long long, 16, 8);
+  check_struct_and_union3(int, int, float, 12, 4);
+  check_struct_and_union3(int, int, double, 16, 8);
+  check_struct_and_union3(int, int, long double, 32, 16);
+  check_struct_and_union3(int, long, char, 24, 8);
+  check_struct_and_union3(int, long, short, 24, 8);
+  check_struct_and_union3(int, long, int, 24, 8);
+  check_struct_and_union3(int, long, long, 24, 8);
+  check_struct_and_union3(int, long, long long, 24, 8);
+  check_struct_and_union3(int, long, float, 24, 8);
+  check_struct_and_union3(int, long, double, 24, 8);
+  check_struct_and_union3(int, long, long double, 32, 16);
+  check_struct_and_union3(int, long long, char, 24, 8);
+  check_struct_and_union3(int, long long, short, 24, 8);
+  check_struct_and_union3(int, long long, int, 24, 8);
+  check_struct_and_union3(int, long long, long, 24, 8);
+  check_struct_and_union3(int, long long, long long, 24, 8);
+  check_struct_and_union3(int, long long, float, 24, 8);
+  check_struct_and_union3(int, long long, double, 24, 8);
+  check_struct_and_union3(int, long long, long double, 32, 16);
+  check_struct_and_union3(int, float, char, 12, 4);
+  check_struct_and_union3(int, float, short, 12, 4);
+  check_struct_and_union3(int, float, int, 12, 4);
+  check_struct_and_union3(int, float, long, 16, 8);
+  check_struct_and_union3(int, float, long long, 16, 8);
+  check_struct_and_union3(int, float, float, 12, 4);
+  check_struct_and_union3(int, float, double, 16, 8);
+  check_struct_and_union3(int, float, long double, 32, 16);
+  check_struct_and_union3(int, double, char, 24, 8);
+  check_struct_and_union3(int, double, short, 24, 8);
+  check_struct_and_union3(int, double, int, 24, 8);
+  check_struct_and_union3(int, double, long, 24, 8);
+  check_struct_and_union3(int, double, long long, 24, 8);
+  check_struct_and_union3(int, double, float, 24, 8);
+  check_struct_and_union3(int, double, double, 24, 8);
+  check_struct_and_union3(int, double, long double, 32, 16);
+  check_struct_and_union3(int, long double, char, 48, 16);
+  check_struct_and_union3(int, long double, short, 48, 16);
+  check_struct_and_union3(int, long double, int, 48, 16);
+  check_struct_and_union3(int, long double, long, 48, 16);
+  check_struct_and_union3(int, long double, long long, 48, 16);
+  check_struct_and_union3(int, long double, float, 48, 16);
+  check_struct_and_union3(int, long double, double, 48, 16);
+  check_struct_and_union3(int, long double, long double, 48, 16);
+  check_struct_and_union3(long, char, char, 16, 8);
+  check_struct_and_union3(long, char, short, 16, 8);
+  check_struct_and_union3(long, char, int, 16, 8);
+  check_struct_and_union3(long, char, long, 24, 8);
+  check_struct_and_union3(long, char, long long, 24, 8);
+  check_struct_and_union3(long, char, float, 16, 8);
+  check_struct_and_union3(long, char, double, 24, 8);
+  check_struct_and_union3(long, char, long double, 32, 16);
+  check_struct_and_union3(long, short, char, 16, 8);
+  check_struct_and_union3(long, short, short, 16, 8);
+  check_struct_and_union3(long, short, int, 16, 8);
+  check_struct_and_union3(long, short, long, 24, 8);
+  check_struct_and_union3(long, short, long long, 24, 8);
+  check_struct_and_union3(long, short, float, 16, 8);
+  check_struct_and_union3(long, short, double, 24, 8);
+  check_struct_and_union3(long, short, long double, 32, 16);
+  check_struct_and_union3(long, int, char, 16, 8);
+  check_struct_and_union3(long, int, short, 16, 8);
+  check_struct_and_union3(long, int, int, 16, 8);
+  check_struct_and_union3(long, int, long, 24, 8);
+  check_struct_and_union3(long, int, long long, 24, 8);
+  check_struct_and_union3(long, int, float, 16, 8);
+  check_struct_and_union3(long, int, double, 24, 8);
+  check_struct_and_union3(long, int, long double, 32, 16);
+  check_struct_and_union3(long, long, char, 24, 8);
+  check_struct_and_union3(long, long, short, 24, 8);
+  check_struct_and_union3(long, long, int, 24, 8);
+  check_struct_and_union3(long, long, long, 24, 8);
+  check_struct_and_union3(long, long, long long, 24, 8);
+  check_struct_and_union3(long, long, float, 24, 8);
+  check_struct_and_union3(long, long, double, 24, 8);
+  check_struct_and_union3(long, long, long double, 32, 16);
+  check_struct_and_union3(long, long long, char, 24, 8);
+  check_struct_and_union3(long, long long, short, 24, 8);
+  check_struct_and_union3(long, long long, int, 24, 8);
+  check_struct_and_union3(long, long long, long, 24, 8);
+  check_struct_and_union3(long, long long, long long, 24, 8);
+  check_struct_and_union3(long, long long, float, 24, 8);
+  check_struct_and_union3(long, long long, double, 24, 8);
+  check_struct_and_union3(long, long long, long double, 32, 16);
+  check_struct_and_union3(long, float, char, 16, 8);
+  check_struct_and_union3(long, float, short, 16, 8);
+  check_struct_and_union3(long, float, int, 16, 8);
+  check_struct_and_union3(long, float, long, 24, 8);
+  check_struct_and_union3(long, float, long long, 24, 8);
+  check_struct_and_union3(long, float, float, 16, 8);
+  check_struct_and_union3(long, float, double, 24, 8);
+  check_struct_and_union3(long, float, long double, 32, 16);
+  check_struct_and_union3(long, double, char, 24, 8);
+  check_struct_and_union3(long, double, short, 24, 8);
+  check_struct_and_union3(long, double, int, 24, 8);
+  check_struct_and_union3(long, double, long, 24, 8);
+  check_struct_and_union3(long, double, long long, 24, 8);
+  check_struct_and_union3(long, double, float, 24, 8);
+  check_struct_and_union3(long, double, double, 24, 8);
+  check_struct_and_union3(long, double, long double, 32, 16);
+  check_struct_and_union3(long, long double, char, 48, 16);
+  check_struct_and_union3(long, long double, short, 48, 16);
+  check_struct_and_union3(long, long double, int, 48, 16);
+  check_struct_and_union3(long, long double, long, 48, 16);
+  check_struct_and_union3(long, long double, long long, 48, 16);
+  check_struct_and_union3(long, long double, float, 48, 16);
+  check_struct_and_union3(long, long double, double, 48, 16);
+  check_struct_and_union3(long, long double, long double, 48, 16);
+  check_struct_and_union3(long long, char, char, 16, 8);
+  check_struct_and_union3(long long, char, short, 16, 8);
+  check_struct_and_union3(long long, char, int, 16, 8);
+  check_struct_and_union3(long long, char, long, 24, 8);
+  check_struct_and_union3(long long, char, long long, 24, 8);
+  check_struct_and_union3(long long, char, float, 16, 8);
+  check_struct_and_union3(long long, char, double, 24, 8);
+  check_struct_and_union3(long long, char, long double, 32, 16);
+  check_struct_and_union3(long long, short, char, 16, 8);
+  check_struct_and_union3(long long, short, short, 16, 8);
+  check_struct_and_union3(long long, short, int, 16, 8);
+  check_struct_and_union3(long long, short, long, 24, 8);
+  check_struct_and_union3(long long, short, long long, 24, 8);
+  check_struct_and_union3(long long, short, float, 16, 8);
+  check_struct_and_union3(long long, short, double, 24, 8);
+  check_struct_and_union3(long long, short, long double, 32, 16);
+  check_struct_and_union3(long long, int, char, 16, 8);
+  check_struct_and_union3(long long, int, short, 16, 8);
+  check_struct_and_union3(long long, int, int, 16, 8);
+  check_struct_and_union3(long long, int, long, 24, 8);
+  check_struct_and_union3(long long, int, long long, 24, 8);
+  check_struct_and_union3(long long, int, float, 16, 8);
+  check_struct_and_union3(long long, int, double, 24, 8);
+  check_struct_and_union3(long long, int, long double, 32, 16);
+  check_struct_and_union3(long long, long, char, 24, 8);
+  check_struct_and_union3(long long, long, short, 24, 8);
+  check_struct_and_union3(long long, long, int, 24, 8);
+  check_struct_and_union3(long long, long, long, 24, 8);
+  check_struct_and_union3(long long, long, long long, 24, 8);
+  check_struct_and_union3(long long, long, float, 24, 8);
+  check_struct_and_union3(long long, long, double, 24, 8);
+  check_struct_and_union3(long long, long, long double, 32, 16);
+  check_struct_and_union3(long long, long long, char, 24, 8);
+  check_struct_and_union3(long long, long long, short, 24, 8);
+  check_struct_and_union3(long long, long long, int, 24, 8);
+  check_struct_and_union3(long long, long long, long, 24, 8);
+  check_struct_and_union3(long long, long long, long long, 24, 8);
+  check_struct_and_union3(long long, long long, float, 24, 8);
+  check_struct_and_union3(long long, long long, double, 24, 8);
+  check_struct_and_union3(long long, long long, long double, 32, 16);
+  check_struct_and_union3(long long, float, char, 16, 8);
+  check_struct_and_union3(long long, float, short, 16, 8);
+  check_struct_and_union3(long long, float, int, 16, 8);
+  check_struct_and_union3(long long, float, long, 24, 8);
+  check_struct_and_union3(long long, float, long long, 24, 8);
+  check_struct_and_union3(long long, float, float, 16, 8);
+  check_struct_and_union3(long long, float, double, 24, 8);
+  check_struct_and_union3(long long, float, long double, 32, 16);
+  check_struct_and_union3(long long, double, char, 24, 8);
+  check_struct_and_union3(long long, double, short, 24, 8);
+  check_struct_and_union3(long long, double, int, 24, 8);
+  check_struct_and_union3(long long, double, long, 24, 8);
+  check_struct_and_union3(long long, double, long long, 24, 8);
+  check_struct_and_union3(long long, double, float, 24, 8);
+  check_struct_and_union3(long long, double, double, 24, 8);
+  check_struct_and_union3(long long, double, long double, 32, 16);
+  check_struct_and_union3(long long, long double, char, 48, 16);
+  check_struct_and_union3(long long, long double, short, 48, 16);
+  check_struct_and_union3(long long, long double, int, 48, 16);
+  check_struct_and_union3(long long, long double, long, 48, 16);
+  check_struct_and_union3(long long, long double, long long, 48, 16);
+  check_struct_and_union3(long long, long double, float, 48, 16);
+  check_struct_and_union3(long long, long double, double, 48, 16);
+  check_struct_and_union3(long long, long double, long double, 48, 16);
+  check_struct_and_union3(float, char, char, 8, 4);
+  check_struct_and_union3(float, char, short, 8, 4);
+  check_struct_and_union3(float, char, int, 12, 4);
+  check_struct_and_union3(float, char, long, 16, 8);
+  check_struct_and_union3(float, char, long long, 16, 8);
+  check_struct_and_union3(float, char, float, 12, 4);
+  check_struct_and_union3(float, char, double, 16, 8);
+  check_struct_and_union3(float, char, long double, 32, 16);
+  check_struct_and_union3(float, short, char, 8, 4);
+  check_struct_and_union3(float, short, short, 8, 4);
+  check_struct_and_union3(float, short, int, 12, 4);
+  check_struct_and_union3(float, short, long, 16, 8);
+  check_struct_and_union3(float, short, long long, 16, 8);
+  check_struct_and_union3(float, short, float, 12, 4);
+  check_struct_and_union3(float, short, double, 16, 8);
+  check_struct_and_union3(float, short, long double, 32, 16);
+  check_struct_and_union3(float, int, char, 12, 4);
+  check_struct_and_union3(float, int, short, 12, 4);
+  check_struct_and_union3(float, int, int, 12, 4);
+  check_struct_and_union3(float, int, long, 16, 8);
+  check_struct_and_union3(float, int, long long, 16, 8);
+  check_struct_and_union3(float, int, float, 12, 4);
+  check_struct_and_union3(float, int, double, 16, 8);
+  check_struct_and_union3(float, int, long double, 32, 16);
+  check_struct_and_union3(float, long, char, 24, 8);
+  check_struct_and_union3(float, long, short, 24, 8);
+  check_struct_and_union3(float, long, int, 24, 8);
+  check_struct_and_union3(float, long, long, 24, 8);
+  check_struct_and_union3(float, long, long long, 24, 8);
+  check_struct_and_union3(float, long, float, 24, 8);
+  check_struct_and_union3(float, long, double, 24, 8);
+  check_struct_and_union3(float, long, long double, 32, 16);
+  check_struct_and_union3(float, long long, char, 24, 8);
+  check_struct_and_union3(float, long long, short, 24, 8);
+  check_struct_and_union3(float, long long, int, 24, 8);
+  check_struct_and_union3(float, long long, long, 24, 8);
+  check_struct_and_union3(float, long long, long long, 24, 8);
+  check_struct_and_union3(float, long long, float, 24, 8);
+  check_struct_and_union3(float, long long, double, 24, 8);
+  check_struct_and_union3(float, long long, long double, 32, 16);
+  check_struct_and_union3(float, float, char, 12, 4);
+  check_struct_and_union3(float, float, short, 12, 4);
+  check_struct_and_union3(float, float, int, 12, 4);
+  check_struct_and_union3(float, float, long, 16, 8);
+  check_struct_and_union3(float, float, long long, 16, 8);
+  check_struct_and_union3(float, float, float, 12, 4);
+  check_struct_and_union3(float, float, double, 16, 8);
+  check_struct_and_union3(float, float, long double, 32, 16);
+  check_struct_and_union3(float, double, char, 24, 8);
+  check_struct_and_union3(float, double, short, 24, 8);
+  check_struct_and_union3(float, double, int, 24, 8);
+  check_struct_and_union3(float, double, long, 24, 8);
+  check_struct_and_union3(float, double, long long, 24, 8);
+  check_struct_and_union3(float, double, float, 24, 8);
+  check_struct_and_union3(float, double, double, 24, 8);
+  check_struct_and_union3(float, double, long double, 32, 16);
+  check_struct_and_union3(float, long double, char, 48, 16);
+  check_struct_and_union3(float, long double, short, 48, 16);
+  check_struct_and_union3(float, long double, int, 48, 16);
+  check_struct_and_union3(float, long double, long, 48, 16);
+  check_struct_and_union3(float, long double, long long, 48, 16);
+  check_struct_and_union3(float, long double, float, 48, 16);
+  check_struct_and_union3(float, long double, double, 48, 16);
+  check_struct_and_union3(float, long double, long double, 48, 16);
+  check_struct_and_union3(double, char, char, 16, 8);
+  check_struct_and_union3(double, char, short, 16, 8);
+  check_struct_and_union3(double, char, int, 16, 8);
+  check_struct_and_union3(double, char, long, 24, 8);
+  check_struct_and_union3(double, char, long long, 24, 8);
+  check_struct_and_union3(double, char, float, 16, 8);
+  check_struct_and_union3(double, char, double, 24, 8);
+  check_struct_and_union3(double, char, long double, 32, 16);
+  check_struct_and_union3(double, short, char, 16, 8);
+  check_struct_and_union3(double, short, short, 16, 8);
+  check_struct_and_union3(double, short, int, 16, 8);
+  check_struct_and_union3(double, short, long, 24, 8);
+  check_struct_and_union3(double, short, long long, 24, 8);
+  check_struct_and_union3(double, short, float, 16, 8);
+  check_struct_and_union3(double, short, double, 24, 8);
+  check_struct_and_union3(double, short, long double, 32, 16);
+  check_struct_and_union3(double, int, char, 16, 8);
+  check_struct_and_union3(double, int, short, 16, 8);
+  check_struct_and_union3(double, int, int, 16, 8);
+  check_struct_and_union3(double, int, long, 24, 8);
+  check_struct_and_union3(double, int, long long, 24, 8);
+  check_struct_and_union3(double, int, float, 16, 8);
+  check_struct_and_union3(double, int, double, 24, 8);
+  check_struct_and_union3(double, int, long double, 32, 16);
+  check_struct_and_union3(double, long, char, 24, 8);
+  check_struct_and_union3(double, long, short, 24, 8);
+  check_struct_and_union3(double, long, int, 24, 8);
+  check_struct_and_union3(double, long, long, 24, 8);
+  check_struct_and_union3(double, long, long long, 24, 8);
+  check_struct_and_union3(double, long, float, 24, 8);
+  check_struct_and_union3(double, long, double, 24, 8);
+  check_struct_and_union3(double, long, long double, 32, 16);
+  check_struct_and_union3(double, long long, char, 24, 8);
+  check_struct_and_union3(double, long long, short, 24, 8);
+  check_struct_and_union3(double, long long, int, 24, 8);
+  check_struct_and_union3(double, long long, long, 24, 8);
+  check_struct_and_union3(double, long long, long long, 24, 8);
+  check_struct_and_union3(double, long long, float, 24, 8);
+  check_struct_and_union3(double, long long, double, 24, 8);
+  check_struct_and_union3(double, long long, long double, 32, 16);
+  check_struct_and_union3(double, float, char, 16, 8);
+  check_struct_and_union3(double, float, short, 16, 8);
+  check_struct_and_union3(double, float, int, 16, 8);
+  check_struct_and_union3(double, float, long, 24, 8);
+  check_struct_and_union3(double, float, long long, 24, 8);
+  check_struct_and_union3(double, float, float, 16, 8);
+  check_struct_and_union3(double, float, double, 24, 8);
+  check_struct_and_union3(double, float, long double, 32, 16);
+  check_struct_and_union3(double, double, char, 24, 8);
+  check_struct_and_union3(double, double, short, 24, 8);
+  check_struct_and_union3(double, double, int, 24, 8);
+  check_struct_and_union3(double, double, long, 24, 8);
+  check_struct_and_union3(double, double, long long, 24, 8);
+  check_struct_and_union3(double, double, float, 24, 8);
+  check_struct_and_union3(double, double, double, 24, 8);
+  check_struct_and_union3(double, double, long double, 32, 16);
+  check_struct_and_union3(double, long double, char, 48, 16);
+  check_struct_and_union3(double, long double, short, 48, 16);
+  check_struct_and_union3(double, long double, int, 48, 16);
+  check_struct_and_union3(double, long double, long, 48, 16);
+  check_struct_and_union3(double, long double, long long, 48, 16);
+  check_struct_and_union3(double, long double, float, 48, 16);
+  check_struct_and_union3(double, long double, double, 48, 16);
+  check_struct_and_union3(double, long double, long double, 48, 16);
+  check_struct_and_union3(long double, char, char, 32, 16);
+  check_struct_and_union3(long double, char, short, 32, 16);
+  check_struct_and_union3(long double, char, int, 32, 16);
+  check_struct_and_union3(long double, char, long, 32, 16);
+  check_struct_and_union3(long double, char, long long, 32, 16);
+  check_struct_and_union3(long double, char, float, 32, 16);
+  check_struct_and_union3(long double, char, double, 32, 16);
+  check_struct_and_union3(long double, char, long double, 48, 16);
+  check_struct_and_union3(long double, short, char, 32, 16);
+  check_struct_and_union3(long double, short, short, 32, 16);
+  check_struct_and_union3(long double, short, int, 32, 16);
+  check_struct_and_union3(long double, short, long, 32, 16);
+  check_struct_and_union3(long double, short, long long, 32, 16);
+  check_struct_and_union3(long double, short, float, 32, 16);
+  check_struct_and_union3(long double, short, double, 32, 16);
+  check_struct_and_union3(long double, short, long double, 48, 16);
+  check_struct_and_union3(long double, int, char, 32, 16);
+  check_struct_and_union3(long double, int, short, 32, 16);
+  check_struct_and_union3(long double, int, int, 32, 16);
+  check_struct_and_union3(long double, int, long, 32, 16);
+  check_struct_and_union3(long double, int, long long, 32, 16);
+  check_struct_and_union3(long double, int, float, 32, 16);
+  check_struct_and_union3(long double, int, double, 32, 16);
+  check_struct_and_union3(long double, int, long double, 48, 16);
+  check_struct_and_union3(long double, long, char, 32, 16);
+  check_struct_and_union3(long double, long, short, 32, 16);
+  check_struct_and_union3(long double, long, int, 32, 16);
+  check_struct_and_union3(long double, long, long, 32, 16);
+  check_struct_and_union3(long double, long, long long, 32, 16);
+  check_struct_and_union3(long double, long, float, 32, 16);
+  check_struct_and_union3(long double, long, double, 32, 16);
+  check_struct_and_union3(long double, long, long double, 48, 16);
+  check_struct_and_union3(long double, long long, char, 32, 16);
+  check_struct_and_union3(long double, long long, short, 32, 16);
+  check_struct_and_union3(long double, long long, int, 32, 16);
+  check_struct_and_union3(long double, long long, long, 32, 16);
+  check_struct_and_union3(long double, long long, long long, 32, 16);
+  check_struct_and_union3(long double, long long, float, 32, 16);
+  check_struct_and_union3(long double, long long, double, 32, 16);
+  check_struct_and_union3(long double, long long, long double, 48, 16);
+  check_struct_and_union3(long double, float, char, 32, 16);
+  check_struct_and_union3(long double, float, short, 32, 16);
+  check_struct_and_union3(long double, float, int, 32, 16);
+  check_struct_and_union3(long double, float, long, 32, 16);
+  check_struct_and_union3(long double, float, long long, 32, 16);
+  check_struct_and_union3(long double, float, float, 32, 16);
+  check_struct_and_union3(long double, float, double, 32, 16);
+  check_struct_and_union3(long double, float, long double, 48, 16);
+  check_struct_and_union3(long double, double, char, 32, 16);
+  check_struct_and_union3(long double, double, short, 32, 16);
+  check_struct_and_union3(long double, double, int, 32, 16);
+  check_struct_and_union3(long double, double, long, 32, 16);
+  check_struct_and_union3(long double, double, long long, 32, 16);
+  check_struct_and_union3(long double, double, float, 32, 16);
+  check_struct_and_union3(long double, double, double, 32, 16);
+  check_struct_and_union3(long double, double, long double, 48, 16);
+  check_struct_and_union3(long double, long double, char, 48, 16);
+  check_struct_and_union3(long double, long double, short, 48, 16);
+  check_struct_and_union3(long double, long double, int, 48, 16);
+  check_struct_and_union3(long double, long double, long, 48, 16);
+  check_struct_and_union3(long double, long double, long long, 48, 16);
+  check_struct_and_union3(long double, long double, float, 48, 16);
+  check_struct_and_union3(long double, long double, double, 48, 16);
+  check_struct_and_union3(long double, long double, long double, 48, 16);
+  check_struct_and_union3(char, char, _Float16, 4, 2);
+  check_struct_and_union3(char, _Float16, char, 6, 2);
+  check_struct_and_union3(char, _Float16, _Float16, 6, 2);
+  check_struct_and_union3(char, _Float16, int, 8, 4);
+  check_struct_and_union3(char, _Float16, long, 16, 8);
+  check_struct_and_union3(char, _Float16, long long, 16, 8);
+  check_struct_and_union3(char, _Float16, float, 8, 4);
+  check_struct_and_union3(char, _Float16, double, 16, 8);
+  check_struct_and_union3(char, _Float16, long double, 32, 16);
+  check_struct_and_union3(char, int, _Float16, 12, 4);
+  check_struct_and_union3(char, long, _Float16, 24, 8);
+  check_struct_and_union3(char, long long, _Float16, 24, 8);
+  check_struct_and_union3(char, float, _Float16, 12, 4);
+  check_struct_and_union3(char, double, _Float16, 24, 8);
+  check_struct_and_union3(char, long double, _Float16, 48, 16);
+  check_struct_and_union3(_Float16, char, char, 4, 2);
+  check_struct_and_union3(_Float16, char, _Float16, 6, 2);
+  check_struct_and_union3(_Float16, char, int, 8, 4);
+  check_struct_and_union3(_Float16, char, long, 16, 8);
+  check_struct_and_union3(_Float16, char, long long, 16, 8);
+  check_struct_and_union3(_Float16, char, float, 8, 4);
+  check_struct_and_union3(_Float16, char, double, 16, 8);
+  check_struct_and_union3(_Float16, char, long double, 32, 16);
+  check_struct_and_union3(_Float16, _Float16, char, 6, 2);
+  check_struct_and_union3(_Float16, _Float16, _Float16, 6, 2);
+  check_struct_and_union3(_Float16, _Float16, int, 8, 4);
+  check_struct_and_union3(_Float16, _Float16, long, 16, 8);
+  check_struct_and_union3(_Float16, _Float16, long long, 16, 8);
+  check_struct_and_union3(_Float16, _Float16, float, 8, 4);
+  check_struct_and_union3(_Float16, _Float16, double, 16, 8);
+  check_struct_and_union3(_Float16, _Float16, long double, 32, 16);
+  check_struct_and_union3(_Float16, int, char, 12, 4);
+  check_struct_and_union3(_Float16, int, _Float16, 12, 4);
+  check_struct_and_union3(_Float16, int, int, 12, 4);
+  check_struct_and_union3(_Float16, int, long, 16, 8);
+  check_struct_and_union3(_Float16, int, long long, 16, 8);
+  check_struct_and_union3(_Float16, int, float, 12, 4);
+  check_struct_and_union3(_Float16, int, double, 16, 8);
+  check_struct_and_union3(_Float16, int, long double, 32, 16);
+  check_struct_and_union3(_Float16, long, char, 24, 8);
+  check_struct_and_union3(_Float16, long, _Float16, 24, 8);
+  check_struct_and_union3(_Float16, long, int, 24, 8);
+  check_struct_and_union3(_Float16, long, long, 24, 8);
+  check_struct_and_union3(_Float16, long, long long, 24, 8);
+  check_struct_and_union3(_Float16, long, float, 24, 8);
+  check_struct_and_union3(_Float16, long, double, 24, 8);
+  check_struct_and_union3(_Float16, long, long double, 32, 16);
+  check_struct_and_union3(_Float16, long long, char, 24, 8);
+  check_struct_and_union3(_Float16, long long, _Float16, 24, 8);
+  check_struct_and_union3(_Float16, long long, int, 24, 8);
+  check_struct_and_union3(_Float16, long long, long, 24, 8);
+  check_struct_and_union3(_Float16, long long, long long, 24, 8);
+  check_struct_and_union3(_Float16, long long, float, 24, 8);
+  check_struct_and_union3(_Float16, long long, double, 24, 8);
+  check_struct_and_union3(_Float16, long long, long double, 32, 16);
+  check_struct_and_union3(_Float16, float, char, 12, 4);
+  check_struct_and_union3(_Float16, float, _Float16, 12, 4);
+  check_struct_and_union3(_Float16, float, int, 12, 4);
+  check_struct_and_union3(_Float16, float, long, 16, 8);
+  check_struct_and_union3(_Float16, float, long long, 16, 8);
+  check_struct_and_union3(_Float16, float, float, 12, 4);
+  check_struct_and_union3(_Float16, float, double, 16, 8);
+  check_struct_and_union3(_Float16, float, long double, 32, 16);
+  check_struct_and_union3(_Float16, double, char, 24, 8);
+  check_struct_and_union3(_Float16, double, _Float16, 24, 8);
+  check_struct_and_union3(_Float16, double, int, 24, 8);
+  check_struct_and_union3(_Float16, double, long, 24, 8);
+  check_struct_and_union3(_Float16, double, long long, 24, 8);
+  check_struct_and_union3(_Float16, double, float, 24, 8);
+  check_struct_and_union3(_Float16, double, double, 24, 8);
+  check_struct_and_union3(_Float16, double, long double, 32, 16);
+  check_struct_and_union3(_Float16, long double, char, 48, 16);
+  check_struct_and_union3(_Float16, long double, _Float16, 48, 16);
+  check_struct_and_union3(_Float16, long double, int, 48, 16);
+  check_struct_and_union3(_Float16, long double, long, 48, 16);
+  check_struct_and_union3(_Float16, long double, long long, 48, 16);
+  check_struct_and_union3(_Float16, long double, float, 48, 16);
+  check_struct_and_union3(_Float16, long double, double, 48, 16);
+  check_struct_and_union3(_Float16, long double, long double, 48, 16);
+  check_struct_and_union3(int, char, _Float16, 8, 4);
+  check_struct_and_union3(int, _Float16, char, 8, 4);
+  check_struct_and_union3(int, _Float16, _Float16, 8, 4);
+  check_struct_and_union3(int, _Float16, int, 12, 4);
+  check_struct_and_union3(int, _Float16, long, 16, 8);
+  check_struct_and_union3(int, _Float16, long long, 16, 8);
+  check_struct_and_union3(int, _Float16, float, 12, 4);
+  check_struct_and_union3(int, _Float16, double, 16, 8);
+  check_struct_and_union3(int, _Float16, long double, 32, 16);
+  check_struct_and_union3(int, int, _Float16, 12, 4);
+  check_struct_and_union3(int, long, _Float16, 24, 8);
+  check_struct_and_union3(int, long long, _Float16, 24, 8);
+  check_struct_and_union3(int, float, _Float16, 12, 4);
+  check_struct_and_union3(int, double, _Float16, 24, 8);
+  check_struct_and_union3(int, long double, _Float16, 48, 16);
+  check_struct_and_union3(long, char, _Float16, 16, 8);
+  check_struct_and_union3(long, _Float16, char, 16, 8);
+  check_struct_and_union3(long, _Float16, _Float16, 16, 8);
+  check_struct_and_union3(long, _Float16, int, 16, 8);
+  check_struct_and_union3(long, _Float16, long, 24, 8);
+  check_struct_and_union3(long, _Float16, long long, 24, 8);
+  check_struct_and_union3(long, _Float16, float, 16, 8);
+  check_struct_and_union3(long, _Float16, double, 24, 8);
+  check_struct_and_union3(long, _Float16, long double, 32, 16);
+  check_struct_and_union3(long, int, _Float16, 16, 8);
+  check_struct_and_union3(long, long, _Float16, 24, 8);
+  check_struct_and_union3(long, long long, _Float16, 24, 8);
+  check_struct_and_union3(long, float, _Float16, 16, 8);
+  check_struct_and_union3(long, double, _Float16, 24, 8);
+  check_struct_and_union3(long, long double, _Float16, 48, 16);
+  check_struct_and_union3(long long, char, _Float16, 16, 8);
+  check_struct_and_union3(long long, _Float16, char, 16, 8);
+  check_struct_and_union3(long long, _Float16, _Float16, 16, 8);
+  check_struct_and_union3(long long, _Float16, int, 16, 8);
+  check_struct_and_union3(long long, _Float16, long, 24, 8);
+  check_struct_and_union3(long long, _Float16, long long, 24, 8);
+  check_struct_and_union3(long long, _Float16, float, 16, 8);
+  check_struct_and_union3(long long, _Float16, double, 24, 8);
+  check_struct_and_union3(long long, _Float16, long double, 32, 16);
+  check_struct_and_union3(long long, int, _Float16, 16, 8);
+  check_struct_and_union3(long long, long, _Float16, 24, 8);
+  check_struct_and_union3(long long, long long, _Float16, 24, 8);
+  check_struct_and_union3(long long, float, _Float16, 16, 8);
+  check_struct_and_union3(long long, double, _Float16, 24, 8);
+  check_struct_and_union3(long long, long double, _Float16, 48, 16);
+  check_struct_and_union3(float, char, _Float16, 8, 4);
+  check_struct_and_union3(float, _Float16, char, 8, 4);
+  check_struct_and_union3(float, _Float16, _Float16, 8, 4);
+  check_struct_and_union3(float, _Float16, int, 12, 4);
+  check_struct_and_union3(float, _Float16, long, 16, 8);
+  check_struct_and_union3(float, _Float16, long long, 16, 8);
+  check_struct_and_union3(float, _Float16, float, 12, 4);
+  check_struct_and_union3(float, _Float16, double, 16, 8);
+  check_struct_and_union3(float, _Float16, long double, 32, 16);
+  check_struct_and_union3(float, int, _Float16, 12, 4);
+  check_struct_and_union3(float, long, _Float16, 24, 8);
+  check_struct_and_union3(float, long long, _Float16, 24, 8);
+  check_struct_and_union3(float, float, _Float16, 12, 4);
+  check_struct_and_union3(float, double, _Float16, 24, 8);
+  check_struct_and_union3(float, long double, _Float16, 48, 16);
+  check_struct_and_union3(double, char, _Float16, 16, 8);
+  check_struct_and_union3(double, _Float16, char, 16, 8);
+  check_struct_and_union3(double, _Float16, _Float16, 16, 8);
+  check_struct_and_union3(double, _Float16, int, 16, 8);
+  check_struct_and_union3(double, _Float16, long, 24, 8);
+  check_struct_and_union3(double, _Float16, long long, 24, 8);
+  check_struct_and_union3(double, _Float16, float, 16, 8);
+  check_struct_and_union3(double, _Float16, double, 24, 8);
+  check_struct_and_union3(double, _Float16, long double, 32, 16);
+  check_struct_and_union3(double, int, _Float16, 16, 8);
+  check_struct_and_union3(double, long, _Float16, 24, 8);
+  check_struct_and_union3(double, long long, _Float16, 24, 8);
+  check_struct_and_union3(double, float, _Float16, 16, 8);
+  check_struct_and_union3(double, double, _Float16, 24, 8);
+  check_struct_and_union3(double, long double, _Float16, 48, 16);
+  check_struct_and_union3(long double, char, _Float16, 32, 16);
+  check_struct_and_union3(long double, _Float16, char, 32, 16);
+  check_struct_and_union3(long double, _Float16, _Float16, 32, 16);
+  check_struct_and_union3(long double, _Float16, int, 32, 16);
+  check_struct_and_union3(long double, _Float16, long, 32, 16);
+  check_struct_and_union3(long double, _Float16, long long, 32, 16);
+  check_struct_and_union3(long double, _Float16, float, 32, 16);
+  check_struct_and_union3(long double, _Float16, double, 32, 16);
+  check_struct_and_union3(long double, _Float16, long double, 48, 16);
+  check_struct_and_union3(long double, int, _Float16, 32, 16);
+  check_struct_and_union3(long double, long, _Float16, 32, 16);
+  check_struct_and_union3(long double, long long, _Float16, 32, 16);
+  check_struct_and_union3(long double, float, _Float16, 32, 16);
+  check_struct_and_union3(long double, double, _Float16, 32, 16);
+  check_struct_and_union3(long double, long double, _Float16, 48, 16);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
new file mode 100644
index 00000000000..2a72b5c9e18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
@@ -0,0 +1,45 @@
+/* This checks alignment of basic types.  */
+
+#include "defines.h"
+#include "macros.h"
+
+
+int
+main (void)
+{
+  /* Integral types.  */
+  run_signed_tests2(check_align, char, TYPE_ALIGN_CHAR);
+  run_signed_tests2(check_align, short, TYPE_ALIGN_SHORT);
+  run_signed_tests2(check_align, int, TYPE_ALIGN_INT);
+  run_signed_tests2(check_align, long, TYPE_ALIGN_LONG);
+  run_signed_tests2(check_align, long long, TYPE_ALIGN_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests2(check_align, __int128, TYPE_ALIGN_INT128);
+#endif
+  check_align(enumtype, TYPE_ALIGN_ENUM);
+
+  /* Floating point types.  */
+  check_align(float, TYPE_ALIGN_FLOAT);
+  check_align(double, TYPE_ALIGN_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_align(long double, TYPE_ALIGN_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_align(__float128, TYPE_ALIGN_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_align(__m64, TYPE_ALIGN_M64);
+  check_align(__m128, TYPE_ALIGN_M128);
+#endif
+
+  /* _Float16 point types.  */
+  check_align(_Float16, TYPE_ALIGN_FLOAT16);
+
+  /* Pointer types.  */
+  check_align(void *, TYPE_ALIGN_POINTER);
+  check_align(void (*)(), TYPE_ALIGN_POINTER);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
new file mode 100644
index 00000000000..d58b9d1c43c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
@@ -0,0 +1,43 @@
+/* This checks .  */
+
+#include "defines.h"
+#include "macros.h"
+
+
+int
+main (void)
+{
+  /* Integral types.  */
+  run_signed_tests3(check_array_size_and_align, char, TYPE_SIZE_CHAR, TYPE_ALIGN_CHAR);
+  run_signed_tests3(check_array_size_and_align, short, TYPE_SIZE_SHORT, TYPE_ALIGN_SHORT);
+  run_signed_tests3(check_array_size_and_align, int, TYPE_SIZE_INT, TYPE_ALIGN_INT);
+  run_signed_tests3(check_array_size_and_align, long, TYPE_SIZE_LONG, TYPE_ALIGN_LONG);
+  run_signed_tests3(check_array_size_and_align, long long, TYPE_SIZE_LONG_LONG, TYPE_ALIGN_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests3(check_array_size_and_align, __int128, TYPE_SIZE_INT128, TYPE_ALIGN_INT128);
+#endif
+  check_array_size_and_align(enum dummytype, TYPE_SIZE_ENUM, TYPE_ALIGN_ENUM);
+
+  /* Floating point types.  */
+  check_array_size_and_align(float, TYPE_SIZE_FLOAT, TYPE_ALIGN_FLOAT);
+  check_array_size_and_align(double, TYPE_SIZE_DOUBLE, TYPE_ALIGN_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_array_size_and_align(long double, TYPE_SIZE_LONG_DOUBLE, TYPE_ALIGN_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_array_size_and_align(__float128, TYPE_SIZE_FLOAT128, TYPE_ALIGN_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_array_size_and_align(__m64, TYPE_SIZE_M64, TYPE_ALIGN_M64);
+  check_array_size_and_align(__m128, TYPE_SIZE_M128, TYPE_ALIGN_M128);
+#endif
+
+  /* Pointer types. The function pointer doesn't work with these macros.  */
+  check_array_size_and_align(void *, TYPE_SIZE_POINTER, TYPE_ALIGN_POINTER);
+
+  check_array_size_and_align(_Float16, TYPE_SIZE_FLOAT16, TYPE_ALIGN_FLOAT16);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
new file mode 100644
index 00000000000..36fb24e6250
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
@@ -0,0 +1,87 @@
+/* This is an autogenerated file. Do not edit.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+char
+fun_test_returning_char (void)
+{
+  volatile_var++;
+  return 64;
+}
+
+short
+fun_test_returning_short (void)
+{
+  volatile_var++;
+  return 65;
+}
+
+int
+fun_test_returning_int (void)
+{
+  volatile_var++;
+  return 66;
+}
+
+long
+fun_test_returning_long (void)
+{
+  volatile_var++;
+  return 67;
+}
+
+long long
+fun_test_returning_long_long (void)
+{
+  volatile_var++;
+  return 68;
+}
+
+float
+fun_test_returning_float (void)
+{
+  volatile_var++;
+  return 69;
+}
+
+double
+fun_test_returning_double (void)
+{
+  volatile_var++;
+  return 70;
+}
+
+long double
+fun_test_returning_long_double (void)
+{
+  volatile_var++;
+  return 71;
+}
+
+_Float16
+fun_test_returning_float16 (void)
+{
+  volatile_var++;
+  return 72;
+}
+
+#define def_test_returning_type_xmm(fun, type, ret, reg) \
+  { type var = WRAP_RET (fun) (); \
+  assert (ret == (type) reg && ret == var); }
+
+static void
+do_test (void)
+{
+  def_test_returning_type_xmm(fun_test_returning_char, char, 64, rax);
+  def_test_returning_type_xmm(fun_test_returning_short, short, 65, rax);
+  def_test_returning_type_xmm(fun_test_returning_int, int, 66, rax);
+  def_test_returning_type_xmm(fun_test_returning_long, long, 67, rax);
+  def_test_returning_type_xmm(fun_test_returning_long_long, long long, 68, rax);
+  def_test_returning_type_xmm(fun_test_returning_float, float, 69, xmm_regs[0]._float[0]);
+  def_test_returning_type_xmm(fun_test_returning_double, double, 70, xmm_regs[0]._double[0]);
+  def_test_returning_type_xmm(fun_test_returning_long_double, long double, 71, x87_regs[0]._ldouble);
+  def_test_returning_type_xmm(fun_test_returning_float16, _Float16, 72, xmm_regs[0].__Float16[0]);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
new file mode 100644
index 00000000000..47f3a5e87ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
@@ -0,0 +1,43 @@
+/* This checks sizes of basic types.  */
+
+#include "defines.h"
+#include "macros.h"
+
+
+int
+main (void)
+{
+  /* Integral types.  */
+  run_signed_tests2(check_size, char, TYPE_SIZE_CHAR);
+  run_signed_tests2(check_size, short, TYPE_SIZE_SHORT);
+  run_signed_tests2(check_size, int, TYPE_SIZE_INT);
+  run_signed_tests2(check_size, long, TYPE_SIZE_LONG);
+  run_signed_tests2(check_size, long long, TYPE_SIZE_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests2(check_size, __int128, TYPE_SIZE_INT128);
+#endif
+  check_size(enumtype, TYPE_SIZE_ENUM);
+
+  /* Floating point types.  */
+  check_size(_Float16, TYPE_SIZE_FLOAT16);
+  check_size(float, TYPE_SIZE_FLOAT);
+  check_size(double, TYPE_SIZE_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_size(long double, TYPE_SIZE_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_size(__float128, TYPE_SIZE_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_size(__m64, TYPE_SIZE_M64);
+  check_size(__m128, TYPE_SIZE_M128);
+#endif
+
+  /* Pointer types.  */
+  check_size(void *, TYPE_SIZE_POINTER);
+  check_size(void (*)(), TYPE_SIZE_POINTER);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
new file mode 100644
index 00000000000..3d1add464a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
@@ -0,0 +1,42 @@
+/* This checks size and alignment of structs with a single basic type
+   element. All basic types are checked.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+
+
+static void
+do_test (void)
+{
+  /* Integral types.  */
+  run_signed_tests3(check_basic_struct_size_and_align, char, TYPE_SIZE_CHAR, TYPE_ALIGN_CHAR);
+  run_signed_tests3(check_basic_struct_size_and_align, short, TYPE_SIZE_SHORT, TYPE_ALIGN_SHORT);
+  run_signed_tests3(check_basic_struct_size_and_align, int, TYPE_SIZE_INT, TYPE_ALIGN_INT);
+  run_signed_tests3(check_basic_struct_size_and_align, long, TYPE_SIZE_LONG, TYPE_ALIGN_LONG);
+  run_signed_tests3(check_basic_struct_size_and_align, long long, TYPE_SIZE_LONG_LONG, TYPE_ALIGN_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests3(check_basic_struct_size_and_align, __int128, TYPE_SIZE_INT128, TYPE_ALIGN_INT128);
+#endif
+  check_basic_struct_size_and_align(enum dummytype, TYPE_SIZE_ENUM, TYPE_ALIGN_ENUM);
+
+  /* Floating point types.  */
+  check_basic_struct_size_and_align(_Float16, TYPE_SIZE_FLOAT16, TYPE_ALIGN_FLOAT16);
+  check_basic_struct_size_and_align(float, TYPE_SIZE_FLOAT, TYPE_ALIGN_FLOAT);
+  check_basic_struct_size_and_align(double, TYPE_SIZE_DOUBLE, TYPE_ALIGN_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_basic_struct_size_and_align(long double, TYPE_SIZE_LONG_DOUBLE, TYPE_ALIGN_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_basic_struct_size_and_align(__float128, TYPE_SIZE_FLOAT128, TYPE_ALIGN_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_basic_struct_size_and_align(__m64, TYPE_SIZE_M64, TYPE_ALIGN_M64);
+  check_basic_struct_size_and_align(__m128, TYPE_SIZE_M128, TYPE_ALIGN_M128);
+#endif
+
+  /* Pointer types. The function pointer doesn't work with these macros.  */
+  check_basic_struct_size_and_align(void *, TYPE_SIZE_POINTER, TYPE_ALIGN_POINTER);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
new file mode 100644
index 00000000000..632feebe920
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
@@ -0,0 +1,40 @@
+/* Test of simple unions, size and alignment.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+
+static void
+do_test (void)
+{
+  /* Integral types.  */
+  run_signed_tests3(check_basic_union_size_and_align, char, TYPE_SIZE_CHAR, TYPE_ALIGN_CHAR);
+  run_signed_tests3(check_basic_union_size_and_align, short, TYPE_SIZE_SHORT, TYPE_ALIGN_SHORT);
+  run_signed_tests3(check_basic_union_size_and_align, int, TYPE_SIZE_INT, TYPE_ALIGN_INT);
+  run_signed_tests3(check_basic_union_size_and_align, long, TYPE_SIZE_LONG, TYPE_ALIGN_LONG);
+  run_signed_tests3(check_basic_union_size_and_align, long long, TYPE_SIZE_LONG_LONG, TYPE_ALIGN_LONG_LONG);
+#ifdef CHECK_INT128
+  run_signed_tests3(check_basic_union_size_and_align, __int128, TYPE_SIZE_INT128, TYPE_ALIGN_INT128);
+#endif
+  check_basic_union_size_and_align(enum dummytype, TYPE_SIZE_ENUM, TYPE_ALIGN_ENUM);
+
+  /* Floating point types.  */
+  check_basic_union_size_and_align(_Float16, TYPE_SIZE_FLOAT16, TYPE_ALIGN_FLOAT16);
+  check_basic_union_size_and_align(float, TYPE_SIZE_FLOAT, TYPE_ALIGN_FLOAT);
+  check_basic_union_size_and_align(double, TYPE_SIZE_DOUBLE, TYPE_ALIGN_DOUBLE);
+#ifdef CHECK_LONG_DOUBLE
+  check_basic_union_size_and_align(long double, TYPE_SIZE_LONG_DOUBLE, TYPE_ALIGN_LONG_DOUBLE);
+#endif
+#ifdef CHECK_FLOAT128
+  check_basic_union_size_and_align(__float128, TYPE_SIZE_FLOAT128, TYPE_ALIGN_FLOAT128);
+#endif
+
+  /* Packed types - MMX, 3DNow!, SSE and SSE2.  */
+#ifdef CHECK_M64_M128
+  check_basic_union_size_and_align(__m64, TYPE_SIZE_M64, TYPE_ALIGN_M64);
+  check_basic_union_size_and_align(__m128, TYPE_SIZE_M128, TYPE_ALIGN_M128);
+#endif
+
+  /* Pointer types. The function pointer doesn't work with these macros.  */
+  check_basic_union_size_and_align(void *, TYPE_SIZE_POINTER, TYPE_ALIGN_POINTER);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
new file mode 100644
index 00000000000..829d86e9ee7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
@@ -0,0 +1,104 @@
+/* This is a small test case for returning a complex number. Written by
+   Andreas Jaeger.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+
+#define BUILD_F16_COMPLEX(real, imag) \
+  ({ __complex__ _Float16 __retval; \
+     __real__ __retval = (real); \
+     __imag__ __retval = (imag); \
+     __retval; })
+
+__complex__ _Float16
+aj_f16_times2 (__complex__ _Float16 x)
+{
+  __complex__ _Float16 res;
+
+  __real__ res = (2.0 * __real__ x);
+  __imag__ res = (2.0 * __imag__ x);
+
+  return res;
+}
+
+#define BUILD_F_COMPLEX(real, imag) \
+  ({ __complex__ float __retval; \
+     __real__ __retval = (real); \
+     __imag__ __retval = (imag); \
+     __retval; })
+
+#define BUILD_D_COMPLEX(real, imag) \
+  ({ __complex__ double __retval; \
+     __real__ __retval = (real); \
+     __imag__ __retval = (imag); \
+     __retval; })
+
+#define BUILD_LD_COMPLEX(real, imag) \
+  ({ __complex__ long double __retval; \
+     __real__ __retval = (real); \
+     __imag__ __retval = (imag); \
+     __retval; })
+
+__complex__ float
+aj_f_times2 (__complex__ float x)
+{
+  __complex__ float res;
+
+  __real__ res = (2.0 * __real__ x);
+  __imag__ res = (2.0 * __imag__ x);
+
+  return res;
+}
+
+__complex__ double
+aj_d_times2 (__complex__ double x)
+{
+  __complex__ double res;
+
+  __real__ res = (2.0 * __real__ x);
+  __imag__ res = (2.0 * __imag__ x);
+
+  return res;
+}
+
+__complex__ long double
+aj_ld_times2 (__complex__ long double x)
+{
+  __complex__ long double res;
+
+  __real__ res = (2.0 * __real__ x);
+  __imag__ res = (2.0 * __imag__ x);
+
+  return res;
+}
+
+static void
+do_test (void)
+{
+#ifdef CHECK_COMPLEX
+  _Complex _Float16 f16c, f16d;
+  _Complex float fc, fd;
+  _Complex double dc, dd;
+  _Complex long double ldc, ldd;
+
+  f16c = BUILD_F16_COMPLEX (2.0, 3.0);
+  f16d = aj_f16_times2 (f16c);
+
+  assert (__real__ f16d == 4.0f16 && __imag__ f16d == 6.0f16);
+
+  fc = BUILD_LD_COMPLEX (2.0f, 3.0f);
+  fd = aj_f_times2 (fc);
+
+  assert (__real__ fd == 4.0f && __imag__ fd == 6.0f);
+
+  dc = BUILD_LD_COMPLEX (2.0, 3.0);
+  dd = aj_ld_times2 (dc);
+
+  assert (__real__ dd == 4.0 && __imag__ dd == 6.0);
+
+  ldc = BUILD_LD_COMPLEX (2.0L, 3.0L);
+  ldd = aj_ld_times2 (ldc);
+
+  assert (__real__ ldd == 4.0L && __imag__ ldd == 6.0L);
+#endif
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
new file mode 100644
index 00000000000..34afee66586
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
@@ -0,0 +1,73 @@
+#include <stdio.h>
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+__m64
+fun_test_returning___m64 (void)
+{
+  volatile_var++;
+  return (__m64){72,0};
+}
+
+__m128
+fun_test_returning___m128 (void)
+{
+  volatile_var++;
+  return (__m128){73,0,0,0};
+}
+
+__m128h
+fun_test_returning___m128h (void)
+{
+  volatile_var++;
+  return (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16,
+                   6.6f16, 7.7f16, 8.8f16};
+}
+
+__m64 test_64;
+__m128 test_128;
+__m128h test_128h;
+
+static void
+do_test (void)
+{
+  unsigned failed = 0;
+  XMM_T xmmt1, xmmt2;
+
+  /* We jump through hoops to compare the results as gcc 3.3 does throw
+     an ICE when trying to generate a compare for a == b, when a and b
+     are of __m64 or __m128 type :-(  */
+  clear_struct_registers;
+  test_64 = (__m64){72,0};
+  xmmt1._m64[0] = test_64;
+  xmmt2._m64[0] = WRAP_RET (fun_test_returning___m64)();
+  if (xmmt1._long[0] != xmmt2._long[0]
+      || xmmt1._long[0] != xmm_regs[0]._long[0])
+    printf ("fail m64\n"), failed++;
+
+  clear_struct_registers;
+  test_128 = (__m128){73,0};
+  xmmt1._m128[0] = test_128;
+  xmmt2._m128[0] = WRAP_RET (fun_test_returning___m128)();
+  if (xmmt1._long[0] != xmmt2._long[0]
+      || xmmt1._long[0] != xmm_regs[0]._long[0])
+    printf ("fail m128\n"), failed++;
+
+  clear_struct_registers;
+  test_128h = (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16,
+                        6.6f16, 7.7f16, 8.8f16};
+  xmmt1._m128h[0] = test_128h;
+  xmmt2._m128h[0] = WRAP_RET (fun_test_returning___m128h)();
+  if (xmmt1._long[0] != xmmt2._long[0]
+      || xmmt1._long[0] != xmm_regs[0]._long[0])
+    printf ("fail m128h\n"), failed++;
+
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
new file mode 100644
index 00000000000..678b25c14d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
@@ -0,0 +1,1066 @@
+/* This is an autogenerated file. Do not edit.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  _Float16 f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14,
+    f15, f16, f17, f18, f19, f20, f21, f22, f23;
+} values__Float16;
+
+struct
+{
+  float f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15,
+    f16, f17, f18, f19, f20, f21, f22, f23;
+} values_float;
+
+struct
+{
+  double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15,
+    f16, f17, f18, f19, f20, f21, f22, f23;
+} values_double;
+
+struct
+{
+  ldouble f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14,
+    f15, f16, f17, f18, f19, f20, f21, f22, f23;
+} values_ldouble;
+
+void
+fun_check_float16_passing_8_values (_Float16 f0 ATTRIBUTE_UNUSED,
+				    _Float16 f1 ATTRIBUTE_UNUSED,
+				    _Float16 f2 ATTRIBUTE_UNUSED,
+				    _Float16 f3 ATTRIBUTE_UNUSED,
+				    _Float16 f4 ATTRIBUTE_UNUSED,
+				    _Float16 f5 ATTRIBUTE_UNUSED,
+				    _Float16 f6 ATTRIBUTE_UNUSED,
+				    _Float16 f7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values__Float16.f0 == f0);
+  assert (values__Float16.f1 == f1);
+  assert (values__Float16.f2 == f2);
+  assert (values__Float16.f3 == f3);
+  assert (values__Float16.f4 == f4);
+  assert (values__Float16.f5 == f5);
+  assert (values__Float16.f6 == f6);
+  assert (values__Float16.f7 == f7);
+}
+
+void
+fun_check_float16_passing_8_regs (_Float16 f0 ATTRIBUTE_UNUSED,
+				  _Float16 f1 ATTRIBUTE_UNUSED,
+				  _Float16 f2 ATTRIBUTE_UNUSED,
+				  _Float16 f3 ATTRIBUTE_UNUSED,
+				  _Float16 f4 ATTRIBUTE_UNUSED,
+				  _Float16 f5 ATTRIBUTE_UNUSED,
+				  _Float16 f6 ATTRIBUTE_UNUSED,
+				  _Float16 f7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float16_arguments;
+}
+
+void
+fun_check_float16_passing_16_values (_Float16 f0 ATTRIBUTE_UNUSED,
+				     _Float16 f1 ATTRIBUTE_UNUSED,
+				     _Float16 f2 ATTRIBUTE_UNUSED,
+				     _Float16 f3 ATTRIBUTE_UNUSED,
+				     _Float16 f4 ATTRIBUTE_UNUSED,
+				     _Float16 f5 ATTRIBUTE_UNUSED,
+				     _Float16 f6 ATTRIBUTE_UNUSED,
+				     _Float16 f7 ATTRIBUTE_UNUSED,
+				     _Float16 f8 ATTRIBUTE_UNUSED,
+				     _Float16 f9 ATTRIBUTE_UNUSED,
+				     _Float16 f10 ATTRIBUTE_UNUSED,
+				     _Float16 f11 ATTRIBUTE_UNUSED,
+				     _Float16 f12 ATTRIBUTE_UNUSED,
+				     _Float16 f13 ATTRIBUTE_UNUSED,
+				     _Float16 f14 ATTRIBUTE_UNUSED,
+				     _Float16 f15 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values__Float16.f0 == f0);
+  assert (values__Float16.f1 == f1);
+  assert (values__Float16.f2 == f2);
+  assert (values__Float16.f3 == f3);
+  assert (values__Float16.f4 == f4);
+  assert (values__Float16.f5 == f5);
+  assert (values__Float16.f6 == f6);
+  assert (values__Float16.f7 == f7);
+  assert (values__Float16.f8 == f8);
+  assert (values__Float16.f9 == f9);
+  assert (values__Float16.f10 == f10);
+  assert (values__Float16.f11 == f11);
+  assert (values__Float16.f12 == f12);
+  assert (values__Float16.f13 == f13);
+  assert (values__Float16.f14 == f14);
+  assert (values__Float16.f15 == f15);
+}
+
+void
+fun_check_float16_passing_16_regs (_Float16 f0 ATTRIBUTE_UNUSED,
+				   _Float16 f1 ATTRIBUTE_UNUSED,
+				   _Float16 f2 ATTRIBUTE_UNUSED,
+				   _Float16 f3 ATTRIBUTE_UNUSED,
+				   _Float16 f4 ATTRIBUTE_UNUSED,
+				   _Float16 f5 ATTRIBUTE_UNUSED,
+				   _Float16 f6 ATTRIBUTE_UNUSED,
+				   _Float16 f7 ATTRIBUTE_UNUSED,
+				   _Float16 f8 ATTRIBUTE_UNUSED,
+				   _Float16 f9 ATTRIBUTE_UNUSED,
+				   _Float16 f10 ATTRIBUTE_UNUSED,
+				   _Float16 f11 ATTRIBUTE_UNUSED,
+				   _Float16 f12 ATTRIBUTE_UNUSED,
+				   _Float16 f13 ATTRIBUTE_UNUSED,
+				   _Float16 f14 ATTRIBUTE_UNUSED,
+				   _Float16 f15 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float16_arguments;
+}
+
+void
+fun_check_float16_passing_20_values (_Float16 f0 ATTRIBUTE_UNUSED,
+				     _Float16 f1 ATTRIBUTE_UNUSED,
+				     _Float16 f2 ATTRIBUTE_UNUSED,
+				     _Float16 f3 ATTRIBUTE_UNUSED,
+				     _Float16 f4 ATTRIBUTE_UNUSED,
+				     _Float16 f5 ATTRIBUTE_UNUSED,
+				     _Float16 f6 ATTRIBUTE_UNUSED,
+				     _Float16 f7 ATTRIBUTE_UNUSED,
+				     _Float16 f8 ATTRIBUTE_UNUSED,
+				     _Float16 f9 ATTRIBUTE_UNUSED,
+				     _Float16 f10 ATTRIBUTE_UNUSED,
+				     _Float16 f11 ATTRIBUTE_UNUSED,
+				     _Float16 f12 ATTRIBUTE_UNUSED,
+				     _Float16 f13 ATTRIBUTE_UNUSED,
+				     _Float16 f14 ATTRIBUTE_UNUSED,
+				     _Float16 f15 ATTRIBUTE_UNUSED,
+				     _Float16 f16 ATTRIBUTE_UNUSED,
+				     _Float16 f17 ATTRIBUTE_UNUSED,
+				     _Float16 f18 ATTRIBUTE_UNUSED,
+				     _Float16 f19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values__Float16.f0 == f0);
+  assert (values__Float16.f1 == f1);
+  assert (values__Float16.f2 == f2);
+  assert (values__Float16.f3 == f3);
+  assert (values__Float16.f4 == f4);
+  assert (values__Float16.f5 == f5);
+  assert (values__Float16.f6 == f6);
+  assert (values__Float16.f7 == f7);
+  assert (values__Float16.f8 == f8);
+  assert (values__Float16.f9 == f9);
+  assert (values__Float16.f10 == f10);
+  assert (values__Float16.f11 == f11);
+  assert (values__Float16.f12 == f12);
+  assert (values__Float16.f13 == f13);
+  assert (values__Float16.f14 == f14);
+  assert (values__Float16.f15 == f15);
+  assert (values__Float16.f16 == f16);
+  assert (values__Float16.f17 == f17);
+  assert (values__Float16.f18 == f18);
+  assert (values__Float16.f19 == f19);
+}
+
+void
+fun_check_float16_passing_20_regs (_Float16 f0 ATTRIBUTE_UNUSED,
+				   _Float16 f1 ATTRIBUTE_UNUSED,
+				   _Float16 f2 ATTRIBUTE_UNUSED,
+				   _Float16 f3 ATTRIBUTE_UNUSED,
+				   _Float16 f4 ATTRIBUTE_UNUSED,
+				   _Float16 f5 ATTRIBUTE_UNUSED,
+				   _Float16 f6 ATTRIBUTE_UNUSED,
+				   _Float16 f7 ATTRIBUTE_UNUSED,
+				   _Float16 f8 ATTRIBUTE_UNUSED,
+				   _Float16 f9 ATTRIBUTE_UNUSED,
+				   _Float16 f10 ATTRIBUTE_UNUSED,
+				   _Float16 f11 ATTRIBUTE_UNUSED,
+				   _Float16 f12 ATTRIBUTE_UNUSED,
+				   _Float16 f13 ATTRIBUTE_UNUSED,
+				   _Float16 f14 ATTRIBUTE_UNUSED,
+				   _Float16 f15 ATTRIBUTE_UNUSED,
+				   _Float16 f16 ATTRIBUTE_UNUSED,
+				   _Float16 f17 ATTRIBUTE_UNUSED,
+				   _Float16 f18 ATTRIBUTE_UNUSED,
+				   _Float16 f19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float16_arguments;
+}
+
+void
+fun_check_float_passing_float8_values (float f0 ATTRIBUTE_UNUSED,
+				       float f1 ATTRIBUTE_UNUSED,
+				       float f2 ATTRIBUTE_UNUSED,
+				       float f3 ATTRIBUTE_UNUSED,
+				       float f4 ATTRIBUTE_UNUSED,
+				       float f5 ATTRIBUTE_UNUSED,
+				       float f6 ATTRIBUTE_UNUSED,
+				       float f7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_float.f0 == f0);
+  assert (values_float.f1 == f1);
+  assert (values_float.f2 == f2);
+  assert (values_float.f3 == f3);
+  assert (values_float.f4 == f4);
+  assert (values_float.f5 == f5);
+  assert (values_float.f6 == f6);
+  assert (values_float.f7 == f7);
+
+}
+
+void
+fun_check_float_passing_float8_regs (float f0 ATTRIBUTE_UNUSED,
+				     float f1 ATTRIBUTE_UNUSED,
+				     float f2 ATTRIBUTE_UNUSED,
+				     float f3 ATTRIBUTE_UNUSED,
+				     float f4 ATTRIBUTE_UNUSED,
+				     float f5 ATTRIBUTE_UNUSED,
+				     float f6 ATTRIBUTE_UNUSED,
+				     float f7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float_arguments;
+}
+
+void
+fun_check_float_passing_float16_values (float f0 ATTRIBUTE_UNUSED,
+					float f1 ATTRIBUTE_UNUSED,
+					float f2 ATTRIBUTE_UNUSED,
+					float f3 ATTRIBUTE_UNUSED,
+					float f4 ATTRIBUTE_UNUSED,
+					float f5 ATTRIBUTE_UNUSED,
+					float f6 ATTRIBUTE_UNUSED,
+					float f7 ATTRIBUTE_UNUSED,
+					float f8 ATTRIBUTE_UNUSED,
+					float f9 ATTRIBUTE_UNUSED,
+					float f10 ATTRIBUTE_UNUSED,
+					float f11 ATTRIBUTE_UNUSED,
+					float f12 ATTRIBUTE_UNUSED,
+					float f13 ATTRIBUTE_UNUSED,
+					float f14 ATTRIBUTE_UNUSED,
+					float f15 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_float.f0 == f0);
+  assert (values_float.f1 == f1);
+  assert (values_float.f2 == f2);
+  assert (values_float.f3 == f3);
+  assert (values_float.f4 == f4);
+  assert (values_float.f5 == f5);
+  assert (values_float.f6 == f6);
+  assert (values_float.f7 == f7);
+  assert (values_float.f8 == f8);
+  assert (values_float.f9 == f9);
+  assert (values_float.f10 == f10);
+  assert (values_float.f11 == f11);
+  assert (values_float.f12 == f12);
+  assert (values_float.f13 == f13);
+  assert (values_float.f14 == f14);
+  assert (values_float.f15 == f15);
+
+}
+
+void
+fun_check_float_passing_float16_regs (float f0 ATTRIBUTE_UNUSED,
+				      float f1 ATTRIBUTE_UNUSED,
+				      float f2 ATTRIBUTE_UNUSED,
+				      float f3 ATTRIBUTE_UNUSED,
+				      float f4 ATTRIBUTE_UNUSED,
+				      float f5 ATTRIBUTE_UNUSED,
+				      float f6 ATTRIBUTE_UNUSED,
+				      float f7 ATTRIBUTE_UNUSED,
+				      float f8 ATTRIBUTE_UNUSED,
+				      float f9 ATTRIBUTE_UNUSED,
+				      float f10 ATTRIBUTE_UNUSED,
+				      float f11 ATTRIBUTE_UNUSED,
+				      float f12 ATTRIBUTE_UNUSED,
+				      float f13 ATTRIBUTE_UNUSED,
+				      float f14 ATTRIBUTE_UNUSED,
+				      float f15 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float_arguments;
+}
+
+void
+fun_check_float_passing_float20_values (float f0 ATTRIBUTE_UNUSED,
+					float f1 ATTRIBUTE_UNUSED,
+					float f2 ATTRIBUTE_UNUSED,
+					float f3 ATTRIBUTE_UNUSED,
+					float f4 ATTRIBUTE_UNUSED,
+					float f5 ATTRIBUTE_UNUSED,
+					float f6 ATTRIBUTE_UNUSED,
+					float f7 ATTRIBUTE_UNUSED,
+					float f8 ATTRIBUTE_UNUSED,
+					float f9 ATTRIBUTE_UNUSED,
+					float f10 ATTRIBUTE_UNUSED,
+					float f11 ATTRIBUTE_UNUSED,
+					float f12 ATTRIBUTE_UNUSED,
+					float f13 ATTRIBUTE_UNUSED,
+					float f14 ATTRIBUTE_UNUSED,
+					float f15 ATTRIBUTE_UNUSED,
+					float f16 ATTRIBUTE_UNUSED,
+					float f17 ATTRIBUTE_UNUSED,
+					float f18 ATTRIBUTE_UNUSED,
+					float f19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_float.f0 == f0);
+  assert (values_float.f1 == f1);
+  assert (values_float.f2 == f2);
+  assert (values_float.f3 == f3);
+  assert (values_float.f4 == f4);
+  assert (values_float.f5 == f5);
+  assert (values_float.f6 == f6);
+  assert (values_float.f7 == f7);
+  assert (values_float.f8 == f8);
+  assert (values_float.f9 == f9);
+  assert (values_float.f10 == f10);
+  assert (values_float.f11 == f11);
+  assert (values_float.f12 == f12);
+  assert (values_float.f13 == f13);
+  assert (values_float.f14 == f14);
+  assert (values_float.f15 == f15);
+  assert (values_float.f16 == f16);
+  assert (values_float.f17 == f17);
+  assert (values_float.f18 == f18);
+  assert (values_float.f19 == f19);
+
+}
+
+void
+fun_check_float_passing_float20_regs (float f0 ATTRIBUTE_UNUSED,
+				      float f1 ATTRIBUTE_UNUSED,
+				      float f2 ATTRIBUTE_UNUSED,
+				      float f3 ATTRIBUTE_UNUSED,
+				      float f4 ATTRIBUTE_UNUSED,
+				      float f5 ATTRIBUTE_UNUSED,
+				      float f6 ATTRIBUTE_UNUSED,
+				      float f7 ATTRIBUTE_UNUSED,
+				      float f8 ATTRIBUTE_UNUSED,
+				      float f9 ATTRIBUTE_UNUSED,
+				      float f10 ATTRIBUTE_UNUSED,
+				      float f11 ATTRIBUTE_UNUSED,
+				      float f12 ATTRIBUTE_UNUSED,
+				      float f13 ATTRIBUTE_UNUSED,
+				      float f14 ATTRIBUTE_UNUSED,
+				      float f15 ATTRIBUTE_UNUSED,
+				      float f16 ATTRIBUTE_UNUSED,
+				      float f17 ATTRIBUTE_UNUSED,
+				      float f18 ATTRIBUTE_UNUSED,
+				      float f19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_float_arguments;
+}
+
+void
+fun_check_float_passing_double8_values (double f0 ATTRIBUTE_UNUSED,
+					double f1 ATTRIBUTE_UNUSED,
+					double f2 ATTRIBUTE_UNUSED,
+					double f3 ATTRIBUTE_UNUSED,
+					double f4 ATTRIBUTE_UNUSED,
+					double f5 ATTRIBUTE_UNUSED,
+					double f6 ATTRIBUTE_UNUSED,
+					double f7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_double.f0 == f0);
+  assert (values_double.f1 == f1);
+  assert (values_double.f2 == f2);
+  assert (values_double.f3 == f3);
+  assert (values_double.f4 == f4);
+  assert (values_double.f5 == f5);
+  assert (values_double.f6 == f6);
+  assert (values_double.f7 == f7);
+
+}
+
+void
+fun_check_float_passing_double8_regs (double f0 ATTRIBUTE_UNUSED,
+				      double f1 ATTRIBUTE_UNUSED,
+				      double f2 ATTRIBUTE_UNUSED,
+				      double f3 ATTRIBUTE_UNUSED,
+				      double f4 ATTRIBUTE_UNUSED,
+				      double f5 ATTRIBUTE_UNUSED,
+				      double f6 ATTRIBUTE_UNUSED,
+				      double f7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_double_arguments;
+}
+
+void
+fun_check_float_passing_double16_values (double f0 ATTRIBUTE_UNUSED,
+					 double f1 ATTRIBUTE_UNUSED,
+					 double f2 ATTRIBUTE_UNUSED,
+					 double f3 ATTRIBUTE_UNUSED,
+					 double f4 ATTRIBUTE_UNUSED,
+					 double f5 ATTRIBUTE_UNUSED,
+					 double f6 ATTRIBUTE_UNUSED,
+					 double f7 ATTRIBUTE_UNUSED,
+					 double f8 ATTRIBUTE_UNUSED,
+					 double f9 ATTRIBUTE_UNUSED,
+					 double f10 ATTRIBUTE_UNUSED,
+					 double f11 ATTRIBUTE_UNUSED,
+					 double f12 ATTRIBUTE_UNUSED,
+					 double f13 ATTRIBUTE_UNUSED,
+					 double f14 ATTRIBUTE_UNUSED,
+					 double f15 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_double.f0 == f0);
+  assert (values_double.f1 == f1);
+  assert (values_double.f2 == f2);
+  assert (values_double.f3 == f3);
+  assert (values_double.f4 == f4);
+  assert (values_double.f5 == f5);
+  assert (values_double.f6 == f6);
+  assert (values_double.f7 == f7);
+  assert (values_double.f8 == f8);
+  assert (values_double.f9 == f9);
+  assert (values_double.f10 == f10);
+  assert (values_double.f11 == f11);
+  assert (values_double.f12 == f12);
+  assert (values_double.f13 == f13);
+  assert (values_double.f14 == f14);
+  assert (values_double.f15 == f15);
+
+}
+
+void
+fun_check_float_passing_double16_regs (double f0 ATTRIBUTE_UNUSED,
+				       double f1 ATTRIBUTE_UNUSED,
+				       double f2 ATTRIBUTE_UNUSED,
+				       double f3 ATTRIBUTE_UNUSED,
+				       double f4 ATTRIBUTE_UNUSED,
+				       double f5 ATTRIBUTE_UNUSED,
+				       double f6 ATTRIBUTE_UNUSED,
+				       double f7 ATTRIBUTE_UNUSED,
+				       double f8 ATTRIBUTE_UNUSED,
+				       double f9 ATTRIBUTE_UNUSED,
+				       double f10 ATTRIBUTE_UNUSED,
+				       double f11 ATTRIBUTE_UNUSED,
+				       double f12 ATTRIBUTE_UNUSED,
+				       double f13 ATTRIBUTE_UNUSED,
+				       double f14 ATTRIBUTE_UNUSED,
+				       double f15 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_double_arguments;
+}
+
+void
+fun_check_float_passing_double20_values (double f0 ATTRIBUTE_UNUSED,
+					 double f1 ATTRIBUTE_UNUSED,
+					 double f2 ATTRIBUTE_UNUSED,
+					 double f3 ATTRIBUTE_UNUSED,
+					 double f4 ATTRIBUTE_UNUSED,
+					 double f5 ATTRIBUTE_UNUSED,
+					 double f6 ATTRIBUTE_UNUSED,
+					 double f7 ATTRIBUTE_UNUSED,
+					 double f8 ATTRIBUTE_UNUSED,
+					 double f9 ATTRIBUTE_UNUSED,
+					 double f10 ATTRIBUTE_UNUSED,
+					 double f11 ATTRIBUTE_UNUSED,
+					 double f12 ATTRIBUTE_UNUSED,
+					 double f13 ATTRIBUTE_UNUSED,
+					 double f14 ATTRIBUTE_UNUSED,
+					 double f15 ATTRIBUTE_UNUSED,
+					 double f16 ATTRIBUTE_UNUSED,
+					 double f17 ATTRIBUTE_UNUSED,
+					 double f18 ATTRIBUTE_UNUSED,
+					 double f19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_double.f0 == f0);
+  assert (values_double.f1 == f1);
+  assert (values_double.f2 == f2);
+  assert (values_double.f3 == f3);
+  assert (values_double.f4 == f4);
+  assert (values_double.f5 == f5);
+  assert (values_double.f6 == f6);
+  assert (values_double.f7 == f7);
+  assert (values_double.f8 == f8);
+  assert (values_double.f9 == f9);
+  assert (values_double.f10 == f10);
+  assert (values_double.f11 == f11);
+  assert (values_double.f12 == f12);
+  assert (values_double.f13 == f13);
+  assert (values_double.f14 == f14);
+  assert (values_double.f15 == f15);
+  assert (values_double.f16 == f16);
+  assert (values_double.f17 == f17);
+  assert (values_double.f18 == f18);
+  assert (values_double.f19 == f19);
+
+}
+
+void
+fun_check_float_passing_double20_regs (double f0 ATTRIBUTE_UNUSED,
+				       double f1 ATTRIBUTE_UNUSED,
+				       double f2 ATTRIBUTE_UNUSED,
+				       double f3 ATTRIBUTE_UNUSED,
+				       double f4 ATTRIBUTE_UNUSED,
+				       double f5 ATTRIBUTE_UNUSED,
+				       double f6 ATTRIBUTE_UNUSED,
+				       double f7 ATTRIBUTE_UNUSED,
+				       double f8 ATTRIBUTE_UNUSED,
+				       double f9 ATTRIBUTE_UNUSED,
+				       double f10 ATTRIBUTE_UNUSED,
+				       double f11 ATTRIBUTE_UNUSED,
+				       double f12 ATTRIBUTE_UNUSED,
+				       double f13 ATTRIBUTE_UNUSED,
+				       double f14 ATTRIBUTE_UNUSED,
+				       double f15 ATTRIBUTE_UNUSED,
+				       double f16 ATTRIBUTE_UNUSED,
+				       double f17 ATTRIBUTE_UNUSED,
+				       double f18 ATTRIBUTE_UNUSED,
+				       double f19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_double_arguments;
+}
+
+void
+fun_check_x87_passing_ldouble8_values (ldouble f0 ATTRIBUTE_UNUSED,
+				       ldouble f1 ATTRIBUTE_UNUSED,
+				       ldouble f2 ATTRIBUTE_UNUSED,
+				       ldouble f3 ATTRIBUTE_UNUSED,
+				       ldouble f4 ATTRIBUTE_UNUSED,
+				       ldouble f5 ATTRIBUTE_UNUSED,
+				       ldouble f6 ATTRIBUTE_UNUSED,
+				       ldouble f7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_ldouble.f0 == f0);
+  assert (values_ldouble.f1 == f1);
+  assert (values_ldouble.f2 == f2);
+  assert (values_ldouble.f3 == f3);
+  assert (values_ldouble.f4 == f4);
+  assert (values_ldouble.f5 == f5);
+  assert (values_ldouble.f6 == f6);
+  assert (values_ldouble.f7 == f7);
+
+}
+
+void
+fun_check_x87_passing_ldouble8_regs (ldouble f0 ATTRIBUTE_UNUSED,
+				     ldouble f1 ATTRIBUTE_UNUSED,
+				     ldouble f2 ATTRIBUTE_UNUSED,
+				     ldouble f3 ATTRIBUTE_UNUSED,
+				     ldouble f4 ATTRIBUTE_UNUSED,
+				     ldouble f5 ATTRIBUTE_UNUSED,
+				     ldouble f6 ATTRIBUTE_UNUSED,
+				     ldouble f7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_ldouble_arguments;
+}
+
+void
+fun_check_x87_passing_ldouble16_values (ldouble f0 ATTRIBUTE_UNUSED,
+					ldouble f1 ATTRIBUTE_UNUSED,
+					ldouble f2 ATTRIBUTE_UNUSED,
+					ldouble f3 ATTRIBUTE_UNUSED,
+					ldouble f4 ATTRIBUTE_UNUSED,
+					ldouble f5 ATTRIBUTE_UNUSED,
+					ldouble f6 ATTRIBUTE_UNUSED,
+					ldouble f7 ATTRIBUTE_UNUSED,
+					ldouble f8 ATTRIBUTE_UNUSED,
+					ldouble f9 ATTRIBUTE_UNUSED,
+					ldouble f10 ATTRIBUTE_UNUSED,
+					ldouble f11 ATTRIBUTE_UNUSED,
+					ldouble f12 ATTRIBUTE_UNUSED,
+					ldouble f13 ATTRIBUTE_UNUSED,
+					ldouble f14 ATTRIBUTE_UNUSED,
+					ldouble f15 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_ldouble.f0 == f0);
+  assert (values_ldouble.f1 == f1);
+  assert (values_ldouble.f2 == f2);
+  assert (values_ldouble.f3 == f3);
+  assert (values_ldouble.f4 == f4);
+  assert (values_ldouble.f5 == f5);
+  assert (values_ldouble.f6 == f6);
+  assert (values_ldouble.f7 == f7);
+  assert (values_ldouble.f8 == f8);
+  assert (values_ldouble.f9 == f9);
+  assert (values_ldouble.f10 == f10);
+  assert (values_ldouble.f11 == f11);
+  assert (values_ldouble.f12 == f12);
+  assert (values_ldouble.f13 == f13);
+  assert (values_ldouble.f14 == f14);
+  assert (values_ldouble.f15 == f15);
+
+}
+
+void
+fun_check_x87_passing_ldouble16_regs (ldouble f0 ATTRIBUTE_UNUSED,
+				      ldouble f1 ATTRIBUTE_UNUSED,
+				      ldouble f2 ATTRIBUTE_UNUSED,
+				      ldouble f3 ATTRIBUTE_UNUSED,
+				      ldouble f4 ATTRIBUTE_UNUSED,
+				      ldouble f5 ATTRIBUTE_UNUSED,
+				      ldouble f6 ATTRIBUTE_UNUSED,
+				      ldouble f7 ATTRIBUTE_UNUSED,
+				      ldouble f8 ATTRIBUTE_UNUSED,
+				      ldouble f9 ATTRIBUTE_UNUSED,
+				      ldouble f10 ATTRIBUTE_UNUSED,
+				      ldouble f11 ATTRIBUTE_UNUSED,
+				      ldouble f12 ATTRIBUTE_UNUSED,
+				      ldouble f13 ATTRIBUTE_UNUSED,
+				      ldouble f14 ATTRIBUTE_UNUSED,
+				      ldouble f15 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_ldouble_arguments;
+}
+
+void
+fun_check_x87_passing_ldouble20_values (ldouble f0 ATTRIBUTE_UNUSED,
+					ldouble f1 ATTRIBUTE_UNUSED,
+					ldouble f2 ATTRIBUTE_UNUSED,
+					ldouble f3 ATTRIBUTE_UNUSED,
+					ldouble f4 ATTRIBUTE_UNUSED,
+					ldouble f5 ATTRIBUTE_UNUSED,
+					ldouble f6 ATTRIBUTE_UNUSED,
+					ldouble f7 ATTRIBUTE_UNUSED,
+					ldouble f8 ATTRIBUTE_UNUSED,
+					ldouble f9 ATTRIBUTE_UNUSED,
+					ldouble f10 ATTRIBUTE_UNUSED,
+					ldouble f11 ATTRIBUTE_UNUSED,
+					ldouble f12 ATTRIBUTE_UNUSED,
+					ldouble f13 ATTRIBUTE_UNUSED,
+					ldouble f14 ATTRIBUTE_UNUSED,
+					ldouble f15 ATTRIBUTE_UNUSED,
+					ldouble f16 ATTRIBUTE_UNUSED,
+					ldouble f17 ATTRIBUTE_UNUSED,
+					ldouble f18 ATTRIBUTE_UNUSED,
+					ldouble f19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  assert (values_ldouble.f0 == f0);
+  assert (values_ldouble.f1 == f1);
+  assert (values_ldouble.f2 == f2);
+  assert (values_ldouble.f3 == f3);
+  assert (values_ldouble.f4 == f4);
+  assert (values_ldouble.f5 == f5);
+  assert (values_ldouble.f6 == f6);
+  assert (values_ldouble.f7 == f7);
+  assert (values_ldouble.f8 == f8);
+  assert (values_ldouble.f9 == f9);
+  assert (values_ldouble.f10 == f10);
+  assert (values_ldouble.f11 == f11);
+  assert (values_ldouble.f12 == f12);
+  assert (values_ldouble.f13 == f13);
+  assert (values_ldouble.f14 == f14);
+  assert (values_ldouble.f15 == f15);
+  assert (values_ldouble.f16 == f16);
+  assert (values_ldouble.f17 == f17);
+  assert (values_ldouble.f18 == f18);
+  assert (values_ldouble.f19 == f19);
+
+}
+
+void
+fun_check_x87_passing_ldouble20_regs (ldouble f0 ATTRIBUTE_UNUSED,
+				      ldouble f1 ATTRIBUTE_UNUSED,
+				      ldouble f2 ATTRIBUTE_UNUSED,
+				      ldouble f3 ATTRIBUTE_UNUSED,
+				      ldouble f4 ATTRIBUTE_UNUSED,
+				      ldouble f5 ATTRIBUTE_UNUSED,
+				      ldouble f6 ATTRIBUTE_UNUSED,
+				      ldouble f7 ATTRIBUTE_UNUSED,
+				      ldouble f8 ATTRIBUTE_UNUSED,
+				      ldouble f9 ATTRIBUTE_UNUSED,
+				      ldouble f10 ATTRIBUTE_UNUSED,
+				      ldouble f11 ATTRIBUTE_UNUSED,
+				      ldouble f12 ATTRIBUTE_UNUSED,
+				      ldouble f13 ATTRIBUTE_UNUSED,
+				      ldouble f14 ATTRIBUTE_UNUSED,
+				      ldouble f15 ATTRIBUTE_UNUSED,
+				      ldouble f16 ATTRIBUTE_UNUSED,
+				      ldouble f17 ATTRIBUTE_UNUSED,
+				      ldouble f18 ATTRIBUTE_UNUSED,
+				      ldouble f19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_ldouble_arguments;
+}
+
+#define def_check_float16_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6,\
+				   _f7, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7);
+
+#define def_check_float16_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \
+				    _f7, _f8, _f9, _f10, _f11, _f12, _f13, \
+				    _f14, _f15, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
+		     _f10, _f11, _f12, _f13, _f14, _f15); \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
+		     _f10, _f11, _f12, _f13, _f14, _f15);
+
+#define def_check_float16_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, \
+				    _f7, _f8, _f9, _f10, _f11, _f12, \
+				    _f13, _f14, _f15, _f16, _f17, \
+				    _f18, _f19, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  values_ ## TYPE .f16 = _f16; \
+  values_ ## TYPE .f17 = _f17; \
+  values_ ## TYPE .f18 = _f18; \
+  values_ ## TYPE .f19 = _f19; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, \
+		     _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, \
+		     _f17, _f18, _f19); \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, \
+		     _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, \
+		     _f18, _f19);
+
+
+#define def_check_float_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \
+  \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7);
+
+#define def_check_float_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15); \
+  \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15);
+
+#define def_check_float_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  values_ ## TYPE .f16 = _f16; \
+  values_ ## TYPE .f17 = _f17; \
+  values_ ## TYPE .f18 = _f18; \
+  values_ ## TYPE .f19 = _f19; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19); \
+  \
+  clear_float_registers; \
+  fregs.F0._ ## TYPE [0] = _f0; \
+  fregs.F1._ ## TYPE [0] = _f1; \
+  fregs.F2._ ## TYPE [0] = _f2; \
+  fregs.F3._ ## TYPE [0] = _f3; \
+  fregs.F4._ ## TYPE [0] = _f4; \
+  fregs.F5._ ## TYPE [0] = _f5; \
+  fregs.F6._ ## TYPE [0] = _f6; \
+  fregs.F7._ ## TYPE [0] = _f7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19);
+
+#define def_check_x87_passing8(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7); \
+  \
+  clear_x87_registers; \
+  num_fregs = 0; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7);
+
+#define def_check_x87_passing16(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15); \
+  \
+  clear_x87_registers; \
+  num_fregs = 0; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15);
+
+#define def_check_x87_passing20(_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19, _func1, _func2, TYPE) \
+  values_ ## TYPE .f0 = _f0; \
+  values_ ## TYPE .f1 = _f1; \
+  values_ ## TYPE .f2 = _f2; \
+  values_ ## TYPE .f3 = _f3; \
+  values_ ## TYPE .f4 = _f4; \
+  values_ ## TYPE .f5 = _f5; \
+  values_ ## TYPE .f6 = _f6; \
+  values_ ## TYPE .f7 = _f7; \
+  values_ ## TYPE .f8 = _f8; \
+  values_ ## TYPE .f9 = _f9; \
+  values_ ## TYPE .f10 = _f10; \
+  values_ ## TYPE .f11 = _f11; \
+  values_ ## TYPE .f12 = _f12; \
+  values_ ## TYPE .f13 = _f13; \
+  values_ ## TYPE .f14 = _f14; \
+  values_ ## TYPE .f15 = _f15; \
+  values_ ## TYPE .f16 = _f16; \
+  values_ ## TYPE .f17 = _f17; \
+  values_ ## TYPE .f18 = _f18; \
+  values_ ## TYPE .f19 = _f19; \
+  WRAP_CALL(_func1) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19); \
+  \
+  clear_x87_registers; \
+  num_fregs = 0; \
+  WRAP_CALL(_func2) (_f0, _f1, _f2, _f3, _f4, _f5, _f6, _f7, _f8, _f9, _f10, _f11, _f12, _f13, _f14, _f15, _f16, _f17, _f18, _f19);
+
+void
+test_float16_on_stack ()
+{
+  def_check_float16_passing8 (32, 33, 34, 35, 36, 37, 38, 39,
+			      fun_check_float16_passing_8_values,
+			      fun_check_float16_passing_8_regs, _Float16);
+
+  def_check_float16_passing16 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			       44, 45, 46, 47,
+			       fun_check_float16_passing_16_values,
+			       fun_check_float16_passing_16_regs, _Float16);
+}
+
+void
+test_too_many_float16 ()
+{
+  def_check_float16_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			       44, 45, 46, 47, 48, 49, 50, 51,
+			       fun_check_float16_passing_20_values,
+			       fun_check_float16_passing_20_regs, _Float16);
+}
+
+void
+test_floats_on_stack ()
+{
+  def_check_float_passing8 (32, 33, 34, 35, 36, 37, 38, 39,
+			    fun_check_float_passing_float8_values,
+			    fun_check_float_passing_float8_regs, float);
+
+  def_check_float_passing16 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			     44, 45, 46, 47,
+			     fun_check_float_passing_float16_values,
+			     fun_check_float_passing_float16_regs, float);
+}
+
+void
+test_too_many_floats ()
+{
+  def_check_float_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			     44, 45, 46, 47, 48, 49, 50, 51,
+			     fun_check_float_passing_float20_values,
+			     fun_check_float_passing_float20_regs, float);
+}
+
+void
+test_doubles_on_stack ()
+{
+  def_check_float_passing8 (32, 33, 34, 35, 36, 37, 38, 39,
+			    fun_check_float_passing_double8_values,
+			    fun_check_float_passing_double8_regs, double);
+
+  def_check_float_passing16 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			     44, 45, 46, 47,
+			     fun_check_float_passing_double16_values,
+			     fun_check_float_passing_double16_regs, double);
+}
+
+void
+test_too_many_doubles ()
+{
+  def_check_float_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
+			     44, 45, 46, 47, 48, 49, 50, 51,
+			     fun_check_float_passing_double20_values,
+			     fun_check_float_passing_double20_regs, double);
+}
+
+void
+test_long_doubles_on_stack ()
+{
+  def_check_x87_passing8 (32, 33, 34, 35, 36, 37, 38, 39,
+			  fun_check_x87_passing_ldouble8_values,
+			  fun_check_x87_passing_ldouble8_regs, ldouble);
+}
+
+void
+test_too_many_long_doubles ()
+{
+  def_check_x87_passing20 (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+			   45, 46, 47, 48, 49, 50, 51,
+			   fun_check_x87_passing_ldouble20_values,
+			   fun_check_x87_passing_ldouble20_regs, ldouble);
+}
+
+void
+test_float128s_on_stack ()
+{
+}
+
+void
+test_too_many_float128s ()
+{
+}
+
+
+static void
+do_test (void)
+{
+  test_float16_on_stack ();
+  test_too_many_float16 ();
+  test_floats_on_stack ();
+  test_too_many_floats ();
+  test_doubles_on_stack ();
+  test_too_many_doubles ();
+  test_long_doubles_on_stack ();
+  test_too_many_long_doubles ();
+  test_float128s_on_stack ();
+  test_too_many_float128s ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
new file mode 100644
index 00000000000..66c27aef7af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
@@ -0,0 +1,510 @@
+#include <stdio.h>
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
+    i16, i17, i18, i19, i20, i21, i22, i23;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m64_8_values (__m64 i0 ATTRIBUTE_UNUSED,
+				__m64 i1 ATTRIBUTE_UNUSED,
+				__m64 i2 ATTRIBUTE_UNUSED,
+				__m64 i3 ATTRIBUTE_UNUSED,
+				__m64 i4 ATTRIBUTE_UNUSED,
+				__m64 i5 ATTRIBUTE_UNUSED,
+				__m64 i6 ATTRIBUTE_UNUSED,
+				__m64 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m64);
+  compare (values.i1, i1, __m64);
+  compare (values.i2, i2, __m64);
+  compare (values.i3, i3, __m64);
+  compare (values.i4, i4, __m64);
+  compare (values.i5, i5, __m64);
+  compare (values.i6, i6, __m64);
+  compare (values.i7, i7, __m64);
+}
+
+void
+fun_check_passing_m64_8_regs (__m64 i0 ATTRIBUTE_UNUSED,
+			      __m64 i1 ATTRIBUTE_UNUSED,
+			      __m64 i2 ATTRIBUTE_UNUSED,
+			      __m64 i3 ATTRIBUTE_UNUSED,
+			      __m64 i4 ATTRIBUTE_UNUSED,
+			      __m64 i5 ATTRIBUTE_UNUSED,
+			      __m64 i6 ATTRIBUTE_UNUSED,
+			      __m64 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m64_arguments;
+}
+
+void
+fun_check_passing_m64_20_values (__m64 i0 ATTRIBUTE_UNUSED,
+				 __m64 i1 ATTRIBUTE_UNUSED,
+				 __m64 i2 ATTRIBUTE_UNUSED,
+				 __m64 i3 ATTRIBUTE_UNUSED,
+				 __m64 i4 ATTRIBUTE_UNUSED,
+				 __m64 i5 ATTRIBUTE_UNUSED,
+				 __m64 i6 ATTRIBUTE_UNUSED,
+				 __m64 i7 ATTRIBUTE_UNUSED,
+				 __m64 i8 ATTRIBUTE_UNUSED,
+				 __m64 i9 ATTRIBUTE_UNUSED,
+				 __m64 i10 ATTRIBUTE_UNUSED,
+				 __m64 i11 ATTRIBUTE_UNUSED,
+				 __m64 i12 ATTRIBUTE_UNUSED,
+				 __m64 i13 ATTRIBUTE_UNUSED,
+				 __m64 i14 ATTRIBUTE_UNUSED,
+				 __m64 i15 ATTRIBUTE_UNUSED,
+				 __m64 i16 ATTRIBUTE_UNUSED,
+				 __m64 i17 ATTRIBUTE_UNUSED,
+				 __m64 i18 ATTRIBUTE_UNUSED,
+				 __m64 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m64);
+  compare (values.i1, i1, __m64);
+  compare (values.i2, i2, __m64);
+  compare (values.i3, i3, __m64);
+  compare (values.i4, i4, __m64);
+  compare (values.i5, i5, __m64);
+  compare (values.i6, i6, __m64);
+  compare (values.i7, i7, __m64);
+  compare (values.i8, i8, __m64);
+  compare (values.i9, i9, __m64);
+  compare (values.i10, i10, __m64);
+  compare (values.i11, i11, __m64);
+  compare (values.i12, i12, __m64);
+  compare (values.i13, i13, __m64);
+  compare (values.i14, i14, __m64);
+  compare (values.i15, i15, __m64);
+  compare (values.i16, i16, __m64);
+  compare (values.i17, i17, __m64);
+  compare (values.i18, i18, __m64);
+  compare (values.i19, i19, __m64);
+}
+
+void
+fun_check_passing_m64_20_regs (__m64 i0 ATTRIBUTE_UNUSED,
+			       __m64 i1 ATTRIBUTE_UNUSED,
+			       __m64 i2 ATTRIBUTE_UNUSED,
+			       __m64 i3 ATTRIBUTE_UNUSED,
+			       __m64 i4 ATTRIBUTE_UNUSED,
+			       __m64 i5 ATTRIBUTE_UNUSED,
+			       __m64 i6 ATTRIBUTE_UNUSED,
+			       __m64 i7 ATTRIBUTE_UNUSED,
+			       __m64 i8 ATTRIBUTE_UNUSED,
+			       __m64 i9 ATTRIBUTE_UNUSED,
+			       __m64 i10 ATTRIBUTE_UNUSED,
+			       __m64 i11 ATTRIBUTE_UNUSED,
+			       __m64 i12 ATTRIBUTE_UNUSED,
+			       __m64 i13 ATTRIBUTE_UNUSED,
+			       __m64 i14 ATTRIBUTE_UNUSED,
+			       __m64 i15 ATTRIBUTE_UNUSED,
+			       __m64 i16 ATTRIBUTE_UNUSED,
+			       __m64 i17 ATTRIBUTE_UNUSED,
+			       __m64 i18 ATTRIBUTE_UNUSED,
+			       __m64 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m64_arguments;
+}
+
+void
+fun_check_passing_m128_8_values (__m128 i0 ATTRIBUTE_UNUSED,
+				 __m128 i1 ATTRIBUTE_UNUSED,
+				 __m128 i2 ATTRIBUTE_UNUSED,
+				 __m128 i3 ATTRIBUTE_UNUSED,
+				 __m128 i4 ATTRIBUTE_UNUSED,
+				 __m128 i5 ATTRIBUTE_UNUSED,
+				 __m128 i6 ATTRIBUTE_UNUSED,
+				 __m128 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128);
+  compare (values.i1, i1, __m128);
+  compare (values.i2, i2, __m128);
+  compare (values.i3, i3, __m128);
+  compare (values.i4, i4, __m128);
+  compare (values.i5, i5, __m128);
+  compare (values.i6, i6, __m128);
+  compare (values.i7, i7, __m128);
+}
+
+void
+fun_check_passing_m128h_8_values (__m128h i0 ATTRIBUTE_UNUSED,
+				  __m128h i1 ATTRIBUTE_UNUSED,
+				  __m128h i2 ATTRIBUTE_UNUSED,
+				  __m128h i3 ATTRIBUTE_UNUSED,
+				  __m128h i4 ATTRIBUTE_UNUSED,
+				  __m128h i5 ATTRIBUTE_UNUSED,
+				  __m128h i6 ATTRIBUTE_UNUSED,
+				  __m128h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128h);
+  compare (values.i1, i1, __m128h);
+  compare (values.i2, i2, __m128h);
+  compare (values.i3, i3, __m128h);
+  compare (values.i4, i4, __m128h);
+  compare (values.i5, i5, __m128h);
+  compare (values.i6, i6, __m128h);
+  compare (values.i7, i7, __m128h);
+}
+
+void
+fun_check_passing_m128_8_regs (__m128 i0 ATTRIBUTE_UNUSED,
+			       __m128 i1 ATTRIBUTE_UNUSED,
+			       __m128 i2 ATTRIBUTE_UNUSED,
+			       __m128 i3 ATTRIBUTE_UNUSED,
+			       __m128 i4 ATTRIBUTE_UNUSED,
+			       __m128 i5 ATTRIBUTE_UNUSED,
+			       __m128 i6 ATTRIBUTE_UNUSED,
+			       __m128 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+void
+fun_check_passing_m128h_8_regs (__m128h i0 ATTRIBUTE_UNUSED,
+			        __m128h i1 ATTRIBUTE_UNUSED,
+			        __m128h i2 ATTRIBUTE_UNUSED,
+			        __m128h i3 ATTRIBUTE_UNUSED,
+			        __m128h i4 ATTRIBUTE_UNUSED,
+			        __m128h i5 ATTRIBUTE_UNUSED,
+			        __m128h i6 ATTRIBUTE_UNUSED,
+			        __m128h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+void
+fun_check_passing_m128_20_values (__m128 i0 ATTRIBUTE_UNUSED,
+				  __m128 i1 ATTRIBUTE_UNUSED,
+				  __m128 i2 ATTRIBUTE_UNUSED,
+				  __m128 i3 ATTRIBUTE_UNUSED,
+				  __m128 i4 ATTRIBUTE_UNUSED,
+				  __m128 i5 ATTRIBUTE_UNUSED,
+				  __m128 i6 ATTRIBUTE_UNUSED,
+				  __m128 i7 ATTRIBUTE_UNUSED,
+				  __m128 i8 ATTRIBUTE_UNUSED,
+				  __m128 i9 ATTRIBUTE_UNUSED,
+				  __m128 i10 ATTRIBUTE_UNUSED,
+				  __m128 i11 ATTRIBUTE_UNUSED,
+				  __m128 i12 ATTRIBUTE_UNUSED,
+				  __m128 i13 ATTRIBUTE_UNUSED,
+				  __m128 i14 ATTRIBUTE_UNUSED,
+				  __m128 i15 ATTRIBUTE_UNUSED,
+				  __m128 i16 ATTRIBUTE_UNUSED,
+				  __m128 i17 ATTRIBUTE_UNUSED,
+				  __m128 i18 ATTRIBUTE_UNUSED,
+				  __m128 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128);
+  compare (values.i1, i1, __m128);
+  compare (values.i2, i2, __m128);
+  compare (values.i3, i3, __m128);
+  compare (values.i4, i4, __m128);
+  compare (values.i5, i5, __m128);
+  compare (values.i6, i6, __m128);
+  compare (values.i7, i7, __m128);
+  compare (values.i8, i8, __m128);
+  compare (values.i9, i9, __m128);
+  compare (values.i10, i10, __m128);
+  compare (values.i11, i11, __m128);
+  compare (values.i12, i12, __m128);
+  compare (values.i13, i13, __m128);
+  compare (values.i14, i14, __m128);
+  compare (values.i15, i15, __m128);
+  compare (values.i16, i16, __m128);
+  compare (values.i17, i17, __m128);
+  compare (values.i18, i18, __m128);
+  compare (values.i19, i19, __m128);
+}
+
+void
+fun_check_passing_m128h_20_values (__m128h i0 ATTRIBUTE_UNUSED,
+				   __m128h i1 ATTRIBUTE_UNUSED,
+				   __m128h i2 ATTRIBUTE_UNUSED,
+				   __m128h i3 ATTRIBUTE_UNUSED,
+				   __m128h i4 ATTRIBUTE_UNUSED,
+				   __m128h i5 ATTRIBUTE_UNUSED,
+				   __m128h i6 ATTRIBUTE_UNUSED,
+				   __m128h i7 ATTRIBUTE_UNUSED,
+				   __m128h i8 ATTRIBUTE_UNUSED,
+				   __m128h i9 ATTRIBUTE_UNUSED,
+				   __m128h i10 ATTRIBUTE_UNUSED,
+				   __m128h i11 ATTRIBUTE_UNUSED,
+				   __m128h i12 ATTRIBUTE_UNUSED,
+				   __m128h i13 ATTRIBUTE_UNUSED,
+				   __m128h i14 ATTRIBUTE_UNUSED,
+				   __m128h i15 ATTRIBUTE_UNUSED,
+				   __m128h i16 ATTRIBUTE_UNUSED,
+				   __m128h i17 ATTRIBUTE_UNUSED,
+				   __m128h i18 ATTRIBUTE_UNUSED,
+				   __m128h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m128h);
+  compare (values.i1, i1, __m128h);
+  compare (values.i2, i2, __m128h);
+  compare (values.i3, i3, __m128h);
+  compare (values.i4, i4, __m128h);
+  compare (values.i5, i5, __m128h);
+  compare (values.i6, i6, __m128h);
+  compare (values.i7, i7, __m128h);
+  compare (values.i8, i8, __m128h);
+  compare (values.i9, i9, __m128h);
+  compare (values.i10, i10, __m128h);
+  compare (values.i11, i11, __m128h);
+  compare (values.i12, i12, __m128h);
+  compare (values.i13, i13, __m128h);
+  compare (values.i14, i14, __m128h);
+  compare (values.i15, i15, __m128h);
+  compare (values.i16, i16, __m128h);
+  compare (values.i17, i17, __m128h);
+  compare (values.i18, i18, __m128h);
+  compare (values.i19, i19, __m128h);
+}
+
+void
+fun_check_passing_m128_20_regs (__m128 i0 ATTRIBUTE_UNUSED,
+				__m128 i1 ATTRIBUTE_UNUSED,
+				__m128 i2 ATTRIBUTE_UNUSED,
+				__m128 i3 ATTRIBUTE_UNUSED,
+				__m128 i4 ATTRIBUTE_UNUSED,
+				__m128 i5 ATTRIBUTE_UNUSED,
+				__m128 i6 ATTRIBUTE_UNUSED,
+				__m128 i7 ATTRIBUTE_UNUSED,
+				__m128 i8 ATTRIBUTE_UNUSED,
+				__m128 i9 ATTRIBUTE_UNUSED,
+				__m128 i10 ATTRIBUTE_UNUSED,
+				__m128 i11 ATTRIBUTE_UNUSED,
+				__m128 i12 ATTRIBUTE_UNUSED,
+				__m128 i13 ATTRIBUTE_UNUSED,
+				__m128 i14 ATTRIBUTE_UNUSED,
+				__m128 i15 ATTRIBUTE_UNUSED,
+				__m128 i16 ATTRIBUTE_UNUSED,
+				__m128 i17 ATTRIBUTE_UNUSED,
+				__m128 i18 ATTRIBUTE_UNUSED,
+				__m128 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+void
+fun_check_passing_m128h_20_regs (__m128h i0 ATTRIBUTE_UNUSED,
+				 __m128h i1 ATTRIBUTE_UNUSED,
+				 __m128h i2 ATTRIBUTE_UNUSED,
+				 __m128h i3 ATTRIBUTE_UNUSED,
+				 __m128h i4 ATTRIBUTE_UNUSED,
+				 __m128h i5 ATTRIBUTE_UNUSED,
+				 __m128h i6 ATTRIBUTE_UNUSED,
+				 __m128h i7 ATTRIBUTE_UNUSED,
+				 __m128h i8 ATTRIBUTE_UNUSED,
+				 __m128h i9 ATTRIBUTE_UNUSED,
+				 __m128h i10 ATTRIBUTE_UNUSED,
+				 __m128h i11 ATTRIBUTE_UNUSED,
+				 __m128h i12 ATTRIBUTE_UNUSED,
+				 __m128h i13 ATTRIBUTE_UNUSED,
+				 __m128h i14 ATTRIBUTE_UNUSED,
+				 __m128h i15 ATTRIBUTE_UNUSED,
+				 __m128h i16 ATTRIBUTE_UNUSED,
+				 __m128h i17 ATTRIBUTE_UNUSED,
+				 __m128h i18 ATTRIBUTE_UNUSED,
+				 __m128h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m128_arguments;
+}
+
+#define def_check_int_passing8(_i0, _i1, _i2, _i3, \
+			       _i4, _i5, _i6, _i7, \
+			       _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
+  clear_float_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
+
+#define def_check_int_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, \
+				_i7, _i8, _i9, _i10, _i11, _i12, _i13, \
+				_i14, _i15, _i16, _i17, _i18, _i19, \
+				_func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  values.i10.TYPE[0] = _i10; \
+  values.i11.TYPE[0] = _i11; \
+  values.i12.TYPE[0] = _i12; \
+  values.i13.TYPE[0] = _i13; \
+  values.i14.TYPE[0] = _i14; \
+  values.i15.TYPE[0] = _i15; \
+  values.i16.TYPE[0] = _i16; \
+  values.i17.TYPE[0] = _i17; \
+  values.i18.TYPE[0] = _i18; \
+  values.i19.TYPE[0] = _i19; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \
+		     _i17, _i18, _i19); \
+  clear_float_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, _i16, \
+		     _i17, _i18, _i19);
+
+void
+test_m64_on_stack ()
+{
+  __m64 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m64){32 + i, 0};
+  pass = "m64-8";
+  def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			  fun_check_passing_m64_8_values,
+			  fun_check_passing_m64_8_regs, _m64);
+}
+
+void
+test_too_many_m64 ()
+{
+  __m64 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m64){32 + i, 0};
+  pass = "m64-20";
+  def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			   x[8], x[9], x[10], x[11], x[12], x[13], x[14],
+			   x[15], x[16], x[17], x[18], x[19],
+			   fun_check_passing_m64_20_values,
+			   fun_check_passing_m64_20_regs, _m64);
+}
+
+void
+test_m128_on_stack ()
+{
+  __m128 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m128){32 + i, 0, 0, 0};
+  pass = "m128-8";
+  def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			  fun_check_passing_m128_8_values,
+			  fun_check_passing_m128_8_regs, _m128);
+}
+
+void
+test_m128h_on_stack ()
+{
+  __m128h x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16,
+	             6.6f16, 7.7f16, 8.8f16};
+  pass = "m128h-8";
+  def_check_int_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			  fun_check_passing_m128h_8_values,
+			  fun_check_passing_m128h_8_regs, _m128h);
+}
+
+void
+test_too_many_m128 ()
+{
+  __m128 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m128){32 + i, 0, 0, 0};
+  pass = "m128-20";
+  def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			   x[8], x[9], x[10], x[11], x[12], x[13], x[14],
+			   x[15], x[16], x[17], x[18], x[19],
+			   fun_check_passing_m128_20_values,
+			   fun_check_passing_m128_20_regs, _m128);
+}
+
+void
+test_too_many_m128h ()
+{
+  __m128h x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m128h){1.1f16, 2.2f16, 3.3f16, 4.4f16, 5.5f16,
+	             6.6f16, 7.7f16, 8.8f16};
+  pass = "m128h-20";
+  def_check_int_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+			   x[8], x[9], x[10], x[11], x[12], x[13], x[14],
+			   x[15], x[16], x[17], x[18], x[19],
+			   fun_check_passing_m128h_20_values,
+			   fun_check_passing_m128h_20_regs, _m128h);
+}
+
+static void
+do_test (void)
+{
+  test_m64_on_stack ();
+  test_too_many_m64 ();
+  test_m128_on_stack ();
+  test_too_many_m128 ();
+  test_m128h_on_stack ();
+  test_too_many_m128h ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
new file mode 100644
index 00000000000..4d1956a846d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
@@ -0,0 +1,332 @@
+/* This tests passing of structs. */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "args.h"
+#include <complex.h>
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+struct int_struct
+{
+  int i;
+};
+
+struct long_struct
+{
+  long long l;
+};
+
+struct long2_struct
+{
+  long long l1, l2;
+};
+
+struct long3_struct
+{
+  long long l1, l2, l3;
+};
+
+
+/* Check that the struct is passed as the individual members in iregs.  */
+void
+check_struct_passing1 (struct int_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_struct_passing2 (struct long_struct ls ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_struct_passing3 (struct long2_struct ls ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_struct_passing4 (struct long3_struct ls ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ls.l1 == rsp+8);
+  assert ((unsigned long)&ls.l2 == rsp+16);
+  assert ((unsigned long)&ls.l3 == rsp+24);
+}
+
+#ifdef CHECK_M64_M128
+struct m128_struct
+{
+  __m128 x;
+};
+
+struct m128_2_struct
+{
+  __m128 x1, x2;
+};
+
+/* Check that the struct is passed as the individual members in fregs.  */
+void
+check_struct_passing5 (struct m128_struct ms1 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms2 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms3 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms4 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms5 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms6 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms7 ATTRIBUTE_UNUSED,
+		       struct m128_struct ms8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+void
+check_struct_passing6 (struct m128_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+24);
+}
+#endif
+
+struct flex1_struct
+{
+  long long i;
+  long long flex[];
+};
+
+struct flex2_struct
+{
+  long long i;
+  long long flex[0];
+};
+
+void
+check_struct_passing7 (struct flex1_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_struct_passing8 (struct flex2_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+struct complex1_struct
+{
+  int c;
+  __complex__ float x;
+};
+
+struct complex1a_struct
+{
+  long long l;
+  float f;
+};
+
+struct complex2_struct
+{
+  int c;
+  __complex__ float x;
+  float y;
+};
+
+struct complex2a_struct
+{
+  long long l;
+  double d;
+};
+
+struct complex3_struct
+{
+  int c;
+  __complex__ _Float16 x;
+};
+
+struct complex3a_struct
+{
+  long long l;
+  _Float16 f;
+};
+
+struct complex4_struct
+{
+  int c;
+  __complex__ _Float16 x;
+  _Float16 y;
+};
+
+struct complex4a_struct
+{
+  long long l;
+  _Float16 f;
+};
+
+void
+check_struct_passing9 (struct complex1_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_float_arguments;
+}
+
+void
+check_struct_passing10 (struct complex2_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_double_arguments;
+}
+
+void
+check_struct_passing11 (struct complex3_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_float16_arguments;
+}
+
+void
+check_struct_passing12 (struct complex4_struct is ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_float16_arguments;
+}
+
+static struct flex1_struct f1s = { 60, { } };
+static struct flex2_struct f2s = { 61, { } };
+
+static void
+do_test (void)
+{
+  struct int_struct is = { 48 };
+  struct long_struct ls = { 49 };
+#ifdef CHECK_LARGER_STRUCTS
+  struct long2_struct l2s = { 50, 51 };
+  struct long3_struct l3s = { 52, 53, 54 };
+#endif
+#ifdef CHECK_M64_M128
+  struct m128_struct m128s[8];
+  struct m128_2_struct m128_2s = { 
+      { 48.394, 39.3, -397.9, 3484.9 },
+      { -8.394, -93.3, 7.9, 84.94 }
+  };
+  int i;
+#endif
+  struct complex1_struct c1s = { 4, ( -13.4 + 3.5*I ) };
+  union
+    {
+      struct complex1_struct c;
+      struct complex1a_struct u;
+    } c1u;
+  struct complex2_struct c2s = { 4, ( -13.4 + 3.5*I ), -34.5 };
+  union
+    {
+      struct complex2_struct c;
+      struct complex2a_struct u;
+    } c2u;
+
+  struct complex3_struct c3s = { 4, ( -13.4 + 3.5*I ) };
+  union
+    {
+      struct complex3_struct c;
+      struct complex3a_struct u;
+    } c3u;
+
+  struct complex4_struct c4s = { 4, ( -13.4 + 3.5*I ), -34.5 };
+  union
+    {
+      struct complex4_struct c;
+      struct complex4a_struct u;
+    } c4u;
+
+  clear_struct_registers;
+  iregs.I0 = is.i;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing1)(is);
+
+  clear_struct_registers;
+  iregs.I0 = ls.l;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing2)(ls);
+
+#ifdef CHECK_LARGER_STRUCTS
+  clear_struct_registers;
+  iregs.I0 = l2s.l1;
+  iregs.I1 = l2s.l2;
+  num_iregs = 2;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing3)(l2s);
+  WRAP_CALL (check_struct_passing4)(l3s);
+#endif
+
+#ifdef CHECK_M64_M128
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      m128s[i].x = (__m128){32+i, 0, i, 0};
+      (&fregs.xmm0)[i]._m128[0] = m128s[i].x;
+    }
+  num_fregs = 8;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing5)(m128s[0], m128s[1], m128s[2], m128s[3],
+				    m128s[4], m128s[5], m128s[6], m128s[7]);
+  WRAP_CALL (check_struct_passing6)(m128_2s);
+#endif
+
+  clear_struct_registers;
+  iregs.I0 = f1s.i;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing7)(f1s);
+
+  clear_struct_registers;
+  iregs.I0 = f2s.i;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  WRAP_CALL (check_struct_passing8)(f2s);
+
+  clear_struct_registers;
+  c1u.c = c1s;
+  iregs.I0 = c1u.u.l;
+  num_iregs = 1;
+  fregs.xmm0._float [0] = c1u.u.f;
+  num_fregs = 1;
+  clear_int_hardware_registers;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing9)(c1s);
+
+  clear_struct_registers;
+  c2u.c = c2s;
+  iregs.I0 = c2u.u.l;
+  num_iregs = 1;
+  fregs.xmm0._double[0] = c2u.u.d;
+  num_fregs = 1;
+  clear_int_hardware_registers;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing10)(c2s);
+
+  clear_struct_registers;
+  c3u.c = c3s;
+  iregs.I0 = c3u.u.l;
+  num_iregs = 1;
+  num_fregs = 0;
+  clear_int_hardware_registers;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing11)(c3s);
+
+  clear_struct_registers;
+  c4u.c = c4s;
+  iregs.I0 = c4u.u.l;
+  num_iregs = 1;
+  fregs.xmm0.__Float16 [0] = c4u.u.f;
+  num_fregs = 1;
+  clear_int_hardware_registers;
+  clear_float_hardware_registers;
+  WRAP_CALL (check_struct_passing12)(c4s);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
new file mode 100644
index 00000000000..640b3057f93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
@@ -0,0 +1,335 @@
+/* This tests passing of structs.  */
+
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+struct int_struct
+{
+  int i;
+};
+
+struct long_struct
+{
+  long l;
+};
+
+union un1
+{
+  char c;
+  int i;
+};
+
+union un2
+{
+  char c1;
+  long l;
+  char c2;
+};
+
+union un3
+{
+  struct int_struct is;
+  struct long_struct ls;
+  union un1 un;
+};
+
+
+void
+check_union_passing1(union un1 u ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+void
+check_union_passing3(union un3 u ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+#define check_union_passing1 WRAP_CALL(check_union_passing1)
+#define check_union_passing2 WRAP_CALL(check_union_passing2)
+#define check_union_passing3 WRAP_CALL(check_union_passing3)
+
+#ifdef CHECK_M64_M128
+union un4
+{
+  __m128 x;
+  float f;
+};
+
+union un5
+{
+  __m128 x;
+  long i;
+};
+
+void
+check_union_passing4(union un4 u1 ATTRIBUTE_UNUSED,
+		     union un4 u2 ATTRIBUTE_UNUSED,
+		     union un4 u3 ATTRIBUTE_UNUSED,
+		     union un4 u4 ATTRIBUTE_UNUSED,
+		     union un4 u5 ATTRIBUTE_UNUSED,
+		     union un4 u6 ATTRIBUTE_UNUSED,
+		     union un4 u7 ATTRIBUTE_UNUSED,
+		     union un4 u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+void
+check_union_passing5(union un5 u ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+  check_vector_arguments(m128, 8);
+}
+
+union un4a
+{
+  __m128 x;
+  _Float16 f;
+};
+
+void
+check_union_passing4a(union un4a u1 ATTRIBUTE_UNUSED,
+		      union un4a u2 ATTRIBUTE_UNUSED,
+		      union un4a u3 ATTRIBUTE_UNUSED,
+		      union un4a u4 ATTRIBUTE_UNUSED,
+		      union un4a u5 ATTRIBUTE_UNUSED,
+		      union un4a u6 ATTRIBUTE_UNUSED,
+		      union un4a u7 ATTRIBUTE_UNUSED,
+		      union un4a u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+union un4b
+{
+  __m128h x;
+  _Float16 f;
+};
+
+void
+check_union_passing4b(union un4b u1 ATTRIBUTE_UNUSED,
+		      union un4b u2 ATTRIBUTE_UNUSED,
+		      union un4b u3 ATTRIBUTE_UNUSED,
+		      union un4b u4 ATTRIBUTE_UNUSED,
+		      union un4b u5 ATTRIBUTE_UNUSED,
+		      union un4b u6 ATTRIBUTE_UNUSED,
+		      union un4b u7 ATTRIBUTE_UNUSED,
+		      union un4b u8 ATTRIBUTE_UNUSED)
+{
+  check_m128_arguments;
+}
+
+#define check_union_passing4 WRAP_CALL(check_union_passing4)
+#define check_union_passing4a WRAP_CALL(check_union_passing4a)
+#define check_union_passing4b WRAP_CALL(check_union_passing4b)
+#define check_union_passing5 WRAP_CALL(check_union_passing5)
+#endif
+
+union un6
+{
+  long double ld;
+  int i;
+};
+
+
+void
+check_union_passing6(union un6 u ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.ld == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+#define check_union_passing6 WRAP_CALL(check_union_passing6)
+
+union un7
+{
+  long double ld;
+  _Float16 f;
+};
+
+void
+check_union_passing7(union un7 u ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.ld == rsp+8);
+  assert ((unsigned long)&u.f == rsp+8);
+}
+
+#define check_union_passing7 WRAP_CALL(check_union_passing7)
+
+union un8
+{
+  _Float16 f;
+  int i;
+};
+
+void
+check_union_passing8(union un8 u ATTRIBUTE_UNUSED)
+{
+  check_int_arguments;
+}
+
+#define check_union_passing8 WRAP_CALL(check_union_passing8)
+
+static void
+do_test (void)
+{
+  union un1 u1;
+#ifdef CHECK_LARGER_UNION_PASSING
+  union un2 u2;
+  union un3 u3;
+  struct int_struct is;
+  struct long_struct ls;
+#endif /* CHECK_LARGER_UNION_PASSING */
+#ifdef CHECK_M64_M128
+  union un4 u4[8];
+  union un4a u4a[8];
+  union un4b u4b[8];
+  union un5 u5 = { { 48.394, 39.3, -397.9, 3484.9 } };
+  int i;
+#endif
+  union un6 u6;
+  union un7 u7;
+  union un8 u8;
+
+  /* Check a union with char, int.  */
+  clear_struct_registers;
+  u1.i = 0;  /* clear the struct to not have high bits left */
+  u1.c = 32;
+  iregs.I0 = 32;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing1(u1);
+  u1.i = 0;  /* clear the struct to not have high bits left */
+  u1.i = 33;
+  iregs.I0 = 33;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing1(u1);
+
+  /* Check a union with char, long, char.  */
+#ifdef CHECK_LARGER_UNION_PASSING
+  clear_struct_registers;
+  u2.l = 0;  /* clear the struct to not have high bits left */
+  u2.c1 = 34;
+  iregs.I0 = 34;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing2(u2);
+  u2.l = 0;  /* clear the struct to not have high bits left */
+  u2.l = 35;
+  iregs.I0 = 35;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing2(u2);
+  u2.l = 0;  /* clear the struct to not have high bits left */
+  u2.c2 = 36;
+  iregs.I0 = 36;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing2(u2);
+
+  /* check a union containing two structs and a union.  */
+  clear_struct_registers;
+  is.i = 37;
+  u3.ls.l = 0;  /* clear the struct to not have high bits left */
+  u3.is = is;
+  iregs.I0 = 37;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing3(u3);
+  ls.l = 38;
+  u3.ls.l = 0;  /* clear the struct to not have high bits left */
+  u3.ls = ls;
+  iregs.I0 = 38;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing3(u3);
+  u1.c = 39;
+  u3.ls.l = 0;  /* clear the struct to not have high bits left */
+  u3.un = u1;
+  iregs.I0 = 39;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing3(u3);
+  u1.i = 40;
+  u3.ls.l = 0;  /* clear the struct to not have high bits left */
+  u3.un = u1;
+  iregs.I0 = 40;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing3(u3);
+#endif /* CHECK_LARGER_UNION_PASSING */
+
+#ifdef CHECK_M64_M128
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u4[i].x = (__m128){32+i, 0, i, 0};
+      (&fregs.xmm0)[i]._m128[0] = u4[i].x;
+    }
+  num_fregs = 8;
+  clear_float_hardware_registers;
+  check_union_passing4(u4[0], u4[1], u4[2], u4[3],
+		       u4[4], u4[5], u4[6], u4[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u4a[i].x = (__m128){32+i, 0, i, 0};
+      (&fregs.xmm0)[i]._m128[0] = u4[i].x;
+    }
+  num_fregs = 8;
+  clear_float_hardware_registers;
+  check_union_passing4a(u4a[0], u4a[1], u4a[2], u4a[3],
+		       u4a[4], u4a[5], u4a[6], u4a[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u4b[i].x = (__m128h){33+i, 0, i, 0, -i, 1, 2 * i, i + 8};
+      (&fregs.xmm0)[i]._m128h[0] = u4b[i].x;
+    }
+  num_fregs = 8;
+  clear_float_hardware_registers;
+  check_union_passing4b(u4b[0], u4b[1], u4b[2], u4b[3],
+		        u4b[4], u4b[5], u4b[6], u4b[7]);
+
+  clear_struct_registers;
+  fregs.xmm0._m128[0] = u5.x;
+  num_fregs = 1;
+  num_iregs = 1;
+  iregs.I0 = u5.i;
+  clear_float_hardware_registers;
+  check_union_passing5(u5);
+#endif
+
+  u6.i = 2;
+  check_union_passing6(u6);
+
+  u7.f = 2.0f16;
+  check_union_passing7(u7);
+
+  clear_struct_registers;
+  u8.i = 8;
+  num_iregs = 1;
+  iregs.I0 = u8.i;
+  clear_int_hardware_registers;
+  check_union_passing8(u8);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
new file mode 100644
index 00000000000..92578127be7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
@@ -0,0 +1,274 @@
+/* This tests returning of structures.  */
+
+#include <stdio.h>
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+int current_test;
+int num_failed = 0;
+
+#undef assert
+#define assert(test) do { if (!(test)) {fprintf (stderr, "failed in test %d\n", current_test); num_failed++; } } while (0)
+
+#define xmm0h xmm_regs[0].__Float16
+#define xmm1h xmm_regs[1].__Float16
+#define xmm0f xmm_regs[0]._float
+#define xmm0d xmm_regs[0]._double
+#define xmm1f xmm_regs[1]._float
+#define xmm1d xmm_regs[1]._double
+
+typedef enum {
+  INT = 0,
+  SSE_H,
+  SSE_F,
+  SSE_D,
+  X87,
+  MEM,
+  INT_SSE,
+  SSE_INT,
+  SSE_F_V,
+  SSE_F_H,
+  SSE_F_H8
+} Type;
+
+/* Structures which should be returned in INTEGER.  */
+#define D(I,MEMBERS,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = INT; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s)); B; return s; }
+
+D(1,char m1, s.m1=42)
+D(2,short m1, s.m1=42)
+D(3,int m1, s.m1=42)
+D(4,long m1, s.m1=42)
+D(5,long long m1, s.m1=42)
+D(6,char m1;short s, s.m1=42)
+D(7,char m1;int i, s.m1=42)
+D(8,char m1; long l, s.m1=42)
+D(9,char m1; long long l, s.m1=42)
+D(10,char m1[16], s.m1[0]=42)
+D(11,short m1[8], s.m1[0]=42)
+D(12,int m1[4], s.m1[0]=42)
+D(13,long m1[2], s.m1[0]=42)
+D(14,long long m1[2], s.m1[0]=42)
+
+#undef D
+
+/* Structures which should be returned in SSE.  */
+#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s)); B; return s; }
+
+D(100,float f,SSE_F, s.f=42)
+D(101,double d,SSE_D, s.d=42)
+D(102,float f;float f2,SSE_F, s.f=42)
+D(103,float f;double d,SSE_F, s.f=42)
+D(104,double d; float f,SSE_D, s.d=42)
+D(105,double d; double d2,SSE_D, s.d=42)
+D(106,float f[2],SSE_F, s.f[0]=42)
+D(107,float f[3],SSE_F, s.f[0]=42)
+D(108,float f[4],SSE_F, s.f[0]=42)
+D(109,double d[2],SSE_D, s.d[0]=42)
+D(110,float f[2]; double d,SSE_F, s.f[0]=42)
+D(111,double d;float f[2],SSE_D, s.d=42)
+
+D(120,_Float16 f,SSE_H, s.f=42)
+D(121,_Float16 f;_Float16 f2,SSE_H, s.f=42)
+D(122,_Float16 f;float d,SSE_H, s.f=42)
+D(123,_Float16 f;double d,SSE_H, s.f=42)
+D(124,double d; _Float16 f,SSE_D, s.d=42)
+D(125,_Float16 f[2],SSE_H, s.f[0]=42)
+D(126,_Float16 f[3],SSE_H, s.f[0]=42)
+D(127,_Float16 f[4],SSE_H, s.f[0]=42)
+D(128,_Float16 f[2]; double d,SSE_H, s.f[0]=42)
+D(129,double d;_Float16 f[2],SSE_D, s.d=42)
+
+#undef D
+
+/* Structures which should be returned on x87 stack.  */
+#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = X87; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s = { 42 }; return s; }
+
+/* The only struct containing a long double, which is returned in
+   registers at all, is the singleton struct.  All others are too large.
+   This includes a struct containing complex long double, which is passed
+   in memory, although a complex long double type itself is returned in
+   two registers.  */
+D(200,long double ld)
+
+#undef D
+
+/* Structures which should be returned in INT (low) and SSE (high).  */
+#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = INT_SSE; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s = { 42,43 }; return s; }
+
+D(300,char m1; float m2)
+D(301,char m1; double m2)
+D(302,short m1; float m2)
+D(303,short m1; double m2)
+D(304,int m1; float m2)
+D(305,int m1; double m2)
+D(306,long long m1; float m2)
+D(307,long long m1; double m2)
+
+D(310,char m1; _Float16 m2)
+D(311,short m1; _Float16 m2)
+D(312,int m1; _Float16 m2)
+D(313,long long m1; _Float16 m2)
+
+#undef D
+
+void check_300 (void)
+{
+  XMM_T x;
+  x._ulong[0] = rax;
+  switch (current_test) {
+    case 300: assert ((rax & 0xff) == 42 && x._float[1] == 43); break;
+    case 301: assert ((rax & 0xff) == 42 && xmm0d[0] == 43); break;
+    case 302: assert ((rax & 0xffff) == 42 && x._float[1] == 43); break;
+    case 303: assert ((rax & 0xffff) == 42 && xmm0d[0] == 43); break;
+    case 304: assert ((rax & 0xffffffff) == 42 && x._float[1] == 43); break;
+    case 305: assert ((rax & 0xffffffff) == 42 && xmm0d[0] == 43); break;
+    case 306: assert (rax == 42 && xmm0f[0] == 43); break;
+    case 307: assert (rax == 42 && xmm0d[0] == 43); break;
+    case 310: assert ((rax & 0xff) == 42 && x.__Float16[1] == 43); break;
+    case 311: assert ((rax & 0xffff) == 42 && x.__Float16[1] == 43); break;
+    case 312: assert ((rax & 0xffffffff) == 42 && x.__Float16[2] == 43); break;
+    case 313: assert (rax == 42 && xmm0h[0] == 43); break;
+
+    default: assert (0); break;
+  }
+}
+
+/* Structures which should be returned in SSE (low) and INT (high).  */
+#define D(I,MEMBERS,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = SSE_INT; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; memset (&s, 0, sizeof(s));  B; return s; }
+
+D(400,float f[2];char c, s.f[0]=42; s.c=43)
+D(401,double d;char c, s.d=42; s.c=43)
+
+D(402,_Float16 f[4];char c, s.f[0]=42; s.c=43)
+
+#undef D
+
+void check_400 (void)
+{
+  switch (current_test) {
+    case 400: assert (xmm0f[0] == 42 && (rax & 0xff) == 43); break;
+    case 401: assert (xmm0d[0] == 42 && (rax & 0xff) == 43); break;
+    case 402: assert (xmm0h[0] == 42 && (rax & 0xff) == 43); break;
+
+    default: assert (0); break;
+  }
+}
+
+/* Structures which should be returned in MEM.  */
+void *struct_addr;
+#define D(I,MEMBERS) struct S_ ## I { MEMBERS ; }; Type class_ ## I = MEM; \
+struct S_ ## I f_ ## I (void) { union {unsigned char c; struct S_ ## I s;} u; memset (&u.s, 0, sizeof(u.s)); u.c = 42; return u.s; }
+
+/* Too large.  */
+D(500,char m1[17])
+D(501,short m1[9])
+D(502,int m1[5])
+D(503,long m1[3])
+D(504,short m1[8];char c)
+D(505,char m1[1];int i[4])
+D(506,float m1[5])
+D(507,double m1[3])
+D(508,char m1[1];float f[4])
+D(509,char m1[1];double d[2])
+D(510,__complex long double m1[1])
+
+/* Too large due to padding.  */
+D(520,char m1[1];int i;char c2; int i2; char c3)
+
+/* Unnaturally aligned members.  */
+D(530,short m1[1];int i PACKED)
+
+D(540,_Float16 m1[10])
+D(541,char m1[1];_Float16 f[8])
+
+#undef D
+
+
+/* Special tests.  */
+#define D(I,MEMBERS,C,B) struct S_ ## I { MEMBERS ; }; Type class_ ## I = C; \
+struct S_ ## I f_ ## I (void) { struct S_ ## I s; B; return s; }
+D(600,float f[4], SSE_F_V, s.f[0] = s.f[1] = s.f[2] = s.f[3] = 42)
+D(601,_Float16 f[4], SSE_F_H, s.f[0] = s.f[1] = s.f[2] = s.f[3] = 42)
+D(602,_Float16 f[8], SSE_F_H8,
+  s.f[0] = s.f[1] = s.f[2] = s.f[3] = s.f[4] = s.f[5] = s.f[6] = s.f[7] = 42)
+#undef D
+
+void clear_all (void)
+{
+  clear_int_registers;
+  clear_float_registers;
+  clear_x87_registers;
+}
+
+void check_all (Type class, unsigned long size)
+{
+  switch (class) {
+    case INT: if (size < 8) rax &= ~0UL >> (64-8*size); assert (rax == 42); break;
+    case SSE_H: assert (xmm0h[0] == 42); break;
+    case SSE_F: assert (xmm0f[0] == 42); break;
+    case SSE_D: assert (xmm0d[0] == 42); break;
+    case SSE_F_V: assert (xmm0f[0] == 42 && xmm0f[1]==42 && xmm1f[0] == 42 && xmm1f[1] == 42); break;
+    case SSE_F_H: assert (xmm0h[0] == 42 && xmm0h[1]==42 && xmm0h[2] == 42 && xmm0h[3] == 42); break;
+    case SSE_F_H8: assert (xmm0h[0] == 42 && xmm0h[1]==42 && xmm0h[2] == 42 && xmm0h[3] == 42
+			   && xmm1h[0] == 42 && xmm1h[1]==42 && xmm1h[2] == 42 && xmm1h[3] == 42); break;
+    case X87: assert (x87_regs[0]._ldouble == 42); break;
+    case INT_SSE: check_300(); break;
+    case SSE_INT: check_400(); break;
+    /* Ideally we would like to check that rax == struct_addr.
+       Unfortunately the address of the target struct escapes (for setting
+       struct_addr), so the return struct is a temporary one whose address
+       is given to the f_* functions, otherwise a conforming program
+       could notice the struct changing already before the function returns.
+       This temporary struct could be anywhere.  For GCC it will be on
+       stack, but no one is forbidding that it could be a static variable
+       if there's no threading or proper locking.  Nobody in his right mind
+       will not use the stack for that.  */
+    case MEM: assert (*(unsigned char*)struct_addr == 42 && rdi == rax); break;
+  }
+}
+
+#define D(I) { struct S_ ## I s; current_test = I; struct_addr = (void*)&s; \
+  clear_all(); \
+  s = WRAP_RET(f_ ## I) (); \
+  check_all(class_ ## I, sizeof(s)); \
+}
+
+static void
+do_test (void)
+{
+  D(1) D(2) D(3) D(4) D(5) D(6) D(7) D(8) D(9) D(10) D(11) D(12) D(13) D(14)
+  
+  D(100) D(101) D(102) D(103) D(104) D(105) D(106) D(107) D(108) D(109) D(110)
+  D(111)
+  
+  D(120) D(121) D(122) D(123) D(124) D(125) D(126) D(127) D(128) D(129)
+
+  D(200)
+
+  D(300) D(301) D(302) D(303) D(304) D(305) D(306) D(307)
+  D(310) D(311) D(312) D(313)
+
+  D(400) D(401) D(402)
+
+  D(500) D(501) D(502) D(503) D(504) D(505) D(506) D(507) D(508) D(509)
+  D(520)
+  D(530)
+
+  D(540) D(541)
+
+  D(600) D(601) D(602)
+  if (num_failed)
+    abort ();
+}
+#undef D
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c
new file mode 100644
index 00000000000..5bdc44db5f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c
@@ -0,0 +1,164 @@
+/* Test variable number of 128-bit vector arguments passed to functions.  */
+
+#include <stdio.h>
+#include "avx512fp16-xmm-check.h"
+#include "defines.h"
+#include "macros.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  XMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m128_varargs (__m128 i0, __m128 i1, __m128 i2,
+				__m128 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m128 *argp;
+
+  compare (values.i0, i0, __m128);
+  compare (values.i1, i1, __m128);
+  compare (values.i2, i2, __m128);
+  compare (values.i3, i3, __m128);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m128 *) (((char *) fp) + 8);
+
+  /* Check __m128 arguments passed on stack.  */
+  compare (values.i8, argp[0], __m128);
+  compare (values.i9, argp[1], __m128);
+
+  /* Check register contents.  */
+  compare (fregs.xmm0, xmm_regs[0], __m128);
+  compare (fregs.xmm1, xmm_regs[1], __m128);
+  compare (fregs.xmm2, xmm_regs[2], __m128);
+  compare (fregs.xmm3, xmm_regs[3], __m128);
+  compare (fregs.xmm4, xmm_regs[4], __m128);
+  compare (fregs.xmm5, xmm_regs[5], __m128);
+  compare (fregs.xmm6, xmm_regs[6], __m128);
+  compare (fregs.xmm7, xmm_regs[7], __m128);
+}
+
+void
+fun_check_passing_m128h_varargs (__m128h i0, __m128h i1, __m128h i2,
+				 __m128h i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m128h *argp;
+
+  compare (values.i0, i0, __m128h);
+  compare (values.i1, i1, __m128h);
+  compare (values.i2, i2, __m128h);
+  compare (values.i3, i3, __m128h);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m128h *) (((char *) fp) + 8);
+
+  /* Check __m128h arguments passed on stack.  */
+  compare (values.i8, argp[0], __m128h);
+  compare (values.i9, argp[1], __m128h);
+
+  /* Check register contents.  */
+  compare (fregs.xmm0, xmm_regs[0], __m128h);
+  compare (fregs.xmm1, xmm_regs[1], __m128h);
+  compare (fregs.xmm2, xmm_regs[2], __m128h);
+  compare (fregs.xmm3, xmm_regs[3], __m128h);
+  compare (fregs.xmm4, xmm_regs[4], __m128h);
+  compare (fregs.xmm5, xmm_regs[5], __m128h);
+  compare (fregs.xmm6, xmm_regs[6], __m128h);
+  compare (fregs.xmm7, xmm_regs[7], __m128h);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+				      _i6, _i7, _i8, _i9, \
+				      _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_float_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m128_varargs (void)
+{
+  __m128 x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m128){32+i, 0, 0, 0};
+  pass = "m128-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m128_varargs,
+				 _m128);
+}
+
+void
+test_m128h_varargs (void)
+{
+  __m128h x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m128h) {
+        1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+	5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i
+    };
+  pass = "m128h-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m128h_varargs,
+				 _m128h);
+}
+
+static void
+do_test (void)
+{
+  test_m128_varargs ();
+  test_m128h_varargs ();
+  if (failed)
+    abort ();
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 09/10] AVX512FP16: Add ABI test for ymm.
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
                           ` (7 preceding siblings ...)
  2021-07-21  7:43         ` [PATCH 08/10] AVX512FP16: Add ABI tests for xmm liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-07-21  7:43         ` [PATCH 10/10] AVX512FP16: Add abi test for zmm liuhongt
  2021-09-08  2:54         ` [PATCH V2 00/10] Initial support for AVX512FP16 Hongtao Liu
  10 siblings, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp:
	New exp file.
	* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header.
	* gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S: New.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c:
	New test.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c: Likewise.
---
 .../avx512fp16/m256h/abi-avx512fp16-ymm.exp   |  45 +++
 .../x86_64/abi/avx512fp16/m256h/args.h        | 182 +++++++++
 .../x86_64/abi/avx512fp16/m256h/asm-support.S |  81 ++++
 .../avx512fp16/m256h/avx512fp16-ymm-check.h   |   3 +
 .../avx512fp16/m256h/test_m256_returning.c    |  54 +++
 .../abi/avx512fp16/m256h/test_passing_m256.c  | 370 ++++++++++++++++++
 .../avx512fp16/m256h/test_passing_structs.c   | 113 ++++++
 .../avx512fp16/m256h/test_passing_unions.c    | 337 ++++++++++++++++
 .../abi/avx512fp16/m256h/test_varargs-m256.c  | 160 ++++++++
 9 files changed, 1345 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
new file mode 100644
index 00000000000..ecf673bf796
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
@@ -0,0 +1,45 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib file-format.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || [is-effective-target ia32]
+     || [gcc_target_object_format] != "elf"
+     || ![is-effective-target avx512fp16] } then {
+  return
+}
+
+
+torture-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -Wno-abi -mavx512fp16"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+    if {[runtest_file_p $runtests $src]} {
+	c-torture-execute [list $src \
+				$srcdir/$subdir/asm-support.S] \
+				$additional_flags
+    }
+}
+
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
new file mode 100644
index 00000000000..136db48c144
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
@@ -0,0 +1,182 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include <immintrin.h>
+#include <string.h>
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 ymm0
+#define F1 ymm1
+#define F2 ymm2
+#define F3 ymm3
+#define F4 ymm4
+#define F5 ymm5
+#define F6 ymm6
+#define F7 ymm7
+
+typedef union {
+  _Float16 __Float16[16];
+  float _float[8];
+  double _double[4];
+  long _long[4];
+  int _int[8];
+  unsigned long _ulong[4];
+  __m64 _m64[4];
+  __m128 _m128[2];
+  __m256 _m256[1];
+  __m256h _m256h[1];
+} YMM_T;
+
+typedef union {
+  float _float;
+  double _double;
+  long double _ldouble;
+  unsigned long _ulong[2];
+} X87_T;
+extern void (*callthis)(void);
+extern unsigned long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
+YMM_T ymm_regs[16];
+X87_T x87_regs[8];
+extern volatile unsigned long volatile_var;
+extern void snapshot (void);
+extern void snapshot_ret (void);
+#define WRAP_CALL(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
+#define WRAP_RET(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
+
+/* Clear all integer registers.  */
+#define clear_int_hardware_registers \
+  asm __volatile__ ("xor %%rax, %%rax\n\t" \
+		    "xor %%rbx, %%rbx\n\t" \
+		    "xor %%rcx, %%rcx\n\t" \
+		    "xor %%rdx, %%rdx\n\t" \
+		    "xor %%rsi, %%rsi\n\t" \
+		    "xor %%rdi, %%rdi\n\t" \
+		    "xor %%r8, %%r8\n\t" \
+		    "xor %%r9, %%r9\n\t" \
+		    "xor %%r10, %%r10\n\t" \
+		    "xor %%r11, %%r11\n\t" \
+		    "xor %%r12, %%r12\n\t" \
+		    "xor %%r13, %%r13\n\t" \
+		    "xor %%r14, %%r14\n\t" \
+		    "xor %%r15, %%r15\n\t" \
+		    ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
+		    "r9", "r10", "r11", "r12", "r13", "r14", "r15");
+
+/* This is the list of registers available for passing arguments. Not all of
+   these are used or even really available.  */
+struct IntegerRegisters
+{
+  unsigned long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
+};
+struct FloatRegisters
+{
+  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
+  long double st0, st1, st2, st3, st4, st5, st6, st7;
+  YMM_T ymm0, ymm1, ymm2, ymm3, ymm4, ymm5, ymm6, ymm7, ymm8, ymm9,
+        ymm10, ymm11, ymm12, ymm13, ymm14, ymm15;
+};
+
+/* Implemented in scalarargs.c  */
+extern struct IntegerRegisters iregs;
+extern struct FloatRegisters fregs;
+extern unsigned int num_iregs, num_fregs;
+
+#define check_int_arguments do { \
+  assert (num_iregs <= 0 || iregs.I0 == I0); \
+  assert (num_iregs <= 1 || iregs.I1 == I1); \
+  assert (num_iregs <= 2 || iregs.I2 == I2); \
+  assert (num_iregs <= 3 || iregs.I3 == I3); \
+  assert (num_iregs <= 4 || iregs.I4 == I4); \
+  assert (num_iregs <= 5 || iregs.I5 == I5); \
+  } while (0)
+
+#define check_char_arguments check_int_arguments
+#define check_short_arguments check_int_arguments
+#define check_long_arguments check_int_arguments
+
+/* Clear register struct.  */
+#define clear_struct_registers \
+  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
+    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
+  memset (&iregs, 0, sizeof (iregs)); \
+  memset (&fregs, 0, sizeof (fregs)); \
+  memset (ymm_regs, 0, sizeof (ymm_regs)); \
+  memset (x87_regs, 0, sizeof (x87_regs));
+
+/* Clear both hardware and register structs for integers.  */
+#define clear_int_registers \
+  clear_struct_registers \
+  clear_int_hardware_registers
+
+/* TODO: Do the checking.  */
+#define check_f_arguments(T) do { \
+  assert (num_fregs <= 0 || fregs.ymm0._ ## T [0] == ymm_regs[0]._ ## T [0]); \
+  assert (num_fregs <= 1 || fregs.ymm1._ ## T [0] == ymm_regs[1]._ ## T [0]); \
+  assert (num_fregs <= 2 || fregs.ymm2._ ## T [0] == ymm_regs[2]._ ## T [0]); \
+  assert (num_fregs <= 3 || fregs.ymm3._ ## T [0] == ymm_regs[3]._ ## T [0]); \
+  assert (num_fregs <= 4 || fregs.ymm4._ ## T [0] == ymm_regs[4]._ ## T [0]); \
+  assert (num_fregs <= 5 || fregs.ymm5._ ## T [0] == ymm_regs[5]._ ## T [0]); \
+  assert (num_fregs <= 6 || fregs.ymm6._ ## T [0] == ymm_regs[6]._ ## T [0]); \
+  assert (num_fregs <= 7 || fregs.ymm7._ ## T [0] == ymm_regs[7]._ ## T [0]); \
+  } while (0)
+
+#define check_float_arguments check_f_arguments(float)
+#define check_double_arguments check_f_arguments(double)
+
+#define check_vector_arguments(T,O) do { \
+  assert (num_fregs <= 0 \
+	  || memcmp (((char *) &fregs.ymm0) + (O), \
+		     &ymm_regs[0], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 1 \
+	  || memcmp (((char *) &fregs.ymm1) + (O), \
+		     &ymm_regs[1], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 2 \
+	  || memcmp (((char *) &fregs.ymm2) + (O), \
+		     &ymm_regs[2], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 3 \
+	  || memcmp (((char *) &fregs.ymm3) + (O), \
+		     &ymm_regs[3], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 4 \
+	  || memcmp (((char *) &fregs.ymm4) + (O), \
+		     &ymm_regs[4], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 5 \
+	  || memcmp (((char *) &fregs.ymm5) + (O), \
+		     &ymm_regs[5], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 6 \
+	  || memcmp (((char *) &fregs.ymm6) + (O), \
+		     &ymm_regs[6], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 7 \
+	  || memcmp (((char *) &fregs.ymm7) + (O), \
+		     &ymm_regs[7], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  } while (0)
+
+#define check_m64_arguments check_vector_arguments(m64, 0)
+#define check_m128_arguments check_vector_arguments(m128, 0)
+#define check_m256_arguments check_vector_arguments(m256, 0)
+
+#endif /* INCLUDED_ARGS_H  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
new file mode 100644
index 00000000000..73a59191d6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
@@ -0,0 +1,81 @@
+	.text
+	.p2align 4,,15
+.globl snapshot
+	.type	snapshot, @function
+snapshot:
+.LFB3:
+	movq	%rax, rax(%rip)
+	movq	%rbx, rbx(%rip)
+	movq	%rcx, rcx(%rip)
+	movq	%rdx, rdx(%rip)
+	movq	%rdi, rdi(%rip)
+	movq	%rsi, rsi(%rip)
+	movq	%rbp, rbp(%rip)
+	movq	%rsp, rsp(%rip)
+	movq	%r8, r8(%rip)
+	movq	%r9, r9(%rip)
+	movq	%r10, r10(%rip)
+	movq	%r11, r11(%rip)
+	movq	%r12, r12(%rip)
+	movq	%r13, r13(%rip)
+	movq	%r14, r14(%rip)
+	movq	%r15, r15(%rip)
+	vmovdqu	%ymm0, ymm_regs+0(%rip)
+	vmovdqu	%ymm1, ymm_regs+32(%rip)
+	vmovdqu	%ymm2, ymm_regs+64(%rip)
+	vmovdqu	%ymm3, ymm_regs+96(%rip)
+	vmovdqu	%ymm4, ymm_regs+128(%rip)
+	vmovdqu	%ymm5, ymm_regs+160(%rip)
+	vmovdqu	%ymm6, ymm_regs+192(%rip)
+	vmovdqu	%ymm7, ymm_regs+224(%rip)
+	vmovdqu	%ymm8, ymm_regs+256(%rip)
+	vmovdqu	%ymm9, ymm_regs+288(%rip)
+	vmovdqu	%ymm10, ymm_regs+320(%rip)
+	vmovdqu	%ymm11, ymm_regs+352(%rip)
+	vmovdqu	%ymm12, ymm_regs+384(%rip)
+	vmovdqu	%ymm13, ymm_regs+416(%rip)
+	vmovdqu	%ymm14, ymm_regs+448(%rip)
+	vmovdqu	%ymm15, ymm_regs+480(%rip)
+	jmp	*callthis(%rip)
+.LFE3:
+	.size	snapshot, .-snapshot
+
+	.p2align 4,,15
+.globl snapshot_ret
+	.type	snapshot_ret, @function
+snapshot_ret:
+	movq	%rdi, rdi(%rip)
+	subq	$8, %rsp
+	call	*callthis(%rip)
+	addq	$8, %rsp
+	movq	%rax, rax(%rip)
+	movq	%rdx, rdx(%rip)
+	vmovdqu	%ymm0, ymm_regs+0(%rip)
+	vmovdqu	%ymm1, ymm_regs+32(%rip)
+	fstpt	x87_regs(%rip)
+	fstpt	x87_regs+16(%rip)
+	fldt	x87_regs+16(%rip)
+	fldt	x87_regs(%rip)
+	ret
+	.size	snapshot_ret, .-snapshot_ret
+
+	.comm	callthis,8,8
+	.comm	rax,8,8
+	.comm	rbx,8,8
+	.comm	rcx,8,8
+	.comm	rdx,8,8
+	.comm	rsi,8,8
+	.comm	rdi,8,8
+	.comm	rsp,8,8
+	.comm	rbp,8,8
+	.comm	r8,8,8
+	.comm	r9,8,8
+	.comm	r10,8,8
+	.comm	r11,8,8
+	.comm	r12,8,8
+	.comm	r13,8,8
+	.comm	r14,8,8
+	.comm	r15,8,8
+	.comm	ymm_regs,512,32
+	.comm	x87_regs,128,32
+	.comm   volatile_var,8,8
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
new file mode 100644
index 00000000000..6a55030c0d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
@@ -0,0 +1,3 @@
+#define AVX512VL(ebx) (ebx & bit_AVX512VL)
+#define XSTATE_MASK (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK)
+#include "../avx512fp16-check.h"
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
new file mode 100644
index 00000000000..48e0139f416
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
@@ -0,0 +1,54 @@
+#include <stdio.h>
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+__m256
+fun_test_returning___m256 (void)
+{
+  volatile_var++;
+  return (__m256){73,0,0,0,0,0,0,0};
+}
+
+__m256h
+fun_test_returning___m256h (void)
+{
+  volatile_var++;
+  return (__m256h){1.1f16,2.1f16,3.1f16,4.1f16,
+                   5.1f16,6.1f16,7.1f16,8.1f16,
+                   9.1f16,10.1f16,11.1f16,12.1f16,
+		   13.1f16,14.1f16,15.1f16,16.1f16};
+}
+
+__m256 test_256;
+__m256h test_256h;
+
+static void
+do_test (void)
+{
+  unsigned failed = 0;
+  YMM_T ymmt1, ymmt2;
+
+  clear_struct_registers;
+  test_256 = (__m256){73,0,0,0,0,0,0,0};
+  ymmt1._m256[0] = test_256;
+  ymmt2._m256[0] = WRAP_RET (fun_test_returning___m256)();
+  if (memcmp (&ymmt1, &ymmt2, sizeof (ymmt2)) != 0)
+    printf ("fail m256\n"), failed++;
+
+  clear_struct_registers;
+  test_256h = (__m256h){1.1f16,2.1f16,3.1f16,4.1f16,
+                        5.1f16,6.1f16,7.1f16,8.1f16,
+                        9.1f16,10.1f16,11.1f16,12.1f16,
+			13.1f16,14.1f16,15.1f16,16.1f16};
+  ymmt1._m256h[0] = test_256h;
+  ymmt2._m256h[0] = WRAP_RET (fun_test_returning___m256h)();
+  if (memcmp (&ymmt1, &ymmt2, sizeof (ymmt2)) != 0)
+    printf ("fail m256h\n"), failed++;
+
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
new file mode 100644
index 00000000000..bfa80d616ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
@@ -0,0 +1,370 @@
+#include <stdio.h>
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
+    i16, i17, i18, i19, i20, i21, i22, i23;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+fun_check_passing_m256_8_values (__m256 i0 ATTRIBUTE_UNUSED,
+				 __m256 i1 ATTRIBUTE_UNUSED,
+				 __m256 i2 ATTRIBUTE_UNUSED,
+				 __m256 i3 ATTRIBUTE_UNUSED,
+				 __m256 i4 ATTRIBUTE_UNUSED,
+				 __m256 i5 ATTRIBUTE_UNUSED,
+				 __m256 i6 ATTRIBUTE_UNUSED,
+				 __m256 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256);
+  compare (values.i1, i1, __m256);
+  compare (values.i2, i2, __m256);
+  compare (values.i3, i3, __m256);
+  compare (values.i4, i4, __m256);
+  compare (values.i5, i5, __m256);
+  compare (values.i6, i6, __m256);
+  compare (values.i7, i7, __m256);
+}
+
+fun_check_passing_m256h_8_values (__m256h i0 ATTRIBUTE_UNUSED,
+				  __m256h i1 ATTRIBUTE_UNUSED,
+				  __m256h i2 ATTRIBUTE_UNUSED,
+				  __m256h i3 ATTRIBUTE_UNUSED,
+				  __m256h i4 ATTRIBUTE_UNUSED,
+				  __m256h i5 ATTRIBUTE_UNUSED,
+				  __m256h i6 ATTRIBUTE_UNUSED,
+				  __m256h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256h);
+  compare (values.i1, i1, __m256h);
+  compare (values.i2, i2, __m256h);
+  compare (values.i3, i3, __m256h);
+  compare (values.i4, i4, __m256h);
+  compare (values.i5, i5, __m256h);
+  compare (values.i6, i6, __m256h);
+  compare (values.i7, i7, __m256h);
+}
+
+void
+fun_check_passing_m256_8_regs (__m256 i0 ATTRIBUTE_UNUSED,
+			       __m256 i1 ATTRIBUTE_UNUSED,
+			       __m256 i2 ATTRIBUTE_UNUSED,
+			       __m256 i3 ATTRIBUTE_UNUSED,
+			       __m256 i4 ATTRIBUTE_UNUSED,
+			       __m256 i5 ATTRIBUTE_UNUSED,
+			       __m256 i6 ATTRIBUTE_UNUSED,
+			       __m256 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+void
+fun_check_passing_m256h_8_regs (__m256h i0 ATTRIBUTE_UNUSED,
+				__m256h i1 ATTRIBUTE_UNUSED,
+				__m256h i2 ATTRIBUTE_UNUSED,
+				__m256h i3 ATTRIBUTE_UNUSED,
+				__m256h i4 ATTRIBUTE_UNUSED,
+				__m256h i5 ATTRIBUTE_UNUSED,
+				__m256h i6 ATTRIBUTE_UNUSED,
+				__m256h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+void
+fun_check_passing_m256_20_values (__m256 i0 ATTRIBUTE_UNUSED,
+				  __m256 i1 ATTRIBUTE_UNUSED,
+				  __m256 i2 ATTRIBUTE_UNUSED,
+				  __m256 i3 ATTRIBUTE_UNUSED,
+				  __m256 i4 ATTRIBUTE_UNUSED,
+				  __m256 i5 ATTRIBUTE_UNUSED,
+				  __m256 i6 ATTRIBUTE_UNUSED,
+				  __m256 i7 ATTRIBUTE_UNUSED,
+				  __m256 i8 ATTRIBUTE_UNUSED,
+				  __m256 i9 ATTRIBUTE_UNUSED,
+				  __m256 i10 ATTRIBUTE_UNUSED,
+				  __m256 i11 ATTRIBUTE_UNUSED,
+				  __m256 i12 ATTRIBUTE_UNUSED,
+				  __m256 i13 ATTRIBUTE_UNUSED,
+				  __m256 i14 ATTRIBUTE_UNUSED,
+				  __m256 i15 ATTRIBUTE_UNUSED,
+				  __m256 i16 ATTRIBUTE_UNUSED,
+				  __m256 i17 ATTRIBUTE_UNUSED,
+				  __m256 i18 ATTRIBUTE_UNUSED,
+				  __m256 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256);
+  compare (values.i1, i1, __m256);
+  compare (values.i2, i2, __m256);
+  compare (values.i3, i3, __m256);
+  compare (values.i4, i4, __m256);
+  compare (values.i5, i5, __m256);
+  compare (values.i6, i6, __m256);
+  compare (values.i7, i7, __m256);
+  compare (values.i8, i8, __m256);
+  compare (values.i9, i9, __m256);
+  compare (values.i10, i10, __m256);
+  compare (values.i11, i11, __m256);
+  compare (values.i12, i12, __m256);
+  compare (values.i13, i13, __m256);
+  compare (values.i14, i14, __m256);
+  compare (values.i15, i15, __m256);
+  compare (values.i16, i16, __m256);
+  compare (values.i17, i17, __m256);
+  compare (values.i18, i18, __m256);
+  compare (values.i19, i19, __m256);
+}
+
+void
+fun_check_passing_m256h_20_values (__m256h i0 ATTRIBUTE_UNUSED,
+				   __m256h i1 ATTRIBUTE_UNUSED,
+				   __m256h i2 ATTRIBUTE_UNUSED,
+				   __m256h i3 ATTRIBUTE_UNUSED,
+				   __m256h i4 ATTRIBUTE_UNUSED,
+				   __m256h i5 ATTRIBUTE_UNUSED,
+				   __m256h i6 ATTRIBUTE_UNUSED,
+				   __m256h i7 ATTRIBUTE_UNUSED,
+				   __m256h i8 ATTRIBUTE_UNUSED,
+				   __m256h i9 ATTRIBUTE_UNUSED,
+				   __m256h i10 ATTRIBUTE_UNUSED,
+				   __m256h i11 ATTRIBUTE_UNUSED,
+				   __m256h i12 ATTRIBUTE_UNUSED,
+				   __m256h i13 ATTRIBUTE_UNUSED,
+				   __m256h i14 ATTRIBUTE_UNUSED,
+				   __m256h i15 ATTRIBUTE_UNUSED,
+				   __m256h i16 ATTRIBUTE_UNUSED,
+				   __m256h i17 ATTRIBUTE_UNUSED,
+				   __m256h i18 ATTRIBUTE_UNUSED,
+				   __m256h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m256h);
+  compare (values.i1, i1, __m256h);
+  compare (values.i2, i2, __m256h);
+  compare (values.i3, i3, __m256h);
+  compare (values.i4, i4, __m256h);
+  compare (values.i5, i5, __m256h);
+  compare (values.i6, i6, __m256h);
+  compare (values.i7, i7, __m256h);
+  compare (values.i8, i8, __m256h);
+  compare (values.i9, i9, __m256h);
+  compare (values.i10, i10, __m256h);
+  compare (values.i11, i11, __m256h);
+  compare (values.i12, i12, __m256h);
+  compare (values.i13, i13, __m256h);
+  compare (values.i14, i14, __m256h);
+  compare (values.i15, i15, __m256h);
+  compare (values.i16, i16, __m256h);
+  compare (values.i17, i17, __m256h);
+  compare (values.i18, i18, __m256h);
+  compare (values.i19, i19, __m256h);
+}
+
+void
+fun_check_passing_m256_20_regs (__m256 i0 ATTRIBUTE_UNUSED,
+				__m256 i1 ATTRIBUTE_UNUSED,
+				__m256 i2 ATTRIBUTE_UNUSED,
+				__m256 i3 ATTRIBUTE_UNUSED,
+				__m256 i4 ATTRIBUTE_UNUSED,
+				__m256 i5 ATTRIBUTE_UNUSED,
+				__m256 i6 ATTRIBUTE_UNUSED,
+				__m256 i7 ATTRIBUTE_UNUSED,
+				__m256 i8 ATTRIBUTE_UNUSED,
+				__m256 i9 ATTRIBUTE_UNUSED,
+				__m256 i10 ATTRIBUTE_UNUSED,
+				__m256 i11 ATTRIBUTE_UNUSED,
+				__m256 i12 ATTRIBUTE_UNUSED,
+				__m256 i13 ATTRIBUTE_UNUSED,
+				__m256 i14 ATTRIBUTE_UNUSED,
+				__m256 i15 ATTRIBUTE_UNUSED,
+				__m256 i16 ATTRIBUTE_UNUSED,
+				__m256 i17 ATTRIBUTE_UNUSED,
+				__m256 i18 ATTRIBUTE_UNUSED,
+				__m256 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+void
+fun_check_passing_m256h_20_regs (__m256h i0 ATTRIBUTE_UNUSED,
+				 __m256h i1 ATTRIBUTE_UNUSED,
+				 __m256h i2 ATTRIBUTE_UNUSED,
+				 __m256h i3 ATTRIBUTE_UNUSED,
+				 __m256h i4 ATTRIBUTE_UNUSED,
+				 __m256h i5 ATTRIBUTE_UNUSED,
+				 __m256h i6 ATTRIBUTE_UNUSED,
+				 __m256h i7 ATTRIBUTE_UNUSED,
+				 __m256h i8 ATTRIBUTE_UNUSED,
+				 __m256h i9 ATTRIBUTE_UNUSED,
+				 __m256h i10 ATTRIBUTE_UNUSED,
+				 __m256h i11 ATTRIBUTE_UNUSED,
+				 __m256h i12 ATTRIBUTE_UNUSED,
+				 __m256h i13 ATTRIBUTE_UNUSED,
+				 __m256h i14 ATTRIBUTE_UNUSED,
+				 __m256h i15 ATTRIBUTE_UNUSED,
+				 __m256h i16 ATTRIBUTE_UNUSED,
+				 __m256h i17 ATTRIBUTE_UNUSED,
+				 __m256h i18 ATTRIBUTE_UNUSED,
+				 __m256h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m256_arguments;
+}
+
+#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
+
+#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, \
+			    _i8, _i9, _i10, _i11, _i12, _i13, _i14, \
+			    _i15, _i16, _i17, _i18, _i19, _func1, \
+			    _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  values.i10.TYPE[0] = _i10; \
+  values.i11.TYPE[0] = _i11; \
+  values.i12.TYPE[0] = _i12; \
+  values.i13.TYPE[0] = _i13; \
+  values.i14.TYPE[0] = _i14; \
+  values.i15.TYPE[0] = _i15; \
+  values.i16.TYPE[0] = _i16; \
+  values.i17.TYPE[0] = _i17; \
+  values.i18.TYPE[0] = _i18; \
+  values.i19.TYPE[0] = _i19; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, \
+		     _i16, _i17, _i18, _i19); \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, \
+		     _i9, _i10, _i11, _i12, _i13, _i14, _i15, \
+		     _i16, _i17, _i18, _i19);
+
+void
+test_m256_on_stack ()
+{
+  __m256 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m256){32 + i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m256-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m256_8_values,
+		      fun_check_passing_m256_8_regs, _m256);
+}
+
+void
+test_m256h_on_stack ()
+{
+  __m256h x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m256h){1.1f16 + i, 2.1f16 + i, 3.1f16 + i, 4.1f16 + i,
+	             5.1f16 + i, 6.1f16 + i, 7.1f16 + i, 8.1f16 + i,
+	             9.1f16 + i, 10.1f16 + i, 11.1f16 + i, 12.1f16 + i,
+	             13.1f16 + i, 14.1f16 + i, 15.1f16 + i, 16.1f16 + i};
+  pass = "m256h-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m256h_8_values,
+		      fun_check_passing_m256h_8_regs, _m256h);
+}
+
+void
+test_too_many_m256 ()
+{
+  __m256 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m256){32 + i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m256-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m256_20_values,
+		       fun_check_passing_m256_20_regs, _m256);
+}
+
+void
+test_too_many_m256h ()
+{
+  __m256h x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m256h){1.1f16 + i, 2.1f16 + i, 3.1f16 + i, 4.1f16 + i,
+	             5.1f16 + i, 6.1f16 + i, 7.1f16 + i, 8.1f16 + i,
+	             9.1f16 + i, 10.1f16 + i, 11.1f16 + i, 12.1f16 + i,
+	             13.1f16 + i, 14.1f16 + i, 15.1f16 + i, 16.1f16 + i};
+  pass = "m256h-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m256h_20_values,
+		       fun_check_passing_m256h_20_regs, _m256h);
+}
+
+static void
+do_test (void)
+{
+  test_m256_on_stack ();
+  test_too_many_m256 ();
+  test_m256h_on_stack ();
+  test_too_many_m256h ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
new file mode 100644
index 00000000000..eff10badd6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
@@ -0,0 +1,113 @@
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+struct m256_struct
+{
+  __m256 x;
+};
+
+struct m256_2_struct
+{
+  __m256 x1, x2;
+};
+
+struct m256h_struct
+{
+  __m256h x;
+};
+
+struct m256h_2_struct
+{
+  __m256h x1, x2;
+};
+
+/* Check that the struct is passed as the individual members in fregs.  */
+void
+check_struct_passing1 (struct m256_struct ms1 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms2 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms3 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms4 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms5 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms6 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms7 ATTRIBUTE_UNUSED,
+		       struct m256_struct ms8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_struct_passing2 (struct m256_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+40);
+}
+
+void
+check_struct_passing1h (struct m256h_struct ms1 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms2 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms3 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms4 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms5 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms6 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms7 ATTRIBUTE_UNUSED,
+		        struct m256h_struct ms8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_struct_passing2h (struct m256h_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+40);
+}
+
+static void
+do_test (void)
+{
+  struct m256_struct m256s [8];
+  struct m256h_struct m256hs [8];
+  struct m256_2_struct m256_2s = { 
+      { 48.394, 39.3, -397.9, 3484.9, -8.394, -93.3, 7.9, 84.94 },
+      { -8.394, -3.3, -39.9, 34.9, 7.9, 84.94, -48.394, 39.3 }
+  };
+  struct m256h_2_struct m256h_2s = { 
+      { 47.364f16, 36.3f16, -367.6f16, 3474.6f16, -7.364f16, -63.3f16, 7.6f16, 74.64f16,
+        57.865f16, 86.8f16, -867.6f16, 8575.6f16, -7.865f16, -68.8f16, 7.6f16, 75.65f16  },
+      { -7.364f16, -3.3f16, -36.6f16, 34.6f16, 7.6f16, 74.64f16, -47.364f16, 36.3f16,
+        -8.364f16, -3.3f16, -36.6f16, 34.6f16, 8.6f16, 84.64f16, -48.364f16, 36.3f16  }
+  };
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      m256s[i].x = (__m256){32+i, 0, i, 0, -i, 0, i - 12, i + 8};
+
+      m256hs[i].x = (__m256h){33+i, 0, i, 0, -i, 0, i - 11, i + 9,
+                              31+i, 2, i, 3, -i, 4, i - 10, i + 7};
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256[0] = m256s[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1)(m256s[0], m256s[1], m256s[2], m256s[3],
+				    m256s[4], m256s[5], m256s[6], m256s[7]);
+  WRAP_CALL (check_struct_passing2)(m256_2s);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256h[0] = m256hs[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1h)(m256hs[0], m256hs[1], m256hs[2], m256hs[3],
+				    m256hs[4], m256hs[5], m256hs[6], m256hs[7]);
+  WRAP_CALL (check_struct_passing2h)(m256h_2s);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
new file mode 100644
index 00000000000..76f300c3e5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
@@ -0,0 +1,337 @@
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+union un1
+{
+  __m256 x;
+  float f;
+};
+
+union un2
+{
+  __m256 x;
+  double d;
+};
+
+union un3
+{
+  __m256 x;
+  __m128 v;
+};
+
+union un4
+{
+  __m256 x;
+  long double ld;
+};
+
+union un5
+{
+  __m256 x;
+  int i;
+};
+
+union un1a
+{
+  __m256 x;
+  _Float16 f;
+};
+
+union un1h
+{
+  __m256h x;
+  float f;
+};
+
+union un1hh
+{
+  __m256h x;
+  _Float16 f;
+};
+
+union un2h
+{
+  __m256h x;
+  double d;
+};
+
+union un3h
+{
+  __m256h x;
+  __m128 v;
+};
+
+union un4h
+{
+  __m256h x;
+  long double ld;
+};
+
+union un5h
+{
+  __m256h x;
+  int i;
+};
+
+void
+check_union_passing1(union un1 u1 ATTRIBUTE_UNUSED,
+		     union un1 u2 ATTRIBUTE_UNUSED,
+		     union un1 u3 ATTRIBUTE_UNUSED,
+		     union un1 u4 ATTRIBUTE_UNUSED,
+		     union un1 u5 ATTRIBUTE_UNUSED,
+		     union un1 u6 ATTRIBUTE_UNUSED,
+		     union un1 u7 ATTRIBUTE_UNUSED,
+		     union un1 u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing1a(union un1a u1 ATTRIBUTE_UNUSED,
+		      union un1a u2 ATTRIBUTE_UNUSED,
+		      union un1a u3 ATTRIBUTE_UNUSED,
+		      union un1a u4 ATTRIBUTE_UNUSED,
+		      union un1a u5 ATTRIBUTE_UNUSED,
+		      union un1a u6 ATTRIBUTE_UNUSED,
+		      union un1a u7 ATTRIBUTE_UNUSED,
+		      union un1a u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing1h(union un1h u1 ATTRIBUTE_UNUSED,
+		      union un1h u2 ATTRIBUTE_UNUSED,
+		      union un1h u3 ATTRIBUTE_UNUSED,
+		      union un1h u4 ATTRIBUTE_UNUSED,
+		      union un1h u5 ATTRIBUTE_UNUSED,
+		      union un1h u6 ATTRIBUTE_UNUSED,
+		      union un1h u7 ATTRIBUTE_UNUSED,
+		      union un1h u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing1hh(union un1hh u1 ATTRIBUTE_UNUSED,
+		       union un1hh u2 ATTRIBUTE_UNUSED,
+		       union un1hh u3 ATTRIBUTE_UNUSED,
+		       union un1hh u4 ATTRIBUTE_UNUSED,
+		       union un1hh u5 ATTRIBUTE_UNUSED,
+		       union un1hh u6 ATTRIBUTE_UNUSED,
+		       union un1hh u7 ATTRIBUTE_UNUSED,
+		       union un1hh u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED,
+		     union un2 u2 ATTRIBUTE_UNUSED,
+		     union un2 u3 ATTRIBUTE_UNUSED,
+		     union un2 u4 ATTRIBUTE_UNUSED,
+		     union un2 u5 ATTRIBUTE_UNUSED,
+		     union un2 u6 ATTRIBUTE_UNUSED,
+		     union un2 u7 ATTRIBUTE_UNUSED,
+		     union un2 u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing2h(union un2h u1 ATTRIBUTE_UNUSED,
+		      union un2h u2 ATTRIBUTE_UNUSED,
+		      union un2h u3 ATTRIBUTE_UNUSED,
+		      union un2h u4 ATTRIBUTE_UNUSED,
+		      union un2h u5 ATTRIBUTE_UNUSED,
+		      union un2h u6 ATTRIBUTE_UNUSED,
+		      union un2h u7 ATTRIBUTE_UNUSED,
+		      union un2h u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing3(union un3 u1 ATTRIBUTE_UNUSED,
+		     union un3 u2 ATTRIBUTE_UNUSED,
+		     union un3 u3 ATTRIBUTE_UNUSED,
+		     union un3 u4 ATTRIBUTE_UNUSED,
+		     union un3 u5 ATTRIBUTE_UNUSED,
+		     union un3 u6 ATTRIBUTE_UNUSED,
+		     union un3 u7 ATTRIBUTE_UNUSED,
+		     union un3 u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing3h(union un3h u1 ATTRIBUTE_UNUSED,
+		      union un3h u2 ATTRIBUTE_UNUSED,
+		      union un3h u3 ATTRIBUTE_UNUSED,
+		      union un3h u4 ATTRIBUTE_UNUSED,
+		      union un3h u5 ATTRIBUTE_UNUSED,
+		      union un3h u6 ATTRIBUTE_UNUSED,
+		      union un3h u7 ATTRIBUTE_UNUSED,
+		      union un3h u8 ATTRIBUTE_UNUSED)
+{
+  check_m256_arguments;
+}
+
+void
+check_union_passing4(union un4 u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing4h(union un4h u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing5(union un5 u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+void
+check_union_passing5h(union un5h u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+#define check_union_passing1 WRAP_CALL(check_union_passing1)
+#define check_union_passing2 WRAP_CALL(check_union_passing2)
+#define check_union_passing3 WRAP_CALL(check_union_passing3)
+#define check_union_passing4 WRAP_CALL(check_union_passing4)
+#define check_union_passing5 WRAP_CALL(check_union_passing5)
+
+#define check_union_passing1h WRAP_CALL(check_union_passing1h)
+#define check_union_passing1a WRAP_CALL(check_union_passing1a)
+#define check_union_passing1hh WRAP_CALL(check_union_passing1hh)
+#define check_union_passing2h WRAP_CALL(check_union_passing2h)
+#define check_union_passing3h WRAP_CALL(check_union_passing3h)
+#define check_union_passing4h WRAP_CALL(check_union_passing4h)
+#define check_union_passing5h WRAP_CALL(check_union_passing5h)
+
+static void
+do_test (void)
+{
+  union un1 u1[8];
+  union un2 u2[8];
+  union un3 u3[8];
+  union un4 u4;
+  union un5 u5;
+  union un1a u1a[8];
+  union un1h u1h[8];
+  union un1hh u1hh[8];
+  union un2h u2h[8];
+  union un3h u3h[8];
+  union un4h u4h;
+  union un5h u5h;
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      u1[i].x = (__m256){32+i, 0, i, 0, -i, 0, i - 12, i + 8};
+      u1h[i].x = (__m256h){32+i, 0, i, 0, -i, 0, i - 12, i + 8,
+                           33+i, 1, i, 2, -i, 4, i - 11, i + 9};
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256[0] = u1[i].x;
+  num_fregs = 8;
+  check_union_passing1(u1[0], u1[1], u1[2], u1[3],
+		       u1[4], u1[5], u1[6], u1[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1a[i].x = u1[i].x;
+      (&fregs.ymm0)[i]._m256[0] = u1a[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1a(u1a[0], u1a[1], u1a[2], u1a[3],
+		        u1a[4], u1a[5], u1a[6], u1a[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.ymm0)[i]._m256h[0] = u1h[i].x;
+  num_fregs = 8;
+  check_union_passing1h(u1h[0], u1h[1], u1h[2], u1h[3],
+		        u1h[4], u1h[5], u1h[6], u1h[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1hh[i].x = u1h[i].x;
+      (&fregs.ymm0)[i]._m256h[0] = u1hh[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1hh(u1hh[0], u1hh[1], u1hh[2], u1hh[3],
+		         u1hh[4], u1hh[5], u1hh[6], u1hh[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2[i].x = u1[i].x;
+      (&fregs.ymm0)[i]._m256[0] = u2[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2(u2[0], u2[1], u2[2], u2[3],
+		       u2[4], u2[5], u2[6], u2[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2h[i].x = u1h[i].x;
+      (&fregs.ymm0)[i]._m256h[0] = u2h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2h(u2h[0], u2h[1], u2h[2], u2h[3],
+		        u2h[4], u2h[5], u2h[6], u2h[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3[i].x = u1[i].x;
+      (&fregs.ymm0)[i]._m256[0] = u3[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3(u3[0], u3[1], u3[2], u3[3],
+		       u3[4], u3[5], u3[6], u3[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3h[i].x = u1h[i].x;
+      (&fregs.ymm0)[i]._m256h[0] = u3h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3h(u3h[0], u3h[1], u3h[2], u3h[3],
+		        u3h[4], u3h[5], u3h[6], u3h[7]);
+
+  check_union_passing4(u4);
+  check_union_passing5(u5);
+
+  check_union_passing4h(u4h);
+  check_union_passing5h(u5h);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c
new file mode 100644
index 00000000000..f15adb4a33b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c
@@ -0,0 +1,160 @@
+/* Test variable number of 256-bit vector arguments passed to functions.  */
+
+#include <stdio.h>
+#include "avx512fp16-ymm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m256_varargs (__m256 i0, __m256 i1, __m256 i2,
+				__m256 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m256 *argp;
+
+  compare (values.i0, i0, __m256);
+  compare (values.i1, i1, __m256);
+  compare (values.i2, i2, __m256);
+  compare (values.i3, i3, __m256);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m256 *)(((char *) fp) + 8);
+
+  /* Check __m256 arguments passed on stack.  */
+  compare (values.i4, argp[0], __m256);
+  compare (values.i5, argp[1], __m256);
+  compare (values.i6, argp[2], __m256);
+  compare (values.i7, argp[3], __m256);
+  compare (values.i8, argp[4], __m256);
+  compare (values.i9, argp[5], __m256);
+
+  /* Check register contents.  */
+  compare (fregs.ymm0, ymm_regs[0], __m256);
+  compare (fregs.ymm1, ymm_regs[1], __m256);
+  compare (fregs.ymm2, ymm_regs[2], __m256);
+  compare (fregs.ymm3, ymm_regs[3], __m256);
+}
+
+void
+fun_check_passing_m256h_varargs (__m256h i0, __m256h i1, __m256h i2,
+				 __m256h i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m256h *argp;
+
+  compare (values.i0, i0, __m256h);
+  compare (values.i1, i1, __m256h);
+  compare (values.i2, i2, __m256h);
+  compare (values.i3, i3, __m256h);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m256h *)(((char *) fp) + 8);
+
+  /* Check __m256h arguments passed on stack.  */
+  compare (values.i4, argp[0], __m256h);
+  compare (values.i5, argp[1], __m256h);
+  compare (values.i6, argp[2], __m256h);
+  compare (values.i7, argp[3], __m256h);
+  compare (values.i8, argp[4], __m256h);
+  compare (values.i9, argp[5], __m256h);
+
+  /* Check register contents.  */
+  compare (fregs.ymm0, ymm_regs[0], __m256h);
+  compare (fregs.ymm1, ymm_regs[1], __m256h);
+  compare (fregs.ymm2, ymm_regs[2], __m256h);
+  compare (fregs.ymm3, ymm_regs[3], __m256h);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+				      _i6, _i7, _i8, _i9, \
+				      _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m256_varargs (void)
+{
+  __m256 x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m256){32+i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m256-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m256_varargs,
+				 _m256);
+}
+
+void
+test_m256h_varargs (void)
+{
+  __m256h x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m256h) {
+        1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+	5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i,
+	9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i,
+	13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i
+    };
+  pass = "m256h-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m256h_varargs,
+				 _m256h);
+}
+
+void
+do_test (void)
+{
+  test_m256_varargs ();
+  test_m256h_varargs ();
+  if (failed)
+    abort ();
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 10/10] AVX512FP16: Add abi test for zmm
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
                           ` (8 preceding siblings ...)
  2021-07-21  7:43         ` [PATCH 09/10] AVX512FP16: Add ABI test for ymm liuhongt
@ 2021-07-21  7:43         ` liuhongt
  2021-09-08  2:54         ` [PATCH V2 00/10] Initial support for AVX512FP16 Hongtao Liu
  10 siblings, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-07-21  7:43 UTC (permalink / raw)
  To: gcc-patches, ubizjak; +Cc: joseph, hjl.tools, richard.guenther, crazylht

gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp:
	New file.
	* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c:
	Likewise.
---
 .../avx512fp16/m512h/abi-avx512fp16-zmm.exp   |  48 ++
 .../x86_64/abi/avx512fp16/m512h/args.h        | 186 ++++++++
 .../x86_64/abi/avx512fp16/m512h/asm-support.S |  97 ++++
 .../avx512fp16/m512h/avx512fp16-zmm-check.h   |   4 +
 .../avx512fp16/m512h/test_m512_returning.c    |  62 +++
 .../abi/avx512fp16/m512h/test_passing_m512.c  | 380 ++++++++++++++++
 .../avx512fp16/m512h/test_passing_structs.c   | 123 ++++++
 .../avx512fp16/m512h/test_passing_unions.c    | 415 ++++++++++++++++++
 .../abi/avx512fp16/m512h/test_varargs-m512.c  | 164 +++++++
 9 files changed, 1479 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
new file mode 100644
index 00000000000..33d24762788
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
@@ -0,0 +1,48 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib clearcap.exp
+load_lib file-format.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || [is-effective-target ia32]
+     || [gcc_target_object_format] != "elf"
+     || ![is-effective-target avx512fp16] } then {
+  return
+}
+
+
+torture-init
+clearcap-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -Wno-abi -mavx512fp16"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+    if {[runtest_file_p $runtests $src]} {
+	c-torture-execute [list $src \
+				$srcdir/$subdir/asm-support.S] \
+				$additional_flags
+    }
+}
+
+clearcap-finish
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
new file mode 100644
index 00000000000..ec89fae4597
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
@@ -0,0 +1,186 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include <immintrin.h>
+#include <string.h>
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 zmm0
+#define F1 zmm1
+#define F2 zmm2
+#define F3 zmm3
+#define F4 zmm4
+#define F5 zmm5
+#define F6 zmm6
+#define F7 zmm7
+
+typedef union {
+  _Float16 __Float16[32];
+  float _float[16];
+  double _double[8];
+  long _long[8];
+  int _int[16];
+  unsigned long _ulong[8];
+  __m64 _m64[8];
+  __m128 _m128[4];
+  __m256 _m256[2];
+  __m512 _m512[1];
+  __m512h _m512h[1];
+} ZMM_T;
+
+typedef union {
+  float _float;
+  double _double;
+  long double _ldouble;
+  unsigned long _ulong[2];
+} X87_T;
+extern void (*callthis)(void);
+extern unsigned long rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp,r8,r9,r10,r11,r12,r13,r14,r15;
+ZMM_T zmm_regs[32];
+X87_T x87_regs[8];
+extern volatile unsigned long volatile_var;
+extern void snapshot (void);
+extern void snapshot_ret (void);
+#define WRAP_CALL(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot)
+#define WRAP_RET(N) \
+  (callthis = (void (*)()) (N), (typeof (&N)) snapshot_ret)
+
+/* Clear all integer registers.  */
+#define clear_int_hardware_registers \
+  asm __volatile__ ("xor %%rax, %%rax\n\t" \
+		    "xor %%rbx, %%rbx\n\t" \
+		    "xor %%rcx, %%rcx\n\t" \
+		    "xor %%rdx, %%rdx\n\t" \
+		    "xor %%rsi, %%rsi\n\t" \
+		    "xor %%rdi, %%rdi\n\t" \
+		    "xor %%r8, %%r8\n\t" \
+		    "xor %%r9, %%r9\n\t" \
+		    "xor %%r10, %%r10\n\t" \
+		    "xor %%r11, %%r11\n\t" \
+		    "xor %%r12, %%r12\n\t" \
+		    "xor %%r13, %%r13\n\t" \
+		    "xor %%r14, %%r14\n\t" \
+		    "xor %%r15, %%r15\n\t" \
+		    ::: "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", \
+		    "r9", "r10", "r11", "r12", "r13", "r14", "r15");
+
+/* This is the list of registers available for passing arguments. Not all of
+   these are used or even really available.  */
+struct IntegerRegisters
+{
+  unsigned long rax, rbx, rcx, rdx, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15;
+};
+struct FloatRegisters
+{
+  double mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7;
+  long double st0, st1, st2, st3, st4, st5, st6, st7;
+  ZMM_T zmm0, zmm1, zmm2, zmm3, zmm4, zmm5, zmm6, zmm7, zmm8, zmm9,
+        zmm10, zmm11, zmm12, zmm13, zmm14, zmm15, zmm16, zmm17, zmm18,
+	zmm19, zmm20, zmm21, zmm22, zmm23, zmm24, zmm25, zmm26, zmm27,
+	zmm28, zmm29, zmm30, zmm31;
+};
+
+/* Implemented in scalarargs.c  */
+extern struct IntegerRegisters iregs;
+extern struct FloatRegisters fregs;
+extern unsigned int num_iregs, num_fregs;
+
+#define check_int_arguments do { \
+  assert (num_iregs <= 0 || iregs.I0 == I0); \
+  assert (num_iregs <= 1 || iregs.I1 == I1); \
+  assert (num_iregs <= 2 || iregs.I2 == I2); \
+  assert (num_iregs <= 3 || iregs.I3 == I3); \
+  assert (num_iregs <= 4 || iregs.I4 == I4); \
+  assert (num_iregs <= 5 || iregs.I5 == I5); \
+  } while (0)
+
+#define check_char_arguments check_int_arguments
+#define check_short_arguments check_int_arguments
+#define check_long_arguments check_int_arguments
+
+/* Clear register struct.  */
+#define clear_struct_registers \
+  rax = rbx = rcx = rdx = rdi = rsi = rbp = rsp \
+    = r8 = r9 = r10 = r11 = r12 = r13 = r14 = r15 = 0; \
+  memset (&iregs, 0, sizeof (iregs)); \
+  memset (&fregs, 0, sizeof (fregs)); \
+  memset (zmm_regs, 0, sizeof (zmm_regs)); \
+  memset (x87_regs, 0, sizeof (x87_regs));
+
+/* Clear both hardware and register structs for integers.  */
+#define clear_int_registers \
+  clear_struct_registers \
+  clear_int_hardware_registers
+
+/* TODO: Do the checking.  */
+#define check_f_arguments(T) do { \
+  assert (num_fregs <= 0 || fregs.zmm0._ ## T [0] == zmm_regs[0]._ ## T [0]); \
+  assert (num_fregs <= 1 || fregs.zmm1._ ## T [0] == zmm_regs[1]._ ## T [0]); \
+  assert (num_fregs <= 2 || fregs.zmm2._ ## T [0] == zmm_regs[2]._ ## T [0]); \
+  assert (num_fregs <= 3 || fregs.zmm3._ ## T [0] == zmm_regs[3]._ ## T [0]); \
+  assert (num_fregs <= 4 || fregs.zmm4._ ## T [0] == zmm_regs[4]._ ## T [0]); \
+  assert (num_fregs <= 5 || fregs.zmm5._ ## T [0] == zmm_regs[5]._ ## T [0]); \
+  assert (num_fregs <= 6 || fregs.zmm6._ ## T [0] == zmm_regs[6]._ ## T [0]); \
+  assert (num_fregs <= 7 || fregs.zmm7._ ## T [0] == zmm_regs[7]._ ## T [0]); \
+  } while (0)
+
+#define check_float_arguments check_f_arguments(float)
+#define check_double_arguments check_f_arguments(double)
+
+#define check_vector_arguments(T,O) do { \
+  assert (num_fregs <= 0 \
+	  || memcmp (((char *) &fregs.zmm0) + (O), \
+		     &zmm_regs[0], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 1 \
+	  || memcmp (((char *) &fregs.zmm1) + (O), \
+		     &zmm_regs[1], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 2 \
+	  || memcmp (((char *) &fregs.zmm2) + (O), \
+		     &zmm_regs[2], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 3 \
+	  || memcmp (((char *) &fregs.zmm3) + (O), \
+		     &zmm_regs[3], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 4 \
+	  || memcmp (((char *) &fregs.zmm4) + (O), \
+		     &zmm_regs[4], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 5 \
+	  || memcmp (((char *) &fregs.zmm5) + (O), \
+		     &zmm_regs[5], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 6 \
+	  || memcmp (((char *) &fregs.zmm6) + (O), \
+		     &zmm_regs[6], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  assert (num_fregs <= 7 \
+	  || memcmp (((char *) &fregs.zmm7) + (O), \
+		     &zmm_regs[7], \
+		     sizeof (__ ## T) - (O)) == 0); \
+  } while (0)
+
+#define check_m64_arguments check_vector_arguments(m64, 0)
+#define check_m128_arguments check_vector_arguments(m128, 0)
+#define check_m256_arguments check_vector_arguments(m256, 0)
+#define check_m512_arguments check_vector_arguments(m512, 0)
+
+#endif /* INCLUDED_ARGS_H  */
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
new file mode 100644
index 00000000000..0ef82876dd9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
@@ -0,0 +1,97 @@
+	.text
+	.p2align 4,,15
+.globl snapshot
+	.type	snapshot, @function
+snapshot:
+.LFB3:
+	movq	%rax, rax(%rip)
+	movq	%rbx, rbx(%rip)
+	movq	%rcx, rcx(%rip)
+	movq	%rdx, rdx(%rip)
+	movq	%rdi, rdi(%rip)
+	movq	%rsi, rsi(%rip)
+	movq	%rbp, rbp(%rip)
+	movq	%rsp, rsp(%rip)
+	movq	%r8, r8(%rip)
+	movq	%r9, r9(%rip)
+	movq	%r10, r10(%rip)
+	movq	%r11, r11(%rip)
+	movq	%r12, r12(%rip)
+	movq	%r13, r13(%rip)
+	movq	%r14, r14(%rip)
+	movq	%r15, r15(%rip)
+	vmovdqu32 %zmm0, zmm_regs+0(%rip)
+	vmovdqu32 %zmm1, zmm_regs+64(%rip)
+	vmovdqu32 %zmm2, zmm_regs+128(%rip)
+	vmovdqu32 %zmm3, zmm_regs+192(%rip)
+	vmovdqu32 %zmm4, zmm_regs+256(%rip)
+	vmovdqu32 %zmm5, zmm_regs+320(%rip)
+	vmovdqu32 %zmm6, zmm_regs+384(%rip)
+	vmovdqu32 %zmm7, zmm_regs+448(%rip)
+	vmovdqu32 %zmm8, zmm_regs+512(%rip)
+	vmovdqu32 %zmm9, zmm_regs+576(%rip)
+	vmovdqu32 %zmm10, zmm_regs+640(%rip)
+	vmovdqu32 %zmm11, zmm_regs+704(%rip)
+	vmovdqu32 %zmm12, zmm_regs+768(%rip)
+	vmovdqu32 %zmm13, zmm_regs+832(%rip)
+	vmovdqu32 %zmm14, zmm_regs+896(%rip)
+	vmovdqu32 %zmm15, zmm_regs+960(%rip)
+	vmovdqu32 %zmm16, zmm_regs+1024(%rip)
+	vmovdqu32 %zmm17, zmm_regs+1088(%rip)
+	vmovdqu32 %zmm18, zmm_regs+1152(%rip)
+	vmovdqu32 %zmm19, zmm_regs+1216(%rip)
+	vmovdqu32 %zmm20, zmm_regs+1280(%rip)
+	vmovdqu32 %zmm21, zmm_regs+1344(%rip)
+	vmovdqu32 %zmm22, zmm_regs+1408(%rip)
+	vmovdqu32 %zmm23, zmm_regs+1472(%rip)
+	vmovdqu32 %zmm24, zmm_regs+1536(%rip)
+	vmovdqu32 %zmm25, zmm_regs+1600(%rip)
+	vmovdqu32 %zmm26, zmm_regs+1664(%rip)
+	vmovdqu32 %zmm27, zmm_regs+1728(%rip)
+	vmovdqu32 %zmm28, zmm_regs+1792(%rip)
+	vmovdqu32 %zmm29, zmm_regs+1856(%rip)
+	vmovdqu32 %zmm30, zmm_regs+1920(%rip)
+	vmovdqu32 %zmm31, zmm_regs+1984(%rip)
+	jmp	*callthis(%rip)
+.LFE3:
+	.size	snapshot, .-snapshot
+
+	.p2align 4,,15
+.globl snapshot_ret
+	.type	snapshot_ret, @function
+snapshot_ret:
+	movq	%rdi, rdi(%rip)
+	subq	$8, %rsp
+	call	*callthis(%rip)
+	addq	$8, %rsp
+	movq	%rax, rax(%rip)
+	movq	%rdx, rdx(%rip)
+	vmovdqu32	%zmm0, zmm_regs+0(%rip)
+	vmovdqu32	%zmm1, zmm_regs+64(%rip)
+	fstpt	x87_regs(%rip)
+	fstpt	x87_regs+16(%rip)
+	fldt	x87_regs+16(%rip)
+	fldt	x87_regs(%rip)
+	ret
+	.size	snapshot_ret, .-snapshot_ret
+
+	.comm	callthis,8,8
+	.comm	rax,8,8
+	.comm	rbx,8,8
+	.comm	rcx,8,8
+	.comm	rdx,8,8
+	.comm	rsi,8,8
+	.comm	rdi,8,8
+	.comm	rsp,8,8
+	.comm	rbp,8,8
+	.comm	r8,8,8
+	.comm	r9,8,8
+	.comm	r10,8,8
+	.comm	r11,8,8
+	.comm	r12,8,8
+	.comm	r13,8,8
+	.comm	r14,8,8
+	.comm	r15,8,8
+	.comm	zmm_regs,2048,64
+	.comm	x87_regs,128,32
+	.comm   volatile_var,8,8
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
new file mode 100644
index 00000000000..4b882cc11fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
@@ -0,0 +1,4 @@
+#define AVX512VL(ebx) 1
+#define XSTATE_MASK (XSTATE_SSE | XSTATE_YMM | XSTATE_ZMM \
+		     | XSTATE_HI_ZMM | XSTATE_OPMASK)
+#include "../avx512fp16-check.h"
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
new file mode 100644
index 00000000000..5cb59436cfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
@@ -0,0 +1,62 @@
+#include <stdio.h>
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+__m512
+fun_test_returning___m512 (void)
+{
+  volatile_var++;
+  return (__m512){73,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+}
+
+__m512h
+fun_test_returning___m512h (void)
+{
+  volatile_var++;
+  return (__m512h){ 1.1f16, 2.2f16, 3.3f16, 4.4f16,
+                    5.5f16, 6.6f16, 7.7f16, 8.8f16,
+                    9.9f16,  10.10f16,   11.11f16, 12.12f16,
+                    13.13f16, 14.14f16,  15.15f16, 16.16f16,
+                    17.17f16, 18.18f16,  19.19f16, 20.20f16,
+                    21.21f16, 22.22f16,  23.23f16, 24.24f16,
+                    25.25f16, 26.26f16,  27.27f16, 28.28f16,
+                    29.29f16, 30.30f16,  31.31f16, 32.32f16};
+}
+
+__m512 test_512;
+__m512h test_512h;
+
+static void
+do_test (void)
+{
+  unsigned failed = 0;
+  ZMM_T zmmt1, zmmt2;
+
+  clear_struct_registers;
+  test_512 = (__m512){73,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+  zmmt1._m512[0] = test_512;
+  zmmt2._m512[0] = WRAP_RET (fun_test_returning___m512)();
+  if (memcmp (&zmmt1, &zmmt2, sizeof (zmmt2)) != 0)
+    printf ("fail m512\n"), failed++;
+
+  clear_struct_registers;
+  test_512h = (__m512h){ 1.1f16, 2.2f16, 3.3f16, 4.4f16,
+                         5.5f16, 6.6f16, 7.7f16, 8.8f16,
+                         9.9f16,  10.10f16,   11.11f16, 12.12f16,
+                         13.13f16, 14.14f16,  15.15f16, 16.16f16,
+                         17.17f16, 18.18f16,  19.19f16, 20.20f16,
+                         21.21f16, 22.22f16,  23.23f16, 24.24f16,
+                         25.25f16, 26.26f16,  27.27f16, 28.28f16,
+                         29.29f16, 30.30f16,  31.31f16, 32.32f16};
+  zmmt1._m512h[0] = test_512h;
+  zmmt2._m512h[0] = WRAP_RET (fun_test_returning___m512h)();
+  if (memcmp (&zmmt1, &zmmt2, sizeof (zmmt2)) != 0)
+    printf ("fail m512h\n"), failed++;
+
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
new file mode 100644
index 00000000000..ad5ba2e7f92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
@@ -0,0 +1,380 @@
+#include <stdio.h>
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+/* This struct holds values for argument checking.  */
+struct
+{
+  ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12, i13, i14, i15,
+    i16, i17, i18, i19, i20, i21, i22, i23;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+fun_check_passing_m512_8_values (__m512 i0 ATTRIBUTE_UNUSED,
+				 __m512 i1 ATTRIBUTE_UNUSED,
+				 __m512 i2 ATTRIBUTE_UNUSED,
+				 __m512 i3 ATTRIBUTE_UNUSED,
+				 __m512 i4 ATTRIBUTE_UNUSED,
+				 __m512 i5 ATTRIBUTE_UNUSED,
+				 __m512 i6 ATTRIBUTE_UNUSED,
+				 __m512 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512);
+  compare (values.i1, i1, __m512);
+  compare (values.i2, i2, __m512);
+  compare (values.i3, i3, __m512);
+  compare (values.i4, i4, __m512);
+  compare (values.i5, i5, __m512);
+  compare (values.i6, i6, __m512);
+  compare (values.i7, i7, __m512);
+}
+
+fun_check_passing_m512h_8_values (__m512h i0 ATTRIBUTE_UNUSED,
+				  __m512h i1 ATTRIBUTE_UNUSED,
+				  __m512h i2 ATTRIBUTE_UNUSED,
+				  __m512h i3 ATTRIBUTE_UNUSED,
+				  __m512h i4 ATTRIBUTE_UNUSED,
+				  __m512h i5 ATTRIBUTE_UNUSED,
+				  __m512h i6 ATTRIBUTE_UNUSED,
+				  __m512h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512h);
+  compare (values.i1, i1, __m512h);
+  compare (values.i2, i2, __m512h);
+  compare (values.i3, i3, __m512h);
+  compare (values.i4, i4, __m512h);
+  compare (values.i5, i5, __m512h);
+  compare (values.i6, i6, __m512h);
+  compare (values.i7, i7, __m512h);
+}
+
+void
+fun_check_passing_m512_8_regs (__m512 i0 ATTRIBUTE_UNUSED,
+			       __m512 i1 ATTRIBUTE_UNUSED,
+			       __m512 i2 ATTRIBUTE_UNUSED,
+			       __m512 i3 ATTRIBUTE_UNUSED,
+			       __m512 i4 ATTRIBUTE_UNUSED,
+			       __m512 i5 ATTRIBUTE_UNUSED,
+			       __m512 i6 ATTRIBUTE_UNUSED,
+			       __m512 i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+fun_check_passing_m512h_8_regs (__m512h i0 ATTRIBUTE_UNUSED,
+				__m512h i1 ATTRIBUTE_UNUSED,
+				__m512h i2 ATTRIBUTE_UNUSED,
+				__m512h i3 ATTRIBUTE_UNUSED,
+				__m512h i4 ATTRIBUTE_UNUSED,
+				__m512h i5 ATTRIBUTE_UNUSED,
+				__m512h i6 ATTRIBUTE_UNUSED,
+				__m512h i7 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+fun_check_passing_m512_20_values (__m512 i0 ATTRIBUTE_UNUSED,
+				  __m512 i1 ATTRIBUTE_UNUSED,
+				  __m512 i2 ATTRIBUTE_UNUSED,
+				  __m512 i3 ATTRIBUTE_UNUSED,
+				  __m512 i4 ATTRIBUTE_UNUSED,
+				  __m512 i5 ATTRIBUTE_UNUSED,
+				  __m512 i6 ATTRIBUTE_UNUSED,
+				  __m512 i7 ATTRIBUTE_UNUSED,
+				  __m512 i8 ATTRIBUTE_UNUSED,
+				  __m512 i9 ATTRIBUTE_UNUSED,
+				  __m512 i10 ATTRIBUTE_UNUSED,
+				  __m512 i11 ATTRIBUTE_UNUSED,
+				  __m512 i12 ATTRIBUTE_UNUSED,
+				  __m512 i13 ATTRIBUTE_UNUSED,
+				  __m512 i14 ATTRIBUTE_UNUSED,
+				  __m512 i15 ATTRIBUTE_UNUSED,
+				  __m512 i16 ATTRIBUTE_UNUSED,
+				  __m512 i17 ATTRIBUTE_UNUSED,
+				  __m512 i18 ATTRIBUTE_UNUSED,
+				  __m512 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512);
+  compare (values.i1, i1, __m512);
+  compare (values.i2, i2, __m512);
+  compare (values.i3, i3, __m512);
+  compare (values.i4, i4, __m512);
+  compare (values.i5, i5, __m512);
+  compare (values.i6, i6, __m512);
+  compare (values.i7, i7, __m512);
+  compare (values.i8, i8, __m512);
+  compare (values.i9, i9, __m512);
+  compare (values.i10, i10, __m512);
+  compare (values.i11, i11, __m512);
+  compare (values.i12, i12, __m512);
+  compare (values.i13, i13, __m512);
+  compare (values.i14, i14, __m512);
+  compare (values.i15, i15, __m512);
+  compare (values.i16, i16, __m512);
+  compare (values.i17, i17, __m512);
+  compare (values.i18, i18, __m512);
+  compare (values.i19, i19, __m512);
+}
+
+void
+fun_check_passing_m512h_20_values (__m512h i0 ATTRIBUTE_UNUSED,
+				   __m512h i1 ATTRIBUTE_UNUSED,
+				   __m512h i2 ATTRIBUTE_UNUSED,
+				   __m512h i3 ATTRIBUTE_UNUSED,
+				   __m512h i4 ATTRIBUTE_UNUSED,
+				   __m512h i5 ATTRIBUTE_UNUSED,
+				   __m512h i6 ATTRIBUTE_UNUSED,
+				   __m512h i7 ATTRIBUTE_UNUSED,
+				   __m512h i8 ATTRIBUTE_UNUSED,
+				   __m512h i9 ATTRIBUTE_UNUSED,
+				   __m512h i10 ATTRIBUTE_UNUSED,
+				   __m512h i11 ATTRIBUTE_UNUSED,
+				   __m512h i12 ATTRIBUTE_UNUSED,
+				   __m512h i13 ATTRIBUTE_UNUSED,
+				   __m512h i14 ATTRIBUTE_UNUSED,
+				   __m512h i15 ATTRIBUTE_UNUSED,
+				   __m512h i16 ATTRIBUTE_UNUSED,
+				   __m512h i17 ATTRIBUTE_UNUSED,
+				   __m512h i18 ATTRIBUTE_UNUSED,
+				   __m512h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check argument values.  */
+  compare (values.i0, i0, __m512h);
+  compare (values.i1, i1, __m512h);
+  compare (values.i2, i2, __m512h);
+  compare (values.i3, i3, __m512h);
+  compare (values.i4, i4, __m512h);
+  compare (values.i5, i5, __m512h);
+  compare (values.i6, i6, __m512h);
+  compare (values.i7, i7, __m512h);
+  compare (values.i8, i8, __m512h);
+  compare (values.i9, i9, __m512h);
+  compare (values.i10, i10, __m512h);
+  compare (values.i11, i11, __m512h);
+  compare (values.i12, i12, __m512h);
+  compare (values.i13, i13, __m512h);
+  compare (values.i14, i14, __m512h);
+  compare (values.i15, i15, __m512h);
+  compare (values.i16, i16, __m512h);
+  compare (values.i17, i17, __m512h);
+  compare (values.i18, i18, __m512h);
+  compare (values.i19, i19, __m512h);
+}
+
+void
+fun_check_passing_m512_20_regs (__m512 i0 ATTRIBUTE_UNUSED,
+				__m512 i1 ATTRIBUTE_UNUSED,
+				__m512 i2 ATTRIBUTE_UNUSED,
+				__m512 i3 ATTRIBUTE_UNUSED,
+				__m512 i4 ATTRIBUTE_UNUSED,
+				__m512 i5 ATTRIBUTE_UNUSED,
+				__m512 i6 ATTRIBUTE_UNUSED,
+				__m512 i7 ATTRIBUTE_UNUSED,
+				__m512 i8 ATTRIBUTE_UNUSED,
+				__m512 i9 ATTRIBUTE_UNUSED,
+				__m512 i10 ATTRIBUTE_UNUSED,
+				__m512 i11 ATTRIBUTE_UNUSED,
+				__m512 i12 ATTRIBUTE_UNUSED,
+				__m512 i13 ATTRIBUTE_UNUSED,
+				__m512 i14 ATTRIBUTE_UNUSED,
+				__m512 i15 ATTRIBUTE_UNUSED,
+				__m512 i16 ATTRIBUTE_UNUSED,
+				__m512 i17 ATTRIBUTE_UNUSED,
+				__m512 i18 ATTRIBUTE_UNUSED,
+				__m512 i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+fun_check_passing_m512h_20_regs (__m512h i0 ATTRIBUTE_UNUSED,
+				 __m512h i1 ATTRIBUTE_UNUSED,
+				 __m512h i2 ATTRIBUTE_UNUSED,
+				 __m512h i3 ATTRIBUTE_UNUSED,
+				 __m512h i4 ATTRIBUTE_UNUSED,
+				 __m512h i5 ATTRIBUTE_UNUSED,
+				 __m512h i6 ATTRIBUTE_UNUSED,
+				 __m512h i7 ATTRIBUTE_UNUSED,
+				 __m512h i8 ATTRIBUTE_UNUSED,
+				 __m512h i9 ATTRIBUTE_UNUSED,
+				 __m512h i10 ATTRIBUTE_UNUSED,
+				 __m512h i11 ATTRIBUTE_UNUSED,
+				 __m512h i12 ATTRIBUTE_UNUSED,
+				 __m512h i13 ATTRIBUTE_UNUSED,
+				 __m512h i14 ATTRIBUTE_UNUSED,
+				 __m512h i15 ATTRIBUTE_UNUSED,
+				 __m512h i16 ATTRIBUTE_UNUSED,
+				 __m512h i17 ATTRIBUTE_UNUSED,
+				 __m512h i18 ATTRIBUTE_UNUSED,
+				 __m512h i19 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+#define def_check_passing8(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7); \
+  \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7);
+
+#define def_check_passing20(_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
+			    _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
+			    _i18, _i19, _func1, _func2, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  values.i10.TYPE[0] = _i10; \
+  values.i11.TYPE[0] = _i11; \
+  values.i12.TYPE[0] = _i12; \
+  values.i13.TYPE[0] = _i13; \
+  values.i14.TYPE[0] = _i14; \
+  values.i15.TYPE[0] = _i15; \
+  values.i16.TYPE[0] = _i16; \
+  values.i17.TYPE[0] = _i17; \
+  values.i18.TYPE[0] = _i18; \
+  values.i19.TYPE[0] = _i19; \
+  WRAP_CALL(_func1) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
+		     _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
+		     _i18, _i19); \
+  \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  fregs.F4.TYPE[0] = _i4; \
+  fregs.F5.TYPE[0] = _i5; \
+  fregs.F6.TYPE[0] = _i6; \
+  fregs.F7.TYPE[0] = _i7; \
+  num_fregs = 8; \
+  WRAP_CALL(_func2) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9, \
+		     _i10, _i11, _i12, _i13, _i14, _i15, _i16, _i17, \
+		     _i18, _i19);
+
+void
+test_m512_on_stack ()
+{
+  __m512 x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m512){32 + i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m512-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m512_8_values,
+		      fun_check_passing_m512_8_regs, _m512);
+}
+
+void
+test_m512h_on_stack ()
+{
+  __m512h x[8];
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] = (__m512h){1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+		     5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i,
+		     9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i,
+		     13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i,
+		     17.17f16 + i, 18.18f16 + i, 19.19f16 + i, 20.20f16 + i,
+		     21.21f16 + i, 22.22f16 + i, 23.23f16 + i, 24.24f16 + i,
+		     25.25f16 + i, 26.26f16 + i, 27.27f16 + i, 28.28f16 + i,
+		     29.29f16 + i, 30.30f16 + i, 31.31f16 + i, 32.32f16 + i};
+
+  pass = "m512h-8";
+  def_check_passing8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+		      fun_check_passing_m512h_8_values,
+		      fun_check_passing_m512h_8_regs, _m512h);
+}
+
+void
+test_too_many_m512 ()
+{
+  __m512 x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m512){32 + i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m512-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m512_20_values,
+		       fun_check_passing_m512_20_regs, _m512);
+}
+
+void
+test_too_many_m512h ()
+{
+  __m512h x[20];
+  int i;
+  for (i = 0; i < 20; i++)
+    x[i] = (__m512h){ 1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+		      5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i,
+		      9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i,
+		      13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i,
+		      17.17f16 + i, 18.18f16 + i, 19.19f16 + i, 20.20f16 + i,
+		      21.21f16 + i, 22.22f16 + i, 23.23f16 + i, 24.24f16 + i,
+		      25.25f16 + i, 26.26f16 + i, 27.27f16 + i, 28.28f16 + i,
+		      29.29f16 + i, 30.30f16 + i, 31.31f16 + i, 32.32f16 + i};
+  pass = "m512h-20";
+  def_check_passing20 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8],
+		       x[9], x[10], x[11], x[12], x[13], x[14], x[15], x[16],
+		       x[17], x[18], x[19], fun_check_passing_m512h_20_values,
+		       fun_check_passing_m512h_20_regs, _m512h);
+}
+
+static void
+do_test (void)
+{
+  test_m512_on_stack ();
+  test_too_many_m512 ();
+  test_m512h_on_stack ();
+  test_too_many_m512h ();
+  if (failed)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
new file mode 100644
index 00000000000..734e0f8e9e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
@@ -0,0 +1,123 @@
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+struct m512_struct
+{
+  __m512 x;
+};
+
+struct m512h_struct
+{
+  __m512h x;
+};
+
+struct m512_2_struct
+{
+  __m512 x1, x2;
+};
+
+struct m512h_2_struct
+{
+  __m512h x1, x2;
+};
+
+/* Check that the struct is passed as the individual members in fregs.  */
+void
+check_struct_passing1 (struct m512_struct ms1 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms2 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms3 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms4 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms5 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms6 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms7 ATTRIBUTE_UNUSED,
+		       struct m512_struct ms8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_struct_passing1h (struct m512h_struct ms1 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms2 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms3 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms4 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms5 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms6 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms7 ATTRIBUTE_UNUSED,
+			struct m512h_struct ms8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_struct_passing2 (struct m512_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+72);
+}
+
+void
+check_struct_passing2h (struct m512h_2_struct ms ATTRIBUTE_UNUSED)
+{
+  /* Check the passing on the stack by comparing the address of the
+     stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&ms.x1 == rsp+8);
+  assert ((unsigned long)&ms.x2 == rsp+72);
+}
+
+static void
+do_test (void)
+{
+  struct m512_struct m512s [8];
+  struct m512h_struct m512hs [8];
+  struct m512_2_struct m512_2s = {
+      { 48.394, 39.3, -397.9, 3484.9, -8.394, -93.3, 7.9, 84.94,
+	48.3941, 39.31, -397.91, 3484.91, -8.3941, -93.31, 7.91, 84.941 },
+      { -8.394, -3.3, -39.9, 34.9, 7.9, 84.94, -48.394, 39.3,
+	-8.3942, -3.32, -39.92, 34.92, 7.92, 84.942, -48.3942, 39.32 }
+  };
+  struct m512h_2_struct m512h_2s = {
+      { 58.395f16, 39.3f16, -397.9f16, 3585.9f16, -8.395f16, -93.3f16, 7.9f16, 85.95f16,
+        58.395f16, 39.3f16, -397.9f16, 3585.9f16, -8.395f16, -93.3f16, 7.9f16, 85.95f16,
+        58.395f16, 39.3f16, -397.9f16, 3585.9f16, -8.395f16, -93.3f16, 7.9f16, 85.95f16,
+	58.3951f16, 39.31f16, -397.91f16, 3585.91f16, -8.3951f16, -93.31f16, 7.91f16, 85.951f16},
+      { 67.396f16, 39.3f16, -397.9f16, 3676.9f16, -7.396f16, -93.3f16, 7.9f16, 76.96f16,
+        67.396f16, 39.3f16, -397.9f16, 3676.9f16, -7.396f16, -93.3f16, 7.9f16, 76.96f16,
+        67.396f16, 39.3f16, -397.9f16, 3676.9f16, -7.396f16, -93.3f16, 7.9f16, 76.96f16,
+	67.3961f16, 39.31f16, -397.91f16, 3676.91f16, -7.3961f16, -93.31f16, 7.91f16, 76.961f16},
+  };
+  int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      m512s[i].x = (__m512){32+i, 0, i, 0, -i, 0, i - 12, i + 8,
+			    32+i, 0, i, 0, -i, 0, i - 12, i + 8};
+      m512hs[i].x = (__m512h){33+i, 1, i, 2, -i, 0, i - 15, i + 9,
+			      34+i, 1, i, 2, -i, 0, i - 15, i + 9,
+			      35+i, 1, i, 2, -i, 0, i - 15, i + 9,
+			      36+i, 1, i, 2, -i, 0, i - 15, i + 9};
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512[0] = m512s[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1)(m512s[0], m512s[1], m512s[2], m512s[3],
+				    m512s[4], m512s[5], m512s[6], m512s[7]);
+  WRAP_CALL (check_struct_passing2)(m512_2s);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512h[0] = m512hs[i].x;
+  num_fregs = 8;
+  WRAP_CALL (check_struct_passing1h)(m512hs[0], m512hs[1], m512hs[2], m512hs[3],
+				    m512hs[4], m512hs[5], m512hs[6], m512hs[7]);
+  WRAP_CALL (check_struct_passing2h)(m512h_2s);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
new file mode 100644
index 00000000000..fa801fbf7ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
@@ -0,0 +1,415 @@
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+unsigned int num_iregs, num_fregs;
+
+union un1
+{
+  __m512 x;
+  float f;
+};
+
+union un2
+{
+  __m512 x;
+  double d;
+};
+
+union un3
+{
+  __m512 x;
+  __m128 v;
+};
+
+union un4
+{
+  __m512 x;
+  long double ld;
+};
+
+union un5
+{
+  __m512 x;
+  int i;
+};
+
+union un6
+{
+  __m512 x;
+  __m256 v;
+};
+
+union un1h
+{
+  __m512 x;
+  _Float16 f;
+};
+
+union un1hf
+{
+  __m512h x;
+  float f;
+};
+
+union un1hh
+{
+  __m512h x;
+  _Float16 f;
+};
+
+union un2h
+{
+  __m512h x;
+  double d;
+};
+
+union un3h
+{
+  __m512h x;
+  __m128 v;
+};
+
+union un4h
+{
+  __m512h x;
+  long double ld;
+};
+
+union un5h
+{
+  __m512h x;
+  int i;
+};
+
+union un6h
+{
+  __m512h x;
+  __m256 v;
+};
+
+void
+check_union_passing1(union un1 u1 ATTRIBUTE_UNUSED,
+		     union un1 u2 ATTRIBUTE_UNUSED,
+		     union un1 u3 ATTRIBUTE_UNUSED,
+		     union un1 u4 ATTRIBUTE_UNUSED,
+		     union un1 u5 ATTRIBUTE_UNUSED,
+		     union un1 u6 ATTRIBUTE_UNUSED,
+		     union un1 u7 ATTRIBUTE_UNUSED,
+		     union un1 u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing1h(union un1h u1 ATTRIBUTE_UNUSED,
+		      union un1h u2 ATTRIBUTE_UNUSED,
+		      union un1h u3 ATTRIBUTE_UNUSED,
+		      union un1h u4 ATTRIBUTE_UNUSED,
+		      union un1h u5 ATTRIBUTE_UNUSED,
+		      union un1h u6 ATTRIBUTE_UNUSED,
+		      union un1h u7 ATTRIBUTE_UNUSED,
+		      union un1h u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing1hf(union un1hf u1 ATTRIBUTE_UNUSED,
+		       union un1hf u2 ATTRIBUTE_UNUSED,
+		       union un1hf u3 ATTRIBUTE_UNUSED,
+		       union un1hf u4 ATTRIBUTE_UNUSED,
+		       union un1hf u5 ATTRIBUTE_UNUSED,
+		       union un1hf u6 ATTRIBUTE_UNUSED,
+		       union un1hf u7 ATTRIBUTE_UNUSED,
+		       union un1hf u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing1hh(union un1hh u1 ATTRIBUTE_UNUSED,
+		       union un1hh u2 ATTRIBUTE_UNUSED,
+		       union un1hh u3 ATTRIBUTE_UNUSED,
+		       union un1hh u4 ATTRIBUTE_UNUSED,
+		       union un1hh u5 ATTRIBUTE_UNUSED,
+		       union un1hh u6 ATTRIBUTE_UNUSED,
+		       union un1hh u7 ATTRIBUTE_UNUSED,
+		       union un1hh u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+
+void
+check_union_passing2(union un2 u1 ATTRIBUTE_UNUSED,
+		     union un2 u2 ATTRIBUTE_UNUSED,
+		     union un2 u3 ATTRIBUTE_UNUSED,
+		     union un2 u4 ATTRIBUTE_UNUSED,
+		     union un2 u5 ATTRIBUTE_UNUSED,
+		     union un2 u6 ATTRIBUTE_UNUSED,
+		     union un2 u7 ATTRIBUTE_UNUSED,
+		     union un2 u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing2h(union un2h u1 ATTRIBUTE_UNUSED,
+		      union un2h u2 ATTRIBUTE_UNUSED,
+		      union un2h u3 ATTRIBUTE_UNUSED,
+		      union un2h u4 ATTRIBUTE_UNUSED,
+		      union un2h u5 ATTRIBUTE_UNUSED,
+		      union un2h u6 ATTRIBUTE_UNUSED,
+		      union un2h u7 ATTRIBUTE_UNUSED,
+		      union un2h u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing3(union un3 u1 ATTRIBUTE_UNUSED,
+		     union un3 u2 ATTRIBUTE_UNUSED,
+		     union un3 u3 ATTRIBUTE_UNUSED,
+		     union un3 u4 ATTRIBUTE_UNUSED,
+		     union un3 u5 ATTRIBUTE_UNUSED,
+		     union un3 u6 ATTRIBUTE_UNUSED,
+		     union un3 u7 ATTRIBUTE_UNUSED,
+		     union un3 u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing3h(union un3h u1 ATTRIBUTE_UNUSED,
+		      union un3h u2 ATTRIBUTE_UNUSED,
+		      union un3h u3 ATTRIBUTE_UNUSED,
+		      union un3h u4 ATTRIBUTE_UNUSED,
+		      union un3h u5 ATTRIBUTE_UNUSED,
+		      union un3h u6 ATTRIBUTE_UNUSED,
+		      union un3h u7 ATTRIBUTE_UNUSED,
+		      union un3h u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing4(union un4 u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing4h(union un4h u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.ld == rsp+8);
+}
+
+void
+check_union_passing5(union un5 u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+void
+check_union_passing5h(union un5h u ATTRIBUTE_UNUSED)
+{
+   /* Check the passing on the stack by comparing the address of the
+      stack elements to the expected place on the stack.  */
+  assert ((unsigned long)&u.x == rsp+8);
+  assert ((unsigned long)&u.i == rsp+8);
+}
+
+void
+check_union_passing6(union un6 u1 ATTRIBUTE_UNUSED,
+		     union un6 u2 ATTRIBUTE_UNUSED,
+		     union un6 u3 ATTRIBUTE_UNUSED,
+		     union un6 u4 ATTRIBUTE_UNUSED,
+		     union un6 u5 ATTRIBUTE_UNUSED,
+		     union un6 u6 ATTRIBUTE_UNUSED,
+		     union un6 u7 ATTRIBUTE_UNUSED,
+		     union un6 u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+void
+check_union_passing6h(union un6h u1 ATTRIBUTE_UNUSED,
+		      union un6h u2 ATTRIBUTE_UNUSED,
+		      union un6h u3 ATTRIBUTE_UNUSED,
+		      union un6h u4 ATTRIBUTE_UNUSED,
+		      union un6h u5 ATTRIBUTE_UNUSED,
+		      union un6h u6 ATTRIBUTE_UNUSED,
+		      union un6h u7 ATTRIBUTE_UNUSED,
+		      union un6h u8 ATTRIBUTE_UNUSED)
+{
+  /* Check register contents.  */
+  check_m512_arguments;
+}
+
+#define check_union_passing1 WRAP_CALL(check_union_passing1)
+#define check_union_passing2 WRAP_CALL(check_union_passing2)
+#define check_union_passing3 WRAP_CALL(check_union_passing3)
+#define check_union_passing4 WRAP_CALL(check_union_passing4)
+#define check_union_passing5 WRAP_CALL(check_union_passing5)
+#define check_union_passing6 WRAP_CALL(check_union_passing6)
+
+#define check_union_passing1h WRAP_CALL(check_union_passing1h)
+#define check_union_passing1hf WRAP_CALL(check_union_passing1hf)
+#define check_union_passing1hh WRAP_CALL(check_union_passing1hh)
+#define check_union_passing2h WRAP_CALL(check_union_passing2h)
+#define check_union_passing3h WRAP_CALL(check_union_passing3h)
+#define check_union_passing4h WRAP_CALL(check_union_passing4h)
+#define check_union_passing5h WRAP_CALL(check_union_passing5h)
+#define check_union_passing6h WRAP_CALL(check_union_passing6h)
+
+
+static void
+do_test (void)
+{
+  union un1 u1[8];
+  union un2 u2[8];
+  union un3 u3[8];
+  union un4 u4;
+  union un5 u5;
+  union un6 u6[8];
+  union un1h u1h[8];
+  union un1hf u1hf[8];
+  union un1hh u1hh[8];
+  union un2h u2h[8];
+  union un3h u3h[8];
+  union un4h u4h;
+  union un5h u5h;
+  union un6h u6h[8];
+   int i;
+
+  for (i = 0; i < 8; i++)
+    {
+      u1[i].x = (__m512){32+i, 0, i, 0, -i, 0, i - 12, i + 8,
+	                 32+i, 0, i, 0, -i, 0, i - 12, i + 8};
+
+      u1hf[i].x =  (__m512h){ 33+i, 1, i, 2, -i, 0, i - 15, i + 9,
+                              34+i, 1, i, 2, -i, 0, i - 15, i + 9,
+                              35+i, 1, i, 2, -i, 0, i - 15, i + 9,
+                              36+i, 1, i, 2, -i, 0, i - 15, i + 9};
+    }
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512[0] = u1[i].x;
+  num_fregs = 8;
+  check_union_passing1(u1[0], u1[1], u1[2], u1[3],
+		       u1[4], u1[5], u1[6], u1[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1h[i].x = u1[i].x;
+      (&fregs.zmm0)[i]._m512[0] = u1h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1h(u1h[0], u1h[1], u1h[2], u1h[3],
+		        u1h[4], u1h[5], u1h[6], u1h[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    (&fregs.zmm0)[i]._m512h[0] = u1hf[i].x;
+  num_fregs = 8;
+  check_union_passing1hf(u1hf[0], u1hf[1], u1hf[2], u1hf[3],
+		         u1hf[4], u1hf[5], u1hf[6], u1hf[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u1hh[i].x = u1hf[i].x;
+      (&fregs.zmm0)[i]._m512h[0] = u1hh[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing1hh(u1hh[0], u1hh[1], u1hh[2], u1hh[3],
+		         u1hh[4], u1hh[5], u1hh[6], u1hh[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2[i].x = u1[i].x;
+      (&fregs.zmm0)[i]._m512[0] = u2[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2(u2[0], u2[1], u2[2], u2[3],
+		       u2[4], u2[5], u2[6], u2[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u2h[i].x = u1hf[i].x;
+      (&fregs.zmm0)[i]._m512h[0] = u2h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing2h(u2h[0], u2h[1], u2h[2], u2h[3],
+		        u2h[4], u2h[5], u2h[6], u2h[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3[i].x = u1[i].x;
+      (&fregs.zmm0)[i]._m512[0] = u3[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3(u3[0], u3[1], u3[2], u3[3],
+		       u3[4], u3[5], u3[6], u3[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u3h[i].x = u1hf[i].x;
+      (&fregs.zmm0)[i]._m512h[0] = u3h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing3h(u3h[0], u3h[1], u3h[2], u3h[3],
+		        u3h[4], u3h[5], u3h[6], u3h[7]);
+
+  check_union_passing4(u4);
+  check_union_passing5(u5);
+
+  check_union_passing4h(u4h);
+  check_union_passing5h(u5h);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u6[i].x = u1[i].x;
+      (&fregs.zmm0)[i]._m512[0] = u6[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing6(u6[0], u6[1], u6[2], u6[3],
+		       u6[4], u6[5], u6[6], u6[7]);
+
+  clear_struct_registers;
+  for (i = 0; i < 8; i++)
+    {
+      u6h[i].x = u1hf[i].x;
+      (&fregs.zmm0)[i]._m512h[0] = u6h[i].x;
+    }
+  num_fregs = 8;
+  check_union_passing6h(u6h[0], u6h[1], u6h[2], u6h[3],
+		        u6h[4], u6h[5], u6h[6], u6h[7]);
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c
new file mode 100644
index 00000000000..e6d165a8247
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c
@@ -0,0 +1,164 @@
+/* Test variable number of 512-bit vector arguments passed to functions.  */
+
+#include <stdio.h>
+#include "avx512fp16-zmm-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m512_varargs (__m512 i0, __m512 i1, __m512 i2,
+				__m512 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m512 *argp;
+
+  compare (values.i0, i0, __m512);
+  compare (values.i1, i1, __m512);
+  compare (values.i2, i2, __m512);
+  compare (values.i3, i3, __m512);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m512 *)(((char *) fp) + 8);
+
+  /* Check __m512 arguments passed on stack.  */
+  compare (values.i4, argp[0], __m512);
+  compare (values.i5, argp[1], __m512);
+  compare (values.i6, argp[2], __m512);
+  compare (values.i7, argp[3], __m512);
+  compare (values.i8, argp[4], __m512);
+  compare (values.i9, argp[5], __m512);
+
+  /* Check register contents.  */
+  compare (fregs.zmm0, zmm_regs[0], __m512);
+  compare (fregs.zmm1, zmm_regs[1], __m512);
+  compare (fregs.zmm2, zmm_regs[2], __m512);
+  compare (fregs.zmm3, zmm_regs[3], __m512);
+}
+
+void
+fun_check_passing_m512h_varargs (__m512h i0, __m512h i1, __m512h i2,
+				 __m512h i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m512h *argp;
+
+  compare (values.i0, i0, __m512h);
+  compare (values.i1, i1, __m512h);
+  compare (values.i2, i2, __m512h);
+  compare (values.i3, i3, __m512h);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+    fp++;
+
+  /* Skip the return address stack slot.  */
+  argp = (__m512h *)(((char *) fp) + 8);
+
+  /* Check __m512h arguments passed on stack.  */
+  compare (values.i4, argp[0], __m512h);
+  compare (values.i5, argp[1], __m512h);
+  compare (values.i6, argp[2], __m512h);
+  compare (values.i7, argp[3], __m512h);
+  compare (values.i8, argp[4], __m512h);
+  compare (values.i9, argp[5], __m512h);
+
+  /* Check register contents.  */
+  compare (fregs.zmm0, zmm_regs[0], __m512h);
+  compare (fregs.zmm1, zmm_regs[1], __m512h);
+  compare (fregs.zmm2, zmm_regs[2], __m512h);
+  compare (fregs.zmm3, zmm_regs[3], __m512h);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+				      _i6, _i7, _i8, _i9, \
+				      _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m512_varargs (void)
+{
+  __m512 x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m512){32+i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m512-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m512_varargs,
+				 _m512);
+}
+
+void
+test_m512h_varargs (void)
+{
+  __m512h x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+    x[i] = (__m512h) {
+        1.1f16 + i, 2.2f16 + i, 3.3f16 + i, 4.4f16 + i,
+	5.5f16 + i, 6.6f16 + i, 7.7f16 + i, 8.8f16 + i,
+	9.9f16 + i, 10.10f16 + i, 11.11f16 + i, 12.12f16 + i,
+	13.13f16 + i, 14.14f16 + i, 15.15f16 + i, 16.16f16 + i,
+	17.17f16 + i, 18.18f16 + i, 19.19f16 + i, 20.20f16 + i,
+	21.21f16 + i, 22.22f16 + i, 23.23f16 + i, 24.24f16 + i,
+	25.25f16 + i, 26.26f16 + i, 27.27f16 + i, 28.28f16 + i,
+	29.29f16 + i, 30.30f16 + i, 31.31f16 + i, 32.32f16 + i
+    };
+  pass = "m512h-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+				 x[6], x[7], x[8], x[9],
+				 fun_check_passing_m512h_varargs,
+				 _m512h);
+}
+
+void
+do_test (void)
+{
+  test_m512_varargs ();
+  test_m512h_varargs ();
+  if (failed)
+    abort ();
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-07-21  7:43         ` [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
@ 2021-07-21 10:35           ` Uros Bizjak
  2021-07-22  5:21             ` Hongtao Liu
  2021-07-22 11:56           ` Richard Biener
  2021-07-28 21:56           ` Joseph Myers
  2 siblings, 1 reply; 138+ messages in thread
From: Uros Bizjak @ 2021-07-21 10:35 UTC (permalink / raw)
  To: liuhongt
  Cc: gcc-patches, Joseph S. Myers, H. J. Lu, Richard Biener, Hongtao Liu

On Wed, Jul 21, 2021 at 9:43 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
>         * config/i386/i386.c (enum x86_64_reg_class): Add
>         X86_64_SSEHF_CLASS.
>         (merge_classes): Handle X86_64_SSEHF_CLASS.
>         (examine_argument): Ditto.
>         (construct_container): Ditto.
>         (classify_argument): Ditto, and set HFmode/HCmode to
>         X86_64_SSEHF_CLASS.
>         (function_value_32): Return _FLoat16/Complex Float16 by
>         %xmm0/%xmm1.
>         (function_value_64): Return _Float16/Complex Float16 by SSE
>         register.
>         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
>         (ix86_secondary_reload): Require gpr as intermediate register
>         to store _Float16 from sse register when sse4 is not
>         available.
>         (ix86_hard_regno_mode_ok): Put HFmode in sse register and gpr.
>         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
>         sse2.
>         (ix86_scalar_mode_supported_p): Ditto.
>         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
>         (ix86_get_excess_precision): Return
>         FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 under sse2.
>         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
>         * config/i386/i386.md (*pushhf_rex64): New define_insn.
>         (*pushhf): Ditto.
>         (*movhf_internal): Ditto.
>         * doc/extend.texi (Half-Precision Floating Point): Documemt
>         _Float16 for x86.
>
> gcc/lto/ChangeLog:
>
>         * lto-lang.c (lto_type_for_mode): Return float16_type_node
>         when mode == TYPE_MODE (float16_type_node).
>
> gcc/testsuite/ChangeLog
>
>         * gcc.target/i386/sse2-float16-1.c: New test.
>         * gcc.target/i386/sse2-float16-2.c: Ditto.
>         * gcc.target/i386/sse2-float16-3.c: Ditto.

OK for the x86 part with some small changes inline.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-modes.def                |   1 +
>  gcc/config/i386/i386.c                        |  99 ++++++++++++++-
>  gcc/config/i386/i386.h                        |   2 +-
>  gcc/config/i386/i386.md                       | 118 +++++++++++++++++-
>  gcc/doc/extend.texi                           |  16 +++
>  gcc/lto/lto-lang.c                            |   3 +
>  .../gcc.target/i386/sse2-float16-1.c          |   8 ++
>  .../gcc.target/i386/sse2-float16-2.c          |  16 +++
>  .../gcc.target/i386/sse2-float16-3.c          |  12 ++
>  9 files changed, 265 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
>
> diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
> index 4e7014be034..9232f59a925 100644
> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
>  FLOAT_MODE (TF, 16, ieee_quad_format);
> +FLOAT_MODE (HF, 2, ieee_half_format);
>
>  /* In ILP32 mode, XFmode has size 12 and alignment 4.
>     In LP64 mode, XFmode has size and alignment 16.  */
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ff96134fb37..02628d838fc 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -387,6 +387,7 @@ enum x86_64_reg_class
>      X86_64_INTEGER_CLASS,
>      X86_64_INTEGERSI_CLASS,
>      X86_64_SSE_CLASS,
> +    X86_64_SSEHF_CLASS,
>      X86_64_SSESF_CLASS,
>      X86_64_SSEDF_CLASS,
>      X86_64_SSEUP_CLASS,
> @@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum x86_64_reg_class class2)
>      return X86_64_MEMORY_CLASS;
>
>    /* Rule #4: If one of the classes is INTEGER, the result is INTEGER.  */
> -  if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
> -      || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS))
> +  if ((class1 == X86_64_INTEGERSI_CLASS
> +       && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
> +      || (class2 == X86_64_INTEGERSI_CLASS
> +         && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS)))
>      return X86_64_INTEGERSI_CLASS;
>    if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
>        || class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS)
> @@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
>             /* The partial classes are now full classes.  */
>             if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4)
>               subclasses[0] = X86_64_SSE_CLASS;
> +           if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2)
> +             subclasses[0] = X86_64_SSE_CLASS;
>             if (subclasses[0] == X86_64_INTEGERSI_CLASS
>                 && !((bit_offset % 64) == 0 && bytes == 4))
>               subclasses[0] = X86_64_INTEGER_CLASS;
> @@ -2350,6 +2355,12 @@ classify_argument (machine_mode mode, const_tree type,
>        gcc_unreachable ();
>      case E_CTImode:
>        return 0;
> +    case E_HFmode:
> +      if (!(bit_offset % 64))
> +       classes[0] = X86_64_SSEHF_CLASS;
> +      else
> +       classes[0] = X86_64_SSE_CLASS;
> +      return 1;
>      case E_SFmode:
>        if (!(bit_offset % 64))
>         classes[0] = X86_64_SSESF_CLASS;
> @@ -2367,6 +2378,15 @@ classify_argument (machine_mode mode, const_tree type,
>        classes[0] = X86_64_SSE_CLASS;
>        classes[1] = X86_64_SSEUP_CLASS;
>        return 2;
> +    case E_HCmode:
> +      classes[0] = X86_64_SSE_CLASS;
> +      if (!(bit_offset % 64))
> +       return 1;
> +      else
> +       {
> +         classes[1] = X86_64_SSEHF_CLASS;
> +         return 2;
> +       }
>      case E_SCmode:
>        classes[0] = X86_64_SSE_CLASS;
>        if (!(bit_offset % 64))
> @@ -2481,6 +2501,7 @@ examine_argument (machine_mode mode, const_tree type, int in_return,
>         (*int_nregs)++;
>         break;
>        case X86_64_SSE_CLASS:
> +      case X86_64_SSEHF_CLASS:
>        case X86_64_SSESF_CLASS:
>        case X86_64_SSEDF_CLASS:
>         (*sse_nregs)++;
> @@ -2580,13 +2601,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
>
>    /* First construct simple cases.  Avoid SCmode, since we want to use
>       single register to pass this type.  */
> -  if (n == 1 && mode != SCmode)
> +  if (n == 1 && mode != SCmode && mode != HCmode)
>      switch (regclass[0])
>        {
>        case X86_64_INTEGER_CLASS:
>        case X86_64_INTEGERSI_CLASS:
>         return gen_rtx_REG (mode, intreg[0]);
>        case X86_64_SSE_CLASS:
> +      case X86_64_SSEHF_CLASS:
>        case X86_64_SSESF_CLASS:
>        case X86_64_SSEDF_CLASS:
>         if (mode != BLKmode)
> @@ -2683,6 +2705,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
>                                    GEN_INT (i*8));
>             intreg++;
>             break;
> +         case X86_64_SSEHF_CLASS:
> +           exp [nexps++]
> +             = gen_rtx_EXPR_LIST (VOIDmode,
> +                                  gen_rtx_REG (HFmode,
> +                                               GET_SSE_REGNO (sse_regno)),
> +                                  GEN_INT (i*8));
> +           sse_regno++;
> +           break;
>           case X86_64_SSESF_CLASS:
>             exp [nexps++]
>               = gen_rtx_EXPR_LIST (VOIDmode,
> @@ -3903,6 +3933,19 @@ function_value_32 (machine_mode orig_mode, machine_mode mode,
>      /* Most things go in %eax.  */
>      regno = AX_REG;
>
> +  /* Return _Float16/_Complex _Foat16 by sse register.  */
> +  if (mode == HFmode)
> +    regno = FIRST_SSE_REG;
> +  if (mode == HCmode)
> +    {
> +      rtx ret = gen_rtx_PARALLEL (mode, rtvec_alloc(1));
> +      XVECEXP (ret, 0, 0)
> +       = gen_rtx_EXPR_LIST (VOIDmode,
> +                            gen_rtx_REG (SImode, FIRST_SSE_REG),
> +                            GEN_INT (0));
> +      return ret;
> +    }
> +
>    /* Override FP return register with %xmm0 for local functions when
>       SSE math is enabled or for functions with sseregparm attribute.  */
>    if ((fn || fntype) && (mode == SFmode || mode == DFmode))
> @@ -3939,6 +3982,8 @@ function_value_64 (machine_mode orig_mode, machine_mode mode,
>
>        switch (mode)
>         {
> +       case E_HFmode:
> +       case E_HCmode:
>         case E_SFmode:
>         case E_SCmode:
>         case E_DFmode:
> @@ -13411,6 +13456,15 @@ ix86_print_operand (FILE *file, rtx x, int code)
>           (file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P');
>      }
>
> +  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == HFmode)
> +    {
> +      long l = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (x),
> +                              REAL_MODE_FORMAT (HFmode));
> +      if (ASSEMBLER_DIALECT == ASM_ATT)
> +       putc ('$', file);
> +      fprintf (file, "0x%04x", (unsigned int) l);
> +    }
> +
>    else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
>      {
>        long l;
> @@ -18928,6 +18982,16 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
>        return NO_REGS;
>      }
>
> +  /* Require movement to gpr, and then store to memory.  */
> +  if (mode == HFmode
> +      && !TARGET_SSE4_1
> +      && SSE_CLASS_P (rclass)
> +      && !in_p && MEM_P (x))
> +    {
> +      sri->extra_cost = 1;
> +      return GENERAL_REGS;
> +    }
> +
>    /* This condition handles corner case where an expression involving
>       pointers gets vectorized.  We're trying to use the address of a
>       stack slot as a vector initializer.
> @@ -19546,6 +19610,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>    else if (VALID_INT_MODE_P (mode)
>            || VALID_FP_MODE_P (mode))
>      return true;
> +  else if (mode == HFmode || mode == HCmode)
> +    return true;

Please add these two modes to VALID_INT_MODE_P instead.

>    /* Lots of MMX code casts 8 byte vector modes to DImode.  If we then go
>       on to use that value in smaller contexts, this can easily force a
>       pseudo to be allocated to GENERAL_REGS.  Since this is no worse than
> @@ -21555,10 +21621,27 @@ ix86_scalar_mode_supported_p (scalar_mode mode)
>      return default_decimal_float_supported_p ();
>    else if (mode == TFmode)
>      return true;
> +  else if (mode == HFmode && TARGET_SSE2)
> +    return true;
>    else
>      return default_scalar_mode_supported_p (mode);
>  }
>
> +/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
> +   if MODE is HFmode, and punt to the generic implementation otherwise.  */
> +
> +static bool
> +ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode)
> +{
> +  /* NB: Always return TRUE for HFmode so that the _Float16 type will
> +     be defined by the C front-end for AVX512FP16 intrinsics.  We will
> +     issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
> +     enabled.  */
> +  return ((mode == HFmode && TARGET_SSE2)
> +         ? true
> +         : default_libgcc_floating_mode_supported_p (mode));
> +}
> +
>  /* Implements target hook vector_mode_supported_p.  */
>  static bool
>  ix86_vector_mode_supported_p (machine_mode mode)
> @@ -23254,13 +23337,15 @@ ix86_get_excess_precision (enum excess_precision_type type)
>            provide would be identical were it not for the unpredictable
>            cases.  */
>         if (!TARGET_80387)
> -         return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +         return TARGET_SSE2
> +                ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> +                : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
>         else if (!TARGET_MIX_SSE_I387)
>           {
>             if (!(TARGET_SSE && TARGET_SSE_MATH))
>               return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
>             else if (TARGET_SSE2)
> -             return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +             return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
>           }
>
>         /* If we are in standards compliant mode, but we know we will
> @@ -23820,6 +23905,10 @@ ix86_run_selftests (void)
>  #undef TARGET_SCALAR_MODE_SUPPORTED_P
>  #define TARGET_SCALAR_MODE_SUPPORTED_P ix86_scalar_mode_supported_p
>
> +#undef TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P
> +#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P        \
> +ix86_libgcc_floating_mode_supported_p
> +
>  #undef TARGET_VECTOR_MODE_SUPPORTED_P
>  #define TARGET_VECTOR_MODE_SUPPORTED_P ix86_vector_mode_supported_p
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 0c2c93daf32..e21922e8782 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1018,7 +1018,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define VALID_SSE2_REG_MODE(MODE)                                      \
>    ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode     \
>     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
> -   || (MODE) == V2DImode || (MODE) == DFmode)
> +   || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
>
>  #define VALID_SSE_REG_MODE(MODE)                                       \
>    ((MODE) == V1TImode || (MODE) == TImode                              \
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 8b809c49fe0..dd991c3ffdf 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1222,6 +1222,9 @@ (define_mode_iterator MODEF [SF DF])
>  ;; All x87 floating point modes
>  (define_mode_iterator X87MODEF [SF DF XF])
>
> +;; All x87 floating point modes plus HF
> +(define_mode_iterator X87MODEFH [SF DF XF HF])
> +
>  ;; All SSE floating point modes
>  (define_mode_iterator SSEMODEF [SF DF TF])
>  (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
> @@ -3130,6 +3133,32 @@ (define_split
>    operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx);
>  })
>
> +(define_insn "*pushhf_rex64"
> +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> +       (match_operand:HF 1 "nonmemory_no_elim_operand" "r,x"))]
> +  "TARGET_64BIT"
> +{
> +  /* Anything else should be already split before reg-stack.  */
> +  gcc_assert (which_alternative == 0);
> +  return "push{q}\t%q1";
> +}
> +  [(set_attr "type" "push,multi")
> +   (set_attr "mode" "DI,TI")
> +   (set_attr "isa"  "*,sse4")])

Please always put "isa" attribute first, as is the case with other
insn patterns.

> +(define_insn "*pushhf"
> +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> +       (match_operand:HF 1 "general_no_elim_operand" "rmF,x"))]
> +  "!TARGET_64BIT"
> +{
> +  /* Anything else should be already split before reg-stack.  */
> +  gcc_assert (which_alternative == 0);
> +  return "push{l}\t%k1";
> +}
> +  [(set_attr "type" "push,multi")
> +   (set_attr "mode" "SI,TI")
> +   (set_attr "isa"  "*,sse4")])

Also here.

> +
>  (define_insn "*pushsf_rex64"
>    [(set (match_operand:SF 0 "push_operand" "=X,X,X")
>         (match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,v"))]
> @@ -3158,10 +3187,11 @@ (define_insn "*pushsf"
>     (set_attr "unit" "i387,*,*")
>     (set_attr "mode" "SF,SI,SF")])
>
> +(define_mode_iterator MODESH [SF HF])
>  ;; %%% Kill this when call knows how to work this out.
>  (define_split
> -  [(set (match_operand:SF 0 "push_operand")
> -       (match_operand:SF 1 "any_fp_register_operand"))]
> +  [(set (match_operand:MODESH 0 "push_operand")
> +       (match_operand:MODESH 1 "any_fp_register_operand"))]
>    "reload_completed"
>    [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2)))
>     (set (match_dup 0) (match_dup 1))]
> @@ -3209,8 +3239,8 @@ (define_expand "movtf"
>    "ix86_expand_move (TFmode, operands); DONE;")
>
>  (define_expand "mov<mode>"
> -  [(set (match_operand:X87MODEF 0 "nonimmediate_operand")
> -       (match_operand:X87MODEF 1 "general_operand"))]
> +  [(set (match_operand:X87MODEFH 0 "nonimmediate_operand")
> +       (match_operand:X87MODEFH 1 "general_operand"))]
>    ""
>    "ix86_expand_move (<MODE>mode, operands); DONE;")
>
> @@ -3646,6 +3676,86 @@ (define_insn "*movsf_internal"
>            ]
>            (const_string "*")))])
>
> +(define_insn "*movhf_internal"
> + [(set (match_operand:HF 0 "nonimmediate_operand"
> +        "=?r,?m,v,v,?r,m,?v,v")
> +       (match_operand:HF 1 "general_operand"
> +        "rmF,rF,C,v, v,v, r,m"))]
> + "!(MEM_P (operands[0]) && MEM_P (operands[1]))
> +  && (lra_in_progress
> +      || reload_completed
> +      || !CONST_DOUBLE_P (operands[1])
> +      || (TARGET_SSE && TARGET_SSE_MATH
> +         && standard_sse_constant_p (operands[1], HFmode) == 1)
> +      || memory_operand (operands[0], HFmode))"
> +{
> +  switch (get_attr_type (insn))
> +    {
> +    case TYPE_IMOV:
> +      return "mov{w}\t{%1, %0|%0, %1}";
> +
> +    case TYPE_SSELOG1:
> +      return standard_sse_constant_opcode (insn, operands);
> +
> +    case TYPE_SSEMOV:
> +      return ix86_output_ssemov (insn, operands);
> +
> +    case TYPE_SSELOG:
> +      if (SSE_REG_P (operands[0]))
> +       return MEM_P (operands[1])
> +              ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> +              : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> +      else
> +       return MEM_P (operands[1])
> +              ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> +              : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> +
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +  [(set (attr "isa")
> +       (cond [(eq_attr "alternative" "2,3,4,6,7")
> +                (const_string "sse2")
> +              (eq_attr "alternative" "5")
> +                (const_string "sse4")
> +             ]
> +             (const_string "*")))
> +   (set (attr "type")
> +       (cond [(eq_attr "alternative" "0,1")
> +                (const_string "imov")
> +              (eq_attr "alternative" "2")
> +                (const_string "sselog1")
> +              (eq_attr "alternative" "4,5,6,7")
> +                (const_string "sselog")
> +             ]
> +             (const_string "ssemov")))
> +   (set (attr "memory")
> +       (cond [(eq_attr "alternative" "4,6")
> +                (const_string "none")
> +              (eq_attr "alternative" "5")
> +                (const_string "store")
> +              (eq_attr "alternative" "7")
> +                (const_string "load")
> +             ]
> +             (const_string "*")))
> +   (set (attr "prefix")
> +       (cond [(eq_attr "alternative" "0,1")
> +                (const_string "orig")
> +             ]
> +             (const_string "maybe_vex")))
> +   (set (attr "mode")
> +       (cond [(eq_attr "alternative" "0,1")
> +                (const_string "HI")
> +              (eq_attr "alternative" "2")
> +                (const_string "V4SF")
> +              (eq_attr "alternative" "4,5,6,7")
> +                (const_string "TI")
> +              (eq_attr "alternative" "3")
> +                (const_string "SF")
> +             ]
> +             (const_string "*")))])
> +
>  (define_split
>    [(set (match_operand 0 "any_fp_register_operand")
>         (match_operand 1 "memory_operand"))]
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index b83cd4919bb..2cd0b38fe5b 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
>  @section Half-Precision Floating Point
>  @cindex half-precision floating point
>  @cindex @code{__fp16} data type
> +@cindex @code{__Float16} data type
>
>  On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
>  point via the @code{__fp16} type defined in the ARM C Language Extensions.
> @@ -1150,6 +1151,21 @@ calls.
>  It is recommended that portable code use the @code{_Float16} type defined
>  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
>
> +On x86 targets with @code{target("sse2")} and above, GCC supports half-precision
> +(16-bit) floating point via the @code{_Float16} type which is defined by
> +18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> +which contains same data format as C.
> +
> +Without @code{target("avx512fp16")} @code{_Float16} type is storage only, and all
> +operations will be emulated by soft-fp and @code{float} instructions.
> +
> +Soft-fp keeps the intermediate result of the operation at 32-bit precision by defaults,
> +which may lead to inconsistent behavior between soft-fp and avx512fp16 instructions,
> +using @option{-fexcess-precision=standard} will force round back after every operation.
> +
> +With @option{-mavx512fp16}, instead of calling soft-fp, GCC automatically generates
> +hardware instructions.
> +
>  @node Decimal Float
>  @section Decimal Floating Types
>  @cindex decimal floating types
> diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
> index c13c7e45ac1..92f499643b5 100644
> --- a/gcc/lto/lto-lang.c
> +++ b/gcc/lto/lto-lang.c
> @@ -992,6 +992,9 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>      return unsigned_p ? unsigned_intTI_type_node : intTI_type_node;
>  #endif
>
> +  if (float16_type_node && mode == TYPE_MODE (float16_type_node))
> +    return float16_type_node;
> +
>    if (mode == TYPE_MODE (float_type_node))
>      return float_type_node;
>
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-1.c b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> new file mode 100644
> index 00000000000..1b645eb499d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-sse2" } */
> +
> +_Float16/* { dg-error "is not supported on this target" } */
> +foo (_Float16 x) /* { dg-error "is not supported on this target" } */
> +{
> +  return x;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-2.c b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> new file mode 100644
> index 00000000000..3da7683fc31
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx512f" } */
> +
> +union flt
> +{
> +  _Float16 flt;
> +  short s;
> +};
> +
> +_Float16
> +foo (union flt x)
> +{
> +  return x.flt;
> +}
> +
> +/* { dg-final { scan-assembler {(?n)pinsrw[\t ].*%xmm0} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-3.c b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> new file mode 100644
> index 00000000000..60ff9d4ab80
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx512f" } */
> +
> +#include<complex.h>
> +
> +_Complex _Float16
> +foo (_Complex _Float16 x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler {(?n)movd[\t ].*%xmm0} } } */
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.
  2021-07-21  7:43         ` [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations liuhongt
@ 2021-07-21 10:51           ` Uros Bizjak
  2021-07-22 12:14           ` Richard Biener
  1 sibling, 0 replies; 138+ messages in thread
From: Uros Bizjak @ 2021-07-21 10:51 UTC (permalink / raw)
  To: liuhongt
  Cc: gcc-patches, Joseph S. Myers, H. J. Lu, Richard Biener, Hongtao Liu

On Wed, Jul 21, 2021 at 9:43 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * optabs-query.c (get_best_extraction_insn): Use word_mode for
>         HF field.
>
> libgcc/ChangeLog:
>
>         * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
>         * config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto.
>         * config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto.
>         * config/i386/t-softfp: Add hf soft-fp.
>         * config.host: Add i386/64/t-softfp.
>         * config/i386/64/t-softfp: New file.

OK for the x86 part, but please take care of newline at the end of
files to avoid:

> \ No newline at end of file

Thanks,
Uros.

> ---
>  gcc/optabs-query.c                  | 10 +++++++++-
>  libgcc/config.host                  |  5 +----
>  libgcc/config/i386/32/sfp-machine.h |  1 +
>  libgcc/config/i386/64/sfp-machine.h |  1 +
>  libgcc/config/i386/64/t-softfp      |  1 +
>  libgcc/config/i386/sfp-machine.h    |  1 +
>  libgcc/config/i386/t-softfp         |  5 +++++
>  7 files changed, 19 insertions(+), 5 deletions(-)
>  create mode 100644 libgcc/config/i386/64/t-softfp
>
> diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
> index 05ee5f517da..0438e451474 100644
> --- a/gcc/optabs-query.c
> +++ b/gcc/optabs-query.c
> @@ -205,7 +205,15 @@ get_best_extraction_insn (extraction_insn *insn,
>                           machine_mode field_mode)
>  {
>    opt_scalar_int_mode mode_iter;
> -  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode_for_size (struct_bits))
> +  scalar_int_mode smallest_int_mode;
> +  /* FIXME: validate_subreg only allows (subreg:WORD_MODE (reg:HF) 0). */
> +  if (FLOAT_MODE_P (field_mode)
> +      && known_eq (GET_MODE_SIZE (field_mode), 2))
> +    smallest_int_mode = word_mode;
> +  else
> +    smallest_int_mode = smallest_int_mode_for_size (struct_bits);
> +
> +  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode)
>      {
>        scalar_int_mode mode = mode_iter.require ();
>        if (get_extraction_insn (insn, pattern, type, mode))
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 50f00062232..96da9ef1cce 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -1540,10 +1540,7 @@ i[34567]86-*-elfiamcu | i[34567]86-*-rtems*)
>         ;;
>  i[34567]86-*-* | x86_64-*-*)
>         tmake_file="${tmake_file} t-softfp-tf"
> -       if test "${host_address}" = 32; then
> -               tmake_file="${tmake_file} i386/${host_address}/t-softfp"
> -       fi
> -       tmake_file="${tmake_file} i386/t-softfp t-softfp"
> +       tmake_file="${tmake_file} i386/${host_address}/t-softfp i386/t-softfp t-softfp"
>         ;;
>  esac
>
> diff --git a/libgcc/config/i386/32/sfp-machine.h b/libgcc/config/i386/32/sfp-machine.h
> index 1fa282d7afe..e24cbc8d180 100644
> --- a/libgcc/config/i386/32/sfp-machine.h
> +++ b/libgcc/config/i386/32/sfp-machine.h
> @@ -86,6 +86,7 @@
>  #define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>
> +#define _FP_NANFRAC_H          _FP_QNANBIT_H
>  #define _FP_NANFRAC_S          _FP_QNANBIT_S
>  #define _FP_NANFRAC_D          _FP_QNANBIT_D, 0
>  /* Even if XFmode is 12byte,  we have to pad it to
> diff --git a/libgcc/config/i386/64/sfp-machine.h b/libgcc/config/i386/64/sfp-machine.h
> index 1ff94c23ea4..e1c616699bb 100644
> --- a/libgcc/config/i386/64/sfp-machine.h
> +++ b/libgcc/config/i386/64/sfp-machine.h
> @@ -13,6 +13,7 @@ typedef unsigned int UTItype __attribute__ ((mode (TI)));
>
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>
> +#define _FP_NANFRAC_H          _FP_QNANBIT_H
>  #define _FP_NANFRAC_S          _FP_QNANBIT_S
>  #define _FP_NANFRAC_D          _FP_QNANBIT_D
>  #define _FP_NANFRAC_E          _FP_QNANBIT_E, 0
> diff --git a/libgcc/config/i386/64/t-softfp b/libgcc/config/i386/64/t-softfp
> new file mode 100644
> index 00000000000..d812bb120bd
> --- /dev/null
> +++ b/libgcc/config/i386/64/t-softfp
> @@ -0,0 +1 @@
> +softfp_extras := fixhfti fixunshfti floattihf floatuntihf
> \ No newline at end of file
> diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
> index 8319f0550bc..f15d29d3755 100644
> --- a/libgcc/config/i386/sfp-machine.h
> +++ b/libgcc/config/i386/sfp-machine.h
> @@ -17,6 +17,7 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
>  #define _FP_KEEPNANFRACP       1
>  #define _FP_QNANNEGATEDP 0
>
> +#define _FP_NANSIGN_H          1
>  #define _FP_NANSIGN_S          1
>  #define _FP_NANSIGN_D          1
>  #define _FP_NANSIGN_E          1
> diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp
> index 685d9cf8502..4ac214eb0ce 100644
> --- a/libgcc/config/i386/t-softfp
> +++ b/libgcc/config/i386/t-softfp
> @@ -1 +1,6 @@
>  LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c
> +
> +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> +
> +softfp_extras += eqhf2
> \ No newline at end of file
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-07-21 10:35           ` Uros Bizjak
@ 2021-07-22  5:21             ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-22  5:21 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: liuhongt, gcc-patches, Joseph S. Myers, H. J. Lu, Richard Biener

On Wed, Jul 21, 2021 at 6:35 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Jul 21, 2021 at 9:43 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> >         * config/i386/i386.c (enum x86_64_reg_class): Add
> >         X86_64_SSEHF_CLASS.
> >         (merge_classes): Handle X86_64_SSEHF_CLASS.
> >         (examine_argument): Ditto.
> >         (construct_container): Ditto.
> >         (classify_argument): Ditto, and set HFmode/HCmode to
> >         X86_64_SSEHF_CLASS.
> >         (function_value_32): Return _FLoat16/Complex Float16 by
> >         %xmm0/%xmm1.
I forget to update changelog entry here, Complex _Float16 will be
returned by 1 sse register, will be updated in my next version.
> >         (function_value_64): Return _Float16/Complex Float16 by SSE
> >         register.
> >         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> >         (ix86_secondary_reload): Require gpr as intermediate register
> >         to store _Float16 from sse register when sse4 is not
> >         available.
> >         (ix86_hard_regno_mode_ok): Put HFmode in sse register and gpr.
> >         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> >         sse2.
> >         (ix86_scalar_mode_supported_p): Ditto.
> >         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> >         (ix86_get_excess_precision): Return
> >         FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 under sse2.
> >         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> >         * config/i386/i386.md (*pushhf_rex64): New define_insn.
> >         (*pushhf): Ditto.
> >         (*movhf_internal): Ditto.
> >         * doc/extend.texi (Half-Precision Floating Point): Documemt
> >         _Float16 for x86.
> >
> > gcc/lto/ChangeLog:
> >
> >         * lto-lang.c (lto_type_for_mode): Return float16_type_node
> >         when mode == TYPE_MODE (float16_type_node).
> >
> > gcc/testsuite/ChangeLog
> >
> >         * gcc.target/i386/sse2-float16-1.c: New test.
> >         * gcc.target/i386/sse2-float16-2.c: Ditto.
> >         * gcc.target/i386/sse2-float16-3.c: Ditto.
>
> OK for the x86 part with some small changes inline.
>
> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386-modes.def                |   1 +
> >  gcc/config/i386/i386.c                        |  99 ++++++++++++++-
> >  gcc/config/i386/i386.h                        |   2 +-
> >  gcc/config/i386/i386.md                       | 118 +++++++++++++++++-
> >  gcc/doc/extend.texi                           |  16 +++
> >  gcc/lto/lto-lang.c                            |   3 +
> >  .../gcc.target/i386/sse2-float16-1.c          |   8 ++
> >  .../gcc.target/i386/sse2-float16-2.c          |  16 +++
> >  .../gcc.target/i386/sse2-float16-3.c          |  12 ++
> >  9 files changed, 265 insertions(+), 10 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> >
> > diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
> > index 4e7014be034..9232f59a925 100644
> > --- a/gcc/config/i386/i386-modes.def
> > +++ b/gcc/config/i386/i386-modes.def
> > @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
> >
> >  FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
> >  FLOAT_MODE (TF, 16, ieee_quad_format);
> > +FLOAT_MODE (HF, 2, ieee_half_format);
> >
> >  /* In ILP32 mode, XFmode has size 12 and alignment 4.
> >     In LP64 mode, XFmode has size and alignment 16.  */
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index ff96134fb37..02628d838fc 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -387,6 +387,7 @@ enum x86_64_reg_class
> >      X86_64_INTEGER_CLASS,
> >      X86_64_INTEGERSI_CLASS,
> >      X86_64_SSE_CLASS,
> > +    X86_64_SSEHF_CLASS,
> >      X86_64_SSESF_CLASS,
> >      X86_64_SSEDF_CLASS,
> >      X86_64_SSEUP_CLASS,
> > @@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum x86_64_reg_class class2)
> >      return X86_64_MEMORY_CLASS;
> >
> >    /* Rule #4: If one of the classes is INTEGER, the result is INTEGER.  */
> > -  if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
> > -      || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS))
> > +  if ((class1 == X86_64_INTEGERSI_CLASS
> > +       && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
> > +      || (class2 == X86_64_INTEGERSI_CLASS
> > +         && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS)))
> >      return X86_64_INTEGERSI_CLASS;
> >    if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
> >        || class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS)
> > @@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
> >             /* The partial classes are now full classes.  */
> >             if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4)
> >               subclasses[0] = X86_64_SSE_CLASS;
> > +           if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2)
> > +             subclasses[0] = X86_64_SSE_CLASS;
> >             if (subclasses[0] == X86_64_INTEGERSI_CLASS
> >                 && !((bit_offset % 64) == 0 && bytes == 4))
> >               subclasses[0] = X86_64_INTEGER_CLASS;
> > @@ -2350,6 +2355,12 @@ classify_argument (machine_mode mode, const_tree type,
> >        gcc_unreachable ();
> >      case E_CTImode:
> >        return 0;
> > +    case E_HFmode:
> > +      if (!(bit_offset % 64))
> > +       classes[0] = X86_64_SSEHF_CLASS;
> > +      else
> > +       classes[0] = X86_64_SSE_CLASS;
> > +      return 1;
> >      case E_SFmode:
> >        if (!(bit_offset % 64))
> >         classes[0] = X86_64_SSESF_CLASS;
> > @@ -2367,6 +2378,15 @@ classify_argument (machine_mode mode, const_tree type,
> >        classes[0] = X86_64_SSE_CLASS;
> >        classes[1] = X86_64_SSEUP_CLASS;
> >        return 2;
> > +    case E_HCmode:
> > +      classes[0] = X86_64_SSE_CLASS;
> > +      if (!(bit_offset % 64))
> > +       return 1;
> > +      else
> > +       {
> > +         classes[1] = X86_64_SSEHF_CLASS;
> > +         return 2;
> > +       }
> >      case E_SCmode:
> >        classes[0] = X86_64_SSE_CLASS;
> >        if (!(bit_offset % 64))
> > @@ -2481,6 +2501,7 @@ examine_argument (machine_mode mode, const_tree type, int in_return,
> >         (*int_nregs)++;
> >         break;
> >        case X86_64_SSE_CLASS:
> > +      case X86_64_SSEHF_CLASS:
> >        case X86_64_SSESF_CLASS:
> >        case X86_64_SSEDF_CLASS:
> >         (*sse_nregs)++;
> > @@ -2580,13 +2601,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
> >
> >    /* First construct simple cases.  Avoid SCmode, since we want to use
> >       single register to pass this type.  */
> > -  if (n == 1 && mode != SCmode)
> > +  if (n == 1 && mode != SCmode && mode != HCmode)
> >      switch (regclass[0])
> >        {
> >        case X86_64_INTEGER_CLASS:
> >        case X86_64_INTEGERSI_CLASS:
> >         return gen_rtx_REG (mode, intreg[0]);
> >        case X86_64_SSE_CLASS:
> > +      case X86_64_SSEHF_CLASS:
> >        case X86_64_SSESF_CLASS:
> >        case X86_64_SSEDF_CLASS:
> >         if (mode != BLKmode)
> > @@ -2683,6 +2705,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
> >                                    GEN_INT (i*8));
> >             intreg++;
> >             break;
> > +         case X86_64_SSEHF_CLASS:
> > +           exp [nexps++]
> > +             = gen_rtx_EXPR_LIST (VOIDmode,
> > +                                  gen_rtx_REG (HFmode,
> > +                                               GET_SSE_REGNO (sse_regno)),
> > +                                  GEN_INT (i*8));
> > +           sse_regno++;
> > +           break;
> >           case X86_64_SSESF_CLASS:
> >             exp [nexps++]
> >               = gen_rtx_EXPR_LIST (VOIDmode,
> > @@ -3903,6 +3933,19 @@ function_value_32 (machine_mode orig_mode, machine_mode mode,
> >      /* Most things go in %eax.  */
> >      regno = AX_REG;
> >
> > +  /* Return _Float16/_Complex _Foat16 by sse register.  */
> > +  if (mode == HFmode)
> > +    regno = FIRST_SSE_REG;
> > +  if (mode == HCmode)
> > +    {
> > +      rtx ret = gen_rtx_PARALLEL (mode, rtvec_alloc(1));
> > +      XVECEXP (ret, 0, 0)
> > +       = gen_rtx_EXPR_LIST (VOIDmode,
> > +                            gen_rtx_REG (SImode, FIRST_SSE_REG),
> > +                            GEN_INT (0));
> > +      return ret;
> > +    }
> > +
> >    /* Override FP return register with %xmm0 for local functions when
> >       SSE math is enabled or for functions with sseregparm attribute.  */
> >    if ((fn || fntype) && (mode == SFmode || mode == DFmode))
> > @@ -3939,6 +3982,8 @@ function_value_64 (machine_mode orig_mode, machine_mode mode,
> >
> >        switch (mode)
> >         {
> > +       case E_HFmode:
> > +       case E_HCmode:
> >         case E_SFmode:
> >         case E_SCmode:
> >         case E_DFmode:
> > @@ -13411,6 +13456,15 @@ ix86_print_operand (FILE *file, rtx x, int code)
> >           (file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P');
> >      }
> >
> > +  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == HFmode)
> > +    {
> > +      long l = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (x),
> > +                              REAL_MODE_FORMAT (HFmode));
> > +      if (ASSEMBLER_DIALECT == ASM_ATT)
> > +       putc ('$', file);
> > +      fprintf (file, "0x%04x", (unsigned int) l);
> > +    }
> > +
> >    else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
> >      {
> >        long l;
> > @@ -18928,6 +18982,16 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
> >        return NO_REGS;
> >      }
> >
> > +  /* Require movement to gpr, and then store to memory.  */
> > +  if (mode == HFmode
> > +      && !TARGET_SSE4_1
> > +      && SSE_CLASS_P (rclass)
> > +      && !in_p && MEM_P (x))
> > +    {
> > +      sri->extra_cost = 1;
> > +      return GENERAL_REGS;
> > +    }
> > +
> >    /* This condition handles corner case where an expression involving
> >       pointers gets vectorized.  We're trying to use the address of a
> >       stack slot as a vector initializer.
> > @@ -19546,6 +19610,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> >    else if (VALID_INT_MODE_P (mode)
> >            || VALID_FP_MODE_P (mode))
> >      return true;
> > +  else if (mode == HFmode || mode == HCmode)
> > +    return true;
>
> Please add these two modes to VALID_INT_MODE_P instead.
>
> >    /* Lots of MMX code casts 8 byte vector modes to DImode.  If we then go
> >       on to use that value in smaller contexts, this can easily force a
> >       pseudo to be allocated to GENERAL_REGS.  Since this is no worse than
> > @@ -21555,10 +21621,27 @@ ix86_scalar_mode_supported_p (scalar_mode mode)
> >      return default_decimal_float_supported_p ();
> >    else if (mode == TFmode)
> >      return true;
> > +  else if (mode == HFmode && TARGET_SSE2)
> > +    return true;
> >    else
> >      return default_scalar_mode_supported_p (mode);
> >  }
> >
> > +/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
> > +   if MODE is HFmode, and punt to the generic implementation otherwise.  */
> > +
> > +static bool
> > +ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode)
> > +{
> > +  /* NB: Always return TRUE for HFmode so that the _Float16 type will
> > +     be defined by the C front-end for AVX512FP16 intrinsics.  We will
> > +     issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
> > +     enabled.  */
> > +  return ((mode == HFmode && TARGET_SSE2)
> > +         ? true
> > +         : default_libgcc_floating_mode_supported_p (mode));
> > +}
> > +
> >  /* Implements target hook vector_mode_supported_p.  */
> >  static bool
> >  ix86_vector_mode_supported_p (machine_mode mode)
> > @@ -23254,13 +23337,15 @@ ix86_get_excess_precision (enum excess_precision_type type)
> >            provide would be identical were it not for the unpredictable
> >            cases.  */
> >         if (!TARGET_80387)
> > -         return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > +         return TARGET_SSE2
> > +                ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> > +                : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> >         else if (!TARGET_MIX_SSE_I387)
> >           {
> >             if (!(TARGET_SSE && TARGET_SSE_MATH))
> >               return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
> >             else if (TARGET_SSE2)
> > -             return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > +             return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> >           }
> >
> >         /* If we are in standards compliant mode, but we know we will
> > @@ -23820,6 +23905,10 @@ ix86_run_selftests (void)
> >  #undef TARGET_SCALAR_MODE_SUPPORTED_P
> >  #define TARGET_SCALAR_MODE_SUPPORTED_P ix86_scalar_mode_supported_p
> >
> > +#undef TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P
> > +#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P        \
> > +ix86_libgcc_floating_mode_supported_p
> > +
> >  #undef TARGET_VECTOR_MODE_SUPPORTED_P
> >  #define TARGET_VECTOR_MODE_SUPPORTED_P ix86_vector_mode_supported_p
> >
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 0c2c93daf32..e21922e8782 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -1018,7 +1018,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> >  #define VALID_SSE2_REG_MODE(MODE)                                      \
> >    ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode     \
> >     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
> > -   || (MODE) == V2DImode || (MODE) == DFmode)
> > +   || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
> >
> >  #define VALID_SSE_REG_MODE(MODE)                                       \
> >    ((MODE) == V1TImode || (MODE) == TImode                              \
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 8b809c49fe0..dd991c3ffdf 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -1222,6 +1222,9 @@ (define_mode_iterator MODEF [SF DF])
> >  ;; All x87 floating point modes
> >  (define_mode_iterator X87MODEF [SF DF XF])
> >
> > +;; All x87 floating point modes plus HF
> > +(define_mode_iterator X87MODEFH [SF DF XF HF])
> > +
> >  ;; All SSE floating point modes
> >  (define_mode_iterator SSEMODEF [SF DF TF])
> >  (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
> > @@ -3130,6 +3133,32 @@ (define_split
> >    operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx);
> >  })
> >
> > +(define_insn "*pushhf_rex64"
> > +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> > +       (match_operand:HF 1 "nonmemory_no_elim_operand" "r,x"))]
> > +  "TARGET_64BIT"
> > +{
> > +  /* Anything else should be already split before reg-stack.  */
> > +  gcc_assert (which_alternative == 0);
> > +  return "push{q}\t%q1";
> > +}
> > +  [(set_attr "type" "push,multi")
> > +   (set_attr "mode" "DI,TI")
> > +   (set_attr "isa"  "*,sse4")])
>
> Please always put "isa" attribute first, as is the case with other
> insn patterns.
>
> > +(define_insn "*pushhf"
> > +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> > +       (match_operand:HF 1 "general_no_elim_operand" "rmF,x"))]
> > +  "!TARGET_64BIT"
> > +{
> > +  /* Anything else should be already split before reg-stack.  */
> > +  gcc_assert (which_alternative == 0);
> > +  return "push{l}\t%k1";
> > +}
> > +  [(set_attr "type" "push,multi")
> > +   (set_attr "mode" "SI,TI")
> > +   (set_attr "isa"  "*,sse4")])
>
> Also here.
>
> > +
> >  (define_insn "*pushsf_rex64"
> >    [(set (match_operand:SF 0 "push_operand" "=X,X,X")
> >         (match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,v"))]
> > @@ -3158,10 +3187,11 @@ (define_insn "*pushsf"
> >     (set_attr "unit" "i387,*,*")
> >     (set_attr "mode" "SF,SI,SF")])
> >
> > +(define_mode_iterator MODESH [SF HF])
> >  ;; %%% Kill this when call knows how to work this out.
> >  (define_split
> > -  [(set (match_operand:SF 0 "push_operand")
> > -       (match_operand:SF 1 "any_fp_register_operand"))]
> > +  [(set (match_operand:MODESH 0 "push_operand")
> > +       (match_operand:MODESH 1 "any_fp_register_operand"))]
> >    "reload_completed"
> >    [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2)))
> >     (set (match_dup 0) (match_dup 1))]
> > @@ -3209,8 +3239,8 @@ (define_expand "movtf"
> >    "ix86_expand_move (TFmode, operands); DONE;")
> >
> >  (define_expand "mov<mode>"
> > -  [(set (match_operand:X87MODEF 0 "nonimmediate_operand")
> > -       (match_operand:X87MODEF 1 "general_operand"))]
> > +  [(set (match_operand:X87MODEFH 0 "nonimmediate_operand")
> > +       (match_operand:X87MODEFH 1 "general_operand"))]
> >    ""
> >    "ix86_expand_move (<MODE>mode, operands); DONE;")
> >
> > @@ -3646,6 +3676,86 @@ (define_insn "*movsf_internal"
> >            ]
> >            (const_string "*")))])
> >
> > +(define_insn "*movhf_internal"
> > + [(set (match_operand:HF 0 "nonimmediate_operand"
> > +        "=?r,?m,v,v,?r,m,?v,v")
> > +       (match_operand:HF 1 "general_operand"
> > +        "rmF,rF,C,v, v,v, r,m"))]
> > + "!(MEM_P (operands[0]) && MEM_P (operands[1]))
> > +  && (lra_in_progress
> > +      || reload_completed
> > +      || !CONST_DOUBLE_P (operands[1])
> > +      || (TARGET_SSE && TARGET_SSE_MATH
> > +         && standard_sse_constant_p (operands[1], HFmode) == 1)
> > +      || memory_operand (operands[0], HFmode))"
> > +{
> > +  switch (get_attr_type (insn))
> > +    {
> > +    case TYPE_IMOV:
> > +      return "mov{w}\t{%1, %0|%0, %1}";
> > +
> > +    case TYPE_SSELOG1:
> > +      return standard_sse_constant_opcode (insn, operands);
> > +
> > +    case TYPE_SSEMOV:
> > +      return ix86_output_ssemov (insn, operands);
> > +
> > +    case TYPE_SSELOG:
> > +      if (SSE_REG_P (operands[0]))
> > +       return MEM_P (operands[1])
> > +              ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> > +              : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> > +      else
> > +       return MEM_P (operands[1])
> > +              ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> > +              : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> > +
> > +    default:
> > +      gcc_unreachable ();
> > +    }
> > +}
> > +  [(set (attr "isa")
> > +       (cond [(eq_attr "alternative" "2,3,4,6,7")
> > +                (const_string "sse2")
> > +              (eq_attr "alternative" "5")
> > +                (const_string "sse4")
> > +             ]
> > +             (const_string "*")))
> > +   (set (attr "type")
> > +       (cond [(eq_attr "alternative" "0,1")
> > +                (const_string "imov")
> > +              (eq_attr "alternative" "2")
> > +                (const_string "sselog1")
> > +              (eq_attr "alternative" "4,5,6,7")
> > +                (const_string "sselog")
> > +             ]
> > +             (const_string "ssemov")))
> > +   (set (attr "memory")
> > +       (cond [(eq_attr "alternative" "4,6")
> > +                (const_string "none")
> > +              (eq_attr "alternative" "5")
> > +                (const_string "store")
> > +              (eq_attr "alternative" "7")
> > +                (const_string "load")
> > +             ]
> > +             (const_string "*")))
> > +   (set (attr "prefix")
> > +       (cond [(eq_attr "alternative" "0,1")
> > +                (const_string "orig")
> > +             ]
> > +             (const_string "maybe_vex")))
> > +   (set (attr "mode")
> > +       (cond [(eq_attr "alternative" "0,1")
> > +                (const_string "HI")
> > +              (eq_attr "alternative" "2")
> > +                (const_string "V4SF")
> > +              (eq_attr "alternative" "4,5,6,7")
> > +                (const_string "TI")
> > +              (eq_attr "alternative" "3")
> > +                (const_string "SF")
> > +             ]
> > +             (const_string "*")))])
> > +
> >  (define_split
> >    [(set (match_operand 0 "any_fp_register_operand")
> >         (match_operand 1 "memory_operand"))]
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index b83cd4919bb..2cd0b38fe5b 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
> >  @section Half-Precision Floating Point
> >  @cindex half-precision floating point
> >  @cindex @code{__fp16} data type
> > +@cindex @code{__Float16} data type
> >
> >  On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
> >  point via the @code{__fp16} type defined in the ARM C Language Extensions.
> > @@ -1150,6 +1151,21 @@ calls.
> >  It is recommended that portable code use the @code{_Float16} type defined
> >  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
> >
> > +On x86 targets with @code{target("sse2")} and above, GCC supports half-precision
> > +(16-bit) floating point via the @code{_Float16} type which is defined by
> > +18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> > +which contains same data format as C.
> > +
> > +Without @code{target("avx512fp16")} @code{_Float16} type is storage only, and all
> > +operations will be emulated by soft-fp and @code{float} instructions.
> > +
> > +Soft-fp keeps the intermediate result of the operation at 32-bit precision by defaults,
> > +which may lead to inconsistent behavior between soft-fp and avx512fp16 instructions,
> > +using @option{-fexcess-precision=standard} will force round back after every operation.
> > +
> > +With @option{-mavx512fp16}, instead of calling soft-fp, GCC automatically generates
> > +hardware instructions.
> > +
> >  @node Decimal Float
> >  @section Decimal Floating Types
> >  @cindex decimal floating types
> > diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
> > index c13c7e45ac1..92f499643b5 100644
> > --- a/gcc/lto/lto-lang.c
> > +++ b/gcc/lto/lto-lang.c
> > @@ -992,6 +992,9 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
> >      return unsigned_p ? unsigned_intTI_type_node : intTI_type_node;
> >  #endif
> >
> > +  if (float16_type_node && mode == TYPE_MODE (float16_type_node))
> > +    return float16_type_node;
> > +
> >    if (mode == TYPE_MODE (float_type_node))
> >      return float_type_node;
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-1.c b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> > new file mode 100644
> > index 00000000000..1b645eb499d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> > @@ -0,0 +1,8 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mno-sse2" } */
> > +
> > +_Float16/* { dg-error "is not supported on this target" } */
> > +foo (_Float16 x) /* { dg-error "is not supported on this target" } */
> > +{
> > +  return x;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-2.c b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> > new file mode 100644
> > index 00000000000..3da7683fc31
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2 -mno-avx512f" } */
> > +
> > +union flt
> > +{
> > +  _Float16 flt;
> > +  short s;
> > +};
> > +
> > +_Float16
> > +foo (union flt x)
> > +{
> > +  return x.flt;
> > +}
> > +
> > +/* { dg-final { scan-assembler {(?n)pinsrw[\t ].*%xmm0} } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-3.c b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> > new file mode 100644
> > index 00000000000..60ff9d4ab80
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2 -mno-avx512f" } */
> > +
> > +#include<complex.h>
> > +
> > +_Complex _Float16
> > +foo (_Complex _Float16 x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler {(?n)movd[\t ].*%xmm0} } } */
> > --
> > 2.18.1
> >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 05/10] AVX512FP16: Support vector init/broadcast/set/extract for FP16.
  2021-07-21  7:43         ` [PATCH 05/10] AVX512FP16: Support vector init/broadcast/set/extract for FP16 liuhongt
@ 2021-07-22  5:24           ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-22  5:24 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, ubizjak, joseph, hjl.tools, richard.guenther

On Wed, Jul 21, 2021 at 3:44 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
>         (_mm256_set_ph): Likewise.
>         (_mm512_set_ph): Likewise.
>         (_mm_setr_ph): Likewise.
>         (_mm256_setr_ph): Likewise.
>         (_mm512_setr_ph): Likewise.
>         (_mm_set1_ph): Likewise.
>         (_mm256_set1_ph): Likewise.
>         (_mm512_set1_ph): Likewise.
>         (_mm_setzero_ph): Likewise.
>         (_mm256_setzero_ph): Likewise.
>         (_mm512_setzero_ph): Likewise.
>         (_mm_set_sh): Likewise.
>         (_mm_load_sh): Likewise.
>         (_mm_store_sh): Likewise.
>         * config/i386/i386-builtin-types.def (V8HF): New type.
>         (DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type
>         * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
>         Support vector HFmodes.
>         (ix86_expand_vector_init_one_nonzero): Likewise.
>         (ix86_expand_vector_init_one_var): Likewise.
>         (ix86_expand_vector_init_interleave): Likewise.
>         (ix86_expand_vector_init_general): Likewise.
>         (ix86_expand_vector_set): Likewise.
>         (ix86_expand_vector_extract): Likewise.
>         (ix86_expand_vector_init_concat): Likewise.
>         (ix86_expand_sse_movcc): Handle vector HFmodes.
>         (ix86_expand_vector_set_var): Ditto.
>         * config/i386/i386-modes.def: Add HF vector modes in comment.
>         * config/i386/i386.c (classify_argument): Add HF vector modes.
>         (ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
>         (ix86_vector_mode_supported_p): Likewise.
>         (ix86_set_reg_reg_cost): Handle vector HFmode.
>         (ix86_get_ssemov): Handle vector HFmode.
>         (function_arg_advance_64): Pass unamed V16HFmode and V32HFmode
>         by stack.
Got some feedback by H.J that 16/32/64-byte vector _Float16 should be
passed by sse registers for 32-bit mode, not stack. will handle it in
function_arg_32  in my next version.
>         * config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
>         (VALID_AVX256_REG_OR_OI_MODE): Rename to ..
>         (VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF.
>         (VALID_SSE2_REG_VHF_MODE): New.
>         (VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode.
>         (SSE_REG_MODE_P): Add vector HFmode.
>         * config/i386/i386.md (mode): Add HF vector modes.
>         (MODE_SIZE): Likewise.
>         (ssemodesuffix): Add ph suffix for HF vector modes.
>         * config/i386/sse.md (VFH_128): New mode iterator.
>         (VMOVE): Adjust for HF vector modes.
>         (V): Likewise.
>         (V_256_512): Likewise.
>         (avx512): Likewise.
>         (avx512fmaskmode): Likewise.
>         (shuffletype): Likewise.
>         (sseinsnmode): Likewise.
>         (ssedoublevecmode): Likewise.
>         (ssehalfvecmode): Likewise.
>         (ssehalfvecmodelower): Likewise.
>         (ssePScmode): Likewise.
>         (ssescalarmode): Likewise.
>         (ssescalarmodelower): Likewise.
>         (sseintprefix): Likewise.
>         (i128): Likewise.
>         (bcstscalarsuff): Likewise.
>         (xtg_mode): Likewise.
>         (VI12HF_AVX512VL): New mode_iterator.
>         (VF_AVX512FP16): Likewise.
>         (VIHF): Likewise.
>         (VIHF_256): Likewise.
>         (VIHF_AVX512BW): Likewise.
>         (V16_256): Likewise.
>         (V32_512): Likewise.
>         (sseintmodesuffix): New mode_attr.
>         (sse): Add scalar and vector HFmodes.
>         (ssescalarmode): Add vector HFmode mapping.
>         (ssescalarmodesuffix): Add sh suffix for HFmode.
>         (*<sse>_vm<insn><mode>3): Use VFH_128.
>         (*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
>         (*ieee_<ieee_maxmin><mode>3): Likewise.
>         (<avx512>_blendm<mode>): New define_insn.
>         (vec_setv8hf): New define_expand.
>         (vec_set<mode>_0): New define_insn for HF vector set.
>         (*avx512fp16_movsh): Likewise.
>         (avx512fp16_movsh): Likewise.
>         (vec_extract_lo_v32hi): Rename to ...
>         (vec_extract_lo_<mode>): ... this, and adjust to allow HF
>         vector modes.
>         (vec_extract_hi_v32hi): Likewise.
>         (vec_extract_hi_<mode>): Likewise.
>         (vec_extract_lo_v16hi): Likewise.
>         (vec_extract_lo_<mode>): Likewise.
>         (vec_extract_hi_v16hi): Likewise.
>         (vec_extract_hi_<mode>): Likewise.
>         (vec_set_hi_v16hi): Likewise.
>         (vec_set_hi_<mode>): Likewise.
>         (vec_set_lo_v16hi): Likewise.
>         (vec_set_lo_<mode>: Likewise.
>         (*vec_extract<mode>_0): New define_insn_and_split for HF
>         vector extract.
>         (*vec_extracthf): New define_insn.
>         (VEC_EXTRACT_MODE): Add HF vector modes.
>         (PINSR_MODE): Add V8HF.
>         (sse2p4_1): Likewise.
>         (pinsr_evex_isa): Likewise.
>         (<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
>         insert for V8HFmode.
>         (pbroadcast_evex_isa): Add HF vector modes.
>         (AVX2_VEC_DUP_MODE): Likewise.
>         (VEC_INIT_MODE): Likewise.
>         (VEC_INIT_HALF_MODE): Likewise.
>         (avx2_pbroadcast<mode>): Adjust to support HF vector mode
>         broadcast.
>         (avx2_pbroadcast<mode>_1): Likewise.
>         (<avx512>_vec_dup<mode>_1): Likewise.
>         (<avx512>_vec_dup<mode><mask_name>): Likewise.
>         (<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>):
>         Likewise.
> ---
>  gcc/config/i386/avx512fp16intrin.h     | 172 +++++++++++
>  gcc/config/i386/i386-builtin-types.def |   6 +-
>  gcc/config/i386/i386-expand.c          | 124 +++++++-
>  gcc/config/i386/i386-modes.def         |  12 +-
>  gcc/config/i386/i386.c                 |  69 ++---
>  gcc/config/i386/i386.h                 |  15 +-
>  gcc/config/i386/i386.md                |  13 +-
>  gcc/config/i386/sse.md                 | 395 +++++++++++++++++++------
>  8 files changed, 652 insertions(+), 154 deletions(-)
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> index 38d63161ba6..3fc0770986e 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -45,6 +45,178 @@ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
>  typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
>  typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
>
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5,
> +           _Float16 __A4, _Float16 __A3, _Float16 __A2,
> +           _Float16 __A1, _Float16 __A0)
> +{
> +  return __extension__ (__m128h)(__v8hf){ __A0, __A1, __A2, __A3,
> +                                         __A4, __A5, __A6, __A7 };
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_set_ph (_Float16 __A15, _Float16 __A14, _Float16 __A13,
> +              _Float16 __A12, _Float16 __A11, _Float16 __A10,
> +              _Float16 __A9, _Float16 __A8, _Float16 __A7,
> +              _Float16 __A6, _Float16 __A5, _Float16 __A4,
> +              _Float16 __A3, _Float16 __A2, _Float16 __A1,
> +              _Float16 __A0)
> +{
> +  return __extension__ (__m256h)(__v16hf){ __A0, __A1, __A2, __A3,
> +                                          __A4, __A5, __A6, __A7,
> +                                          __A8, __A9, __A10, __A11,
> +                                          __A12, __A13, __A14, __A15 };
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29,
> +              _Float16 __A28, _Float16 __A27, _Float16 __A26,
> +              _Float16 __A25, _Float16 __A24, _Float16 __A23,
> +              _Float16 __A22, _Float16 __A21, _Float16 __A20,
> +              _Float16 __A19, _Float16 __A18, _Float16 __A17,
> +              _Float16 __A16, _Float16 __A15, _Float16 __A14,
> +              _Float16 __A13, _Float16 __A12, _Float16 __A11,
> +              _Float16 __A10, _Float16 __A9, _Float16 __A8,
> +              _Float16 __A7, _Float16 __A6, _Float16 __A5,
> +              _Float16 __A4, _Float16 __A3, _Float16 __A2,
> +              _Float16 __A1, _Float16 __A0)
> +{
> +  return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3,
> +                                          __A4, __A5, __A6, __A7,
> +                                          __A8, __A9, __A10, __A11,
> +                                          __A12, __A13, __A14, __A15,
> +                                          __A16, __A17, __A18, __A19,
> +                                          __A20, __A21, __A22, __A23,
> +                                          __A24, __A25, __A26, __A27,
> +                                          __A28, __A29, __A30, __A31 };
> +}
> +
> +/* Create vectors of elements in the reversed order from _mm_set_ph,
> +   _mm256_set_ph and _mm512_set_ph functions.  */
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
> +            _Float16 __A3, _Float16 __A4, _Float16 __A5,
> +            _Float16 __A6, _Float16 __A7)
> +{
> +  return _mm_set_ph (__A7, __A6, __A5, __A4, __A3, __A2, __A1, __A0);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
> +               _Float16 __A3, _Float16 __A4, _Float16 __A5,
> +               _Float16 __A6, _Float16 __A7, _Float16 __A8,
> +               _Float16 __A9, _Float16 __A10, _Float16 __A11,
> +               _Float16 __A12, _Float16 __A13, _Float16 __A14,
> +               _Float16 __A15)
> +{
> +  return _mm256_set_ph (__A15, __A14, __A13, __A12, __A11, __A10, __A9,
> +                       __A8, __A7, __A6, __A5, __A4, __A3, __A2, __A1,
> +                       __A0);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
> +               _Float16 __A3, _Float16 __A4, _Float16 __A5,
> +               _Float16 __A6, _Float16 __A7, _Float16 __A8,
> +               _Float16 __A9, _Float16 __A10, _Float16 __A11,
> +               _Float16 __A12, _Float16 __A13, _Float16 __A14,
> +               _Float16 __A15, _Float16 __A16, _Float16 __A17,
> +               _Float16 __A18, _Float16 __A19, _Float16 __A20,
> +               _Float16 __A21, _Float16 __A22, _Float16 __A23,
> +               _Float16 __A24, _Float16 __A25, _Float16 __A26,
> +               _Float16 __A27, _Float16 __A28, _Float16 __A29,
> +               _Float16 __A30, _Float16 __A31)
> +
> +{
> +  return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25,
> +                       __A24, __A23, __A22, __A21, __A20, __A19, __A18,
> +                       __A17, __A16, __A15, __A14, __A13, __A12, __A11,
> +                       __A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3,
> +                       __A2, __A1, __A0);
> +}
> +
> +/* Broadcast _Float16 to vector.  */
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_set1_ph (_Float16 __A)
> +{
> +  return _mm_set_ph (__A, __A, __A, __A, __A, __A, __A, __A);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_set1_ph (_Float16 __A)
> +{
> +  return _mm256_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
> +                       __A, __A, __A, __A, __A, __A, __A, __A);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_set1_ph (_Float16 __A)
> +{
> +  return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
> +                       __A, __A, __A, __A, __A, __A, __A, __A,
> +                       __A, __A, __A, __A, __A, __A, __A, __A,
> +                       __A, __A, __A, __A, __A, __A, __A, __A);
> +}
> +
> +/* Create a vector with all zeros.  */
> +
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_setzero_ph (void)
> +{
> +  return _mm_set1_ph (0.0f);
> +}
> +
> +extern __inline __m256h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm256_setzero_ph (void)
> +{
> +  return _mm256_set1_ph (0.0f);
> +}
> +
> +extern __inline __m512h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_setzero_ph (void)
> +{
> +  return _mm512_set1_ph (0.0f);
> +}
> +
> +/* Create a vector with element 0 as F and the rest zero.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_set_sh (_Float16 __F)
> +{
> +  return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, __F);
> +}
> +
> +/* Create a vector with element 0 as *P and the rest zero.  */
> +extern __inline __m128h
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_load_sh (void const *__P)
> +{
> +  return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
> +                    *(_Float16 const *) __P);
> +}
> +
> +/* Stores the lower _Float16 value.  */
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_store_sh (void *__P, __m128h __A)
> +{
> +  *(_Float16 *) __P = ((__v8hf)__A)[0];
> +}
> +
>  #ifdef __DISABLE_AVX512FP16__
>  #undef __DISABLE_AVX512FP16__
>  #pragma GCC pop_options
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index 1768b88d748..4df6ee1009d 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -85,6 +85,7 @@ DEF_VECTOR_TYPE (V8QI, QI)
>  # SSE vectors
>  DEF_VECTOR_TYPE (V2DF, DOUBLE)
>  DEF_VECTOR_TYPE (V4SF, FLOAT)
> +DEF_VECTOR_TYPE (V8HF, FLOAT16)
>  DEF_VECTOR_TYPE (V2DI, DI)
>  DEF_VECTOR_TYPE (V4SI, SI)
>  DEF_VECTOR_TYPE (V8HI, HI)
> @@ -1297,4 +1298,7 @@ DEF_FUNCTION_TYPE (UINT, UINT, V2DI, V2DI, PVOID)
>  DEF_FUNCTION_TYPE (UINT, UINT, V2DI, PVOID)
>  DEF_FUNCTION_TYPE (VOID, V2DI, V2DI, V2DI, UINT)
>  DEF_FUNCTION_TYPE (UINT8, PV2DI, V2DI, PCVOID)
> -DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
> \ No newline at end of file
> +DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
> +
> +# FP16 builtins
> +DEF_FUNCTION_TYPE (V8HF, V8HI)
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index b7d050a1e42..bb965ca0e9b 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -3952,6 +3952,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
>        break;
>      case E_V16QImode:
>      case E_V8HImode:
> +    case E_V8HFmode:
>      case E_V4SImode:
>      case E_V2DImode:
>        if (TARGET_SSE4_1)
> @@ -3974,6 +3975,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
>        break;
>      case E_V32QImode:
>      case E_V16HImode:
> +    case E_V16HFmode:
>      case E_V8SImode:
>      case E_V4DImode:
>        if (TARGET_AVX2)
> @@ -3993,6 +3995,9 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
>      case E_V32HImode:
>        gen = gen_avx512bw_blendmv32hi;
>        break;
> +    case E_V32HFmode:
> +      gen = gen_avx512bw_blendmv32hf;
> +      break;
>      case E_V16SImode:
>        gen = gen_avx512f_blendmv16si;
>        break;
> @@ -14144,6 +14149,11 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
>         }
>        return true;
>
> +    case E_V8HFmode:
> +    case E_V16HFmode:
> +    case E_V32HFmode:
> +      return ix86_vector_duplicate_value (mode, target, val);
> +
>      default:
>        return false;
>      }
> @@ -14228,6 +14238,18 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
>        use_vector_set = TARGET_AVX512F && TARGET_64BIT && one_var == 0;
>        gen_vec_set_0 = gen_vec_setv8di_0;
>        break;
> +    case E_V8HFmode:
> +      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> +      gen_vec_set_0 = gen_vec_setv8hf_0;
> +      break;
> +    case E_V16HFmode:
> +      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> +      gen_vec_set_0 = gen_vec_setv16hf_0;
> +      break;
> +    case E_V32HFmode:
> +      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> +      gen_vec_set_0 = gen_vec_setv32hf_0;
> +      break;
>      default:
>        break;
>      }
> @@ -14377,6 +14399,8 @@ ix86_expand_vector_init_one_var (bool mmx_ok, machine_mode mode,
>        if (!TARGET_64BIT)
>         return false;
>        /* FALLTHRU */
> +    case E_V8HFmode:
> +    case E_V16HFmode:
>      case E_V4DFmode:
>      case E_V8SFmode:
>      case E_V8SImode:
> @@ -14457,6 +14481,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
>      case 2:
>        switch (mode)
>         {
> +       case E_V32HFmode:
> +         half_mode = V16HFmode;
> +         break;
>         case E_V16SImode:
>           half_mode = V8SImode;
>           break;
> @@ -14469,6 +14496,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
>         case E_V8DFmode:
>           half_mode = V4DFmode;
>           break;
> +       case E_V16HFmode:
> +         half_mode = V8HFmode;
> +         break;
>         case E_V8SImode:
>           half_mode = V4SImode;
>           break;
> @@ -14611,13 +14641,22 @@ ix86_expand_vector_init_interleave (machine_mode mode,
>  {
>    machine_mode first_imode, second_imode, third_imode, inner_mode;
>    int i, j;
> -  rtx op0, op1;
> +  rtx op, op0, op1;
>    rtx (*gen_load_even) (rtx, rtx, rtx);
>    rtx (*gen_interleave_first_low) (rtx, rtx, rtx);
>    rtx (*gen_interleave_second_low) (rtx, rtx, rtx);
>
>    switch (mode)
>      {
> +    case E_V8HFmode:
> +      gen_load_even = gen_vec_setv8hf;
> +      gen_interleave_first_low = gen_vec_interleave_lowv4si;
> +      gen_interleave_second_low = gen_vec_interleave_lowv2di;
> +      inner_mode = HFmode;
> +      first_imode = V4SImode;
> +      second_imode = V2DImode;
> +      third_imode = VOIDmode;
> +      break;
>      case E_V8HImode:
>        gen_load_even = gen_vec_setv8hi;
>        gen_interleave_first_low = gen_vec_interleave_lowv4si;
> @@ -14642,9 +14681,19 @@ ix86_expand_vector_init_interleave (machine_mode mode,
>
>    for (i = 0; i < n; i++)
>      {
> +      op = ops [i + i];
> +      if (inner_mode == HFmode)
> +       {
> +         /* Convert HFmode to HImode.  */
> +         op1 = gen_reg_rtx (HImode);
> +         op1 = gen_rtx_SUBREG (HImode, force_reg (HFmode, op), 0);
> +         op = gen_reg_rtx (HImode);
> +         emit_move_insn (op, op1);
> +       }
> +
>        /* Extend the odd elment to SImode using a paradoxical SUBREG.  */
>        op0 = gen_reg_rtx (SImode);
> -      emit_move_insn (op0, gen_lowpart (SImode, ops [i + i]));
> +      emit_move_insn (op0, gen_lowpart (SImode, op));
>
>        /* Insert the SImode value as low element of V4SImode vector. */
>        op1 = gen_reg_rtx (V4SImode);
> @@ -14781,6 +14830,10 @@ ix86_expand_vector_init_general (bool mmx_ok, machine_mode mode,
>        half_mode = V8HImode;
>        goto half;
>
> +    case E_V16HFmode:
> +      half_mode = V8HFmode;
> +      goto half;
> +
>  half:
>        n = GET_MODE_NUNITS (mode);
>        for (i = 0; i < n; i++)
> @@ -14804,6 +14857,11 @@ half:
>        half_mode = V16HImode;
>        goto quarter;
>
> +    case E_V32HFmode:
> +      quarter_mode = V8HFmode;
> +      half_mode = V16HFmode;
> +      goto quarter;
> +
>  quarter:
>        n = GET_MODE_NUNITS (mode);
>        for (i = 0; i < n; i++)
> @@ -14840,6 +14898,9 @@ quarter:
>          move from GPR to SSE register directly.  */
>        if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
>         break;
> +      /* FALLTHRU */
> +
> +    case E_V8HFmode:
>
>        n = GET_MODE_NUNITS (mode);
>        for (i = 0; i < n; i++)
> @@ -15087,6 +15148,16 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
>         case E_V16SFmode:
>           cmp_mode = V16SImode;
>           break;
> +       /* TARGET_AVX512FP16 implies TARGET_AVX512BW.  */
> +       case E_V8HFmode:
> +         cmp_mode = V8HImode;
> +         break;
> +       case E_V16HFmode:
> +         cmp_mode = V16HImode;
> +         break;
> +       case E_V32HFmode:
> +         cmp_mode = V32HImode;
> +         break;
>         default:
>           gcc_unreachable ();
>         }
> @@ -15123,23 +15194,25 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
>    machine_mode half_mode;
>    bool use_vec_merge = false;
>    rtx tmp;
> -  static rtx (*gen_extract[6][2]) (rtx, rtx)
> +  static rtx (*gen_extract[7][2]) (rtx, rtx)
>      = {
>         { gen_vec_extract_lo_v32qi, gen_vec_extract_hi_v32qi },
>         { gen_vec_extract_lo_v16hi, gen_vec_extract_hi_v16hi },
>         { gen_vec_extract_lo_v8si, gen_vec_extract_hi_v8si },
>         { gen_vec_extract_lo_v4di, gen_vec_extract_hi_v4di },
>         { gen_vec_extract_lo_v8sf, gen_vec_extract_hi_v8sf },
> -       { gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df }
> +       { gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df },
> +       { gen_vec_extract_lo_v16hf, gen_vec_extract_hi_v16hf }
>        };
> -  static rtx (*gen_insert[6][2]) (rtx, rtx, rtx)
> +  static rtx (*gen_insert[7][2]) (rtx, rtx, rtx)
>      = {
>         { gen_vec_set_lo_v32qi, gen_vec_set_hi_v32qi },
>         { gen_vec_set_lo_v16hi, gen_vec_set_hi_v16hi },
>         { gen_vec_set_lo_v8si, gen_vec_set_hi_v8si },
>         { gen_vec_set_lo_v4di, gen_vec_set_hi_v4di },
>         { gen_vec_set_lo_v8sf, gen_vec_set_hi_v8sf },
> -       { gen_vec_set_lo_v4df, gen_vec_set_hi_v4df }
> +       { gen_vec_set_lo_v4df, gen_vec_set_hi_v4df },
> +       { gen_vec_set_lo_v16hf, gen_vec_set_hi_v16hf },
>        };
>    int i, j, n;
>    machine_mode mmode = VOIDmode;
> @@ -15306,6 +15379,10 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
>         }
>        return;
>
> +    case E_V8HFmode:
> +      use_vec_merge = true;
> +      break;
> +
>      case E_V8HImode:
>      case E_V2HImode:
>        use_vec_merge = TARGET_SSE2;
> @@ -15329,6 +15406,12 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
>        n = 16;
>        goto half;
>
> +    case E_V16HFmode:
> +      half_mode = V8HFmode;
> +      j = 6;
> +      n = 8;
> +      goto half;
> +
>      case E_V16HImode:
>        half_mode = V8HImode;
>        j = 1;
> @@ -15409,6 +15492,13 @@ half:
>         }
>        break;
>
> +    case E_V32HFmode:
> +      if (TARGET_AVX512BW)
> +       {
> +         mmode = SImode;
> +         gen_blendm = gen_avx512bw_blendmv32hf;
> +       }
> +      break;
>      case E_V32HImode:
>        if (TARGET_AVX512BW)
>         {
> @@ -15780,6 +15870,28 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
>        ix86_expand_vector_extract (false, target, tmp, elt & 3);
>        return;
>
> +    case E_V32HFmode:
> +      tmp = gen_reg_rtx (V16HFmode);
> +      if (elt < 16)
> +       emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
> +      else
> +       emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
> +      ix86_expand_vector_extract (false, target, tmp, elt & 15);
> +      return;
> +
> +    case E_V16HFmode:
> +      tmp = gen_reg_rtx (V8HFmode);
> +      if (elt < 8)
> +       emit_insn (gen_vec_extract_lo_v16hf (tmp, vec));
> +      else
> +       emit_insn (gen_vec_extract_hi_v16hf (tmp, vec));
> +      ix86_expand_vector_extract (false, target, tmp, elt & 7);
> +      return;
> +
> +    case E_V8HFmode:
> +      use_vec_extr = true;
> +      break;
> +
>      case E_V8QImode:
>        use_vec_extr = TARGET_MMX_WITH_SSE && TARGET_SSE4_1;
>        /* ??? Could extract the appropriate HImode element and shift.  */
> diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
> index 9232f59a925..fcadfcd4c94 100644
> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -84,12 +84,12 @@ VECTOR_MODES (INT, 16);       /*   V16QI V8HI V4SI V2DI */
>  VECTOR_MODES (INT, 32);       /*  V32QI V16HI V8SI V4DI */
>  VECTOR_MODES (INT, 64);       /* V64QI V32HI V16SI V8DI */
>  VECTOR_MODES (INT, 128);      /* V128QI V64HI V32SI V16DI */
> -VECTOR_MODES (FLOAT, 8);      /*                   V2SF */
> -VECTOR_MODES (FLOAT, 16);     /*              V4SF V2DF */
> -VECTOR_MODES (FLOAT, 32);     /*         V8SF V4DF V2TF */
> -VECTOR_MODES (FLOAT, 64);     /*        V16SF V8DF V4TF */
> -VECTOR_MODES (FLOAT, 128);    /*       V32SF V16DF V8TF */
> -VECTOR_MODES (FLOAT, 256);    /*      V64SF V32DF V16TF */
> +VECTOR_MODES (FLOAT, 8);      /*              V4HF V2SF */
> +VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
> +VECTOR_MODES (FLOAT, 32);     /*   V16HF V8SF V4DF V2TF */
> +VECTOR_MODES (FLOAT, 64);     /*  V32HF V16SF V8DF V4TF */
> +VECTOR_MODES (FLOAT, 128);    /* V64HF V32SF V16DF V8TF */
> +VECTOR_MODES (FLOAT, 256);    /* V128HF V64SF V32DF V16TF */
>  VECTOR_MODE (INT, TI, 1);     /*                   V1TI */
>  VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
>  VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index e826484a4f4..9fd36ff4c59 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2418,6 +2418,7 @@ classify_argument (machine_mode mode, const_tree type,
>      case E_V8SFmode:
>      case E_V8SImode:
>      case E_V32QImode:
> +    case E_V16HFmode:
>      case E_V16HImode:
>      case E_V4DFmode:
>      case E_V4DImode:
> @@ -2428,6 +2429,7 @@ classify_argument (machine_mode mode, const_tree type,
>        return 4;
>      case E_V8DFmode:
>      case E_V16SFmode:
> +    case E_V32HFmode:
>      case E_V8DImode:
>      case E_V16SImode:
>      case E_V32HImode:
> @@ -2445,6 +2447,7 @@ classify_argument (machine_mode mode, const_tree type,
>      case E_V4SImode:
>      case E_V16QImode:
>      case E_V8HImode:
> +    case E_V8HFmode:
>      case E_V2DFmode:
>      case E_V2DImode:
>        classes[0] = X86_64_SSE_CLASS;
> @@ -2929,7 +2932,9 @@ function_arg_advance_64 (CUMULATIVE_ARGS *cum, machine_mode mode,
>
>    /* Unnamed 512 and 256bit vector mode parameters are passed on stack.  */
>    if (!named && (VALID_AVX512F_REG_MODE (mode)
> -                || VALID_AVX256_REG_MODE (mode)))
> +                || VALID_AVX256_REG_MODE (mode)
> +                || mode == V16HFmode
> +                || mode == V32HFmode))
>      return 0;
>
>    if (!examine_argument (mode, type, 0, &int_nregs, &sse_nregs)
> @@ -3176,12 +3181,14 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode,
>      default:
>        break;
>
> +    case E_V16HFmode:
>      case E_V8SFmode:
>      case E_V8SImode:
>      case E_V32QImode:
>      case E_V16HImode:
>      case E_V4DFmode:
>      case E_V4DImode:
> +    case E_V32HFmode:
>      case E_V16SFmode:
>      case E_V16SImode:
>      case E_V64QImode:
> @@ -4676,12 +4683,14 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
>    nat_mode = type_natural_mode (type, NULL, false);
>    switch (nat_mode)
>      {
> +    case E_V16HFmode:
>      case E_V8SFmode:
>      case E_V8SImode:
>      case E_V32QImode:
>      case E_V16HImode:
>      case E_V4DFmode:
>      case E_V4DImode:
> +    case E_V32HFmode:
>      case E_V16SFmode:
>      case E_V16SImode:
>      case E_V64QImode:
> @@ -5348,7 +5357,12 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>        switch (type)
>         {
>         case opcode_int:
> -         opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> +         if (scalar_mode == E_HFmode)
> +           opcode = (misaligned_p
> +                     ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
> +                     : "vmovdqa64");
> +         else
> +           opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
>           break;
>         case opcode_float:
>           opcode = misaligned_p ? "vmovups" : "vmovaps";
> @@ -5362,6 +5376,11 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>      {
>        switch (scalar_mode)
>         {
> +       case E_HFmode:
> +         opcode = (misaligned_p
> +                   ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
> +                   : "vmovdqa64");
> +         break;
>         case E_SFmode:
>           opcode = misaligned_p ? "%vmovups" : "%vmovaps";
>           break;
> @@ -19293,7 +19312,6 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>        int index;
>        switch (mode)
>         {
> -         case E_HFmode:
>           case E_SFmode:
>             index = 0;
>             break;
> @@ -19394,31 +19412,12 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>           }
>         break;
>        case 2:
> -       {
> -         int cost;
> -         if (in == 2)
> -           cost = MAX (ix86_cost->hard_register.int_load[1],
> -                       ix86_cost->hard_register.int_store[1]);
> -         else
> -           cost = in ? ix86_cost->hard_register.int_load[1]
> -                     : ix86_cost->hard_register.int_store[1];
> -         if (mode == E_HFmode)
> -           {
> -             /* Prefer SSE over GPR for HFmode.  */
> -             int sse_cost;
> -             int index = sse_store_index (mode);
> -             if (in == 2)
> -               sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
> -                               ix86_cost->hard_register.sse_store[index]);
> -             else
> -               sse_cost = (in
> -                           ? ix86_cost->hard_register.sse_load [index]
> -                           : ix86_cost->hard_register.sse_store [index]);
> -             if (sse_cost >= cost)
> -               cost = sse_cost + 1;
> -           }
> -         return cost;
> -       }
> +       if (in == 2)
> +         return MAX (ix86_cost->hard_register.int_load[1],
> +                     ix86_cost->hard_register.int_store[1]);
> +       else
> +         return in ? ix86_cost->hard_register.int_load[1]
> +                   : ix86_cost->hard_register.int_store[1];
>        default:
>         if (in == 2)
>           cost = MAX (ix86_cost->hard_register.int_load[2],
> @@ -19596,6 +19595,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>          between gpr and sse registser.  */
>        if (TARGET_AVX512F
>           && (mode == XImode
> +             || mode == V32HFmode
>               || VALID_AVX512F_REG_MODE (mode)
>               || VALID_AVX512F_SCALAR_MODE (mode)))
>         return true;
> @@ -19610,9 +19610,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>        /* TODO check for QI/HI scalars.  */
>        /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
>        if (TARGET_AVX512VL
> -         && (mode == OImode
> -             || mode == TImode
> -             || VALID_AVX256_REG_MODE (mode)
> +         && (VALID_AVX256_REG_OR_OI_VHF_MODE (mode)
>               || VALID_AVX512VL_128_REG_MODE (mode)))
>         return true;
>
> @@ -19622,9 +19620,9 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>
>        /* OImode and AVX modes are available only when AVX is enabled.  */
>        return ((TARGET_AVX
> -              && VALID_AVX256_REG_OR_OI_MODE (mode))
> +              && VALID_AVX256_REG_OR_OI_VHF_MODE (mode))
>               || VALID_SSE_REG_MODE (mode)
> -             || VALID_SSE2_REG_MODE (mode)
> +             || VALID_SSE2_REG_VHF_MODE (mode)
>               || VALID_MMX_REG_MODE (mode)
>               || VALID_MMX_REG_MODE_3DNOW (mode));
>      }
> @@ -19837,7 +19835,8 @@ ix86_set_reg_reg_cost (machine_mode mode)
>
>      case MODE_VECTOR_INT:
>      case MODE_VECTOR_FLOAT:
> -      if ((TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
> +      if ((TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
> +         || (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
>           || (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
>           || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
>           || (TARGET_SSE && VALID_SSE_REG_MODE (mode))
> @@ -21703,6 +21702,8 @@ ix86_vector_mode_supported_p (machine_mode mode)
>    if ((TARGET_MMX || TARGET_MMX_WITH_SSE)
>        && VALID_MMX_REG_MODE (mode))
>      return true;
> +  if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
> +    return true;
>    if ((TARGET_3DNOW || TARGET_MMX_WITH_SSE)
>        && VALID_MMX_REG_MODE_3DNOW (mode))
>      return true;
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index dca2ad32ed4..086dbafbcee 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -995,8 +995,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>     || (MODE) == V4DImode || (MODE) == V2TImode || (MODE) == V8SFmode   \
>     || (MODE) == V4DFmode)
>
> -#define VALID_AVX256_REG_OR_OI_MODE(MODE)              \
> -  (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
> +#define VALID_AVX256_REG_OR_OI_VHF_MODE(MODE)          \
> +  (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode || (MODE) == V16HFmode)
>
>  #define VALID_AVX512F_SCALAR_MODE(MODE)                                        \
>    ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode            \
> @@ -1014,13 +1014,20 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define VALID_AVX512VL_128_REG_MODE(MODE)                              \
>    ((MODE) == V2DImode || (MODE) == V2DFmode || (MODE) == V16QImode     \
>     || (MODE) == V4SImode || (MODE) == V4SFmode || (MODE) == V8HImode   \
> -   || (MODE) == TFmode || (MODE) == V1TImode)
> +   || (MODE) == TFmode || (MODE) == V1TImode || (MODE) == V8HFmode     \
> +   || (MODE) == TImode)
> +
> +#define VALID_AVX512FP16_REG_MODE(MODE)                                        \
> +  ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode)
>
>  #define VALID_SSE2_REG_MODE(MODE)                                      \
>    ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode     \
>     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
>     || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
>
> +#define VALID_SSE2_REG_VHF_MODE(MODE)                  \
> +  (VALID_SSE2_REG_MODE (MODE) || (MODE) == V8HFmode)
> +
>  #define VALID_SSE_REG_MODE(MODE)                                       \
>    ((MODE) == V1TImode || (MODE) == TImode                              \
>     || (MODE) == V4SFmode || (MODE) == V4SImode                         \
> @@ -1064,7 +1071,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>     || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode   \
>     || (MODE) == V2TImode || (MODE) == V8DImode || (MODE) == V64QImode  \
>     || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode \
> -   || (MODE) == V16SFmode)
> +   || (MODE) == V16SFmode || VALID_AVX512FP16_REG_MODE (MODE))
>
>  #define X87_FLOAT_MODE_P(MODE) \
>    (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode))
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 8f11cbcf28b..20945fabb2c 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -496,8 +496,8 @@ (define_attr "type"
>
>  ;; Main data type used by the insn
>  (define_attr "mode"
> -  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
> -  V2DF,V2SF,V1DF,V8DF"
> +  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V32HF,V16HF,V8HF,
> +   V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF"
>    (const_string "unknown"))
>
>  ;; The CPU unit operations uses.
> @@ -1098,7 +1098,8 @@ (define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
>                              (V2DI "16") (V4DI "32") (V8DI "64")
>                              (V1TI "16") (V2TI "32") (V4TI "64")
>                              (V2DF "16") (V4DF "32") (V8DF "64")
> -                            (V4SF "16") (V8SF "32") (V16SF "64")])
> +                            (V4SF "16") (V8SF "32") (V16SF "64")
> +                            (V8HF "16") (V16HF "32") (V32HF "64")])
>
>  ;; Double word integer modes as mode attribute.
>  (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "OI")])
> @@ -1236,9 +1237,9 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
>  ;; SSE instruction suffix for various modes
>  (define_mode_attr ssemodesuffix
>    [(HF "sh") (SF "ss") (DF "sd")
> -   (V16SF "ps") (V8DF "pd")
> -   (V8SF "ps") (V4DF "pd")
> -   (V4SF "ps") (V2DF "pd")
> +   (V32HF "ph") (V16SF "ps") (V8DF "pd")
> +   (V16HF "ph") (V8SF "ps") (V4DF "pd")
> +   (V8HF "ph") (V4SF "ps") (V2DF "pd")
>     (V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
>     (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
>     (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")])
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index ab29999023d..b004b5eee74 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -225,6 +225,7 @@ (define_mode_iterator VMOVE
>     (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
>     (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI
> +   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") V2DF])
>
> @@ -240,6 +241,13 @@ (define_mode_iterator VI12_AVX512VL
>    [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
>     V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
>
> +(define_mode_iterator VI12HF_AVX512VL
> +  [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
> +   V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
> +   (V32HF "TARGET_AVX512FP16")
> +   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
> +   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")])
> +
>  ;; Same iterator, but without supposed TARGET_AVX512BW
>  (define_mode_iterator VI12_AVX512VLBW
>    [(V64QI "TARGET_AVX512BW") (V16QI "TARGET_AVX512VL")
> @@ -255,6 +263,8 @@ (define_mode_iterator V
>     (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
>     (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
>     (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
> +   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
> +   (V8HF "TARGET_AVX512FP16")
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
>
> @@ -277,7 +287,8 @@ (define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF])
>  (define_mode_iterator V_256_512
>    [V32QI V16HI V8SI V4DI V8SF V4DF
>     (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V16SI "TARGET_AVX512F")
> -   (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
> +   (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
> +   (V16HF "TARGET_AVX512FP16") (V32HF "TARGET_AVX512FP16")])
>
>  ;; All vector float modes
>  (define_mode_iterator VF
> @@ -321,6 +332,11 @@ (define_mode_iterator VF2_512_256VL
>  (define_mode_iterator VF_128
>    [V4SF (V2DF "TARGET_SSE2")])
>
> +;; All 128bit vector HF/SF/DF modes
> +(define_mode_iterator VFH_128
> +  [(V8HF "TARGET_AVX512FP16")
> +   V4SF (V2DF "TARGET_SSE2")])
> +
>  ;; All 256bit vector float modes
>  (define_mode_iterator VF_256
>    [V8SF V4DF])
> @@ -347,6 +363,9 @@ (define_mode_iterator VF2_AVX512VL
>  (define_mode_iterator VF1_AVX512VL
>    [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
>
> +(define_mode_iterator VF_AVX512FP16
> +  [V32HF V16HF V8HF])
> +
>  ;; All vector integer modes
>  (define_mode_iterator VI
>    [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
> @@ -355,6 +374,16 @@ (define_mode_iterator VI
>     (V8SI "TARGET_AVX") V4SI
>     (V4DI "TARGET_AVX") V2DI])
>
> +;; All vector integer and HF modes
> +(define_mode_iterator VIHF
> +  [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
> +   (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
> +   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
> +   (V8SI "TARGET_AVX") V4SI
> +   (V4DI "TARGET_AVX") V2DI
> +   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
> +   (V8HF "TARGET_AVX512FP16")])
> +
>  (define_mode_iterator VI_AVX2
>    [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
>     (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
> @@ -557,6 +586,7 @@ (define_mode_attr avx512
>     (V8HI  "avx512vl") (V16HI  "avx512vl") (V32HI "avx512bw")
>     (V4SI  "avx512vl") (V8SI  "avx512vl") (V16SI "avx512f")
>     (V2DI  "avx512vl") (V4DI  "avx512vl") (V8DI "avx512f")
> +   (V8HF "avx512fp16") (V16HF "avx512vl") (V32HF "avx512bw")
>     (V4SF "avx512vl") (V8SF "avx512vl") (V16SF "avx512f")
>     (V2DF "avx512vl") (V4DF "avx512vl") (V8DF "avx512f")])
>
> @@ -617,12 +647,13 @@ (define_mode_attr avx2_avx512
>     (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw")])
>
>  (define_mode_attr shuffletype
> -  [(V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i")
> -  (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i")
> -  (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i")
> -  (V32HI "i") (V16HI "i") (V8HI "i")
> -  (V64QI "i") (V32QI "i") (V16QI "i")
> -  (V4TI "i") (V2TI "i") (V1TI "i")])
> +  [(V32HF "f") (V16HF "f") (V8HF "f")
> +   (V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i")
> +   (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i")
> +   (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i")
> +   (V32HI "i") (V16HI "i") (V8HI "i")
> +   (V64QI "i") (V32QI "i") (V16QI "i")
> +   (V4TI "i") (V2TI "i") (V1TI "i")])
>
>  (define_mode_attr ssequartermode
>    [(V16SF "V4SF") (V8DF "V2DF") (V16SI "V4SI") (V8DI "V2DI")])
> @@ -659,6 +690,8 @@ (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
>
>  ;; All 128 and 256bit vector integer modes
>  (define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
> +;; All 256bit vector integer and HF modes
> +(define_mode_iterator VIHF_256 [V32QI V16HI V8SI V4DI V16HF])
>
>  ;; Various 128bit vector integer mode combinations
>  (define_mode_iterator VI12_128 [V16QI V8HI])
> @@ -680,6 +713,9 @@ (define_mode_iterator VI48_512 [V16SI V8DI])
>  (define_mode_iterator VI4_256_8_512 [V8SI V8DI])
>  (define_mode_iterator VI_AVX512BW
>    [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
> +(define_mode_iterator VIHF_AVX512BW
> +  [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
> +  (V32HF "TARGET_AVX512FP16")])
>
>  ;; Int-float size matches
>  (define_mode_iterator VI4F_128 [V4SI V4SF])
> @@ -720,6 +756,9 @@ (define_mode_iterator VF_AVX512
>     (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
>     V16SF V8DF])
>
> +(define_mode_iterator V16_256 [V16HI V16HF])
> +(define_mode_iterator V32_512 [V32HI V32HF])
> +
>  (define_mode_attr avx512bcst
>    [(V4SI "%{1to4%}") (V2DI "%{1to2%}")
>     (V8SI "%{1to8%}") (V4DI "%{1to4%}")
> @@ -730,8 +769,10 @@ (define_mode_attr avx512bcst
>
>  ;; Mapping from float mode to required SSE level
>  (define_mode_attr sse
> -  [(SF "sse") (DF "sse2")
> +  [(SF "sse") (DF "sse2") (HF "avx512fp16")
>     (V4SF "sse") (V2DF "sse2")
> +   (V32HF "avx512fp16") (V16HF "avx512fp16")
> +   (V8HF "avx512fp16")
>     (V16SF "avx512f") (V8SF "avx")
>     (V8DF "avx512f") (V4DF "avx")])
>
> @@ -767,14 +808,23 @@ (define_mode_attr sseinsnmode
>     (V16SF "V16SF") (V8DF "V8DF")
>     (V8SF "V8SF") (V4DF "V4DF")
>     (V4SF "V4SF") (V2DF "V2DF")
> +   (V8HF "TI") (V16HF "OI") (V32HF "XI")
>     (TI "TI")])
>
> +;; SSE integer instruction suffix for various modes
> +(define_mode_attr sseintmodesuffix
> +  [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
> +   (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
> +   (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")
> +   (V8HF "w") (V16HF "w") (V32HF "w")])
> +
>  ;; Mapping of vector modes to corresponding mask size
>  (define_mode_attr avx512fmaskmode
>    [(V64QI "DI") (V32QI "SI") (V16QI "HI")
>     (V32HI "SI") (V16HI "HI") (V8HI  "QI") (V4HI "QI")
>     (V16SI "HI") (V8SI  "QI") (V4SI  "QI")
>     (V8DI  "QI") (V4DI  "QI") (V2DI  "QI")
> +   (V32HF "SI") (V16HF "HI") (V8HF  "QI")
>     (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
>     (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
>
> @@ -784,6 +834,7 @@ (define_mode_attr avx512fmaskmodelower
>     (V32HI "si") (V16HI "hi") (V8HI  "qi") (V4HI "qi")
>     (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
>     (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
> +   (V32HF "si") (V16HF "hi") (V8HF  "qi")
>     (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
>     (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
>
> @@ -828,7 +879,8 @@ (define_mode_attr ssedoublevecmode
>     (V16QI "V32QI") (V8HI "V16HI") (V4SI "V8SI") (V2DI "V4DI")
>     (V16SF "V32SF") (V8DF "V16DF")
>     (V8SF "V16SF") (V4DF "V8DF")
> -   (V4SF "V8SF") (V2DF "V4DF")])
> +   (V4SF "V8SF") (V2DF "V4DF")
> +   (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")])
>
>  ;; Mapping of vector modes to a vector mode of half size
>  ;; instead of V1DI/V1DF, DI/DF are used for V2DI/V2DF although they are scalar.
> @@ -838,7 +890,8 @@ (define_mode_attr ssehalfvecmode
>     (V16QI  "V8QI") (V8HI   "V4HI") (V4SI  "V2SI") (V2DI "DI")
>     (V16SF "V8SF") (V8DF "V4DF")
>     (V8SF  "V4SF") (V4DF "V2DF")
> -   (V4SF  "V2SF") (V2DF "DF")])
> +   (V4SF  "V2SF") (V2DF "DF")
> +   (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")])
>
>  (define_mode_attr ssehalfvecmodelower
>    [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti")
> @@ -846,9 +899,10 @@ (define_mode_attr ssehalfvecmodelower
>     (V16QI  "v8qi") (V8HI   "v4hi") (V4SI  "v2si")
>     (V16SF "v8sf") (V8DF "v4df")
>     (V8SF  "v4sf") (V4DF "v2df")
> -   (V4SF  "v2sf")])
> +   (V4SF  "v2sf")
> +   (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")])
>
> -;; Mapping of vector modes ti packed single mode of the same size
> +;; Mapping of vector modes to packed single mode of the same size
>  (define_mode_attr ssePSmode
>    [(V16SI "V16SF") (V8DF "V16SF")
>     (V16SF "V16SF") (V8DI "V16SF")
> @@ -858,7 +912,8 @@ (define_mode_attr ssePSmode
>     (V4DI "V8SF") (V2DI "V4SF")
>     (V4TI "V16SF") (V2TI "V8SF") (V1TI "V4SF")
>     (V8SF "V8SF") (V4SF "V4SF")
> -   (V4DF "V8SF") (V2DF "V4SF")])
> +   (V4DF "V8SF") (V2DF "V4SF")
> +   (V32HF "V16SF") (V16HF "V8SF") (V8HF "V4SF")])
>
>  (define_mode_attr ssePSmode2
>    [(V8DI "V8SF") (V4DI "V4SF")])
> @@ -869,6 +924,7 @@ (define_mode_attr ssescalarmode
>     (V32HI "HI") (V16HI "HI") (V8HI "HI")
>     (V16SI "SI") (V8SI "SI")  (V4SI "SI")
>     (V8DI "DI")  (V4DI "DI")  (V2DI "DI")
> +   (V32HF "HF") (V16HF "HF") (V8HF "HF")
>     (V16SF "SF") (V8SF "SF")  (V4SF "SF")
>     (V8DF "DF")  (V4DF "DF")  (V2DF "DF")
>     (V4TI "TI")  (V2TI "TI")])
> @@ -879,6 +935,7 @@ (define_mode_attr ssescalarmodelower
>     (V32HI "hi") (V16HI "hi") (V8HI "hi")
>     (V16SI "si") (V8SI "si")  (V4SI "si")
>     (V8DI "di")  (V4DI "di")  (V2DI "di")
> +   (V32HF "hf") (V16HF "hf")  (V8HF "hf")
>     (V16SF "sf") (V8SF "sf")  (V4SF "sf")
>     (V8DF "df")  (V4DF "df")  (V2DF "df")
>     (V4TI "ti")  (V2TI "ti")])
> @@ -889,6 +946,7 @@ (define_mode_attr ssexmmmode
>     (V32HI "V8HI")  (V16HI "V8HI") (V8HI "V8HI")
>     (V16SI "V4SI")  (V8SI "V4SI")  (V4SI "V4SI")
>     (V8DI "V2DI")   (V4DI "V2DI")  (V2DI "V2DI")
> +   (V32HF "V8HF")  (V16HF "V8HF") (V8HF "V8HF")
>     (V16SF "V4SF")  (V8SF "V4SF")  (V4SF "V4SF")
>     (V8DF "V2DF")   (V4DF "V2DF")  (V2DF "V2DF")])
>
> @@ -931,10 +989,11 @@ (define_mode_attr ssescalarsize
>     (V64QI "8") (V32QI "8") (V16QI "8")
>     (V32HI "16") (V16HI "16") (V8HI "16")
>     (V16SI "32") (V8SI "32") (V4SI "32")
> +   (V32HF "16") (V16HF "16") (V8HF "16")
>     (V16SF "32") (V8SF "32") (V4SF "32")
>     (V8DF "64") (V4DF "64") (V2DF "64")])
>
> -;; SSE prefix for integer vector modes
> +;; SSE prefix for integer and HF vector modes
>  (define_mode_attr sseintprefix
>    [(V2DI  "p") (V2DF  "")
>     (V4DI  "p") (V4DF  "")
> @@ -942,16 +1001,16 @@ (define_mode_attr sseintprefix
>     (V4SI  "p") (V4SF  "")
>     (V8SI  "p") (V8SF  "")
>     (V16SI "p") (V16SF "")
> -   (V16QI "p") (V8HI "p")
> -   (V32QI "p") (V16HI "p")
> -   (V64QI "p") (V32HI "p")])
> +   (V16QI "p") (V8HI "p") (V8HF "p")
> +   (V32QI "p") (V16HI "p") (V16HF "p")
> +   (V64QI "p") (V32HI "p") (V32HF "p")])
>
>  ;; SSE scalar suffix for vector modes
>  (define_mode_attr ssescalarmodesuffix
> -  [(SF "ss") (DF "sd")
> -   (V16SF "ss") (V8DF "sd")
> -   (V8SF "ss") (V4DF "sd")
> -   (V4SF "ss") (V2DF "sd")
> +  [(HF "sh") (SF "ss") (DF "sd")
> +   (V32HF "sh") (V16SF "ss") (V8DF "sd")
> +   (V16HF "sh") (V8SF "ss") (V4DF "sd")
> +   (V8HF "sh") (V4SF "ss") (V2DF "sd")
>     (V16SI "d") (V8DI "q")
>     (V8SI "d") (V4DI "q")
>     (V4SI "d") (V2DI "q")])
> @@ -979,7 +1038,8 @@ (define_mode_attr castmode
>  ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise.
>  ;; i64x4 or f64x4 for 512bit modes.
>  (define_mode_attr i128
> -  [(V16SF "f64x4") (V8SF "f128") (V8DF "f64x4") (V4DF "f128")
> +  [(V16HF "%~128") (V32HF "i64x4") (V16SF "f64x4") (V8SF "f128")
> +   (V8DF "f64x4") (V4DF "f128")
>     (V64QI "i64x4") (V32QI "%~128") (V32HI "i64x4") (V16HI "%~128")
>     (V16SI "i64x4") (V8SI "%~128") (V8DI "i64x4") (V4DI "%~128")])
>
> @@ -1003,14 +1063,18 @@ (define_mode_attr bcstscalarsuff
>     (V32HI "w")  (V16HI "w") (V8HI "w")
>     (V16SI "d")  (V8SI "d")  (V4SI "d")
>     (V8DI "q")   (V4DI "q")  (V2DI "q")
> +   (V32HF "w")  (V16HF "w") (V8HF "w")
>     (V16SF "ss") (V8SF "ss") (V4SF "ss")
>     (V8DF "sd")  (V4DF "sd") (V2DF "sd")])
>
>  ;; Tie mode of assembler operand to mode iterator
>  (define_mode_attr xtg_mode
> -  [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x") (V4SF "x") (V2DF "x")
> -   (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t")
> -   (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")])
> +  [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x")
> +   (V8HF "x") (V4SF "x") (V2DF "x")
> +   (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t")
> +   (V16HF "t") (V8SF "t") (V4DF "t")
> +   (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g")
> +   (V32HF "g") (V16SF "g") (V8DF "g")])
>
>  ;; Half mask mode for unpacks
>  (define_mode_attr HALFMASKMODE
> @@ -1306,6 +1370,20 @@ (define_insn "<avx512>_blendm<mode>"
>     (set_attr "prefix" "evex")
>     (set_attr "mode" "<sseinsnmode>")])
>
> +(define_insn "<avx512>_blendm<mode>"
> +  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v,v")
> +       (vec_merge:VF_AVX512FP16
> +         (match_operand:VF_AVX512FP16 2 "nonimmediate_operand" "vm,vm")
> +         (match_operand:VF_AVX512FP16 1 "nonimm_or_0_operand" "0C,v")
> +         (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk,Yk")))]
> +  "TARGET_AVX512BW"
> +  "@
> +    vmovdqu<ssescalarsize>\t{%2, %0%{%3%}%N1|%0%{%3%}%N1, %2}
> +    vpblendmw\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<sseinsnmode>")])
> +
>  (define_insn "<avx512>_store<mode>_mask"
>    [(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m")
>         (vec_merge:V48_AVX512VL
> @@ -1903,12 +1981,12 @@ (define_insn "*<insn><mode>3<mask_name><round_name>"
>  ;; Standard scalar operation patterns which preserve the rest of the
>  ;; vector for combiner.
>  (define_insn "*<sse>_vm<insn><mode>3"
> -  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> -       (vec_merge:VF_128
> -         (vec_duplicate:VF_128
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +       (vec_merge:VFH_128
> +         (vec_duplicate:VFH_128
>             (plusminus:<ssescalarmode>
>               (vec_select:<ssescalarmode>
> -               (match_operand:VF_128 1 "register_operand" "0,v")
> +               (match_operand:VFH_128 1 "register_operand" "0,v")
>                 (parallel [(const_int 0)]))
>               (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")))
>           (match_dup 1)
> @@ -1919,7 +1997,16 @@ (define_insn "*<sse>_vm<insn><mode>3"
>     v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
>    [(set_attr "isa" "noavx,avx")
>     (set_attr "type" "sseadd")
> -   (set_attr "prefix" "orig,vex")
> +   (set (attr "prefix")
> +     (cond [(eq_attr "alternative" "0")
> +             (const_string "orig")
> +           (eq_attr "alternative" "1")
> +             (if_then_else
> +               (match_test "<MODE>mode == V8HFmode")
> +               (const_string "evex")
> +               (const_string "vex"))
> +          ]
> +          (const_string "*")))
>     (set_attr "mode" "<ssescalarmode>")])
>
>  (define_insn "<sse>_vm<insn><mode>3<mask_scalar_name><round_scalar_name>"
> @@ -1966,12 +2053,12 @@ (define_insn "*mul<mode>3<mask_name><round_name>"
>  ;; Standard scalar operation patterns which preserve the rest of the
>  ;; vector for combiner.
>  (define_insn "*<sse>_vm<multdiv_mnemonic><mode>3"
> -  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> -       (vec_merge:VF_128
> -         (vec_duplicate:VF_128
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +       (vec_merge:VFH_128
> +         (vec_duplicate:VFH_128
>             (multdiv:<ssescalarmode>
>               (vec_select:<ssescalarmode>
> -               (match_operand:VF_128 1 "register_operand" "0,v")
> +               (match_operand:VFH_128 1 "register_operand" "0,v")
>                 (parallel [(const_int 0)]))
>               (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")))
>           (match_dup 1)
> @@ -1982,7 +2069,16 @@ (define_insn "*<sse>_vm<multdiv_mnemonic><mode>3"
>     v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
>    [(set_attr "isa" "noavx,avx")
>     (set_attr "type" "sse<multdiv_mnemonic>")
> -   (set_attr "prefix" "orig,vex")
> +   (set (attr "prefix")
> +     (cond [(eq_attr "alternative" "0")
> +             (const_string "orig")
> +           (eq_attr "alternative" "1")
> +             (if_then_else
> +               (match_test "<MODE>mode == V8HFmode")
> +               (const_string "evex")
> +               (const_string "vex"))
> +          ]
> +          (const_string "*")))
>     (set_attr "btver2_decode" "direct,double")
>     (set_attr "mode" "<ssescalarmode>")])
>
> @@ -2368,12 +2464,12 @@ (define_insn "ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>"
>  ;; Standard scalar operation patterns which preserve the rest of the
>  ;; vector for combiner.
>  (define_insn "*ieee_<ieee_maxmin><mode>3"
> -  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> -       (vec_merge:VF_128
> -         (vec_duplicate:VF_128
> +  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
> +       (vec_merge:VFH_128
> +         (vec_duplicate:VFH_128
>             (unspec:<ssescalarmode>
>               [(vec_select:<ssescalarmode>
> -                (match_operand:VF_128 1 "register_operand" "0,v")
> +                (match_operand:VFH_128 1 "register_operand" "0,v")
>                  (parallel [(const_int 0)]))
>                (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")]
>                IEEE_MAXMIN))
> @@ -2386,7 +2482,16 @@ (define_insn "*ieee_<ieee_maxmin><mode>3"
>    [(set_attr "isa" "noavx,avx")
>     (set_attr "type" "sseadd")
>     (set_attr "btver2_sse_attr" "maxmin")
> -   (set_attr "prefix" "orig,vex")
> +   (set (attr "prefix")
> +     (cond [(eq_attr "alternative" "0")
> +             (const_string "orig")
> +           (eq_attr "alternative" "1")
> +             (if_then_else
> +               (match_test "<MODE>mode == V8HFmode")
> +               (const_string "evex")
> +               (const_string "vex"))
> +          ]
> +          (const_string "*")))
>     (set_attr "mode" "<ssescalarmode>")])
>
>  (define_insn "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>"
> @@ -8364,6 +8469,45 @@ (define_insn "vec_set<mode>_0"
>            ]
>            (symbol_ref "true")))])
>
> +;; vmovw clears also the higer bits
> +(define_insn "vec_set<mode>_0"
> +  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v")
> +       (vec_merge:VF_AVX512FP16
> +         (vec_duplicate:VF_AVX512FP16
> +           (match_operand:HF 2 "nonimmediate_operand" "rm"))
> +         (match_operand:VF_AVX512FP16 1 "const0_operand" "C")
> +         (const_int 1)))]
> +  "TARGET_AVX512FP16"
> +  "vmovw\t{%2, %x0|%x0, %2}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
> +(define_insn "*avx512fp16_movsh"
> +  [(set (match_operand:V8HF 0 "register_operand" "=v")
> +       (vec_merge:V8HF
> +         (vec_duplicate:V8HF
> +           (match_operand:HF 2 "register_operand" "v"))
> +         (match_operand:V8HF 1 "register_operand" "v")
> +         (const_int 1)))]
> +  "TARGET_AVX512FP16"
> +  "vmovsh\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
> +(define_insn "avx512fp16_movsh"
> +  [(set (match_operand:V8HF 0 "register_operand" "=v")
> +       (vec_merge:V8HF
> +          (match_operand:V8HF 2 "register_operand" "v")
> +         (match_operand:V8HF 1 "register_operand" "v")
> +         (const_int 1)))]
> +  "TARGET_AVX512FP16"
> +  "vmovsh\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  ;; A subset is vec_setv4sf.
>  (define_insn "*vec_setv4sf_sse4_1"
>    [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
> @@ -8499,6 +8643,20 @@ (define_expand "vec_set<mode>"
>    DONE;
>  })
>
> +(define_expand "vec_setv8hf"
> +  [(match_operand:V8HF 0 "register_operand")
> +   (match_operand:HF 1 "register_operand")
> +   (match_operand 2 "vec_setm_sse41_operand")]
> +  "TARGET_SSE"
> +{
> +  if (CONST_INT_P (operands[2]))
> +    ix86_expand_vector_set (false, operands[0], operands[1],
> +                           INTVAL (operands[2]));
> +  else
> +    ix86_expand_vector_set_var (operands[0], operands[1], operands[2]);
> +  DONE;
> +})
> +
>  (define_expand "vec_set<mode>"
>    [(match_operand:V_256_512 0 "register_operand")
>     (match_operand:<ssescalarmode> 1 "register_operand")
> @@ -9214,10 +9372,10 @@ (define_insn "vec_extract_hi_<mode>"
>     (set_attr "length_immediate" "1")
>     (set_attr "mode" "<sseinsnmode>")])
>
> -(define_insn_and_split "vec_extract_lo_v32hi"
> -  [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,v,m")
> -       (vec_select:V16HI
> -         (match_operand:V32HI 1 "nonimmediate_operand" "v,m,v")
> +(define_insn_and_split "vec_extract_lo_<mode>"
> +  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,v,m")
> +       (vec_select:<ssehalfvecmode>
> +         (match_operand:V32_512 1 "nonimmediate_operand" "v,m,v")
>           (parallel [(const_int 0) (const_int 1)
>                      (const_int 2) (const_int 3)
>                      (const_int 4) (const_int 5)
> @@ -9244,9 +9402,10 @@ (define_insn_and_split "vec_extract_lo_v32hi"
>    if (!TARGET_AVX512VL
>        && REG_P (operands[0])
>        && EXT_REX_SSE_REG_P (operands[1]))
> -    operands[0] = lowpart_subreg (V32HImode, operands[0], V16HImode);
> +    operands[0] = lowpart_subreg (<MODE>mode, operands[0],
> +                                 <ssehalfvecmode>mode);
>    else
> -    operands[1] = gen_lowpart (V16HImode, operands[1]);
> +    operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);
>  }
>    [(set_attr "type" "sselog1")
>     (set_attr "prefix_extra" "1")
> @@ -9255,10 +9414,10 @@ (define_insn_and_split "vec_extract_lo_v32hi"
>     (set_attr "prefix" "evex")
>     (set_attr "mode" "XI")])
>
> -(define_insn "vec_extract_hi_v32hi"
> -  [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
> -       (vec_select:V16HI
> -         (match_operand:V32HI 1 "register_operand" "v")
> +(define_insn "vec_extract_hi_<mode>"
> +  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
> +       (vec_select:<ssehalfvecmode>
> +         (match_operand:V32_512 1 "register_operand" "v")
>           (parallel [(const_int 16) (const_int 17)
>                      (const_int 18) (const_int 19)
>                      (const_int 20) (const_int 21)
> @@ -9275,10 +9434,10 @@ (define_insn "vec_extract_hi_v32hi"
>     (set_attr "prefix" "evex")
>     (set_attr "mode" "XI")])
>
> -(define_insn_and_split "vec_extract_lo_v16hi"
> -  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,m")
> -       (vec_select:V8HI
> -         (match_operand:V16HI 1 "nonimmediate_operand" "vm,v")
> +(define_insn_and_split "vec_extract_lo_<mode>"
> +  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,m")
> +       (vec_select:<ssehalfvecmode>
> +         (match_operand:V16_256 1 "nonimmediate_operand" "vm,v")
>           (parallel [(const_int 0) (const_int 1)
>                      (const_int 2) (const_int 3)
>                      (const_int 4) (const_int 5)
> @@ -9287,12 +9446,12 @@ (define_insn_and_split "vec_extract_lo_v16hi"
>    "#"
>    "&& reload_completed"
>    [(set (match_dup 0) (match_dup 1))]
> -  "operands[1] = gen_lowpart (V8HImode, operands[1]);")
> +  "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);")
>
> -(define_insn "vec_extract_hi_v16hi"
> -  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=xm,vm,vm")
> -       (vec_select:V8HI
> -         (match_operand:V16HI 1 "register_operand" "x,v,v")
> +(define_insn "vec_extract_hi_<mode>"
> +  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=xm,vm,vm")
> +       (vec_select:<ssehalfvecmode>
> +         (match_operand:V16_256 1 "register_operand" "x,v,v")
>           (parallel [(const_int 8) (const_int 9)
>                      (const_int 10) (const_int 11)
>                      (const_int 12) (const_int 13)
> @@ -9428,12 +9587,41 @@ (define_insn "vec_extract_hi_v32qi"
>     (set_attr "prefix" "vex,evex,evex")
>     (set_attr "mode" "OI")])
>
> +;; NB: *vec_extract<mode>_0 must be placed before *vec_extracthf.
> +;; Otherwise, it will be ignored.
> +(define_insn_and_split "*vec_extract<mode>_0"
> +  [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,r")
> +       (vec_select:HF
> +         (match_operand:VF_AVX512FP16 1 "nonimmediate_operand" "vm,v,m")
> +         (parallel [(const_int 0)])))]
> +  "TARGET_AVX512FP16 && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
> +  "#"
> +  "&& reload_completed"
> +  [(set (match_dup 0) (match_dup 1))]
> +  "operands[1] = gen_lowpart (HFmode, operands[1]);")
> +
> +(define_insn "*vec_extracthf"
> +  [(set (match_operand:HF 0 "register_sse4nonimm_operand" "=r,m")
> +       (vec_select:HF
> +         (match_operand:V8HF 1 "register_operand" "v,v")
> +         (parallel
> +           [(match_operand:SI 2 "const_0_to_7_operand")])))]
> +  "TARGET_AVX512FP16"
> +  "@
> +   vpextrw\t{%2, %1, %k0|%k0, %1, %2}
> +   vpextrw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "sselog1")
> +   (set_attr "prefix" "maybe_evex")
> +   (set_attr "mode" "TI")])
> +
>  ;; Modes handled by vec_extract patterns.
>  (define_mode_iterator VEC_EXTRACT_MODE
>    [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
>     (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
>     (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
>     (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
> +   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
> +   (V8HF "TARGET_AVX512FP16")
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
> @@ -14666,16 +14854,16 @@ (define_expand "vec_interleave_low<mode>"
>
>  ;; Modes handled by pinsr patterns.
>  (define_mode_iterator PINSR_MODE
> -  [(V16QI "TARGET_SSE4_1") V8HI
> +  [(V16QI "TARGET_SSE4_1") V8HI (V8HF "TARGET_AVX512FP16")
>     (V4SI "TARGET_SSE4_1")
>     (V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
>
>  (define_mode_attr sse2p4_1
> -  [(V16QI "sse4_1") (V8HI "sse2")
> +  [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1")
>     (V4SI "sse4_1") (V2DI "sse4_1")])
>
>  (define_mode_attr pinsr_evex_isa
> -  [(V16QI "avx512bw") (V8HI "avx512bw")
> +  [(V16QI "avx512bw") (V8HI "avx512bw") (V8HF "avx512bw")
>     (V4SI "avx512dq") (V2DI "avx512dq")])
>
>  ;; sse4_1_pinsrd must come before sse2_loadld since it is preferred.
> @@ -14703,11 +14891,19 @@ (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
>      case 2:
>      case 4:
>        if (GET_MODE_SIZE (<ssescalarmode>mode) < GET_MODE_SIZE (SImode))
> -       return "vpinsr<ssemodesuffix>\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
> +       {
> +         if (<MODE>mode == V8HFmode)
> +           return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
> +         else
> +           return "vpinsr<ssemodesuffix>\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
> +       }
>        /* FALLTHRU */
>      case 3:
>      case 5:
> -      return "vpinsr<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> +      if (<MODE>mode == V8HFmode)
> +       return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> +      else
> +       return "vpinsr<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
>      default:
>        gcc_unreachable ();
>      }
> @@ -21122,16 +21318,17 @@ (define_mode_attr pbroadcast_evex_isa
>    [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw")
>     (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
>     (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f")
> -   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")])
> +   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")
> +   (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")])
>
>  (define_insn "avx2_pbroadcast<mode>"
> -  [(set (match_operand:VI 0 "register_operand" "=x,v")
> -       (vec_duplicate:VI
> +  [(set (match_operand:VIHF 0 "register_operand" "=x,v")
> +       (vec_duplicate:VIHF
>           (vec_select:<ssescalarmode>
>             (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "xm,vm")
>             (parallel [(const_int 0)]))))]
>    "TARGET_AVX2"
> -  "vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
> +  "vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}"
>    [(set_attr "isa" "*,<pbroadcast_evex_isa>")
>     (set_attr "type" "ssemov")
>     (set_attr "prefix_extra" "1")
> @@ -21139,17 +21336,17 @@ (define_insn "avx2_pbroadcast<mode>"
>     (set_attr "mode" "<sseinsnmode>")])
>
>  (define_insn "avx2_pbroadcast<mode>_1"
> -  [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v")
> -       (vec_duplicate:VI_256
> +  [(set (match_operand:VIHF_256 0 "register_operand" "=x,x,v,v")
> +       (vec_duplicate:VIHF_256
>           (vec_select:<ssescalarmode>
> -           (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v")
> +           (match_operand:VIHF_256 1 "nonimmediate_operand" "m,x,m,v")
>             (parallel [(const_int 0)]))))]
>    "TARGET_AVX2"
>    "@
> -   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
> -   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
> -   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
> -   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}"
> +   vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}
> +   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}
> +   vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}
> +   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}"
>    [(set_attr "isa" "*,*,<pbroadcast_evex_isa>,<pbroadcast_evex_isa>")
>     (set_attr "type" "ssemov")
>     (set_attr "prefix_extra" "1")
> @@ -21503,15 +21700,15 @@ (define_insn "avx2_vec_dupv4df"
>     (set_attr "mode" "V4DF")])
>
>  (define_insn "<avx512>_vec_dup<mode>_1"
> -  [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v")
> -       (vec_duplicate:VI_AVX512BW
> +  [(set (match_operand:VIHF_AVX512BW 0 "register_operand" "=v,v")
> +       (vec_duplicate:VIHF_AVX512BW
>           (vec_select:<ssescalarmode>
> -           (match_operand:VI_AVX512BW 1 "nonimmediate_operand" "v,m")
> +           (match_operand:VIHF_AVX512BW 1 "nonimmediate_operand" "v,m")
>             (parallel [(const_int 0)]))))]
>    "TARGET_AVX512F"
>    "@
> -   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
> -   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %<iptr>1}"
> +   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}
> +   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %<iptr>1}"
>    [(set_attr "type" "ssemov")
>     (set_attr "prefix" "evex")
>     (set_attr "mode" "<sseinsnmode>")])
> @@ -21536,8 +21733,8 @@ (define_insn "<avx512>_vec_dup<mode><mask_name>"
>     (set_attr "mode" "<sseinsnmode>")])
>
>  (define_insn "<avx512>_vec_dup<mode><mask_name>"
> -  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
> -       (vec_duplicate:VI12_AVX512VL
> +  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v")
> +       (vec_duplicate:VI12HF_AVX512VL
>           (vec_select:<ssescalarmode>
>             (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
>             (parallel [(const_int 0)]))))]
> @@ -21572,8 +21769,8 @@ (define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
>     (set_attr "mode" "<sseinsnmode>")])
>
>  (define_insn "<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>"
> -  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v,v")
> -       (vec_duplicate:VI12_AVX512VL
> +  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v,v")
> +       (vec_duplicate:VI12HF_AVX512VL
>           (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm,r")))]
>    "TARGET_AVX512BW"
>    "@
> @@ -21668,7 +21865,7 @@ (define_mode_attr vecdupssescalarmodesuffix
>    [(V8SF "ss") (V4DF "sd") (V8SI "ss") (V4DI "sd")])
>  ;; Modes handled by AVX2 vec_dup patterns.
>  (define_mode_iterator AVX2_VEC_DUP_MODE
> -  [V32QI V16QI V16HI V8HI V8SI V4SI])
> +  [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF])
>
>  (define_insn "*vec_dup<mode>"
>    [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand" "=x,x,v")
> @@ -22224,12 +22421,12 @@ (define_insn "vec_set_hi_<mode><mask_name>"
>     (set_attr "prefix" "vex")
>     (set_attr "mode" "<sseinsnmode>")])
>
> -(define_insn "vec_set_lo_v16hi"
> -  [(set (match_operand:V16HI 0 "register_operand" "=x,v")
> -       (vec_concat:V16HI
> -         (match_operand:V8HI 2 "nonimmediate_operand" "xm,vm")
> -         (vec_select:V8HI
> -           (match_operand:V16HI 1 "register_operand" "x,v")
> +(define_insn "vec_set_lo_<mode>"
> +  [(set (match_operand:V16_256 0 "register_operand" "=x,v")
> +       (vec_concat:V16_256
> +         (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")
> +         (vec_select:<ssehalfvecmode>
> +           (match_operand:V16_256 1 "register_operand" "x,v")
>             (parallel [(const_int 8) (const_int 9)
>                        (const_int 10) (const_int 11)
>                        (const_int 12) (const_int 13)
> @@ -22244,16 +22441,16 @@ (define_insn "vec_set_lo_v16hi"
>     (set_attr "prefix" "vex,evex")
>     (set_attr "mode" "OI")])
>
> -(define_insn "vec_set_hi_v16hi"
> -  [(set (match_operand:V16HI 0 "register_operand" "=x,v")
> -       (vec_concat:V16HI
> -         (vec_select:V8HI
> -           (match_operand:V16HI 1 "register_operand" "x,v")
> +(define_insn "vec_set_hi_<mode>"
> +  [(set (match_operand:V16_256 0 "register_operand" "=x,v")
> +       (vec_concat:V16_256
> +         (vec_select:<ssehalfvecmode>
> +           (match_operand:V16_256 1 "register_operand" "x,v")
>             (parallel [(const_int 0) (const_int 1)
>                        (const_int 2) (const_int 3)
>                        (const_int 4) (const_int 5)
>                        (const_int 6) (const_int 7)]))
> -         (match_operand:V8HI 2 "nonimmediate_operand" "xm,vm")))]
> +         (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")))]
>    "TARGET_AVX"
>    "@
>     vinsert%~128\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}
> @@ -22430,6 +22627,8 @@ (define_mode_iterator VEC_INIT_MODE
>     (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
>     (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
>     (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
> +   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
> +   (V8HF "TARGET_AVX512FP16")
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
> @@ -22441,6 +22640,8 @@ (define_mode_iterator VEC_INIT_HALF_MODE
>     (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
>     (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
>     (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
> +   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
> +   (V8HF "TARGET_AVX512FP16")
>     (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
>     (V4TI "TARGET_AVX512F")])
> --
> 2.18.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.
  2021-07-21  7:43         ` [PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions liuhongt
@ 2021-07-22  8:49           ` Uros Bizjak
  2021-07-27  7:31             ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Uros Bizjak @ 2021-07-22  8:49 UTC (permalink / raw)
  To: liuhongt
  Cc: gcc-patches, Joseph S. Myers, H. J. Lu, Richard Biener,
	Hongtao Liu, Guo, Xuepeng

On Wed, Jul 21, 2021 at 9:44 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> From: "Guo, Xuepeng" <xuepeng.guo@intel.com>
>
> gcc/ChangeLog:
>
>         * common/config/i386/cpuinfo.h (get_available_features):
>         Detect FEATURE_AVX512FP16.
>         * common/config/i386/i386-common.c
>         (OPTION_MASK_ISA_AVX512FP16_SET,
>         OPTION_MASK_ISA_AVX512FP16_UNSET,
>         OPTION_MASK_ISA2_AVX512FP16_SET,
>         OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
>         (OPTION_MASK_ISA2_AVX512BW_UNSET,
>         OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
>         (ix86_handle_option): Handle -mavx512fp16.
>         * common/config/i386/i386-cpuinfo.h (enum processor_features):
>         Add FEATURE_AVX512FP16.
>         * common/config/i386/i386-isas.h: Add entry for AVX512FP16.
>         * config.gcc: Add avx512fp16intrin.h.
>         * config/i386/avx512fp16intrin.h: New intrinsic header.
>         * config/i386/cpuid.h: Add bit_AVX512FP16.
>         * config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
>         * config/i386/i386-builtins.c: Support _Float16 type for i386
>         backend.
>         (ix86_init_float16_builtins): New function.
>         (ix86_float16_type_node): New.
>         * config/i386/i386-c.c (ix86_target_macros_internal): Define
>         __AVX512FP16__.
>         * config/i386/i386-expand.c (ix86_expand_branch): Support
>         HFmode.
>         (ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
>         SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
>         (ix86_expand_fp_movcc): Ditto.
>         * config/i386/i386-isa.def: Add PTA define for AVX512FP16.
>         * config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
>         (ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
>         * config/i386/i386.c (ix86_get_ssemov): Use
>         vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
>         (ix86_get_excess_precision): Use
>         FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
>         existed.
>         (output_387_binary_op): Update instruction suffix for HFmode.
>         (sse_store_index): Use SFmode cost for HFmode cost.
>         (inline_memory_move_cost): Add HFmode, and perfer SSE cost over
>         GPR cost for HFmode.
>         (ix86_hard_regno_mode_ok): Allow HImode in sse register.
>         (ix86_mangle_type): Add manlging for _Float16 type.
>         (inline_secondary_memory_needed): No memory is needed for
>         16bit movement between gpr and sse reg under
>         TARGET_AVX512FP16.
>         (ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
>         SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
>         (ix86_division_cost): Ditto.
>         (ix86_rtx_costs): Ditto.
>         (ix86_add_stmt_cost): Ditto.
>         (ix86_optab_supported_p): Ditto.
>         * config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
>         (SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
>         (SSE_FLOAT_MODE_P): Add HFmode.
>         (PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
>         * config/i386/i386.md (mode): Add HFmode.
>         (MODE_SIZE): Add HFmode.
>         (MODEFH): Likewise.
>         (ssemodesuffix): Add sh suffix for HFmode.
>         (cbranch<mode>4): Use MODEFH.
>         (<insn><mode>3): Likewise.
>         (mul<mode>3): Likewise.
>         (div<mode>3): Likewise.
>         (*ieee_s<ieee_maxmin><mode>3): Likewise.
>         (*cmpi<unord>hf): New define_insn for HFmode.
>         (*movhf_internal): Adjust for avx512fp16 instruction.
>         (extendhf<mode>2): Likewise.
>         (trunc<mode>hf2): Likewise.
>         (*fop_hf_comm): Likewise.
>         (*fop_hf_1): Likewise.
>         (float<floatunssuffix><mode>hf2): Likewise.
>         (mov<mode>cc): Likewise.
>         * config/i386/i386.opt: Add mavx512fp16.
>         * config/i386/immintrin.h: Include avx512fp16intrin.h.
>         * doc/invoke.texi: Add mavx512fp16.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
>         * gcc.target/i386/avx-2.c: Ditto.
>         * gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
>         * gcc.target/i386/funcspec-56.inc: Add new target attribute check.
>         * gcc.target/i386/sse-13.c: Add -mavx512fp16.
>         * gcc.target/i386/sse-14.c: Ditto.
>         * gcc.target/i386/sse-22.c: Ditto.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * lib/target-supports.exp: (check_effective_target_avx512fp16): New.
>         * g++.target/i386/float16-1.C: New test.
>         * g++.target/i386/float16-2.C: Ditto.
>         * g++.target/i386/float16-3.C: Ditto.
>         * gcc.target/i386/avx512fp16-12a.c: Ditto.
>         * gcc.target/i386/avx512fp16-12b.c: Ditto.
>         * gcc.target/i386/float16-3a.c: Ditto.
>         * gcc.target/i386/float16-3b.c: Ditto.
>         * gcc.target/i386/float16-4a.c: Ditto.
>         * gcc.target/i386/float16-4b.c: Ditto.
>         * gcc.target/i386/pr54855-12.c: Ditto.
>         * g++.dg/other/i386-2.C: Ditto.
>         * g++.dg/other/i386-3.C: Ditto.
>
> Co-Authored-By: Guo, Xuepeng <xuepeng.guo@intel.com>
> Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
> Co-Authored-By: Liu, Hongtao <hongtao.liu@intel.com>
> Co-Authored-By: Wang, Hongyu <hongyu.wang@intel.com>
> Co-Authored-By: Xu, Dianhong <dianhong.xu@intel.com>
> ---
>  gcc/common/config/i386/cpuinfo.h              |   2 +
>  gcc/common/config/i386/i386-common.c          |  26 ++-
>  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
>  gcc/common/config/i386/i386-isas.h            |   1 +
>  gcc/config.gcc                                |   2 +-
>  gcc/config/i386/avx512fp16intrin.h            |  53 +++++
>  gcc/config/i386/cpuid.h                       |   1 +
>  gcc/config/i386/i386-builtin-types.def        |   1 +
>  gcc/config/i386/i386-builtins.c               |  23 +++
>  gcc/config/i386/i386-c.c                      |   2 +
>  gcc/config/i386/i386-expand.c                 |   5 +-
>  gcc/config/i386/i386-isa.def                  |   1 +
>  gcc/config/i386/i386-options.c                |   4 +-
>  gcc/config/i386/i386.c                        | 128 ++++++++----
>  gcc/config/i386/i386.h                        |  11 +-
>  gcc/config/i386/i386.md                       | 185 ++++++++++++++----
>  gcc/config/i386/i386.opt                      |   4 +
>  gcc/config/i386/immintrin.h                   |   4 +
>  gcc/doc/invoke.texi                           |  10 +-
>  gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
>  gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
>  gcc/testsuite/g++.target/i386/float16-2.C     |  14 ++
>  gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
>  gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
>  .../gcc.target/i386/avx512fp16-12a.c          |  21 ++
>  .../gcc.target/i386/avx512fp16-12b.c          |  27 +++
>  gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 ++
>  gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
>  gcc/testsuite/lib/target-supports.exp         |  13 +-
>  40 files changed, 531 insertions(+), 103 deletions(-)
>  create mode 100644 gcc/config/i386/avx512fp16intrin.h
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
>
> diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
> index 458f41de776..1835ac64e67 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -731,6 +731,8 @@ get_available_features (struct __processor_model *cpu_model,
>             set_feature (FEATURE_AVX5124FMAPS);
>           if (edx & bit_AVX512VP2INTERSECT)
>             set_feature (FEATURE_AVX512VP2INTERSECT);
> +         if (edx & bit_AVX512FP16)
> +           set_feature (FEATURE_AVX512FP16);
>         }
>
>        __cpuid_count (7, 1, eax, ebx, ecx, edx);
> diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
> index 76ab1a14e54..00c65ba15ab 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX5124VNNIW_SET OPTION_MASK_ISA2_AVX5124VNNIW
>  #define OPTION_MASK_ISA_AVX512VBMI2_SET \
>    (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512F_SET)
> +#define OPTION_MASK_ISA_AVX512FP16_SET OPTION_MASK_ISA_AVX512BW_SET
> +#define OPTION_MASK_ISA2_AVX512FP16_SET OPTION_MASK_ISA2_AVX512FP16
>  #define OPTION_MASK_ISA_AVX512VNNI_SET \
>    (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512F_SET)
>  #define OPTION_MASK_ISA2_AVXVNNI_SET OPTION_MASK_ISA2_AVXVNNI
> @@ -231,6 +233,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS
>  #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW
>  #define OPTION_MASK_ISA_AVX512VBMI2_UNSET OPTION_MASK_ISA_AVX512VBMI2
> +#define OPTION_MASK_ISA_AVX512FP16_UNSET OPTION_MASK_ISA_AVX512BW_UNSET
> +#define OPTION_MASK_ISA2_AVX512FP16_UNSET OPTION_MASK_ISA2_AVX512FP16
>  #define OPTION_MASK_ISA_AVX512VNNI_UNSET OPTION_MASK_ISA_AVX512VNNI
>  #define OPTION_MASK_ISA2_AVXVNNI_UNSET OPTION_MASK_ISA2_AVXVNNI
>  #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ
> @@ -313,7 +317,8 @@ along with GCC; see the file COPYING3.  If not see
>    (OPTION_MASK_ISA2_AVX512BF16_UNSET \
>     | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \
>     | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \
> -   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
> +   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> +   | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
>    (OPTION_MASK_ISA2_AVX512F_UNSET)
>  #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
> @@ -326,7 +331,9 @@ along with GCC; see the file COPYING3.  If not see
>    (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET)
>  #define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET
>
> -#define OPTION_MASK_ISA2_AVX512BW_UNSET OPTION_MASK_ISA2_AVX512BF16_UNSET
> +#define OPTION_MASK_ISA2_AVX512BW_UNSET \
> +  (OPTION_MASK_ISA2_AVX512BF16_UNSET \
> +    | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>
>  /* Set 1 << value as value of -malign-FLAG option.  */
>
> @@ -853,6 +860,21 @@ ix86_handle_option (struct gcc_options *opts,
>         }
>        return true;
>
> +    case OPT_mavx512fp16:
> +      if (value)
> +       {
> +         opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
> +       }
> +      else
> +       {
> +         opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
> +       }
> +      return true;
> +
>      case OPT_mavx512vnni:
>        if (value)
>         {
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
> index e68dd656046..4e0659fc7b2 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -228,6 +228,7 @@ enum processor_features
>    FEATURE_AESKLE,
>    FEATURE_WIDEKL,
>    FEATURE_AVXVNNI,
> +  FEATURE_AVX512FP16,
>    CPU_FEATURE_MAX
>  };
>
> diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
> index 898c18f3dda..a6783660278 100644
> --- a/gcc/common/config/i386/i386-isas.h
> +++ b/gcc/common/config/i386/i386-isas.h
> @@ -169,4 +169,5 @@ ISA_NAMES_TABLE_START
>    ISA_NAMES_TABLE_ENTRY("aeskle", FEATURE_AESKLE, P_NONE, NULL)
>    ISA_NAMES_TABLE_ENTRY("widekl", FEATURE_WIDEKL, P_NONE, "-mwidekl")
>    ISA_NAMES_TABLE_ENTRY("avxvnni", FEATURE_AVXVNNI, P_NONE, "-mavxvnni")
> +  ISA_NAMES_TABLE_ENTRY("avx512fp16", FEATURE_AVX512FP16, P_NONE, "-mavx512fp16")
>  ISA_NAMES_TABLE_END
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 3df9b52cf25..a354351408c 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
>                        tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
>                        amxbf16intrin.h x86gprintrin.h uintrintrin.h
>                        hresetintrin.h keylockerintrin.h avxvnniintrin.h
> -                      mwaitintrin.h"
> +                      mwaitintrin.h avx512fp16intrin.h"
>         ;;
>  ia64-*-*)
>         extra_headers=ia64intrin.h
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> new file mode 100644
> index 00000000000..38d63161ba6
> --- /dev/null
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -0,0 +1,53 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _IMMINTRIN_H_INCLUDED
> +#error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
> +#endif
> +
> +#ifndef __AVX512FP16INTRIN_H_INCLUDED
> +#define __AVX512FP16INTRIN_H_INCLUDED
> +
> +#ifndef __AVX512FP16__
> +#pragma GCC push_options
> +#pragma GCC target("avx512fp16")
> +#define __DISABLE_AVX512FP16__
> +#endif /* __AVX512FP16__ */
> +
> +/* Internal data types for implementing the intrinsics.  */
> +typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
> +typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
> +typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
> +
> +/* The Intel API is flexible enough that we must allow aliasing with other
> +   vector types, and their scalar components.  */
> +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
> +typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
> +typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +#ifdef __DISABLE_AVX512FP16__
> +#undef __DISABLE_AVX512FP16__
> +#pragma GCC pop_options
> +#endif /* __DISABLE_AVX512FP16__ */
> +
> +#endif /* __AVX512FP16INTRIN_H_INCLUDED */
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index aebc17c6827..82b8050028b 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -126,6 +126,7 @@
>  #define bit_AVX5124VNNIW (1 << 2)
>  #define bit_AVX5124FMAPS (1 << 3)
>  #define bit_AVX512VP2INTERSECT (1 << 8)
> +#define bit_AVX512FP16   (1 << 23)
>  #define bit_IBT        (1 << 20)
>  #define bit_UINTR (1 << 5)
>  #define bit_PCONFIG    (1 << 18)
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index 3ca313c19ec..1768b88d748 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -68,6 +68,7 @@ DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
>  DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)
>  DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
>  DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
> +DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>  DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
> index 204e2903126..668f09f12a0 100644
> --- a/gcc/config/i386/i386-builtins.c
> +++ b/gcc/config/i386/i386-builtins.c
> @@ -125,6 +125,7 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
>  /* Table for the ix86 builtin non-function types.  */
>  static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
>
> +tree ix86_float16_type_node = NULL_TREE;
>  /* Retrieve an element from the above table, building some of
>     the types lazily.  */
>
> @@ -1343,6 +1344,26 @@ ix86_init_builtins_va_builtins_abi (void)
>                         BUILT_IN_VA_COPY, BUILT_IN_NORMAL, NULL, fnattr_sysv);
>  }
>
> +static void
> +ix86_init_float16_builtins (void)
> +{
> +  /* Provide the _Float16 type and float16_type_node if needed so that
> +     it can be used in AVX512FP16 intrinsics and builtins.  */
> +  if (!float16_type_node)
> +    {
> +      ix86_float16_type_node = make_node (REAL_TYPE);
> +      TYPE_PRECISION (ix86_float16_type_node) = 16;
> +      SET_TYPE_MODE (ix86_float16_type_node, HFmode);
> +      layout_type (ix86_float16_type_node);
> +    }
> +  else
> +    ix86_float16_type_node = float16_type_node;
> +
> +  if (!maybe_get_identifier ("_Float16") && TARGET_SSE2)
> +    lang_hooks.types.register_builtin_type (ix86_float16_type_node,
> +                                           "_Float16");
> +}
> +
>  static void
>  ix86_init_builtin_types (void)
>  {
> @@ -1371,6 +1392,8 @@ ix86_init_builtin_types (void)
>       it.  */
>    lang_hooks.types.register_builtin_type (float128_type_node, "__float128");
>
> +  ix86_init_float16_builtins ();
> +
>    const_string_type_node
>      = build_pointer_type (build_qualified_type
>                           (char_type_node, TYPE_QUAL_CONST));
> diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> index 5ed0de006fb..cc64f855ecc 100644
> --- a/gcc/config/i386/i386-c.c
> +++ b/gcc/config/i386/i386-c.c
> @@ -598,6 +598,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>      def_or_undef (parse_in, "__PTWRITE__");
>    if (isa_flag2 & OPTION_MASK_ISA2_AVX512BF16)
>      def_or_undef (parse_in, "__AVX512BF16__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX512FP16)
> +    def_or_undef (parse_in, "__AVX512FP16__");
>    if (TARGET_MMX_WITH_SSE)
>      def_or_undef (parse_in, "__MMX_WITH_SSE__");
>    if (isa_flag2 & OPTION_MASK_ISA2_ENQCMD)
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 69ea79e6123..b7d050a1e42 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -2314,6 +2314,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
>
>    switch (mode)
>      {
> +    case E_HFmode:
>      case E_SFmode:
>      case E_DFmode:
>      case E_XFmode:
> @@ -2627,7 +2628,7 @@ ix86_prepare_fp_compare_args (enum rtx_code code, rtx *pop0, rtx *pop1)
>    bool unordered_compare = ix86_unordered_fp_compare (code);
>    rtx op0 = *pop0, op1 = *pop1;
>    machine_mode op_mode = GET_MODE (op0);
> -  bool is_sse = TARGET_SSE_MATH && SSE_FLOAT_MODE_P (op_mode);
> +  bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
>
>    /* All of the unordered compare instructions only work on registers.
>       The same is true of the fcomi compare instructions.  The XFmode
> @@ -4112,7 +4113,7 @@ ix86_expand_fp_movcc (rtx operands[])
>    rtx op0 = XEXP (operands[1], 0);
>    rtx op1 = XEXP (operands[1], 1);
>
> -  if (TARGET_SSE_MATH && SSE_FLOAT_MODE_P (mode))
> +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      {
>        machine_mode cmode;
>
> diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
> index a0d46cbc892..83d9302ea3d 100644
> --- a/gcc/config/i386/i386-isa.def
> +++ b/gcc/config/i386/i386-isa.def
> @@ -108,3 +108,4 @@ DEF_PTA(HRESET)
>  DEF_PTA(KL)
>  DEF_PTA(WIDEKL)
>  DEF_PTA(AVXVNNI)
> +DEF_PTA(AVX512FP16)
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index 3416a4f1752..df191763e4b 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -223,7 +223,8 @@ static struct ix86_target_opts isa2_opts[] =
>    { "-mhreset",                OPTION_MASK_ISA2_HRESET },
>    { "-mkl",            OPTION_MASK_ISA2_KL },
>    { "-mwidekl",        OPTION_MASK_ISA2_WIDEKL },
> -  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI }
> +  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI },
> +  { "-mavx512fp16",    OPTION_MASK_ISA2_AVX512FP16 }
>  };
>  static struct ix86_target_opts isa_opts[] =
>  {
> @@ -1045,6 +1046,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
>      IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
>      IX86_ATTR_ISA ("hreset", OPT_mhreset),
>      IX86_ATTR_ISA ("avxvnni",   OPT_mavxvnni),
> +    IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16),
>
>      /* enum options */
>      IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_),
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 02628d838fc..e826484a4f4 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5497,6 +5497,14 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
>      case MODE_SI:
>        return "%vmovd\t{%1, %0|%0, %1}";
>
> +    case MODE_HI:
> +      if (GENERAL_REG_P (operands[0]))
> +       return "vmovw\t{%1, %k0|%k0, %1}";
> +      else if (GENERAL_REG_P (operands[1]))
> +       return "vmovw\t{%k1, %0|%0, %k1}";
> +      else
> +       return "vmovw\t{%1, %0|%0, %1}";
> +
>      case MODE_DF:
>        if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
>         return "vmovsd\t{%d1, %0|%0, %d1}";
> @@ -5509,6 +5517,12 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
>        else
>         return "%vmovss\t{%1, %0|%0, %1}";
>
> +    case MODE_HF:
> +      if (REG_P (operands[0]) && REG_P (operands[1]))
> +       return "vmovsh\t{%d1, %0|%0, %d1}";
> +      else
> +       return "vmovsh\t{%1, %0|%0, %1}";
> +
>      case MODE_V1DF:
>        gcc_assert (!TARGET_AVX);
>        return "movlpd\t{%1, %0|%0, %1}";
> @@ -13955,7 +13969,9 @@ output_387_binary_op (rtx_insn *insn, rtx *operands)
>
>    if (is_sse)
>     {
> -     p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd";
> +     p = (GET_MODE (operands[0]) == HFmode
> +         ? "sh"
> +         : (GET_MODE (operands[0]) == SFmode ? "ss" : "sd"));
>       strcat (buf, p);
>
>       if (TARGET_AVX)
> @@ -19132,9 +19148,11 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1,
>        if (!TARGET_SSE2)
>         return true;
>
> -      /* Between SSE and general, we have moves no larger than word size.  */
> +      /* Between SSE and general, we have moves no larger than word size
> +        except for AVX512FP16, VMOVW enable 16bits movement.  */
>        if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
> -         || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
> +         || GET_MODE_SIZE (mode) < GET_MODE_SIZE (TARGET_AVX512FP16
> +                                                  ? HImode : SImode)
>           || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
>         return true;

Please recode the above to something like:

if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
  return true;

int msize = GET_MODE_SIZE (mode);

/* Between SSE and general, we have moves no larger than word size.  */
if (msize > UNITS_PER_WORD)
  return true;

/* In addition to SImode moves, AVX512FP16 also enables HImode moves.  */
int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);

if (msize < minsize)
  return true;

> @@ -19229,21 +19247,26 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
>  static inline int
>  sse_store_index (machine_mode mode)
>  {
> -      switch (GET_MODE_SIZE (mode))
> -       {
> -         case 4:
> -           return 0;
> -         case 8:
> -           return 1;
> -         case 16:
> -           return 2;
> -         case 32:
> -           return 3;
> -         case 64:
> -           return 4;
> -         default:
> -           return -1;
> -       }
> +  /* NB: Use SFmode cost for HFmode instead of adding HFmode load/store
> +     costs to processor_costs, which requires changes to all entries in
> +     processor cost table.  */
> +  if (mode == E_HFmode)
> +    mode = E_SFmode;
> +  switch (GET_MODE_SIZE (mode))
> +    {
> +    case 4:
> +      return 0;
> +    case 8:
> +      return 1;
> +    case 16:
> +      return 2;
> +    case 32:
> +      return 3;
> +    case 64:
> +      return 4;
> +    default:
> +      return -1;
> +    }
>  }
>
>  /* Return the cost of moving data of mode M between a
> @@ -19270,6 +19293,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>        int index;
>        switch (mode)
>         {
> +         case E_HFmode:
>           case E_SFmode:
>             index = 0;
>             break;
> @@ -19370,11 +19394,31 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>           }
>         break;
>        case 2:
> -       if (in == 2)
> -         return MAX (ix86_cost->hard_register.int_load[1],
> -                     ix86_cost->hard_register.int_store[1]);
> -       return in ? ix86_cost->hard_register.int_load[1]
> -                 : ix86_cost->hard_register.int_store[1];
> +       {
> +         int cost;
> +         if (in == 2)
> +           cost = MAX (ix86_cost->hard_register.int_load[1],
> +                       ix86_cost->hard_register.int_store[1]);
> +         else
> +           cost = in ? ix86_cost->hard_register.int_load[1]
> +                     : ix86_cost->hard_register.int_store[1];
> +         if (mode == E_HFmode)
> +           {
> +             /* Prefer SSE over GPR for HFmode.  */
> +             int sse_cost;
> +             int index = sse_store_index (mode);
> +             if (in == 2)
> +               sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
> +                               ix86_cost->hard_register.sse_store[index]);
> +             else
> +               sse_cost = (in
> +                           ? ix86_cost->hard_register.sse_load [index]
> +                           : ix86_cost->hard_register.sse_store [index]);
> +             if (sse_cost >= cost)
> +               cost = sse_cost + 1;
> +           }
> +         return cost;
> +       }
>        default:
>         if (in == 2)
>           cost = MAX (ix86_cost->hard_register.int_load[2],
> @@ -19548,6 +19592,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>           - XI mode
>           - any of 512-bit wide vector mode
>           - any scalar mode.  */
> +      /* For AVX512FP16, vmovw supports movement of HImode
> +        between gpr and sse registser.  */
>        if (TARGET_AVX512F
>           && (mode == XImode
>               || VALID_AVX512F_REG_MODE (mode)
> @@ -19833,7 +19879,7 @@ ix86_multiplication_cost (const struct processor_costs *cost,
>    if (VECTOR_MODE_P (mode))
>      inner_mode = GET_MODE_INNER (mode);
>
> -  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      return inner_mode == DFmode ? cost->mulsd : cost->mulss;
>    else if (X87_FLOAT_MODE_P (mode))
>      return cost->fmul;
> @@ -19885,7 +19931,7 @@ ix86_division_cost (const struct processor_costs *cost,
>    if (VECTOR_MODE_P (mode))
>      inner_mode = GET_MODE_INNER (mode);
>
> -  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      return inner_mode == DFmode ? cost->divsd : cost->divss;
>    else if (X87_FLOAT_MODE_P (mode))
>      return cost->fdiv;
> @@ -20305,7 +20351,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>           return true;
>         }
>
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         {
>           *total = cost->addss;
>           return false;
> @@ -20338,7 +20384,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        /* FALLTHRU */
>
>      case NEG:
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         {
>           *total = cost->sse_op;
>           return false;
> @@ -20420,14 +20466,14 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        return false;
>
>      case FLOAT_EXTEND:
> -      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
> +      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = 0;
>        else
>          *total = ix86_vec_cost (mode, cost->addss);
>        return false;
>
>      case FLOAT_TRUNCATE:
> -      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
> +      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = cost->fadd;
>        else
>          *total = ix86_vec_cost (mode, cost->addss);
> @@ -20437,7 +20483,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        /* SSE requires memory load for the constant operand. It may make
>          sense to account for this.  Of course the constant operand may or
>          may not be reused. */
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = cost->sse_op;
>        else if (X87_FLOAT_MODE_P (mode))
>         *total = cost->fabs;
> @@ -20446,7 +20492,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        return false;
>
>      case SQRT:
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = mode == SFmode ? cost->sqrtss : cost->sqrtsd;
>        else if (X87_FLOAT_MODE_P (mode))
>         *total = cost->fsqrt;
> @@ -21930,6 +21976,10 @@ ix86_mangle_type (const_tree type)
>
>    switch (TYPE_MODE (type))
>      {
> +    case E_HFmode:
> +      /* _Float16 is "DF16_".
> +        Align with clang's decision in https://reviews.llvm.org/D33719. */
> +      return "DF16_";
>      case E_TFmode:
>        /* __float128 is "g".  */
>        return "g";
> @@ -22553,7 +22603,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>         case MINUS_EXPR:
>           if (kind == scalar_stmt)
>             {
> -             if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +             if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>                 stmt_cost = ix86_cost->addss;
>               else if (X87_FLOAT_MODE_P (mode))
>                 stmt_cost = ix86_cost->fadd;
> @@ -22571,7 +22621,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>           stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>           break;
>         case NEGATE_EXPR:
> -         if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +         if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>             stmt_cost = ix86_cost->sse_op;
>           else if (X87_FLOAT_MODE_P (mode))
>             stmt_cost = ix86_cost->fchs;
> @@ -22627,7 +22677,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>         case BIT_XOR_EXPR:
>         case BIT_AND_EXPR:
>         case BIT_NOT_EXPR:
> -         if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +         if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>             stmt_cost = ix86_cost->sse_op;
>           else if (VECTOR_MODE_P (mode))
>             stmt_cost = ix86_vec_cost (mode, ix86_cost->sse_op);
> @@ -23233,8 +23283,7 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode,
>        return opt_type == OPTIMIZE_FOR_SPEED;
>
>      case rint_optab:
> -      if (SSE_FLOAT_MODE_P (mode1)
> -         && TARGET_SSE_MATH
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode1)
>           && !flag_trapping_math
>           && !TARGET_SSE4_1)

The above change is wrong. The condition is enabled for
!TARGET_SSE4_1, so it never triggers for TARGET_AVX512FP16.

>         return opt_type == OPTIMIZE_FOR_SPEED;
> @@ -23243,8 +23292,7 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode,
>      case floor_optab:
>      case ceil_optab:
>      case btrunc_optab:
> -      if (SSE_FLOAT_MODE_P (mode1)
> -         && TARGET_SSE_MATH
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode1)
>           && !flag_trapping_math
>           && TARGET_SSE4_1)
>         return true;
> @@ -23329,7 +23377,9 @@ ix86_get_excess_precision (enum excess_precision_type type)
>         /* The fastest type to promote to will always be the native type,
>            whether that occurs with implicit excess precision or
>            otherwise.  */
> -       return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +       return TARGET_AVX512FP16
> +              ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> +              : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
>        case EXCESS_PRECISION_TYPE_STANDARD:
>        case EXCESS_PRECISION_TYPE_IMPLICIT:
>         /* Otherwise, the excess precision we want when we are
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index e21922e8782..dca2ad32ed4 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1000,7 +1000,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>
>  #define VALID_AVX512F_SCALAR_MODE(MODE)                                        \
>    ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode            \
> -   || (MODE) == SFmode)
> +   || (MODE) == SFmode                                                 \
> +   || (((MODE) == HImode || (MODE) == HFmode) && TARGET_AVX512FP16))

Please put TARGET_... in front of the condition.

>  #define VALID_AVX512F_REG_MODE(MODE)                                   \
>    ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode     \
> @@ -1039,7 +1040,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>
>  #define VALID_FP_MODE_P(MODE)                                          \
>    ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode            \
> -   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)                \
> +   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
>
>  #define VALID_INT_MODE_P(MODE)                                         \
>    ((MODE) == QImode || (MODE) == HImode                                        \
> @@ -1071,6 +1072,10 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define SSE_FLOAT_MODE_P(MODE) \
>    ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
>
> +#define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE)                           \
> +  ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH)                                \
> +   || (TARGET_AVX512FP16 && (MODE) == HFmode))
> +
>  #define FMA4_VEC_FLOAT_MODE_P(MODE) \
>    (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \
>                   || (MODE) == V8SFmode || (MODE) == V4DFmode))
> @@ -2264,7 +2269,7 @@ constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
>  constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
>    | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
>    | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
> -  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI;
> +  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16;
>  constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
>    | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
>  constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index dd991c3ffdf..8f11cbcf28b 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -496,7 +496,7 @@ (define_attr "type"
>
>  ;; Main data type used by the insn
>  (define_attr "mode"
> -  "unknown,none,QI,HI,SI,DI,TI,OI,XI,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
> +  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
>    V2DF,V2SF,V1DF,V8DF"
>    (const_string "unknown"))
>
> @@ -832,8 +832,7 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
>                     sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
>                     avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
>                     avx512bw,noavx512bw,avx512dq,noavx512dq,
> -                   avx512vl,noavx512vl,
> -                   avxvnni,avx512vnnivl"
> +                   avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16"
>    (const_string "base"))
>
>  ;; Define instruction set of MMX instructions
> @@ -885,7 +884,8 @@ (define_attr "enabled" ""
>          (eq_attr "isa" "avxvnni") (symbol_ref "TARGET_AVXVNNI")
>          (eq_attr "isa" "avx512vnnivl")
>            (symbol_ref "TARGET_AVX512VNNI && TARGET_AVX512VL")
> -
> +        (eq_attr "isa" "avx512fp16")
> +          (symbol_ref "TARGET_AVX512FP16")

Space here between "isa" and "mmx_isa" attribute processing.

>          (eq_attr "mmx_isa" "native")
>            (symbol_ref "!TARGET_MMX_WITH_SSE")
>          (eq_attr "mmx_isa" "sse")
> @@ -1089,8 +1089,9 @@ (define_mode_iterator SWI48DWI [SI DI (TI "TARGET_64BIT")])
>  ;; compile time constant, it is faster to use <MODE_SIZE> than
>  ;; GET_MODE_SIZE (<MODE>mode).  For XFmode which depends on
>  ;; command line options just use GET_MODE_SIZE macro.
> -(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (TI "16")
> -                            (SF "4") (DF "8") (XF "GET_MODE_SIZE (XFmode)")
> +(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
> +                            (TI "16") (HF "2") (SF "4") (DF "8")
> +                            (XF "GET_MODE_SIZE (XFmode)")
>                              (V16QI "16") (V32QI "32") (V64QI "64")
>                              (V8HI "16") (V16HI "32") (V32HI "64")
>                              (V4SI "16") (V8SI "32") (V16SI "64")
> @@ -1222,8 +1223,11 @@ (define_mode_iterator MODEF [SF DF])
>  ;; All x87 floating point modes
>  (define_mode_iterator X87MODEF [SF DF XF])
>
> -;; All x87 floating point modes plus HF
> -(define_mode_iterator X87MODEFH [SF DF XF HF])
> +;; SSE and x87 SFmode and DFmode floating point modes plus HFmode
> +(define_mode_iterator MODEFH [(HF "TARGET_AVX512FP16") SF DF])
> +
> +;; All x87 floating point modes plus HFmode
> +(define_mode_iterator X87MODEFH [HF SF DF XF])

A general remark: Please avoiding macroization of HFmode patterns for
now. MODEF macro is used for cases where modes are shared between x87
and SSE, so the patterns have:

TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH).

Looking at the macroization gain, it looks to me that we get nothing
but complications with conditional MODEFH iterator. So, please remove
all HFmode macroization (incuding mode attributes) and simply add a
couple of expanders, protected with TARGET_AVX512FP16 insn constraint.
We can macroize newly added patterns with existing in future, but
please not now.

Uros.

>  ;; All SSE floating point modes
>  (define_mode_iterator SSEMODEF [SF DF TF])
> @@ -1231,7 +1235,7 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
>
>  ;; SSE instruction suffix for various modes
>  (define_mode_attr ssemodesuffix
> -  [(SF "ss") (DF "sd")
> +  [(HF "sh") (SF "ss") (DF "sd")
>     (V16SF "ps") (V8DF "pd")
>     (V8SF "ps") (V4DF "pd")
>     (V4SF "ps") (V2DF "pd")
> @@ -1498,15 +1502,15 @@ (define_expand "cstorexf4"
>
>  (define_expand "cbranch<mode>4"
>    [(set (reg:CC FLAGS_REG)
> -       (compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
> -                   (match_operand:MODEF 2 "cmp_fp_expander_operand")))
> +       (compare:CC (match_operand:MODEFH 1 "cmp_fp_expander_operand")
> +                   (match_operand:MODEFH 2 "cmp_fp_expander_operand")))
>     (set (pc) (if_then_else
>                (match_operator 0 "ix86_fp_comparison_operator"
>                 [(reg:CC FLAGS_REG)
>                  (const_int 0)])
>                (label_ref (match_operand 3))
>                (pc)))]
> -  "TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> +  "TARGET_80387 || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
>  {
>    ix86_expand_branch (GET_CODE (operands[0]),
>                       operands[1], operands[2], operands[3]);
> @@ -1705,6 +1709,17 @@ (define_insn "*cmpi<unord><MODEF:mode>"
>          (eq_attr "alternative" "0")
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
> +
> +(define_insn "*cmpi<unord>hf"
> +  [(set (reg:CCFP FLAGS_REG)
> +       (compare:CCFP
> +         (match_operand:HF 0 "register_operand" "v")
> +         (match_operand:HF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "v<unord>comish\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "ssecomi")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
>
>  ;; Push/pop instructions.
>
> @@ -2436,8 +2451,8 @@ (define_insn "*movsi_internal"
>            (symbol_ref "true")))])
>
>  (define_insn "*movhi_internal"
> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k")
> -       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC"))]
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k,?r,?v,*v,*v,*m")
> +       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
>    "!(MEM_P (operands[0]) && MEM_P (operands[1]))
>     && ix86_hardreg_mov_ok (operands[0], operands[1])"
>
> @@ -2463,6 +2478,9 @@ (define_insn "*movhi_internal"
>           gcc_unreachable ();
>         }
>
> +    case TYPE_SSEMOV:
> +      return ix86_output_ssemov (insn, operands);
> +
>      case TYPE_MSKLOG:
>        if (operands[1] == const0_rtx)
>         return "kxorw\t%0, %0, %0";
> @@ -2478,7 +2496,9 @@ (define_insn "*movhi_internal"
>      }
>  }
>    [(set (attr "type")
> -     (cond [(eq_attr "alternative" "4,5,6,7")
> +     (cond [(eq_attr "alternative" "9,10,11,12,13")
> +             (const_string "ssemov")
> +           (eq_attr "alternative" "4,5,6,7")
>               (const_string "mskmov")
>             (eq_attr "alternative" "8")
>               (const_string "msklog")
> @@ -2503,6 +2523,8 @@ (define_insn "*movhi_internal"
>      (set (attr "mode")
>        (cond [(eq_attr "type" "imovx")
>                (const_string "SI")
> +            (eq_attr "alternative" "11")
> +              (const_string "HF")
>              (and (eq_attr "alternative" "1,2")
>                   (match_operand:HI 1 "aligned_operand"))
>                (const_string "SI")
> @@ -2511,7 +2533,12 @@ (define_insn "*movhi_internal"
>                        (not (match_test "TARGET_HIMODE_MATH"))))
>                (const_string "SI")
>             ]
> -           (const_string "HI")))])
> +           (const_string "HI")))
> +    (set (attr "isa")
> +        (cond [(eq_attr "alternative" "9,10,11,12,13")
> +               (const_string "avx512fp16")
> +              ]
> +              (const_string "*")))])

Attribute ISA should be the first in attribute section, see many examples.

>  ;; Situation is quite tricky about when to choose full sized (SImode) move
>  ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
> @@ -3727,7 +3754,10 @@ (define_insn "*movhf_internal"
>                (eq_attr "alternative" "2")
>                  (const_string "sselog1")
>                (eq_attr "alternative" "4,5,6,7")
> -                (const_string "sselog")
> +                (if_then_else
> +                  (match_test ("TARGET_AVX512FP16"))
> +                  (const_string "ssemov")
> +                  (const_string "sselog"))
>               ]
>               (const_string "ssemov")))
>     (set (attr "memory")
> @@ -3750,9 +3780,15 @@ (define_insn "*movhf_internal"
>                (eq_attr "alternative" "2")
>                  (const_string "V4SF")
>                (eq_attr "alternative" "4,5,6,7")
> -                (const_string "TI")
> +                (if_then_else
> +                  (match_test "TARGET_AVX512FP16")
> +                  (const_string "HI")
> +                  (const_string "TI"))
>                (eq_attr "alternative" "3")
> -                (const_string "SF")
> +                (if_then_else
> +                  (match_test "TARGET_AVX512FP16")
> +                  (const_string "HF")
> +                  (const_string "SF"))
>               ]
>               (const_string "*")))])
>
> @@ -4493,6 +4529,17 @@ (define_split
>    emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
>  })
>
> +(define_insn "extendhf<mode>2"
> +  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
> +        (float_extend:MODEF
> +         (match_operand:HF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvtsh2<ssemodesuffix>\t{%1, %0, %0|%0, %0, %1}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<MODE>")])
> +
> +
>  (define_expand "extend<mode>xf2"
>    [(set (match_operand:XF 0 "nonimmediate_operand")
>          (float_extend:XF (match_operand:MODEF 1 "general_operand")))]
> @@ -4670,6 +4717,18 @@ (define_insn "truncxf<mode>2"
>               (symbol_ref "flag_unsafe_math_optimizations")
>            ]
>            (symbol_ref "true")))])
> +
> +;; Conversion from {SF,DF}mode to HFmode.
> +
> +(define_insn "trunc<mode>hf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (float_truncate:HF
> +         (match_operand:MODEF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvt<ssemodesuffix>2sh\t{%1, %d0|%d0, %1}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
>
>  ;; Signed conversion to DImode.
>
> @@ -5046,6 +5105,16 @@ (define_insn "*float<SWI48:mode><MODEF:mode>2"
>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")]
>            (symbol_ref "true")))])
>
> +(define_insn "float<floatunssuffix><mode>hf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (any_float:HF
> +         (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvt<floatsuffix>si2sh<rex64suffix>\t{%1, %d0|%d0, %1}"
> +  [(set_attr "type" "sseicvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*floatdi<MODEF:mode>2_i387"
>    [(set (match_operand:MODEF 0 "register_operand" "=f")
>         (float:MODEF (match_operand:DI 1 "nonimmediate_operand" "m")))]
> @@ -7627,12 +7696,12 @@ (define_expand "<insn>xf3"
>    "TARGET_80387")
>
>  (define_expand "<insn><mode>3"
> -  [(set (match_operand:MODEF 0 "register_operand")
> -       (plusminus:MODEF
> -         (match_operand:MODEF 1 "register_operand")
> -         (match_operand:MODEF 2 "nonimmediate_operand")))]
> +  [(set (match_operand:MODEFH 0 "register_operand")
> +       (plusminus:MODEFH
> +         (match_operand:MODEFH 1 "register_operand")
> +         (match_operand:MODEFH 2 "nonimmediate_operand")))]
>    "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
> -    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)")
> +    || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)")
>
>  ;; Multiply instructions
>
> @@ -8204,11 +8273,11 @@ (define_expand "mulxf3"
>    "TARGET_80387")
>
>  (define_expand "mul<mode>3"
> -  [(set (match_operand:MODEF 0 "register_operand")
> -       (mult:MODEF (match_operand:MODEF 1 "register_operand")
> -                   (match_operand:MODEF 2 "nonimmediate_operand")))]
> +  [(set (match_operand:MODEFH 0 "register_operand")
> +       (mult:MODEFH (match_operand:MODEFH 1 "register_operand")
> +                   (match_operand:MODEFH 2 "nonimmediate_operand")))]
>    "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
> -    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)")
> +    || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)")
>
>  ;; Divide instructions
>
> @@ -8221,11 +8290,11 @@ (define_expand "divxf3"
>    "TARGET_80387")
>
>  (define_expand "div<mode>3"
> -  [(set (match_operand:MODEF 0 "register_operand")
> -       (div:MODEF (match_operand:MODEF 1 "register_operand")
> -                  (match_operand:MODEF 2 "nonimmediate_operand")))]
> +  [(set (match_operand:MODEFH 0 "register_operand")
> +       (div:MODEFH (match_operand:MODEFH 1 "register_operand")
> +                  (match_operand:MODEFH 2 "nonimmediate_operand")))]
>    "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
> -    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> +    || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
>  {
>    if (<MODE>mode == SFmode
>        && TARGET_SSE && TARGET_SSE_MATH
> @@ -16312,6 +16381,22 @@ (define_insn "*fop_<mode>_comm"
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
>
> +(define_insn "*fop_hf_comm"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (match_operator:HF 3 "binary_fp_operator"
> +         [(match_operand:HF 1 "nonimmediate_operand" "%v")
> +          (match_operand:HF 2 "nonimmediate_operand" "vm")]))]
> +  "TARGET_AVX512FP16
> +   && COMMUTATIVE_ARITH_P (operands[3])
> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> +  "* return output_387_binary_op (insn, operands);"
> +  [(set (attr "type")
> +       (if_then_else (match_operand:HF 3 "mult_operator")
> +         (const_string "ssemul")
> +         (const_string "sseadd")))
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*rcpsf2_sse"
>    [(set (match_operand:SF 0 "register_operand" "=x,x,x")
>         (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
> @@ -16385,6 +16470,22 @@ (define_insn "*fop_<mode>_1"
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
>
> +(define_insn "*fop_hf_1"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (match_operator:HF 3 "binary_fp_operator"
> +         [(match_operand:HF 1 "nonimmediate_operand" "v")
> +          (match_operand:HF 2 "nonimmediate_operand" "vm")]))]
> +  "TARGET_AVX512FP16
> +   && !COMMUTATIVE_ARITH_P (operands[3])
> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> +  "* return output_387_binary_op (insn, operands);"
> +  [(set (attr "type")
> +       (if_then_else (match_operand:MODEF 3 "div_operator")
> +         (const_string "ssediv")
> +         (const_string "sseadd")))
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<MODE>")])
> +
>  (define_insn "*fop_<X87MODEF:mode>_2_i387"
>    [(set (match_operand:X87MODEF 0 "register_operand" "=f")
>         (match_operator:X87MODEF 3 "binary_fp_operator"
> @@ -19179,13 +19280,13 @@ (define_peephole2
>  })
>
>  (define_expand "mov<mode>cc"
> -  [(set (match_operand:X87MODEF 0 "register_operand")
> -       (if_then_else:X87MODEF
> +  [(set (match_operand:X87MODEFH 0 "register_operand")
> +       (if_then_else:X87MODEFH
>           (match_operand 1 "comparison_operator")
> -         (match_operand:X87MODEF 2 "register_operand")
> -         (match_operand:X87MODEF 3 "register_operand")))]
> +         (match_operand:X87MODEFH 2 "register_operand")
> +         (match_operand:X87MODEFH 3 "register_operand")))]
>    "(TARGET_80387 && TARGET_CMOVE)
> -   || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> +   || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
>    "if (ix86_expand_fp_movcc (operands)) DONE; else FAIL;")
>
>  (define_insn "*movxfcc_1"
> @@ -19347,12 +19448,12 @@ (define_insn "<code><mode>3"
>  ;; presence of -0.0 and NaN.
>
>  (define_insn "*ieee_s<ieee_maxmin><mode>3"
> -  [(set (match_operand:MODEF 0 "register_operand" "=x,v")
> -       (unspec:MODEF
> -         [(match_operand:MODEF 1 "register_operand" "0,v")
> -          (match_operand:MODEF 2 "nonimmediate_operand" "xm,vm")]
> +  [(set (match_operand:MODEFH 0 "register_operand" "=x,v")
> +       (unspec:MODEFH
> +         [(match_operand:MODEFH 1 "register_operand" "0,v")
> +          (match_operand:MODEFH 2 "nonimmediate_operand" "xm,vm")]
>           IEEE_MAXMIN))]
> -  "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
> +  "SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
>    "@
>     <ieee_maxmin><ssemodesuffix>\t{%2, %0|%0, %2}
>     v<ieee_maxmin><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 7b8547bb1c3..ad366974b5b 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1166,3 +1166,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property.
>  mmwait
>  Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save
>  Support MWAIT and MONITOR built-in functions and code generation.
> +
> +mavx512fp16
> +Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
> +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation.
> diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
> index f129de4bbe5..2421a78637b 100644
> --- a/gcc/config/i386/immintrin.h
> +++ b/gcc/config/i386/immintrin.h
> @@ -94,6 +94,10 @@
>
>  #include <avx512vp2intersectvlintrin.h>
>
> +#ifdef __SSE2__
> +#include <avx512fp16intrin.h>
> +#endif
> +
>  #include <shaintrin.h>
>
>  #include <fmaintrin.h>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 32697e6117c..bb9f7ca956e 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1393,6 +1393,7 @@ See RS/6000 and PowerPC Options.
>  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
>  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
>  -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni@gol
> +-mavx512fp16 @gol
>  -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
>  -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
>  -mkl -mwidekl @gol
> @@ -31154,6 +31155,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
>  @itemx -mavx512bf16
>  @opindex mavx512bf16
>  @need 200
> +@itemx -mavx512fp16
> +@opindex mavx512fp16
> +@need 200
>  @itemx -mgfni
>  @opindex mgfni
>  @need 200
> @@ -31232,9 +31236,9 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP,
>  XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2,
>  GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
>  ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
> -UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI or CLDEMOTE
> -extended instruction sets. Each has a corresponding @option{-mno-} option to
> -disable use of these instructions.
> +UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16
> +or CLDEMOTE extended instruction sets. Each has a corresponding
> +@option{-mno-} option to disable use of these instructions.
>
>  These extensions are also available as built-in functions: see
>  @ref{x86 Built-in Functions}, for details of the functions enabled and
> diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
> index 62b2132957a..fba3d1ac684 100644
> --- a/gcc/testsuite/g++.dg/other/i386-2.C
> +++ b/gcc/testsuite/g++.dg/other/i386-2.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>
>  /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
>     xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
> diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
> index 843aa2bdb2f..5cc0fa83457 100644
> --- a/gcc/testsuite/g++.dg/other/i386-3.C
> +++ b/gcc/testsuite/g++.dg/other/i386-3.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>
>  /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
>     xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
> diff --git a/gcc/testsuite/g++.target/i386/float16-1.C b/gcc/testsuite/g++.target/i386/float16-1.C
> new file mode 100644
> index 00000000000..95d1ac27c4f
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-1.C
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-sse2" } */
> +
> +_Float16/* { dg-error "does not name a type" } */
> +foo (_Float16 x)
> +{
> +  return x;
> +}
> diff --git a/gcc/testsuite/g++.target/i386/float16-2.C b/gcc/testsuite/g++.target/i386/float16-2.C
> new file mode 100644
> index 00000000000..99eb797eff1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-2.C
> @@ -0,0 +1,14 @@
> +/* { dg-do assemble { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +union flt
> +{
> +  _Float16 flt;
> +  short s;
> +};
> +
> +_Float16
> +foo (union flt x)
> +{
> +  return x.flt;
> +}
> diff --git a/gcc/testsuite/g++.target/i386/float16-3.C b/gcc/testsuite/g++.target/i386/float16-3.C
> new file mode 100644
> index 00000000000..940878503f1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-3.C
> @@ -0,0 +1,10 @@
> +/* { dg-do assemble { target avx512fp16 } } */
> +/* { dg-options "-O0 -mavx512fp16" } */
> +
> +template <typename> void a(char *) {}
> +char b, d;
> +void c()
> +{
> +  a<unsigned char>(&d);
> +  a<_Float16>(&b);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> index 6178e38ce02..f3676077743 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw" } */
> +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
> index 986fbd819e4..1751c52565c 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw" } */
> +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> index 0a377dba1d5..0ad9064f637 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> @@ -87,6 +87,9 @@ main ()
>  #ifdef AVX512VNNI
>        && (ecx & bit_AVX512VNNI)
>  #endif
> +#ifdef AVX512FP16
> +      && (edx & bit_AVX512FP16)
> +#endif
>  #ifdef VAES
>        && (ecx & bit_VAES)
>  #endif
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> new file mode 100644
> index 00000000000..88887556d68
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +__attribute__ ((noinline, noclone))
> +do_max (_Float16 __A, _Float16 __B)
> +{
> +  return __A > __B ? __A : __B;
> +}
> +
> +_Float16
> +__attribute__ ((noinline, noclone))
> +do_min (_Float16 __A, _Float16 __B)
> +{
> +  return __A < __B ? __A : __B;
> +}
> +
> +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> new file mode 100644
> index 00000000000..c9e23bf95c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +#include "avx512fp16-12a.c"
> +
> +static void
> +do_test (void)
> +{
> +  _Float16 x = 0.1f;
> +  _Float16 y = -3.2f;
> +  _Float16 z;
> +
> +  z = do_max (x, y);
> +  if (z != x)
> +    abort ();
> +
> +  z = do_min (x, y);
> +  if (z != y)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/float16-3a.c b/gcc/testsuite/gcc.target/i386/float16-3a.c
> new file mode 100644
> index 00000000000..3846c8e9b6e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-3a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtsi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-3b.c b/gcc/testsuite/gcc.target/i386/float16-3b.c
> new file mode 100644
> index 00000000000..247dd6e7e33
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-3b.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (unsigned int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtusi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-4a.c b/gcc/testsuite/gcc.target/i386/float16-4a.c
> new file mode 100644
> index 00000000000..631082581f3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-4a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (long long x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtsi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-4b.c b/gcc/testsuite/gcc.target/i386/float16-4b.c
> new file mode 100644
> index 00000000000..828d8530769
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-4b.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (unsigned long long x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtusi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> index 79265c7c94f..8499fdf2db9 100644
> --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> @@ -79,6 +79,7 @@ extern void test_hreset (void)                        __attribute__((__target__("hreset")));
>  extern void test_keylocker (void)              __attribute__((__target__("kl")));
>  extern void test_widekl (void)                 __attribute__((__target__("widekl")));
>  extern void test_avxvnni (void)                        __attribute__((__target__("avxvnni")));
> +extern void test_avx512fp16 (void)             __attribute__((__target__("avx512fp16")));
>
>  extern void test_no_sgx (void)                 __attribute__((__target__("no-sgx")));
>  extern void test_no_avx5124fmaps(void)         __attribute__((__target__("no-avx5124fmaps")));
> @@ -159,6 +160,7 @@ extern void test_no_hreset (void)           __attribute__((__target__("no-hreset")));
>  extern void test_no_keylocker (void)           __attribute__((__target__("no-kl")));
>  extern void test_no_widekl (void)              __attribute__((__target__("no-widekl")));
>  extern void test_no_avxvnni (void)             __attribute__((__target__("no-avxvnni")));
> +extern void test_no_avx512fp16 (void)          __attribute__((__target__("no-avx512fp16")));
>
>  extern void test_arch_nocona (void)            __attribute__((__target__("arch=nocona")));
>  extern void test_arch_core2 (void)             __attribute__((__target__("arch=core2")));
> diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> new file mode 100644
> index 00000000000..2f8af392c83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> +
> +#include <immintrin.h>
> +
> +_Float16
> +foo (_Float16 x, _Float16 y)
> +{
> +  x = x > y ? x : y;
> +  return x;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> index 7029771334b..f5f5c113612 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> index 4ce0ffffaf3..747d504cedb 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> index 6e8b6f3fa1b..33411969901 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> @@ -103,7 +103,7 @@
>
>
>  #ifndef DIFFERENT_PRAGMAS
> -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>  #endif
>
>  /* Following intrinsics require immediate arguments.  They
> @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
>
>  /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
>  #ifdef DIFFERENT_PRAGMAS
> -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>  #endif
>  #include <immintrin.h>
>  test_1 (_cvtss_sh, unsigned short, float, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> index 7faa053ace8..86590ca5ffb 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> @@ -708,6 +708,6 @@
>  #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1)
>  #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1)
>
> -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>
>  #include <x86intrin.h>
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index 42ac9d0ac1a..10765365d7b 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3020,7 +3020,7 @@ proc check_effective_target_has_q_floating_suffix { } {
>
>  proc check_effective_target_float16 {} {
>      return [check_no_compiler_messages_nocache float16 object {
> -        _Float16 x;
> +        _Float16 foo (_Float16 x) { return x; }
>      } [add_options_for_float16 ""]]
>  }
>
> @@ -8714,6 +8714,17 @@ proc check_prefer_avx128 { } {
>  }
>
>
> +# Return 1 if avx512fp16 instructions can be compiled.
> +
> +proc check_effective_target_avx512fp16 { } {
> +    return [check_no_compiler_messages avx512fp16 object {
> +       void foo (void)
> +       {
> +         asm volatile ("vmovw %edi, %xmm0");
> +       }
> +    } "-O2 -mavx512fp16" ]
> +}
> +
>  # Return 1 if avx512f instructions can be compiled.
>
>  proc check_effective_target_avx512f { } {
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-07-21  7:43         ` [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
  2021-07-21 10:35           ` Uros Bizjak
@ 2021-07-22 11:56           ` Richard Biener
  2021-07-28 21:56           ` Joseph Myers
  2 siblings, 0 replies; 138+ messages in thread
From: Richard Biener @ 2021-07-22 11:56 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Uros Bizjak, Joseph S. Myers, H. J. Lu, Hongtao Liu

On Wed, Jul 21, 2021 at 9:43 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
>         * config/i386/i386.c (enum x86_64_reg_class): Add
>         X86_64_SSEHF_CLASS.
>         (merge_classes): Handle X86_64_SSEHF_CLASS.
>         (examine_argument): Ditto.
>         (construct_container): Ditto.
>         (classify_argument): Ditto, and set HFmode/HCmode to
>         X86_64_SSEHF_CLASS.
>         (function_value_32): Return _FLoat16/Complex Float16 by
>         %xmm0/%xmm1.
>         (function_value_64): Return _Float16/Complex Float16 by SSE
>         register.
>         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
>         (ix86_secondary_reload): Require gpr as intermediate register
>         to store _Float16 from sse register when sse4 is not
>         available.
>         (ix86_hard_regno_mode_ok): Put HFmode in sse register and gpr.
>         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
>         sse2.
>         (ix86_scalar_mode_supported_p): Ditto.
>         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
>         (ix86_get_excess_precision): Return
>         FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 under sse2.
>         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
>         * config/i386/i386.md (*pushhf_rex64): New define_insn.
>         (*pushhf): Ditto.
>         (*movhf_internal): Ditto.
>         * doc/extend.texi (Half-Precision Floating Point): Documemt
>         _Float16 for x86.
>
> gcc/lto/ChangeLog:
>
>         * lto-lang.c (lto_type_for_mode): Return float16_type_node
>         when mode == TYPE_MODE (float16_type_node).

This lto-lang.c part is OK.

> gcc/testsuite/ChangeLog
>
>         * gcc.target/i386/sse2-float16-1.c: New test.
>         * gcc.target/i386/sse2-float16-2.c: Ditto.
>         * gcc.target/i386/sse2-float16-3.c: Ditto.
> ---
>  gcc/config/i386/i386-modes.def                |   1 +
>  gcc/config/i386/i386.c                        |  99 ++++++++++++++-
>  gcc/config/i386/i386.h                        |   2 +-
>  gcc/config/i386/i386.md                       | 118 +++++++++++++++++-
>  gcc/doc/extend.texi                           |  16 +++
>  gcc/lto/lto-lang.c                            |   3 +
>  .../gcc.target/i386/sse2-float16-1.c          |   8 ++
>  .../gcc.target/i386/sse2-float16-2.c          |  16 +++
>  .../gcc.target/i386/sse2-float16-3.c          |  12 ++
>  9 files changed, 265 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
>
> diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
> index 4e7014be034..9232f59a925 100644
> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
>  FLOAT_MODE (TF, 16, ieee_quad_format);
> +FLOAT_MODE (HF, 2, ieee_half_format);
>
>  /* In ILP32 mode, XFmode has size 12 and alignment 4.
>     In LP64 mode, XFmode has size and alignment 16.  */
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ff96134fb37..02628d838fc 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -387,6 +387,7 @@ enum x86_64_reg_class
>      X86_64_INTEGER_CLASS,
>      X86_64_INTEGERSI_CLASS,
>      X86_64_SSE_CLASS,
> +    X86_64_SSEHF_CLASS,
>      X86_64_SSESF_CLASS,
>      X86_64_SSEDF_CLASS,
>      X86_64_SSEUP_CLASS,
> @@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum x86_64_reg_class class2)
>      return X86_64_MEMORY_CLASS;
>
>    /* Rule #4: If one of the classes is INTEGER, the result is INTEGER.  */
> -  if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
> -      || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS))
> +  if ((class1 == X86_64_INTEGERSI_CLASS
> +       && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
> +      || (class2 == X86_64_INTEGERSI_CLASS
> +         && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS)))
>      return X86_64_INTEGERSI_CLASS;
>    if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
>        || class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS)
> @@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
>             /* The partial classes are now full classes.  */
>             if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4)
>               subclasses[0] = X86_64_SSE_CLASS;
> +           if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2)
> +             subclasses[0] = X86_64_SSE_CLASS;
>             if (subclasses[0] == X86_64_INTEGERSI_CLASS
>                 && !((bit_offset % 64) == 0 && bytes == 4))
>               subclasses[0] = X86_64_INTEGER_CLASS;
> @@ -2350,6 +2355,12 @@ classify_argument (machine_mode mode, const_tree type,
>        gcc_unreachable ();
>      case E_CTImode:
>        return 0;
> +    case E_HFmode:
> +      if (!(bit_offset % 64))
> +       classes[0] = X86_64_SSEHF_CLASS;
> +      else
> +       classes[0] = X86_64_SSE_CLASS;
> +      return 1;
>      case E_SFmode:
>        if (!(bit_offset % 64))
>         classes[0] = X86_64_SSESF_CLASS;
> @@ -2367,6 +2378,15 @@ classify_argument (machine_mode mode, const_tree type,
>        classes[0] = X86_64_SSE_CLASS;
>        classes[1] = X86_64_SSEUP_CLASS;
>        return 2;
> +    case E_HCmode:
> +      classes[0] = X86_64_SSE_CLASS;
> +      if (!(bit_offset % 64))
> +       return 1;
> +      else
> +       {
> +         classes[1] = X86_64_SSEHF_CLASS;
> +         return 2;
> +       }
>      case E_SCmode:
>        classes[0] = X86_64_SSE_CLASS;
>        if (!(bit_offset % 64))
> @@ -2481,6 +2501,7 @@ examine_argument (machine_mode mode, const_tree type, int in_return,
>         (*int_nregs)++;
>         break;
>        case X86_64_SSE_CLASS:
> +      case X86_64_SSEHF_CLASS:
>        case X86_64_SSESF_CLASS:
>        case X86_64_SSEDF_CLASS:
>         (*sse_nregs)++;
> @@ -2580,13 +2601,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
>
>    /* First construct simple cases.  Avoid SCmode, since we want to use
>       single register to pass this type.  */
> -  if (n == 1 && mode != SCmode)
> +  if (n == 1 && mode != SCmode && mode != HCmode)
>      switch (regclass[0])
>        {
>        case X86_64_INTEGER_CLASS:
>        case X86_64_INTEGERSI_CLASS:
>         return gen_rtx_REG (mode, intreg[0]);
>        case X86_64_SSE_CLASS:
> +      case X86_64_SSEHF_CLASS:
>        case X86_64_SSESF_CLASS:
>        case X86_64_SSEDF_CLASS:
>         if (mode != BLKmode)
> @@ -2683,6 +2705,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
>                                    GEN_INT (i*8));
>             intreg++;
>             break;
> +         case X86_64_SSEHF_CLASS:
> +           exp [nexps++]
> +             = gen_rtx_EXPR_LIST (VOIDmode,
> +                                  gen_rtx_REG (HFmode,
> +                                               GET_SSE_REGNO (sse_regno)),
> +                                  GEN_INT (i*8));
> +           sse_regno++;
> +           break;
>           case X86_64_SSESF_CLASS:
>             exp [nexps++]
>               = gen_rtx_EXPR_LIST (VOIDmode,
> @@ -3903,6 +3933,19 @@ function_value_32 (machine_mode orig_mode, machine_mode mode,
>      /* Most things go in %eax.  */
>      regno = AX_REG;
>
> +  /* Return _Float16/_Complex _Foat16 by sse register.  */
> +  if (mode == HFmode)
> +    regno = FIRST_SSE_REG;
> +  if (mode == HCmode)
> +    {
> +      rtx ret = gen_rtx_PARALLEL (mode, rtvec_alloc(1));
> +      XVECEXP (ret, 0, 0)
> +       = gen_rtx_EXPR_LIST (VOIDmode,
> +                            gen_rtx_REG (SImode, FIRST_SSE_REG),
> +                            GEN_INT (0));
> +      return ret;
> +    }
> +
>    /* Override FP return register with %xmm0 for local functions when
>       SSE math is enabled or for functions with sseregparm attribute.  */
>    if ((fn || fntype) && (mode == SFmode || mode == DFmode))
> @@ -3939,6 +3982,8 @@ function_value_64 (machine_mode orig_mode, machine_mode mode,
>
>        switch (mode)
>         {
> +       case E_HFmode:
> +       case E_HCmode:
>         case E_SFmode:
>         case E_SCmode:
>         case E_DFmode:
> @@ -13411,6 +13456,15 @@ ix86_print_operand (FILE *file, rtx x, int code)
>           (file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P');
>      }
>
> +  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == HFmode)
> +    {
> +      long l = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (x),
> +                              REAL_MODE_FORMAT (HFmode));
> +      if (ASSEMBLER_DIALECT == ASM_ATT)
> +       putc ('$', file);
> +      fprintf (file, "0x%04x", (unsigned int) l);
> +    }
> +
>    else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
>      {
>        long l;
> @@ -18928,6 +18982,16 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
>        return NO_REGS;
>      }
>
> +  /* Require movement to gpr, and then store to memory.  */
> +  if (mode == HFmode
> +      && !TARGET_SSE4_1
> +      && SSE_CLASS_P (rclass)
> +      && !in_p && MEM_P (x))
> +    {
> +      sri->extra_cost = 1;
> +      return GENERAL_REGS;
> +    }
> +
>    /* This condition handles corner case where an expression involving
>       pointers gets vectorized.  We're trying to use the address of a
>       stack slot as a vector initializer.
> @@ -19546,6 +19610,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>    else if (VALID_INT_MODE_P (mode)
>            || VALID_FP_MODE_P (mode))
>      return true;
> +  else if (mode == HFmode || mode == HCmode)
> +    return true;
>    /* Lots of MMX code casts 8 byte vector modes to DImode.  If we then go
>       on to use that value in smaller contexts, this can easily force a
>       pseudo to be allocated to GENERAL_REGS.  Since this is no worse than
> @@ -21555,10 +21621,27 @@ ix86_scalar_mode_supported_p (scalar_mode mode)
>      return default_decimal_float_supported_p ();
>    else if (mode == TFmode)
>      return true;
> +  else if (mode == HFmode && TARGET_SSE2)
> +    return true;
>    else
>      return default_scalar_mode_supported_p (mode);
>  }
>
> +/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
> +   if MODE is HFmode, and punt to the generic implementation otherwise.  */
> +
> +static bool
> +ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode)
> +{
> +  /* NB: Always return TRUE for HFmode so that the _Float16 type will
> +     be defined by the C front-end for AVX512FP16 intrinsics.  We will
> +     issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
> +     enabled.  */
> +  return ((mode == HFmode && TARGET_SSE2)
> +         ? true
> +         : default_libgcc_floating_mode_supported_p (mode));
> +}
> +
>  /* Implements target hook vector_mode_supported_p.  */
>  static bool
>  ix86_vector_mode_supported_p (machine_mode mode)
> @@ -23254,13 +23337,15 @@ ix86_get_excess_precision (enum excess_precision_type type)
>            provide would be identical were it not for the unpredictable
>            cases.  */
>         if (!TARGET_80387)
> -         return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +         return TARGET_SSE2
> +                ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> +                : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
>         else if (!TARGET_MIX_SSE_I387)
>           {
>             if (!(TARGET_SSE && TARGET_SSE_MATH))
>               return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
>             else if (TARGET_SSE2)
> -             return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +             return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
>           }
>
>         /* If we are in standards compliant mode, but we know we will
> @@ -23820,6 +23905,10 @@ ix86_run_selftests (void)
>  #undef TARGET_SCALAR_MODE_SUPPORTED_P
>  #define TARGET_SCALAR_MODE_SUPPORTED_P ix86_scalar_mode_supported_p
>
> +#undef TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P
> +#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P        \
> +ix86_libgcc_floating_mode_supported_p
> +
>  #undef TARGET_VECTOR_MODE_SUPPORTED_P
>  #define TARGET_VECTOR_MODE_SUPPORTED_P ix86_vector_mode_supported_p
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 0c2c93daf32..e21922e8782 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1018,7 +1018,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define VALID_SSE2_REG_MODE(MODE)                                      \
>    ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode     \
>     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
> -   || (MODE) == V2DImode || (MODE) == DFmode)
> +   || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
>
>  #define VALID_SSE_REG_MODE(MODE)                                       \
>    ((MODE) == V1TImode || (MODE) == TImode                              \
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 8b809c49fe0..dd991c3ffdf 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1222,6 +1222,9 @@ (define_mode_iterator MODEF [SF DF])
>  ;; All x87 floating point modes
>  (define_mode_iterator X87MODEF [SF DF XF])
>
> +;; All x87 floating point modes plus HF
> +(define_mode_iterator X87MODEFH [SF DF XF HF])
> +
>  ;; All SSE floating point modes
>  (define_mode_iterator SSEMODEF [SF DF TF])
>  (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
> @@ -3130,6 +3133,32 @@ (define_split
>    operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx);
>  })
>
> +(define_insn "*pushhf_rex64"
> +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> +       (match_operand:HF 1 "nonmemory_no_elim_operand" "r,x"))]
> +  "TARGET_64BIT"
> +{
> +  /* Anything else should be already split before reg-stack.  */
> +  gcc_assert (which_alternative == 0);
> +  return "push{q}\t%q1";
> +}
> +  [(set_attr "type" "push,multi")
> +   (set_attr "mode" "DI,TI")
> +   (set_attr "isa"  "*,sse4")])
> +
> +(define_insn "*pushhf"
> +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> +       (match_operand:HF 1 "general_no_elim_operand" "rmF,x"))]
> +  "!TARGET_64BIT"
> +{
> +  /* Anything else should be already split before reg-stack.  */
> +  gcc_assert (which_alternative == 0);
> +  return "push{l}\t%k1";
> +}
> +  [(set_attr "type" "push,multi")
> +   (set_attr "mode" "SI,TI")
> +   (set_attr "isa"  "*,sse4")])
> +
>  (define_insn "*pushsf_rex64"
>    [(set (match_operand:SF 0 "push_operand" "=X,X,X")
>         (match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,v"))]
> @@ -3158,10 +3187,11 @@ (define_insn "*pushsf"
>     (set_attr "unit" "i387,*,*")
>     (set_attr "mode" "SF,SI,SF")])
>
> +(define_mode_iterator MODESH [SF HF])
>  ;; %%% Kill this when call knows how to work this out.
>  (define_split
> -  [(set (match_operand:SF 0 "push_operand")
> -       (match_operand:SF 1 "any_fp_register_operand"))]
> +  [(set (match_operand:MODESH 0 "push_operand")
> +       (match_operand:MODESH 1 "any_fp_register_operand"))]
>    "reload_completed"
>    [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2)))
>     (set (match_dup 0) (match_dup 1))]
> @@ -3209,8 +3239,8 @@ (define_expand "movtf"
>    "ix86_expand_move (TFmode, operands); DONE;")
>
>  (define_expand "mov<mode>"
> -  [(set (match_operand:X87MODEF 0 "nonimmediate_operand")
> -       (match_operand:X87MODEF 1 "general_operand"))]
> +  [(set (match_operand:X87MODEFH 0 "nonimmediate_operand")
> +       (match_operand:X87MODEFH 1 "general_operand"))]
>    ""
>    "ix86_expand_move (<MODE>mode, operands); DONE;")
>
> @@ -3646,6 +3676,86 @@ (define_insn "*movsf_internal"
>            ]
>            (const_string "*")))])
>
> +(define_insn "*movhf_internal"
> + [(set (match_operand:HF 0 "nonimmediate_operand"
> +        "=?r,?m,v,v,?r,m,?v,v")
> +       (match_operand:HF 1 "general_operand"
> +        "rmF,rF,C,v, v,v, r,m"))]
> + "!(MEM_P (operands[0]) && MEM_P (operands[1]))
> +  && (lra_in_progress
> +      || reload_completed
> +      || !CONST_DOUBLE_P (operands[1])
> +      || (TARGET_SSE && TARGET_SSE_MATH
> +         && standard_sse_constant_p (operands[1], HFmode) == 1)
> +      || memory_operand (operands[0], HFmode))"
> +{
> +  switch (get_attr_type (insn))
> +    {
> +    case TYPE_IMOV:
> +      return "mov{w}\t{%1, %0|%0, %1}";
> +
> +    case TYPE_SSELOG1:
> +      return standard_sse_constant_opcode (insn, operands);
> +
> +    case TYPE_SSEMOV:
> +      return ix86_output_ssemov (insn, operands);
> +
> +    case TYPE_SSELOG:
> +      if (SSE_REG_P (operands[0]))
> +       return MEM_P (operands[1])
> +              ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> +              : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> +      else
> +       return MEM_P (operands[1])
> +              ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> +              : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> +
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +  [(set (attr "isa")
> +       (cond [(eq_attr "alternative" "2,3,4,6,7")
> +                (const_string "sse2")
> +              (eq_attr "alternative" "5")
> +                (const_string "sse4")
> +             ]
> +             (const_string "*")))
> +   (set (attr "type")
> +       (cond [(eq_attr "alternative" "0,1")
> +                (const_string "imov")
> +              (eq_attr "alternative" "2")
> +                (const_string "sselog1")
> +              (eq_attr "alternative" "4,5,6,7")
> +                (const_string "sselog")
> +             ]
> +             (const_string "ssemov")))
> +   (set (attr "memory")
> +       (cond [(eq_attr "alternative" "4,6")
> +                (const_string "none")
> +              (eq_attr "alternative" "5")
> +                (const_string "store")
> +              (eq_attr "alternative" "7")
> +                (const_string "load")
> +             ]
> +             (const_string "*")))
> +   (set (attr "prefix")
> +       (cond [(eq_attr "alternative" "0,1")
> +                (const_string "orig")
> +             ]
> +             (const_string "maybe_vex")))
> +   (set (attr "mode")
> +       (cond [(eq_attr "alternative" "0,1")
> +                (const_string "HI")
> +              (eq_attr "alternative" "2")
> +                (const_string "V4SF")
> +              (eq_attr "alternative" "4,5,6,7")
> +                (const_string "TI")
> +              (eq_attr "alternative" "3")
> +                (const_string "SF")
> +             ]
> +             (const_string "*")))])
> +
>  (define_split
>    [(set (match_operand 0 "any_fp_register_operand")
>         (match_operand 1 "memory_operand"))]
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index b83cd4919bb..2cd0b38fe5b 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
>  @section Half-Precision Floating Point
>  @cindex half-precision floating point
>  @cindex @code{__fp16} data type
> +@cindex @code{__Float16} data type
>
>  On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
>  point via the @code{__fp16} type defined in the ARM C Language Extensions.
> @@ -1150,6 +1151,21 @@ calls.
>  It is recommended that portable code use the @code{_Float16} type defined
>  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
>
> +On x86 targets with @code{target("sse2")} and above, GCC supports half-precision
> +(16-bit) floating point via the @code{_Float16} type which is defined by
> +18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> +which contains same data format as C.
> +
> +Without @code{target("avx512fp16")} @code{_Float16} type is storage only, and all
> +operations will be emulated by soft-fp and @code{float} instructions.
> +
> +Soft-fp keeps the intermediate result of the operation at 32-bit precision by defaults,
> +which may lead to inconsistent behavior between soft-fp and avx512fp16 instructions,
> +using @option{-fexcess-precision=standard} will force round back after every operation.
> +
> +With @option{-mavx512fp16}, instead of calling soft-fp, GCC automatically generates
> +hardware instructions.
> +
>  @node Decimal Float
>  @section Decimal Floating Types
>  @cindex decimal floating types
> diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
> index c13c7e45ac1..92f499643b5 100644
> --- a/gcc/lto/lto-lang.c
> +++ b/gcc/lto/lto-lang.c
> @@ -992,6 +992,9 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>      return unsigned_p ? unsigned_intTI_type_node : intTI_type_node;
>  #endif
>
> +  if (float16_type_node && mode == TYPE_MODE (float16_type_node))
> +    return float16_type_node;
> +
>    if (mode == TYPE_MODE (float_type_node))
>      return float_type_node;
>
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-1.c b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> new file mode 100644
> index 00000000000..1b645eb499d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-sse2" } */
> +
> +_Float16/* { dg-error "is not supported on this target" } */
> +foo (_Float16 x) /* { dg-error "is not supported on this target" } */
> +{
> +  return x;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-2.c b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> new file mode 100644
> index 00000000000..3da7683fc31
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx512f" } */
> +
> +union flt
> +{
> +  _Float16 flt;
> +  short s;
> +};
> +
> +_Float16
> +foo (union flt x)
> +{
> +  return x.flt;
> +}
> +
> +/* { dg-final { scan-assembler {(?n)pinsrw[\t ].*%xmm0} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-3.c b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> new file mode 100644
> index 00000000000..60ff9d4ab80
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx512f" } */
> +
> +#include<complex.h>
> +
> +_Complex _Float16
> +foo (_Complex _Float16 x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler {(?n)movd[\t ].*%xmm0} } } */
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.
  2021-07-21  7:43         ` [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations liuhongt
  2021-07-21 10:51           ` Uros Bizjak
@ 2021-07-22 12:14           ` Richard Biener
  2021-07-27  5:32             ` Hongtao Liu
  1 sibling, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-07-22 12:14 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Uros Bizjak, Joseph S. Myers, H. J. Lu, Hongtao Liu

On Wed, Jul 21, 2021 at 9:43 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * optabs-query.c (get_best_extraction_insn): Use word_mode for
>         HF field.
>
> libgcc/ChangeLog:
>
>         * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
>         * config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto.
>         * config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto.
>         * config/i386/t-softfp: Add hf soft-fp.
>         * config.host: Add i386/64/t-softfp.
>         * config/i386/64/t-softfp: New file.
> ---
>  gcc/optabs-query.c                  | 10 +++++++++-
>  libgcc/config.host                  |  5 +----
>  libgcc/config/i386/32/sfp-machine.h |  1 +
>  libgcc/config/i386/64/sfp-machine.h |  1 +
>  libgcc/config/i386/64/t-softfp      |  1 +
>  libgcc/config/i386/sfp-machine.h    |  1 +
>  libgcc/config/i386/t-softfp         |  5 +++++
>  7 files changed, 19 insertions(+), 5 deletions(-)
>  create mode 100644 libgcc/config/i386/64/t-softfp
>
> diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
> index 05ee5f517da..0438e451474 100644
> --- a/gcc/optabs-query.c
> +++ b/gcc/optabs-query.c
> @@ -205,7 +205,15 @@ get_best_extraction_insn (extraction_insn *insn,
>                           machine_mode field_mode)
>  {
>    opt_scalar_int_mode mode_iter;
> -  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode_for_size (struct_bits))
> +  scalar_int_mode smallest_int_mode;
> +  /* FIXME: validate_subreg only allows (subreg:WORD_MODE (reg:HF) 0). */

I think that needs "fixing" then, or alternatively the caller should care.

> +  if (FLOAT_MODE_P (field_mode)
> +      && known_eq (GET_MODE_SIZE (field_mode), 2))
> +    smallest_int_mode = word_mode;
> +  else
> +    smallest_int_mode = smallest_int_mode_for_size (struct_bits);
> +
> +  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode)
>      {
>        scalar_int_mode mode = mode_iter.require ();
>        if (get_extraction_insn (insn, pattern, type, mode))
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 50f00062232..96da9ef1cce 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -1540,10 +1540,7 @@ i[34567]86-*-elfiamcu | i[34567]86-*-rtems*)
>         ;;
>  i[34567]86-*-* | x86_64-*-*)
>         tmake_file="${tmake_file} t-softfp-tf"
> -       if test "${host_address}" = 32; then
> -               tmake_file="${tmake_file} i386/${host_address}/t-softfp"
> -       fi
> -       tmake_file="${tmake_file} i386/t-softfp t-softfp"
> +       tmake_file="${tmake_file} i386/${host_address}/t-softfp i386/t-softfp t-softfp"
>         ;;
>  esac
>
> diff --git a/libgcc/config/i386/32/sfp-machine.h b/libgcc/config/i386/32/sfp-machine.h
> index 1fa282d7afe..e24cbc8d180 100644
> --- a/libgcc/config/i386/32/sfp-machine.h
> +++ b/libgcc/config/i386/32/sfp-machine.h
> @@ -86,6 +86,7 @@
>  #define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>
> +#define _FP_NANFRAC_H          _FP_QNANBIT_H
>  #define _FP_NANFRAC_S          _FP_QNANBIT_S
>  #define _FP_NANFRAC_D          _FP_QNANBIT_D, 0
>  /* Even if XFmode is 12byte,  we have to pad it to
> diff --git a/libgcc/config/i386/64/sfp-machine.h b/libgcc/config/i386/64/sfp-machine.h
> index 1ff94c23ea4..e1c616699bb 100644
> --- a/libgcc/config/i386/64/sfp-machine.h
> +++ b/libgcc/config/i386/64/sfp-machine.h
> @@ -13,6 +13,7 @@ typedef unsigned int UTItype __attribute__ ((mode (TI)));
>
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>
> +#define _FP_NANFRAC_H          _FP_QNANBIT_H
>  #define _FP_NANFRAC_S          _FP_QNANBIT_S
>  #define _FP_NANFRAC_D          _FP_QNANBIT_D
>  #define _FP_NANFRAC_E          _FP_QNANBIT_E, 0
> diff --git a/libgcc/config/i386/64/t-softfp b/libgcc/config/i386/64/t-softfp
> new file mode 100644
> index 00000000000..d812bb120bd
> --- /dev/null
> +++ b/libgcc/config/i386/64/t-softfp
> @@ -0,0 +1 @@
> +softfp_extras := fixhfti fixunshfti floattihf floatuntihf
> \ No newline at end of file
> diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
> index 8319f0550bc..f15d29d3755 100644
> --- a/libgcc/config/i386/sfp-machine.h
> +++ b/libgcc/config/i386/sfp-machine.h
> @@ -17,6 +17,7 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
>  #define _FP_KEEPNANFRACP       1
>  #define _FP_QNANNEGATEDP 0
>
> +#define _FP_NANSIGN_H          1
>  #define _FP_NANSIGN_S          1
>  #define _FP_NANSIGN_D          1
>  #define _FP_NANSIGN_E          1
> diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp
> index 685d9cf8502..4ac214eb0ce 100644
> --- a/libgcc/config/i386/t-softfp
> +++ b/libgcc/config/i386/t-softfp
> @@ -1 +1,6 @@
>  LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c
> +
> +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> +
> +softfp_extras += eqhf2
> \ No newline at end of file
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.
  2021-07-22 12:14           ` Richard Biener
@ 2021-07-27  5:32             ` Hongtao Liu
  2021-07-29 20:57               ` Joseph Myers
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-07-27  5:32 UTC (permalink / raw)
  To: Richard Biener
  Cc: liuhongt, GCC Patches, Uros Bizjak, Joseph S. Myers, H. J. Lu

On Thu, Jul 22, 2021 at 8:14 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Jul 21, 2021 at 9:43 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > gcc/ChangeLog:
> >
> >         * optabs-query.c (get_best_extraction_insn): Use word_mode for
> >         HF field.
> >
> > libgcc/ChangeLog:
> >
> >         * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
> >         * config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto.
> >         * config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto.
> >         * config/i386/t-softfp: Add hf soft-fp.
> >         * config.host: Add i386/64/t-softfp.
> >         * config/i386/64/t-softfp: New file.
> > ---
> >  gcc/optabs-query.c                  | 10 +++++++++-
> >  libgcc/config.host                  |  5 +----
> >  libgcc/config/i386/32/sfp-machine.h |  1 +
> >  libgcc/config/i386/64/sfp-machine.h |  1 +
> >  libgcc/config/i386/64/t-softfp      |  1 +
> >  libgcc/config/i386/sfp-machine.h    |  1 +
> >  libgcc/config/i386/t-softfp         |  5 +++++
> >  7 files changed, 19 insertions(+), 5 deletions(-)
> >  create mode 100644 libgcc/config/i386/64/t-softfp
> >
> > diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
> > index 05ee5f517da..0438e451474 100644
> > --- a/gcc/optabs-query.c
> > +++ b/gcc/optabs-query.c
> > @@ -205,7 +205,15 @@ get_best_extraction_insn (extraction_insn *insn,
> >                           machine_mode field_mode)
> >  {
> >    opt_scalar_int_mode mode_iter;
> > -  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode_for_size (struct_bits))
> > +  scalar_int_mode smallest_int_mode;
> > +  /* FIXME: validate_subreg only allows (subreg:WORD_MODE (reg:HF) 0). */
>
> I think that needs "fixing" then, or alternatively the caller should care.
>
How about this

modified   gcc/emit-rtl.c
@@ -928,6 +928,10 @@ validate_subreg (machine_mode omode, machine_mode imode,
      fix them all.  */
   if (omode == word_mode)
     ;
+  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
+     here. Though extract_bit_field is the culprit here, not the backends.  */
+  else if (imode == HFmode && omode == SImode)
+    ;
   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
      is the culprit here, and not the backends.  */
   else if (known_ge (osize, regsize) && known_ge (isize, osize))
new file   gcc/testsuite/gcc.target/i386/float16-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2" } */
+_Float16
+foo (int a)
+{
+  union {
+    int a;
+    _Float16 b;
+  }c;
+  c.a = a;
+  return c.b;
+}

If it's ok, I'll merge the upper change to the former commit:
"[PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above."

> > +  if (FLOAT_MODE_P (field_mode)
> > +      && known_eq (GET_MODE_SIZE (field_mode), 2))
> > +    smallest_int_mode = word_mode;
> > +  else
> > +    smallest_int_mode = smallest_int_mode_for_size (struct_bits);
> > +
> > +  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode)
> >      {
> >        scalar_int_mode mode = mode_iter.require ();
> >        if (get_extraction_insn (insn, pattern, type, mode))
> > diff --git a/libgcc/config.host b/libgcc/config.host
> > index 50f00062232..96da9ef1cce 100644
> > --- a/libgcc/config.host
> > +++ b/libgcc/config.host
> > @@ -1540,10 +1540,7 @@ i[34567]86-*-elfiamcu | i[34567]86-*-rtems*)
> >         ;;
> >  i[34567]86-*-* | x86_64-*-*)
> >         tmake_file="${tmake_file} t-softfp-tf"
> > -       if test "${host_address}" = 32; then
> > -               tmake_file="${tmake_file} i386/${host_address}/t-softfp"
> > -       fi
> > -       tmake_file="${tmake_file} i386/t-softfp t-softfp"
> > +       tmake_file="${tmake_file} i386/${host_address}/t-softfp i386/t-softfp t-softfp"
> >         ;;
> >  esac
> >
> > diff --git a/libgcc/config/i386/32/sfp-machine.h b/libgcc/config/i386/32/sfp-machine.h
> > index 1fa282d7afe..e24cbc8d180 100644
> > --- a/libgcc/config/i386/32/sfp-machine.h
> > +++ b/libgcc/config/i386/32/sfp-machine.h
> > @@ -86,6 +86,7 @@
> >  #define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
> >  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
> >
> > +#define _FP_NANFRAC_H          _FP_QNANBIT_H
> >  #define _FP_NANFRAC_S          _FP_QNANBIT_S
> >  #define _FP_NANFRAC_D          _FP_QNANBIT_D, 0
> >  /* Even if XFmode is 12byte,  we have to pad it to
> > diff --git a/libgcc/config/i386/64/sfp-machine.h b/libgcc/config/i386/64/sfp-machine.h
> > index 1ff94c23ea4..e1c616699bb 100644
> > --- a/libgcc/config/i386/64/sfp-machine.h
> > +++ b/libgcc/config/i386/64/sfp-machine.h
> > @@ -13,6 +13,7 @@ typedef unsigned int UTItype __attribute__ ((mode (TI)));
> >
> >  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
> >
> > +#define _FP_NANFRAC_H          _FP_QNANBIT_H
> >  #define _FP_NANFRAC_S          _FP_QNANBIT_S
> >  #define _FP_NANFRAC_D          _FP_QNANBIT_D
> >  #define _FP_NANFRAC_E          _FP_QNANBIT_E, 0
> > diff --git a/libgcc/config/i386/64/t-softfp b/libgcc/config/i386/64/t-softfp
> > new file mode 100644
> > index 00000000000..d812bb120bd
> > --- /dev/null
> > +++ b/libgcc/config/i386/64/t-softfp
> > @@ -0,0 +1 @@
> > +softfp_extras := fixhfti fixunshfti floattihf floatuntihf
> > \ No newline at end of file
> > diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
> > index 8319f0550bc..f15d29d3755 100644
> > --- a/libgcc/config/i386/sfp-machine.h
> > +++ b/libgcc/config/i386/sfp-machine.h
> > @@ -17,6 +17,7 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
> >  #define _FP_KEEPNANFRACP       1
> >  #define _FP_QNANNEGATEDP 0
> >
> > +#define _FP_NANSIGN_H          1
> >  #define _FP_NANSIGN_S          1
> >  #define _FP_NANSIGN_D          1
> >  #define _FP_NANSIGN_E          1
> > diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp
> > index 685d9cf8502..4ac214eb0ce 100644
> > --- a/libgcc/config/i386/t-softfp
> > +++ b/libgcc/config/i386/t-softfp
> > @@ -1 +1,6 @@
> >  LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c
> > +
> > +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> > +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> > +
> > +softfp_extras += eqhf2
> > \ No newline at end of file
> > --
> > 2.18.1
> >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.
  2021-07-22  8:49           ` Uros Bizjak
@ 2021-07-27  7:31             ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-27  7:31 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: liuhongt, gcc-patches, Joseph S. Myers, H. J. Lu, Richard Biener,
	Guo, Xuepeng

[-- Attachment #1: Type: text/plain, Size: 82180 bytes --]

On Thu, Jul 22, 2021 at 4:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Jul 21, 2021 at 9:44 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > From: "Guo, Xuepeng" <xuepeng.guo@intel.com>
> >
> > gcc/ChangeLog:
> >
> >         * common/config/i386/cpuinfo.h (get_available_features):
> >         Detect FEATURE_AVX512FP16.
> >         * common/config/i386/i386-common.c
> >         (OPTION_MASK_ISA_AVX512FP16_SET,
> >         OPTION_MASK_ISA_AVX512FP16_UNSET,
> >         OPTION_MASK_ISA2_AVX512FP16_SET,
> >         OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
> >         (OPTION_MASK_ISA2_AVX512BW_UNSET,
> >         OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
> >         (ix86_handle_option): Handle -mavx512fp16.
> >         * common/config/i386/i386-cpuinfo.h (enum processor_features):
> >         Add FEATURE_AVX512FP16.
> >         * common/config/i386/i386-isas.h: Add entry for AVX512FP16.
> >         * config.gcc: Add avx512fp16intrin.h.
> >         * config/i386/avx512fp16intrin.h: New intrinsic header.
> >         * config/i386/cpuid.h: Add bit_AVX512FP16.
> >         * config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
> >         * config/i386/i386-builtins.c: Support _Float16 type for i386
> >         backend.
> >         (ix86_init_float16_builtins): New function.
> >         (ix86_float16_type_node): New.
> >         * config/i386/i386-c.c (ix86_target_macros_internal): Define
> >         __AVX512FP16__.
> >         * config/i386/i386-expand.c (ix86_expand_branch): Support
> >         HFmode.
> >         (ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
> >         SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
> >         (ix86_expand_fp_movcc): Ditto.
> >         * config/i386/i386-isa.def: Add PTA define for AVX512FP16.
> >         * config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
> >         (ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
> >         * config/i386/i386.c (ix86_get_ssemov): Use
> >         vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
> >         (ix86_get_excess_precision): Use
> >         FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
> >         existed.
> >         (output_387_binary_op): Update instruction suffix for HFmode.
> >         (sse_store_index): Use SFmode cost for HFmode cost.
> >         (inline_memory_move_cost): Add HFmode, and perfer SSE cost over
> >         GPR cost for HFmode.
> >         (ix86_hard_regno_mode_ok): Allow HImode in sse register.
> >         (ix86_mangle_type): Add manlging for _Float16 type.
> >         (inline_secondary_memory_needed): No memory is needed for
> >         16bit movement between gpr and sse reg under
> >         TARGET_AVX512FP16.
> >         (ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
> >         SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
> >         (ix86_division_cost): Ditto.
> >         (ix86_rtx_costs): Ditto.
> >         (ix86_add_stmt_cost): Ditto.
> >         (ix86_optab_supported_p): Ditto.
> >         * config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
> >         (SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
> >         (SSE_FLOAT_MODE_P): Add HFmode.
> >         (PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
> >         * config/i386/i386.md (mode): Add HFmode.
> >         (MODE_SIZE): Add HFmode.
> >         (MODEFH): Likewise.
> >         (ssemodesuffix): Add sh suffix for HFmode.
> >         (cbranch<mode>4): Use MODEFH.
> >         (<insn><mode>3): Likewise.
> >         (mul<mode>3): Likewise.
> >         (div<mode>3): Likewise.
> >         (*ieee_s<ieee_maxmin><mode>3): Likewise.
> >         (*cmpi<unord>hf): New define_insn for HFmode.
> >         (*movhf_internal): Adjust for avx512fp16 instruction.
> >         (extendhf<mode>2): Likewise.
> >         (trunc<mode>hf2): Likewise.
> >         (*fop_hf_comm): Likewise.
> >         (*fop_hf_1): Likewise.
> >         (float<floatunssuffix><mode>hf2): Likewise.
> >         (mov<mode>cc): Likewise.
> >         * config/i386/i386.opt: Add mavx512fp16.
> >         * config/i386/immintrin.h: Include avx512fp16intrin.h.
> >         * doc/invoke.texi: Add mavx512fp16.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
> >         * gcc.target/i386/avx-2.c: Ditto.
> >         * gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
> >         * gcc.target/i386/funcspec-56.inc: Add new target attribute check.
> >         * gcc.target/i386/sse-13.c: Add -mavx512fp16.
> >         * gcc.target/i386/sse-14.c: Ditto.
> >         * gcc.target/i386/sse-22.c: Ditto.
> >         * gcc.target/i386/sse-23.c: Ditto.
> >         * lib/target-supports.exp: (check_effective_target_avx512fp16): New.
> >         * g++.target/i386/float16-1.C: New test.
> >         * g++.target/i386/float16-2.C: Ditto.
> >         * g++.target/i386/float16-3.C: Ditto.
> >         * gcc.target/i386/avx512fp16-12a.c: Ditto.
> >         * gcc.target/i386/avx512fp16-12b.c: Ditto.
> >         * gcc.target/i386/float16-3a.c: Ditto.
> >         * gcc.target/i386/float16-3b.c: Ditto.
> >         * gcc.target/i386/float16-4a.c: Ditto.
> >         * gcc.target/i386/float16-4b.c: Ditto.
> >         * gcc.target/i386/pr54855-12.c: Ditto.
> >         * g++.dg/other/i386-2.C: Ditto.
> >         * g++.dg/other/i386-3.C: Ditto.
> >
> > Co-Authored-By: Guo, Xuepeng <xuepeng.guo@intel.com>
> > Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
> > Co-Authored-By: Liu, Hongtao <hongtao.liu@intel.com>
> > Co-Authored-By: Wang, Hongyu <hongyu.wang@intel.com>
> > Co-Authored-By: Xu, Dianhong <dianhong.xu@intel.com>
> > ---
> >  gcc/common/config/i386/cpuinfo.h              |   2 +
> >  gcc/common/config/i386/i386-common.c          |  26 ++-
> >  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
> >  gcc/common/config/i386/i386-isas.h            |   1 +
> >  gcc/config.gcc                                |   2 +-
> >  gcc/config/i386/avx512fp16intrin.h            |  53 +++++
> >  gcc/config/i386/cpuid.h                       |   1 +
> >  gcc/config/i386/i386-builtin-types.def        |   1 +
> >  gcc/config/i386/i386-builtins.c               |  23 +++
> >  gcc/config/i386/i386-c.c                      |   2 +
> >  gcc/config/i386/i386-expand.c                 |   5 +-
> >  gcc/config/i386/i386-isa.def                  |   1 +
> >  gcc/config/i386/i386-options.c                |   4 +-
> >  gcc/config/i386/i386.c                        | 128 ++++++++----
> >  gcc/config/i386/i386.h                        |  11 +-
> >  gcc/config/i386/i386.md                       | 185 ++++++++++++++----
> >  gcc/config/i386/i386.opt                      |   4 +
> >  gcc/config/i386/immintrin.h                   |   4 +
> >  gcc/doc/invoke.texi                           |  10 +-
> >  gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
> >  gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
> >  gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
> >  gcc/testsuite/g++.target/i386/float16-2.C     |  14 ++
> >  gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
> >  gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
> >  gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
> >  gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
> >  .../gcc.target/i386/avx512fp16-12a.c          |  21 ++
> >  .../gcc.target/i386/avx512fp16-12b.c          |  27 +++
> >  gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
> >  gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
> >  gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
> >  gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
> >  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
> >  gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 ++
> >  gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
> >  gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
> >  gcc/testsuite/lib/target-supports.exp         |  13 +-
> >  40 files changed, 531 insertions(+), 103 deletions(-)
> >  create mode 100644 gcc/config/i386/avx512fp16intrin.h
> >  create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
> >  create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
> >  create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
> >
> > diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
> > index 458f41de776..1835ac64e67 100644
> > --- a/gcc/common/config/i386/cpuinfo.h
> > +++ b/gcc/common/config/i386/cpuinfo.h
> > @@ -731,6 +731,8 @@ get_available_features (struct __processor_model *cpu_model,
> >             set_feature (FEATURE_AVX5124FMAPS);
> >           if (edx & bit_AVX512VP2INTERSECT)
> >             set_feature (FEATURE_AVX512VP2INTERSECT);
> > +         if (edx & bit_AVX512FP16)
> > +           set_feature (FEATURE_AVX512FP16);
> >         }
> >
> >        __cpuid_count (7, 1, eax, ebx, ecx, edx);
> > diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
> > index 76ab1a14e54..00c65ba15ab 100644
> > --- a/gcc/common/config/i386/i386-common.c
> > +++ b/gcc/common/config/i386/i386-common.c
> > @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #define OPTION_MASK_ISA2_AVX5124VNNIW_SET OPTION_MASK_ISA2_AVX5124VNNIW
> >  #define OPTION_MASK_ISA_AVX512VBMI2_SET \
> >    (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512F_SET)
> > +#define OPTION_MASK_ISA_AVX512FP16_SET OPTION_MASK_ISA_AVX512BW_SET
> > +#define OPTION_MASK_ISA2_AVX512FP16_SET OPTION_MASK_ISA2_AVX512FP16
> >  #define OPTION_MASK_ISA_AVX512VNNI_SET \
> >    (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512F_SET)
> >  #define OPTION_MASK_ISA2_AVXVNNI_SET OPTION_MASK_ISA2_AVXVNNI
> > @@ -231,6 +233,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS
> >  #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW
> >  #define OPTION_MASK_ISA_AVX512VBMI2_UNSET OPTION_MASK_ISA_AVX512VBMI2
> > +#define OPTION_MASK_ISA_AVX512FP16_UNSET OPTION_MASK_ISA_AVX512BW_UNSET
> > +#define OPTION_MASK_ISA2_AVX512FP16_UNSET OPTION_MASK_ISA2_AVX512FP16
> >  #define OPTION_MASK_ISA_AVX512VNNI_UNSET OPTION_MASK_ISA_AVX512VNNI
> >  #define OPTION_MASK_ISA2_AVXVNNI_UNSET OPTION_MASK_ISA2_AVXVNNI
> >  #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ
> > @@ -313,7 +317,8 @@ along with GCC; see the file COPYING3.  If not see
> >    (OPTION_MASK_ISA2_AVX512BF16_UNSET \
> >     | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \
> >     | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \
> > -   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
> > +   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> > +   | OPTION_MASK_ISA2_AVX512FP16_UNSET)
> >  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
> >    (OPTION_MASK_ISA2_AVX512F_UNSET)
> >  #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
> > @@ -326,7 +331,9 @@ along with GCC; see the file COPYING3.  If not see
> >    (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET)
> >  #define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET
> >
> > -#define OPTION_MASK_ISA2_AVX512BW_UNSET OPTION_MASK_ISA2_AVX512BF16_UNSET
> > +#define OPTION_MASK_ISA2_AVX512BW_UNSET \
> > +  (OPTION_MASK_ISA2_AVX512BF16_UNSET \
> > +    | OPTION_MASK_ISA2_AVX512FP16_UNSET)
> >
> >  /* Set 1 << value as value of -malign-FLAG option.  */
> >
> > @@ -853,6 +860,21 @@ ix86_handle_option (struct gcc_options *opts,
> >         }
> >        return true;
> >
> > +    case OPT_mavx512fp16:
> > +      if (value)
> > +       {
> > +         opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512FP16_SET;
> > +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_SET;
> > +         opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
> > +         opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
> > +       }
> > +      else
> > +       {
> > +         opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
> > +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
> > +       }
> > +      return true;
> > +
> >      case OPT_mavx512vnni:
> >        if (value)
> >         {
> > diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
> > index e68dd656046..4e0659fc7b2 100644
> > --- a/gcc/common/config/i386/i386-cpuinfo.h
> > +++ b/gcc/common/config/i386/i386-cpuinfo.h
> > @@ -228,6 +228,7 @@ enum processor_features
> >    FEATURE_AESKLE,
> >    FEATURE_WIDEKL,
> >    FEATURE_AVXVNNI,
> > +  FEATURE_AVX512FP16,
> >    CPU_FEATURE_MAX
> >  };
> >
> > diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
> > index 898c18f3dda..a6783660278 100644
> > --- a/gcc/common/config/i386/i386-isas.h
> > +++ b/gcc/common/config/i386/i386-isas.h
> > @@ -169,4 +169,5 @@ ISA_NAMES_TABLE_START
> >    ISA_NAMES_TABLE_ENTRY("aeskle", FEATURE_AESKLE, P_NONE, NULL)
> >    ISA_NAMES_TABLE_ENTRY("widekl", FEATURE_WIDEKL, P_NONE, "-mwidekl")
> >    ISA_NAMES_TABLE_ENTRY("avxvnni", FEATURE_AVXVNNI, P_NONE, "-mavxvnni")
> > +  ISA_NAMES_TABLE_ENTRY("avx512fp16", FEATURE_AVX512FP16, P_NONE, "-mavx512fp16")
> >  ISA_NAMES_TABLE_END
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 3df9b52cf25..a354351408c 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
> >                        tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> >                        amxbf16intrin.h x86gprintrin.h uintrintrin.h
> >                        hresetintrin.h keylockerintrin.h avxvnniintrin.h
> > -                      mwaitintrin.h"
> > +                      mwaitintrin.h avx512fp16intrin.h"
> >         ;;
> >  ia64-*-*)
> >         extra_headers=ia64intrin.h
> > diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> > new file mode 100644
> > index 00000000000..38d63161ba6
> > --- /dev/null
> > +++ b/gcc/config/i386/avx512fp16intrin.h
> > @@ -0,0 +1,53 @@
> > +/* Copyright (C) 2019 Free Software Foundation, Inc.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify
> > +   it under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +   GNU General Public License for more details.
> > +
> > +   Under Section 7 of GPL version 3, you are granted additional
> > +   permissions described in the GCC Runtime Library Exception, version
> > +   3.1, as published by the Free Software Foundation.
> > +
> > +   You should have received a copy of the GNU General Public License and
> > +   a copy of the GCC Runtime Library Exception along with this program;
> > +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#ifndef _IMMINTRIN_H_INCLUDED
> > +#error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
> > +#endif
> > +
> > +#ifndef __AVX512FP16INTRIN_H_INCLUDED
> > +#define __AVX512FP16INTRIN_H_INCLUDED
> > +
> > +#ifndef __AVX512FP16__
> > +#pragma GCC push_options
> > +#pragma GCC target("avx512fp16")
> > +#define __DISABLE_AVX512FP16__
> > +#endif /* __AVX512FP16__ */
> > +
> > +/* Internal data types for implementing the intrinsics.  */
> > +typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
> > +typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
> > +typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
> > +
> > +/* The Intel API is flexible enough that we must allow aliasing with other
> > +   vector types, and their scalar components.  */
> > +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
> > +typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
> > +typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
> > +
> > +#ifdef __DISABLE_AVX512FP16__
> > +#undef __DISABLE_AVX512FP16__
> > +#pragma GCC pop_options
> > +#endif /* __DISABLE_AVX512FP16__ */
> > +
> > +#endif /* __AVX512FP16INTRIN_H_INCLUDED */
> > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> > index aebc17c6827..82b8050028b 100644
> > --- a/gcc/config/i386/cpuid.h
> > +++ b/gcc/config/i386/cpuid.h
> > @@ -126,6 +126,7 @@
> >  #define bit_AVX5124VNNIW (1 << 2)
> >  #define bit_AVX5124FMAPS (1 << 3)
> >  #define bit_AVX512VP2INTERSECT (1 << 8)
> > +#define bit_AVX512FP16   (1 << 23)
> >  #define bit_IBT        (1 << 20)
> >  #define bit_UINTR (1 << 5)
> >  #define bit_PCONFIG    (1 << 18)
> > diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> > index 3ca313c19ec..1768b88d748 100644
> > --- a/gcc/config/i386/i386-builtin-types.def
> > +++ b/gcc/config/i386/i386-builtin-types.def
> > @@ -68,6 +68,7 @@ DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
> >  DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)
> >  DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
> >  DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
> > +DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
> >  DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
> >  DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
> >  DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> > diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
> > index 204e2903126..668f09f12a0 100644
> > --- a/gcc/config/i386/i386-builtins.c
> > +++ b/gcc/config/i386/i386-builtins.c
> > @@ -125,6 +125,7 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
> >  /* Table for the ix86 builtin non-function types.  */
> >  static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
> >
> > +tree ix86_float16_type_node = NULL_TREE;
> >  /* Retrieve an element from the above table, building some of
> >     the types lazily.  */
> >
> > @@ -1343,6 +1344,26 @@ ix86_init_builtins_va_builtins_abi (void)
> >                         BUILT_IN_VA_COPY, BUILT_IN_NORMAL, NULL, fnattr_sysv);
> >  }
> >
> > +static void
> > +ix86_init_float16_builtins (void)
> > +{
> > +  /* Provide the _Float16 type and float16_type_node if needed so that
> > +     it can be used in AVX512FP16 intrinsics and builtins.  */
> > +  if (!float16_type_node)
> > +    {
> > +      ix86_float16_type_node = make_node (REAL_TYPE);
> > +      TYPE_PRECISION (ix86_float16_type_node) = 16;
> > +      SET_TYPE_MODE (ix86_float16_type_node, HFmode);
> > +      layout_type (ix86_float16_type_node);
> > +    }
> > +  else
> > +    ix86_float16_type_node = float16_type_node;
> > +
> > +  if (!maybe_get_identifier ("_Float16") && TARGET_SSE2)
> > +    lang_hooks.types.register_builtin_type (ix86_float16_type_node,
> > +                                           "_Float16");
> > +}
> > +
> >  static void
> >  ix86_init_builtin_types (void)
> >  {
> > @@ -1371,6 +1392,8 @@ ix86_init_builtin_types (void)
> >       it.  */
> >    lang_hooks.types.register_builtin_type (float128_type_node, "__float128");
> >
> > +  ix86_init_float16_builtins ();
> > +
> >    const_string_type_node
> >      = build_pointer_type (build_qualified_type
> >                           (char_type_node, TYPE_QUAL_CONST));
> > diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> > index 5ed0de006fb..cc64f855ecc 100644
> > --- a/gcc/config/i386/i386-c.c
> > +++ b/gcc/config/i386/i386-c.c
> > @@ -598,6 +598,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
> >      def_or_undef (parse_in, "__PTWRITE__");
> >    if (isa_flag2 & OPTION_MASK_ISA2_AVX512BF16)
> >      def_or_undef (parse_in, "__AVX512BF16__");
> > +  if (isa_flag2 & OPTION_MASK_ISA2_AVX512FP16)
> > +    def_or_undef (parse_in, "__AVX512FP16__");
> >    if (TARGET_MMX_WITH_SSE)
> >      def_or_undef (parse_in, "__MMX_WITH_SSE__");
> >    if (isa_flag2 & OPTION_MASK_ISA2_ENQCMD)
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index 69ea79e6123..b7d050a1e42 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -2314,6 +2314,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
> >
> >    switch (mode)
> >      {
> > +    case E_HFmode:
> >      case E_SFmode:
> >      case E_DFmode:
> >      case E_XFmode:
> > @@ -2627,7 +2628,7 @@ ix86_prepare_fp_compare_args (enum rtx_code code, rtx *pop0, rtx *pop1)
> >    bool unordered_compare = ix86_unordered_fp_compare (code);
> >    rtx op0 = *pop0, op1 = *pop1;
> >    machine_mode op_mode = GET_MODE (op0);
> > -  bool is_sse = TARGET_SSE_MATH && SSE_FLOAT_MODE_P (op_mode);
> > +  bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
> >
> >    /* All of the unordered compare instructions only work on registers.
> >       The same is true of the fcomi compare instructions.  The XFmode
> > @@ -4112,7 +4113,7 @@ ix86_expand_fp_movcc (rtx operands[])
> >    rtx op0 = XEXP (operands[1], 0);
> >    rtx op1 = XEXP (operands[1], 1);
> >
> > -  if (TARGET_SSE_MATH && SSE_FLOAT_MODE_P (mode))
> > +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >      {
> >        machine_mode cmode;
> >
> > diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
> > index a0d46cbc892..83d9302ea3d 100644
> > --- a/gcc/config/i386/i386-isa.def
> > +++ b/gcc/config/i386/i386-isa.def
> > @@ -108,3 +108,4 @@ DEF_PTA(HRESET)
> >  DEF_PTA(KL)
> >  DEF_PTA(WIDEKL)
> >  DEF_PTA(AVXVNNI)
> > +DEF_PTA(AVX512FP16)
> > diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> > index 3416a4f1752..df191763e4b 100644
> > --- a/gcc/config/i386/i386-options.c
> > +++ b/gcc/config/i386/i386-options.c
> > @@ -223,7 +223,8 @@ static struct ix86_target_opts isa2_opts[] =
> >    { "-mhreset",                OPTION_MASK_ISA2_HRESET },
> >    { "-mkl",            OPTION_MASK_ISA2_KL },
> >    { "-mwidekl",        OPTION_MASK_ISA2_WIDEKL },
> > -  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI }
> > +  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI },
> > +  { "-mavx512fp16",    OPTION_MASK_ISA2_AVX512FP16 }
> >  };
> >  static struct ix86_target_opts isa_opts[] =
> >  {
> > @@ -1045,6 +1046,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
> >      IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
> >      IX86_ATTR_ISA ("hreset", OPT_mhreset),
> >      IX86_ATTR_ISA ("avxvnni",   OPT_mavxvnni),
> > +    IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16),
> >
> >      /* enum options */
> >      IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_),
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 02628d838fc..e826484a4f4 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -5497,6 +5497,14 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
> >      case MODE_SI:
> >        return "%vmovd\t{%1, %0|%0, %1}";
> >
> > +    case MODE_HI:
> > +      if (GENERAL_REG_P (operands[0]))
> > +       return "vmovw\t{%1, %k0|%k0, %1}";
> > +      else if (GENERAL_REG_P (operands[1]))
> > +       return "vmovw\t{%k1, %0|%0, %k1}";
> > +      else
> > +       return "vmovw\t{%1, %0|%0, %1}";
> > +
> >      case MODE_DF:
> >        if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
> >         return "vmovsd\t{%d1, %0|%0, %d1}";
> > @@ -5509,6 +5517,12 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
> >        else
> >         return "%vmovss\t{%1, %0|%0, %1}";
> >
> > +    case MODE_HF:
> > +      if (REG_P (operands[0]) && REG_P (operands[1]))
> > +       return "vmovsh\t{%d1, %0|%0, %d1}";
> > +      else
> > +       return "vmovsh\t{%1, %0|%0, %1}";
> > +
> >      case MODE_V1DF:
> >        gcc_assert (!TARGET_AVX);
> >        return "movlpd\t{%1, %0|%0, %1}";
> > @@ -13955,7 +13969,9 @@ output_387_binary_op (rtx_insn *insn, rtx *operands)
> >
> >    if (is_sse)
> >     {
> > -     p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd";
> > +     p = (GET_MODE (operands[0]) == HFmode
> > +         ? "sh"
> > +         : (GET_MODE (operands[0]) == SFmode ? "ss" : "sd"));
> >       strcat (buf, p);
> >
> >       if (TARGET_AVX)
> > @@ -19132,9 +19148,11 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1,
> >        if (!TARGET_SSE2)
> >         return true;
> >
> > -      /* Between SSE and general, we have moves no larger than word size.  */
> > +      /* Between SSE and general, we have moves no larger than word size
> > +        except for AVX512FP16, VMOVW enable 16bits movement.  */
> >        if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
> > -         || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
> > +         || GET_MODE_SIZE (mode) < GET_MODE_SIZE (TARGET_AVX512FP16
> > +                                                  ? HImode : SImode)
> >           || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> >         return true;
>
> Please recode the above to something like:
>
> if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
>   return true;
>
> int msize = GET_MODE_SIZE (mode);
>
> /* Between SSE and general, we have moves no larger than word size.  */
> if (msize > UNITS_PER_WORD)
>   return true;
>
> /* In addition to SImode moves, AVX512FP16 also enables HImode moves.  */
> int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);
>
> if (msize < minsize)
>   return true;
>
Changed.
> > @@ -19229,21 +19247,26 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
> >  static inline int
> >  sse_store_index (machine_mode mode)
> >  {
> > -      switch (GET_MODE_SIZE (mode))
> > -       {
> > -         case 4:
> > -           return 0;
> > -         case 8:
> > -           return 1;
> > -         case 16:
> > -           return 2;
> > -         case 32:
> > -           return 3;
> > -         case 64:
> > -           return 4;
> > -         default:
> > -           return -1;
> > -       }
> > +  /* NB: Use SFmode cost for HFmode instead of adding HFmode load/store
> > +     costs to processor_costs, which requires changes to all entries in
> > +     processor cost table.  */
> > +  if (mode == E_HFmode)
> > +    mode = E_SFmode;
> > +  switch (GET_MODE_SIZE (mode))
> > +    {
> > +    case 4:
> > +      return 0;
> > +    case 8:
> > +      return 1;
> > +    case 16:
> > +      return 2;
> > +    case 32:
> > +      return 3;
> > +    case 64:
> > +      return 4;
> > +    default:
> > +      return -1;
> > +    }
> >  }
> >
> >  /* Return the cost of moving data of mode M between a
> > @@ -19270,6 +19293,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
> >        int index;
> >        switch (mode)
> >         {
> > +         case E_HFmode:
> >           case E_SFmode:
> >             index = 0;
> >             break;
> > @@ -19370,11 +19394,31 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
> >           }
> >         break;
> >        case 2:
> > -       if (in == 2)
> > -         return MAX (ix86_cost->hard_register.int_load[1],
> > -                     ix86_cost->hard_register.int_store[1]);
> > -       return in ? ix86_cost->hard_register.int_load[1]
> > -                 : ix86_cost->hard_register.int_store[1];
> > +       {
> > +         int cost;
> > +         if (in == 2)
> > +           cost = MAX (ix86_cost->hard_register.int_load[1],
> > +                       ix86_cost->hard_register.int_store[1]);
> > +         else
> > +           cost = in ? ix86_cost->hard_register.int_load[1]
> > +                     : ix86_cost->hard_register.int_store[1];
> > +         if (mode == E_HFmode)
> > +           {
> > +             /* Prefer SSE over GPR for HFmode.  */
> > +             int sse_cost;
> > +             int index = sse_store_index (mode);
> > +             if (in == 2)
> > +               sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
> > +                               ix86_cost->hard_register.sse_store[index]);
> > +             else
> > +               sse_cost = (in
> > +                           ? ix86_cost->hard_register.sse_load [index]
> > +                           : ix86_cost->hard_register.sse_store [index]);
> > +             if (sse_cost >= cost)
> > +               cost = sse_cost + 1;
> > +           }
> > +         return cost;
> > +       }
> >        default:
> >         if (in == 2)
> >           cost = MAX (ix86_cost->hard_register.int_load[2],
> > @@ -19548,6 +19592,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> >           - XI mode
> >           - any of 512-bit wide vector mode
> >           - any scalar mode.  */
> > +      /* For AVX512FP16, vmovw supports movement of HImode
> > +        between gpr and sse registser.  */
> >        if (TARGET_AVX512F
> >           && (mode == XImode
> >               || VALID_AVX512F_REG_MODE (mode)
> > @@ -19833,7 +19879,7 @@ ix86_multiplication_cost (const struct processor_costs *cost,
> >    if (VECTOR_MODE_P (mode))
> >      inner_mode = GET_MODE_INNER (mode);
> >
> > -  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >      return inner_mode == DFmode ? cost->mulsd : cost->mulss;
> >    else if (X87_FLOAT_MODE_P (mode))
> >      return cost->fmul;
> > @@ -19885,7 +19931,7 @@ ix86_division_cost (const struct processor_costs *cost,
> >    if (VECTOR_MODE_P (mode))
> >      inner_mode = GET_MODE_INNER (mode);
> >
> > -  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >      return inner_mode == DFmode ? cost->divsd : cost->divss;
> >    else if (X87_FLOAT_MODE_P (mode))
> >      return cost->fdiv;
> > @@ -20305,7 +20351,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
> >           return true;
> >         }
> >
> > -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >         {
> >           *total = cost->addss;
> >           return false;
> > @@ -20338,7 +20384,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
> >        /* FALLTHRU */
> >
> >      case NEG:
> > -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >         {
> >           *total = cost->sse_op;
> >           return false;
> > @@ -20420,14 +20466,14 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
> >        return false;
> >
> >      case FLOAT_EXTEND:
> > -      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
> > +      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >         *total = 0;
> >        else
> >          *total = ix86_vec_cost (mode, cost->addss);
> >        return false;
> >
> >      case FLOAT_TRUNCATE:
> > -      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
> > +      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >         *total = cost->fadd;
> >        else
> >          *total = ix86_vec_cost (mode, cost->addss);
> > @@ -20437,7 +20483,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
> >        /* SSE requires memory load for the constant operand. It may make
> >          sense to account for this.  Of course the constant operand may or
> >          may not be reused. */
> > -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >         *total = cost->sse_op;
> >        else if (X87_FLOAT_MODE_P (mode))
> >         *total = cost->fabs;
> > @@ -20446,7 +20492,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
> >        return false;
> >
> >      case SQRT:
> > -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >         *total = mode == SFmode ? cost->sqrtss : cost->sqrtsd;
> >        else if (X87_FLOAT_MODE_P (mode))
> >         *total = cost->fsqrt;
> > @@ -21930,6 +21976,10 @@ ix86_mangle_type (const_tree type)
> >
> >    switch (TYPE_MODE (type))
> >      {
> > +    case E_HFmode:
> > +      /* _Float16 is "DF16_".
> > +        Align with clang's decision in https://reviews.llvm.org/D33719. */
> > +      return "DF16_";
> >      case E_TFmode:
> >        /* __float128 is "g".  */
> >        return "g";
> > @@ -22553,7 +22603,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
> >         case MINUS_EXPR:
> >           if (kind == scalar_stmt)
> >             {
> > -             if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > +             if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >                 stmt_cost = ix86_cost->addss;
> >               else if (X87_FLOAT_MODE_P (mode))
> >                 stmt_cost = ix86_cost->fadd;
> > @@ -22571,7 +22621,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
> >           stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
> >           break;
> >         case NEGATE_EXPR:
> > -         if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > +         if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >             stmt_cost = ix86_cost->sse_op;
> >           else if (X87_FLOAT_MODE_P (mode))
> >             stmt_cost = ix86_cost->fchs;
> > @@ -22627,7 +22677,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
> >         case BIT_XOR_EXPR:
> >         case BIT_AND_EXPR:
> >         case BIT_NOT_EXPR:
> > -         if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> > +         if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
> >             stmt_cost = ix86_cost->sse_op;
> >           else if (VECTOR_MODE_P (mode))
> >             stmt_cost = ix86_vec_cost (mode, ix86_cost->sse_op);
> > @@ -23233,8 +23283,7 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode,
> >        return opt_type == OPTIMIZE_FOR_SPEED;
> >
> >      case rint_optab:
> > -      if (SSE_FLOAT_MODE_P (mode1)
> > -         && TARGET_SSE_MATH
> > +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode1)
> >           && !flag_trapping_math
> >           && !TARGET_SSE4_1)
>
> The above change is wrong. The condition is enabled for
> !TARGET_SSE4_1, so it never triggers for TARGET_AVX512FP16.
>
Changed.
> >         return opt_type == OPTIMIZE_FOR_SPEED;
> > @@ -23243,8 +23292,7 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode,
> >      case floor_optab:
> >      case ceil_optab:
> >      case btrunc_optab:
> > -      if (SSE_FLOAT_MODE_P (mode1)
> > -         && TARGET_SSE_MATH
> > +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode1)
> >           && !flag_trapping_math
> >           && TARGET_SSE4_1)
> >         return true;
> > @@ -23329,7 +23377,9 @@ ix86_get_excess_precision (enum excess_precision_type type)
> >         /* The fastest type to promote to will always be the native type,
> >            whether that occurs with implicit excess precision or
> >            otherwise.  */
> > -       return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > +       return TARGET_AVX512FP16
> > +              ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> > +              : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> >        case EXCESS_PRECISION_TYPE_STANDARD:
> >        case EXCESS_PRECISION_TYPE_IMPLICIT:
> >         /* Otherwise, the excess precision we want when we are
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index e21922e8782..dca2ad32ed4 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -1000,7 +1000,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> >
> >  #define VALID_AVX512F_SCALAR_MODE(MODE)                                        \
> >    ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode            \
> > -   || (MODE) == SFmode)
> > +   || (MODE) == SFmode                                                 \
> > +   || (((MODE) == HImode || (MODE) == HFmode) && TARGET_AVX512FP16))
>
> Please put TARGET_... in front of the condition.
>
Changed.
> >  #define VALID_AVX512F_REG_MODE(MODE)                                   \
> >    ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode     \
> > @@ -1039,7 +1040,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> >
> >  #define VALID_FP_MODE_P(MODE)                                          \
> >    ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode            \
> > -   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)                \
> > +   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
> >
> >  #define VALID_INT_MODE_P(MODE)                                         \
> >    ((MODE) == QImode || (MODE) == HImode                                        \
> > @@ -1071,6 +1072,10 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> >  #define SSE_FLOAT_MODE_P(MODE) \
> >    ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
> >
> > +#define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE)                           \
> > +  ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH)                                \
> > +   || (TARGET_AVX512FP16 && (MODE) == HFmode))
> > +
> >  #define FMA4_VEC_FLOAT_MODE_P(MODE) \
> >    (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \
> >                   || (MODE) == V8SFmode || (MODE) == V4DFmode))
> > @@ -2264,7 +2269,7 @@ constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
> >  constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
> >    | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
> >    | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
> > -  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI;
> > +  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16;
> >  constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
> >    | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
> >  constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index dd991c3ffdf..8f11cbcf28b 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -496,7 +496,7 @@ (define_attr "type"
> >
> >  ;; Main data type used by the insn
> >  (define_attr "mode"
> > -  "unknown,none,QI,HI,SI,DI,TI,OI,XI,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
> > +  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
> >    V2DF,V2SF,V1DF,V8DF"
> >    (const_string "unknown"))
> >
> > @@ -832,8 +832,7 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
> >                     sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
> >                     avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
> >                     avx512bw,noavx512bw,avx512dq,noavx512dq,
> > -                   avx512vl,noavx512vl,
> > -                   avxvnni,avx512vnnivl"
> > +                   avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16"
> >    (const_string "base"))
> >
> >  ;; Define instruction set of MMX instructions
> > @@ -885,7 +884,8 @@ (define_attr "enabled" ""
> >          (eq_attr "isa" "avxvnni") (symbol_ref "TARGET_AVXVNNI")
> >          (eq_attr "isa" "avx512vnnivl")
> >            (symbol_ref "TARGET_AVX512VNNI && TARGET_AVX512VL")
> > -
> > +        (eq_attr "isa" "avx512fp16")
> > +          (symbol_ref "TARGET_AVX512FP16")
>
> Space here between "isa" and "mmx_isa" attribute processing.
>
Changed.
> >          (eq_attr "mmx_isa" "native")
> >            (symbol_ref "!TARGET_MMX_WITH_SSE")
> >          (eq_attr "mmx_isa" "sse")
> > @@ -1089,8 +1089,9 @@ (define_mode_iterator SWI48DWI [SI DI (TI "TARGET_64BIT")])
> >  ;; compile time constant, it is faster to use <MODE_SIZE> than
> >  ;; GET_MODE_SIZE (<MODE>mode).  For XFmode which depends on
> >  ;; command line options just use GET_MODE_SIZE macro.
> > -(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (TI "16")
> > -                            (SF "4") (DF "8") (XF "GET_MODE_SIZE (XFmode)")
> > +(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
> > +                            (TI "16") (HF "2") (SF "4") (DF "8")
> > +                            (XF "GET_MODE_SIZE (XFmode)")
> >                              (V16QI "16") (V32QI "32") (V64QI "64")
> >                              (V8HI "16") (V16HI "32") (V32HI "64")
> >                              (V4SI "16") (V8SI "32") (V16SI "64")
> > @@ -1222,8 +1223,11 @@ (define_mode_iterator MODEF [SF DF])
> >  ;; All x87 floating point modes
> >  (define_mode_iterator X87MODEF [SF DF XF])
> >
> > -;; All x87 floating point modes plus HF
> > -(define_mode_iterator X87MODEFH [SF DF XF HF])
> > +;; SSE and x87 SFmode and DFmode floating point modes plus HFmode
> > +(define_mode_iterator MODEFH [(HF "TARGET_AVX512FP16") SF DF])
> > +
> > +;; All x87 floating point modes plus HFmode
> > +(define_mode_iterator X87MODEFH [HF SF DF XF])
>
> A general remark: Please avoiding macroization of HFmode patterns for
> now. MODEF macro is used for cases where modes are shared between x87
> and SSE, so the patterns have:
>
> TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH).
>
> Looking at the macroization gain, it looks to me that we get nothing
> but complications with conditional MODEFH iterator. So, please remove
> all HFmode macroization (incuding mode attributes) and simply add a
> couple of expanders, protected with TARGET_AVX512FP16 insn constraint.
> We can macroize newly added patterns with existing in future, but
> please not now.
Changed.
>
> Uros.
>
> >  ;; All SSE floating point modes
> >  (define_mode_iterator SSEMODEF [SF DF TF])
> > @@ -1231,7 +1235,7 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
> >
> >  ;; SSE instruction suffix for various modes
> >  (define_mode_attr ssemodesuffix
> > -  [(SF "ss") (DF "sd")
> > +  [(HF "sh") (SF "ss") (DF "sd")
> >     (V16SF "ps") (V8DF "pd")
> >     (V8SF "ps") (V4DF "pd")
> >     (V4SF "ps") (V2DF "pd")
> > @@ -1498,15 +1502,15 @@ (define_expand "cstorexf4"
> >
> >  (define_expand "cbranch<mode>4"
> >    [(set (reg:CC FLAGS_REG)
> > -       (compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
> > -                   (match_operand:MODEF 2 "cmp_fp_expander_operand")))
> > +       (compare:CC (match_operand:MODEFH 1 "cmp_fp_expander_operand")
> > +                   (match_operand:MODEFH 2 "cmp_fp_expander_operand")))
> >     (set (pc) (if_then_else
> >                (match_operator 0 "ix86_fp_comparison_operator"
> >                 [(reg:CC FLAGS_REG)
> >                  (const_int 0)])
> >                (label_ref (match_operand 3))
> >                (pc)))]
> > -  "TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> > +  "TARGET_80387 || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
> >  {
> >    ix86_expand_branch (GET_CODE (operands[0]),
> >                       operands[1], operands[2], operands[3]);
> > @@ -1705,6 +1709,17 @@ (define_insn "*cmpi<unord><MODEF:mode>"
> >          (eq_attr "alternative" "0")
> >          (symbol_ref "true")
> >          (symbol_ref "false"))))])
> > +
> > +(define_insn "*cmpi<unord>hf"
> > +  [(set (reg:CCFP FLAGS_REG)
> > +       (compare:CCFP
> > +         (match_operand:HF 0 "register_operand" "v")
> > +         (match_operand:HF 1 "nonimmediate_operand" "vm")))]
> > +  "TARGET_AVX512FP16"
> > +  "v<unord>comish\t{%1, %0|%0, %1}"
> > +  [(set_attr "type" "ssecomi")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "HF")])
> >
> >  ;; Push/pop instructions.
> >
> > @@ -2436,8 +2451,8 @@ (define_insn "*movsi_internal"
> >            (symbol_ref "true")))])
> >
> >  (define_insn "*movhi_internal"
> > -  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k")
> > -       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC"))]
> > +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k,?r,?v,*v,*v,*m")
> > +       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
> >    "!(MEM_P (operands[0]) && MEM_P (operands[1]))
> >     && ix86_hardreg_mov_ok (operands[0], operands[1])"
> >
> > @@ -2463,6 +2478,9 @@ (define_insn "*movhi_internal"
> >           gcc_unreachable ();
> >         }
> >
> > +    case TYPE_SSEMOV:
> > +      return ix86_output_ssemov (insn, operands);
> > +
> >      case TYPE_MSKLOG:
> >        if (operands[1] == const0_rtx)
> >         return "kxorw\t%0, %0, %0";
> > @@ -2478,7 +2496,9 @@ (define_insn "*movhi_internal"
> >      }
> >  }
> >    [(set (attr "type")
> > -     (cond [(eq_attr "alternative" "4,5,6,7")
> > +     (cond [(eq_attr "alternative" "9,10,11,12,13")
> > +             (const_string "ssemov")
> > +           (eq_attr "alternative" "4,5,6,7")
> >               (const_string "mskmov")
> >             (eq_attr "alternative" "8")
> >               (const_string "msklog")
> > @@ -2503,6 +2523,8 @@ (define_insn "*movhi_internal"
> >      (set (attr "mode")
> >        (cond [(eq_attr "type" "imovx")
> >                (const_string "SI")
> > +            (eq_attr "alternative" "11")
> > +              (const_string "HF")
> >              (and (eq_attr "alternative" "1,2")
> >                   (match_operand:HI 1 "aligned_operand"))
> >                (const_string "SI")
> > @@ -2511,7 +2533,12 @@ (define_insn "*movhi_internal"
> >                        (not (match_test "TARGET_HIMODE_MATH"))))
> >                (const_string "SI")
> >             ]
> > -           (const_string "HI")))])
> > +           (const_string "HI")))
> > +    (set (attr "isa")
> > +        (cond [(eq_attr "alternative" "9,10,11,12,13")
> > +               (const_string "avx512fp16")
> > +              ]
> > +              (const_string "*")))])
>
> Attribute ISA should be the first in attribute section, see many examples.
>
Changed.
> >  ;; Situation is quite tricky about when to choose full sized (SImode) move
> >  ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
> > @@ -3727,7 +3754,10 @@ (define_insn "*movhf_internal"
> >                (eq_attr "alternative" "2")
> >                  (const_string "sselog1")
> >                (eq_attr "alternative" "4,5,6,7")
> > -                (const_string "sselog")
> > +                (if_then_else
> > +                  (match_test ("TARGET_AVX512FP16"))
> > +                  (const_string "ssemov")
> > +                  (const_string "sselog"))
> >               ]
> >               (const_string "ssemov")))
> >     (set (attr "memory")
> > @@ -3750,9 +3780,15 @@ (define_insn "*movhf_internal"
> >                (eq_attr "alternative" "2")
> >                  (const_string "V4SF")
> >                (eq_attr "alternative" "4,5,6,7")
> > -                (const_string "TI")
> > +                (if_then_else
> > +                  (match_test "TARGET_AVX512FP16")
> > +                  (const_string "HI")
> > +                  (const_string "TI"))
> >                (eq_attr "alternative" "3")
> > -                (const_string "SF")
> > +                (if_then_else
> > +                  (match_test "TARGET_AVX512FP16")
> > +                  (const_string "HF")
> > +                  (const_string "SF"))
> >               ]
> >               (const_string "*")))])
> >
> > @@ -4493,6 +4529,17 @@ (define_split
> >    emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
> >  })
> >
> > +(define_insn "extendhf<mode>2"
> > +  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
> > +        (float_extend:MODEF
> > +         (match_operand:HF 1 "nonimmediate_operand" "vm")))]
> > +  "TARGET_AVX512FP16"
> > +  "vcvtsh2<ssemodesuffix>\t{%1, %0, %0|%0, %0, %1}"
> > +  [(set_attr "type" "ssecvt")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "<MODE>")])
> > +
> > +
> >  (define_expand "extend<mode>xf2"
> >    [(set (match_operand:XF 0 "nonimmediate_operand")
> >          (float_extend:XF (match_operand:MODEF 1 "general_operand")))]
> > @@ -4670,6 +4717,18 @@ (define_insn "truncxf<mode>2"
> >               (symbol_ref "flag_unsafe_math_optimizations")
> >            ]
> >            (symbol_ref "true")))])
> > +
> > +;; Conversion from {SF,DF}mode to HFmode.
> > +
> > +(define_insn "trunc<mode>hf2"
> > +  [(set (match_operand:HF 0 "register_operand" "=v")
> > +       (float_truncate:HF
> > +         (match_operand:MODEF 1 "nonimmediate_operand" "vm")))]
> > +  "TARGET_AVX512FP16"
> > +  "vcvt<ssemodesuffix>2sh\t{%1, %d0|%d0, %1}"
> > +  [(set_attr "type" "ssecvt")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "HF")])
> >
> >  ;; Signed conversion to DImode.
> >
> > @@ -5046,6 +5105,16 @@ (define_insn "*float<SWI48:mode><MODEF:mode>2"
> >               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")]
> >            (symbol_ref "true")))])
> >
> > +(define_insn "float<floatunssuffix><mode>hf2"
> > +  [(set (match_operand:HF 0 "register_operand" "=v")
> > +       (any_float:HF
> > +         (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
> > +  "TARGET_AVX512FP16"
> > +  "vcvt<floatsuffix>si2sh<rex64suffix>\t{%1, %d0|%d0, %1}"
> > +  [(set_attr "type" "sseicvt")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "HF")])
> > +
> >  (define_insn "*floatdi<MODEF:mode>2_i387"
> >    [(set (match_operand:MODEF 0 "register_operand" "=f")
> >         (float:MODEF (match_operand:DI 1 "nonimmediate_operand" "m")))]
> > @@ -7627,12 +7696,12 @@ (define_expand "<insn>xf3"
> >    "TARGET_80387")
> >
> >  (define_expand "<insn><mode>3"
> > -  [(set (match_operand:MODEF 0 "register_operand")
> > -       (plusminus:MODEF
> > -         (match_operand:MODEF 1 "register_operand")
> > -         (match_operand:MODEF 2 "nonimmediate_operand")))]
> > +  [(set (match_operand:MODEFH 0 "register_operand")
> > +       (plusminus:MODEFH
> > +         (match_operand:MODEFH 1 "register_operand")
> > +         (match_operand:MODEFH 2 "nonimmediate_operand")))]
> >    "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
> > -    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)")
> > +    || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)")
> >
> >  ;; Multiply instructions
> >
> > @@ -8204,11 +8273,11 @@ (define_expand "mulxf3"
> >    "TARGET_80387")
> >
> >  (define_expand "mul<mode>3"
> > -  [(set (match_operand:MODEF 0 "register_operand")
> > -       (mult:MODEF (match_operand:MODEF 1 "register_operand")
> > -                   (match_operand:MODEF 2 "nonimmediate_operand")))]
> > +  [(set (match_operand:MODEFH 0 "register_operand")
> > +       (mult:MODEFH (match_operand:MODEFH 1 "register_operand")
> > +                   (match_operand:MODEFH 2 "nonimmediate_operand")))]
> >    "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
> > -    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)")
> > +    || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)")
> >
> >  ;; Divide instructions
> >
> > @@ -8221,11 +8290,11 @@ (define_expand "divxf3"
> >    "TARGET_80387")
> >
> >  (define_expand "div<mode>3"
> > -  [(set (match_operand:MODEF 0 "register_operand")
> > -       (div:MODEF (match_operand:MODEF 1 "register_operand")
> > -                  (match_operand:MODEF 2 "nonimmediate_operand")))]
> > +  [(set (match_operand:MODEFH 0 "register_operand")
> > +       (div:MODEFH (match_operand:MODEFH 1 "register_operand")
> > +                  (match_operand:MODEFH 2 "nonimmediate_operand")))]
> >    "(TARGET_80387 && X87_ENABLE_ARITH (<MODE>mode))
> > -    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> > +    || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
> >  {
> >    if (<MODE>mode == SFmode
> >        && TARGET_SSE && TARGET_SSE_MATH
> > @@ -16312,6 +16381,22 @@ (define_insn "*fop_<mode>_comm"
> >          (symbol_ref "true")
> >          (symbol_ref "false"))))])
> >
> > +(define_insn "*fop_hf_comm"
> > +  [(set (match_operand:HF 0 "register_operand" "=v")
> > +       (match_operator:HF 3 "binary_fp_operator"
> > +         [(match_operand:HF 1 "nonimmediate_operand" "%v")
> > +          (match_operand:HF 2 "nonimmediate_operand" "vm")]))]
> > +  "TARGET_AVX512FP16
> > +   && COMMUTATIVE_ARITH_P (operands[3])
> > +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> > +  "* return output_387_binary_op (insn, operands);"
> > +  [(set (attr "type")
> > +       (if_then_else (match_operand:HF 3 "mult_operator")
> > +         (const_string "ssemul")
> > +         (const_string "sseadd")))
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "HF")])
> > +
> >  (define_insn "*rcpsf2_sse"
> >    [(set (match_operand:SF 0 "register_operand" "=x,x,x")
> >         (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
> > @@ -16385,6 +16470,22 @@ (define_insn "*fop_<mode>_1"
> >          (symbol_ref "true")
> >          (symbol_ref "false"))))])
> >
> > +(define_insn "*fop_hf_1"
> > +  [(set (match_operand:HF 0 "register_operand" "=v")
> > +       (match_operator:HF 3 "binary_fp_operator"
> > +         [(match_operand:HF 1 "nonimmediate_operand" "v")
> > +          (match_operand:HF 2 "nonimmediate_operand" "vm")]))]
> > +  "TARGET_AVX512FP16
> > +   && !COMMUTATIVE_ARITH_P (operands[3])
> > +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> > +  "* return output_387_binary_op (insn, operands);"
> > +  [(set (attr "type")
> > +       (if_then_else (match_operand:MODEF 3 "div_operator")
> > +         (const_string "ssediv")
> > +         (const_string "sseadd")))
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "mode" "<MODE>")])
> > +
> >  (define_insn "*fop_<X87MODEF:mode>_2_i387"
> >    [(set (match_operand:X87MODEF 0 "register_operand" "=f")
> >         (match_operator:X87MODEF 3 "binary_fp_operator"
> > @@ -19179,13 +19280,13 @@ (define_peephole2
> >  })
> >
> >  (define_expand "mov<mode>cc"
> > -  [(set (match_operand:X87MODEF 0 "register_operand")
> > -       (if_then_else:X87MODEF
> > +  [(set (match_operand:X87MODEFH 0 "register_operand")
> > +       (if_then_else:X87MODEFH
> >           (match_operand 1 "comparison_operator")
> > -         (match_operand:X87MODEF 2 "register_operand")
> > -         (match_operand:X87MODEF 3 "register_operand")))]
> > +         (match_operand:X87MODEFH 2 "register_operand")
> > +         (match_operand:X87MODEFH 3 "register_operand")))]
> >    "(TARGET_80387 && TARGET_CMOVE)
> > -   || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> > +   || SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
> >    "if (ix86_expand_fp_movcc (operands)) DONE; else FAIL;")
> >
> >  (define_insn "*movxfcc_1"
> > @@ -19347,12 +19448,12 @@ (define_insn "<code><mode>3"
> >  ;; presence of -0.0 and NaN.
> >
> >  (define_insn "*ieee_s<ieee_maxmin><mode>3"
> > -  [(set (match_operand:MODEF 0 "register_operand" "=x,v")
> > -       (unspec:MODEF
> > -         [(match_operand:MODEF 1 "register_operand" "0,v")
> > -          (match_operand:MODEF 2 "nonimmediate_operand" "xm,vm")]
> > +  [(set (match_operand:MODEFH 0 "register_operand" "=x,v")
> > +       (unspec:MODEFH
> > +         [(match_operand:MODEFH 1 "register_operand" "0,v")
> > +          (match_operand:MODEFH 2 "nonimmediate_operand" "xm,vm")]
> >           IEEE_MAXMIN))]
> > -  "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
> > +  "SSE_FLOAT_MODE_SSEMATH_OR_HF_P (<MODE>mode)"
> >    "@
> >     <ieee_maxmin><ssemodesuffix>\t{%2, %0|%0, %2}
> >     v<ieee_maxmin><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index 7b8547bb1c3..ad366974b5b 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -1166,3 +1166,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property.
> >  mmwait
> >  Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save
> >  Support MWAIT and MONITOR built-in functions and code generation.
> > +
> > +mavx512fp16
> > +Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
> > +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation.
> > diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
> > index f129de4bbe5..2421a78637b 100644
> > --- a/gcc/config/i386/immintrin.h
> > +++ b/gcc/config/i386/immintrin.h
> > @@ -94,6 +94,10 @@
> >
> >  #include <avx512vp2intersectvlintrin.h>
> >
> > +#ifdef __SSE2__
> > +#include <avx512fp16intrin.h>
> > +#endif
> > +
> >  #include <shaintrin.h>
> >
> >  #include <fmaintrin.h>
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 32697e6117c..bb9f7ca956e 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -1393,6 +1393,7 @@ See RS/6000 and PowerPC Options.
> >  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
> >  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> >  -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni@gol
> > +-mavx512fp16 @gol
> >  -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
> >  -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
> >  -mkl -mwidekl @gol
> > @@ -31154,6 +31155,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
> >  @itemx -mavx512bf16
> >  @opindex mavx512bf16
> >  @need 200
> > +@itemx -mavx512fp16
> > +@opindex mavx512fp16
> > +@need 200
> >  @itemx -mgfni
> >  @opindex mgfni
> >  @need 200
> > @@ -31232,9 +31236,9 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP,
> >  XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2,
> >  GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
> >  ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
> > -UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI or CLDEMOTE
> > -extended instruction sets. Each has a corresponding @option{-mno-} option to
> > -disable use of these instructions.
> > +UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16
> > +or CLDEMOTE extended instruction sets. Each has a corresponding
> > +@option{-mno-} option to disable use of these instructions.
> >
> >  These extensions are also available as built-in functions: see
> >  @ref{x86 Built-in Functions}, for details of the functions enabled and
> > diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
> > index 62b2132957a..fba3d1ac684 100644
> > --- a/gcc/testsuite/g++.dg/other/i386-2.C
> > +++ b/gcc/testsuite/g++.dg/other/i386-2.C
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> > -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> > +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
> >
> >  /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
> >     xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
> > diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
> > index 843aa2bdb2f..5cc0fa83457 100644
> > --- a/gcc/testsuite/g++.dg/other/i386-3.C
> > +++ b/gcc/testsuite/g++.dg/other/i386-3.C
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> > -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> > +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
> >
> >  /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
> >     xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
> > diff --git a/gcc/testsuite/g++.target/i386/float16-1.C b/gcc/testsuite/g++.target/i386/float16-1.C
> > new file mode 100644
> > index 00000000000..95d1ac27c4f
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/i386/float16-1.C
> > @@ -0,0 +1,8 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mno-sse2" } */
> > +
> > +_Float16/* { dg-error "does not name a type" } */
> > +foo (_Float16 x)
> > +{
> > +  return x;
> > +}
> > diff --git a/gcc/testsuite/g++.target/i386/float16-2.C b/gcc/testsuite/g++.target/i386/float16-2.C
> > new file mode 100644
> > index 00000000000..99eb797eff1
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/i386/float16-2.C
> > @@ -0,0 +1,14 @@
> > +/* { dg-do assemble { target avx512fp16 } } */
> > +/* { dg-options "-O2 -mavx512fp16" } */
> > +
> > +union flt
> > +{
> > +  _Float16 flt;
> > +  short s;
> > +};
> > +
> > +_Float16
> > +foo (union flt x)
> > +{
> > +  return x.flt;
> > +}
> > diff --git a/gcc/testsuite/g++.target/i386/float16-3.C b/gcc/testsuite/g++.target/i386/float16-3.C
> > new file mode 100644
> > index 00000000000..940878503f1
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/i386/float16-3.C
> > @@ -0,0 +1,10 @@
> > +/* { dg-do assemble { target avx512fp16 } } */
> > +/* { dg-options "-O0 -mavx512fp16" } */
> > +
> > +template <typename> void a(char *) {}
> > +char b, d;
> > +void c()
> > +{
> > +  a<unsigned char>(&d);
> > +  a<_Float16>(&b);
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> > index 6178e38ce02..f3676077743 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw" } */
> > +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
> >  /* { dg-add-options bind_pic_locally } */
> >
> >  #include <mm_malloc.h>
> > diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
> > index 986fbd819e4..1751c52565c 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx-2.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx-2.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw" } */
> > +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
> >  /* { dg-add-options bind_pic_locally } */
> >
> >  #include <mm_malloc.h>
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > index 0a377dba1d5..0ad9064f637 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > @@ -87,6 +87,9 @@ main ()
> >  #ifdef AVX512VNNI
> >        && (ecx & bit_AVX512VNNI)
> >  #endif
> > +#ifdef AVX512FP16
> > +      && (edx & bit_AVX512FP16)
> > +#endif
> >  #ifdef VAES
> >        && (ecx & bit_VAES)
> >  #endif
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> > new file mode 100644
> > index 00000000000..88887556d68
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx512fp16" } */
> > +
> > +_Float16
> > +__attribute__ ((noinline, noclone))
> > +do_max (_Float16 __A, _Float16 __B)
> > +{
> > +  return __A > __B ? __A : __B;
> > +}
> > +
> > +_Float16
> > +__attribute__ ((noinline, noclone))
> > +do_min (_Float16 __A, _Float16 __B)
> > +{
> > +  return __A < __B ? __A : __B;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> > +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]" 1 } } */
> > +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> > new file mode 100644
> > index 00000000000..c9e23bf95c2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do run { target avx512fp16 } } */
> > +/* { dg-options "-O2 -mavx512fp16" } */
> > +
> > +#include <string.h>
> > +
> > +static void do_test (void);
> > +
> > +#define DO_TEST do_test
> > +#define AVX512FP16
> > +#include "avx512-check.h"
> > +#include "avx512fp16-12a.c"
> > +
> > +static void
> > +do_test (void)
> > +{
> > +  _Float16 x = 0.1f;
> > +  _Float16 y = -3.2f;
> > +  _Float16 z;
> > +
> > +  z = do_max (x, y);
> > +  if (z != x)
> > +    abort ();
> > +
> > +  z = do_min (x, y);
> > +  if (z != y)
> > +    abort ();
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/float16-3a.c b/gcc/testsuite/gcc.target/i386/float16-3a.c
> > new file mode 100644
> > index 00000000000..3846c8e9b6e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/float16-3a.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx512fp16" } */
> > +
> > +_Float16
> > +foo (int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vcvtsi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/float16-3b.c b/gcc/testsuite/gcc.target/i386/float16-3b.c
> > new file mode 100644
> > index 00000000000..247dd6e7e33
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/float16-3b.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx512fp16" } */
> > +
> > +_Float16
> > +foo (unsigned int x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vcvtusi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/float16-4a.c b/gcc/testsuite/gcc.target/i386/float16-4a.c
> > new file mode 100644
> > index 00000000000..631082581f3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/float16-4a.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mavx512fp16" } */
> > +
> > +_Float16
> > +foo (long long x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vcvtsi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/float16-4b.c b/gcc/testsuite/gcc.target/i386/float16-4b.c
> > new file mode 100644
> > index 00000000000..828d8530769
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/float16-4b.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mavx512fp16" } */
> > +
> > +_Float16
> > +foo (unsigned long long x)
> > +{
> > +  return x;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vcvtusi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> > index 79265c7c94f..8499fdf2db9 100644
> > --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> > +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> > @@ -79,6 +79,7 @@ extern void test_hreset (void)                        __attribute__((__target__("hreset")));
> >  extern void test_keylocker (void)              __attribute__((__target__("kl")));
> >  extern void test_widekl (void)                 __attribute__((__target__("widekl")));
> >  extern void test_avxvnni (void)                        __attribute__((__target__("avxvnni")));
> > +extern void test_avx512fp16 (void)             __attribute__((__target__("avx512fp16")));
> >
> >  extern void test_no_sgx (void)                 __attribute__((__target__("no-sgx")));
> >  extern void test_no_avx5124fmaps(void)         __attribute__((__target__("no-avx5124fmaps")));
> > @@ -159,6 +160,7 @@ extern void test_no_hreset (void)           __attribute__((__target__("no-hreset")));
> >  extern void test_no_keylocker (void)           __attribute__((__target__("no-kl")));
> >  extern void test_no_widekl (void)              __attribute__((__target__("no-widekl")));
> >  extern void test_no_avxvnni (void)             __attribute__((__target__("no-avxvnni")));
> > +extern void test_no_avx512fp16 (void)          __attribute__((__target__("no-avx512fp16")));
> >
> >  extern void test_arch_nocona (void)            __attribute__((__target__("arch=nocona")));
> >  extern void test_arch_core2 (void)             __attribute__((__target__("arch=core2")));
> > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > new file mode 100644
> > index 00000000000..2f8af392c83
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx512fp16" } */
> > +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> > +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> > +
> > +#include <immintrin.h>
> > +
> > +_Float16
> > +foo (_Float16 x, _Float16 y)
> > +{
> > +  x = x > y ? x : y;
> > +  return x;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> > index 7029771334b..f5f5c113612 100644
> > --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> > +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> > +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
> >  /* { dg-add-options bind_pic_locally } */
> >
> >  #include <mm_malloc.h>
> > diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> > index 4ce0ffffaf3..747d504cedb 100644
> > --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> > +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> > +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
> >  /* { dg-add-options bind_pic_locally } */
> >
> >  #include <mm_malloc.h>
> > diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> > index 6e8b6f3fa1b..33411969901 100644
> > --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> > +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> > @@ -103,7 +103,7 @@
> >
> >
> >  #ifndef DIFFERENT_PRAGMAS
> > -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> > +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
> >  #endif
> >
> >  /* Following intrinsics require immediate arguments.  They
> > @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
> >
> >  /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
> >  #ifdef DIFFERENT_PRAGMAS
> > -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> > +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
> >  #endif
> >  #include <immintrin.h>
> >  test_1 (_cvtss_sh, unsigned short, float, 1)
> > diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> > index 7faa053ace8..86590ca5ffb 100644
> > --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> > +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> > @@ -708,6 +708,6 @@
> >  #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1)
> >  #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1)
> >
> > -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> > +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
> >
> >  #include <x86intrin.h>
> > diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> > index 42ac9d0ac1a..10765365d7b 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -3020,7 +3020,7 @@ proc check_effective_target_has_q_floating_suffix { } {
> >
> >  proc check_effective_target_float16 {} {
> >      return [check_no_compiler_messages_nocache float16 object {
> > -        _Float16 x;
> > +        _Float16 foo (_Float16 x) { return x; }
> >      } [add_options_for_float16 ""]]
> >  }
> >
> > @@ -8714,6 +8714,17 @@ proc check_prefer_avx128 { } {
> >  }
> >
> >
> > +# Return 1 if avx512fp16 instructions can be compiled.
> > +
> > +proc check_effective_target_avx512fp16 { } {
> > +    return [check_no_compiler_messages avx512fp16 object {
> > +       void foo (void)
> > +       {
> > +         asm volatile ("vmovw %edi, %xmm0");
> > +       }
> > +    } "-O2 -mavx512fp16" ]
> > +}
> > +
> >  # Return 1 if avx512f instructions can be compiled.
> >
> >  proc check_effective_target_avx512f { } {
> > --
> > 2.18.1
> >

Here is the updated patch.

-- 
BR,
Hongtao

[-- Attachment #2: V3-0004-AVX512FP16-Initial-support-for-AVX512FP16-feature-an.patch --]
[-- Type: application/octet-stream, Size: 67141 bytes --]

From dbb22e9fa3f5e3df082f7113eb6dfe78bbc4eeaf Mon Sep 17 00:00:00 2001
From: "Guo, Xuepeng" <xuepeng.guo@intel.com>
Date: Mon, 24 Dec 2018 19:39:26 -0800
Subject: [PATCH 4/9] AVX512FP16: Initial support for AVX512FP16 feature and
 scalar _Float16 instructions.

gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (get_available_features):
	Detect FEATURE_AVX512FP16.
	* common/config/i386/i386-common.c
	(OPTION_MASK_ISA_AVX512FP16_SET,
	OPTION_MASK_ISA_AVX512FP16_UNSET,
	OPTION_MASK_ISA2_AVX512FP16_SET,
	OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
	(OPTION_MASK_ISA2_AVX512BW_UNSET,
	OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
	(ix86_handle_option): Handle -mavx512fp16.
	* common/config/i386/i386-cpuinfo.h (enum processor_features):
	Add FEATURE_AVX512FP16.
	* common/config/i386/i386-isas.h: Add entry for AVX512FP16.
	* config.gcc: Add avx512fp16intrin.h.
	* config/i386/avx512fp16intrin.h: New intrinsic header.
	* config/i386/cpuid.h: Add bit_AVX512FP16.
	* config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
	* config/i386/i386-builtins.c: Support _Float16 type for i386
	backend.
	(ix86_init_float16_builtins): New function.
	(ix86_float16_type_node): New.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__AVX512FP16__.
	* config/i386/i386-expand.c (ix86_expand_branch): Support
	HFmode.
	(ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
	SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
	(ix86_expand_fp_movcc): Ditto.
	* config/i386/i386-isa.def: Add PTA define for AVX512FP16.
	* config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
	(ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
	* config/i386/i386.c (ix86_get_ssemov): Use
	vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
	(ix86_get_excess_precision): Use
	FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
	existed.
	(sse_store_index): Use SFmode cost for HFmode cost.
	(inline_memory_move_cost): Add HFmode, and perfer SSE cost over
	GPR cost for HFmode.
	(ix86_hard_regno_mode_ok): Allow HImode in sse register.
	(ix86_mangle_type): Add manlging for _Float16 type.
	(inline_secondary_memory_needed): No memory is needed for
	16bit movement between gpr and sse reg under
	TARGET_AVX512FP16.
	(ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
	SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
	(ix86_division_cost): Ditto.
	(ix86_rtx_costs): Ditto.
	(ix86_add_stmt_cost): Ditto.
	(ix86_optab_supported_p): Ditto.
	* config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
	(SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
	(PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
	* config/i386/i386.md (mode): Add HFmode.
	(MODE_SIZE): Add HFmode.
	(isa): Add avx512fp16.
	(enabled): Handle avx512fp16.
	(ssemodesuffix): Add sh suffix for HFmode.
	(comm): Add mult, div.
	(plusminusmultdiv): New code iterator.
	(insn): Add mult, div.
	(*movhf_internal): Adjust for avx512fp16 instruction.
	(*movhi_internal): Ditto.
	(*cmpi<unord>hf): New define_insn for HFmode.
	(*ieee_s<ieee_maxmin>hf3): Likewise.
	(extendhf<mode>2): Likewise.
	(trunc<mode>hf2): Likewise.
	(float<floatunssuffix><mode>hf2): Likewise.
	(*<insn>hf): Likewise.
	(cbranchhf4): New expander.
	(movhfcc): Likewise.
	(<insn>hf3): Likewise.
	(mulhf3): Likewise.
	(divhf3): Likewise.
	* config/i386/i386.opt: Add mavx512fp16.
	* config/i386/immintrin.h: Include avx512fp16intrin.h.
	* doc/invoke.texi: Add mavx512fp16.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
	* gcc.target/i386/avx-2.c: Ditto.
	* gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute check.
	* gcc.target/i386/sse-13.c: Add -mavx512fp16.
	* gcc.target/i386/sse-14.c: Ditto.
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* lib/target-supports.exp: (check_effective_target_avx512fp16): New.
	* g++.target/i386/float16-1.C: New test.
	* g++.target/i386/float16-2.C: Ditto.
	* g++.target/i386/float16-3.C: Ditto.
	* gcc.target/i386/avx512fp16-12a.c: Ditto.
	* gcc.target/i386/avx512fp16-12b.c: Ditto.
	* gcc.target/i386/float16-3a.c: Ditto.
	* gcc.target/i386/float16-3b.c: Ditto.
	* gcc.target/i386/float16-4a.c: Ditto.
	* gcc.target/i386/float16-4b.c: Ditto.
	* gcc.target/i386/pr54855-12.c: Ditto.
	* g++.dg/other/i386-2.C: Ditto.
	* g++.dg/other/i386-3.C: Ditto.

Co-Authored-By: Guo, Xuepeng <xuepeng.guo@intel.com>
Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
Co-Authored-By: Liu, Hongtao <hongtao.liu@intel.com>
Co-Authored-By: Wang, Hongyu <hongyu.wang@intel.com>
Co-Authored-By: Xu, Dianhong <dianhong.xu@intel.com>

Fix 1.
---
 gcc/common/config/i386/cpuinfo.h              |   2 +
 gcc/common/config/i386/i386-common.c          |  26 ++-
 gcc/common/config/i386/i386-cpuinfo.h         |   1 +
 gcc/common/config/i386/i386-isas.h            |   1 +
 gcc/config.gcc                                |   2 +-
 gcc/config/i386/avx512fp16intrin.h            |  53 ++++++
 gcc/config/i386/cpuid.h                       |   1 +
 gcc/config/i386/i386-builtin-types.def        |   1 +
 gcc/config/i386/i386-builtins.c               |  23 +++
 gcc/config/i386/i386-c.c                      |   2 +
 gcc/config/i386/i386-expand.c                 |   5 +-
 gcc/config/i386/i386-isa.def                  |   1 +
 gcc/config/i386/i386-options.c                |   4 +-
 gcc/config/i386/i386.c                        | 129 +++++++++----
 gcc/config/i386/i386.h                        |  11 +-
 gcc/config/i386/i386.md                       | 172 ++++++++++++++++--
 gcc/config/i386/i386.opt                      |   4 +
 gcc/config/i386/immintrin.h                   |   4 +
 gcc/doc/invoke.texi                           |  10 +-
 gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
 gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
 gcc/testsuite/g++.target/i386/float16-2.C     |  14 ++
 gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
 gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
 gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
 gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
 .../gcc.target/i386/avx512fp16-12a.c          |  21 +++
 .../gcc.target/i386/avx512fp16-12b.c          |  27 +++
 gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 ++
 gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
 gcc/testsuite/lib/target-supports.exp         |  13 +-
 40 files changed, 547 insertions(+), 75 deletions(-)
 create mode 100644 gcc/config/i386/avx512fp16intrin.h
 create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 458f41de776..1835ac64e67 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -731,6 +731,8 @@ get_available_features (struct __processor_model *cpu_model,
 	    set_feature (FEATURE_AVX5124FMAPS);
 	  if (edx & bit_AVX512VP2INTERSECT)
 	    set_feature (FEATURE_AVX512VP2INTERSECT);
+	  if (edx & bit_AVX512FP16)
+	    set_feature (FEATURE_AVX512FP16);
 	}
 
       __cpuid_count (7, 1, eax, ebx, ecx, edx);
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 76ab1a14e54..00c65ba15ab 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX5124VNNIW_SET OPTION_MASK_ISA2_AVX5124VNNIW
 #define OPTION_MASK_ISA_AVX512VBMI2_SET \
   (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512F_SET)
+#define OPTION_MASK_ISA_AVX512FP16_SET OPTION_MASK_ISA_AVX512BW_SET
+#define OPTION_MASK_ISA2_AVX512FP16_SET OPTION_MASK_ISA2_AVX512FP16
 #define OPTION_MASK_ISA_AVX512VNNI_SET \
   (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512F_SET)
 #define OPTION_MASK_ISA2_AVXVNNI_SET OPTION_MASK_ISA2_AVXVNNI
@@ -231,6 +233,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS
 #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW
 #define OPTION_MASK_ISA_AVX512VBMI2_UNSET OPTION_MASK_ISA_AVX512VBMI2
+#define OPTION_MASK_ISA_AVX512FP16_UNSET OPTION_MASK_ISA_AVX512BW_UNSET
+#define OPTION_MASK_ISA2_AVX512FP16_UNSET OPTION_MASK_ISA2_AVX512FP16
 #define OPTION_MASK_ISA_AVX512VNNI_UNSET OPTION_MASK_ISA_AVX512VNNI
 #define OPTION_MASK_ISA2_AVXVNNI_UNSET OPTION_MASK_ISA2_AVXVNNI
 #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ
@@ -313,7 +317,8 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA2_AVX512BF16_UNSET \
    | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \
    | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \
-   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
+   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
+   | OPTION_MASK_ISA2_AVX512FP16_UNSET)
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
   (OPTION_MASK_ISA2_AVX512F_UNSET)
 #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
@@ -326,7 +331,9 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET)
 #define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET
 
-#define OPTION_MASK_ISA2_AVX512BW_UNSET OPTION_MASK_ISA2_AVX512BF16_UNSET
+#define OPTION_MASK_ISA2_AVX512BW_UNSET \
+  (OPTION_MASK_ISA2_AVX512BF16_UNSET \
+    | OPTION_MASK_ISA2_AVX512FP16_UNSET)
 
 /* Set 1 << value as value of -malign-FLAG option.  */
 
@@ -853,6 +860,21 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mavx512fp16:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512FP16_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
+	}
+      return true;
+
     case OPT_mavx512vnni:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index e68dd656046..4e0659fc7b2 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -228,6 +228,7 @@ enum processor_features
   FEATURE_AESKLE,
   FEATURE_WIDEKL,
   FEATURE_AVXVNNI,
+  FEATURE_AVX512FP16,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 898c18f3dda..a6783660278 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -169,4 +169,5 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("aeskle", FEATURE_AESKLE, P_NONE, NULL)
   ISA_NAMES_TABLE_ENTRY("widekl", FEATURE_WIDEKL, P_NONE, "-mwidekl")
   ISA_NAMES_TABLE_ENTRY("avxvnni", FEATURE_AVXVNNI, P_NONE, "-mavxvnni")
+  ISA_NAMES_TABLE_ENTRY("avx512fp16", FEATURE_AVX512FP16, P_NONE, "-mavx512fp16")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3df9b52cf25..a354351408c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
 		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
 		       amxbf16intrin.h x86gprintrin.h uintrintrin.h
 		       hresetintrin.h keylockerintrin.h avxvnniintrin.h
-		       mwaitintrin.h"
+		       mwaitintrin.h avx512fp16intrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
new file mode 100644
index 00000000000..38d63161ba6
--- /dev/null
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -0,0 +1,53 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef __AVX512FP16INTRIN_H_INCLUDED
+#define __AVX512FP16INTRIN_H_INCLUDED
+
+#ifndef __AVX512FP16__
+#pragma GCC push_options
+#pragma GCC target("avx512fp16")
+#define __DISABLE_AVX512FP16__
+#endif /* __AVX512FP16__ */
+
+/* Internal data types for implementing the intrinsics.  */
+typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
+typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
+
+/* The Intel API is flexible enough that we must allow aliasing with other
+   vector types, and their scalar components.  */
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
+typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
+
+#ifdef __DISABLE_AVX512FP16__
+#undef __DISABLE_AVX512FP16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512FP16__ */
+
+#endif /* __AVX512FP16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index aebc17c6827..82b8050028b 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -126,6 +126,7 @@
 #define bit_AVX5124VNNIW (1 << 2)
 #define bit_AVX5124FMAPS (1 << 3)
 #define bit_AVX512VP2INTERSECT	(1 << 8)
+#define bit_AVX512FP16   (1 << 23)
 #define bit_IBT	(1 << 20)
 #define bit_UINTR (1 << 5)
 #define bit_PCONFIG	(1 << 18)
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 3ca313c19ec..1768b88d748 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -68,6 +68,7 @@ DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
 DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
+DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
index 204e2903126..668f09f12a0 100644
--- a/gcc/config/i386/i386-builtins.c
+++ b/gcc/config/i386/i386-builtins.c
@@ -125,6 +125,7 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
 /* Table for the ix86 builtin non-function types.  */
 static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
+tree ix86_float16_type_node = NULL_TREE;
 /* Retrieve an element from the above table, building some of
    the types lazily.  */
 
@@ -1343,6 +1344,26 @@ ix86_init_builtins_va_builtins_abi (void)
 			BUILT_IN_VA_COPY, BUILT_IN_NORMAL, NULL, fnattr_sysv);
 }
 
+static void
+ix86_init_float16_builtins (void)
+{
+  /* Provide the _Float16 type and float16_type_node if needed so that
+     it can be used in AVX512FP16 intrinsics and builtins.  */
+  if (!float16_type_node)
+    {
+      ix86_float16_type_node = make_node (REAL_TYPE);
+      TYPE_PRECISION (ix86_float16_type_node) = 16;
+      SET_TYPE_MODE (ix86_float16_type_node, HFmode);
+      layout_type (ix86_float16_type_node);
+    }
+  else
+    ix86_float16_type_node = float16_type_node;
+
+  if (!maybe_get_identifier ("_Float16") && TARGET_SSE2)
+    lang_hooks.types.register_builtin_type (ix86_float16_type_node,
+					    "_Float16");
+}
+
 static void
 ix86_init_builtin_types (void)
 {
@@ -1371,6 +1392,8 @@ ix86_init_builtin_types (void)
      it.  */
   lang_hooks.types.register_builtin_type (float128_type_node, "__float128");
 
+  ix86_init_float16_builtins ();
+
   const_string_type_node
     = build_pointer_type (build_qualified_type
 			  (char_type_node, TYPE_QUAL_CONST));
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 5ed0de006fb..cc64f855ecc 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -598,6 +598,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__PTWRITE__");
   if (isa_flag2 & OPTION_MASK_ISA2_AVX512BF16)
     def_or_undef (parse_in, "__AVX512BF16__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AVX512FP16)
+    def_or_undef (parse_in, "__AVX512FP16__");
   if (TARGET_MMX_WITH_SSE)
     def_or_undef (parse_in, "__MMX_WITH_SSE__");
   if (isa_flag2 & OPTION_MASK_ISA2_ENQCMD)
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 69ea79e6123..b7d050a1e42 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -2314,6 +2314,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
 
   switch (mode)
     {
+    case E_HFmode:
     case E_SFmode:
     case E_DFmode:
     case E_XFmode:
@@ -2627,7 +2628,7 @@ ix86_prepare_fp_compare_args (enum rtx_code code, rtx *pop0, rtx *pop1)
   bool unordered_compare = ix86_unordered_fp_compare (code);
   rtx op0 = *pop0, op1 = *pop1;
   machine_mode op_mode = GET_MODE (op0);
-  bool is_sse = TARGET_SSE_MATH && SSE_FLOAT_MODE_P (op_mode);
+  bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
 
   /* All of the unordered compare instructions only work on registers.
      The same is true of the fcomi compare instructions.  The XFmode
@@ -4112,7 +4113,7 @@ ix86_expand_fp_movcc (rtx operands[])
   rtx op0 = XEXP (operands[1], 0);
   rtx op1 = XEXP (operands[1], 1);
 
-  if (TARGET_SSE_MATH && SSE_FLOAT_MODE_P (mode))
+  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
     {
       machine_mode cmode;
 
diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
index a0d46cbc892..83d9302ea3d 100644
--- a/gcc/config/i386/i386-isa.def
+++ b/gcc/config/i386/i386-isa.def
@@ -108,3 +108,4 @@ DEF_PTA(HRESET)
 DEF_PTA(KL)
 DEF_PTA(WIDEKL)
 DEF_PTA(AVXVNNI)
+DEF_PTA(AVX512FP16)
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 3416a4f1752..df191763e4b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -223,7 +223,8 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mhreset",		OPTION_MASK_ISA2_HRESET },
   { "-mkl",		OPTION_MASK_ISA2_KL },
   { "-mwidekl", 	OPTION_MASK_ISA2_WIDEKL },
-  { "-mavxvnni",	OPTION_MASK_ISA2_AVXVNNI }
+  { "-mavxvnni",	OPTION_MASK_ISA2_AVXVNNI },
+  { "-mavx512fp16",	OPTION_MASK_ISA2_AVX512FP16 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1045,6 +1046,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
     IX86_ATTR_ISA ("hreset", OPT_mhreset),
     IX86_ATTR_ISA ("avxvnni",   OPT_mavxvnni),
+    IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 597e4d68247..485d591275c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5497,6 +5497,14 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
     case MODE_SI:
       return "%vmovd\t{%1, %0|%0, %1}";
 
+    case MODE_HI:
+      if (GENERAL_REG_P (operands[0]))
+	return "vmovw\t{%1, %k0|%k0, %1}";
+      else if (GENERAL_REG_P (operands[1]))
+	return "vmovw\t{%k1, %0|%0, %k1}";
+      else
+	return "vmovw\t{%1, %0|%0, %1}";
+
     case MODE_DF:
       if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
 	return "vmovsd\t{%d1, %0|%0, %d1}";
@@ -5509,6 +5517,12 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
       else
 	return "%vmovss\t{%1, %0|%0, %1}";
 
+    case MODE_HF:
+      if (REG_P (operands[0]) && REG_P (operands[1]))
+	return "vmovsh\t{%d1, %0|%0, %d1}";
+      else
+	return "vmovsh\t{%1, %0|%0, %1}";
+
     case MODE_V1DF:
       gcc_assert (!TARGET_AVX);
       return "movlpd\t{%1, %0|%0, %1}";
@@ -13955,7 +13969,7 @@ output_387_binary_op (rtx_insn *insn, rtx *operands)
 
   if (is_sse)
    {
-     p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd";
+     p = (GET_MODE (operands[0]) == SFmode ? "ss" : "sd");
      strcat (buf, p);
 
      if (TARGET_AVX)
@@ -19132,10 +19146,19 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1,
       if (!TARGET_SSE2)
 	return true;
 
+      if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2)))
+	return true;
+
+      int msize = GET_MODE_SIZE (mode);
+
       /* Between SSE and general, we have moves no larger than word size.  */
-      if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
-	  || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
-	  || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+      if (msize > UNITS_PER_WORD)
+	return true;
+
+      /* In addition to SImode moves, AVX512FP16 also enables HImode moves.  */
+      int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);
+
+      if (msize < minsize)
 	return true;
 
       /* If the target says that inter-unit moves are more expensive
@@ -19229,21 +19252,26 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
 static inline int
 sse_store_index (machine_mode mode)
 {
-      switch (GET_MODE_SIZE (mode))
-	{
-	  case 4:
-	    return 0;
-	  case 8:
-	    return 1;
-	  case 16:
-	    return 2;
-	  case 32:
-	    return 3;
-	  case 64:
-	    return 4;
-	  default:
-	    return -1;
-	}
+  /* NB: Use SFmode cost for HFmode instead of adding HFmode load/store
+     costs to processor_costs, which requires changes to all entries in
+     processor cost table.  */
+  if (mode == E_HFmode)
+    mode = E_SFmode;
+  switch (GET_MODE_SIZE (mode))
+    {
+    case 4:
+      return 0;
+    case 8:
+      return 1;
+    case 16:
+      return 2;
+    case 32:
+      return 3;
+    case 64:
+      return 4;
+    default:
+      return -1;
+    }
 }
 
 /* Return the cost of moving data of mode M between a
@@ -19270,6 +19298,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
       int index;
       switch (mode)
 	{
+	  case E_HFmode:
 	  case E_SFmode:
 	    index = 0;
 	    break;
@@ -19370,11 +19399,31 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
 	  }
 	break;
       case 2:
-	if (in == 2)
-	  return MAX (ix86_cost->hard_register.int_load[1],
-		      ix86_cost->hard_register.int_store[1]);
-	return in ? ix86_cost->hard_register.int_load[1]
-		  : ix86_cost->hard_register.int_store[1];
+	{
+	  int cost;
+	  if (in == 2)
+	    cost = MAX (ix86_cost->hard_register.int_load[1],
+			ix86_cost->hard_register.int_store[1]);
+	  else
+	    cost = in ? ix86_cost->hard_register.int_load[1]
+		      : ix86_cost->hard_register.int_store[1];
+	  if (mode == E_HFmode)
+	    {
+	      /* Prefer SSE over GPR for HFmode.  */
+	      int sse_cost;
+	      int index = sse_store_index (mode);
+	      if (in == 2)
+		sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
+				ix86_cost->hard_register.sse_store[index]);
+	      else
+		sse_cost = (in
+			    ? ix86_cost->hard_register.sse_load [index]
+			    : ix86_cost->hard_register.sse_store [index]);
+	      if (sse_cost >= cost)
+		cost = sse_cost + 1;
+	    }
+	  return cost;
+	}
       default:
 	if (in == 2)
 	  cost = MAX (ix86_cost->hard_register.int_load[2],
@@ -19548,6 +19597,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	  - XI mode
 	  - any of 512-bit wide vector mode
 	  - any scalar mode.  */
+      /* For AVX512FP16, vmovw supports movement of HImode
+	 between gpr and sse registser.  */
       if (TARGET_AVX512F
 	  && (mode == XImode
 	      || VALID_AVX512F_REG_MODE (mode)
@@ -19831,7 +19882,7 @@ ix86_multiplication_cost (const struct processor_costs *cost,
   if (VECTOR_MODE_P (mode))
     inner_mode = GET_MODE_INNER (mode);
 
-  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
     return inner_mode == DFmode ? cost->mulsd : cost->mulss;
   else if (X87_FLOAT_MODE_P (mode))
     return cost->fmul;
@@ -19883,7 +19934,7 @@ ix86_division_cost (const struct processor_costs *cost,
   if (VECTOR_MODE_P (mode))
     inner_mode = GET_MODE_INNER (mode);
 
-  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
     return inner_mode == DFmode ? cost->divsd : cost->divss;
   else if (X87_FLOAT_MODE_P (mode))
     return cost->fdiv;
@@ -20303,7 +20354,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
 	  return true;
 	}
 
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	{
 	  *total = cost->addss;
 	  return false;
@@ -20336,7 +20387,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       /* FALLTHRU */
 
     case NEG:
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	{
 	  *total = cost->sse_op;
 	  return false;
@@ -20418,14 +20469,14 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       return false;
 
     case FLOAT_EXTEND:
-      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
+      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = 0;
       else
         *total = ix86_vec_cost (mode, cost->addss);
       return false;
 
     case FLOAT_TRUNCATE:
-      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
+      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = cost->fadd;
       else
         *total = ix86_vec_cost (mode, cost->addss);
@@ -20435,7 +20486,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       /* SSE requires memory load for the constant operand. It may make
 	 sense to account for this.  Of course the constant operand may or
 	 may not be reused. */
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = cost->sse_op;
       else if (X87_FLOAT_MODE_P (mode))
 	*total = cost->fabs;
@@ -20444,7 +20495,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       return false;
 
     case SQRT:
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = mode == SFmode ? cost->sqrtss : cost->sqrtsd;
       else if (X87_FLOAT_MODE_P (mode))
 	*total = cost->fsqrt;
@@ -21928,6 +21979,10 @@ ix86_mangle_type (const_tree type)
 
   switch (TYPE_MODE (type))
     {
+    case E_HFmode:
+      /* _Float16 is "DF16_".
+	 Align with clang's decision in https://reviews.llvm.org/D33719. */
+      return "DF16_";
     case E_TFmode:
       /* __float128 is "g".  */
       return "g";
@@ -22551,7 +22606,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	case MINUS_EXPR:
 	  if (kind == scalar_stmt)
 	    {
-	      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+	      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 		stmt_cost = ix86_cost->addss;
 	      else if (X87_FLOAT_MODE_P (mode))
 		stmt_cost = ix86_cost->fadd;
@@ -22569,7 +22624,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	  stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
 	  break;
 	case NEGATE_EXPR:
-	  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+	  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	    stmt_cost = ix86_cost->sse_op;
 	  else if (X87_FLOAT_MODE_P (mode))
 	    stmt_cost = ix86_cost->fchs;
@@ -22625,7 +22680,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	case BIT_XOR_EXPR:
 	case BIT_AND_EXPR:
 	case BIT_NOT_EXPR:
-	  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+	  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	    stmt_cost = ix86_cost->sse_op;
 	  else if (VECTOR_MODE_P (mode))
 	    stmt_cost = ix86_vec_cost (mode, ix86_cost->sse_op);
@@ -23327,7 +23382,9 @@ ix86_get_excess_precision (enum excess_precision_type type)
 	/* The fastest type to promote to will always be the native type,
 	   whether that occurs with implicit excess precision or
 	   otherwise.  */
-	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+	return TARGET_AVX512FP16
+	       ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
+	       : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
       case EXCESS_PRECISION_TYPE_STANDARD:
       case EXCESS_PRECISION_TYPE_IMPLICIT:
 	/* Otherwise, the excess precision we want when we are
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index b1e66ee192e..8fcd5693624 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1000,7 +1000,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define VALID_AVX512F_SCALAR_MODE(MODE)					\
   ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode		\
-   || (MODE) == SFmode)
+   || (MODE) == SFmode							\
+   || (TARGET_AVX512FP16 && ((MODE) == HImode || (MODE) == HFmode)))
 
 #define VALID_AVX512F_REG_MODE(MODE)					\
   ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode	\
@@ -1039,7 +1040,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define VALID_FP_MODE_P(MODE)						\
   ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode		\
-   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)		\
+   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
 
 #define VALID_INT_MODE_P(MODE)						\
   ((MODE) == QImode || (MODE) == HImode					\
@@ -1072,6 +1073,10 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
 
+#define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE)				\
+  ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH)				\
+   || (TARGET_AVX512FP16 && (MODE) == HFmode))
+
 #define FMA4_VEC_FLOAT_MODE_P(MODE) \
   (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \
 		  || (MODE) == V8SFmode || (MODE) == V4DFmode))
@@ -2265,7 +2270,7 @@ constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
 constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
   | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
-  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI;
+  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16;
 constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
   | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
 constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d475347172d..777d11261ac 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -496,7 +496,7 @@ (define_attr "type"
 
 ;; Main data type used by the insn
 (define_attr "mode"
-  "unknown,none,QI,HI,SI,DI,TI,OI,XI,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
+  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
   V2DF,V2SF,V1DF,V8DF"
   (const_string "unknown"))
 
@@ -832,8 +832,7 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
 		    avx512bw,noavx512bw,avx512dq,noavx512dq,
-		    avx512vl,noavx512vl,
-		    avxvnni,avx512vnnivl"
+		    avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16"
   (const_string "base"))
 
 ;; Define instruction set of MMX instructions
@@ -885,6 +884,8 @@ (define_attr "enabled" ""
 	 (eq_attr "isa" "avxvnni") (symbol_ref "TARGET_AVXVNNI")
 	 (eq_attr "isa" "avx512vnnivl")
 	   (symbol_ref "TARGET_AVX512VNNI && TARGET_AVX512VL")
+	 (eq_attr "isa" "avx512fp16")
+	   (symbol_ref "TARGET_AVX512FP16")
 
 	 (eq_attr "mmx_isa" "native")
 	   (symbol_ref "!TARGET_MMX_WITH_SSE")
@@ -906,6 +907,7 @@ (define_asm_attributes
    (set_attr "type" "multi")])
 
 (define_code_iterator plusminus [plus minus])
+(define_code_iterator plusminusmultdiv [plus minus mult div])
 
 (define_code_iterator sat_plusminus [ss_plus us_plus ss_minus us_minus])
 
@@ -921,7 +923,8 @@ (define_code_attr multdiv_mnemonic
 
 ;; Mark commutative operators as such in constraints.
 (define_code_attr comm [(plus "%") (ss_plus "%") (us_plus "%")
-			(minus "") (ss_minus "") (us_minus "")])
+			(minus "") (ss_minus "") (us_minus "")
+			(mult "%") (div "")])
 
 ;; Mapping of max and min
 (define_code_iterator maxmin [smax smin umax umin])
@@ -1021,7 +1024,8 @@ (define_code_attr insn
    (minus "sub") (ss_minus "sssub") (us_minus "ussub")
    (sign_extend "extend") (zero_extend "zero_extend")
    (ashift "ashl") (lshiftrt "lshr") (ashiftrt "ashr")
-   (rotate "rotl") (rotatert "rotr")])
+   (rotate "rotl") (rotatert "rotr")
+   (mult "mul") (div "div")])
 
 ;; All integer modes.
 (define_mode_iterator SWI1248x [QI HI SI DI])
@@ -1089,8 +1093,9 @@ (define_mode_iterator SWI48DWI [SI DI (TI "TARGET_64BIT")])
 ;; compile time constant, it is faster to use <MODE_SIZE> than
 ;; GET_MODE_SIZE (<MODE>mode).  For XFmode which depends on
 ;; command line options just use GET_MODE_SIZE macro.
-(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (TI "16")
-			     (SF "4") (DF "8") (XF "GET_MODE_SIZE (XFmode)")
+(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
+			     (TI "16") (HF "2") (SF "4") (DF "8")
+			     (XF "GET_MODE_SIZE (XFmode)")
 			     (V16QI "16") (V32QI "32") (V64QI "64")
 			     (V8HI "16") (V16HI "32") (V32HI "64")
 			     (V4SI "16") (V8SI "32") (V16SI "64")
@@ -1222,8 +1227,8 @@ (define_mode_iterator MODEF [SF DF])
 ;; All x87 floating point modes
 (define_mode_iterator X87MODEF [SF DF XF])
 
-;; All x87 floating point modes plus HF
-(define_mode_iterator X87MODEFH [SF DF XF HF])
+;; All x87 floating point modes plus HFmode
+(define_mode_iterator X87MODEFH [HF SF DF XF])
 
 ;; All SSE floating point modes
 (define_mode_iterator SSEMODEF [SF DF TF])
@@ -1231,7 +1236,7 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
 
 ;; SSE instruction suffix for various modes
 (define_mode_attr ssemodesuffix
-  [(SF "ss") (DF "sd")
+  [(HF "sh") (SF "ss") (DF "sd")
    (V16SF "ps") (V8DF "pd")
    (V8SF "ps") (V4DF "pd")
    (V4SF "ps") (V2DF "pd")
@@ -1496,6 +1501,23 @@ (define_expand "cstorexf4"
   DONE;
 })
 
+(define_expand "cbranchhf4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:HF 1 "cmp_fp_expander_operand")
+		    (match_operand:HF 2 "cmp_fp_expander_operand")))
+   (set (pc) (if_then_else
+              (match_operator 0 "ix86_fp_comparison_operator"
+               [(reg:CC FLAGS_REG)
+                (const_int 0)])
+              (label_ref (match_operand 3))
+              (pc)))]
+  "TARGET_AVX512FP16"
+{
+  ix86_expand_branch (GET_CODE (operands[0]),
+		      operands[1], operands[2], operands[3]);
+  DONE;
+})
+
 (define_expand "cbranch<mode>4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
@@ -1705,6 +1727,17 @@ (define_insn "*cmpi<unord><MODEF:mode>"
 	 (eq_attr "alternative" "0")
 	 (symbol_ref "true")
 	 (symbol_ref "false"))))])
+
+(define_insn "*cmpi<unord>hf"
+  [(set (reg:CCFP FLAGS_REG)
+	(compare:CCFP
+	  (match_operand:HF 0 "register_operand" "v")
+	  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16"
+  "v<unord>comish\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecomi")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
 \f
 ;; Push/pop instructions.
 
@@ -2436,8 +2469,8 @@ (define_insn "*movsi_internal"
 	   (symbol_ref "true")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k")
-	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k,?r,?v,*v,*v,*m")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 
@@ -2463,6 +2496,9 @@ (define_insn "*movhi_internal"
 	  gcc_unreachable ();
 	}
 
+    case TYPE_SSEMOV:
+      return ix86_output_ssemov (insn, operands);
+
     case TYPE_MSKLOG:
       if (operands[1] == const0_rtx)
 	return "kxorw\t%0, %0, %0";
@@ -2477,8 +2513,15 @@ (define_insn "*movhi_internal"
 	return "mov{w}\t{%1, %0|%0, %1}";
     }
 }
-  [(set (attr "type")
-     (cond [(eq_attr "alternative" "4,5,6,7")
+  [(set (attr "isa")
+	(cond [(eq_attr "alternative" "9,10,11,12,13")
+		  (const_string "avx512fp16")
+	       ]
+	       (const_string "*")))
+   (set (attr "type")
+     (cond [(eq_attr "alternative" "9,10,11,12,13")
+	      (const_string "ssemov")
+	    (eq_attr "alternative" "4,5,6,7")
 	      (const_string "mskmov")
 	    (eq_attr "alternative" "8")
 	      (const_string "msklog")
@@ -2503,6 +2546,8 @@ (define_insn "*movhi_internal"
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
+	     (eq_attr "alternative" "11")
+	       (const_string "HF")
 	     (and (eq_attr "alternative" "1,2")
 		  (match_operand:HI 1 "aligned_operand"))
 	       (const_string "SI")
@@ -3727,7 +3772,10 @@ (define_insn "*movhf_internal"
 	       (eq_attr "alternative" "2")
 		 (const_string "sselog1")
 	       (eq_attr "alternative" "4,5,6,7")
-		 (const_string "sselog")
+		 (if_then_else
+		   (match_test ("TARGET_AVX512FP16"))
+		   (const_string "ssemov")
+		   (const_string "sselog"))
 	      ]
 	      (const_string "ssemov")))
    (set (attr "memory")
@@ -3750,9 +3798,15 @@ (define_insn "*movhf_internal"
 	       (eq_attr "alternative" "2")
 		 (const_string "V4SF")
 	       (eq_attr "alternative" "4,5,6,7")
-		 (const_string "TI")
+		 (if_then_else
+		   (match_test "TARGET_AVX512FP16")
+		   (const_string "HI")
+		   (const_string "TI"))
 	       (eq_attr "alternative" "3")
-		 (const_string "SF")
+		 (if_then_else
+		   (match_test "TARGET_AVX512FP16")
+		   (const_string "HF")
+		   (const_string "SF"))
 	      ]
 	      (const_string "*")))])
 
@@ -4493,6 +4547,17 @@ (define_split
   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
 })
 
+(define_insn "extendhf<mode>2"
+  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
+        (float_extend:MODEF
+	  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16"
+  "vcvtsh2<ssemodesuffix>\t{%1, %0, %0|%0, %0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
+
 (define_expand "extend<mode>xf2"
   [(set (match_operand:XF 0 "nonimmediate_operand")
         (float_extend:XF (match_operand:MODEF 1 "general_operand")))]
@@ -4670,6 +4735,18 @@ (define_insn "truncxf<mode>2"
 	      (symbol_ref "flag_unsafe_math_optimizations")
 	   ]
 	   (symbol_ref "true")))])
+
+;; Conversion from {SF,DF}mode to HFmode.
+
+(define_insn "trunc<mode>hf2"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+       (float_truncate:HF
+         (match_operand:MODEF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16"
+  "vcvt<ssemodesuffix>2sh\t{%1, %d0|%d0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
 \f
 ;; Signed conversion to DImode.
 
@@ -5046,6 +5123,16 @@ (define_insn "*float<SWI48:mode><MODEF:mode>2"
 	      (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")]
 	   (symbol_ref "true")))])
 
+(define_insn "float<floatunssuffix><mode>hf2"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+	(any_float:HF
+	  (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
+  "TARGET_AVX512FP16"
+  "vcvt<floatsuffix>si2sh<rex64suffix>\t{%1, %d0|%d0, %1}"
+  [(set_attr "type" "sseicvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_insn "*floatdi<MODEF:mode>2_i387"
   [(set (match_operand:MODEF 0 "register_operand" "=f")
 	(float:MODEF (match_operand:DI 1 "nonimmediate_operand" "m")))]
@@ -7626,6 +7713,13 @@ (define_expand "<insn>xf3"
 	  (match_operand:XF 2 "register_operand")))]
   "TARGET_80387")
 
+(define_expand "<insn>hf3"
+  [(set (match_operand:HF 0 "register_operand")
+	(plusminus:HF
+	  (match_operand:HF 1 "register_operand")
+	  (match_operand:HF 2 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
 (define_expand "<insn><mode>3"
   [(set (match_operand:MODEF 0 "register_operand")
 	(plusminus:MODEF
@@ -8203,6 +8297,12 @@ (define_expand "mulxf3"
 		 (match_operand:XF 2 "register_operand")))]
   "TARGET_80387")
 
+(define_expand "mulhf3"
+  [(set (match_operand:HF 0 "register_operand")
+	(mult:HF (match_operand:HF 1 "register_operand")
+		    (match_operand:HF 2 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
 (define_expand "mul<mode>3"
   [(set (match_operand:MODEF 0 "register_operand")
 	(mult:MODEF (match_operand:MODEF 1 "register_operand")
@@ -8220,6 +8320,12 @@ (define_expand "divxf3"
 		(match_operand:XF 2 "register_operand")))]
   "TARGET_80387")
 
+(define_expand "divhf3"
+  [(set (match_operand:HF 0 "register_operand")
+	(div:HF (match_operand:HF 1 "register_operand")
+		   (match_operand:HF 2 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
 (define_expand "div<mode>3"
   [(set (match_operand:MODEF 0 "register_operand")
 	(div:MODEF (match_operand:MODEF 1 "register_operand")
@@ -16312,6 +16418,17 @@ (define_insn "*fop_<mode>_comm"
 	 (symbol_ref "true")
 	 (symbol_ref "false"))))])
 
+(define_insn "*<insn>hf"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+	(plusminusmultdiv:HF
+	  (match_operand:HF 1 "nonimmediate_operand" "<comm>v")
+	  (match_operand:HF 2 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "v<insn>sh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_insn "*rcpsf2_sse"
   [(set (match_operand:SF 0 "register_operand" "=x,x,x")
 	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
@@ -19178,6 +19295,15 @@ (define_peephole2
     gcc_unreachable ();
 })
 
+(define_expand "movhfcc"
+  [(set (match_operand:HF 0 "register_operand")
+	(if_then_else:HF
+	  (match_operand 1 "comparison_operator")
+	  (match_operand:HF 2 "register_operand")
+	  (match_operand:HF 3 "register_operand")))]
+  "TARGET_AVX512FP16"
+  "if (ix86_expand_fp_movcc (operands)) DONE; else FAIL;")
+
 (define_expand "mov<mode>cc"
   [(set (match_operand:X87MODEF 0 "register_operand")
 	(if_then_else:X87MODEF
@@ -19346,6 +19472,18 @@ (define_insn "<code><mode>3"
 ;; Their operands are not commutative, and thus they may be used in the
 ;; presence of -0.0 and NaN.
 
+(define_insn "*ieee_s<ieee_maxmin>hf3"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+	(unspec:HF
+	  [(match_operand:HF 1 "register_operand" "v")
+	   (match_operand:HF 2 "nonimmediate_operand" "vm")]
+	  IEEE_MAXMIN))]
+  "TARGET_AVX512FP16"
+  "v<ieee_maxmin>sh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "sseadd")
+   (set_attr "mode" "HF")])
+
 (define_insn "*ieee_s<ieee_maxmin><mode>3"
   [(set (match_operand:MODEF 0 "register_operand" "=x,v")
 	(unspec:MODEF
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 7b8547bb1c3..ad366974b5b 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1166,3 +1166,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property.
 mmwait
 Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save
 Support MWAIT and MONITOR built-in functions and code generation.
+
+mavx512fp16
+Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index f129de4bbe5..2421a78637b 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -94,6 +94,10 @@
 
 #include <avx512vp2intersectvlintrin.h>
 
+#ifdef __SSE2__
+#include <avx512fp16intrin.h>
+#endif
+
 #include <shaintrin.h>
 
 #include <fmaintrin.h>
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 32697e6117c..bb9f7ca956e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1393,6 +1393,7 @@ See RS/6000 and PowerPC Options.
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
 -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni@gol
+-mavx512fp16 @gol
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
 -mkl -mwidekl @gol
@@ -31154,6 +31155,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @itemx -mavx512bf16
 @opindex mavx512bf16
 @need 200
+@itemx -mavx512fp16
+@opindex mavx512fp16
+@need 200
 @itemx -mgfni
 @opindex mgfni
 @need 200
@@ -31232,9 +31236,9 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP,
 XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2,
 GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
 ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
-UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI or CLDEMOTE
-extended instruction sets. Each has a corresponding @option{-mno-} option to
-disable use of these instructions.
+UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16
+or CLDEMOTE extended instruction sets. Each has a corresponding
+@option{-mno-} option to disable use of these instructions.
 
 These extensions are also available as built-in functions: see
 @ref{x86 Built-in Functions}, for details of the functions enabled and
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index 62b2132957a..fba3d1ac684 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index 843aa2bdb2f..5cc0fa83457 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
diff --git a/gcc/testsuite/g++.target/i386/float16-1.C b/gcc/testsuite/g++.target/i386/float16-1.C
new file mode 100644
index 00000000000..95d1ac27c4f
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/float16-1.C
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-sse2" } */
+
+_Float16/* { dg-error "does not name a type" } */
+foo (_Float16 x) 
+{
+  return x;
+}
diff --git a/gcc/testsuite/g++.target/i386/float16-2.C b/gcc/testsuite/g++.target/i386/float16-2.C
new file mode 100644
index 00000000000..99eb797eff1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/float16-2.C
@@ -0,0 +1,14 @@
+/* { dg-do assemble { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+union flt
+{
+  _Float16 flt;
+  short s;
+};
+
+_Float16
+foo (union flt x)
+{
+  return x.flt;
+}
diff --git a/gcc/testsuite/g++.target/i386/float16-3.C b/gcc/testsuite/g++.target/i386/float16-3.C
new file mode 100644
index 00000000000..940878503f1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/float16-3.C
@@ -0,0 +1,10 @@
+/* { dg-do assemble { target avx512fp16 } } */
+/* { dg-options "-O0 -mavx512fp16" } */
+
+template <typename> void a(char *) {}
+char b, d;
+void c()
+{
+  a<unsigned char>(&d);
+  a<_Float16>(&b);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 6178e38ce02..f3676077743 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
index 986fbd819e4..1751c52565c 100644
--- a/gcc/testsuite/gcc.target/i386/avx-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 0a377dba1d5..0ad9064f637 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -87,6 +87,9 @@ main ()
 #ifdef AVX512VNNI
       && (ecx & bit_AVX512VNNI)
 #endif
+#ifdef AVX512FP16
+      && (edx & bit_AVX512FP16)
+#endif
 #ifdef VAES
       && (ecx & bit_VAES)
 #endif
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
new file mode 100644
index 00000000000..88887556d68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+__attribute__ ((noinline, noclone))
+do_max (_Float16 __A, _Float16 __B)
+{
+  return __A > __B ? __A : __B;
+}
+
+_Float16
+__attribute__ ((noinline, noclone))
+do_min (_Float16 __A, _Float16 __B)
+{
+  return __A < __B ? __A : __B;
+}
+
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-times "vminsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
new file mode 100644
index 00000000000..c9e23bf95c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-12a.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 0.1f;
+  _Float16 y = -3.2f;
+  _Float16 z;
+
+  z = do_max (x, y);
+  if (z != x)
+    abort ();
+
+  z = do_min (x, y);
+  if (z != y)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/float16-3a.c b/gcc/testsuite/gcc.target/i386/float16-3a.c
new file mode 100644
index 00000000000..3846c8e9b6e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-3a.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtsi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/float16-3b.c b/gcc/testsuite/gcc.target/i386/float16-3b.c
new file mode 100644
index 00000000000..247dd6e7e33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-3b.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (unsigned int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtusi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/float16-4a.c b/gcc/testsuite/gcc.target/i386/float16-4a.c
new file mode 100644
index 00000000000..631082581f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-4a.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (long long x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtsi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/float16-4b.c b/gcc/testsuite/gcc.target/i386/float16-4b.c
new file mode 100644
index 00000000000..828d8530769
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-4b.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (unsigned long long x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtusi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 79265c7c94f..8499fdf2db9 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -79,6 +79,7 @@ extern void test_hreset (void)			__attribute__((__target__("hreset")));
 extern void test_keylocker (void)		__attribute__((__target__("kl")));
 extern void test_widekl (void)			__attribute__((__target__("widekl")));
 extern void test_avxvnni (void)			__attribute__((__target__("avxvnni")));
+extern void test_avx512fp16 (void)		__attribute__((__target__("avx512fp16")));
 
 extern void test_no_sgx (void)			__attribute__((__target__("no-sgx")));
 extern void test_no_avx5124fmaps(void)		__attribute__((__target__("no-avx5124fmaps")));
@@ -159,6 +160,7 @@ extern void test_no_hreset (void)		__attribute__((__target__("no-hreset")));
 extern void test_no_keylocker (void)		__attribute__((__target__("no-kl")));
 extern void test_no_widekl (void)		__attribute__((__target__("no-widekl")));
 extern void test_no_avxvnni (void)		__attribute__((__target__("no-avxvnni")));
+extern void test_no_avx512fp16 (void)		__attribute__((__target__("no-avx512fp16")));
 
 extern void test_arch_nocona (void)		__attribute__((__target__("arch=nocona")));
 extern void test_arch_core2 (void)		__attribute__((__target__("arch=core2")));
diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
new file mode 100644
index 00000000000..2f8af392c83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
+/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
+
+#include <immintrin.h>
+
+_Float16
+foo (_Float16 x, _Float16 y)
+{
+  x = x > y ? x : y;
+  return x;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 7029771334b..f5f5c113612 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 4ce0ffffaf3..747d504cedb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 6e8b6f3fa1b..33411969901 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -103,7 +103,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
+#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
 #endif
 #include <immintrin.h>
 test_1 (_cvtss_sh, unsigned short, float, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 7faa053ace8..86590ca5ffb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -708,6 +708,6 @@
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1) 
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 42ac9d0ac1a..10765365d7b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3020,7 +3020,7 @@ proc check_effective_target_has_q_floating_suffix { } {
 
 proc check_effective_target_float16 {} {
     return [check_no_compiler_messages_nocache float16 object {
-        _Float16 x;
+        _Float16 foo (_Float16 x) { return x; }
     } [add_options_for_float16 ""]]
 }
 
@@ -8714,6 +8714,17 @@ proc check_prefer_avx128 { } {
 }
 
 
+# Return 1 if avx512fp16 instructions can be compiled.
+
+proc check_effective_target_avx512fp16 { } {
+    return [check_no_compiler_messages avx512fp16 object {
+	void foo (void)
+	{
+	  asm volatile ("vmovw %edi, %xmm0");
+	}
+    } "-O2 -mavx512fp16" ]
+}
+
 # Return 1 if avx512f instructions can be compiled.
 
 proc check_effective_target_avx512f { } {
-- 
2.18.1


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-07-21  7:43         ` [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
  2021-07-21 10:35           ` Uros Bizjak
  2021-07-22 11:56           ` Richard Biener
@ 2021-07-28 21:56           ` Joseph Myers
  2021-07-29  4:53             ` Hongtao Liu
  2 siblings, 1 reply; 138+ messages in thread
From: Joseph Myers @ 2021-07-28 21:56 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, ubizjak

On Wed, 21 Jul 2021, liuhongt via Gcc-patches wrote:

> @@ -23254,13 +23337,15 @@ ix86_get_excess_precision (enum excess_precision_type type)
>  	   provide would be identical were it not for the unpredictable
>  	   cases.  */
>  	if (!TARGET_80387)
> -	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +	  return TARGET_SSE2
> +		 ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> +		 : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
>  	else if (!TARGET_MIX_SSE_I387)
>  	  {
>  	    if (!(TARGET_SSE && TARGET_SSE_MATH))
>  	      return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
>  	    else if (TARGET_SSE2)
> -	      return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +	      return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
>  	  }
>  
>  	/* If we are in standards compliant mode, but we know we will

This patch is not changing the default "fast" mode at all; that's 
promoting to float, unconditionally.  But you have a subsequent change 
there in patch 4 to make the promotions in the default "fast" mode depend 
on hardware support for the new instructions; it's unhelpful for the 
documentation not to corresponding exactly to the code changes in the same 
patch.

Rather than using FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 whenever TARGET_SSE2 
(i.e. whenever the type is available), it might make more sense to follow 
AArch64 and use it only when the hardware instructions are available.  In 
any case, it seems peculiar to use a different threshold in the "fast" 
case from the "standard" case.  -fexcess-precision=standard is not "avoid 
excess precision", it's "implement excess precision in the front end".  
Whenever "fast" is implementing excess precision in the front end, 
"standard" should be doing the same thing as "fast".

> +Soft-fp keeps the intermediate result of the operation at 32-bit precision by defaults,
> +which may lead to inconsistent behavior between soft-fp and avx512fp16 instructions,
> +using @option{-fexcess-precision=standard} will force round back after every operation.

"soft-fp" is, as the name of some code within GCC, an internal 
implementation detail, which should not be referenced in the user manual.  
What results in intermediate results being in a wider precision is not 
soft-fp; it's promotions inserted by the front end as a result of how the 
above hook is defined (promotions inserted by the optabs/expand code are 
an implementation detail that should always be followed automatically by a 
truncation of the result and so not be user-visible).

As far as I know, the official name of "avx512fp16" is "AVX512-FP16" and 
text in the manual should use the official capitalization, hyphenation 
etc. in such names unless literally referring to command-line options 
inside @option or similar.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-07-28 21:56           ` Joseph Myers
@ 2021-07-29  4:53             ` Hongtao Liu
  2021-07-29  5:34               ` Hongtao Liu
  2021-07-29 21:30               ` Joseph Myers
  0 siblings, 2 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-29  4:53 UTC (permalink / raw)
  To: Joseph Myers; +Cc: liuhongt, gcc-patches

On Thu, Jul 29, 2021 at 5:57 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Wed, 21 Jul 2021, liuhongt via Gcc-patches wrote:
>
> > @@ -23254,13 +23337,15 @@ ix86_get_excess_precision (enum excess_precision_type type)
> >          provide would be identical were it not for the unpredictable
> >          cases.  */
> >       if (!TARGET_80387)
> > -       return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > +       return TARGET_SSE2
> > +              ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> > +              : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> >       else if (!TARGET_MIX_SSE_I387)
> >         {
> >           if (!(TARGET_SSE && TARGET_SSE_MATH))
> >             return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
> >           else if (TARGET_SSE2)
> > -           return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > +           return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> >         }
> >
> >       /* If we are in standards compliant mode, but we know we will
>
> This patch is not changing the default "fast" mode at all; that's
> promoting to float, unconditionally.  But you have a subsequent change
> there in patch 4 to make the promotions in the default "fast" mode depend
> on hardware support for the new instructions; it's unhelpful for the
> documentation not to corresponding exactly to the code changes in the same
> patch.
Yes, will change.
>
> Rather than using FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 whenever TARGET_SSE2
> (i.e. whenever the type is available), it might make more sense to follow
> AArch64 and use it only when the hardware instructions are available.  In
> any case, it seems peculiar to use a different threshold in the "fast"
  We want to provide some debuggability to the software emulation.
When there's inconsistency between software emulation and hardware
instructions, users can still debug on non-avx512fp16 processor w/
software emulation and extra option -fexcess-precision=standard,
Also since TARGET_C_EXCESS_PRECISION is not related to type, for
testcase w/o _Float16 and is supposed to be runned on x86 fpu, if gcc
is built w/ --with-arch=sapphirerapid, it will regress those
testcases. .i.e. gcc.target/i386/excess-precision-*.c, that's why we
can't follow AArch64.
> case from the "standard" case.  -fexcess-precision=standard is not "avoid
> excess precision", it's "implement excess precision in the front end".
> Whenever "fast" is implementing excess precision in the front end,
> "standard" should be doing the same thing as "fast".
>
> > +Soft-fp keeps the intermediate result of the operation at 32-bit precision by defaults,
> > +which may lead to inconsistent behavior between soft-fp and avx512fp16 instructions,
> > +using @option{-fexcess-precision=standard} will force round back after every operation.
>
> "soft-fp" is, as the name of some code within GCC, an internal
> implementation detail, which should not be referenced in the user manual.
> What results in intermediate results being in a wider precision is not
> soft-fp; it's promotions inserted by the front end as a result of how the
> above hook is defined (promotions inserted by the optabs/expand code are
> an implementation detail that should always be followed automatically by a
> truncation of the result and so not be user-visible).
Yes, will reorganize the words.
>
> As far as I know, the official name of "avx512fp16" is "AVX512-FP16" and
> text in the manual should use the official capitalization, hyphenation
> etc. in such names unless literally referring to command-line options
> inside @option or similar.
Yes, will change.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-07-29  4:53             ` Hongtao Liu
@ 2021-07-29  5:34               ` Hongtao Liu
  2021-07-29 21:30               ` Joseph Myers
  1 sibling, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-07-29  5:34 UTC (permalink / raw)
  To: Joseph Myers, richard.guenther; +Cc: liuhongt, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5201 bytes --]

On Thu, Jul 29, 2021 at 12:53 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Jul 29, 2021 at 5:57 AM Joseph Myers <joseph@codesourcery.com> wrote:
> >
> > On Wed, 21 Jul 2021, liuhongt via Gcc-patches wrote:
> >
> > > @@ -23254,13 +23337,15 @@ ix86_get_excess_precision (enum excess_precision_type type)
> > >          provide would be identical were it not for the unpredictable
> > >          cases.  */
> > >       if (!TARGET_80387)
> > > -       return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > > +       return TARGET_SSE2
> > > +              ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> > > +              : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > >       else if (!TARGET_MIX_SSE_I387)
> > >         {
> > >           if (!(TARGET_SSE && TARGET_SSE_MATH))
> > >             return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
> > >           else if (TARGET_SSE2)
> > > -           return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > > +           return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > >         }
> > >
> > >       /* If we are in standards compliant mode, but we know we will
> >
> > This patch is not changing the default "fast" mode at all; that's
> > promoting to float, unconditionally.  But you have a subsequent change
> > there in patch 4 to make the promotions in the default "fast" mode depend
> > on hardware support for the new instructions; it's unhelpful for the
> > documentation not to corresponding exactly to the code changes in the same
> > patch.
> Yes, will change.
> >
> > Rather than using FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 whenever TARGET_SSE2
> > (i.e. whenever the type is available), it might make more sense to follow
> > AArch64 and use it only when the hardware instructions are available.  In
> > any case, it seems peculiar to use a different threshold in the "fast"
>   We want to provide some debuggability to the software emulation.
> When there's inconsistency between software emulation and hardware
> instructions, users can still debug on non-avx512fp16 processor w/
> software emulation and extra option -fexcess-precision=standard,
> Also since TARGET_C_EXCESS_PRECISION is not related to type, for
> testcase w/o _Float16 and is supposed to be runned on x86 fpu, if gcc
> is built w/ --with-arch=sapphirerapid, it will regress those
> testcases. .i.e. gcc.target/i386/excess-precision-*.c, that's why we
> can't follow AArch64.
> > case from the "standard" case.  -fexcess-precision=standard is not "avoid
> > excess precision", it's "implement excess precision in the front end".
> > Whenever "fast" is implementing excess precision in the front end,
> > "standard" should be doing the same thing as "fast".
> >
> > > +Soft-fp keeps the intermediate result of the operation at 32-bit precision by defaults,
> > > +which may lead to inconsistent behavior between soft-fp and avx512fp16 instructions,
> > > +using @option{-fexcess-precision=standard} will force round back after every operation.
> >
> > "soft-fp" is, as the name of some code within GCC, an internal
> > implementation detail, which should not be referenced in the user manual.
> > What results in intermediate results being in a wider precision is not
> > soft-fp; it's promotions inserted by the front end as a result of how the
> > above hook is defined (promotions inserted by the optabs/expand code are
> > an implementation detail that should always be followed automatically by a
> > truncation of the result and so not be user-visible).
> Yes, will reorganize the words.
> >
> > As far as I know, the official name of "avx512fp16" is "AVX512-FP16" and
> > text in the manual should use the official capitalization, hyphenation
> > etc. in such names unless literally referring to command-line options
> > inside @option or similar.
> Yes, will change.
> >
Update patch for documents.
> > --
> > Joseph S. Myers
> > joseph@codesourcery.com
>
>
>
> --
> BR,
> Hongtao

Also as a follow up of [1], I merge the below change into the updated patch.
Richard, please comment under this thread.
> > > +  /* FIXME: validate_subreg only allows (subreg:WORD_MODE (reg:HF) 0). */
> >
> > I think that needs "fixing" then, or alternatively the caller should care.
> >
> How about this
>
> modified   gcc/emit-rtl.c
> @@ -928,6 +928,10 @@ validate_subreg (machine_mode omode, machine_mode imode,
>       fix them all.  */
>    if (omode == word_mode)
>      ;
> +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> +     here. Though extract_bit_field is the culprit here, not the backends.  */
> +  else if (imode == HFmode && omode == SImode)
> +    ;
>    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>       is the culprit here, and not the backends.  */
>    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> new file   gcc/testsuite/gcc.target/i386/float16-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse2 -O2" } */
> +_Float16
> +foo (int a)
> +{
> +  union {
> +    int a;
> +    _Float16 b;
> +  }c;
> +  c.a = a;
> +  return c.b;
> +}
>
> If it's ok, I'll merge the upper change to the former commit:

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576074.html


--
BR,
Hongtao

[-- Attachment #2: V3 0002-i386-Enable-_Float16-type-for-TARGET_SSE2-and-above.patch --]
[-- Type: application/x-patch, Size: 21734 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.
  2021-07-27  5:32             ` Hongtao Liu
@ 2021-07-29 20:57               ` Joseph Myers
  2021-08-02  5:10                 ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Joseph Myers @ 2021-07-29 20:57 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Richard Biener, liuhongt, GCC Patches

On Tue, 27 Jul 2021, Hongtao Liu via Gcc-patches wrote:

> modified   gcc/emit-rtl.c
> @@ -928,6 +928,10 @@ validate_subreg (machine_mode omode, machine_mode imode,
>       fix them all.  */
>    if (omode == word_mode)
>      ;
> +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> +     here. Though extract_bit_field is the culprit here, not the backends.  */
> +  else if (imode == HFmode && omode == SImode)
> +    ;

You can't reference HFmode by name at all in any target-independent file, 
outside of a #ifdef HAVE_HFmode conditional.  It's only defined in 
architecture-specific <arch>-modes.def for those architectures supporting 
that mode, so you'll have an undefined identifier building for other 
targets if you reference it in a generic source file.  You have to 
condition things on the logical properties of the mode that are relevant, 
not on the target-specific name (or use a HAVE_HFmode conditional, but 
basing things on logical properties is clearly better where possible).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-07-29  4:53             ` Hongtao Liu
  2021-07-29  5:34               ` Hongtao Liu
@ 2021-07-29 21:30               ` Joseph Myers
  2021-08-02  5:23                 ` Hongtao Liu
  1 sibling, 1 reply; 138+ messages in thread
From: Joseph Myers @ 2021-07-29 21:30 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: liuhongt, gcc-patches

On Thu, 29 Jul 2021, Hongtao Liu via Gcc-patches wrote:

> > Rather than using FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 whenever TARGET_SSE2
> > (i.e. whenever the type is available), it might make more sense to follow
> > AArch64 and use it only when the hardware instructions are available.  In
> > any case, it seems peculiar to use a different threshold in the "fast"
>   We want to provide some debuggability to the software emulation.
> When there's inconsistency between software emulation and hardware
> instructions, users can still debug on non-avx512fp16 processor w/
> software emulation and extra option -fexcess-precision=standard,

But that's not the purpose of -fexcess-precision=standard.  The purpose is 
only: when the default case is non-conforming, make it conforming instead.  
The default case is non-conforming only when the back end has insn 
patterns pretending to be able to do arithmetic on formats it can't 
actually do arithmetic on - that is, x87 arithmetic where the insn 
patterns pretend to support SFmode and DFmode arithmetic but actually use 
XFmode (and the similar issue for older m68k, but that back end doesn't 
actually have the required support for -fexcess-precision=standard).

So -fexcess-precision=standard should not do anything different from 
-fexcess-precision=fast regarding _Float16.

If you want to be able to enable or disable excess precision for _Float16 
separately from the underlying hardware support, that might provide a case 
for supporting extra options, say -fexcess-precision=16 that means follow 
the semantics of FLT_EVAL_METHOD == 16 (and with an error for that option 
on architectures where the given FLT_EVAL_METHOD value isn't supported).  
But that shouldn't be done by making -fexcess-precision=standard do 
something outside its scope.

> Also since TARGET_C_EXCESS_PRECISION is not related to type, for
> testcase w/o _Float16 and is supposed to be runned on x86 fpu, if gcc
> is built w/ --with-arch=sapphirerapid, it will regress those
> testcases. .i.e. gcc.target/i386/excess-precision-*.c, that's why we
> can't follow AArch64.

Those tests use -mfpmath=387.

In the -mfpmath=387 case, it seems reasonable to keep the rule of 
promoting to long double, regardless of hardware _Float16 support (-msse2 
must also be in effect for the type to be supported at all by the back 
end).  It's the -mfpmath=sse case for which I think following AArch64 is 
appropriate.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.
  2021-07-29 20:57               ` Joseph Myers
@ 2021-08-02  5:10                 ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-02  5:10 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Richard Biener, liuhongt, GCC Patches

On Fri, Jul 30, 2021 at 4:58 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Tue, 27 Jul 2021, Hongtao Liu via Gcc-patches wrote:
>
> > modified   gcc/emit-rtl.c
> > @@ -928,6 +928,10 @@ validate_subreg (machine_mode omode, machine_mode imode,
> >       fix them all.  */
> >    if (omode == word_mode)
> >      ;
> > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > +     here. Though extract_bit_field is the culprit here, not the backends.  */
> > +  else if (imode == HFmode && omode == SImode)
> > +    ;
>
> You can't reference HFmode by name at all in any target-independent file,
> outside of a #ifdef HAVE_HFmode conditional.  It's only defined in
> architecture-specific <arch>-modes.def for those architectures supporting
> that mode, so you'll have an undefined identifier building for other
> targets if you reference it in a generic source file.  You have to
> condition things on the logical properties of the mode that are relevant,
> not on the target-specific name (or use a HAVE_HFmode conditional, but
> basing things on logical properties is clearly better where possible).
>
Yes, I didn't notice that, thanks for the explanation.
And I guess the same logic is also applies to BFmode(or other 16bit
float mode) if it existed, so add something like

modified   gcc/emit-rtl.c
@@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
      fix them all.  */
   if (omode == word_mode)
     ;
+  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
+     here. Though extract_bit_field is the culprit here, not the backends.  */
+  else if (known_gt (regsize, osize) && known_gt (osize, isize)
+    && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
+    ;
> --
> Joseph S. Myers
> joseph@codesourcery.com



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-07-29 21:30               ` Joseph Myers
@ 2021-08-02  5:23                 ` Hongtao Liu
  2021-08-02  6:31                   ` [PATCH V3 0/6] Initial support for AVX512FP16 liuhongt
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-02  5:23 UTC (permalink / raw)
  To: Joseph Myers; +Cc: liuhongt, GCC Patches

On Fri, Jul 30, 2021 at 5:30 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 29 Jul 2021, Hongtao Liu via Gcc-patches wrote:
>
> > > Rather than using FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 whenever TARGET_SSE2
> > > (i.e. whenever the type is available), it might make more sense to follow
> > > AArch64 and use it only when the hardware instructions are available.  In
> > > any case, it seems peculiar to use a different threshold in the "fast"
> >   We want to provide some debuggability to the software emulation.
> > When there's inconsistency between software emulation and hardware
> > instructions, users can still debug on non-avx512fp16 processor w/
> > software emulation and extra option -fexcess-precision=standard,
>
> But that's not the purpose of -fexcess-precision=standard.  The purpose is
> only: when the default case is non-conforming, make it conforming instead.
> The default case is non-conforming only when the back end has insn
> patterns pretending to be able to do arithmetic on formats it can't
> actually do arithmetic on - that is, x87 arithmetic where the insn
> patterns pretend to support SFmode and DFmode arithmetic but actually use
> XFmode (and the similar issue for older m68k, but that back end doesn't
> actually have the required support for -fexcess-precision=standard).
>
> So -fexcess-precision=standard should not do anything different from
> -fexcess-precision=fast regarding _Float16.
>
It make perfect sense.
> If you want to be able to enable or disable excess precision for _Float16
> separately from the underlying hardware support, that might provide a case
> for supporting extra options, say -fexcess-precision=16 that means follow
> the semantics of FLT_EVAL_METHOD == 16 (and with an error for that option
> on architectures where the given FLT_EVAL_METHOD value isn't supported).
> But that shouldn't be done by making -fexcess-precision=standard do
> something outside its scope.
>
> > Also since TARGET_C_EXCESS_PRECISION is not related to type, for
> > testcase w/o _Float16 and is supposed to be runned on x86 fpu, if gcc
> > is built w/ --with-arch=sapphirerapid, it will regress those
> > testcases. .i.e. gcc.target/i386/excess-precision-*.c, that's why we
> > can't follow AArch64.
>
> Those tests use -mfpmath=387.
>
> In the -mfpmath=387 case, it seems reasonable to keep the rule of
> promoting to long double, regardless of hardware _Float16 support (-msse2
> must also be in effect for the type to be supported at all by the back
> end).  It's the -mfpmath=sse case for which I think following AArch64 is
> appropriate.
So does this.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

I'll add an extra option -fexcess-precision=16 to set
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
Also and refine ix86_get_excess_precision as

@@ -23327,14 +23382,18 @@ ix86_get_excess_precision (enum
excess_precision_type type)
  /* The fastest type to promote to will always be the native type,
     whether that occurs with implicit excess precision or
     otherwise.  */
- return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+ return TARGET_AVX512FP16
+        ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
+        : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
       case EXCESS_PRECISION_TYPE_STANDARD:
       case EXCESS_PRECISION_TYPE_IMPLICIT:
  /* Otherwise, the excess precision we want when we are
     in a standards compliant mode, and the implicit precision we
     provide would be identical were it not for the unpredictable
     cases.  */
- if (!TARGET_80387)
+ if (TARGET_AVX512FP16 && TARGET_SSE_MATH)
+   return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
+ else if (!TARGET_80387)
    return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
  else if (!TARGET_MIX_SSE_I387)
    {

Will update in my next version.

-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH V3 0/6] Initial support for AVX512FP16
  2021-08-02  5:23                 ` Hongtao Liu
@ 2021-08-02  6:31                   ` liuhongt
  2021-08-02  6:31                     ` [PATCH 1/6] Update hf soft-fp from glibc liuhongt
                                       ` (6 more replies)
  0 siblings, 7 replies; 138+ messages in thread
From: liuhongt @ 2021-08-02  6:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, crazylht, joseph, richard.guenther, hjl.tools

Update from v2:

1. Support -fexcess-precision=16 which will enable
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16. 
2. Update ix86_get_excess_precision, so -fexcess-precision=standard
should not do anything different from -fexcess-precision=fast
 regarding _Float16.
3. Avoiding macroization of HFmode patterns.
4. Allow (subreg:SI (reg:HF)).
5. Update documents corresponding exactly to the code changes in
the same patch.
6. According to 32bit abi, pass vector _Float16 by sse registers
for 32-bit mode, not stack.

Guo, Xuepeng (1):
  AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
    instructions.

liuhongt (5):
  Update hf soft-fp from glibc.
  [i386] Enable _Float16 type for TARGET_SSE2 and above.
  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
    truncations.
  Support -fexcess-precision=16 which will enable
    FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
  AVX512FP16: Support vector init/broadcast/set/extract for FP16.

 gcc/ada/gcc-interface/misc.c                  |   3 +
 gcc/c-family/c-common.c                       |   6 +-
 gcc/c-family/c-cppbuiltin.c                   |   6 +-
 gcc/common.opt                                |   5 +-
 gcc/common/config/i386/cpuinfo.h              |   2 +
 gcc/common/config/i386/i386-common.c          |  26 +-
 gcc/common/config/i386/i386-cpuinfo.h         |   1 +
 gcc/common/config/i386/i386-isas.h            |   1 +
 gcc/config.gcc                                |   2 +-
 gcc/config/aarch64/aarch64.c                  |   1 +
 gcc/config/arm/arm.c                          |   1 +
 gcc/config/i386/avx512fp16intrin.h            | 225 ++++++++++
 gcc/config/i386/cpuid.h                       |   1 +
 gcc/config/i386/i386-builtin-types.def        |   7 +-
 gcc/config/i386/i386-builtins.c               |  23 +
 gcc/config/i386/i386-c.c                      |   2 +
 gcc/config/i386/i386-expand.c                 | 129 +++++-
 gcc/config/i386/i386-isa.def                  |   1 +
 gcc/config/i386/i386-modes.def                |  13 +-
 gcc/config/i386/i386-options.c                |   4 +-
 gcc/config/i386/i386.c                        | 243 +++++++++--
 gcc/config/i386/i386.h                        |  29 +-
 gcc/config/i386/i386.md                       | 291 ++++++++++++-
 gcc/config/i386/i386.opt                      |   4 +
 gcc/config/i386/immintrin.h                   |   4 +
 gcc/config/i386/sse.md                        | 397 +++++++++++++-----
 gcc/config/m68k/m68k.c                        |   2 +
 gcc/config/s390/s390.c                        |   2 +
 gcc/coretypes.h                               |   3 +-
 gcc/doc/extend.texi                           |  22 +
 gcc/doc/invoke.texi                           |  10 +-
 gcc/doc/tm.texi                               |  14 +-
 gcc/doc/tm.texi.in                            |   3 +
 gcc/emit-rtl.c                                |   5 +
 gcc/flag-types.h                              |   3 +-
 gcc/fortran/options.c                         |   3 +
 gcc/lto/lto-lang.c                            |   3 +
 gcc/target.def                                |  11 +-
 gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
 gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
 gcc/testsuite/g++.target/i386/float16-2.C     |  14 +
 gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
 gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
 gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
 gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
 .../gcc.target/i386/avx512fp16-12a.c          |  21 +
 .../gcc.target/i386/avx512fp16-12b.c          |  27 ++
 gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-5.c     |  12 +
 gcc/testsuite/gcc.target/i386/float16-6.c     |   8 +
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 +
 gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
 .../gcc.target/i386/sse2-float16-1.c          |   8 +
 .../gcc.target/i386/sse2-float16-2.c          |  16 +
 .../gcc.target/i386/sse2-float16-3.c          |  12 +
 gcc/testsuite/lib/target-supports.exp         |  13 +-
 gcc/tree.c                                    |   3 +-
 libgcc/config.host                            |   5 +-
 libgcc/config/i386/32/sfp-machine.h           |   1 +
 libgcc/config/i386/32/t-softfp                |   1 +
 libgcc/config/i386/64/sfp-machine.h           |   1 +
 libgcc/config/i386/64/t-softfp                |   1 +
 libgcc/config/i386/sfp-machine.h              |   1 +
 libgcc/config/i386/t-softfp                   |   5 +
 libgcc/soft-fp/eqhf2.c                        |  49 +++
 libgcc/soft-fp/extendhfdf2.c                  |  53 +++
 libgcc/soft-fp/extendhfsf2.c                  |  49 +++
 libgcc/soft-fp/half.h                         |   1 +
 libgcc/soft-fp/truncdfhf2.c                   |  52 +++
 libgcc/soft-fp/truncsfhf2.c                   |  48 +++
 78 files changed, 1781 insertions(+), 223 deletions(-)
 create mode 100644 gcc/config/i386/avx512fp16intrin.h
 create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
 create mode 100644 libgcc/config/i386/64/t-softfp
 create mode 100644 libgcc/soft-fp/eqhf2.c
 create mode 100644 libgcc/soft-fp/extendhfdf2.c
 create mode 100644 libgcc/soft-fp/extendhfsf2.c
 create mode 100644 libgcc/soft-fp/truncdfhf2.c
 create mode 100644 libgcc/soft-fp/truncsfhf2.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 1/6] Update hf soft-fp from glibc.
  2021-08-02  6:31                   ` [PATCH V3 0/6] Initial support for AVX512FP16 liuhongt
@ 2021-08-02  6:31                     ` liuhongt
  2021-08-02  6:31                     ` [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
                                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-08-02  6:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, crazylht, joseph, richard.guenther, hjl.tools

libgcc/ChangeLog

	* soft-fp/eqhf2.c: New file.
	* soft-fp/extendhfdf2.c: New file.
	* soft-fp/extendhfsf2.c: New file.
	* soft-fp/extendhfxf2.c: New file.
	* soft-fp/half.h (FP_CMP_EQ_H): New marco.
	* soft-fp/truncdfhf2.c: New file
	* soft-fp/truncsfhf2.c: New file
	* soft-fp/truncxfhf2.c: New file
---
 libgcc/soft-fp/eqhf2.c       | 49 +++++++++++++++++++++++++++++++++
 libgcc/soft-fp/extendhfdf2.c | 53 ++++++++++++++++++++++++++++++++++++
 libgcc/soft-fp/extendhfsf2.c | 49 +++++++++++++++++++++++++++++++++
 libgcc/soft-fp/half.h        |  1 +
 libgcc/soft-fp/truncdfhf2.c  | 52 +++++++++++++++++++++++++++++++++++
 libgcc/soft-fp/truncsfhf2.c  | 48 ++++++++++++++++++++++++++++++++
 6 files changed, 252 insertions(+)
 create mode 100644 libgcc/soft-fp/eqhf2.c
 create mode 100644 libgcc/soft-fp/extendhfdf2.c
 create mode 100644 libgcc/soft-fp/extendhfsf2.c
 create mode 100644 libgcc/soft-fp/truncdfhf2.c
 create mode 100644 libgcc/soft-fp/truncsfhf2.c

diff --git a/libgcc/soft-fp/eqhf2.c b/libgcc/soft-fp/eqhf2.c
new file mode 100644
index 00000000000..6d6634e5c54
--- /dev/null
+++ b/libgcc/soft-fp/eqhf2.c
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return 0 iff a == b, 1 otherwise
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+
+CMPtype
+__eqhf2 (HFtype a, HFtype b)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_H (B);
+  CMPtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+  FP_UNPACK_RAW_H (B, b);
+  FP_CMP_EQ_H (r, A, B, 1);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
+
+strong_alias (__eqhf2, __nehf2);
diff --git a/libgcc/soft-fp/extendhfdf2.c b/libgcc/soft-fp/extendhfdf2.c
new file mode 100644
index 00000000000..337ba791d48
--- /dev/null
+++ b/libgcc/soft-fp/extendhfdf2.c
@@ -0,0 +1,53 @@
+/* Software floating-point emulation.
+   Return an IEEE half converted to IEEE double
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "half.h"
+#include "double.h"
+
+DFtype
+__extendhfdf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_D (R);
+  DFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_EXTEND (D, H, 2, 1, R, A);
+#else
+  FP_EXTEND (D, H, 1, 1, R, A);
+#endif
+  FP_PACK_RAW_D (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/libgcc/soft-fp/extendhfsf2.c b/libgcc/soft-fp/extendhfsf2.c
new file mode 100644
index 00000000000..a02f46d9a99
--- /dev/null
+++ b/libgcc/soft-fp/extendhfsf2.c
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return an IEEE half converted to IEEE single
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "half.h"
+#include "single.h"
+
+SFtype
+__extendhfsf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (R);
+  SFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+  FP_EXTEND (S, H, 1, 1, R, A);
+  FP_PACK_RAW_S (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/libgcc/soft-fp/half.h b/libgcc/soft-fp/half.h
index c7823ac61d3..4108f5cb3c2 100644
--- a/libgcc/soft-fp/half.h
+++ b/libgcc/soft-fp/half.h
@@ -167,4 +167,5 @@ union _FP_UNION_H
 #define _FP_FRAC_HIGH_RAW_H(X)	_FP_FRAC_HIGH_1 (X)
 #define _FP_FRAC_HIGH_DW_H(X)	_FP_FRAC_HIGH_1 (X)
 
+#define FP_CMP_EQ_H(r, X, Y, ex)	_FP_CMP_EQ (H, 1, (r), X, Y, (ex))
 #endif /* !SOFT_FP_HALF_H */
diff --git a/libgcc/soft-fp/truncdfhf2.c b/libgcc/soft-fp/truncdfhf2.c
new file mode 100644
index 00000000000..8bcb2787692
--- /dev/null
+++ b/libgcc/soft-fp/truncdfhf2.c
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE double into IEEE half.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "double.h"
+
+HFtype
+__truncdfhf2 (DFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_D (A);
+  FP_DECL_H (R);
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_D (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_TRUNC (H, D, 1, 2, R, A);
+#else
+  FP_TRUNC (H, D, 1, 1, R, A);
+#endif
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/libgcc/soft-fp/truncsfhf2.c b/libgcc/soft-fp/truncsfhf2.c
new file mode 100644
index 00000000000..25bee29f7f5
--- /dev/null
+++ b/libgcc/soft-fp/truncsfhf2.c
@@ -0,0 +1,48 @@
+/* Software floating-point emulation.
+   Truncate IEEE single into IEEE half.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "single.h"
+
+HFtype
+__truncsfhf2 (SFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_S (A);
+  FP_DECL_H (R);
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_S (A, a);
+  FP_TRUNC (H, S, 1, 1, R, A);
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-08-02  6:31                   ` [PATCH V3 0/6] Initial support for AVX512FP16 liuhongt
  2021-08-02  6:31                     ` [PATCH 1/6] Update hf soft-fp from glibc liuhongt
@ 2021-08-02  6:31                     ` liuhongt
  2021-08-04  2:45                       ` Hongtao Liu
  2021-09-03 12:42                       ` [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above Jakub Jelinek
  2021-08-02  6:31                     ` [PATCH 3/6] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations liuhongt
                                       ` (4 subsequent siblings)
  6 siblings, 2 replies; 138+ messages in thread
From: liuhongt @ 2021-08-02  6:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, crazylht, joseph, richard.guenther, hjl.tools

gcc/ChangeLog:

	* config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
	* config/i386/i386.c (enum x86_64_reg_class): Add
	X86_64_SSEHF_CLASS.
	(merge_classes): Handle X86_64_SSEHF_CLASS.
	(examine_argument): Ditto.
	(construct_container): Ditto.
	(classify_argument): Ditto, and set HFmode/HCmode to
	X86_64_SSEHF_CLASS.
	(function_value_32): Return _FLoat16/Complex Float16 by
	%xmm0.
	(function_value_64): Return _Float16/Complex Float16 by SSE
	register.
	(ix86_print_operand): Handle CONST_DOUBLE HFmode.
	(ix86_secondary_reload): Require gpr as intermediate register
	to store _Float16 from sse register when sse4 is not
	available.
	(ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
	sse2.
	(ix86_scalar_mode_supported_p): Ditto.
	(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
	* config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
	(VALID_INT_MODE_P): Add HFmode and HCmode.
	* config/i386/i386.md (*pushhf_rex64): New define_insn.
	(*pushhf): Ditto.
	(*movhf_internal): Ditto.
	* doc/extend.texi (Half-Precision Floating Point): Documemt
	_Float16 for x86.
	* emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
	which is used by extract_bit_field but not backends.

gcc/lto/ChangeLog:

	* lto-lang.c (lto_type_for_mode): Return float16_type_node
	when mode == TYPE_MODE (float16_type_node).

gcc/testsuite/ChangeLog

	* gcc.target/i386/sse2-float16-1.c: New test.
	* gcc.target/i386/sse2-float16-2.c: Ditto.
	* gcc.target/i386/sse2-float16-3.c: Ditto.
	* gcc.target/i386/float16-5.c: Ditto.
---
 gcc/config/i386/i386-modes.def                |   1 +
 gcc/config/i386/i386.c                        |  91 +++++++++++++-
 gcc/config/i386/i386.h                        |   3 +-
 gcc/config/i386/i386.md                       | 118 +++++++++++++++++-
 gcc/doc/extend.texi                           |  13 ++
 gcc/emit-rtl.c                                |   5 +
 gcc/lto/lto-lang.c                            |   3 +
 gcc/testsuite/gcc.target/i386/float16-5.c     |  12 ++
 .../gcc.target/i386/sse2-float16-1.c          |   8 ++
 .../gcc.target/i386/sse2-float16-2.c          |  16 +++
 .../gcc.target/i386/sse2-float16-3.c          |  12 ++
 11 files changed, 274 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c

diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 4e7014be034..9232f59a925 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 
 FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
+FLOAT_MODE (HF, 2, ieee_half_format);
 
 /* In ILP32 mode, XFmode has size 12 and alignment 4.
    In LP64 mode, XFmode has size and alignment 16.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff96134fb37..7979e240426 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -387,6 +387,7 @@ enum x86_64_reg_class
     X86_64_INTEGER_CLASS,
     X86_64_INTEGERSI_CLASS,
     X86_64_SSE_CLASS,
+    X86_64_SSEHF_CLASS,
     X86_64_SSESF_CLASS,
     X86_64_SSEDF_CLASS,
     X86_64_SSEUP_CLASS,
@@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum x86_64_reg_class class2)
     return X86_64_MEMORY_CLASS;
 
   /* Rule #4: If one of the classes is INTEGER, the result is INTEGER.  */
-  if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
-      || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS))
+  if ((class1 == X86_64_INTEGERSI_CLASS
+       && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
+      || (class2 == X86_64_INTEGERSI_CLASS
+	  && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS)))
     return X86_64_INTEGERSI_CLASS;
   if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
       || class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS)
@@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
 	    /* The partial classes are now full classes.  */
 	    if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4)
 	      subclasses[0] = X86_64_SSE_CLASS;
+	    if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2)
+	      subclasses[0] = X86_64_SSE_CLASS;
 	    if (subclasses[0] == X86_64_INTEGERSI_CLASS
 		&& !((bit_offset % 64) == 0 && bytes == 4))
 	      subclasses[0] = X86_64_INTEGER_CLASS;
@@ -2350,6 +2355,12 @@ classify_argument (machine_mode mode, const_tree type,
       gcc_unreachable ();
     case E_CTImode:
       return 0;
+    case E_HFmode:
+      if (!(bit_offset % 64))
+	classes[0] = X86_64_SSEHF_CLASS;
+      else
+	classes[0] = X86_64_SSE_CLASS;
+      return 1;
     case E_SFmode:
       if (!(bit_offset % 64))
 	classes[0] = X86_64_SSESF_CLASS;
@@ -2367,6 +2378,15 @@ classify_argument (machine_mode mode, const_tree type,
       classes[0] = X86_64_SSE_CLASS;
       classes[1] = X86_64_SSEUP_CLASS;
       return 2;
+    case E_HCmode:
+      classes[0] = X86_64_SSE_CLASS;
+      if (!(bit_offset % 64))
+	return 1;
+      else
+	{
+	  classes[1] = X86_64_SSEHF_CLASS;
+	  return 2;
+	}
     case E_SCmode:
       classes[0] = X86_64_SSE_CLASS;
       if (!(bit_offset % 64))
@@ -2481,6 +2501,7 @@ examine_argument (machine_mode mode, const_tree type, int in_return,
 	(*int_nregs)++;
 	break;
       case X86_64_SSE_CLASS:
+      case X86_64_SSEHF_CLASS:
       case X86_64_SSESF_CLASS:
       case X86_64_SSEDF_CLASS:
 	(*sse_nregs)++;
@@ -2580,13 +2601,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
 
   /* First construct simple cases.  Avoid SCmode, since we want to use
      single register to pass this type.  */
-  if (n == 1 && mode != SCmode)
+  if (n == 1 && mode != SCmode && mode != HCmode)
     switch (regclass[0])
       {
       case X86_64_INTEGER_CLASS:
       case X86_64_INTEGERSI_CLASS:
 	return gen_rtx_REG (mode, intreg[0]);
       case X86_64_SSE_CLASS:
+      case X86_64_SSEHF_CLASS:
       case X86_64_SSESF_CLASS:
       case X86_64_SSEDF_CLASS:
 	if (mode != BLKmode)
@@ -2683,6 +2705,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
 				   GEN_INT (i*8));
 	    intreg++;
 	    break;
+	  case X86_64_SSEHF_CLASS:
+	    exp [nexps++]
+	      = gen_rtx_EXPR_LIST (VOIDmode,
+				   gen_rtx_REG (HFmode,
+						GET_SSE_REGNO (sse_regno)),
+				   GEN_INT (i*8));
+	    sse_regno++;
+	    break;
 	  case X86_64_SSESF_CLASS:
 	    exp [nexps++]
 	      = gen_rtx_EXPR_LIST (VOIDmode,
@@ -3903,6 +3933,19 @@ function_value_32 (machine_mode orig_mode, machine_mode mode,
     /* Most things go in %eax.  */
     regno = AX_REG;
 
+  /* Return _Float16/_Complex _Foat16 by sse register.  */
+  if (mode == HFmode)
+    regno = FIRST_SSE_REG;
+  if (mode == HCmode)
+    {
+      rtx ret = gen_rtx_PARALLEL (mode, rtvec_alloc(1));
+      XVECEXP (ret, 0, 0)
+	= gen_rtx_EXPR_LIST (VOIDmode,
+			     gen_rtx_REG (SImode, FIRST_SSE_REG),
+			     GEN_INT (0));
+      return ret;
+    }
+
   /* Override FP return register with %xmm0 for local functions when
      SSE math is enabled or for functions with sseregparm attribute.  */
   if ((fn || fntype) && (mode == SFmode || mode == DFmode))
@@ -3939,6 +3982,8 @@ function_value_64 (machine_mode orig_mode, machine_mode mode,
 
       switch (mode)
 	{
+	case E_HFmode:
+	case E_HCmode:
 	case E_SFmode:
 	case E_SCmode:
 	case E_DFmode:
@@ -13411,6 +13456,15 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	  (file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P');
     }
 
+  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == HFmode)
+    {
+      long l = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (x),
+			       REAL_MODE_FORMAT (HFmode));
+      if (ASSEMBLER_DIALECT == ASM_ATT)
+	putc ('$', file);
+      fprintf (file, "0x%04x", (unsigned int) l);
+    }
+
   else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
     {
       long l;
@@ -18928,6 +18982,16 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
       return NO_REGS;
     }
 
+  /* Require movement to gpr, and then store to memory.  */
+  if (mode == HFmode
+      && !TARGET_SSE4_1
+      && SSE_CLASS_P (rclass)
+      && !in_p && MEM_P (x))
+    {
+      sri->extra_cost = 1;
+      return GENERAL_REGS;
+    }
+
   /* This condition handles corner case where an expression involving
      pointers gets vectorized.  We're trying to use the address of a
      stack slot as a vector initializer.
@@ -21555,10 +21619,27 @@ ix86_scalar_mode_supported_p (scalar_mode mode)
     return default_decimal_float_supported_p ();
   else if (mode == TFmode)
     return true;
+  else if (mode == HFmode && TARGET_SSE2)
+    return true;
   else
     return default_scalar_mode_supported_p (mode);
 }
 
+/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
+   if MODE is HFmode, and punt to the generic implementation otherwise.  */
+
+static bool
+ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode)
+{
+  /* NB: Always return TRUE for HFmode so that the _Float16 type will
+     be defined by the C front-end for AVX512FP16 intrinsics.  We will
+     issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
+     enabled.  */
+  return ((mode == HFmode && TARGET_SSE2)
+	  ? true
+	  : default_libgcc_floating_mode_supported_p (mode));
+}
+
 /* Implements target hook vector_mode_supported_p.  */
 static bool
 ix86_vector_mode_supported_p (machine_mode mode)
@@ -23820,6 +23901,10 @@ ix86_run_selftests (void)
 #undef TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P ix86_scalar_mode_supported_p
 
+#undef TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P
+#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P	\
+ix86_libgcc_floating_mode_supported_p
+
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P ix86_vector_mode_supported_p
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 0c2c93daf32..b1e66ee192e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1018,7 +1018,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define VALID_SSE2_REG_MODE(MODE)					\
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode	\
    || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode	\
-   || (MODE) == V2DImode || (MODE) == DFmode)
+   || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
 
 #define VALID_SSE_REG_MODE(MODE)					\
   ((MODE) == V1TImode || (MODE) == TImode				\
@@ -1047,6 +1047,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == CQImode || (MODE) == CHImode				\
    || (MODE) == CSImode || (MODE) == CDImode				\
    || (MODE) == SDmode || (MODE) == DDmode				\
+   || (MODE) == HFmode || (MODE) == HCmode				\
    || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode	\
    || (TARGET_64BIT							\
        && ((MODE) == TImode || (MODE) == CTImode			\
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8b809c49fe0..d475347172d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1222,6 +1222,9 @@ (define_mode_iterator MODEF [SF DF])
 ;; All x87 floating point modes
 (define_mode_iterator X87MODEF [SF DF XF])
 
+;; All x87 floating point modes plus HF
+(define_mode_iterator X87MODEFH [SF DF XF HF])
+
 ;; All SSE floating point modes
 (define_mode_iterator SSEMODEF [SF DF TF])
 (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
@@ -3130,6 +3133,32 @@ (define_split
   operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx);
 })
 
+(define_insn "*pushhf_rex64"
+  [(set (match_operand:HF 0 "push_operand" "=X,X")
+	(match_operand:HF 1 "nonmemory_no_elim_operand" "r,x"))]
+  "TARGET_64BIT"
+{
+  /* Anything else should be already split before reg-stack.  */
+  gcc_assert (which_alternative == 0);
+  return "push{q}\t%q1";
+}
+  [(set_attr "isa"  "*,sse4")
+   (set_attr "type" "push,multi")
+   (set_attr "mode" "DI,TI")])
+
+(define_insn "*pushhf"
+  [(set (match_operand:HF 0 "push_operand" "=X,X")
+	(match_operand:HF 1 "general_no_elim_operand" "rmF,x"))]
+  "!TARGET_64BIT"
+{
+  /* Anything else should be already split before reg-stack.  */
+  gcc_assert (which_alternative == 0);
+  return "push{l}\t%k1";
+}
+  [(set_attr "isa"  "*,sse4")
+   (set_attr "type" "push,multi")
+   (set_attr "mode" "SI,TI")])
+
 (define_insn "*pushsf_rex64"
   [(set (match_operand:SF 0 "push_operand" "=X,X,X")
 	(match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,v"))]
@@ -3158,10 +3187,11 @@ (define_insn "*pushsf"
    (set_attr "unit" "i387,*,*")
    (set_attr "mode" "SF,SI,SF")])
 
+(define_mode_iterator MODESH [SF HF])
 ;; %%% Kill this when call knows how to work this out.
 (define_split
-  [(set (match_operand:SF 0 "push_operand")
-	(match_operand:SF 1 "any_fp_register_operand"))]
+  [(set (match_operand:MODESH 0 "push_operand")
+	(match_operand:MODESH 1 "any_fp_register_operand"))]
   "reload_completed"
   [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2)))
    (set (match_dup 0) (match_dup 1))]
@@ -3209,8 +3239,8 @@ (define_expand "movtf"
   "ix86_expand_move (TFmode, operands); DONE;")
 
 (define_expand "mov<mode>"
-  [(set (match_operand:X87MODEF 0 "nonimmediate_operand")
-	(match_operand:X87MODEF 1 "general_operand"))]
+  [(set (match_operand:X87MODEFH 0 "nonimmediate_operand")
+	(match_operand:X87MODEFH 1 "general_operand"))]
   ""
   "ix86_expand_move (<MODE>mode, operands); DONE;")
 
@@ -3646,6 +3676,86 @@ (define_insn "*movsf_internal"
 	   ]
 	   (const_string "*")))])
 
+(define_insn "*movhf_internal"
+ [(set (match_operand:HF 0 "nonimmediate_operand"
+	 "=?r,?m,v,v,?r,m,?v,v")
+       (match_operand:HF 1 "general_operand"
+	 "rmF,rF,C,v, v,v, r,m"))]
+ "!(MEM_P (operands[0]) && MEM_P (operands[1]))
+  && (lra_in_progress
+      || reload_completed
+      || !CONST_DOUBLE_P (operands[1])
+      || (TARGET_SSE && TARGET_SSE_MATH
+	  && standard_sse_constant_p (operands[1], HFmode) == 1)
+      || memory_operand (operands[0], HFmode))"
+{
+  switch (get_attr_type (insn))
+    {
+    case TYPE_IMOV:
+      return "mov{w}\t{%1, %0|%0, %1}";
+
+    case TYPE_SSELOG1:
+      return standard_sse_constant_opcode (insn, operands);
+
+    case TYPE_SSEMOV:
+      return ix86_output_ssemov (insn, operands);
+
+    case TYPE_SSELOG:
+      if (SSE_REG_P (operands[0]))
+	return MEM_P (operands[1])
+	       ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
+	       : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+      else
+	return MEM_P (operands[1])
+	       ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
+	       : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set (attr "isa")
+	(cond [(eq_attr "alternative" "2,3,4,6,7")
+		 (const_string "sse2")
+	       (eq_attr "alternative" "5")
+		 (const_string "sse4")
+	      ]
+	      (const_string "*")))
+   (set (attr "type")
+	(cond [(eq_attr "alternative" "0,1")
+		 (const_string "imov")
+	       (eq_attr "alternative" "2")
+		 (const_string "sselog1")
+	       (eq_attr "alternative" "4,5,6,7")
+		 (const_string "sselog")
+	      ]
+	      (const_string "ssemov")))
+   (set (attr "memory")
+	(cond [(eq_attr "alternative" "4,6")
+		 (const_string "none")
+	       (eq_attr "alternative" "5")
+		 (const_string "store")
+	       (eq_attr "alternative" "7")
+		 (const_string "load")
+	      ]
+	      (const_string "*")))
+   (set (attr "prefix")
+	(cond [(eq_attr "alternative" "0,1")
+		 (const_string "orig")
+	      ]
+	      (const_string "maybe_vex")))
+   (set (attr "mode")
+	(cond [(eq_attr "alternative" "0,1")
+		 (const_string "HI")
+	       (eq_attr "alternative" "2")
+		 (const_string "V4SF")
+	       (eq_attr "alternative" "4,5,6,7")
+		 (const_string "TI")
+	       (eq_attr "alternative" "3")
+		 (const_string "SF")
+	      ]
+	      (const_string "*")))])
+
 (define_split
   [(set (match_operand 0 "any_fp_register_operand")
 	(match_operand 1 "memory_operand"))]
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b83cd4919bb..f42fd633725 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
 @section Half-Precision Floating Point
 @cindex half-precision floating point
 @cindex @code{__fp16} data type
+@cindex @code{__Float16} data type
 
 On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
 point via the @code{__fp16} type defined in the ARM C Language Extensions.
@@ -1150,6 +1151,18 @@ calls.
 It is recommended that portable code use the @code{_Float16} type defined
 by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
 
+On x86 targets with @code{target("sse2")} and above, GCC supports half-precision
+(16-bit) floating point via the @code{_Float16} type which is defined by
+18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
+which contains same data format as C.
+
+Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
+operations will be emulated by software emulation and the @code{float}
+instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
+the intermediate result of the operation as 32-bit precision. This may lead
+to inconsistent behavior between software emulation and AVX512-FP16
+instructions.
+
 @node Decimal Float
 @section Decimal Floating Types
 @cindex decimal floating types
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index ff3b4449b37..775ee397836 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
      fix them all.  */
   if (omode == word_mode)
     ;
+  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
+     here. Though extract_bit_field is the culprit here, not the backends.  */
+  else if (known_gt (regsize, osize) && known_gt (osize, isize)
+	   && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
+    ;
   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
      is the culprit here, and not the backends.  */
   else if (known_ge (osize, regsize) && known_ge (isize, osize))
diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
index c13c7e45ac1..92f499643b5 100644
--- a/gcc/lto/lto-lang.c
+++ b/gcc/lto/lto-lang.c
@@ -992,6 +992,9 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
     return unsigned_p ? unsigned_intTI_type_node : intTI_type_node;
 #endif
 
+  if (float16_type_node && mode == TYPE_MODE (float16_type_node))
+    return float16_type_node;
+
   if (mode == TYPE_MODE (float_type_node))
     return float_type_node;
 
diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
new file mode 100644
index 00000000000..ebc0af1490b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2" } */
+_Float16
+foo (int a)
+{
+  union {
+    int a;
+    _Float16 b;
+  }c;
+  c.a = a;
+  return c.b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-1.c b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
new file mode 100644
index 00000000000..1b645eb499d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-sse2" } */
+
+_Float16/* { dg-error "is not supported on this target" } */
+foo (_Float16 x) /* { dg-error "is not supported on this target" } */
+{
+  return x;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-2.c b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
new file mode 100644
index 00000000000..3da7683fc31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx512f" } */
+
+union flt
+{
+  _Float16 flt;
+  short s;
+};
+
+_Float16
+foo (union flt x)
+{
+  return x.flt;
+}
+
+/* { dg-final { scan-assembler {(?n)pinsrw[\t ].*%xmm0} } } */
diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-3.c b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
new file mode 100644
index 00000000000..60ff9d4ab80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx512f" } */
+
+#include<complex.h>
+
+_Complex _Float16
+foo (_Complex _Float16 x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler {(?n)movd[\t ].*%xmm0} } } */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 3/6] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.
  2021-08-02  6:31                   ` [PATCH V3 0/6] Initial support for AVX512FP16 liuhongt
  2021-08-02  6:31                     ` [PATCH 1/6] Update hf soft-fp from glibc liuhongt
  2021-08-02  6:31                     ` [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
@ 2021-08-02  6:31                     ` liuhongt
  2021-08-02  6:31                     ` [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16 liuhongt
                                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-08-02  6:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, crazylht, joseph, richard.guenther, hjl.tools

libgcc/ChangeLog:

	* config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto.
	* config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto.
	* config/i386/t-softfp: Add hf soft-fp.
	* config.host: Add i386/64/t-softfp.
	* config/i386/64/t-softfp: New file.
---
 libgcc/config.host                  | 5 +----
 libgcc/config/i386/32/sfp-machine.h | 1 +
 libgcc/config/i386/32/t-softfp      | 1 +
 libgcc/config/i386/64/sfp-machine.h | 1 +
 libgcc/config/i386/64/t-softfp      | 1 +
 libgcc/config/i386/sfp-machine.h    | 1 +
 libgcc/config/i386/t-softfp         | 5 +++++
 7 files changed, 11 insertions(+), 4 deletions(-)
 create mode 100644 libgcc/config/i386/64/t-softfp

diff --git a/libgcc/config.host b/libgcc/config.host
index 50f00062232..96da9ef1cce 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1540,10 +1540,7 @@ i[34567]86-*-elfiamcu | i[34567]86-*-rtems*)
 	;;
 i[34567]86-*-* | x86_64-*-*)
   	tmake_file="${tmake_file} t-softfp-tf"
-	if test "${host_address}" = 32; then
-		tmake_file="${tmake_file} i386/${host_address}/t-softfp"
-	fi
-	tmake_file="${tmake_file} i386/t-softfp t-softfp"
+	tmake_file="${tmake_file} i386/${host_address}/t-softfp i386/t-softfp t-softfp"
 	;;
 esac
 
diff --git a/libgcc/config/i386/32/sfp-machine.h b/libgcc/config/i386/32/sfp-machine.h
index 1fa282d7afe..e24cbc8d180 100644
--- a/libgcc/config/i386/32/sfp-machine.h
+++ b/libgcc/config/i386/32/sfp-machine.h
@@ -86,6 +86,7 @@
 #define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
+#define _FP_NANFRAC_H		_FP_QNANBIT_H
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 /* Even if XFmode is 12byte,  we have to pad it to
diff --git a/libgcc/config/i386/32/t-softfp b/libgcc/config/i386/32/t-softfp
index a48a5b3b116..86478cf5f20 100644
--- a/libgcc/config/i386/32/t-softfp
+++ b/libgcc/config/i386/32/t-softfp
@@ -3,3 +3,4 @@ softfp_int_modes := si di
 
 # Provide fallbacks for __builtin_copysignq and __builtin_fabsq.
 LIB2ADD += $(srcdir)/config/i386/32/tf-signs.c
+
diff --git a/libgcc/config/i386/64/sfp-machine.h b/libgcc/config/i386/64/sfp-machine.h
index 1ff94c23ea4..e1c616699bb 100644
--- a/libgcc/config/i386/64/sfp-machine.h
+++ b/libgcc/config/i386/64/sfp-machine.h
@@ -13,6 +13,7 @@ typedef unsigned int UTItype __attribute__ ((mode (TI)));
 
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
+#define _FP_NANFRAC_H		_FP_QNANBIT_H
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D
 #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
diff --git a/libgcc/config/i386/64/t-softfp b/libgcc/config/i386/64/t-softfp
new file mode 100644
index 00000000000..f9d8b3a945c
--- /dev/null
+++ b/libgcc/config/i386/64/t-softfp
@@ -0,0 +1 @@
+softfp_extras := fixhfti fixunshfti floattihf floatuntihf
diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
index 8319f0550bc..f15d29d3755 100644
--- a/libgcc/config/i386/sfp-machine.h
+++ b/libgcc/config/i386/sfp-machine.h
@@ -17,6 +17,7 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
 #define _FP_KEEPNANFRACP	1
 #define _FP_QNANNEGATEDP 0
 
+#define _FP_NANSIGN_H		1
 #define _FP_NANSIGN_S		1
 #define _FP_NANSIGN_D		1
 #define _FP_NANSIGN_E		1
diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp
index 685d9cf8502..4ac214eb0ce 100644
--- a/libgcc/config/i386/t-softfp
+++ b/libgcc/config/i386/t-softfp
@@ -1 +1,6 @@
 LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c
+
+softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
+softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
+
+softfp_extras += eqhf2
\ No newline at end of file
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
  2021-08-02  6:31                   ` [PATCH V3 0/6] Initial support for AVX512FP16 liuhongt
                                       ` (2 preceding siblings ...)
  2021-08-02  6:31                     ` [PATCH 3/6] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations liuhongt
@ 2021-08-02  6:31                     ` liuhongt
  2021-08-02 19:34                       ` Joseph Myers
  2021-08-02  6:39                     ` [PATCH 6/6] AVX512FP16: Support vector init/broadcast/set/extract for FP16 liuhongt
                                       ` (2 subsequent siblings)
  6 siblings, 1 reply; 138+ messages in thread
From: liuhongt @ 2021-08-02  6:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, crazylht, joseph, richard.guenther, hjl.tools

gcc/ada/ChangeLog:

	* gcc-interface/misc.c (gnat_post_options): Issue an error for
	-fexcess-precision=16.

gcc/c-family/ChangeLog:

	* c-common.c (excess_precision_mode_join): Update below comments.
	(c_ts18661_flt_eval_method): Set excess_precision_type to
	EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
	* c-cppbuiltin.c (cpp_atomic_builtins): Update below comments.
	(c_cpp_flt_eval_method_iec_559): Set excess_precision_type to
	EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.

gcc/ChangeLog:

	* common.opt: Support -fexcess-precision=16.
	* config/aarch64/aarch64.c (aarch64_excess_precision): Return
	FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when
	EXCESS_PRECISION_TYPE_FLOAT16.
	* config/arm/arm.c (arm_excess_precision): Ditto.
	* config/i386/i386.c (ix86_get_excess_precision): Ditto.
	* config/m68k/m68k.c (m68k_excess_precision): Issue an error
	when EXCESS_PRECISION_TYPE_FLOAT16.
	* config/s390/s390.c (s390_excess_precision): Ditto.
	* coretypes.h (enum excess_precision_type): Add
	EXCESS_PRECISION_TYPE_FLOAT16.
	* doc/tm.texi (TARGET_C_EXCESS_PRECISION): Update documents.
	* doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): Ditto.
	* doc/extend.texi (Half-Precision): Document
	-fexcess-precision=16.
	* flag-types.h (enum excess_precision): Add
	EXCESS_PRECISION_FLOAT16.
	* target.def (excess_precision): Update document.
	* tree.c (excess_precision_type): Set excess_precision_type to
	EXCESS_PRECISION_FLOAT16 when -fexcess-precision=16.

gcc/fortran/ChangeLog:

	* options.c (gfc_post_options): Issue an error for
	-fexcess-precision=16.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/float16-6.c: New test.
---
 gcc/ada/gcc-interface/misc.c              |  3 +++
 gcc/c-family/c-common.c                   |  6 ++++--
 gcc/c-family/c-cppbuiltin.c               |  6 ++++--
 gcc/common.opt                            |  5 ++++-
 gcc/config/aarch64/aarch64.c              |  1 +
 gcc/config/arm/arm.c                      |  1 +
 gcc/config/i386/i386.c                    |  2 ++
 gcc/config/m68k/m68k.c                    |  2 ++
 gcc/config/s390/s390.c                    |  2 ++
 gcc/coretypes.h                           |  3 ++-
 gcc/doc/extend.texi                       |  3 ++-
 gcc/doc/tm.texi                           | 14 ++++++++++----
 gcc/doc/tm.texi.in                        |  3 +++
 gcc/flag-types.h                          |  3 ++-
 gcc/fortran/options.c                     |  3 +++
 gcc/target.def                            | 11 +++++++----
 gcc/testsuite/gcc.target/i386/float16-6.c |  8 ++++++++
 gcc/tree.c                                |  3 ++-
 18 files changed, 62 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-6.c

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 186367ac6d1..96199bd4b63 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -256,6 +256,9 @@ gnat_post_options (const char **pfilename ATTRIBUTE_UNUSED)
   /* Excess precision other than "fast" requires front-end support.  */
   if (flag_excess_precision == EXCESS_PRECISION_STANDARD)
     sorry ("%<-fexcess-precision=standard%> for Ada");
+  else if (flag_excess_precision == EXCESS_PRECISION_FLOAT16)
+    sorry ("%<-fexcess-precision=16%> for Ada");
+
   flag_excess_precision = EXCESS_PRECISION_FAST;
 
   /* No psABI change warnings for Ada.  */
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index aacdfb46a02..7e72062c77c 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8772,7 +8772,7 @@ excess_precision_mode_join (enum flt_eval_method x,
 
    This relates to the effective excess precision seen by the user,
    which is the join point of the precision the target requests for
-   -fexcess-precision={standard,fast} and the implicit excess precision
+   -fexcess-precision={standard,fast,16} and the implicit excess precision
    the target uses.  */
 
 static enum flt_eval_method
@@ -8784,7 +8784,9 @@ c_ts18661_flt_eval_method (void)
   enum excess_precision_type flag_type
     = (flag_excess_precision == EXCESS_PRECISION_STANDARD
        ? EXCESS_PRECISION_TYPE_STANDARD
-       : EXCESS_PRECISION_TYPE_FAST);
+       : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
+	  ? EXCESS_PRECISION_TYPE_FLOAT16
+	  : EXCESS_PRECISION_TYPE_FAST));
 
   enum flt_eval_method requested
     = targetm.c.excess_precision (flag_type);
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index f79f939bd10..5f30354a33c 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -753,7 +753,7 @@ cpp_atomic_builtins (cpp_reader *pfile)
 /* Return TRUE if the implicit excess precision in which the back-end will
    compute floating-point calculations is not more than the explicit
    excess precision that the front-end will apply under
-   -fexcess-precision=[standard|fast].
+   -fexcess-precision=[standard|fast|16].
 
    More intuitively, return TRUE if the excess precision proposed by the
    front-end is the excess precision that will actually be used.  */
@@ -764,7 +764,9 @@ c_cpp_flt_eval_method_iec_559 (void)
   enum excess_precision_type front_end_ept
     = (flag_excess_precision == EXCESS_PRECISION_STANDARD
        ? EXCESS_PRECISION_TYPE_STANDARD
-       : EXCESS_PRECISION_TYPE_FAST);
+       : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
+	  ? EXCESS_PRECISION_TYPE_FLOAT16
+	  : EXCESS_PRECISION_TYPE_FAST));
 
   enum flt_eval_method back_end
     = targetm.c.excess_precision (EXCESS_PRECISION_TYPE_IMPLICIT);
diff --git a/gcc/common.opt b/gcc/common.opt
index d9da1131eda..3dd74766400 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1518,7 +1518,7 @@ Perform a number of minor, expensive optimizations.
 
 fexcess-precision=
 Common Joined RejectNegative Enum(excess_precision) Var(flag_excess_precision) Init(EXCESS_PRECISION_DEFAULT) Optimization SetByCombined
--fexcess-precision=[fast|standard]	Specify handling of excess floating-point precision.
+-fexcess-precision=[fast|standard|16]	Specify handling of excess floating-point precision.
 
 Enum
 Name(excess_precision) Type(enum excess_precision) UnknownError(unknown excess precision style %qs)
@@ -1529,6 +1529,9 @@ Enum(excess_precision) String(fast) Value(EXCESS_PRECISION_FAST)
 EnumValue
 Enum(excess_precision) String(standard) Value(EXCESS_PRECISION_STANDARD)
 
+EnumValue
+Enum(excess_precision) String(16) Value(EXCESS_PRECISION_FLOAT16)
+
 ; Whether we permit the extended set of values for FLT_EVAL_METHOD
 ; introduced in ISO/IEC TS 18661-3, or limit ourselves to those in C99/C11.
 fpermitted-flt-eval-methods=
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3bdf19d71b5..c986a93a243 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -24797,6 +24797,7 @@ aarch64_excess_precision (enum excess_precision_type type)
 		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
 		: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
       case EXCESS_PRECISION_TYPE_IMPLICIT:
+      case EXCESS_PRECISION_TYPE_FLOAT16:
 	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
 	gcc_unreachable ();
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6d781e23ee9..e2a18615860 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25599,6 +25599,7 @@ arm_excess_precision (enum excess_precision_type type)
 		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
 		: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
       case EXCESS_PRECISION_TYPE_IMPLICIT:
+      case EXCESS_PRECISION_TYPE_FLOAT16:
 	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
 	gcc_unreachable ();
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7979e240426..dc673c89bc8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum excess_precision_type type)
 	return (type == EXCESS_PRECISION_TYPE_STANDARD
 		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
 		: FLT_EVAL_METHOD_UNPREDICTABLE);
+      case EXCESS_PRECISION_TYPE_FLOAT16:
+	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
 	gcc_unreachable ();
     }
diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index 3f63c60fa92..2fef457c09e 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -7115,6 +7115,8 @@ m68k_excess_precision (enum excess_precision_type type)
 	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
 
 	return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
+      case EXCESS_PRECISION_TYPE_FLOAT16:
+	error ("%<-fexcess-precision=16%> is not supported on this target");
       default:
 	gcc_unreachable ();
     }
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index b1d3b99784d..234ee4ac9c4 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16515,6 +16515,8 @@ s390_excess_precision (enum excess_precision_type type)
 	   ensure consistency with the implementation in glibc, report that
 	   float is evaluated to the range and precision of double.  */
 	return FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE;
+      case EXCESS_PRECISION_TYPE_FLOAT16:
+	error ("%<-fexcess-precision=16%> is not supported on this target");
       default:
 	gcc_unreachable ();
     }
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 406572e947d..07b9aa656c5 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -424,7 +424,8 @@ enum excess_precision_type
 {
   EXCESS_PRECISION_TYPE_IMPLICIT,
   EXCESS_PRECISION_TYPE_STANDARD,
-  EXCESS_PRECISION_TYPE_FAST
+  EXCESS_PRECISION_TYPE_FAST,
+  EXCESS_PRECISION_TYPE_FLOAT16
 };
 
 /* Level of size optimization.  */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f42fd633725..3a1978efc97 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1161,7 +1161,8 @@ operations will be emulated by software emulation and the @code{float}
 instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
 the intermediate result of the operation as 32-bit precision. This may lead
 to inconsistent behavior between software emulation and AVX512-FP16
-instructions.
+instructions. Using @option{-fexcess-precision=16} and  will force round
+back after each operation.
 
 @node Decimal Float
 @section Decimal Floating Types
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c8f4abe3e41..9fac173a217 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -982,20 +982,26 @@ Do not define this macro if it would never modify @var{m}.
 Return a value, with the same meaning as the C99 macro
 @code{FLT_EVAL_METHOD} that describes which excess precision should be
 applied.  @var{type} is either @code{EXCESS_PRECISION_TYPE_IMPLICIT},
-@code{EXCESS_PRECISION_TYPE_FAST}, or
-@code{EXCESS_PRECISION_TYPE_STANDARD}.  For
+@code{EXCESS_PRECISION_TYPE_FAST},
+@code{EXCESS_PRECISION_TYPE_STANDARD}, or
+@code{EXCESS_PRECISION_TYPE_FLOAT16}.  For
 @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which
 precision and range operations will be implictly evaluated in regardless
 of the excess precision explicitly added.  For
-@code{EXCESS_PRECISION_TYPE_STANDARD} and
+@code{EXCESS_PRECISION_TYPE_STANDARD}, 
+@code{EXCESS_PRECISION_TYPE_FLOAT16}, and
 @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the
 explicit excess precision that should be added depending on the
 value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}.
 Note that unpredictable explicit excess precision does not make sense,
 so a target should never return @code{FLT_EVAL_METHOD_UNPREDICTABLE}
-when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD} or
+when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD},
+@code{EXCESS_PRECISION_TYPE_FLOAT16} or
 @code{EXCESS_PRECISION_TYPE_FAST}.
 @end deftypefn
+Return a value, with the same meaning as the C99 macro
+@code{FLT_EVAL_METHOD} that describes which excess precision should be
+applied.
 
 @deftypefn {Target Hook} machine_mode TARGET_PROMOTE_FUNCTION_MODE (const_tree @var{type}, machine_mode @var{mode}, int *@var{punsignedp}, const_tree @var{funtype}, int @var{for_return})
 Like @code{PROMOTE_MODE}, but it is applied to outgoing function arguments or
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9c4b5016053..90a8d790758 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -929,6 +929,9 @@ Do not define this macro if it would never modify @var{m}.
 @end defmac
 
 @hook TARGET_C_EXCESS_PRECISION
+Return a value, with the same meaning as the C99 macro
+@code{FLT_EVAL_METHOD} that describes which excess precision should be
+applied.
 
 @hook TARGET_PROMOTE_FUNCTION_MODE
 
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index e43d1de490d..5eeb5046222 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -198,7 +198,8 @@ enum excess_precision
 {
   EXCESS_PRECISION_DEFAULT,
   EXCESS_PRECISION_FAST,
-  EXCESS_PRECISION_STANDARD
+  EXCESS_PRECISION_STANDARD,
+  EXCESS_PRECISION_FLOAT16
 };
 
 /* The options for which values of FLT_EVAL_METHOD are permissible.  */
diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index 1723f689a57..847e20e8829 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -267,6 +267,9 @@ gfc_post_options (const char **pfilename)
      support.  */
   if (flag_excess_precision == EXCESS_PRECISION_STANDARD)
     sorry ("%<-fexcess-precision=standard%> for Fortran");
+  else if (flag_excess_precision == EXCESS_PRECISION_FLOAT16)
+    sorry ("%<-fexcess-precision=16%> for Fortran");
+
   flag_excess_precision = EXCESS_PRECISION_FAST;
 
   /* Fortran allows associative math - but we cannot reassociate if
diff --git a/gcc/target.def b/gcc/target.def
index 2e40448e6c5..b0bd79a0671 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6192,18 +6192,21 @@ DEFHOOK
  "Return a value, with the same meaning as the C99 macro\n\
 @code{FLT_EVAL_METHOD} that describes which excess precision should be\n\
 applied.  @var{type} is either @code{EXCESS_PRECISION_TYPE_IMPLICIT},\n\
-@code{EXCESS_PRECISION_TYPE_FAST}, or\n\
-@code{EXCESS_PRECISION_TYPE_STANDARD}.  For\n\
+@code{EXCESS_PRECISION_TYPE_FAST},\n\
+@code{EXCESS_PRECISION_TYPE_STANDARD}, or\n\
+@code{EXCESS_PRECISION_TYPE_FLOAT16}.  For\n\
 @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which\n\
 precision and range operations will be implictly evaluated in regardless\n\
 of the excess precision explicitly added.  For\n\
-@code{EXCESS_PRECISION_TYPE_STANDARD} and\n\
+@code{EXCESS_PRECISION_TYPE_STANDARD}, \n\
+@code{EXCESS_PRECISION_TYPE_FLOAT16}, and\n\
 @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the\n\
 explicit excess precision that should be added depending on the\n\
 value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}.\n\
 Note that unpredictable explicit excess precision does not make sense,\n\
 so a target should never return @code{FLT_EVAL_METHOD_UNPREDICTABLE}\n\
-when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD} or\n\
+when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD},\n\
+@code{EXCESS_PRECISION_TYPE_FLOAT16} or\n\
 @code{EXCESS_PRECISION_TYPE_FAST}.",
  enum flt_eval_method, (enum excess_precision_type type),
  default_excess_precision)
diff --git a/gcc/testsuite/gcc.target/i386/float16-6.c b/gcc/testsuite/gcc.target/i386/float16-6.c
new file mode 100644
index 00000000000..599f4495086
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-6.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2 -fdump-tree-gimple -fexcess-precision=16" } */
+/* { dg-final { scan-tree-dump-not "\\(float\\)" "gimple" } } */
+_Float16
+foo (_Float16 a, _Float16 b, _Float16 c)
+{
+  return a + b + c;
+}
diff --git a/gcc/tree.c b/gcc/tree.c
index bead1ac134c..20dfbe00b88 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7633,7 +7633,8 @@ excess_precision_type (tree type)
   enum excess_precision_type requested_type
     = (flag_excess_precision == EXCESS_PRECISION_FAST
        ? EXCESS_PRECISION_TYPE_FAST
-       : EXCESS_PRECISION_TYPE_STANDARD);
+       : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
+	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
 
   enum flt_eval_method target_flt_eval_method
     = targetm.c.excess_precision (requested_type);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 6/6] AVX512FP16: Support vector init/broadcast/set/extract for FP16.
  2021-08-02  6:31                   ` [PATCH V3 0/6] Initial support for AVX512FP16 liuhongt
                                       ` (3 preceding siblings ...)
  2021-08-02  6:31                     ` [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16 liuhongt
@ 2021-08-02  6:39                     ` liuhongt
  2021-08-02  6:44                     ` [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions liuhongt
  2021-09-02  6:06                     ` [PATCH V3 0/6] Initial support for AVX512FP16 Hongtao Liu
  6 siblings, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-08-02  6:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, crazylht, joseph, richard.guenther, hjl.tools

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
	(_mm256_set_ph): Likewise.
	(_mm512_set_ph): Likewise.
	(_mm_setr_ph): Likewise.
	(_mm256_setr_ph): Likewise.
	(_mm512_setr_ph): Likewise.
	(_mm_set1_ph): Likewise.
	(_mm256_set1_ph): Likewise.
	(_mm512_set1_ph): Likewise.
	(_mm_setzero_ph): Likewise.
	(_mm256_setzero_ph): Likewise.
	(_mm512_setzero_ph): Likewise.
	(_mm_set_sh): Likewise.
	(_mm_load_sh): Likewise.
	(_mm_store_sh): Likewise.
	* config/i386/i386-builtin-types.def (V8HF): New type.
	(DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type
	* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
	Support vector HFmodes.
	(ix86_expand_vector_init_one_nonzero): Likewise.
	(ix86_expand_vector_init_one_var): Likewise.
	(ix86_expand_vector_init_interleave): Likewise.
	(ix86_expand_vector_init_general): Likewise.
	(ix86_expand_vector_set): Likewise.
	(ix86_expand_vector_extract): Likewise.
	(ix86_expand_vector_init_concat): Likewise.
	(ix86_expand_sse_movcc): Handle vector HFmodes.
	(ix86_expand_vector_set_var): Ditto.
	* config/i386/i386-modes.def: Add HF vector modes in comment.
	* config/i386/i386.c (classify_argument): Add HF vector modes.
	(ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
	(ix86_vector_mode_supported_p): Likewise.
	(ix86_set_reg_reg_cost): Handle vector HFmode.
	(ix86_get_ssemov): Handle vector HFmode.
	(function_arg_advance_64): Pass unamed V16HFmode and V32HFmode
	by stack.
	(function_arg_32): Pass V8HF/V16HF/V32HF by sse reg for 32bit
	mode.
	(function_arg_advance_32): Ditto.
	* config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
	(VALID_AVX256_REG_OR_OI_MODE): Rename to ..
	(VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF.
	(VALID_SSE2_REG_VHF_MODE): New.
	(VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode.
	(SSE_REG_MODE_P): Add vector HFmode.
	* config/i386/i386.md (mode): Add HF vector modes.
	(MODE_SIZE): Likewise.
	(ssemodesuffix): Add ph suffix for HF vector modes.
	* config/i386/sse.md (VFH_128): New mode iterator.
	(VMOVE): Adjust for HF vector modes.
	(V): Likewise.
	(V_256_512): Likewise.
	(avx512): Likewise.
	(avx512fmaskmode): Likewise.
	(shuffletype): Likewise.
	(sseinsnmode): Likewise.
	(ssedoublevecmode): Likewise.
	(ssehalfvecmode): Likewise.
	(ssehalfvecmodelower): Likewise.
	(ssePScmode): Likewise.
	(ssescalarmode): Likewise.
	(ssescalarmodelower): Likewise.
	(sseintprefix): Likewise.
	(i128): Likewise.
	(bcstscalarsuff): Likewise.
	(xtg_mode): Likewise.
	(VI12HF_AVX512VL): New mode_iterator.
	(VF_AVX512FP16): Likewise.
	(VIHF): Likewise.
	(VIHF_256): Likewise.
	(VIHF_AVX512BW): Likewise.
	(V16_256): Likewise.
	(V32_512): Likewise.
	(sseintmodesuffix): New mode_attr.
	(sse): Add scalar and vector HFmodes.
	(ssescalarmode): Add vector HFmode mapping.
	(ssescalarmodesuffix): Add sh suffix for HFmode.
	(*<sse>_vm<insn><mode>3): Use VFH_128.
	(*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
	(*ieee_<ieee_maxmin><mode>3): Likewise.
	(<avx512>_blendm<mode>): New define_insn.
	(vec_setv8hf): New define_expand.
	(vec_set<mode>_0): New define_insn for HF vector set.
	(*avx512fp16_movsh): Likewise.
	(avx512fp16_movsh): Likewise.
	(vec_extract_lo_v32hi): Rename to ...
	(vec_extract_lo_<mode>): ... this, and adjust to allow HF
	vector modes.
	(vec_extract_hi_v32hi): Likewise.
	(vec_extract_hi_<mode>): Likewise.
	(vec_extract_lo_v16hi): Likewise.
	(vec_extract_lo_<mode>): Likewise.
	(vec_extract_hi_v16hi): Likewise.
	(vec_extract_hi_<mode>): Likewise.
	(vec_set_hi_v16hi): Likewise.
	(vec_set_hi_<mode>): Likewise.
	(vec_set_lo_v16hi): Likewise.
	(vec_set_lo_<mode>: Likewise.
	(*vec_extract<mode>_0): New define_insn_and_split for HF
	vector extract.
	(*vec_extracthf): New define_insn.
	(VEC_EXTRACT_MODE): Add HF vector modes.
	(PINSR_MODE): Add V8HF.
	(sse2p4_1): Likewise.
	(pinsr_evex_isa): Likewise.
	(<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
	insert for V8HFmode.
	(pbroadcast_evex_isa): Add HF vector modes.
	(AVX2_VEC_DUP_MODE): Likewise.
	(VEC_INIT_MODE): Likewise.
	(VEC_INIT_HALF_MODE): Likewise.
	(avx2_pbroadcast<mode>): Adjust to support HF vector mode
	broadcast.
	(avx2_pbroadcast<mode>_1): Likewise.
	(<avx512>_vec_dup<mode>_1): Likewise.
	(<avx512>_vec_dup<mode><mask_name>): Likewise.
	(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>):
	Likewise.
---
 gcc/config/i386/avx512fp16intrin.h     | 172 +++++++++++
 gcc/config/i386/i386-builtin-types.def |   6 +-
 gcc/config/i386/i386-expand.c          | 124 +++++++-
 gcc/config/i386/i386-modes.def         |  12 +-
 gcc/config/i386/i386.c                 |  75 ++---
 gcc/config/i386/i386.h                 |  15 +-
 gcc/config/i386/i386.md                |  13 +-
 gcc/config/i386/sse.md                 | 397 +++++++++++++++++++------
 8 files changed, 660 insertions(+), 154 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
index 38d63161ba6..3fc0770986e 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -45,6 +45,178 @@ typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
 typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
 typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
 
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5,
+	    _Float16 __A4, _Float16 __A3, _Float16 __A2,
+	    _Float16 __A1, _Float16 __A0)
+{
+  return __extension__ (__m128h)(__v8hf){ __A0, __A1, __A2, __A3,
+					  __A4, __A5, __A6, __A7 };
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_set_ph (_Float16 __A15, _Float16 __A14, _Float16 __A13,
+	       _Float16 __A12, _Float16 __A11, _Float16 __A10,
+	       _Float16 __A9, _Float16 __A8, _Float16 __A7,
+	       _Float16 __A6, _Float16 __A5, _Float16 __A4,
+	       _Float16 __A3, _Float16 __A2, _Float16 __A1,
+	       _Float16 __A0)
+{
+  return __extension__ (__m256h)(__v16hf){ __A0, __A1, __A2, __A3,
+					   __A4, __A5, __A6, __A7,
+					   __A8, __A9, __A10, __A11,
+					   __A12, __A13, __A14, __A15 };
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set_ph (_Float16 __A31, _Float16 __A30, _Float16 __A29,
+	       _Float16 __A28, _Float16 __A27, _Float16 __A26,
+	       _Float16 __A25, _Float16 __A24, _Float16 __A23,
+	       _Float16 __A22, _Float16 __A21, _Float16 __A20,
+	       _Float16 __A19, _Float16 __A18, _Float16 __A17,
+	       _Float16 __A16, _Float16 __A15, _Float16 __A14,
+	       _Float16 __A13, _Float16 __A12, _Float16 __A11,
+	       _Float16 __A10, _Float16 __A9, _Float16 __A8,
+	       _Float16 __A7, _Float16 __A6, _Float16 __A5,
+	       _Float16 __A4, _Float16 __A3, _Float16 __A2,
+	       _Float16 __A1, _Float16 __A0)
+{
+  return __extension__ (__m512h)(__v32hf){ __A0, __A1, __A2, __A3,
+					   __A4, __A5, __A6, __A7,
+					   __A8, __A9, __A10, __A11,
+					   __A12, __A13, __A14, __A15,
+					   __A16, __A17, __A18, __A19,
+					   __A20, __A21, __A22, __A23,
+					   __A24, __A25, __A26, __A27,
+					   __A28, __A29, __A30, __A31 };
+}
+
+/* Create vectors of elements in the reversed order from _mm_set_ph,
+   _mm256_set_ph and _mm512_set_ph functions.  */
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+	     _Float16 __A3, _Float16 __A4, _Float16 __A5,
+	     _Float16 __A6, _Float16 __A7)
+{
+  return _mm_set_ph (__A7, __A6, __A5, __A4, __A3, __A2, __A1, __A0);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+		_Float16 __A3, _Float16 __A4, _Float16 __A5,
+		_Float16 __A6, _Float16 __A7, _Float16 __A8,
+		_Float16 __A9, _Float16 __A10, _Float16 __A11,
+		_Float16 __A12, _Float16 __A13, _Float16 __A14,
+		_Float16 __A15)
+{
+  return _mm256_set_ph (__A15, __A14, __A13, __A12, __A11, __A10, __A9,
+			__A8, __A7, __A6, __A5, __A4, __A3, __A2, __A1,
+			__A0);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setr_ph (_Float16 __A0, _Float16 __A1, _Float16 __A2,
+		_Float16 __A3, _Float16 __A4, _Float16 __A5,
+		_Float16 __A6, _Float16 __A7, _Float16 __A8,
+		_Float16 __A9, _Float16 __A10, _Float16 __A11,
+		_Float16 __A12, _Float16 __A13, _Float16 __A14,
+		_Float16 __A15, _Float16 __A16, _Float16 __A17,
+		_Float16 __A18, _Float16 __A19, _Float16 __A20,
+		_Float16 __A21, _Float16 __A22, _Float16 __A23,
+		_Float16 __A24, _Float16 __A25, _Float16 __A26,
+		_Float16 __A27, _Float16 __A28, _Float16 __A29,
+		_Float16 __A30, _Float16 __A31)
+
+{
+  return _mm512_set_ph (__A31, __A30, __A29, __A28, __A27, __A26, __A25,
+			__A24, __A23, __A22, __A21, __A20, __A19, __A18,
+			__A17, __A16, __A15, __A14, __A13, __A12, __A11,
+			__A10, __A9, __A8, __A7, __A6, __A5, __A4, __A3,
+			__A2, __A1, __A0);
+}
+
+/* Broadcast _Float16 to vector.  */
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set1_ph (_Float16 __A)
+{
+  return _mm_set_ph (__A, __A, __A, __A, __A, __A, __A, __A);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_set1_ph (_Float16 __A)
+{
+  return _mm256_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_ph (_Float16 __A)
+{
+  return _mm512_set_ph (__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A,
+			__A, __A, __A, __A, __A, __A, __A, __A);
+}
+
+/* Create a vector with all zeros.  */
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_setzero_ph (void)
+{
+  return _mm_set1_ph (0.0f);
+}
+
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_setzero_ph (void)
+{
+  return _mm256_set1_ph (0.0f);
+}
+
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_ph (void)
+{
+  return _mm512_set1_ph (0.0f);
+}
+
+/* Create a vector with element 0 as F and the rest zero.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set_sh (_Float16 __F)
+{
+  return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, __F);
+}
+
+/* Create a vector with element 0 as *P and the rest zero.  */
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_load_sh (void const *__P)
+{
+  return _mm_set_ph (0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f,
+		     *(_Float16 const *) __P);
+}
+
+/* Stores the lower _Float16 value.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_store_sh (void *__P, __m128h __A)
+{
+  *(_Float16 *) __P = ((__v8hf)__A)[0];
+}
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 1768b88d748..4df6ee1009d 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -85,6 +85,7 @@ DEF_VECTOR_TYPE (V8QI, QI)
 # SSE vectors
 DEF_VECTOR_TYPE (V2DF, DOUBLE)
 DEF_VECTOR_TYPE (V4SF, FLOAT)
+DEF_VECTOR_TYPE (V8HF, FLOAT16)
 DEF_VECTOR_TYPE (V2DI, DI)
 DEF_VECTOR_TYPE (V4SI, SI)
 DEF_VECTOR_TYPE (V8HI, HI)
@@ -1297,4 +1298,7 @@ DEF_FUNCTION_TYPE (UINT, UINT, V2DI, V2DI, PVOID)
 DEF_FUNCTION_TYPE (UINT, UINT, V2DI, PVOID)
 DEF_FUNCTION_TYPE (VOID, V2DI, V2DI, V2DI, UINT)
 DEF_FUNCTION_TYPE (UINT8, PV2DI, V2DI, PCVOID)
-DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
\ No newline at end of file
+DEF_FUNCTION_TYPE (UINT8, PV2DI, PCV2DI, PCVOID)
+
+# FP16 builtins
+DEF_FUNCTION_TYPE (V8HF, V8HI)
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index b7d050a1e42..bb965ca0e9b 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3952,6 +3952,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
       break;
     case E_V16QImode:
     case E_V8HImode:
+    case E_V8HFmode:
     case E_V4SImode:
     case E_V2DImode:
       if (TARGET_SSE4_1)
@@ -3974,6 +3975,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
       break;
     case E_V32QImode:
     case E_V16HImode:
+    case E_V16HFmode:
     case E_V8SImode:
     case E_V4DImode:
       if (TARGET_AVX2)
@@ -3993,6 +3995,9 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false)
     case E_V32HImode:
       gen = gen_avx512bw_blendmv32hi;
       break;
+    case E_V32HFmode:
+      gen = gen_avx512bw_blendmv32hf;
+      break;
     case E_V16SImode:
       gen = gen_avx512f_blendmv16si;
       break;
@@ -14144,6 +14149,11 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
 	}
       return true;
 
+    case E_V8HFmode:
+    case E_V16HFmode:
+    case E_V32HFmode:
+      return ix86_vector_duplicate_value (mode, target, val);
+
     default:
       return false;
     }
@@ -14228,6 +14238,18 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
       use_vector_set = TARGET_AVX512F && TARGET_64BIT && one_var == 0;
       gen_vec_set_0 = gen_vec_setv8di_0;
       break;
+    case E_V8HFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv8hf_0;
+      break;
+    case E_V16HFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv16hf_0;
+      break;
+    case E_V32HFmode:
+      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
+      gen_vec_set_0 = gen_vec_setv32hf_0;
+      break;
     default:
       break;
     }
@@ -14377,6 +14399,8 @@ ix86_expand_vector_init_one_var (bool mmx_ok, machine_mode mode,
       if (!TARGET_64BIT)
 	return false;
       /* FALLTHRU */
+    case E_V8HFmode:
+    case E_V16HFmode:
     case E_V4DFmode:
     case E_V8SFmode:
     case E_V8SImode:
@@ -14457,6 +14481,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
     case 2:
       switch (mode)
 	{
+	case E_V32HFmode:
+	  half_mode = V16HFmode;
+	  break;
 	case E_V16SImode:
 	  half_mode = V8SImode;
 	  break;
@@ -14469,6 +14496,9 @@ ix86_expand_vector_init_concat (machine_mode mode,
 	case E_V8DFmode:
 	  half_mode = V4DFmode;
 	  break;
+	case E_V16HFmode:
+	  half_mode = V8HFmode;
+	  break;
 	case E_V8SImode:
 	  half_mode = V4SImode;
 	  break;
@@ -14611,13 +14641,22 @@ ix86_expand_vector_init_interleave (machine_mode mode,
 {
   machine_mode first_imode, second_imode, third_imode, inner_mode;
   int i, j;
-  rtx op0, op1;
+  rtx op, op0, op1;
   rtx (*gen_load_even) (rtx, rtx, rtx);
   rtx (*gen_interleave_first_low) (rtx, rtx, rtx);
   rtx (*gen_interleave_second_low) (rtx, rtx, rtx);
 
   switch (mode)
     {
+    case E_V8HFmode:
+      gen_load_even = gen_vec_setv8hf;
+      gen_interleave_first_low = gen_vec_interleave_lowv4si;
+      gen_interleave_second_low = gen_vec_interleave_lowv2di;
+      inner_mode = HFmode;
+      first_imode = V4SImode;
+      second_imode = V2DImode;
+      third_imode = VOIDmode;
+      break;
     case E_V8HImode:
       gen_load_even = gen_vec_setv8hi;
       gen_interleave_first_low = gen_vec_interleave_lowv4si;
@@ -14642,9 +14681,19 @@ ix86_expand_vector_init_interleave (machine_mode mode,
 
   for (i = 0; i < n; i++)
     {
+      op = ops [i + i];
+      if (inner_mode == HFmode)
+	{
+	  /* Convert HFmode to HImode.  */
+	  op1 = gen_reg_rtx (HImode);
+	  op1 = gen_rtx_SUBREG (HImode, force_reg (HFmode, op), 0);
+	  op = gen_reg_rtx (HImode);
+	  emit_move_insn (op, op1);
+	}
+
       /* Extend the odd elment to SImode using a paradoxical SUBREG.  */
       op0 = gen_reg_rtx (SImode);
-      emit_move_insn (op0, gen_lowpart (SImode, ops [i + i]));
+      emit_move_insn (op0, gen_lowpart (SImode, op));
 
       /* Insert the SImode value as low element of V4SImode vector. */
       op1 = gen_reg_rtx (V4SImode);
@@ -14781,6 +14830,10 @@ ix86_expand_vector_init_general (bool mmx_ok, machine_mode mode,
       half_mode = V8HImode;
       goto half;
 
+    case E_V16HFmode:
+      half_mode = V8HFmode;
+      goto half;
+
 half:
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -14804,6 +14857,11 @@ half:
       half_mode = V16HImode;
       goto quarter;
 
+    case E_V32HFmode:
+      quarter_mode = V8HFmode;
+      half_mode = V16HFmode;
+      goto quarter;
+
 quarter:
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -14840,6 +14898,9 @@ quarter:
 	 move from GPR to SSE register directly.  */
       if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
 	break;
+      /* FALLTHRU */
+
+    case E_V8HFmode:
 
       n = GET_MODE_NUNITS (mode);
       for (i = 0; i < n; i++)
@@ -15087,6 +15148,16 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
 	case E_V16SFmode:
 	  cmp_mode = V16SImode;
 	  break;
+	/* TARGET_AVX512FP16 implies TARGET_AVX512BW.  */
+	case E_V8HFmode:
+	  cmp_mode = V8HImode;
+	  break;
+	case E_V16HFmode:
+	  cmp_mode = V16HImode;
+	  break;
+	case E_V32HFmode:
+	  cmp_mode = V32HImode;
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -15123,23 +15194,25 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
   machine_mode half_mode;
   bool use_vec_merge = false;
   rtx tmp;
-  static rtx (*gen_extract[6][2]) (rtx, rtx)
+  static rtx (*gen_extract[7][2]) (rtx, rtx)
     = {
 	{ gen_vec_extract_lo_v32qi, gen_vec_extract_hi_v32qi },
 	{ gen_vec_extract_lo_v16hi, gen_vec_extract_hi_v16hi },
 	{ gen_vec_extract_lo_v8si, gen_vec_extract_hi_v8si },
 	{ gen_vec_extract_lo_v4di, gen_vec_extract_hi_v4di },
 	{ gen_vec_extract_lo_v8sf, gen_vec_extract_hi_v8sf },
-	{ gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df }
+	{ gen_vec_extract_lo_v4df, gen_vec_extract_hi_v4df },
+	{ gen_vec_extract_lo_v16hf, gen_vec_extract_hi_v16hf }
       };
-  static rtx (*gen_insert[6][2]) (rtx, rtx, rtx)
+  static rtx (*gen_insert[7][2]) (rtx, rtx, rtx)
     = {
 	{ gen_vec_set_lo_v32qi, gen_vec_set_hi_v32qi },
 	{ gen_vec_set_lo_v16hi, gen_vec_set_hi_v16hi },
 	{ gen_vec_set_lo_v8si, gen_vec_set_hi_v8si },
 	{ gen_vec_set_lo_v4di, gen_vec_set_hi_v4di },
 	{ gen_vec_set_lo_v8sf, gen_vec_set_hi_v8sf },
-	{ gen_vec_set_lo_v4df, gen_vec_set_hi_v4df }
+	{ gen_vec_set_lo_v4df, gen_vec_set_hi_v4df },
+	{ gen_vec_set_lo_v16hf, gen_vec_set_hi_v16hf },
       };
   int i, j, n;
   machine_mode mmode = VOIDmode;
@@ -15306,6 +15379,10 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
 	}
       return;
 
+    case E_V8HFmode:
+      use_vec_merge = true;
+      break;
+
     case E_V8HImode:
     case E_V2HImode:
       use_vec_merge = TARGET_SSE2;
@@ -15329,6 +15406,12 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
       n = 16;
       goto half;
 
+    case E_V16HFmode:
+      half_mode = V8HFmode;
+      j = 6;
+      n = 8;
+      goto half;
+
     case E_V16HImode:
       half_mode = V8HImode;
       j = 1;
@@ -15409,6 +15492,13 @@ half:
 	}
       break;
 
+    case E_V32HFmode:
+      if (TARGET_AVX512BW)
+	{
+	  mmode = SImode;
+	  gen_blendm = gen_avx512bw_blendmv32hf;
+	}
+      break;
     case E_V32HImode:
       if (TARGET_AVX512BW)
 	{
@@ -15780,6 +15870,28 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
       ix86_expand_vector_extract (false, target, tmp, elt & 3);
       return;
 
+    case E_V32HFmode:
+      tmp = gen_reg_rtx (V16HFmode);
+      if (elt < 16)
+	emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
+      else
+	emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
+      ix86_expand_vector_extract (false, target, tmp, elt & 15);
+      return;
+
+    case E_V16HFmode:
+      tmp = gen_reg_rtx (V8HFmode);
+      if (elt < 8)
+	emit_insn (gen_vec_extract_lo_v16hf (tmp, vec));
+      else
+	emit_insn (gen_vec_extract_hi_v16hf (tmp, vec));
+      ix86_expand_vector_extract (false, target, tmp, elt & 7);
+      return;
+
+    case E_V8HFmode:
+      use_vec_extr = true;
+      break;
+
     case E_V8QImode:
       use_vec_extr = TARGET_MMX_WITH_SSE && TARGET_SSE4_1;
       /* ??? Could extract the appropriate HImode element and shift.  */
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 9232f59a925..fcadfcd4c94 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -84,12 +84,12 @@ VECTOR_MODES (INT, 16);       /*   V16QI V8HI V4SI V2DI */
 VECTOR_MODES (INT, 32);       /*  V32QI V16HI V8SI V4DI */
 VECTOR_MODES (INT, 64);       /* V64QI V32HI V16SI V8DI */
 VECTOR_MODES (INT, 128);      /* V128QI V64HI V32SI V16DI */
-VECTOR_MODES (FLOAT, 8);      /*                   V2SF */
-VECTOR_MODES (FLOAT, 16);     /*              V4SF V2DF */
-VECTOR_MODES (FLOAT, 32);     /*         V8SF V4DF V2TF */
-VECTOR_MODES (FLOAT, 64);     /*        V16SF V8DF V4TF */
-VECTOR_MODES (FLOAT, 128);    /*       V32SF V16DF V8TF */
-VECTOR_MODES (FLOAT, 256);    /*      V64SF V32DF V16TF */
+VECTOR_MODES (FLOAT, 8);      /*              V4HF V2SF */
+VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
+VECTOR_MODES (FLOAT, 32);     /*   V16HF V8SF V4DF V2TF */
+VECTOR_MODES (FLOAT, 64);     /*  V32HF V16SF V8DF V4TF */
+VECTOR_MODES (FLOAT, 128);    /* V64HF V32SF V16DF V8TF */
+VECTOR_MODES (FLOAT, 256);    /* V128HF V64SF V32DF V16TF */
 VECTOR_MODE (INT, TI, 1);     /*                   V1TI */
 VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
 VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 71bbcf968c5..889256e0298 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2418,6 +2418,7 @@ classify_argument (machine_mode mode, const_tree type,
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
+    case E_V16HFmode:
     case E_V16HImode:
     case E_V4DFmode:
     case E_V4DImode:
@@ -2428,6 +2429,7 @@ classify_argument (machine_mode mode, const_tree type,
       return 4;
     case E_V8DFmode:
     case E_V16SFmode:
+    case E_V32HFmode:
     case E_V8DImode:
     case E_V16SImode:
     case E_V32HImode:
@@ -2445,6 +2447,7 @@ classify_argument (machine_mode mode, const_tree type,
     case E_V4SImode:
     case E_V16QImode:
     case E_V8HImode:
+    case E_V8HFmode:
     case E_V2DFmode:
     case E_V2DImode:
       classes[0] = X86_64_SSE_CLASS;
@@ -2858,12 +2861,14 @@ pass_in_reg:
 	break;
       /* FALLTHRU */
 
+    case E_V16HFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V64QImode:
     case E_V32HImode:
     case E_V16SImode:
     case E_V8DImode:
+    case E_V32HFmode:
     case E_V16SFmode:
     case E_V8DFmode:
     case E_V32QImode:
@@ -2875,6 +2880,7 @@ pass_in_reg:
     case E_V8HImode:
     case E_V4SImode:
     case E_V2DImode:
+    case E_V8HFmode:
     case E_V4SFmode:
     case E_V2DFmode:
       if (!type || !AGGREGATE_TYPE_P (type))
@@ -2929,7 +2935,9 @@ function_arg_advance_64 (CUMULATIVE_ARGS *cum, machine_mode mode,
 
   /* Unnamed 512 and 256bit vector mode parameters are passed on stack.  */
   if (!named && (VALID_AVX512F_REG_MODE (mode)
-		 || VALID_AVX256_REG_MODE (mode)))
+		 || VALID_AVX256_REG_MODE (mode)
+		 || mode == V16HFmode
+		 || mode == V32HFmode))
     return 0;
 
   if (!examine_argument (mode, type, 0, &int_nregs, &sse_nregs)
@@ -3097,6 +3105,7 @@ pass_in_reg:
     case E_V8HImode:
     case E_V4SImode:
     case E_V2DImode:
+    case E_V8HFmode:
     case E_V4SFmode:
     case E_V2DFmode:
       if (!type || !AGGREGATE_TYPE_P (type))
@@ -3116,8 +3125,10 @@ pass_in_reg:
     case E_V32HImode:
     case E_V16SImode:
     case E_V8DImode:
+    case E_V32HFmode:
     case E_V16SFmode:
     case E_V8DFmode:
+    case E_V16HFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
@@ -3176,12 +3187,14 @@ function_arg_64 (const CUMULATIVE_ARGS *cum, machine_mode mode,
     default:
       break;
 
+    case E_V16HFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
     case E_V16HImode:
     case E_V4DFmode:
     case E_V4DImode:
+    case E_V32HFmode:
     case E_V16SFmode:
     case E_V16SImode:
     case E_V64QImode:
@@ -4676,12 +4689,14 @@ ix86_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
   nat_mode = type_natural_mode (type, NULL, false);
   switch (nat_mode)
     {
+    case E_V16HFmode:
     case E_V8SFmode:
     case E_V8SImode:
     case E_V32QImode:
     case E_V16HImode:
     case E_V4DFmode:
     case E_V4DImode:
+    case E_V32HFmode:
     case E_V16SFmode:
     case E_V16SImode:
     case E_V64QImode:
@@ -5348,7 +5363,12 @@ ix86_get_ssemov (rtx *operands, unsigned size,
       switch (type)
 	{
 	case opcode_int:
-	  opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	  if (scalar_mode == E_HFmode)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
+		      : "vmovdqa64");
+	  else
+	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
 	  break;
 	case opcode_float:
 	  opcode = misaligned_p ? "vmovups" : "vmovaps";
@@ -5362,6 +5382,11 @@ ix86_get_ssemov (rtx *operands, unsigned size,
     {
       switch (scalar_mode)
 	{
+	case E_HFmode:
+	  opcode = (misaligned_p
+		    ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
+		    : "vmovdqa64");
+	  break;
 	case E_SFmode:
 	  opcode = misaligned_p ? "%vmovups" : "%vmovaps";
 	  break;
@@ -19298,7 +19323,6 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
       int index;
       switch (mode)
 	{
-	  case E_HFmode:
 	  case E_SFmode:
 	    index = 0;
 	    break;
@@ -19399,31 +19423,12 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
 	  }
 	break;
       case 2:
-	{
-	  int cost;
-	  if (in == 2)
-	    cost = MAX (ix86_cost->hard_register.int_load[1],
-			ix86_cost->hard_register.int_store[1]);
-	  else
-	    cost = in ? ix86_cost->hard_register.int_load[1]
-		      : ix86_cost->hard_register.int_store[1];
-	  if (mode == E_HFmode)
-	    {
-	      /* Prefer SSE over GPR for HFmode.  */
-	      int sse_cost;
-	      int index = sse_store_index (mode);
-	      if (in == 2)
-		sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
-				ix86_cost->hard_register.sse_store[index]);
-	      else
-		sse_cost = (in
-			    ? ix86_cost->hard_register.sse_load [index]
-			    : ix86_cost->hard_register.sse_store [index]);
-	      if (sse_cost >= cost)
-		cost = sse_cost + 1;
-	    }
-	  return cost;
-	}
+	if (in == 2)
+	  return MAX (ix86_cost->hard_register.int_load[1],
+		      ix86_cost->hard_register.int_store[1]);
+	else
+	  return in ? ix86_cost->hard_register.int_load[1]
+		    : ix86_cost->hard_register.int_store[1];
       default:
 	if (in == 2)
 	  cost = MAX (ix86_cost->hard_register.int_load[2],
@@ -19601,6 +19606,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	 between gpr and sse registser.  */
       if (TARGET_AVX512F
 	  && (mode == XImode
+	      || mode == V32HFmode
 	      || VALID_AVX512F_REG_MODE (mode)
 	      || VALID_AVX512F_SCALAR_MODE (mode)))
 	return true;
@@ -19615,9 +19621,7 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
       /* TODO check for QI/HI scalars.  */
       /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
       if (TARGET_AVX512VL
-	  && (mode == OImode
-	      || mode == TImode
-	      || VALID_AVX256_REG_MODE (mode)
+	  && (VALID_AVX256_REG_OR_OI_VHF_MODE (mode)
 	      || VALID_AVX512VL_128_REG_MODE (mode)))
 	return true;
 
@@ -19627,9 +19631,9 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 
       /* OImode and AVX modes are available only when AVX is enabled.  */
       return ((TARGET_AVX
-	       && VALID_AVX256_REG_OR_OI_MODE (mode))
+	       && VALID_AVX256_REG_OR_OI_VHF_MODE (mode))
 	      || VALID_SSE_REG_MODE (mode)
-	      || VALID_SSE2_REG_MODE (mode)
+	      || VALID_SSE2_REG_VHF_MODE (mode)
 	      || VALID_MMX_REG_MODE (mode)
 	      || VALID_MMX_REG_MODE_3DNOW (mode));
     }
@@ -19840,7 +19844,8 @@ ix86_set_reg_reg_cost (machine_mode mode)
 
     case MODE_VECTOR_INT:
     case MODE_VECTOR_FLOAT:
-      if ((TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
+      if ((TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
+	  || (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
 	  || (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
 	  || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
 	  || (TARGET_SSE && VALID_SSE_REG_MODE (mode))
@@ -21706,6 +21711,8 @@ ix86_vector_mode_supported_p (machine_mode mode)
   if ((TARGET_MMX || TARGET_MMX_WITH_SSE)
       && VALID_MMX_REG_MODE (mode))
     return true;
+  if (TARGET_AVX512FP16 && VALID_AVX512FP16_REG_MODE (mode))
+    return true;
   if ((TARGET_3DNOW || TARGET_MMX_WITH_SSE)
       && VALID_MMX_REG_MODE_3DNOW (mode))
     return true;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 8fcd5693624..64327dc90df 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -995,8 +995,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V4DImode || (MODE) == V2TImode || (MODE) == V8SFmode	\
    || (MODE) == V4DFmode)
 
-#define VALID_AVX256_REG_OR_OI_MODE(MODE)		\
-  (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
+#define VALID_AVX256_REG_OR_OI_VHF_MODE(MODE)		\
+  (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode || (MODE) == V16HFmode)
 
 #define VALID_AVX512F_SCALAR_MODE(MODE)					\
   ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode		\
@@ -1014,13 +1014,20 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define VALID_AVX512VL_128_REG_MODE(MODE)				\
   ((MODE) == V2DImode || (MODE) == V2DFmode || (MODE) == V16QImode	\
    || (MODE) == V4SImode || (MODE) == V4SFmode || (MODE) == V8HImode	\
-   || (MODE) == TFmode || (MODE) == V1TImode)
+   || (MODE) == TFmode || (MODE) == V1TImode || (MODE) == V8HFmode	\
+   || (MODE) == TImode)
+
+#define VALID_AVX512FP16_REG_MODE(MODE)					\
+  ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode)
 
 #define VALID_SSE2_REG_MODE(MODE)					\
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode	\
    || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode	\
    || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
 
+#define VALID_SSE2_REG_VHF_MODE(MODE)			\
+  (VALID_SSE2_REG_MODE (MODE) || (MODE) == V8HFmode)
+
 #define VALID_SSE_REG_MODE(MODE)					\
   ((MODE) == V1TImode || (MODE) == TImode				\
    || (MODE) == V4SFmode || (MODE) == V4SImode				\
@@ -1065,7 +1072,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode	\
    || (MODE) == V2TImode || (MODE) == V8DImode || (MODE) == V64QImode	\
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
-   || (MODE) == V16SFmode)
+   || (MODE) == V16SFmode || VALID_AVX512FP16_REG_MODE (MODE))
 
 #define X87_FLOAT_MODE_P(MODE)	\
   (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 777d11261ac..f25166695f1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -496,8 +496,8 @@ (define_attr "type"
 
 ;; Main data type used by the insn
 (define_attr "mode"
-  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
-  V2DF,V2SF,V1DF,V8DF"
+  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V32HF,V16HF,V8HF,
+   V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF"
   (const_string "unknown"))
 
 ;; The CPU unit operations uses.
@@ -1102,7 +1102,8 @@ (define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
 			     (V2DI "16") (V4DI "32") (V8DI "64")
 			     (V1TI "16") (V2TI "32") (V4TI "64")
 			     (V2DF "16") (V4DF "32") (V8DF "64")
-			     (V4SF "16") (V8SF "32") (V16SF "64")])
+			     (V4SF "16") (V8SF "32") (V16SF "64")
+			     (V8HF "16") (V16HF "32") (V32HF "64")])
 
 ;; Double word integer modes as mode attribute.
 (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "OI")])
@@ -1237,9 +1238,9 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
 ;; SSE instruction suffix for various modes
 (define_mode_attr ssemodesuffix
   [(HF "sh") (SF "ss") (DF "sd")
-   (V16SF "ps") (V8DF "pd")
-   (V8SF "ps") (V4DF "pd")
-   (V4SF "ps") (V2DF "pd")
+   (V32HF "ph") (V16SF "ps") (V8DF "pd")
+   (V16HF "ph") (V8SF "ps") (V4DF "pd")
+   (V8HF "ph") (V4SF "ps") (V2DF "pd")
    (V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
    (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
    (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")])
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ab29999023d..e331ef477d3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -225,6 +225,7 @@ (define_mode_iterator VMOVE
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX") V1TI
+   (V32HF "TARGET_AVX512F") (V16HF "TARGET_AVX") V8HF
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") V2DF])
 
@@ -240,6 +241,13 @@ (define_mode_iterator VI12_AVX512VL
   [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
    V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
 
+(define_mode_iterator VI12HF_AVX512VL
+  [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL")
+   V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")
+   (V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")])
+
 ;; Same iterator, but without supposed TARGET_AVX512BW
 (define_mode_iterator VI12_AVX512VLBW
   [(V64QI "TARGET_AVX512BW") (V16QI "TARGET_AVX512VL")
@@ -255,6 +263,8 @@ (define_mode_iterator V
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F")  (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F")  (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")])
 
@@ -277,7 +287,8 @@ (define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF])
 (define_mode_iterator V_256_512
   [V32QI V16HI V8SI V4DI V8SF V4DF
    (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V16SI "TARGET_AVX512F")
-   (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
+   (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
+   (V16HF "TARGET_AVX512FP16") (V32HF "TARGET_AVX512FP16")])
 
 ;; All vector float modes
 (define_mode_iterator VF
@@ -321,6 +332,11 @@ (define_mode_iterator VF2_512_256VL
 (define_mode_iterator VF_128
   [V4SF (V2DF "TARGET_SSE2")])
 
+;; All 128bit vector HF/SF/DF modes
+(define_mode_iterator VFH_128
+  [(V8HF "TARGET_AVX512FP16")
+   V4SF (V2DF "TARGET_SSE2")])
+
 ;; All 256bit vector float modes
 (define_mode_iterator VF_256
   [V8SF V4DF])
@@ -347,6 +363,9 @@ (define_mode_iterator VF2_AVX512VL
 (define_mode_iterator VF1_AVX512VL
   [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
 
+(define_mode_iterator VF_AVX512FP16
+  [V32HF V16HF V8HF])
+
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
@@ -355,6 +374,16 @@ (define_mode_iterator VI
    (V8SI "TARGET_AVX") V4SI
    (V4DI "TARGET_AVX") V2DI])
 
+;; All vector integer and HF modes
+(define_mode_iterator VIHF
+  [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
+   (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
+   (V8SI "TARGET_AVX") V4SI
+   (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")])
+
 (define_mode_iterator VI_AVX2
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI
@@ -557,6 +586,7 @@ (define_mode_attr avx512
    (V8HI  "avx512vl") (V16HI  "avx512vl") (V32HI "avx512bw")
    (V4SI  "avx512vl") (V8SI  "avx512vl") (V16SI "avx512f")
    (V2DI  "avx512vl") (V4DI  "avx512vl") (V8DI "avx512f")
+   (V8HF "avx512fp16") (V16HF "avx512vl") (V32HF "avx512bw")
    (V4SF "avx512vl") (V8SF "avx512vl") (V16SF "avx512f")
    (V2DF "avx512vl") (V4DF "avx512vl") (V8DF "avx512f")])
 
@@ -617,12 +647,13 @@ (define_mode_attr avx2_avx512
    (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw")])
 
 (define_mode_attr shuffletype
-  [(V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i")
-  (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i")
-  (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i")
-  (V32HI "i") (V16HI "i") (V8HI "i")
-  (V64QI "i") (V32QI "i") (V16QI "i")
-  (V4TI "i") (V2TI "i") (V1TI "i")])
+  [(V32HF "f") (V16HF "f") (V8HF "f")
+   (V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i")
+   (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i")
+   (V4SF "f") (V4SI "i") (V2DF "f") (V2DI "i")
+   (V32HI "i") (V16HI "i") (V8HI "i")
+   (V64QI "i") (V32QI "i") (V16QI "i")
+   (V4TI "i") (V2TI "i") (V1TI "i")])
 
 (define_mode_attr ssequartermode
   [(V16SF "V4SF") (V8DF "V2DF") (V16SI "V4SI") (V8DI "V2DI")])
@@ -659,6 +690,8 @@ (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
 
 ;; All 128 and 256bit vector integer modes
 (define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI])
+;; All 256bit vector integer and HF modes
+(define_mode_iterator VIHF_256 [V32QI V16HI V8SI V4DI V16HF])
 
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
@@ -680,6 +713,9 @@ (define_mode_iterator VI48_512 [V16SI V8DI])
 (define_mode_iterator VI4_256_8_512 [V8SI V8DI])
 (define_mode_iterator VI_AVX512BW
   [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
+(define_mode_iterator VIHF_AVX512BW
+  [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")
+  (V32HF "TARGET_AVX512FP16")])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -720,6 +756,9 @@ (define_mode_iterator VF_AVX512
    (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
    V16SF V8DF])
 
+(define_mode_iterator V16_256 [V16HI V16HF])
+(define_mode_iterator V32_512 [V32HI V32HF])
+
 (define_mode_attr avx512bcst
   [(V4SI "%{1to4%}") (V2DI "%{1to2%}")
    (V8SI "%{1to8%}") (V4DI "%{1to4%}")
@@ -730,8 +769,10 @@ (define_mode_attr avx512bcst
 
 ;; Mapping from float mode to required SSE level
 (define_mode_attr sse
-  [(SF "sse") (DF "sse2")
+  [(SF "sse") (DF "sse2") (HF "avx512fp16")
    (V4SF "sse") (V2DF "sse2")
+   (V32HF "avx512fp16") (V16HF "avx512fp16")
+   (V8HF "avx512fp16")
    (V16SF "avx512f") (V8SF "avx")
    (V8DF "avx512f") (V4DF "avx")])
 
@@ -767,14 +808,23 @@ (define_mode_attr sseinsnmode
    (V16SF "V16SF") (V8DF "V8DF")
    (V8SF "V8SF") (V4DF "V4DF")
    (V4SF "V4SF") (V2DF "V2DF")
+   (V8HF "TI") (V16HF "OI") (V32HF "XI")
    (TI "TI")])
 
+;; SSE integer instruction suffix for various modes
+(define_mode_attr sseintmodesuffix
+  [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")
+   (V32QI "b") (V16HI "w") (V8SI "d") (V4DI "q")
+   (V64QI "b") (V32HI "w") (V16SI "d") (V8DI "q")
+   (V8HF "w") (V16HF "w") (V32HF "w")])
+
 ;; Mapping of vector modes to corresponding mask size
 (define_mode_attr avx512fmaskmode
   [(V64QI "DI") (V32QI "SI") (V16QI "HI")
    (V32HI "SI") (V16HI "HI") (V8HI  "QI") (V4HI "QI")
    (V16SI "HI") (V8SI  "QI") (V4SI  "QI")
    (V8DI  "QI") (V4DI  "QI") (V2DI  "QI")
+   (V32HF "SI") (V16HF "HI") (V8HF  "QI")
    (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
@@ -784,6 +834,7 @@ (define_mode_attr avx512fmaskmodelower
    (V32HI "si") (V16HI "hi") (V8HI  "qi") (V4HI "qi")
    (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
    (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
+   (V32HF "si") (V16HF "hi") (V8HF  "qi")
    (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
    (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
 
@@ -828,7 +879,8 @@ (define_mode_attr ssedoublevecmode
    (V16QI "V32QI") (V8HI "V16HI") (V4SI "V8SI") (V2DI "V4DI")
    (V16SF "V32SF") (V8DF "V16DF")
    (V8SF "V16SF") (V4DF "V8DF")
-   (V4SF "V8SF") (V2DF "V4DF")])
+   (V4SF "V8SF") (V2DF "V4DF")
+   (V32HF "V64HF") (V16HF "V32HF") (V8HF "V16HF")])
 
 ;; Mapping of vector modes to a vector mode of half size
 ;; instead of V1DI/V1DF, DI/DF are used for V2DI/V2DF although they are scalar.
@@ -838,7 +890,8 @@ (define_mode_attr ssehalfvecmode
    (V16QI  "V8QI") (V8HI   "V4HI") (V4SI  "V2SI") (V2DI "DI")
    (V16SF "V8SF") (V8DF "V4DF")
    (V8SF  "V4SF") (V4DF "V2DF")
-   (V4SF  "V2SF") (V2DF "DF")])
+   (V4SF  "V2SF") (V2DF "DF")
+   (V32HF "V16HF") (V16HF "V8HF") (V8HF "V4HF")])
 
 (define_mode_attr ssehalfvecmodelower
   [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti")
@@ -846,9 +899,10 @@ (define_mode_attr ssehalfvecmodelower
    (V16QI  "v8qi") (V8HI   "v4hi") (V4SI  "v2si")
    (V16SF "v8sf") (V8DF "v4df")
    (V8SF  "v4sf") (V4DF "v2df")
-   (V4SF  "v2sf")])
+   (V4SF  "v2sf")
+   (V32HF "v16hf") (V16HF "v8hf") (V8HF "v4hf")])
 
-;; Mapping of vector modes ti packed single mode of the same size
+;; Mapping of vector modes to packed single mode of the same size
 (define_mode_attr ssePSmode
   [(V16SI "V16SF") (V8DF "V16SF")
    (V16SF "V16SF") (V8DI "V16SF")
@@ -858,7 +912,8 @@ (define_mode_attr ssePSmode
    (V4DI "V8SF") (V2DI "V4SF")
    (V4TI "V16SF") (V2TI "V8SF") (V1TI "V4SF")
    (V8SF "V8SF") (V4SF "V4SF")
-   (V4DF "V8SF") (V2DF "V4SF")])
+   (V4DF "V8SF") (V2DF "V4SF")
+   (V32HF "V16SF") (V16HF "V8SF") (V8HF "V4SF")])
 
 (define_mode_attr ssePSmode2
   [(V8DI "V8SF") (V4DI "V4SF")])
@@ -869,6 +924,7 @@ (define_mode_attr ssescalarmode
    (V32HI "HI") (V16HI "HI") (V8HI "HI")
    (V16SI "SI") (V8SI "SI")  (V4SI "SI")
    (V8DI "DI")  (V4DI "DI")  (V2DI "DI")
+   (V32HF "HF") (V16HF "HF") (V8HF "HF")
    (V16SF "SF") (V8SF "SF")  (V4SF "SF")
    (V8DF "DF")  (V4DF "DF")  (V2DF "DF")
    (V4TI "TI")  (V2TI "TI")])
@@ -879,6 +935,7 @@ (define_mode_attr ssescalarmodelower
    (V32HI "hi") (V16HI "hi") (V8HI "hi")
    (V16SI "si") (V8SI "si")  (V4SI "si")
    (V8DI "di")  (V4DI "di")  (V2DI "di")
+   (V32HF "hf") (V16HF "hf")  (V8HF "hf")
    (V16SF "sf") (V8SF "sf")  (V4SF "sf")
    (V8DF "df")  (V4DF "df")  (V2DF "df")
    (V4TI "ti")  (V2TI "ti")])
@@ -889,6 +946,7 @@ (define_mode_attr ssexmmmode
    (V32HI "V8HI")  (V16HI "V8HI") (V8HI "V8HI")
    (V16SI "V4SI")  (V8SI "V4SI")  (V4SI "V4SI")
    (V8DI "V2DI")   (V4DI "V2DI")  (V2DI "V2DI")
+   (V32HF "V8HF")  (V16HF "V8HF") (V8HF "V8HF")
    (V16SF "V4SF")  (V8SF "V4SF")  (V4SF "V4SF")
    (V8DF "V2DF")   (V4DF "V2DF")  (V2DF "V2DF")])
 
@@ -931,10 +989,11 @@ (define_mode_attr ssescalarsize
    (V64QI "8") (V32QI "8") (V16QI "8")
    (V32HI "16") (V16HI "16") (V8HI "16")
    (V16SI "32") (V8SI "32") (V4SI "32")
+   (V32HF "16") (V16HF "16") (V8HF "16")
    (V16SF "32") (V8SF "32") (V4SF "32")
    (V8DF "64") (V4DF "64") (V2DF "64")])
 
-;; SSE prefix for integer vector modes
+;; SSE prefix for integer and HF vector modes
 (define_mode_attr sseintprefix
   [(V2DI  "p") (V2DF  "")
    (V4DI  "p") (V4DF  "")
@@ -942,16 +1001,16 @@ (define_mode_attr sseintprefix
    (V4SI  "p") (V4SF  "")
    (V8SI  "p") (V8SF  "")
    (V16SI "p") (V16SF "")
-   (V16QI "p") (V8HI "p")
-   (V32QI "p") (V16HI "p")
-   (V64QI "p") (V32HI "p")])
+   (V16QI "p") (V8HI "p") (V8HF "p")
+   (V32QI "p") (V16HI "p") (V16HF "p")
+   (V64QI "p") (V32HI "p") (V32HF "p")])
 
 ;; SSE scalar suffix for vector modes
 (define_mode_attr ssescalarmodesuffix
-  [(SF "ss") (DF "sd")
-   (V16SF "ss") (V8DF "sd")
-   (V8SF "ss") (V4DF "sd")
-   (V4SF "ss") (V2DF "sd")
+  [(HF "sh") (SF "ss") (DF "sd")
+   (V32HF "sh") (V16SF "ss") (V8DF "sd")
+   (V16HF "sh") (V8SF "ss") (V4DF "sd")
+   (V8HF "sh") (V4SF "ss") (V2DF "sd")
    (V16SI "d") (V8DI "q")
    (V8SI "d") (V4DI "q")
    (V4SI "d") (V2DI "q")])
@@ -979,7 +1038,8 @@ (define_mode_attr castmode
 ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise.
 ;; i64x4 or f64x4 for 512bit modes.
 (define_mode_attr i128
-  [(V16SF "f64x4") (V8SF "f128") (V8DF "f64x4") (V4DF "f128")
+  [(V16HF "%~128") (V32HF "i64x4") (V16SF "f64x4") (V8SF "f128")
+   (V8DF "f64x4") (V4DF "f128")
    (V64QI "i64x4") (V32QI "%~128") (V32HI "i64x4") (V16HI "%~128")
    (V16SI "i64x4") (V8SI "%~128") (V8DI "i64x4") (V4DI "%~128")])
 
@@ -1003,14 +1063,18 @@ (define_mode_attr bcstscalarsuff
    (V32HI "w")  (V16HI "w") (V8HI "w")
    (V16SI "d")  (V8SI "d")  (V4SI "d")
    (V8DI "q")   (V4DI "q")  (V2DI "q")
+   (V32HF "w")  (V16HF "w") (V8HF "w")
    (V16SF "ss") (V8SF "ss") (V4SF "ss")
    (V8DF "sd")  (V4DF "sd") (V2DF "sd")])
 
 ;; Tie mode of assembler operand to mode iterator
 (define_mode_attr xtg_mode
-  [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x") (V4SF "x") (V2DF "x")
-   (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t")
-   (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")])
+  [(V16QI "x") (V8HI "x") (V4SI "x") (V2DI "x")
+   (V8HF "x") (V4SF "x") (V2DF "x")
+   (V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t")
+   (V16HF "t") (V8SF "t") (V4DF "t")
+   (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g")
+   (V32HF "g") (V16SF "g") (V8DF "g")])
 
 ;; Half mask mode for unpacks
 (define_mode_attr HALFMASKMODE
@@ -1306,6 +1370,20 @@ (define_insn "<avx512>_blendm<mode>"
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "<avx512>_blendm<mode>"
+  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v,v")
+	(vec_merge:VF_AVX512FP16
+	  (match_operand:VF_AVX512FP16 2 "nonimmediate_operand" "vm,vm")
+	  (match_operand:VF_AVX512FP16 1 "nonimm_or_0_operand" "0C,v")
+	  (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk,Yk")))]
+  "TARGET_AVX512BW"
+  "@
+    vmovdqu<ssescalarsize>\t{%2, %0%{%3%}%N1|%0%{%3%}%N1, %2}
+    vpblendmw\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_insn "<avx512>_store<mode>_mask"
   [(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m")
 	(vec_merge:V48_AVX512VL
@@ -1903,12 +1981,12 @@ (define_insn "*<insn><mode>3<mask_name><round_name>"
 ;; Standard scalar operation patterns which preserve the rest of the
 ;; vector for combiner.
 (define_insn "*<sse>_vm<insn><mode>3"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (vec_duplicate:VF_128
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (vec_duplicate:VFH_128
 	    (plusminus:<ssescalarmode>
 	      (vec_select:<ssescalarmode>
-	        (match_operand:VF_128 1 "register_operand" "0,v")
+		(match_operand:VFH_128 1 "register_operand" "0,v")
 		(parallel [(const_int 0)]))
 	      (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")))
 	  (match_dup 1)
@@ -1919,7 +1997,16 @@ (define_insn "*<sse>_vm<insn><mode>3"
    v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
-   (set_attr "prefix" "orig,vex")
+   (set (attr "prefix")
+     (cond [(eq_attr "alternative" "0")
+	      (const_string "orig")
+	    (eq_attr "alternative" "1")
+	      (if_then_else
+		(match_test "<MODE>mode == V8HFmode")
+		(const_string "evex")
+		(const_string "vex"))
+	   ]
+	   (const_string "*")))
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_insn "<sse>_vm<insn><mode>3<mask_scalar_name><round_scalar_name>"
@@ -1966,12 +2053,12 @@ (define_insn "*mul<mode>3<mask_name><round_name>"
 ;; Standard scalar operation patterns which preserve the rest of the
 ;; vector for combiner.
 (define_insn "*<sse>_vm<multdiv_mnemonic><mode>3"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (vec_duplicate:VF_128
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (vec_duplicate:VFH_128
 	    (multdiv:<ssescalarmode>
 	      (vec_select:<ssescalarmode>
-	        (match_operand:VF_128 1 "register_operand" "0,v")
+		(match_operand:VFH_128 1 "register_operand" "0,v")
 		(parallel [(const_int 0)]))
 	      (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")))
 	  (match_dup 1)
@@ -1982,7 +2069,16 @@ (define_insn "*<sse>_vm<multdiv_mnemonic><mode>3"
    v<multdiv_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse<multdiv_mnemonic>")
-   (set_attr "prefix" "orig,vex")
+   (set (attr "prefix")
+     (cond [(eq_attr "alternative" "0")
+	      (const_string "orig")
+	    (eq_attr "alternative" "1")
+	      (if_then_else
+		(match_test "<MODE>mode == V8HFmode")
+		(const_string "evex")
+		(const_string "vex"))
+	   ]
+	   (const_string "*")))
    (set_attr "btver2_decode" "direct,double")
    (set_attr "mode" "<ssescalarmode>")])
 
@@ -2368,12 +2464,12 @@ (define_insn "ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>"
 ;; Standard scalar operation patterns which preserve the rest of the
 ;; vector for combiner.
 (define_insn "*ieee_<ieee_maxmin><mode>3"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,v")
-	(vec_merge:VF_128
-	  (vec_duplicate:VF_128
+  [(set (match_operand:VFH_128 0 "register_operand" "=x,v")
+	(vec_merge:VFH_128
+	  (vec_duplicate:VFH_128
 	    (unspec:<ssescalarmode>
 	      [(vec_select:<ssescalarmode>
-	         (match_operand:VF_128 1 "register_operand" "0,v")
+		 (match_operand:VFH_128 1 "register_operand" "0,v")
 		 (parallel [(const_int 0)]))
 	       (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,vm")]
 	       IEEE_MAXMIN))
@@ -2386,7 +2482,16 @@ (define_insn "*ieee_<ieee_maxmin><mode>3"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "btver2_sse_attr" "maxmin")
-   (set_attr "prefix" "orig,vex")
+   (set (attr "prefix")
+     (cond [(eq_attr "alternative" "0")
+	      (const_string "orig")
+	    (eq_attr "alternative" "1")
+	      (if_then_else
+		(match_test "<MODE>mode == V8HFmode")
+		(const_string "evex")
+		(const_string "vex"))
+	   ]
+	   (const_string "*")))
    (set_attr "mode" "<ssescalarmode>")])
 
 (define_insn "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>"
@@ -8364,6 +8469,47 @@ (define_insn "vec_set<mode>_0"
 	   ]
 	   (symbol_ref "true")))])
 
+;; vmovw clears also the higer bits
+(define_insn "vec_set<mode>_0"
+  [(set (match_operand:VF_AVX512FP16 0 "register_operand" "=v,v")
+	(vec_merge:VF_AVX512FP16
+	  (vec_duplicate:VF_AVX512FP16
+	    (match_operand:HF 2 "nonimmediate_operand" "r,m"))
+	  (match_operand:VF_AVX512FP16 1 "const0_operand" "C,C")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "@
+   vmovw\t{%k2, %x0|%x0, %k2}
+   vmovw\t{%2, %x0|%x0, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
+(define_insn "*avx512fp16_movsh"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+	  (vec_duplicate:V8HF
+	    (match_operand:HF 2 "register_operand" "v"))
+	  (match_operand:V8HF 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vmovsh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
+(define_insn "avx512fp16_movsh"
+  [(set (match_operand:V8HF 0 "register_operand" "=v")
+	(vec_merge:V8HF
+          (match_operand:V8HF 2 "register_operand" "v")
+	  (match_operand:V8HF 1 "register_operand" "v")
+	  (const_int 1)))]
+  "TARGET_AVX512FP16"
+  "vmovsh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 ;; A subset is vec_setv4sf.
 (define_insn "*vec_setv4sf_sse4_1"
   [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
@@ -8499,6 +8645,20 @@ (define_expand "vec_set<mode>"
   DONE;
 })
 
+(define_expand "vec_setv8hf"
+  [(match_operand:V8HF 0 "register_operand")
+   (match_operand:HF 1 "register_operand")
+   (match_operand 2 "vec_setm_sse41_operand")]
+  "TARGET_SSE"
+{
+  if (CONST_INT_P (operands[2]))
+    ix86_expand_vector_set (false, operands[0], operands[1],
+			    INTVAL (operands[2]));
+  else
+    ix86_expand_vector_set_var (operands[0], operands[1], operands[2]);
+  DONE;
+})
+
 (define_expand "vec_set<mode>"
   [(match_operand:V_256_512 0 "register_operand")
    (match_operand:<ssescalarmode> 1 "register_operand")
@@ -9214,10 +9374,10 @@ (define_insn "vec_extract_hi_<mode>"
    (set_attr "length_immediate" "1")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn_and_split "vec_extract_lo_v32hi"
-  [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,v,m")
-	(vec_select:V16HI
-	  (match_operand:V32HI 1 "nonimmediate_operand" "v,m,v")
+(define_insn_and_split "vec_extract_lo_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,v,m")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V32_512 1 "nonimmediate_operand" "v,m,v")
 	  (parallel [(const_int 0) (const_int 1)
 		     (const_int 2) (const_int 3)
 		     (const_int 4) (const_int 5)
@@ -9244,9 +9404,10 @@ (define_insn_and_split "vec_extract_lo_v32hi"
   if (!TARGET_AVX512VL
       && REG_P (operands[0])
       && EXT_REX_SSE_REG_P (operands[1]))
-    operands[0] = lowpart_subreg (V32HImode, operands[0], V16HImode);
+    operands[0] = lowpart_subreg (<MODE>mode, operands[0],
+				  <ssehalfvecmode>mode);
   else
-    operands[1] = gen_lowpart (V16HImode, operands[1]);
+    operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);
 }
   [(set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
@@ -9255,10 +9416,10 @@ (define_insn_and_split "vec_extract_lo_v32hi"
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "vec_extract_hi_v32hi"
-  [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
-	(vec_select:V16HI
-	  (match_operand:V32HI 1 "register_operand" "v")
+(define_insn "vec_extract_hi_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=vm")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V32_512 1 "register_operand" "v")
 	  (parallel [(const_int 16) (const_int 17)
 		     (const_int 18) (const_int 19)
 		     (const_int 20) (const_int 21)
@@ -9275,10 +9436,10 @@ (define_insn "vec_extract_hi_v32hi"
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn_and_split "vec_extract_lo_v16hi"
-  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,m")
-	(vec_select:V8HI
-	  (match_operand:V16HI 1 "nonimmediate_operand" "vm,v")
+(define_insn_and_split "vec_extract_lo_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,m")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V16_256 1 "nonimmediate_operand" "vm,v")
 	  (parallel [(const_int 0) (const_int 1)
 		     (const_int 2) (const_int 3)
 		     (const_int 4) (const_int 5)
@@ -9287,12 +9448,12 @@ (define_insn_and_split "vec_extract_lo_v16hi"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (V8HImode, operands[1]);")
+  "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);")
 
-(define_insn "vec_extract_hi_v16hi"
-  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=xm,vm,vm")
-	(vec_select:V8HI
-	  (match_operand:V16HI 1 "register_operand" "x,v,v")
+(define_insn "vec_extract_hi_<mode>"
+  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=xm,vm,vm")
+	(vec_select:<ssehalfvecmode>
+	  (match_operand:V16_256 1 "register_operand" "x,v,v")
 	  (parallel [(const_int 8) (const_int 9)
 		     (const_int 10) (const_int 11)
 		     (const_int 12) (const_int 13)
@@ -9428,12 +9589,41 @@ (define_insn "vec_extract_hi_v32qi"
    (set_attr "prefix" "vex,evex,evex")
    (set_attr "mode" "OI")])
 
+;; NB: *vec_extract<mode>_0 must be placed before *vec_extracthf.
+;; Otherwise, it will be ignored.
+(define_insn_and_split "*vec_extract<mode>_0"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=v,m,r")
+	(vec_select:HF
+	  (match_operand:VF_AVX512FP16 1 "nonimmediate_operand" "vm,v,m")
+	  (parallel [(const_int 0)])))]
+  "TARGET_AVX512FP16 && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 1))]
+  "operands[1] = gen_lowpart (HFmode, operands[1]);")
+
+(define_insn "*vec_extracthf"
+  [(set (match_operand:HF 0 "register_sse4nonimm_operand" "=r,m")
+	(vec_select:HF
+	  (match_operand:V8HF 1 "register_operand" "v,v")
+	  (parallel
+	    [(match_operand:SI 2 "const_0_to_7_operand")])))]
+  "TARGET_AVX512FP16"
+  "@
+   vpextrw\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix" "maybe_evex")
+   (set_attr "mode" "TI")])
+
 ;; Modes handled by vec_extract patterns.
 (define_mode_iterator VEC_EXTRACT_MODE
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -14666,16 +14856,16 @@ (define_expand "vec_interleave_low<mode>"
 
 ;; Modes handled by pinsr patterns.
 (define_mode_iterator PINSR_MODE
-  [(V16QI "TARGET_SSE4_1") V8HI
+  [(V16QI "TARGET_SSE4_1") V8HI (V8HF "TARGET_AVX512FP16")
    (V4SI "TARGET_SSE4_1")
    (V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
 
 (define_mode_attr sse2p4_1
-  [(V16QI "sse4_1") (V8HI "sse2")
+  [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1")
    (V4SI "sse4_1") (V2DI "sse4_1")])
 
 (define_mode_attr pinsr_evex_isa
-  [(V16QI "avx512bw") (V8HI "avx512bw")
+  [(V16QI "avx512bw") (V8HI "avx512bw") (V8HF "avx512bw")
    (V4SI "avx512dq") (V2DI "avx512dq")])
 
 ;; sse4_1_pinsrd must come before sse2_loadld since it is preferred.
@@ -14703,11 +14893,19 @@ (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
     case 2:
     case 4:
       if (GET_MODE_SIZE (<ssescalarmode>mode) < GET_MODE_SIZE (SImode))
-	return "vpinsr<ssemodesuffix>\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+	{
+	  if (<MODE>mode == V8HFmode)
+	    return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+	  else
+	    return "vpinsr<ssemodesuffix>\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+	}
       /* FALLTHRU */
     case 3:
     case 5:
-      return "vpinsr<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      if (<MODE>mode == V8HFmode)
+	return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      else
+	return "vpinsr<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
     default:
       gcc_unreachable ();
     }
@@ -21122,16 +21320,17 @@ (define_mode_attr pbroadcast_evex_isa
   [(V64QI "avx512bw") (V32QI "avx512bw") (V16QI "avx512bw")
    (V32HI "avx512bw") (V16HI "avx512bw") (V8HI "avx512bw")
    (V16SI "avx512f") (V8SI "avx512f") (V4SI "avx512f")
-   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")])
+   (V8DI "avx512f") (V4DI "avx512f") (V2DI "avx512f")
+   (V32HF "avx512bw") (V16HF "avx512bw") (V8HF "avx512bw")])
 
 (define_insn "avx2_pbroadcast<mode>"
-  [(set (match_operand:VI 0 "register_operand" "=x,v")
-	(vec_duplicate:VI
+  [(set (match_operand:VIHF 0 "register_operand" "=x,v")
+	(vec_duplicate:VIHF
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "xm,vm")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX2"
-  "vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}"
+  "vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}"
   [(set_attr "isa" "*,<pbroadcast_evex_isa>")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
@@ -21139,17 +21338,17 @@ (define_insn "avx2_pbroadcast<mode>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "avx2_pbroadcast<mode>_1"
-  [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v")
-	(vec_duplicate:VI_256
+  [(set (match_operand:VIHF_256 0 "register_operand" "=x,x,v,v")
+	(vec_duplicate:VIHF_256
 	  (vec_select:<ssescalarmode>
-	    (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v")
+	    (match_operand:VIHF_256 1 "nonimmediate_operand" "m,x,m,v")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX2"
   "@
-   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
-   vpbroadcast<ssemodesuffix>\t{%1, %0|%0, %<iptr>1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}"
+   vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}
+   vpbroadcast<sseintmodesuffix>\t{%1, %0|%0, %<iptr>1}
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}"
   [(set_attr "isa" "*,*,<pbroadcast_evex_isa>,<pbroadcast_evex_isa>")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
@@ -21503,15 +21702,15 @@ (define_insn "avx2_vec_dupv4df"
    (set_attr "mode" "V4DF")])
 
 (define_insn "<avx512>_vec_dup<mode>_1"
-  [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v")
-	(vec_duplicate:VI_AVX512BW
+  [(set (match_operand:VIHF_AVX512BW 0 "register_operand" "=v,v")
+	(vec_duplicate:VIHF_AVX512BW
 	  (vec_select:<ssescalarmode>
-	    (match_operand:VI_AVX512BW 1 "nonimmediate_operand" "v,m")
+	    (match_operand:VIHF_AVX512BW 1 "nonimmediate_operand" "v,m")
 	    (parallel [(const_int 0)]))))]
   "TARGET_AVX512F"
   "@
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %x1}
-   vpbroadcast<ssemodesuffix>\t{%x1, %0|%0, %<iptr>1}"
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %x1}
+   vpbroadcast<sseintmodesuffix>\t{%x1, %0|%0, %<iptr>1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -21536,8 +21735,8 @@ (define_insn "<avx512>_vec_dup<mode><mask_name>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx512>_vec_dup<mode><mask_name>"
-  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
-	(vec_duplicate:VI12_AVX512VL
+  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v")
+	(vec_duplicate:VI12HF_AVX512VL
 	  (vec_select:<ssescalarmode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
 	    (parallel [(const_int 0)]))))]
@@ -21572,8 +21771,8 @@ (define_insn "<mask_codefor>avx512f_broadcast<mode><mask_name>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>"
-  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v,v")
-	(vec_duplicate:VI12_AVX512VL
+  [(set (match_operand:VI12HF_AVX512VL 0 "register_operand" "=v,v")
+	(vec_duplicate:VI12HF_AVX512VL
 	  (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "vm,r")))]
   "TARGET_AVX512BW"
   "@
@@ -21668,7 +21867,7 @@ (define_mode_attr vecdupssescalarmodesuffix
   [(V8SF "ss") (V4DF "sd") (V8SI "ss") (V4DI "sd")])
 ;; Modes handled by AVX2 vec_dup patterns.
 (define_mode_iterator AVX2_VEC_DUP_MODE
-  [V32QI V16QI V16HI V8HI V8SI V4SI])
+  [V32QI V16QI V16HI V8HI V8SI V4SI V16HF V8HF])
 
 (define_insn "*vec_dup<mode>"
   [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand" "=x,x,v")
@@ -22224,12 +22423,12 @@ (define_insn "vec_set_hi_<mode><mask_name>"
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_insn "vec_set_lo_v16hi"
-  [(set (match_operand:V16HI 0 "register_operand" "=x,v")
-	(vec_concat:V16HI
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,vm")
-	  (vec_select:V8HI
-	    (match_operand:V16HI 1 "register_operand" "x,v")
+(define_insn "vec_set_lo_<mode>"
+  [(set (match_operand:V16_256 0 "register_operand" "=x,v")
+	(vec_concat:V16_256
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V16_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 8) (const_int 9)
 		       (const_int 10) (const_int 11)
 		       (const_int 12) (const_int 13)
@@ -22244,16 +22443,16 @@ (define_insn "vec_set_lo_v16hi"
    (set_attr "prefix" "vex,evex")
    (set_attr "mode" "OI")])
 
-(define_insn "vec_set_hi_v16hi"
-  [(set (match_operand:V16HI 0 "register_operand" "=x,v")
-	(vec_concat:V16HI
-	  (vec_select:V8HI
-	    (match_operand:V16HI 1 "register_operand" "x,v")
+(define_insn "vec_set_hi_<mode>"
+  [(set (match_operand:V16_256 0 "register_operand" "=x,v")
+	(vec_concat:V16_256
+	  (vec_select:<ssehalfvecmode>
+	    (match_operand:V16_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")))]
   "TARGET_AVX"
   "@
    vinsert%~128\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}
@@ -22430,6 +22629,8 @@ (define_mode_iterator VEC_INIT_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
@@ -22441,6 +22642,8 @@ (define_mode_iterator VEC_INIT_HALF_MODE
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
    (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
+   (V32HF "TARGET_AVX512FP16") (V16HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16")
    (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
    (V4TI "TARGET_AVX512F")])
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.
  2021-08-02  6:31                   ` [PATCH V3 0/6] Initial support for AVX512FP16 liuhongt
                                       ` (4 preceding siblings ...)
  2021-08-02  6:39                     ` [PATCH 6/6] AVX512FP16: Support vector init/broadcast/set/extract for FP16 liuhongt
@ 2021-08-02  6:44                     ` liuhongt
  2021-08-04  2:40                       ` Hongtao Liu
  2021-08-04  9:55                       ` Uros Bizjak
  2021-09-02  6:06                     ` [PATCH V3 0/6] Initial support for AVX512FP16 Hongtao Liu
  6 siblings, 2 replies; 138+ messages in thread
From: liuhongt @ 2021-08-02  6:44 UTC (permalink / raw)
  To: gcc-patches
  Cc: ubizjak, crazylht, joseph, richard.guenther, hjl.tools, Guo,
	Xuepeng, H . J . Lu, Liu Hongtao, Wang Hongyu, Xu Dianhong

From: "Guo, Xuepeng" <xuepeng.guo@intel.com>

gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (get_available_features):
	Detect FEATURE_AVX512FP16.
	* common/config/i386/i386-common.c
	(OPTION_MASK_ISA_AVX512FP16_SET,
	OPTION_MASK_ISA_AVX512FP16_UNSET,
	OPTION_MASK_ISA2_AVX512FP16_SET,
	OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
	(OPTION_MASK_ISA2_AVX512BW_UNSET,
	OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
	(ix86_handle_option): Handle -mavx512fp16.
	* common/config/i386/i386-cpuinfo.h (enum processor_features):
	Add FEATURE_AVX512FP16.
	* common/config/i386/i386-isas.h: Add entry for AVX512FP16.
	* config.gcc: Add avx512fp16intrin.h.
	* config/i386/avx512fp16intrin.h: New intrinsic header.
	* config/i386/cpuid.h: Add bit_AVX512FP16.
	* config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
	* config/i386/i386-builtins.c: Support _Float16 type for i386
	backend.
	(ix86_init_float16_builtins): New function.
	(ix86_float16_type_node): New.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__AVX512FP16__.
	* config/i386/i386-expand.c (ix86_expand_branch): Support
	HFmode.
	(ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
	SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
	(ix86_expand_fp_movcc): Ditto.
	* config/i386/i386-isa.def: Add PTA define for AVX512FP16.
	* config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
	(ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
	* config/i386/i386.c (ix86_get_ssemov): Use
	vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
	(ix86_get_excess_precision): Use
	FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
	existed.
	(sse_store_index): Use SFmode cost for HFmode cost.
	(inline_memory_move_cost): Add HFmode, and perfer SSE cost over
	GPR cost for HFmode.
	(ix86_hard_regno_mode_ok): Allow HImode in sse register.
	(ix86_mangle_type): Add manlging for _Float16 type.
	(inline_secondary_memory_needed): No memory is needed for
	16bit movement between gpr and sse reg under
	TARGET_AVX512FP16.
	(ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
	SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
	(ix86_division_cost): Ditto.
	(ix86_rtx_costs): Ditto.
	(ix86_add_stmt_cost): Ditto.
	(ix86_optab_supported_p): Ditto.
	* config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
	(SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
	(PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
	* config/i386/i386.md (mode): Add HFmode.
	(MODE_SIZE): Add HFmode.
	(isa): Add avx512fp16.
	(enabled): Handle avx512fp16.
	(ssemodesuffix): Add sh suffix for HFmode.
	(comm): Add mult, div.
	(plusminusmultdiv): New code iterator.
	(insn): Add mult, div.
	(*movhf_internal): Adjust for avx512fp16 instruction.
	(*movhi_internal): Ditto.
	(*cmpi<unord>hf): New define_insn for HFmode.
	(*ieee_s<ieee_maxmin>hf3): Likewise.
	(extendhf<mode>2): Likewise.
	(trunc<mode>hf2): Likewise.
	(float<floatunssuffix><mode>hf2): Likewise.
	(*<insn>hf): Likewise.
	(cbranchhf4): New expander.
	(movhfcc): Likewise.
	(<insn>hf3): Likewise.
	(mulhf3): Likewise.
	(divhf3): Likewise.
	* config/i386/i386.opt: Add mavx512fp16.
	* config/i386/immintrin.h: Include avx512fp16intrin.h.
	* doc/invoke.texi: Add mavx512fp16.
	* doc/extend.texi: Add avx512fp16 Usage Notes.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
	* gcc.target/i386/avx-2.c: Ditto.
	* gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute check.
	* gcc.target/i386/sse-13.c: Add -mavx512fp16.
	* gcc.target/i386/sse-14.c: Ditto.
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* lib/target-supports.exp: (check_effective_target_avx512fp16): New.
	* g++.target/i386/float16-1.C: New test.
	* g++.target/i386/float16-2.C: Ditto.
	* g++.target/i386/float16-3.C: Ditto.
	* gcc.target/i386/avx512fp16-12a.c: Ditto.
	* gcc.target/i386/avx512fp16-12b.c: Ditto.
	* gcc.target/i386/float16-3a.c: Ditto.
	* gcc.target/i386/float16-3b.c: Ditto.
	* gcc.target/i386/float16-4a.c: Ditto.
	* gcc.target/i386/float16-4b.c: Ditto.
	* gcc.target/i386/pr54855-12.c: Ditto.
	* g++.dg/other/i386-2.C: Ditto.
	* g++.dg/other/i386-3.C: Ditto.

Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
Co-Authored-By: Liu Hongtao <hongtao.liu@intel.com>
Co-Authored-By: Wang Hongyu <hongyu.wang@intel.com>
Co-Authored-By: Xu Dianhong <dianhong.xu@intel.com>
---
 gcc/common/config/i386/cpuinfo.h              |   2 +
 gcc/common/config/i386/i386-common.c          |  26 ++-
 gcc/common/config/i386/i386-cpuinfo.h         |   1 +
 gcc/common/config/i386/i386-isas.h            |   1 +
 gcc/config.gcc                                |   2 +-
 gcc/config/i386/avx512fp16intrin.h            |  53 ++++++
 gcc/config/i386/cpuid.h                       |   1 +
 gcc/config/i386/i386-builtin-types.def        |   1 +
 gcc/config/i386/i386-builtins.c               |  23 +++
 gcc/config/i386/i386-c.c                      |   2 +
 gcc/config/i386/i386-expand.c                 |   5 +-
 gcc/config/i386/i386-isa.def                  |   1 +
 gcc/config/i386/i386-options.c                |   4 +-
 gcc/config/i386/i386.c                        | 133 ++++++++++----
 gcc/config/i386/i386.h                        |  11 +-
 gcc/config/i386/i386.md                       | 172 ++++++++++++++++--
 gcc/config/i386/i386.opt                      |   4 +
 gcc/config/i386/immintrin.h                   |   4 +
 gcc/doc/extend.texi                           |   8 +
 gcc/doc/invoke.texi                           |  10 +-
 gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
 gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
 gcc/testsuite/g++.target/i386/float16-2.C     |  14 ++
 gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
 gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
 gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
 gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
 .../gcc.target/i386/avx512fp16-12a.c          |  21 +++
 .../gcc.target/i386/avx512fp16-12b.c          |  27 +++
 gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
 gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 ++
 gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
 gcc/testsuite/lib/target-supports.exp         |  13 +-
 41 files changed, 558 insertions(+), 76 deletions(-)
 create mode 100644 gcc/config/i386/avx512fp16intrin.h
 create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
 create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 458f41de776..1835ac64e67 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -731,6 +731,8 @@ get_available_features (struct __processor_model *cpu_model,
 	    set_feature (FEATURE_AVX5124FMAPS);
 	  if (edx & bit_AVX512VP2INTERSECT)
 	    set_feature (FEATURE_AVX512VP2INTERSECT);
+	  if (edx & bit_AVX512FP16)
+	    set_feature (FEATURE_AVX512FP16);
 	}
 
       __cpuid_count (7, 1, eax, ebx, ecx, edx);
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 76ab1a14e54..00c65ba15ab 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX5124VNNIW_SET OPTION_MASK_ISA2_AVX5124VNNIW
 #define OPTION_MASK_ISA_AVX512VBMI2_SET \
   (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512F_SET)
+#define OPTION_MASK_ISA_AVX512FP16_SET OPTION_MASK_ISA_AVX512BW_SET
+#define OPTION_MASK_ISA2_AVX512FP16_SET OPTION_MASK_ISA2_AVX512FP16
 #define OPTION_MASK_ISA_AVX512VNNI_SET \
   (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512F_SET)
 #define OPTION_MASK_ISA2_AVXVNNI_SET OPTION_MASK_ISA2_AVXVNNI
@@ -231,6 +233,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS
 #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW
 #define OPTION_MASK_ISA_AVX512VBMI2_UNSET OPTION_MASK_ISA_AVX512VBMI2
+#define OPTION_MASK_ISA_AVX512FP16_UNSET OPTION_MASK_ISA_AVX512BW_UNSET
+#define OPTION_MASK_ISA2_AVX512FP16_UNSET OPTION_MASK_ISA2_AVX512FP16
 #define OPTION_MASK_ISA_AVX512VNNI_UNSET OPTION_MASK_ISA_AVX512VNNI
 #define OPTION_MASK_ISA2_AVXVNNI_UNSET OPTION_MASK_ISA2_AVXVNNI
 #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ
@@ -313,7 +317,8 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA2_AVX512BF16_UNSET \
    | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \
    | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \
-   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
+   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
+   | OPTION_MASK_ISA2_AVX512FP16_UNSET)
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
   (OPTION_MASK_ISA2_AVX512F_UNSET)
 #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
@@ -326,7 +331,9 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET)
 #define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET
 
-#define OPTION_MASK_ISA2_AVX512BW_UNSET OPTION_MASK_ISA2_AVX512BF16_UNSET
+#define OPTION_MASK_ISA2_AVX512BW_UNSET \
+  (OPTION_MASK_ISA2_AVX512BF16_UNSET \
+    | OPTION_MASK_ISA2_AVX512FP16_UNSET)
 
 /* Set 1 << value as value of -malign-FLAG option.  */
 
@@ -853,6 +860,21 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mavx512fp16:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512FP16_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
+	}
+      return true;
+
     case OPT_mavx512vnni:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index e68dd656046..4e0659fc7b2 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -228,6 +228,7 @@ enum processor_features
   FEATURE_AESKLE,
   FEATURE_WIDEKL,
   FEATURE_AVXVNNI,
+  FEATURE_AVX512FP16,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 898c18f3dda..a6783660278 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -169,4 +169,5 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("aeskle", FEATURE_AESKLE, P_NONE, NULL)
   ISA_NAMES_TABLE_ENTRY("widekl", FEATURE_WIDEKL, P_NONE, "-mwidekl")
   ISA_NAMES_TABLE_ENTRY("avxvnni", FEATURE_AVXVNNI, P_NONE, "-mavxvnni")
+  ISA_NAMES_TABLE_ENTRY("avx512fp16", FEATURE_AVX512FP16, P_NONE, "-mavx512fp16")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3df9b52cf25..a354351408c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
 		       tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
 		       amxbf16intrin.h x86gprintrin.h uintrintrin.h
 		       hresetintrin.h keylockerintrin.h avxvnniintrin.h
-		       mwaitintrin.h"
+		       mwaitintrin.h avx512fp16intrin.h"
 	;;
 ia64-*-*)
 	extra_headers=ia64intrin.h
diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
new file mode 100644
index 00000000000..38d63161ba6
--- /dev/null
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -0,0 +1,53 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
+#endif
+
+#ifndef __AVX512FP16INTRIN_H_INCLUDED
+#define __AVX512FP16INTRIN_H_INCLUDED
+
+#ifndef __AVX512FP16__
+#pragma GCC push_options
+#pragma GCC target("avx512fp16")
+#define __DISABLE_AVX512FP16__
+#endif /* __AVX512FP16__ */
+
+/* Internal data types for implementing the intrinsics.  */
+typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
+typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
+
+/* The Intel API is flexible enough that we must allow aliasing with other
+   vector types, and their scalar components.  */
+typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
+typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
+typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
+
+#ifdef __DISABLE_AVX512FP16__
+#undef __DISABLE_AVX512FP16__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX512FP16__ */
+
+#endif /* __AVX512FP16INTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index aebc17c6827..82b8050028b 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -126,6 +126,7 @@
 #define bit_AVX5124VNNIW (1 << 2)
 #define bit_AVX5124FMAPS (1 << 3)
 #define bit_AVX512VP2INTERSECT	(1 << 8)
+#define bit_AVX512FP16   (1 << 23)
 #define bit_IBT	(1 << 20)
 #define bit_UINTR (1 << 5)
 #define bit_PCONFIG	(1 << 18)
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 3ca313c19ec..1768b88d748 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -68,6 +68,7 @@ DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
 DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
+DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
index 204e2903126..668f09f12a0 100644
--- a/gcc/config/i386/i386-builtins.c
+++ b/gcc/config/i386/i386-builtins.c
@@ -125,6 +125,7 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
 /* Table for the ix86 builtin non-function types.  */
 static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
+tree ix86_float16_type_node = NULL_TREE;
 /* Retrieve an element from the above table, building some of
    the types lazily.  */
 
@@ -1343,6 +1344,26 @@ ix86_init_builtins_va_builtins_abi (void)
 			BUILT_IN_VA_COPY, BUILT_IN_NORMAL, NULL, fnattr_sysv);
 }
 
+static void
+ix86_init_float16_builtins (void)
+{
+  /* Provide the _Float16 type and float16_type_node if needed so that
+     it can be used in AVX512FP16 intrinsics and builtins.  */
+  if (!float16_type_node)
+    {
+      ix86_float16_type_node = make_node (REAL_TYPE);
+      TYPE_PRECISION (ix86_float16_type_node) = 16;
+      SET_TYPE_MODE (ix86_float16_type_node, HFmode);
+      layout_type (ix86_float16_type_node);
+    }
+  else
+    ix86_float16_type_node = float16_type_node;
+
+  if (!maybe_get_identifier ("_Float16") && TARGET_SSE2)
+    lang_hooks.types.register_builtin_type (ix86_float16_type_node,
+					    "_Float16");
+}
+
 static void
 ix86_init_builtin_types (void)
 {
@@ -1371,6 +1392,8 @@ ix86_init_builtin_types (void)
      it.  */
   lang_hooks.types.register_builtin_type (float128_type_node, "__float128");
 
+  ix86_init_float16_builtins ();
+
   const_string_type_node
     = build_pointer_type (build_qualified_type
 			  (char_type_node, TYPE_QUAL_CONST));
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 5ed0de006fb..cc64f855ecc 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -598,6 +598,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__PTWRITE__");
   if (isa_flag2 & OPTION_MASK_ISA2_AVX512BF16)
     def_or_undef (parse_in, "__AVX512BF16__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AVX512FP16)
+    def_or_undef (parse_in, "__AVX512FP16__");
   if (TARGET_MMX_WITH_SSE)
     def_or_undef (parse_in, "__MMX_WITH_SSE__");
   if (isa_flag2 & OPTION_MASK_ISA2_ENQCMD)
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 69ea79e6123..b7d050a1e42 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -2314,6 +2314,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
 
   switch (mode)
     {
+    case E_HFmode:
     case E_SFmode:
     case E_DFmode:
     case E_XFmode:
@@ -2627,7 +2628,7 @@ ix86_prepare_fp_compare_args (enum rtx_code code, rtx *pop0, rtx *pop1)
   bool unordered_compare = ix86_unordered_fp_compare (code);
   rtx op0 = *pop0, op1 = *pop1;
   machine_mode op_mode = GET_MODE (op0);
-  bool is_sse = TARGET_SSE_MATH && SSE_FLOAT_MODE_P (op_mode);
+  bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
 
   /* All of the unordered compare instructions only work on registers.
      The same is true of the fcomi compare instructions.  The XFmode
@@ -4112,7 +4113,7 @@ ix86_expand_fp_movcc (rtx operands[])
   rtx op0 = XEXP (operands[1], 0);
   rtx op1 = XEXP (operands[1], 1);
 
-  if (TARGET_SSE_MATH && SSE_FLOAT_MODE_P (mode))
+  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
     {
       machine_mode cmode;
 
diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
index a0d46cbc892..83d9302ea3d 100644
--- a/gcc/config/i386/i386-isa.def
+++ b/gcc/config/i386/i386-isa.def
@@ -108,3 +108,4 @@ DEF_PTA(HRESET)
 DEF_PTA(KL)
 DEF_PTA(WIDEKL)
 DEF_PTA(AVXVNNI)
+DEF_PTA(AVX512FP16)
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 3416a4f1752..df191763e4b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -223,7 +223,8 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mhreset",		OPTION_MASK_ISA2_HRESET },
   { "-mkl",		OPTION_MASK_ISA2_KL },
   { "-mwidekl", 	OPTION_MASK_ISA2_WIDEKL },
-  { "-mavxvnni",	OPTION_MASK_ISA2_AVXVNNI }
+  { "-mavxvnni",	OPTION_MASK_ISA2_AVXVNNI },
+  { "-mavx512fp16",	OPTION_MASK_ISA2_AVX512FP16 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -1045,6 +1046,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
     IX86_ATTR_ISA ("hreset", OPT_mhreset),
     IX86_ATTR_ISA ("avxvnni",   OPT_mavxvnni),
+    IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index dc673c89bc8..71bbcf968c5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5497,6 +5497,14 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
     case MODE_SI:
       return "%vmovd\t{%1, %0|%0, %1}";
 
+    case MODE_HI:
+      if (GENERAL_REG_P (operands[0]))
+	return "vmovw\t{%1, %k0|%k0, %1}";
+      else if (GENERAL_REG_P (operands[1]))
+	return "vmovw\t{%k1, %0|%0, %k1}";
+      else
+	return "vmovw\t{%1, %0|%0, %1}";
+
     case MODE_DF:
       if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
 	return "vmovsd\t{%d1, %0|%0, %d1}";
@@ -5509,6 +5517,12 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
       else
 	return "%vmovss\t{%1, %0|%0, %1}";
 
+    case MODE_HF:
+      if (REG_P (operands[0]) && REG_P (operands[1]))
+	return "vmovsh\t{%d1, %0|%0, %d1}";
+      else
+	return "vmovsh\t{%1, %0|%0, %1}";
+
     case MODE_V1DF:
       gcc_assert (!TARGET_AVX);
       return "movlpd\t{%1, %0|%0, %1}";
@@ -13955,7 +13969,7 @@ output_387_binary_op (rtx_insn *insn, rtx *operands)
 
   if (is_sse)
    {
-     p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd";
+     p = (GET_MODE (operands[0]) == SFmode ? "ss" : "sd");
      strcat (buf, p);
 
      if (TARGET_AVX)
@@ -19132,10 +19146,19 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1,
       if (!TARGET_SSE2)
 	return true;
 
+      if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2)))
+	return true;
+
+      int msize = GET_MODE_SIZE (mode);
+
       /* Between SSE and general, we have moves no larger than word size.  */
-      if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
-	  || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
-	  || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+      if (msize > UNITS_PER_WORD)
+	return true;
+
+      /* In addition to SImode moves, AVX512FP16 also enables HImode moves.  */
+      int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);
+
+      if (msize < minsize)
 	return true;
 
       /* If the target says that inter-unit moves are more expensive
@@ -19229,21 +19252,26 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
 static inline int
 sse_store_index (machine_mode mode)
 {
-      switch (GET_MODE_SIZE (mode))
-	{
-	  case 4:
-	    return 0;
-	  case 8:
-	    return 1;
-	  case 16:
-	    return 2;
-	  case 32:
-	    return 3;
-	  case 64:
-	    return 4;
-	  default:
-	    return -1;
-	}
+  /* NB: Use SFmode cost for HFmode instead of adding HFmode load/store
+     costs to processor_costs, which requires changes to all entries in
+     processor cost table.  */
+  if (mode == E_HFmode)
+    mode = E_SFmode;
+  switch (GET_MODE_SIZE (mode))
+    {
+    case 4:
+      return 0;
+    case 8:
+      return 1;
+    case 16:
+      return 2;
+    case 32:
+      return 3;
+    case 64:
+      return 4;
+    default:
+      return -1;
+    }
 }
 
 /* Return the cost of moving data of mode M between a
@@ -19270,6 +19298,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
       int index;
       switch (mode)
 	{
+	  case E_HFmode:
 	  case E_SFmode:
 	    index = 0;
 	    break;
@@ -19370,11 +19399,31 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
 	  }
 	break;
       case 2:
-	if (in == 2)
-	  return MAX (ix86_cost->hard_register.int_load[1],
-		      ix86_cost->hard_register.int_store[1]);
-	return in ? ix86_cost->hard_register.int_load[1]
-		  : ix86_cost->hard_register.int_store[1];
+	{
+	  int cost;
+	  if (in == 2)
+	    cost = MAX (ix86_cost->hard_register.int_load[1],
+			ix86_cost->hard_register.int_store[1]);
+	  else
+	    cost = in ? ix86_cost->hard_register.int_load[1]
+		      : ix86_cost->hard_register.int_store[1];
+	  if (mode == E_HFmode)
+	    {
+	      /* Prefer SSE over GPR for HFmode.  */
+	      int sse_cost;
+	      int index = sse_store_index (mode);
+	      if (in == 2)
+		sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
+				ix86_cost->hard_register.sse_store[index]);
+	      else
+		sse_cost = (in
+			    ? ix86_cost->hard_register.sse_load [index]
+			    : ix86_cost->hard_register.sse_store [index]);
+	      if (sse_cost >= cost)
+		cost = sse_cost + 1;
+	    }
+	  return cost;
+	}
       default:
 	if (in == 2)
 	  cost = MAX (ix86_cost->hard_register.int_load[2],
@@ -19548,6 +19597,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	  - XI mode
 	  - any of 512-bit wide vector mode
 	  - any scalar mode.  */
+      /* For AVX512FP16, vmovw supports movement of HImode
+	 between gpr and sse registser.  */
       if (TARGET_AVX512F
 	  && (mode == XImode
 	      || VALID_AVX512F_REG_MODE (mode)
@@ -19831,7 +19882,7 @@ ix86_multiplication_cost (const struct processor_costs *cost,
   if (VECTOR_MODE_P (mode))
     inner_mode = GET_MODE_INNER (mode);
 
-  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
     return inner_mode == DFmode ? cost->mulsd : cost->mulss;
   else if (X87_FLOAT_MODE_P (mode))
     return cost->fmul;
@@ -19883,7 +19934,7 @@ ix86_division_cost (const struct processor_costs *cost,
   if (VECTOR_MODE_P (mode))
     inner_mode = GET_MODE_INNER (mode);
 
-  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
     return inner_mode == DFmode ? cost->divsd : cost->divss;
   else if (X87_FLOAT_MODE_P (mode))
     return cost->fdiv;
@@ -20303,7 +20354,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
 	  return true;
 	}
 
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	{
 	  *total = cost->addss;
 	  return false;
@@ -20336,7 +20387,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       /* FALLTHRU */
 
     case NEG:
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	{
 	  *total = cost->sse_op;
 	  return false;
@@ -20418,14 +20469,14 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       return false;
 
     case FLOAT_EXTEND:
-      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
+      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = 0;
       else
         *total = ix86_vec_cost (mode, cost->addss);
       return false;
 
     case FLOAT_TRUNCATE:
-      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
+      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = cost->fadd;
       else
         *total = ix86_vec_cost (mode, cost->addss);
@@ -20435,7 +20486,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       /* SSE requires memory load for the constant operand. It may make
 	 sense to account for this.  Of course the constant operand may or
 	 may not be reused. */
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = cost->sse_op;
       else if (X87_FLOAT_MODE_P (mode))
 	*total = cost->fabs;
@@ -20444,7 +20495,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       return false;
 
     case SQRT:
-      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	*total = mode == SFmode ? cost->sqrtss : cost->sqrtsd;
       else if (X87_FLOAT_MODE_P (mode))
 	*total = cost->fsqrt;
@@ -21928,6 +21979,10 @@ ix86_mangle_type (const_tree type)
 
   switch (TYPE_MODE (type))
     {
+    case E_HFmode:
+      /* _Float16 is "DF16_".
+	 Align with clang's decision in https://reviews.llvm.org/D33719. */
+      return "DF16_";
     case E_TFmode:
       /* __float128 is "g".  */
       return "g";
@@ -22551,7 +22606,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	case MINUS_EXPR:
 	  if (kind == scalar_stmt)
 	    {
-	      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+	      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 		stmt_cost = ix86_cost->addss;
 	      else if (X87_FLOAT_MODE_P (mode))
 		stmt_cost = ix86_cost->fadd;
@@ -22569,7 +22624,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	  stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
 	  break;
 	case NEGATE_EXPR:
-	  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+	  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	    stmt_cost = ix86_cost->sse_op;
 	  else if (X87_FLOAT_MODE_P (mode))
 	    stmt_cost = ix86_cost->fchs;
@@ -22625,7 +22680,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	case BIT_XOR_EXPR:
 	case BIT_AND_EXPR:
 	case BIT_NOT_EXPR:
-	  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+	  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
 	    stmt_cost = ix86_cost->sse_op;
 	  else if (VECTOR_MODE_P (mode))
 	    stmt_cost = ix86_vec_cost (mode, ix86_cost->sse_op);
@@ -23327,14 +23382,18 @@ ix86_get_excess_precision (enum excess_precision_type type)
 	/* The fastest type to promote to will always be the native type,
 	   whether that occurs with implicit excess precision or
 	   otherwise.  */
-	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+	return TARGET_AVX512FP16
+	       ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
+	       : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
       case EXCESS_PRECISION_TYPE_STANDARD:
       case EXCESS_PRECISION_TYPE_IMPLICIT:
 	/* Otherwise, the excess precision we want when we are
 	   in a standards compliant mode, and the implicit precision we
 	   provide would be identical were it not for the unpredictable
 	   cases.  */
-	if (!TARGET_80387)
+	if (TARGET_AVX512FP16 && TARGET_SSE_MATH)
+	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
+	else if (!TARGET_80387)
 	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
 	else if (!TARGET_MIX_SSE_I387)
 	  {
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index b1e66ee192e..8fcd5693624 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1000,7 +1000,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define VALID_AVX512F_SCALAR_MODE(MODE)					\
   ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode		\
-   || (MODE) == SFmode)
+   || (MODE) == SFmode							\
+   || (TARGET_AVX512FP16 && ((MODE) == HImode || (MODE) == HFmode)))
 
 #define VALID_AVX512F_REG_MODE(MODE)					\
   ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode	\
@@ -1039,7 +1040,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define VALID_FP_MODE_P(MODE)						\
   ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode		\
-   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)		\
+   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
 
 #define VALID_INT_MODE_P(MODE)						\
   ((MODE) == QImode || (MODE) == HImode					\
@@ -1072,6 +1073,10 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
 
+#define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE)				\
+  ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH)				\
+   || (TARGET_AVX512FP16 && (MODE) == HFmode))
+
 #define FMA4_VEC_FLOAT_MODE_P(MODE) \
   (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \
 		  || (MODE) == V8SFmode || (MODE) == V4DFmode))
@@ -2265,7 +2270,7 @@ constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
 constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
   | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
-  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI;
+  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16;
 constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
   | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
 constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d475347172d..777d11261ac 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -496,7 +496,7 @@ (define_attr "type"
 
 ;; Main data type used by the insn
 (define_attr "mode"
-  "unknown,none,QI,HI,SI,DI,TI,OI,XI,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
+  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
   V2DF,V2SF,V1DF,V8DF"
   (const_string "unknown"))
 
@@ -832,8 +832,7 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
 		    avx512bw,noavx512bw,avx512dq,noavx512dq,
-		    avx512vl,noavx512vl,
-		    avxvnni,avx512vnnivl"
+		    avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16"
   (const_string "base"))
 
 ;; Define instruction set of MMX instructions
@@ -885,6 +884,8 @@ (define_attr "enabled" ""
 	 (eq_attr "isa" "avxvnni") (symbol_ref "TARGET_AVXVNNI")
 	 (eq_attr "isa" "avx512vnnivl")
 	   (symbol_ref "TARGET_AVX512VNNI && TARGET_AVX512VL")
+	 (eq_attr "isa" "avx512fp16")
+	   (symbol_ref "TARGET_AVX512FP16")
 
 	 (eq_attr "mmx_isa" "native")
 	   (symbol_ref "!TARGET_MMX_WITH_SSE")
@@ -906,6 +907,7 @@ (define_asm_attributes
    (set_attr "type" "multi")])
 
 (define_code_iterator plusminus [plus minus])
+(define_code_iterator plusminusmultdiv [plus minus mult div])
 
 (define_code_iterator sat_plusminus [ss_plus us_plus ss_minus us_minus])
 
@@ -921,7 +923,8 @@ (define_code_attr multdiv_mnemonic
 
 ;; Mark commutative operators as such in constraints.
 (define_code_attr comm [(plus "%") (ss_plus "%") (us_plus "%")
-			(minus "") (ss_minus "") (us_minus "")])
+			(minus "") (ss_minus "") (us_minus "")
+			(mult "%") (div "")])
 
 ;; Mapping of max and min
 (define_code_iterator maxmin [smax smin umax umin])
@@ -1021,7 +1024,8 @@ (define_code_attr insn
    (minus "sub") (ss_minus "sssub") (us_minus "ussub")
    (sign_extend "extend") (zero_extend "zero_extend")
    (ashift "ashl") (lshiftrt "lshr") (ashiftrt "ashr")
-   (rotate "rotl") (rotatert "rotr")])
+   (rotate "rotl") (rotatert "rotr")
+   (mult "mul") (div "div")])
 
 ;; All integer modes.
 (define_mode_iterator SWI1248x [QI HI SI DI])
@@ -1089,8 +1093,9 @@ (define_mode_iterator SWI48DWI [SI DI (TI "TARGET_64BIT")])
 ;; compile time constant, it is faster to use <MODE_SIZE> than
 ;; GET_MODE_SIZE (<MODE>mode).  For XFmode which depends on
 ;; command line options just use GET_MODE_SIZE macro.
-(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (TI "16")
-			     (SF "4") (DF "8") (XF "GET_MODE_SIZE (XFmode)")
+(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
+			     (TI "16") (HF "2") (SF "4") (DF "8")
+			     (XF "GET_MODE_SIZE (XFmode)")
 			     (V16QI "16") (V32QI "32") (V64QI "64")
 			     (V8HI "16") (V16HI "32") (V32HI "64")
 			     (V4SI "16") (V8SI "32") (V16SI "64")
@@ -1222,8 +1227,8 @@ (define_mode_iterator MODEF [SF DF])
 ;; All x87 floating point modes
 (define_mode_iterator X87MODEF [SF DF XF])
 
-;; All x87 floating point modes plus HF
-(define_mode_iterator X87MODEFH [SF DF XF HF])
+;; All x87 floating point modes plus HFmode
+(define_mode_iterator X87MODEFH [HF SF DF XF])
 
 ;; All SSE floating point modes
 (define_mode_iterator SSEMODEF [SF DF TF])
@@ -1231,7 +1236,7 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
 
 ;; SSE instruction suffix for various modes
 (define_mode_attr ssemodesuffix
-  [(SF "ss") (DF "sd")
+  [(HF "sh") (SF "ss") (DF "sd")
    (V16SF "ps") (V8DF "pd")
    (V8SF "ps") (V4DF "pd")
    (V4SF "ps") (V2DF "pd")
@@ -1496,6 +1501,23 @@ (define_expand "cstorexf4"
   DONE;
 })
 
+(define_expand "cbranchhf4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:HF 1 "cmp_fp_expander_operand")
+		    (match_operand:HF 2 "cmp_fp_expander_operand")))
+   (set (pc) (if_then_else
+              (match_operator 0 "ix86_fp_comparison_operator"
+               [(reg:CC FLAGS_REG)
+                (const_int 0)])
+              (label_ref (match_operand 3))
+              (pc)))]
+  "TARGET_AVX512FP16"
+{
+  ix86_expand_branch (GET_CODE (operands[0]),
+		      operands[1], operands[2], operands[3]);
+  DONE;
+})
+
 (define_expand "cbranch<mode>4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
@@ -1705,6 +1727,17 @@ (define_insn "*cmpi<unord><MODEF:mode>"
 	 (eq_attr "alternative" "0")
 	 (symbol_ref "true")
 	 (symbol_ref "false"))))])
+
+(define_insn "*cmpi<unord>hf"
+  [(set (reg:CCFP FLAGS_REG)
+	(compare:CCFP
+	  (match_operand:HF 0 "register_operand" "v")
+	  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16"
+  "v<unord>comish\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecomi")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
 \f
 ;; Push/pop instructions.
 
@@ -2436,8 +2469,8 @@ (define_insn "*movsi_internal"
 	   (symbol_ref "true")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k")
-	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k,?r,?v,*v,*v,*m")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 
@@ -2463,6 +2496,9 @@ (define_insn "*movhi_internal"
 	  gcc_unreachable ();
 	}
 
+    case TYPE_SSEMOV:
+      return ix86_output_ssemov (insn, operands);
+
     case TYPE_MSKLOG:
       if (operands[1] == const0_rtx)
 	return "kxorw\t%0, %0, %0";
@@ -2477,8 +2513,15 @@ (define_insn "*movhi_internal"
 	return "mov{w}\t{%1, %0|%0, %1}";
     }
 }
-  [(set (attr "type")
-     (cond [(eq_attr "alternative" "4,5,6,7")
+  [(set (attr "isa")
+	(cond [(eq_attr "alternative" "9,10,11,12,13")
+		  (const_string "avx512fp16")
+	       ]
+	       (const_string "*")))
+   (set (attr "type")
+     (cond [(eq_attr "alternative" "9,10,11,12,13")
+	      (const_string "ssemov")
+	    (eq_attr "alternative" "4,5,6,7")
 	      (const_string "mskmov")
 	    (eq_attr "alternative" "8")
 	      (const_string "msklog")
@@ -2503,6 +2546,8 @@ (define_insn "*movhi_internal"
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
+	     (eq_attr "alternative" "11")
+	       (const_string "HF")
 	     (and (eq_attr "alternative" "1,2")
 		  (match_operand:HI 1 "aligned_operand"))
 	       (const_string "SI")
@@ -3727,7 +3772,10 @@ (define_insn "*movhf_internal"
 	       (eq_attr "alternative" "2")
 		 (const_string "sselog1")
 	       (eq_attr "alternative" "4,5,6,7")
-		 (const_string "sselog")
+		 (if_then_else
+		   (match_test ("TARGET_AVX512FP16"))
+		   (const_string "ssemov")
+		   (const_string "sselog"))
 	      ]
 	      (const_string "ssemov")))
    (set (attr "memory")
@@ -3750,9 +3798,15 @@ (define_insn "*movhf_internal"
 	       (eq_attr "alternative" "2")
 		 (const_string "V4SF")
 	       (eq_attr "alternative" "4,5,6,7")
-		 (const_string "TI")
+		 (if_then_else
+		   (match_test "TARGET_AVX512FP16")
+		   (const_string "HI")
+		   (const_string "TI"))
 	       (eq_attr "alternative" "3")
-		 (const_string "SF")
+		 (if_then_else
+		   (match_test "TARGET_AVX512FP16")
+		   (const_string "HF")
+		   (const_string "SF"))
 	      ]
 	      (const_string "*")))])
 
@@ -4493,6 +4547,17 @@ (define_split
   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
 })
 
+(define_insn "extendhf<mode>2"
+  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
+        (float_extend:MODEF
+	  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16"
+  "vcvtsh2<ssemodesuffix>\t{%1, %0, %0|%0, %0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<MODE>")])
+
+
 (define_expand "extend<mode>xf2"
   [(set (match_operand:XF 0 "nonimmediate_operand")
         (float_extend:XF (match_operand:MODEF 1 "general_operand")))]
@@ -4670,6 +4735,18 @@ (define_insn "truncxf<mode>2"
 	      (symbol_ref "flag_unsafe_math_optimizations")
 	   ]
 	   (symbol_ref "true")))])
+
+;; Conversion from {SF,DF}mode to HFmode.
+
+(define_insn "trunc<mode>hf2"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+       (float_truncate:HF
+         (match_operand:MODEF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16"
+  "vcvt<ssemodesuffix>2sh\t{%1, %d0|%d0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
 \f
 ;; Signed conversion to DImode.
 
@@ -5046,6 +5123,16 @@ (define_insn "*float<SWI48:mode><MODEF:mode>2"
 	      (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")]
 	   (symbol_ref "true")))])
 
+(define_insn "float<floatunssuffix><mode>hf2"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+	(any_float:HF
+	  (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
+  "TARGET_AVX512FP16"
+  "vcvt<floatsuffix>si2sh<rex64suffix>\t{%1, %d0|%d0, %1}"
+  [(set_attr "type" "sseicvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_insn "*floatdi<MODEF:mode>2_i387"
   [(set (match_operand:MODEF 0 "register_operand" "=f")
 	(float:MODEF (match_operand:DI 1 "nonimmediate_operand" "m")))]
@@ -7626,6 +7713,13 @@ (define_expand "<insn>xf3"
 	  (match_operand:XF 2 "register_operand")))]
   "TARGET_80387")
 
+(define_expand "<insn>hf3"
+  [(set (match_operand:HF 0 "register_operand")
+	(plusminus:HF
+	  (match_operand:HF 1 "register_operand")
+	  (match_operand:HF 2 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
 (define_expand "<insn><mode>3"
   [(set (match_operand:MODEF 0 "register_operand")
 	(plusminus:MODEF
@@ -8203,6 +8297,12 @@ (define_expand "mulxf3"
 		 (match_operand:XF 2 "register_operand")))]
   "TARGET_80387")
 
+(define_expand "mulhf3"
+  [(set (match_operand:HF 0 "register_operand")
+	(mult:HF (match_operand:HF 1 "register_operand")
+		    (match_operand:HF 2 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
 (define_expand "mul<mode>3"
   [(set (match_operand:MODEF 0 "register_operand")
 	(mult:MODEF (match_operand:MODEF 1 "register_operand")
@@ -8220,6 +8320,12 @@ (define_expand "divxf3"
 		(match_operand:XF 2 "register_operand")))]
   "TARGET_80387")
 
+(define_expand "divhf3"
+  [(set (match_operand:HF 0 "register_operand")
+	(div:HF (match_operand:HF 1 "register_operand")
+		   (match_operand:HF 2 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
 (define_expand "div<mode>3"
   [(set (match_operand:MODEF 0 "register_operand")
 	(div:MODEF (match_operand:MODEF 1 "register_operand")
@@ -16312,6 +16418,17 @@ (define_insn "*fop_<mode>_comm"
 	 (symbol_ref "true")
 	 (symbol_ref "false"))))])
 
+(define_insn "*<insn>hf"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+	(plusminusmultdiv:HF
+	  (match_operand:HF 1 "nonimmediate_operand" "<comm>v")
+	  (match_operand:HF 2 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "v<insn>sh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
 (define_insn "*rcpsf2_sse"
   [(set (match_operand:SF 0 "register_operand" "=x,x,x")
 	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
@@ -19178,6 +19295,15 @@ (define_peephole2
     gcc_unreachable ();
 })
 
+(define_expand "movhfcc"
+  [(set (match_operand:HF 0 "register_operand")
+	(if_then_else:HF
+	  (match_operand 1 "comparison_operator")
+	  (match_operand:HF 2 "register_operand")
+	  (match_operand:HF 3 "register_operand")))]
+  "TARGET_AVX512FP16"
+  "if (ix86_expand_fp_movcc (operands)) DONE; else FAIL;")
+
 (define_expand "mov<mode>cc"
   [(set (match_operand:X87MODEF 0 "register_operand")
 	(if_then_else:X87MODEF
@@ -19346,6 +19472,18 @@ (define_insn "<code><mode>3"
 ;; Their operands are not commutative, and thus they may be used in the
 ;; presence of -0.0 and NaN.
 
+(define_insn "*ieee_s<ieee_maxmin>hf3"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+	(unspec:HF
+	  [(match_operand:HF 1 "register_operand" "v")
+	   (match_operand:HF 2 "nonimmediate_operand" "vm")]
+	  IEEE_MAXMIN))]
+  "TARGET_AVX512FP16"
+  "v<ieee_maxmin>sh\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "prefix" "evex")
+   (set_attr "type" "sseadd")
+   (set_attr "mode" "HF")])
+
 (define_insn "*ieee_s<ieee_maxmin><mode>3"
   [(set (match_operand:MODEF 0 "register_operand" "=x,v")
 	(unspec:MODEF
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 7b8547bb1c3..ad366974b5b 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1166,3 +1166,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property.
 mmwait
 Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save
 Support MWAIT and MONITOR built-in functions and code generation.
+
+mavx512fp16
+Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation.
diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
index f129de4bbe5..2421a78637b 100644
--- a/gcc/config/i386/immintrin.h
+++ b/gcc/config/i386/immintrin.h
@@ -94,6 +94,10 @@
 
 #include <avx512vp2intersectvlintrin.h>
 
+#ifdef __SSE2__
+#include <avx512fp16intrin.h>
+#endif
+
 #include <shaintrin.h>
 
 #include <fmaintrin.h>
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3a1978efc97..09040bfca33 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1164,6 +1164,14 @@ to inconsistent behavior between software emulation and AVX512-FP16
 instructions. Using @option{-fexcess-precision=16} and  will force round
 back after each operation.
 
+Using @option{-mavx512fp16} will generate AVX512-FP16 instructions instead of
+software emulation. The default behavior of @code{FLT_EVAL_METHOD} is to round
+after each operation. The same is true with @option{-fexcess-precision=standard}
+and @option{-mfpmath=sse}. If there is no @option{-mfpmath=sse},
+@option{-fexcess-precision=standard} alone does the same thing as before,
+It is useful for code that does not have @code{_Float16} and runs on the x87
+FPU.
+
 @node Decimal Float
 @section Decimal Floating Types
 @cindex decimal floating types
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 32697e6117c..bb9f7ca956e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1393,6 +1393,7 @@ See RS/6000 and PowerPC Options.
 -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
 -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni@gol
+-mavx512fp16 @gol
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
 -mkl -mwidekl @gol
@@ -31154,6 +31155,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @itemx -mavx512bf16
 @opindex mavx512bf16
 @need 200
+@itemx -mavx512fp16
+@opindex mavx512fp16
+@need 200
 @itemx -mgfni
 @opindex mgfni
 @need 200
@@ -31232,9 +31236,9 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP,
 XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2,
 GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
 ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
-UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI or CLDEMOTE
-extended instruction sets. Each has a corresponding @option{-mno-} option to
-disable use of these instructions.
+UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16
+or CLDEMOTE extended instruction sets. Each has a corresponding
+@option{-mno-} option to disable use of these instructions.
 
 These extensions are also available as built-in functions: see
 @ref{x86 Built-in Functions}, for details of the functions enabled and
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index 62b2132957a..fba3d1ac684 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index 843aa2bdb2f..5cc0fa83457 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
diff --git a/gcc/testsuite/g++.target/i386/float16-1.C b/gcc/testsuite/g++.target/i386/float16-1.C
new file mode 100644
index 00000000000..95d1ac27c4f
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/float16-1.C
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-sse2" } */
+
+_Float16/* { dg-error "does not name a type" } */
+foo (_Float16 x) 
+{
+  return x;
+}
diff --git a/gcc/testsuite/g++.target/i386/float16-2.C b/gcc/testsuite/g++.target/i386/float16-2.C
new file mode 100644
index 00000000000..99eb797eff1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/float16-2.C
@@ -0,0 +1,14 @@
+/* { dg-do assemble { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+union flt
+{
+  _Float16 flt;
+  short s;
+};
+
+_Float16
+foo (union flt x)
+{
+  return x.flt;
+}
diff --git a/gcc/testsuite/g++.target/i386/float16-3.C b/gcc/testsuite/g++.target/i386/float16-3.C
new file mode 100644
index 00000000000..940878503f1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/float16-3.C
@@ -0,0 +1,10 @@
+/* { dg-do assemble { target avx512fp16 } } */
+/* { dg-options "-O0 -mavx512fp16" } */
+
+template <typename> void a(char *) {}
+char b, d;
+void c()
+{
+  a<unsigned char>(&d);
+  a<_Float16>(&b);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 6178e38ce02..f3676077743 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
index 986fbd819e4..1751c52565c 100644
--- a/gcc/testsuite/gcc.target/i386/avx-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 0a377dba1d5..0ad9064f637 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -87,6 +87,9 @@ main ()
 #ifdef AVX512VNNI
       && (ecx & bit_AVX512VNNI)
 #endif
+#ifdef AVX512FP16
+      && (edx & bit_AVX512FP16)
+#endif
 #ifdef VAES
       && (ecx & bit_VAES)
 #endif
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
new file mode 100644
index 00000000000..88887556d68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+__attribute__ ((noinline, noclone))
+do_max (_Float16 __A, _Float16 __B)
+{
+  return __A > __B ? __A : __B;
+}
+
+_Float16
+__attribute__ ((noinline, noclone))
+do_min (_Float16 __A, _Float16 __B)
+{
+  return __A < __B ? __A : __B;
+}
+
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-times "vminsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
new file mode 100644
index 00000000000..c9e23bf95c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+#include "avx512fp16-12a.c"
+
+static void
+do_test (void)
+{
+  _Float16 x = 0.1f;
+  _Float16 y = -3.2f;
+  _Float16 z;
+
+  z = do_max (x, y);
+  if (z != x)
+    abort ();
+
+  z = do_min (x, y);
+  if (z != y)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/float16-3a.c b/gcc/testsuite/gcc.target/i386/float16-3a.c
new file mode 100644
index 00000000000..3846c8e9b6e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-3a.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtsi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/float16-3b.c b/gcc/testsuite/gcc.target/i386/float16-3b.c
new file mode 100644
index 00000000000..247dd6e7e33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-3b.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (unsigned int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtusi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/float16-4a.c b/gcc/testsuite/gcc.target/i386/float16-4a.c
new file mode 100644
index 00000000000..631082581f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-4a.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (long long x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtsi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/float16-4b.c b/gcc/testsuite/gcc.target/i386/float16-4b.c
new file mode 100644
index 00000000000..828d8530769
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-4b.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+_Float16
+foo (unsigned long long x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "vcvtusi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 79265c7c94f..8499fdf2db9 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -79,6 +79,7 @@ extern void test_hreset (void)			__attribute__((__target__("hreset")));
 extern void test_keylocker (void)		__attribute__((__target__("kl")));
 extern void test_widekl (void)			__attribute__((__target__("widekl")));
 extern void test_avxvnni (void)			__attribute__((__target__("avxvnni")));
+extern void test_avx512fp16 (void)		__attribute__((__target__("avx512fp16")));
 
 extern void test_no_sgx (void)			__attribute__((__target__("no-sgx")));
 extern void test_no_avx5124fmaps(void)		__attribute__((__target__("no-avx5124fmaps")));
@@ -159,6 +160,7 @@ extern void test_no_hreset (void)		__attribute__((__target__("no-hreset")));
 extern void test_no_keylocker (void)		__attribute__((__target__("no-kl")));
 extern void test_no_widekl (void)		__attribute__((__target__("no-widekl")));
 extern void test_no_avxvnni (void)		__attribute__((__target__("no-avxvnni")));
+extern void test_no_avx512fp16 (void)		__attribute__((__target__("no-avx512fp16")));
 
 extern void test_arch_nocona (void)		__attribute__((__target__("arch=nocona")));
 extern void test_arch_core2 (void)		__attribute__((__target__("arch=core2")));
diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
new file mode 100644
index 00000000000..2f8af392c83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
+/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
+
+#include <immintrin.h>
+
+_Float16
+foo (_Float16 x, _Float16 y)
+{
+  x = x > y ? x : y;
+  return x;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 7029771334b..f5f5c113612 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 4ce0ffffaf3..747d504cedb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 6e8b6f3fa1b..33411969901 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -103,7 +103,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
+#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
 #endif
 #include <immintrin.h>
 test_1 (_cvtss_sh, unsigned short, float, 1)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 7faa053ace8..86590ca5ffb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -708,6 +708,6 @@
 #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1) 
 #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1) 
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 42ac9d0ac1a..10765365d7b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3020,7 +3020,7 @@ proc check_effective_target_has_q_floating_suffix { } {
 
 proc check_effective_target_float16 {} {
     return [check_no_compiler_messages_nocache float16 object {
-        _Float16 x;
+        _Float16 foo (_Float16 x) { return x; }
     } [add_options_for_float16 ""]]
 }
 
@@ -8714,6 +8714,17 @@ proc check_prefer_avx128 { } {
 }
 
 
+# Return 1 if avx512fp16 instructions can be compiled.
+
+proc check_effective_target_avx512fp16 { } {
+    return [check_no_compiler_messages avx512fp16 object {
+	void foo (void)
+	{
+	  asm volatile ("vmovw %edi, %xmm0");
+	}
+    } "-O2 -mavx512fp16" ]
+}
+
 # Return 1 if avx512f instructions can be compiled.
 
 proc check_effective_target_avx512f { } {
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
  2021-08-02  6:31                     ` [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16 liuhongt
@ 2021-08-02 19:34                       ` Joseph Myers
  2021-08-03  2:44                         ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Joseph Myers @ 2021-08-02 19:34 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches

On Mon, 2 Aug 2021, liuhongt via Gcc-patches wrote:

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 7979e240426..dc673c89bc8 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum excess_precision_type type)
>  	return (type == EXCESS_PRECISION_TYPE_STANDARD
>  		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
>  		: FLT_EVAL_METHOD_UNPREDICTABLE);
> +      case EXCESS_PRECISION_TYPE_FLOAT16:
> +	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
>        default:
>  	gcc_unreachable ();
>      }

I'd expect an error for -fexcess-precision=16 with -mfpmath=387 (since x87 
doesn't do float or double arithmetic, but -fexcess-precision=16 implies 
that all of _Float16, float and double are represented to the range and 
precision of their type withou any excess precision).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
  2021-08-02 19:34                       ` Joseph Myers
@ 2021-08-03  2:44                         ` Hongtao Liu
  2021-08-06  6:06                           ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-03  2:44 UTC (permalink / raw)
  To: Joseph Myers; +Cc: liuhongt, GCC Patches

On Tue, Aug 3, 2021 at 3:34 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Mon, 2 Aug 2021, liuhongt via Gcc-patches wrote:
>
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 7979e240426..dc673c89bc8 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum excess_precision_type type)
> >       return (type == EXCESS_PRECISION_TYPE_STANDARD
> >               ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> >               : FLT_EVAL_METHOD_UNPREDICTABLE);
> > +      case EXCESS_PRECISION_TYPE_FLOAT16:
> > +     return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> >        default:
> >       gcc_unreachable ();
> >      }
>
> I'd expect an error for -fexcess-precision=16 with -mfpmath=387 (since x87
> doesn't do float or double arithmetic, but -fexcess-precision=16 implies
> that all of _Float16, float and double are represented to the range and
> precision of their type withou any excess precision).
>
Yes, additional changes like this.

modified   gcc/config/i386/i386.c
@@ -23443,6 +23443,9 @@ ix86_get_excess_precision (enum
excess_precision_type type)
  ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
  : FLT_EVAL_METHOD_UNPREDICTABLE);
       case EXCESS_PRECISION_TYPE_FLOAT16:
+ if (TARGET_80387
+     && !(TARGET_SSE_MATH && TARGET_SSE))
+   error ("%<-fexcess-precision=16%> is not compatible with %<-mfpmath=387%>");
  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
  gcc_unreachable ();
new file   gcc/testsuite/gcc.target/i386/float16-7.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpmath=387 -fexcess-precision=16" } */
+/* { dg-excess-errors "'-fexcess-precision=16' is not compatible with
'-mfpmath=387'" } */
+_Float16
+foo (_Float16 a, _Float16 b)
+{
+  return a + b;/* { dg-error "'-fexcess-precision=16' is not
compatible with '-mfpmath=387'" } */
+}
+

> --
> Joseph S. Myers
> joseph@codesourcery.com



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.
  2021-08-02  6:44                     ` [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions liuhongt
@ 2021-08-04  2:40                       ` Hongtao Liu
  2021-08-04  9:55                       ` Uros Bizjak
  1 sibling, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-04  2:40 UTC (permalink / raw)
  To: liuhongt
  Cc: GCC Patches, Uros Bizjak, Joseph Myers, Richard Biener, H. J. Lu,
	Guo, Xuepeng, H . J . Lu, Wang Hongyu, Xu Dianhong

On Mon, Aug 2, 2021 at 2:44 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> From: "Guo, Xuepeng" <xuepeng.guo@intel.com>
>
> gcc/ChangeLog:
>
>         * common/config/i386/cpuinfo.h (get_available_features):
>         Detect FEATURE_AVX512FP16.
>         * common/config/i386/i386-common.c
>         (OPTION_MASK_ISA_AVX512FP16_SET,
>         OPTION_MASK_ISA_AVX512FP16_UNSET,
>         OPTION_MASK_ISA2_AVX512FP16_SET,
>         OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
>         (OPTION_MASK_ISA2_AVX512BW_UNSET,
>         OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
>         (ix86_handle_option): Handle -mavx512fp16.
>         * common/config/i386/i386-cpuinfo.h (enum processor_features):
>         Add FEATURE_AVX512FP16.
>         * common/config/i386/i386-isas.h: Add entry for AVX512FP16.
>         * config.gcc: Add avx512fp16intrin.h.
>         * config/i386/avx512fp16intrin.h: New intrinsic header.
>         * config/i386/cpuid.h: Add bit_AVX512FP16.
>         * config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
>         * config/i386/i386-builtins.c: Support _Float16 type for i386
>         backend.
>         (ix86_init_float16_builtins): New function.
>         (ix86_float16_type_node): New.
>         * config/i386/i386-c.c (ix86_target_macros_internal): Define
>         __AVX512FP16__.
>         * config/i386/i386-expand.c (ix86_expand_branch): Support
>         HFmode.
>         (ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
>         SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
>         (ix86_expand_fp_movcc): Ditto.
>         * config/i386/i386-isa.def: Add PTA define for AVX512FP16.
>         * config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
>         (ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
>         * config/i386/i386.c (ix86_get_ssemov): Use
>         vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
>         (ix86_get_excess_precision): Use
>         FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
>         existed.
>         (sse_store_index): Use SFmode cost for HFmode cost.
>         (inline_memory_move_cost): Add HFmode, and perfer SSE cost over
>         GPR cost for HFmode.
>         (ix86_hard_regno_mode_ok): Allow HImode in sse register.
>         (ix86_mangle_type): Add manlging for _Float16 type.
>         (inline_secondary_memory_needed): No memory is needed for
>         16bit movement between gpr and sse reg under
>         TARGET_AVX512FP16.
>         (ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
>         SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
>         (ix86_division_cost): Ditto.
>         (ix86_rtx_costs): Ditto.
>         (ix86_add_stmt_cost): Ditto.
>         (ix86_optab_supported_p): Ditto.
>         * config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
>         (SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
>         (PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
>         * config/i386/i386.md (mode): Add HFmode.
>         (MODE_SIZE): Add HFmode.
>         (isa): Add avx512fp16.
>         (enabled): Handle avx512fp16.
>         (ssemodesuffix): Add sh suffix for HFmode.
>         (comm): Add mult, div.
>         (plusminusmultdiv): New code iterator.
>         (insn): Add mult, div.
>         (*movhf_internal): Adjust for avx512fp16 instruction.
>         (*movhi_internal): Ditto.
>         (*cmpi<unord>hf): New define_insn for HFmode.
>         (*ieee_s<ieee_maxmin>hf3): Likewise.
>         (extendhf<mode>2): Likewise.
>         (trunc<mode>hf2): Likewise.
>         (float<floatunssuffix><mode>hf2): Likewise.
>         (*<insn>hf): Likewise.
>         (cbranchhf4): New expander.
>         (movhfcc): Likewise.
>         (<insn>hf3): Likewise.
>         (mulhf3): Likewise.
>         (divhf3): Likewise.
>         * config/i386/i386.opt: Add mavx512fp16.
>         * config/i386/immintrin.h: Include avx512fp16intrin.h.
>         * doc/invoke.texi: Add mavx512fp16.
>         * doc/extend.texi: Add avx512fp16 Usage Notes.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
>         * gcc.target/i386/avx-2.c: Ditto.
>         * gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
>         * gcc.target/i386/funcspec-56.inc: Add new target attribute check.
>         * gcc.target/i386/sse-13.c: Add -mavx512fp16.
>         * gcc.target/i386/sse-14.c: Ditto.
>         * gcc.target/i386/sse-22.c: Ditto.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * lib/target-supports.exp: (check_effective_target_avx512fp16): New.
>         * g++.target/i386/float16-1.C: New test.
>         * g++.target/i386/float16-2.C: Ditto.
>         * g++.target/i386/float16-3.C: Ditto.
>         * gcc.target/i386/avx512fp16-12a.c: Ditto.
>         * gcc.target/i386/avx512fp16-12b.c: Ditto.
>         * gcc.target/i386/float16-3a.c: Ditto.
>         * gcc.target/i386/float16-3b.c: Ditto.
>         * gcc.target/i386/float16-4a.c: Ditto.
>         * gcc.target/i386/float16-4b.c: Ditto.
>         * gcc.target/i386/pr54855-12.c: Ditto.
>         * g++.dg/other/i386-2.C: Ditto.
>         * g++.dg/other/i386-3.C: Ditto.
>
> Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
> Co-Authored-By: Liu Hongtao <hongtao.liu@intel.com>
> Co-Authored-By: Wang Hongyu <hongyu.wang@intel.com>
> Co-Authored-By: Xu Dianhong <dianhong.xu@intel.com>

Ping.
Hi uros, this is the updated patch according to your comments.
> ---
>  gcc/common/config/i386/cpuinfo.h              |   2 +
>  gcc/common/config/i386/i386-common.c          |  26 ++-
>  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
>  gcc/common/config/i386/i386-isas.h            |   1 +
>  gcc/config.gcc                                |   2 +-
>  gcc/config/i386/avx512fp16intrin.h            |  53 ++++++
>  gcc/config/i386/cpuid.h                       |   1 +
>  gcc/config/i386/i386-builtin-types.def        |   1 +
>  gcc/config/i386/i386-builtins.c               |  23 +++
>  gcc/config/i386/i386-c.c                      |   2 +
>  gcc/config/i386/i386-expand.c                 |   5 +-
>  gcc/config/i386/i386-isa.def                  |   1 +
>  gcc/config/i386/i386-options.c                |   4 +-
>  gcc/config/i386/i386.c                        | 133 ++++++++++----
>  gcc/config/i386/i386.h                        |  11 +-
>  gcc/config/i386/i386.md                       | 172 ++++++++++++++++--
>  gcc/config/i386/i386.opt                      |   4 +
>  gcc/config/i386/immintrin.h                   |   4 +
>  gcc/doc/extend.texi                           |   8 +
>  gcc/doc/invoke.texi                           |  10 +-
>  gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
>  gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
>  gcc/testsuite/g++.target/i386/float16-2.C     |  14 ++
>  gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
>  gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
>  .../gcc.target/i386/avx512fp16-12a.c          |  21 +++
>  .../gcc.target/i386/avx512fp16-12b.c          |  27 +++
>  gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 ++
>  gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
>  gcc/testsuite/lib/target-supports.exp         |  13 +-
>  41 files changed, 558 insertions(+), 76 deletions(-)
>  create mode 100644 gcc/config/i386/avx512fp16intrin.h
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
>
> diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
> index 458f41de776..1835ac64e67 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -731,6 +731,8 @@ get_available_features (struct __processor_model *cpu_model,
>             set_feature (FEATURE_AVX5124FMAPS);
>           if (edx & bit_AVX512VP2INTERSECT)
>             set_feature (FEATURE_AVX512VP2INTERSECT);
> +         if (edx & bit_AVX512FP16)
> +           set_feature (FEATURE_AVX512FP16);
>         }
>
>        __cpuid_count (7, 1, eax, ebx, ecx, edx);
> diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
> index 76ab1a14e54..00c65ba15ab 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX5124VNNIW_SET OPTION_MASK_ISA2_AVX5124VNNIW
>  #define OPTION_MASK_ISA_AVX512VBMI2_SET \
>    (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512F_SET)
> +#define OPTION_MASK_ISA_AVX512FP16_SET OPTION_MASK_ISA_AVX512BW_SET
> +#define OPTION_MASK_ISA2_AVX512FP16_SET OPTION_MASK_ISA2_AVX512FP16
>  #define OPTION_MASK_ISA_AVX512VNNI_SET \
>    (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512F_SET)
>  #define OPTION_MASK_ISA2_AVXVNNI_SET OPTION_MASK_ISA2_AVXVNNI
> @@ -231,6 +233,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS
>  #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW
>  #define OPTION_MASK_ISA_AVX512VBMI2_UNSET OPTION_MASK_ISA_AVX512VBMI2
> +#define OPTION_MASK_ISA_AVX512FP16_UNSET OPTION_MASK_ISA_AVX512BW_UNSET
> +#define OPTION_MASK_ISA2_AVX512FP16_UNSET OPTION_MASK_ISA2_AVX512FP16
>  #define OPTION_MASK_ISA_AVX512VNNI_UNSET OPTION_MASK_ISA_AVX512VNNI
>  #define OPTION_MASK_ISA2_AVXVNNI_UNSET OPTION_MASK_ISA2_AVXVNNI
>  #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ
> @@ -313,7 +317,8 @@ along with GCC; see the file COPYING3.  If not see
>    (OPTION_MASK_ISA2_AVX512BF16_UNSET \
>     | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \
>     | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \
> -   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
> +   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> +   | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
>    (OPTION_MASK_ISA2_AVX512F_UNSET)
>  #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
> @@ -326,7 +331,9 @@ along with GCC; see the file COPYING3.  If not see
>    (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET)
>  #define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET
>
> -#define OPTION_MASK_ISA2_AVX512BW_UNSET OPTION_MASK_ISA2_AVX512BF16_UNSET
> +#define OPTION_MASK_ISA2_AVX512BW_UNSET \
> +  (OPTION_MASK_ISA2_AVX512BF16_UNSET \
> +    | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>
>  /* Set 1 << value as value of -malign-FLAG option.  */
>
> @@ -853,6 +860,21 @@ ix86_handle_option (struct gcc_options *opts,
>         }
>        return true;
>
> +    case OPT_mavx512fp16:
> +      if (value)
> +       {
> +         opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
> +       }
> +      else
> +       {
> +         opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
> +       }
> +      return true;
> +
>      case OPT_mavx512vnni:
>        if (value)
>         {
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
> index e68dd656046..4e0659fc7b2 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -228,6 +228,7 @@ enum processor_features
>    FEATURE_AESKLE,
>    FEATURE_WIDEKL,
>    FEATURE_AVXVNNI,
> +  FEATURE_AVX512FP16,
>    CPU_FEATURE_MAX
>  };
>
> diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
> index 898c18f3dda..a6783660278 100644
> --- a/gcc/common/config/i386/i386-isas.h
> +++ b/gcc/common/config/i386/i386-isas.h
> @@ -169,4 +169,5 @@ ISA_NAMES_TABLE_START
>    ISA_NAMES_TABLE_ENTRY("aeskle", FEATURE_AESKLE, P_NONE, NULL)
>    ISA_NAMES_TABLE_ENTRY("widekl", FEATURE_WIDEKL, P_NONE, "-mwidekl")
>    ISA_NAMES_TABLE_ENTRY("avxvnni", FEATURE_AVXVNNI, P_NONE, "-mavxvnni")
> +  ISA_NAMES_TABLE_ENTRY("avx512fp16", FEATURE_AVX512FP16, P_NONE, "-mavx512fp16")
>  ISA_NAMES_TABLE_END
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 3df9b52cf25..a354351408c 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
>                        tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
>                        amxbf16intrin.h x86gprintrin.h uintrintrin.h
>                        hresetintrin.h keylockerintrin.h avxvnniintrin.h
> -                      mwaitintrin.h"
> +                      mwaitintrin.h avx512fp16intrin.h"
>         ;;
>  ia64-*-*)
>         extra_headers=ia64intrin.h
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> new file mode 100644
> index 00000000000..38d63161ba6
> --- /dev/null
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -0,0 +1,53 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _IMMINTRIN_H_INCLUDED
> +#error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
> +#endif
> +
> +#ifndef __AVX512FP16INTRIN_H_INCLUDED
> +#define __AVX512FP16INTRIN_H_INCLUDED
> +
> +#ifndef __AVX512FP16__
> +#pragma GCC push_options
> +#pragma GCC target("avx512fp16")
> +#define __DISABLE_AVX512FP16__
> +#endif /* __AVX512FP16__ */
> +
> +/* Internal data types for implementing the intrinsics.  */
> +typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
> +typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
> +typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
> +
> +/* The Intel API is flexible enough that we must allow aliasing with other
> +   vector types, and their scalar components.  */
> +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
> +typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
> +typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +#ifdef __DISABLE_AVX512FP16__
> +#undef __DISABLE_AVX512FP16__
> +#pragma GCC pop_options
> +#endif /* __DISABLE_AVX512FP16__ */
> +
> +#endif /* __AVX512FP16INTRIN_H_INCLUDED */
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index aebc17c6827..82b8050028b 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -126,6 +126,7 @@
>  #define bit_AVX5124VNNIW (1 << 2)
>  #define bit_AVX5124FMAPS (1 << 3)
>  #define bit_AVX512VP2INTERSECT (1 << 8)
> +#define bit_AVX512FP16   (1 << 23)
>  #define bit_IBT        (1 << 20)
>  #define bit_UINTR (1 << 5)
>  #define bit_PCONFIG    (1 << 18)
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index 3ca313c19ec..1768b88d748 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -68,6 +68,7 @@ DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
>  DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)
>  DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
>  DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
> +DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>  DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
> index 204e2903126..668f09f12a0 100644
> --- a/gcc/config/i386/i386-builtins.c
> +++ b/gcc/config/i386/i386-builtins.c
> @@ -125,6 +125,7 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
>  /* Table for the ix86 builtin non-function types.  */
>  static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
>
> +tree ix86_float16_type_node = NULL_TREE;
>  /* Retrieve an element from the above table, building some of
>     the types lazily.  */
>
> @@ -1343,6 +1344,26 @@ ix86_init_builtins_va_builtins_abi (void)
>                         BUILT_IN_VA_COPY, BUILT_IN_NORMAL, NULL, fnattr_sysv);
>  }
>
> +static void
> +ix86_init_float16_builtins (void)
> +{
> +  /* Provide the _Float16 type and float16_type_node if needed so that
> +     it can be used in AVX512FP16 intrinsics and builtins.  */
> +  if (!float16_type_node)
> +    {
> +      ix86_float16_type_node = make_node (REAL_TYPE);
> +      TYPE_PRECISION (ix86_float16_type_node) = 16;
> +      SET_TYPE_MODE (ix86_float16_type_node, HFmode);
> +      layout_type (ix86_float16_type_node);
> +    }
> +  else
> +    ix86_float16_type_node = float16_type_node;
> +
> +  if (!maybe_get_identifier ("_Float16") && TARGET_SSE2)
> +    lang_hooks.types.register_builtin_type (ix86_float16_type_node,
> +                                           "_Float16");
> +}
> +
>  static void
>  ix86_init_builtin_types (void)
>  {
> @@ -1371,6 +1392,8 @@ ix86_init_builtin_types (void)
>       it.  */
>    lang_hooks.types.register_builtin_type (float128_type_node, "__float128");
>
> +  ix86_init_float16_builtins ();
> +
>    const_string_type_node
>      = build_pointer_type (build_qualified_type
>                           (char_type_node, TYPE_QUAL_CONST));
> diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> index 5ed0de006fb..cc64f855ecc 100644
> --- a/gcc/config/i386/i386-c.c
> +++ b/gcc/config/i386/i386-c.c
> @@ -598,6 +598,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>      def_or_undef (parse_in, "__PTWRITE__");
>    if (isa_flag2 & OPTION_MASK_ISA2_AVX512BF16)
>      def_or_undef (parse_in, "__AVX512BF16__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX512FP16)
> +    def_or_undef (parse_in, "__AVX512FP16__");
>    if (TARGET_MMX_WITH_SSE)
>      def_or_undef (parse_in, "__MMX_WITH_SSE__");
>    if (isa_flag2 & OPTION_MASK_ISA2_ENQCMD)
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 69ea79e6123..b7d050a1e42 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -2314,6 +2314,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
>
>    switch (mode)
>      {
> +    case E_HFmode:
>      case E_SFmode:
>      case E_DFmode:
>      case E_XFmode:
> @@ -2627,7 +2628,7 @@ ix86_prepare_fp_compare_args (enum rtx_code code, rtx *pop0, rtx *pop1)
>    bool unordered_compare = ix86_unordered_fp_compare (code);
>    rtx op0 = *pop0, op1 = *pop1;
>    machine_mode op_mode = GET_MODE (op0);
> -  bool is_sse = TARGET_SSE_MATH && SSE_FLOAT_MODE_P (op_mode);
> +  bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
>
>    /* All of the unordered compare instructions only work on registers.
>       The same is true of the fcomi compare instructions.  The XFmode
> @@ -4112,7 +4113,7 @@ ix86_expand_fp_movcc (rtx operands[])
>    rtx op0 = XEXP (operands[1], 0);
>    rtx op1 = XEXP (operands[1], 1);
>
> -  if (TARGET_SSE_MATH && SSE_FLOAT_MODE_P (mode))
> +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      {
>        machine_mode cmode;
>
> diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
> index a0d46cbc892..83d9302ea3d 100644
> --- a/gcc/config/i386/i386-isa.def
> +++ b/gcc/config/i386/i386-isa.def
> @@ -108,3 +108,4 @@ DEF_PTA(HRESET)
>  DEF_PTA(KL)
>  DEF_PTA(WIDEKL)
>  DEF_PTA(AVXVNNI)
> +DEF_PTA(AVX512FP16)
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index 3416a4f1752..df191763e4b 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -223,7 +223,8 @@ static struct ix86_target_opts isa2_opts[] =
>    { "-mhreset",                OPTION_MASK_ISA2_HRESET },
>    { "-mkl",            OPTION_MASK_ISA2_KL },
>    { "-mwidekl",        OPTION_MASK_ISA2_WIDEKL },
> -  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI }
> +  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI },
> +  { "-mavx512fp16",    OPTION_MASK_ISA2_AVX512FP16 }
>  };
>  static struct ix86_target_opts isa_opts[] =
>  {
> @@ -1045,6 +1046,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
>      IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
>      IX86_ATTR_ISA ("hreset", OPT_mhreset),
>      IX86_ATTR_ISA ("avxvnni",   OPT_mavxvnni),
> +    IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16),
>
>      /* enum options */
>      IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_),
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index dc673c89bc8..71bbcf968c5 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5497,6 +5497,14 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
>      case MODE_SI:
>        return "%vmovd\t{%1, %0|%0, %1}";
>
> +    case MODE_HI:
> +      if (GENERAL_REG_P (operands[0]))
> +       return "vmovw\t{%1, %k0|%k0, %1}";
> +      else if (GENERAL_REG_P (operands[1]))
> +       return "vmovw\t{%k1, %0|%0, %k1}";
> +      else
> +       return "vmovw\t{%1, %0|%0, %1}";
> +
>      case MODE_DF:
>        if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
>         return "vmovsd\t{%d1, %0|%0, %d1}";
> @@ -5509,6 +5517,12 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
>        else
>         return "%vmovss\t{%1, %0|%0, %1}";
>
> +    case MODE_HF:
> +      if (REG_P (operands[0]) && REG_P (operands[1]))
> +       return "vmovsh\t{%d1, %0|%0, %d1}";
> +      else
> +       return "vmovsh\t{%1, %0|%0, %1}";
> +
>      case MODE_V1DF:
>        gcc_assert (!TARGET_AVX);
>        return "movlpd\t{%1, %0|%0, %1}";
> @@ -13955,7 +13969,7 @@ output_387_binary_op (rtx_insn *insn, rtx *operands)
>
>    if (is_sse)
>     {
> -     p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd";
> +     p = (GET_MODE (operands[0]) == SFmode ? "ss" : "sd");
>       strcat (buf, p);
>
>       if (TARGET_AVX)
> @@ -19132,10 +19146,19 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1,
>        if (!TARGET_SSE2)
>         return true;
>
> +      if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2)))
> +       return true;
> +
> +      int msize = GET_MODE_SIZE (mode);
> +
>        /* Between SSE and general, we have moves no larger than word size.  */
> -      if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
> -         || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
> -         || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> +      if (msize > UNITS_PER_WORD)
> +       return true;
> +
> +      /* In addition to SImode moves, AVX512FP16 also enables HImode moves.  */
> +      int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);
> +
> +      if (msize < minsize)
>         return true;
>
>        /* If the target says that inter-unit moves are more expensive
> @@ -19229,21 +19252,26 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
>  static inline int
>  sse_store_index (machine_mode mode)
>  {
> -      switch (GET_MODE_SIZE (mode))
> -       {
> -         case 4:
> -           return 0;
> -         case 8:
> -           return 1;
> -         case 16:
> -           return 2;
> -         case 32:
> -           return 3;
> -         case 64:
> -           return 4;
> -         default:
> -           return -1;
> -       }
> +  /* NB: Use SFmode cost for HFmode instead of adding HFmode load/store
> +     costs to processor_costs, which requires changes to all entries in
> +     processor cost table.  */
> +  if (mode == E_HFmode)
> +    mode = E_SFmode;
> +  switch (GET_MODE_SIZE (mode))
> +    {
> +    case 4:
> +      return 0;
> +    case 8:
> +      return 1;
> +    case 16:
> +      return 2;
> +    case 32:
> +      return 3;
> +    case 64:
> +      return 4;
> +    default:
> +      return -1;
> +    }
>  }
>
>  /* Return the cost of moving data of mode M between a
> @@ -19270,6 +19298,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>        int index;
>        switch (mode)
>         {
> +         case E_HFmode:
>           case E_SFmode:
>             index = 0;
>             break;
> @@ -19370,11 +19399,31 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>           }
>         break;
>        case 2:
> -       if (in == 2)
> -         return MAX (ix86_cost->hard_register.int_load[1],
> -                     ix86_cost->hard_register.int_store[1]);
> -       return in ? ix86_cost->hard_register.int_load[1]
> -                 : ix86_cost->hard_register.int_store[1];
> +       {
> +         int cost;
> +         if (in == 2)
> +           cost = MAX (ix86_cost->hard_register.int_load[1],
> +                       ix86_cost->hard_register.int_store[1]);
> +         else
> +           cost = in ? ix86_cost->hard_register.int_load[1]
> +                     : ix86_cost->hard_register.int_store[1];
> +         if (mode == E_HFmode)
> +           {
> +             /* Prefer SSE over GPR for HFmode.  */
> +             int sse_cost;
> +             int index = sse_store_index (mode);
> +             if (in == 2)
> +               sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
> +                               ix86_cost->hard_register.sse_store[index]);
> +             else
> +               sse_cost = (in
> +                           ? ix86_cost->hard_register.sse_load [index]
> +                           : ix86_cost->hard_register.sse_store [index]);
> +             if (sse_cost >= cost)
> +               cost = sse_cost + 1;
> +           }
> +         return cost;
> +       }
>        default:
>         if (in == 2)
>           cost = MAX (ix86_cost->hard_register.int_load[2],
> @@ -19548,6 +19597,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>           - XI mode
>           - any of 512-bit wide vector mode
>           - any scalar mode.  */
> +      /* For AVX512FP16, vmovw supports movement of HImode
> +        between gpr and sse registser.  */
>        if (TARGET_AVX512F
>           && (mode == XImode
>               || VALID_AVX512F_REG_MODE (mode)
> @@ -19831,7 +19882,7 @@ ix86_multiplication_cost (const struct processor_costs *cost,
>    if (VECTOR_MODE_P (mode))
>      inner_mode = GET_MODE_INNER (mode);
>
> -  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      return inner_mode == DFmode ? cost->mulsd : cost->mulss;
>    else if (X87_FLOAT_MODE_P (mode))
>      return cost->fmul;
> @@ -19883,7 +19934,7 @@ ix86_division_cost (const struct processor_costs *cost,
>    if (VECTOR_MODE_P (mode))
>      inner_mode = GET_MODE_INNER (mode);
>
> -  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      return inner_mode == DFmode ? cost->divsd : cost->divss;
>    else if (X87_FLOAT_MODE_P (mode))
>      return cost->fdiv;
> @@ -20303,7 +20354,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>           return true;
>         }
>
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         {
>           *total = cost->addss;
>           return false;
> @@ -20336,7 +20387,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        /* FALLTHRU */
>
>      case NEG:
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         {
>           *total = cost->sse_op;
>           return false;
> @@ -20418,14 +20469,14 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        return false;
>
>      case FLOAT_EXTEND:
> -      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
> +      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = 0;
>        else
>          *total = ix86_vec_cost (mode, cost->addss);
>        return false;
>
>      case FLOAT_TRUNCATE:
> -      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
> +      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = cost->fadd;
>        else
>          *total = ix86_vec_cost (mode, cost->addss);
> @@ -20435,7 +20486,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        /* SSE requires memory load for the constant operand. It may make
>          sense to account for this.  Of course the constant operand may or
>          may not be reused. */
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = cost->sse_op;
>        else if (X87_FLOAT_MODE_P (mode))
>         *total = cost->fabs;
> @@ -20444,7 +20495,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        return false;
>
>      case SQRT:
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = mode == SFmode ? cost->sqrtss : cost->sqrtsd;
>        else if (X87_FLOAT_MODE_P (mode))
>         *total = cost->fsqrt;
> @@ -21928,6 +21979,10 @@ ix86_mangle_type (const_tree type)
>
>    switch (TYPE_MODE (type))
>      {
> +    case E_HFmode:
> +      /* _Float16 is "DF16_".
> +        Align with clang's decision in https://reviews.llvm.org/D33719. */
> +      return "DF16_";
>      case E_TFmode:
>        /* __float128 is "g".  */
>        return "g";
> @@ -22551,7 +22606,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>         case MINUS_EXPR:
>           if (kind == scalar_stmt)
>             {
> -             if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +             if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>                 stmt_cost = ix86_cost->addss;
>               else if (X87_FLOAT_MODE_P (mode))
>                 stmt_cost = ix86_cost->fadd;
> @@ -22569,7 +22624,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>           stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>           break;
>         case NEGATE_EXPR:
> -         if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +         if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>             stmt_cost = ix86_cost->sse_op;
>           else if (X87_FLOAT_MODE_P (mode))
>             stmt_cost = ix86_cost->fchs;
> @@ -22625,7 +22680,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>         case BIT_XOR_EXPR:
>         case BIT_AND_EXPR:
>         case BIT_NOT_EXPR:
> -         if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +         if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>             stmt_cost = ix86_cost->sse_op;
>           else if (VECTOR_MODE_P (mode))
>             stmt_cost = ix86_vec_cost (mode, ix86_cost->sse_op);
> @@ -23327,14 +23382,18 @@ ix86_get_excess_precision (enum excess_precision_type type)
>         /* The fastest type to promote to will always be the native type,
>            whether that occurs with implicit excess precision or
>            otherwise.  */
> -       return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +       return TARGET_AVX512FP16
> +              ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> +              : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
>        case EXCESS_PRECISION_TYPE_STANDARD:
>        case EXCESS_PRECISION_TYPE_IMPLICIT:
>         /* Otherwise, the excess precision we want when we are
>            in a standards compliant mode, and the implicit precision we
>            provide would be identical were it not for the unpredictable
>            cases.  */
> -       if (!TARGET_80387)
> +       if (TARGET_AVX512FP16 && TARGET_SSE_MATH)
> +         return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> +       else if (!TARGET_80387)
>           return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
>         else if (!TARGET_MIX_SSE_I387)
>           {
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index b1e66ee192e..8fcd5693624 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1000,7 +1000,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>
>  #define VALID_AVX512F_SCALAR_MODE(MODE)                                        \
>    ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode            \
> -   || (MODE) == SFmode)
> +   || (MODE) == SFmode                                                 \
> +   || (TARGET_AVX512FP16 && ((MODE) == HImode || (MODE) == HFmode)))
>
>  #define VALID_AVX512F_REG_MODE(MODE)                                   \
>    ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode     \
> @@ -1039,7 +1040,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>
>  #define VALID_FP_MODE_P(MODE)                                          \
>    ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode            \
> -   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)                \
> +   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
>
>  #define VALID_INT_MODE_P(MODE)                                         \
>    ((MODE) == QImode || (MODE) == HImode                                        \
> @@ -1072,6 +1073,10 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define SSE_FLOAT_MODE_P(MODE) \
>    ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
>
> +#define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE)                           \
> +  ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH)                                \
> +   || (TARGET_AVX512FP16 && (MODE) == HFmode))
> +
>  #define FMA4_VEC_FLOAT_MODE_P(MODE) \
>    (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \
>                   || (MODE) == V8SFmode || (MODE) == V4DFmode))
> @@ -2265,7 +2270,7 @@ constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
>  constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
>    | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
>    | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
> -  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI;
> +  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16;
>  constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
>    | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
>  constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index d475347172d..777d11261ac 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -496,7 +496,7 @@ (define_attr "type"
>
>  ;; Main data type used by the insn
>  (define_attr "mode"
> -  "unknown,none,QI,HI,SI,DI,TI,OI,XI,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
> +  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
>    V2DF,V2SF,V1DF,V8DF"
>    (const_string "unknown"))
>
> @@ -832,8 +832,7 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
>                     sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
>                     avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
>                     avx512bw,noavx512bw,avx512dq,noavx512dq,
> -                   avx512vl,noavx512vl,
> -                   avxvnni,avx512vnnivl"
> +                   avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16"
>    (const_string "base"))
>
>  ;; Define instruction set of MMX instructions
> @@ -885,6 +884,8 @@ (define_attr "enabled" ""
>          (eq_attr "isa" "avxvnni") (symbol_ref "TARGET_AVXVNNI")
>          (eq_attr "isa" "avx512vnnivl")
>            (symbol_ref "TARGET_AVX512VNNI && TARGET_AVX512VL")
> +        (eq_attr "isa" "avx512fp16")
> +          (symbol_ref "TARGET_AVX512FP16")
>
>          (eq_attr "mmx_isa" "native")
>            (symbol_ref "!TARGET_MMX_WITH_SSE")
> @@ -906,6 +907,7 @@ (define_asm_attributes
>     (set_attr "type" "multi")])
>
>  (define_code_iterator plusminus [plus minus])
> +(define_code_iterator plusminusmultdiv [plus minus mult div])
>
>  (define_code_iterator sat_plusminus [ss_plus us_plus ss_minus us_minus])
>
> @@ -921,7 +923,8 @@ (define_code_attr multdiv_mnemonic
>
>  ;; Mark commutative operators as such in constraints.
>  (define_code_attr comm [(plus "%") (ss_plus "%") (us_plus "%")
> -                       (minus "") (ss_minus "") (us_minus "")])
> +                       (minus "") (ss_minus "") (us_minus "")
> +                       (mult "%") (div "")])
>
>  ;; Mapping of max and min
>  (define_code_iterator maxmin [smax smin umax umin])
> @@ -1021,7 +1024,8 @@ (define_code_attr insn
>     (minus "sub") (ss_minus "sssub") (us_minus "ussub")
>     (sign_extend "extend") (zero_extend "zero_extend")
>     (ashift "ashl") (lshiftrt "lshr") (ashiftrt "ashr")
> -   (rotate "rotl") (rotatert "rotr")])
> +   (rotate "rotl") (rotatert "rotr")
> +   (mult "mul") (div "div")])
>
>  ;; All integer modes.
>  (define_mode_iterator SWI1248x [QI HI SI DI])
> @@ -1089,8 +1093,9 @@ (define_mode_iterator SWI48DWI [SI DI (TI "TARGET_64BIT")])
>  ;; compile time constant, it is faster to use <MODE_SIZE> than
>  ;; GET_MODE_SIZE (<MODE>mode).  For XFmode which depends on
>  ;; command line options just use GET_MODE_SIZE macro.
> -(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (TI "16")
> -                            (SF "4") (DF "8") (XF "GET_MODE_SIZE (XFmode)")
> +(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
> +                            (TI "16") (HF "2") (SF "4") (DF "8")
> +                            (XF "GET_MODE_SIZE (XFmode)")
>                              (V16QI "16") (V32QI "32") (V64QI "64")
>                              (V8HI "16") (V16HI "32") (V32HI "64")
>                              (V4SI "16") (V8SI "32") (V16SI "64")
> @@ -1222,8 +1227,8 @@ (define_mode_iterator MODEF [SF DF])
>  ;; All x87 floating point modes
>  (define_mode_iterator X87MODEF [SF DF XF])
>
> -;; All x87 floating point modes plus HF
> -(define_mode_iterator X87MODEFH [SF DF XF HF])
> +;; All x87 floating point modes plus HFmode
> +(define_mode_iterator X87MODEFH [HF SF DF XF])
>
>  ;; All SSE floating point modes
>  (define_mode_iterator SSEMODEF [SF DF TF])
> @@ -1231,7 +1236,7 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
>
>  ;; SSE instruction suffix for various modes
>  (define_mode_attr ssemodesuffix
> -  [(SF "ss") (DF "sd")
> +  [(HF "sh") (SF "ss") (DF "sd")
>     (V16SF "ps") (V8DF "pd")
>     (V8SF "ps") (V4DF "pd")
>     (V4SF "ps") (V2DF "pd")
> @@ -1496,6 +1501,23 @@ (define_expand "cstorexf4"
>    DONE;
>  })
>
> +(define_expand "cbranchhf4"
> +  [(set (reg:CC FLAGS_REG)
> +       (compare:CC (match_operand:HF 1 "cmp_fp_expander_operand")
> +                   (match_operand:HF 2 "cmp_fp_expander_operand")))
> +   (set (pc) (if_then_else
> +              (match_operator 0 "ix86_fp_comparison_operator"
> +               [(reg:CC FLAGS_REG)
> +                (const_int 0)])
> +              (label_ref (match_operand 3))
> +              (pc)))]
> +  "TARGET_AVX512FP16"
> +{
> +  ix86_expand_branch (GET_CODE (operands[0]),
> +                     operands[1], operands[2], operands[3]);
> +  DONE;
> +})
> +
>  (define_expand "cbranch<mode>4"
>    [(set (reg:CC FLAGS_REG)
>         (compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
> @@ -1705,6 +1727,17 @@ (define_insn "*cmpi<unord><MODEF:mode>"
>          (eq_attr "alternative" "0")
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
> +
> +(define_insn "*cmpi<unord>hf"
> +  [(set (reg:CCFP FLAGS_REG)
> +       (compare:CCFP
> +         (match_operand:HF 0 "register_operand" "v")
> +         (match_operand:HF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "v<unord>comish\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "ssecomi")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
>
>  ;; Push/pop instructions.
>
> @@ -2436,8 +2469,8 @@ (define_insn "*movsi_internal"
>            (symbol_ref "true")))])
>
>  (define_insn "*movhi_internal"
> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k")
> -       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC"))]
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k,?r,?v,*v,*v,*m")
> +       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
>    "!(MEM_P (operands[0]) && MEM_P (operands[1]))
>     && ix86_hardreg_mov_ok (operands[0], operands[1])"
>
> @@ -2463,6 +2496,9 @@ (define_insn "*movhi_internal"
>           gcc_unreachable ();
>         }
>
> +    case TYPE_SSEMOV:
> +      return ix86_output_ssemov (insn, operands);
> +
>      case TYPE_MSKLOG:
>        if (operands[1] == const0_rtx)
>         return "kxorw\t%0, %0, %0";
> @@ -2477,8 +2513,15 @@ (define_insn "*movhi_internal"
>         return "mov{w}\t{%1, %0|%0, %1}";
>      }
>  }
> -  [(set (attr "type")
> -     (cond [(eq_attr "alternative" "4,5,6,7")
> +  [(set (attr "isa")
> +       (cond [(eq_attr "alternative" "9,10,11,12,13")
> +                 (const_string "avx512fp16")
> +              ]
> +              (const_string "*")))
> +   (set (attr "type")
> +     (cond [(eq_attr "alternative" "9,10,11,12,13")
> +             (const_string "ssemov")
> +           (eq_attr "alternative" "4,5,6,7")
>               (const_string "mskmov")
>             (eq_attr "alternative" "8")
>               (const_string "msklog")
> @@ -2503,6 +2546,8 @@ (define_insn "*movhi_internal"
>      (set (attr "mode")
>        (cond [(eq_attr "type" "imovx")
>                (const_string "SI")
> +            (eq_attr "alternative" "11")
> +              (const_string "HF")
>              (and (eq_attr "alternative" "1,2")
>                   (match_operand:HI 1 "aligned_operand"))
>                (const_string "SI")
> @@ -3727,7 +3772,10 @@ (define_insn "*movhf_internal"
>                (eq_attr "alternative" "2")
>                  (const_string "sselog1")
>                (eq_attr "alternative" "4,5,6,7")
> -                (const_string "sselog")
> +                (if_then_else
> +                  (match_test ("TARGET_AVX512FP16"))
> +                  (const_string "ssemov")
> +                  (const_string "sselog"))
>               ]
>               (const_string "ssemov")))
>     (set (attr "memory")
> @@ -3750,9 +3798,15 @@ (define_insn "*movhf_internal"
>                (eq_attr "alternative" "2")
>                  (const_string "V4SF")
>                (eq_attr "alternative" "4,5,6,7")
> -                (const_string "TI")
> +                (if_then_else
> +                  (match_test "TARGET_AVX512FP16")
> +                  (const_string "HI")
> +                  (const_string "TI"))
>                (eq_attr "alternative" "3")
> -                (const_string "SF")
> +                (if_then_else
> +                  (match_test "TARGET_AVX512FP16")
> +                  (const_string "HF")
> +                  (const_string "SF"))
>               ]
>               (const_string "*")))])
>
> @@ -4493,6 +4547,17 @@ (define_split
>    emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
>  })
>
> +(define_insn "extendhf<mode>2"
> +  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
> +        (float_extend:MODEF
> +         (match_operand:HF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvtsh2<ssemodesuffix>\t{%1, %0, %0|%0, %0, %1}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<MODE>")])
> +
> +
>  (define_expand "extend<mode>xf2"
>    [(set (match_operand:XF 0 "nonimmediate_operand")
>          (float_extend:XF (match_operand:MODEF 1 "general_operand")))]
> @@ -4670,6 +4735,18 @@ (define_insn "truncxf<mode>2"
>               (symbol_ref "flag_unsafe_math_optimizations")
>            ]
>            (symbol_ref "true")))])
> +
> +;; Conversion from {SF,DF}mode to HFmode.
> +
> +(define_insn "trunc<mode>hf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (float_truncate:HF
> +         (match_operand:MODEF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvt<ssemodesuffix>2sh\t{%1, %d0|%d0, %1}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
>
>  ;; Signed conversion to DImode.
>
> @@ -5046,6 +5123,16 @@ (define_insn "*float<SWI48:mode><MODEF:mode>2"
>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")]
>            (symbol_ref "true")))])
>
> +(define_insn "float<floatunssuffix><mode>hf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (any_float:HF
> +         (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvt<floatsuffix>si2sh<rex64suffix>\t{%1, %d0|%d0, %1}"
> +  [(set_attr "type" "sseicvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*floatdi<MODEF:mode>2_i387"
>    [(set (match_operand:MODEF 0 "register_operand" "=f")
>         (float:MODEF (match_operand:DI 1 "nonimmediate_operand" "m")))]
> @@ -7626,6 +7713,13 @@ (define_expand "<insn>xf3"
>           (match_operand:XF 2 "register_operand")))]
>    "TARGET_80387")
>
> +(define_expand "<insn>hf3"
> +  [(set (match_operand:HF 0 "register_operand")
> +       (plusminus:HF
> +         (match_operand:HF 1 "register_operand")
> +         (match_operand:HF 2 "nonimmediate_operand")))]
> +  "TARGET_AVX512FP16")
> +
>  (define_expand "<insn><mode>3"
>    [(set (match_operand:MODEF 0 "register_operand")
>         (plusminus:MODEF
> @@ -8203,6 +8297,12 @@ (define_expand "mulxf3"
>                  (match_operand:XF 2 "register_operand")))]
>    "TARGET_80387")
>
> +(define_expand "mulhf3"
> +  [(set (match_operand:HF 0 "register_operand")
> +       (mult:HF (match_operand:HF 1 "register_operand")
> +                   (match_operand:HF 2 "nonimmediate_operand")))]
> +  "TARGET_AVX512FP16")
> +
>  (define_expand "mul<mode>3"
>    [(set (match_operand:MODEF 0 "register_operand")
>         (mult:MODEF (match_operand:MODEF 1 "register_operand")
> @@ -8220,6 +8320,12 @@ (define_expand "divxf3"
>                 (match_operand:XF 2 "register_operand")))]
>    "TARGET_80387")
>
> +(define_expand "divhf3"
> +  [(set (match_operand:HF 0 "register_operand")
> +       (div:HF (match_operand:HF 1 "register_operand")
> +                  (match_operand:HF 2 "nonimmediate_operand")))]
> +  "TARGET_AVX512FP16")
> +
>  (define_expand "div<mode>3"
>    [(set (match_operand:MODEF 0 "register_operand")
>         (div:MODEF (match_operand:MODEF 1 "register_operand")
> @@ -16312,6 +16418,17 @@ (define_insn "*fop_<mode>_comm"
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
>
> +(define_insn "*<insn>hf"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (plusminusmultdiv:HF
> +         (match_operand:HF 1 "nonimmediate_operand" "<comm>v")
> +         (match_operand:HF 2 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16
> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> +  "v<insn>sh\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*rcpsf2_sse"
>    [(set (match_operand:SF 0 "register_operand" "=x,x,x")
>         (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
> @@ -19178,6 +19295,15 @@ (define_peephole2
>      gcc_unreachable ();
>  })
>
> +(define_expand "movhfcc"
> +  [(set (match_operand:HF 0 "register_operand")
> +       (if_then_else:HF
> +         (match_operand 1 "comparison_operator")
> +         (match_operand:HF 2 "register_operand")
> +         (match_operand:HF 3 "register_operand")))]
> +  "TARGET_AVX512FP16"
> +  "if (ix86_expand_fp_movcc (operands)) DONE; else FAIL;")
> +
>  (define_expand "mov<mode>cc"
>    [(set (match_operand:X87MODEF 0 "register_operand")
>         (if_then_else:X87MODEF
> @@ -19346,6 +19472,18 @@ (define_insn "<code><mode>3"
>  ;; Their operands are not commutative, and thus they may be used in the
>  ;; presence of -0.0 and NaN.
>
> +(define_insn "*ieee_s<ieee_maxmin>hf3"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (unspec:HF
> +         [(match_operand:HF 1 "register_operand" "v")
> +          (match_operand:HF 2 "nonimmediate_operand" "vm")]
> +         IEEE_MAXMIN))]
> +  "TARGET_AVX512FP16"
> +  "v<ieee_maxmin>sh\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "prefix" "evex")
> +   (set_attr "type" "sseadd")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*ieee_s<ieee_maxmin><mode>3"
>    [(set (match_operand:MODEF 0 "register_operand" "=x,v")
>         (unspec:MODEF
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 7b8547bb1c3..ad366974b5b 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1166,3 +1166,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property.
>  mmwait
>  Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save
>  Support MWAIT and MONITOR built-in functions and code generation.
> +
> +mavx512fp16
> +Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
> +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation.
> diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
> index f129de4bbe5..2421a78637b 100644
> --- a/gcc/config/i386/immintrin.h
> +++ b/gcc/config/i386/immintrin.h
> @@ -94,6 +94,10 @@
>
>  #include <avx512vp2intersectvlintrin.h>
>
> +#ifdef __SSE2__
> +#include <avx512fp16intrin.h>
> +#endif
> +
>  #include <shaintrin.h>
>
>  #include <fmaintrin.h>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 3a1978efc97..09040bfca33 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -1164,6 +1164,14 @@ to inconsistent behavior between software emulation and AVX512-FP16
>  instructions. Using @option{-fexcess-precision=16} and  will force round
>  back after each operation.
>
> +Using @option{-mavx512fp16} will generate AVX512-FP16 instructions instead of
> +software emulation. The default behavior of @code{FLT_EVAL_METHOD} is to round
> +after each operation. The same is true with @option{-fexcess-precision=standard}
> +and @option{-mfpmath=sse}. If there is no @option{-mfpmath=sse},
> +@option{-fexcess-precision=standard} alone does the same thing as before,
> +It is useful for code that does not have @code{_Float16} and runs on the x87
> +FPU.
> +
>  @node Decimal Float
>  @section Decimal Floating Types
>  @cindex decimal floating types
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 32697e6117c..bb9f7ca956e 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1393,6 +1393,7 @@ See RS/6000 and PowerPC Options.
>  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
>  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
>  -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni@gol
> +-mavx512fp16 @gol
>  -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
>  -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
>  -mkl -mwidekl @gol
> @@ -31154,6 +31155,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
>  @itemx -mavx512bf16
>  @opindex mavx512bf16
>  @need 200
> +@itemx -mavx512fp16
> +@opindex mavx512fp16
> +@need 200
>  @itemx -mgfni
>  @opindex mgfni
>  @need 200
> @@ -31232,9 +31236,9 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP,
>  XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2,
>  GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
>  ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
> -UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI or CLDEMOTE
> -extended instruction sets. Each has a corresponding @option{-mno-} option to
> -disable use of these instructions.
> +UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16
> +or CLDEMOTE extended instruction sets. Each has a corresponding
> +@option{-mno-} option to disable use of these instructions.
>
>  These extensions are also available as built-in functions: see
>  @ref{x86 Built-in Functions}, for details of the functions enabled and
> diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
> index 62b2132957a..fba3d1ac684 100644
> --- a/gcc/testsuite/g++.dg/other/i386-2.C
> +++ b/gcc/testsuite/g++.dg/other/i386-2.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>
>  /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
>     xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
> diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
> index 843aa2bdb2f..5cc0fa83457 100644
> --- a/gcc/testsuite/g++.dg/other/i386-3.C
> +++ b/gcc/testsuite/g++.dg/other/i386-3.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>
>  /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
>     xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
> diff --git a/gcc/testsuite/g++.target/i386/float16-1.C b/gcc/testsuite/g++.target/i386/float16-1.C
> new file mode 100644
> index 00000000000..95d1ac27c4f
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-1.C
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-sse2" } */
> +
> +_Float16/* { dg-error "does not name a type" } */
> +foo (_Float16 x)
> +{
> +  return x;
> +}
> diff --git a/gcc/testsuite/g++.target/i386/float16-2.C b/gcc/testsuite/g++.target/i386/float16-2.C
> new file mode 100644
> index 00000000000..99eb797eff1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-2.C
> @@ -0,0 +1,14 @@
> +/* { dg-do assemble { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +union flt
> +{
> +  _Float16 flt;
> +  short s;
> +};
> +
> +_Float16
> +foo (union flt x)
> +{
> +  return x.flt;
> +}
> diff --git a/gcc/testsuite/g++.target/i386/float16-3.C b/gcc/testsuite/g++.target/i386/float16-3.C
> new file mode 100644
> index 00000000000..940878503f1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-3.C
> @@ -0,0 +1,10 @@
> +/* { dg-do assemble { target avx512fp16 } } */
> +/* { dg-options "-O0 -mavx512fp16" } */
> +
> +template <typename> void a(char *) {}
> +char b, d;
> +void c()
> +{
> +  a<unsigned char>(&d);
> +  a<_Float16>(&b);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> index 6178e38ce02..f3676077743 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw" } */
> +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
> index 986fbd819e4..1751c52565c 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw" } */
> +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> index 0a377dba1d5..0ad9064f637 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> @@ -87,6 +87,9 @@ main ()
>  #ifdef AVX512VNNI
>        && (ecx & bit_AVX512VNNI)
>  #endif
> +#ifdef AVX512FP16
> +      && (edx & bit_AVX512FP16)
> +#endif
>  #ifdef VAES
>        && (ecx & bit_VAES)
>  #endif
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> new file mode 100644
> index 00000000000..88887556d68
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +__attribute__ ((noinline, noclone))
> +do_max (_Float16 __A, _Float16 __B)
> +{
> +  return __A > __B ? __A : __B;
> +}
> +
> +_Float16
> +__attribute__ ((noinline, noclone))
> +do_min (_Float16 __A, _Float16 __B)
> +{
> +  return __A < __B ? __A : __B;
> +}
> +
> +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> new file mode 100644
> index 00000000000..c9e23bf95c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +#include "avx512fp16-12a.c"
> +
> +static void
> +do_test (void)
> +{
> +  _Float16 x = 0.1f;
> +  _Float16 y = -3.2f;
> +  _Float16 z;
> +
> +  z = do_max (x, y);
> +  if (z != x)
> +    abort ();
> +
> +  z = do_min (x, y);
> +  if (z != y)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/float16-3a.c b/gcc/testsuite/gcc.target/i386/float16-3a.c
> new file mode 100644
> index 00000000000..3846c8e9b6e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-3a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtsi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-3b.c b/gcc/testsuite/gcc.target/i386/float16-3b.c
> new file mode 100644
> index 00000000000..247dd6e7e33
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-3b.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (unsigned int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtusi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-4a.c b/gcc/testsuite/gcc.target/i386/float16-4a.c
> new file mode 100644
> index 00000000000..631082581f3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-4a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (long long x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtsi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-4b.c b/gcc/testsuite/gcc.target/i386/float16-4b.c
> new file mode 100644
> index 00000000000..828d8530769
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-4b.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (unsigned long long x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtusi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> index 79265c7c94f..8499fdf2db9 100644
> --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> @@ -79,6 +79,7 @@ extern void test_hreset (void)                        __attribute__((__target__("hreset")));
>  extern void test_keylocker (void)              __attribute__((__target__("kl")));
>  extern void test_widekl (void)                 __attribute__((__target__("widekl")));
>  extern void test_avxvnni (void)                        __attribute__((__target__("avxvnni")));
> +extern void test_avx512fp16 (void)             __attribute__((__target__("avx512fp16")));
>
>  extern void test_no_sgx (void)                 __attribute__((__target__("no-sgx")));
>  extern void test_no_avx5124fmaps(void)         __attribute__((__target__("no-avx5124fmaps")));
> @@ -159,6 +160,7 @@ extern void test_no_hreset (void)           __attribute__((__target__("no-hreset")));
>  extern void test_no_keylocker (void)           __attribute__((__target__("no-kl")));
>  extern void test_no_widekl (void)              __attribute__((__target__("no-widekl")));
>  extern void test_no_avxvnni (void)             __attribute__((__target__("no-avxvnni")));
> +extern void test_no_avx512fp16 (void)          __attribute__((__target__("no-avx512fp16")));
>
>  extern void test_arch_nocona (void)            __attribute__((__target__("arch=nocona")));
>  extern void test_arch_core2 (void)             __attribute__((__target__("arch=core2")));
> diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> new file mode 100644
> index 00000000000..2f8af392c83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> +
> +#include <immintrin.h>
> +
> +_Float16
> +foo (_Float16 x, _Float16 y)
> +{
> +  x = x > y ? x : y;
> +  return x;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> index 7029771334b..f5f5c113612 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> index 4ce0ffffaf3..747d504cedb 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> index 6e8b6f3fa1b..33411969901 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> @@ -103,7 +103,7 @@
>
>
>  #ifndef DIFFERENT_PRAGMAS
> -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>  #endif
>
>  /* Following intrinsics require immediate arguments.  They
> @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
>
>  /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
>  #ifdef DIFFERENT_PRAGMAS
> -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>  #endif
>  #include <immintrin.h>
>  test_1 (_cvtss_sh, unsigned short, float, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> index 7faa053ace8..86590ca5ffb 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> @@ -708,6 +708,6 @@
>  #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1)
>  #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1)
>
> -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>
>  #include <x86intrin.h>
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index 42ac9d0ac1a..10765365d7b 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3020,7 +3020,7 @@ proc check_effective_target_has_q_floating_suffix { } {
>
>  proc check_effective_target_float16 {} {
>      return [check_no_compiler_messages_nocache float16 object {
> -        _Float16 x;
> +        _Float16 foo (_Float16 x) { return x; }
>      } [add_options_for_float16 ""]]
>  }
>
> @@ -8714,6 +8714,17 @@ proc check_prefer_avx128 { } {
>  }
>
>
> +# Return 1 if avx512fp16 instructions can be compiled.
> +
> +proc check_effective_target_avx512fp16 { } {
> +    return [check_no_compiler_messages avx512fp16 object {
> +       void foo (void)
> +       {
> +         asm volatile ("vmovw %edi, %xmm0");
> +       }
> +    } "-O2 -mavx512fp16" ]
> +}
> +
>  # Return 1 if avx512f instructions can be compiled.
>
>  proc check_effective_target_avx512f { } {
> --
> 2.27.0
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-08-02  6:31                     ` [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
@ 2021-08-04  2:45                       ` Hongtao Liu
  2021-08-04 11:28                         ` Richard Biener
  2021-09-03 12:42                       ` [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above Jakub Jelinek
  1 sibling, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-04  2:45 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Uros Bizjak, Joseph Myers, Richard Biener, H. J. Lu

On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
>         * config/i386/i386.c (enum x86_64_reg_class): Add
>         X86_64_SSEHF_CLASS.
>         (merge_classes): Handle X86_64_SSEHF_CLASS.
>         (examine_argument): Ditto.
>         (construct_container): Ditto.
>         (classify_argument): Ditto, and set HFmode/HCmode to
>         X86_64_SSEHF_CLASS.
>         (function_value_32): Return _FLoat16/Complex Float16 by
>         %xmm0.
>         (function_value_64): Return _Float16/Complex Float16 by SSE
>         register.
>         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
>         (ix86_secondary_reload): Require gpr as intermediate register
>         to store _Float16 from sse register when sse4 is not
>         available.
>         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
>         sse2.
>         (ix86_scalar_mode_supported_p): Ditto.
>         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
>         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
>         (VALID_INT_MODE_P): Add HFmode and HCmode.
>         * config/i386/i386.md (*pushhf_rex64): New define_insn.
>         (*pushhf): Ditto.
>         (*movhf_internal): Ditto.
>         * doc/extend.texi (Half-Precision Floating Point): Documemt
>         _Float16 for x86.
>         * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
>         which is used by extract_bit_field but not backends.
>
> gcc/lto/ChangeLog:
>
>         * lto-lang.c (lto_type_for_mode): Return float16_type_node
>         when mode == TYPE_MODE (float16_type_node).
>
> gcc/testsuite/ChangeLog
>
>         * gcc.target/i386/sse2-float16-1.c: New test.
>         * gcc.target/i386/sse2-float16-2.c: Ditto.
>         * gcc.target/i386/sse2-float16-3.c: Ditto.
>         * gcc.target/i386/float16-5.c: Ditto.
> ---
>  gcc/config/i386/i386-modes.def                |   1 +
>  gcc/config/i386/i386.c                        |  91 +++++++++++++-
>  gcc/config/i386/i386.h                        |   3 +-
>  gcc/config/i386/i386.md                       | 118 +++++++++++++++++-
>  gcc/doc/extend.texi                           |  13 ++
>  gcc/emit-rtl.c                                |   5 +
>  gcc/lto/lto-lang.c                            |   3 +
>  gcc/testsuite/gcc.target/i386/float16-5.c     |  12 ++
>  .../gcc.target/i386/sse2-float16-1.c          |   8 ++
>  .../gcc.target/i386/sse2-float16-2.c          |  16 +++
>  .../gcc.target/i386/sse2-float16-3.c          |  12 ++
>  11 files changed, 274 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
>
> diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
> index 4e7014be034..9232f59a925 100644
> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
>  FLOAT_MODE (TF, 16, ieee_quad_format);
> +FLOAT_MODE (HF, 2, ieee_half_format);
>
>  /* In ILP32 mode, XFmode has size 12 and alignment 4.
>     In LP64 mode, XFmode has size and alignment 16.  */
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ff96134fb37..7979e240426 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -387,6 +387,7 @@ enum x86_64_reg_class
>      X86_64_INTEGER_CLASS,
>      X86_64_INTEGERSI_CLASS,
>      X86_64_SSE_CLASS,
> +    X86_64_SSEHF_CLASS,
>      X86_64_SSESF_CLASS,
>      X86_64_SSEDF_CLASS,
>      X86_64_SSEUP_CLASS,
> @@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum x86_64_reg_class class2)
>      return X86_64_MEMORY_CLASS;
>
>    /* Rule #4: If one of the classes is INTEGER, the result is INTEGER.  */
> -  if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
> -      || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS))
> +  if ((class1 == X86_64_INTEGERSI_CLASS
> +       && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
> +      || (class2 == X86_64_INTEGERSI_CLASS
> +         && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS)))
>      return X86_64_INTEGERSI_CLASS;
>    if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
>        || class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS)
> @@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
>             /* The partial classes are now full classes.  */
>             if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4)
>               subclasses[0] = X86_64_SSE_CLASS;
> +           if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2)
> +             subclasses[0] = X86_64_SSE_CLASS;
>             if (subclasses[0] == X86_64_INTEGERSI_CLASS
>                 && !((bit_offset % 64) == 0 && bytes == 4))
>               subclasses[0] = X86_64_INTEGER_CLASS;
> @@ -2350,6 +2355,12 @@ classify_argument (machine_mode mode, const_tree type,
>        gcc_unreachable ();
>      case E_CTImode:
>        return 0;
> +    case E_HFmode:
> +      if (!(bit_offset % 64))
> +       classes[0] = X86_64_SSEHF_CLASS;
> +      else
> +       classes[0] = X86_64_SSE_CLASS;
> +      return 1;
>      case E_SFmode:
>        if (!(bit_offset % 64))
>         classes[0] = X86_64_SSESF_CLASS;
> @@ -2367,6 +2378,15 @@ classify_argument (machine_mode mode, const_tree type,
>        classes[0] = X86_64_SSE_CLASS;
>        classes[1] = X86_64_SSEUP_CLASS;
>        return 2;
> +    case E_HCmode:
> +      classes[0] = X86_64_SSE_CLASS;
> +      if (!(bit_offset % 64))
> +       return 1;
> +      else
> +       {
> +         classes[1] = X86_64_SSEHF_CLASS;
> +         return 2;
> +       }
>      case E_SCmode:
>        classes[0] = X86_64_SSE_CLASS;
>        if (!(bit_offset % 64))
> @@ -2481,6 +2501,7 @@ examine_argument (machine_mode mode, const_tree type, int in_return,
>         (*int_nregs)++;
>         break;
>        case X86_64_SSE_CLASS:
> +      case X86_64_SSEHF_CLASS:
>        case X86_64_SSESF_CLASS:
>        case X86_64_SSEDF_CLASS:
>         (*sse_nregs)++;
> @@ -2580,13 +2601,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
>
>    /* First construct simple cases.  Avoid SCmode, since we want to use
>       single register to pass this type.  */
> -  if (n == 1 && mode != SCmode)
> +  if (n == 1 && mode != SCmode && mode != HCmode)
>      switch (regclass[0])
>        {
>        case X86_64_INTEGER_CLASS:
>        case X86_64_INTEGERSI_CLASS:
>         return gen_rtx_REG (mode, intreg[0]);
>        case X86_64_SSE_CLASS:
> +      case X86_64_SSEHF_CLASS:
>        case X86_64_SSESF_CLASS:
>        case X86_64_SSEDF_CLASS:
>         if (mode != BLKmode)
> @@ -2683,6 +2705,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
>                                    GEN_INT (i*8));
>             intreg++;
>             break;
> +         case X86_64_SSEHF_CLASS:
> +           exp [nexps++]
> +             = gen_rtx_EXPR_LIST (VOIDmode,
> +                                  gen_rtx_REG (HFmode,
> +                                               GET_SSE_REGNO (sse_regno)),
> +                                  GEN_INT (i*8));
> +           sse_regno++;
> +           break;
>           case X86_64_SSESF_CLASS:
>             exp [nexps++]
>               = gen_rtx_EXPR_LIST (VOIDmode,
> @@ -3903,6 +3933,19 @@ function_value_32 (machine_mode orig_mode, machine_mode mode,
>      /* Most things go in %eax.  */
>      regno = AX_REG;
>
> +  /* Return _Float16/_Complex _Foat16 by sse register.  */
> +  if (mode == HFmode)
> +    regno = FIRST_SSE_REG;
> +  if (mode == HCmode)
> +    {
> +      rtx ret = gen_rtx_PARALLEL (mode, rtvec_alloc(1));
> +      XVECEXP (ret, 0, 0)
> +       = gen_rtx_EXPR_LIST (VOIDmode,
> +                            gen_rtx_REG (SImode, FIRST_SSE_REG),
> +                            GEN_INT (0));
> +      return ret;
> +    }
> +
>    /* Override FP return register with %xmm0 for local functions when
>       SSE math is enabled or for functions with sseregparm attribute.  */
>    if ((fn || fntype) && (mode == SFmode || mode == DFmode))
> @@ -3939,6 +3982,8 @@ function_value_64 (machine_mode orig_mode, machine_mode mode,
>
>        switch (mode)
>         {
> +       case E_HFmode:
> +       case E_HCmode:
>         case E_SFmode:
>         case E_SCmode:
>         case E_DFmode:
> @@ -13411,6 +13456,15 @@ ix86_print_operand (FILE *file, rtx x, int code)
>           (file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P');
>      }
>
> +  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == HFmode)
> +    {
> +      long l = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (x),
> +                              REAL_MODE_FORMAT (HFmode));
> +      if (ASSEMBLER_DIALECT == ASM_ATT)
> +       putc ('$', file);
> +      fprintf (file, "0x%04x", (unsigned int) l);
> +    }
> +
>    else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
>      {
>        long l;
> @@ -18928,6 +18982,16 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
>        return NO_REGS;
>      }
>
> +  /* Require movement to gpr, and then store to memory.  */
> +  if (mode == HFmode
> +      && !TARGET_SSE4_1
> +      && SSE_CLASS_P (rclass)
> +      && !in_p && MEM_P (x))
> +    {
> +      sri->extra_cost = 1;
> +      return GENERAL_REGS;
> +    }
> +
>    /* This condition handles corner case where an expression involving
>       pointers gets vectorized.  We're trying to use the address of a
>       stack slot as a vector initializer.
> @@ -21555,10 +21619,27 @@ ix86_scalar_mode_supported_p (scalar_mode mode)
>      return default_decimal_float_supported_p ();
>    else if (mode == TFmode)
>      return true;
> +  else if (mode == HFmode && TARGET_SSE2)
> +    return true;
>    else
>      return default_scalar_mode_supported_p (mode);
>  }
>
> +/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
> +   if MODE is HFmode, and punt to the generic implementation otherwise.  */
> +
> +static bool
> +ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode)
> +{
> +  /* NB: Always return TRUE for HFmode so that the _Float16 type will
> +     be defined by the C front-end for AVX512FP16 intrinsics.  We will
> +     issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
> +     enabled.  */
> +  return ((mode == HFmode && TARGET_SSE2)
> +         ? true
> +         : default_libgcc_floating_mode_supported_p (mode));
> +}
> +
>  /* Implements target hook vector_mode_supported_p.  */
>  static bool
>  ix86_vector_mode_supported_p (machine_mode mode)
> @@ -23820,6 +23901,10 @@ ix86_run_selftests (void)
>  #undef TARGET_SCALAR_MODE_SUPPORTED_P
>  #define TARGET_SCALAR_MODE_SUPPORTED_P ix86_scalar_mode_supported_p
>
> +#undef TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P
> +#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P        \
> +ix86_libgcc_floating_mode_supported_p
> +
>  #undef TARGET_VECTOR_MODE_SUPPORTED_P
>  #define TARGET_VECTOR_MODE_SUPPORTED_P ix86_vector_mode_supported_p
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 0c2c93daf32..b1e66ee192e 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1018,7 +1018,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define VALID_SSE2_REG_MODE(MODE)                                      \
>    ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode     \
>     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
> -   || (MODE) == V2DImode || (MODE) == DFmode)
> +   || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
>
>  #define VALID_SSE_REG_MODE(MODE)                                       \
>    ((MODE) == V1TImode || (MODE) == TImode                              \
> @@ -1047,6 +1047,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>     || (MODE) == CQImode || (MODE) == CHImode                           \
>     || (MODE) == CSImode || (MODE) == CDImode                           \
>     || (MODE) == SDmode || (MODE) == DDmode                             \
> +   || (MODE) == HFmode || (MODE) == HCmode                             \
>     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
>     || (TARGET_64BIT                                                    \
>         && ((MODE) == TImode || (MODE) == CTImode                       \
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 8b809c49fe0..d475347172d 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1222,6 +1222,9 @@ (define_mode_iterator MODEF [SF DF])
>  ;; All x87 floating point modes
>  (define_mode_iterator X87MODEF [SF DF XF])
>
> +;; All x87 floating point modes plus HF
> +(define_mode_iterator X87MODEFH [SF DF XF HF])
> +
>  ;; All SSE floating point modes
>  (define_mode_iterator SSEMODEF [SF DF TF])
>  (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
> @@ -3130,6 +3133,32 @@ (define_split
>    operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx);
>  })
>
> +(define_insn "*pushhf_rex64"
> +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> +       (match_operand:HF 1 "nonmemory_no_elim_operand" "r,x"))]
> +  "TARGET_64BIT"
> +{
> +  /* Anything else should be already split before reg-stack.  */
> +  gcc_assert (which_alternative == 0);
> +  return "push{q}\t%q1";
> +}
> +  [(set_attr "isa"  "*,sse4")
> +   (set_attr "type" "push,multi")
> +   (set_attr "mode" "DI,TI")])
> +
> +(define_insn "*pushhf"
> +  [(set (match_operand:HF 0 "push_operand" "=X,X")
> +       (match_operand:HF 1 "general_no_elim_operand" "rmF,x"))]
> +  "!TARGET_64BIT"
> +{
> +  /* Anything else should be already split before reg-stack.  */
> +  gcc_assert (which_alternative == 0);
> +  return "push{l}\t%k1";
> +}
> +  [(set_attr "isa"  "*,sse4")
> +   (set_attr "type" "push,multi")
> +   (set_attr "mode" "SI,TI")])
> +
>  (define_insn "*pushsf_rex64"
>    [(set (match_operand:SF 0 "push_operand" "=X,X,X")
>         (match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,v"))]
> @@ -3158,10 +3187,11 @@ (define_insn "*pushsf"
>     (set_attr "unit" "i387,*,*")
>     (set_attr "mode" "SF,SI,SF")])
>
> +(define_mode_iterator MODESH [SF HF])
>  ;; %%% Kill this when call knows how to work this out.
>  (define_split
> -  [(set (match_operand:SF 0 "push_operand")
> -       (match_operand:SF 1 "any_fp_register_operand"))]
> +  [(set (match_operand:MODESH 0 "push_operand")
> +       (match_operand:MODESH 1 "any_fp_register_operand"))]
>    "reload_completed"
>    [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2)))
>     (set (match_dup 0) (match_dup 1))]
> @@ -3209,8 +3239,8 @@ (define_expand "movtf"
>    "ix86_expand_move (TFmode, operands); DONE;")
>
>  (define_expand "mov<mode>"
> -  [(set (match_operand:X87MODEF 0 "nonimmediate_operand")
> -       (match_operand:X87MODEF 1 "general_operand"))]
> +  [(set (match_operand:X87MODEFH 0 "nonimmediate_operand")
> +       (match_operand:X87MODEFH 1 "general_operand"))]
>    ""
>    "ix86_expand_move (<MODE>mode, operands); DONE;")
>
> @@ -3646,6 +3676,86 @@ (define_insn "*movsf_internal"
>            ]
>            (const_string "*")))])
>
> +(define_insn "*movhf_internal"
> + [(set (match_operand:HF 0 "nonimmediate_operand"
> +        "=?r,?m,v,v,?r,m,?v,v")
> +       (match_operand:HF 1 "general_operand"
> +        "rmF,rF,C,v, v,v, r,m"))]
> + "!(MEM_P (operands[0]) && MEM_P (operands[1]))
> +  && (lra_in_progress
> +      || reload_completed
> +      || !CONST_DOUBLE_P (operands[1])
> +      || (TARGET_SSE && TARGET_SSE_MATH
> +         && standard_sse_constant_p (operands[1], HFmode) == 1)
> +      || memory_operand (operands[0], HFmode))"
> +{
> +  switch (get_attr_type (insn))
> +    {
> +    case TYPE_IMOV:
> +      return "mov{w}\t{%1, %0|%0, %1}";
> +
> +    case TYPE_SSELOG1:
> +      return standard_sse_constant_opcode (insn, operands);
> +
> +    case TYPE_SSEMOV:
> +      return ix86_output_ssemov (insn, operands);
> +
> +    case TYPE_SSELOG:
> +      if (SSE_REG_P (operands[0]))
> +       return MEM_P (operands[1])
> +              ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> +              : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> +      else
> +       return MEM_P (operands[1])
> +              ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> +              : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> +
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +  [(set (attr "isa")
> +       (cond [(eq_attr "alternative" "2,3,4,6,7")
> +                (const_string "sse2")
> +              (eq_attr "alternative" "5")
> +                (const_string "sse4")
> +             ]
> +             (const_string "*")))
> +   (set (attr "type")
> +       (cond [(eq_attr "alternative" "0,1")
> +                (const_string "imov")
> +              (eq_attr "alternative" "2")
> +                (const_string "sselog1")
> +              (eq_attr "alternative" "4,5,6,7")
> +                (const_string "sselog")
> +             ]
> +             (const_string "ssemov")))
> +   (set (attr "memory")
> +       (cond [(eq_attr "alternative" "4,6")
> +                (const_string "none")
> +              (eq_attr "alternative" "5")
> +                (const_string "store")
> +              (eq_attr "alternative" "7")
> +                (const_string "load")
> +             ]
> +             (const_string "*")))
> +   (set (attr "prefix")
> +       (cond [(eq_attr "alternative" "0,1")
> +                (const_string "orig")
> +             ]
> +             (const_string "maybe_vex")))
> +   (set (attr "mode")
> +       (cond [(eq_attr "alternative" "0,1")
> +                (const_string "HI")
> +              (eq_attr "alternative" "2")
> +                (const_string "V4SF")
> +              (eq_attr "alternative" "4,5,6,7")
> +                (const_string "TI")
> +              (eq_attr "alternative" "3")
> +                (const_string "SF")
> +             ]
> +             (const_string "*")))])
> +
>  (define_split
>    [(set (match_operand 0 "any_fp_register_operand")
>         (match_operand 1 "memory_operand"))]
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index b83cd4919bb..f42fd633725 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
>  @section Half-Precision Floating Point
>  @cindex half-precision floating point
>  @cindex @code{__fp16} data type
> +@cindex @code{__Float16} data type
>
>  On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
>  point via the @code{__fp16} type defined in the ARM C Language Extensions.
> @@ -1150,6 +1151,18 @@ calls.
>  It is recommended that portable code use the @code{_Float16} type defined
>  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
>
> +On x86 targets with @code{target("sse2")} and above, GCC supports half-precision
> +(16-bit) floating point via the @code{_Float16} type which is defined by
> +18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> +which contains same data format as C.
> +
> +Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
> +operations will be emulated by software emulation and the @code{float}
> +instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
> +the intermediate result of the operation as 32-bit precision. This may lead
> +to inconsistent behavior between software emulation and AVX512-FP16
> +instructions.
> +
>  @node Decimal Float
>  @section Decimal Floating Types
>  @cindex decimal floating types

Ping, i'd like to ask for approval for the below codes which is
related to generic part.

start from ..
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index ff3b4449b37..775ee397836 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
>       fix them all.  */
>    if (omode == word_mode)
>      ;
> +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> +     here. Though extract_bit_field is the culprit here, not the backends.  */
> +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> +    ;
>    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>       is the culprit here, and not the backends.  */
>    else if (known_ge (osize, regsize) && known_ge (isize, osize))

and end here.
> diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
> index c13c7e45ac1..92f499643b5 100644
> --- a/gcc/lto/lto-lang.c
> +++ b/gcc/lto/lto-lang.c
> @@ -992,6 +992,9 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>      return unsigned_p ? unsigned_intTI_type_node : intTI_type_node;
>  #endif
>
> +  if (float16_type_node && mode == TYPE_MODE (float16_type_node))
> +    return float16_type_node;
> +
>    if (mode == TYPE_MODE (float_type_node))
>      return float_type_node;
>
> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> new file mode 100644
> index 00000000000..ebc0af1490b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse2 -O2" } */
> +_Float16
> +foo (int a)
> +{
> +  union {
> +    int a;
> +    _Float16 b;
> +  }c;
> +  c.a = a;
> +  return c.b;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-1.c b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> new file mode 100644
> index 00000000000..1b645eb499d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-sse2" } */
> +
> +_Float16/* { dg-error "is not supported on this target" } */
> +foo (_Float16 x) /* { dg-error "is not supported on this target" } */
> +{
> +  return x;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-2.c b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> new file mode 100644
> index 00000000000..3da7683fc31
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx512f" } */
> +
> +union flt
> +{
> +  _Float16 flt;
> +  short s;
> +};
> +
> +_Float16
> +foo (union flt x)
> +{
> +  return x.flt;
> +}
> +
> +/* { dg-final { scan-assembler {(?n)pinsrw[\t ].*%xmm0} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-3.c b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> new file mode 100644
> index 00000000000..60ff9d4ab80
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx512f" } */
> +
> +#include<complex.h>
> +
> +_Complex _Float16
> +foo (_Complex _Float16 x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler {(?n)movd[\t ].*%xmm0} } } */
> --
> 2.27.0
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.
  2021-08-02  6:44                     ` [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions liuhongt
  2021-08-04  2:40                       ` Hongtao Liu
@ 2021-08-04  9:55                       ` Uros Bizjak
  1 sibling, 0 replies; 138+ messages in thread
From: Uros Bizjak @ 2021-08-04  9:55 UTC (permalink / raw)
  To: liuhongt
  Cc: gcc-patches, Hongtao Liu, Joseph S. Myers, Richard Biener,
	H. J. Lu, Guo, Xuepeng, H . J . Lu, Wang Hongyu, Xu Dianhong

On Mon, Aug 2, 2021 at 8:44 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> From: "Guo, Xuepeng" <xuepeng.guo@intel.com>
>
> gcc/ChangeLog:
>
>         * common/config/i386/cpuinfo.h (get_available_features):
>         Detect FEATURE_AVX512FP16.
>         * common/config/i386/i386-common.c
>         (OPTION_MASK_ISA_AVX512FP16_SET,
>         OPTION_MASK_ISA_AVX512FP16_UNSET,
>         OPTION_MASK_ISA2_AVX512FP16_SET,
>         OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
>         (OPTION_MASK_ISA2_AVX512BW_UNSET,
>         OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
>         (ix86_handle_option): Handle -mavx512fp16.
>         * common/config/i386/i386-cpuinfo.h (enum processor_features):
>         Add FEATURE_AVX512FP16.
>         * common/config/i386/i386-isas.h: Add entry for AVX512FP16.
>         * config.gcc: Add avx512fp16intrin.h.
>         * config/i386/avx512fp16intrin.h: New intrinsic header.
>         * config/i386/cpuid.h: Add bit_AVX512FP16.
>         * config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
>         * config/i386/i386-builtins.c: Support _Float16 type for i386
>         backend.
>         (ix86_init_float16_builtins): New function.
>         (ix86_float16_type_node): New.
>         * config/i386/i386-c.c (ix86_target_macros_internal): Define
>         __AVX512FP16__.
>         * config/i386/i386-expand.c (ix86_expand_branch): Support
>         HFmode.
>         (ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
>         SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
>         (ix86_expand_fp_movcc): Ditto.
>         * config/i386/i386-isa.def: Add PTA define for AVX512FP16.
>         * config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
>         (ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
>         * config/i386/i386.c (ix86_get_ssemov): Use
>         vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
>         (ix86_get_excess_precision): Use
>         FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
>         existed.
>         (sse_store_index): Use SFmode cost for HFmode cost.
>         (inline_memory_move_cost): Add HFmode, and perfer SSE cost over
>         GPR cost for HFmode.
>         (ix86_hard_regno_mode_ok): Allow HImode in sse register.
>         (ix86_mangle_type): Add manlging for _Float16 type.
>         (inline_secondary_memory_needed): No memory is needed for
>         16bit movement between gpr and sse reg under
>         TARGET_AVX512FP16.
>         (ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
>         SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
>         (ix86_division_cost): Ditto.
>         (ix86_rtx_costs): Ditto.
>         (ix86_add_stmt_cost): Ditto.
>         (ix86_optab_supported_p): Ditto.
>         * config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
>         (SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
>         (PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
>         * config/i386/i386.md (mode): Add HFmode.
>         (MODE_SIZE): Add HFmode.
>         (isa): Add avx512fp16.
>         (enabled): Handle avx512fp16.
>         (ssemodesuffix): Add sh suffix for HFmode.
>         (comm): Add mult, div.
>         (plusminusmultdiv): New code iterator.
>         (insn): Add mult, div.
>         (*movhf_internal): Adjust for avx512fp16 instruction.
>         (*movhi_internal): Ditto.
>         (*cmpi<unord>hf): New define_insn for HFmode.
>         (*ieee_s<ieee_maxmin>hf3): Likewise.
>         (extendhf<mode>2): Likewise.
>         (trunc<mode>hf2): Likewise.
>         (float<floatunssuffix><mode>hf2): Likewise.
>         (*<insn>hf): Likewise.
>         (cbranchhf4): New expander.
>         (movhfcc): Likewise.
>         (<insn>hf3): Likewise.
>         (mulhf3): Likewise.
>         (divhf3): Likewise.
>         * config/i386/i386.opt: Add mavx512fp16.
>         * config/i386/immintrin.h: Include avx512fp16intrin.h.
>         * doc/invoke.texi: Add mavx512fp16.
>         * doc/extend.texi: Add avx512fp16 Usage Notes.

OK with some nits (e.g. please leave some vertical space to visually
split different functionality inside the function).

> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
>         * gcc.target/i386/avx-2.c: Ditto.
>         * gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
>         * gcc.target/i386/funcspec-56.inc: Add new target attribute check.
>         * gcc.target/i386/sse-13.c: Add -mavx512fp16.
>         * gcc.target/i386/sse-14.c: Ditto.
>         * gcc.target/i386/sse-22.c: Ditto.
>         * gcc.target/i386/sse-23.c: Ditto.
>         * lib/target-supports.exp: (check_effective_target_avx512fp16): New.
>         * g++.target/i386/float16-1.C: New test.
>         * g++.target/i386/float16-2.C: Ditto.
>         * g++.target/i386/float16-3.C: Ditto.
>         * gcc.target/i386/avx512fp16-12a.c: Ditto.
>         * gcc.target/i386/avx512fp16-12b.c: Ditto.
>         * gcc.target/i386/float16-3a.c: Ditto.
>         * gcc.target/i386/float16-3b.c: Ditto.
>         * gcc.target/i386/float16-4a.c: Ditto.
>         * gcc.target/i386/float16-4b.c: Ditto.
>         * gcc.target/i386/pr54855-12.c: Ditto.
>         * g++.dg/other/i386-2.C: Ditto.
>         * g++.dg/other/i386-3.C: Ditto.

LGTM for the testcases.

Thanks,
Uros.

>
> Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
> Co-Authored-By: Liu Hongtao <hongtao.liu@intel.com>
> Co-Authored-By: Wang Hongyu <hongyu.wang@intel.com>
> Co-Authored-By: Xu Dianhong <dianhong.xu@intel.com>
> ---
>  gcc/common/config/i386/cpuinfo.h              |   2 +
>  gcc/common/config/i386/i386-common.c          |  26 ++-
>  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
>  gcc/common/config/i386/i386-isas.h            |   1 +
>  gcc/config.gcc                                |   2 +-
>  gcc/config/i386/avx512fp16intrin.h            |  53 ++++++
>  gcc/config/i386/cpuid.h                       |   1 +
>  gcc/config/i386/i386-builtin-types.def        |   1 +
>  gcc/config/i386/i386-builtins.c               |  23 +++
>  gcc/config/i386/i386-c.c                      |   2 +
>  gcc/config/i386/i386-expand.c                 |   5 +-
>  gcc/config/i386/i386-isa.def                  |   1 +
>  gcc/config/i386/i386-options.c                |   4 +-
>  gcc/config/i386/i386.c                        | 133 ++++++++++----
>  gcc/config/i386/i386.h                        |  11 +-
>  gcc/config/i386/i386.md                       | 172 ++++++++++++++++--
>  gcc/config/i386/i386.opt                      |   4 +
>  gcc/config/i386/immintrin.h                   |   4 +
>  gcc/doc/extend.texi                           |   8 +
>  gcc/doc/invoke.texi                           |  10 +-
>  gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
>  gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
>  gcc/testsuite/g++.target/i386/float16-2.C     |  14 ++
>  gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
>  gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
>  .../gcc.target/i386/avx512fp16-12a.c          |  21 +++
>  .../gcc.target/i386/avx512fp16-12b.c          |  27 +++
>  gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 ++
>  gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
>  gcc/testsuite/lib/target-supports.exp         |  13 +-
>  41 files changed, 558 insertions(+), 76 deletions(-)
>  create mode 100644 gcc/config/i386/avx512fp16intrin.h
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
>
> diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
> index 458f41de776..1835ac64e67 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -731,6 +731,8 @@ get_available_features (struct __processor_model *cpu_model,
>             set_feature (FEATURE_AVX5124FMAPS);
>           if (edx & bit_AVX512VP2INTERSECT)
>             set_feature (FEATURE_AVX512VP2INTERSECT);
> +         if (edx & bit_AVX512FP16)
> +           set_feature (FEATURE_AVX512FP16);
>         }
>
>        __cpuid_count (7, 1, eax, ebx, ecx, edx);
> diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
> index 76ab1a14e54..00c65ba15ab 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX5124VNNIW_SET OPTION_MASK_ISA2_AVX5124VNNIW
>  #define OPTION_MASK_ISA_AVX512VBMI2_SET \
>    (OPTION_MASK_ISA_AVX512VBMI2 | OPTION_MASK_ISA_AVX512F_SET)
> +#define OPTION_MASK_ISA_AVX512FP16_SET OPTION_MASK_ISA_AVX512BW_SET
> +#define OPTION_MASK_ISA2_AVX512FP16_SET OPTION_MASK_ISA2_AVX512FP16
>  #define OPTION_MASK_ISA_AVX512VNNI_SET \
>    (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512F_SET)
>  #define OPTION_MASK_ISA2_AVXVNNI_SET OPTION_MASK_ISA2_AVXVNNI
> @@ -231,6 +233,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS
>  #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW
>  #define OPTION_MASK_ISA_AVX512VBMI2_UNSET OPTION_MASK_ISA_AVX512VBMI2
> +#define OPTION_MASK_ISA_AVX512FP16_UNSET OPTION_MASK_ISA_AVX512BW_UNSET
> +#define OPTION_MASK_ISA2_AVX512FP16_UNSET OPTION_MASK_ISA2_AVX512FP16
>  #define OPTION_MASK_ISA_AVX512VNNI_UNSET OPTION_MASK_ISA_AVX512VNNI
>  #define OPTION_MASK_ISA2_AVXVNNI_UNSET OPTION_MASK_ISA2_AVXVNNI
>  #define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ
> @@ -313,7 +317,8 @@ along with GCC; see the file COPYING3.  If not see
>    (OPTION_MASK_ISA2_AVX512BF16_UNSET \
>     | OPTION_MASK_ISA2_AVX5124FMAPS_UNSET \
>     | OPTION_MASK_ISA2_AVX5124VNNIW_UNSET \
> -   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
> +   | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
> +   | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
>    (OPTION_MASK_ISA2_AVX512F_UNSET)
>  #define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
> @@ -326,7 +331,9 @@ along with GCC; see the file COPYING3.  If not see
>    (OPTION_MASK_ISA2_SSE3_UNSET | OPTION_MASK_ISA2_KL_UNSET)
>  #define OPTION_MASK_ISA2_SSE_UNSET OPTION_MASK_ISA2_SSE2_UNSET
>
> -#define OPTION_MASK_ISA2_AVX512BW_UNSET OPTION_MASK_ISA2_AVX512BF16_UNSET
> +#define OPTION_MASK_ISA2_AVX512BW_UNSET \
> +  (OPTION_MASK_ISA2_AVX512BF16_UNSET \
> +    | OPTION_MASK_ISA2_AVX512FP16_UNSET)
>
>  /* Set 1 << value as value of -malign-FLAG option.  */
>
> @@ -853,6 +860,21 @@ ix86_handle_option (struct gcc_options *opts,
>         }
>        return true;
>
> +    case OPT_mavx512fp16:
> +      if (value)
> +       {
> +         opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
> +         opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
> +       }
> +      else
> +       {
> +         opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
> +       }
> +      return true;
> +
>      case OPT_mavx512vnni:
>        if (value)
>         {
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
> index e68dd656046..4e0659fc7b2 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -228,6 +228,7 @@ enum processor_features
>    FEATURE_AESKLE,
>    FEATURE_WIDEKL,
>    FEATURE_AVXVNNI,
> +  FEATURE_AVX512FP16,
>    CPU_FEATURE_MAX
>  };
>
> diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
> index 898c18f3dda..a6783660278 100644
> --- a/gcc/common/config/i386/i386-isas.h
> +++ b/gcc/common/config/i386/i386-isas.h
> @@ -169,4 +169,5 @@ ISA_NAMES_TABLE_START
>    ISA_NAMES_TABLE_ENTRY("aeskle", FEATURE_AESKLE, P_NONE, NULL)
>    ISA_NAMES_TABLE_ENTRY("widekl", FEATURE_WIDEKL, P_NONE, "-mwidekl")
>    ISA_NAMES_TABLE_ENTRY("avxvnni", FEATURE_AVXVNNI, P_NONE, "-mavxvnni")
> +  ISA_NAMES_TABLE_ENTRY("avx512fp16", FEATURE_AVX512FP16, P_NONE, "-mavx512fp16")
>  ISA_NAMES_TABLE_END
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 3df9b52cf25..a354351408c 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -416,7 +416,7 @@ i[34567]86-*-* | x86_64-*-*)
>                        tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
>                        amxbf16intrin.h x86gprintrin.h uintrintrin.h
>                        hresetintrin.h keylockerintrin.h avxvnniintrin.h
> -                      mwaitintrin.h"
> +                      mwaitintrin.h avx512fp16intrin.h"
>         ;;
>  ia64-*-*)
>         extra_headers=ia64intrin.h
> diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h
> new file mode 100644
> index 00000000000..38d63161ba6
> --- /dev/null
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -0,0 +1,53 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _IMMINTRIN_H_INCLUDED
> +#error "Never use <avx512fp16intrin.h> directly; include <immintrin.h> instead."
> +#endif
> +
> +#ifndef __AVX512FP16INTRIN_H_INCLUDED
> +#define __AVX512FP16INTRIN_H_INCLUDED
> +
> +#ifndef __AVX512FP16__
> +#pragma GCC push_options
> +#pragma GCC target("avx512fp16")
> +#define __DISABLE_AVX512FP16__
> +#endif /* __AVX512FP16__ */
> +
> +/* Internal data types for implementing the intrinsics.  */
> +typedef _Float16 __v8hf __attribute__ ((__vector_size__ (16)));
> +typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
> +typedef _Float16 __v32hf __attribute__ ((__vector_size__ (64)));
> +
> +/* The Intel API is flexible enough that we must allow aliasing with other
> +   vector types, and their scalar components.  */
> +typedef _Float16 __m128h __attribute__ ((__vector_size__ (16), __may_alias__));
> +typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));
> +typedef _Float16 __m512h __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +#ifdef __DISABLE_AVX512FP16__
> +#undef __DISABLE_AVX512FP16__
> +#pragma GCC pop_options
> +#endif /* __DISABLE_AVX512FP16__ */
> +
> +#endif /* __AVX512FP16INTRIN_H_INCLUDED */
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index aebc17c6827..82b8050028b 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -126,6 +126,7 @@
>  #define bit_AVX5124VNNIW (1 << 2)
>  #define bit_AVX5124FMAPS (1 << 3)
>  #define bit_AVX512VP2INTERSECT (1 << 8)
> +#define bit_AVX512FP16   (1 << 23)
>  #define bit_IBT        (1 << 20)
>  #define bit_UINTR (1 << 5)
>  #define bit_PCONFIG    (1 << 18)
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
> index 3ca313c19ec..1768b88d748 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -68,6 +68,7 @@ DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
>  DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)
>  DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
>  DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
> +DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
>  DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
> diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
> index 204e2903126..668f09f12a0 100644
> --- a/gcc/config/i386/i386-builtins.c
> +++ b/gcc/config/i386/i386-builtins.c
> @@ -125,6 +125,7 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
>  /* Table for the ix86 builtin non-function types.  */
>  static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
>
> +tree ix86_float16_type_node = NULL_TREE;
>  /* Retrieve an element from the above table, building some of
>     the types lazily.  */
>
> @@ -1343,6 +1344,26 @@ ix86_init_builtins_va_builtins_abi (void)
>                         BUILT_IN_VA_COPY, BUILT_IN_NORMAL, NULL, fnattr_sysv);
>  }
>
> +static void
> +ix86_init_float16_builtins (void)

Maybe better name this ix86_register_float16_builtin_type.

> +{
> +  /* Provide the _Float16 type and float16_type_node if needed so that
> +     it can be used in AVX512FP16 intrinsics and builtins.  */
> +  if (!float16_type_node)
> +    {
> +      ix86_float16_type_node = make_node (REAL_TYPE);
> +      TYPE_PRECISION (ix86_float16_type_node) = 16;
> +      SET_TYPE_MODE (ix86_float16_type_node, HFmode);
> +      layout_type (ix86_float16_type_node);
> +    }
> +  else
> +    ix86_float16_type_node = float16_type_node;
> +
> +  if (!maybe_get_identifier ("_Float16") && TARGET_SSE2)
> +    lang_hooks.types.register_builtin_type (ix86_float16_type_node,
> +                                           "_Float16");
> +}
> +
>  static void
>  ix86_init_builtin_types (void)
>  {
> @@ -1371,6 +1392,8 @@ ix86_init_builtin_types (void)
>       it.  */
>    lang_hooks.types.register_builtin_type (float128_type_node, "__float128");
>
> +  ix86_init_float16_builtins ();
> +
>    const_string_type_node
>      = build_pointer_type (build_qualified_type
>                           (char_type_node, TYPE_QUAL_CONST));
> diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> index 5ed0de006fb..cc64f855ecc 100644
> --- a/gcc/config/i386/i386-c.c
> +++ b/gcc/config/i386/i386-c.c
> @@ -598,6 +598,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>      def_or_undef (parse_in, "__PTWRITE__");
>    if (isa_flag2 & OPTION_MASK_ISA2_AVX512BF16)
>      def_or_undef (parse_in, "__AVX512BF16__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX512FP16)
> +    def_or_undef (parse_in, "__AVX512FP16__");
>    if (TARGET_MMX_WITH_SSE)
>      def_or_undef (parse_in, "__MMX_WITH_SSE__");
>    if (isa_flag2 & OPTION_MASK_ISA2_ENQCMD)
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 69ea79e6123..b7d050a1e42 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -2314,6 +2314,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
>
>    switch (mode)
>      {
> +    case E_HFmode:
>      case E_SFmode:
>      case E_DFmode:
>      case E_XFmode:
> @@ -2627,7 +2628,7 @@ ix86_prepare_fp_compare_args (enum rtx_code code, rtx *pop0, rtx *pop1)
>    bool unordered_compare = ix86_unordered_fp_compare (code);
>    rtx op0 = *pop0, op1 = *pop1;
>    machine_mode op_mode = GET_MODE (op0);
> -  bool is_sse = TARGET_SSE_MATH && SSE_FLOAT_MODE_P (op_mode);
> +  bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
>
>    /* All of the unordered compare instructions only work on registers.
>       The same is true of the fcomi compare instructions.  The XFmode
> @@ -4112,7 +4113,7 @@ ix86_expand_fp_movcc (rtx operands[])
>    rtx op0 = XEXP (operands[1], 0);
>    rtx op1 = XEXP (operands[1], 1);
>
> -  if (TARGET_SSE_MATH && SSE_FLOAT_MODE_P (mode))
> +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      {
>        machine_mode cmode;
>
> diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
> index a0d46cbc892..83d9302ea3d 100644
> --- a/gcc/config/i386/i386-isa.def
> +++ b/gcc/config/i386/i386-isa.def
> @@ -108,3 +108,4 @@ DEF_PTA(HRESET)
>  DEF_PTA(KL)
>  DEF_PTA(WIDEKL)
>  DEF_PTA(AVXVNNI)
> +DEF_PTA(AVX512FP16)
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index 3416a4f1752..df191763e4b 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -223,7 +223,8 @@ static struct ix86_target_opts isa2_opts[] =
>    { "-mhreset",                OPTION_MASK_ISA2_HRESET },
>    { "-mkl",            OPTION_MASK_ISA2_KL },
>    { "-mwidekl",        OPTION_MASK_ISA2_WIDEKL },
> -  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI }
> +  { "-mavxvnni",       OPTION_MASK_ISA2_AVXVNNI },
> +  { "-mavx512fp16",    OPTION_MASK_ISA2_AVX512FP16 }
>  };
>  static struct ix86_target_opts isa_opts[] =
>  {
> @@ -1045,6 +1046,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
>      IX86_ATTR_ISA ("amx-bf16", OPT_mamx_bf16),
>      IX86_ATTR_ISA ("hreset", OPT_mhreset),
>      IX86_ATTR_ISA ("avxvnni",   OPT_mavxvnni),
> +    IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16),
>
>      /* enum options */
>      IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_),
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index dc673c89bc8..71bbcf968c5 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5497,6 +5497,14 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
>      case MODE_SI:
>        return "%vmovd\t{%1, %0|%0, %1}";
>
> +    case MODE_HI:
> +      if (GENERAL_REG_P (operands[0]))
> +       return "vmovw\t{%1, %k0|%k0, %1}";
> +      else if (GENERAL_REG_P (operands[1]))
> +       return "vmovw\t{%k1, %0|%0, %k1}";
> +      else
> +       return "vmovw\t{%1, %0|%0, %1}";
> +
>      case MODE_DF:
>        if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
>         return "vmovsd\t{%d1, %0|%0, %d1}";
> @@ -5509,6 +5517,12 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
>        else
>         return "%vmovss\t{%1, %0|%0, %1}";
>
> +    case MODE_HF:
> +      if (REG_P (operands[0]) && REG_P (operands[1]))
> +       return "vmovsh\t{%d1, %0|%0, %d1}";
> +      else
> +       return "vmovsh\t{%1, %0|%0, %1}";
> +
>      case MODE_V1DF:
>        gcc_assert (!TARGET_AVX);
>        return "movlpd\t{%1, %0|%0, %1}";
> @@ -13955,7 +13969,7 @@ output_387_binary_op (rtx_insn *insn, rtx *operands)
>
>    if (is_sse)
>     {
> -     p = (GET_MODE (operands[0]) == SFmode) ? "ss" : "sd";
> +     p = (GET_MODE (operands[0]) == SFmode ? "ss" : "sd");

No need for parenthesis here.

>       strcat (buf, p);
>
>       if (TARGET_AVX)
> @@ -19132,10 +19146,19 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1,
>        if (!TARGET_SSE2)
>         return true;
>
> +      if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2)))
> +       return true;
> +
> +      int msize = GET_MODE_SIZE (mode);
> +
>        /* Between SSE and general, we have moves no larger than word size.  */
> -      if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
> -         || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
> -         || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> +      if (msize > UNITS_PER_WORD)
> +       return true;
> +
> +      /* In addition to SImode moves, AVX512FP16 also enables HImode moves.  */
> +      int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);
> +
> +      if (msize < minsize)
>         return true;
>
>        /* If the target says that inter-unit moves are more expensive
> @@ -19229,21 +19252,26 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
>  static inline int
>  sse_store_index (machine_mode mode)
>  {
> -      switch (GET_MODE_SIZE (mode))
> -       {
> -         case 4:
> -           return 0;
> -         case 8:
> -           return 1;
> -         case 16:
> -           return 2;
> -         case 32:
> -           return 3;
> -         case 64:
> -           return 4;
> -         default:
> -           return -1;
> -       }
> +  /* NB: Use SFmode cost for HFmode instead of adding HFmode load/store
> +     costs to processor_costs, which requires changes to all entries in
> +     processor cost table.  */
> +  if (mode == E_HFmode)
> +    mode = E_SFmode;

Vertical space here.

> +  switch (GET_MODE_SIZE (mode))
> +    {
> +    case 4:
> +      return 0;
> +    case 8:
> +      return 1;
> +    case 16:
> +      return 2;
> +    case 32:
> +      return 3;
> +    case 64:
> +      return 4;
> +    default:
> +      return -1;
> +    }
>  }
>
>  /* Return the cost of moving data of mode M between a
> @@ -19270,6 +19298,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>        int index;
>        switch (mode)
>         {
> +         case E_HFmode:
>           case E_SFmode:
>             index = 0;
>             break;
> @@ -19370,11 +19399,31 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, int in)
>           }
>         break;
>        case 2:
> -       if (in == 2)
> -         return MAX (ix86_cost->hard_register.int_load[1],
> -                     ix86_cost->hard_register.int_store[1]);
> -       return in ? ix86_cost->hard_register.int_load[1]
> -                 : ix86_cost->hard_register.int_store[1];
> +       {
> +         int cost;

Vertical space here.

> +         if (in == 2)
> +           cost = MAX (ix86_cost->hard_register.int_load[1],
> +                       ix86_cost->hard_register.int_store[1]);
> +         else
> +           cost = in ? ix86_cost->hard_register.int_load[1]
> +                     : ix86_cost->hard_register.int_store[1];

Vertical space here.

> +         if (mode == E_HFmode)
> +           {
> +             /* Prefer SSE over GPR for HFmode.  */
> +             int sse_cost;
> +             int index = sse_store_index (mode);
> +             if (in == 2)
> +               sse_cost = MAX (ix86_cost->hard_register.sse_load[index],
> +                               ix86_cost->hard_register.sse_store[index]);
> +             else
> +               sse_cost = (in
> +                           ? ix86_cost->hard_register.sse_load [index]
> +                           : ix86_cost->hard_register.sse_store [index]);
> +             if (sse_cost >= cost)
> +               cost = sse_cost + 1;
> +           }
> +         return cost;
> +       }
>        default:
>         if (in == 2)
>           cost = MAX (ix86_cost->hard_register.int_load[2],
> @@ -19548,6 +19597,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>           - XI mode
>           - any of 512-bit wide vector mode
>           - any scalar mode.  */
> +      /* For AVX512FP16, vmovw supports movement of HImode
> +        between gpr and sse registser.  */
>        if (TARGET_AVX512F
>           && (mode == XImode
>               || VALID_AVX512F_REG_MODE (mode)
> @@ -19831,7 +19882,7 @@ ix86_multiplication_cost (const struct processor_costs *cost,
>    if (VECTOR_MODE_P (mode))
>      inner_mode = GET_MODE_INNER (mode);
>
> -  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      return inner_mode == DFmode ? cost->mulsd : cost->mulss;
>    else if (X87_FLOAT_MODE_P (mode))
>      return cost->fmul;
> @@ -19883,7 +19934,7 @@ ix86_division_cost (const struct processor_costs *cost,
>    if (VECTOR_MODE_P (mode))
>      inner_mode = GET_MODE_INNER (mode);
>
> -  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +  if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      return inner_mode == DFmode ? cost->divsd : cost->divss;
>    else if (X87_FLOAT_MODE_P (mode))
>      return cost->fdiv;
> @@ -20303,7 +20354,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>           return true;
>         }
>
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         {
>           *total = cost->addss;
>           return false;
> @@ -20336,7 +20387,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        /* FALLTHRU */
>
>      case NEG:
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         {
>           *total = cost->sse_op;
>           return false;
> @@ -20418,14 +20469,14 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        return false;
>
>      case FLOAT_EXTEND:
> -      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
> +      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = 0;
>        else
>          *total = ix86_vec_cost (mode, cost->addss);
>        return false;
>
>      case FLOAT_TRUNCATE:
> -      if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
> +      if (!SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = cost->fadd;
>        else
>          *total = ix86_vec_cost (mode, cost->addss);
> @@ -20435,7 +20486,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        /* SSE requires memory load for the constant operand. It may make
>          sense to account for this.  Of course the constant operand may or
>          may not be reused. */
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = cost->sse_op;
>        else if (X87_FLOAT_MODE_P (mode))
>         *total = cost->fabs;
> @@ -20444,7 +20495,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        return false;
>
>      case SQRT:
> -      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +      if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>         *total = mode == SFmode ? cost->sqrtss : cost->sqrtsd;
>        else if (X87_FLOAT_MODE_P (mode))
>         *total = cost->fsqrt;
> @@ -21928,6 +21979,10 @@ ix86_mangle_type (const_tree type)
>
>    switch (TYPE_MODE (type))
>      {
> +    case E_HFmode:
> +      /* _Float16 is "DF16_".
> +        Align with clang's decision in https://reviews.llvm.org/D33719. */
> +      return "DF16_";
>      case E_TFmode:
>        /* __float128 is "g".  */
>        return "g";
> @@ -22551,7 +22606,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>         case MINUS_EXPR:
>           if (kind == scalar_stmt)
>             {
> -             if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +             if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>                 stmt_cost = ix86_cost->addss;
>               else if (X87_FLOAT_MODE_P (mode))
>                 stmt_cost = ix86_cost->fadd;
> @@ -22569,7 +22624,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>           stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>           break;
>         case NEGATE_EXPR:
> -         if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +         if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>             stmt_cost = ix86_cost->sse_op;
>           else if (X87_FLOAT_MODE_P (mode))
>             stmt_cost = ix86_cost->fchs;
> @@ -22625,7 +22680,7 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
>         case BIT_XOR_EXPR:
>         case BIT_AND_EXPR:
>         case BIT_NOT_EXPR:
> -         if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> +         if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>             stmt_cost = ix86_cost->sse_op;
>           else if (VECTOR_MODE_P (mode))
>             stmt_cost = ix86_vec_cost (mode, ix86_cost->sse_op);
> @@ -23327,14 +23382,18 @@ ix86_get_excess_precision (enum excess_precision_type type)
>         /* The fastest type to promote to will always be the native type,
>            whether that occurs with implicit excess precision or
>            otherwise.  */
> -       return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> +       return TARGET_AVX512FP16
> +              ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> +              : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
>        case EXCESS_PRECISION_TYPE_STANDARD:
>        case EXCESS_PRECISION_TYPE_IMPLICIT:
>         /* Otherwise, the excess precision we want when we are
>            in a standards compliant mode, and the implicit precision we
>            provide would be identical were it not for the unpredictable
>            cases.  */
> -       if (!TARGET_80387)
> +       if (TARGET_AVX512FP16 && TARGET_SSE_MATH)
> +         return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> +       else if (!TARGET_80387)
>           return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
>         else if (!TARGET_MIX_SSE_I387)
>           {
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index b1e66ee192e..8fcd5693624 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1000,7 +1000,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>
>  #define VALID_AVX512F_SCALAR_MODE(MODE)                                        \
>    ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode            \
> -   || (MODE) == SFmode)
> +   || (MODE) == SFmode                                                 \
> +   || (TARGET_AVX512FP16 && ((MODE) == HImode || (MODE) == HFmode)))
>
>  #define VALID_AVX512F_REG_MODE(MODE)                                   \
>    ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode     \
> @@ -1039,7 +1040,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>
>  #define VALID_FP_MODE_P(MODE)                                          \
>    ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode            \
> -   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)                \
> +   || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
>
>  #define VALID_INT_MODE_P(MODE)                                         \
>    ((MODE) == QImode || (MODE) == HImode                                        \
> @@ -1072,6 +1073,10 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>  #define SSE_FLOAT_MODE_P(MODE) \
>    ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
>
> +#define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE)                           \
> +  ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH)                                \
> +   || (TARGET_AVX512FP16 && (MODE) == HFmode))
> +
>  #define FMA4_VEC_FLOAT_MODE_P(MODE) \
>    (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \
>                   || (MODE) == V8SFmode || (MODE) == V4DFmode))
> @@ -2265,7 +2270,7 @@ constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
>  constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
>    | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
>    | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
> -  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI;
> +  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16;
>  constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
>    | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
>  constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index d475347172d..777d11261ac 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -496,7 +496,7 @@ (define_attr "type"
>
>  ;; Main data type used by the insn
>  (define_attr "mode"
> -  "unknown,none,QI,HI,SI,DI,TI,OI,XI,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
> +  "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V16SF,V8SF,V4DF,V4SF,
>    V2DF,V2SF,V1DF,V8DF"
>    (const_string "unknown"))
>
> @@ -832,8 +832,7 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
>                     sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
>                     avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
>                     avx512bw,noavx512bw,avx512dq,noavx512dq,
> -                   avx512vl,noavx512vl,
> -                   avxvnni,avx512vnnivl"
> +                   avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16"
>    (const_string "base"))
>
>  ;; Define instruction set of MMX instructions
> @@ -885,6 +884,8 @@ (define_attr "enabled" ""
>          (eq_attr "isa" "avxvnni") (symbol_ref "TARGET_AVXVNNI")
>          (eq_attr "isa" "avx512vnnivl")
>            (symbol_ref "TARGET_AVX512VNNI && TARGET_AVX512VL")
> +        (eq_attr "isa" "avx512fp16")
> +          (symbol_ref "TARGET_AVX512FP16")
>
>          (eq_attr "mmx_isa" "native")
>            (symbol_ref "!TARGET_MMX_WITH_SSE")
> @@ -906,6 +907,7 @@ (define_asm_attributes
>     (set_attr "type" "multi")])
>
>  (define_code_iterator plusminus [plus minus])
> +(define_code_iterator plusminusmultdiv [plus minus mult div])
>
>  (define_code_iterator sat_plusminus [ss_plus us_plus ss_minus us_minus])
>
> @@ -921,7 +923,8 @@ (define_code_attr multdiv_mnemonic
>
>  ;; Mark commutative operators as such in constraints.
>  (define_code_attr comm [(plus "%") (ss_plus "%") (us_plus "%")
> -                       (minus "") (ss_minus "") (us_minus "")])
> +                       (minus "") (ss_minus "") (us_minus "")
> +                       (mult "%") (div "")])
>
>  ;; Mapping of max and min
>  (define_code_iterator maxmin [smax smin umax umin])
> @@ -1021,7 +1024,8 @@ (define_code_attr insn
>     (minus "sub") (ss_minus "sssub") (us_minus "ussub")
>     (sign_extend "extend") (zero_extend "zero_extend")
>     (ashift "ashl") (lshiftrt "lshr") (ashiftrt "ashr")
> -   (rotate "rotl") (rotatert "rotr")])
> +   (rotate "rotl") (rotatert "rotr")
> +   (mult "mul") (div "div")])
>
>  ;; All integer modes.
>  (define_mode_iterator SWI1248x [QI HI SI DI])
> @@ -1089,8 +1093,9 @@ (define_mode_iterator SWI48DWI [SI DI (TI "TARGET_64BIT")])
>  ;; compile time constant, it is faster to use <MODE_SIZE> than
>  ;; GET_MODE_SIZE (<MODE>mode).  For XFmode which depends on
>  ;; command line options just use GET_MODE_SIZE macro.
> -(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8") (TI "16")
> -                            (SF "4") (DF "8") (XF "GET_MODE_SIZE (XFmode)")
> +(define_mode_attr MODE_SIZE [(QI "1") (HI "2") (SI "4") (DI "8")
> +                            (TI "16") (HF "2") (SF "4") (DF "8")
> +                            (XF "GET_MODE_SIZE (XFmode)")
>                              (V16QI "16") (V32QI "32") (V64QI "64")
>                              (V8HI "16") (V16HI "32") (V32HI "64")
>                              (V4SI "16") (V8SI "32") (V16SI "64")
> @@ -1222,8 +1227,8 @@ (define_mode_iterator MODEF [SF DF])
>  ;; All x87 floating point modes
>  (define_mode_iterator X87MODEF [SF DF XF])
>
> -;; All x87 floating point modes plus HF
> -(define_mode_iterator X87MODEFH [SF DF XF HF])
> +;; All x87 floating point modes plus HFmode
> +(define_mode_iterator X87MODEFH [HF SF DF XF])
>
>  ;; All SSE floating point modes
>  (define_mode_iterator SSEMODEF [SF DF TF])
> @@ -1231,7 +1236,7 @@ (define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
>
>  ;; SSE instruction suffix for various modes
>  (define_mode_attr ssemodesuffix
> -  [(SF "ss") (DF "sd")
> +  [(HF "sh") (SF "ss") (DF "sd")
>     (V16SF "ps") (V8DF "pd")
>     (V8SF "ps") (V4DF "pd")
>     (V4SF "ps") (V2DF "pd")
> @@ -1496,6 +1501,23 @@ (define_expand "cstorexf4"
>    DONE;
>  })
>
> +(define_expand "cbranchhf4"
> +  [(set (reg:CC FLAGS_REG)
> +       (compare:CC (match_operand:HF 1 "cmp_fp_expander_operand")
> +                   (match_operand:HF 2 "cmp_fp_expander_operand")))
> +   (set (pc) (if_then_else
> +              (match_operator 0 "ix86_fp_comparison_operator"
> +               [(reg:CC FLAGS_REG)
> +                (const_int 0)])
> +              (label_ref (match_operand 3))
> +              (pc)))]
> +  "TARGET_AVX512FP16"
> +{
> +  ix86_expand_branch (GET_CODE (operands[0]),
> +                     operands[1], operands[2], operands[3]);
> +  DONE;
> +})
> +
>  (define_expand "cbranch<mode>4"
>    [(set (reg:CC FLAGS_REG)
>         (compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
> @@ -1705,6 +1727,17 @@ (define_insn "*cmpi<unord><MODEF:mode>"
>          (eq_attr "alternative" "0")
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
> +
> +(define_insn "*cmpi<unord>hf"
> +  [(set (reg:CCFP FLAGS_REG)
> +       (compare:CCFP
> +         (match_operand:HF 0 "register_operand" "v")
> +         (match_operand:HF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "v<unord>comish\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "ssecomi")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
>
>  ;; Push/pop instructions.
>
> @@ -2436,8 +2469,8 @@ (define_insn "*movsi_internal"
>            (symbol_ref "true")))])
>
>  (define_insn "*movhi_internal"
> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k")
> -       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC"))]
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k ,*r,*m,*k,?r,?v,*v,*v,*m")
> +       (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
>    "!(MEM_P (operands[0]) && MEM_P (operands[1]))
>     && ix86_hardreg_mov_ok (operands[0], operands[1])"
>
> @@ -2463,6 +2496,9 @@ (define_insn "*movhi_internal"
>           gcc_unreachable ();
>         }
>
> +    case TYPE_SSEMOV:
> +      return ix86_output_ssemov (insn, operands);
> +
>      case TYPE_MSKLOG:
>        if (operands[1] == const0_rtx)
>         return "kxorw\t%0, %0, %0";
> @@ -2477,8 +2513,15 @@ (define_insn "*movhi_internal"
>         return "mov{w}\t{%1, %0|%0, %1}";
>      }
>  }
> -  [(set (attr "type")
> -     (cond [(eq_attr "alternative" "4,5,6,7")
> +  [(set (attr "isa")
> +       (cond [(eq_attr "alternative" "9,10,11,12,13")
> +                 (const_string "avx512fp16")
> +              ]
> +              (const_string "*")))
> +   (set (attr "type")
> +     (cond [(eq_attr "alternative" "9,10,11,12,13")
> +             (const_string "ssemov")
> +           (eq_attr "alternative" "4,5,6,7")
>               (const_string "mskmov")
>             (eq_attr "alternative" "8")
>               (const_string "msklog")
> @@ -2503,6 +2546,8 @@ (define_insn "*movhi_internal"
>      (set (attr "mode")
>        (cond [(eq_attr "type" "imovx")
>                (const_string "SI")
> +            (eq_attr "alternative" "11")
> +              (const_string "HF")
>              (and (eq_attr "alternative" "1,2")
>                   (match_operand:HI 1 "aligned_operand"))
>                (const_string "SI")
> @@ -3727,7 +3772,10 @@ (define_insn "*movhf_internal"
>                (eq_attr "alternative" "2")
>                  (const_string "sselog1")
>                (eq_attr "alternative" "4,5,6,7")
> -                (const_string "sselog")
> +                (if_then_else
> +                  (match_test ("TARGET_AVX512FP16"))
> +                  (const_string "ssemov")
> +                  (const_string "sselog"))
>               ]
>               (const_string "ssemov")))
>     (set (attr "memory")
> @@ -3750,9 +3798,15 @@ (define_insn "*movhf_internal"
>                (eq_attr "alternative" "2")
>                  (const_string "V4SF")
>                (eq_attr "alternative" "4,5,6,7")
> -                (const_string "TI")
> +                (if_then_else
> +                  (match_test "TARGET_AVX512FP16")
> +                  (const_string "HI")
> +                  (const_string "TI"))
>                (eq_attr "alternative" "3")
> -                (const_string "SF")
> +                (if_then_else
> +                  (match_test "TARGET_AVX512FP16")
> +                  (const_string "HF")
> +                  (const_string "SF"))
>               ]
>               (const_string "*")))])
>
> @@ -4493,6 +4547,17 @@ (define_split
>    emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
>  })
>
> +(define_insn "extendhf<mode>2"
> +  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
> +        (float_extend:MODEF
> +         (match_operand:HF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvtsh2<ssemodesuffix>\t{%1, %0, %0|%0, %0, %1}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "<MODE>")])
> +
> +
>  (define_expand "extend<mode>xf2"
>    [(set (match_operand:XF 0 "nonimmediate_operand")
>          (float_extend:XF (match_operand:MODEF 1 "general_operand")))]
> @@ -4670,6 +4735,18 @@ (define_insn "truncxf<mode>2"
>               (symbol_ref "flag_unsafe_math_optimizations")
>            ]
>            (symbol_ref "true")))])
> +
> +;; Conversion from {SF,DF}mode to HFmode.
> +
> +(define_insn "trunc<mode>hf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (float_truncate:HF
> +         (match_operand:MODEF 1 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvt<ssemodesuffix>2sh\t{%1, %d0|%d0, %1}"
> +  [(set_attr "type" "ssecvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
>
>  ;; Signed conversion to DImode.
>
> @@ -5046,6 +5123,16 @@ (define_insn "*float<SWI48:mode><MODEF:mode>2"
>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")]
>            (symbol_ref "true")))])
>
> +(define_insn "float<floatunssuffix><mode>hf2"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (any_float:HF
> +         (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
> +  "TARGET_AVX512FP16"
> +  "vcvt<floatsuffix>si2sh<rex64suffix>\t{%1, %d0|%d0, %1}"
> +  [(set_attr "type" "sseicvt")
> +   (set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*floatdi<MODEF:mode>2_i387"
>    [(set (match_operand:MODEF 0 "register_operand" "=f")
>         (float:MODEF (match_operand:DI 1 "nonimmediate_operand" "m")))]
> @@ -7626,6 +7713,13 @@ (define_expand "<insn>xf3"
>           (match_operand:XF 2 "register_operand")))]
>    "TARGET_80387")
>
> +(define_expand "<insn>hf3"
> +  [(set (match_operand:HF 0 "register_operand")
> +       (plusminus:HF
> +         (match_operand:HF 1 "register_operand")
> +         (match_operand:HF 2 "nonimmediate_operand")))]
> +  "TARGET_AVX512FP16")
> +
>  (define_expand "<insn><mode>3"
>    [(set (match_operand:MODEF 0 "register_operand")
>         (plusminus:MODEF
> @@ -8203,6 +8297,12 @@ (define_expand "mulxf3"
>                  (match_operand:XF 2 "register_operand")))]
>    "TARGET_80387")
>
> +(define_expand "mulhf3"
> +  [(set (match_operand:HF 0 "register_operand")
> +       (mult:HF (match_operand:HF 1 "register_operand")
> +                   (match_operand:HF 2 "nonimmediate_operand")))]
> +  "TARGET_AVX512FP16")
> +
>  (define_expand "mul<mode>3"
>    [(set (match_operand:MODEF 0 "register_operand")
>         (mult:MODEF (match_operand:MODEF 1 "register_operand")
> @@ -8220,6 +8320,12 @@ (define_expand "divxf3"
>                 (match_operand:XF 2 "register_operand")))]
>    "TARGET_80387")
>
> +(define_expand "divhf3"
> +  [(set (match_operand:HF 0 "register_operand")
> +       (div:HF (match_operand:HF 1 "register_operand")
> +                  (match_operand:HF 2 "nonimmediate_operand")))]
> +  "TARGET_AVX512FP16")
> +
>  (define_expand "div<mode>3"
>    [(set (match_operand:MODEF 0 "register_operand")
>         (div:MODEF (match_operand:MODEF 1 "register_operand")
> @@ -16312,6 +16418,17 @@ (define_insn "*fop_<mode>_comm"
>          (symbol_ref "true")
>          (symbol_ref "false"))))])
>
> +(define_insn "*<insn>hf"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (plusminusmultdiv:HF
> +         (match_operand:HF 1 "nonimmediate_operand" "<comm>v")
> +         (match_operand:HF 2 "nonimmediate_operand" "vm")))]
> +  "TARGET_AVX512FP16
> +   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
> +  "v<insn>sh\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "prefix" "evex")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*rcpsf2_sse"
>    [(set (match_operand:SF 0 "register_operand" "=x,x,x")
>         (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
> @@ -19178,6 +19295,15 @@ (define_peephole2
>      gcc_unreachable ();
>  })
>
> +(define_expand "movhfcc"
> +  [(set (match_operand:HF 0 "register_operand")
> +       (if_then_else:HF
> +         (match_operand 1 "comparison_operator")
> +         (match_operand:HF 2 "register_operand")
> +         (match_operand:HF 3 "register_operand")))]
> +  "TARGET_AVX512FP16"
> +  "if (ix86_expand_fp_movcc (operands)) DONE; else FAIL;")
> +
>  (define_expand "mov<mode>cc"
>    [(set (match_operand:X87MODEF 0 "register_operand")
>         (if_then_else:X87MODEF
> @@ -19346,6 +19472,18 @@ (define_insn "<code><mode>3"
>  ;; Their operands are not commutative, and thus they may be used in the
>  ;; presence of -0.0 and NaN.
>
> +(define_insn "*ieee_s<ieee_maxmin>hf3"
> +  [(set (match_operand:HF 0 "register_operand" "=v")
> +       (unspec:HF
> +         [(match_operand:HF 1 "register_operand" "v")
> +          (match_operand:HF 2 "nonimmediate_operand" "vm")]
> +         IEEE_MAXMIN))]
> +  "TARGET_AVX512FP16"
> +  "v<ieee_maxmin>sh\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "prefix" "evex")
> +   (set_attr "type" "sseadd")
> +   (set_attr "mode" "HF")])
> +
>  (define_insn "*ieee_s<ieee_maxmin><mode>3"
>    [(set (match_operand:MODEF 0 "register_operand" "=x,v")
>         (unspec:MODEF
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 7b8547bb1c3..ad366974b5b 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1166,3 +1166,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property.
>  mmwait
>  Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save
>  Support MWAIT and MONITOR built-in functions and code generation.
> +
> +mavx512fp16
> +Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
> +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation.
> diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h
> index f129de4bbe5..2421a78637b 100644
> --- a/gcc/config/i386/immintrin.h
> +++ b/gcc/config/i386/immintrin.h
> @@ -94,6 +94,10 @@
>
>  #include <avx512vp2intersectvlintrin.h>
>
> +#ifdef __SSE2__
> +#include <avx512fp16intrin.h>
> +#endif
> +
>  #include <shaintrin.h>
>
>  #include <fmaintrin.h>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 3a1978efc97..09040bfca33 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -1164,6 +1164,14 @@ to inconsistent behavior between software emulation and AVX512-FP16
>  instructions. Using @option{-fexcess-precision=16} and  will force round
>  back after each operation.
>
> +Using @option{-mavx512fp16} will generate AVX512-FP16 instructions instead of
> +software emulation. The default behavior of @code{FLT_EVAL_METHOD} is to round
> +after each operation. The same is true with @option{-fexcess-precision=standard}
> +and @option{-mfpmath=sse}. If there is no @option{-mfpmath=sse},
> +@option{-fexcess-precision=standard} alone does the same thing as before,
> +It is useful for code that does not have @code{_Float16} and runs on the x87
> +FPU.
> +
>  @node Decimal Float
>  @section Decimal Floating Types
>  @cindex decimal floating types
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 32697e6117c..bb9f7ca956e 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1393,6 +1393,7 @@ See RS/6000 and PowerPC Options.
>  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
>  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
>  -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni@gol
> +-mavx512fp16 @gol
>  -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops @gol
>  -minline-stringops-dynamically  -mstringop-strategy=@var{alg} @gol
>  -mkl -mwidekl @gol
> @@ -31154,6 +31155,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
>  @itemx -mavx512bf16
>  @opindex mavx512bf16
>  @need 200
> +@itemx -mavx512fp16
> +@opindex mavx512fp16
> +@need 200
>  @itemx -mgfni
>  @opindex mgfni
>  @need 200
> @@ -31232,9 +31236,9 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP,
>  XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2,
>  GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
>  ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
> -UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI or CLDEMOTE
> -extended instruction sets. Each has a corresponding @option{-mno-} option to
> -disable use of these instructions.
> +UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16
> +or CLDEMOTE extended instruction sets. Each has a corresponding
> +@option{-mno-} option to disable use of these instructions.
>
>  These extensions are also available as built-in functions: see
>  @ref{x86 Built-in Functions}, for details of the functions enabled and
> diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
> index 62b2132957a..fba3d1ac684 100644
> --- a/gcc/testsuite/g++.dg/other/i386-2.C
> +++ b/gcc/testsuite/g++.dg/other/i386-2.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt  -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>
>  /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
>     xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
> diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
> index 843aa2bdb2f..5cc0fa83457 100644
> --- a/gcc/testsuite/g++.dg/other/i386-3.C
> +++ b/gcc/testsuite/g++.dg/other/i386-3.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>
>  /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
>     xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
> diff --git a/gcc/testsuite/g++.target/i386/float16-1.C b/gcc/testsuite/g++.target/i386/float16-1.C
> new file mode 100644
> index 00000000000..95d1ac27c4f
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-1.C
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-sse2" } */
> +
> +_Float16/* { dg-error "does not name a type" } */
> +foo (_Float16 x)
> +{
> +  return x;
> +}
> diff --git a/gcc/testsuite/g++.target/i386/float16-2.C b/gcc/testsuite/g++.target/i386/float16-2.C
> new file mode 100644
> index 00000000000..99eb797eff1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-2.C
> @@ -0,0 +1,14 @@
> +/* { dg-do assemble { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +union flt
> +{
> +  _Float16 flt;
> +  short s;
> +};
> +
> +_Float16
> +foo (union flt x)
> +{
> +  return x.flt;
> +}
> diff --git a/gcc/testsuite/g++.target/i386/float16-3.C b/gcc/testsuite/g++.target/i386/float16-3.C
> new file mode 100644
> index 00000000000..940878503f1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/float16-3.C
> @@ -0,0 +1,10 @@
> +/* { dg-do assemble { target avx512fp16 } } */
> +/* { dg-options "-O0 -mavx512fp16" } */
> +
> +template <typename> void a(char *) {}
> +char b, d;
> +void c()
> +{
> +  a<unsigned char>(&d);
> +  a<_Float16>(&b);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
> index 6178e38ce02..f3676077743 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw" } */
> +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
> index 986fbd819e4..1751c52565c 100644
> --- a/gcc/testsuite/gcc.target/i386/avx-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw" } */
> +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
> index 0a377dba1d5..0ad9064f637 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> @@ -87,6 +87,9 @@ main ()
>  #ifdef AVX512VNNI
>        && (ecx & bit_AVX512VNNI)
>  #endif
> +#ifdef AVX512FP16
> +      && (edx & bit_AVX512FP16)
> +#endif
>  #ifdef VAES
>        && (ecx & bit_VAES)
>  #endif
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> new file mode 100644
> index 00000000000..88887556d68
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +__attribute__ ((noinline, noclone))
> +do_max (_Float16 __A, _Float16 __B)
> +{
> +  return __A > __B ? __A : __B;
> +}
> +
> +_Float16
> +__attribute__ ((noinline, noclone))
> +do_min (_Float16 __A, _Float16 __B)
> +{
> +  return __A < __B ? __A : __B;
> +}
> +
> +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vminsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> new file mode 100644
> index 00000000000..c9e23bf95c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run { target avx512fp16 } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +#include "avx512fp16-12a.c"
> +
> +static void
> +do_test (void)
> +{
> +  _Float16 x = 0.1f;
> +  _Float16 y = -3.2f;
> +  _Float16 z;
> +
> +  z = do_max (x, y);
> +  if (z != x)
> +    abort ();
> +
> +  z = do_min (x, y);
> +  if (z != y)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/float16-3a.c b/gcc/testsuite/gcc.target/i386/float16-3a.c
> new file mode 100644
> index 00000000000..3846c8e9b6e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-3a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtsi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-3b.c b/gcc/testsuite/gcc.target/i386/float16-3b.c
> new file mode 100644
> index 00000000000..247dd6e7e33
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-3b.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (unsigned int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtusi2shl\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-4a.c b/gcc/testsuite/gcc.target/i386/float16-4a.c
> new file mode 100644
> index 00000000000..631082581f3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-4a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (long long x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtsi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/float16-4b.c b/gcc/testsuite/gcc.target/i386/float16-4b.c
> new file mode 100644
> index 00000000000..828d8530769
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-4b.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +
> +_Float16
> +foo (unsigned long long x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-times "vcvtusi2shq\[ \t\]+\[^\n\r]*%xmm0" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> index 79265c7c94f..8499fdf2db9 100644
> --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> @@ -79,6 +79,7 @@ extern void test_hreset (void)                        __attribute__((__target__("hreset")));
>  extern void test_keylocker (void)              __attribute__((__target__("kl")));
>  extern void test_widekl (void)                 __attribute__((__target__("widekl")));
>  extern void test_avxvnni (void)                        __attribute__((__target__("avxvnni")));
> +extern void test_avx512fp16 (void)             __attribute__((__target__("avx512fp16")));
>
>  extern void test_no_sgx (void)                 __attribute__((__target__("no-sgx")));
>  extern void test_no_avx5124fmaps(void)         __attribute__((__target__("no-avx5124fmaps")));
> @@ -159,6 +160,7 @@ extern void test_no_hreset (void)           __attribute__((__target__("no-hreset")));
>  extern void test_no_keylocker (void)           __attribute__((__target__("no-kl")));
>  extern void test_no_widekl (void)              __attribute__((__target__("no-widekl")));
>  extern void test_no_avxvnni (void)             __attribute__((__target__("no-avxvnni")));
> +extern void test_no_avx512fp16 (void)          __attribute__((__target__("no-avx512fp16")));
>
>  extern void test_arch_nocona (void)            __attribute__((__target__("arch=nocona")));
>  extern void test_arch_core2 (void)             __attribute__((__target__("arch=core2")));
> diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> new file mode 100644
> index 00000000000..2f8af392c83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16" } */
> +/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
> +/* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */
> +
> +#include <immintrin.h>
> +
> +_Float16
> +foo (_Float16 x, _Float16 y)
> +{
> +  x = x > y ? x : y;
> +  return x;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
> index 7029771334b..f5f5c113612 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-13.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-13.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
> index 4ce0ffffaf3..747d504cedb 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-14.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-14.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */
> +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  #include <mm_malloc.h>
> diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
> index 6e8b6f3fa1b..33411969901 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-22.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-22.c
> @@ -103,7 +103,7 @@
>
>
>  #ifndef DIFFERENT_PRAGMAS
> -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>  #endif
>
>  /* Following intrinsics require immediate arguments.  They
> @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
>
>  /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
>  #ifdef DIFFERENT_PRAGMAS
> -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>  #endif
>  #include <immintrin.h>
>  test_1 (_cvtss_sh, unsigned short, float, 1)
> diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
> index 7faa053ace8..86590ca5ffb 100644
> --- a/gcc/testsuite/gcc.target/i386/sse-23.c
> +++ b/gcc/testsuite/gcc.target/i386/sse-23.c
> @@ -708,6 +708,6 @@
>  #define __builtin_ia32_vpclmulqdq_v2di(A, B, C)  __builtin_ia32_vpclmulqdq_v2di(A, B, 1)
>  #define __builtin_ia32_vpclmulqdq_v8di(A, B, C)  __builtin_ia32_vpclmulqdq_v8di(A, B, 1)
>
> -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni")
> +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16")
>
>  #include <x86intrin.h>
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index 42ac9d0ac1a..10765365d7b 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3020,7 +3020,7 @@ proc check_effective_target_has_q_floating_suffix { } {
>
>  proc check_effective_target_float16 {} {
>      return [check_no_compiler_messages_nocache float16 object {
> -        _Float16 x;
> +        _Float16 foo (_Float16 x) { return x; }
>      } [add_options_for_float16 ""]]
>  }
>
> @@ -8714,6 +8714,17 @@ proc check_prefer_avx128 { } {
>  }
>
>
> +# Return 1 if avx512fp16 instructions can be compiled.
> +
> +proc check_effective_target_avx512fp16 { } {
> +    return [check_no_compiler_messages avx512fp16 object {
> +       void foo (void)
> +       {
> +         asm volatile ("vmovw %edi, %xmm0");
> +       }
> +    } "-O2 -mavx512fp16" ]
> +}
> +
>  # Return 1 if avx512f instructions can be compiled.
>
>  proc check_effective_target_avx512f { } {
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-08-04  2:45                       ` Hongtao Liu
@ 2021-08-04 11:28                         ` Richard Biener
  2021-08-05  7:31                           ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-04 11:28 UTC (permalink / raw)
  To: Hongtao Liu, Richard Sandiford
  Cc: liuhongt, GCC Patches, Uros Bizjak, Joseph Myers, H. J. Lu

On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> >         * config/i386/i386.c (enum x86_64_reg_class): Add
> >         X86_64_SSEHF_CLASS.
> >         (merge_classes): Handle X86_64_SSEHF_CLASS.
> >         (examine_argument): Ditto.
> >         (construct_container): Ditto.
> >         (classify_argument): Ditto, and set HFmode/HCmode to
> >         X86_64_SSEHF_CLASS.
> >         (function_value_32): Return _FLoat16/Complex Float16 by
> >         %xmm0.
> >         (function_value_64): Return _Float16/Complex Float16 by SSE
> >         register.
> >         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> >         (ix86_secondary_reload): Require gpr as intermediate register
> >         to store _Float16 from sse register when sse4 is not
> >         available.
> >         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> >         sse2.
> >         (ix86_scalar_mode_supported_p): Ditto.
> >         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> >         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> >         (VALID_INT_MODE_P): Add HFmode and HCmode.
> >         * config/i386/i386.md (*pushhf_rex64): New define_insn.
> >         (*pushhf): Ditto.
> >         (*movhf_internal): Ditto.
> >         * doc/extend.texi (Half-Precision Floating Point): Documemt
> >         _Float16 for x86.
> >         * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> >         which is used by extract_bit_field but not backends.
> >
[...]
>
> Ping, i'd like to ask for approval for the below codes which is
> related to generic part.
>
> start from ..
> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > index ff3b4449b37..775ee397836 100644
> > --- a/gcc/emit-rtl.c
> > +++ b/gcc/emit-rtl.c
> > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
> >       fix them all.  */
> >    if (omode == word_mode)
> >      ;
> > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > +     here. Though extract_bit_field is the culprit here, not the backends.  */
> > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > +    ;
> >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> >       is the culprit here, and not the backends.  */
> >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
>
> and end here.

So the main restriction otherwise in place is

  /* Subregs involving floating point modes are not allowed to
     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
     (subreg:SI (reg:DF) 0) isn't.  */
  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
    {
      if (! (known_eq (isize, osize)
             /* LRA can use subreg to store a floating point value in
                an integer mode.  Although the floating point and the
                integer modes need the same number of hard registers,
                the size of floating point mode can be less than the
                integer mode.  LRA also uses subregs for a register
                should be used in different mode in on insn.  */
             || lra_in_progress))
        return false;

I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))
to "work around" this restriction.  Alternatively one could finally do away
with all the exceptions and simply allow all such subregs giving them
semantics as to intermediate same-size subregs to integer modes
if this definition issue is why we disallow them?

That is, any float-mode source or destination subreg is interpreted as
wrapping the source operand (if float-mode) in a same size int subreg
and performing the subreg in an integer mode first if the destination
mode is a float mode?

Also I detest that validate_subreg list things not allowed as opposed
to things allowed.  Why are FLOAT_MODE special, but
fractional and accumulating modes not?  The subreg documentation
also doesn't talk about cases not allowed.

Richard.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-08-04 11:28                         ` Richard Biener
@ 2021-08-05  7:31                           ` Hongtao Liu
  2021-08-05  7:39                             ` Hongtao Liu
  2021-08-05  9:24                             ` Richard Biener
  0 siblings, 2 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-05  7:31 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Sandiford, liuhongt, GCC Patches, Uros Bizjak,
	Joseph Myers, H. J. Lu

On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > >         * config/i386/i386.c (enum x86_64_reg_class): Add
> > >         X86_64_SSEHF_CLASS.
> > >         (merge_classes): Handle X86_64_SSEHF_CLASS.
> > >         (examine_argument): Ditto.
> > >         (construct_container): Ditto.
> > >         (classify_argument): Ditto, and set HFmode/HCmode to
> > >         X86_64_SSEHF_CLASS.
> > >         (function_value_32): Return _FLoat16/Complex Float16 by
> > >         %xmm0.
> > >         (function_value_64): Return _Float16/Complex Float16 by SSE
> > >         register.
> > >         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > >         (ix86_secondary_reload): Require gpr as intermediate register
> > >         to store _Float16 from sse register when sse4 is not
> > >         available.
> > >         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > >         sse2.
> > >         (ix86_scalar_mode_supported_p): Ditto.
> > >         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > >         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > >         (VALID_INT_MODE_P): Add HFmode and HCmode.
> > >         * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > >         (*pushhf): Ditto.
> > >         (*movhf_internal): Ditto.
> > >         * doc/extend.texi (Half-Precision Floating Point): Documemt
> > >         _Float16 for x86.
> > >         * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > >         which is used by extract_bit_field but not backends.
> > >
> [...]
> >
> > Ping, i'd like to ask for approval for the below codes which is
> > related to generic part.
> >
> > start from ..
> > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > index ff3b4449b37..775ee397836 100644
> > > --- a/gcc/emit-rtl.c
> > > +++ b/gcc/emit-rtl.c
> > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
> > >       fix them all.  */
> > >    if (omode == word_mode)
> > >      ;
> > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > > +     here. Though extract_bit_field is the culprit here, not the backends.  */
> > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > +    ;
> > >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > >       is the culprit here, and not the backends.  */
> > >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> >
> > and end here.
>
> So the main restriction otherwise in place is
>
>   /* Subregs involving floating point modes are not allowed to
>      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
>      (subreg:SI (reg:DF) 0) isn't.  */
>   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
>     {
>       if (! (known_eq (isize, osize)
>              /* LRA can use subreg to store a floating point value in
>                 an integer mode.  Although the floating point and the
>                 integer modes need the same number of hard registers,
>                 the size of floating point mode can be less than the
>                 integer mode.  LRA also uses subregs for a register
>                 should be used in different mode in on insn.  */
>              || lra_in_progress))
>         return false;
>
> I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))

After debug, I find (subreg:SI (reg:HF)) is not really needed, it
would be finally handled by below cut
----cut-----
  /* Find a correspondingly-sized integer field, so we can apply
     shifts and masks to it.  */
  scalar_int_mode int_mode;
  if (!int_mode_for_mode (tmode).exists (&int_mode))
    /* If this fails, we should probably push op0 out to memory and then
       do a load.  */
    int_mode = int_mode_for_mode (mode).require ();

  target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
    bitnum, target, unsignedp, reverse);
-----end----

and generate things like below cut

---cut----
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
            (set (reg:HI 86)
                (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
                    (const_int -1 [0xffffffffffffffff])))
            (clobber (reg:CC 17 flags))
        ]) "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
-1
     (nil))
(insn 7 6 11 2 (set (reg:HF 82 [ <retval> ])
        (subreg:HF (reg:HI 86) 0))
"../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
-1
     (nil))
(insn 11 7 12 2 (set (reg/i:HF 20 xmm0)
        (reg:HF 82 [ <retval> ]))
"../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1
-1
     (nil))
----end---

The real problem is here, when validate_subreg doesn't allow subreg
between integer mode and float mode with different sizes. It will hit
gcc_assert in gen_lowpart

----cut-----
      /* Don't use LHS paradoxical subreg if explicit truncation is needed
between the mode of the extraction (word_mode) and the target
mode.  Instead, create a temporary and use convert_move to set
the target.  */
      if (REG_P (target)
  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
{
  target = gen_lowpart (ext_mode, target);
  if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
    spec_target_subreg = target;
}
----end----

So how about changes like below, remove changes in validate_subreg and
add some guard in extract_bit_field_using_extv.

modified   gcc/emit-rtl.c
@@ -928,11 +928,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
      fix them all.  */
   if (omode == word_mode)
     ;
-  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
-     here. Though extract_bit_field is the culprit here, not the backends.  */
-  else if (known_gt (regsize, osize) && known_gt (osize, isize)
-           && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
-    ;
   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
      is the culprit here, and not the backends.  */
   else if (known_ge (osize, regsize) && known_ge (isize, osize))
modified   gcc/expmed.c
@@ -1572,8 +1572,19 @@ extract_bit_field_using_extv (const
extraction_insn *extv, rtx op0,
          between the mode of the extraction (word_mode) and the target
          mode.  Instead, create a temporary and use convert_move to set
          the target.  */
+      machine_mode tmode = GET_MODE (target);
       if (REG_P (target)
-          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
+          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
+          /* When validate_subreg doesn't allow subreg between integer mode
+             and float mode with different size, It will hit gcc_assert in
+             gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is
+             not really needed, codes like below will be finally generated.
+             (set (reg:SI 1)
+                  (and:SI (reg:DI 2) -1))
+             (set (reg:SF 3)
+                  (subreg:SF (reg:SI 1)))  */
+          && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
+          && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
         {
           target = gen_lowpart (ext_mode, target);
           if (partial_subreg_p (GET_MODE (spec_target), ext_mode))



> to "work around" this restriction.  Alternatively one could finally do away
> with all the exceptions and simply allow all such subregs giving them
> semantics as to intermediate same-size subregs to integer modes
> if this definition issue is why we disallow them?
>
> That is, any float-mode source or destination subreg is interpreted as
> wrapping the source operand (if float-mode) in a same size int subreg
> and performing the subreg in an integer mode first if the destination
> mode is a float mode?
>
> Also I detest that validate_subreg list things not allowed as opposed
> to things allowed.  Why are FLOAT_MODE special, but
> fractional and accumulating modes not?  The subreg documentation
> also doesn't talk about cases not allowed.
>
> Richard.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-08-05  7:31                           ` Hongtao Liu
@ 2021-08-05  7:39                             ` Hongtao Liu
  2021-08-05  9:24                             ` Richard Biener
  1 sibling, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-05  7:39 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Sandiford, liuhongt, GCC Patches, Uros Bizjak,
	Joseph Myers, H. J. Lu

On Thu, Aug 5, 2021 at 3:31 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > > >         * config/i386/i386.c (enum x86_64_reg_class): Add
> > > >         X86_64_SSEHF_CLASS.
> > > >         (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > >         (examine_argument): Ditto.
> > > >         (construct_container): Ditto.
> > > >         (classify_argument): Ditto, and set HFmode/HCmode to
> > > >         X86_64_SSEHF_CLASS.
> > > >         (function_value_32): Return _FLoat16/Complex Float16 by
> > > >         %xmm0.
> > > >         (function_value_64): Return _Float16/Complex Float16 by SSE
> > > >         register.
> > > >         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > >         (ix86_secondary_reload): Require gpr as intermediate register
> > > >         to store _Float16 from sse register when sse4 is not
> > > >         available.
> > > >         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > > >         sse2.
> > > >         (ix86_scalar_mode_supported_p): Ditto.
> > > >         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > >         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > >         (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > >         * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > >         (*pushhf): Ditto.
> > > >         (*movhf_internal): Ditto.
> > > >         * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > >         _Float16 for x86.
> > > >         * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > >         which is used by extract_bit_field but not backends.
> > > >
> > [...]
> > >
> > > Ping, i'd like to ask for approval for the below codes which is
> > > related to generic part.
> > >
> > > start from ..
> > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > > index ff3b4449b37..775ee397836 100644
> > > > --- a/gcc/emit-rtl.c
> > > > +++ b/gcc/emit-rtl.c
> > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
> > > >       fix them all.  */
> > > >    if (omode == word_mode)
> > > >      ;
> > > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > > > +     here. Though extract_bit_field is the culprit here, not the backends.  */
> > > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > > +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > > +    ;
> > > >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > >       is the culprit here, and not the backends.  */
> > > >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > >
> > > and end here.
> >
> > So the main restriction otherwise in place is
> >
> >   /* Subregs involving floating point modes are not allowed to
> >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> >      (subreg:SI (reg:DF) 0) isn't.  */
> >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> >     {
> >       if (! (known_eq (isize, osize)
> >              /* LRA can use subreg to store a floating point value in
> >                 an integer mode.  Although the floating point and the
> >                 integer modes need the same number of hard registers,
> >                 the size of floating point mode can be less than the
> >                 integer mode.  LRA also uses subregs for a register
> >                 should be used in different mode in on insn.  */
> >              || lra_in_progress))
> >         return false;
> >
> > I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))
>
> After debug, I find (subreg:SI (reg:HF)) is not really needed, it
> would be finally handled by below cut
> ----cut-----
>   /* Find a correspondingly-sized integer field, so we can apply
>      shifts and masks to it.  */
>   scalar_int_mode int_mode;
>   if (!int_mode_for_mode (tmode).exists (&int_mode))
>     /* If this fails, we should probably push op0 out to memory and then
>        do a load.  */
>     int_mode = int_mode_for_mode (mode).require ();
>
>   target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
>     bitnum, target, unsignedp, reverse);
> -----end----
>
> and generate things like below cut
>
> ---cut----
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (parallel [
>             (set (reg:HI 86)
>                 (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
>                     (const_int -1 [0xffffffffffffffff])))
>             (clobber (reg:CC 17 flags))
>         ]) "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> -1
>      (nil))
> (insn 7 6 11 2 (set (reg:HF 82 [ <retval> ])
>         (subreg:HF (reg:HI 86) 0))
> "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> -1
>      (nil))
> (insn 11 7 12 2 (set (reg/i:HF 20 xmm0)
>         (reg:HF 82 [ <retval> ]))
> "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1
> -1
>      (nil))
> ----end---
>
> The real problem is here, when validate_subreg doesn't allow subreg
> between integer mode and float mode with different sizes. It will hit
> gcc_assert in gen_lowpart
>
> ----cut-----
>       /* Don't use LHS paradoxical subreg if explicit truncation is needed
> between the mode of the extraction (word_mode) and the target
> mode.  Instead, create a temporary and use convert_move to set
> the target.  */
>       if (REG_P (target)
>   && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> {
>   target = gen_lowpart (ext_mode, target);
>   if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
>     spec_target_subreg = target;
> }
> ----end----
>
> So how about changes like below, remove changes in validate_subreg and
> add some guard in extract_bit_field_using_extv.
>
> modified   gcc/emit-rtl.c
> @@ -928,11 +928,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
>       fix them all.  */
>    if (omode == word_mode)
>      ;
> -  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> -     here. Though extract_bit_field is the culprit here, not the backends.  */
> -  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> -           && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> -    ;
>    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>       is the culprit here, and not the backends.  */
>    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> modified   gcc/expmed.c
> @@ -1572,8 +1572,19 @@ extract_bit_field_using_extv (const
> extraction_insn *extv, rtx op0,
>           between the mode of the extraction (word_mode) and the target
>           mode.  Instead, create a temporary and use convert_move to set
>           the target.  */
> +      machine_mode tmode = GET_MODE (target);
>        if (REG_P (target)
> -          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> +          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
> +          /* When validate_subreg doesn't allow subreg between integer mode
> +             and float mode with different size, It will hit gcc_assert in
> +             gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is
> +             not really needed, codes like below will be finally generated.
> +             (set (reg:SI 1)
> +                  (and:SI (reg:DI 2) -1))
> +             (set (reg:SF 3)
> +                  (subreg:SF (reg:SI 1)))  */
> +          && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
> +          && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
>          {
>            target = gen_lowpart (ext_mode, target);
>            if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
>
>
>
> > to "work around" this restriction.  Alternatively one could finally do away
> > with all the exceptions and simply allow all such subregs giving them
> > semantics as to intermediate same-size subregs to integer modes
> > if this definition issue is why we disallow them?
> >
> > That is, any float-mode source or destination subreg is interpreted as
> > wrapping the source operand (if float-mode) in a same size int subreg
> > and performing the subreg in an integer mode first if the destination
> > mode is a float mode?
> >
> > Also I detest that validate_subreg list things not allowed as opposed
> > to things allowed.  Why are FLOAT_MODE special, but
> > fractional and accumulating modes not?  The subreg documentation
> > also doesn't talk about cases not allowed.
I found there're usage like (subreg:SI (reg:TF)) which is generated by
simplify_gen_subreg_concatn in pass_subreg1 and finally handled by RA.

dump form addtf3.c
---cut-----
(insn 5132 411 5133 2 (set (reg:SI 1400)
        (subreg:SI (reg/v:TF 747 [ b ]) 0)) 75 {*movsi_internal}
     (nil))
(insn 5133 5132 5134 2 (set (reg:SI 1401 [+4 ])
        (subreg:SI (reg/v:TF 747 [ b ]) 4)) 75 {*movsi_internal}
     (nil))
(insn 5134 5133 5135 2 (set (reg:SI 1402 [+8 ])
        (subreg:SI (reg/v:TF 747 [ b ]) 8)) 75 {*movsi_internal}
     (nil))
(insn 5135 5134 413 2 (set (reg:SI 1403 [+12 ])
        (subreg:SI (reg/v:TF 747 [ b ]) 12)) 75 {*movsi_internal}
-----end-----
> >
> > Richard.
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-08-05  7:31                           ` Hongtao Liu
  2021-08-05  7:39                             ` Hongtao Liu
@ 2021-08-05  9:24                             ` Richard Biener
  2021-08-05  9:49                               ` Hongtao Liu
  1 sibling, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-05  9:24 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Richard Sandiford, liuhongt, GCC Patches, Uros Bizjak,
	Joseph Myers, H. J. Lu

On Thu, Aug 5, 2021 at 9:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > > >         * config/i386/i386.c (enum x86_64_reg_class): Add
> > > >         X86_64_SSEHF_CLASS.
> > > >         (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > >         (examine_argument): Ditto.
> > > >         (construct_container): Ditto.
> > > >         (classify_argument): Ditto, and set HFmode/HCmode to
> > > >         X86_64_SSEHF_CLASS.
> > > >         (function_value_32): Return _FLoat16/Complex Float16 by
> > > >         %xmm0.
> > > >         (function_value_64): Return _Float16/Complex Float16 by SSE
> > > >         register.
> > > >         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > >         (ix86_secondary_reload): Require gpr as intermediate register
> > > >         to store _Float16 from sse register when sse4 is not
> > > >         available.
> > > >         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > > >         sse2.
> > > >         (ix86_scalar_mode_supported_p): Ditto.
> > > >         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > >         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > >         (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > >         * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > >         (*pushhf): Ditto.
> > > >         (*movhf_internal): Ditto.
> > > >         * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > >         _Float16 for x86.
> > > >         * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > >         which is used by extract_bit_field but not backends.
> > > >
> > [...]
> > >
> > > Ping, i'd like to ask for approval for the below codes which is
> > > related to generic part.
> > >
> > > start from ..
> > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > > index ff3b4449b37..775ee397836 100644
> > > > --- a/gcc/emit-rtl.c
> > > > +++ b/gcc/emit-rtl.c
> > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
> > > >       fix them all.  */
> > > >    if (omode == word_mode)
> > > >      ;
> > > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > > > +     here. Though extract_bit_field is the culprit here, not the backends.  */
> > > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > > +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > > +    ;
> > > >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > >       is the culprit here, and not the backends.  */
> > > >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > >
> > > and end here.
> >
> > So the main restriction otherwise in place is
> >
> >   /* Subregs involving floating point modes are not allowed to
> >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> >      (subreg:SI (reg:DF) 0) isn't.  */
> >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> >     {
> >       if (! (known_eq (isize, osize)
> >              /* LRA can use subreg to store a floating point value in
> >                 an integer mode.  Although the floating point and the
> >                 integer modes need the same number of hard registers,
> >                 the size of floating point mode can be less than the
> >                 integer mode.  LRA also uses subregs for a register
> >                 should be used in different mode in on insn.  */
> >              || lra_in_progress))
> >         return false;
> >
> > I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))
>
> After debug, I find (subreg:SI (reg:HF)) is not really needed, it
> would be finally handled by below cut
> ----cut-----
>   /* Find a correspondingly-sized integer field, so we can apply
>      shifts and masks to it.  */
>   scalar_int_mode int_mode;
>   if (!int_mode_for_mode (tmode).exists (&int_mode))
>     /* If this fails, we should probably push op0 out to memory and then
>        do a load.  */
>     int_mode = int_mode_for_mode (mode).require ();
>
>   target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
>     bitnum, target, unsignedp, reverse);
> -----end----
>
> and generate things like below cut
>
> ---cut----
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (parallel [
>             (set (reg:HI 86)
>                 (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
>                     (const_int -1 [0xffffffffffffffff])))
>             (clobber (reg:CC 17 flags))
>         ]) "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> -1
>      (nil))
> (insn 7 6 11 2 (set (reg:HF 82 [ <retval> ])
>         (subreg:HF (reg:HI 86) 0))
> "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> -1
>      (nil))
> (insn 11 7 12 2 (set (reg/i:HF 20 xmm0)
>         (reg:HF 82 [ <retval> ]))
> "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1
> -1
>      (nil))
> ----end---
>
> The real problem is here, when validate_subreg doesn't allow subreg
> between integer mode and float mode with different sizes. It will hit
> gcc_assert in gen_lowpart
>
> ----cut-----
>       /* Don't use LHS paradoxical subreg if explicit truncation is needed
> between the mode of the extraction (word_mode) and the target
> mode.  Instead, create a temporary and use convert_move to set
> the target.  */
>       if (REG_P (target)
>   && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> {
>   target = gen_lowpart (ext_mode, target);
>   if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
>     spec_target_subreg = target;
> }
> ----end----
>
> So how about changes like below, remove changes in validate_subreg and
> add some guard in extract_bit_field_using_extv.
>
> modified   gcc/emit-rtl.c
> @@ -928,11 +928,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
>       fix them all.  */
>    if (omode == word_mode)
>      ;
> -  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> -     here. Though extract_bit_field is the culprit here, not the backends.  */
> -  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> -           && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> -    ;
>    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>       is the culprit here, and not the backends.  */
>    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> modified   gcc/expmed.c
> @@ -1572,8 +1572,19 @@ extract_bit_field_using_extv (const
> extraction_insn *extv, rtx op0,
>           between the mode of the extraction (word_mode) and the target
>           mode.  Instead, create a temporary and use convert_move to set
>           the target.  */
> +      machine_mode tmode = GET_MODE (target);
>        if (REG_P (target)
> -          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> +          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)

doesn't it simply mean that TRULY_NOOP_TRUNCATION_MODES_P may not be
true for modes that are not handled by gen_lowpart?  But it just wraps the
truly_noop_truncation target hook which is fed the modes precision.  In fact
I wonder why we're using 'extv' to "extract" sth larger from sth smaller at all.
Why do we arrive at something for get_best_reg_extraction_insn at all?

> +          /* When validate_subreg doesn't allow subreg between integer mode
> +             and float mode with different size, It will hit gcc_assert in
> +             gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is
> +             not really needed, codes like below will be finally generated.
> +             (set (reg:SI 1)
> +                  (and:SI (reg:DI 2) -1))
> +             (set (reg:SF 3)
> +                  (subreg:SF (reg:SI 1)))  */
> +          && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
> +          && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
>          {
>            target = gen_lowpart (ext_mode, target);
>            if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
>
>
>
> > to "work around" this restriction.  Alternatively one could finally do away
> > with all the exceptions and simply allow all such subregs giving them
> > semantics as to intermediate same-size subregs to integer modes
> > if this definition issue is why we disallow them?
> >
> > That is, any float-mode source or destination subreg is interpreted as
> > wrapping the source operand (if float-mode) in a same size int subreg
> > and performing the subreg in an integer mode first if the destination
> > mode is a float mode?
> >
> > Also I detest that validate_subreg list things not allowed as opposed
> > to things allowed.  Why are FLOAT_MODE special, but
> > fractional and accumulating modes not?  The subreg documentation
> > also doesn't talk about cases not allowed.
> >
> > Richard.
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-08-05  9:24                             ` Richard Biener
@ 2021-08-05  9:49                               ` Hongtao Liu
  2021-08-05 10:14                                 ` Richard Biener
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-05  9:49 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Sandiford, liuhongt, GCC Patches, Uros Bizjak,
	Joseph Myers, H. J. Lu

On Thu, Aug 5, 2021 at 5:24 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Aug 5, 2021 at 9:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > > > >         * config/i386/i386.c (enum x86_64_reg_class): Add
> > > > >         X86_64_SSEHF_CLASS.
> > > > >         (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > > >         (examine_argument): Ditto.
> > > > >         (construct_container): Ditto.
> > > > >         (classify_argument): Ditto, and set HFmode/HCmode to
> > > > >         X86_64_SSEHF_CLASS.
> > > > >         (function_value_32): Return _FLoat16/Complex Float16 by
> > > > >         %xmm0.
> > > > >         (function_value_64): Return _Float16/Complex Float16 by SSE
> > > > >         register.
> > > > >         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > > >         (ix86_secondary_reload): Require gpr as intermediate register
> > > > >         to store _Float16 from sse register when sse4 is not
> > > > >         available.
> > > > >         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > > > >         sse2.
> > > > >         (ix86_scalar_mode_supported_p): Ditto.
> > > > >         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > > >         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > > >         (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > > >         * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > > >         (*pushhf): Ditto.
> > > > >         (*movhf_internal): Ditto.
> > > > >         * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > > >         _Float16 for x86.
> > > > >         * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > > >         which is used by extract_bit_field but not backends.
> > > > >
> > > [...]
> > > >
> > > > Ping, i'd like to ask for approval for the below codes which is
> > > > related to generic part.
> > > >
> > > > start from ..
> > > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > > > index ff3b4449b37..775ee397836 100644
> > > > > --- a/gcc/emit-rtl.c
> > > > > +++ b/gcc/emit-rtl.c
> > > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
> > > > >       fix them all.  */
> > > > >    if (omode == word_mode)
> > > > >      ;
> > > > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > > > > +     here. Though extract_bit_field is the culprit here, not the backends.  */
> > > > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > > > +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > > > +    ;
> > > > >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > > >       is the culprit here, and not the backends.  */
> > > > >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > >
> > > > and end here.
> > >
> > > So the main restriction otherwise in place is
> > >
> > >   /* Subregs involving floating point modes are not allowed to
> > >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > >      (subreg:SI (reg:DF) 0) isn't.  */
> > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > >     {
> > >       if (! (known_eq (isize, osize)
> > >              /* LRA can use subreg to store a floating point value in
> > >                 an integer mode.  Although the floating point and the
> > >                 integer modes need the same number of hard registers,
> > >                 the size of floating point mode can be less than the
> > >                 integer mode.  LRA also uses subregs for a register
> > >                 should be used in different mode in on insn.  */
> > >              || lra_in_progress))
> > >         return false;
> > >
> > > I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))
> >
> > After debug, I find (subreg:SI (reg:HF)) is not really needed, it
> > would be finally handled by below cut
> > ----cut-----
> >   /* Find a correspondingly-sized integer field, so we can apply
> >      shifts and masks to it.  */
> >   scalar_int_mode int_mode;
> >   if (!int_mode_for_mode (tmode).exists (&int_mode))
> >     /* If this fails, we should probably push op0 out to memory and then
> >        do a load.  */
> >     int_mode = int_mode_for_mode (mode).require ();
> >
> >   target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
> >     bitnum, target, unsignedp, reverse);
> > -----end----
> >
> > and generate things like below cut
> >
> > ---cut----
> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> > (insn 6 3 7 2 (parallel [
> >             (set (reg:HI 86)
> >                 (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
> >                     (const_int -1 [0xffffffffffffffff])))
> >             (clobber (reg:CC 17 flags))
> >         ]) "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> > -1
> >      (nil))
> > (insn 7 6 11 2 (set (reg:HF 82 [ <retval> ])
> >         (subreg:HF (reg:HI 86) 0))
> > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> > -1
> >      (nil))
> > (insn 11 7 12 2 (set (reg/i:HF 20 xmm0)
> >         (reg:HF 82 [ <retval> ]))
> > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1
> > -1
> >      (nil))
> > ----end---
> >
> > The real problem is here, when validate_subreg doesn't allow subreg
> > between integer mode and float mode with different sizes. It will hit
> > gcc_assert in gen_lowpart
> >
> > ----cut-----
> >       /* Don't use LHS paradoxical subreg if explicit truncation is needed
> > between the mode of the extraction (word_mode) and the target
> > mode.  Instead, create a temporary and use convert_move to set
> > the target.  */
> >       if (REG_P (target)
> >   && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> > {
> >   target = gen_lowpart (ext_mode, target);
> >   if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
> >     spec_target_subreg = target;
> > }
> > ----end----
> >
> > So how about changes like below, remove changes in validate_subreg and
> > add some guard in extract_bit_field_using_extv.
> >
> > modified   gcc/emit-rtl.c
> > @@ -928,11 +928,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
> >       fix them all.  */
> >    if (omode == word_mode)
> >      ;
> > -  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > -     here. Though extract_bit_field is the culprit here, not the backends.  */
> > -  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > -           && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > -    ;
> >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> >       is the culprit here, and not the backends.  */
> >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > modified   gcc/expmed.c
> > @@ -1572,8 +1572,19 @@ extract_bit_field_using_extv (const
> > extraction_insn *extv, rtx op0,
> >           between the mode of the extraction (word_mode) and the target
> >           mode.  Instead, create a temporary and use convert_move to set
> >           the target.  */
> > +      machine_mode tmode = GET_MODE (target);
> >        if (REG_P (target)
> > -          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> > +          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
>
> doesn't it simply mean that TRULY_NOOP_TRUNCATION_MODES_P may not be
> true for modes that are not handled by gen_lowpart?  But it just wraps the
> truly_noop_truncation target hook which is fed the modes precision.  In fact
> I wonder why we're using 'extv' to "extract" sth larger from sth smaller at all.
No, target is SFmode, and ext_mode is DImode, so it extracts sth
smaller from larger.
> Why do we arrive at something for get_best_reg_extraction_insn at all?
>
> > +          /* When validate_subreg doesn't allow subreg between integer mode
> > +             and float mode with different size, It will hit gcc_assert in
> > +             gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is
> > +             not really needed, codes like below will be finally generated.
> > +             (set (reg:SI 1)
> > +                  (and:SI (reg:DI 2) -1))
> > +             (set (reg:SF 3)
> > +                  (subreg:SF (reg:SI 1)))  */
> > +          && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
> > +          && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
> >          {
> >            target = gen_lowpart (ext_mode, target);
> >            if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
And somehow it tries to use paradoxical_subreg here, that's why we get
something like (subreg:SI (reg:HF).
> >
> >
> >
> > > to "work around" this restriction.  Alternatively one could finally do away
> > > with all the exceptions and simply allow all such subregs giving them
> > > semantics as to intermediate same-size subregs to integer modes
> > > if this definition issue is why we disallow them?
> > >
> > > That is, any float-mode source or destination subreg is interpreted as
> > > wrapping the source operand (if float-mode) in a same size int subreg
> > > and performing the subreg in an integer mode first if the destination
> > > mode is a float mode?
> > >
> > > Also I detest that validate_subreg list things not allowed as opposed
> > > to things allowed.  Why are FLOAT_MODE special, but
> > > fractional and accumulating modes not?  The subreg documentation
> > > also doesn't talk about cases not allowed.
> > >
> > > Richard.
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-08-05  9:49                               ` Hongtao Liu
@ 2021-08-05 10:14                                 ` Richard Biener
  2021-08-06  3:32                                   ` [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field liuhongt
  0 siblings, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-05 10:14 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Richard Sandiford, liuhongt, GCC Patches, Uros Bizjak,
	Joseph Myers, H. J. Lu

On Thu, Aug 5, 2021 at 11:43 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 5, 2021 at 5:24 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Aug 5, 2021 at 9:25 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > > > > >         * config/i386/i386.c (enum x86_64_reg_class): Add
> > > > > >         X86_64_SSEHF_CLASS.
> > > > > >         (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > > > >         (examine_argument): Ditto.
> > > > > >         (construct_container): Ditto.
> > > > > >         (classify_argument): Ditto, and set HFmode/HCmode to
> > > > > >         X86_64_SSEHF_CLASS.
> > > > > >         (function_value_32): Return _FLoat16/Complex Float16 by
> > > > > >         %xmm0.
> > > > > >         (function_value_64): Return _Float16/Complex Float16 by SSE
> > > > > >         register.
> > > > > >         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > > > >         (ix86_secondary_reload): Require gpr as intermediate register
> > > > > >         to store _Float16 from sse register when sse4 is not
> > > > > >         available.
> > > > > >         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > > > > >         sse2.
> > > > > >         (ix86_scalar_mode_supported_p): Ditto.
> > > > > >         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > > > >         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > > > >         (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > > > >         * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > > > >         (*pushhf): Ditto.
> > > > > >         (*movhf_internal): Ditto.
> > > > > >         * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > > > >         _Float16 for x86.
> > > > > >         * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > > > >         which is used by extract_bit_field but not backends.
> > > > > >
> > > > [...]
> > > > >
> > > > > Ping, i'd like to ask for approval for the below codes which is
> > > > > related to generic part.
> > > > >
> > > > > start from ..
> > > > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > > > > index ff3b4449b37..775ee397836 100644
> > > > > > --- a/gcc/emit-rtl.c
> > > > > > +++ b/gcc/emit-rtl.c
> > > > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
> > > > > >       fix them all.  */
> > > > > >    if (omode == word_mode)
> > > > > >      ;
> > > > > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > > > > > +     here. Though extract_bit_field is the culprit here, not the backends.  */
> > > > > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > > > > +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > > > > +    ;
> > > > > >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > > > >       is the culprit here, and not the backends.  */
> > > > > >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > > >
> > > > > and end here.
> > > >
> > > > So the main restriction otherwise in place is
> > > >
> > > >   /* Subregs involving floating point modes are not allowed to
> > > >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > > >      (subreg:SI (reg:DF) 0) isn't.  */
> > > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > > >     {
> > > >       if (! (known_eq (isize, osize)
> > > >              /* LRA can use subreg to store a floating point value in
> > > >                 an integer mode.  Although the floating point and the
> > > >                 integer modes need the same number of hard registers,
> > > >                 the size of floating point mode can be less than the
> > > >                 integer mode.  LRA also uses subregs for a register
> > > >                 should be used in different mode in on insn.  */
> > > >              || lra_in_progress))
> > > >         return false;
> > > >
> > > > I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))
> > >
> > > After debug, I find (subreg:SI (reg:HF)) is not really needed, it
> > > would be finally handled by below cut
> > > ----cut-----
> > >   /* Find a correspondingly-sized integer field, so we can apply
> > >      shifts and masks to it.  */
> > >   scalar_int_mode int_mode;
> > >   if (!int_mode_for_mode (tmode).exists (&int_mode))
> > >     /* If this fails, we should probably push op0 out to memory and then
> > >        do a load.  */
> > >     int_mode = int_mode_for_mode (mode).require ();
> > >
> > >   target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
> > >     bitnum, target, unsignedp, reverse);
> > > -----end----
> > >
> > > and generate things like below cut
> > >
> > > ---cut----
> > > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> > > (insn 6 3 7 2 (parallel [
> > >             (set (reg:HI 86)
> > >                 (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
> > >                     (const_int -1 [0xffffffffffffffff])))
> > >             (clobber (reg:CC 17 flags))
> > >         ]) "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> > > -1
> > >      (nil))
> > > (insn 7 6 11 2 (set (reg:HF 82 [ <retval> ])
> > >         (subreg:HF (reg:HI 86) 0))
> > > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> > > -1
> > >      (nil))
> > > (insn 11 7 12 2 (set (reg/i:HF 20 xmm0)
> > >         (reg:HF 82 [ <retval> ]))
> > > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1
> > > -1
> > >      (nil))
> > > ----end---
> > >
> > > The real problem is here, when validate_subreg doesn't allow subreg
> > > between integer mode and float mode with different sizes. It will hit
> > > gcc_assert in gen_lowpart
> > >
> > > ----cut-----
> > >       /* Don't use LHS paradoxical subreg if explicit truncation is needed
> > > between the mode of the extraction (word_mode) and the target
> > > mode.  Instead, create a temporary and use convert_move to set
> > > the target.  */
> > >       if (REG_P (target)
> > >   && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> > > {
> > >   target = gen_lowpart (ext_mode, target);
> > >   if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
> > >     spec_target_subreg = target;
> > > }
> > > ----end----
> > >
> > > So how about changes like below, remove changes in validate_subreg and
> > > add some guard in extract_bit_field_using_extv.
> > >
> > > modified   gcc/emit-rtl.c
> > > @@ -928,11 +928,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
> > >       fix them all.  */
> > >    if (omode == word_mode)
> > >      ;
> > > -  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > > -     here. Though extract_bit_field is the culprit here, not the backends.  */
> > > -  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > -           && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > -    ;
> > >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > >       is the culprit here, and not the backends.  */
> > >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > modified   gcc/expmed.c
> > > @@ -1572,8 +1572,19 @@ extract_bit_field_using_extv (const
> > > extraction_insn *extv, rtx op0,
> > >           between the mode of the extraction (word_mode) and the target
> > >           mode.  Instead, create a temporary and use convert_move to set
> > >           the target.  */
> > > +      machine_mode tmode = GET_MODE (target);
> > >        if (REG_P (target)
> > > -          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> > > +          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
> >
> > doesn't it simply mean that TRULY_NOOP_TRUNCATION_MODES_P may not be
> > true for modes that are not handled by gen_lowpart?  But it just wraps the
> > truly_noop_truncation target hook which is fed the modes precision.  In fact
> > I wonder why we're using 'extv' to "extract" sth larger from sth smaller at all.
> No, target is SFmode, and ext_mode is DImode, so it extracts sth
> smaller from larger.
> > Why do we arrive at something for get_best_reg_extraction_insn at all?
> >
> > > +          /* When validate_subreg doesn't allow subreg between integer mode
> > > +             and float mode with different size, It will hit gcc_assert in
> > > +             gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is
> > > +             not really needed, codes like below will be finally generated.
> > > +             (set (reg:SI 1)
> > > +                  (and:SI (reg:DI 2) -1))
> > > +             (set (reg:SF 3)
> > > +                  (subreg:SF (reg:SI 1)))  */
> > > +          && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
> > > +          && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
> > >          {
> > >            target = gen_lowpart (ext_mode, target);
> > >            if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
> And somehow it tries to use paradoxical_subreg here, that's why we get
> something like (subreg:SI (reg:HF).

OK, I think sth is amiss here upthread.  insv/extv do look like they
are designed
to work on integer modes (but docs do not say anything about this here).
In fact the caller of extract_bit_field_using_extv is named
extract_integral_bit_field.  Of course nothing seems to check what kind of
modes we're dealing with, but we're for example happily doing
expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
some integer mode and op0 is HFmode?  From the above I get it's
the other way around?  In that case we should wrap the
call to extract_integral_bit_field, extracting in an integer mode with the
same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).

> > >
> > >
> > >
> > > > to "work around" this restriction.  Alternatively one could finally do away
> > > > with all the exceptions and simply allow all such subregs giving them
> > > > semantics as to intermediate same-size subregs to integer modes
> > > > if this definition issue is why we disallow them?
> > > >
> > > > That is, any float-mode source or destination subreg is interpreted as
> > > > wrapping the source operand (if float-mode) in a same size int subreg
> > > > and performing the subreg in an integer mode first if the destination
> > > > mode is a float mode?
> > > >
> > > > Also I detest that validate_subreg list things not allowed as opposed
> > > > to things allowed.  Why are FLOAT_MODE special, but
> > > > fractional and accumulating modes not?  The subreg documentation
> > > > also doesn't talk about cases not allowed.
> > > >
> > > > Richard.
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-05 10:14                                 ` Richard Biener
@ 2021-08-06  3:32                                   ` liuhongt
  2021-08-06  3:44                                     ` Andrew Pinski
  2021-08-06  6:57                                     ` Richard Biener
  0 siblings, 2 replies; 138+ messages in thread
From: liuhongt @ 2021-08-06  3:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, richard.guenther, crazylht

Hi:
---
OK, I think sth is amiss here upthread.  insv/extv do look like they
are designed
to work on integer modes (but docs do not say anything about this here).
In fact the caller of extract_bit_field_using_extv is named
extract_integral_bit_field.  Of course nothing seems to check what kind of
modes we're dealing with, but we're for example happily doing
expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
some integer mode and op0 is HFmode?  From the above I get it's
the other way around?  In that case we should wrap the
call to extract_integral_bit_field, extracting in an integer mode with the
same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
---
  This is a separate patch as a follow up of upper comments.
 
gcc/ChangeLog:

	* expmed.c (extract_bit_field_1): Wrap the call to
	extract_integral_bit_field, extracting in an integer mode with
	the same size as 'tmode' and then converting the result
	as (subreg:tmode (reg:imode)).

gcc/testsuite/ChangeLog:
	* gcc.target/i386/float16-5.c: New test.
---
 gcc/expmed.c                              | 19 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3143f38e057..72790693ef0 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
       op0_mode = opt_scalar_int_mode ();
     }
 
+  /* Make sure we are playing with integral modes.  Pun with subregs
+     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
+     in extract_integral_bit_field.  */
+  if (int_mode_for_mode (tmode).exists (&imode)
+      && imode != tmode
+      && imode != GET_MODE (op0))
+    {
+      rtx ret = extract_integral_bit_field (op0, op0_mode,
+					    bitsize.to_constant (),
+					    bitnum.to_constant (), unsignedp,
+					    NULL, imode, imode,
+					    reverse, fallback_p);
+      gcc_assert (ret);
+
+      if (!REG_P (ret))
+	ret = force_reg (imode, ret);
+      return gen_lowpart_SUBREG (tmode, ret);
+    }
+
   /* It's possible we'll need to handle other cases here for
      polynomial bitnum and bitsize.  */
 
diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
new file mode 100644
index 00000000000..ebc0af1490b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2" } */
+_Float16
+foo (int a)
+{
+  union {
+    int a;
+    _Float16 b;
+  }c;
+  c.a = a;
+  return c.b;
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-06  3:32                                   ` [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field liuhongt
@ 2021-08-06  3:44                                     ` Andrew Pinski
  2021-08-06  4:59                                       ` Hongtao Liu
  2021-08-06  6:57                                     ` Richard Biener
  1 sibling, 1 reply; 138+ messages in thread
From: Andrew Pinski @ 2021-08-06  3:44 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Richard Sandiford

On Thu, Aug 5, 2021 at 8:33 PM liuhongt via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi:
> ---
> OK, I think sth is amiss here upthread.  insv/extv do look like they
> are designed
> to work on integer modes (but docs do not say anything about this here).
> In fact the caller of extract_bit_field_using_extv is named
> extract_integral_bit_field.  Of course nothing seems to check what kind of
> modes we're dealing with, but we're for example happily doing
> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> some integer mode and op0 is HFmode?  From the above I get it's
> the other way around?  In that case we should wrap the
> call to extract_integral_bit_field, extracting in an integer mode with the
> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).

This seems related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93235 .
I wonder why the fix for that did not help here.

Thanks,
Andrew Pinski

> ---
>   This is a separate patch as a follow up of upper comments.
>
> gcc/ChangeLog:
>
>         * expmed.c (extract_bit_field_1): Wrap the call to
>         extract_integral_bit_field, extracting in an integer mode with
>         the same size as 'tmode' and then converting the result
>         as (subreg:tmode (reg:imode)).
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/i386/float16-5.c: New test.
> ---
>  gcc/expmed.c                              | 19 +++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index 3143f38e057..72790693ef0 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>        op0_mode = opt_scalar_int_mode ();
>      }
>
> +  /* Make sure we are playing with integral modes.  Pun with subregs
> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> +     in extract_integral_bit_field.  */
> +  if (int_mode_for_mode (tmode).exists (&imode)
> +      && imode != tmode
> +      && imode != GET_MODE (op0))
> +    {
> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> +                                           bitsize.to_constant (),
> +                                           bitnum.to_constant (), unsignedp,
> +                                           NULL, imode, imode,
> +                                           reverse, fallback_p);
> +      gcc_assert (ret);
> +
> +      if (!REG_P (ret))
> +       ret = force_reg (imode, ret);
> +      return gen_lowpart_SUBREG (tmode, ret);
> +    }
> +
>    /* It's possible we'll need to handle other cases here for
>       polynomial bitnum and bitsize.  */
>
> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> new file mode 100644
> index 00000000000..ebc0af1490b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse2 -O2" } */
> +_Float16
> +foo (int a)
> +{
> +  union {
> +    int a;
> +    _Float16 b;
> +  }c;
> +  c.a = a;
> +  return c.b;
> +}
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-06  3:44                                     ` Andrew Pinski
@ 2021-08-06  4:59                                       ` Hongtao Liu
  2021-08-06  5:52                                         ` Hongtao Liu
  2021-08-06  6:59                                         ` Richard Biener
  0 siblings, 2 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-06  4:59 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: liuhongt, Richard Sandiford, GCC Patches

On Fri, Aug 6, 2021 at 11:44 AM Andrew Pinski via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 5, 2021 at 8:33 PM liuhongt via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi:
> > ---
> > OK, I think sth is amiss here upthread.  insv/extv do look like they
> > are designed
> > to work on integer modes (but docs do not say anything about this here).
> > In fact the caller of extract_bit_field_using_extv is named
> > extract_integral_bit_field.  Of course nothing seems to check what kind of
> > modes we're dealing with, but we're for example happily doing
> > expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > some integer mode and op0 is HFmode?  From the above I get it's
> > the other way around?  In that case we should wrap the
> > call to extract_integral_bit_field, extracting in an integer mode with the
> > same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
>
> This seems related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93235 .
> I wonder why the fix for that did not help here.
>
aarch64 didn't hit gcc_assert with my testcase, and I debugged it to
figure out why.

in gimple level, both x86 and aarch64 is the same with
_3 = BIT_FIELD_REF <a_2(D), 16, 0>;

and they all goes into
extract_bit_field_using_extv

The difference is aarch64 has ext_mode as DImode, but x86 has ext_mode
as SImode.
with ext_mode as DImode and target as (reg:HF 94), aarch64 doesn't hit
gcc_assert in
 gen_lowpart (ext_mode, target)

since validate_subreg allow (subreg:DI (reg:HF)), but disallow
(subreg:SI (reg:HF)).

  /* ??? This should not be here.  Temporarily continue to allow word_mode
     subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
     Generally, backends are doing something sketchy but it'll take time to
     fix them all.  */
  if (omode == word_mode)
    ;

ext_mode is assigned from extv->field mode which is initialized in
get_best_reg_extraction_insn.
get_best_reg_extraction_insn will finally call
get_optab_extraction_insn and find
aarch64 doesn't have CODE_FOR_extzvsi but x86 has.

That's why aarch64 has ext_mode as DImode and x86 SImode.

> Thanks,
> Andrew Pinski
>
> > ---
> >   This is a separate patch as a follow up of upper comments.
> >
> > gcc/ChangeLog:
> >
> >         * expmed.c (extract_bit_field_1): Wrap the call to
> >         extract_integral_bit_field, extracting in an integer mode with
> >         the same size as 'tmode' and then converting the result
> >         as (subreg:tmode (reg:imode)).
> >
> > gcc/testsuite/ChangeLog:
> >         * gcc.target/i386/float16-5.c: New test.
> > ---
> >  gcc/expmed.c                              | 19 +++++++++++++++++++
> >  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> >  2 files changed, 31 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> >
> > diff --git a/gcc/expmed.c b/gcc/expmed.c
> > index 3143f38e057..72790693ef0 100644
> > --- a/gcc/expmed.c
> > +++ b/gcc/expmed.c
> > @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> >        op0_mode = opt_scalar_int_mode ();
> >      }
> >
> > +  /* Make sure we are playing with integral modes.  Pun with subregs
> > +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > +     in extract_integral_bit_field.  */
> > +  if (int_mode_for_mode (tmode).exists (&imode)
> > +      && imode != tmode
> > +      && imode != GET_MODE (op0))
> > +    {
> > +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> > +                                           bitsize.to_constant (),
> > +                                           bitnum.to_constant (), unsignedp,
> > +                                           NULL, imode, imode,
> > +                                           reverse, fallback_p);
> > +      gcc_assert (ret);
> > +
> > +      if (!REG_P (ret))
> > +       ret = force_reg (imode, ret);
> > +      return gen_lowpart_SUBREG (tmode, ret);
> > +    }
> > +
> >    /* It's possible we'll need to handle other cases here for
> >       polynomial bitnum and bitsize.  */
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> > new file mode 100644
> > index 00000000000..ebc0af1490b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-msse2 -O2" } */
> > +_Float16
> > +foo (int a)
> > +{
> > +  union {
> > +    int a;
> > +    _Float16 b;
> > +  }c;
> > +  c.a = a;
> > +  return c.b;
> > +}
> > --
> > 2.27.0
> >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-06  4:59                                       ` Hongtao Liu
@ 2021-08-06  5:52                                         ` Hongtao Liu
  2021-08-06  6:59                                         ` Richard Biener
  1 sibling, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-06  5:52 UTC (permalink / raw)
  To: Andrew Pinski, Jakub Jelinek; +Cc: liuhongt, Richard Sandiford, GCC Patches

On Fri, Aug 6, 2021 at 12:59 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Aug 6, 2021 at 11:44 AM Andrew Pinski via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Aug 5, 2021 at 8:33 PM liuhongt via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi:
> > > ---
> > > OK, I think sth is amiss here upthread.  insv/extv do look like they
> > > are designed
> > > to work on integer modes (but docs do not say anything about this here).
> > > In fact the caller of extract_bit_field_using_extv is named
> > > extract_integral_bit_field.  Of course nothing seems to check what kind of
> > > modes we're dealing with, but we're for example happily doing
> > > expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > > some integer mode and op0 is HFmode?  From the above I get it's
> > > the other way around?  In that case we should wrap the
> > > call to extract_integral_bit_field, extracting in an integer mode with the
> > > same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> >
> > This seems related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93235 .
CC jakub.
> > I wonder why the fix for that did not help here.
> >
> aarch64 didn't hit gcc_assert with my testcase, and I debugged it to
> figure out why.
>
> in gimple level, both x86 and aarch64 is the same with
> _3 = BIT_FIELD_REF <a_2(D), 16, 0>;
>
> and they all goes into
> extract_bit_field_using_extv
>
> The difference is aarch64 has ext_mode as DImode, but x86 has ext_mode
> as SImode.
> with ext_mode as DImode and target as (reg:HF 94), aarch64 doesn't hit
> gcc_assert in
>  gen_lowpart (ext_mode, target)
>
> since validate_subreg allow (subreg:DI (reg:HF)), but disallow
> (subreg:SI (reg:HF)).
>
>   /* ??? This should not be here.  Temporarily continue to allow word_mode
>      subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
>      Generally, backends are doing something sketchy but it'll take time to
>      fix them all.  */
>   if (omode == word_mode)
>     ;
>
> ext_mode is assigned from extv->field mode which is initialized in
> get_best_reg_extraction_insn.
> get_best_reg_extraction_insn will finally call
> get_optab_extraction_insn and find
> aarch64 doesn't have CODE_FOR_extzvsi but x86 has.
>
> That's why aarch64 has ext_mode as DImode and x86 SImode.
>
> > Thanks,
> > Andrew Pinski
> >
> > > ---
> > >   This is a separate patch as a follow up of upper comments.
> > >
> > > gcc/ChangeLog:
> > >
> > >         * expmed.c (extract_bit_field_1): Wrap the call to
> > >         extract_integral_bit_field, extracting in an integer mode with
> > >         the same size as 'tmode' and then converting the result
> > >         as (subreg:tmode (reg:imode)).
> > >
> > > gcc/testsuite/ChangeLog:
> > >         * gcc.target/i386/float16-5.c: New test.
> > > ---
> > >  gcc/expmed.c                              | 19 +++++++++++++++++++
> > >  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> > >  2 files changed, 31 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > >
> > > diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > index 3143f38e057..72790693ef0 100644
> > > --- a/gcc/expmed.c
> > > +++ b/gcc/expmed.c
> > > @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> > >        op0_mode = opt_scalar_int_mode ();
> > >      }
> > >
> > > +  /* Make sure we are playing with integral modes.  Pun with subregs
> > > +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > > +     in extract_integral_bit_field.  */
> > > +  if (int_mode_for_mode (tmode).exists (&imode)
> > > +      && imode != tmode
> > > +      && imode != GET_MODE (op0))
> > > +    {
> > > +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> > > +                                           bitsize.to_constant (),
> > > +                                           bitnum.to_constant (), unsignedp,
> > > +                                           NULL, imode, imode,
> > > +                                           reverse, fallback_p);
> > > +      gcc_assert (ret);
> > > +
> > > +      if (!REG_P (ret))
> > > +       ret = force_reg (imode, ret);
> > > +      return gen_lowpart_SUBREG (tmode, ret);
> > > +    }
> > > +
> > >    /* It's possible we'll need to handle other cases here for
> > >       polynomial bitnum and bitsize.  */
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > new file mode 100644
> > > index 00000000000..ebc0af1490b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-msse2 -O2" } */
> > > +_Float16
> > > +foo (int a)
> > > +{
> > > +  union {
> > > +    int a;
> > > +    _Float16 b;
> > > +  }c;
> > > +  c.a = a;
> > > +  return c.b;
> > > +}
> > > --
> > > 2.27.0
> > >
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
  2021-08-03  2:44                         ` Hongtao Liu
@ 2021-08-06  6:06                           ` Hongtao Liu
  2021-08-17  1:53                             ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-06  6:06 UTC (permalink / raw)
  To: Joseph Myers; +Cc: liuhongt, GCC Patches, schwab, Richard Sandiford, hepenner

[-- Attachment #1: Type: text/plain, Size: 2590 bytes --]

On Tue, Aug 3, 2021 at 10:44 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 3, 2021 at 3:34 AM Joseph Myers <joseph@codesourcery.com> wrote:
> >
> > On Mon, 2 Aug 2021, liuhongt via Gcc-patches wrote:
> >
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index 7979e240426..dc673c89bc8 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum excess_precision_type type)
> > >       return (type == EXCESS_PRECISION_TYPE_STANDARD
> > >               ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> > >               : FLT_EVAL_METHOD_UNPREDICTABLE);
> > > +      case EXCESS_PRECISION_TYPE_FLOAT16:
> > > +     return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > >        default:
> > >       gcc_unreachable ();
> > >      }
> >
> > I'd expect an error for -fexcess-precision=16 with -mfpmath=387 (since x87
> > doesn't do float or double arithmetic, but -fexcess-precision=16 implies
> > that all of _Float16, float and double are represented to the range and
> > precision of their type withou any excess precision).
> >
> Yes, additional changes like this.
>
> modified   gcc/config/i386/i386.c
> @@ -23443,6 +23443,9 @@ ix86_get_excess_precision (enum
> excess_precision_type type)
>   ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
>   : FLT_EVAL_METHOD_UNPREDICTABLE);
>        case EXCESS_PRECISION_TYPE_FLOAT16:
> + if (TARGET_80387
> +     && !(TARGET_SSE_MATH && TARGET_SSE))
> +   error ("%<-fexcess-precision=16%> is not compatible with %<-mfpmath=387%>");
>   return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
>        default:
>   gcc_unreachable ();
> new file   gcc/testsuite/gcc.target/i386/float16-7.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mfpmath=387 -fexcess-precision=16" } */
> +/* { dg-excess-errors "'-fexcess-precision=16' is not compatible with
> '-mfpmath=387'" } */
> +_Float16
> +foo (_Float16 a, _Float16 b)
> +{
> +  return a + b;/* { dg-error "'-fexcess-precision=16' is not
> compatible with '-mfpmath=387'" } */
> +}
> +
>
> > --
> > Joseph S. Myers
> > joseph@codesourcery.com
>
>
>
> --
> BR,
> Hongtao


Updated patch and ping for it.

Also for backend changes.
1. For backend m68k/s390 which totally don't support _Float16, backend
will issue an error for -fexcess-precision=16, I think it should be
fine.
2. For backend like arm/aarch64 which supports _Float16 , backend will
set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for -fexcess-precision=16 even
hardware instruction for fp16 is not supported. Would that be ok for
arm?
-- 
BR,
Hongtao

[-- Attachment #2: Support-fexcess-precision-16-which-will-enable-FLT_E.patch --]
[-- Type: text/x-patch, Size: 17353 bytes --]

From 0d0b317eabd9f2ed070111fe8401aaf1391279be Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 2 Aug 2021 10:56:45 +0800
Subject: [PATCH] Support -fexcess-precision=16 which will enable
 FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.

gcc/ada/ChangeLog:

	* gcc-interface/misc.c (gnat_post_options): Issue an error for
	-fexcess-precision=16.

gcc/c-family/ChangeLog:

	* c-common.c (excess_precision_mode_join): Update below comments.
	(c_ts18661_flt_eval_method): Set excess_precision_type to
	EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
	* c-cppbuiltin.c (cpp_atomic_builtins): Update below comments.
	(c_cpp_flt_eval_method_iec_559): Set excess_precision_type to
	EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.

gcc/ChangeLog:

	* common.opt: Support -fexcess-precision=16.
	* config/aarch64/aarch64.c (aarch64_excess_precision): Return
	FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when
	EXCESS_PRECISION_TYPE_FLOAT16.
	* config/arm/arm.c (arm_excess_precision): Ditto.
	* config/i386/i386.c (ix86_get_excess_precision): Ditto.
	* config/m68k/m68k.c (m68k_excess_precision): Issue an error
	when EXCESS_PRECISION_TYPE_FLOAT16.
	* config/s390/s390.c (s390_excess_precision): Ditto.
	* coretypes.h (enum excess_precision_type): Add
	EXCESS_PRECISION_TYPE_FLOAT16.
	* doc/tm.texi (TARGET_C_EXCESS_PRECISION): Update documents.
	* doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): Ditto.
	* doc/extend.texi (Half-Precision): Document
	-fexcess-precision=16.
	* flag-types.h (enum excess_precision): Add
	EXCESS_PRECISION_FLOAT16.
	* target.def (excess_precision): Update document.
	* tree.c (excess_precision_type): Set excess_precision_type to
	EXCESS_PRECISION_FLOAT16 when -fexcess-precision=16.

gcc/fortran/ChangeLog:

	* options.c (gfc_post_options): Issue an error for
	-fexcess-precision=16.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/float16-6.c: New test.
	* gcc.target/i386/float16-7.c: New test.
---
 gcc/ada/gcc-interface/misc.c              |  3 +++
 gcc/c-family/c-common.c                   |  6 ++++--
 gcc/c-family/c-cppbuiltin.c               |  6 ++++--
 gcc/common.opt                            |  5 ++++-
 gcc/config/aarch64/aarch64.c              |  1 +
 gcc/config/arm/arm.c                      |  1 +
 gcc/config/i386/i386.c                    |  5 +++++
 gcc/config/m68k/m68k.c                    |  2 ++
 gcc/config/s390/s390.c                    |  2 ++
 gcc/coretypes.h                           |  3 ++-
 gcc/doc/extend.texi                       |  3 ++-
 gcc/doc/tm.texi                           | 14 ++++++++++----
 gcc/doc/tm.texi.in                        |  3 +++
 gcc/flag-types.h                          |  3 ++-
 gcc/fortran/options.c                     |  3 +++
 gcc/target.def                            | 11 +++++++----
 gcc/testsuite/gcc.target/i386/float16-6.c |  8 ++++++++
 gcc/testsuite/gcc.target/i386/float16-7.c |  9 +++++++++
 gcc/tree.c                                |  3 ++-
 19 files changed, 74 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-7.c

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 186367ac6d1..96199bd4b63 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -256,6 +256,9 @@ gnat_post_options (const char **pfilename ATTRIBUTE_UNUSED)
   /* Excess precision other than "fast" requires front-end support.  */
   if (flag_excess_precision == EXCESS_PRECISION_STANDARD)
     sorry ("%<-fexcess-precision=standard%> for Ada");
+  else if (flag_excess_precision == EXCESS_PRECISION_FLOAT16)
+    sorry ("%<-fexcess-precision=16%> for Ada");
+
   flag_excess_precision = EXCESS_PRECISION_FAST;
 
   /* No psABI change warnings for Ada.  */
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 00ac3c5278b..28a867aa06b 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8777,7 +8777,7 @@ excess_precision_mode_join (enum flt_eval_method x,
 
    This relates to the effective excess precision seen by the user,
    which is the join point of the precision the target requests for
-   -fexcess-precision={standard,fast} and the implicit excess precision
+   -fexcess-precision={standard,fast,16} and the implicit excess precision
    the target uses.  */
 
 static enum flt_eval_method
@@ -8789,7 +8789,9 @@ c_ts18661_flt_eval_method (void)
   enum excess_precision_type flag_type
     = (flag_excess_precision == EXCESS_PRECISION_STANDARD
        ? EXCESS_PRECISION_TYPE_STANDARD
-       : EXCESS_PRECISION_TYPE_FAST);
+       : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
+	  ? EXCESS_PRECISION_TYPE_FLOAT16
+	  : EXCESS_PRECISION_TYPE_FAST));
 
   enum flt_eval_method requested
     = targetm.c.excess_precision (flag_type);
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index f79f939bd10..5f30354a33c 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -753,7 +753,7 @@ cpp_atomic_builtins (cpp_reader *pfile)
 /* Return TRUE if the implicit excess precision in which the back-end will
    compute floating-point calculations is not more than the explicit
    excess precision that the front-end will apply under
-   -fexcess-precision=[standard|fast].
+   -fexcess-precision=[standard|fast|16].
 
    More intuitively, return TRUE if the excess precision proposed by the
    front-end is the excess precision that will actually be used.  */
@@ -764,7 +764,9 @@ c_cpp_flt_eval_method_iec_559 (void)
   enum excess_precision_type front_end_ept
     = (flag_excess_precision == EXCESS_PRECISION_STANDARD
        ? EXCESS_PRECISION_TYPE_STANDARD
-       : EXCESS_PRECISION_TYPE_FAST);
+       : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
+	  ? EXCESS_PRECISION_TYPE_FLOAT16
+	  : EXCESS_PRECISION_TYPE_FAST));
 
   enum flt_eval_method back_end
     = targetm.c.excess_precision (EXCESS_PRECISION_TYPE_IMPLICIT);
diff --git a/gcc/common.opt b/gcc/common.opt
index d9da1131eda..3dd74766400 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1518,7 +1518,7 @@ Perform a number of minor, expensive optimizations.
 
 fexcess-precision=
 Common Joined RejectNegative Enum(excess_precision) Var(flag_excess_precision) Init(EXCESS_PRECISION_DEFAULT) Optimization SetByCombined
--fexcess-precision=[fast|standard]	Specify handling of excess floating-point precision.
+-fexcess-precision=[fast|standard|16]	Specify handling of excess floating-point precision.
 
 Enum
 Name(excess_precision) Type(enum excess_precision) UnknownError(unknown excess precision style %qs)
@@ -1529,6 +1529,9 @@ Enum(excess_precision) String(fast) Value(EXCESS_PRECISION_FAST)
 EnumValue
 Enum(excess_precision) String(standard) Value(EXCESS_PRECISION_STANDARD)
 
+EnumValue
+Enum(excess_precision) String(16) Value(EXCESS_PRECISION_FLOAT16)
+
 ; Whether we permit the extended set of values for FLT_EVAL_METHOD
 ; introduced in ISO/IEC TS 18661-3, or limit ourselves to those in C99/C11.
 fpermitted-flt-eval-methods=
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e02cbcbcb38..a68555b3114 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -25112,6 +25112,7 @@ aarch64_excess_precision (enum excess_precision_type type)
 		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
 		: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
       case EXCESS_PRECISION_TYPE_IMPLICIT:
+      case EXCESS_PRECISION_TYPE_FLOAT16:
 	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
 	gcc_unreachable ();
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6d781e23ee9..e2a18615860 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25599,6 +25599,7 @@ arm_excess_precision (enum excess_precision_type type)
 		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
 		: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
       case EXCESS_PRECISION_TYPE_IMPLICIT:
+      case EXCESS_PRECISION_TYPE_FLOAT16:
 	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
 	gcc_unreachable ();
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e7921fbaca6..6a51076c145 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23518,6 +23518,11 @@ ix86_get_excess_precision (enum excess_precision_type type)
 	return (type == EXCESS_PRECISION_TYPE_STANDARD
 		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
 		: FLT_EVAL_METHOD_UNPREDICTABLE);
+      case EXCESS_PRECISION_TYPE_FLOAT16:
+	if (TARGET_80387
+	    && !(TARGET_SSE_MATH && TARGET_SSE))
+	  error ("%<-fexcess-precision=16%> is not compatible with %<-mfpmath=387%>");
+	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
 	gcc_unreachable ();
     }
diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index 3f63c60fa92..2fef457c09e 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -7115,6 +7115,8 @@ m68k_excess_precision (enum excess_precision_type type)
 	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
 
 	return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
+      case EXCESS_PRECISION_TYPE_FLOAT16:
+	error ("%<-fexcess-precision=16%> is not supported on this target");
       default:
 	gcc_unreachable ();
     }
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 673a1340285..37e5b2c8c6f 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16549,6 +16549,8 @@ s390_excess_precision (enum excess_precision_type type)
 	   ensure consistency with the implementation in glibc, report that
 	   float is evaluated to the range and precision of double.  */
 	return FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE;
+      case EXCESS_PRECISION_TYPE_FLOAT16:
+	error ("%<-fexcess-precision=16%> is not supported on this target");
       default:
 	gcc_unreachable ();
     }
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 406572e947d..07b9aa656c5 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -424,7 +424,8 @@ enum excess_precision_type
 {
   EXCESS_PRECISION_TYPE_IMPLICIT,
   EXCESS_PRECISION_TYPE_STANDARD,
-  EXCESS_PRECISION_TYPE_FAST
+  EXCESS_PRECISION_TYPE_FAST,
+  EXCESS_PRECISION_TYPE_FLOAT16
 };
 
 /* Level of size optimization.  */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6cf9ebbc3e9..8a89efc4321 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1161,7 +1161,8 @@ operations will be emulated by software emulation and the @code{float}
 instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
 the intermediate result of the operation as 32-bit precision. This may lead
 to inconsistent behavior between software emulation and AVX512-FP16
-instructions.
+instructions. Using @option{-fexcess-precision=16} and  will force round
+back after each operation.
 
 @node Decimal Float
 @section Decimal Floating Types
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cb015283237..07729ab2ad5 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -982,20 +982,26 @@ Do not define this macro if it would never modify @var{m}.
 Return a value, with the same meaning as the C99 macro
 @code{FLT_EVAL_METHOD} that describes which excess precision should be
 applied.  @var{type} is either @code{EXCESS_PRECISION_TYPE_IMPLICIT},
-@code{EXCESS_PRECISION_TYPE_FAST}, or
-@code{EXCESS_PRECISION_TYPE_STANDARD}.  For
+@code{EXCESS_PRECISION_TYPE_FAST},
+@code{EXCESS_PRECISION_TYPE_STANDARD}, or
+@code{EXCESS_PRECISION_TYPE_FLOAT16}.  For
 @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which
 precision and range operations will be implictly evaluated in regardless
 of the excess precision explicitly added.  For
-@code{EXCESS_PRECISION_TYPE_STANDARD} and
+@code{EXCESS_PRECISION_TYPE_STANDARD}, 
+@code{EXCESS_PRECISION_TYPE_FLOAT16}, and
 @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the
 explicit excess precision that should be added depending on the
 value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}.
 Note that unpredictable explicit excess precision does not make sense,
 so a target should never return @code{FLT_EVAL_METHOD_UNPREDICTABLE}
-when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD} or
+when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD},
+@code{EXCESS_PRECISION_TYPE_FLOAT16} or
 @code{EXCESS_PRECISION_TYPE_FAST}.
 @end deftypefn
+Return a value, with the same meaning as the C99 macro
+@code{FLT_EVAL_METHOD} that describes which excess precision should be
+applied.
 
 @deftypefn {Target Hook} machine_mode TARGET_PROMOTE_FUNCTION_MODE (const_tree @var{type}, machine_mode @var{mode}, int *@var{punsignedp}, const_tree @var{funtype}, int @var{for_return})
 Like @code{PROMOTE_MODE}, but it is applied to outgoing function arguments or
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4a522ae7e2e..70fe92adf5c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -929,6 +929,9 @@ Do not define this macro if it would never modify @var{m}.
 @end defmac
 
 @hook TARGET_C_EXCESS_PRECISION
+Return a value, with the same meaning as the C99 macro
+@code{FLT_EVAL_METHOD} that describes which excess precision should be
+applied.
 
 @hook TARGET_PROMOTE_FUNCTION_MODE
 
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index e39673f6716..892d7f5fd34 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -198,7 +198,8 @@ enum excess_precision
 {
   EXCESS_PRECISION_DEFAULT,
   EXCESS_PRECISION_FAST,
-  EXCESS_PRECISION_STANDARD
+  EXCESS_PRECISION_STANDARD,
+  EXCESS_PRECISION_FLOAT16
 };
 
 /* The options for which values of FLT_EVAL_METHOD are permissible.  */
diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index 1723f689a57..847e20e8829 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -267,6 +267,9 @@ gfc_post_options (const char **pfilename)
      support.  */
   if (flag_excess_precision == EXCESS_PRECISION_STANDARD)
     sorry ("%<-fexcess-precision=standard%> for Fortran");
+  else if (flag_excess_precision == EXCESS_PRECISION_FLOAT16)
+    sorry ("%<-fexcess-precision=16%> for Fortran");
+
   flag_excess_precision = EXCESS_PRECISION_FAST;
 
   /* Fortran allows associative math - but we cannot reassociate if
diff --git a/gcc/target.def b/gcc/target.def
index 68a46aaa832..d8482b9dcd3 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6214,18 +6214,21 @@ DEFHOOK
  "Return a value, with the same meaning as the C99 macro\n\
 @code{FLT_EVAL_METHOD} that describes which excess precision should be\n\
 applied.  @var{type} is either @code{EXCESS_PRECISION_TYPE_IMPLICIT},\n\
-@code{EXCESS_PRECISION_TYPE_FAST}, or\n\
-@code{EXCESS_PRECISION_TYPE_STANDARD}.  For\n\
+@code{EXCESS_PRECISION_TYPE_FAST},\n\
+@code{EXCESS_PRECISION_TYPE_STANDARD}, or\n\
+@code{EXCESS_PRECISION_TYPE_FLOAT16}.  For\n\
 @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which\n\
 precision and range operations will be implictly evaluated in regardless\n\
 of the excess precision explicitly added.  For\n\
-@code{EXCESS_PRECISION_TYPE_STANDARD} and\n\
+@code{EXCESS_PRECISION_TYPE_STANDARD}, \n\
+@code{EXCESS_PRECISION_TYPE_FLOAT16}, and\n\
 @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the\n\
 explicit excess precision that should be added depending on the\n\
 value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}.\n\
 Note that unpredictable explicit excess precision does not make sense,\n\
 so a target should never return @code{FLT_EVAL_METHOD_UNPREDICTABLE}\n\
-when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD} or\n\
+when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD},\n\
+@code{EXCESS_PRECISION_TYPE_FLOAT16} or\n\
 @code{EXCESS_PRECISION_TYPE_FAST}.",
  enum flt_eval_method, (enum excess_precision_type type),
  default_excess_precision)
diff --git a/gcc/testsuite/gcc.target/i386/float16-6.c b/gcc/testsuite/gcc.target/i386/float16-6.c
new file mode 100644
index 00000000000..3d2503ce5e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-6.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2 -mfpmath=sse -fdump-tree-gimple -fexcess-precision=16" } */
+/* { dg-final { scan-tree-dump-not "\\(float\\)" "gimple" } } */
+_Float16
+foo (_Float16 a, _Float16 b, _Float16 c)
+{
+  return a + b + c;
+}
diff --git a/gcc/testsuite/gcc.target/i386/float16-7.c b/gcc/testsuite/gcc.target/i386/float16-7.c
new file mode 100644
index 00000000000..86641afeba9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-7.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpmath=387 -fexcess-precision=16" } */
+/* { dg-excess-errors "'-fexcess-precision=16' is not compatible with '-mfpmath=387'" } */
+_Float16
+foo (_Float16 a, _Float16 b)
+{
+  return a + b;/* { dg-error "'-fexcess-precision=16' is not compatible with '-mfpmath=387'" } */
+}
+
diff --git a/gcc/tree.c b/gcc/tree.c
index e923e67b694..547b7e53510 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7635,7 +7635,8 @@ excess_precision_type (tree type)
   enum excess_precision_type requested_type
     = (flag_excess_precision == EXCESS_PRECISION_FAST
        ? EXCESS_PRECISION_TYPE_FAST
-       : EXCESS_PRECISION_TYPE_STANDARD);
+       : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
+	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
 
   enum flt_eval_method target_flt_eval_method
     = targetm.c.excess_precision (requested_type);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-06  3:32                                   ` [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field liuhongt
  2021-08-06  3:44                                     ` Andrew Pinski
@ 2021-08-06  6:57                                     ` Richard Biener
  2021-08-06  9:05                                       ` Richard Sandiford
  1 sibling, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-06  6:57 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Richard Sandiford, Hongtao Liu

On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> Hi:
> ---
> OK, I think sth is amiss here upthread.  insv/extv do look like they
> are designed
> to work on integer modes (but docs do not say anything about this here).
> In fact the caller of extract_bit_field_using_extv is named
> extract_integral_bit_field.  Of course nothing seems to check what kind of
> modes we're dealing with, but we're for example happily doing
> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> some integer mode and op0 is HFmode?  From the above I get it's
> the other way around?  In that case we should wrap the
> call to extract_integral_bit_field, extracting in an integer mode with the
> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> ---
>   This is a separate patch as a follow up of upper comments.
>
> gcc/ChangeLog:
>
>         * expmed.c (extract_bit_field_1): Wrap the call to
>         extract_integral_bit_field, extracting in an integer mode with
>         the same size as 'tmode' and then converting the result
>         as (subreg:tmode (reg:imode)).
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/i386/float16-5.c: New test.
> ---
>  gcc/expmed.c                              | 19 +++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index 3143f38e057..72790693ef0 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>        op0_mode = opt_scalar_int_mode ();
>      }
>
> +  /* Make sure we are playing with integral modes.  Pun with subregs
> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> +     in extract_integral_bit_field.  */
> +  if (int_mode_for_mode (tmode).exists (&imode)

check !INTEGRAL_MODE_P (tmode) before, that should be slightly
cheaper.  Then imode should always be != tmode.  Maybe
even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
how it behaves for composite modes.

Of course the least surprises would happen when we restrict this
to FLOAT_MODE_P (tmode).

Richard - any preferences?

> +      && imode != tmode
> +      && imode != GET_MODE (op0))
> +    {
> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> +                                           bitsize.to_constant (),
> +                                           bitnum.to_constant (), unsignedp,
> +                                           NULL, imode, imode,
> +                                           reverse, fallback_p);
> +      gcc_assert (ret);
> +
> +      if (!REG_P (ret))
> +       ret = force_reg (imode, ret);
> +      return gen_lowpart_SUBREG (tmode, ret);
> +    }
> +
>    /* It's possible we'll need to handle other cases here for
>       polynomial bitnum and bitsize.  */
>
> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> new file mode 100644
> index 00000000000..ebc0af1490b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse2 -O2" } */
> +_Float16
> +foo (int a)
> +{
> +  union {
> +    int a;
> +    _Float16 b;
> +  }c;
> +  c.a = a;
> +  return c.b;
> +}
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-06  4:59                                       ` Hongtao Liu
  2021-08-06  5:52                                         ` Hongtao Liu
@ 2021-08-06  6:59                                         ` Richard Biener
  1 sibling, 0 replies; 138+ messages in thread
From: Richard Biener @ 2021-08-06  6:59 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Andrew Pinski, Richard Sandiford, liuhongt, GCC Patches

On Fri, Aug 6, 2021 at 6:54 AM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, Aug 6, 2021 at 11:44 AM Andrew Pinski via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Aug 5, 2021 at 8:33 PM liuhongt via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi:
> > > ---
> > > OK, I think sth is amiss here upthread.  insv/extv do look like they
> > > are designed
> > > to work on integer modes (but docs do not say anything about this here).
> > > In fact the caller of extract_bit_field_using_extv is named
> > > extract_integral_bit_field.  Of course nothing seems to check what kind of
> > > modes we're dealing with, but we're for example happily doing
> > > expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > > some integer mode and op0 is HFmode?  From the above I get it's
> > > the other way around?  In that case we should wrap the
> > > call to extract_integral_bit_field, extracting in an integer mode with the
> > > same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> >
> > This seems related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93235 .
> > I wonder why the fix for that did not help here.
> >
> aarch64 didn't hit gcc_assert with my testcase, and I debugged it to
> figure out why.
>
> in gimple level, both x86 and aarch64 is the same with
> _3 = BIT_FIELD_REF <a_2(D), 16, 0>;
>
> and they all goes into
> extract_bit_field_using_extv
>
> The difference is aarch64 has ext_mode as DImode, but x86 has ext_mode
> as SImode.
> with ext_mode as DImode and target as (reg:HF 94), aarch64 doesn't hit
> gcc_assert in
>  gen_lowpart (ext_mode, target)
>
> since validate_subreg allow (subreg:DI (reg:HF)), but disallow
> (subreg:SI (reg:HF)).
>
>   /* ??? This should not be here.  Temporarily continue to allow word_mode
>      subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
>      Generally, backends are doing something sketchy but it'll take time to
>      fix them all.  */
>   if (omode == word_mode)
>     ;

Yeah, as said all this verification looks dubious.  This one especially so.

> ext_mode is assigned from extv->field mode which is initialized in
> get_best_reg_extraction_insn.
> get_best_reg_extraction_insn will finally call
> get_optab_extraction_insn and find
> aarch64 doesn't have CODE_FOR_extzvsi but x86 has.
>
> That's why aarch64 has ext_mode as DImode and x86 SImode.
>
> > Thanks,
> > Andrew Pinski
> >
> > > ---
> > >   This is a separate patch as a follow up of upper comments.
> > >
> > > gcc/ChangeLog:
> > >
> > >         * expmed.c (extract_bit_field_1): Wrap the call to
> > >         extract_integral_bit_field, extracting in an integer mode with
> > >         the same size as 'tmode' and then converting the result
> > >         as (subreg:tmode (reg:imode)).
> > >
> > > gcc/testsuite/ChangeLog:
> > >         * gcc.target/i386/float16-5.c: New test.
> > > ---
> > >  gcc/expmed.c                              | 19 +++++++++++++++++++
> > >  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> > >  2 files changed, 31 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > >
> > > diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > index 3143f38e057..72790693ef0 100644
> > > --- a/gcc/expmed.c
> > > +++ b/gcc/expmed.c
> > > @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> > >        op0_mode = opt_scalar_int_mode ();
> > >      }
> > >
> > > +  /* Make sure we are playing with integral modes.  Pun with subregs
> > > +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > > +     in extract_integral_bit_field.  */
> > > +  if (int_mode_for_mode (tmode).exists (&imode)
> > > +      && imode != tmode
> > > +      && imode != GET_MODE (op0))
> > > +    {
> > > +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> > > +                                           bitsize.to_constant (),
> > > +                                           bitnum.to_constant (), unsignedp,
> > > +                                           NULL, imode, imode,
> > > +                                           reverse, fallback_p);
> > > +      gcc_assert (ret);
> > > +
> > > +      if (!REG_P (ret))
> > > +       ret = force_reg (imode, ret);
> > > +      return gen_lowpart_SUBREG (tmode, ret);
> > > +    }
> > > +
> > >    /* It's possible we'll need to handle other cases here for
> > >       polynomial bitnum and bitsize.  */
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > new file mode 100644
> > > index 00000000000..ebc0af1490b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-msse2 -O2" } */
> > > +_Float16
> > > +foo (int a)
> > > +{
> > > +  union {
> > > +    int a;
> > > +    _Float16 b;
> > > +  }c;
> > > +  c.a = a;
> > > +  return c.b;
> > > +}
> > > --
> > > 2.27.0
> > >
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-06  6:57                                     ` Richard Biener
@ 2021-08-06  9:05                                       ` Richard Sandiford
  2021-08-06 11:27                                         ` Richard Biener
  0 siblings, 1 reply; 138+ messages in thread
From: Richard Sandiford @ 2021-08-06  9:05 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches; +Cc: liuhongt, Richard Biener

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
>>
>> Hi:
>> ---
>> OK, I think sth is amiss here upthread.  insv/extv do look like they
>> are designed
>> to work on integer modes (but docs do not say anything about this here).
>> In fact the caller of extract_bit_field_using_extv is named
>> extract_integral_bit_field.  Of course nothing seems to check what kind of
>> modes we're dealing with, but we're for example happily doing
>> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
>> some integer mode and op0 is HFmode?  From the above I get it's
>> the other way around?  In that case we should wrap the
>> call to extract_integral_bit_field, extracting in an integer mode with the
>> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
>> ---
>>   This is a separate patch as a follow up of upper comments.
>>
>> gcc/ChangeLog:
>>
>>         * expmed.c (extract_bit_field_1): Wrap the call to
>>         extract_integral_bit_field, extracting in an integer mode with
>>         the same size as 'tmode' and then converting the result
>>         as (subreg:tmode (reg:imode)).
>>
>> gcc/testsuite/ChangeLog:
>>         * gcc.target/i386/float16-5.c: New test.
>> ---
>>  gcc/expmed.c                              | 19 +++++++++++++++++++
>>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
>>  2 files changed, 31 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>>
>> diff --git a/gcc/expmed.c b/gcc/expmed.c
>> index 3143f38e057..72790693ef0 100644
>> --- a/gcc/expmed.c
>> +++ b/gcc/expmed.c
>> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>>        op0_mode = opt_scalar_int_mode ();
>>      }
>>
>> +  /* Make sure we are playing with integral modes.  Pun with subregs
>> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
>> +     in extract_integral_bit_field.  */
>> +  if (int_mode_for_mode (tmode).exists (&imode)
>
> check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> cheaper.  Then imode should always be != tmode.  Maybe
> even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> how it behaves for composite modes.
>
> Of course the least surprises would happen when we restrict this
> to FLOAT_MODE_P (tmode).
>
> Richard - any preferences?

If the bug is that extract_integral_bit_field is being called with
a non-integral mode parameter, then it looks odd that we can still
fall through to it without an integral mode (when exists is false).

If calling extract_integral_bit_field without an integral mode is
a bug then I think we should have:

  int_mode_for_mode (mode).require ()

whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
Ideally we'd make the mode parameter scalar_int_mode too.

extract_integral_bit_field currently has:

  /* Find a correspondingly-sized integer field, so we can apply
     shifts and masks to it.  */
  scalar_int_mode int_mode;
  if (!int_mode_for_mode (tmode).exists (&int_mode))
    /* If this fails, we should probably push op0 out to memory and then
       do a load.  */
    int_mode = int_mode_for_mode (mode).require ();

which would seem to be redundant after this change.

>> +      && imode != tmode
>> +      && imode != GET_MODE (op0))
>> +    {
>> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
>> +                                           bitsize.to_constant (),
>> +                                           bitnum.to_constant (), unsignedp,
>> +                                           NULL, imode, imode,
>> +                                           reverse, fallback_p);
>> +      gcc_assert (ret);
>> +
>> +      if (!REG_P (ret))
>> +       ret = force_reg (imode, ret);
>> +      return gen_lowpart_SUBREG (tmode, ret);
>> +    }
>> +
>>    /* It's possible we'll need to handle other cases here for
>>       polynomial bitnum and bitsize.  */

Minor nit, but since the code is using to_constant, it should go after
this comment rather than before it.

Thanks,
Richard

>>
>> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
>> new file mode 100644
>> index 00000000000..ebc0af1490b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-msse2 -O2" } */
>> +_Float16
>> +foo (int a)
>> +{
>> +  union {
>> +    int a;
>> +    _Float16 b;
>> +  }c;
>> +  c.a = a;
>> +  return c.b;
>> +}
>> --
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-06  9:05                                       ` Richard Sandiford
@ 2021-08-06 11:27                                         ` Richard Biener
  2021-08-09  8:34                                           ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-06 11:27 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches, liuhongt, Richard Biener,
	Richard Sandiford

On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
> >>
> >> Hi:
> >> ---
> >> OK, I think sth is amiss here upthread.  insv/extv do look like they
> >> are designed
> >> to work on integer modes (but docs do not say anything about this here).
> >> In fact the caller of extract_bit_field_using_extv is named
> >> extract_integral_bit_field.  Of course nothing seems to check what kind of
> >> modes we're dealing with, but we're for example happily doing
> >> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> >> some integer mode and op0 is HFmode?  From the above I get it's
> >> the other way around?  In that case we should wrap the
> >> call to extract_integral_bit_field, extracting in an integer mode with the
> >> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> >> ---
> >>   This is a separate patch as a follow up of upper comments.
> >>
> >> gcc/ChangeLog:
> >>
> >>         * expmed.c (extract_bit_field_1): Wrap the call to
> >>         extract_integral_bit_field, extracting in an integer mode with
> >>         the same size as 'tmode' and then converting the result
> >>         as (subreg:tmode (reg:imode)).
> >>
> >> gcc/testsuite/ChangeLog:
> >>         * gcc.target/i386/float16-5.c: New test.
> >> ---
> >>  gcc/expmed.c                              | 19 +++++++++++++++++++
> >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> >>  2 files changed, 31 insertions(+)
> >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> >>
> >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> >> index 3143f38e057..72790693ef0 100644
> >> --- a/gcc/expmed.c
> >> +++ b/gcc/expmed.c
> >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> >>        op0_mode = opt_scalar_int_mode ();
> >>      }
> >>
> >> +  /* Make sure we are playing with integral modes.  Pun with subregs
> >> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> >> +     in extract_integral_bit_field.  */
> >> +  if (int_mode_for_mode (tmode).exists (&imode)
> >
> > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > cheaper.  Then imode should always be != tmode.  Maybe
> > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > how it behaves for composite modes.
> >
> > Of course the least surprises would happen when we restrict this
> > to FLOAT_MODE_P (tmode).
> >
> > Richard - any preferences?
>
> If the bug is that extract_integral_bit_field is being called with
> a non-integral mode parameter, then it looks odd that we can still
> fall through to it without an integral mode (when exists is false).
>
> If calling extract_integral_bit_field without an integral mode is
> a bug then I think we should have:
>
>   int_mode_for_mode (mode).require ()
>
> whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
> Ideally we'd make the mode parameter scalar_int_mode too.
>
> extract_integral_bit_field currently has:
>
>   /* Find a correspondingly-sized integer field, so we can apply
>      shifts and masks to it.  */
>   scalar_int_mode int_mode;
>   if (!int_mode_for_mode (tmode).exists (&int_mode))
>     /* If this fails, we should probably push op0 out to memory and then
>        do a load.  */
>     int_mode = int_mode_for_mode (mode).require ();
>
> which would seem to be redundant after this change.

I'm not sure what exactly the bug is, but extract_integral_bit_field ends
up creating a lowpart subreg that's not allowed and that ICEs (and I
can't see a way to check beforehand).  So it seems to me at least
part of that function doesn't expect non-integral extraction modes.

But who knows - the code is older than I am (OK, not, but older than
my involvment in GCC ;))

Richard.

> >> +      && imode != tmode
> >> +      && imode != GET_MODE (op0))
> >> +    {
> >> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> >> +                                           bitsize.to_constant (),
> >> +                                           bitnum.to_constant (), unsignedp,
> >> +                                           NULL, imode, imode,
> >> +                                           reverse, fallback_p);
> >> +      gcc_assert (ret);
> >> +
> >> +      if (!REG_P (ret))
> >> +       ret = force_reg (imode, ret);
> >> +      return gen_lowpart_SUBREG (tmode, ret);
> >> +    }
> >> +
> >>    /* It's possible we'll need to handle other cases here for
> >>       polynomial bitnum and bitsize.  */
>
> Minor nit, but since the code is using to_constant, it should go after
> this comment rather than before it.
>
> Thanks,
> Richard
>
> >>
> >> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> >> new file mode 100644
> >> index 00000000000..ebc0af1490b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> >> @@ -0,0 +1,12 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-msse2 -O2" } */
> >> +_Float16
> >> +foo (int a)
> >> +{
> >> +  union {
> >> +    int a;
> >> +    _Float16 b;
> >> +  }c;
> >> +  c.a = a;
> >> +  return c.b;
> >> +}
> >> --
> >> 2.27.0
> >>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-06 11:27                                         ` Richard Biener
@ 2021-08-09  8:34                                           ` Hongtao Liu
  2021-08-17  1:52                                             ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-09  8:34 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, liuhongt, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 6417 bytes --]

On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
> > >>
> > >> Hi:
> > >> ---
> > >> OK, I think sth is amiss here upthread.  insv/extv do look like they
> > >> are designed
> > >> to work on integer modes (but docs do not say anything about this here).
> > >> In fact the caller of extract_bit_field_using_extv is named
> > >> extract_integral_bit_field.  Of course nothing seems to check what kind of
> > >> modes we're dealing with, but we're for example happily doing
> > >> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > >> some integer mode and op0 is HFmode?  From the above I get it's
> > >> the other way around?  In that case we should wrap the
> > >> call to extract_integral_bit_field, extracting in an integer mode with the
> > >> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> > >> ---
> > >>   This is a separate patch as a follow up of upper comments.
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >>         * expmed.c (extract_bit_field_1): Wrap the call to
> > >>         extract_integral_bit_field, extracting in an integer mode with
> > >>         the same size as 'tmode' and then converting the result
> > >>         as (subreg:tmode (reg:imode)).
> > >>
> > >> gcc/testsuite/ChangeLog:
> > >>         * gcc.target/i386/float16-5.c: New test.
> > >> ---
> > >>  gcc/expmed.c                              | 19 +++++++++++++++++++
> > >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> > >>  2 files changed, 31 insertions(+)
> > >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > >>
> > >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> > >> index 3143f38e057..72790693ef0 100644
> > >> --- a/gcc/expmed.c
> > >> +++ b/gcc/expmed.c
> > >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> > >>        op0_mode = opt_scalar_int_mode ();
> > >>      }
> > >>
> > >> +  /* Make sure we are playing with integral modes.  Pun with subregs
> > >> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > >> +     in extract_integral_bit_field.  */
> > >> +  if (int_mode_for_mode (tmode).exists (&imode)
> > >
> > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > > cheaper.  Then imode should always be != tmode.  Maybe
> > > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > > how it behaves for composite modes.
> > >
> > > Of course the least surprises would happen when we restrict this
> > > to FLOAT_MODE_P (tmode).
> > >
> > > Richard - any preferences?
> >
> > If the bug is that extract_integral_bit_field is being called with
> > a non-integral mode parameter, then it looks odd that we can still
> > fall through to it without an integral mode (when exists is false).
> >
> > If calling extract_integral_bit_field without an integral mode is
> > a bug then I think we should have:
> >
> >   int_mode_for_mode (mode).require ()
> >
> > whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
> > Ideally we'd make the mode parameter scalar_int_mode too.
> >
> > extract_integral_bit_field currently has:
> >
> >   /* Find a correspondingly-sized integer field, so we can apply
> >      shifts and masks to it.  */
> >   scalar_int_mode int_mode;
> >   if (!int_mode_for_mode (tmode).exists (&int_mode))
> >     /* If this fails, we should probably push op0 out to memory and then
> >        do a load.  */
> >     int_mode = int_mode_for_mode (mode).require ();
> >
> > which would seem to be redundant after this change.
>
> I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> up creating a lowpart subreg that's not allowed and that ICEs (and I
> can't see a way to check beforehand).  So it seems to me at least
> part of that function doesn't expect non-integral extraction modes.
>
> But who knows - the code is older than I am (OK, not, but older than
> my involvment in GCC ;))
>
How about attached patch w/ below changelog

gcc/ChangeLog:

        * expmed.c (extract_bit_field_1): Make sure we're playing with
        integral modes before call extract_integral_bit_field.
        (extract_integral_bit_field): Add a parameter of type
        scalar_int_mode which corresponds to of tmode.
        And call extract_and_convert_fixed_bit_field instead of
        extract_fixed_bit_field and convert_extracted_bit_field.
        (extract_and_convert_fixed_bit_field): New function, it's a
        combination of extract_fixed_bit_field and
        convert_extracted_bit_field.

gcc/testsuite/ChangeLog:
        * gcc.target/i386/float16-5.c: New test.


> Richard.
>
> > >> +      && imode != tmode
> > >> +      && imode != GET_MODE (op0))
> > >> +    {
> > >> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> > >> +                                           bitsize.to_constant (),
> > >> +                                           bitnum.to_constant (), unsignedp,
> > >> +                                           NULL, imode, imode,
> > >> +                                           reverse, fallback_p);
> > >> +      gcc_assert (ret);
> > >> +
> > >> +      if (!REG_P (ret))
> > >> +       ret = force_reg (imode, ret);
> > >> +      return gen_lowpart_SUBREG (tmode, ret);
> > >> +    }
> > >> +
> > >>    /* It's possible we'll need to handle other cases here for
> > >>       polynomial bitnum and bitsize.  */
> >
> > Minor nit, but since the code is using to_constant, it should go after
> > this comment rather than before it.
> >
> > Thanks,
> > Richard
> >
> > >>
> > >> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> > >> new file mode 100644
> > >> index 00000000000..ebc0af1490b
> > >> --- /dev/null
> > >> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > >> @@ -0,0 +1,12 @@
> > >> +/* { dg-do compile } */
> > >> +/* { dg-options "-msse2 -O2" } */
> > >> +_Float16
> > >> +foo (int a)
> > >> +{
> > >> +  union {
> > >> +    int a;
> > >> +    _Float16 b;
> > >> +  }c;
> > >> +  c.a = a;
> > >> +  return c.b;
> > >> +}
> > >> --
> > >> 2.27.0
> > >>



-- 
BR,
Hongtao

[-- Attachment #2: 0001-Make-sure-we-re-playing-with-integral-modes-before-c.patch --]
[-- Type: text/x-patch, Size: 7259 bytes --]

From 9c77ac15e69b567156a82debe45e3ced10df1110 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Fri, 6 Aug 2021 10:18:43 +0800
Subject: [PATCH] Make sure we're playing with integral modes before call
 extract_integral_bit_field.

gcc/ChangeLog:

	* expmed.c (extract_bit_field_1): Make sure we're playing with
	integral modes before call extract_integral_bit_field.
	(extract_integral_bit_field): Add a parameter of type
	scalar_int_mode which corresponds to of tmode.
	And call extract_and_convert_fixed_bit_field instead of
	extract_fixed_bit_field and convert_extracted_bit_field.
	(extract_and_convert_fixed_bit_field): New function, it's a
	combination of extract_fixed_bit_field and
	convert_extracted_bit_field.

gcc/testsuite/ChangeLog:
	* gcc.target/i386/float16-5.c: New test.
---
 gcc/expmed.c                              | 103 ++++++++++++++++------
 gcc/testsuite/gcc.target/i386/float16-5.c |  12 +++
 2 files changed, 90 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3143f38e057..f083d6e86d0 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -71,7 +71,14 @@ static void store_split_bit_field (rtx, opt_scalar_int_mode,
 static rtx extract_integral_bit_field (rtx, opt_scalar_int_mode,
 				       unsigned HOST_WIDE_INT,
 				       unsigned HOST_WIDE_INT, int, rtx,
-				       machine_mode, machine_mode, bool, bool);
+				       machine_mode, machine_mode,
+				       scalar_int_mode, bool, bool);
+static rtx extract_and_convert_fixed_bit_field (scalar_int_mode,
+						machine_mode, machine_mode,
+						rtx, opt_scalar_int_mode,
+						unsigned HOST_WIDE_INT,
+						unsigned HOST_WIDE_INT, rtx,
+						int, bool);
 static rtx extract_fixed_bit_field (machine_mode, rtx, opt_scalar_int_mode,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT, rtx, int, bool);
@@ -1632,6 +1639,7 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 {
   rtx op0 = str_rtx;
   machine_mode mode1;
+  scalar_int_mode int_tmode;
 
   if (tmode == VOIDmode)
     tmode = mode;
@@ -1853,10 +1861,46 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
   /* It's possible we'll need to handle other cases here for
      polynomial bitnum and bitsize.  */
 
+  /* Make sure we are playing with integral modes.  Pun with subregs
+     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
+     in extract_integral_bit_field.  */
+  opt_scalar_int_mode target_imode = int_mode_for_mode (tmode);
+  if (!target_imode.exists (&int_tmode) || int_tmode != tmode)
+    {
+      if (target_imode.exists (&int_tmode))
+	{
+	  rtx ret = extract_integral_bit_field (op0, op0_mode,
+						bitsize.to_constant (),
+						bitnum.to_constant (),
+						unsignedp, NULL, int_tmode,
+						int_tmode, int_tmode,
+						reverse, fallback_p);
+	  gcc_assert (ret);
+
+	  if (!REG_P (ret))
+	    ret = force_reg (int_tmode, ret);
+	  return gen_lowpart_SUBREG (tmode, ret);
+	}
+      else
+	{
+	  if (!fallback_p)
+	    return NULL;
+
+	  int_tmode = int_mode_for_mode (mode).require ();
+	  return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
+						      op0, op0_mode,
+						      bitsize.to_constant (),
+						      bitnum.to_constant (),
+						      target, unsignedp,
+						      reverse);
+	}
+    }
+
   /* From here on we need to be looking at a fixed-size insertion.  */
   return extract_integral_bit_field (op0, op0_mode, bitsize.to_constant (),
 				     bitnum.to_constant (), unsignedp,
-				     target, mode, tmode, reverse, fallback_p);
+				     target, mode, tmode,
+				     int_tmode, reverse, fallback_p);
 }
 
 /* Subroutine of extract_bit_field_1, with the same arguments, except
@@ -1869,6 +1913,7 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 			    unsigned HOST_WIDE_INT bitsize,
 			    unsigned HOST_WIDE_INT bitnum, int unsignedp,
 			    rtx target, machine_mode mode, machine_mode tmode,
+			    scalar_int_mode int_tmode,
 			    bool reverse, bool fallback_p)
 {
   /* Handle fields bigger than a word.  */
@@ -2035,29 +2080,10 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
   if (!fallback_p)
     return NULL;
 
-  /* Find a correspondingly-sized integer field, so we can apply
-     shifts and masks to it.  */
-  scalar_int_mode int_mode;
-  if (!int_mode_for_mode (tmode).exists (&int_mode))
-    /* If this fails, we should probably push op0 out to memory and then
-       do a load.  */
-    int_mode = int_mode_for_mode (mode).require ();
-
-  target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
-				    bitnum, target, unsignedp, reverse);
-
-  /* Complex values must be reversed piecewise, so we need to undo the global
-     reversal, convert to the complex mode and reverse again.  */
-  if (reverse && COMPLEX_MODE_P (tmode))
-    {
-      target = flip_storage_order (int_mode, target);
-      target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
-      target = flip_storage_order (tmode, target);
-    }
-  else
-    target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
-
-  return target;
+  return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
+					      op0, op0_mode, bitsize,
+					      bitnum, target, unsignedp,
+					      reverse);
 }
 
 /* Generate code to extract a byte-field from STR_RTX
@@ -2129,6 +2155,33 @@ extract_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
   return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp,
 			      target, mode, tmode, reverse, true, alt_rtl);
 }
+
+/* Combination of extract_fixed_bit_field and convert_extracted_bit_field.  */
+static rtx
+extract_and_convert_fixed_bit_field (scalar_int_mode int_tmode,
+				     machine_mode tmode, machine_mode mode,
+				     rtx op0, opt_scalar_int_mode op0_mode,
+				     unsigned HOST_WIDE_INT bitsize,
+				     unsigned HOST_WIDE_INT bitnum,
+				     rtx target, int unsignedp, bool reverse)
+{
+  target = extract_fixed_bit_field (int_tmode, op0, op0_mode, bitsize,
+				    bitnum, target, unsignedp, reverse);
+
+  /* Complex values must be reversed piecewise, so we need to undo the global
+     reversal, convert to the complex mode and reverse again.  */
+  if (reverse && COMPLEX_MODE_P (tmode))
+    {
+      target = flip_storage_order (int_tmode, target);
+      target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+      target = flip_storage_order (tmode, target);
+    }
+  else
+    target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+
+  return target;
+}
+
 \f
 /* Use shifts and boolean operations to extract a field of BITSIZE bits
    from bit BITNUM of OP0.  If OP0_MODE is defined, it is the mode of OP0,
diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
new file mode 100644
index 00000000000..ebc0af1490b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2" } */
+_Float16
+foo (int a)
+{
+  union {
+    int a;
+    _Float16 b;
+  }c;
+  c.a = a;
+  return c.b;
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-09  8:34                                           ` Hongtao Liu
@ 2021-08-17  1:52                                             ` Hongtao Liu
  2021-08-24  9:40                                               ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-17  1:52 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, liuhongt, Richard Sandiford

On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > >
> > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > >>
> > > >> Hi:
> > > >> ---
> > > >> OK, I think sth is amiss here upthread.  insv/extv do look like they
> > > >> are designed
> > > >> to work on integer modes (but docs do not say anything about this here).
> > > >> In fact the caller of extract_bit_field_using_extv is named
> > > >> extract_integral_bit_field.  Of course nothing seems to check what kind of
> > > >> modes we're dealing with, but we're for example happily doing
> > > >> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > > >> some integer mode and op0 is HFmode?  From the above I get it's
> > > >> the other way around?  In that case we should wrap the
> > > >> call to extract_integral_bit_field, extracting in an integer mode with the
> > > >> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> > > >> ---
> > > >>   This is a separate patch as a follow up of upper comments.
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >>         * expmed.c (extract_bit_field_1): Wrap the call to
> > > >>         extract_integral_bit_field, extracting in an integer mode with
> > > >>         the same size as 'tmode' and then converting the result
> > > >>         as (subreg:tmode (reg:imode)).
> > > >>
> > > >> gcc/testsuite/ChangeLog:
> > > >>         * gcc.target/i386/float16-5.c: New test.
> > > >> ---
> > > >>  gcc/expmed.c                              | 19 +++++++++++++++++++
> > > >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> > > >>  2 files changed, 31 insertions(+)
> > > >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > > >>
> > > >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > >> index 3143f38e057..72790693ef0 100644
> > > >> --- a/gcc/expmed.c
> > > >> +++ b/gcc/expmed.c
> > > >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> > > >>        op0_mode = opt_scalar_int_mode ();
> > > >>      }
> > > >>
> > > >> +  /* Make sure we are playing with integral modes.  Pun with subregs
> > > >> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > > >> +     in extract_integral_bit_field.  */
> > > >> +  if (int_mode_for_mode (tmode).exists (&imode)
> > > >
> > > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > > > cheaper.  Then imode should always be != tmode.  Maybe
> > > > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > > > how it behaves for composite modes.
> > > >
> > > > Of course the least surprises would happen when we restrict this
> > > > to FLOAT_MODE_P (tmode).
> > > >
> > > > Richard - any preferences?
> > >
> > > If the bug is that extract_integral_bit_field is being called with
> > > a non-integral mode parameter, then it looks odd that we can still
> > > fall through to it without an integral mode (when exists is false).
> > >
> > > If calling extract_integral_bit_field without an integral mode is
> > > a bug then I think we should have:
> > >
> > >   int_mode_for_mode (mode).require ()
> > >
> > > whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
> > > Ideally we'd make the mode parameter scalar_int_mode too.
> > >
> > > extract_integral_bit_field currently has:
> > >
> > >   /* Find a correspondingly-sized integer field, so we can apply
> > >      shifts and masks to it.  */
> > >   scalar_int_mode int_mode;
> > >   if (!int_mode_for_mode (tmode).exists (&int_mode))
> > >     /* If this fails, we should probably push op0 out to memory and then
> > >        do a load.  */
> > >     int_mode = int_mode_for_mode (mode).require ();
> > >
> > > which would seem to be redundant after this change.
> >
> > I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> > up creating a lowpart subreg that's not allowed and that ICEs (and I
> > can't see a way to check beforehand).  So it seems to me at least
> > part of that function doesn't expect non-integral extraction modes.
> >
> > But who knows - the code is older than I am (OK, not, but older than
> > my involvment in GCC ;))
> >
> How about attached patch w/ below changelog
>
> gcc/ChangeLog:
>
>         * expmed.c (extract_bit_field_1): Make sure we're playing with
>         integral modes before call extract_integral_bit_field.
>         (extract_integral_bit_field): Add a parameter of type
>         scalar_int_mode which corresponds to of tmode.
>         And call extract_and_convert_fixed_bit_field instead of
>         extract_fixed_bit_field and convert_extracted_bit_field.
>         (extract_and_convert_fixed_bit_field): New function, it's a
>         combination of extract_fixed_bit_field and
>         convert_extracted_bit_field.
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/i386/float16-5.c: New test.
>
I'd like to ping this patch, or maybe we can use the patch before with
richi's comments.
>
> > Richard.
> >
> > > >> +      && imode != tmode
> > > >> +      && imode != GET_MODE (op0))
> > > >> +    {
> > > >> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> > > >> +                                           bitsize.to_constant (),
> > > >> +                                           bitnum.to_constant (), unsignedp,
> > > >> +                                           NULL, imode, imode,
> > > >> +                                           reverse, fallback_p);
> > > >> +      gcc_assert (ret);
> > > >> +
> > > >> +      if (!REG_P (ret))
> > > >> +       ret = force_reg (imode, ret);
> > > >> +      return gen_lowpart_SUBREG (tmode, ret);
> > > >> +    }
> > > >> +
> > > >>    /* It's possible we'll need to handle other cases here for
> > > >>       polynomial bitnum and bitsize.  */
> > >
> > > Minor nit, but since the code is using to_constant, it should go after
> > > this comment rather than before it.
> > >
> > > Thanks,
> > > Richard
> > >
> > > >>
> > > >> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > >> new file mode 100644
> > > >> index 00000000000..ebc0af1490b
> > > >> --- /dev/null
> > > >> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > >> @@ -0,0 +1,12 @@
> > > >> +/* { dg-do compile } */
> > > >> +/* { dg-options "-msse2 -O2" } */
> > > >> +_Float16
> > > >> +foo (int a)
> > > >> +{
> > > >> +  union {
> > > >> +    int a;
> > > >> +    _Float16 b;
> > > >> +  }c;
> > > >> +  c.a = a;
> > > >> +  return c.b;
> > > >> +}
> > > >> --
> > > >> 2.27.0
> > > >>
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
  2021-08-06  6:06                           ` Hongtao Liu
@ 2021-08-17  1:53                             ` Hongtao Liu
  2021-08-24  9:39                               ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-17  1:53 UTC (permalink / raw)
  To: Joseph Myers; +Cc: liuhongt, GCC Patches, schwab, Richard Sandiford

On Fri, Aug 6, 2021 at 2:06 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 3, 2021 at 10:44 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Tue, Aug 3, 2021 at 3:34 AM Joseph Myers <joseph@codesourcery.com> wrote:
> > >
> > > On Mon, 2 Aug 2021, liuhongt via Gcc-patches wrote:
> > >
> > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > index 7979e240426..dc673c89bc8 100644
> > > > --- a/gcc/config/i386/i386.c
> > > > +++ b/gcc/config/i386/i386.c
> > > > @@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum excess_precision_type type)
> > > >       return (type == EXCESS_PRECISION_TYPE_STANDARD
> > > >               ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> > > >               : FLT_EVAL_METHOD_UNPREDICTABLE);
> > > > +      case EXCESS_PRECISION_TYPE_FLOAT16:
> > > > +     return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > > >        default:
> > > >       gcc_unreachable ();
> > > >      }
> > >
> > > I'd expect an error for -fexcess-precision=16 with -mfpmath=387 (since x87
> > > doesn't do float or double arithmetic, but -fexcess-precision=16 implies
> > > that all of _Float16, float and double are represented to the range and
> > > precision of their type withou any excess precision).
> > >
> > Yes, additional changes like this.
> >
> > modified   gcc/config/i386/i386.c
> > @@ -23443,6 +23443,9 @@ ix86_get_excess_precision (enum
> > excess_precision_type type)
> >   ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> >   : FLT_EVAL_METHOD_UNPREDICTABLE);
> >        case EXCESS_PRECISION_TYPE_FLOAT16:
> > + if (TARGET_80387
> > +     && !(TARGET_SSE_MATH && TARGET_SSE))
> > +   error ("%<-fexcess-precision=16%> is not compatible with %<-mfpmath=387%>");
> >   return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> >        default:
> >   gcc_unreachable ();
> > new file   gcc/testsuite/gcc.target/i386/float16-7.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mfpmath=387 -fexcess-precision=16" } */
> > +/* { dg-excess-errors "'-fexcess-precision=16' is not compatible with
> > '-mfpmath=387'" } */
> > +_Float16
> > +foo (_Float16 a, _Float16 b)
> > +{
> > +  return a + b;/* { dg-error "'-fexcess-precision=16' is not
> > compatible with '-mfpmath=387'" } */
> > +}
> > +
> >
> > > --
> > > Joseph S. Myers
> > > joseph@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
> Updated patch and ping for it.
>
> Also for backend changes.
> 1. For backend m68k/s390 which totally don't support _Float16, backend
> will issue an error for -fexcess-precision=16, I think it should be
> fine.
> 2. For backend like arm/aarch64 which supports _Float16 , backend will
> set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for -fexcess-precision=16 even
> hardware instruction for fp16 is not supported. Would that be ok for
> arm?

Ping for this patch.

> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
  2021-08-17  1:53                             ` Hongtao Liu
@ 2021-08-24  9:39                               ` Hongtao Liu
  2021-09-02  6:06                                 ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-24  9:39 UTC (permalink / raw)
  To: Joseph Myers; +Cc: liuhongt, GCC Patches, schwab, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 3242 bytes --]

On Tue, Aug 17, 2021 at 9:53 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Aug 6, 2021 at 2:06 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Tue, Aug 3, 2021 at 10:44 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Tue, Aug 3, 2021 at 3:34 AM Joseph Myers <joseph@codesourcery.com> wrote:
> > > >
> > > > On Mon, 2 Aug 2021, liuhongt via Gcc-patches wrote:
> > > >
> > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > index 7979e240426..dc673c89bc8 100644
> > > > > --- a/gcc/config/i386/i386.c
> > > > > +++ b/gcc/config/i386/i386.c
> > > > > @@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum excess_precision_type type)
> > > > >       return (type == EXCESS_PRECISION_TYPE_STANDARD
> > > > >               ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> > > > >               : FLT_EVAL_METHOD_UNPREDICTABLE);
> > > > > +      case EXCESS_PRECISION_TYPE_FLOAT16:
> > > > > +     return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > > > >        default:
> > > > >       gcc_unreachable ();
> > > > >      }
> > > >
> > > > I'd expect an error for -fexcess-precision=16 with -mfpmath=387 (since x87
> > > > doesn't do float or double arithmetic, but -fexcess-precision=16 implies
> > > > that all of _Float16, float and double are represented to the range and
> > > > precision of their type withou any excess precision).
> > > >
> > > Yes, additional changes like this.
> > >
> > > modified   gcc/config/i386/i386.c
> > > @@ -23443,6 +23443,9 @@ ix86_get_excess_precision (enum
> > > excess_precision_type type)
> > >   ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> > >   : FLT_EVAL_METHOD_UNPREDICTABLE);
> > >        case EXCESS_PRECISION_TYPE_FLOAT16:
> > > + if (TARGET_80387
> > > +     && !(TARGET_SSE_MATH && TARGET_SSE))
> > > +   error ("%<-fexcess-precision=16%> is not compatible with %<-mfpmath=387%>");
> > >   return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > >        default:
> > >   gcc_unreachable ();
> > > new file   gcc/testsuite/gcc.target/i386/float16-7.c
> > > @@ -0,0 +1,9 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -mfpmath=387 -fexcess-precision=16" } */
> > > +/* { dg-excess-errors "'-fexcess-precision=16' is not compatible with
> > > '-mfpmath=387'" } */
> > > +_Float16
> > > +foo (_Float16 a, _Float16 b)
> > > +{
> > > +  return a + b;/* { dg-error "'-fexcess-precision=16' is not
> > > compatible with '-mfpmath=387'" } */
> > > +}
> > > +
> > >
> > > > --
> > > > Joseph S. Myers
> > > > joseph@codesourcery.com
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> > Updated patch and ping for it.
> >
> > Also for backend changes.
> > 1. For backend m68k/s390 which totally don't support _Float16, backend
> > will issue an error for -fexcess-precision=16, I think it should be
> > fine.
> > 2. For backend like arm/aarch64 which supports _Float16 , backend will
> > set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for -fexcess-precision=16 even
> > hardware instruction for fp16 is not supported. Would that be ok for
> > arm?
>
> Ping for this patch.
>
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao

Rebased and ping^3, there are plenty of avx512fp16 patches blocked by
this patch, i'd like someone to help review this patch.
-- 
BR,
Hongtao

[-- Attachment #2: 0001-Support-fexcess-precision-16-which-will-enable-FLT_E.patch --]
[-- Type: text/x-patch, Size: 17373 bytes --]

From 5deedc50dde5846dff4d0bf0719a7a5facc3723e Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 2 Aug 2021 10:56:45 +0800
Subject: [PATCH] Support -fexcess-precision=16 which will enable
 FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.

gcc/ada/ChangeLog:

	* gcc-interface/misc.c (gnat_post_options): Issue an error for
	-fexcess-precision=16.

gcc/c-family/ChangeLog:

	* c-common.c (excess_precision_mode_join): Update below comments.
	(c_ts18661_flt_eval_method): Set excess_precision_type to
	EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
	* c-cppbuiltin.c (cpp_atomic_builtins): Update below comments.
	(c_cpp_flt_eval_method_iec_559): Set excess_precision_type to
	EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.

gcc/ChangeLog:

	* common.opt: Support -fexcess-precision=16.
	* config/aarch64/aarch64.c (aarch64_excess_precision): Return
	FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when
	EXCESS_PRECISION_TYPE_FLOAT16.
	* config/arm/arm.c (arm_excess_precision): Ditto.
	* config/i386/i386.c (ix86_get_excess_precision): Ditto.
	* config/m68k/m68k.c (m68k_excess_precision): Issue an error
	when EXCESS_PRECISION_TYPE_FLOAT16.
	* config/s390/s390.c (s390_excess_precision): Ditto.
	* coretypes.h (enum excess_precision_type): Add
	EXCESS_PRECISION_TYPE_FLOAT16.
	* doc/tm.texi (TARGET_C_EXCESS_PRECISION): Update documents.
	* doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): Ditto.
	* doc/extend.texi (Half-Precision): Document
	-fexcess-precision=16.
	* flag-types.h (enum excess_precision): Add
	EXCESS_PRECISION_FLOAT16.
	* target.def (excess_precision): Update document.
	* tree.c (excess_precision_type): Set excess_precision_type to
	EXCESS_PRECISION_FLOAT16 when -fexcess-precision=16.

gcc/fortran/ChangeLog:

	* options.c (gfc_post_options): Issue an error for
	-fexcess-precision=16.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/float16-6.c: New test.
	* gcc.target/i386/float16-7.c: New test.
---
 gcc/ada/gcc-interface/misc.c              |  3 +++
 gcc/c-family/c-common.c                   |  6 ++++--
 gcc/c-family/c-cppbuiltin.c               |  6 ++++--
 gcc/common.opt                            |  5 ++++-
 gcc/config/aarch64/aarch64.c              |  1 +
 gcc/config/arm/arm.c                      |  1 +
 gcc/config/i386/i386.c                    |  5 +++++
 gcc/config/m68k/m68k.c                    |  3 +++
 gcc/config/s390/s390.c                    |  3 +++
 gcc/coretypes.h                           |  3 ++-
 gcc/doc/extend.texi                       |  3 ++-
 gcc/doc/tm.texi                           | 14 ++++++++++----
 gcc/doc/tm.texi.in                        |  3 +++
 gcc/flag-types.h                          |  3 ++-
 gcc/fortran/options.c                     |  3 +++
 gcc/target.def                            | 11 +++++++----
 gcc/testsuite/gcc.target/i386/float16-6.c |  8 ++++++++
 gcc/testsuite/gcc.target/i386/float16-7.c |  9 +++++++++
 gcc/tree.c                                |  3 ++-
 19 files changed, 76 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-7.c

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 186367ac6d1..96199bd4b63 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -256,6 +256,9 @@ gnat_post_options (const char **pfilename ATTRIBUTE_UNUSED)
   /* Excess precision other than "fast" requires front-end support.  */
   if (flag_excess_precision == EXCESS_PRECISION_STANDARD)
     sorry ("%<-fexcess-precision=standard%> for Ada");
+  else if (flag_excess_precision == EXCESS_PRECISION_FLOAT16)
+    sorry ("%<-fexcess-precision=16%> for Ada");
+
   flag_excess_precision = EXCESS_PRECISION_FAST;
 
   /* No psABI change warnings for Ada.  */
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 017e41537ac..c6757f093ac 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8778,7 +8778,7 @@ excess_precision_mode_join (enum flt_eval_method x,
 
    This relates to the effective excess precision seen by the user,
    which is the join point of the precision the target requests for
-   -fexcess-precision={standard,fast} and the implicit excess precision
+   -fexcess-precision={standard,fast,16} and the implicit excess precision
    the target uses.  */
 
 static enum flt_eval_method
@@ -8790,7 +8790,9 @@ c_ts18661_flt_eval_method (void)
   enum excess_precision_type flag_type
     = (flag_excess_precision == EXCESS_PRECISION_STANDARD
        ? EXCESS_PRECISION_TYPE_STANDARD
-       : EXCESS_PRECISION_TYPE_FAST);
+       : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
+	  ? EXCESS_PRECISION_TYPE_FLOAT16
+	  : EXCESS_PRECISION_TYPE_FAST));
 
   enum flt_eval_method requested
     = targetm.c.excess_precision (flag_type);
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 3fa62bc4fe7..48cbefd8bf8 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -753,7 +753,7 @@ cpp_atomic_builtins (cpp_reader *pfile)
 /* Return TRUE if the implicit excess precision in which the back-end will
    compute floating-point calculations is not more than the explicit
    excess precision that the front-end will apply under
-   -fexcess-precision=[standard|fast].
+   -fexcess-precision=[standard|fast|16].
 
    More intuitively, return TRUE if the excess precision proposed by the
    front-end is the excess precision that will actually be used.  */
@@ -764,7 +764,9 @@ c_cpp_flt_eval_method_iec_559 (void)
   enum excess_precision_type front_end_ept
     = (flag_excess_precision == EXCESS_PRECISION_STANDARD
        ? EXCESS_PRECISION_TYPE_STANDARD
-       : EXCESS_PRECISION_TYPE_FAST);
+       : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
+	  ? EXCESS_PRECISION_TYPE_FLOAT16
+	  : EXCESS_PRECISION_TYPE_FAST));
 
   enum flt_eval_method back_end
     = targetm.c.excess_precision (EXCESS_PRECISION_TYPE_IMPLICIT);
diff --git a/gcc/common.opt b/gcc/common.opt
index ed8ab5fbe13..9fec349a736 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1518,7 +1518,7 @@ Perform a number of minor, expensive optimizations.
 
 fexcess-precision=
 Common Joined RejectNegative Enum(excess_precision) Var(flag_excess_precision) Init(EXCESS_PRECISION_DEFAULT) Optimization SetByCombined
--fexcess-precision=[fast|standard]	Specify handling of excess floating-point precision.
+-fexcess-precision=[fast|standard|16]	Specify handling of excess floating-point precision.
 
 Enum
 Name(excess_precision) Type(enum excess_precision) UnknownError(unknown excess precision style %qs)
@@ -1529,6 +1529,9 @@ Enum(excess_precision) String(fast) Value(EXCESS_PRECISION_FAST)
 EnumValue
 Enum(excess_precision) String(standard) Value(EXCESS_PRECISION_STANDARD)
 
+EnumValue
+Enum(excess_precision) String(16) Value(EXCESS_PRECISION_FLOAT16)
+
 ; Whether we permit the extended set of values for FLT_EVAL_METHOD
 ; introduced in ISO/IEC TS 18661-3, or limit ourselves to those in C99/C11.
 fpermitted-flt-eval-methods=
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3213585a588..53773e58bd1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -25045,6 +25045,7 @@ aarch64_excess_precision (enum excess_precision_type type)
 		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
 		: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
       case EXCESS_PRECISION_TYPE_IMPLICIT:
+      case EXCESS_PRECISION_TYPE_FLOAT16:
 	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
 	gcc_unreachable ();
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 11dafc70067..6494a0edc36 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25603,6 +25603,7 @@ arm_excess_precision (enum excess_precision_type type)
 		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
 		: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
       case EXCESS_PRECISION_TYPE_IMPLICIT:
+      case EXCESS_PRECISION_TYPE_FLOAT16:
 	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
 	gcc_unreachable ();
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e069d0c0596..e1b1e09bd10 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23523,6 +23523,11 @@ ix86_get_excess_precision (enum excess_precision_type type)
 	return (type == EXCESS_PRECISION_TYPE_STANDARD
 		? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
 		: FLT_EVAL_METHOD_UNPREDICTABLE);
+      case EXCESS_PRECISION_TYPE_FLOAT16:
+	if (TARGET_80387
+	    && !(TARGET_SSE_MATH && TARGET_SSE))
+	  error ("%<-fexcess-precision=16%> is not compatible with %<-mfpmath=387%>");
+	return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
       default:
 	gcc_unreachable ();
     }
diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index 3f63c60fa92..0248eb719a8 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -7115,6 +7115,9 @@ m68k_excess_precision (enum excess_precision_type type)
 	  return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
 
 	return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
+      case EXCESS_PRECISION_TYPE_FLOAT16:
+	error ("%<-fexcess-precision=16%> is not supported on this target");
+	break;
       default:
 	gcc_unreachable ();
     }
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 673a1340285..54dd6332c3a 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16549,6 +16549,9 @@ s390_excess_precision (enum excess_precision_type type)
 	   ensure consistency with the implementation in glibc, report that
 	   float is evaluated to the range and precision of double.  */
 	return FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE;
+      case EXCESS_PRECISION_TYPE_FLOAT16:
+	error ("%<-fexcess-precision=16%> is not supported on this target");
+	break;
       default:
 	gcc_unreachable ();
     }
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 406572e947d..07b9aa656c5 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -424,7 +424,8 @@ enum excess_precision_type
 {
   EXCESS_PRECISION_TYPE_IMPLICIT,
   EXCESS_PRECISION_TYPE_STANDARD,
-  EXCESS_PRECISION_TYPE_FAST
+  EXCESS_PRECISION_TYPE_FAST,
+  EXCESS_PRECISION_TYPE_FLOAT16
 };
 
 /* Level of size optimization.  */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6cf9ebbc3e9..8a89efc4321 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1161,7 +1161,8 @@ operations will be emulated by software emulation and the @code{float}
 instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
 the intermediate result of the operation as 32-bit precision. This may lead
 to inconsistent behavior between software emulation and AVX512-FP16
-instructions.
+instructions. Using @option{-fexcess-precision=16} and  will force round
+back after each operation.
 
 @node Decimal Float
 @section Decimal Floating Types
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f68f42638a1..be8148583d8 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -982,20 +982,26 @@ Do not define this macro if it would never modify @var{m}.
 Return a value, with the same meaning as the C99 macro
 @code{FLT_EVAL_METHOD} that describes which excess precision should be
 applied.  @var{type} is either @code{EXCESS_PRECISION_TYPE_IMPLICIT},
-@code{EXCESS_PRECISION_TYPE_FAST}, or
-@code{EXCESS_PRECISION_TYPE_STANDARD}.  For
+@code{EXCESS_PRECISION_TYPE_FAST},
+@code{EXCESS_PRECISION_TYPE_STANDARD}, or
+@code{EXCESS_PRECISION_TYPE_FLOAT16}.  For
 @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which
 precision and range operations will be implictly evaluated in regardless
 of the excess precision explicitly added.  For
-@code{EXCESS_PRECISION_TYPE_STANDARD} and
+@code{EXCESS_PRECISION_TYPE_STANDARD}, 
+@code{EXCESS_PRECISION_TYPE_FLOAT16}, and
 @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the
 explicit excess precision that should be added depending on the
 value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}.
 Note that unpredictable explicit excess precision does not make sense,
 so a target should never return @code{FLT_EVAL_METHOD_UNPREDICTABLE}
-when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD} or
+when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD},
+@code{EXCESS_PRECISION_TYPE_FLOAT16} or
 @code{EXCESS_PRECISION_TYPE_FAST}.
 @end deftypefn
+Return a value, with the same meaning as the C99 macro
+@code{FLT_EVAL_METHOD} that describes which excess precision should be
+applied.
 
 @deftypefn {Target Hook} machine_mode TARGET_PROMOTE_FUNCTION_MODE (const_tree @var{type}, machine_mode @var{mode}, int *@var{punsignedp}, const_tree @var{funtype}, int @var{for_return})
 Like @code{PROMOTE_MODE}, but it is applied to outgoing function arguments or
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index fdf16b901c5..d088eee4afe 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -929,6 +929,9 @@ Do not define this macro if it would never modify @var{m}.
 @end defmac
 
 @hook TARGET_C_EXCESS_PRECISION
+Return a value, with the same meaning as the C99 macro
+@code{FLT_EVAL_METHOD} that describes which excess precision should be
+applied.
 
 @hook TARGET_PROMOTE_FUNCTION_MODE
 
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 4fb1cb4743d..0328db1fb54 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -198,7 +198,8 @@ enum excess_precision
 {
   EXCESS_PRECISION_DEFAULT,
   EXCESS_PRECISION_FAST,
-  EXCESS_PRECISION_STANDARD
+  EXCESS_PRECISION_STANDARD,
+  EXCESS_PRECISION_FLOAT16
 };
 
 /* The options for which values of FLT_EVAL_METHOD are permissible.  */
diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index 1723f689a57..847e20e8829 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -267,6 +267,9 @@ gfc_post_options (const char **pfilename)
      support.  */
   if (flag_excess_precision == EXCESS_PRECISION_STANDARD)
     sorry ("%<-fexcess-precision=standard%> for Fortran");
+  else if (flag_excess_precision == EXCESS_PRECISION_FLOAT16)
+    sorry ("%<-fexcess-precision=16%> for Fortran");
+
   flag_excess_precision = EXCESS_PRECISION_FAST;
 
   /* Fortran allows associative math - but we cannot reassociate if
diff --git a/gcc/target.def b/gcc/target.def
index 28a34f1d51b..bfa819609c2 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6225,18 +6225,21 @@ DEFHOOK
  "Return a value, with the same meaning as the C99 macro\n\
 @code{FLT_EVAL_METHOD} that describes which excess precision should be\n\
 applied.  @var{type} is either @code{EXCESS_PRECISION_TYPE_IMPLICIT},\n\
-@code{EXCESS_PRECISION_TYPE_FAST}, or\n\
-@code{EXCESS_PRECISION_TYPE_STANDARD}.  For\n\
+@code{EXCESS_PRECISION_TYPE_FAST},\n\
+@code{EXCESS_PRECISION_TYPE_STANDARD}, or\n\
+@code{EXCESS_PRECISION_TYPE_FLOAT16}.  For\n\
 @code{EXCESS_PRECISION_TYPE_IMPLICIT}, the target should return which\n\
 precision and range operations will be implictly evaluated in regardless\n\
 of the excess precision explicitly added.  For\n\
-@code{EXCESS_PRECISION_TYPE_STANDARD} and\n\
+@code{EXCESS_PRECISION_TYPE_STANDARD}, \n\
+@code{EXCESS_PRECISION_TYPE_FLOAT16}, and\n\
 @code{EXCESS_PRECISION_TYPE_FAST}, the target should return the\n\
 explicit excess precision that should be added depending on the\n\
 value set for @option{-fexcess-precision=@r{[}standard@r{|}fast@r{]}}.\n\
 Note that unpredictable explicit excess precision does not make sense,\n\
 so a target should never return @code{FLT_EVAL_METHOD_UNPREDICTABLE}\n\
-when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD} or\n\
+when @var{type} is @code{EXCESS_PRECISION_TYPE_STANDARD},\n\
+@code{EXCESS_PRECISION_TYPE_FLOAT16} or\n\
 @code{EXCESS_PRECISION_TYPE_FAST}.",
  enum flt_eval_method, (enum excess_precision_type type),
  default_excess_precision)
diff --git a/gcc/testsuite/gcc.target/i386/float16-6.c b/gcc/testsuite/gcc.target/i386/float16-6.c
new file mode 100644
index 00000000000..3d2503ce5e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-6.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2 -mfpmath=sse -fdump-tree-gimple -fexcess-precision=16" } */
+/* { dg-final { scan-tree-dump-not "\\(float\\)" "gimple" } } */
+_Float16
+foo (_Float16 a, _Float16 b, _Float16 c)
+{
+  return a + b + c;
+}
diff --git a/gcc/testsuite/gcc.target/i386/float16-7.c b/gcc/testsuite/gcc.target/i386/float16-7.c
new file mode 100644
index 00000000000..86641afeba9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-7.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfpmath=387 -fexcess-precision=16" } */
+/* { dg-excess-errors "'-fexcess-precision=16' is not compatible with '-mfpmath=387'" } */
+_Float16
+foo (_Float16 a, _Float16 b)
+{
+  return a + b;/* { dg-error "'-fexcess-precision=16' is not compatible with '-mfpmath=387'" } */
+}
+
diff --git a/gcc/tree.c b/gcc/tree.c
index cba3bca41b3..f5cceb07c0d 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -7637,7 +7637,8 @@ excess_precision_type (tree type)
   enum excess_precision_type requested_type
     = (flag_excess_precision == EXCESS_PRECISION_FAST
        ? EXCESS_PRECISION_TYPE_FAST
-       : EXCESS_PRECISION_TYPE_STANDARD);
+       : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
+	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
 
   enum flt_eval_method target_flt_eval_method
     = targetm.c.excess_precision (requested_type);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-17  1:52                                             ` Hongtao Liu
@ 2021-08-24  9:40                                               ` Hongtao Liu
  2021-08-24  9:44                                                 ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-24  9:40 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, liuhongt, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 7494 bytes --]

On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> > > <richard.sandiford@arm.com> wrote:
> > > >
> > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > >>
> > > > >> Hi:
> > > > >> ---
> > > > >> OK, I think sth is amiss here upthread.  insv/extv do look like they
> > > > >> are designed
> > > > >> to work on integer modes (but docs do not say anything about this here).
> > > > >> In fact the caller of extract_bit_field_using_extv is named
> > > > >> extract_integral_bit_field.  Of course nothing seems to check what kind of
> > > > >> modes we're dealing with, but we're for example happily doing
> > > > >> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > > > >> some integer mode and op0 is HFmode?  From the above I get it's
> > > > >> the other way around?  In that case we should wrap the
> > > > >> call to extract_integral_bit_field, extracting in an integer mode with the
> > > > >> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> > > > >> ---
> > > > >>   This is a separate patch as a follow up of upper comments.
> > > > >>
> > > > >> gcc/ChangeLog:
> > > > >>
> > > > >>         * expmed.c (extract_bit_field_1): Wrap the call to
> > > > >>         extract_integral_bit_field, extracting in an integer mode with
> > > > >>         the same size as 'tmode' and then converting the result
> > > > >>         as (subreg:tmode (reg:imode)).
> > > > >>
> > > > >> gcc/testsuite/ChangeLog:
> > > > >>         * gcc.target/i386/float16-5.c: New test.
> > > > >> ---
> > > > >>  gcc/expmed.c                              | 19 +++++++++++++++++++
> > > > >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> > > > >>  2 files changed, 31 insertions(+)
> > > > >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > > > >>
> > > > >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > > >> index 3143f38e057..72790693ef0 100644
> > > > >> --- a/gcc/expmed.c
> > > > >> +++ b/gcc/expmed.c
> > > > >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> > > > >>        op0_mode = opt_scalar_int_mode ();
> > > > >>      }
> > > > >>
> > > > >> +  /* Make sure we are playing with integral modes.  Pun with subregs
> > > > >> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > > > >> +     in extract_integral_bit_field.  */
> > > > >> +  if (int_mode_for_mode (tmode).exists (&imode)
> > > > >
> > > > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > > > > cheaper.  Then imode should always be != tmode.  Maybe
> > > > > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > > > > how it behaves for composite modes.
> > > > >
> > > > > Of course the least surprises would happen when we restrict this
> > > > > to FLOAT_MODE_P (tmode).
> > > > >
> > > > > Richard - any preferences?
> > > >
> > > > If the bug is that extract_integral_bit_field is being called with
> > > > a non-integral mode parameter, then it looks odd that we can still
> > > > fall through to it without an integral mode (when exists is false).
> > > >
> > > > If calling extract_integral_bit_field without an integral mode is
> > > > a bug then I think we should have:
> > > >
> > > >   int_mode_for_mode (mode).require ()
> > > >
> > > > whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
> > > > Ideally we'd make the mode parameter scalar_int_mode too.
> > > >
> > > > extract_integral_bit_field currently has:
> > > >
> > > >   /* Find a correspondingly-sized integer field, so we can apply
> > > >      shifts and masks to it.  */
> > > >   scalar_int_mode int_mode;
> > > >   if (!int_mode_for_mode (tmode).exists (&int_mode))
> > > >     /* If this fails, we should probably push op0 out to memory and then
> > > >        do a load.  */
> > > >     int_mode = int_mode_for_mode (mode).require ();
> > > >
> > > > which would seem to be redundant after this change.
> > >
> > > I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> > > up creating a lowpart subreg that's not allowed and that ICEs (and I
> > > can't see a way to check beforehand).  So it seems to me at least
> > > part of that function doesn't expect non-integral extraction modes.
> > >
> > > But who knows - the code is older than I am (OK, not, but older than
> > > my involvment in GCC ;))
> > >
> > How about attached patch w/ below changelog
> >
> > gcc/ChangeLog:
> >
> >         * expmed.c (extract_bit_field_1): Make sure we're playing with
> >         integral modes before call extract_integral_bit_field.
> >         (extract_integral_bit_field): Add a parameter of type
> >         scalar_int_mode which corresponds to of tmode.
> >         And call extract_and_convert_fixed_bit_field instead of
> >         extract_fixed_bit_field and convert_extracted_bit_field.
> >         (extract_and_convert_fixed_bit_field): New function, it's a
> >         combination of extract_fixed_bit_field and
> >         convert_extracted_bit_field.
> >
> > gcc/testsuite/ChangeLog:
> >         * gcc.target/i386/float16-5.c: New test.
> >
> I'd like to ping this patch, or maybe we can use the patch before with
> richi's comments.
> >

Rebased and ping^2, there are plenty of avx512fp16 patches blocked by
this patch, i'd like someone to help review this patch.

> > > Richard.
> > >
> > > > >> +      && imode != tmode
> > > > >> +      && imode != GET_MODE (op0))
> > > > >> +    {
> > > > >> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> > > > >> +                                           bitsize.to_constant (),
> > > > >> +                                           bitnum.to_constant (), unsignedp,
> > > > >> +                                           NULL, imode, imode,
> > > > >> +                                           reverse, fallback_p);
> > > > >> +      gcc_assert (ret);
> > > > >> +
> > > > >> +      if (!REG_P (ret))
> > > > >> +       ret = force_reg (imode, ret);
> > > > >> +      return gen_lowpart_SUBREG (tmode, ret);
> > > > >> +    }
> > > > >> +
> > > > >>    /* It's possible we'll need to handle other cases here for
> > > > >>       polynomial bitnum and bitsize.  */
> > > >
> > > > Minor nit, but since the code is using to_constant, it should go after
> > > > this comment rather than before it.
> > > >
> > > > Thanks,
> > > > Richard
> > > >
> > > > >>
> > > > >> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > > >> new file mode 100644
> > > > >> index 00000000000..ebc0af1490b
> > > > >> --- /dev/null
> > > > >> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > > >> @@ -0,0 +1,12 @@
> > > > >> +/* { dg-do compile } */
> > > > >> +/* { dg-options "-msse2 -O2" } */
> > > > >> +_Float16
> > > > >> +foo (int a)
> > > > >> +{
> > > > >> +  union {
> > > > >> +    int a;
> > > > >> +    _Float16 b;
> > > > >> +  }c;
> > > > >> +  c.a = a;
> > > > >> +  return c.b;
> > > > >> +}
> > > > >> --
> > > > >> 2.27.0
> > > > >>
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

[-- Attachment #2: 0001-Make-sure-we-re-playing-with-integral-modes-before-c.patch --]
[-- Type: text/x-patch, Size: 7259 bytes --]

From 9c77ac15e69b567156a82debe45e3ced10df1110 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Fri, 6 Aug 2021 10:18:43 +0800
Subject: [PATCH] Make sure we're playing with integral modes before call
 extract_integral_bit_field.

gcc/ChangeLog:

	* expmed.c (extract_bit_field_1): Make sure we're playing with
	integral modes before call extract_integral_bit_field.
	(extract_integral_bit_field): Add a parameter of type
	scalar_int_mode which corresponds to of tmode.
	And call extract_and_convert_fixed_bit_field instead of
	extract_fixed_bit_field and convert_extracted_bit_field.
	(extract_and_convert_fixed_bit_field): New function, it's a
	combination of extract_fixed_bit_field and
	convert_extracted_bit_field.

gcc/testsuite/ChangeLog:
	* gcc.target/i386/float16-5.c: New test.
---
 gcc/expmed.c                              | 103 ++++++++++++++++------
 gcc/testsuite/gcc.target/i386/float16-5.c |  12 +++
 2 files changed, 90 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3143f38e057..f083d6e86d0 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -71,7 +71,14 @@ static void store_split_bit_field (rtx, opt_scalar_int_mode,
 static rtx extract_integral_bit_field (rtx, opt_scalar_int_mode,
 				       unsigned HOST_WIDE_INT,
 				       unsigned HOST_WIDE_INT, int, rtx,
-				       machine_mode, machine_mode, bool, bool);
+				       machine_mode, machine_mode,
+				       scalar_int_mode, bool, bool);
+static rtx extract_and_convert_fixed_bit_field (scalar_int_mode,
+						machine_mode, machine_mode,
+						rtx, opt_scalar_int_mode,
+						unsigned HOST_WIDE_INT,
+						unsigned HOST_WIDE_INT, rtx,
+						int, bool);
 static rtx extract_fixed_bit_field (machine_mode, rtx, opt_scalar_int_mode,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT, rtx, int, bool);
@@ -1632,6 +1639,7 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 {
   rtx op0 = str_rtx;
   machine_mode mode1;
+  scalar_int_mode int_tmode;
 
   if (tmode == VOIDmode)
     tmode = mode;
@@ -1853,10 +1861,46 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
   /* It's possible we'll need to handle other cases here for
      polynomial bitnum and bitsize.  */
 
+  /* Make sure we are playing with integral modes.  Pun with subregs
+     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
+     in extract_integral_bit_field.  */
+  opt_scalar_int_mode target_imode = int_mode_for_mode (tmode);
+  if (!target_imode.exists (&int_tmode) || int_tmode != tmode)
+    {
+      if (target_imode.exists (&int_tmode))
+	{
+	  rtx ret = extract_integral_bit_field (op0, op0_mode,
+						bitsize.to_constant (),
+						bitnum.to_constant (),
+						unsignedp, NULL, int_tmode,
+						int_tmode, int_tmode,
+						reverse, fallback_p);
+	  gcc_assert (ret);
+
+	  if (!REG_P (ret))
+	    ret = force_reg (int_tmode, ret);
+	  return gen_lowpart_SUBREG (tmode, ret);
+	}
+      else
+	{
+	  if (!fallback_p)
+	    return NULL;
+
+	  int_tmode = int_mode_for_mode (mode).require ();
+	  return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
+						      op0, op0_mode,
+						      bitsize.to_constant (),
+						      bitnum.to_constant (),
+						      target, unsignedp,
+						      reverse);
+	}
+    }
+
   /* From here on we need to be looking at a fixed-size insertion.  */
   return extract_integral_bit_field (op0, op0_mode, bitsize.to_constant (),
 				     bitnum.to_constant (), unsignedp,
-				     target, mode, tmode, reverse, fallback_p);
+				     target, mode, tmode,
+				     int_tmode, reverse, fallback_p);
 }
 
 /* Subroutine of extract_bit_field_1, with the same arguments, except
@@ -1869,6 +1913,7 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 			    unsigned HOST_WIDE_INT bitsize,
 			    unsigned HOST_WIDE_INT bitnum, int unsignedp,
 			    rtx target, machine_mode mode, machine_mode tmode,
+			    scalar_int_mode int_tmode,
 			    bool reverse, bool fallback_p)
 {
   /* Handle fields bigger than a word.  */
@@ -2035,29 +2080,10 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
   if (!fallback_p)
     return NULL;
 
-  /* Find a correspondingly-sized integer field, so we can apply
-     shifts and masks to it.  */
-  scalar_int_mode int_mode;
-  if (!int_mode_for_mode (tmode).exists (&int_mode))
-    /* If this fails, we should probably push op0 out to memory and then
-       do a load.  */
-    int_mode = int_mode_for_mode (mode).require ();
-
-  target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
-				    bitnum, target, unsignedp, reverse);
-
-  /* Complex values must be reversed piecewise, so we need to undo the global
-     reversal, convert to the complex mode and reverse again.  */
-  if (reverse && COMPLEX_MODE_P (tmode))
-    {
-      target = flip_storage_order (int_mode, target);
-      target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
-      target = flip_storage_order (tmode, target);
-    }
-  else
-    target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
-
-  return target;
+  return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
+					      op0, op0_mode, bitsize,
+					      bitnum, target, unsignedp,
+					      reverse);
 }
 
 /* Generate code to extract a byte-field from STR_RTX
@@ -2129,6 +2155,33 @@ extract_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
   return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp,
 			      target, mode, tmode, reverse, true, alt_rtl);
 }
+
+/* Combination of extract_fixed_bit_field and convert_extracted_bit_field.  */
+static rtx
+extract_and_convert_fixed_bit_field (scalar_int_mode int_tmode,
+				     machine_mode tmode, machine_mode mode,
+				     rtx op0, opt_scalar_int_mode op0_mode,
+				     unsigned HOST_WIDE_INT bitsize,
+				     unsigned HOST_WIDE_INT bitnum,
+				     rtx target, int unsignedp, bool reverse)
+{
+  target = extract_fixed_bit_field (int_tmode, op0, op0_mode, bitsize,
+				    bitnum, target, unsignedp, reverse);
+
+  /* Complex values must be reversed piecewise, so we need to undo the global
+     reversal, convert to the complex mode and reverse again.  */
+  if (reverse && COMPLEX_MODE_P (tmode))
+    {
+      target = flip_storage_order (int_tmode, target);
+      target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+      target = flip_storage_order (tmode, target);
+    }
+  else
+    target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+
+  return target;
+}
+
 \f
 /* Use shifts and boolean operations to extract a field of BITSIZE bits
    from bit BITNUM of OP0.  If OP0_MODE is defined, it is the mode of OP0,
diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
new file mode 100644
index 00000000000..ebc0af1490b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2" } */
+_Float16
+foo (int a)
+{
+  union {
+    int a;
+    _Float16 b;
+  }c;
+  c.a = a;
+  return c.b;
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-24  9:40                                               ` Hongtao Liu
@ 2021-08-24  9:44                                                 ` Hongtao Liu
  2021-08-24 11:38                                                   ` Richard Biener
  2021-08-25 23:16                                                   ` Jeff Law
  0 siblings, 2 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-24  9:44 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, liuhongt, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 8035 bytes --]

On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> > > > <richard.sandiford@arm.com> wrote:
> > > > >
> > > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > >>
> > > > > >> Hi:
> > > > > >> ---
> > > > > >> OK, I think sth is amiss here upthread.  insv/extv do look like they
> > > > > >> are designed
> > > > > >> to work on integer modes (but docs do not say anything about this here).
> > > > > >> In fact the caller of extract_bit_field_using_extv is named
> > > > > >> extract_integral_bit_field.  Of course nothing seems to check what kind of
> > > > > >> modes we're dealing with, but we're for example happily doing
> > > > > >> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > > > > >> some integer mode and op0 is HFmode?  From the above I get it's
> > > > > >> the other way around?  In that case we should wrap the
> > > > > >> call to extract_integral_bit_field, extracting in an integer mode with the
> > > > > >> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> > > > > >> ---
> > > > > >>   This is a separate patch as a follow up of upper comments.
> > > > > >>
> > > > > >> gcc/ChangeLog:
> > > > > >>
> > > > > >>         * expmed.c (extract_bit_field_1): Wrap the call to
> > > > > >>         extract_integral_bit_field, extracting in an integer mode with
> > > > > >>         the same size as 'tmode' and then converting the result
> > > > > >>         as (subreg:tmode (reg:imode)).
> > > > > >>
> > > > > >> gcc/testsuite/ChangeLog:
> > > > > >>         * gcc.target/i386/float16-5.c: New test.
> > > > > >> ---
> > > > > >>  gcc/expmed.c                              | 19 +++++++++++++++++++
> > > > > >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> > > > > >>  2 files changed, 31 insertions(+)
> > > > > >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > >>
> > > > > >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > > > >> index 3143f38e057..72790693ef0 100644
> > > > > >> --- a/gcc/expmed.c
> > > > > >> +++ b/gcc/expmed.c
> > > > > >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> > > > > >>        op0_mode = opt_scalar_int_mode ();
> > > > > >>      }
> > > > > >>
> > > > > >> +  /* Make sure we are playing with integral modes.  Pun with subregs
> > > > > >> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > > > > >> +     in extract_integral_bit_field.  */
> > > > > >> +  if (int_mode_for_mode (tmode).exists (&imode)
> > > > > >
> > > > > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > > > > > cheaper.  Then imode should always be != tmode.  Maybe
> > > > > > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > > > > > how it behaves for composite modes.
> > > > > >
> > > > > > Of course the least surprises would happen when we restrict this
> > > > > > to FLOAT_MODE_P (tmode).
> > > > > >
> > > > > > Richard - any preferences?
> > > > >
> > > > > If the bug is that extract_integral_bit_field is being called with
> > > > > a non-integral mode parameter, then it looks odd that we can still
> > > > > fall through to it without an integral mode (when exists is false).
> > > > >
> > > > > If calling extract_integral_bit_field without an integral mode is
> > > > > a bug then I think we should have:
> > > > >
> > > > >   int_mode_for_mode (mode).require ()
> > > > >
> > > > > whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
> > > > > Ideally we'd make the mode parameter scalar_int_mode too.
> > > > >
> > > > > extract_integral_bit_field currently has:
> > > > >
> > > > >   /* Find a correspondingly-sized integer field, so we can apply
> > > > >      shifts and masks to it.  */
> > > > >   scalar_int_mode int_mode;
> > > > >   if (!int_mode_for_mode (tmode).exists (&int_mode))
> > > > >     /* If this fails, we should probably push op0 out to memory and then
> > > > >        do a load.  */
> > > > >     int_mode = int_mode_for_mode (mode).require ();
> > > > >
> > > > > which would seem to be redundant after this change.
> > > >
> > > > I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> > > > up creating a lowpart subreg that's not allowed and that ICEs (and I
> > > > can't see a way to check beforehand).  So it seems to me at least
> > > > part of that function doesn't expect non-integral extraction modes.
> > > >
> > > > But who knows - the code is older than I am (OK, not, but older than
> > > > my involvment in GCC ;))
> > > >
> > > How about attached patch w/ below changelog
> > >
> > > gcc/ChangeLog:
> > >
> > >         * expmed.c (extract_bit_field_1): Make sure we're playing with
> > >         integral modes before call extract_integral_bit_field.
> > >         (extract_integral_bit_field): Add a parameter of type
> > >         scalar_int_mode which corresponds to of tmode.
> > >         And call extract_and_convert_fixed_bit_field instead of
> > >         extract_fixed_bit_field and convert_extracted_bit_field.
> > >         (extract_and_convert_fixed_bit_field): New function, it's a
> > >         combination of extract_fixed_bit_field and
> > >         convert_extracted_bit_field.
> > >
> > > gcc/testsuite/ChangeLog:
> > >         * gcc.target/i386/float16-5.c: New test.
> > >
> > I'd like to ping this patch, or maybe we can use the patch before with
> > richi's comments.
> > >
>
> Rebased and ping^2, there are plenty of avx512fp16 patches blocked by
> this patch, i'd like someone to help review this patch.
>
Please ignore the former attached patch, should be the patch attached here.
> > > > Richard.
> > > >
> > > > > >> +      && imode != tmode
> > > > > >> +      && imode != GET_MODE (op0))
> > > > > >> +    {
> > > > > >> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> > > > > >> +                                           bitsize.to_constant (),
> > > > > >> +                                           bitnum.to_constant (), unsignedp,
> > > > > >> +                                           NULL, imode, imode,
> > > > > >> +                                           reverse, fallback_p);
> > > > > >> +      gcc_assert (ret);
> > > > > >> +
> > > > > >> +      if (!REG_P (ret))
> > > > > >> +       ret = force_reg (imode, ret);
> > > > > >> +      return gen_lowpart_SUBREG (tmode, ret);
> > > > > >> +    }
> > > > > >> +
> > > > > >>    /* It's possible we'll need to handle other cases here for
> > > > > >>       polynomial bitnum and bitsize.  */
> > > > >
> > > > > Minor nit, but since the code is using to_constant, it should go after
> > > > > this comment rather than before it.
> > > > >
> > > > > Thanks,
> > > > > Richard
> > > > >
> > > > > >>
> > > > > >> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > >> new file mode 100644
> > > > > >> index 00000000000..ebc0af1490b
> > > > > >> --- /dev/null
> > > > > >> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > >> @@ -0,0 +1,12 @@
> > > > > >> +/* { dg-do compile } */
> > > > > >> +/* { dg-options "-msse2 -O2" } */
> > > > > >> +_Float16
> > > > > >> +foo (int a)
> > > > > >> +{
> > > > > >> +  union {
> > > > > >> +    int a;
> > > > > >> +    _Float16 b;
> > > > > >> +  }c;
> > > > > >> +  c.a = a;
> > > > > >> +  return c.b;
> > > > > >> +}
> > > > > >> --
> > > > > >> 2.27.0
> > > > > >>
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

[-- Attachment #2: 0001-Make-sure-we-re-playing-with-integral-modes-before-c.patch --]
[-- Type: text/x-patch, Size: 7259 bytes --]

From 9c77ac15e69b567156a82debe45e3ced10df1110 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Fri, 6 Aug 2021 10:18:43 +0800
Subject: [PATCH] Make sure we're playing with integral modes before call
 extract_integral_bit_field.

gcc/ChangeLog:

	* expmed.c (extract_bit_field_1): Make sure we're playing with
	integral modes before call extract_integral_bit_field.
	(extract_integral_bit_field): Add a parameter of type
	scalar_int_mode which corresponds to of tmode.
	And call extract_and_convert_fixed_bit_field instead of
	extract_fixed_bit_field and convert_extracted_bit_field.
	(extract_and_convert_fixed_bit_field): New function, it's a
	combination of extract_fixed_bit_field and
	convert_extracted_bit_field.

gcc/testsuite/ChangeLog:
	* gcc.target/i386/float16-5.c: New test.
---
 gcc/expmed.c                              | 103 ++++++++++++++++------
 gcc/testsuite/gcc.target/i386/float16-5.c |  12 +++
 2 files changed, 90 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3143f38e057..f083d6e86d0 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -71,7 +71,14 @@ static void store_split_bit_field (rtx, opt_scalar_int_mode,
 static rtx extract_integral_bit_field (rtx, opt_scalar_int_mode,
 				       unsigned HOST_WIDE_INT,
 				       unsigned HOST_WIDE_INT, int, rtx,
-				       machine_mode, machine_mode, bool, bool);
+				       machine_mode, machine_mode,
+				       scalar_int_mode, bool, bool);
+static rtx extract_and_convert_fixed_bit_field (scalar_int_mode,
+						machine_mode, machine_mode,
+						rtx, opt_scalar_int_mode,
+						unsigned HOST_WIDE_INT,
+						unsigned HOST_WIDE_INT, rtx,
+						int, bool);
 static rtx extract_fixed_bit_field (machine_mode, rtx, opt_scalar_int_mode,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT, rtx, int, bool);
@@ -1632,6 +1639,7 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 {
   rtx op0 = str_rtx;
   machine_mode mode1;
+  scalar_int_mode int_tmode;
 
   if (tmode == VOIDmode)
     tmode = mode;
@@ -1853,10 +1861,46 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
   /* It's possible we'll need to handle other cases here for
      polynomial bitnum and bitsize.  */
 
+  /* Make sure we are playing with integral modes.  Pun with subregs
+     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
+     in extract_integral_bit_field.  */
+  opt_scalar_int_mode target_imode = int_mode_for_mode (tmode);
+  if (!target_imode.exists (&int_tmode) || int_tmode != tmode)
+    {
+      if (target_imode.exists (&int_tmode))
+	{
+	  rtx ret = extract_integral_bit_field (op0, op0_mode,
+						bitsize.to_constant (),
+						bitnum.to_constant (),
+						unsignedp, NULL, int_tmode,
+						int_tmode, int_tmode,
+						reverse, fallback_p);
+	  gcc_assert (ret);
+
+	  if (!REG_P (ret))
+	    ret = force_reg (int_tmode, ret);
+	  return gen_lowpart_SUBREG (tmode, ret);
+	}
+      else
+	{
+	  if (!fallback_p)
+	    return NULL;
+
+	  int_tmode = int_mode_for_mode (mode).require ();
+	  return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
+						      op0, op0_mode,
+						      bitsize.to_constant (),
+						      bitnum.to_constant (),
+						      target, unsignedp,
+						      reverse);
+	}
+    }
+
   /* From here on we need to be looking at a fixed-size insertion.  */
   return extract_integral_bit_field (op0, op0_mode, bitsize.to_constant (),
 				     bitnum.to_constant (), unsignedp,
-				     target, mode, tmode, reverse, fallback_p);
+				     target, mode, tmode,
+				     int_tmode, reverse, fallback_p);
 }
 
 /* Subroutine of extract_bit_field_1, with the same arguments, except
@@ -1869,6 +1913,7 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 			    unsigned HOST_WIDE_INT bitsize,
 			    unsigned HOST_WIDE_INT bitnum, int unsignedp,
 			    rtx target, machine_mode mode, machine_mode tmode,
+			    scalar_int_mode int_tmode,
 			    bool reverse, bool fallback_p)
 {
   /* Handle fields bigger than a word.  */
@@ -2035,29 +2080,10 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
   if (!fallback_p)
     return NULL;
 
-  /* Find a correspondingly-sized integer field, so we can apply
-     shifts and masks to it.  */
-  scalar_int_mode int_mode;
-  if (!int_mode_for_mode (tmode).exists (&int_mode))
-    /* If this fails, we should probably push op0 out to memory and then
-       do a load.  */
-    int_mode = int_mode_for_mode (mode).require ();
-
-  target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
-				    bitnum, target, unsignedp, reverse);
-
-  /* Complex values must be reversed piecewise, so we need to undo the global
-     reversal, convert to the complex mode and reverse again.  */
-  if (reverse && COMPLEX_MODE_P (tmode))
-    {
-      target = flip_storage_order (int_mode, target);
-      target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
-      target = flip_storage_order (tmode, target);
-    }
-  else
-    target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
-
-  return target;
+  return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
+					      op0, op0_mode, bitsize,
+					      bitnum, target, unsignedp,
+					      reverse);
 }
 
 /* Generate code to extract a byte-field from STR_RTX
@@ -2129,6 +2155,33 @@ extract_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
   return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp,
 			      target, mode, tmode, reverse, true, alt_rtl);
 }
+
+/* Combination of extract_fixed_bit_field and convert_extracted_bit_field.  */
+static rtx
+extract_and_convert_fixed_bit_field (scalar_int_mode int_tmode,
+				     machine_mode tmode, machine_mode mode,
+				     rtx op0, opt_scalar_int_mode op0_mode,
+				     unsigned HOST_WIDE_INT bitsize,
+				     unsigned HOST_WIDE_INT bitnum,
+				     rtx target, int unsignedp, bool reverse)
+{
+  target = extract_fixed_bit_field (int_tmode, op0, op0_mode, bitsize,
+				    bitnum, target, unsignedp, reverse);
+
+  /* Complex values must be reversed piecewise, so we need to undo the global
+     reversal, convert to the complex mode and reverse again.  */
+  if (reverse && COMPLEX_MODE_P (tmode))
+    {
+      target = flip_storage_order (int_tmode, target);
+      target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+      target = flip_storage_order (tmode, target);
+    }
+  else
+    target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+
+  return target;
+}
+
 \f
 /* Use shifts and boolean operations to extract a field of BITSIZE bits
    from bit BITNUM of OP0.  If OP0_MODE is defined, it is the mode of OP0,
diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
new file mode 100644
index 00000000000..ebc0af1490b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2" } */
+_Float16
+foo (int a)
+{
+  union {
+    int a;
+    _Float16 b;
+  }c;
+  c.a = a;
+  return c.b;
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-24  9:44                                                 ` Hongtao Liu
@ 2021-08-24 11:38                                                   ` Richard Biener
  2021-08-26  1:17                                                     ` Hongtao Liu
  2021-08-25 23:16                                                   ` Jeff Law
  1 sibling, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-24 11:38 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Richard Biener via Gcc-patches, liuhongt, Richard Sandiford

On Tue, Aug 24, 2021 at 11:38 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> > > > > <richard.sandiford@arm.com> wrote:
> > > > > >
> > > > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > > > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > > >>
> > > > > > >> Hi:
> > > > > > >> ---
> > > > > > >> OK, I think sth is amiss here upthread.  insv/extv do look like they
> > > > > > >> are designed
> > > > > > >> to work on integer modes (but docs do not say anything about this here).
> > > > > > >> In fact the caller of extract_bit_field_using_extv is named
> > > > > > >> extract_integral_bit_field.  Of course nothing seems to check what kind of
> > > > > > >> modes we're dealing with, but we're for example happily doing
> > > > > > >> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > > > > > >> some integer mode and op0 is HFmode?  From the above I get it's
> > > > > > >> the other way around?  In that case we should wrap the
> > > > > > >> call to extract_integral_bit_field, extracting in an integer mode with the
> > > > > > >> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> > > > > > >> ---
> > > > > > >>   This is a separate patch as a follow up of upper comments.
> > > > > > >>
> > > > > > >> gcc/ChangeLog:
> > > > > > >>
> > > > > > >>         * expmed.c (extract_bit_field_1): Wrap the call to
> > > > > > >>         extract_integral_bit_field, extracting in an integer mode with
> > > > > > >>         the same size as 'tmode' and then converting the result
> > > > > > >>         as (subreg:tmode (reg:imode)).
> > > > > > >>
> > > > > > >> gcc/testsuite/ChangeLog:
> > > > > > >>         * gcc.target/i386/float16-5.c: New test.
> > > > > > >> ---
> > > > > > >>  gcc/expmed.c                              | 19 +++++++++++++++++++
> > > > > > >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> > > > > > >>  2 files changed, 31 insertions(+)
> > > > > > >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > > >>
> > > > > > >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > > > > >> index 3143f38e057..72790693ef0 100644
> > > > > > >> --- a/gcc/expmed.c
> > > > > > >> +++ b/gcc/expmed.c
> > > > > > >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> > > > > > >>        op0_mode = opt_scalar_int_mode ();
> > > > > > >>      }
> > > > > > >>
> > > > > > >> +  /* Make sure we are playing with integral modes.  Pun with subregs
> > > > > > >> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > > > > > >> +     in extract_integral_bit_field.  */
> > > > > > >> +  if (int_mode_for_mode (tmode).exists (&imode)
> > > > > > >
> > > > > > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > > > > > > cheaper.  Then imode should always be != tmode.  Maybe
> > > > > > > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > > > > > > how it behaves for composite modes.
> > > > > > >
> > > > > > > Of course the least surprises would happen when we restrict this
> > > > > > > to FLOAT_MODE_P (tmode).
> > > > > > >
> > > > > > > Richard - any preferences?
> > > > > >
> > > > > > If the bug is that extract_integral_bit_field is being called with
> > > > > > a non-integral mode parameter, then it looks odd that we can still
> > > > > > fall through to it without an integral mode (when exists is false).
> > > > > >
> > > > > > If calling extract_integral_bit_field without an integral mode is
> > > > > > a bug then I think we should have:
> > > > > >
> > > > > >   int_mode_for_mode (mode).require ()
> > > > > >
> > > > > > whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
> > > > > > Ideally we'd make the mode parameter scalar_int_mode too.
> > > > > >
> > > > > > extract_integral_bit_field currently has:
> > > > > >
> > > > > >   /* Find a correspondingly-sized integer field, so we can apply
> > > > > >      shifts and masks to it.  */
> > > > > >   scalar_int_mode int_mode;
> > > > > >   if (!int_mode_for_mode (tmode).exists (&int_mode))
> > > > > >     /* If this fails, we should probably push op0 out to memory and then
> > > > > >        do a load.  */
> > > > > >     int_mode = int_mode_for_mode (mode).require ();
> > > > > >
> > > > > > which would seem to be redundant after this change.
> > > > >
> > > > > I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> > > > > up creating a lowpart subreg that's not allowed and that ICEs (and I
> > > > > can't see a way to check beforehand).  So it seems to me at least
> > > > > part of that function doesn't expect non-integral extraction modes.
> > > > >
> > > > > But who knows - the code is older than I am (OK, not, but older than
> > > > > my involvment in GCC ;))
> > > > >
> > > > How about attached patch w/ below changelog
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * expmed.c (extract_bit_field_1): Make sure we're playing with
> > > >         integral modes before call extract_integral_bit_field.
> > > >         (extract_integral_bit_field): Add a parameter of type
> > > >         scalar_int_mode which corresponds to of tmode.
> > > >         And call extract_and_convert_fixed_bit_field instead of
> > > >         extract_fixed_bit_field and convert_extracted_bit_field.
> > > >         (extract_and_convert_fixed_bit_field): New function, it's a
> > > >         combination of extract_fixed_bit_field and
> > > >         convert_extracted_bit_field.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >         * gcc.target/i386/float16-5.c: New test.
> > > >
> > > I'd like to ping this patch, or maybe we can use the patch before with
> > > richi's comments.
> > > >
> >
> > Rebased and ping^2, there are plenty of avx512fp16 patches blocked by
> > this patch, i'd like someone to help review this patch.
> >
> Please ignore the former attached patch, should be the patch attached here.

I think the patch is reasonable.  I'm a bit worried approving it since
my knowledge
of the code is restricted.  I wonder if you can tell the change doesn't make
a difference for the majority of cases - that is, did you try for
example comparing
generated code for GCC (or parts of it)?

OK if there's no objection from others within 48 hours.

Thanks,
Richard.

> > > > > Richard.
> > > > >
> > > > > > >> +      && imode != tmode
> > > > > > >> +      && imode != GET_MODE (op0))
> > > > > > >> +    {
> > > > > > >> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> > > > > > >> +                                           bitsize.to_constant (),
> > > > > > >> +                                           bitnum.to_constant (), unsignedp,
> > > > > > >> +                                           NULL, imode, imode,
> > > > > > >> +                                           reverse, fallback_p);
> > > > > > >> +      gcc_assert (ret);
> > > > > > >> +
> > > > > > >> +      if (!REG_P (ret))
> > > > > > >> +       ret = force_reg (imode, ret);
> > > > > > >> +      return gen_lowpart_SUBREG (tmode, ret);
> > > > > > >> +    }
> > > > > > >> +
> > > > > > >>    /* It's possible we'll need to handle other cases here for
> > > > > > >>       polynomial bitnum and bitsize.  */
> > > > > >
> > > > > > Minor nit, but since the code is using to_constant, it should go after
> > > > > > this comment rather than before it.
> > > > > >
> > > > > > Thanks,
> > > > > > Richard
> > > > > >
> > > > > > >>
> > > > > > >> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > > >> new file mode 100644
> > > > > > >> index 00000000000..ebc0af1490b
> > > > > > >> --- /dev/null
> > > > > > >> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > > >> @@ -0,0 +1,12 @@
> > > > > > >> +/* { dg-do compile } */
> > > > > > >> +/* { dg-options "-msse2 -O2" } */
> > > > > > >> +_Float16
> > > > > > >> +foo (int a)
> > > > > > >> +{
> > > > > > >> +  union {
> > > > > > >> +    int a;
> > > > > > >> +    _Float16 b;
> > > > > > >> +  }c;
> > > > > > >> +  c.a = a;
> > > > > > >> +  return c.b;
> > > > > > >> +}
> > > > > > >> --
> > > > > > >> 2.27.0
> > > > > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-24  9:44                                                 ` Hongtao Liu
  2021-08-24 11:38                                                   ` Richard Biener
@ 2021-08-25 23:16                                                   ` Jeff Law
  2021-08-26  2:05                                                     ` Hongtao Liu
  2021-08-26  7:11                                                     ` Richard Biener
  1 sibling, 2 replies; 138+ messages in thread
From: Jeff Law @ 2021-08-25 23:16 UTC (permalink / raw)
  To: Hongtao Liu, Richard Biener
  Cc: Richard Sandiford, liuhongt, Richard Biener via Gcc-patches



On 8/24/2021 3:44 AM, Hongtao Liu via Gcc-patches wrote:
> On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu <crazylht@gmail.com> wrote:
>> On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
>>> On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu <crazylht@gmail.com> wrote:
>>>> On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>> On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
>>>>> <richard.sandiford@arm.com> wrote:
>>>>>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>>>>> On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
>>>>>>>> Hi:
>>>>>>>> ---
>>>>>>>> OK, I think sth is amiss here upthread.  insv/extv do look like they
>>>>>>>> are designed
>>>>>>>> to work on integer modes (but docs do not say anything about this here).
>>>>>>>> In fact the caller of extract_bit_field_using_extv is named
>>>>>>>> extract_integral_bit_field.  Of course nothing seems to check what kind of
>>>>>>>> modes we're dealing with, but we're for example happily doing
>>>>>>>> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
>>>>>>>> some integer mode and op0 is HFmode?  From the above I get it's
>>>>>>>> the other way around?  In that case we should wrap the
>>>>>>>> call to extract_integral_bit_field, extracting in an integer mode with the
>>>>>>>> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
>>>>>>>> ---
>>>>>>>>    This is a separate patch as a follow up of upper comments.
>>>>>>>>
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>          * expmed.c (extract_bit_field_1): Wrap the call to
>>>>>>>>          extract_integral_bit_field, extracting in an integer mode with
>>>>>>>>          the same size as 'tmode' and then converting the result
>>>>>>>>          as (subreg:tmode (reg:imode)).
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>          * gcc.target/i386/float16-5.c: New test.
>>>>>>>> ---
>>>>>>>>   gcc/expmed.c                              | 19 +++++++++++++++++++
>>>>>>>>   gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
>>>>>>>>   2 files changed, 31 insertions(+)
>>>>>>>>   create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>>>>>>>>
>>>>>>>> diff --git a/gcc/expmed.c b/gcc/expmed.c
>>>>>>>> index 3143f38e057..72790693ef0 100644
>>>>>>>> --- a/gcc/expmed.c
>>>>>>>> +++ b/gcc/expmed.c
>>>>>>>> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>>>>>>>>         op0_mode = opt_scalar_int_mode ();
>>>>>>>>       }
>>>>>>>>
>>>>>>>> +  /* Make sure we are playing with integral modes.  Pun with subregs
>>>>>>>> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
>>>>>>>> +     in extract_integral_bit_field.  */
>>>>>>>> +  if (int_mode_for_mode (tmode).exists (&imode)
>>>>>>> check !INTEGRAL_MODE_P (tmode) before, that should be slightly
>>>>>>> cheaper.  Then imode should always be != tmode.  Maybe
>>>>>>> even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
>>>>>>> how it behaves for composite modes.
>>>>>>>
>>>>>>> Of course the least surprises would happen when we restrict this
>>>>>>> to FLOAT_MODE_P (tmode).
>>>>>>>
>>>>>>> Richard - any preferences?
>>>>>> If the bug is that extract_integral_bit_field is being called with
>>>>>> a non-integral mode parameter, then it looks odd that we can still
>>>>>> fall through to it without an integral mode (when exists is false).
>>>>>>
>>>>>> If calling extract_integral_bit_field without an integral mode is
>>>>>> a bug then I think we should have:
>>>>>>
>>>>>>    int_mode_for_mode (mode).require ()
>>>>>>
>>>>>> whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
>>>>>> Ideally we'd make the mode parameter scalar_int_mode too.
>>>>>>
>>>>>> extract_integral_bit_field currently has:
>>>>>>
>>>>>>    /* Find a correspondingly-sized integer field, so we can apply
>>>>>>       shifts and masks to it.  */
>>>>>>    scalar_int_mode int_mode;
>>>>>>    if (!int_mode_for_mode (tmode).exists (&int_mode))
>>>>>>      /* If this fails, we should probably push op0 out to memory and then
>>>>>>         do a load.  */
>>>>>>      int_mode = int_mode_for_mode (mode).require ();
>>>>>>
>>>>>> which would seem to be redundant after this change.
>>>>> I'm not sure what exactly the bug is, but extract_integral_bit_field ends
>>>>> up creating a lowpart subreg that's not allowed and that ICEs (and I
>>>>> can't see a way to check beforehand).  So it seems to me at least
>>>>> part of that function doesn't expect non-integral extraction modes.
>>>>>
>>>>> But who knows - the code is older than I am (OK, not, but older than
>>>>> my involvment in GCC ;))
>>>>>
>>>> How about attached patch w/ below changelog
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>          * expmed.c (extract_bit_field_1): Make sure we're playing with
>>>>          integral modes before call extract_integral_bit_field.
>>>>          (extract_integral_bit_field): Add a parameter of type
>>>>          scalar_int_mode which corresponds to of tmode.
>>>>          And call extract_and_convert_fixed_bit_field instead of
>>>>          extract_fixed_bit_field and convert_extracted_bit_field.
>>>>          (extract_and_convert_fixed_bit_field): New function, it's a
>>>>          combination of extract_fixed_bit_field and
>>>>          convert_extracted_bit_field.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>          * gcc.target/i386/float16-5.c: New test.
>>>>
>>> I'd like to ping this patch, or maybe we can use the patch before with
>>> richi's comments.
>> Rebased and ping^2, there are plenty of avx512fp16 patches blocked by
>> this patch, i'd like someone to help review this patch.
>>
> Please ignore the former attached patch, should be the patch attached here.
>>>>> Richard.
>>>>>
>>>>>>>> +      && imode != tmode
>>>>>>>> +      && imode != GET_MODE (op0))
>>>>>>>> +    {
>>>>>>>> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
>>>>>>>> +                                           bitsize.to_constant (),
>>>>>>>> +                                           bitnum.to_constant (), unsignedp,
>>>>>>>> +                                           NULL, imode, imode,
>>>>>>>> +                                           reverse, fallback_p);
>>>>>>>> +      gcc_assert (ret);
>>>>>>>> +
>>>>>>>> +      if (!REG_P (ret))
>>>>>>>> +       ret = force_reg (imode, ret);
>>>>>>>> +      return gen_lowpart_SUBREG (tmode, ret);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>     /* It's possible we'll need to handle other cases here for
>>>>>>>>        polynomial bitnum and bitsize.  */
>>>>>> Minor nit, but since the code is using to_constant, it should go after
>>>>>> this comment rather than before it.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard
>>>>>>
>>>>>>>> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
>>>>>>>> new file mode 100644
>>>>>>>> index 00000000000..ebc0af1490b
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
>>>>>>>> @@ -0,0 +1,12 @@
>>>>>>>> +/* { dg-do compile } */
>>>>>>>> +/* { dg-options "-msse2 -O2" } */
>>>>>>>> +_Float16
>>>>>>>> +foo (int a)
>>>>>>>> +{
>>>>>>>> +  union {
>>>>>>>> +    int a;
>>>>>>>> +    _Float16 b;
>>>>>>>> +  }c;
>>>>>>>> +  c.a = a;
>>>>>>>> +  return c.b;
>>>>>>>> +}
>>>>>>>> --
>>>>>>>> 2.27.0
>>>>>>>>
>>>>
>>>>
>>>> --
>>>> BR,
>>>> Hongtao
>>>
>>>
>>> --
>>> BR,
>>> Hongtao
>>
>>
>> --
>> BR,
>> Hongtao
>
>
>
> 0001-Make-sure-we-re-playing-with-integral-modes-before-c.patch
>
>  From 9c77ac15e69b567156a82debe45e3ced10df1110 Mon Sep 17 00:00:00 2001
> From: liuhongt <hongtao.liu@intel.com>
> Date: Fri, 6 Aug 2021 10:18:43 +0800
> Subject: [PATCH] Make sure we're playing with integral modes before call
>   extract_integral_bit_field.
>
> gcc/ChangeLog:
>
> 	* expmed.c (extract_bit_field_1): Make sure we're playing with
> 	integral modes before call extract_integral_bit_field.
> 	(extract_integral_bit_field): Add a parameter of type
> 	scalar_int_mode which corresponds to of tmode.
> 	And call extract_and_convert_fixed_bit_field instead of
> 	extract_fixed_bit_field and convert_extracted_bit_field.
> 	(extract_and_convert_fixed_bit_field): New function, it's a
> 	combination of extract_fixed_bit_field and
> 	convert_extracted_bit_field.
>
> gcc/testsuite/ChangeLog:
> 	* gcc.target/i386/float16-5.c: New test.
I bet this is all getting triggered due to the introduction of HFmode.  
Wrapping with a subreg to get an integral mode may work, but I'd be more 
comfortable if we had other instances where we knew wrapping an SF/DF 
mode with SI/DI was enough to make all this code safe.  I fear we're 
just pushing the bug down in one spot and it's going to pop up elsewhere.

Another approach would be to force the object into memory, but I suspect 
y'all don't want to do that ;-)

So in the end, it may be reasonable, but I wouldn't be surprised if we 
trip over more problems in this code with FP modes.

jeff


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-24 11:38                                                   ` Richard Biener
@ 2021-08-26  1:17                                                     ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-26  1:17 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, liuhongt, Richard Sandiford

On Tue, Aug 24, 2021 at 7:39 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Aug 24, 2021 at 11:38 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> > > > > > <richard.sandiford@arm.com> wrote:
> > > > > > >
> > > > > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > > > > > On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
> > > > > > > >>
> > > > > > > >> Hi:
> > > > > > > >> ---
> > > > > > > >> OK, I think sth is amiss here upthread.  insv/extv do look like they
> > > > > > > >> are designed
> > > > > > > >> to work on integer modes (but docs do not say anything about this here).
> > > > > > > >> In fact the caller of extract_bit_field_using_extv is named
> > > > > > > >> extract_integral_bit_field.  Of course nothing seems to check what kind of
> > > > > > > >> modes we're dealing with, but we're for example happily doing
> > > > > > > >> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > > > > > > >> some integer mode and op0 is HFmode?  From the above I get it's
> > > > > > > >> the other way around?  In that case we should wrap the
> > > > > > > >> call to extract_integral_bit_field, extracting in an integer mode with the
> > > > > > > >> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> > > > > > > >> ---
> > > > > > > >>   This is a separate patch as a follow up of upper comments.
> > > > > > > >>
> > > > > > > >> gcc/ChangeLog:
> > > > > > > >>
> > > > > > > >>         * expmed.c (extract_bit_field_1): Wrap the call to
> > > > > > > >>         extract_integral_bit_field, extracting in an integer mode with
> > > > > > > >>         the same size as 'tmode' and then converting the result
> > > > > > > >>         as (subreg:tmode (reg:imode)).
> > > > > > > >>
> > > > > > > >> gcc/testsuite/ChangeLog:
> > > > > > > >>         * gcc.target/i386/float16-5.c: New test.
> > > > > > > >> ---
> > > > > > > >>  gcc/expmed.c                              | 19 +++++++++++++++++++
> > > > > > > >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
> > > > > > > >>  2 files changed, 31 insertions(+)
> > > > > > > >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > > > >>
> > > > > > > >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > > > > > >> index 3143f38e057..72790693ef0 100644
> > > > > > > >> --- a/gcc/expmed.c
> > > > > > > >> +++ b/gcc/expmed.c
> > > > > > > >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> > > > > > > >>        op0_mode = opt_scalar_int_mode ();
> > > > > > > >>      }
> > > > > > > >>
> > > > > > > >> +  /* Make sure we are playing with integral modes.  Pun with subregs
> > > > > > > >> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > > > > > > >> +     in extract_integral_bit_field.  */
> > > > > > > >> +  if (int_mode_for_mode (tmode).exists (&imode)
> > > > > > > >
> > > > > > > > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > > > > > > > cheaper.  Then imode should always be != tmode.  Maybe
> > > > > > > > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > > > > > > > how it behaves for composite modes.
> > > > > > > >
> > > > > > > > Of course the least surprises would happen when we restrict this
> > > > > > > > to FLOAT_MODE_P (tmode).
> > > > > > > >
> > > > > > > > Richard - any preferences?
> > > > > > >
> > > > > > > If the bug is that extract_integral_bit_field is being called with
> > > > > > > a non-integral mode parameter, then it looks odd that we can still
> > > > > > > fall through to it without an integral mode (when exists is false).
> > > > > > >
> > > > > > > If calling extract_integral_bit_field without an integral mode is
> > > > > > > a bug then I think we should have:
> > > > > > >
> > > > > > >   int_mode_for_mode (mode).require ()
> > > > > > >
> > > > > > > whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
> > > > > > > Ideally we'd make the mode parameter scalar_int_mode too.
> > > > > > >
> > > > > > > extract_integral_bit_field currently has:
> > > > > > >
> > > > > > >   /* Find a correspondingly-sized integer field, so we can apply
> > > > > > >      shifts and masks to it.  */
> > > > > > >   scalar_int_mode int_mode;
> > > > > > >   if (!int_mode_for_mode (tmode).exists (&int_mode))
> > > > > > >     /* If this fails, we should probably push op0 out to memory and then
> > > > > > >        do a load.  */
> > > > > > >     int_mode = int_mode_for_mode (mode).require ();
> > > > > > >
> > > > > > > which would seem to be redundant after this change.
> > > > > >
> > > > > > I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> > > > > > up creating a lowpart subreg that's not allowed and that ICEs (and I
> > > > > > can't see a way to check beforehand).  So it seems to me at least
> > > > > > part of that function doesn't expect non-integral extraction modes.
> > > > > >
> > > > > > But who knows - the code is older than I am (OK, not, but older than
> > > > > > my involvment in GCC ;))
> > > > > >
> > > > > How about attached patch w/ below changelog
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * expmed.c (extract_bit_field_1): Make sure we're playing with
> > > > >         integral modes before call extract_integral_bit_field.
> > > > >         (extract_integral_bit_field): Add a parameter of type
> > > > >         scalar_int_mode which corresponds to of tmode.
> > > > >         And call extract_and_convert_fixed_bit_field instead of
> > > > >         extract_fixed_bit_field and convert_extracted_bit_field.
> > > > >         (extract_and_convert_fixed_bit_field): New function, it's a
> > > > >         combination of extract_fixed_bit_field and
> > > > >         convert_extracted_bit_field.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >         * gcc.target/i386/float16-5.c: New test.
> > > > >
> > > > I'd like to ping this patch, or maybe we can use the patch before with
> > > > richi's comments.
> > > > >
> > >
> > > Rebased and ping^2, there are plenty of avx512fp16 patches blocked by
> > > this patch, i'd like someone to help review this patch.
> > >
> > Please ignore the former attached patch, should be the patch attached here.
>
> I think the patch is reasonable.  I'm a bit worried approving it since
> my knowledge
> of the code is restricted.  I wonder if you can tell the change doesn't make
> a difference for the majority of cases - that is, did you try for
> example comparing
> generated code for GCC (or parts of it)?
Build same for SPEC2017 and eembc.
>
> OK if there's no objection from others within 48 hours.
>
> Thanks,
> Richard.
>
> > > > > > Richard.
> > > > > >
> > > > > > > >> +      && imode != tmode
> > > > > > > >> +      && imode != GET_MODE (op0))
> > > > > > > >> +    {
> > > > > > > >> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> > > > > > > >> +                                           bitsize.to_constant (),
> > > > > > > >> +                                           bitnum.to_constant (), unsignedp,
> > > > > > > >> +                                           NULL, imode, imode,
> > > > > > > >> +                                           reverse, fallback_p);
> > > > > > > >> +      gcc_assert (ret);
> > > > > > > >> +
> > > > > > > >> +      if (!REG_P (ret))
> > > > > > > >> +       ret = force_reg (imode, ret);
> > > > > > > >> +      return gen_lowpart_SUBREG (tmode, ret);
> > > > > > > >> +    }
> > > > > > > >> +
> > > > > > > >>    /* It's possible we'll need to handle other cases here for
> > > > > > > >>       polynomial bitnum and bitsize.  */
> > > > > > >
> > > > > > > Minor nit, but since the code is using to_constant, it should go after
> > > > > > > this comment rather than before it.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard
> > > > > > >
> > > > > > > >>
> > > > > > > >> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > > > >> new file mode 100644
> > > > > > > >> index 00000000000..ebc0af1490b
> > > > > > > >> --- /dev/null
> > > > > > > >> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > > > > > >> @@ -0,0 +1,12 @@
> > > > > > > >> +/* { dg-do compile } */
> > > > > > > >> +/* { dg-options "-msse2 -O2" } */
> > > > > > > >> +_Float16
> > > > > > > >> +foo (int a)
> > > > > > > >> +{
> > > > > > > >> +  union {
> > > > > > > >> +    int a;
> > > > > > > >> +    _Float16 b;
> > > > > > > >> +  }c;
> > > > > > > >> +  c.a = a;
> > > > > > > >> +  return c.b;
> > > > > > > >> +}
> > > > > > > >> --
> > > > > > > >> 2.27.0
> > > > > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > BR,
> > > > > Hongtao
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-25 23:16                                                   ` Jeff Law
@ 2021-08-26  2:05                                                     ` Hongtao Liu
  2021-08-26  7:11                                                     ` Richard Biener
  1 sibling, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-26  2:05 UTC (permalink / raw)
  To: Jeff Law
  Cc: Richard Biener, Richard Sandiford, liuhongt,
	Richard Biener via Gcc-patches

On Thu, Aug 26, 2021 at 7:16 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 8/24/2021 3:44 AM, Hongtao Liu via Gcc-patches wrote:
>
> On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
> On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> Hi:
> ---
> OK, I think sth is amiss here upthread.  insv/extv do look like they
> are designed
> to work on integer modes (but docs do not say anything about this here).
> In fact the caller of extract_bit_field_using_extv is named
> extract_integral_bit_field.  Of course nothing seems to check what kind of
> modes we're dealing with, but we're for example happily doing
> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> some integer mode and op0 is HFmode?  From the above I get it's
> the other way around?  In that case we should wrap the
> call to extract_integral_bit_field, extracting in an integer mode with the
> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> ---
>   This is a separate patch as a follow up of upper comments.
>
> gcc/ChangeLog:
>
>         * expmed.c (extract_bit_field_1): Wrap the call to
>         extract_integral_bit_field, extracting in an integer mode with
>         the same size as 'tmode' and then converting the result
>         as (subreg:tmode (reg:imode)).
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/i386/float16-5.c: New test.
> ---
>  gcc/expmed.c                              | 19 +++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index 3143f38e057..72790693ef0 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>        op0_mode = opt_scalar_int_mode ();
>      }
>
> +  /* Make sure we are playing with integral modes.  Pun with subregs
> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> +     in extract_integral_bit_field.  */
> +  if (int_mode_for_mode (tmode).exists (&imode)
>
> check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> cheaper.  Then imode should always be != tmode.  Maybe
> even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> how it behaves for composite modes.
>
> Of course the least surprises would happen when we restrict this
> to FLOAT_MODE_P (tmode).
>
> Richard - any preferences?
>
> If the bug is that extract_integral_bit_field is being called with
> a non-integral mode parameter, then it looks odd that we can still
> fall through to it without an integral mode (when exists is false).
>
> If calling extract_integral_bit_field without an integral mode is
> a bug then I think we should have:
>
>   int_mode_for_mode (mode).require ()
>
> whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
> Ideally we'd make the mode parameter scalar_int_mode too.
>
> extract_integral_bit_field currently has:
>
>   /* Find a correspondingly-sized integer field, so we can apply
>      shifts and masks to it.  */
>   scalar_int_mode int_mode;
>   if (!int_mode_for_mode (tmode).exists (&int_mode))
>     /* If this fails, we should probably push op0 out to memory and then
>        do a load.  */
>     int_mode = int_mode_for_mode (mode).require ();
>
> which would seem to be redundant after this change.
>
> I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> up creating a lowpart subreg that's not allowed and that ICEs (and I
> can't see a way to check beforehand).  So it seems to me at least
> part of that function doesn't expect non-integral extraction modes.
>
> But who knows - the code is older than I am (OK, not, but older than
> my involvment in GCC ;))
>
> How about attached patch w/ below changelog
>
> gcc/ChangeLog:
>
>         * expmed.c (extract_bit_field_1): Make sure we're playing with
>         integral modes before call extract_integral_bit_field.
>         (extract_integral_bit_field): Add a parameter of type
>         scalar_int_mode which corresponds to of tmode.
>         And call extract_and_convert_fixed_bit_field instead of
>         extract_fixed_bit_field and convert_extracted_bit_field.
>         (extract_and_convert_fixed_bit_field): New function, it's a
>         combination of extract_fixed_bit_field and
>         convert_extracted_bit_field.
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/i386/float16-5.c: New test.
>
> I'd like to ping this patch, or maybe we can use the patch before with
> richi's comments.
>
> Rebased and ping^2, there are plenty of avx512fp16 patches blocked by
> this patch, i'd like someone to help review this patch.
>
> Please ignore the former attached patch, should be the patch attached here.
>
> Richard.
>
> +      && imode != tmode
> +      && imode != GET_MODE (op0))
> +    {
> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> +                                           bitsize.to_constant (),
> +                                           bitnum.to_constant (), unsignedp,
> +                                           NULL, imode, imode,
> +                                           reverse, fallback_p);
> +      gcc_assert (ret);
> +
> +      if (!REG_P (ret))
> +       ret = force_reg (imode, ret);
> +      return gen_lowpart_SUBREG (tmode, ret);
> +    }
> +
>    /* It's possible we'll need to handle other cases here for
>       polynomial bitnum and bitsize.  */
>
> Minor nit, but since the code is using to_constant, it should go after
> this comment rather than before it.
>
> Thanks,
> Richard
>
> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> new file mode 100644
> index 00000000000..ebc0af1490b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse2 -O2" } */
> +_Float16
> +foo (int a)
> +{
> +  union {
> +    int a;
> +    _Float16 b;
> +  }c;
> +  c.a = a;
> +  return c.b;
> +}
> --
> 2.27.0
>
>
>
> --
> BR,
> Hongtao
>
>
> --
> BR,
> Hongtao
>
>
> --
> BR,
> Hongtao
>
>
>
> 0001-Make-sure-we-re-playing-with-integral-modes-before-c.patch
>
> From 9c77ac15e69b567156a82debe45e3ced10df1110 Mon Sep 17 00:00:00 2001
> From: liuhongt <hongtao.liu@intel.com>
> Date: Fri, 6 Aug 2021 10:18:43 +0800
> Subject: [PATCH] Make sure we're playing with integral modes before call
>  extract_integral_bit_field.
>
> gcc/ChangeLog:
>
> * expmed.c (extract_bit_field_1): Make sure we're playing with
> integral modes before call extract_integral_bit_field.
> (extract_integral_bit_field): Add a parameter of type
> scalar_int_mode which corresponds to of tmode.
> And call extract_and_convert_fixed_bit_field instead of
> extract_fixed_bit_field and convert_extracted_bit_field.
> (extract_and_convert_fixed_bit_field): New function, it's a
> combination of extract_fixed_bit_field and
> convert_extracted_bit_field.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/float16-5.c: New test.
>
> I bet this is all getting triggered due to the introduction of HFmode.  Wrapping with a subreg to get an integral mode may work, but I'd be more comfortable if we had other instances where we knew wrapping an SF/DF mode with SI/DI was enough to make all this code safe.  I fear we're just pushing the bug down in one spot and it's going to pop up elsewhere.
For SFmode, it will go into the new approach and work fine, i.e.
float
foo (long long b)
{
  union{float a;
    long long b;}c;
  c.b = b;
  return c.a;
}

For DFmode, It will generate (subreg:DF (reg:TI/DI)) right before it
goes into my new patch, because validate_subreg allows

  /* ??? This should not be here.  Temporarily continue to allow word_mode
     subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
     Generally, backends are doing something sketchy but it'll take time to
     fix them all.  */
  if (omode == word_mode)
    ;
  /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
     is the culprit here, and not the backends.  */
  else if (known_ge (osize, regsize) && known_ge (isize, osize))
    ;

>
> Another approach would be to force the object into memory, but I suspect y'all don't want to do that ;-)
>
> So in the end, it may be reasonable, but I wouldn't be surprised if we trip over more problems in this code with FP modes.
I can upstream this patches separately first, then wait for a week, if
no new problems are exposed on trunk, then I will upstream the float16
patches.
>
> jeff
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-25 23:16                                                   ` Jeff Law
  2021-08-26  2:05                                                     ` Hongtao Liu
@ 2021-08-26  7:11                                                     ` Richard Biener
  2021-08-26  9:06                                                       ` Richard Sandiford
  1 sibling, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-26  7:11 UTC (permalink / raw)
  To: Jeff Law
  Cc: Hongtao Liu, Richard Sandiford, liuhongt, Richard Biener via Gcc-patches

On Thu, Aug 26, 2021 at 1:16 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 8/24/2021 3:44 AM, Hongtao Liu via Gcc-patches wrote:
>
> On Tue, Aug 24, 2021 at 5:40 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 17, 2021 at 9:52 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Aug 9, 2021 at 4:34 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Aug 6, 2021 at 7:27 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
> On Fri, Aug 6, 2021 at 5:32 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> Hi:
> ---
> OK, I think sth is amiss here upthread.  insv/extv do look like they
> are designed
> to work on integer modes (but docs do not say anything about this here).
> In fact the caller of extract_bit_field_using_extv is named
> extract_integral_bit_field.  Of course nothing seems to check what kind of
> modes we're dealing with, but we're for example happily doing
> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> some integer mode and op0 is HFmode?  From the above I get it's
> the other way around?  In that case we should wrap the
> call to extract_integral_bit_field, extracting in an integer mode with the
> same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
> ---
>   This is a separate patch as a follow up of upper comments.
>
> gcc/ChangeLog:
>
>         * expmed.c (extract_bit_field_1): Wrap the call to
>         extract_integral_bit_field, extracting in an integer mode with
>         the same size as 'tmode' and then converting the result
>         as (subreg:tmode (reg:imode)).
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/i386/float16-5.c: New test.
> ---
>  gcc/expmed.c                              | 19 +++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++++++++++++
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index 3143f38e057..72790693ef0 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>        op0_mode = opt_scalar_int_mode ();
>      }
>
> +  /* Make sure we are playing with integral modes.  Pun with subregs
> +     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> +     in extract_integral_bit_field.  */
> +  if (int_mode_for_mode (tmode).exists (&imode)
>
> check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> cheaper.  Then imode should always be != tmode.  Maybe
> even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> how it behaves for composite modes.
>
> Of course the least surprises would happen when we restrict this
> to FLOAT_MODE_P (tmode).
>
> Richard - any preferences?
>
> If the bug is that extract_integral_bit_field is being called with
> a non-integral mode parameter, then it looks odd that we can still
> fall through to it without an integral mode (when exists is false).
>
> If calling extract_integral_bit_field without an integral mode is
> a bug then I think we should have:
>
>   int_mode_for_mode (mode).require ()
>
> whenever mode is not already SCALAR_INT_MODE_P/is_a<scalar_int_mode>.
> Ideally we'd make the mode parameter scalar_int_mode too.
>
> extract_integral_bit_field currently has:
>
>   /* Find a correspondingly-sized integer field, so we can apply
>      shifts and masks to it.  */
>   scalar_int_mode int_mode;
>   if (!int_mode_for_mode (tmode).exists (&int_mode))
>     /* If this fails, we should probably push op0 out to memory and then
>        do a load.  */
>     int_mode = int_mode_for_mode (mode).require ();
>
> which would seem to be redundant after this change.
>
> I'm not sure what exactly the bug is, but extract_integral_bit_field ends
> up creating a lowpart subreg that's not allowed and that ICEs (and I
> can't see a way to check beforehand).  So it seems to me at least
> part of that function doesn't expect non-integral extraction modes.
>
> But who knows - the code is older than I am (OK, not, but older than
> my involvment in GCC ;))
>
> How about attached patch w/ below changelog
>
> gcc/ChangeLog:
>
>         * expmed.c (extract_bit_field_1): Make sure we're playing with
>         integral modes before call extract_integral_bit_field.
>         (extract_integral_bit_field): Add a parameter of type
>         scalar_int_mode which corresponds to of tmode.
>         And call extract_and_convert_fixed_bit_field instead of
>         extract_fixed_bit_field and convert_extracted_bit_field.
>         (extract_and_convert_fixed_bit_field): New function, it's a
>         combination of extract_fixed_bit_field and
>         convert_extracted_bit_field.
>
> gcc/testsuite/ChangeLog:
>         * gcc.target/i386/float16-5.c: New test.
>
> I'd like to ping this patch, or maybe we can use the patch before with
> richi's comments.
>
> Rebased and ping^2, there are plenty of avx512fp16 patches blocked by
> this patch, i'd like someone to help review this patch.
>
> Please ignore the former attached patch, should be the patch attached here.
>
> Richard.
>
> +      && imode != tmode
> +      && imode != GET_MODE (op0))
> +    {
> +      rtx ret = extract_integral_bit_field (op0, op0_mode,
> +                                           bitsize.to_constant (),
> +                                           bitnum.to_constant (), unsignedp,
> +                                           NULL, imode, imode,
> +                                           reverse, fallback_p);
> +      gcc_assert (ret);
> +
> +      if (!REG_P (ret))
> +       ret = force_reg (imode, ret);
> +      return gen_lowpart_SUBREG (tmode, ret);
> +    }
> +
>    /* It's possible we'll need to handle other cases here for
>       polynomial bitnum and bitsize.  */
>
> Minor nit, but since the code is using to_constant, it should go after
> this comment rather than before it.
>
> Thanks,
> Richard
>
> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
> new file mode 100644
> index 00000000000..ebc0af1490b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse2 -O2" } */
> +_Float16
> +foo (int a)
> +{
> +  union {
> +    int a;
> +    _Float16 b;
> +  }c;
> +  c.a = a;
> +  return c.b;
> +}
> --
> 2.27.0
>
>
>
> --
> BR,
> Hongtao
>
>
> --
> BR,
> Hongtao
>
>
> --
> BR,
> Hongtao
>
>
>
> 0001-Make-sure-we-re-playing-with-integral-modes-before-c.patch
>
> From 9c77ac15e69b567156a82debe45e3ced10df1110 Mon Sep 17 00:00:00 2001
> From: liuhongt <hongtao.liu@intel.com>
> Date: Fri, 6 Aug 2021 10:18:43 +0800
> Subject: [PATCH] Make sure we're playing with integral modes before call
>  extract_integral_bit_field.
>
> gcc/ChangeLog:
>
> * expmed.c (extract_bit_field_1): Make sure we're playing with
> integral modes before call extract_integral_bit_field.
> (extract_integral_bit_field): Add a parameter of type
> scalar_int_mode which corresponds to of tmode.
> And call extract_and_convert_fixed_bit_field instead of
> extract_fixed_bit_field and convert_extracted_bit_field.
> (extract_and_convert_fixed_bit_field): New function, it's a
> combination of extract_fixed_bit_field and
> convert_extracted_bit_field.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/float16-5.c: New test.
>
> I bet this is all getting triggered due to the introduction of HFmode.  Wrapping with a subreg to get an integral mode may work, but I'd be more comfortable if we had other instances where we knew wrapping an SF/DF mode with SI/DI was enough to make all this code safe.  I fear we're just pushing the bug down in one spot and it's going to pop up elsewhere.
>
> Another approach would be to force the object into memory, but I suspect y'all don't want to do that ;-)
>
> So in the end, it may be reasonable, but I wouldn't be surprised if we trip over more problems in this code with FP modes.

One thought I had is whether we can "fix" validate_subreg to have less
"weird" allowed float-int
special cases.  As said upthread I think that we either should allow
all of those, implying that
subregs work semantically as if there's subregs to same-sized integer
modes inbetween or
disallow them all and make sure we're actually doing that explicitely.

For example

  /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
     is the culprit here, and not the backends.  */
  else if (known_ge (osize, regsize) && known_ge (isize, osize))
    ;

I can't decipther rtl.text as to what the semantics of such a subreg is
given the docs hand-wave about WORDS_BIG_ENDIAN vs.
FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
when you mix those in a subreg.  So maybe the above should
have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.

But then the world would be much simpler if subregs of non-same size
modes have explicit documentation for the mode kinds we have.

Richard.

> jeff
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-26  7:11                                                     ` Richard Biener
@ 2021-08-26  9:06                                                       ` Richard Sandiford
  2021-08-26 10:14                                                         ` Richard Biener
  0 siblings, 1 reply; 138+ messages in thread
From: Richard Sandiford @ 2021-08-26  9:06 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches; +Cc: Jeff Law, Richard Biener, liuhongt

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> One thought I had is whether we can "fix" validate_subreg to have less
> "weird" allowed float-int
> special cases.  As said upthread I think that we either should allow
> all of those, implying that
> subregs work semantically as if there's subregs to same-sized integer
> modes inbetween or
> disallow them all and make sure we're actually doing that explicitely.
>
> For example
>
>   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>      is the culprit here, and not the backends.  */
>   else if (known_ge (osize, regsize) && known_ge (isize, osize))
>     ;
>
> I can't decipther rtl.text as to what the semantics of such a subreg is
> given the docs hand-wave about WORDS_BIG_ENDIAN vs.
> FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
> when you mix those in a subreg.  So maybe the above should
> have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.
>
> But then the world would be much simpler if subregs of non-same size
> modes have explicit documentation for the mode kinds we have.

Yeah.  Although validate_subreg was a good idea, some of the mode checks
are IMO a failed experiment.  The hope was that eventually we'd remove
all those special exceptions once the culprit has been fixed.  However,
the code is over 16 years old at this point and those changes never
happened.

Nested subregs aren't a thing (thankfully) and one of the big disadvantages
of the current validate_subreg mode-changing rules is that they aren't
transitive.  This can artificially require temporary pseudos for things
that could be expressed directly as a single subreg.

I'm not even sure we have to worry about WORDS_BIG_ENDIAN vs.
FLOAT_WORDS_BIG_ENDIAN.  The SUBREG_BYTE rules follow WORDS_BIG_ENDIAN
for all modes, so its up to the generator of the subreg to diddle the
SUBREG_BYTE if they want to use a different interpretation for floats.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-26  9:06                                                       ` Richard Sandiford
@ 2021-08-26 10:14                                                         ` Richard Biener
  2021-08-26 10:50                                                           ` Richard Sandiford
  0 siblings, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-26 10:14 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches, Jeff Law, Richard Biener,
	liuhongt, Richard Sandiford

On Thu, Aug 26, 2021 at 11:06 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > One thought I had is whether we can "fix" validate_subreg to have less
> > "weird" allowed float-int
> > special cases.  As said upthread I think that we either should allow
> > all of those, implying that
> > subregs work semantically as if there's subregs to same-sized integer
> > modes inbetween or
> > disallow them all and make sure we're actually doing that explicitely.
> >
> > For example
> >
> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> >      is the culprit here, and not the backends.  */
> >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> >     ;
> >
> > I can't decipther rtl.text as to what the semantics of such a subreg is
> > given the docs hand-wave about WORDS_BIG_ENDIAN vs.
> > FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
> > when you mix those in a subreg.  So maybe the above should
> > have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.
> >
> > But then the world would be much simpler if subregs of non-same size
> > modes have explicit documentation for the mode kinds we have.
>
> Yeah.  Although validate_subreg was a good idea, some of the mode checks
> are IMO a failed experiment.  The hope was that eventually we'd remove
> all those special exceptions once the culprit has been fixed.  However,
> the code is over 16 years old at this point and those changes never
> happened.
>
> Nested subregs aren't a thing (thankfully) and one of the big disadvantages
> of the current validate_subreg mode-changing rules is that they aren't
> transitive.  This can artificially require temporary pseudos for things
> that could be expressed directly as a single subreg.

And that's what the proposed patch does (add same-mode size integer mode
punning intermediate subregs).

So if that's not supposed to be necessary then why restrict subregs at all?

I mean you seem to imply that the semantics would be clear and well-defined
(to you - not to me).  The only thing is that of course not all subregs are
"implemented" by a target (or can be, w/o spilling).

Which means - we should adjust validate_subreg with another special-case
or rather generalize the existing ones to an overall set that makes more
sense?

Richard.

> I'm not even sure we have to worry about WORDS_BIG_ENDIAN vs.
> FLOAT_WORDS_BIG_ENDIAN.  The SUBREG_BYTE rules follow WORDS_BIG_ENDIAN
> for all modes, so its up to the generator of the subreg to diddle the
> SUBREG_BYTE if they want to use a different interpretation for floats.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-26 10:14                                                         ` Richard Biener
@ 2021-08-26 10:50                                                           ` Richard Sandiford
  2021-08-26 11:09                                                             ` Richard Biener
  0 siblings, 1 reply; 138+ messages in thread
From: Richard Sandiford @ 2021-08-26 10:50 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches; +Cc: Jeff Law, Richard Biener, liuhongt

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, Aug 26, 2021 at 11:06 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > One thought I had is whether we can "fix" validate_subreg to have less
>> > "weird" allowed float-int
>> > special cases.  As said upthread I think that we either should allow
>> > all of those, implying that
>> > subregs work semantically as if there's subregs to same-sized integer
>> > modes inbetween or
>> > disallow them all and make sure we're actually doing that explicitely.
>> >
>> > For example
>> >
>> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>> >      is the culprit here, and not the backends.  */
>> >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
>> >     ;
>> >
>> > I can't decipther rtl.text as to what the semantics of such a subreg is
>> > given the docs hand-wave about WORDS_BIG_ENDIAN vs.
>> > FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
>> > when you mix those in a subreg.  So maybe the above should
>> > have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.
>> >
>> > But then the world would be much simpler if subregs of non-same size
>> > modes have explicit documentation for the mode kinds we have.
>>
>> Yeah.  Although validate_subreg was a good idea, some of the mode checks
>> are IMO a failed experiment.  The hope was that eventually we'd remove
>> all those special exceptions once the culprit has been fixed.  However,
>> the code is over 16 years old at this point and those changes never
>> happened.
>>
>> Nested subregs aren't a thing (thankfully) and one of the big disadvantages
>> of the current validate_subreg mode-changing rules is that they aren't
>> transitive.  This can artificially require temporary pseudos for things
>> that could be expressed directly as a single subreg.
>
> And that's what the proposed patch does (add same-mode size integer mode
> punning intermediate subregs).
>
> So if that's not supposed to be necessary then why restrict subregs at all?

I was trying to say: I'm not sure we should.

> I mean you seem to imply that the semantics would be clear and well-defined
> (to you - not to me).  The only thing is that of course not all subregs are
> "implemented" by a target (or can be, w/o spilling).

Yeah.  That's for TARGET_CAN_CHANGE_MODE_CLASS to decide.
But it only comes in to play during RA or when trying to take
the subreg of a particular hard register.  Transitivity doesn't
matter so much for the hard register case since the result of
simplify_gen_subreg should then be another hard register.

> Which means - we should adjust validate_subreg with another special-case
> or rather generalize the existing ones to an overall set that makes more
> sense?

Maybe it's too radical, but I would whether we should just get rid of:

  /* ??? This should not be here.  Temporarily continue to allow word_mode
     subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
     Generally, backends are doing something sketchy but it'll take time to
     fix them all.  */
  if (omode == word_mode)
    ;
  /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
     is the culprit here, and not the backends.  */
  else if (known_ge (osize, regsize) && known_ge (isize, osize))
    ;
  /* Allow component subregs of complex and vector.  Though given the below
     extraction rules, it's not always clear what that means.  */
  else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
	   && GET_MODE_INNER (imode) == omode)
    ;
  /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
     i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
     surely isn't the cleanest way to represent this.  It's questionable
     if this ought to be represented at all -- why can't this all be hidden
     in post-reload splitters that make arbitrarily mode changes to the
     registers themselves.  */
  else if (VECTOR_MODE_P (omode)
	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
    ;
  /* Subregs involving floating point modes are not allowed to
     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
     (subreg:SI (reg:DF) 0) isn't.  */
  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
    {
      if (! (known_eq (isize, osize)
	     /* LRA can use subreg to store a floating point value in
		an integer mode.  Although the floating point and the
		integer modes need the same number of hard registers,
		the size of floating point mode can be less than the
		integer mode.  LRA also uses subregs for a register
		should be used in different mode in on insn.  */
	     || lra_in_progress))
	return false;
    }

altogether.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-26 10:50                                                           ` Richard Sandiford
@ 2021-08-26 11:09                                                             ` Richard Biener
  2021-08-27  4:56                                                               ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-26 11:09 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches, Jeff Law, Richard Biener,
	liuhongt, Richard Sandiford

On Thu, Aug 26, 2021 at 12:50 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Thu, Aug 26, 2021 at 11:06 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > One thought I had is whether we can "fix" validate_subreg to have less
> >> > "weird" allowed float-int
> >> > special cases.  As said upthread I think that we either should allow
> >> > all of those, implying that
> >> > subregs work semantically as if there's subregs to same-sized integer
> >> > modes inbetween or
> >> > disallow them all and make sure we're actually doing that explicitely.
> >> >
> >> > For example
> >> >
> >> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> >> >      is the culprit here, and not the backends.  */
> >> >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> >> >     ;
> >> >
> >> > I can't decipther rtl.text as to what the semantics of such a subreg is
> >> > given the docs hand-wave about WORDS_BIG_ENDIAN vs.
> >> > FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
> >> > when you mix those in a subreg.  So maybe the above should
> >> > have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.
> >> >
> >> > But then the world would be much simpler if subregs of non-same size
> >> > modes have explicit documentation for the mode kinds we have.
> >>
> >> Yeah.  Although validate_subreg was a good idea, some of the mode checks
> >> are IMO a failed experiment.  The hope was that eventually we'd remove
> >> all those special exceptions once the culprit has been fixed.  However,
> >> the code is over 16 years old at this point and those changes never
> >> happened.
> >>
> >> Nested subregs aren't a thing (thankfully) and one of the big disadvantages
> >> of the current validate_subreg mode-changing rules is that they aren't
> >> transitive.  This can artificially require temporary pseudos for things
> >> that could be expressed directly as a single subreg.
> >
> > And that's what the proposed patch does (add same-mode size integer mode
> > punning intermediate subregs).
> >
> > So if that's not supposed to be necessary then why restrict subregs at all?
>
> I was trying to say: I'm not sure we should.
>
> > I mean you seem to imply that the semantics would be clear and well-defined
> > (to you - not to me).  The only thing is that of course not all subregs are
> > "implemented" by a target (or can be, w/o spilling).
>
> Yeah.  That's for TARGET_CAN_CHANGE_MODE_CLASS to decide.
> But it only comes in to play during RA or when trying to take
> the subreg of a particular hard register.  Transitivity doesn't
> matter so much for the hard register case since the result of
> simplify_gen_subreg should then be another hard register.
>
> > Which means - we should adjust validate_subreg with another special-case
> > or rather generalize the existing ones to an overall set that makes more
> > sense?
>
> Maybe it's too radical, but I would whether we should just get rid of:
>
>   /* ??? This should not be here.  Temporarily continue to allow word_mode
>      subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
>      Generally, backends are doing something sketchy but it'll take time to
>      fix them all.  */
>   if (omode == word_mode)
>     ;
>   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
>      is the culprit here, and not the backends.  */
>   else if (known_ge (osize, regsize) && known_ge (isize, osize))
>     ;
>   /* Allow component subregs of complex and vector.  Though given the below
>      extraction rules, it's not always clear what that means.  */
>   else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
>            && GET_MODE_INNER (imode) == omode)
>     ;
>   /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
>      i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
>      surely isn't the cleanest way to represent this.  It's questionable
>      if this ought to be represented at all -- why can't this all be hidden
>      in post-reload splitters that make arbitrarily mode changes to the
>      registers themselves.  */
>   else if (VECTOR_MODE_P (omode)
>            && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
>     ;
>   /* Subregs involving floating point modes are not allowed to
>      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
>      (subreg:SI (reg:DF) 0) isn't.  */
>   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
>     {
>       if (! (known_eq (isize, osize)
>              /* LRA can use subreg to store a floating point value in
>                 an integer mode.  Although the floating point and the
>                 integer modes need the same number of hard registers,
>                 the size of floating point mode can be less than the
>                 integer mode.  LRA also uses subregs for a register
>                 should be used in different mode in on insn.  */
>              || lra_in_progress))
>         return false;
>     }
>
> altogether.

Yeah, I would fully support this.  Maybe replace it with a comment
but I don't know what it should say.

Richard.

>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-26 11:09                                                             ` Richard Biener
@ 2021-08-27  4:56                                                               ` Hongtao Liu
  2021-08-30 19:09                                                                 ` Joseph Myers
  2021-08-31  6:10                                                                 ` Richard Biener
  0 siblings, 2 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-27  4:56 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, Jeff Law, liuhongt, Richard Sandiford

On Thu, Aug 26, 2021 at 7:09 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 26, 2021 at 12:50 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > On Thu, Aug 26, 2021 at 11:06 AM Richard Sandiford
> > > <richard.sandiford@arm.com> wrote:
> > >>
> > >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > >> > One thought I had is whether we can "fix" validate_subreg to have less
> > >> > "weird" allowed float-int
> > >> > special cases.  As said upthread I think that we either should allow
> > >> > all of those, implying that
> > >> > subregs work semantically as if there's subregs to same-sized integer
> > >> > modes inbetween or
> > >> > disallow them all and make sure we're actually doing that explicitely.
> > >> >
> > >> > For example
> > >> >
> > >> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > >> >      is the culprit here, and not the backends.  */
> > >> >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > >> >     ;
> > >> >
> > >> > I can't decipther rtl.text as to what the semantics of such a subreg is
> > >> > given the docs hand-wave about WORDS_BIG_ENDIAN vs.
> > >> > FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
> > >> > when you mix those in a subreg.  So maybe the above should
> > >> > have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.
> > >> >
> > >> > But then the world would be much simpler if subregs of non-same size
> > >> > modes have explicit documentation for the mode kinds we have.
> > >>
> > >> Yeah.  Although validate_subreg was a good idea, some of the mode checks
> > >> are IMO a failed experiment.  The hope was that eventually we'd remove
> > >> all those special exceptions once the culprit has been fixed.  However,
> > >> the code is over 16 years old at this point and those changes never
> > >> happened.
> > >>
> > >> Nested subregs aren't a thing (thankfully) and one of the big disadvantages
> > >> of the current validate_subreg mode-changing rules is that they aren't
> > >> transitive.  This can artificially require temporary pseudos for things
> > >> that could be expressed directly as a single subreg.
> > >
> > > And that's what the proposed patch does (add same-mode size integer mode
> > > punning intermediate subregs).
> > >
> > > So if that's not supposed to be necessary then why restrict subregs at all?
> >
> > I was trying to say: I'm not sure we should.
> >
> > > I mean you seem to imply that the semantics would be clear and well-defined
> > > (to you - not to me).  The only thing is that of course not all subregs are
> > > "implemented" by a target (or can be, w/o spilling).
> >
> > Yeah.  That's for TARGET_CAN_CHANGE_MODE_CLASS to decide.
> > But it only comes in to play during RA or when trying to take
> > the subreg of a particular hard register.  Transitivity doesn't
> > matter so much for the hard register case since the result of
> > simplify_gen_subreg should then be another hard register.
> >
> > > Which means - we should adjust validate_subreg with another special-case
> > > or rather generalize the existing ones to an overall set that makes more
> > > sense?
> >
> > Maybe it's too radical, but I would whether we should just get rid of:
> >
> >   /* ??? This should not be here.  Temporarily continue to allow word_mode
> >      subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
> >      Generally, backends are doing something sketchy but it'll take time to
> >      fix them all.  */
> >   if (omode == word_mode)
> >     ;
> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> >      is the culprit here, and not the backends.  */
> >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> >     ;
> >   /* Allow component subregs of complex and vector.  Though given the below
> >      extraction rules, it's not always clear what that means.  */
> >   else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
> >            && GET_MODE_INNER (imode) == omode)
> >     ;
> >   /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
> >      i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
> >      surely isn't the cleanest way to represent this.  It's questionable
> >      if this ought to be represented at all -- why can't this all be hidden
> >      in post-reload splitters that make arbitrarily mode changes to the
> >      registers themselves.  */
> >   else if (VECTOR_MODE_P (omode)
> >            && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
> >     ;
> >   /* Subregs involving floating point modes are not allowed to
> >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> >      (subreg:SI (reg:DF) 0) isn't.  */
> >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> >     {
> >       if (! (known_eq (isize, osize)
> >              /* LRA can use subreg to store a floating point value in
> >                 an integer mode.  Although the floating point and the
> >                 integer modes need the same number of hard registers,
> >                 the size of floating point mode can be less than the
> >                 integer mode.  LRA also uses subregs for a register
> >                 should be used in different mode in on insn.  */
> >              || lra_in_progress))
> >         return false;
> >     }
> >
> > altogether.
>
> Yeah, I would fully support this.  Maybe replace it with a comment
> but I don't know what it should say.
>
> Richard.
>
> >
> > Thanks,
> > Richard

I'm going to upstream the patch.

-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-27  4:56                                                               ` Hongtao Liu
@ 2021-08-30 19:09                                                                 ` Joseph Myers
  2021-08-30 21:15                                                                   ` Jeff Law
  2021-08-31  6:10                                                                 ` Richard Biener
  1 sibling, 1 reply; 138+ messages in thread
From: Joseph Myers @ 2021-08-30 19:09 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Richard Biener, Richard Sandiford, liuhongt,
	Richard Biener via Gcc-patches

This commit introduces an ICE building libgcc for 32-bit SPARC.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-30 19:09                                                                 ` Joseph Myers
@ 2021-08-30 21:15                                                                   ` Jeff Law
  0 siblings, 0 replies; 138+ messages in thread
From: Jeff Law @ 2021-08-30 21:15 UTC (permalink / raw)
  To: gcc-patches



On 8/30/2021 1:09 PM, Joseph Myers wrote:
> This commit introduces an ICE building libgcc for 32-bit SPARC.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102133
I've sent Hongtao a testcase which is ICE-ing.  It was mcore-elf, but I 
saw similar failures on a half-dozen architectures before I reverted the 
change locally.

jeff

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-27  4:56                                                               ` Hongtao Liu
  2021-08-30 19:09                                                                 ` Joseph Myers
@ 2021-08-31  6:10                                                                 ` Richard Biener
  2021-08-31  6:30                                                                   ` Hongtao Liu
  1 sibling, 1 reply; 138+ messages in thread
From: Richard Biener @ 2021-08-31  6:10 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Richard Biener via Gcc-patches, Jeff Law, liuhongt, Richard Sandiford

On Fri, Aug 27, 2021 at 6:50 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Aug 26, 2021 at 7:09 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Aug 26, 2021 at 12:50 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > >
> > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > On Thu, Aug 26, 2021 at 11:06 AM Richard Sandiford
> > > > <richard.sandiford@arm.com> wrote:
> > > >>
> > > >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > >> > One thought I had is whether we can "fix" validate_subreg to have less
> > > >> > "weird" allowed float-int
> > > >> > special cases.  As said upthread I think that we either should allow
> > > >> > all of those, implying that
> > > >> > subregs work semantically as if there's subregs to same-sized integer
> > > >> > modes inbetween or
> > > >> > disallow them all and make sure we're actually doing that explicitely.
> > > >> >
> > > >> > For example
> > > >> >
> > > >> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > >> >      is the culprit here, and not the backends.  */
> > > >> >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > >> >     ;
> > > >> >
> > > >> > I can't decipther rtl.text as to what the semantics of such a subreg is
> > > >> > given the docs hand-wave about WORDS_BIG_ENDIAN vs.
> > > >> > FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
> > > >> > when you mix those in a subreg.  So maybe the above should
> > > >> > have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.
> > > >> >
> > > >> > But then the world would be much simpler if subregs of non-same size
> > > >> > modes have explicit documentation for the mode kinds we have.
> > > >>
> > > >> Yeah.  Although validate_subreg was a good idea, some of the mode checks
> > > >> are IMO a failed experiment.  The hope was that eventually we'd remove
> > > >> all those special exceptions once the culprit has been fixed.  However,
> > > >> the code is over 16 years old at this point and those changes never
> > > >> happened.
> > > >>
> > > >> Nested subregs aren't a thing (thankfully) and one of the big disadvantages
> > > >> of the current validate_subreg mode-changing rules is that they aren't
> > > >> transitive.  This can artificially require temporary pseudos for things
> > > >> that could be expressed directly as a single subreg.
> > > >
> > > > And that's what the proposed patch does (add same-mode size integer mode
> > > > punning intermediate subregs).
> > > >
> > > > So if that's not supposed to be necessary then why restrict subregs at all?
> > >
> > > I was trying to say: I'm not sure we should.
> > >
> > > > I mean you seem to imply that the semantics would be clear and well-defined
> > > > (to you - not to me).  The only thing is that of course not all subregs are
> > > > "implemented" by a target (or can be, w/o spilling).
> > >
> > > Yeah.  That's for TARGET_CAN_CHANGE_MODE_CLASS to decide.
> > > But it only comes in to play during RA or when trying to take
> > > the subreg of a particular hard register.  Transitivity doesn't
> > > matter so much for the hard register case since the result of
> > > simplify_gen_subreg should then be another hard register.
> > >
> > > > Which means - we should adjust validate_subreg with another special-case
> > > > or rather generalize the existing ones to an overall set that makes more
> > > > sense?
> > >
> > > Maybe it's too radical, but I would whether we should just get rid of:
> > >
> > >   /* ??? This should not be here.  Temporarily continue to allow word_mode
> > >      subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
> > >      Generally, backends are doing something sketchy but it'll take time to
> > >      fix them all.  */
> > >   if (omode == word_mode)
> > >     ;
> > >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > >      is the culprit here, and not the backends.  */
> > >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > >     ;
> > >   /* Allow component subregs of complex and vector.  Though given the below
> > >      extraction rules, it's not always clear what that means.  */
> > >   else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
> > >            && GET_MODE_INNER (imode) == omode)
> > >     ;
> > >   /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
> > >      i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
> > >      surely isn't the cleanest way to represent this.  It's questionable
> > >      if this ought to be represented at all -- why can't this all be hidden
> > >      in post-reload splitters that make arbitrarily mode changes to the
> > >      registers themselves.  */
> > >   else if (VECTOR_MODE_P (omode)
> > >            && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
> > >     ;
> > >   /* Subregs involving floating point modes are not allowed to
> > >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > >      (subreg:SI (reg:DF) 0) isn't.  */
> > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > >     {
> > >       if (! (known_eq (isize, osize)
> > >              /* LRA can use subreg to store a floating point value in
> > >                 an integer mode.  Although the floating point and the
> > >                 integer modes need the same number of hard registers,
> > >                 the size of floating point mode can be less than the
> > >                 integer mode.  LRA also uses subregs for a register
> > >                 should be used in different mode in on insn.  */
> > >              || lra_in_progress))
> > >         return false;
> > >     }
> > >
> > > altogether.
> >
> > Yeah, I would fully support this.  Maybe replace it with a comment
> > but I don't know what it should say.
> >
> > Richard.
> >
> > >
> > > Thanks,
> > > Richard
>
> I'm going to upstream the patch.

Hmm, so looks like you pushed the variant massaging extract_bit_field.  Above
we supported to instead "fix" validate_subreg to allow the HFmode subreg.

So maybe we should revert and try that?

Richard.

> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-31  6:10                                                                 ` Richard Biener
@ 2021-08-31  6:30                                                                   ` Hongtao Liu
  2021-08-31  6:48                                                                     ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-08-31  6:30 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, Jeff Law, liuhongt, Richard Sandiford

On Tue, Aug 31, 2021 at 2:11 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Aug 27, 2021 at 6:50 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Aug 26, 2021 at 7:09 PM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Thu, Aug 26, 2021 at 12:50 PM Richard Sandiford
> > > <richard.sandiford@arm.com> wrote:
> > > >
> > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > > On Thu, Aug 26, 2021 at 11:06 AM Richard Sandiford
> > > > > <richard.sandiford@arm.com> wrote:
> > > > >>
> > > > >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > >> > One thought I had is whether we can "fix" validate_subreg to have less
> > > > >> > "weird" allowed float-int
> > > > >> > special cases.  As said upthread I think that we either should allow
> > > > >> > all of those, implying that
> > > > >> > subregs work semantically as if there's subregs to same-sized integer
> > > > >> > modes inbetween or
> > > > >> > disallow them all and make sure we're actually doing that explicitely.
> > > > >> >
> > > > >> > For example
> > > > >> >
> > > > >> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > > >> >      is the culprit here, and not the backends.  */
> > > > >> >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > > >> >     ;
> > > > >> >
> > > > >> > I can't decipther rtl.text as to what the semantics of such a subreg is
> > > > >> > given the docs hand-wave about WORDS_BIG_ENDIAN vs.
> > > > >> > FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
> > > > >> > when you mix those in a subreg.  So maybe the above should
> > > > >> > have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.
> > > > >> >
> > > > >> > But then the world would be much simpler if subregs of non-same size
> > > > >> > modes have explicit documentation for the mode kinds we have.
> > > > >>
> > > > >> Yeah.  Although validate_subreg was a good idea, some of the mode checks
> > > > >> are IMO a failed experiment.  The hope was that eventually we'd remove
> > > > >> all those special exceptions once the culprit has been fixed.  However,
> > > > >> the code is over 16 years old at this point and those changes never
> > > > >> happened.
> > > > >>
> > > > >> Nested subregs aren't a thing (thankfully) and one of the big disadvantages
> > > > >> of the current validate_subreg mode-changing rules is that they aren't
> > > > >> transitive.  This can artificially require temporary pseudos for things
> > > > >> that could be expressed directly as a single subreg.
> > > > >
> > > > > And that's what the proposed patch does (add same-mode size integer mode
> > > > > punning intermediate subregs).
> > > > >
> > > > > So if that's not supposed to be necessary then why restrict subregs at all?
> > > >
> > > > I was trying to say: I'm not sure we should.
> > > >
> > > > > I mean you seem to imply that the semantics would be clear and well-defined
> > > > > (to you - not to me).  The only thing is that of course not all subregs are
> > > > > "implemented" by a target (or can be, w/o spilling).
> > > >
> > > > Yeah.  That's for TARGET_CAN_CHANGE_MODE_CLASS to decide.
> > > > But it only comes in to play during RA or when trying to take
> > > > the subreg of a particular hard register.  Transitivity doesn't
> > > > matter so much for the hard register case since the result of
> > > > simplify_gen_subreg should then be another hard register.
> > > >
> > > > > Which means - we should adjust validate_subreg with another special-case
> > > > > or rather generalize the existing ones to an overall set that makes more
> > > > > sense?
> > > >
> > > > Maybe it's too radical, but I would whether we should just get rid of:
> > > >
> > > >   /* ??? This should not be here.  Temporarily continue to allow word_mode
> > > >      subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
> > > >      Generally, backends are doing something sketchy but it'll take time to
> > > >      fix them all.  */
> > > >   if (omode == word_mode)
> > > >     ;
> > > >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > >      is the culprit here, and not the backends.  */
> > > >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > >     ;
> > > >   /* Allow component subregs of complex and vector.  Though given the below
> > > >      extraction rules, it's not always clear what that means.  */
> > > >   else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
> > > >            && GET_MODE_INNER (imode) == omode)
> > > >     ;
> > > >   /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
> > > >      i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
> > > >      surely isn't the cleanest way to represent this.  It's questionable
> > > >      if this ought to be represented at all -- why can't this all be hidden
> > > >      in post-reload splitters that make arbitrarily mode changes to the
> > > >      registers themselves.  */
> > > >   else if (VECTOR_MODE_P (omode)
> > > >            && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
> > > >     ;
> > > >   /* Subregs involving floating point modes are not allowed to
> > > >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > > >      (subreg:SI (reg:DF) 0) isn't.  */
> > > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > > >     {
> > > >       if (! (known_eq (isize, osize)
> > > >              /* LRA can use subreg to store a floating point value in
> > > >                 an integer mode.  Although the floating point and the
> > > >                 integer modes need the same number of hard registers,
> > > >                 the size of floating point mode can be less than the
> > > >                 integer mode.  LRA also uses subregs for a register
> > > >                 should be used in different mode in on insn.  */
> > > >              || lra_in_progress))
> > > >         return false;
> > > >     }
> > > >
> > > > altogether.
> > >
> > > Yeah, I would fully support this.  Maybe replace it with a comment
> > > but I don't know what it should say.
> > >
> > > Richard.
> > >
> > > >
> > > > Thanks,
> > > > Richard
> >
> > I'm going to upstream the patch.
>
> Hmm, so looks like you pushed the variant massaging extract_bit_field.  Above
> we supported to instead "fix" validate_subreg to allow the HFmode subreg.
>
> So maybe we should revert and try that?
This one:

> +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> +     here. Though extract_bit_field is the culprit here, not the backends.  */
> +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> +    ;

or this one

+      machine_mode tmode = GET_MODE (target);
       if (REG_P (target)
-          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
+          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
+          /* When validate_subreg doesn't allow subreg between integer mode
+             and float mode with different size, It will hit gcc_assert in
+             gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is
+             not really needed, codes like below will be finally generated.
+             (set (reg:SI 1)
+                  (and:SI (reg:DI 2) -1))
+             (set (reg:SF 3)
+                  (subreg:SF (reg:SI 1)))  */
+          && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
+          && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
         {
           target = gen_lowpart (ext_mode, target);
           if (partial_subreg_p (GET_MODE (spec_target), ext_mode))

or the proposed patch in PR102133 which may risk falling down a rabbit hole?

   gcc_checking_assert (!x
        || !(TREE_CODE (t) == SSA_NAME || is_gimple_reg (t))
        || (use_register_for_decl (t)
-   ? (REG_P (x)
+   ? (REG_P (x) || SUBREG_P (x)
       || (GET_CODE (x) == CONCAT
   && (REG_P (XEXP (x, 0))
       || SUBREG_P (XEXP (x, 0)))


>
> Richard.
>
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-31  6:30                                                                   ` Hongtao Liu
@ 2021-08-31  6:48                                                                     ` Hongtao Liu
  2021-08-31 11:16                                                                       ` Richard Biener
  2021-08-31 11:17                                                                       ` [PATCH 0/2] Get rid of all float-int special cases in validate_subreg liuhongt
  0 siblings, 2 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-08-31  6:48 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener via Gcc-patches, Jeff Law, liuhongt, Richard Sandiford

On Tue, Aug 31, 2021 at 2:30 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 31, 2021 at 2:11 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Fri, Aug 27, 2021 at 6:50 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Thu, Aug 26, 2021 at 7:09 PM Richard Biener via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > On Thu, Aug 26, 2021 at 12:50 PM Richard Sandiford
> > > > <richard.sandiford@arm.com> wrote:
> > > > >
> > > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > > > On Thu, Aug 26, 2021 at 11:06 AM Richard Sandiford
> > > > > > <richard.sandiford@arm.com> wrote:
> > > > > >>
> > > > > >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > > >> > One thought I had is whether we can "fix" validate_subreg to have less
> > > > > >> > "weird" allowed float-int
> > > > > >> > special cases.  As said upthread I think that we either should allow
> > > > > >> > all of those, implying that
> > > > > >> > subregs work semantically as if there's subregs to same-sized integer
> > > > > >> > modes inbetween or
> > > > > >> > disallow them all and make sure we're actually doing that explicitely.
> > > > > >> >
> > > > > >> > For example
> > > > > >> >
> > > > > >> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > > > >> >      is the culprit here, and not the backends.  */
> > > > > >> >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > > > >> >     ;
> > > > > >> >
> > > > > >> > I can't decipther rtl.text as to what the semantics of such a subreg is
> > > > > >> > given the docs hand-wave about WORDS_BIG_ENDIAN vs.
> > > > > >> > FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
> > > > > >> > when you mix those in a subreg.  So maybe the above should
> > > > > >> > have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.
> > > > > >> >
> > > > > >> > But then the world would be much simpler if subregs of non-same size
> > > > > >> > modes have explicit documentation for the mode kinds we have.
> > > > > >>
> > > > > >> Yeah.  Although validate_subreg was a good idea, some of the mode checks
> > > > > >> are IMO a failed experiment.  The hope was that eventually we'd remove
> > > > > >> all those special exceptions once the culprit has been fixed.  However,
> > > > > >> the code is over 16 years old at this point and those changes never
> > > > > >> happened.
> > > > > >>
> > > > > >> Nested subregs aren't a thing (thankfully) and one of the big disadvantages
> > > > > >> of the current validate_subreg mode-changing rules is that they aren't
> > > > > >> transitive.  This can artificially require temporary pseudos for things
> > > > > >> that could be expressed directly as a single subreg.
> > > > > >
> > > > > > And that's what the proposed patch does (add same-mode size integer mode
> > > > > > punning intermediate subregs).
> > > > > >
> > > > > > So if that's not supposed to be necessary then why restrict subregs at all?
> > > > >
> > > > > I was trying to say: I'm not sure we should.
> > > > >
> > > > > > I mean you seem to imply that the semantics would be clear and well-defined
> > > > > > (to you - not to me).  The only thing is that of course not all subregs are
> > > > > > "implemented" by a target (or can be, w/o spilling).
> > > > >
> > > > > Yeah.  That's for TARGET_CAN_CHANGE_MODE_CLASS to decide.
> > > > > But it only comes in to play during RA or when trying to take
> > > > > the subreg of a particular hard register.  Transitivity doesn't
> > > > > matter so much for the hard register case since the result of
> > > > > simplify_gen_subreg should then be another hard register.
> > > > >
> > > > > > Which means - we should adjust validate_subreg with another special-case
> > > > > > or rather generalize the existing ones to an overall set that makes more
> > > > > > sense?
> > > > >
> > > > > Maybe it's too radical, but I would whether we should just get rid of:
> > > > >
> > > > >   /* ??? This should not be here.  Temporarily continue to allow word_mode
> > > > >      subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
> > > > >      Generally, backends are doing something sketchy but it'll take time to
> > > > >      fix them all.  */
> > > > >   if (omode == word_mode)
> > > > >     ;
> > > > >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > > >      is the culprit here, and not the backends.  */
> > > > >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > > >     ;
> > > > >   /* Allow component subregs of complex and vector.  Though given the below
> > > > >      extraction rules, it's not always clear what that means.  */
> > > > >   else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
> > > > >            && GET_MODE_INNER (imode) == omode)
> > > > >     ;
> > > > >   /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
> > > > >      i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
> > > > >      surely isn't the cleanest way to represent this.  It's questionable
> > > > >      if this ought to be represented at all -- why can't this all be hidden
> > > > >      in post-reload splitters that make arbitrarily mode changes to the
> > > > >      registers themselves.  */
> > > > >   else if (VECTOR_MODE_P (omode)
> > > > >            && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
> > > > >     ;
> > > > >   /* Subregs involving floating point modes are not allowed to
> > > > >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > > > >      (subreg:SI (reg:DF) 0) isn't.  */
> > > > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > > > >     {
> > > > >       if (! (known_eq (isize, osize)
> > > > >              /* LRA can use subreg to store a floating point value in
> > > > >                 an integer mode.  Although the floating point and the
> > > > >                 integer modes need the same number of hard registers,
> > > > >                 the size of floating point mode can be less than the
> > > > >                 integer mode.  LRA also uses subregs for a register
> > > > >                 should be used in different mode in on insn.  */
> > > > >              || lra_in_progress))
> > > > >         return false;
> > > > >     }
> > > > >
> > > > > altogether.
let me test the patch which removed the upper code.
> > > >
> > > > Yeah, I would fully support this.  Maybe replace it with a comment
> > > > but I don't know what it should say.
> > > >
> > > > Richard.
> > > >
> > > > >
> > > > > Thanks,
> > > > > Richard
> > >
> > > I'm going to upstream the patch.
> >
> > Hmm, so looks like you pushed the variant massaging extract_bit_field.  Above
> > we supported to instead "fix" validate_subreg to allow the HFmode subreg.
> >
> > So maybe we should revert and try that?
> This one:
>
> > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > +     here. Though extract_bit_field is the culprit here, not the backends.  */
> > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > +    ;
>
> or this one
>
> +      machine_mode tmode = GET_MODE (target);
>        if (REG_P (target)
> -          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> +          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
> +          /* When validate_subreg doesn't allow subreg between integer mode
> +             and float mode with different size, It will hit gcc_assert in
> +             gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is
> +             not really needed, codes like below will be finally generated.
> +             (set (reg:SI 1)
> +                  (and:SI (reg:DI 2) -1))
> +             (set (reg:SF 3)
> +                  (subreg:SF (reg:SI 1)))  */
> +          && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
> +          && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
>          {
>            target = gen_lowpart (ext_mode, target);
>            if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
>
> or the proposed patch in PR102133 which may risk falling down a rabbit hole?
>
>    gcc_checking_assert (!x
>         || !(TREE_CODE (t) == SSA_NAME || is_gimple_reg (t))
>         || (use_register_for_decl (t)
> -   ? (REG_P (x)
> +   ? (REG_P (x) || SUBREG_P (x)
>        || (GET_CODE (x) == CONCAT
>    && (REG_P (XEXP (x, 0))
>        || SUBREG_P (XEXP (x, 0)))
>
>
> >
> > Richard.
> >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.
  2021-08-31  6:48                                                                     ` Hongtao Liu
@ 2021-08-31 11:16                                                                       ` Richard Biener
  2021-08-31 11:17                                                                       ` [PATCH 0/2] Get rid of all float-int special cases in validate_subreg liuhongt
  1 sibling, 0 replies; 138+ messages in thread
From: Richard Biener @ 2021-08-31 11:16 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Richard Biener via Gcc-patches, Jeff Law, liuhongt, Richard Sandiford

On Tue, Aug 31, 2021 at 8:48 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 31, 2021 at 2:30 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Tue, Aug 31, 2021 at 2:11 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Fri, Aug 27, 2021 at 6:50 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 26, 2021 at 7:09 PM Richard Biener via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > On Thu, Aug 26, 2021 at 12:50 PM Richard Sandiford
> > > > > <richard.sandiford@arm.com> wrote:
> > > > > >
> > > > > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > > > > On Thu, Aug 26, 2021 at 11:06 AM Richard Sandiford
> > > > > > > <richard.sandiford@arm.com> wrote:
> > > > > > >>
> > > > > > >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > > > >> > One thought I had is whether we can "fix" validate_subreg to have less
> > > > > > >> > "weird" allowed float-int
> > > > > > >> > special cases.  As said upthread I think that we either should allow
> > > > > > >> > all of those, implying that
> > > > > > >> > subregs work semantically as if there's subregs to same-sized integer
> > > > > > >> > modes inbetween or
> > > > > > >> > disallow them all and make sure we're actually doing that explicitely.
> > > > > > >> >
> > > > > > >> > For example
> > > > > > >> >
> > > > > > >> >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > > > > >> >      is the culprit here, and not the backends.  */
> > > > > > >> >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > > > > >> >     ;
> > > > > > >> >
> > > > > > >> > I can't decipther rtl.text as to what the semantics of such a subreg is
> > > > > > >> > given the docs hand-wave about WORDS_BIG_ENDIAN vs.
> > > > > > >> > FLOAT_WORDS_BIG_ENDIAN but don't actually say what happens
> > > > > > >> > when you mix those in a subreg.  So maybe the above should
> > > > > > >> > have explicitely have WORDS_BIG_ENDIAN == FLOAT_WORDS_BIG_ENDIAN.
> > > > > > >> >
> > > > > > >> > But then the world would be much simpler if subregs of non-same size
> > > > > > >> > modes have explicit documentation for the mode kinds we have.
> > > > > > >>
> > > > > > >> Yeah.  Although validate_subreg was a good idea, some of the mode checks
> > > > > > >> are IMO a failed experiment.  The hope was that eventually we'd remove
> > > > > > >> all those special exceptions once the culprit has been fixed.  However,
> > > > > > >> the code is over 16 years old at this point and those changes never
> > > > > > >> happened.
> > > > > > >>
> > > > > > >> Nested subregs aren't a thing (thankfully) and one of the big disadvantages
> > > > > > >> of the current validate_subreg mode-changing rules is that they aren't
> > > > > > >> transitive.  This can artificially require temporary pseudos for things
> > > > > > >> that could be expressed directly as a single subreg.
> > > > > > >
> > > > > > > And that's what the proposed patch does (add same-mode size integer mode
> > > > > > > punning intermediate subregs).
> > > > > > >
> > > > > > > So if that's not supposed to be necessary then why restrict subregs at all?
> > > > > >
> > > > > > I was trying to say: I'm not sure we should.
> > > > > >
> > > > > > > I mean you seem to imply that the semantics would be clear and well-defined
> > > > > > > (to you - not to me).  The only thing is that of course not all subregs are
> > > > > > > "implemented" by a target (or can be, w/o spilling).
> > > > > >
> > > > > > Yeah.  That's for TARGET_CAN_CHANGE_MODE_CLASS to decide.
> > > > > > But it only comes in to play during RA or when trying to take
> > > > > > the subreg of a particular hard register.  Transitivity doesn't
> > > > > > matter so much for the hard register case since the result of
> > > > > > simplify_gen_subreg should then be another hard register.
> > > > > >
> > > > > > > Which means - we should adjust validate_subreg with another special-case
> > > > > > > or rather generalize the existing ones to an overall set that makes more
> > > > > > > sense?
> > > > > >
> > > > > > Maybe it's too radical, but I would whether we should just get rid of:
> > > > > >
> > > > > >   /* ??? This should not be here.  Temporarily continue to allow word_mode
> > > > > >      subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
> > > > > >      Generally, backends are doing something sketchy but it'll take time to
> > > > > >      fix them all.  */
> > > > > >   if (omode == word_mode)
> > > > > >     ;
> > > > > >   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> > > > > >      is the culprit here, and not the backends.  */
> > > > > >   else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > > > >     ;
> > > > > >   /* Allow component subregs of complex and vector.  Though given the below
> > > > > >      extraction rules, it's not always clear what that means.  */
> > > > > >   else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
> > > > > >            && GET_MODE_INNER (imode) == omode)
> > > > > >     ;
> > > > > >   /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
> > > > > >      i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
> > > > > >      surely isn't the cleanest way to represent this.  It's questionable
> > > > > >      if this ought to be represented at all -- why can't this all be hidden
> > > > > >      in post-reload splitters that make arbitrarily mode changes to the
> > > > > >      registers themselves.  */
> > > > > >   else if (VECTOR_MODE_P (omode)
> > > > > >            && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
> > > > > >     ;
> > > > > >   /* Subregs involving floating point modes are not allowed to
> > > > > >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > > > > >      (subreg:SI (reg:DF) 0) isn't.  */
> > > > > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > > > > >     {
> > > > > >       if (! (known_eq (isize, osize)
> > > > > >              /* LRA can use subreg to store a floating point value in
> > > > > >                 an integer mode.  Although the floating point and the
> > > > > >                 integer modes need the same number of hard registers,
> > > > > >                 the size of floating point mode can be less than the
> > > > > >                 integer mode.  LRA also uses subregs for a register
> > > > > >                 should be used in different mode in on insn.  */
> > > > > >              || lra_in_progress))
> > > > > >         return false;
> > > > > >     }
> > > > > >
> > > > > > altogether.
> let me test the patch which removed the upper code.

Yes, that's what I was refering to.

Richard.

> > > > >
> > > > > Yeah, I would fully support this.  Maybe replace it with a comment
> > > > > but I don't know what it should say.
> > > > >
> > > > > Richard.
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Richard
> > > >
> > > > I'm going to upstream the patch.
> > >
> > > Hmm, so looks like you pushed the variant massaging extract_bit_field.  Above
> > > we supported to instead "fix" validate_subreg to allow the HFmode subreg.
> > >
> > > So maybe we should revert and try that?
> > This one:
> >
> > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> > > +     here. Though extract_bit_field is the culprit here, not the backends.  */
> > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > +    ;
> >
> > or this one
> >
> > +      machine_mode tmode = GET_MODE (target);
> >        if (REG_P (target)
> > -          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> > +          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
> > +          /* When validate_subreg doesn't allow subreg between integer mode
> > +             and float mode with different size, It will hit gcc_assert in
> > +             gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is
> > +             not really needed, codes like below will be finally generated.
> > +             (set (reg:SI 1)
> > +                  (and:SI (reg:DI 2) -1))
> > +             (set (reg:SF 3)
> > +                  (subreg:SF (reg:SI 1)))  */
> > +          && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
> > +          && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
> >          {
> >            target = gen_lowpart (ext_mode, target);
> >            if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
> >
> > or the proposed patch in PR102133 which may risk falling down a rabbit hole?
> >
> >    gcc_checking_assert (!x
> >         || !(TREE_CODE (t) == SSA_NAME || is_gimple_reg (t))
> >         || (use_register_for_decl (t)
> > -   ? (REG_P (x)
> > +   ? (REG_P (x) || SUBREG_P (x)
> >        || (GET_CODE (x) == CONCAT
> >    && (REG_P (XEXP (x, 0))
> >        || SUBREG_P (XEXP (x, 0)))
> >
> >
> > >
> > > Richard.
> > >
> > > > --
> > > > BR,
> > > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 0/2] Get rid of all float-int special cases in validate_subreg.
  2021-08-31  6:48                                                                     ` Hongtao Liu
  2021-08-31 11:16                                                                       ` Richard Biener
@ 2021-08-31 11:17                                                                       ` liuhongt
  2021-08-31 11:17                                                                         ` [PATCH 1/2] Revert "Make sure we're playing with integral modes before call extract_integral_bit_field." liuhongt
  2021-08-31 11:17                                                                         ` [PATCH 2/2] Get rid of all float-int special cases in validate_subreg liuhongt
  1 sibling, 2 replies; 138+ messages in thread
From: liuhongt @ 2021-08-31 11:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, richard.sandiford, crazylht

Hi:
  There's 2 patches, the first patch revert my r12-3218 which caused ICE
in PR102133, the second one remove all float-int special cases in
validate_subreg as suggested in [1].

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

PS: I am building SPEC2017 and eembc to see whether binaries are the same as
HEAD~2, i guess they're the same.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578189.html.

liuhongt (2):
  Revert "Make sure we're playing with integral modes before call
    extract_integral_bit_field."
  Get rid of all float-int special cases in validate_subreg.

 gcc/emit-rtl.c |  40 -------------------
 gcc/expmed.c   | 103 ++++++++++++-------------------------------------
 2 files changed, 25 insertions(+), 118 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 1/2] Revert "Make sure we're playing with integral modes before call extract_integral_bit_field."
  2021-08-31 11:17                                                                       ` [PATCH 0/2] Get rid of all float-int special cases in validate_subreg liuhongt
@ 2021-08-31 11:17                                                                         ` liuhongt
  2021-08-31 11:17                                                                         ` [PATCH 2/2] Get rid of all float-int special cases in validate_subreg liuhongt
  1 sibling, 0 replies; 138+ messages in thread
From: liuhongt @ 2021-08-31 11:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, richard.sandiford, crazylht

This reverts commit 7218c2ec365ce95f5a1012a6eb425b0a36aec6bf.

     PR middle-end/102133
---
 gcc/expmed.c | 103 +++++++++++++--------------------------------------
 1 file changed, 25 insertions(+), 78 deletions(-)

diff --git a/gcc/expmed.c b/gcc/expmed.c
index f083d6e86d0..3143f38e057 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -71,14 +71,7 @@ static void store_split_bit_field (rtx, opt_scalar_int_mode,
 static rtx extract_integral_bit_field (rtx, opt_scalar_int_mode,
 				       unsigned HOST_WIDE_INT,
 				       unsigned HOST_WIDE_INT, int, rtx,
-				       machine_mode, machine_mode,
-				       scalar_int_mode, bool, bool);
-static rtx extract_and_convert_fixed_bit_field (scalar_int_mode,
-						machine_mode, machine_mode,
-						rtx, opt_scalar_int_mode,
-						unsigned HOST_WIDE_INT,
-						unsigned HOST_WIDE_INT, rtx,
-						int, bool);
+				       machine_mode, machine_mode, bool, bool);
 static rtx extract_fixed_bit_field (machine_mode, rtx, opt_scalar_int_mode,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT, rtx, int, bool);
@@ -1639,7 +1632,6 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 {
   rtx op0 = str_rtx;
   machine_mode mode1;
-  scalar_int_mode int_tmode;
 
   if (tmode == VOIDmode)
     tmode = mode;
@@ -1861,46 +1853,10 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
   /* It's possible we'll need to handle other cases here for
      polynomial bitnum and bitsize.  */
 
-  /* Make sure we are playing with integral modes.  Pun with subregs
-     if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
-     in extract_integral_bit_field.  */
-  opt_scalar_int_mode target_imode = int_mode_for_mode (tmode);
-  if (!target_imode.exists (&int_tmode) || int_tmode != tmode)
-    {
-      if (target_imode.exists (&int_tmode))
-	{
-	  rtx ret = extract_integral_bit_field (op0, op0_mode,
-						bitsize.to_constant (),
-						bitnum.to_constant (),
-						unsignedp, NULL, int_tmode,
-						int_tmode, int_tmode,
-						reverse, fallback_p);
-	  gcc_assert (ret);
-
-	  if (!REG_P (ret))
-	    ret = force_reg (int_tmode, ret);
-	  return gen_lowpart_SUBREG (tmode, ret);
-	}
-      else
-	{
-	  if (!fallback_p)
-	    return NULL;
-
-	  int_tmode = int_mode_for_mode (mode).require ();
-	  return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
-						      op0, op0_mode,
-						      bitsize.to_constant (),
-						      bitnum.to_constant (),
-						      target, unsignedp,
-						      reverse);
-	}
-    }
-
   /* From here on we need to be looking at a fixed-size insertion.  */
   return extract_integral_bit_field (op0, op0_mode, bitsize.to_constant (),
 				     bitnum.to_constant (), unsignedp,
-				     target, mode, tmode,
-				     int_tmode, reverse, fallback_p);
+				     target, mode, tmode, reverse, fallback_p);
 }
 
 /* Subroutine of extract_bit_field_1, with the same arguments, except
@@ -1913,7 +1869,6 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 			    unsigned HOST_WIDE_INT bitsize,
 			    unsigned HOST_WIDE_INT bitnum, int unsignedp,
 			    rtx target, machine_mode mode, machine_mode tmode,
-			    scalar_int_mode int_tmode,
 			    bool reverse, bool fallback_p)
 {
   /* Handle fields bigger than a word.  */
@@ -2080,10 +2035,29 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
   if (!fallback_p)
     return NULL;
 
-  return extract_and_convert_fixed_bit_field (int_tmode, tmode, mode,
-					      op0, op0_mode, bitsize,
-					      bitnum, target, unsignedp,
-					      reverse);
+  /* Find a correspondingly-sized integer field, so we can apply
+     shifts and masks to it.  */
+  scalar_int_mode int_mode;
+  if (!int_mode_for_mode (tmode).exists (&int_mode))
+    /* If this fails, we should probably push op0 out to memory and then
+       do a load.  */
+    int_mode = int_mode_for_mode (mode).require ();
+
+  target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
+				    bitnum, target, unsignedp, reverse);
+
+  /* Complex values must be reversed piecewise, so we need to undo the global
+     reversal, convert to the complex mode and reverse again.  */
+  if (reverse && COMPLEX_MODE_P (tmode))
+    {
+      target = flip_storage_order (int_mode, target);
+      target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+      target = flip_storage_order (tmode, target);
+    }
+  else
+    target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
+
+  return target;
 }
 
 /* Generate code to extract a byte-field from STR_RTX
@@ -2155,33 +2129,6 @@ extract_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
   return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp,
 			      target, mode, tmode, reverse, true, alt_rtl);
 }
-
-/* Combination of extract_fixed_bit_field and convert_extracted_bit_field.  */
-static rtx
-extract_and_convert_fixed_bit_field (scalar_int_mode int_tmode,
-				     machine_mode tmode, machine_mode mode,
-				     rtx op0, opt_scalar_int_mode op0_mode,
-				     unsigned HOST_WIDE_INT bitsize,
-				     unsigned HOST_WIDE_INT bitnum,
-				     rtx target, int unsignedp, bool reverse)
-{
-  target = extract_fixed_bit_field (int_tmode, op0, op0_mode, bitsize,
-				    bitnum, target, unsignedp, reverse);
-
-  /* Complex values must be reversed piecewise, so we need to undo the global
-     reversal, convert to the complex mode and reverse again.  */
-  if (reverse && COMPLEX_MODE_P (tmode))
-    {
-      target = flip_storage_order (int_tmode, target);
-      target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
-      target = flip_storage_order (tmode, target);
-    }
-  else
-    target = convert_extracted_bit_field (target, mode, tmode, unsignedp);
-
-  return target;
-}
-
 \f
 /* Use shifts and boolean operations to extract a field of BITSIZE bits
    from bit BITNUM of OP0.  If OP0_MODE is defined, it is the mode of OP0,
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.
  2021-08-31 11:17                                                                       ` [PATCH 0/2] Get rid of all float-int special cases in validate_subreg liuhongt
  2021-08-31 11:17                                                                         ` [PATCH 1/2] Revert "Make sure we're playing with integral modes before call extract_integral_bit_field." liuhongt
@ 2021-08-31 11:17                                                                         ` liuhongt
  2021-08-31 11:57                                                                           ` Richard Biener
  2021-09-02 17:55                                                                           ` Segher Boessenkool
  1 sibling, 2 replies; 138+ messages in thread
From: liuhongt @ 2021-08-31 11:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, richard.sandiford, crazylht

gcc/ChangeLog:

	* emit-rtl.c (validate_subreg): Get rid of all float-int
	special cases.
---
 gcc/emit-rtl.c | 40 ----------------------------------------
 1 file changed, 40 deletions(-)

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index ff3b4449b37..77ea8948ee8 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -922,46 +922,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
 
   poly_uint64 regsize = REGMODE_NATURAL_SIZE (imode);
 
-  /* ??? This should not be here.  Temporarily continue to allow word_mode
-     subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
-     Generally, backends are doing something sketchy but it'll take time to
-     fix them all.  */
-  if (omode == word_mode)
-    ;
-  /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
-     is the culprit here, and not the backends.  */
-  else if (known_ge (osize, regsize) && known_ge (isize, osize))
-    ;
-  /* Allow component subregs of complex and vector.  Though given the below
-     extraction rules, it's not always clear what that means.  */
-  else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
-	   && GET_MODE_INNER (imode) == omode)
-    ;
-  /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
-     i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
-     surely isn't the cleanest way to represent this.  It's questionable
-     if this ought to be represented at all -- why can't this all be hidden
-     in post-reload splitters that make arbitrarily mode changes to the
-     registers themselves.  */
-  else if (VECTOR_MODE_P (omode)
-	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
-    ;
-  /* Subregs involving floating point modes are not allowed to
-     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
-     (subreg:SI (reg:DF) 0) isn't.  */
-  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
-    {
-      if (! (known_eq (isize, osize)
-	     /* LRA can use subreg to store a floating point value in
-		an integer mode.  Although the floating point and the
-		integer modes need the same number of hard registers,
-		the size of floating point mode can be less than the
-		integer mode.  LRA also uses subregs for a register
-		should be used in different mode in on insn.  */
-	     || lra_in_progress))
-	return false;
-    }
-
   /* Paradoxical subregs must have offset zero.  */
   if (maybe_gt (osize, isize))
     return known_eq (offset, 0U);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.
  2021-08-31 11:17                                                                         ` [PATCH 2/2] Get rid of all float-int special cases in validate_subreg liuhongt
@ 2021-08-31 11:57                                                                           ` Richard Biener
  2021-09-02 17:55                                                                           ` Segher Boessenkool
  1 sibling, 0 replies; 138+ messages in thread
From: Richard Biener @ 2021-08-31 11:57 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Richard Sandiford, Hongtao Liu

On Tue, Aug 31, 2021 at 1:17 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> gcc/ChangeLog:

OK.

Thanks,
Richard.

>         * emit-rtl.c (validate_subreg): Get rid of all float-int
>         special cases.
> ---
>  gcc/emit-rtl.c | 40 ----------------------------------------
>  1 file changed, 40 deletions(-)
>
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index ff3b4449b37..77ea8948ee8 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -922,46 +922,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
>
>    poly_uint64 regsize = REGMODE_NATURAL_SIZE (imode);
>
> -  /* ??? This should not be here.  Temporarily continue to allow word_mode
> -     subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
> -     Generally, backends are doing something sketchy but it'll take time to
> -     fix them all.  */
> -  if (omode == word_mode)
> -    ;
> -  /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
> -     is the culprit here, and not the backends.  */
> -  else if (known_ge (osize, regsize) && known_ge (isize, osize))
> -    ;
> -  /* Allow component subregs of complex and vector.  Though given the below
> -     extraction rules, it's not always clear what that means.  */
> -  else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
> -          && GET_MODE_INNER (imode) == omode)
> -    ;
> -  /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
> -     i.e. (subreg:V4SF (reg:SF) 0) or (subreg:V4SF (reg:V2SF) 0).  This
> -     surely isn't the cleanest way to represent this.  It's questionable
> -     if this ought to be represented at all -- why can't this all be hidden
> -     in post-reload splitters that make arbitrarily mode changes to the
> -     registers themselves.  */
> -  else if (VECTOR_MODE_P (omode)
> -          && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
> -    ;
> -  /* Subregs involving floating point modes are not allowed to
> -     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> -     (subreg:SI (reg:DF) 0) isn't.  */
> -  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> -    {
> -      if (! (known_eq (isize, osize)
> -            /* LRA can use subreg to store a floating point value in
> -               an integer mode.  Although the floating point and the
> -               integer modes need the same number of hard registers,
> -               the size of floating point mode can be less than the
> -               integer mode.  LRA also uses subregs for a register
> -               should be used in different mode in on insn.  */
> -            || lra_in_progress))
> -       return false;
> -    }
> -
>    /* Paradoxical subregs must have offset zero.  */
>    if (maybe_gt (osize, isize))
>      return known_eq (offset, 0U);
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-08-02  6:31                   ` [PATCH V3 0/6] Initial support for AVX512FP16 liuhongt
                                       ` (5 preceding siblings ...)
  2021-08-02  6:44                     ` [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions liuhongt
@ 2021-09-02  6:06                     ` Hongtao Liu
  2021-09-02 11:30                       ` Iain Sandoe
                                         ` (2 more replies)
  6 siblings, 3 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-09-02  6:06 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Uros Bizjak, Joseph Myers, Richard Biener, H. J. Lu

I'm going to check in the first 3 patches which are already approved.

  Update hf soft-fp from glibc.
  [i386] Enable _Float16 type for TARGET_SSE2 and above.
  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
    truncations.

On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> Update from v2:
>
> 1. Support -fexcess-precision=16 which will enable
> FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
> should not do anything different from -fexcess-precision=fast
>  regarding _Float16.
> 3. Avoiding macroization of HFmode patterns.
> 4. Allow (subreg:SI (reg:HF)).
> 5. Update documents corresponding exactly to the code changes in
> the same patch.
> 6. According to 32bit abi, pass vector _Float16 by sse registers
> for 32-bit mode, not stack.
>
> Guo, Xuepeng (1):
>   AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
>     instructions.
>
> liuhongt (5):
>   Update hf soft-fp from glibc.
>   [i386] Enable _Float16 type for TARGET_SSE2 and above.
>   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
>     truncations.
>   Support -fexcess-precision=16 which will enable
>     FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
>   AVX512FP16: Support vector init/broadcast/set/extract for FP16.
>
>  gcc/ada/gcc-interface/misc.c                  |   3 +
>  gcc/c-family/c-common.c                       |   6 +-
>  gcc/c-family/c-cppbuiltin.c                   |   6 +-
>  gcc/common.opt                                |   5 +-
>  gcc/common/config/i386/cpuinfo.h              |   2 +
>  gcc/common/config/i386/i386-common.c          |  26 +-
>  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
>  gcc/common/config/i386/i386-isas.h            |   1 +
>  gcc/config.gcc                                |   2 +-
>  gcc/config/aarch64/aarch64.c                  |   1 +
>  gcc/config/arm/arm.c                          |   1 +
>  gcc/config/i386/avx512fp16intrin.h            | 225 ++++++++++
>  gcc/config/i386/cpuid.h                       |   1 +
>  gcc/config/i386/i386-builtin-types.def        |   7 +-
>  gcc/config/i386/i386-builtins.c               |  23 +
>  gcc/config/i386/i386-c.c                      |   2 +
>  gcc/config/i386/i386-expand.c                 | 129 +++++-
>  gcc/config/i386/i386-isa.def                  |   1 +
>  gcc/config/i386/i386-modes.def                |  13 +-
>  gcc/config/i386/i386-options.c                |   4 +-
>  gcc/config/i386/i386.c                        | 243 +++++++++--
>  gcc/config/i386/i386.h                        |  29 +-
>  gcc/config/i386/i386.md                       | 291 ++++++++++++-
>  gcc/config/i386/i386.opt                      |   4 +
>  gcc/config/i386/immintrin.h                   |   4 +
>  gcc/config/i386/sse.md                        | 397 +++++++++++++-----
>  gcc/config/m68k/m68k.c                        |   2 +
>  gcc/config/s390/s390.c                        |   2 +
>  gcc/coretypes.h                               |   3 +-
>  gcc/doc/extend.texi                           |  22 +
>  gcc/doc/invoke.texi                           |  10 +-
>  gcc/doc/tm.texi                               |  14 +-
>  gcc/doc/tm.texi.in                            |   3 +
>  gcc/emit-rtl.c                                |   5 +
>  gcc/flag-types.h                              |   3 +-
>  gcc/fortran/options.c                         |   3 +
>  gcc/lto/lto-lang.c                            |   3 +
>  gcc/target.def                                |  11 +-
>  gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
>  gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
>  gcc/testsuite/g++.target/i386/float16-2.C     |  14 +
>  gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
>  gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
>  gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
>  .../gcc.target/i386/avx512fp16-12a.c          |  21 +
>  .../gcc.target/i386/avx512fp16-12b.c          |  27 ++
>  gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
>  gcc/testsuite/gcc.target/i386/float16-5.c     |  12 +
>  gcc/testsuite/gcc.target/i386/float16-6.c     |   8 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 +
>  gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
>  .../gcc.target/i386/sse2-float16-1.c          |   8 +
>  .../gcc.target/i386/sse2-float16-2.c          |  16 +
>  .../gcc.target/i386/sse2-float16-3.c          |  12 +
>  gcc/testsuite/lib/target-supports.exp         |  13 +-
>  gcc/tree.c                                    |   3 +-
>  libgcc/config.host                            |   5 +-
>  libgcc/config/i386/32/sfp-machine.h           |   1 +
>  libgcc/config/i386/32/t-softfp                |   1 +
>  libgcc/config/i386/64/sfp-machine.h           |   1 +
>  libgcc/config/i386/64/t-softfp                |   1 +
>  libgcc/config/i386/sfp-machine.h              |   1 +
>  libgcc/config/i386/t-softfp                   |   5 +
>  libgcc/soft-fp/eqhf2.c                        |  49 +++
>  libgcc/soft-fp/extendhfdf2.c                  |  53 +++
>  libgcc/soft-fp/extendhfsf2.c                  |  49 +++
>  libgcc/soft-fp/half.h                         |   1 +
>  libgcc/soft-fp/truncdfhf2.c                   |  52 +++
>  libgcc/soft-fp/truncsfhf2.c                   |  48 +++
>  78 files changed, 1781 insertions(+), 223 deletions(-)
>  create mode 100644 gcc/config/i386/avx512fp16intrin.h
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
>  create mode 100644 libgcc/config/i386/64/t-softfp
>  create mode 100644 libgcc/soft-fp/eqhf2.c
>  create mode 100644 libgcc/soft-fp/extendhfdf2.c
>  create mode 100644 libgcc/soft-fp/extendhfsf2.c
>  create mode 100644 libgcc/soft-fp/truncdfhf2.c
>  create mode 100644 libgcc/soft-fp/truncsfhf2.c
>
> --
> 2.27.0
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
  2021-08-24  9:39                               ` Hongtao Liu
@ 2021-09-02  6:06                                 ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-09-02  6:06 UTC (permalink / raw)
  To: Joseph Myers; +Cc: liuhongt, GCC Patches, schwab, Richard Sandiford

On Tue, Aug 24, 2021 at 5:39 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 17, 2021 at 9:53 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Fri, Aug 6, 2021 at 2:06 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Tue, Aug 3, 2021 at 10:44 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Tue, Aug 3, 2021 at 3:34 AM Joseph Myers <joseph@codesourcery.com> wrote:
> > > > >
> > > > > On Mon, 2 Aug 2021, liuhongt via Gcc-patches wrote:
> > > > >
> > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > index 7979e240426..dc673c89bc8 100644
> > > > > > --- a/gcc/config/i386/i386.c
> > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > @@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum excess_precision_type type)
> > > > > >       return (type == EXCESS_PRECISION_TYPE_STANDARD
> > > > > >               ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> > > > > >               : FLT_EVAL_METHOD_UNPREDICTABLE);
> > > > > > +      case EXCESS_PRECISION_TYPE_FLOAT16:
> > > > > > +     return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > > > > >        default:
> > > > > >       gcc_unreachable ();
> > > > > >      }
> > > > >
> > > > > I'd expect an error for -fexcess-precision=16 with -mfpmath=387 (since x87
> > > > > doesn't do float or double arithmetic, but -fexcess-precision=16 implies
> > > > > that all of _Float16, float and double are represented to the range and
> > > > > precision of their type withou any excess precision).
> > > > >
> > > > Yes, additional changes like this.
> > > >
> > > > modified   gcc/config/i386/i386.c
> > > > @@ -23443,6 +23443,9 @@ ix86_get_excess_precision (enum
> > > > excess_precision_type type)
> > > >   ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> > > >   : FLT_EVAL_METHOD_UNPREDICTABLE);
> > > >        case EXCESS_PRECISION_TYPE_FLOAT16:
> > > > + if (TARGET_80387
> > > > +     && !(TARGET_SSE_MATH && TARGET_SSE))
> > > > +   error ("%<-fexcess-precision=16%> is not compatible with %<-mfpmath=387%>");
> > > >   return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > > >        default:
> > > >   gcc_unreachable ();
> > > > new file   gcc/testsuite/gcc.target/i386/float16-7.c
> > > > @@ -0,0 +1,9 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -mfpmath=387 -fexcess-precision=16" } */
> > > > +/* { dg-excess-errors "'-fexcess-precision=16' is not compatible with
> > > > '-mfpmath=387'" } */
> > > > +_Float16
> > > > +foo (_Float16 a, _Float16 b)
> > > > +{
> > > > +  return a + b;/* { dg-error "'-fexcess-precision=16' is not
> > > > compatible with '-mfpmath=387'" } */
> > > > +}
> > > > +
> > > >
> > > > > --
> > > > > Joseph S. Myers
> > > > > joseph@codesourcery.com
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > > Updated patch and ping for it.
> > >
> > > Also for backend changes.
> > > 1. For backend m68k/s390 which totally don't support _Float16, backend
> > > will issue an error for -fexcess-precision=16, I think it should be
> > > fine.
> > > 2. For backend like arm/aarch64 which supports _Float16 , backend will
> > > set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for -fexcess-precision=16 even
> > > hardware instruction for fp16 is not supported. Would that be ok for
> > > arm?
> >
> > Ping for this patch.
> >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
> Rebased and ping^3, there are plenty of avx512fp16 patches blocked by
> this patch, i'd like someone to help review this patch.
I'm going to check in this patch if there's no objections in the next 48 hours.

> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-02  6:06                     ` [PATCH V3 0/6] Initial support for AVX512FP16 Hongtao Liu
@ 2021-09-02 11:30                       ` Iain Sandoe
  2021-09-02 15:18                         ` Hongtao Liu
  2021-09-02 15:30                       ` H.J. Lu
  2021-09-02 19:45                       ` Joseph Myers
  2 siblings, 1 reply; 138+ messages in thread
From: Iain Sandoe @ 2021-09-02 11:30 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: liuhongt, GCC Patches, Joseph Myers

Hi Hongtao.

> On 2 Sep 2021, at 07:06, Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> I'm going to check in the first 3 patches which are already approved.
> 
>  Update hf soft-fp from glibc.
>  [i386] Enable _Float16 type for TARGET_SSE2 and above.
>  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
>    truncations.

Bootstrap on Darwin x86_64 is broken on at least AVX512 and i5 cpus at revision
r12-3311-g1e6267b33526.

"fp-machine.h:81:22: error: unknown type name 'TFtype'; did you mean 'HFtype’?”

any immediate ideas on what might be the issue?
thanks
Iain

> 
> On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
>> 
>> Update from v2:
>> 
>> 1. Support -fexcess-precision=16 which will enable
>> FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
>> 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
>> should not do anything different from -fexcess-precision=fast
>> regarding _Float16.
>> 3. Avoiding macroization of HFmode patterns.
>> 4. Allow (subreg:SI (reg:HF)).
>> 5. Update documents corresponding exactly to the code changes in
>> the same patch.
>> 6. According to 32bit abi, pass vector _Float16 by sse registers
>> for 32-bit mode, not stack.
>> 
>> Guo, Xuepeng (1):
>>  AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
>>    instructions.
>> 
>> liuhongt (5):
>>  Update hf soft-fp from glibc.
>>  [i386] Enable _Float16 type for TARGET_SSE2 and above.
>>  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
>>    truncations.
>>  Support -fexcess-precision=16 which will enable
>>    FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
>>  AVX512FP16: Support vector init/broadcast/set/extract for FP16.
>> 
>> gcc/ada/gcc-interface/misc.c                  |   3 +
>> gcc/c-family/c-common.c                       |   6 +-
>> gcc/c-family/c-cppbuiltin.c                   |   6 +-
>> gcc/common.opt                                |   5 +-
>> gcc/common/config/i386/cpuinfo.h              |   2 +
>> gcc/common/config/i386/i386-common.c          |  26 +-
>> gcc/common/config/i386/i386-cpuinfo.h         |   1 +
>> gcc/common/config/i386/i386-isas.h            |   1 +
>> gcc/config.gcc                                |   2 +-
>> gcc/config/aarch64/aarch64.c                  |   1 +
>> gcc/config/arm/arm.c                          |   1 +
>> gcc/config/i386/avx512fp16intrin.h            | 225 ++++++++++
>> gcc/config/i386/cpuid.h                       |   1 +
>> gcc/config/i386/i386-builtin-types.def        |   7 +-
>> gcc/config/i386/i386-builtins.c               |  23 +
>> gcc/config/i386/i386-c.c                      |   2 +
>> gcc/config/i386/i386-expand.c                 | 129 +++++-
>> gcc/config/i386/i386-isa.def                  |   1 +
>> gcc/config/i386/i386-modes.def                |  13 +-
>> gcc/config/i386/i386-options.c                |   4 +-
>> gcc/config/i386/i386.c                        | 243 +++++++++--
>> gcc/config/i386/i386.h                        |  29 +-
>> gcc/config/i386/i386.md                       | 291 ++++++++++++-
>> gcc/config/i386/i386.opt                      |   4 +
>> gcc/config/i386/immintrin.h                   |   4 +
>> gcc/config/i386/sse.md                        | 397 +++++++++++++-----
>> gcc/config/m68k/m68k.c                        |   2 +
>> gcc/config/s390/s390.c                        |   2 +
>> gcc/coretypes.h                               |   3 +-
>> gcc/doc/extend.texi                           |  22 +
>> gcc/doc/invoke.texi                           |  10 +-
>> gcc/doc/tm.texi                               |  14 +-
>> gcc/doc/tm.texi.in                            |   3 +
>> gcc/emit-rtl.c                                |   5 +
>> gcc/flag-types.h                              |   3 +-
>> gcc/fortran/options.c                         |   3 +
>> gcc/lto/lto-lang.c                            |   3 +
>> gcc/target.def                                |  11 +-
>> gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
>> gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
>> gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
>> gcc/testsuite/g++.target/i386/float16-2.C     |  14 +
>> gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
>> gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
>> gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
>> gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
>> .../gcc.target/i386/avx512fp16-12a.c          |  21 +
>> .../gcc.target/i386/avx512fp16-12b.c          |  27 ++
>> gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
>> gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
>> gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
>> gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
>> gcc/testsuite/gcc.target/i386/float16-5.c     |  12 +
>> gcc/testsuite/gcc.target/i386/float16-6.c     |   8 +
>> gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>> gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 +
>> gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
>> gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
>> gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
>> gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
>> .../gcc.target/i386/sse2-float16-1.c          |   8 +
>> .../gcc.target/i386/sse2-float16-2.c          |  16 +
>> .../gcc.target/i386/sse2-float16-3.c          |  12 +
>> gcc/testsuite/lib/target-supports.exp         |  13 +-
>> gcc/tree.c                                    |   3 +-
>> libgcc/config.host                            |   5 +-
>> libgcc/config/i386/32/sfp-machine.h           |   1 +
>> libgcc/config/i386/32/t-softfp                |   1 +
>> libgcc/config/i386/64/sfp-machine.h           |   1 +
>> libgcc/config/i386/64/t-softfp                |   1 +
>> libgcc/config/i386/sfp-machine.h              |   1 +
>> libgcc/config/i386/t-softfp                   |   5 +
>> libgcc/soft-fp/eqhf2.c                        |  49 +++
>> libgcc/soft-fp/extendhfdf2.c                  |  53 +++
>> libgcc/soft-fp/extendhfsf2.c                  |  49 +++
>> libgcc/soft-fp/half.h                         |   1 +
>> libgcc/soft-fp/truncdfhf2.c                   |  52 +++
>> libgcc/soft-fp/truncsfhf2.c                   |  48 +++
>> 78 files changed, 1781 insertions(+), 223 deletions(-)
>> create mode 100644 gcc/config/i386/avx512fp16intrin.h
>> create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
>> create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
>> create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
>> create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/float16-6.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
>> create mode 100644 libgcc/config/i386/64/t-softfp
>> create mode 100644 libgcc/soft-fp/eqhf2.c
>> create mode 100644 libgcc/soft-fp/extendhfdf2.c
>> create mode 100644 libgcc/soft-fp/extendhfsf2.c
>> create mode 100644 libgcc/soft-fp/truncdfhf2.c
>> create mode 100644 libgcc/soft-fp/truncsfhf2.c
>> 
>> --
>> 2.27.0
>> 
> 
> 
> -- 
> BR,
> Hongtao


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-02 11:30                       ` Iain Sandoe
@ 2021-09-02 15:18                         ` Hongtao Liu
  2021-09-02 16:44                           ` Iain Sandoe
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-09-02 15:18 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: liuhongt, GCC Patches, Joseph Myers

On Thursday, September 2, 2021, Iain Sandoe <idsandoe@googlemail.com> wrote:

> Hi Hongtao.
>
> > On 2 Sep 2021, at 07:06, Hongtao Liu via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> >
> > I'm going to check in the first 3 patches which are already approved.
> >
> >  Update hf soft-fp from glibc.
> >  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> >    truncations.
>
> Bootstrap on Darwin x86_64 is broken on at least AVX512 and i5 cpus at
> revision
> r12-3311-g1e6267b33526.
>
> "fp-machine.h:81:22: error: unknown type name 'TFtype'; did you mean
> 'HFtype’?”
>
> any immediate ideas on what might be the issue?
> thanks


Seems to be related to the belowpart which is not changed by my patch, and
TFtype is defined in quad.h

76
<https://gcc.gnu.org/git?p=gcc.git;a=blob;f=libgcc/config/i386/sfp-machine.h;h=f15d29d37550936c060c4caed4182c58c43ee221;hb=f15d29d37550936c060c4caed4182c58c43ee221#l76>
/*
Define ALIASNAME as a strong alias for NAME. */
77
<https://gcc.gnu.org/git?p=gcc.git;a=blob;f=libgcc/config/i386/sfp-machine.h;h=f15d29d37550936c060c4caed4182c58c43ee221;hb=f15d29d37550936c060c4caed4182c58c43ee221#l77>
#if
defined __MACH__
78
<https://gcc.gnu.org/git?p=gcc.git;a=blob;f=libgcc/config/i386/sfp-machine.h;h=f15d29d37550936c060c4caed4182c58c43ee221;hb=f15d29d37550936c060c4caed4182c58c43ee221#l78>
/*
Mach-O doesn't support aliasing. If these functions ever return
79
<https://gcc.gnu.org/git?p=gcc.git;a=blob;f=libgcc/config/i386/sfp-machine.h;h=f15d29d37550936c060c4caed4182c58c43ee221;hb=f15d29d37550936c060c4caed4182c58c43ee221#l79>
anything
but CMPtype we need to revisit this... */
80
<https://gcc.gnu.org/git?p=gcc.git;a=blob;f=libgcc/config/i386/sfp-machine.h;h=f15d29d37550936c060c4caed4182c58c43ee221;hb=f15d29d37550936c060c4caed4182c58c43ee221#l80>
#define
strong_alias(name, aliasname) \
81
<https://gcc.gnu.org/git?p=gcc.git;a=blob;f=libgcc/config/i386/sfp-machine.h;h=f15d29d37550936c060c4caed4182c58c43ee221;hb=f15d29d37550936c060c4caed4182c58c43ee221#l81>
CMPtype
aliasname (TFtype a, TFtype b) { return name(a, b); }
82
<https://gcc.gnu.org/git?p=gcc.git;a=blob;f=libgcc/config/i386/sfp-machine.h;h=f15d29d37550936c060c4caed4182c58c43ee221;hb=f15d29d37550936c060c4caed4182c58c43ee221#l82>
#else

Would you try to add
typedef float TFtype __attribute__ ((mode (TF)));
Here to see if it fixes the issue.

Iain
>
>

> >
> > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
> >>
> >> Update from v2:
> >>
> >> 1. Support -fexcess-precision=16 which will enable
> >> FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> >> 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
> >> should not do anything different from -fexcess-precision=fast
> >> regarding _Float16.
> >> 3. Avoiding macroization of HFmode patterns.
> >> 4. Allow (subreg:SI (reg:HF)).
> >> 5. Update documents corresponding exactly to the code changes in
> >> the same patch.
> >> 6. According to 32bit abi, pass vector _Float16 by sse registers
> >> for 32-bit mode, not stack.
> >>
> >> Guo, Xuepeng (1):
> >>  AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
> >>    instructions.
> >>
> >> liuhongt (5):
> >>  Update hf soft-fp from glibc.
> >>  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >>  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> >>    truncations.
> >>  Support -fexcess-precision=16 which will enable
> >>    FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> >>  AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> >>
> >> gcc/ada/gcc-interface/misc.c                  |   3 +
> >> gcc/c-family/c-common.c                       |   6 +-
> >> gcc/c-family/c-cppbuiltin.c                   |   6 +-
> >> gcc/common.opt                                |   5 +-
> >> gcc/common/config/i386/cpuinfo.h              |   2 +
> >> gcc/common/config/i386/i386-common.c          |  26 +-
> >> gcc/common/config/i386/i386-cpuinfo.h         |   1 +
> >> gcc/common/config/i386/i386-isas.h            |   1 +
> >> gcc/config.gcc                                |   2 +-
> >> gcc/config/aarch64/aarch64.c                  |   1 +
> >> gcc/config/arm/arm.c                          |   1 +
> >> gcc/config/i386/avx512fp16intrin.h            | 225 ++++++++++
> >> gcc/config/i386/cpuid.h                       |   1 +
> >> gcc/config/i386/i386-builtin-types.def        |   7 +-
> >> gcc/config/i386/i386-builtins.c               |  23 +
> >> gcc/config/i386/i386-c.c                      |   2 +
> >> gcc/config/i386/i386-expand.c                 | 129 +++++-
> >> gcc/config/i386/i386-isa.def                  |   1 +
> >> gcc/config/i386/i386-modes.def                |  13 +-
> >> gcc/config/i386/i386-options.c                |   4 +-
> >> gcc/config/i386/i386.c                        | 243 +++++++++--
> >> gcc/config/i386/i386.h                        |  29 +-
> >> gcc/config/i386/i386.md                       | 291 ++++++++++++-
> >> gcc/config/i386/i386.opt                      |   4 +
> >> gcc/config/i386/immintrin.h                   |   4 +
> >> gcc/config/i386/sse.md                        | 397 +++++++++++++-----
> >> gcc/config/m68k/m68k.c                        |   2 +
> >> gcc/config/s390/s390.c                        |   2 +
> >> gcc/coretypes.h                               |   3 +-
> >> gcc/doc/extend.texi                           |  22 +
> >> gcc/doc/invoke.texi                           |  10 +-
> >> gcc/doc/tm.texi                               |  14 +-
> >> gcc/doc/tm.texi.in                            |   3 +
> >> gcc/emit-rtl.c                                |   5 +
> >> gcc/flag-types.h                              |   3 +-
> >> gcc/fortran/options.c                         |   3 +
> >> gcc/lto/lto-lang.c                            |   3 +
> >> gcc/target.def                                |  11 +-
> >> gcc/testsuite/g++.dg/other/i386-2.C           |   2 +-
> >> gcc/testsuite/g++.dg/other/i386-3.C           |   2 +-
> >> gcc/testsuite/g++.target/i386/float16-1.C     |   8 +
> >> gcc/testsuite/g++.target/i386/float16-2.C     |  14 +
> >> gcc/testsuite/g++.target/i386/float16-3.C     |  10 +
> >> gcc/testsuite/gcc.target/i386/avx-1.c         |   2 +-
> >> gcc/testsuite/gcc.target/i386/avx-2.c         |   2 +-
> >> gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
> >> .../gcc.target/i386/avx512fp16-12a.c          |  21 +
> >> .../gcc.target/i386/avx512fp16-12b.c          |  27 ++
> >> gcc/testsuite/gcc.target/i386/float16-3a.c    |  10 +
> >> gcc/testsuite/gcc.target/i386/float16-3b.c    |  10 +
> >> gcc/testsuite/gcc.target/i386/float16-4a.c    |  10 +
> >> gcc/testsuite/gcc.target/i386/float16-4b.c    |  10 +
> >> gcc/testsuite/gcc.target/i386/float16-5.c     |  12 +
> >> gcc/testsuite/gcc.target/i386/float16-6.c     |   8 +
> >> gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
> >> gcc/testsuite/gcc.target/i386/pr54855-12.c    |  14 +
> >> gcc/testsuite/gcc.target/i386/sse-13.c        |   2 +-
> >> gcc/testsuite/gcc.target/i386/sse-14.c        |   2 +-
> >> gcc/testsuite/gcc.target/i386/sse-22.c        |   4 +-
> >> gcc/testsuite/gcc.target/i386/sse-23.c        |   2 +-
> >> .../gcc.target/i386/sse2-float16-1.c          |   8 +
> >> .../gcc.target/i386/sse2-float16-2.c          |  16 +
> >> .../gcc.target/i386/sse2-float16-3.c          |  12 +
> >> gcc/testsuite/lib/target-supports.exp         |  13 +-
> >> gcc/tree.c                                    |   3 +-
> >> libgcc/config.host                            |   5 +-
> >> libgcc/config/i386/32/sfp-machine.h           |   1 +
> >> libgcc/config/i386/32/t-softfp                |   1 +
> >> libgcc/config/i386/64/sfp-machine.h           |   1 +
> >> libgcc/config/i386/64/t-softfp                |   1 +
> >> libgcc/config/i386/sfp-machine.h              |   1 +
> >> libgcc/config/i386/t-softfp                   |   5 +
> >> libgcc/soft-fp/eqhf2.c                        |  49 +++
> >> libgcc/soft-fp/extendhfdf2.c                  |  53 +++
> >> libgcc/soft-fp/extendhfsf2.c                  |  49 +++
> >> libgcc/soft-fp/half.h                         |   1 +
> >> libgcc/soft-fp/truncdfhf2.c                   |  52 +++
> >> libgcc/soft-fp/truncsfhf2.c                   |  48 +++
> >> 78 files changed, 1781 insertions(+), 223 deletions(-)
> >> create mode 100644 gcc/config/i386/avx512fp16intrin.h
> >> create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
> >> create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
> >> create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
> >> create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/float16-6.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> >> create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> >> create mode 100644 libgcc/config/i386/64/t-softfp
> >> create mode 100644 libgcc/soft-fp/eqhf2.c
> >> create mode 100644 libgcc/soft-fp/extendhfdf2.c
> >> create mode 100644 libgcc/soft-fp/extendhfsf2.c
> >> create mode 100644 libgcc/soft-fp/truncdfhf2.c
> >> create mode 100644 libgcc/soft-fp/truncsfhf2.c
> >>
> >> --
> >> 2.27.0
> >>
> >
> >
> > --
> > BR,
> > Hongtao
>
>

-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-02  6:06                     ` [PATCH V3 0/6] Initial support for AVX512FP16 Hongtao Liu
  2021-09-02 11:30                       ` Iain Sandoe
@ 2021-09-02 15:30                       ` H.J. Lu
  2021-09-02 15:50                         ` Hongtao Liu
  2021-09-02 19:45                       ` Joseph Myers
  2 siblings, 1 reply; 138+ messages in thread
From: H.J. Lu @ 2021-09-02 15:30 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: liuhongt, GCC Patches, Uros Bizjak, Joseph Myers, Richard Biener

On Wed, Sep 1, 2021 at 11:00 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> I'm going to check in the first 3 patches which are already approved.
>
>   Update hf soft-fp from glibc.
>   [i386] Enable _Float16 type for TARGET_SSE2 and above.
>   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
>     truncations.
>
> On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > Update from v2:
> >
> > 1. Support -fexcess-precision=16 which will enable
> > FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> > 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
> > should not do anything different from -fexcess-precision=fast
> >  regarding _Float16.
> > 3. Avoiding macroization of HFmode patterns.
> > 4. Allow (subreg:SI (reg:HF)).
> > 5. Update documents corresponding exactly to the code changes in
> > the same patch.
> > 6. According to 32bit abi, pass vector _Float16 by sse registers
> > for 32-bit mode, not stack.
> >
> > Guo, Xuepeng (1):
> >   AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
> >     instructions.
> >
> > liuhongt (5):
> >   Update hf soft-fp from glibc.
> >   [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> >     truncations.
> >   Support -fexcess-precision=16 which will enable
> >     FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> >   AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> >

I got

FAIL: gcc.dg/torture/fp-int-convert-float16.c   -Os  execution test
FAIL: gcc.dg/torture/fp-int-convert-float16-timode.c   -Os  execution test

with -m32:

[hjl@gnu-skx-1 gcc]$ ./xgcc -B./ -m32
/export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.dg/torture/fp-int-convert-float16.c
 -m32  -Os -march=i686 -mfpmath=sse -msse2
[hjl@gnu-skx-1 gcc]$ ./a.out
Aborted (core dumped)
[hjl@gnu-skx-1 gcc]$

H.J.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-02 15:30                       ` H.J. Lu
@ 2021-09-02 15:50                         ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-09-02 15:50 UTC (permalink / raw)
  To: H.J. Lu; +Cc: liuhongt, GCC Patches, Uros Bizjak, Joseph Myers, Richard Biener

On Thursday, September 2, 2021, H.J. Lu <hjl.tools@gmail.com> wrote:

> On Wed, Sep 1, 2021 at 11:00 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > I'm going to check in the first 3 patches which are already approved.
> >
> >   Update hf soft-fp from glibc.
> >   [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> >     truncations.
> >
> > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao.liu@intel.com> wrote:
> > >
> > > Update from v2:
> > >
> > > 1. Support -fexcess-precision=16 which will enable
> > > FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> > > 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
> > > should not do anything different from -fexcess-precision=fast
> > >  regarding _Float16.
> > > 3. Avoiding macroization of HFmode patterns.
> > > 4. Allow (subreg:SI (reg:HF)).
> > > 5. Update documents corresponding exactly to the code changes in
> > > the same patch.
> > > 6. According to 32bit abi, pass vector _Float16 by sse registers
> > > for 32-bit mode, not stack.
> > >
> > > Guo, Xuepeng (1):
> > >   AVX512FP16: Initial support for AVX512FP16 feature and scalar
> _Float16
> > >     instructions.
> > >
> > > liuhongt (5):
> > >   Update hf soft-fp from glibc.
> > >   [i386] Enable _Float16 type for TARGET_SSE2 and above.
> > >   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> > >     truncations.
> > >   Support -fexcess-precision=16 which will enable
> > >     FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> > >   AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> > >
>
> I got
>
> FAIL: gcc.dg/torture/fp-int-convert-float16.c   -Os  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float16-timode.c   -Os  execution test
>
> with -m32:

Guess it hit some precess excession issue w/ x87 fpu.

>
> [hjl@gnu-skx-1 gcc]$ ./xgcc -B./ -m32
> /export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.dg/
> torture/fp-int-convert-float16.c
>  -m32  -Os -march=i686 -mfpmath=sse -msse2
> [hjl@gnu-skx-1 gcc]$ ./a.out
> Aborted (core dumped)
> [hjl@gnu-skx-1 gcc]$
>
> H.J.
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-02 15:18                         ` Hongtao Liu
@ 2021-09-02 16:44                           ` Iain Sandoe
  2021-09-02 20:03                             ` Joseph Myers
  0 siblings, 1 reply; 138+ messages in thread
From: Iain Sandoe @ 2021-09-02 16:44 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: liuhongt, GCC Patches, Joseph Myers

Patch below fixes bootstrap,

OK if it passes testing on x86_64 darwin/linux?
(if !OK .. then suggestions welcome)

thanks
Iain

> On 2 Sep 2021, at 16:18, Hongtao Liu <crazylht@gmail.com> wrote:
> 
> 
> 
> On Thursday, September 2, 2021, Iain Sandoe <idsandoe@googlemail.com> wrote:
> Hi Hongtao.
> 
> > On 2 Sep 2021, at 07:06, Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > 
> > I'm going to check in the first 3 patches which are already approved.
> > 
> >  Update hf soft-fp from glibc.
> >  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> >    truncations.
> 
> Bootstrap on Darwin x86_64 is broken on at least AVX512 and i5 cpus at revision
> r12-3311-g1e6267b33526.
> 
> "fp-machine.h:81:22: error: unknown type name 'TFtype'; did you mean 'HFtype’?”
> 
> any immediate ideas on what might be the issue?
> thanks
>  
> Seems to be related to the belowpart which is not changed by my patch, and TFtype is defined in quad.h
> 
>  76 /* Define ALIASNAME as a strong alias for NAME.  */
>   77 #if defined __MACH__
>   78 /* Mach-O doesn't support aliasing.  If these functions ever return
>   79    anything but CMPtype we need to revisit this... */
>   80 #define strong_alias(name, aliasname) \
>   81   CMPtype aliasname (TFtype a, TFtype b) { return name(a, b); }
>   82 #else
> 
> Would you try to add
> typedef float TFtype __attribute__ ((mode (TF))); 
> Here to see if it fixes the issue.

I don’t think it’s  quite as simple as that - this is what I’m testing:

[PATCH] libgcc, soft-float: Fix strong_alias macro use for Darwin.

Darwin does not support strong symbol aliases and a work-
around is provided in sfp-machine.h where a second function
is created that simply calls the original.  However this
needs the arguments to the synthesized function to track
the mode of the original function.

So the fix here is to adjust the macro to allow the mode to
be provided and then to set it as needed before the header
is included.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

libgcc/ChangeLog:

	* config/i386/sfp-exceptions.c (DarwinMode): Set
	arbitrarily to DF mode (the strong_alias macros is
	not used here).
	* config/i386/sfp-machine.h: Adjust strong_alias macro
	so that the type can be provided per case.
	* soft-fp/eqdf2.c (DarwinMode): Set to DF mode.
	* soft-fp/eqhf2.c (DarwinMode): Set to HF mode.
	* soft-fp/eqsf2.c (DarwinMode): Set to SF mode.
	* soft-fp/eqtf2.c (DarwinMode): Set to TF mode.
	* soft-fp/gedf2.c (DarwinMode): Set to DF mode.
	* soft-fp/gesf2.c (DarwinMode): Set to SF mode.
	* soft-fp/getf2.c (DarwinMode): Set to TF mode.
	* soft-fp/ledf2.c (DarwinMode): Set to DF mode.
	* soft-fp/lesf2.c (DarwinMode): Set to SF mode.
	* soft-fp/letf2.c (DarwinMode): Set to TF mode.
---
 libgcc/config/i386/sfp-exceptions.c | 1 +
 libgcc/config/i386/sfp-machine.h    | 9 ++++++---
 libgcc/soft-fp/eqdf2.c              | 1 +
 libgcc/soft-fp/eqhf2.c              | 1 +
 libgcc/soft-fp/eqsf2.c              | 1 +
 libgcc/soft-fp/eqtf2.c              | 1 +
 libgcc/soft-fp/gedf2.c              | 1 +
 libgcc/soft-fp/gesf2.c              | 1 +
 libgcc/soft-fp/getf2.c              | 1 +
 libgcc/soft-fp/ledf2.c              | 1 +
 libgcc/soft-fp/lesf2.c              | 1 +
 libgcc/soft-fp/letf2.c              | 1 +
 12 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/i386/sfp-exceptions.c b/libgcc/config/i386/sfp-exceptions.c
index edb6a57bb35..7431cf93e33 100644
--- a/libgcc/config/i386/sfp-exceptions.c
+++ b/libgcc/config/i386/sfp-exceptions.c
@@ -22,6 +22,7 @@
  */
 
 #ifndef _SOFT_FLOAT
+#define DarwinMode DF
 #include "sfp-machine.h"
 
 struct fenv
diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
index f15d29d3755..2cb6119b8f8 100644
--- a/libgcc/config/i386/sfp-machine.h
+++ b/libgcc/config/i386/sfp-machine.h
@@ -75,10 +75,13 @@ void __sfp_handle_exceptions (int);
 
 /* Define ALIASNAME as a strong alias for NAME.  */
 #if defined __MACH__
-/* Mach-O doesn't support aliasing.  If these functions ever return
-   anything but CMPtype we need to revisit this... */
+/* Mach-O doesn't support aliasing, so we build a secondary function for
+   the alias - this needs the type of the arguments to be provided as
+   DarwinFtype.  If these functions ever return anything but CMPtype
+   we need to revisit this... */
 #define strong_alias(name, aliasname) \
-  CMPtype aliasname (TFtype a, TFtype b) { return name(a, b); }
+  typedef float DarwinFtype __attribute__((mode (DarwinMode))); \
+  CMPtype aliasname (DarwinFtype a, DarwinFtype b) { return name(a, b); }
 #else
 # define strong_alias(name, aliasname) _strong_alias(name, aliasname)
 # define _strong_alias(name, aliasname) \
diff --git a/libgcc/soft-fp/eqdf2.c b/libgcc/soft-fp/eqdf2.c
index 2a44ee377ce..a3bb664f5f1 100644
--- a/libgcc/soft-fp/eqdf2.c
+++ b/libgcc/soft-fp/eqdf2.c
@@ -28,6 +28,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode DF
 #include "soft-fp.h"
 #include "double.h"
 
diff --git a/libgcc/soft-fp/eqhf2.c b/libgcc/soft-fp/eqhf2.c
index 6d6634e5c54..73a3b0a13d8 100644
--- a/libgcc/soft-fp/eqhf2.c
+++ b/libgcc/soft-fp/eqhf2.c
@@ -26,6 +26,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode HF
 #include "soft-fp.h"
 #include "half.h"
 
diff --git a/libgcc/soft-fp/eqsf2.c b/libgcc/soft-fp/eqsf2.c
index c515044d7bf..84f3ba63958 100644
--- a/libgcc/soft-fp/eqsf2.c
+++ b/libgcc/soft-fp/eqsf2.c
@@ -28,6 +28,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode SF
 #include "soft-fp.h"
 #include "single.h"
 
diff --git a/libgcc/soft-fp/eqtf2.c b/libgcc/soft-fp/eqtf2.c
index 5feac41a0de..3a44e006943 100644
--- a/libgcc/soft-fp/eqtf2.c
+++ b/libgcc/soft-fp/eqtf2.c
@@ -28,6 +28,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode TF
 #include "soft-fp.h"
 #include "quad.h"
 
diff --git a/libgcc/soft-fp/gedf2.c b/libgcc/soft-fp/gedf2.c
index bcefb61aa80..551356a0ea4 100644
--- a/libgcc/soft-fp/gedf2.c
+++ b/libgcc/soft-fp/gedf2.c
@@ -28,6 +28,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode DF
 #include "soft-fp.h"
 #include "double.h"
 
diff --git a/libgcc/soft-fp/gesf2.c b/libgcc/soft-fp/gesf2.c
index 22f0b6a24be..57cae4c3fd7 100644
--- a/libgcc/soft-fp/gesf2.c
+++ b/libgcc/soft-fp/gesf2.c
@@ -28,6 +28,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode SF
 #include "soft-fp.h"
 #include "single.h"
 
diff --git a/libgcc/soft-fp/getf2.c b/libgcc/soft-fp/getf2.c
index 6c7e38f36fc..3b23c17322e 100644
--- a/libgcc/soft-fp/getf2.c
+++ b/libgcc/soft-fp/getf2.c
@@ -28,6 +28,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode TF
 #include "soft-fp.h"
 #include "quad.h"
 
diff --git a/libgcc/soft-fp/ledf2.c b/libgcc/soft-fp/ledf2.c
index c36148e2f12..ecf05ce1554 100644
--- a/libgcc/soft-fp/ledf2.c
+++ b/libgcc/soft-fp/ledf2.c
@@ -28,6 +28,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode DF
 #include "soft-fp.h"
 #include "double.h"
 
diff --git a/libgcc/soft-fp/lesf2.c b/libgcc/soft-fp/lesf2.c
index e3233535c8f..e0105600592 100644
--- a/libgcc/soft-fp/lesf2.c
+++ b/libgcc/soft-fp/lesf2.c
@@ -28,6 +28,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode SF
 #include "soft-fp.h"
 #include "single.h"
 
diff --git a/libgcc/soft-fp/letf2.c b/libgcc/soft-fp/letf2.c
index 43d9f77bca9..80bfa085e70 100644
--- a/libgcc/soft-fp/letf2.c
+++ b/libgcc/soft-fp/letf2.c
@@ -28,6 +28,7 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#define DarwinMode TF
 #include "soft-fp.h"
 #include "quad.h"
 
-- 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.
  2021-08-31 11:17                                                                         ` [PATCH 2/2] Get rid of all float-int special cases in validate_subreg liuhongt
  2021-08-31 11:57                                                                           ` Richard Biener
@ 2021-09-02 17:55                                                                           ` Segher Boessenkool
  2021-09-03 15:05                                                                             ` Andreas Schwab
  1 sibling, 1 reply; 138+ messages in thread
From: Segher Boessenkool @ 2021-09-02 17:55 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, richard.sandiford

On Tue, Aug 31, 2021 at 07:17:49PM +0800, liuhongt via Gcc-patches wrote:
> 	* emit-rtl.c (validate_subreg): Get rid of all float-int
> 	special cases.

This caused various regressions on powerpc.  Please revert this until
this can be done safely (the comment this patch deletes says why it can
not be done yet).


Segher

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-02  6:06                     ` [PATCH V3 0/6] Initial support for AVX512FP16 Hongtao Liu
  2021-09-02 11:30                       ` Iain Sandoe
  2021-09-02 15:30                       ` H.J. Lu
@ 2021-09-02 19:45                       ` Joseph Myers
  2 siblings, 0 replies; 138+ messages in thread
From: Joseph Myers @ 2021-09-02 19:45 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: liuhongt, GCC Patches

One of the committed changes breaks the build of libgcc for 32-bit x86 
configurations without SSE2 enabled by default:

In file included from /scratch/jmyers/glibc-bot/src/gcc/libgcc/soft-fp/extendhfsf2.c:31:
/scratch/jmyers/glibc-bot/src/gcc/libgcc/soft-fp/half.h:62:1: error: unable to emulate 'HF'
   62 | typedef float HFtype __attribute__ ((mode (HF)));
      | ^~~~~~~

(this showed up with my glibc bot building for i686-gnu).

Such a configuration should still support HFmode when you build user code 
with appropriate options.  I.e., the functions in question do need to be 
built into libgcc, so that user code can link against them, so you need to 
arrange for an explicit -msse2 to be used when building the HFmode libgcc 
functions (but not any other libgcc functions).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-02 16:44                           ` Iain Sandoe
@ 2021-09-02 20:03                             ` Joseph Myers
  2021-09-03  7:51                               ` Iain Sandoe
  0 siblings, 1 reply; 138+ messages in thread
From: Joseph Myers @ 2021-09-02 20:03 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: Hongtao Liu, liuhongt, GCC Patches

On Thu, 2 Sep 2021, Iain Sandoe via Gcc-patches wrote:

> diff --git a/libgcc/soft-fp/eqdf2.c b/libgcc/soft-fp/eqdf2.c
> index 2a44ee377ce..a3bb664f5f1 100644
> --- a/libgcc/soft-fp/eqdf2.c
> +++ b/libgcc/soft-fp/eqdf2.c
> @@ -28,6 +28,7 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> +#define DarwinMode DF
>  #include "soft-fp.h"
>  #include "double.h"

All these files are supposed to be taken unmodified from glibc.  They 
shouldn't contain any OS-specific code, such as a define of DarwinMode.  
sfp-machine.h, however, is libgcc-local, hence putting the definition of 
strong_alias there.

So you need some other way to extract the argument type of name in order 
to use it in a declaration of aliasname.  E.g.

__typeof (_Generic (name,
                    CMPtype (*) (HFtype, HFtype): (HFtype) 0,
                    CMPtype (*) (SFtype, SFtype): (SFtype) 0,
                    CMPtype (*) (DFtype, DFtype): (DFtype) 0,
                    CMPtype (*) (TFtype, TFtype): (TFtype) 0))

Now in fact I think the include ordering means none of the *type macros 
are defined here.  But if you do e.g.

typedef float alias_SFtype __attribute__ ((mode (SF)));

and similar, you could use alias_SFtype in the above.  And so keep the 
changes to the Darwin-specific parts of the libgcc-local sfp-machine.h.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-02 20:03                             ` Joseph Myers
@ 2021-09-03  7:51                               ` Iain Sandoe
  2021-09-03 15:33                                 ` Iain Sandoe
  0 siblings, 1 reply; 138+ messages in thread
From: Iain Sandoe @ 2021-09-03  7:51 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Hongtao Liu, liuhongt, GCC Patches

Hi Joseph,

> On 2 Sep 2021, at 21:03, Joseph Myers <joseph@codesourcery.com> wrote:
> 
> On Thu, 2 Sep 2021, Iain Sandoe via Gcc-patches wrote:
> 
>> diff --git a/libgcc/soft-fp/eqdf2.c b/libgcc/soft-fp/eqdf2.c
>> index 2a44ee377ce..a3bb664f5f1 100644
>> --- a/libgcc/soft-fp/eqdf2.c
>> +++ b/libgcc/soft-fp/eqdf2.c
>> @@ -28,6 +28,7 @@
>>    License along with the GNU C Library; if not, see
>>    <http://www.gnu.org/licenses/>.  */
>> 
>> +#define DarwinMode DF
>> #include "soft-fp.h"
>> #include "double.h"
> 
> All these files are supposed to be taken unmodified from glibc.  They 
> shouldn't contain any OS-specific code, such as a define of DarwinMode.  
> sfp-machine.h, however, is libgcc-local, hence putting the definition of 
> strong_alias there.

OK, that makes sense.
> 
> So you need some other way to extract the argument type of name in order 
> to use it in a declaration of aliasname.  E.g.
> 
> __typeof (_Generic (name,
>                    CMPtype (*) (HFtype, HFtype): (HFtype) 0,
>                    CMPtype (*) (SFtype, SFtype): (SFtype) 0,
>                    CMPtype (*) (DFtype, DFtype): (DFtype) 0,
>                    CMPtype (*) (TFtype, TFtype): (TFtype) 0))

thanks for the suggestion

> Now in fact I think the include ordering means none of the *type macros 
> are defined here.  But if you do e.g.
> 
> typedef float alias_SFtype __attribute__ ((mode (SF)));
> 
> and similar, you could use alias_SFtype in the above.  And so keep the 
> changes to the Darwin-specific parts of the libgcc-local sfp-machine.h.

this is what I’m testing - OK if it bootstraps on x86_64-darwin, linux?

thanks
Iain


[PATCH] libgcc, soft-float: Fix strong_alias macro use for Darwin.

Darwin does not support strong symbol aliases and a work-
around is provided in sfp-machine.h where a second function
is created that simply calls the original.  However this
needs the arguments to the synthesized function to track
the mode of the original function.

So the fix here is to match known floating point modes from
the incoming function and apply the one found to the new
function args.

The matching is highly specific to the current set of modes
and will need adjusting should more cases be added.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

libgcc/ChangeLog:

	* config/i386/sfp-machine.h (alias_HFtype, alias_SFtype
	alias_DFtype, alias_TFtype): New.
	(ALIAS_SELECTOR): New.
	(strong_alias): Use __typeof and a _Generic selector to
	provide the type to the synthesized function.
---
 libgcc/config/i386/sfp-machine.h | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
index f15d29d3755..172ebc70c8d 100644
--- a/libgcc/config/i386/sfp-machine.h
+++ b/libgcc/config/i386/sfp-machine.h
@@ -75,10 +75,24 @@ void __sfp_handle_exceptions (int);
 
 /* Define ALIASNAME as a strong alias for NAME.  */
 #if defined __MACH__
-/* Mach-O doesn't support aliasing.  If these functions ever return
-   anything but CMPtype we need to revisit this... */
+/* Mach-O doesn't support aliasing, so we build a secondary function for
+   the alias - we need to do a bit of a dance to find out what the type of
+   the arguments is and then apply that to the secondary function.
+   If these functions ever return anything but CMPtype we need to revisit
+   this... */
+typedef float alias_HFtype __attribute__ ((mode (HF)));
+typedef float alias_SFtype __attribute__ ((mode (SF)));
+typedef float alias_DFtype __attribute__ ((mode (DF)));
+typedef float alias_TFtype __attribute__ ((mode (TF)));
+#define ALIAS_SELECTOR \
+  CMPtype (*) (alias_HFtype, alias_HFtype): (alias_HFtype) 0, \
+  CMPtype (*) (alias_SFtype, alias_SFtype): (alias_SFtype) 0, \
+  CMPtype (*) (alias_DFtype, alias_DFtype): (alias_DFtype) 0, \
+  CMPtype (*) (alias_TFtype, alias_TFtype): (alias_TFtype) 0
 #define strong_alias(name, aliasname) \
-  CMPtype aliasname (TFtype a, TFtype b) { return name(a, b); }
+  CMPtype aliasname (__typeof (_Generic (name, ALIAS_SELECTOR)) a, \
+		     __typeof (_Generic (name, ALIAS_SELECTOR)) b) \
+		    { return name (a, b); }
 #else
 # define strong_alias(name, aliasname) _strong_alias(name, aliasname)
 # define _strong_alias(name, aliasname) \
-- 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-08-02  6:31                     ` [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
  2021-08-04  2:45                       ` Hongtao Liu
@ 2021-09-03 12:42                       ` Jakub Jelinek
  2021-09-06  2:05                         ` Hongtao Liu
  1 sibling, 1 reply; 138+ messages in thread
From: Jakub Jelinek @ 2021-09-03 12:42 UTC (permalink / raw)
  To: liuhongt; +Cc: gcc-patches, joseph

On Mon, Aug 02, 2021 at 02:31:12PM +0800, liuhongt via Gcc-patches wrote:
> 	* doc/extend.texi (Half-Precision Floating Point): Documemt
> 	_Float16 for x86.

> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
>  @section Half-Precision Floating Point
>  @cindex half-precision floating point
>  @cindex @code{__fp16} data type
> +@cindex @code{__Float16} data type
>  
>  On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
>  point via the @code{__fp16} type defined in the ARM C Language Extensions.
> @@ -1150,6 +1151,18 @@ calls.
>  It is recommended that portable code use the @code{_Float16} type defined
>  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
>  
> +On x86 targets with @code{target("sse2")} and above, GCC supports half-precision
> +(16-bit) floating point via the @code{_Float16} type which is defined by
> +18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> +which contains same data format as C.
> +
> +Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
> +operations will be emulated by software emulation and the @code{float}
> +instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
> +the intermediate result of the operation as 32-bit precision. This may lead
> +to inconsistent behavior between software emulation and AVX512-FP16
> +instructions.
> +
>  @node Decimal Float
>  @section Decimal Floating Types
>  @cindex decimal floating types

Shouldn't there be more changes for this in doc/extend.texi?

I'd say that x86 with -msse2 should be mentioned in
The @code{_Float16} type is supported on AArch64
systems by default, and on ARM systems when the IEEE format for 16-bit
floating-point types is selected with @option{-mfp16-format=ieee}.

and in
@node Half-Precision
I'd say that one sentence about the x86 support should go already in the
first paragraph, perhaps after:
On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
point via the @code{__fp16} type defined in the ARM C Language Extensions.
On ARM systems, you must enable this type explicitly with the
@option{-mfp16-format} command-line option in order to use it.
because users just won't scroll down to immediately find out that in the
10th/11th paragraph it talks about x86.
Just mention there that on all 3 arches it is available using the _Float16 type
in C, on x86 in C++ too and then on ARM/AArch64 using __fp16.

	Jakub


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.
  2021-09-02 17:55                                                                           ` Segher Boessenkool
@ 2021-09-03 15:05                                                                             ` Andreas Schwab
  2021-09-07 23:19                                                                               ` Segher Boessenkool
  0 siblings, 1 reply; 138+ messages in thread
From: Andreas Schwab @ 2021-09-03 15:05 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: liuhongt, richard.sandiford, gcc-patches

On Sep 02 2021, Segher Boessenkool wrote:

> On Tue, Aug 31, 2021 at 07:17:49PM +0800, liuhongt via Gcc-patches wrote:
>> 	* emit-rtl.c (validate_subreg): Get rid of all float-int
>> 	special cases.
>
> This caused various regressions on powerpc.  Please revert this until
> this can be done safely (the comment this patch deletes says why it can
> not be done yet).

This also breaks ada on riscv64.

s-fatgen.adb: In function 'System.Fat_Flt.Attr_Float.Scaling':
s-fatgen.adb:830:8: error: unable to find a register to spill
s-fatgen.adb:830:8: error: this is the insn:
(insn 215 321 216 26 (set (reg:SF 88 [ xx.26_39 ])
        (mult:SF (reg:SF 190)
            (subreg:SF (reg:DI 221 [164]) 0))) "s-fatgen.adb":821:25 17 {mulsf3}
     (expr_list:REG_DEAD (reg:DI 221 [164])
        (expr_list:REG_DEAD (reg:SF 190)
            (nil))))
during RTL pass: reload

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-03  7:51                               ` Iain Sandoe
@ 2021-09-03 15:33                                 ` Iain Sandoe
  2021-09-21 20:11                                   ` Joseph Myers
  0 siblings, 1 reply; 138+ messages in thread
From: Iain Sandoe @ 2021-09-03 15:33 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Hongtao Liu, liuhongt, GCC Patches



> On 3 Sep 2021, at 08:51, Iain Sandoe <idsandoe@googlemail.com> wrote:
> 
> 
>> On 2 Sep 2021, at 21:03, Joseph Myers <joseph@codesourcery.com> wrote:
>> 
>> On Thu, 2 Sep 2021, Iain Sandoe via Gcc-patches wrote:
>> 
>>> diff --git a/libgcc/soft-fp/eqdf2.c b/libgcc/soft-fp/eqdf2.c
>>> index 2a44ee377ce..a3bb664f5f1 100644
>>> --- a/libgcc/soft-fp/eqdf2.c
>>> +++ b/libgcc/soft-fp/eqdf2.c
>>> @@ -28,6 +28,7 @@
>>>   License along with the GNU C Library; if not, see
>>>   <http://www.gnu.org/licenses/>.  */
>>> 
>>> +#define DarwinMode DF
>>> #include "soft-fp.h"
>>> #include "double.h"
>> 
>> All these files are supposed to be taken unmodified from glibc.  They 
>> shouldn't contain any OS-specific code, such as a define of DarwinMode.  
>> sfp-machine.h, however, is libgcc-local, hence putting the definition of 
>> strong_alias there.
> 
> OK, that makes sense.
>> 
>> So you need some other way to extract the argument type of name in order 
>> to use it in a declaration of aliasname.  E.g.
>> 
>> __typeof (_Generic (name,
>>                   CMPtype (*) (HFtype, HFtype): (HFtype) 0,
>>                   CMPtype (*) (SFtype, SFtype): (SFtype) 0,
>>                   CMPtype (*) (DFtype, DFtype): (DFtype) 0,
>>                   CMPtype (*) (TFtype, TFtype): (TFtype) 0))
> 
> thanks for the suggestion
> 
>> Now in fact I think the include ordering means none of the *type macros 
>> are defined here.  But if you do e.g.
>> 
>> typedef float alias_SFtype __attribute__ ((mode (SF)));
>> 
>> and similar, you could use alias_SFtype in the above.  And so keep the 
>> changes to the Darwin-specific parts of the libgcc-local sfp-machine.h.
> 
> this is what I’m testing - OK if it bootstraps on x86_64-darwin, linux?

(those bootstraps were sucessful)

given that:

a) this fixes Darwin x86-64 bootstrap which has been broken for more than 24h
b) the patch is now Darwin-local.

I’ve pushed the patch below to fix the bootstrap break - but if there are any futher 
recommendations I’m happy to apply a follow-on.  It seems that there will be more
changes for the half-float support anyway,

thanks
Iain

> 
> thanks
> Iain
> 
> 
> [PATCH] libgcc, soft-float: Fix strong_alias macro use for Darwin.
> 
> Darwin does not support strong symbol aliases and a work-
> around is provided in sfp-machine.h where a second function
> is created that simply calls the original.  However this
> needs the arguments to the synthesized function to track
> the mode of the original function.
> 
> So the fix here is to match known floating point modes from
> the incoming function and apply the one found to the new
> function args.
> 
> The matching is highly specific to the current set of modes
> and will need adjusting should more cases be added.
> 
> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
> 
> libgcc/ChangeLog:
> 
> 	* config/i386/sfp-machine.h (alias_HFtype, alias_SFtype
> 	alias_DFtype, alias_TFtype): New.
> 	(ALIAS_SELECTOR): New.
> 	(strong_alias): Use __typeof and a _Generic selector to
> 	provide the type to the synthesized function.
> ---
> libgcc/config/i386/sfp-machine.h | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
> index f15d29d3755..172ebc70c8d 100644
> --- a/libgcc/config/i386/sfp-machine.h
> +++ b/libgcc/config/i386/sfp-machine.h
> @@ -75,10 +75,24 @@ void __sfp_handle_exceptions (int);
> 
> /* Define ALIASNAME as a strong alias for NAME.  */
> #if defined __MACH__
> -/* Mach-O doesn't support aliasing.  If these functions ever return
> -   anything but CMPtype we need to revisit this... */
> +/* Mach-O doesn't support aliasing, so we build a secondary function for
> +   the alias - we need to do a bit of a dance to find out what the type of
> +   the arguments is and then apply that to the secondary function.
> +   If these functions ever return anything but CMPtype we need to revisit
> +   this... */
> +typedef float alias_HFtype __attribute__ ((mode (HF)));
> +typedef float alias_SFtype __attribute__ ((mode (SF)));
> +typedef float alias_DFtype __attribute__ ((mode (DF)));
> +typedef float alias_TFtype __attribute__ ((mode (TF)));
> +#define ALIAS_SELECTOR \
> +  CMPtype (*) (alias_HFtype, alias_HFtype): (alias_HFtype) 0, \
> +  CMPtype (*) (alias_SFtype, alias_SFtype): (alias_SFtype) 0, \
> +  CMPtype (*) (alias_DFtype, alias_DFtype): (alias_DFtype) 0, \
> +  CMPtype (*) (alias_TFtype, alias_TFtype): (alias_TFtype) 0
> #define strong_alias(name, aliasname) \
> -  CMPtype aliasname (TFtype a, TFtype b) { return name(a, b); }
> +  CMPtype aliasname (__typeof (_Generic (name, ALIAS_SELECTOR)) a, \
> +		     __typeof (_Generic (name, ALIAS_SELECTOR)) b) \
> +		    { return name (a, b); }
> #else
> # define strong_alias(name, aliasname) _strong_alias(name, aliasname)
> # define _strong_alias(name, aliasname) \
> -- 
> 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-09-03 12:42                       ` [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above Jakub Jelinek
@ 2021-09-06  2:05                         ` Hongtao Liu
  2021-09-06 12:13                           ` Jakub Jelinek
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-09-06  2:05 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: liuhongt, GCC Patches, Joseph Myers

On Fri, Sep 3, 2021 at 8:42 PM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Mon, Aug 02, 2021 at 02:31:12PM +0800, liuhongt via Gcc-patches wrote:
> >       * doc/extend.texi (Half-Precision Floating Point): Documemt
> >       _Float16 for x86.
>
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
> >  @section Half-Precision Floating Point
> >  @cindex half-precision floating point
> >  @cindex @code{__fp16} data type
> > +@cindex @code{__Float16} data type
> >
> >  On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
> >  point via the @code{__fp16} type defined in the ARM C Language Extensions.
> > @@ -1150,6 +1151,18 @@ calls.
> >  It is recommended that portable code use the @code{_Float16} type defined
> >  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
> >
> > +On x86 targets with @code{target("sse2")} and above, GCC supports half-precision
> > +(16-bit) floating point via the @code{_Float16} type which is defined by
> > +18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> > +which contains same data format as C.
> > +
> > +Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
> > +operations will be emulated by software emulation and the @code{float}
> > +instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
> > +the intermediate result of the operation as 32-bit precision. This may lead
> > +to inconsistent behavior between software emulation and AVX512-FP16
> > +instructions.
> > +
> >  @node Decimal Float
> >  @section Decimal Floating Types
> >  @cindex decimal floating types
>
> Shouldn't there be more changes for this in doc/extend.texi?
>
> I'd say that x86 with -msse2 should be mentioned in
> The @code{_Float16} type is supported on AArch64
> systems by default, and on ARM systems when the IEEE format for 16-bit
> floating-point types is selected with @option{-mfp16-format=ieee}.
>
> and in
> @node Half-Precision
> I'd say that one sentence about the x86 support should go already in the
> first paragraph, perhaps after:
> On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
> point via the @code{__fp16} type defined in the ARM C Language Extensions.
> On ARM systems, you must enable this type explicitly with the
> @option{-mfp16-format} command-line option in order to use it.
> because users just won't scroll down to immediately find out that in the
> 10th/11th paragraph it talks about x86.
> Just mention there that on all 3 arches it is available using the _Float16 type
> in C, on x86 in C++ too and then on ARM/AArch64 using __fp16.
>
>         Jakub
>

How about this?

Adjust the wording for x86 _Float16 type.

gcc/ChangeLog:

        * doc/extend.texi: (@node Floating Types): Adjust the wording.
        (@node Half-Precision): Ditto.

1 file changed, 14 insertions(+), 13 deletions(-)
gcc/doc/extend.texi | 27 ++++++++++++++-------------

modified   gcc/doc/extend.texi
@@ -1076,9 +1076,11 @@ systems where @code{__float128} is supported.
The @code{_Float32}
 type is supported on all systems supporting IEEE binary32; the
 @code{_Float64} and @code{_Float32x} types are supported on all systems
 supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
-systems by default, and on ARM systems when the IEEE format for 16-bit
-floating-point types is selected with @option{-mfp16-format=ieee}.
-GCC does not currently support @code{_Float128x} on any systems.
+systems by default, and also on x86 systems with @code{target("sse2")}
+for both C and C++.
+On ARM systems when the IEEE format for 16-bit floating-point types is
+selected with @option{-mfp16-format=ieee}. GCC does not currently support
+@code{_Float128x} on any systems.

 On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
 types using the corresponding internal complex type, @code{XCmode} for
@@ -1108,6 +1110,10 @@ On ARM and AArch64 targets, GCC supports
half-precision (16-bit) floating
 point via the @code{__fp16} type defined in the ARM C Language Extensions.
 On ARM systems, you must enable this type explicitly with the
 @option{-mfp16-format} command-line option in order to use it.
+On x86 targets with @code{target("sse2")} and above,  GCC supports
+half-precision (16-bit) floating point via the @code{_Float16} type.
+For C++, x86 provides a builtin type named @code{_Float16} which contains
+same data format as C.

 ARM targets support two incompatible representations for half-precision
 floating-point values.  You must choose one of the representations and
@@ -1151,16 +1157,11 @@ calls.
 It is recommended that portable code use the @code{_Float16} type defined
 by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.

-On x86 targets with @code{target("sse2")} and above, GCC supports
half-precision
-(16-bit) floating point via the @code{_Float16} type which is defined by
-18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
-which contains same data format as C.
-
-Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
-operations will be emulated by software emulation and the @code{float}
-instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
-the intermediate result of the operation as 32-bit precision. This may lead
-to inconsistent behavior between software emulation and AVX512-FP16
+On x86 targets, without @option{-mavx512fp16}, @code{_Float16} type is
+storage only, all operations will be emulated by software emulation and the
+@code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is
+to keep the intermediate result of the operation as 32-bit precision. This may
+lead to inconsistent behavior between software emulation and AVX512-FP16
 instructions.

 @node Decimal Float

-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-09-06  2:05                         ` Hongtao Liu
@ 2021-09-06 12:13                           ` Jakub Jelinek
  2021-09-07  1:52                             ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Jakub Jelinek @ 2021-09-06 12:13 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: liuhongt, GCC Patches, Joseph Myers

On Mon, Sep 06, 2021 at 10:05:00AM +0800, Hongtao Liu wrote:

> @@ -1076,9 +1076,11 @@ systems where @code{__float128} is supported.
> The @code{_Float32}
>  type is supported on all systems supporting IEEE binary32; the
>  @code{_Float64} and @code{_Float32x} types are supported on all systems
>  supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
> -systems by default, and on ARM systems when the IEEE format for 16-bit
> -floating-point types is selected with @option{-mfp16-format=ieee}.
> -GCC does not currently support @code{_Float128x} on any systems.
> +systems by default, and also on x86 systems with @code{target("sse2")}
> +for both C and C++.
> +On ARM systems when the IEEE format for 16-bit floating-point types is
> +selected with @option{-mfp16-format=ieee}.

This isn't a sentence.  I think it should be:

The @code{_Float16} type is supported on AArch64 systems by default,
on ARM systems when the IEEE format for 16-bit floating-point types is
selected with @option{-mfp16-format=ieee} and, for both C and C++, on x86
systems with SSE2 enabled.

>  On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
>  types using the corresponding internal complex type, @code{XCmode} for
> @@ -1108,6 +1110,10 @@ On ARM and AArch64 targets, GCC supports
> half-precision (16-bit) floating
>  point via the @code{__fp16} type defined in the ARM C Language Extensions.
>  On ARM systems, you must enable this type explicitly with the
>  @option{-mfp16-format} command-line option in order to use it.
> +On x86 targets with @code{target("sse2")} and above,  GCC supports
> +half-precision (16-bit) floating point via the @code{_Float16} type.
> +For C++, x86 provides a builtin type named @code{_Float16} which contains
> +same data format as C.

Again, I'd write with SSE2 enabled, there are many ways to enable SSE2,
-msse2, -mavx, -mavx512f, ... on the command line, or various target
attributes.

	Jakub


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-09-06 12:13                           ` Jakub Jelinek
@ 2021-09-07  1:52                             ` Hongtao Liu
  2021-09-07  7:17                               ` Jakub Jelinek
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-09-07  1:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: liuhongt, GCC Patches, Joseph Myers

On Mon, Sep 6, 2021 at 8:13 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Sep 06, 2021 at 10:05:00AM +0800, Hongtao Liu wrote:
>
> > @@ -1076,9 +1076,11 @@ systems where @code{__float128} is supported.
> > The @code{_Float32}
> >  type is supported on all systems supporting IEEE binary32; the
> >  @code{_Float64} and @code{_Float32x} types are supported on all systems
> >  supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
> > -systems by default, and on ARM systems when the IEEE format for 16-bit
> > -floating-point types is selected with @option{-mfp16-format=ieee}.
> > -GCC does not currently support @code{_Float128x} on any systems.
> > +systems by default, and also on x86 systems with @code{target("sse2")}
> > +for both C and C++.
> > +On ARM systems when the IEEE format for 16-bit floating-point types is
> > +selected with @option{-mfp16-format=ieee}.
>
> This isn't a sentence.  I think it should be:
>
> The @code{_Float16} type is supported on AArch64 systems by default,
> on ARM systems when the IEEE format for 16-bit floating-point types is
> selected with @option{-mfp16-format=ieee} and, for both C and C++, on x86
> systems with SSE2 enabled.
>
> >  On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
> >  types using the corresponding internal complex type, @code{XCmode} for
> > @@ -1108,6 +1110,10 @@ On ARM and AArch64 targets, GCC supports
> > half-precision (16-bit) floating
> >  point via the @code{__fp16} type defined in the ARM C Language Extensions.
> >  On ARM systems, you must enable this type explicitly with the
> >  @option{-mfp16-format} command-line option in order to use it.
> > +On x86 targets with @code{target("sse2")} and above,  GCC supports
> > +half-precision (16-bit) floating point via the @code{_Float16} type.
> > +For C++, x86 provides a builtin type named @code{_Float16} which contains
> > +same data format as C.
>
> Again, I'd write with SSE2 enabled, there are many ways to enable SSE2,
> -msse2, -mavx, -mavx512f, ... on the command line, or various target
> attributes.

Adjust the wording for x86 _Float16 type.

gcc/ChangeLog:

* doc/extend.texi: (@node Floating Types): Adjust the wording.
(@node Half-Precision): Ditto.

1 file changed, 15 insertions(+), 13 deletions(-)
gcc/doc/extend.texi | 28 +++++++++++++++-------------

modified   gcc/doc/extend.texi
@@ -1076,9 +1076,10 @@ systems where @code{__float128} is supported.
The @code{_Float32}
 type is supported on all systems supporting IEEE binary32; the
 @code{_Float64} and @code{_Float32x} types are supported on all systems
 supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
-systems by default, and on ARM systems when the IEEE format for 16-bit
-floating-point types is selected with @option{-mfp16-format=ieee}.
-GCC does not currently support @code{_Float128x} on any systems.
+systems by default when the IEEE format for 16-bit floating-point types is
+selected with @option{-mfp16-format=ieee} and, for both C and C++, on x86
+systems with SSE2 enabled. GCC does not currently support
+@code{_Float128x} on any systems.

 On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
 types using the corresponding internal complex type, @code{XCmode} for
@@ -1108,6 +1109,12 @@ On ARM and AArch64 targets, GCC supports
half-precision (16-bit) floating
 point via the @code{__fp16} type defined in the ARM C Language Extensions.
 On ARM systems, you must enable this type explicitly with the
 @option{-mfp16-format} command-line option in order to use it.
+On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
+floating point via the @code{_Float16} type, there are many ways to enable
+SSE2, @option{-msse2, -mavx, -mavx512f, ...} on the command line, or various
+target attributes.
+For C++, x86 provides a builtin type named @code{_Float16} which contains
+same data format as C.

 ARM targets support two incompatible representations for half-precision
 floating-point values.  You must choose one of the representations and
@@ -1151,16 +1158,11 @@ calls.
 It is recommended that portable code use the @code{_Float16} type defined
 by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.

-On x86 targets with @code{target("sse2")} and above, GCC supports
half-precision
-(16-bit) floating point via the @code{_Float16} type which is defined by
-18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
-which contains same data format as C.
-
-Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
-operations will be emulated by software emulation and the @code{float}
-instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
-the intermediate result of the operation as 32-bit precision. This may lead
-to inconsistent behavior between software emulation and AVX512-FP16
+On x86 targets, without @option{-mavx512fp16}, @code{_Float16} type is
+storage only, all operations will be emulated by software emulation and the
+@code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is
+to keep the intermediate result of the operation as 32-bit precision. This may
+lead to inconsistent behavior between software emulation and AVX512-FP16
 instructions.

 @node Decimal Float

>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-09-07  1:52                             ` Hongtao Liu
@ 2021-09-07  7:17                               ` Jakub Jelinek
  2021-09-07 10:08                                 ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Jakub Jelinek @ 2021-09-07  7:17 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: liuhongt, GCC Patches, Joseph Myers

On Tue, Sep 07, 2021 at 09:52:57AM +0800, Hongtao Liu wrote:
> Adjust the wording for x86 _Float16 type.
> 
> gcc/ChangeLog:
> 
> * doc/extend.texi: (@node Floating Types): Adjust the wording.
> (@node Half-Precision): Ditto.
> 
> 1 file changed, 15 insertions(+), 13 deletions(-)
> gcc/doc/extend.texi | 28 +++++++++++++++-------------
> 
> modified   gcc/doc/extend.texi
> @@ -1076,9 +1076,10 @@ systems where @code{__float128} is supported.
> The @code{_Float32}
>  type is supported on all systems supporting IEEE binary32; the
>  @code{_Float64} and @code{_Float32x} types are supported on all systems
>  supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
> -systems by default, and on ARM systems when the IEEE format for 16-bit
> -floating-point types is selected with @option{-mfp16-format=ieee}.
> -GCC does not currently support @code{_Float128x} on any systems.
> +systems by default when the IEEE format for 16-bit floating-point types is

The AArch64 case now has the ARM case restriction and ARM is lost.  It
should be

+systems by default, on ARM systems when the IEEE format for 16-bit
+floating-point-types is

> +selected with @option{-mfp16-format=ieee} and, for both C and C++, on x86
> +systems with SSE2 enabled. GCC does not currently support
> +@code{_Float128x} on any systems.
> 
>  On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
>  types using the corresponding internal complex type, @code{XCmode} for
> @@ -1108,6 +1109,12 @@ On ARM and AArch64 targets, GCC supports
> half-precision (16-bit) floating
>  point via the @code{__fp16} type defined in the ARM C Language Extensions.
>  On ARM systems, you must enable this type explicitly with the
>  @option{-mfp16-format} command-line option in order to use it.
> +On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
> +floating point via the @code{_Float16} type, there are many ways to enable
> +SSE2, @option{-msse2, -mavx, -mavx512f, ...} on the command line, or various
> +target attributes.

The ", there are many ways ... attributes" was just meant as explanation for
the "with SSE2 enabled" wording, not something that should be literally in
the documentation.  It is documented elsewhere...

> +For C++, x86 provides a builtin type named @code{_Float16} which contains
> +same data format as C.
> 
>  ARM targets support two incompatible representations for half-precision
>  floating-point values.  You must choose one of the representations and
> @@ -1151,16 +1158,11 @@ calls.
>  It is recommended that portable code use the @code{_Float16} type defined
>  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
> 
> -On x86 targets with @code{target("sse2")} and above, GCC supports
> half-precision
> -(16-bit) floating point via the @code{_Float16} type which is defined by
> -18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> -which contains same data format as C.
> -
> -Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
> -operations will be emulated by software emulation and the @code{float}
> -instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
> -the intermediate result of the operation as 32-bit precision. This may lead
> -to inconsistent behavior between software emulation and AVX512-FP16
> +On x86 targets, without @option{-mavx512fp16}, @code{_Float16} type is
> +storage only, all operations will be emulated by software emulation and the
> +@code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is
> +to keep the intermediate result of the operation as 32-bit precision. This may
> +lead to inconsistent behavior between software emulation and AVX512-FP16
>  instructions.
> 
>  @node Decimal Float

	Jakub


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-09-07  7:17                               ` Jakub Jelinek
@ 2021-09-07 10:08                                 ` Hongtao Liu
  2021-09-07 10:10                                   ` Jakub Jelinek
  0 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-09-07 10:08 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: liuhongt, GCC Patches, Joseph Myers

On Tue, Sep 7, 2021 at 3:18 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Sep 07, 2021 at 09:52:57AM +0800, Hongtao Liu wrote:
> > Adjust the wording for x86 _Float16 type.
> >
> > gcc/ChangeLog:
> >
> > * doc/extend.texi: (@node Floating Types): Adjust the wording.
> > (@node Half-Precision): Ditto.
> >
> > 1 file changed, 15 insertions(+), 13 deletions(-)
> > gcc/doc/extend.texi | 28 +++++++++++++++-------------
> >
> > modified   gcc/doc/extend.texi
> > @@ -1076,9 +1076,10 @@ systems where @code{__float128} is supported.
> > The @code{_Float32}
> >  type is supported on all systems supporting IEEE binary32; the
> >  @code{_Float64} and @code{_Float32x} types are supported on all systems
> >  supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
> > -systems by default, and on ARM systems when the IEEE format for 16-bit
> > -floating-point types is selected with @option{-mfp16-format=ieee}.
> > -GCC does not currently support @code{_Float128x} on any systems.
> > +systems by default when the IEEE format for 16-bit floating-point types is
>
> The AArch64 case now has the ARM case restriction and ARM is lost.  It
> should be
>
> +systems by default, on ARM systems when the IEEE format for 16-bit
> +floating-point-types is
>
> > +selected with @option{-mfp16-format=ieee} and, for both C and C++, on x86
> > +systems with SSE2 enabled. GCC does not currently support
> > +@code{_Float128x} on any systems.
> >
> >  On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
> >  types using the corresponding internal complex type, @code{XCmode} for
> > @@ -1108,6 +1109,12 @@ On ARM and AArch64 targets, GCC supports
> > half-precision (16-bit) floating
> >  point via the @code{__fp16} type defined in the ARM C Language Extensions.
> >  On ARM systems, you must enable this type explicitly with the
> >  @option{-mfp16-format} command-line option in order to use it.
> > +On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
> > +floating point via the @code{_Float16} type, there are many ways to enable
> > +SSE2, @option{-msse2, -mavx, -mavx512f, ...} on the command line, or various
> > +target attributes.
>
> The ", there are many ways ... attributes" was just meant as explanation for
> the "with SSE2 enabled" wording, not something that should be literally in
> the documentation.  It is documented elsewhere...
>
> > +For C++, x86 provides a builtin type named @code{_Float16} which contains
> > +same data format as C.
> >
> >  ARM targets support two incompatible representations for half-precision
> >  floating-point values.  You must choose one of the representations and
> > @@ -1151,16 +1158,11 @@ calls.
> >  It is recommended that portable code use the @code{_Float16} type defined
> >  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
> >
> > -On x86 targets with @code{target("sse2")} and above, GCC supports
> > half-precision
> > -(16-bit) floating point via the @code{_Float16} type which is defined by
> > -18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> > -which contains same data format as C.
> > -
> > -Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
> > -operations will be emulated by software emulation and the @code{float}
> > -instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
> > -the intermediate result of the operation as 32-bit precision. This may lead
> > -to inconsistent behavior between software emulation and AVX512-FP16
> > +On x86 targets, without @option{-mavx512fp16}, @code{_Float16} type is
> > +storage only, all operations will be emulated by software emulation and the
> > +@code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is
> > +to keep the intermediate result of the operation as 32-bit precision. This may
> > +lead to inconsistent behavior between software emulation and AVX512-FP16
> >  instructions.
> >
> >  @node Decimal Float
>
>         Jakub
>
Like this?


1 file changed, 12 insertions(+), 13 deletions(-)
gcc/doc/extend.texi | 25 ++++++++++++-------------

modified   gcc/doc/extend.texi
@@ -1076,9 +1076,10 @@ systems where @code{__float128} is supported.
The @code{_Float32}
 type is supported on all systems supporting IEEE binary32; the
 @code{_Float64} and @code{_Float32x} types are supported on all systems
 supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
-systems by default, and on ARM systems when the IEEE format for 16-bit
-floating-point types is selected with @option{-mfp16-format=ieee}.
-GCC does not currently support @code{_Float128x} on any systems.
+systems by default, on ARM systems when the IEEE format for 16-bit
+floating-point types is selected with @option{-mfp16-format=ieee} and,
+for both C and C++, on x86 systems with SSE2 enabled. GCC does not currently
+support @code{_Float128x} on any systems.

 On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
 types using the corresponding internal complex type, @code{XCmode} for
@@ -1108,6 +1109,9 @@ On ARM and AArch64 targets, GCC supports
half-precision (16-bit) floating
 point via the @code{__fp16} type defined in the ARM C Language Extensions.
 On ARM systems, you must enable this type explicitly with the
 @option{-mfp16-format} command-line option in order to use it.
+On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
+floating point via the @code{_Float16} type. For C++, x86 provides a builtin
+type named @code{_Float16} which contains the same data format as C.

 ARM targets support two incompatible representations for half-precision
 floating-point values.  You must choose one of the representations and
@@ -1151,16 +1155,11 @@ calls.
 It is recommended that portable code use the @code{_Float16} type defined
 by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.

-On x86 targets with @code{target("sse2")} and above, GCC supports
half-precision
-(16-bit) floating point via the @code{_Float16} type which is defined by
-18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
-which contains same data format as C.
-
-Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
-operations will be emulated by software emulation and the @code{float}
-instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
-the intermediate result of the operation as 32-bit precision. This may lead
-to inconsistent behavior between software emulation and AVX512-FP16
+On x86 targets, without @option{-mavx512fp16}, @code{_Float16} type is
+storage only, all operations will be emulated by software emulation and the
+@code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is
+to keep the intermediate result of the operation as 32-bit precision. This may
+lead to inconsistent behavior between software emulation and AVX512-FP16
 instructions.

-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.
  2021-09-07 10:08                                 ` Hongtao Liu
@ 2021-09-07 10:10                                   ` Jakub Jelinek
  0 siblings, 0 replies; 138+ messages in thread
From: Jakub Jelinek @ 2021-09-07 10:10 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: liuhongt, GCC Patches, Joseph Myers

On Tue, Sep 07, 2021 at 06:08:44PM +0800, Hongtao Liu wrote:
> -On x86 targets with @code{target("sse2")} and above, GCC supports
> half-precision
> -(16-bit) floating point via the @code{_Float16} type which is defined by
> -18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> -which contains same data format as C.
> -
> -Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
> -operations will be emulated by software emulation and the @code{float}
> -instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
> -the intermediate result of the operation as 32-bit precision. This may lead
> -to inconsistent behavior between software emulation and AVX512-FP16
> +On x86 targets, without @option{-mavx512fp16}, @code{_Float16} type is

I'd add write
	  targets with SSE2 enabled, without ...

> +storage only, all operations will be emulated by software emulation and the
> +@code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is
> +to keep the intermediate result of the operation as 32-bit precision. This may
> +lead to inconsistent behavior between software emulation and AVX512-FP16
>  instructions.

Ok for trunk with that change, thanks.

	Jakub


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.
  2021-09-03 15:05                                                                             ` Andreas Schwab
@ 2021-09-07 23:19                                                                               ` Segher Boessenkool
  2021-09-08  0:55                                                                                 ` Hongtao Liu
  0 siblings, 1 reply; 138+ messages in thread
From: Segher Boessenkool @ 2021-09-07 23:19 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: richard.sandiford, liuhongt, gcc-patches

On Fri, Sep 03, 2021 at 05:05:47PM +0200, Andreas Schwab wrote:
> On Sep 02 2021, Segher Boessenkool wrote:
> > On Tue, Aug 31, 2021 at 07:17:49PM +0800, liuhongt via Gcc-patches wrote:
> >> 	* emit-rtl.c (validate_subreg): Get rid of all float-int
> >> 	special cases.
> >
> > This caused various regressions on powerpc.  Please revert this until
> > this can be done safely (the comment this patch deletes says why it can
> > not be done yet).
> 
> This also breaks ada on riscv64.
> 
> s-fatgen.adb: In function 'System.Fat_Flt.Attr_Float.Scaling':
> s-fatgen.adb:830:8: error: unable to find a register to spill
> s-fatgen.adb:830:8: error: this is the insn:
> (insn 215 321 216 26 (set (reg:SF 88 [ xx.26_39 ])
>         (mult:SF (reg:SF 190)
>             (subreg:SF (reg:DI 221 [164]) 0))) "s-fatgen.adb":821:25 17 {mulsf3}
>      (expr_list:REG_DEAD (reg:DI 221 [164])
>         (expr_list:REG_DEAD (reg:SF 190)
>             (nil))))
> during RTL pass: reload

It still is broken on rs6000.  This breaks when building SPEC for
example (but in many more places as well).

This needs to be fixed somehow.

I sent <https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579026.html>
(Message-ID: <20210907230730.GM1583@gate.crashing.org>) that may be a
start discussing this somewhat.  The idea of the change looks fine, but
the time isn't ripe for it yet (if it was intentional!)

In the meantime, various targets still are broken.  This needs a real
fix.  How many *other* targets have been broken, just not detected yet?


Segher

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.
  2021-09-07 23:19                                                                               ` Segher Boessenkool
@ 2021-09-08  0:55                                                                                 ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-09-08  0:55 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Andreas Schwab, Richard Sandiford, liuhongt, GCC Patches

On Wed, Sep 8, 2021 at 7:20 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Sep 03, 2021 at 05:05:47PM +0200, Andreas Schwab wrote:
> > On Sep 02 2021, Segher Boessenkool wrote:
> > > On Tue, Aug 31, 2021 at 07:17:49PM +0800, liuhongt via Gcc-patches wrote:
> > >>    * emit-rtl.c (validate_subreg): Get rid of all float-int
> > >>    special cases.
> > >
> > > This caused various regressions on powerpc.  Please revert this until
> > > this can be done safely (the comment this patch deletes says why it can
> > > not be done yet).
> >
> > This also breaks ada on riscv64.
> >
> > s-fatgen.adb: In function 'System.Fat_Flt.Attr_Float.Scaling':
> > s-fatgen.adb:830:8: error: unable to find a register to spill
> > s-fatgen.adb:830:8: error: this is the insn:
> > (insn 215 321 216 26 (set (reg:SF 88 [ xx.26_39 ])
> >         (mult:SF (reg:SF 190)
> >             (subreg:SF (reg:DI 221 [164]) 0))) "s-fatgen.adb":821:25 17 {mulsf3}
> >      (expr_list:REG_DEAD (reg:DI 221 [164])
> >         (expr_list:REG_DEAD (reg:SF 190)
> >             (nil))))
> > during RTL pass: reload
>
> It still is broken on rs6000.  This breaks when building SPEC for
> example (but in many more places as well).
>
> This needs to be fixed somehow.
>
> I sent <https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579026.html>
> (Message-ID: <20210907230730.GM1583@gate.crashing.org>) that may be a
> start discussing this somewhat.  The idea of the change looks fine, but
> the time isn't ripe for it yet (if it was intentional!)
>
> In the meantime, various targets still are broken.  This needs a real
> fix.  How many *other* targets have been broken, just not detected yet?
riscv64 report related bug.
Other than that, no other target reports related regression yet.
>
>
> Segher



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V2 00/10] Initial support for AVX512FP16
  2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
                           ` (9 preceding siblings ...)
  2021-07-21  7:43         ` [PATCH 10/10] AVX512FP16: Add abi test for zmm liuhongt
@ 2021-09-08  2:54         ` Hongtao Liu
  2021-09-08  3:02           ` Hongtao Liu
  10 siblings, 1 reply; 138+ messages in thread
From: Hongtao Liu @ 2021-09-08  2:54 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Uros Bizjak, Joseph Myers, H. J. Lu, Richard Biener

On Wed, Jul 21, 2021 at 3:43 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> Hi:
>   As discussed in [1], this patch support _Float16 under target sse2
> and above, w/o avx512fp16, _Float16 type is storage only, all operations
> are emulated by soft-fp and float instructions. Soft-fp keeps the intermediate
> result of the operation at 32-bit precision by defaults, which may lead to
> inconsistent behavior between soft-fp and avx512fp16 instructions, using option
> -fexcess-precision=standard will force round back after every operation.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574112.html
>
> There's 10 patches in this series:
>
> 1)  Update hf soft-fp from glibc.
> 2)  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> 3)  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
>     truncations.
> 4) AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
> instructions.
> 5) AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> 6) AVX512FP16: Add testcase for vector init and broadcast intrinsics.
> 7) AVX512FP16: Add tests for vector passing in variable arguments.
> 8) AVX512FP16: Add ABI tests for xmm.
> 9) AVX512FP16: Add ABI test for ymm.
> 10) AVX512FP16: Add abi test for zmm

I'm going to check in patch 4-10 plus [1].


[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578654.html.


>
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,} on CLX.
>   Boostrappped and regtested on x86_64-linux-gnu{-m32\ -march=native,\ -march=native} on SPR.
>   Pass 300+ new tests under gcc.dg/torture/*float16*
>
>   On SPR, there're regressions related to FLT_EVAL_METHODS for pr69225-[1234567].c
>  since TARGET_AVX512FP16 will set FLT_EVAL_MATHOD as FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16.
>
>  gcc/common/config/i386/cpuinfo.h              |    2 +
>  gcc/common/config/i386/i386-common.c          |   26 +-
>  gcc/common/config/i386/i386-cpuinfo.h         |    1 +
>  gcc/common/config/i386/i386-isas.h            |    1 +
>  gcc/config.gcc                                |    2 +-
>  gcc/config/i386/avx512fp16intrin.h            |  225 ++++
>  gcc/config/i386/cpuid.h                       |    1 +
>  gcc/config/i386/i386-builtin-types.def        |    7 +-
>  gcc/config/i386/i386-builtins.c               |   23 +
>  gcc/config/i386/i386-c.c                      |    2 +
>  gcc/config/i386/i386-expand.c                 |  129 +-
>  gcc/config/i386/i386-isa.def                  |    1 +
>  gcc/config/i386/i386-modes.def                |   13 +-
>  gcc/config/i386/i386-options.c                |    4 +-
>  gcc/config/i386/i386.c                        |  238 +++-
>  gcc/config/i386/i386.h                        |   28 +-
>  gcc/config/i386/i386.md                       |  304 ++++-
>  gcc/config/i386/i386.opt                      |    4 +
>  gcc/config/i386/immintrin.h                   |    4 +
>  gcc/config/i386/sse.md                        |  395 ++++--
>  gcc/doc/extend.texi                           |   16 +
>  gcc/doc/invoke.texi                           |   10 +-
>  gcc/lto/lto-lang.c                            |    3 +
>  gcc/optabs-query.c                            |   10 +-
>  gcc/testsuite/g++.dg/other/i386-2.C           |    2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C           |    2 +-
>  gcc/testsuite/g++.target/i386/float16-1.C     |    8 +
>  gcc/testsuite/g++.target/i386/float16-2.C     |   14 +
>  gcc/testsuite/g++.target/i386/float16-3.C     |   10 +
>  gcc/testsuite/gcc.target/i386/avx-1.c         |    2 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c         |    2 +-
>  gcc/testsuite/gcc.target/i386/avx512-check.h  |    3 +
>  .../gcc.target/i386/avx512fp16-10a.c          |   14 +
>  .../gcc.target/i386/avx512fp16-10b.c          |   25 +
>  .../gcc.target/i386/avx512fp16-12a.c          |   21 +
>  .../gcc.target/i386/avx512fp16-12b.c          |   27 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |   24 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |   32 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |   26 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |   33 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |   30 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |   28 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |   33 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |   36 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |   36 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |   35 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |   40 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-4.c  |   31 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-5.c  |  133 ++
>  gcc/testsuite/gcc.target/i386/avx512fp16-6.c  |   57 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-7.c  |   86 ++
>  gcc/testsuite/gcc.target/i386/avx512fp16-8.c  |   53 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-9a.c |   27 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-9b.c |   49 +
>  .../gcc.target/i386/avx512fp16-vararg-1.c     |  122 ++
>  .../gcc.target/i386/avx512fp16-vararg-2.c     |  107 ++
>  .../gcc.target/i386/avx512fp16-vararg-3.c     |  114 ++
>  .../gcc.target/i386/avx512fp16-vararg-4.c     |  115 ++
>  .../gcc.target/i386/avx512fp16-vec_set_var.c  |   30 +
>  gcc/testsuite/gcc.target/i386/float16-3a.c    |   10 +
>  gcc/testsuite/gcc.target/i386/float16-3b.c    |   10 +
>  gcc/testsuite/gcc.target/i386/float16-4a.c    |   10 +
>  gcc/testsuite/gcc.target/i386/float16-4b.c    |   10 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |    2 +
>  gcc/testsuite/gcc.target/i386/m512-check.h    |   38 +-
>  gcc/testsuite/gcc.target/i386/pr54855-12.c    |   14 +
>  gcc/testsuite/gcc.target/i386/pr54855-13.c    |   14 +
>  gcc/testsuite/gcc.target/i386/sse-13.c        |    2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c        |    2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c        |    4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c        |    2 +-
>  .../gcc.target/i386/sse2-float16-1.c          |    8 +
>  .../gcc.target/i386/sse2-float16-2.c          |   16 +
>  .../gcc.target/i386/sse2-float16-3.c          |   12 +
>  .../abi/avx512fp16/abi-avx512fp16-xmm.exp     |   48 +
>  .../gcc.target/x86_64/abi/avx512fp16/args.h   |  190 +++
>  .../x86_64/abi/avx512fp16/asm-support.S       |   81 ++
>  .../x86_64/abi/avx512fp16/avx512fp16-check.h  |   74 ++
>  .../abi/avx512fp16/avx512fp16-xmm-check.h     |    3 +
>  .../x86_64/abi/avx512fp16/defines.h           |  150 +++
>  .../avx512fp16/m256h/abi-avx512fp16-ymm.exp   |   45 +
>  .../x86_64/abi/avx512fp16/m256h/args.h        |  182 +++
>  .../x86_64/abi/avx512fp16/m256h/asm-support.S |   81 ++
>  .../avx512fp16/m256h/avx512fp16-ymm-check.h   |    3 +
>  .../avx512fp16/m256h/test_m256_returning.c    |   54 +
>  .../abi/avx512fp16/m256h/test_passing_m256.c  |  370 ++++++
>  .../avx512fp16/m256h/test_passing_structs.c   |  113 ++
>  .../avx512fp16/m256h/test_passing_unions.c    |  337 ++++++
>  .../abi/avx512fp16/m256h/test_varargs-m256.c  |  160 +++
>  .../avx512fp16/m512h/abi-avx512fp16-zmm.exp   |   48 +
>  .../x86_64/abi/avx512fp16/m512h/args.h        |  186 +++
>  .../x86_64/abi/avx512fp16/m512h/asm-support.S |   97 ++
>  .../avx512fp16/m512h/avx512fp16-zmm-check.h   |    4 +
>  .../avx512fp16/m512h/test_m512_returning.c    |   62 +
>  .../abi/avx512fp16/m512h/test_passing_m512.c  |  380 ++++++
>  .../avx512fp16/m512h/test_passing_structs.c   |  123 ++
>  .../avx512fp16/m512h/test_passing_unions.c    |  415 +++++++
>  .../abi/avx512fp16/m512h/test_varargs-m512.c  |  164 +++
>  .../gcc.target/x86_64/abi/avx512fp16/macros.h |   53 +
>  .../test_3_element_struct_and_unions.c        |  692 +++++++++++
>  .../abi/avx512fp16/test_basic_alignment.c     |   45 +
>  .../test_basic_array_size_and_align.c         |   43 +
>  .../abi/avx512fp16/test_basic_returning.c     |   87 ++
>  .../x86_64/abi/avx512fp16/test_basic_sizes.c  |   43 +
>  .../test_basic_struct_size_and_align.c        |   42 +
>  .../test_basic_union_size_and_align.c         |   40 +
>  .../abi/avx512fp16/test_complex_returning.c   |  104 ++
>  .../abi/avx512fp16/test_m64m128_returning.c   |   73 ++
>  .../abi/avx512fp16/test_passing_floats.c      | 1066 +++++++++++++++++
>  .../abi/avx512fp16/test_passing_m64m128.c     |  510 ++++++++
>  .../abi/avx512fp16/test_passing_structs.c     |  332 +++++
>  .../abi/avx512fp16/test_passing_unions.c      |  335 ++++++
>  .../abi/avx512fp16/test_struct_returning.c    |  274 +++++
>  .../x86_64/abi/avx512fp16/test_varargs-m128.c |  164 +++
>  gcc/testsuite/lib/target-supports.exp         |   13 +-
>  libgcc/config.host                            |    5 +-
>  libgcc/config/i386/32/sfp-machine.h           |    1 +
>  libgcc/config/i386/64/sfp-machine.h           |    1 +
>  libgcc/config/i386/64/t-softfp                |    1 +
>  libgcc/config/i386/sfp-machine.h              |    1 +
>  libgcc/config/i386/t-softfp                   |    5 +
>  libgcc/soft-fp/eqhf2.c                        |   49 +
>  libgcc/soft-fp/extendhfdf2.c                  |   53 +
>  libgcc/soft-fp/extendhfsf2.c                  |   49 +
>  libgcc/soft-fp/half.h                         |    1 +
>  libgcc/soft-fp/truncdfhf2.c                   |   52 +
>  libgcc/soft-fp/truncsfhf2.c                   |   48 +
>  127 files changed, 10324 insertions(+), 238 deletions(-)
>  create mode 100644 gcc/config/i386/avx512fp16intrin.h
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
>  create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-7.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vec_set_var.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-13.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
>  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c
>  create mode 100644 libgcc/config/i386/64/t-softfp
>  create mode 100644 libgcc/soft-fp/eqhf2.c
>  create mode 100644 libgcc/soft-fp/extendhfdf2.c
>  create mode 100644 libgcc/soft-fp/extendhfsf2.c
>  create mode 100644 libgcc/soft-fp/truncdfhf2.c
>  create mode 100644 libgcc/soft-fp/truncsfhf2.c
>
> --
> 2.18.1
>


--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V2 00/10] Initial support for AVX512FP16
  2021-09-08  2:54         ` [PATCH V2 00/10] Initial support for AVX512FP16 Hongtao Liu
@ 2021-09-08  3:02           ` Hongtao Liu
  0 siblings, 0 replies; 138+ messages in thread
From: Hongtao Liu @ 2021-09-08  3:02 UTC (permalink / raw)
  To: liuhongt; +Cc: GCC Patches, Uros Bizjak, Joseph Myers, H. J. Lu, Richard Biener

On Wed, Sep 8, 2021 at 10:54 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Jul 21, 2021 at 3:43 PM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > Hi:
> >   As discussed in [1], this patch support _Float16 under target sse2
> > and above, w/o avx512fp16, _Float16 type is storage only, all operations
> > are emulated by soft-fp and float instructions. Soft-fp keeps the intermediate
> > result of the operation at 32-bit precision by defaults, which may lead to
> > inconsistent behavior between soft-fp and avx512fp16 instructions, using option
> > -fexcess-precision=standard will force round back after every operation.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574112.html
> >
> > There's 10 patches in this series:
> >
> > 1)  Update hf soft-fp from glibc.
> > 2)  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> > 3)  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> >     truncations.
> > 4) AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
> > instructions.
> > 5) AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> > 6) AVX512FP16: Add testcase for vector init and broadcast intrinsics.
> > 7) AVX512FP16: Add tests for vector passing in variable arguments.
> > 8) AVX512FP16: Add ABI tests for xmm.
> > 9) AVX512FP16: Add ABI test for ymm.
> > 10) AVX512FP16: Add abi test for zmm
>
> I'm going to check in patch 4-10 plus [1].
patch 4 introduces a new failure which seems to just expose the latent
bug which is already recorded in PR99936 and PR98531.

g++.dg/modules/xtreme-header_b.C -std=c++17 (internal compiler error)

>
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578654.html.
>
>
> >
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,} on CLX.
> >   Boostrappped and regtested on x86_64-linux-gnu{-m32\ -march=native,\ -march=native} on SPR.
> >   Pass 300+ new tests under gcc.dg/torture/*float16*
> >
> >   On SPR, there're regressions related to FLT_EVAL_METHODS for pr69225-[1234567].c
> >  since TARGET_AVX512FP16 will set FLT_EVAL_MATHOD as FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16.
> >
> >  gcc/common/config/i386/cpuinfo.h              |    2 +
> >  gcc/common/config/i386/i386-common.c          |   26 +-
> >  gcc/common/config/i386/i386-cpuinfo.h         |    1 +
> >  gcc/common/config/i386/i386-isas.h            |    1 +
> >  gcc/config.gcc                                |    2 +-
> >  gcc/config/i386/avx512fp16intrin.h            |  225 ++++
> >  gcc/config/i386/cpuid.h                       |    1 +
> >  gcc/config/i386/i386-builtin-types.def        |    7 +-
> >  gcc/config/i386/i386-builtins.c               |   23 +
> >  gcc/config/i386/i386-c.c                      |    2 +
> >  gcc/config/i386/i386-expand.c                 |  129 +-
> >  gcc/config/i386/i386-isa.def                  |    1 +
> >  gcc/config/i386/i386-modes.def                |   13 +-
> >  gcc/config/i386/i386-options.c                |    4 +-
> >  gcc/config/i386/i386.c                        |  238 +++-
> >  gcc/config/i386/i386.h                        |   28 +-
> >  gcc/config/i386/i386.md                       |  304 ++++-
> >  gcc/config/i386/i386.opt                      |    4 +
> >  gcc/config/i386/immintrin.h                   |    4 +
> >  gcc/config/i386/sse.md                        |  395 ++++--
> >  gcc/doc/extend.texi                           |   16 +
> >  gcc/doc/invoke.texi                           |   10 +-
> >  gcc/lto/lto-lang.c                            |    3 +
> >  gcc/optabs-query.c                            |   10 +-
> >  gcc/testsuite/g++.dg/other/i386-2.C           |    2 +-
> >  gcc/testsuite/g++.dg/other/i386-3.C           |    2 +-
> >  gcc/testsuite/g++.target/i386/float16-1.C     |    8 +
> >  gcc/testsuite/g++.target/i386/float16-2.C     |   14 +
> >  gcc/testsuite/g++.target/i386/float16-3.C     |   10 +
> >  gcc/testsuite/gcc.target/i386/avx-1.c         |    2 +-
> >  gcc/testsuite/gcc.target/i386/avx-2.c         |    2 +-
> >  gcc/testsuite/gcc.target/i386/avx512-check.h  |    3 +
> >  .../gcc.target/i386/avx512fp16-10a.c          |   14 +
> >  .../gcc.target/i386/avx512fp16-10b.c          |   25 +
> >  .../gcc.target/i386/avx512fp16-12a.c          |   21 +
> >  .../gcc.target/i386/avx512fp16-12b.c          |   27 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |   24 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |   32 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |   26 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |   33 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |   30 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |   28 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |   33 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |   36 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |   36 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |   35 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |   40 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-4.c  |   31 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-5.c  |  133 ++
> >  gcc/testsuite/gcc.target/i386/avx512fp16-6.c  |   57 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-7.c  |   86 ++
> >  gcc/testsuite/gcc.target/i386/avx512fp16-8.c  |   53 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-9a.c |   27 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-9b.c |   49 +
> >  .../gcc.target/i386/avx512fp16-vararg-1.c     |  122 ++
> >  .../gcc.target/i386/avx512fp16-vararg-2.c     |  107 ++
> >  .../gcc.target/i386/avx512fp16-vararg-3.c     |  114 ++
> >  .../gcc.target/i386/avx512fp16-vararg-4.c     |  115 ++
> >  .../gcc.target/i386/avx512fp16-vec_set_var.c  |   30 +
> >  gcc/testsuite/gcc.target/i386/float16-3a.c    |   10 +
> >  gcc/testsuite/gcc.target/i386/float16-3b.c    |   10 +
> >  gcc/testsuite/gcc.target/i386/float16-4a.c    |   10 +
> >  gcc/testsuite/gcc.target/i386/float16-4b.c    |   10 +
> >  gcc/testsuite/gcc.target/i386/funcspec-56.inc |    2 +
> >  gcc/testsuite/gcc.target/i386/m512-check.h    |   38 +-
> >  gcc/testsuite/gcc.target/i386/pr54855-12.c    |   14 +
> >  gcc/testsuite/gcc.target/i386/pr54855-13.c    |   14 +
> >  gcc/testsuite/gcc.target/i386/sse-13.c        |    2 +-
> >  gcc/testsuite/gcc.target/i386/sse-14.c        |    2 +-
> >  gcc/testsuite/gcc.target/i386/sse-22.c        |    4 +-
> >  gcc/testsuite/gcc.target/i386/sse-23.c        |    2 +-
> >  .../gcc.target/i386/sse2-float16-1.c          |    8 +
> >  .../gcc.target/i386/sse2-float16-2.c          |   16 +
> >  .../gcc.target/i386/sse2-float16-3.c          |   12 +
> >  .../abi/avx512fp16/abi-avx512fp16-xmm.exp     |   48 +
> >  .../gcc.target/x86_64/abi/avx512fp16/args.h   |  190 +++
> >  .../x86_64/abi/avx512fp16/asm-support.S       |   81 ++
> >  .../x86_64/abi/avx512fp16/avx512fp16-check.h  |   74 ++
> >  .../abi/avx512fp16/avx512fp16-xmm-check.h     |    3 +
> >  .../x86_64/abi/avx512fp16/defines.h           |  150 +++
> >  .../avx512fp16/m256h/abi-avx512fp16-ymm.exp   |   45 +
> >  .../x86_64/abi/avx512fp16/m256h/args.h        |  182 +++
> >  .../x86_64/abi/avx512fp16/m256h/asm-support.S |   81 ++
> >  .../avx512fp16/m256h/avx512fp16-ymm-check.h   |    3 +
> >  .../avx512fp16/m256h/test_m256_returning.c    |   54 +
> >  .../abi/avx512fp16/m256h/test_passing_m256.c  |  370 ++++++
> >  .../avx512fp16/m256h/test_passing_structs.c   |  113 ++
> >  .../avx512fp16/m256h/test_passing_unions.c    |  337 ++++++
> >  .../abi/avx512fp16/m256h/test_varargs-m256.c  |  160 +++
> >  .../avx512fp16/m512h/abi-avx512fp16-zmm.exp   |   48 +
> >  .../x86_64/abi/avx512fp16/m512h/args.h        |  186 +++
> >  .../x86_64/abi/avx512fp16/m512h/asm-support.S |   97 ++
> >  .../avx512fp16/m512h/avx512fp16-zmm-check.h   |    4 +
> >  .../avx512fp16/m512h/test_m512_returning.c    |   62 +
> >  .../abi/avx512fp16/m512h/test_passing_m512.c  |  380 ++++++
> >  .../avx512fp16/m512h/test_passing_structs.c   |  123 ++
> >  .../avx512fp16/m512h/test_passing_unions.c    |  415 +++++++
> >  .../abi/avx512fp16/m512h/test_varargs-m512.c  |  164 +++
> >  .../gcc.target/x86_64/abi/avx512fp16/macros.h |   53 +
> >  .../test_3_element_struct_and_unions.c        |  692 +++++++++++
> >  .../abi/avx512fp16/test_basic_alignment.c     |   45 +
> >  .../test_basic_array_size_and_align.c         |   43 +
> >  .../abi/avx512fp16/test_basic_returning.c     |   87 ++
> >  .../x86_64/abi/avx512fp16/test_basic_sizes.c  |   43 +
> >  .../test_basic_struct_size_and_align.c        |   42 +
> >  .../test_basic_union_size_and_align.c         |   40 +
> >  .../abi/avx512fp16/test_complex_returning.c   |  104 ++
> >  .../abi/avx512fp16/test_m64m128_returning.c   |   73 ++
> >  .../abi/avx512fp16/test_passing_floats.c      | 1066 +++++++++++++++++
> >  .../abi/avx512fp16/test_passing_m64m128.c     |  510 ++++++++
> >  .../abi/avx512fp16/test_passing_structs.c     |  332 +++++
> >  .../abi/avx512fp16/test_passing_unions.c      |  335 ++++++
> >  .../abi/avx512fp16/test_struct_returning.c    |  274 +++++
> >  .../x86_64/abi/avx512fp16/test_varargs-m128.c |  164 +++
> >  gcc/testsuite/lib/target-supports.exp         |   13 +-
> >  libgcc/config.host                            |    5 +-
> >  libgcc/config/i386/32/sfp-machine.h           |    1 +
> >  libgcc/config/i386/64/sfp-machine.h           |    1 +
> >  libgcc/config/i386/64/t-softfp                |    1 +
> >  libgcc/config/i386/sfp-machine.h              |    1 +
> >  libgcc/config/i386/t-softfp                   |    5 +
> >  libgcc/soft-fp/eqhf2.c                        |   49 +
> >  libgcc/soft-fp/extendhfdf2.c                  |   53 +
> >  libgcc/soft-fp/extendhfsf2.c                  |   49 +
> >  libgcc/soft-fp/half.h                         |    1 +
> >  libgcc/soft-fp/truncdfhf2.c                   |   52 +
> >  libgcc/soft-fp/truncsfhf2.c                   |   48 +
> >  127 files changed, 10324 insertions(+), 238 deletions(-)
> >  create mode 100644 gcc/config/i386/avx512fp16intrin.h
> >  create mode 100644 gcc/testsuite/g++.target/i386/float16-1.C
> >  create mode 100644 gcc/testsuite/g++.target/i386/float16-2.C
> >  create mode 100644 gcc/testsuite/g++.target/i386/float16-3.C
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-12b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-5.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-6.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-7.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-8.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vec_set_var.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-3b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4a.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-4b.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-12.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-13.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/args.h
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/asm-support.S
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/defines.h
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/macros.h
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c
> >  create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c
> >  create mode 100644 libgcc/config/i386/64/t-softfp
> >  create mode 100644 libgcc/soft-fp/eqhf2.c
> >  create mode 100644 libgcc/soft-fp/extendhfdf2.c
> >  create mode 100644 libgcc/soft-fp/extendhfsf2.c
> >  create mode 100644 libgcc/soft-fp/truncdfhf2.c
> >  create mode 100644 libgcc/soft-fp/truncsfhf2.c
> >
> > --
> > 2.18.1
> >
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-03 15:33                                 ` Iain Sandoe
@ 2021-09-21 20:11                                   ` Joseph Myers
  2021-09-21 20:25                                     ` Iain Sandoe
  2021-09-22  7:08                                     ` Iain Sandoe
  0 siblings, 2 replies; 138+ messages in thread
From: Joseph Myers @ 2021-09-21 20:11 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: GCC Patches, liuhongt

On Fri, 3 Sep 2021, Iain Sandoe wrote:

> given that:
> 
> a) this fixes Darwin x86-64 bootstrap which has been broken for more than 24h
> b) the patch is now Darwin-local.

Actually, it's not Darwin-local.  It uses __MACH__, which is also defined 
for Hurd.  And because sfp-machine.h gets included in files that aren't 
specific to HFmode (and so aren't built with explicit -msse2), the build 
for i686-gnu fails with:

In file included from /scratch/jmyers/glibc-bot/src/gcc/libgcc/config/i386/sfp-exceptions.c:25:
/scratch/jmyers/glibc-bot/src/gcc/libgcc/config/i386/sfp-machine.h:83:1: error: unable to emulate 'HF'
   83 | typedef float alias_HFtype __attribute__ ((mode (HF)));
      | ^~~~~~~

I think some conditional that is genuinely Darwin-specific should be used, 
so that Hurd keeps using normal ELF aliases and doesn't get these HFmode 
references in sfp-machine.h at all.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-21 20:11                                   ` Joseph Myers
@ 2021-09-21 20:25                                     ` Iain Sandoe
  2021-09-22  7:08                                     ` Iain Sandoe
  1 sibling, 0 replies; 138+ messages in thread
From: Iain Sandoe @ 2021-09-21 20:25 UTC (permalink / raw)
  To: Joseph Myers; +Cc: liuhongt, GCC Patches

Hello Joseph,

> On 21 Sep 2021, at 21:11, Joseph Myers <joseph@codesourcery.com> wrote:
> 
> On Fri, 3 Sep 2021, Iain Sandoe wrote:
> 
>> given that:
>> 
>> a) this fixes Darwin x86-64 bootstrap which has been broken for more than 24h
>> b) the patch is now Darwin-local.
> 
> Actually, it's not Darwin-local.  It uses __MACH__, which is also defined 
> for Hurd.  And because sfp-machine.h gets included in files that aren't 
> specific to HFmode (and so aren't built with explicit -msse2), the build 
> for i686-gnu fails with:
> 
> In file included from /scratch/jmyers/glibc-bot/src/gcc/libgcc/config/i386/sfp-exceptions.c:25:
> /scratch/jmyers/glibc-bot/src/gcc/libgcc/config/i386/sfp-machine.h:83:1: error: unable to emulate 'HF'
>   83 | typedef float alias_HFtype __attribute__ ((mode (HF)));
>      | ^~~~~~~
> 
> I think some conditional that is genuinely Darwin-specific should be used, 
> so that Hurd keeps using normal ELF aliases and doesn't get these HFmode 
> references in sfp-machine.h at all.

Sorry about this, (I usually use __APPLE__ as the flag, but the __MACH__ was already there
in this case, I think).

I’ll fix this by s/__MACH__/__APPLE__/ 
since in this case we definitely mean Mach-O rather than anything to do with the micro-kernel...
(tomorrow now).

Note, that I have a suspicion that there are (maybe a small number of) other places in the code where __MACH__ has been taken to mean mach-o is in use.

thanks
Iain


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-21 20:11                                   ` Joseph Myers
  2021-09-21 20:25                                     ` Iain Sandoe
@ 2021-09-22  7:08                                     ` Iain Sandoe
  2021-09-22 19:50                                       ` Joseph Myers
  1 sibling, 1 reply; 138+ messages in thread
From: Iain Sandoe @ 2021-09-22  7:08 UTC (permalink / raw)
  To: Joseph Myers, Thomas Schwinge; +Cc: GCC Patches

Hi Joseph, Thomas,

> On 21 Sep 2021, at 21:11, Joseph Myers <joseph@codesourcery.com> wrote:
> 
> On Fri, 3 Sep 2021, Iain Sandoe wrote:
> 
>> given that:
>> 
>> a) this fixes Darwin x86-64 bootstrap which has been broken for more than 24h
>> b) the patch is now Darwin-local.
> 
> Actually, it's not Darwin-local.  It uses __MACH__, which is also defined 
> for Hurd.  And because sfp-machine.h gets included in files that aren't 
> specific to HFmode (and so aren't built with explicit -msse2), the build 
> for i686-gnu fails with:

Fixed for master with:

https://gcc.gnu.org/pipermail/gcc-cvs/2021-September/353935.html

However, note that the use of __MACH__ to guard the Mach-O code has been
there for a long time (it is present in all open branches).  So it's possible that has
been silently doing the wrong thing for some time,

So maybe the fix at r12-3777-g578b7687338 should be back-ported?

thanks
Iain


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH V3 0/6] Initial support for AVX512FP16
  2021-09-22  7:08                                     ` Iain Sandoe
@ 2021-09-22 19:50                                       ` Joseph Myers
  0 siblings, 0 replies; 138+ messages in thread
From: Joseph Myers @ 2021-09-22 19:50 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: Thomas Schwinge, GCC Patches

On Wed, 22 Sep 2021, Iain Sandoe wrote:

> However, note that the use of __MACH__ to guard the Mach-O code has been
> there for a long time (it is present in all open branches).  So it's possible that has
> been silently doing the wrong thing for some time,

That's "wrong thing" as in having previously been suboptimal; the 
semantics of the function aliases would still have been correct (until the 
HFmode changes introduced a build failure), it would just have been less 
efficient for them to be wrappers rather than proper aliases.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 138+ messages in thread

end of thread, other threads:[~2021-09-22 19:50 UTC | newest]

Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210701054808.39000-1-hongtao.liu@intel.com>
2021-07-01  5:55 ` [PATCH 0/2] Initial support for AVX512FP16 Hongtao Liu
2021-07-01 20:46   ` Joseph Myers
2021-07-06  8:53     ` Hongtao Liu
     [not found] ` <20210701054808.39000-3-hongtao.liu@intel.com>
2021-07-01  5:55   ` [PATCH 2/2] AVX512FP16: Add HFmode support in libgcc Hongtao Liu
     [not found] ` <20210701054808.39000-2-hongtao.liu@intel.com>
2021-07-01  5:55   ` [PATCH 1/2] AVX512FP16: Initial support for _Float16 type and AVX512FP16 feature Hongtao Liu
2021-07-01 11:10 ` [PATCH 0/2] Initial support for AVX512FP16 Uros Bizjak
2021-07-01 12:39   ` H.J. Lu
2021-07-01 12:58     ` Richard Biener
2021-07-01 13:03       ` Jakub Jelinek
2021-07-06  8:51         ` Hongtao Liu
2021-07-06 10:14           ` Richard Biener
2021-07-06 12:11             ` H.J. Lu
2021-07-06 18:20               ` Joseph Myers
2021-07-06 18:18             ` Joseph Myers
2021-07-06 18:11           ` Joseph Myers
2021-07-07  1:24             ` Hongtao Liu
2021-07-14  7:50               ` Hongtao Liu
2021-07-14 15:32                 ` [llvm-dev] " Craig Topper
2021-07-15  2:07                   ` Wang, Pengfei
2021-07-15  6:34                     ` Hongtao Liu
2021-07-15  6:57                       ` Wang, Pengfei
2021-07-15  7:49                         ` Hongtao Liu
2021-07-21  7:43       ` [PATCH V2 00/10] " liuhongt
2021-07-21  7:43         ` [PATCH 01/10] Update hf soft-fp from glibc liuhongt
2021-07-21  7:43         ` [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
2021-07-21 10:35           ` Uros Bizjak
2021-07-22  5:21             ` Hongtao Liu
2021-07-22 11:56           ` Richard Biener
2021-07-28 21:56           ` Joseph Myers
2021-07-29  4:53             ` Hongtao Liu
2021-07-29  5:34               ` Hongtao Liu
2021-07-29 21:30               ` Joseph Myers
2021-08-02  5:23                 ` Hongtao Liu
2021-08-02  6:31                   ` [PATCH V3 0/6] Initial support for AVX512FP16 liuhongt
2021-08-02  6:31                     ` [PATCH 1/6] Update hf soft-fp from glibc liuhongt
2021-08-02  6:31                     ` [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above liuhongt
2021-08-04  2:45                       ` Hongtao Liu
2021-08-04 11:28                         ` Richard Biener
2021-08-05  7:31                           ` Hongtao Liu
2021-08-05  7:39                             ` Hongtao Liu
2021-08-05  9:24                             ` Richard Biener
2021-08-05  9:49                               ` Hongtao Liu
2021-08-05 10:14                                 ` Richard Biener
2021-08-06  3:32                                   ` [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field liuhongt
2021-08-06  3:44                                     ` Andrew Pinski
2021-08-06  4:59                                       ` Hongtao Liu
2021-08-06  5:52                                         ` Hongtao Liu
2021-08-06  6:59                                         ` Richard Biener
2021-08-06  6:57                                     ` Richard Biener
2021-08-06  9:05                                       ` Richard Sandiford
2021-08-06 11:27                                         ` Richard Biener
2021-08-09  8:34                                           ` Hongtao Liu
2021-08-17  1:52                                             ` Hongtao Liu
2021-08-24  9:40                                               ` Hongtao Liu
2021-08-24  9:44                                                 ` Hongtao Liu
2021-08-24 11:38                                                   ` Richard Biener
2021-08-26  1:17                                                     ` Hongtao Liu
2021-08-25 23:16                                                   ` Jeff Law
2021-08-26  2:05                                                     ` Hongtao Liu
2021-08-26  7:11                                                     ` Richard Biener
2021-08-26  9:06                                                       ` Richard Sandiford
2021-08-26 10:14                                                         ` Richard Biener
2021-08-26 10:50                                                           ` Richard Sandiford
2021-08-26 11:09                                                             ` Richard Biener
2021-08-27  4:56                                                               ` Hongtao Liu
2021-08-30 19:09                                                                 ` Joseph Myers
2021-08-30 21:15                                                                   ` Jeff Law
2021-08-31  6:10                                                                 ` Richard Biener
2021-08-31  6:30                                                                   ` Hongtao Liu
2021-08-31  6:48                                                                     ` Hongtao Liu
2021-08-31 11:16                                                                       ` Richard Biener
2021-08-31 11:17                                                                       ` [PATCH 0/2] Get rid of all float-int special cases in validate_subreg liuhongt
2021-08-31 11:17                                                                         ` [PATCH 1/2] Revert "Make sure we're playing with integral modes before call extract_integral_bit_field." liuhongt
2021-08-31 11:17                                                                         ` [PATCH 2/2] Get rid of all float-int special cases in validate_subreg liuhongt
2021-08-31 11:57                                                                           ` Richard Biener
2021-09-02 17:55                                                                           ` Segher Boessenkool
2021-09-03 15:05                                                                             ` Andreas Schwab
2021-09-07 23:19                                                                               ` Segher Boessenkool
2021-09-08  0:55                                                                                 ` Hongtao Liu
2021-09-03 12:42                       ` [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above Jakub Jelinek
2021-09-06  2:05                         ` Hongtao Liu
2021-09-06 12:13                           ` Jakub Jelinek
2021-09-07  1:52                             ` Hongtao Liu
2021-09-07  7:17                               ` Jakub Jelinek
2021-09-07 10:08                                 ` Hongtao Liu
2021-09-07 10:10                                   ` Jakub Jelinek
2021-08-02  6:31                     ` [PATCH 3/6] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations liuhongt
2021-08-02  6:31                     ` [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16 liuhongt
2021-08-02 19:34                       ` Joseph Myers
2021-08-03  2:44                         ` Hongtao Liu
2021-08-06  6:06                           ` Hongtao Liu
2021-08-17  1:53                             ` Hongtao Liu
2021-08-24  9:39                               ` Hongtao Liu
2021-09-02  6:06                                 ` Hongtao Liu
2021-08-02  6:39                     ` [PATCH 6/6] AVX512FP16: Support vector init/broadcast/set/extract for FP16 liuhongt
2021-08-02  6:44                     ` [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions liuhongt
2021-08-04  2:40                       ` Hongtao Liu
2021-08-04  9:55                       ` Uros Bizjak
2021-09-02  6:06                     ` [PATCH V3 0/6] Initial support for AVX512FP16 Hongtao Liu
2021-09-02 11:30                       ` Iain Sandoe
2021-09-02 15:18                         ` Hongtao Liu
2021-09-02 16:44                           ` Iain Sandoe
2021-09-02 20:03                             ` Joseph Myers
2021-09-03  7:51                               ` Iain Sandoe
2021-09-03 15:33                                 ` Iain Sandoe
2021-09-21 20:11                                   ` Joseph Myers
2021-09-21 20:25                                     ` Iain Sandoe
2021-09-22  7:08                                     ` Iain Sandoe
2021-09-22 19:50                                       ` Joseph Myers
2021-09-02 15:30                       ` H.J. Lu
2021-09-02 15:50                         ` Hongtao Liu
2021-09-02 19:45                       ` Joseph Myers
2021-07-21  7:43         ` [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations liuhongt
2021-07-21 10:51           ` Uros Bizjak
2021-07-22 12:14           ` Richard Biener
2021-07-27  5:32             ` Hongtao Liu
2021-07-29 20:57               ` Joseph Myers
2021-08-02  5:10                 ` Hongtao Liu
2021-07-21  7:43         ` [PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions liuhongt
2021-07-22  8:49           ` Uros Bizjak
2021-07-27  7:31             ` Hongtao Liu
2021-07-21  7:43         ` [PATCH 05/10] AVX512FP16: Support vector init/broadcast/set/extract for FP16 liuhongt
2021-07-22  5:24           ` Hongtao Liu
2021-07-21  7:43         ` [PATCH 06/10] AVX512FP16: Add testcase for vector init and broadcast intrinsics liuhongt
2021-07-21  7:43         ` [PATCH 07/10] AVX512FP16: Add tests for vector passing in variable arguments liuhongt
2021-07-21  7:43         ` [PATCH 08/10] AVX512FP16: Add ABI tests for xmm liuhongt
2021-07-21  7:43         ` [PATCH 09/10] AVX512FP16: Add ABI test for ymm liuhongt
2021-07-21  7:43         ` [PATCH 10/10] AVX512FP16: Add abi test for zmm liuhongt
2021-09-08  2:54         ` [PATCH V2 00/10] Initial support for AVX512FP16 Hongtao Liu
2021-09-08  3:02           ` Hongtao Liu
2021-07-01 12:58     ` [PATCH 0/2] " Uros Bizjak
2021-07-01 21:40     ` Joseph Myers
2021-07-02  6:30   ` Hongtao Liu
2021-07-02  8:03     ` Uros Bizjak
2021-07-02  8:19       ` Richard Biener
2021-07-03 14:44         ` Hongtao Liu
2021-07-05  1:25       ` Hongtao Liu
2021-07-05 11:02         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).