From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) by sourceware.org (Postfix) with ESMTPS id C4AF83858C2C for ; Tue, 28 Sep 2021 06:27:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C4AF83858C2C Received: by mail-qt1-x830.google.com with SMTP id m26so7429527qtn.1 for ; Mon, 27 Sep 2021 23:27:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=He1WWAtHCmjCOGRF4xGmAPXZdbl1FUCcXOYYQUCSCTo=; b=7iV2kRE3uY3fP9mbuz8vCJ0XcNsIazMykrro80+i1Ef/OKL3Se3BE2PM+8byoaJKJH sDRCUedvrD2ka18xdJhs/jV6OjX+fu6pwepcBX5Fz+iqw2oRFhmPkGvDCSyt8q7RrLdC Sj5J2IwlEyRqz/Z6qdQLSqAB/yP5TH5xIIh386455gDdQTf04Be1VI914D/sN13YvY3O GK2oJdIm1WqnrJtfDAQb61eBAEoJh8wzacTnDgUAb3EX3LXfT5d+W3nV9LTVaHwIEVib vczZH2Ujs8GIp9NV7l2GREmIcH0rlW6xAEAIOqy2uPVVLRMPJLzND/GdKQzRvwCOhhCt np7A== X-Gm-Message-State: AOAM533wMieWx94ak1gSuZbp7ARK+yT2mc1NiM0wjkdRciRA/Neob9YA rP3XM0yzhrb1J78N2GNuqgVb349jF1KNQC6AH9E= X-Google-Smtp-Source: ABdhPJxXx022a0ist62e2Wn6UNR99khgBC2dpHOVV+GT3lXjGDdS6zLmdGbNIUanFqwf7i9cU6H9uBoyDABldHZSeW0= X-Received: by 2002:ac8:404b:: with SMTP id j11mr4083688qtl.140.1632810457977; Mon, 27 Sep 2021 23:27:37 -0700 (PDT) MIME-Version: 1.0 References: <20210927104251.81107-1-hongyu.wang@intel.com> In-Reply-To: From: Uros Bizjak Date: Tue, 28 Sep 2021 08:27:25 +0200 Message-ID: Subject: Re: [PATCH] AVX512FP16:support basic 64/32bit vector type and operation. To: Hongyu Wang Cc: Hongyu Wang , Hongtao Liu , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2021 06:27:42 -0000 On Tue, Sep 28, 2021 at 6:48 AM Hongyu Wang wrote: > > > ia32 ABI declares that __m64 values pass via MMX registers. Due to > > this, we are not able to fully disable MMX register usage, as is the > > case with x86_64. So, V4HFmode values will pass to functions via MMX > > registers on ia32 targets. > > > > So, there should be no additional define_insn, the addition to the > > existing MMXMODE mode iterator should be enough. V4HFmodes should be > > handled in the same way as e.g. V8QImode. > > > > This is not the case with 4-byte values, which should be passed using > > integer ABI. > > Thanks for the explanation, updated patch by removing the extra define_in= sn, > and drop V4HFmode from VALID_AVX512FP16_REG_MODE. Now v4hf would behave > same as v8qi. > > Bootsrapped and regtested on x86_64-pc-linux-gnu{-m32,} and sde. > > OK for master with the updated one? I'd put this new pattern in mmx.md to keep 64bit/32bit modes in mmx.md, similar to e.g. FMA patterns among others. OK with the eventual above change. Thanks, Uros. > > Uros Bizjak via Gcc-patches =E4=BA=8E2021=E5=B9= =B49=E6=9C=8827=E6=97=A5=E5=91=A8=E4=B8=80 =E4=B8=8B=E5=8D=887:35=E5=86=99= =E9=81=93=EF=BC=9A > > > > On Mon, Sep 27, 2021 at 12:42 PM Hongyu Wang wr= ote: > > > > > > Hi Uros, > > > > > > This patch intends to support V4HF/V2HF vector type and basic operati= ons. > > > > > > For 32bit target, V4HF vector is parsed same as __m64 type, V2HF > > > is parsed by stack and returned from GPR since it is not specified > > > by ABI. > > > > > > We found for 64bit vector in ia32, when mmx disabled there seems no > > > mov_internal, so we add a define_insn for v4hf mode. It would b= e very > > > ppreciated if you know why the handling of 64bit vector looks as is a= nd > > > give some advice. > > > > ia32 ABI declares that __m64 values pass via MMX registers. Due to > > this, we are not able to fully disable MMX register usage, as is the > > case with x86_64. So, V4HFmode values will pass to functions via MMX > > registers on ia32 targets. > > > > So, there should be no additional define_insn, the addition to the > > existing MMXMODE mode iterator should be enough. V4HFmodes should be > > handled in the same way as e.g. V8QImode. > > > > This is not the case with 4-byte values, which should be passed using > > integer ABI. > > > > Uros. > > > > > > > > Bootstraped and regtested on x86_64-pc-linux-gnu{-m32,} and sde. > > > > > > OK for master? > > > > > > gcc/ChangeLog: > > > > > > PR target/102230 > > > * config/i386/i386.h (VALID_AVX512FP16_REG_MODE): Add > > > V4HF and V2HF mode check. > > > (VALID_SSE2_REG_VHF_MODE): Likewise. > > > (VALID_MMX_REG_MODE): Likewise. > > > (SSE_REG_MODE_P): Replace VALID_AVX512FP16_REG_MODE with > > > vector mode condition. > > > * config/i386/i386.c (classify_argument): Parse V4HF/V2HF > > > via sse regs. > > > (function_arg_32): Add V4HFmode. > > > (function_arg_advance_32): Likewise. > > > * config/i386/i386.md (mode): Add V4HF/V2HF. > > > (MODE_SIZE): Likewise. > > > * config/i386/mmx.md (MMXMODE): Add V4HF mode. > > > (V_32): Add V2HF mode. > > > (*mov_internal): Adjust sse alternatives to support > > > V4HF mode vector move. > > > (*mov_internal): Adjust sse alternatives > > > to support V2HF mode move. > > > * config/i386/sse.md (VHF_32_64): New mode iterator. > > > (3): New define_insn for add/sub/mul/div. > > > (*movv4hf_internal_sse): New define_insn for -mno-mmx and -ms= se. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR target/102230 > > > * gcc.target/i386/avx512fp16-floatvnhf.c: Remove xfail. > > > * gcc.target/i386/avx512fp16-trunc-extendvnhf.c: Ditto. > > > * gcc.target/i386/avx512fp16-truncvnhf.c: Ditto. > > > * gcc.target/i386/avx512fp16-64-32-vecop-1.c: New test. > > > * gcc.target/i386/avx512fp16-64-32-vecop-2.c: Ditto. > > > * gcc.target/i386/pr102230.c: Ditto. > > > --- > > > gcc/config/i386/i386.c | 4 + > > > gcc/config/i386/i386.h | 12 ++- > > > gcc/config/i386/i386.md | 5 +- > > > gcc/config/i386/mmx.md | 27 ++++--- > > > gcc/config/i386/sse.md | 49 ++++++++++++ > > > .../i386/avx512fp16-64-32-vecop-1.c | 30 ++++++++ > > > .../i386/avx512fp16-64-32-vecop-2.c | 75 +++++++++++++++++= ++ > > > .../gcc.target/i386/avx512fp16-floatvnhf.c | 12 +-- > > > .../i386/avx512fp16-trunc-extendvnhf.c | 12 +-- > > > .../gcc.target/i386/avx512fp16-truncvnhf.c | 12 +-- > > > gcc/testsuite/gcc.target/i386/pr102230.c | 38 ++++++++++ > > > 11 files changed, 243 insertions(+), 33 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-ve= cop-1.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-ve= cop-2.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr102230.c > > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > > > index ba89e111d28..b3e4add4b9e 100644 > > > --- a/gcc/config/i386/i386.c > > > +++ b/gcc/config/i386/i386.c > > > @@ -2462,6 +2462,8 @@ classify_argument (machine_mode mode, const_tre= e type, > > > case E_V2SFmode: > > > case E_V2SImode: > > > case E_V4HImode: > > > + case E_V4HFmode: > > > + case E_V2HFmode: > > > case E_V8QImode: > > > classes[0] =3D X86_64_SSE_CLASS; > > > return 1; > > > @@ -2902,6 +2904,7 @@ pass_in_reg: > > > > > > case E_V8QImode: > > > case E_V4HImode: > > > + case E_V4HFmode: > > > case E_V2SImode: > > > case E_V2SFmode: > > > case E_V1TImode: > > > @@ -3149,6 +3152,7 @@ pass_in_reg: > > > > > > case E_V8QImode: > > > case E_V4HImode: > > > + case E_V4HFmode: > > > case E_V2SImode: > > > case E_V2SFmode: > > > case E_V1TImode: > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > > index 8a4251b4926..9f3cad31f96 100644 > > > --- a/gcc/config/i386/i386.h > > > +++ b/gcc/config/i386/i386.h > > > @@ -1033,7 +1033,8 @@ extern const char *host_detect_local_cpu (int a= rgc, const char **argv); > > > || (MODE) =3D=3D TImode) > > > > > > #define VALID_AVX512FP16_REG_MODE(MODE) = \ > > > - ((MODE) =3D=3D V8HFmode || (MODE) =3D=3D V16HFmode || (MODE) =3D= =3D V32HFmode) > > > + ((MODE) =3D=3D V8HFmode || (MODE) =3D=3D V16HFmode || (MODE) =3D= =3D V32HFmode \ > > > + || (MODE) =3D=3D V4HFmode || (MODE) =3D=3D V2HFmode) > > > > > > #define VALID_SSE2_REG_MODE(MODE) = \ > > > ((MODE) =3D=3D V16QImode || (MODE) =3D=3D V8HImode || (MODE) =3D= =3D V2DFmode \ > > > @@ -1041,7 +1042,8 @@ extern const char *host_detect_local_cpu (int a= rgc, const char **argv); > > > || (MODE) =3D=3D V2DImode || (MODE) =3D=3D DFmode || (MODE) =3D= =3D HFmode) > > > > > > #define VALID_SSE2_REG_VHF_MODE(MODE) \ > > > - (VALID_SSE2_REG_MODE (MODE) || (MODE) =3D=3D V8HFmode) > > > + (VALID_SSE2_REG_MODE (MODE) || (MODE) =3D=3D V8HFmode \ > > > + || (MODE) =3D=3D V4HFmode || (MODE) =3D=3D V2HFmode) > > > > > > #define VALID_SSE_REG_MODE(MODE) = \ > > > ((MODE) =3D=3D V1TImode || (MODE) =3D=3D TImode = \ > > > @@ -1054,7 +1056,8 @@ extern const char *host_detect_local_cpu (int a= rgc, const char **argv); > > > #define VALID_MMX_REG_MODE(MODE) = \ > > > ((MODE) =3D=3D V1DImode || (MODE) =3D=3D DImode = \ > > > || (MODE) =3D=3D V2SImode || (MODE) =3D=3D SImode = \ > > > - || (MODE) =3D=3D V4HImode || (MODE) =3D=3D V8QImode) > > > + || (MODE) =3D=3D V4HImode || (MODE) =3D=3D V8QImode = \ > > > + || (MODE) =3D=3D V4HFmode) > > > > > > #define VALID_MASK_REG_MODE(MODE) ((MODE) =3D=3D HImode || (MODE) = =3D=3D QImode) > > > > > > @@ -1087,7 +1090,8 @@ extern const char *host_detect_local_cpu (int a= rgc, const char **argv); > > > || (MODE) =3D=3D V4DImode || (MODE) =3D=3D V8SFmode || (MODE) =3D= =3D V4DFmode \ > > > || (MODE) =3D=3D V2TImode || (MODE) =3D=3D V8DImode || (MODE) =3D= =3D V64QImode \ > > > || (MODE) =3D=3D V16SImode || (MODE) =3D=3D V32HImode || (MODE) = =3D=3D V8DFmode \ > > > - || (MODE) =3D=3D V16SFmode || VALID_AVX512FP16_REG_MODE (MODE)) > > > + || (MODE) =3D=3D V16SFmode || (MODE) =3D=3D V32HFmode || (MODE) = =3D=3D V16HFmode \ > > > + || (MODE) =3D=3D V8HFmode) > > > > > > #define X87_FLOAT_MODE_P(MODE) \ > > > (TARGET_80387 && ((MODE) =3D=3D SFmode || (MODE) =3D=3D DFmode || = (MODE) =3D=3D XFmode)) > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > > > index c6279e620c9..758d7d1e3c0 100644 > > > --- a/gcc/config/i386/i386.md > > > +++ b/gcc/config/i386/i386.md > > > @@ -498,7 +498,7 @@ > > > ;; Main data type used by the insn > > > (define_attr "mode" > > > "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V32HF,V16HF,V8HF= , > > > - V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF" > > > + V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF,V4HF,V2HF" > > > (const_string "unknown")) > > > > > > ;; The CPU unit operations uses. > > > @@ -1106,7 +1106,8 @@ > > > (V1TI "16") (V2TI "32") (V4TI "64") > > > (V2DF "16") (V4DF "32") (V8DF "64") > > > (V4SF "16") (V8SF "32") (V16SF "64") > > > - (V8HF "16") (V16HF "32") (V32HF "64")]) > > > + (V8HF "16") (V16HF "32") (V32HF "64") > > > + (V4HF "8") (V2HF "4")]) > > > > > > ;; Double word integer modes as mode attribute. > > > (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "= OI")]) > > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > > > index b0093778fc6..68e1c4b2dbd 100644 > > > --- a/gcc/config/i386/mmx.md > > > +++ b/gcc/config/i386/mmx.md > > > @@ -48,7 +48,7 @@ > > > (define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI (V1DI "TARGET_SSE2")= ]) > > > > > > ;; All 8-byte vector modes handled by MMX > > > -(define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF]) > > > +(define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF V4HF]) > > > (define_mode_iterator MMXMODE124 [V8QI V4HI V2SI V2SF]) > > > > > > ;; Mix-n-match > > > @@ -57,8 +57,8 @@ > > > (define_mode_iterator MMXMODE24 [V4HI V2SI]) > > > (define_mode_iterator MMXMODE248 [V4HI V2SI V1DI]) > > > > > > -;; All 4-byte integer vector modes > > > -(define_mode_iterator V_32 [V4QI V2HI V1SI]) > > > +;; All 4-byte integer/float16 vector modes > > > +(define_mode_iterator V_32 [V4QI V2HI V1SI V2HF]) > > > > > > ;; 4-byte integer vector modes > > > (define_mode_iterator VI_32 [V4QI V2HI]) > > > @@ -191,6 +191,8 @@ > > > (eq_attr "alternative" "11,12") > > > (cond [(match_test "mode =3D=3D V2SFmode") > > > (const_string "V4SF") > > > + (match_test "mode =3D=3D V4HFmode") > > > + (const_string "V4SF") > > > (ior (not (match_test "TARGET_SSE2")) > > > (match_test "optimize_function_for_size_p (= cfun)")) > > > (const_string "V4SF") > > > @@ -198,14 +200,16 @@ > > > (const_string "TI")) > > > > > > (and (eq_attr "alternative" "13") > > > - (ior (and (match_test "mode =3D=3D V2SFmode") > > > - (not (match_test "TARGET_MMX_WITH_SSE"))) > > > - (not (match_test "TARGET_SSE2")))) > > > + (ior (ior (and (match_test "mode =3D=3D V2SFmo= de") > > > + (not (match_test "TARGET_MMX_WITH_SSE= "))) > > > + (not (match_test "TARGET_SSE2"))) > > > + (match_test "mode =3D=3D V4HFmode"))) > > > (const_string "V2SF") > > > > > > (and (eq_attr "alternative" "14") > > > - (ior (match_test "mode =3D=3D V2SFmode") > > > - (not (match_test "TARGET_SSE2")))) > > > + (ior (ior (match_test "mode =3D=3D V2SFmode") > > > + (not (match_test "TARGET_SSE2"))) > > > + (match_test "mode =3D=3D V4HFmode"))) > > > (const_string "V2SF") > > > ] > > > (const_string "DI"))) > > > @@ -289,12 +293,17 @@ > > > (const_string "*"))) > > > (set (attr "mode") > > > (cond [(eq_attr "alternative" "2,3") > > > - (cond [(match_test "TARGET_AVX") > > > + (cond [(match_test "mode =3D=3D V2HFmode") > > > + (const_string "V4SF") > > > + (match_test "TARGET_AVX") > > > (const_string "TI") > > > (match_test "optimize_function_for_size_p (cfun)= ") > > > (const_string "V4SF") > > > ] > > > (const_string "TI")) > > > + (and (eq_attr "alternative" "4,5") > > > + (match_test "mode =3D=3D V2HFmode")) > > > + (const_string "SF") > > > ] > > > (const_string "SI"))) > > > (set (attr "preferred_for_speed") > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > > index a446dedb2ec..b7832926287 100644 > > > --- a/gcc/config/i386/sse.md > > > +++ b/gcc/config/i386/sse.md > > > @@ -671,6 +671,9 @@ > > > [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL") > > > (V16QI "TARGET_AVX512VL")]) > > > > > > +(define_mode_iterator VHF_32_64 > > > + [V4HF V2HF]) > > > + > > > (define_mode_attr avx512 > > > [(V16QI "avx512vl") (V32QI "avx512vl") (V64QI "avx512bw") > > > (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw") > > > @@ -1313,6 +1316,36 @@ > > > ] > > > (symbol_ref "true")))]) > > > > > > +(define_insn "*movv4hf_internal_sse" > > > + [(set (match_operand:V4HF 0 "nonimmediate_operand" > > > + "=3Dv,v,v,m") > > > + (match_operand:V4HF 1 "nonimmediate_or_sse_const_operand" > > > + " C,v,m,v"))] > > > + "!TARGET_MMX && TARGET_SSE2 > > > + && (register_operand (operands[0], V4HFmode) > > > + || register_operand (operands[1], V4HFmode))" > > > +{ > > > + switch (get_attr_type (insn)) > > > + { > > > + case TYPE_SSELOG1: > > > + return standard_sse_constant_opcode (insn, operands); > > > + > > > + case TYPE_SSEMOV: > > > + return ix86_output_ssemov (insn, operands); > > > + > > > + default: > > > + gcc_unreachable (); > > > + } > > > +} > > > + [(set_attr "type" "sselog1,ssemov,ssemov,ssemov") > > > + (set_attr "prefix" "maybe_vex") > > > + (set (attr "mode") > > > + (cond [(eq_attr "alternative" "1") > > > + (const_string "V4SF")] > > > + (const_string "V2SF")))] > > > +) > > > + > > > + > > > ;; If mem_addr points to a memory region with less than whole vector= size bytes > > > ;; of accessible memory and k is a mask that would prevent reading t= he inaccessible > > > ;; bytes from mem_addr, add UNSPEC_MASKLOAD to prevent it to be tran= sformed to vpblendd > > > @@ -2165,6 +2198,22 @@ > > > (set_attr "prefix" "") > > > (set_attr "mode" "")]) > > > > > > +(define_insn "3" > > > + [(set (match_operand:VHF_32_64 0 "register_operand" "=3Dv") > > > + (plusminusmultdiv:VHF_32_64 > > > + (match_operand:VHF_32_64 1 "register_operand" "v") > > > + (match_operand:VHF_32_64 2 "register_operand" "v")))] > > > + "TARGET_AVX512FP16 && TARGET_AVX512VL" > > > + "vph\t{%2, %1, %0|%0, %1, %2}" > > > + [(set (attr "type") > > > + (cond [(match_test " =3D=3D MULT") > > > + (const_string "ssemul") > > > + (match_test " =3D=3D DIV") > > > + (const_string "ssediv")] > > > + (const_string "sseadd"))) > > > + (set_attr "prefix" "evex") > > > + (set_attr "mode" "V8HF")]) > > > + > > > ;; Standard scalar operation patterns which preserve the rest of the > > > ;; vector for combiner. > > > (define_insn "*_vm3" > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c= b/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c > > > new file mode 100644 > > > index 00000000000..754e909d77b > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c > > > @@ -0,0 +1,30 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ > > > + > > > +/* { dg-final { scan-assembler-times "vaddph" 2 } } */ > > > +/* { dg-final { scan-assembler-times "vsubph" 2 } } */ > > > +/* { dg-final { scan-assembler-times "vmulph" 2 } } */ > > > +/* { dg-final { scan-assembler-times "vdivph" 2 } } */ > > > + > > > +#define DO_PRAGMA(X) _Pragma(#X) > > > + > > > +#define VEC_OP_VV(size, op, name) \ > > > +void \ > > > +__attribute__ ((noinline, noclone, optimize("tree-slp-vectorize"))) = \ > > > +vecop_v##size##hf##name (_Float16 * restrict dst, \ > > > + _Float16 * restrict src1, _Float16 * restrict src2) \ > > > +{ \ > > > + int i; \ > > > + DO_PRAGMA (GCC unroll size) \ > > > + for (i =3D 0; i < size; i++) \ > > > + dst[i] =3D src1[i] op src2[i]; \ > > > +} > > > + > > > +VEC_OP_VV(4, +, add) > > > +VEC_OP_VV(2, +, add) > > > +VEC_OP_VV(4, -, sub) > > > +VEC_OP_VV(2, -, sub) > > > +VEC_OP_VV(4, *, mul) > > > +VEC_OP_VV(2, *, mul) > > > +VEC_OP_VV(4, /, div) > > > +VEC_OP_VV(2, /, div) > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c= b/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c > > > new file mode 100644 > > > index 00000000000..4dc6f9fb92e > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c > > > @@ -0,0 +1,75 @@ > > > +/* { dg-do run { target avx512fp16 } } */ > > > +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ > > > + > > > +static void vec_op_test (void); > > > +#define DO_TEST vec_op_test > > > +#define AVX512FP16 > > > +#define AVX512VL > > > +#include "avx512f-check.h" > > > +#include "avx512fp16-64-32-vecop-1.c" > > > + > > > +_Float16 a[4], b[4], fexp[4], fref[4]; > > > + > > > +#define EMULATE_VEC_OP_VV(size, op, name) \ > > > +void \ > > > +__attribute__ ((noinline, noclone)) \ > > > +scalar_vecop_v##size##hf##name ( \ > > > + _Float16 * restrict dst, _Float16 * restrict src1, \ > > > + _Float16 * restrict src2) \ > > > +{ \ > > > + int i; \ > > > + for (i =3D 0; i < size; i++) \ > > > + dst[i] =3D src1[i] op src2[i]; \ > > > +} > > > + > > > +EMULATE_VEC_OP_VV (4, +, add) > > > +EMULATE_VEC_OP_VV (2, +, add) > > > +EMULATE_VEC_OP_VV (4, -, sub) > > > +EMULATE_VEC_OP_VV (2, -, sub) > > > +EMULATE_VEC_OP_VV (4, *, mul) > > > +EMULATE_VEC_OP_VV (2, *, mul) > > > +EMULATE_VEC_OP_VV (4, /, div) > > > +EMULATE_VEC_OP_VV (2, /, div) > > > + > > > +void init() > > > +{ > > > + int i; > > > + for (i =3D 0; i < 4; i++) > > > + { > > > + a[i] =3D i + 0.5; > > > + b[i] =3D i * 1.5; > > > + fexp[i] =3D fref[i] =3D 2.75 * i; > > > + } > > > +} > > > + > > > +int check_cond(void *a, void *b, int size) > > > +{ > > > + int i; > > > + unsigned short *pa =3D (unsigned short *)a, > > > + *pb =3D (unsigned short *)b; > > > + for (i =3D 0; i < size; i++) > > > + if (pa[i] !=3D pb[i]) > > > + return 0; > > > + return 1; > > > +} > > > + > > > +#define TEST_VEC_OP_VV(size, name) \ > > > +{ \ > > > + init (); \ > > > + scalar_vecop_v##size##hf##name (a, b, fexp); \ > > > + vecop_v##size##hf##name (a, b, fref); \ > > > + if (!check_cond ((void *)fexp, (void *)fref, size)) \ > > > + abort(); \ > > > +} > > > + > > > +static void vec_op_test() > > > +{ > > > + TEST_VEC_OP_VV (4, add) > > > + TEST_VEC_OP_VV (2, add) > > > + TEST_VEC_OP_VV (4, sub) > > > + TEST_VEC_OP_VV (2, sub) > > > + TEST_VEC_OP_VV (4, mul) > > > + TEST_VEC_OP_VV (2, mul) > > > + TEST_VEC_OP_VV (4, div) > > > + TEST_VEC_OP_VV (2, div) > > > +} > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-floatvnhf.c b/g= cc/testsuite/gcc.target/i386/avx512fp16-floatvnhf.c > > > index 112ac3e74d5..8471a1d1d10 100644 > > > --- a/gcc/testsuite/gcc.target/i386/avx512fp16-floatvnhf.c > > > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-floatvnhf.c > > > @@ -43,16 +43,16 @@ FLOATHFVV(2, udi) > > > > > > /* { dg-final { scan-assembler-times "vcvtqq2phz\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtuqq2phz\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > -/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\[^\{\n\]*\[= ^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtdq2phy\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtudq2phy\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > -/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*\[^= \n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*\[= ^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*\[^= \n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-trunc-extendvnh= f.c b/gcc/testsuite/gcc.target/i386/avx512fp16-trunc-extendvnhf.c > > > index 286ea9f2624..2ef901a0375 100644 > > > --- a/gcc/testsuite/gcc.target/i386/avx512fp16-trunc-extendvnhf.c > > > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-trunc-extendvnhf.c > > > @@ -41,15 +41,15 @@ EXTENDHFVV(8, sf) > > > EXTENDHFVV(4, sf) > > > > > > /* { dg-final { scan-assembler-times "vcvtpd2phz\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > -/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtps2phxy\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > -/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[= ^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > -/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[= ^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[= ^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[= ^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[= ^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > -/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-truncvnhf.c b/g= cc/testsuite/gcc.target/i386/avx512fp16-truncvnhf.c > > > index ee55cd12300..7a51c9dd077 100644 > > > --- a/gcc/testsuite/gcc.target/i386/avx512fp16-truncvnhf.c > > > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-truncvnhf.c > > > @@ -43,16 +43,16 @@ FIX_TRUNCHFVV(2, udi) > > > > > > /* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > -/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > -/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > -/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*= \[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*\[= ^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*\= [^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > /* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*\[= ^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ > > > diff --git a/gcc/testsuite/gcc.target/i386/pr102230.c b/gcc/testsuite= /gcc.target/i386/pr102230.c > > > new file mode 100644 > > > index 00000000000..60cf1c32afe > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/pr102230.c > > > @@ -0,0 +1,38 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -mavx512fp16" } */ > > > + > > > +typedef _Float16 v4hf __attribute__ ((vector_size (8))); > > > +typedef _Float16 v2hf __attribute__ ((vector_size (4))); > > > + > > > +v4hf > > > +v4hf_abi_1 (v4hf a) > > > +{ > > > + return a; > > > +} > > > + > > > +v4hf > > > +v4hf_abi_3 (v4hf a, v4hf b, v4hf c) > > > +{ > > > + return c; > > > +} > > > + > > > +/* { dg-final { scan-assembler-times "movq\[[\\t \]*%mm2, %mm0" 1 { = target { ia32 } } } } */ > > > +/* { dg-final { scan-assembler-times "vmovaps\[[\\t \]*%xmm2, %xmm0"= 1 { target { ! ia32 } } } } */ > > > + > > > +v4hf > > > +v4hf_abi_4 (v4hf a, v4hf b, v4hf c, v4hf d) > > > +{ > > > + return d; > > > +} > > > + > > > +/* { dg-final { scan-assembler-times "movq\[[\\t \]*4\[(\]%esp\[)\],= %mm0" 1 { target { ia32 } } } } */ > > > +/* { dg-final { scan-assembler-times "vmovaps\[[\\t \]*%xmm3, %xmm0"= 1 { target { ! ia32 } } } } */ > > > + > > > +v2hf > > > +v2hf_test (v2hf a, v2hf b, v2hf c, v2hf d) > > > +{ > > > + return b; > > > +} > > > + > > > +/* { dg-final { scan-assembler-times "movl\[[\\t \]*8\[(\]%esp\[)\],= %eax" 1 { target { ia32 } } } } */ > > > +/* { dg-final { scan-assembler-times "vmovaps\[[\\t \]*%xmm1, %xmm0"= 1 { target { ! ia32 } } } } */ > > > -- > > > 2.18.1 > > >