From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by sourceware.org (Postfix) with ESMTPS id E01FA385840C for ; Mon, 7 Aug 2023 09:18:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E01FA385840C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-4fe07f0636bso6922172e87.1 for ; Mon, 07 Aug 2023 02:18:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691399928; x=1692004728; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ajqCwMO0enZAKAiAzXBsgFGmCQoX/sTPKWuItVR5nI4=; b=LSu268FbS6q0sl2E3tQLBadLEzHQX1kCga2Cm/edOVqFPh6CPpfAK5U2OnHXbs25bm 5WB/TqqxG5XlHHPu+a0hY/6YZoWoTIYaAVi0g3cPxKBZKxDtxFCH2cKk1Es3+JgBukoB jJinCyQaFkT/7Txxud92TtYZC0M+1FPRVyoZi7WGZvMJMtyMr1MDD+aLJr7+cGaeRcl2 r6Vt543GG7AzojhxgfRhlOJCs7uUOIATzfdnHwsO/frv+YB1YBh/1Nx/H/MAXEPsuPNH DKsnI+DByiUKkC7dHYTDHvYwi76IlTVUOoasMMPQMVrd230+mkwEyTJDAjPLkSY1eFD3 3i/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691399928; x=1692004728; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ajqCwMO0enZAKAiAzXBsgFGmCQoX/sTPKWuItVR5nI4=; b=AQiqL6chIPkCO/ZztB21JIDLklOZpY/FP6jyyNDMiV/8lSX7k1EDFihHQ8QmNGc8Te 6YSLez0OxdBnzAHcWMkvGT9VqieAPeiJszuhMtgs/i+wuewu6vBXqFI+A/Sp37+b3Q8q UBralyN2dTqUDORolbH+6lf3Htcu3Mke4/Kzt+npFn3vNR5sohwZybCzgFrxYtF/FVY7 BQ1JyIRhAXvd7j39aaFLYPJAnP9y8S2OB9MIGKIeYc1ZvWUCnY2sDZ4n9qblCqFDHGOu eMLZzpvS/GAtauCDLtAOqGIQkasj5nuX6F713JSaumGrwbuu0cZqt4RUplXnpoa82QhG 93MQ== X-Gm-Message-State: AOJu0YwHpWbSeh5E8aRwLBFk0nAIww8TVGRxeeEsy8e95f3p04yWHW5X AG+L8rc5gbjG7zBu6WeRgwfuAO34FBldBNZ+Bq8= X-Google-Smtp-Source: AGHT+IGNUiEheYjyGn9Ut1kWOo01oov2h7B6RZNR1HdWKXFSHiJuGDP/EZZxxmY4tt3kne8LaOX3LYv1mCJqv8E+LOc= X-Received: by 2002:a05:6512:32b3:b0:4f8:66e1:14e8 with SMTP id q19-20020a05651232b300b004f866e114e8mr4959355lfe.69.1691399927928; Mon, 07 Aug 2023 02:18:47 -0700 (PDT) MIME-Version: 1.0 References: <20230807085701.302936-1-hongtao.liu@intel.com> In-Reply-To: <20230807085701.302936-1-hongtao.liu@intel.com> From: Uros Bizjak Date: Mon, 7 Aug 2023 11:18:36 +0200 Message-ID: Subject: Re: [PATCH] i386: Clear upper bits of XMM register for V4HFmode/V2HFmode operations [PR110762] To: liuhongt Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Aug 7, 2023 at 10:57=E2=80=AFAM liuhongt wr= ote: > > Similar like r14-2786-gade30fad6669e5, the patch is for V4HF/V2HFmode. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/110762 > * config/i386/mmx.md (3): Changed from define_insn > to define_expand and break into .. > (v4hf3): .. this. > (divv4hf3): .. this. > (v2hf3): .. this. > (divv2hf3): .. this. > (movd_v2hf_to_sse): New define_expand. > (movq__to_sse): Extend to V4HFmode. > (mmxdoublevecmode): Ditto. > (V2FI_V4HF): New mode iterator. > * config/i386/sse.md (*vec_concatv4sf): Extend to hanlde V8HF > by using mode iterator V4SF_V8HF, renamed to .. > (*vec_concat): .. this. > (*vec_concatv4sf_0): Extend to handle V8HF by using mode > iterator V4SF_V8HF, renamed to .. > (*vec_concat_0): .. this. > (*vec_concatv8hf_movss): New define_insn. > (V4SF_V8HF): New mode iterator. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr110762-v4hf.c: New test. LGTM. Please also note the RFC patch [1] that relaxes clears for V2SFmode with -fno-trapping-math. The patched compiler will then emit the same code as clang does for -O2. Which raises another question - should gcc default to -fno-trapping-math? [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625795.html Thanks, Uros. > --- > gcc/config/i386/mmx.md | 109 +++++++++++++++--- > gcc/config/i386/sse.md | 40 +++++-- > gcc/testsuite/gcc.target/i386/pr110762-v4hf.c | 57 +++++++++ > 3 files changed, 177 insertions(+), 29 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr110762-v4hf.c > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > index 896af76a33f..88bdf084f54 100644 > --- a/gcc/config/i386/mmx.md > +++ b/gcc/config/i386/mmx.md > @@ -79,9 +79,7 @@ (define_mode_iterator V_16_32_64 > ;; V2S* modes > (define_mode_iterator V2FI [V2SF V2SI]) > > -;; 4-byte and 8-byte float16 vector modes > -(define_mode_iterator VHF_32_64 [V4HF V2HF]) > - > +(define_mode_iterator V2FI_V4HF [V2SF V2SI V4HF]) > ;; Mapping from integer vector mode to mnemonic suffix > (define_mode_attr mmxvecsize > [(V8QI "b") (V4QI "b") (V2QI "b") > @@ -108,7 +106,7 @@ (define_mode_attr mmxintvecmodelower > > ;; Mapping of vector modes to a vector mode of double size > (define_mode_attr mmxdoublevecmode > - [(V2SF "V4SF") (V2SI "V4SI")]) > + [(V2SF "V4SF") (V2SI "V4SI") (V4HF "V8HF")]) > > ;; Mapping of vector modes back to the scalar modes > (define_mode_attr mmxscalarmode > @@ -594,7 +592,7 @@ (define_insn "sse_movntq" > (define_expand "movq__to_sse" > [(set (match_operand: 0 "register_operand") > (vec_concat: > - (match_operand:V2FI 1 "nonimmediate_operand") > + (match_operand:V2FI_V4HF 1 "nonimmediate_operand") > (match_dup 2)))] > "TARGET_SSE2" > "operands[2] =3D CONST0_RTX (mode);") > @@ -1927,21 +1925,94 @@ (define_expand "lroundv2sfv2si2" > ;; > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > > -(define_insn "3" > - [(set (match_operand:VHF_32_64 0 "register_operand" "=3Dv") > - (plusminusmultdiv:VHF_32_64 > - (match_operand:VHF_32_64 1 "register_operand" "v") > - (match_operand:VHF_32_64 2 "register_operand" "v")))] > +(define_expand "v4hf3" > + [(set (match_operand:V4HF 0 "register_operand") > + (plusminusmult:V4HF > + (match_operand:V4HF 1 "nonimmediate_operand") > + (match_operand:V4HF 2 "nonimmediate_operand")))] > "TARGET_AVX512FP16 && TARGET_AVX512VL" > - "vph\t{%2, %1, %0|%0, %1, %2}" > - [(set (attr "type") > - (cond [(match_test " =3D=3D MULT") > - (const_string "ssemul") > - (match_test " =3D=3D DIV") > - (const_string "ssediv")] > - (const_string "sseadd"))) > - (set_attr "prefix" "evex") > - (set_attr "mode" "V8HF")]) > +{ > + rtx op2 =3D gen_reg_rtx (V8HFmode); > + rtx op1 =3D gen_reg_rtx (V8HFmode); > + rtx op0 =3D gen_reg_rtx (V8HFmode); > + > + emit_insn (gen_movq_v4hf_to_sse (op2, operands[2])); > + emit_insn (gen_movq_v4hf_to_sse (op1, operands[1])); > + > + emit_insn (gen_v8hf3 (op0, op1, op2)); > + > + emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode))= ; > + DONE; > +}) > + > +(define_expand "divv4hf3" > + [(set (match_operand:V4HF 0 "register_operand") > + (div:V4HF > + (match_operand:V4HF 1 "nonimmediate_operand") > + (match_operand:V4HF 2 "nonimmediate_operand")))] > + "TARGET_AVX512FP16 && TARGET_AVX512VL" > +{ > + rtx op2 =3D gen_reg_rtx (V8HFmode); > + rtx op1 =3D gen_reg_rtx (V8HFmode); > + rtx op0 =3D gen_reg_rtx (V8HFmode); > + > + emit_insn (gen_movq_v4hf_to_sse (op1, operands[1])); > + rtx tmp =3D gen_rtx_VEC_CONCAT (V8HFmode, operands[2], > + force_reg (V4HFmode, CONST1_RTX (V4HFmode= ))); > + emit_insn (gen_rtx_SET (op2, tmp)); > + emit_insn (gen_divv8hf3 (op0, op1, op2)); > + emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode))= ; > + DONE; > +}) > + > +(define_expand "movd_v2hf_to_sse" > + [(set (match_operand:V8HF 0 "register_operand") > + (vec_merge:V8HF > + (vec_duplicate:V8HF > + (match_operand:V2HF 1 "nonimmediate_operand")) > + (match_operand:V8HF 2 "reg_or_0_operand") > + (const_int 3)))] > + "TARGET_SSE") > + > +(define_expand "v2hf3" > + [(set (match_operand:V2HF 0 "register_operand") > + (plusminusmult:V2HF > + (match_operand:V2HF 1 "nonimmediate_operand") > + (match_operand:V2HF 2 "nonimmediate_operand")))] > + "TARGET_AVX512FP16 && TARGET_AVX512VL" > +{ > + rtx op2 =3D gen_reg_rtx (V8HFmode); > + rtx op1 =3D gen_reg_rtx (V8HFmode); > + rtx op0 =3D gen_reg_rtx (V8HFmode); > + > + emit_insn (gen_movd_v2hf_to_sse (op2, operands[2], CONST0_RTX (V8HFmod= e))); > + emit_insn (gen_movd_v2hf_to_sse (op1, operands[1], CONST0_RTX (V8HFmod= e))); > + emit_insn (gen_v8hf3 (op0, op1, op2)); > + > + emit_move_insn (operands[0], lowpart_subreg (V2HFmode, op0, V8HFmode))= ; > + DONE; > +}) > + > +(define_expand "divv2hf3" > + [(set (match_operand:V2HF 0 "register_operand") > + (div:V2HF > + (match_operand:V2HF 1 "nonimmediate_operand") > + (match_operand:V2HF 2 "nonimmediate_operand")))] > + "TARGET_AVX512FP16 && TARGET_AVX512VL" > +{ > + rtx op2 =3D gen_reg_rtx (V8HFmode); > + rtx op1 =3D gen_reg_rtx (V8HFmode); > + rtx op0 =3D gen_reg_rtx (V8HFmode); > + > + emit_insn (gen_movd_v2hf_to_sse (op2, operands[2], > + force_reg (V8HFmode, CONST1_RTX (V8HFmo= de)))); > + emit_insn (gen_movd_v2hf_to_sse (op1, operands[1], CONST0_RTX (V8HFmod= e))); > + emit_insn (gen_divv8hf3 (op0, op1, op2)); > + > + emit_move_insn (operands[0], lowpart_subreg (V2HFmode, op0, V8HFmode))= ; > + DONE; > +}) > + > > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > ;; > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index ab455c3e297..7383a50ee0d 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -430,6 +430,9 @@ (define_mode_iterator VF_512 > (define_mode_iterator VFB_512 > [V32HF V16SF V8DF]) > > +(define_mode_iterator V4SF_V8HF > + [V4SF V8HF]) > + > (define_mode_iterator VI48_AVX512VL > [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") > V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) > @@ -10873,11 +10876,11 @@ (define_insn "*vec_concatv2sf_sse" > (set_attr "type" "sselog,ssemov,mmxcvt,mmxmov") > (set_attr "mode" "V4SF,SF,DI,DI")]) > > -(define_insn "*vec_concatv4sf" > - [(set (match_operand:V4SF 0 "register_operand" "=3Dx,v,x,v") > - (vec_concat:V4SF > - (match_operand:V2SF 1 "register_operand" " 0,v,0,v") > - (match_operand:V2SF 2 "nonimmediate_operand" " x,v,m,m")))] > +(define_insn "*vec_concat" > + [(set (match_operand:V4SF_V8HF 0 "register_operand" "=3Dx,v,x,v"= ) > + (vec_concat:V4SF_V8HF > + (match_operand: 1 "register_operand" " 0,v,= 0,v") > + (match_operand: 2 "nonimmediate_operand" " x,v,= m,m")))] > "TARGET_SSE" > "@ > movlhps\t{%2, %0|%0, %2} > @@ -10889,17 +10892,34 @@ (define_insn "*vec_concatv4sf" > (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex") > (set_attr "mode" "V4SF,V4SF,V2SF,V2SF")]) > > -(define_insn "*vec_concatv4sf_0" > - [(set (match_operand:V4SF 0 "register_operand" "=3Dv") > - (vec_concat:V4SF > - (match_operand:V2SF 1 "nonimmediate_operand" "vm") > - (match_operand:V2SF 2 "const0_operand")))] > +(define_insn "*vec_concat_0" > + [(set (match_operand:V4SF_V8HF 0 "register_operand" "=3Dv") > + (vec_concat:V4SF_V8HF > + (match_operand: 1 "nonimmediate_operand" "vm") > + (match_operand: 2 "const0_operand")))] > "TARGET_SSE2" > "%vmovq\t{%1, %0|%0, %1}" > [(set_attr "type" "ssemov") > (set_attr "prefix" "maybe_vex") > (set_attr "mode" "DF")]) > > +(define_insn "*vec_concatv8hf_movss" > + [(set (match_operand:V8HF 0 "register_operand" "=3Dx,v,v") > + (vec_merge:V8HF > + (vec_duplicate:V8HF > + (match_operand:V2HF 2 "nonimmediate_operand" "x,m,v")) > + (match_operand:V8HF 1 "reg_or_0_operand" "0,C,v" ) > + (const_int 3)))] > + "TARGET_SSE" > + "@ > + movss\t{%2, %0|%0, %2} > + %vmovss\t{%2, %0|%0, %2} > + vmovss\t{%2, %1, %0|%0, %1, %2}" > + [(set_attr "isa" "noavx,*,avx") > + (set_attr "type" "ssemov") > + (set_attr "prefix" "orig,maybe_vex,maybe_vex") > + (set_attr "mode" "SF")]) > + > ;; Avoid combining registers from different units in a single alternativ= e, > ;; see comment above inline_secondary_memory_needed function in i386.cc > (define_insn "vec_set_0" > diff --git a/gcc/testsuite/gcc.target/i386/pr110762-v4hf.c b/gcc/testsuit= e/gcc.target/i386/pr110762-v4hf.c > new file mode 100644 > index 00000000000..332784ac694 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr110762-v4hf.c > @@ -0,0 +1,57 @@ > +/* PR target/110762 */ > +/* { dg-do compile { target { ! ia32 } } } */ > +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -dp" } */ > + > +typedef _Float16 v4hf __attribute__((vector_size(8))); > +typedef _Float16 v2hf __attribute__((vector_size(4))); > + > +v4hf > +foo (v4hf a, v4hf b) > +{ > + return a + b; > +} > + > +v4hf > +foo2 (v4hf a, v4hf b) > +{ > + return a - b; > +} > + > +v4hf > +foo3 (v4hf a, v4hf b) > +{ > + return a * b; > +} > + > +v4hf > +foo1 (v4hf a, v4hf b) > +{ > + return a / b; > +} > + > +v2hf > +foo4 (v2hf a, v2hf b) > +{ > + return a + b; > +} > + > +v2hf > +foo5 (v2hf a, v2hf b) > +{ > + return a - b; > +} > + > +v2hf > +foo6 (v2hf a, v2hf b) > +{ > + return a * b; > +} > + > +v2hf > +foo7 (v2hf a, v2hf b) > +{ > + return a / b; > +} > + > +/* { dg-final { scan-assembler-times "\\*vec_concatv8hf_0" 7 } } */ > +/* { dg-final { scan-assembler-times "\\*vec_concatv8hf_movss" 8 } } */ > -- > 2.31.1 >