From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by sourceware.org (Postfix) with ESMTPS id 1DEA73858D20 for ; Mon, 12 Jun 2023 09:16:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1DEA73858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-565cd2fc9acso42924637b3.0 for ; Mon, 12 Jun 2023 02:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686561370; x=1689153370; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4XUyOEgTr2FfIAtuZmNesgi0VNOfv2VNX6ZAWrgMlnU=; b=fQxkCMzmCVFXBocz4m5e07qxmzwsesi6SYtQEQWLCRmKhxOHrsGROOzrOk4kCO6ACW 3XCwfhrWVAuR62Bz5Aou47glD5yI9IUqVxc64CGNnNzgefTuKvhuIsM6ZVYAtcfKKJZN GK8XS9BMYKAoNp0QJNvZ6cNhon1GOw7iZ5td3oV9VhgubfN8ELA3cF4y2JmibFAdsvaL yG4UinvjuUybbjkQbK3UamjM5bEryAKkopDgRd9vXpKmgF9eOC4IqhFrpmLf+XFPkS6G gc047PbWObQDFniNXEQuWf48gvev6Ggvd3xF28avNikPT/+4CrNZsXBLsibZ166HiStK 7CFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686561370; x=1689153370; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4XUyOEgTr2FfIAtuZmNesgi0VNOfv2VNX6ZAWrgMlnU=; b=bIMIkFVaoospU0x3zPtokrOTSLXdhiI9ZIbPnsHi8equr2wf3V8WiB0W0JsyHwCFlw f5TQ+aj3EO4Nt6wg5LVFBrB6enK8PN2tnNvnQrcGiH0xSttQ1yUJi5IKn1cwwZz5hu7j xC1rJisiiG1OgjlrJr8P1UZp2ERZp67yWABGBhIRifcodkfAAcqUNGqBlVPzyqAcsYH9 VPAQAxkXK5Nr2ZgoUj20Ghj8hbubi+SMAVQiWYSu2u6hNOJE6hHeTLoOEyT6tIbx/fhe VudygiMZLWGuTBdbN0kvXjbDh8uTJTHRRboKmd0nTgeMkUALB5vUwzAxLsKkIusTmd5A 3TKw== X-Gm-Message-State: AC+VfDx8YfkBPedMit5aoPtJeRv6/sD17ZRWWo7F3ZaPGFFG58GLBV1M geuqlz+UZJHOlzLFbJ7dCJ6xgAuX6B89HqYyqJA= X-Google-Smtp-Source: ACHHUZ7iBoDPrgyD0rhTssgH/s54CJQyFYf09hBmaQoo/FRv1RCGjgSikMs8PCXkV4/MsxBcwgQ/SXT9PmR9N8gMTzw= X-Received: by 2002:a81:83c9:0:b0:56c:fe54:4183 with SMTP id t192-20020a8183c9000000b0056cfe544183mr4877808ywf.52.1686561370267; Mon, 12 Jun 2023 02:16:10 -0700 (PDT) MIME-Version: 1.0 References: <20230605012635.2292889-1-hongtao.liu@intel.com> In-Reply-To: <20230605012635.2292889-1-hongtao.liu@intel.com> From: Hongtao Liu Date: Mon, 12 Jun 2023 17:15:59 +0800 Message-ID: Subject: Re: [PATCH] [x86] Add missing vec_pack/unpacks patterns for _Float16 <-> int/float conversion. To: liuhongt Cc: gcc-patches@gcc.gnu.org, hjl.tools@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Jun 5, 2023 at 9:26=E2=80=AFAM liuhongt wro= te: > > This patch only support vec_pack/unpacks optabs for vector modes whose le= nth >=3D 128. > For 32/64-bit vector, they're more hanlded by BB vectorizer with > truncmn2/extendmn2/fix{,uns}_truncmn2. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready to push to trunk. Committed. > > gcc/ChangeLog: > > * config/i386/sse.md (vec_pack_float_): New ex= pander. > (vec_unpack_fix_trunc_lo_): Ditto. > (vec_unpack_fix_trunc_hi_): Ditto. > (vec_unpacks_lo_: Ditto. > (vec_unpacks_hi_: Ditto. > (sse_movlhps_): New define_insn. > (ssse3_palignr_perm): Extend to V_128H. > (V_128H): New mode iterator. > (ssepackPHmode): New mode attribute. > (vunpck_extract_mode>: Ditto. > (vpckfloat_concat_mode): Extend to VxSI/VxSF for _Float16. > (vpckfloat_temp_mode): Ditto. > (vpckfloat_op_mode): Ditto. > (vunpckfixt_mode): Extend to VxHF. > (vunpckfixt_model): Ditto. > (vunpckfixt_extract_mode): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/vec_pack_fp16-1.c: New test. > * gcc.target/i386/vec_pack_fp16-2.c: New test. > * gcc.target/i386/vec_pack_fp16-3.c: New test. > --- > gcc/config/i386/sse.md | 216 +++++++++++++++++- > .../gcc.target/i386/vec_pack_fp16-1.c | 34 +++ > .../gcc.target/i386/vec_pack_fp16-2.c | 9 + > .../gcc.target/i386/vec_pack_fp16-3.c | 8 + > 4 files changed, 258 insertions(+), 9 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/vec_pack_fp16-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/vec_pack_fp16-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/vec_pack_fp16-3.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index a92f50e96b5..1eb2dd077ff 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -291,6 +291,9 @@ (define_mode_iterator V > (define_mode_iterator V_128 > [V16QI V8HI V4SI V2DI V4SF (V2DF "TARGET_SSE2")]) > > +(define_mode_iterator V_128H > + [V16QI V8HI V8HF V8BF V4SI V2DI V4SF (V2DF "TARGET_SSE2")]) > + > ;; All 256bit vector modes > (define_mode_iterator V_256 > [V32QI V16HI V8SI V4DI V8SF V4DF]) > @@ -1076,6 +1079,12 @@ (define_mode_attr ssePHmodelower > (V8DI "v8hf") (V4DI "v4hf") (V2DI "v2hf") > (V8DF "v8hf") (V16SF "v16hf") (V8SF "v8hf")]) > > + > +;; Mapping of vector modes to packed vector hf modes of same sized. > +(define_mode_attr ssepackPHmode > + [(V16SI "V32HF") (V8SI "V16HF") (V4SI "V8HF") > + (V16SF "V32HF") (V8SF "V16HF") (V4SF "V8HF")]) > + > ;; Mapping of vector modes to packed single mode of the same size > (define_mode_attr ssePSmode > [(V16SI "V16SF") (V8DF "V16SF") > @@ -6918,6 +6927,61 @@ (define_mode_attr qq2phsuff > (V16SF "") (V8SF "{y}") (V4SF "{x}") > (V8DF "{z}") (V4DF "{y}") (V2DF "{x}")]) > > +(define_mode_attr vunpck_extract_mode > + [(V32HF "v32hf") (V16HF "v16hf") (V8HF "v16hf")]) > + > +(define_expand "vec_unpacks_lo_" > + [(match_operand: 0 "register_operand") > + (match_operand:VF_AVX512FP16VL 1 "register_operand")] > + "TARGET_AVX512FP16" > +{ > + rtx tem =3D operands[1]; > + rtx (*gen) (rtx, rtx); > + if (mode !=3D V8HFmode) > + { > + tem =3D gen_reg_rtx (mode); > + emit_insn (gen_vec_extract_lo_ (tem, > + operands[1])); > + gen =3D gen_extend2; > + } > + else > + gen =3D gen_avx512fp16_float_extend_phv4sf2; > + > + emit_insn (gen (operands[0], tem)); > + DONE; > +}) > + > +(define_expand "vec_unpacks_hi_" > + [(match_operand: 0 "register_operand") > + (match_operand:VF_AVX512FP16VL 1 "register_operand")] > + "TARGET_AVX512FP16" > +{ > + rtx tem =3D operands[1]; > + rtx (*gen) (rtx, rtx); > + if (mode !=3D V8HFmode) > + { > + tem =3D gen_reg_rtx (mode); > + emit_insn (gen_vec_extract_hi_ (tem, > + operands[1])); > + gen =3D gen_extend2; > + } > + else > + { > + tem =3D gen_reg_rtx (V8HFmode); > + rtvec tmp =3D rtvec_alloc (8); > + for (int i =3D 0; i !=3D 8; i++) > + RTVEC_ELT (tmp, i) =3D GEN_INT((i+4)%8); > + > + rtx selector =3D gen_rtx_PARALLEL (VOIDmode, tmp); > + emit_move_insn (tem, > + gen_rtx_VEC_SELECT (V8HFmode, operands[1], selector)= ); > + gen =3D gen_avx512fp16_float_extend_phv4sf2; > + } > + > + emit_insn (gen (operands[0], tem)); > + DONE; > +}) > + > (define_insn "avx512fp16_vcvtph2= _" > [(set (match_operand:VI248_AVX512VL 0 "register_operand" "=3Dv") > (unspec:VI248_AVX512VL > @@ -8314,11 +8378,17 @@ (define_expand "floatv2div2sf2" > }) > > (define_mode_attr vpckfloat_concat_mode > - [(V8DI "v16sf") (V4DI "v8sf") (V2DI "v8sf")]) > + [(V8DI "v16sf") (V4DI "v8sf") (V2DI "v8sf") > + (V16SI "v32hf") (V8SI "v16hf") (V4SI "v16hf") > + (V16SF "v32hf") (V8SF "v16hf") (V4SF "v16hf")]) > (define_mode_attr vpckfloat_temp_mode > - [(V8DI "V8SF") (V4DI "V4SF") (V2DI "V4SF")]) > + [(V8DI "V8SF") (V4DI "V4SF") (V2DI "V4SF") > + (V16SI "V16HF") (V8SI "V8HF") (V4SI "V8HF") > + (V16SF "V16HF") (V8SF "V8HF") (V4SF "V8HF")]) > (define_mode_attr vpckfloat_op_mode > - [(V8DI "v8sf") (V4DI "v4sf") (V2DI "v2sf")]) > + [(V8DI "v8sf") (V4DI "v4sf") (V2DI "v2sf") > + (V16SI "v16hf") (V8SI "v8hf") (V4SI "v4hf") > + (V16SF "v16hf") (V8SF "v8hf") (V4SF "v4hf")]) > > (define_expand "vec_pack_float_" > [(match_operand: 0 "register_operand") > @@ -8345,6 +8415,31 @@ (define_expand "vec_pack_float_= " > DONE; > }) > > +(define_expand "vec_pack_float_" > + [(match_operand: 0 "register_operand") > + (any_float: > + (match_operand:VI4_AVX512VL 1 "register_operand")) > + (match_operand:VI4_AVX512VL 2 "register_operand")] > + "TARGET_AVX512FP16" > +{ > + rtx r1 =3D gen_reg_rtx (mode); > + rtx r2 =3D gen_reg_rtx (mode); > + rtx (*gen) (rtx, rtx); > + > + if (mode =3D=3D V4SImode) > + gen =3D gen_avx512fp16_floatv4siv4hf2; > + else > + gen =3D gen_float2; > + emit_insn (gen (r1, operands[1])); > + emit_insn (gen (r2, operands[2])); > + if (mode =3D=3D V4SImode) > + emit_insn (gen_sse_movlhps_v8hf (operands[0], r1, r2)); > + else > + emit_insn (gen_avx_vec_concat (operands[0], > + r1, r2)); > + DONE; > +}) > + > (define_expand "floatv2div2sf2_mask" > [(set (match_operand:V4SF 0 "register_operand" "=3Dv") > (vec_concat:V4SF > @@ -8747,11 +8842,14 @@ (define_expand "fix_truncv2sfv2di2" > }) > > (define_mode_attr vunpckfixt_mode > - [(V16SF "V8DI") (V8SF "V4DI") (V4SF "V2DI")]) > + [(V16SF "V8DI") (V8SF "V4DI") (V4SF "V2DI") > + (V32HF "V16SI") (V16HF "V8SI") (V8HF "V4SI")]) > (define_mode_attr vunpckfixt_model > - [(V16SF "v8di") (V8SF "v4di") (V4SF "v2di")]) > + [(V16SF "v8di") (V8SF "v4di") (V4SF "v2di") > + (V32HF "v16si") (V16HF "v8si") (V8HF "v4si")]) > (define_mode_attr vunpckfixt_extract_mode > - [(V16SF "v16sf") (V8SF "v8sf") (V4SF "v8sf")]) > + [(V16SF "v16sf") (V8SF "v8sf") (V4SF "v8sf") > + (V32HF "v32hf") (V16HF "v16hf") (V8HF "v16hf")]) > > (define_expand "vec_unpack_fix_trunc_lo_" > [(match_operand: 0 "register_operand") > @@ -8803,6 +8901,60 @@ (define_expand "vec_unpack_fix_trunc_hi= _" > DONE; > }) > > +(define_expand "vec_unpack_fix_trunc_lo_" > + [(match_operand: 0 "register_operand") > + (any_fix: > + (match_operand:VF_AVX512FP16VL 1 "register_operand"))] > + "TARGET_AVX512FP16" > +{ > + rtx tem =3D operands[1]; > + rtx (*gen) (rtx, rtx); > + if (mode !=3D V8HFmode) > + { > + tem =3D gen_reg_rtx (mode); > + emit_insn (gen_vec_extract_lo_ (tem, > + operands[1= ])); > + gen =3D gen_fix_trunc2; > + } > + else > + gen =3D gen_avx512fp16_fix_trunc2; > + > + emit_insn (gen (operands[0], tem)); > + DONE; > +}) > + > +(define_expand "vec_unpack_fix_trunc_hi_" > + [(match_operand: 0 "register_operand") > + (any_fix: > + (match_operand:VF_AVX512FP16VL 1 "register_operand"))] > + "TARGET_AVX512FP16" > +{ > + rtx tem =3D operands[1]; > + rtx (*gen) (rtx, rtx); > + if (mode !=3D V8HFmode) > + { > + tem =3D gen_reg_rtx (mode); > + emit_insn (gen_vec_extract_hi_ (tem, > + operands[1= ])); > + gen =3D gen_fix_trunc2; > + } > + else > + { > + tem =3D gen_reg_rtx (V8HFmode); > + rtvec tmp =3D rtvec_alloc (8); > + for (int i =3D 0; i !=3D 8; i++) > + RTVEC_ELT (tmp, i) =3D GEN_INT((i+4)%8); > + > + rtx selector =3D gen_rtx_PARALLEL (VOIDmode, tmp); > + emit_move_insn (tem, > + gen_rtx_VEC_SELECT (V8HFmode, operands[1], selector)= ); > + gen =3D gen_avx512fp16_fix_trunc2; > + } > + > + emit_insn (gen (operands[0], tem)); > + DONE; > +}) > + > (define_insn "fixuns_trunc2" > [(set (match_operand: 0 "register_operand" "=3Dv") > (unsigned_fix: > @@ -9616,6 +9768,31 @@ (define_expand "vec_pack_trunc_" > operands[4] =3D gen_reg_rtx (mode); > }) > > +(define_expand "vec_pack_trunc_" > + [(match_operand: 0 "register_operand") > + (match_operand:VF1_AVX512VL 1 "register_operand") > + (match_operand:VF1_AVX512VL 2 "register_operand")] > + "TARGET_AVX512FP16" > +{ > + rtx r1 =3D gen_reg_rtx (mode); > + rtx r2 =3D gen_reg_rtx (mode); > + rtx (*gen) (rtx, rtx); > + > + if (mode =3D=3D V4SFmode) > + gen =3D gen_avx512fp16_truncv4sfv4hf2; > + else > + gen =3D gen_trunc2; > + emit_insn (gen (r1, operands[1])); > + emit_insn (gen (r2, operands[2])); > + if (mode =3D=3D V4SFmode) > + emit_insn (gen_sse_movlhps_v8hf (operands[0], r1, r2)); > + else > + emit_insn (gen_avx_vec_concat (operands[0], > + r1, r2)); > + DONE; > + > +}) > + > (define_expand "vec_pack_trunc_v2df" > [(match_operand:V4SF 0 "register_operand") > (match_operand:V2DF 1 "vector_operand") > @@ -9921,6 +10098,27 @@ (define_insn "sse_movlhps" > (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex") > (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")]) > > +(define_insn "sse_movlhps_" > + [(set (match_operand:V8_128 0 "nonimmediate_operand" "=3Dx,v,x,o") > + (vec_select:V8_128 > + (vec_concat: > + (match_operand:V8_128 1 "nonimmediate_operand" " 0,v,0,0") > + (match_operand:V8_128 2 "nonimmediate_operand" " x,v,m,v")) > + (parallel [(const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 8) (const_int 9) > + (const_int 10) (const_int 11)])))] > + "TARGET_SSE && ix86_binary_operator_ok (UNKNOWN, mode, operands)= " > + "@ > + movlhps\t{%2, %0|%0, %2} > + vpunpcklqdq\t{%2, %1, %0|%0, %1, %2} > + movhps\t{%2, %0|%0, %q2} > + %vmovlps\t{%2, %H0|%H0, %2}" > + [(set_attr "isa" "noavx,avx,noavx,*") > + (set_attr "type" "ssemov") > + (set_attr "prefix" "orig,maybe_evex,orig,maybe_vex") > + (set_attr "mode" "V4SF,TI,V2SF,V2SF")]) > + > (define_insn "avx512f_unpckhps512" > [(set (match_operand:V16SF 0 "register_operand" "=3Dv") > (vec_select:V16SF > @@ -26239,9 +26437,9 @@ (define_insn "*avx_vperm2f128_nozero" > (set_attr "mode" "")]) > > (define_insn "*ssse3_palignr_perm" > - [(set (match_operand:V_128 0 "register_operand" "=3Dx,Yw") > - (vec_select:V_128 > - (match_operand:V_128 1 "register_operand" "0,Yw") > + [(set (match_operand:V_128H 0 "register_operand" "=3Dx,Yw") > + (vec_select:V_128H > + (match_operand:V_128H 1 "register_operand" "0,Yw") > (match_parallel 2 "palignr_operand" > [(match_operand 3 "const_int_operand")])))] > "TARGET_SSSE3" > diff --git a/gcc/testsuite/gcc.target/i386/vec_pack_fp16-1.c b/gcc/testsu= ite/gcc.target/i386/vec_pack_fp16-1.c > new file mode 100644 > index 00000000000..9eca9c71645 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vec_pack_fp16-1.c > @@ -0,0 +1,34 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512fp16 -mavx512vl -Ofast -mprefer-vector-width=3D= 512" } */ > +/* { dg-final { scan-assembler-times "vcvttph2dq" "2" } } */ > +/* { dg-final { scan-assembler-times "vcvtdq2ph" "2" } } */ > +/* { dg-final { scan-assembler-times "vcvtph2ps" "2" } } */ > +/* { dg-final { scan-assembler-times "vcvtps2ph" "2" } } */ > + > +void > +foo (int* __restrict a, _Float16* b) > +{ > + for (int i =3D 0; i !=3D 1000000; i++) > + a[i] =3D b[i]; > +} > + > +void > +foo1 (int* __restrict a, _Float16* b) > +{ > + for (int i =3D 0; i !=3D 100000; i++) > + b[i] =3D a[i]; > +} > + > +void > +foo2 (float* __restrict a, _Float16* b) > +{ > + for (int i =3D 0; i !=3D 1000000; i++) > + a[i] =3D b[i]; > +} > + > +void > +foo3 (float* __restrict a, _Float16* b) > +{ > + for (int i =3D 0; i !=3D 100000; i++) > + b[i] =3D a[i]; > +} > diff --git a/gcc/testsuite/gcc.target/i386/vec_pack_fp16-2.c b/gcc/testsu= ite/gcc.target/i386/vec_pack_fp16-2.c > new file mode 100644 > index 00000000000..0fd0325c193 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vec_pack_fp16-2.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512fp16 -mavx512vl -Ofast -mprefer-vector-width=3D= 256" } */ > +/* { dg-final { scan-assembler-times "vcvttph2dq" "2" } } */ > +/* { dg-final { scan-assembler-times "vcvtdq2ph" "2" } } */ > +/* { dg-final { scan-assembler-times "vcvtph2ps" "2" } } */ > +/* { dg-final { scan-assembler-times "vcvtps2ph" "2" } } */ > + > + > +#include "vec_pack_fp16-1.c" > diff --git a/gcc/testsuite/gcc.target/i386/vec_pack_fp16-3.c b/gcc/testsu= ite/gcc.target/i386/vec_pack_fp16-3.c > new file mode 100644 > index 00000000000..f6d3fa0bf65 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vec_pack_fp16-3.c > @@ -0,0 +1,8 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512fp16 -mavx512vl -Ofast -mprefer-vector-width=3D= 128" } */ > +/* { dg-final { scan-assembler-times "vcvttph2dq" "2" } } */ > +/* { dg-final { scan-assembler-times "vcvtdq2ph" "2" } } */ > +/* { dg-final { scan-assembler-times "vcvtph2ps" "2" } } */ > +/* { dg-final { scan-assembler-times "vcvtps2ph" "2" } } */ > + > +#include "vec_pack_fp16-1.c" > -- > 2.39.1.388.g2fc9e9ca3c > --=20 BR, Hongtao