From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) by sourceware.org (Postfix) with ESMTPS id 67D1D3857363 for ; Fri, 16 Sep 2022 00:55:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 67D1D3857363 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb2a.google.com with SMTP id b136so30318289yba.2 for ; Thu, 15 Sep 2022 17:55:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=P7XPyt0NdSn+K+z/mLySQc8sofEXAA4zfbN332FKICM=; b=eXMPOJ8YfBdl/Uhg9fWB+uH64b3V8OUUksAqf2DLzhZv4DEBRcgYq0yM6ASwlx606i pnbmPeGCs4vXjdbp9D/Uxbf8h6Rd8IrsKfveuWr8IY9mkF+ZS93FbNpGo3E4uR/ZSB94 k08lx/4qkpMEMR6wzAFQxQ2zvL4pCl9/tZj1rlVGscGjCSRDsZWxutjqvbfKzGjR/g9G 5RrOpmANOmo3BfdKfC0uz4R7TyQZvWIkAw4p9JYwE7TqGC3v+iHRDWPvmt7zE+0kq91m wMuVUVR5UHp5TU35P1lAN73DwFqFwm/A3k7vLUyif2yCRiKBfU8R24zGYMaztrdBshPo z1ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=P7XPyt0NdSn+K+z/mLySQc8sofEXAA4zfbN332FKICM=; b=ry876fSWaxZUwfbDbXq5Kprz7/PVaz0nnvuVvKbaDS9e/vQYs/4pcJNGvgfa+b7Uc+ 5UEzxpS58mRzjyUnGSWeC67BJjHyaTO4MsvlgYxNcoW4EVM/nb6YXRqY794DokG8SCn6 Fu7TzauN+EwO2JIuCBfGEltJ3OqRMaQ1xDKIvGgeUAZkgrwazYDF1ClQhts4r2NcYlop MpSBR0s05do6IsdSICMzgDrgaRx0Sl+yFk7yOmDvSd1+cd0241xmSM1pxzhIVrdXaINe 7iHXxEdFV1878fUacXrsc5S/cDS8MpqL8XDZLidamLbFeLJmNd5xz/WHhLMmSKROWIkW q6fA== X-Gm-Message-State: ACrzQf1ZdPP4WAmD0ckA4eFTMTkrV3JkBUeHoyAp0k88W0Y+gbonpr5U 9PGgGJU3Uci1OWO6l4Nsc0AO2gJLfpWBf5o8J6qRPyii X-Google-Smtp-Source: AMsMyM4jjxJn+XIAXrDVLUHTssd2ou868eKsketR4/lsE7dGXUWSi7j+ashY7EyVV8CalpN4ZgDOlyGqXxdutPBIAYw= X-Received: by 2002:a25:d686:0:b0:6a8:e9a8:54f7 with SMTP id n128-20020a25d686000000b006a8e9a854f7mr2293150ybg.611.1663289753287; Thu, 15 Sep 2022 17:55:53 -0700 (PDT) MIME-Version: 1.0 References: <20220916005443.3305032-1-hongtao.liu@intel.com> In-Reply-To: <20220916005443.3305032-1-hongtao.liu@intel.com> From: Hongtao Liu Date: Fri, 16 Sep 2022 08:58:27 +0800 Message-ID: Subject: Re: [PATCH] Modernize ix86_builtin_vectorized_function with corresponding expanders. To: Uros Bizjak , Richard Biener Cc: gcc-patches@gcc.gnu.org, hjl.tools@gmail.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Sep 16, 2022 at 8:55 AM liuhongt wrote: > > For ifloor/lfloor/iceil/lceil/irint/lrint/iround/lround when size of > in_mode is not equal out_mode, vectorizer doesn't go to internal fn > way,still left that part in the ix86_builtin_vectorized_function. > > Remove others builtins and add corresponding expanders. > Note the patch just refactor the codes, doesn't solve the related case > in the PR which needs extra expander for 64-bit vector. > > Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}. > Ok for trunk. > > gcc/ChangeLog: > > PR target/106910 > * config/i386/i386-builtins.cc > (ix86_builtin_vectorized_function): Modernized with > corresponding expanders. > * config/i386/sse.md (lrint2): New > expander. > (floor2): Ditto. > (lfloor2): Ditto. > (ceil2): Ditto. > (lceil2): Ditto. > (btrunc2): Ditto. > (lround2): Ditto. > (exp22): Ditto. > --- > gcc/config/i386/i386-builtins.cc | 185 +------------------------------ > gcc/config/i386/sse.md | 80 +++++++++++++ > 2 files changed, 84 insertions(+), 181 deletions(-) > > diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc > index 6a04fb57e65..af2faee245b 100644 > --- a/gcc/config/i386/i386-builtins.cc > +++ b/gcc/config/i386/i386-builtins.cc > @@ -1540,21 +1540,16 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out, > > switch (fn) > { > - CASE_CFN_EXP2: > - if (out_mode == SFmode && in_mode == SFmode) > - { > - if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_EXP2PS); > - } > - break; > - > CASE_CFN_IFLOOR: > CASE_CFN_LFLOOR: > - CASE_CFN_LLFLOOR: > /* The round insn does not trap on denormals. */ > if (flag_trapping_math || !TARGET_SSE4_1) > break; > > + /* PR106910, currently vectorizer doesn't go direct internal fn way > + when out_n != in_n, so let's still keep this. > + Otherwise, it relies on expander of > + lceilmn2/lfloormn2/lroundmn2/lrintmn2. */ > if (out_mode == SImode && in_mode == DFmode) > { > if (out_n == 4 && in_n == 2) > @@ -1564,20 +1559,10 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out, > else if (out_n == 16 && in_n == 8) > return ix86_get_builtin (IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512); > } > - if (out_mode == SImode && in_mode == SFmode) > - { > - if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX256); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX512); > - } > break; > > CASE_CFN_ICEIL: > CASE_CFN_LCEIL: > - CASE_CFN_LLCEIL: > /* The round insn does not trap on denormals. */ > if (flag_trapping_math || !TARGET_SSE4_1) > break; > @@ -1591,20 +1576,10 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out, > else if (out_n == 16 && in_n == 8) > return ix86_get_builtin (IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512); > } > - if (out_mode == SImode && in_mode == SFmode) > - { > - if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX256); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX512); > - } > break; > > CASE_CFN_IRINT: > CASE_CFN_LRINT: > - CASE_CFN_LLRINT: > if (out_mode == SImode && in_mode == DFmode) > { > if (out_n == 4 && in_n == 2) > @@ -1614,20 +1589,10 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out, > else if (out_n == 16 && in_n == 8) > return ix86_get_builtin (IX86_BUILTIN_VEC_PACK_SFIX512); > } > - if (out_mode == SImode && in_mode == SFmode) > - { > - if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ256); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ512); > - } > break; > > CASE_CFN_IROUND: > CASE_CFN_LROUND: > - CASE_CFN_LLROUND: > /* The round insn does not trap on denormals. */ > if (flag_trapping_math || !TARGET_SSE4_1) > break; > @@ -1641,150 +1606,8 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out, > else if (out_n == 16 && in_n == 8) > return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512); > } > - if (out_mode == SImode && in_mode == SFmode) > - { > - if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ_SFIX); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ_SFIX256); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ_SFIX512); > - } > break; > > - CASE_CFN_FLOOR: > - /* The round insn does not trap on denormals. */ > - if (flag_trapping_math || !TARGET_SSE4_1) > - break; > - > - if (out_mode == DFmode && in_mode == DFmode) > - { > - if (out_n == 2 && in_n == 2) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPD); > - else if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPD256); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPD512); > - } > - if (out_mode == SFmode && in_mode == SFmode) > - { > - if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPS); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPS256); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPS512); > - } > - if (out_mode == HFmode && in_mode == HFmode) > - { > - /* V8HF/V16HF is supported in ix86_vector_mode_supported_p > - under TARGET_AVX512FP16, TARGET_AVX512VL is needed here. */ > - if (out_n < 32 && !TARGET_AVX512VL) > - break; > - > - if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPH); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPH256); > - else if (out_n == 32 && in_n == 32) > - return ix86_get_builtin (IX86_BUILTIN_FLOORPH512); > - } > - break; > - > - CASE_CFN_CEIL: > - /* The round insn does not trap on denormals. */ > - if (flag_trapping_math || !TARGET_SSE4_1) > - break; > - > - if (out_mode == DFmode && in_mode == DFmode) > - { > - if (out_n == 2 && in_n == 2) > - return ix86_get_builtin (IX86_BUILTIN_CEILPD); > - else if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_CEILPD256); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_CEILPD512); > - } > - if (out_mode == SFmode && in_mode == SFmode) > - { > - if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_CEILPS); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_CEILPS256); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_CEILPS512); > - } > - if (out_mode == HFmode && in_mode == HFmode) > - { > - /* V8HF/V16HF is supported in ix86_vector_mode_supported_p > - under TARGET_AVX512FP16, TARGET_AVX512VL is needed here. */ > - if (out_n < 32 && !TARGET_AVX512VL) > - break; > - > - if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_CEILPH); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_CEILPH256); > - else if (out_n == 32 && in_n == 32) > - return ix86_get_builtin (IX86_BUILTIN_CEILPH512); > - } > - break; > - > - CASE_CFN_TRUNC: > - /* The round insn does not trap on denormals. */ > - if (flag_trapping_math || !TARGET_SSE4_1) > - break; > - > - if (out_mode == DFmode && in_mode == DFmode) > - { > - if (out_n == 2 && in_n == 2) > - return ix86_get_builtin (IX86_BUILTIN_TRUNCPD); > - else if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_TRUNCPD256); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_TRUNCPD512); > - } > - if (out_mode == SFmode && in_mode == SFmode) > - { > - if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_TRUNCPS); > - else if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_TRUNCPS256); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_TRUNCPS512); > - } > - if (out_mode == HFmode && in_mode == HFmode) > - { > - /* V8HF/V16HF is supported in ix86_vector_mode_supported_p > - under TARGET_AVX512FP16, TARGET_AVX512VL is needed here. */ > - if (out_n < 32 && !TARGET_AVX512VL) > - break; > - > - if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_TRUNCPH); > - else if (out_n == 16 && in_n == 16) > - return ix86_get_builtin (IX86_BUILTIN_TRUNCPH256); > - else if (out_n == 32 && in_n == 32) > - return ix86_get_builtin (IX86_BUILTIN_TRUNCPH512); > - } > - break; > - > - CASE_CFN_FMA: > - if (out_mode == DFmode && in_mode == DFmode) > - { > - if (out_n == 2 && in_n == 2) > - return ix86_get_builtin (IX86_BUILTIN_VFMADDPD); > - if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_VFMADDPD256); > - } > - if (out_mode == SFmode && in_mode == SFmode) > - { > - if (out_n == 4 && in_n == 4) > - return ix86_get_builtin (IX86_BUILTIN_VFMADDPS); > - if (out_n == 8 && in_n == 8) > - return ix86_get_builtin (IX86_BUILTIN_VFMADDPS256); > - } > - break; > > default: > break; > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index d535c0af043..dd6c94dce05 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -321,6 +321,11 @@ (define_mode_iterator VF > [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF > (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) > > +(define_mode_iterator VF1_VF2_AVX512DQ > + [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF > + (V8DF "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ && TARGET_AVX512VL") > + (V2DF "TARGET_AVX512DQ && TARGET_AVX512VL")]) > + > (define_mode_iterator VFH > [(V32HF "TARGET_AVX512FP16") > (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL") > @@ -23177,6 +23182,14 @@ (define_expand "rint2" > "TARGET_SSE4_1" > "operands[2] = GEN_INT (ROUND_MXCSR);") > > +;; Note vcvtpd2qq require avx512dq for all vector lengths. > +(define_expand "lrint2" > + [(set (match_operand: 0 "register_operand") > + (unspec: > + [(match_operand:VF1_VF2_AVX512DQ 1 "register_operand")] > + UNSPEC_FIX_NOTRUNC))] > + "TARGET_SSE2") > + > (define_insn "_round" > [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x") > (unspec:VF_128_256 > @@ -23316,6 +23329,55 @@ (define_insn "*sse4_1_round" > (set_attr "prefix" "orig,orig,vex,evex") > (set_attr "mode" "")]) > > +(define_expand "floor2" > + [(set (match_operand:VFH 0 "register_operand") > + (unspec:VFH > + [(match_operand:VFH 1 "vector_operand") > + (match_dup 2)] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && !flag_trapping_math" > + "operands[2] = GEN_INT (ROUND_FLOOR);") > + > +(define_expand "lfloor2" > + [(match_operand: 0 "register_operand") > + (match_operand:VF1_VF2_AVX512DQ 1 "register_operand")] > + "TARGET_SSE4_1 && !flag_trapping_math" > +{ > + rtx tmp = gen_reg_rtx (mode); > + emit_insn (gen_floor2 (tmp, operands[1])); > + emit_insn (gen_fix_trunc2 (operands[0], tmp)); > + DONE; > +}) > + > +(define_expand "ceil2" > + [(set (match_operand:VFH 0 "register_operand") > + (unspec:VFH > + [(match_operand:VFH 1 "vector_operand") > + (match_dup 2)] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && !flag_trapping_math" > + "operands[2] = GEN_INT (ROUND_CEIL);") > + > +(define_expand "lceil2" > + [(match_operand: 0 "register_operand") > + (match_operand:VF1_VF2_AVX512DQ 1 "register_operand")] > + "TARGET_SSE4_1 && !flag_trapping_math" > +{ > + rtx tmp = gen_reg_rtx (mode); > + emit_insn (gen_ceil2 (tmp, operands[1])); > + emit_insn (gen_fix_trunc2 (operands[0], tmp)); > + DONE; > +}) > + > +(define_expand "btrunc2" > + [(set (match_operand:VFH 0 "register_operand") > + (unspec:VFH > + [(match_operand:VFH 1 "vector_operand") > + (match_dup 2)] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && !flag_trapping_math" > + "operands[2] = GEN_INT (ROUND_TRUNC);") > + > (define_expand "round2" > [(set (match_dup 3) > (plus:VF > @@ -23350,6 +23412,17 @@ (define_expand "round2" > operands[4] = GEN_INT (ROUND_TRUNC); > }) > > +(define_expand "lround2" > + [(match_operand: 0 "register_operand") > + (match_operand:VF1_VF2_AVX512DQ 1 "register_operand")] > + "TARGET_SSE4_1 && !flag_trapping_math" > +{ > + rtx tmp = gen_reg_rtx (mode); > + emit_insn (gen_round2 (tmp, operands[1])); > + emit_insn (gen_fix_trunc2 (operands[0], tmp)); > + DONE; > +}) > + > (define_expand "round2_sfix" > [(match_operand: 0 "register_operand") > (match_operand:VF1 1 "register_operand")] > @@ -23868,6 +23941,13 @@ (define_insn "*avx512pf_scatterpfdf_mask" > (set_attr "prefix" "evex") > (set_attr "mode" "XI")]) > > +(define_expand "exp22" > + [(set (match_operand:VF_512 0 "register_operand") > + (unspec:VF_512 > + [(match_operand:VF_512 1 "vector_operand")] > + UNSPEC_EXP2))] > + "TARGET_AVX512ER") > + > (define_insn "avx512er_exp2" > [(set (match_operand:VF_512 0 "register_operand" "=v") > (unspec:VF_512 > -- > 2.27.0 > -- BR, Hongtao