From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1131.google.com (mail-yw1-x1131.google.com [IPv6:2607:f8b0:4864:20::1131]) by sourceware.org (Postfix) with ESMTPS id 951F63858D39 for ; Tue, 20 Sep 2022 06:35:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 951F63858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-345528ceb87so15488247b3.11 for ; Mon, 19 Sep 2022 23:35:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=kiZGeOWVk6LS12+cIZkQZi0GpnXI4u3ZH5rATp5nKlo=; b=IUv6w3eJt4LK+FPgh5w94bi3zERuU3wpPRhUZIzCQHM1wpYKYhcf/MRRrSOM8n8DAs pOjoOedhW1yemqFcQZ9bp93sJ9XxRESEnSn0aWMJc+hE97x4x5nj0b0bo/rH91wAulAr hH9Vol0f5X57CWWeBpgOAa+1Hmz05pQ3ZQR2k4Rd+g4OX1nLfpbszotloj1IQ24C3yzv r+OrRGGtpon8r0waQ8EPcGmjGFWdSI+46QjHZT0x8myB8GhEv2xRgKcnJEy/ND10B89N Rbq4N9qh086X3VRfoysPvIiNWS8va7HNfB0OqWr3ITNO6GaAeR6JucTwKw6VI9/CxXxH cz4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=kiZGeOWVk6LS12+cIZkQZi0GpnXI4u3ZH5rATp5nKlo=; b=CvweD1hcbCE2KDUbSiPAhfD6IYCL4JOPRGkj4byqnHzybnsrXYkH/WT1V3CkgMY9Wk kJqwibOEXy3ttaD6erWvhv/nEOjuZSKzkqGCPTYncy0++OFeATrLCU08keRGcQ+IoCO9 FBClIkxSzwbyFlHlzx6vpuNhqqCCdkTjs+tJk2UCL8eu/4mCQdfj+06DW/ogNJFNBiNx flKacgRcxaHa43jvnXeP14zNypiqAyCmGlSOQjPiO+Il6gDtdvmfLCYhstegI63P5RvB RyKuQocAMIEEibwbUi149hZNh+96Qlk1L37GsWXAYfzX6ILYqKfQVZ0puXmEtm6ML98g 0FgA== X-Gm-Message-State: ACrzQf2NTSzr9Wwa6IBMbeSIeO9D4ewAWkhCPx+tkU03u+rrXWnqt0LN STcwccxSaW/8qlMBlVtPe0rSWkflqQf2FiXT/sc= X-Google-Smtp-Source: AMsMyM6wktWplkekvchCpPc28JTV4SY9tp59FaeqyRbQnh3skBkRv8KJdv7MZxE5U2WSSwain8NWXN+yYte8pQwhe0M= X-Received: by 2002:a81:77d5:0:b0:345:41ac:db07 with SMTP id s204-20020a8177d5000000b0034541acdb07mr18010062ywc.495.1663655749693; Mon, 19 Sep 2022 23:35:49 -0700 (PDT) MIME-Version: 1.0 References: <20220920021235.2634748-1-hongtao.liu@intel.com> In-Reply-To: <20220920021235.2634748-1-hongtao.liu@intel.com> From: Uros Bizjak Date: Tue, 20 Sep 2022 08:35:38 +0200 Message-ID: Subject: Re: [PATCH] Support 64-bit vectorization for single-precision floating rounding operation. To: liuhongt Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Sep 20, 2022 at 4:15 AM liuhongt via Gcc-patches wrote: > > Here's list the patch supported. > rint/nearbyint/ceil/floor/trunc/lrint/lceil/lfloor/round/lround. > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > Ok for trunk? > > gcc/ChangeLog: > > PR target/106910 > * config/i386/mmx.md (nearbyintv2sf2): New expander. > (rintv2sf2): Ditto. > (ceilv2sf2): Ditto. > (lceilv2sfv2si2): Ditto. > (floorv2sf2): Ditto. > (lfloorv2sfv2si2): Ditto. > (btruncv2sf2): Ditto. > (lrintv2sfv2si2): Ditto. > (roundv2sf2): Ditto. > (lroundv2sfv2si2): Ditto. > (*mmx_roundv2sf2): New define_insn. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr106910-1.c: New test. OK. Thanks, Uros. > --- > gcc/config/i386/mmx.md | 154 +++++++++++++++++++++ > gcc/testsuite/gcc.target/i386/pr106910-1.c | 77 +++++++++++ > 2 files changed, 231 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr106910-1.c > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > index dda4b43f5c1..222a041de58 100644 > --- a/gcc/config/i386/mmx.md > +++ b/gcc/config/i386/mmx.md > @@ -1627,6 +1627,160 @@ (define_expand "vec_initv2sfsf" > DONE; > }) > > +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > +;; > +;; Parallel single-precision floating point rounding operations. > +;; > +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > + > +(define_expand "nearbyintv2sf2" > + [(set (match_operand:V2SF 0 "register_operand") > + (unspec:V2SF > + [(match_operand:V2SF 1 "register_operand") > + (match_dup 2)] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" > + "operands[2] = GEN_INT (ROUND_MXCSR | ROUND_NO_EXC);") > + > +(define_expand "rintv2sf2" > + [(set (match_operand:V2SF 0 "register_operand") > + (unspec:V2SF > + [(match_operand:V2SF 1 "register_operand") > + (match_dup 2)] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" > + "operands[2] = GEN_INT (ROUND_MXCSR);") > + > +(define_expand "ceilv2sf2" > + [(set (match_operand:V2SF 0 "register_operand") > + (unspec:V2SF > + [(match_operand:V2SF 1 "register_operand") > + (match_dup 2)] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && !flag_trapping_math > + && TARGET_MMX_WITH_SSE" > + "operands[2] = GEN_INT (ROUND_CEIL | ROUND_NO_EXC);") > + > +(define_expand "lceilv2sfv2si2" > + [(match_operand:V2SI 0 "register_operand") > + (match_operand:V2SF 1 "register_operand")] > + "TARGET_SSE4_1 && !flag_trapping_math > + && TARGET_MMX_WITH_SSE" > +{ > + rtx tmp = gen_reg_rtx (V2SFmode); > + emit_insn (gen_ceilv2sf2 (tmp, operands[1])); > + emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp)); > + DONE; > +}) > + > +(define_expand "floorv2sf2" > + [(set (match_operand:V2SF 0 "register_operand") > + (unspec:V2SF > + [(match_operand:V2SF 1 "vector_operand") > + (match_dup 2)] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && !flag_trapping_math > + && TARGET_MMX_WITH_SSE" > + "operands[2] = GEN_INT (ROUND_FLOOR | ROUND_NO_EXC);") > + > +(define_expand "lfloorv2sfv2si2" > + [(match_operand:V2SI 0 "register_operand") > + (match_operand:V2SF 1 "register_operand")] > + "TARGET_SSE4_1 && !flag_trapping_math > + && TARGET_MMX_WITH_SSE" > +{ > + rtx tmp = gen_reg_rtx (V2SFmode); > + emit_insn (gen_floorv2sf2 (tmp, operands[1])); > + emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp)); > + DONE; > +}) > + > +(define_expand "btruncv2sf2" > + [(set (match_operand:V2SF 0 "register_operand") > + (unspec:V2SF > + [(match_operand:V2SF 1 "register_operand") > + (match_dup 2)] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && !flag_trapping_math" > + "operands[2] = GEN_INT (ROUND_TRUNC | ROUND_NO_EXC);") > + > +(define_insn "*mmx_roundv2sf2" > + [(set (match_operand:V2SF 0 "register_operand" "=Yr,*x,v") > + (unspec:V2SF > + [(match_operand:V2SF 1 "register_operand" "Yr,x,v") > + (match_operand:SI 2 "const_0_to_15_operand")] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" > + "%vroundps\t{%2, %1, %0|%0, %1, %2}" > + [(set_attr "isa" "noavx,noavx,avx") > + (set_attr "type" "ssecvt") > + (set_attr "prefix_data16" "1,1,*") > + (set_attr "prefix_extra" "1") > + (set_attr "length_immediate" "1") > + (set_attr "prefix" "orig,orig,vex") > + (set_attr "mode" "V4SF")]) > + > +(define_insn "lrintv2sfv2si2" > + [(set (match_operand:V2SI 0 "register_operand" "=v") > + (unspec:V2SI > + [(match_operand:V2SF 1 "register_operand" "v")] > + UNSPEC_FIX_NOTRUNC))] > + "TARGET_MMX_WITH_SSE" > + "%vcvtps2dq\t{%1, %0|%0, %1}" > + [(set_attr "type" "ssecvt") > + (set (attr "prefix_data16") > + (if_then_else > + (match_test "TARGET_AVX") > + (const_string "*") > + (const_string "1"))) > + (set_attr "prefix" "maybe_vex") > + (set_attr "mode" "TI")]) > + > +(define_expand "roundv2sf2" > + [(set (match_dup 3) > + (plus:V2SF > + (match_operand:V2SF 1 "register_operand") > + (match_dup 2))) > + (set (match_operand:V2SF 0 "register_operand") > + (unspec:V2SF > + [(match_dup 3) (match_dup 4)] > + UNSPEC_ROUND))] > + "TARGET_SSE4_1 && !flag_trapping_math > + && TARGET_MMX_WITH_SSE" > +{ > + const struct real_format *fmt; > + REAL_VALUE_TYPE pred_half, half_minus_pred_half; > + rtx half, vec_half; > + > + /* load nextafter (0.5, 0.0) */ > + fmt = REAL_MODE_FORMAT (SFmode); > + real_2expN (&half_minus_pred_half, -(fmt->p) - 1, SFmode); > + real_arithmetic (&pred_half, MINUS_EXPR, &dconsthalf, &half_minus_pred_half); > + half = const_double_from_real_value (pred_half, SFmode); > + > + vec_half = ix86_build_const_vector (V2SFmode, true, half); > + vec_half = force_reg (V2SFmode, vec_half); > + > + operands[2] = gen_reg_rtx (V2SFmode); > + emit_insn (gen_copysignv2sf3 (operands[2], vec_half, operands[1])); > + > + operands[3] = gen_reg_rtx (V2SFmode); > + operands[4] = GEN_INT (ROUND_TRUNC); > +}) > + > +(define_expand "lroundv2sfv2si2" > + [(match_operand:V2SI 0 "register_operand") > + (match_operand:V2SF 1 "register_operand")] > + "TARGET_SSE4_1 && !flag_trapping_math > + && TARGET_MMX_WITH_SSE" > +{ > + rtx tmp = gen_reg_rtx (V2SFmode); > + emit_insn (gen_roundv2sf2 (tmp, operands[1])); > + emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp)); > + DONE; > +}) > + > + > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > ;; > ;; Parallel half-precision floating point arithmetic > diff --git a/gcc/testsuite/gcc.target/i386/pr106910-1.c b/gcc/testsuite/gcc.target/i386/pr106910-1.c > new file mode 100644 > index 00000000000..c7685a32183 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr106910-1.c > @@ -0,0 +1,77 @@ > +/* { dg-do compile { target { ! ia32 } } } */ > +/* { dg-options "-msse4.1 -O2 -Ofast" } */ > +/* { dg-final { scan-assembler-times "roundps" 9 } } */ > +/* { dg-final { scan-assembler-times "cvtps2dq" 1 } } */ > +/* { dg-final { scan-assembler-times "cvttps2dq" 3 } } */ > + > +#include > + > +void > +foo (float* p, float* __restrict q) > +{ > + p[0] = truncf (q[0]); > + p[1] = truncf (q[1]); > +} > + > +void > +foo1 (float* p, float* __restrict q) > +{ > + p[0] = floorf (q[0]); > + p[1] = floorf (q[1]); > +} > + > +void > +foo1i (int* p, float* __restrict q) > +{ > + p[0] = (int) floorf (q[0]); > + p[1] = (int) floorf (q[1]); > +} > + > +void > +foo2 (float* p, float* __restrict q) > +{ > + p[0] = ceilf (q[0]); > + p[1] = ceilf (q[1]); > +} > + > +void > +foo2i (int* p, float* __restrict q) > +{ > + p[0] = (int) ceilf (q[0]); > + p[1] = (int) ceilf (q[1]); > +} > + > +void > +foo3 (float* p, float* __restrict q) > +{ > + p[0] = rintf (q[0]); > + p[1] = rintf (q[1]); > +} > + > +void > +foo3i (int* p, float* __restrict q) > +{ > + p[0] = (int) rintf (q[0]); > + p[1] = (int) rintf (q[1]); > +} > + > +void > +foo4 (float* p, float* __restrict q) > +{ > + p[0] = nearbyintf (q[0]); > + p[1] = nearbyintf (q[1]); > +} > + > +void > +foo5(float* p, float* __restrict q) > +{ > + p[0] = roundf (q[0]); > + p[1] = roundf (q[1]); > +} > + > +void > +foo5i(int* p, float* __restrict q) > +{ > + p[0] = (int) roundf (q[0]); > + p[1] = (int) roundf (q[1]); > +} > -- > 2.27.0 >