From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id B3BA13858421 for ; Wed, 21 Jun 2023 11:20:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B3BA13858421 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 55B1B1063; Wed, 21 Jun 2023 04:21:11 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A4B5D3F663; Wed, 21 Jun 2023 04:20:26 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,Richard Biener via Gcc-patches , liuhongt , crazylht@gmail.com, hjl.tools@gmail.com, richard.sandiford@arm.com Cc: Richard Biener via Gcc-patches , liuhongt , crazylht@gmail.com, hjl.tools@gmail.com Subject: Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed. References: <20230602010015.2571612-1-hongtao.liu@intel.com> Date: Wed, 21 Jun 2023 12:20:25 +0100 In-Reply-To: (Richard Biener's message of "Wed, 21 Jun 2023 13:04:28 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-27.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > On Wed, Jun 21, 2023 at 11:32=E2=80=AFAM Richard Sandiford > wrote: >> >> Richard Sandiford writes: >> > Richard Biener via Gcc-patches writes: >> >> On Fri, Jun 2, 2023 at 3:01=E2=80=AFAM liuhongt via Gcc-patches >> >> wrote: >> >>> >> >>> We have already use intermidate type in case WIDEN, but not for NONE, >> >>> this patch extended that. >> >>> >> >>> I didn't do that in pattern recog since we need to know whether the >> >>> stmt belongs to any slp_node to decide the vectype, the related opta= bs >> >>> are checked according to vectype_in and vectype_out. For non-slp cas= e, >> >>> vec_pack/unpack are always used when lhs has different size from rhs, >> >>> for slp case, sometimes vec_pack/unpack is used, somethings >> >>> direct conversion is used. >> >>> >> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >> >>> Ok for trunk? >> >>> >> >>> gcc/ChangeLog: >> >>> >> >>> PR target/110018 >> >>> * tree-vect-stmts.cc (vectorizable_conversion): Use >> >>> intermiediate integer type for float_expr/fix_trunc_expr when >> >>> direct optab is not existed. >> >>> >> >>> gcc/testsuite/ChangeLog: >> >>> >> >>> * gcc.target/i386/pr110018-1.c: New test. >> >>> --- >> >>> gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 +++++++++++++++++++= +++ >> >>> gcc/tree-vect-stmts.cc | 56 ++++++++++++- >> >>> 2 files changed, 149 insertions(+), 1 deletion(-) >> >>> create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> >> >>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c b/gcc/testsu= ite/gcc.target/i386/pr110018-1.c >> >>> new file mode 100644 >> >>> index 00000000000..b1baffd7af1 >> >>> --- /dev/null >> >>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> @@ -0,0 +1,94 @@ >> >>> +/* { dg-do compile } */ >> >>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ >> >>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } = */ >> >>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } = */ >> >>> + >> >>> +void >> >>> +foo (double* __restrict a, char* b) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo1 (float* __restrict a, char* b) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> + a[2] =3D b[2]; >> >>> + a[3] =3D b[3]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo2 (_Float16* __restrict a, char* b) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> + a[2] =3D b[2]; >> >>> + a[3] =3D b[3]; >> >>> + a[4] =3D b[4]; >> >>> + a[5] =3D b[5]; >> >>> + a[6] =3D b[6]; >> >>> + a[7] =3D b[7]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo3 (double* __restrict a, short* b) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo4 (float* __restrict a, char* b) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> + a[2] =3D b[2]; >> >>> + a[3] =3D b[3]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo5 (double* __restrict b, char* a) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo6 (float* __restrict b, char* a) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> + a[2] =3D b[2]; >> >>> + a[3] =3D b[3]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo7 (_Float16* __restrict b, char* a) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> + a[2] =3D b[2]; >> >>> + a[3] =3D b[3]; >> >>> + a[4] =3D b[4]; >> >>> + a[5] =3D b[5]; >> >>> + a[6] =3D b[6]; >> >>> + a[7] =3D b[7]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo8 (double* __restrict b, short* a) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo9 (float* __restrict b, char* a) >> >>> +{ >> >>> + a[0] =3D b[0]; >> >>> + a[1] =3D b[1]; >> >>> + a[2] =3D b[2]; >> >>> + a[3] =3D b[3]; >> >>> +} >> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> >>> index bd3b07a3aa1..1118c89686d 100644 >> >>> --- a/gcc/tree-vect-stmts.cc >> >>> +++ b/gcc/tree-vect-stmts.cc >> >>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo, >> >>> return false; >> >>> if (supportable_convert_operation (code, vectype_out, vectype= _in, &code1)) >> >>> break; >> >> >> >> A comment would be nice here. Like >> >> >> >> /* For conversions between float and smaller integer types try whe= ther we can >> >> use intermediate signed integer types to support the conversion= . */ >> >> >> >>> + if ((code =3D=3D FLOAT_EXPR >> >>> + && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode)) >> >>> + || (code =3D=3D FIX_TRUNC_EXPR >> >>> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)= )) >> > >> > Is the FIX_TRUNC_EXPR case safe without some flag? >> > >> > #include >> > int32_t x =3D (int32_t)0x1.0p32; >> > int32_t y =3D (int32_t)(int64_t)0x1.0p32; >> > >> > sets x to 2147483647 and y to 0. > > Hmm, good question. GENERIC has a direct truncation to unsigned char > for example, the C standard generally says if the integral part cannot > be represented then the behavior is undefined. So I think we should be > safe here (0x1.0p32 doesn't fit an int). OK. >> Also, I think multi_step_cvt should influence the costs, since at >> the moment we cost one statement but generate two. This makes a >> difference for SVE with VECT_COMPARE_COSTS. Would changing it to: >> >> vect_model_simple_cost (vinfo, stmt_info, >> ncopies * (multi_step_cvt + 1), >> dt, ndts, slp_node, >> cost_vec); >> >> be OK? > > Yeah, I guess so. Thanks, will send a patch. >> There again, I wonder if we should handle this using patterns instead. >> That makes both conversions explicit and therefore easier to cost. > > But we don't do this for the other multi-step conversions either ... Yeah, agree it's not a new problem. > but sure, I also suggested that but the complaint was that with > BB SLP this would get us a vector type with off lanes? Ah, sorry, missed the previous discussion. I only became interested once some SVE tests started failing :) >> E.g. for SVE, an integer extension is free if the source is a load, >> and we do try to model that. But it's difficult to handle if the >> conversion is only implicit. > > In the distant future I hope that vectorizable_conversion will > upon analysis produce an SLP sub-graph for the suggested > code generation which we'd then cost. Sounds good. Agree we can live with the compound operation until then. > My current idea is > to get this part live before switching the analysis to all-SLP > because it "seems" easier (fingers crossing). Still I'm > struggling to even find time to start that effort. Yeah, can sympthaise :/ Don't know how you find time for all the reviews you currently do. Thanks, Richard