From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112b.google.com (mail-yw1-x112b.google.com [IPv6:2607:f8b0:4864:20::112b]) by sourceware.org (Postfix) with ESMTPS id 8DD813858416 for ; Tue, 6 Jun 2023 08:15:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8DD813858416 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x112b.google.com with SMTP id 00721157ae682-565c7399afaso63265337b3.1 for ; Tue, 06 Jun 2023 01:15:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686039356; x=1688631356; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ao4wN8B3thN97Sb6fx6BMYL14ZadbFUDZRu4yzo8kw0=; b=GpE1EyCy26PyXVbS0UMEfGkb/+D57Q6eDjGjzvhBENCsTBp7gaacvn5vKYScJeKzuZ PrEpn+1r5B/4R0KCd2n3SluueCCx9WN00SYMrh3ReiVu1wKqeAIfq7tIK5/mZ0Phq6Fa 4KruV/XBdVJPn1dhIzkT9oMhfFfd0MYMxVHmUd0VKiUSZDiFxbGw2Fy4IweE03cbXfYM /Z2XKhvYGsP/B/u/vGYotPyhgKza8wHQDwiU0UDHVEsHBR285cNAPeSlcVnQJu3kWLZw oYVnPxbmkbZKeTJTO8wJhA6U+YY0x7cTVFIqT3Efsw0ab+ovY71YK9vOpNcjPh7wNoGS pJbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686039356; x=1688631356; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ao4wN8B3thN97Sb6fx6BMYL14ZadbFUDZRu4yzo8kw0=; b=faMfxfRjrHvy57Gi5sZsih/e1T1vk66M3Uk8RH/xDjRDUbneGQDR0ZIVmkDZq7QOkm kgSw5iXEf925GBeupSNvX4fuhqpoNNgQH98hk6MA9zpSLnuyjoZmYa3YFWnqpw907H/+ /dROXchMoJ27+WYwVoM94c8b3r0rzVS8tbpstVqJmQPoEr1Mc/jSagTOEBOUjzz0tIjd MNqNDpgPBOwZ5BylUlAIiJ7LPIdhhM2MEI8F2y+2zf8/gfDHripmtnBya5cJRVSiSP+r PK05TZ32DMGvLqo+SvXknJOA6hMVUchw7JMe6hbvkMFvf20Tla+AeJapDWyqsDIiXiIL Ow0Q== X-Gm-Message-State: AC+VfDyJvhQljmTffMUuxx/w3Yk62ZvhZxtIf6GQSaoxVl8W3D9bX7Ji 3hO2Ey/p2g73PCnMUOnRbuIcC3XEKIxY3G63xJi1/aigH1E= X-Google-Smtp-Source: ACHHUZ5CmTHqmQ/vBHHl7F4GZwCyD0EApAErj37jEyb57O0DrcZYNTklNOnTPaK+1VC5IHpx2uWHp4Ofd84k51odYDo= X-Received: by 2002:a0d:da47:0:b0:565:3571:7a09 with SMTP id c68-20020a0dda47000000b0056535717a09mr1544035ywe.52.1686039355861; Tue, 06 Jun 2023 01:15:55 -0700 (PDT) MIME-Version: 1.0 References: <20230606043121.24843-1-hongtao.liu@intel.com> In-Reply-To: From: Hongtao Liu Date: Tue, 6 Jun 2023 16:15:44 +0800 Message-ID: Subject: Re: [PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE. To: Andrew Pinski Cc: liuhongt , gcc-patches@gcc.gnu.org, hjl.tools@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Jun 6, 2023 at 12:49=E2=80=AFPM Andrew Pinski w= rote: > > On Mon, Jun 5, 2023 at 9:34=E2=80=AFPM liuhongt via Gcc-patches > wrote: > > > > r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for > > TYPE_MIN, but PABSB will store unsigned result into dst. The patch > > uses ABSU_EXPR + VCE instead of ABS_EXPR. > > > > Also don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT since 64-bit > > vector absm2 is guarded with TARGET_MMX_WITH_SSE. > > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for trunk? > > > > > > gcc/ChangeLog: > > > > PR target/110108 > > * config/i386/i386.cc (ix86_gimple_fold_builtin): Fold > > _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple > > ABSU_EXPR + VCE, don't fold _mm_abs_{pi8,pi16,pi32} w/o > > TARGET_64BIT. > > * config/i386/i386-builtin.def: Replace CODE_FOR_nothing with > > real codename for __builtin_ia32_pabs{b,w,d}. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr110108.c: New test. > > --- > > gcc/config/i386/i386-builtin.def | 6 ++-- > > gcc/config/i386/i386.cc | 44 ++++++++++++++++++++---- > > gcc/testsuite/gcc.target/i386/pr110108.c | 16 +++++++++ > > 3 files changed, 56 insertions(+), 10 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr110108.c > > > > diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-bu= iltin.def > > index 383b68a9bb8..7ba5b6a9d11 100644 > > --- a/gcc/config/i386/i386-builtin.def > > +++ b/gcc/config/i386/i386-builtin.def > > @@ -900,11 +900,11 @@ BDESC (OPTION_MASK_ISA_SSE3, 0, CODE_FOR_sse3_hsu= bv2df3, "__builtin_ia32_hsubpd" > > > > /* SSSE3 */ > > BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_nothing, "__builtin_ia32_pab= sb128", IX86_BUILTIN_PABSB128, UNKNOWN, (int) V16QI_FTYPE_V16QI) > > -BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothin= g, "__builtin_ia32_pabsb", IX86_BUILTIN_PABSB, UNKNOWN, (int) V8QI_FTYPE_V8= QI) > > +BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_ssse3_= absv8qi2, "__builtin_ia32_pabsb", IX86_BUILTIN_PABSB, UNKNOWN, (int) V8QI_F= TYPE_V8QI) > > BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_nothing, "__builtin_ia32_pab= sw128", IX86_BUILTIN_PABSW128, UNKNOWN, (int) V8HI_FTYPE_V8HI) > > -BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothin= g, "__builtin_ia32_pabsw", IX86_BUILTIN_PABSW, UNKNOWN, (int) V4HI_FTYPE_V4= HI) > > +BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_ssse3_= absv4hi2, "__builtin_ia32_pabsw", IX86_BUILTIN_PABSW, UNKNOWN, (int) V4HI_F= TYPE_V4HI) > > BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_nothing, "__builtin_ia32_pab= sd128", IX86_BUILTIN_PABSD128, UNKNOWN, (int) V4SI_FTYPE_V4SI) > > -BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothin= g, "__builtin_ia32_pabsd", IX86_BUILTIN_PABSD, UNKNOWN, (int) V2SI_FTYPE_V2= SI) > > +BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_ssse3_= absv2si2, "__builtin_ia32_pabsd", IX86_BUILTIN_PABSD, UNKNOWN, (int) V2SI_F= TYPE_V2SI) > > > > BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_ssse3_phaddwv8hi3, "__builti= n_ia32_phaddw128", IX86_BUILTIN_PHADDW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V= 8HI) > > BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_ssse3_= phaddwv4hi3, "__builtin_ia32_phaddw", IX86_BUILTIN_PHADDW, UNKNOWN, (int) V= 4HI_FTYPE_V4HI_V4HI) > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index d4ff56ee8dd..b09b3c79e99 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -18433,6 +18433,7 @@ bool > > ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi) > > { > > gimple *stmt =3D gsi_stmt (*gsi), *g; > > + gimple_seq stmts =3D NULL; > > tree fndecl =3D gimple_call_fndecl (stmt); > > gcc_checking_assert (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_M= D)); > > int n_args =3D gimple_call_num_args (stmt); > > @@ -18555,7 +18556,6 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator = *gsi) > > { > > loc =3D gimple_location (stmt); > > tree type =3D TREE_TYPE (arg2); > > - gimple_seq stmts =3D NULL; > > if (VECTOR_FLOAT_TYPE_P (type)) > > { > > tree itype =3D GET_MODE_INNER (TYPE_MODE (type)) =3D=3D E= _SFmode > > @@ -18610,7 +18610,6 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator = *gsi) > > tree zero_vec =3D build_zero_cst (type); > > tree minus_one_vec =3D build_minus_one_cst (type); > > tree cmp_type =3D truth_type_for (type); > > - gimple_seq stmts =3D NULL; > > tree cmp =3D gimple_build (&stmts, tcode, cmp_type, arg0, arg= 1); > > gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > > g =3D gimple_build_assign (gimple_call_lhs (stmt), > > @@ -18904,14 +18903,18 @@ ix86_gimple_fold_builtin (gimple_stmt_iterato= r *gsi) > > break; > > > > case IX86_BUILTIN_PABSB: > > + case IX86_BUILTIN_PABSW: > > + case IX86_BUILTIN_PABSD: > > + /* 64-bit vector abs2 is only supported under TARGET_MMX_W= ITH_SSE. */ > > + if (!TARGET_64BIT) > > + break; > > + /* FALLTHRU. */ > > case IX86_BUILTIN_PABSB128: > > case IX86_BUILTIN_PABSB256: > > case IX86_BUILTIN_PABSB512: > > - case IX86_BUILTIN_PABSW: > > case IX86_BUILTIN_PABSW128: > > case IX86_BUILTIN_PABSW256: > > case IX86_BUILTIN_PABSW512: > > - case IX86_BUILTIN_PABSD: > > case IX86_BUILTIN_PABSD128: > > case IX86_BUILTIN_PABSD256: > > case IX86_BUILTIN_PABSD512: > > @@ -18933,9 +18936,36 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator= *gsi) > > if (n_args > 1 > > && !ix86_masked_all_ones (elems, gimple_call_arg (stmt, n_arg= s - 1))) > > break; > > - loc =3D gimple_location (stmt); > > - g =3D gimple_build_assign (gimple_call_lhs (stmt), ABS_EXPR, arg= 0); > > - gsi_replace (gsi, g, false); > > + { > > + tree utype, ures, vce; > > + switch (GET_MODE_INNER (TYPE_MODE (TREE_TYPE (arg0)))) > > + { > > + case E_QImode: > > + utype =3D unsigned_intQI_type_node; > > + break; > > + case E_HImode: > > + utype =3D unsigned_intHI_type_node; > > + break; > > + case E_SImode: > > + utype =3D unsigned_intSI_type_node; > > + break; > > + case E_DImode: > > + utype =3D long_long_unsigned_type_node; > > + break; > > + default: > > + gcc_unreachable (); > > + } > > + utype =3D get_same_sized_vectype (utype, TREE_TYPE (arg0)); > > The above switch can be replaced with just simply > utype =3D unsigned_type_for (TREE_TYPE (arg0)); Yes, thanks. > > > + /* PABSB/W/D/Q store the unsigned result in dst, use ABSU_EXPR > > + instead of ABS_EXPR to hanlde overflow case(TYPE_MIN). */ > > + ures =3D gimple_build (&stmts, ABSU_EXPR, utype, arg0); > > + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > > + loc =3D gimple_location (stmt); > > + vce =3D build1 (VIEW_CONVERT_EXPR, TREE_TYPE (arg0), ures); > > + g =3D gimple_build_assign (gimple_call_lhs (stmt), > > + VIEW_CONVERT_EXPR, vce); > > + gsi_replace (gsi, g, false); > > + } > > return true; > > > > default: > > diff --git a/gcc/testsuite/gcc.target/i386/pr110108.c b/gcc/testsuite/g= cc.target/i386/pr110108.c > > new file mode 100644 > > index 00000000000..cd05763b9bf > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr110108.c > > @@ -0,0 +1,16 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-mavx2 -O2" } */ > > +/* { dg-final { scan-assembler-times "vpblendvb" 2 } } */ > > +#include > > + > > +__m128i do_stuff_128(__m128i X0, __m128i X1) { > > + __m128i AbsX0 =3D _mm_abs_epi8(X0); > > + __m128i Result =3D _mm_blendv_epi8(AbsX0, X1, AbsX0); > > + return Result; > > +} > > + > > +__m256i do_stuff_256(__m256i X0, __m256i X1) { > > + __m256i AbsX0 =3D _mm256_abs_epi8(X0); > > + __m256i Result =3D _mm256_blendv_epi8(AbsX0, X1, AbsX0); > > + return Result; > > +} > > -- > > 2.39.1.388.g2fc9e9ca3c > > --=20 BR, Hongtao