From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id A635F3858416 for ; Tue, 6 Jun 2023 04:46:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A635F3858416 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-53202149ae2so3305928a12.3 for ; Mon, 05 Jun 2023 21:46:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686026815; x=1688618815; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2tZAFEluRBpqUgNvqyokws62iZubng6CXkN1eRJZs0o=; b=sIxv3UU4GRRw2FfWIvHwxAeP2C3Hv03GiotKd1QVHqj1cfiQ6gDGSconovGBfMi6p7 iCq81Ppx5SR513cpuUc6b56YcAebzvt2/v5QwylzUKJ+Vbq14wdJ0eDE09RuQkZovsXF lylyl/KXsHIVAwH5OoVAM2D3SSp6ZLllVzXXnrAtX4HZ4P53/H/1AXVGfxD7fwuYX6DR CzumgVNyT0m6rxjQ1+CyVJkKyDcxzZ0Ywz18BSHB08zTm5ny0Z2VHksyzFKlCn8xUj5K ymyTZnKv7XdUFAHsHQU6wAK1mq3XHU7t9UFm8J+ZZ7YD0b8hewaKQzl+T16pd5DsnK0A 5QMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686026815; x=1688618815; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2tZAFEluRBpqUgNvqyokws62iZubng6CXkN1eRJZs0o=; b=NCyHiI/Jk7Miq/JjYfDEP7Ll5A4K5gVimsGX8u26UD6izVVXwsFb2sdnqj3KHotb1n +vBigVPG5yoOAaDR3iiC1isIn4ZWImeb70ooTvHU/v6vlgP0/pLtimcKB84DZsDKuS0n /m7UfUtO9HyQ6rBFTT+ONx3A+Bmv/wUMjvtJ22TZqdgvM0aCrKvGFw1j20jLy+Gy24XT DLRqPelo1iL35k8HufvY5YiflEVN07DiWf+hsj+iWYjP9S/C1EZl1yVy8TJ2B9xd+wHs tGB0fWQg+L27MjuhhUK5zUwUDFUTF0R8kd8MN32Q109mrHgGl98HYV6CLhpSr/k/hBbD LeVg== X-Gm-Message-State: AC+VfDx0ANiXqncoFJQZPtfRTK1rj4isviBvlQdNN88LBlnCBEyxswjl 8GPJEnye7OPg4uZ4p93Qj8T136oppsjdB3ADO5I= X-Google-Smtp-Source: ACHHUZ7ANIppfziJjgRnDOtpqxWRcBgMeuhOTybBvmkZI6PY2lqwZIVJ8OU14RkAMjSNfYwmgXvQe2vTdE7H4tLzhdY= X-Received: by 2002:a17:90a:4ec5:b0:255:70e4:ad25 with SMTP id v5-20020a17090a4ec500b0025570e4ad25mr568136pjl.15.1686026815391; Mon, 05 Jun 2023 21:46:55 -0700 (PDT) MIME-Version: 1.0 References: <20230606043121.24843-1-hongtao.liu@intel.com> <20230606043121.24843-2-hongtao.liu@intel.com> In-Reply-To: <20230606043121.24843-2-hongtao.liu@intel.com> From: Andrew Pinski Date: Mon, 5 Jun 2023 21:46:43 -0700 Message-ID: Subject: Re: [PATCH] Don't fold _mm{, 256}_blendv_epi8 into (mask < 0 ? src1 : src2) when -funsigned-char. To: liuhongt Cc: gcc-patches@gcc.gnu.org, crazylht@gmail.com, hjl.tools@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Jun 5, 2023 at 9:34=E2=80=AFPM liuhongt via Gcc-patches wrote: > > Since mask < 0 will be always false when -funsigned-char, but > vpblendvb needs to check the most significant bit. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk and backport to GCC12/GCC13 release branch? I think this is a better patch and will always be correct and still get folded at the gimple level (correctly): diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index d4ff56ee8dd..02bf5ba93a5 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -18561,8 +18561,10 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gs= i) tree itype =3D GET_MODE_INNER (TYPE_MODE (type)) =3D=3D E_SFm= ode ? intSI_type_node : intDI_type_node; type =3D get_same_sized_vectype (itype, type); - arg2 =3D gimple_build (&stmts, VIEW_CONVERT_EXPR, type, arg2)= ; } + else + type =3D signed_type_for (type); + arg2 =3D gimple_build (&stmts, VIEW_CONVERT_EXPR, type, arg2); tree zero_vec =3D build_zero_cst (type); tree cmp_type =3D truth_type_for (type); tree cmp =3D gimple_build (&stmts, LT_EXPR, cmp_type, arg2, zero_= vec); Thanks, Andrew Pinski > > gcc/ChangeLog: > > PR target/110108 > * config/i386/i386-builtin.def (BDESC): Replace > CODE_FOR_nothing with real code name for blendvb builtins. > * config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold > _mm{,256}_blendv_epi8 into (mask < 0 ? src1 : src2) when > -funsigned-char. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr110108-2.c: New test. > --- > gcc/config/i386/i386-builtin.def | 4 ++-- > gcc/config/i386/i386.cc | 7 +++++++ > gcc/testsuite/gcc.target/i386/pr110108-2.c | 14 ++++++++++++++ > 3 files changed, 23 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr110108-2.c > > diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-buil= tin.def > index 7ba5b6a9d11..b4c99ff62a2 100644 > --- a/gcc/config/i386/i386-builtin.def > +++ b/gcc/config/i386/i386-builtin.def > @@ -944,7 +944,7 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpp= d, "__builtin_ia32_dppd", I > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpps, "__builtin_ia32_= dpps", IX86_BUILTIN_DPPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_insertps_v4sf, "__buil= tin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) V4SF_FTYPE_= V4SF_V4SF_INT) > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia= 32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V= 16QI_INT) > -BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, "__builtin_ia32_pble= ndvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_= V16QI) > +BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_pblendvb, "__builtin_i= a32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) V16QI_FTYPE_V16Q= I_V16QI_V16QI) > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_pblendw, "__builtin_ia= 32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8H= I_INT) > > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_sign_extendv8qiv8hi2, = "__builtin_ia32_pmovsxbw128", IX86_BUILTIN_PMOVSXBW128, UNKNOWN, (int) V8HI= _FTYPE_V16QI) > @@ -1198,7 +1198,7 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_andv4di3, = "__builtin_ia32_andsi256", IX > BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_andnotv4di3, "__builtin_ia= 32_andnotsi256", IX86_BUILTIN_ANDNOT256I, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4= DI) > BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_uavgv32qi3, "__builtin_ia3= 2_pavgb256", IX86_BUILTIN_PAVGB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI= ) > BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_uavgv16hi3, "__builtin_ia3= 2_pavgw256", IX86_BUILTIN_PAVGW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI= ) > -BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_nothing, "__builtin_ia32_pblend= vb256", IX86_BUILTIN_PBLENDVB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V3= 2QI) > +BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pblendvb, "__builtin_ia32= _pblendvb256", IX86_BUILTIN_PBLENDVB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V= 32QI_V32QI) > BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pblendw, "__builtin_ia32_p= blendw256", IX86_BUILTIN_PBLENDVW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16H= I_INT) > BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpeq= b256", IX86_BUILTIN_PCMPEQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI) > BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpeq= w256", IX86_BUILTIN_PCMPEQW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI) > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b09b3c79e99..f8f6c26c8eb 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -18548,6 +18548,13 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *= gsi) > /* FALLTHRU. */ > case IX86_BUILTIN_PBLENDVB128: > case IX86_BUILTIN_BLENDVPS: > + /* Don't fold PBLENDVB when funsigned-char since mask < 0 > + will always be false in the gimple level. */ > + if ((fn_code =3D=3D IX86_BUILTIN_PBLENDVB128 > + || fn_code =3D=3D IX86_BUILTIN_PBLENDVB256) > + && !flag_signed_char) > + break; > + > gcc_assert (n_args =3D=3D 3); > arg0 =3D gimple_call_arg (stmt, 0); > arg1 =3D gimple_call_arg (stmt, 1); > diff --git a/gcc/testsuite/gcc.target/i386/pr110108-2.c b/gcc/testsuite/g= cc.target/i386/pr110108-2.c > new file mode 100644 > index 00000000000..2d1d2fd4991 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr110108-2.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx2 -O2 -funsigned-char" } */ > +/* { dg-final { scan-assembler-times "vpblendvb" 2 } } */ > + > +#include > +__m128i do_stuff_128(__m128i X0, __m128i X1, __m128i X2) { > + __m128i Result =3D _mm_blendv_epi8(X0, X1, X2); > + return Result; > +} > + > +__m256i do_stuff_256(__m256i X0, __m256i X1, __m256i X2) { > + __m256i Result =3D _mm256_blendv_epi8(X0, X1, X2); > + return Result; > +} > -- > 2.39.1.388.g2fc9e9ca3c >