* [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908]
@ 2021-04-27 9:58 Hongtao Liu
2021-05-12 7:29 ` Hongtao Liu
2021-05-12 8:36 ` Uros Bizjak
0 siblings, 2 replies; 7+ messages in thread
From: Hongtao Liu @ 2021-04-27 9:58 UTC (permalink / raw)
To: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 660 bytes --]
Hi:
As described in the subject line, this patch is about to do the
below transformation.
- vpcmpeqd %ymm3, %ymm3, %ymm3
- vpandn %ymm3, %ymm2, %ymm2
- vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
+ vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
gcc/ChangeLog:
PR target/99908
* config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add
splitters for pblendvb of NOT mask register.
gcc/testsuite/ChangeLog:
PR target/99908
* gcc.target/i386/avx2-pr99908.c: New test.
* gcc.target/i386/sse4_1-pr99908.c: New test.
--
BR,
Hongtao
[-- Attachment #2: 0001-i386-Optimize-vpblendvb-on-inverted-mask-register-to.patch --]
[-- Type: text/x-patch, Size: 4474 bytes --]
From e1daa651d201f9ab3a85a80a635746fcf4be70ab Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 7 Apr 2021 09:58:54 +0800
Subject: [PATCH] i386: Optimize vpblendvb on inverted mask register to
vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908]
- vpcmpeqd %ymm3, %ymm3, %ymm3
- vpandn %ymm3, %ymm2, %ymm2
- vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
+ vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
gcc/ChangeLog:
PR target/99908
* config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add
splitters for pblendvb of NOT mask register.
gcc/testsuite/ChangeLog:
PR target/99908
* gcc.target/i386/avx2-pr99908.c: New test.
* gcc.target/i386/sse4_1-pr99908.c: New test.
---
gcc/config/i386/sse.md | 29 +++++++++++++++++++
gcc/testsuite/gcc.target/i386/avx2-pr99908.c | 25 ++++++++++++++++
.../gcc.target/i386/sse4_1-pr99908.c | 23 +++++++++++++++
3 files changed, 77 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/i386/avx2-pr99908.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 897cf3eaea9..4ef22b428e4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17735,6 +17735,35 @@ (define_insn "<sse4_1_avx2>_pblendvb"
(set_attr "btver2_decode" "vector,vector,vector")
(set_attr "mode" "<sseinsnmode>")])
+(define_split
+ [(set (match_operand:VI1_AVX2 0 "register_operand")
+ (unspec:VI1_AVX2
+ [(match_operand:VI1_AVX2 1 "vector_operand")
+ (match_operand:VI1_AVX2 2 "register_operand")
+ (not:VI1_AVX2 (match_operand:VI1_AVX2 3 "register_operand"))]
+ UNSPEC_BLENDV))]
+ "TARGET_SSE4_1"
+ [(set (match_dup 0)
+ (unspec:VI1_AVX2
+ [(match_dup 2) (match_dup 1) (match_dup 3)]
+ UNSPEC_BLENDV))])
+
+(define_split
+ [(set (match_operand:VI1_AVX2 0 "register_operand")
+ (unspec:VI1_AVX2
+ [(match_operand:VI1_AVX2 1 "vector_operand")
+ (match_operand:VI1_AVX2 2 "register_operand")
+ (subreg:VI1_AVX2 (not (match_operand 3 "register_operand")) 0)]
+ UNSPEC_BLENDV))]
+ "TARGET_SSE4_1
+ && GET_MODE_CLASS (GET_MODE (operands[3])) == MODE_VECTOR_INT
+ && GET_MODE_SIZE (GET_MODE (operands[3])) == <MODE_SIZE>"
+ [(set (match_dup 0)
+ (unspec:VI1_AVX2
+ [(match_dup 2) (match_dup 1) (match_dup 4)]
+ UNSPEC_BLENDV))]
+ "operands[4] = gen_lowpart (<MODE>mode, operands[3]);")
+
(define_insn_and_split "*<sse4_1_avx2>_pblendvb_lt"
[(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
(unspec:VI1_AVX2
diff --git a/gcc/testsuite/gcc.target/i386/avx2-pr99908.c b/gcc/testsuite/gcc.target/i386/avx2-pr99908.c
new file mode 100644
index 00000000000..2775f3b50f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-pr99908.c
@@ -0,0 +1,25 @@
+/* PR target/99908 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2 -masm=att" } */
+/* { dg-final { scan-assembler-times "\tvpblendvb\t" 2 } } */
+/* { dg-final { scan-assembler-not "\tvpcmpeq" } } */
+/* { dg-final { scan-assembler-not "\tvpandn" } } */
+
+#include <x86intrin.h>
+
+__m256i
+f1 (__m256i a, __m256i b, __m256i mask)
+{
+ return _mm256_blendv_epi8(a, b,
+ _mm256_andnot_si256(mask, _mm256_set1_epi8(255)));
+}
+
+__m256i
+f2 (__v32qi x, __v32qi a, __v32qi b)
+{
+ x ^= (__v32qi) { -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1 };
+ return _mm256_blendv_epi8 ((__m256i) a, (__m256i) b, (__m256i) x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c b/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c
new file mode 100644
index 00000000000..c13e730b220
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c
@@ -0,0 +1,23 @@
+/* PR target/99908 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse4.1 -mno-avx -masm=att" } */
+/* { dg-final { scan-assembler-times "\tpblendvb\t" 2 } } */
+/* { dg-final { scan-assembler-not "\tpcmpeq" } } */
+/* { dg-final { scan-assembler-not "\tpandn" } } */
+
+#include <x86intrin.h>
+
+__m128i
+f1 (__m128i a, __m128i b, __m128i mask)
+{
+ return _mm_blendv_epi8(a, b,
+ _mm_andnot_si128(mask, _mm_set1_epi8(255)));
+}
+
+__m128i
+f2 (__v16qi x, __v16qi a, __v16qi b)
+{
+ x ^= (__v16qi) { -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1 };
+ return _mm_blendv_epi8 ((__m128i) a, (__m128i) b, (__m128i) x);
+}
--
2.18.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908]
2021-04-27 9:58 [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908] Hongtao Liu
@ 2021-05-12 7:29 ` Hongtao Liu
2021-05-12 8:36 ` Uros Bizjak
1 sibling, 0 replies; 7+ messages in thread
From: Hongtao Liu @ 2021-05-12 7:29 UTC (permalink / raw)
To: GCC Patches
ping
On Tue, Apr 27, 2021 at 5:58 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
> As described in the subject line, this patch is about to do the
> below transformation.
>
> - vpcmpeqd %ymm3, %ymm3, %ymm3
> - vpandn %ymm3, %ymm2, %ymm2
> - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
> + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
>
> Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
>
> gcc/ChangeLog:
>
> PR target/99908
> * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add
> splitters for pblendvb of NOT mask register.
>
> gcc/testsuite/ChangeLog:
>
> PR target/99908
> * gcc.target/i386/avx2-pr99908.c: New test.
> * gcc.target/i386/sse4_1-pr99908.c: New test.
>
> --
> BR,
> Hongtao
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908]
2021-04-27 9:58 [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908] Hongtao Liu
2021-05-12 7:29 ` Hongtao Liu
@ 2021-05-12 8:36 ` Uros Bizjak
2021-05-12 11:46 ` Hongtao Liu
1 sibling, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2021-05-12 8:36 UTC (permalink / raw)
To: Hongtao Liu; +Cc: GCC Patches
On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi:
> As described in the subject line, this patch is about to do the
> below transformation.
>
> - vpcmpeqd %ymm3, %ymm3, %ymm3
> - vpandn %ymm3, %ymm2, %ymm2
> - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
> + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
>
> Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
>
> gcc/ChangeLog:
>
> PR target/99908
> * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add
> splitters for pblendvb of NOT mask register.
>
> gcc/testsuite/ChangeLog:
>
> PR target/99908
> * gcc.target/i386/avx2-pr99908.c: New test.
> * gcc.target/i386/sse4_1-pr99908.c: New test.
OK.
Thanks,
Uros.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908]
2021-05-12 8:36 ` Uros Bizjak
@ 2021-05-12 11:46 ` Hongtao Liu
2021-05-12 12:38 ` Uros Bizjak
0 siblings, 1 reply; 7+ messages in thread
From: Hongtao Liu @ 2021-05-12 11:46 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
On Wed, May 12, 2021 at 4:36 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi:
> > As described in the subject line, this patch is about to do the
> > below transformation.
> >
> > - vpcmpeqd %ymm3, %ymm3, %ymm3
> > - vpandn %ymm3, %ymm2, %ymm2
> > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
> > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
> >
> > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> >
> > gcc/ChangeLog:
> >
> > PR target/99908
> > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add
> > splitters for pblendvb of NOT mask register.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/99908
> > * gcc.target/i386/avx2-pr99908.c: New test.
> > * gcc.target/i386/sse4_1-pr99908.c: New test.
>
> OK.
>
> Thanks,
> Uros.
Thanks for the review.
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908]
2021-05-12 11:46 ` Hongtao Liu
@ 2021-05-12 12:38 ` Uros Bizjak
2021-05-13 0:43 ` Hongtao Liu
0 siblings, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2021-05-12 12:38 UTC (permalink / raw)
To: Hongtao Liu; +Cc: GCC Patches
On Wed, May 12, 2021 at 1:42 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, May 12, 2021 at 4:36 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi:
> > > As described in the subject line, this patch is about to do the
> > > below transformation.
> > >
> > > - vpcmpeqd %ymm3, %ymm3, %ymm3
> > > - vpandn %ymm3, %ymm2, %ymm2
> > > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
> > > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
> > >
> > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/99908
> > > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add
> > > splitters for pblendvb of NOT mask register.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/99908
> > > * gcc.target/i386/avx2-pr99908.c: New test.
> > > * gcc.target/i386/sse4_1-pr99908.c: New test.
>
> Thanks for the review.
OTOH, have you considered ix86_fold_builtinor
ix86_gimple_fold_builtin? These functions are implemented as builtins,
so perhaps the transformation can be more efficiently implemented by
calling these two target functions.
Uros.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908]
2021-05-12 12:38 ` Uros Bizjak
@ 2021-05-13 0:43 ` Hongtao Liu
2021-05-21 2:32 ` Hongtao Liu
0 siblings, 1 reply; 7+ messages in thread
From: Hongtao Liu @ 2021-05-13 0:43 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
On Wed, May 12, 2021 at 8:38 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, May 12, 2021 at 1:42 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Wed, May 12, 2021 at 4:36 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Hi:
> > > > As described in the subject line, this patch is about to do the
> > > > below transformation.
> > > >
> > > > - vpcmpeqd %ymm3, %ymm3, %ymm3
> > > > - vpandn %ymm3, %ymm2, %ymm2
> > > > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
> > > > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
> > > >
> > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/99908
> > > > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add
> > > > splitters for pblendvb of NOT mask register.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR target/99908
> > > > * gcc.target/i386/avx2-pr99908.c: New test.
> > > > * gcc.target/i386/sse4_1-pr99908.c: New test.
> >
> > Thanks for the review.
>
> OTOH, have you considered ix86_fold_builtinor
> ix86_gimple_fold_builtin? These functions are implemented as builtins,
> so perhaps the transformation can be more efficiently implemented by
> calling these two target functions.
Good idea, I'll try that.
>
> Uros.
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908]
2021-05-13 0:43 ` Hongtao Liu
@ 2021-05-21 2:32 ` Hongtao Liu
0 siblings, 0 replies; 7+ messages in thread
From: Hongtao Liu @ 2021-05-21 2:32 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
On Thu, May 13, 2021 at 8:43 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, May 12, 2021 at 8:38 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, May 12, 2021 at 1:42 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Wed, May 12, 2021 at 4:36 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > Hi:
> > > > > As described in the subject line, this patch is about to do the
> > > > > below transformation.
> > > > >
> > > > > - vpcmpeqd %ymm3, %ymm3, %ymm3
> > > > > - vpandn %ymm3, %ymm2, %ymm2
> > > > > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
> > > > > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
> > > > >
> > > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/99908
> > > > > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add
> > > > > splitters for pblendvb of NOT mask register.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR target/99908
> > > > > * gcc.target/i386/avx2-pr99908.c: New test.
> > > > > * gcc.target/i386/sse4_1-pr99908.c: New test.
> > >
> > > Thanks for the review.
> >
> > OTOH, have you considered ix86_fold_builtinor
> > ix86_gimple_fold_builtin? These functions are implemented as builtins,
> > so perhaps the transformation can be more efficiently implemented by
> > calling these two target functions.
> Good idea, I'll try that.
I find it's not that good to fold andn to 2 gimple IRs which don't
always come back to andn in rtl, and lose some opt.
But blendv folding seems to be obviously good.
> >
> > Uros.
>
>
>
> --
> BR,
> Hongtao
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-05-21 2:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 9:58 [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908] Hongtao Liu
2021-05-12 7:29 ` Hongtao Liu
2021-05-12 8:36 ` Uros Bizjak
2021-05-12 11:46 ` Hongtao Liu
2021-05-12 12:38 ` Uros Bizjak
2021-05-13 0:43 ` Hongtao Liu
2021-05-21 2:32 ` Hongtao Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).