From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by sourceware.org (Postfix) with ESMTPS id 386183857023 for ; Fri, 13 May 2022 09:03:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 386183857023 Received: by mail-qt1-x82e.google.com with SMTP id fu47so6354370qtb.5 for ; Fri, 13 May 2022 02:03:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZEwfIj/84U+lhAnp8AMNPSz5I5zZv+f6jEpfgAoUKos=; b=pFDxCd3KMrT/W9Ru4nzDTK5ob8U7w6ynLAVJ8ruJlzMCxjG958zswrwKIcpqc6jJzG KWcEPNl4FpqK6p4UAxVU+ItHDok8h59Xiv5YjNxWpzRFkunlCM2H/4MPz82QfB7wEwMD xBMhQatLd1FJyyqRum47gR3eqBV9jjj8aUYBCEqaYuZXovqvI+aGpyUXpGjL0TgoF5sn H5CB735AwIbk0UgzHDk3jriwzeL8Fg55EyzNaPIqfhhviqNPV/6zl2r2zLhcOpgmcZJ0 MG0X9xHBDgg53GfDNSKYf9ttHtD1BF0xLCcGeYTIHjVfqaCmSR033iKq4h3y9qX/LAhI c7Zg== X-Gm-Message-State: AOAM532s0nPN0p7HxZ5UhBjc7tHcTPHRtl8Dh/F2lEUnMJG5UPLCXhLk KR6Nl3vMTrACDOq/GLc77CVjbB2jc1PiViaP25g5xlAXCzw= X-Google-Smtp-Source: ABdhPJzH3gfb8OeHsbGGXN+V3A6JWvNIwm1nDzjKN0g9t1uV/yki2wwisRHsecou0BZNCDa1htKpKHNvCzQSFGEeVxs= X-Received: by 2002:a05:622a:20f:b0:2f3:d271:2b12 with SMTP id b15-20020a05622a020f00b002f3d2712b12mr3506601qtx.436.1652432629143; Fri, 13 May 2022 02:03:49 -0700 (PDT) MIME-Version: 1.0 References: <20220513071602.91413-1-hongtao.liu@intel.com> In-Reply-To: From: Uros Bizjak Date: Fri, 13 May 2022 11:03:38 +0200 Message-ID: Subject: Re: [PATCH] Optimize vpermtiw/b to vpunpcklqdq for certain cases. To: "Liu, Hongtao" Cc: "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2022 09:03:53 -0000 On Fri, May 13, 2022 at 10:54 AM Liu, Hongtao wrote: > > > > > -----Original Message----- > > From: Uros Bizjak > > Sent: Friday, May 13, 2022 4:15 PM > > To: Liu, Hongtao > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH] Optimize vpermtiw/b to vpunpcklqdq for certain cases. > > > > On Fri, May 13, 2022 at 9:16 AM liuhongt wrote: > > > > > > Assembly Optimization like: > > > - vmovq %xmm0, %xmm2 > > > - vmovdqa .LC0(%rip), %xmm0 > > > vmovq %xmm1, %xmm1 > > > - vpermi2w %xmm1, %xmm2, %xmm0 > > > + vmovq %xmm0, %xmm0 > > > + vpunpcklqdq %xmm1, %xmm0, %xmm0 > > > > > > ... > > > > > > -.LC0: > > > - .value 0 > > > - .value 1 > > > - .value 2 > > > - .value 3 > > > - .value 8 > > > - .value 9 > > > - .value 10 > > > - .value 11 > > > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > > Ok for trunk? > > > > > > gcc/ChangeLog: > > > > > > PR target/105033 > > > * config/i386/sse.md (*vec_concatv4si): Extend to .. > > > (*vec_concat): .. V16QI and V8HImode. > > > (*vec_concatv16qi_permt2): New pre_reload define_insn_and_split. > > > (*vec_concatv8hi_permt2): Ditto. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/pr105033.c: New test. > > > --- > > > gcc/config/i386/sse.md | 62 ++++++++++++++++++++++-- > > > gcc/testsuite/gcc.target/i386/pr105033.c | 27 +++++++++++ > > > 2 files changed, 84 insertions(+), 5 deletions(-) create mode 100644 > > > gcc/testsuite/gcc.target/i386/pr105033.c > > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index > > > a63df0d0b1f..2e417e47d20 100644 > > > --- a/gcc/config/i386/sse.md > > > +++ b/gcc/config/i386/sse.md > > > @@ -19600,11 +19600,11 @@ (define_insn "*vec_concatv2si" > > > (set_attr "type" "sselog,ssemov,sselog,ssemov,mmxcvt,mmxmov") > > > (set_attr "mode" "TI,TI,V4SF,SF,DI,DI")]) > > > > > > -(define_insn "*vec_concatv4si" > > > - [(set (match_operand:V4SI 0 "register_operand" "=x,v,x,x,v") > > > - (vec_concat:V4SI > > > - (match_operand:V2SI 1 "register_operand" " 0,v,0,0,v") > > > - (match_operand:V2SI 2 "nonimmediate_operand" " x,v,x,m,m")))] > > > +(define_insn "*vec_concat" > > > + [(set (match_operand:VI124_128 0 "register_operand" "=x,v,x,x,v") > > > + (vec_concat:VI124_128 > > > + (match_operand: 1 "register_operand" " 0,v,0,0,v") > > > + (match_operand: 2 "nonimmediate_operand" " > > > +x,v,x,m,m")))] > > > "TARGET_SSE" > > > "@ > > > punpcklqdq\t{%2, %0|%0, %2} > > > @@ -19617,6 +19617,58 @@ (define_insn "*vec_concatv4si" > > > (set_attr "prefix" "orig,maybe_evex,orig,orig,maybe_evex") > > > (set_attr "mode" "TI,TI,V4SF,V2SF,V2SF")]) > > > > > > +(define_insn_and_split "*vec_concatv16qi_permt2" > > > + [(set (match_operand:V16QI 0 "register_operand") > > > + (unspec:V16QI > > > + [(const_vector:V16QI [(const_int 0) (const_int 1) > > > + (const_int 2) (const_int 3) > > > + (const_int 4) (const_int 5) > > > + (const_int 6) (const_int 7) > > > + (const_int 16) (const_int 17) > > > + (const_int 18) (const_int 19) > > > + (const_int 20) (const_int 21) > > > + (const_int 22) (const_int 23)]) > > > + (match_operand:V16QI 1 "register_operand") > > > + (match_operand:V16QI 2 "nonimmediate_operand")] > > > + UNSPEC_VPERMT2))] > > > + "TARGET_AVX512VL && TARGET_AVX512VBMI" > > > > You need "&& ix86_pre_reload_split ()" here, because a pseudo can be > > generated via force_reg. > > > will change. > > > + "#" > > > + "&& 1" > > > + [(set (match_dup 0) > > > + (vec_concat:V16QI (match_dup 1) (match_dup 2)))] { > > > + operands[1] = lowpart_subreg (V8QImode, > > > + force_reg (V16QImode, operands[1]), > > > + V16QImode); > > > + if (!MEM_P (operands[2])) > > > + operands[2] = force_reg (V16QImode, operands[2]); > > > > Are you sure there are no subregs possible in operand[2]? To stay on the safe > > side, use force_reg unconditionally, it will also force subregs to reg, avoiding > > failure with the following lowpart_subreg. > When it's MEM, not need to force_reg. Ah, I misread this. Uros. > > > > Uros. > > > > > + operands[2] = lowpart_subreg (V8QImode, operands[2], V16QImode); > > > +}) > > > + > > > +(define_insn_and_split "*vec_concatv8hi_permt2" > > > + [(set (match_operand:V8HI 0 "register_operand") > > > + (unspec:V8HI > > > + [(const_vector:V8HI [(const_int 0) (const_int 1) > > > + (const_int 2) (const_int 3) > > > + (const_int 8) (const_int 9) > > > + (const_int 10) (const_int 11)]) > > > + (match_operand:V8HI 1 "register_operand") > > > + (match_operand:V8HI 2 "nonimmediate_operand")] > > > + UNSPEC_VPERMT2))] > > > + "TARGET_AVX512VL && TARGET_AVX512BW" > > > + "#" > > > + "&& 1" > > > + [(set (match_dup 0) > > > + (vec_concat:V8HI (match_dup 1) (match_dup 2)))] { > > > + operands[1] = lowpart_subreg (V4HImode, > > > + force_reg (V8HImode, operands[1]), > > > + V8HImode); > > > + if (!MEM_P (operands[2])) > > > + operands[2] = force_reg (V8HImode, operands[2]); > > > + operands[2] = lowpart_subreg (V4HImode, operands[2], V8HImode); > > > +}) > > > + > > > (define_insn "*vec_concat_0" > > > [(set (match_operand:VI124_128 0 "register_operand" "=v,x") > > > (vec_concat:VI124_128 > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105033.c > > > b/gcc/testsuite/gcc.target/i386/pr105033.c > > > new file mode 100644 > > > index 00000000000..ab05e3b3bc8 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/pr105033.c > > > @@ -0,0 +1,27 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-march=sapphirerapids -O2" } */ > > > +/* { dg-final { scan-assembler-times {vpunpcklqdq[ \t]+} 3 } } */ > > > +/* { dg-final { scan-assembler-not {vpermi2[wb][ \t]+} } } */ > > > + > > > +typedef _Float16 v8hf __attribute__((vector_size (16))); typedef > > > +_Float16 v4hf __attribute__((vector_size (8))); typedef short v8hi > > > +__attribute__((vector_size (16))); typedef short v4hi > > > +__attribute__((vector_size (8))); typedef char v16qi > > > +__attribute__((vector_size (16))); typedef char v8qi > > > +__attribute__((vector_size (8))); > > > + > > > +v8hf foo (v4hf a, v4hf b) > > > +{ > > > + return __builtin_shufflevector (a, b, 0, 1, 2, 3, 4, 5, 6, 7); } > > > + > > > +v8hi foo2 (v4hi a, v4hi b) > > > +{ > > > + return __builtin_shufflevector (a, b, 0, 1, 2, 3, 4, 5, 6, 7); } > > > + > > > +v16qi foo3 (v8qi a, v8qi b) > > > +{ > > > + return __builtin_shufflevector (a, b, 0, 1, 2, 3, 4, 5, 6, 7, > > > + 8, 9, 10, 11, 12, 13, 14, 15); } > > > -- > > > 2.18.1 > > >