From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by sourceware.org (Postfix) with ESMTPS id 05781385829C for ; Mon, 11 Jul 2022 08:02:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 05781385829C Received: by mail-qt1-x832.google.com with SMTP id 23so6404224qtt.3 for ; Mon, 11 Jul 2022 01:02:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jPBYQ2SdJQhZrtcmc10ei26N5lKCBDGp/KlO+y6SRIE=; b=6RlYPG/j2bEyUrxYfqXdPJxxfwGDIohje8yX4L6VHuZs0UXsHwHfexSovaa2vyEknN apsf6Luln5mR5QK92SKgmSgC+s/io9Ki2wgxqlbOj0QvrWERym3PTdcZSAG0pa8GS/0u mIKuU52ML35HZeVuw/sBeAvyWR67tui3weXwtQYn80BhSaeE+/Oqri1cPHJ4syn1MpDj SlIZ9idaJYFc43fObU4RY2JFJtasWsK/clEwUd/WEvwLBr5O3BF4l0Mx2A7S33xdA9Xx az42Bc/9hevJ+3piu1jZQAFCEM5PdfyDkOLdSZaqtOXP1wm0PRTMXZRRHxRmeDoaNfsJ JMNQ== X-Gm-Message-State: AJIora/5PQNDcf9wFGm3fUsY+KynduU8gPIvz9FfLZPBHAk3k/kXkLps /m3vj2Gj7hXNfSCGlKtaSmcp96F8ByZh0egPgNk= X-Google-Smtp-Source: AGRyM1tP/8PlQXxMcDOW+Cjfp7EN6GFmnygZ6I3XsMErA9zllCYcFh9hkL2jV9T79EmDpESkEPTVjSsorLwtnm8jsj8= X-Received: by 2002:a05:622a:5d4:b0:31a:ade:2086 with SMTP id d20-20020a05622a05d400b0031a0ade2086mr12685021qtb.569.1657526569257; Mon, 11 Jul 2022 01:02:49 -0700 (PDT) MIME-Version: 1.0 References: <20220711011506.103835-1-hongtao.liu@intel.com> In-Reply-To: <20220711011506.103835-1-hongtao.liu@intel.com> From: Uros Bizjak Date: Mon, 11 Jul 2022 10:02:38 +0200 Message-ID: Subject: Re: [PATCH] Allocate general register(memory/immediate) for 16/32/64-bit vector bit_op patterns. To: liuhongt Cc: "gcc-patches@gcc.gnu.org" , "H. J. Lu" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2022 08:02:53 -0000 On Mon, Jul 11, 2022 at 3:15 AM liuhongt wrote: > > And split it to GPR-version instruction after reload. > > This will enable below optimization for 16/32/64-bit vector bit_op > > - movd (%rdi), %xmm0 > - movd (%rsi), %xmm1 > - pand %xmm1, %xmm0 > - movd %xmm0, (%rdi) > + movl (%rsi), %eax > + andl %eax, (%rdi) > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? The patch will create many interunit moves (xmm <-> gpr) for anything but the most simple logic sequences, because operations with memory/immediate will be forced into GPR registers, while reg/reg operations will remain in XMM registers. I tried to introduce GPR registers to MMX logic insns in the past and observed the above behavior, but perhaps RA evolved in the mean time to handle different register sets better (especially under register pressure). However, I would advise to be careful with this functionality. Perhaps this problem should be attacked in stages. First, please introduce GPR registers to MMX logic instructions (similar to how VI_16_32 mode instructions are handled). After RA effects will be analysed, only then memory/immediate handling should be added. Also, please don't forget to handle ANDNOT insn - TARGET_BMI slightly complicates this part, but this is also solved with VI_16_32 mode instructions. Uros. > > gcc/ChangeLog: > > PR target/106038 > * config/i386/mmx.md (3): Expand > with (clobber (reg:CC flags_reg)) under TARGET_64BIT > (mmx_code>3): Ditto. > (*mmx_3_1): New define_insn, add post_reload > splitter after it. > (*3): New define_insn, also add post_reload > splitter after it. > (mmxinsnmode): New mode attribute. > (VI_16_32_64): New mode iterator. > (*mov_imm): Refactor with mmxinsnmode. > * config/i386/predicates.md > (nonimmediate_or_x86_64_vector_cst): New predicate. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr106038-1.c: New test. > * gcc.target/i386/pr106038-2.c: New test. > * gcc.target/i386/pr106038-3.c: New test. > --- > gcc/config/i386/mmx.md | 131 +++++++++++++++------ > gcc/config/i386/predicates.md | 4 + > gcc/testsuite/gcc.target/i386/pr106038-1.c | 61 ++++++++++ > gcc/testsuite/gcc.target/i386/pr106038-2.c | 35 ++++++ > gcc/testsuite/gcc.target/i386/pr106038-3.c | 17 +++ > 5 files changed, 213 insertions(+), 35 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-3.c > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > index 3294c1e6274..85b06abea27 100644 > --- a/gcc/config/i386/mmx.md > +++ b/gcc/config/i386/mmx.md > @@ -75,6 +75,11 @@ (define_mode_iterator V_16_32_64 > (V8QI "TARGET_64BIT") (V4HI "TARGET_64BIT") (V4HF "TARGET_64BIT") > (V2SI "TARGET_64BIT") (V2SF "TARGET_64BIT")]) > > +(define_mode_iterator VI_16_32_64 > + [V2QI V4QI V2HI > + (V8QI "TARGET_64BIT") (V4HI "TARGET_64BIT") > + (V2SI "TARGET_64BIT")]) > + > ;; V2S* modes > (define_mode_iterator V2FI [V2SF V2SI]) > > @@ -86,6 +91,14 @@ (define_mode_attr mmxvecsize > [(V8QI "b") (V4QI "b") (V2QI "b") > (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")]) > > +;; Mapping to same size integral mode. > +(define_mode_attr mmxinsnmode > + [(V8QI "DI") (V4QI "SI") (V2QI "HI") > + (V4HI "DI") (V2HI "SI") > + (V2SI "DI") > + (V4HF "DI") (V2HF "SI") > + (V2SF "DI")]) > + > (define_mode_attr mmxdoublemode > [(V8QI "V8HI") (V4HI "V4SI")]) > > @@ -350,22 +363,7 @@ (define_insn_and_split "*mov_imm" > HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1], > mode); > operands[1] = GEN_INT (val); > - machine_mode mode; > - switch (GET_MODE_SIZE (mode)) > - { > - case 2: > - mode = HImode; > - break; > - case 4: > - mode = SImode; > - break; > - case 8: > - mode = DImode; > - break; > - default: > - gcc_unreachable (); > - } > - operands[0] = lowpart_subreg (mode, operands[0], mode); > + operands[0] = lowpart_subreg (mode, operands[0], mode); > }) > > ;; For TARGET_64BIT we always round up to 8 bytes. > @@ -2948,14 +2946,28 @@ (define_expand "mmx_3" > (match_operand:MMXMODEI 1 "register_mmxmem_operand") > (match_operand:MMXMODEI 2 "register_mmxmem_operand")))] > "TARGET_MMX || TARGET_MMX_WITH_SSE" > - "ix86_fixup_binary_operands_no_copy (, mode, operands);") > +{ > + ix86_fixup_binary_operands_no_copy (, mode, operands); > + if (TARGET_64BIT) > + { > + ix86_expand_binary_operator (, mode, operands); > + DONE; > + } > +}) > > (define_expand "3" > [(set (match_operand:MMXMODEI 0 "register_operand") > (any_logic:MMXMODEI > (match_operand:MMXMODEI 1 "register_operand") > (match_operand:MMXMODEI 2 "register_operand")))] > - "TARGET_MMX_WITH_SSE") > + "TARGET_MMX_WITH_SSE" > +{ > + if (TARGET_64BIT) > + { > + ix86_expand_binary_operator (, mode, operands); > + DONE; > + } > +}) > > (define_insn "*mmx_3" > [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,x,v") > @@ -2974,33 +2986,82 @@ (define_insn "*mmx_3" > (set_attr "type" "mmxadd,sselog,sselog,sselog") > (set_attr "mode" "DI,TI,TI,TI")]) > > -(define_insn "3" > - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") > - (any_logic:VI_16_32 > - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") > - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) > +(define_insn "*mmx_3_1" > + [(set (match_operand:MMXMODEI 0 "nonimmediate_operand" "=y,x,x,v,rm,r") > + (any_logic:MMXMODEI > + (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0,0,x,v,0,0") > + (match_operand:MMXMODEI 2 "nonimmediate_or_x86_64_vector_cst" "ym,x,x,v,ri,m"))) > + (clobber (reg:CC FLAGS_REG))] > + "TARGET_64BIT > + && (TARGET_MMX || TARGET_SSE2) > + && ix86_binary_operator_ok (, mode, operands)" > + "#" > + [(set_attr "isa" "*,sse2_noavx,avx,avx512vl,x64,x64") > + (set_attr "mmx_isa" "native,*,*,*,*,*") > + (set_attr "type" "mmxadd,sselog,sselog,sselog,alu,alu") > + (set_attr "mode" "DI,TI,TI,TI,DI,DI")]) > + > +(define_split > + [(set (match_operand:MMXMODEI 0 "register_operand") > + (any_logic:MMXMODEI > + (match_operand:MMXMODEI 1 "register_mmxmem_operand") > + (match_operand:MMXMODEI 2 "register_mmxmem_operand"))) > (clobber (reg:CC FLAGS_REG))] > + "TARGET_64BIT > + && (TARGET_MMX || TARGET_SSE2) > + && reload_completed > + && !general_reg_operand (operands[0], mode)" > + [(set (match_dup 0) > + (any_logic: (match_dup 1) (match_dup 2)))]) > + > +(define_expand "3" > + [(set (match_operand:VI_16_32 0 "register_operand") > + (any_logic:VI_16_32 > + (match_operand:VI_16_32 1 "register_operand") > + (match_operand:VI_16_32 2 "register_operand")))] > "" > +{ > + ix86_expand_binary_operator (, mode, operands); > + DONE; > +}) > + > +(define_insn "*3" > + [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=rm,r,x,x,v") > + (any_logic:VI_16_32 > + (match_operand:VI_16_32 1 "nonimmediate_operand" "%0,0,0,x,v") > + (match_operand:VI_16_32 2 "nonimmediate_or_x86_64_vector_cst" "ri,m,x,x,v"))) > + (clobber (reg:CC FLAGS_REG))] > + "ix86_binary_operator_ok (, mode, operands)" > "#" > - [(set_attr "isa" "*,sse2_noavx,avx,avx512vl") > - (set_attr "type" "alu,sselog,sselog,sselog") > - (set_attr "mode" "SI,TI,TI,TI")]) > + [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl") > + (set_attr "type" "alu,alu,sselog,sselog,sselog") > + (set_attr "mode" "SI,SI,TI,TI,TI")]) > > (define_split > - [(set (match_operand:VI_16_32 0 "general_reg_operand") > - (any_logic:VI_16_32 > - (match_operand:VI_16_32 1 "general_reg_operand") > - (match_operand:VI_16_32 2 "general_reg_operand"))) > + [(set (match_operand:VI_16_32_64 0 "") > + (any_logic:VI_16_32_64 > + (match_operand:VI_16_32_64 1 "") > + (match_operand:VI_16_32_64 2 ""))) > (clobber (reg:CC FLAGS_REG))] > - "reload_completed" > + "reload_completed > + && !sse_reg_operand (operands[1], mode) > + && !sse_reg_operand (operands[2], mode) > + && !sse_reg_operand (operands[0], mode)" > [(parallel > [(set (match_dup 0) > - (any_logic:SI (match_dup 1) (match_dup 2))) > + (any_logic: (match_dup 1) (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > { > - operands[2] = lowpart_subreg (SImode, operands[2], mode); > - operands[1] = lowpart_subreg (SImode, operands[1], mode); > - operands[0] = lowpart_subreg (SImode, operands[0], mode); > + if (CONSTANT_P (operands[2])) > + { > + HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[2], > + mode); > + operands[2] = GEN_INT (val); > + } > + else > + operands[2] = lowpart_subreg (mode, operands[2], mode); > + operands[1] = lowpart_subreg (mode, operands[1], mode); > + operands[0] = lowpart_subreg (mode, operands[0], mode); > }) > > (define_split > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > index c71c453cceb..62280f58478 100644 > --- a/gcc/config/i386/predicates.md > +++ b/gcc/config/i386/predicates.md > @@ -1205,6 +1205,10 @@ (define_predicate "x86_64_const_vector_operand" > return trunc_int_for_mode (val, SImode) == val; > }) > > +(define_predicate "nonimmediate_or_x86_64_vector_cst" > + (ior (match_operand 0 "nonimmediate_operand") > + (match_operand 0 "x86_64_const_vector_operand"))) > + > ;; Return true when OP is nonimmediate or standard SSE constant. > (define_predicate "nonimmediate_or_sse_const_operand" > (ior (match_operand 0 "nonimmediate_operand") > diff --git a/gcc/testsuite/gcc.target/i386/pr106038-1.c b/gcc/testsuite/gcc.target/i386/pr106038-1.c > new file mode 100644 > index 00000000000..ac5d1990682 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr106038-1.c > @@ -0,0 +1,61 @@ > +/* { dg-do compile } */ > +/* { dg-options "-msse2 -O2" } */ > +/* { dg-final { scan-assembler-not "xmm" } } */ > + > +void > +foo (char* a, char* __restrict b) > +{ > + a[0] &= b[0]; > + a[1] &= b[1]; > + a[2] &= b[2]; > + a[3] &= b[3]; > +} > + > +void > +foo1 (char* a, char* __restrict b) > +{ > + a[0] &= b[0]; > + a[1] &= b[1]; > +} > + > +void > +foo2 (char* a, char* __restrict b) > +{ > + a[0] &= b[0]; > + a[1] &= b[1]; > + a[2] &= b[2]; > + a[3] &= b[3]; > + a[4] &= b[4]; > + a[5] &= b[5]; > + a[6] &= b[6]; > + a[7] &= b[7]; > +} > + > +void > +foo3 (char* a, char* __restrict b) > +{ > + a[0] &= 1; > + a[1] &= 2; > + a[2] &= 3; > + a[3] &= 3; > +} > + > +void > +foo4 (char* a, char* __restrict b) > +{ > + a[0] &= 1; > + a[1] &= 2; > +} > + > +void > +foo5 (char* a, char* __restrict b) > +{ > + a[0] &= 1; > + a[1] &= 2; > + a[2] &= 2; > + a[3] &= 3; > + a[4] &= 4; > + a[5] &= 5; > + a[6] &= 6; > + a[7] &= 7; > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr106038-2.c b/gcc/testsuite/gcc.target/i386/pr106038-2.c > new file mode 100644 > index 00000000000..dce8a536a95 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr106038-2.c > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-options "-msse2 -O2" } */ > +/* { dg-final { scan-assembler-not "xmm" } } */ > + > +void > +foo (short* a, short* __restrict b) > +{ > + a[0] &= b[0]; > + a[1] &= b[1]; > + a[2] &= b[2]; > + a[3] &= b[3]; > +} > + > +void > +foo1 (short* a, short* __restrict b) > +{ > + a[0] &= b[0]; > + a[1] &= b[1]; > +} > + > +void > +foo3 (short* a, short* __restrict b) > +{ > + a[0] &= 1; > + a[1] &= 2; > + a[2] &= 3; > + a[3] &= 3; > +} > + > +void > +foo4 (short* a, short* __restrict b) > +{ > + a[0] &= 1; > + a[1] &= 2; > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr106038-3.c b/gcc/testsuite/gcc.target/i386/pr106038-3.c > new file mode 100644 > index 00000000000..3c7bd978f36 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr106038-3.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-msse2 -O2" } */ > +/* { dg-final { scan-assembler-not "xmm" } } */ > + > +void > +foo1 (int* a, int* __restrict b) > +{ > + a[0] &= b[0]; > + a[1] &= b[1]; > +} > + > +void > +foo4 (int* a, int* __restrict b) > +{ > + a[0] &= 1; > + a[1] &= 2; > +} > -- > 2.18.1 >