From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by sourceware.org (Postfix) with ESMTPS id 93F563858439 for ; Tue, 19 Jul 2022 06:56:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 93F563858439 Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-31e1ecea074so54433787b3.8 for ; Mon, 18 Jul 2022 23:56:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iMrhdFUCjtOlLdP+0moajn7Ng+sxyORs2AXY5dY5OfA=; b=AqQVpd2q8jcwQoP1vhrZw7fKyjvY2rNZj0M4hwutny7L4zlwGU64SPvvSlkyc0TTz3 4TzURVDXjvbTxRYmfWg5nwxTDifymSyn5AP+ya6X1DNFLjfmGnBkADbDix6k/au1B+6X vCDm9g3BSC/T3UjrWX9AjxtjaU0B7r+Ya8PVy7a+Owhf6+dq56iLDB0La2BC0+JubtI5 wIJxkDYLMKJMn8IvQatGarYkwFZHA/U7POqMJYVEy2MRODsMupDHMa7DYbu+JItBFfoM Ru8UdLE/ax2CPRzbskSz1aCtWU/cwGw/SP2Bhw4U1yRIYJENS5rKD0hVAZwlHkqeqXcC w3Bw== X-Gm-Message-State: AJIora82Q6qRKWxUWtshDZsXheG/gL65w5iAb6IovXQXJOxt8g87CeO6 EQiBoasOGwJrcnsL5dfcomowAuw+T4N8H72hLPA= X-Google-Smtp-Source: AGRyM1v0mqNruXBfOEjZw5BlXL6Iyj0QKhtxfumPZBgqyBNzs3gj4iVX+xRLvQbZlglFNnLIg0kjaDGJrI86QE0z1cc= X-Received: by 2002:a81:f82:0:b0:31c:f1ae:1ed6 with SMTP id 124-20020a810f82000000b0031cf1ae1ed6mr34148489ywp.249.1658213800970; Mon, 18 Jul 2022 23:56:40 -0700 (PDT) MIME-Version: 1.0 References: <20220719060736.18399-1-hongtao.liu@intel.com> In-Reply-To: From: Hongtao Liu Date: Tue, 19 Jul 2022 14:56:29 +0800 Message-ID: Subject: Re: [PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative. To: Uros Bizjak Cc: liuhongt , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jul 2022 06:56:43 -0000 On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches wrote: > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt wrote: > > > > And split it after reload. > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > prepare insn operands. > > Split define_expand with just register_operand, and allow > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > But you will *ease* the job of the above passes if you use > ix86_fixup_binary_operands_no_copy in the expander. for -m32, it will hit ICE in Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, mode=E_V4QImode, operands=0x7fffffffa970) a /gcc/config/i386/i386-expand.cc:1184 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); (gdb) n 1185 gcc_assert (dst == operands[0]); -- here (gdb) the original operands[0], operands[1], operands[2] are below (gdb) p debug_rtx (operands[0]) (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) (const_int -8220 [0xffffffffffffdfe4])) [0 MEM [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) $1 = void (gdb) p debug_rtx (operands[1]) (subreg:V4QI (reg:SI 129) 0) $2 = void (gdb) p debug_rtx (operands[2]) (subreg:V4QI (reg:SI 98 [ _46 ]) 0) $3 = void (gdb) since operands[0] is mem and not equal to operands[1], ix86_fixup_binary_operands will create a pseudo register for dst. and then hit ICE. Is this a bug or assumed? > > Uros. > > > > > > Please use if (!register_operand (operands[2], mode)) instead. > > Changed. > > > > Update patch. > > > > gcc/ChangeLog: > > > > PR target/106038 > > * config/i386/mmx.md (3): New define_expand, it's > > original "3". > > (*3): New define_insn, it's original > > "3" be extended to handle memory and immediate > > operand with ix86_binary_operator_ok. Also adjust define_split > > after it. > > (mmxinsnmode): New mode attribute. > > (*mov_imm): Refactor with mmxinsnmode. > > * config/i386/predicates.md > > (register_or_x86_64_const_vector_operand): New predicate. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr106038-1.c: New test. > > --- > > gcc/config/i386/mmx.md | 71 ++++++++++++---------- > > gcc/config/i386/predicates.md | 4 ++ > > gcc/testsuite/gcc.target/i386/pr106038-1.c | 27 ++++++++ > > 3 files changed, 71 insertions(+), 31 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-1.c > > > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > > index 3294c1e6274..316b83dd3ac 100644 > > --- a/gcc/config/i386/mmx.md > > +++ b/gcc/config/i386/mmx.md > > @@ -86,6 +86,14 @@ (define_mode_attr mmxvecsize > > [(V8QI "b") (V4QI "b") (V2QI "b") > > (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")]) > > > > +;; Mapping to same size integral mode. > > +(define_mode_attr mmxinsnmode > > + [(V8QI "DI") (V4QI "SI") (V2QI "HI") > > + (V4HI "DI") (V2HI "SI") > > + (V2SI "DI") > > + (V4HF "DI") (V2HF "SI") > > + (V2SF "DI")]) > > + > > (define_mode_attr mmxdoublemode > > [(V8QI "V8HI") (V4HI "V4SI")]) > > > > @@ -350,22 +358,7 @@ (define_insn_and_split "*mov_imm" > > HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1], > > mode); > > operands[1] = GEN_INT (val); > > - machine_mode mode; > > - switch (GET_MODE_SIZE (mode)) > > - { > > - case 2: > > - mode = HImode; > > - break; > > - case 4: > > - mode = SImode; > > - break; > > - case 8: > > - mode = DImode; > > - break; > > - default: > > - gcc_unreachable (); > > - } > > - operands[0] = lowpart_subreg (mode, operands[0], mode); > > + operands[0] = lowpart_subreg (mode, operands[0], mode); > > }) > > > > ;; For TARGET_64BIT we always round up to 8 bytes. > > @@ -2974,33 +2967,49 @@ (define_insn "*mmx_3" > > (set_attr "type" "mmxadd,sselog,sselog,sselog") > > (set_attr "mode" "DI,TI,TI,TI")]) > > > > -(define_insn "3" > > - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") > > +(define_expand "3" > > + [(parallel > > + [(set (match_operand:VI_16_32 0 "register_operand") > > + (any_logic:VI_16_32 > > + (match_operand:VI_16_32 1 "register_operand") > > + (match_operand:VI_16_32 2 "register_operand"))) > > + (clobber (reg:CC FLAGS_REG))])] > > + "") > > + > > +(define_insn "*3" > > + [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v") > > (any_logic:VI_16_32 > > - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") > > - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) > > + (match_operand:VI_16_32 1 "nonimmediate_operand" "%0,0,0,x,v") > > + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand" "r,i,x,x,v"))) > > (clobber (reg:CC FLAGS_REG))] > > - "" > > + "ix86_binary_operator_ok (, mode, operands)" > > "#" > > - [(set_attr "isa" "*,sse2_noavx,avx,avx512vl") > > - (set_attr "type" "alu,sselog,sselog,sselog") > > - (set_attr "mode" "SI,TI,TI,TI")]) > > + [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl") > > + (set_attr "type" "alu,alu,sselog,sselog,sselog") > > + (set_attr "mode" "SI,SI,TI,TI,TI")]) > > > > (define_split > > - [(set (match_operand:VI_16_32 0 "general_reg_operand") > > + [(set (match_operand:VI_16_32 0 "nonimmediate_gr_operand") > > (any_logic:VI_16_32 > > - (match_operand:VI_16_32 1 "general_reg_operand") > > - (match_operand:VI_16_32 2 "general_reg_operand"))) > > + (match_operand:VI_16_32 1 "nonimmediate_gr_operand") > > + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand"))) > > (clobber (reg:CC FLAGS_REG))] > > "reload_completed" > > [(parallel > > [(set (match_dup 0) > > - (any_logic:SI (match_dup 1) (match_dup 2))) > > + (any_logic: (match_dup 1) (match_dup 2))) > > (clobber (reg:CC FLAGS_REG))])] > > { > > - operands[2] = lowpart_subreg (SImode, operands[2], mode); > > - operands[1] = lowpart_subreg (SImode, operands[1], mode); > > - operands[0] = lowpart_subreg (SImode, operands[0], mode); > > + if (!register_operand (operands[2], mode)) > > + { > > + HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[2], > > + mode); > > + operands[2] = GEN_INT (val); > > + } > > + else > > + operands[2] = lowpart_subreg (mode, operands[2], mode); > > + operands[1] = lowpart_subreg (mode, operands[1], mode); > > + operands[0] = lowpart_subreg (mode, operands[0], mode); > > }) > > > > (define_split > > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > > index c71c453cceb..5f63a7d52f5 100644 > > --- a/gcc/config/i386/predicates.md > > +++ b/gcc/config/i386/predicates.md > > @@ -1205,6 +1205,10 @@ (define_predicate "x86_64_const_vector_operand" > > return trunc_int_for_mode (val, SImode) == val; > > }) > > > > +(define_predicate "register_or_x86_64_const_vector_operand" > > + (ior (match_operand 0 "register_operand") > > + (match_operand 0 "x86_64_const_vector_operand"))) > > + > > ;; Return true when OP is nonimmediate or standard SSE constant. > > (define_predicate "nonimmediate_or_sse_const_operand" > > (ior (match_operand 0 "nonimmediate_operand") > > diff --git a/gcc/testsuite/gcc.target/i386/pr106038-1.c b/gcc/testsuite/gcc.target/i386/pr106038-1.c > > new file mode 100644 > > index 00000000000..bb52385c8a5 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr106038-1.c > > @@ -0,0 +1,27 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-msse2 -O2" } */ > > +/* { dg-final { scan-assembler-not "xmm" } } */ > > + > > +void > > +foo3 (char* a, char* __restrict b) > > +{ > > + a[0] &= 1; > > + a[1] &= 2; > > + a[2] &= 3; > > + a[3] &= 3; > > +} > > + > > +void > > +foo4 (char* a, char* __restrict b) > > +{ > > + a[0] &= 1; > > + a[1] &= 2; > > +} > > + > > + > > +void > > +foo1 (short* a, short* __restrict b) > > +{ > > + a[0] &= 1; > > + a[1] &= 2; > > +} > > -- > > 2.18.1 > > -- BR, Hongtao