From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112b.google.com (mail-yw1-x112b.google.com [IPv6:2607:f8b0:4864:20::112b]) by sourceware.org (Postfix) with ESMTPS id 9751C3858C52 for ; Wed, 20 Jul 2022 06:54:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9751C3858C52 Received: by mail-yw1-x112b.google.com with SMTP id 00721157ae682-31caffa4a45so164008277b3.3 for ; Tue, 19 Jul 2022 23:54:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8r39RdrJRTxquktcNLGcCaldMuUw2yvgGA1qsKjWpZY=; b=FldTHaIrppEqrou+BU/CebdUczFsFsKa1o9rBVoAJkpIZCbtiyJ9mlVSi5ma1H2kh9 iM68zl/sTZvYw6QJNZ3sKBtygEovNN9APuSt3Ubt8tsdaSeNTypRO9ydrCiriYysTsOF bv8ce0E8zvgrVEzR7UfSspuDvq7hlnl+lTs0fhm0ESWSG0hvhLlW7hsH6FalYqc1gXQF PTHfjRbvhvbsrRvrlXm5JaUDUuPOidR6mg2VBkA5/PFXCtFcS0gxtJuvGK6LD+oZiNLU tyiD/ZvzwoxTTzT+jq7GL2EH5t6iPANKHat+uTWIF8q+LP9nhf4vYzC6VTWCuuEf/XDD IXRQ== X-Gm-Message-State: AJIora8aU2IjRj8QX+15QCHmlQoRDnIu4bqgq1DlAP5mS+JR1o6DVXNL FThwF49+drN85y6cZeTpmnh6IeMkNDG9rSfxh2g= X-Google-Smtp-Source: AGRyM1tY9pHlIkwUcXYr3//6/6lUmjs0Bwy2JWWhPhchmlawT0gA7iKLB5T4q9tRTRU3DtMb7mR0uOHW9P71xqifU5I= X-Received: by 2002:a81:a1ca:0:b0:31e:58d4:e724 with SMTP id y193-20020a81a1ca000000b0031e58d4e724mr8386318ywg.486.1658300079953; Tue, 19 Jul 2022 23:54:39 -0700 (PDT) MIME-Version: 1.0 References: <20220719060736.18399-1-hongtao.liu@intel.com> In-Reply-To: From: Hongtao Liu Date: Wed, 20 Jul 2022 14:54:28 +0800 Message-ID: Subject: Re: [PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative. To: Uros Bizjak Cc: liuhongt , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jul 2022 06:54:42 -0000 On Wed, Jul 20, 2022 at 2:18 PM Uros Bizjak wrote: > > On Wed, Jul 20, 2022 at 8:14 AM Uros Bizjak wrote: > > > > On Wed, Jul 20, 2022 at 4:37 AM Hongtao Liu wrote: > > > > > > On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak wrote: > > > > > > > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu wrote: > > > > > > > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches > > > > > wrote: > > > > > > > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt wrote: > > > > > > > > > > > > > > And split it after reload. > > > > > > > > > > > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > > > > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > > > > > > prepare insn operands. > > > > > > > Split define_expand with just register_operand, and allow > > > > > > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > > > > > > > > > > > But you will *ease* the job of the above passes if you use > > > > > > ix86_fixup_binary_operands_no_copy in the expander. > > > > > for -m32, it will hit ICE in > > > > > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, > > > > > mode=E_V4QImode, operands=0x7fffffffa970) a > > > > > /gcc/config/i386/i386-expand.cc:1184 > > > > > 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); > > > > > (gdb) n > > > > > 1185 gcc_assert (dst == operands[0]); -- here > > > > > (gdb) > > > > > > > > > > the original operands[0], operands[1], operands[2] are below > > > > > (gdb) p debug_rtx (operands[0]) > > > > > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) > > > > > (const_int -8220 [0xffffffffffffdfe4])) [0 MEM > > > > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) > > > > > $1 = void > > > > > (gdb) p debug_rtx (operands[1]) > > > > > (subreg:V4QI (reg:SI 129) 0) > > > > > $2 = void > > > > > (gdb) p debug_rtx (operands[2]) > > > > > (subreg:V4QI (reg:SI 98 [ _46 ]) 0) > > > > > $3 = void > > > > > (gdb) > > > > > > > > > > since operands[0] is mem and not equal to operands[1], > > > > > ix86_fixup_binary_operands will create a pseudo register for dst. and > > > > > then hit ICE. > > > > > Is this a bug or assumed? > > > > > > > > You will need ix86_expand_binary_operator here. > > > It will swap memory operand from op1 to op2 and hit ICE for unrecognized insn. > > > > > > What about this? > > > > Still no good... You are using commutative operands, so the predicate > > of operand 2 should also allow memory. So, the predicate should be > > nonimmediate_or_x86_64_const_vector_operand. The intermediate insn > > pattern should look something like *_1, but with > > added XMM and MMX reg alternatives instead of mask regs. > > Alternatively, you can use UNKNOWN operator to prevent > canonicalization, but then you should not use commutative constraint > in the intermediate insn. I think this is the best solution. Like this? -(define_insn "3" - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") +(define_expand "3" + [(set (match_operand:VI_16_32 0 "nonimmediate_operand") (any_logic:VI_16_32 - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) - (clobber (reg:CC FLAGS_REG))] + (match_operand:VI_16_32 1 "nonimmediate_operand") + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand")))] "" +{ + rtx dst = ix86_fixup_binary_operands (, mode, operands); + if (MEM_P (operands[2])) + operands[2] = force_reg (mode, operands[2]); + rtx op = gen_rtx_SET (dst, gen_rtx_fmt_ee (, mode, + operands[1], operands[2])); + rtx clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); + emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob))); + if (dst != operands[0]) + emit_move_insn (operands[0], dst); + DONE; +}) + +(define_insn "*3" + [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v") + (any_logic:VI_16_32 + (match_operand:VI_16_32 1 "nonimmediate_operand" "0,0,0,x,v") + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand" "r,i,x,x,v"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (UNKNOWN, mode, operands)" "#" - [(set_attr "isa" "*,sse2_noavx,avx,avx512vl") - (set_attr "type" "alu,sselog,sselog,sselog") - (set_attr "mode" "SI,TI,TI,TI")]) + [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl") + (set_attr "type" "alu,alu,sselog,sselog,sselog") + (set_attr "mode" "SI,SI,TI,TI,TI")]) > > Uros. > > > > > > > -(define_insn "3" > > > - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") > > > +(define_expand "3" > > > + [(set (match_operand:VI_16_32 0 "nonimmediate_operand") > > > (any_logic:VI_16_32 > > > - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") > > > - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) > > > - (clobber (reg:CC FLAGS_REG))] > > > + (match_operand:VI_16_32 1 "nonimmediate_operand") > > > + (match_operand:VI_16_32 2 > > > "register_or_x86_64_const_vector_operand")))] > > > "" > > > +{ > > > + rtx dst = ix86_fixup_binary_operands (, mode, operands); > > > + if (MEM_P (operands[2])) > > > + operands[2] = force_reg (mode, operands[2]); > > > + rtx op = gen_rtx_SET (dst, gen_rtx_fmt_ee (, mode, > > > + operands[1], operands[2])); > > > + rtx clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); > > > + emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob))); > > > + if (dst != operands[0]) > > > + emit_move_insn (operands[0], dst); > > > + DONE; > > > +}) > > > + > > > > > > > > > > > Uros. > > > > > > > > > > > > -- > > > BR, > > > Hongtao -- BR, Hongtao