From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) by sourceware.org (Postfix) with ESMTPS id 582EA3858C20 for ; Tue, 11 Oct 2022 09:04:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 582EA3858C20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb30.google.com with SMTP id e62so15720723yba.6 for ; Tue, 11 Oct 2022 02:04:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=CpXCw1JxuV1+CyoupY8mNFLSjdMlt+j0HhOBi1OhRWA=; b=L17/MNc9uqoEoP/229joUlvLURTmClDi9tVhctWMWIgEyLUUI+1wxNDJeubRXkgAz6 HL/6hcWgTOLnziinkHvN9oomQIheL7FmQwy71I36nx1T2+q+jor4Uw4FslCZE7s02pCA 0yvf8qQwfkzgYC9kyh6sLdHqwf0W6P0lq+L7m+QqqoH2k34v/VjGDyG8DJJT3gH4A51e NmQMp/iE9t7mFACA7noOyyA0TgqR55hM3AyKC7mCKFRJIZIm37S/uZwwNmx0JbJKpYWY e7Cwxly7G5v2djrCrB9XAzLkeJCpMGZPJ3L8bQyddXIKT13ZVXJDKXB2QePFpZ2/vGH0 xBMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CpXCw1JxuV1+CyoupY8mNFLSjdMlt+j0HhOBi1OhRWA=; b=N3WUbzUOvudOeawzgjVVmAYqNt5jqOwlFCbzdMx7znVxJ1l1iA3esD5YHql4uI0Gft lzdkgkCb5CaallNVGQzblRXIcdve3r88llrkqU84zUs78/UgsKio2ZyrA0Ma4XJdghrw Q9ObG1YVXowHtSOCsnQkqgM3j7TDmNmuPZhRLg5i8TrMQwkJuZn686WOFBvFSVQsw6BZ 4eB1p8jh9xwUZ4WoDYNOsWuYB2ePEGR/T1bHz/B8Lpy/uyv50/WINUYTJ3CH77VTNiPK 8GBDuA7FQQshs8OFO4wQ2SSD3ZubLsMW6rDeBm6xgOHLVfMSm8HwV8pTKECaumpgmgZi Wgsw== X-Gm-Message-State: ACrzQf3c5D6pE3+/cNv0kiB5rEKqBqHkJAVG3pMSnl361bSVmUWCP+OI 7vqP0i9IaMVnR+FTnLb+j+gVPknK9pzby3tvV6U= X-Google-Smtp-Source: AMsMyM6hnSuy3YrDdCEuEUDH4e2etgK8fln20VJ1CkqLDzfuR0OGja3HijPTjxJedAsbG8kkckSyqf0o51CiWB+Pnmg= X-Received: by 2002:a5b:a44:0:b0:6b0:13b:c93b with SMTP id z4-20020a5b0a44000000b006b0013bc93bmr22880163ybq.398.1665479080538; Tue, 11 Oct 2022 02:04:40 -0700 (PDT) MIME-Version: 1.0 References: <20221011080316.1778261-1-hongtao.liu@intel.com> In-Reply-To: <20221011080316.1778261-1-hongtao.liu@intel.com> From: Uros Bizjak Date: Tue, 11 Oct 2022 11:04:29 +0200 Message-ID: Subject: Re: [PATCH] [x86] Add define_insn_and_split to support general version of "kxnor". To: liuhongt Cc: gcc-patches@gcc.gnu.org, crazylht@gmail.com, hjl.tools@gmail.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Oct 11, 2022 at 10:03 AM liuhongt wrote: > > For genereal_reg_operand, it will be splitted into xor + not. > For mask_reg_operand, it will be splitted with UNSPEC_MASK_OP just > like what we did for other logic operations. > > The patch will optimize xor+not to kxnor when possible. > > Bootstrapped and regtested on x86_64-pc-linux-gnu. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.md (*notxor_1): New post_reload > define_insn_and_split. > (*notxorqi_1): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr107093.c: New test. OK with a small fix below. Thanks, Uros. > --- > gcc/config/i386/i386.md | 71 ++++++++++++++++++++++++ > gcc/testsuite/gcc.target/i386/pr107093.c | 38 +++++++++++++ > 2 files changed, 109 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr107093.c > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 1be9b669909..228edba2b40 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -10826,6 +10826,39 @@ (define_insn "*_1" > (set_attr "type" "alu, alu, msklog") > (set_attr "mode" "")]) > > +(define_insn_and_split "*notxor_1" > + [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k") > + (not:SWI248 > + (xor:SWI248 > + (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k") > + (match_operand:SWI248 2 "" "r,,k")))) > + (clobber (reg:CC FLAGS_REG))] > + "ix86_binary_operator_ok (XOR, mode, operands)" > + "#" > + "&& reload_completed" > + [(parallel > + [(set (match_dup 0) > + (xor:SWI248 (match_dup 1) (match_dup 2))) > + (clobber (reg:CC FLAGS_REG))]) > + (set (match_dup 0) > + (not:SWI248 (match_dup 1)))] (not:SWI248 (match_dup 0)) in the above RTX. > +{ > + if (MASK_REGNO_P (REGNO (operands[0]))) > + { > + emit_insn (gen_kxnor (operands[0], operands[1], operands[2])); > + DONE; > + } > +} > + [(set (attr "isa") > + (cond [(eq_attr "alternative" "2") > + (if_then_else (eq_attr "mode" "SI,DI") > + (const_string "avx512bw") > + (const_string "avx512f")) > + ] > + (const_string "*"))) > + (set_attr "type" "alu, alu, msklog") > + (set_attr "mode" "")]) > + > (define_insn_and_split "*iordi_1_bts" > [(set (match_operand:DI 0 "nonimmediate_operand" "=rm") > (ior:DI > @@ -10959,6 +10992,44 @@ (define_insn "*qi_1" > (symbol_ref "!TARGET_PARTIAL_REG_STALL")] > (symbol_ref "true")))]) > > +(define_insn_and_split "*notxorqi_1" > + [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k") > + (not:QI > + (xor:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k") > + (match_operand:QI 2 "general_operand" "qn,m,rn,k")))) > + (clobber (reg:CC FLAGS_REG))] > + "ix86_binary_operator_ok (XOR, QImode, operands)" > + "#" > + "&& reload_completed" > + [(parallel > + [(set (match_dup 0) > + (xor:QI (match_dup 1) (match_dup 2))) > + (clobber (reg:CC FLAGS_REG))]) > + (set (match_dup 0) > + (not:QI (match_dup 0)))] > +{ > + if (mask_reg_operand (operands[0], QImode)) > + { > + emit_insn (gen_kxnorqi (operands[0], operands[1], operands[2])); > + DONE; > + } > +} > + [(set_attr "isa" "*,*,*,avx512f") > + (set_attr "type" "alu,alu,alu,msklog") > + (set (attr "mode") > + (cond [(eq_attr "alternative" "2") > + (const_string "SI") > + (and (eq_attr "alternative" "3") > + (match_test "!TARGET_AVX512DQ")) > + (const_string "HI") > + ] > + (const_string "QI"))) > + ;; Potential partial reg stall on alternative 2. > + (set (attr "preferred_for_speed") > + (cond [(eq_attr "alternative" "2") > + (symbol_ref "!TARGET_PARTIAL_REG_STALL")] > + (symbol_ref "true")))]) > + > ;; Alternative 1 is needed to work around LRA limitation, see PR82524. > (define_insn_and_split "*_1_slp" > [(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+,&")) > diff --git a/gcc/testsuite/gcc.target/i386/pr107093.c b/gcc/testsuite/gcc.target/i386/pr107093.c > new file mode 100644 > index 00000000000..23e30cbac0f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr107093.c > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512bw -O2 -mavx512vl" } */ > +/* { dg-final { scan-assembler-times {(?n)kxnor[bwqd]} 4 { target { ! ia32 } } } } */ > +/* { dg-final { scan-assembler-times {(?n)kxnor[bwdq]} 3 { target ia32 } } } */ > + > +#include > + > +__m512i > +foo (__m512i a, __m512i b, __m512i c, __m512i d) > +{ > + __mmask32 k1 = _mm512_cmp_epi16_mask (a, b, 1); > + __mmask32 k2 = _mm512_cmp_epi16_mask (c, d, 2); > + return _mm512_mask_mov_epi16 (a, ~(k1 ^ k2), c); > +} > + > +__m512i > +foo1 (__m512i a, __m512i b, __m512i c, __m512i d) > +{ > + __mmask16 k1 = _mm512_cmp_epi32_mask (a, b, 1); > + __mmask16 k2 = _mm512_cmp_epi32_mask (c, d, 2); > + return _mm512_mask_mov_epi32 (a, ~(k1 ^ k2), c); > +} > + > +__m512i > +foo2 (__m512i a, __m512i b, __m512i c, __m512i d) > +{ > + __mmask64 k1 = _mm512_cmp_epi8_mask (a, b, 1); > + __mmask64 k2 = _mm512_cmp_epi8_mask (c, d, 2); > + return _mm512_mask_mov_epi8 (a, ~(k1 ^ k2), c); > +} > + > +__m512i > +foo3 (__m512i a, __m512i b, __m512i c, __m512i d) > +{ > + __mmask8 k1 = _mm512_cmp_epi64_mask (a, b, 1); > + __mmask8 k2 = _mm512_cmp_epi64_mask (c, d, 2); > + return _mm512_mask_mov_epi64 (a, ~(k1 ^ k2), c); > +} > -- > 2.27.0 >