From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x942.google.com (mail-ua1-x942.google.com [IPv6:2607:f8b0:4864:20::942]) by sourceware.org (Postfix) with ESMTPS id D9CA0386EC18 for ; Thu, 20 Aug 2020 07:23:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D9CA0386EC18 Received: by mail-ua1-x942.google.com with SMTP id x17so286791uao.5 for ; Thu, 20 Aug 2020 00:23:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yvUtx+XiNd7Cs0mJ7U/4Gn86Aa/3fnVmAgTwzbOTroc=; b=r1Zl7/JVH5wOzq58x97jZJjwdfnGsuo6Jz6bQS9rLZjQCsGeNT1iJmXXQURI53kw+i U/gLW9Yyj0hQhPYpn0++VmRn9lN+Ydb5L6pMl5/RrISb/WNrMKWXOlbUA5q4YMEP5PGW PPC/s7Q+MQbQRsUUCNGFIEYnq9iY9W9pQacaWqqYsvy8YEaGL0Emcffu20LmiCtJj5K7 pjUslNkXuiwsl5yFGkAlR1Qybs6J1OcznYzMbb9s3QrBE5Jqqw4xTZ8DGa9jiN+KND2W 5QFdgKXi9NcZxFYL/w3qj1zatSUrrInVF+G5zvTXfGLYOJLVECOVvUHO8/chgXYEbp8U PS7A== X-Gm-Message-State: AOAM5309bgYPkCnwWd48vprWpML4I7pXdvUYWbJ4t15qSBWIw/yD95Pt AMTmf9mAhe9LlCDAUnTn/gldsQ9wTg1Cra7PSmk= X-Google-Smtp-Source: ABdhPJyUiB5btxhxHVuMf6l7q4uTIDFgVpZJ+ZzCXLabY4VHH6/pB6pNFNBRzQEY5RWJxCnVHzRfxqjdtwsckq2QF9U= X-Received: by 2002:ab0:6797:: with SMTP id v23mr795394uar.35.1597908234313; Thu, 20 Aug 2020 00:23:54 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Hongtao Liu Date: Thu, 20 Aug 2020 15:24:42 +0800 Message-ID: Subject: Re: [PATCH 4/4][PR target/88808]Enable bitwise operator for AVX512 masks. To: Uros Bizjak Cc: GCC Patches , Kirill Yukhin , "H. J. Lu" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Aug 2020 07:23:56 -0000 On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak wrote: > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu wrote: > > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak wrote: > > > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu wrote: > > > > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is not > > > > enabled since there's no corresponding instruction for general > > > > registers. > > > > > > > > gcc/ > > > > PR target/88808 > > > > * config/i386/i386.md: (*movsi_internal): Adjust constraints > > > > for mask registers. > > > > (*movhi_internal): Ditto. > > > > (*movqi_internal): Ditto. > > > > (*anddi_1): Support mask register operations > > > > (*and_1): Ditto. > > > > (*andqi_1): Ditto. > > > > (*andn_1): Ditto. > > > > (*_1): Ditto. > > > > (*qi_1): Ditto. > > > > (*one_cmpl2_1): Ditto. > > > > (*one_cmplsi2_1_zext): Ditto. > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > gcc/testsuite/ > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase. > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > index 74d207c3711..e8ad79d1b0a 100644 > > > --- a/gcc/config/i386/i386.md > > > +++ b/gcc/config/i386/i386.md > > > @@ -2294,7 +2294,7 @@ > > > > > > (define_insn "*movsi_internal" > > > [(set (match_operand:SI 0 "nonimmediate_operand" > > > - "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") > > > + "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k") > > > (match_operand:SI 1 "general_operand" > > > "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] > > > "!(MEM_P (operands[0]) && MEM_P (operands[1]))" > > > > > > I'd rather see *k everywhere, also with *movqi_internal and > > > *movhi_internal patterns. The "*" means that the allocator won't > > > allocate a mask register by default, but it will be used to optimize > > > moves. With the above change, you are risking that during integer > > > register pressure, the register allocator will allocate zero to a mask > > > register, and later "optimize" the move with a direct maskreg-intreg > > > move. > > > > > > The current strategy is that only general registers get allocated for > > > integer modes. Let's keep it this way for now. > > > > > > > Yes, though it would fail gcc.target/i386/avx512dq-pr88465.c and > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to > > move zero into mask register directly. > > Although it would be nice if the register allocator was smart enough, > the current strategy is to introduce peephole2 patterns to fix these > problems, similar to [1]. These peepholes can be introduced in a > follow-up patch. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html > peephole2 added. > > > Otherwise, the patchset LGTM, but please test the suggested changes and repost. > > > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they are > > > used to distinguish mask operations, generated from builtins from > > > generic operations, so I'd like to keep them for a while. The drawback > > > is, that they are not combined with other operations, but at the end > > > of the day, this is what the programmer asked for by using builtins. > > > > Agree, I prefer to keep them. > > Thinking some more about the approach, it looks to me that the optimal > solution is a post-reload splitter that would convert "generic" > patterns to mask operations from sse.md. The mask operations don't set > flags, so we can substantially improve post reload scheduling of these > instructions by removing flags clobber. > > So, simply add "#" to relevant alternatives of logic patterns and add > something like: > > --cut here-- > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 41c6dbfa668..ad49bdc7583 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -1470,6 +1470,18 @@ > ] > (const_string "")))]) > > +(define_split > + [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand") > + (any_logic:SWI1248_AVX512BW > + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand") > + (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand"))) > + (clobber (reg:CC FLAGS_REG))] > + "TARGET_AVX512F && reload_completed" > + [(parallel > + [(set (match_dup 0) > + (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2))) > + (unspec [(const_int 0)] UNSPEC_MASKOP)])]) > + > (define_insn "kandn" > [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k") > (and:SWI1248_AVX512BW > --cut here-- > > and similar for kandn and knot in sse.md. You will have to add > mask_reg_operand predicate, see e.g. sse_reg_operand in predicates.md > for example. > > We don't lose anything, because all important transformations, > propagations and simplifications with these patterns happen before > reload. define_splits are added for those bitwise operations. > > Uros. Also add bellow part which will pass gcc.target/i386/bitwise_mask_op-3.c - must go into Q_REGS. */ + must go into Q_REGS or ALL_MASK_REGS. */ if (GET_MODE (x) == QImode && !CONSTANT_P (x)) { if (Q_CLASS_P (regclass)) return regclass; else if (reg_class_subset_p (Q_REGS, regclass)) return Q_REGS; + else if (MASK_CLASS_P (regclass)) + return regclass; else return NO_REGS; Update patch. -- BR, Hongtao