From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by sourceware.org (Postfix) with ESMTPS id 1F1653858418 for ; Sun, 5 Jun 2022 18:57:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1F1653858418 Received: by mail-qk1-x72a.google.com with SMTP id o68so9582762qkf.13 for ; Sun, 05 Jun 2022 11:57:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=8tbp3CKBWXThm0gqHBQKkKwuXcsgslXmQ29UXsqtA0U=; b=wymSPUCO1y+l5XDQlA2fZ3Db+zDyRvWX91SyhBsG0rRxlXemQIvu7ss29o4y3+oBZS cokEeyZLuqj5WDAdIvKF9ihMHc7cPJnbmoysUlf8Rc6bdMIEv1vDB6CvZ0iolZ5xHPWg hdrBPZI/1XUIH3xRql4FH+hIfsOeO43IDpqZ7zg4j5RZTiJraq807yPiybQelTeX04YG 6YKqxJ14fJ7kqu2uxAV6GSjCqE26S23hgjjRZzldFyehcYqyT7fA8XCGnNG+QU/wmCrN 36Quk+NoZV2nYN568frRHTkqECG2SWE8d9J6TSEu+SIBto+0xaz8UrtVUWHTwtuwC3fb Rocw== X-Gm-Message-State: AOAM5336hCgRFfMJo0lfhK9LSd8lMpAeQ/GorcE5ik/XU+HiLQr4Gxof C/gyb965UQHlAKBVj+rjIz09ZP5P87hqg07ariWJRR76JXb1YA== X-Google-Smtp-Source: ABdhPJwAHg0mTfItiA/PslsFKSmy817GIVFcbUe8uTgtR1UsqlWI4cTB8b0klm/iRZLrBBswz7H6oA6iYxiV75kbvpM= X-Received: by 2002:a05:620a:2903:b0:6a0:4d8f:8b88 with SMTP id m3-20020a05620a290300b006a04d8f8b88mr13699697qkp.328.1654455441375; Sun, 05 Jun 2022 11:57:21 -0700 (PDT) MIME-Version: 1.0 References: <0af601d8772f$43ba22f0$cb2e68d0$@nextmovesoftware.com> <009201d878d2$2591ff60$70b5fe20$@nextmovesoftware.com> In-Reply-To: <009201d878d2$2591ff60$70b5fe20$@nextmovesoftware.com> From: Uros Bizjak Date: Sun, 5 Jun 2022 20:57:10 +0200 Message-ID: Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations. To: Roger Sayle Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jun 2022 18:57:24 -0000 On Sun, Jun 5, 2022 at 1:48 PM Roger Sayle wro= te: > > > Hi Uros, > Many thanks for your speedy review. This revised patch implements > all three of your recommended improvements; the use of > ix86_binary_operator_ok with code UNKNOWN, the removal of > "n" constraints from const_int_operand predicates, and the use > of force_reg (for input operands, and REG_P for output operands) > to ensure that it's always safe to call gen_lowpart/gen_highpart. > > [If we proceed with the recent proposal to split double word > addition, subtraction and other operations before reload, then > these new add/sub variants should be updated at the same time, > but for now this patch keeps double word patterns consistent]. > > This revised patch has been retested on x86_64-pc-linux-gnu with > make bootstrap and make -k check, both with and without > --target_board=3Dunix{-m32} with no new failures. Ok for mainline? +(define_insn_and_split "*concat3_1" + [(set (match_operand: 0 "register_operand" "=3Dr") + (any_or_plus: + (ashift: (match_operand: 1 "register_operand" "r") + (match_operand: 2 "const_int_operand")) + (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))] + "INTVAL (operands[2]) =3D=3D * BITS_PER_UNIT + && REG_P (operands[0]) + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (match_dup 4) (match_dup 3)) + (set (match_dup 5) (match_dup 6))] +{ + operands[1] =3D force_reg (mode, operands[1]); + operands[4] =3D gen_lowpart (mode, operands[0]); + operands[5] =3D gen_highpart (mode, operands[0]); + operands[6] =3D gen_lowpart (mode, operands[1]); +}) Hm, but in this particular case (and other) you can use split_double_mode on operands[0], instead of manually splitting REG_P constrained operands, and it will handle everything correctly. Please note that split_double_mode has: split_double_mode (machine_mode mode, rtx operands[], int num, rtx lo_half[], rtx hi_half[]) so with some care you can use: "split_double_mode (mode, &operands[0],1, &operands[4], &operands[5]);= " followed by: operands[6] =3D simplify_gen_subreg (mode, op, mode, 0); The above line is partially what split_double_mode does. This is the approach other pre_reload doubleword splitters take, it looks the safest (otherwise it would break left and right with existing patterns ...), and the most effective to me. Please also get approval for sse.md change from Hongtao, AVX512F stuff is in a separate playground. Uros. > > 2022-06-05 Roger Sayle > Uro=C5=A1 Bizjak > > gcc/ChangeLog > PR target/91681 > * config/i386/i386.md (zero_extendditi2): New define_insn_and_spl= it. > (*add3_doubleword_zext): New define_insn_and_split. > (*sub3_doubleword_zext): New define_insn_and_split. > (*concat3_1): New define_insn_and_split replacing > previous define_split for implementing DST =3D (HI<<32)|LO as > pair of move instructions, setting lopart and hipart. > (*concat3_2): Likewise. > (*concat3_3): Likewise, where HI is zero_extended. > (*concat3_4): Likewise, where HI is zero_extended. > * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec. > (kunpcksi): Likewise, add UNSPEC_MASKOP unspec. > (kunpckdi): Likewise, add UNSPEC_MASKOP unspec. > (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP uns= pec. > (vec_pack_trunc_): Likewise. > > gcc/testsuite/ChangeLog > PR target/91681 > * g++.target/i386/pr91681.C: New test case (from the PR). > * gcc.target/i386/pr91681-1.c: New int128 test case. > * gcc.target/i386/pr91681-2.c: Likewise. > * gcc.target/i386/pr91681-3.c: Likewise, but for ia32. > > > Thanks again, > Roger > -- > > > -----Original Message----- > > From: Uros Bizjak > > Sent: 03 June 2022 11:08 > > To: Roger Sayle > > Cc: GCC Patches > > Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for = more > > optimizations. > > > > On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle > > wrote: > > > > > > > > > Technically, PR target/91681 has already been resolved; we now > > > recognize the highpart multiplication at the tree-level, we no longer > > > use the stack, and we currently generate the same number of > > > instructions as LLVM. However, it is still possible to do better, th= e > > > current x86_64 code to generate a double word addition of a zero exte= nded > > operand, looks like: > > > > > > xorl %r11d, %r11d > > > addq %r10, %rax > > > adcq %r11, %rdx > > > > > > when it's possible (as LLVM does) to use an immediate constant: > > > > > > addq %r10, %rax > > > adcq $0, %rdx > > > > > > To do this, the backend required one or two simple changes, that then > > > themselves required one or two more obscure tweaks. > > > > > > The simple starting point is to define a zero_extendditi2 pattern, fo= r > > > zero extension from DImode to TImode on TARGET_64BIT that is split > > > after reload. Double word (TImode) addition/subtraction is split > > > after reload, so that constrains when things should happen. > > > > > > With zero extension now visible to combine, we add two new > > > define_insn_and_split that add/subtract a zero extended operand in > > > double word mode. These apply to both 32-bit and 64-bit code > > > generation, to produce adc $0 and sbb $0. > > > > > > The first strange tweak is that these new patterns interfere with the > > > optimization that recognizes DW:DI =3D (HI:SI<<32)+LO:SI as a pair of > > > register moves, or more accurately the combine splitter no longer > > > triggers as we're now converting two instructions into two > > > instructions (not three instructions into two instructions). This is > > > easily repaired (and extended to handle TImode) by changing from a > > > pair of define_split (that handle operand commutativity) to a set of > > > four define_insn_and_split (again to handle operand commutativity). > > > > > > The other/final strange tweak that the above splitters now interfere > > > with AVX512's kunpckdq instruction which is defined as identical RTL, > > > DW:DI =3D (HI:SI<<32)|zero_extend(LO:SI). To distinguish this, and a= lso > > > avoid AVX512 mask registers being used by reload to perform SImode > > > scalar shifts, I've added the explicit (unspec UNSPEC_MASKOP) to the > > > unpack mask operations, which matches what sse.md does for the other > > > mask specific (logic) operations. > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=3Dunix{-m32}, > > > with no new failures. Ok for mainline? > > > > > > > > > 2022-06-03 Roger Sayle > > > > > > gcc/ChangeLog > > > PR target/91681 > > > * config/i386/i386.md (zero_extendditi2): New define_insn_and= _split. > > > (*add3_doubleword_zext): New define_insn_and_split. > > > (*sub3_doubleword_zext): New define_insn_and_split. > > > (*concat3_1): New define_insn_and_split replacing > > > previous define_split for implementing DST =3D (HI<<32)|LO as > > > pair of move instructions, setting lopart and hipart. > > > (*concat3_2): Likewise. > > > (*concat3_3): Likewise, where HI is zero_extended. > > > (*concat3_4): Likewise, where HI is zero_extended. > > > * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec. > > > (kunpcksi): Likewise, add UNSPEC_MASKOP unspec. > > > (kunpckdi): Likewise, add UNSPEC_MASKOP unspec. > > > (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP > > > unspec. > > > (vec_pack_trunc_): Likewise. > > > > > > gcc/testsuite/ChangeLog > > > PR target/91681 > > > * g++.target/i386/pr91681.C: New test case (from the PR). > > > * gcc.target/i386/pr91681-1.c: New int128 test case. > > > * gcc.target/i386/pr91681-2.c: Likewise. > > > * gcc.target/i386/pr91681-3.c: Likewise, but for ia32. > > > > + "MEM_P (operands[0]) ? rtx_equal_p (operands[0], operands[1]) && > > + !MEM_P (operands[2]) > > + : !MEM_P (operands[1])" > > > > Can we use ix86_binary_operator_ok (UNKNOWN, ...mode..., operands) here > > instead? > > > > (UNKNOWN RTX code is used to prevent unwanted optimization with > > commutative operands). > > > > Uros.