From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by sourceware.org (Postfix) with ESMTPS id 2701B3858285 for ; Mon, 8 Aug 2022 07:22:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2701B3858285 Received: by mail-qk1-x72e.google.com with SMTP id f14so5904919qkm.0 for ; Mon, 08 Aug 2022 00:22:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=T1VffAH4WRcKhho6ENMC8mj0WybWmFKR1OG/sIpIHX0=; b=Dlw6DkP2XdHIbHTKvlLp9lyJQf/CGEE9CQBrOaoK/1UuJ+3HjragnUNF+R+7dSOtRO uEJ+GiMvACbnjSKCtyUmcjcH5L2rx4giecAotGjiRIeP1o82PlBs0qxCs2ya5I95yBuV WfH6wCXzaTbCGy8OpRVI+Y5iRpwBK35k4FyaqRVb/00GuqXWw+T1ZeyD9zASS3C67pUD ouceDvzNO2Kx5wSZZSjEOt1EFVEIro/zTrmggenZpWQIp/gOmowZAGYs6I89iDvROw5D bAy2tf4bS4qp2qLiJtQxtTHsDecZEWYXBQuhBhsMvMEgNBTlY2WF0udwhjvrN9Mz5/Uv C5/w== X-Gm-Message-State: ACgBeo2LuI1FzpuwjUWKFxKL7OraVTrSGAtx23Dq7nUR7M9okdBfCO2H PK5PZs4+9Lrt9ItZdcnXt9y2S46o9Ek6ZFz/4Qs= X-Google-Smtp-Source: AA6agR7bBy1uClicQNwFFhIiidNjRS9V3a5kF67Z/iijA7eshpKGJ0mFtIqeUFuZVDOnXD9ZomAuIXkSujnq7A6jNNw= X-Received: by 2002:a05:620a:2587:b0:6a7:ee6f:bf2a with SMTP id x7-20020a05620a258700b006a7ee6fbf2amr13086796qko.542.1659943331409; Mon, 08 Aug 2022 00:22:11 -0700 (PDT) MIME-Version: 1.0 References: <013401d8aa7f$bd006d80$37014880$@nextmovesoftware.com> In-Reply-To: <013401d8aa7f$bd006d80$37014880$@nextmovesoftware.com> From: Uros Bizjak Date: Mon, 8 Aug 2022 09:22:00 +0200 Message-ID: Subject: Re: [x86 PATCH take #2] Add peephole2 to reduce double word register shuffling To: Roger Sayle Cc: gcc-patches@gcc.gnu.org, Richard Biener Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Aug 2022 07:22:14 -0000 On Sun, Aug 7, 2022 at 7:04 PM Roger Sayle wrote: > > > This is a resubmission of my patch from June to fix some forms of > inefficient > register allocation using an additional peephole2 in i386.md. > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596064.html > > Since the original, a number of supporting patches/improvements have > been reviewed and approved, making this peephole even more effective. > Hence for the simple function: > __int128 foo(__int128 x, __int128 y) { return x+y; } > > mainline GCC on x86_64 with -O2 currently generates: > movq %rsi, %rax > movq %rdi, %r8 > movq %rax, %rdi > movq %rdx, %rax > movq %rcx, %rdx > addq %r8, %rax > adcq %rdi, %rdx > ret > > with this patch we now generate (a three insn improvement): > movq %rdx, %rax > movq %rcx, %rdx > addq %rdi, %rax > adcq %rsi, %rdx > ret > > Back in June the review of the original patch stalled, as peephole2 > isn't the ideal place to fix this (with which I fully agree), and this patch > is really just a workaround for a deeper deficiency in reload/lra. > To address this I've now filed a new enhancement PR in Bugzilla, > PR rtl-optimization/106518, that describes that underlying issue, > which might make an interesting (GSoC) project for anyone brave > (fool hardy) enough to tweak GCC's register allocation. > > By comparison, this single peephole can't adversely affect other targets, > and should the happy day come that it's no longer required, at worst > would just become a harmless legacy transform that no longer triggers. > > I'm also investigating Uros' suggestion that it may be possible for RTL > expansion to do a better job expanding the function prologue, but > ultimately the hard register placement constraints are fixed by the > target ABI, and poor allocation/assignment of hard registers is the > responsibility/fault of the register allocation passes. > But it may still be possible to reduce register pressure, but avoiding the > use of SUBREGs (which keep the source and destination double words > live during shuffling) along the lines of Richard's CONCAT suggestion. > > This patch has been retested again mainline using make bootstrap and > make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok mainline? > > > 2022-08-07 Roger Sayle > > gcc/ChangeLog > PR target/43644 > PR rtl-optimization/97756 > PR rtl-optimization/98438 > * config/i386/i386.md (define_peephole2): Recognize double word > swap sequences, and replace them with more efficient idioms, > including using xchg when optimizing for size. > > gcc/testsuite/ChangeLog > PR target/43644 > * gcc.target/i386/pr43644.c: New test case. +;; Replace a double word swap that requires 4 mov insns with a +;; 3 mov insn implementation (or an xchg when optimizing for size). +(define_peephole2 + [(set (match_operand:DWIH 0 "general_reg_operand") + (match_operand:DWIH 1 "general_reg_operand")) + (set (match_operand:DWIH 2 "general_reg_operand") + (match_operand:DWIH 3 "general_reg_operand")) + (clobber (match_operand: 4 "general_reg_operand")) + (set (match_dup 3) (match_dup 0)) + (set (match_dup 1) (match_dup 2))] + "REGNO (operands[0]) != REGNO (operands[3]) + && REGNO (operands[1]) != REGNO (operands[2]) + && REGNO (operands[1]) != REGNO (operands[3]) + && REGNO (operands[3]) == REGNO (operands[4]) + && peep2_reg_dead_p (4, operands[0]) + && peep2_reg_dead_p (5, operands[2])" + [(parallel [(set (match_dup 1) (match_dup 3)) + (set (match_dup 3) (match_dup 1))])] I'm not sure it is correct to remove the clobber here. Some RTL expert should comment on this change. + if (!optimize_insn_for_size_p ()) + { + rtx tmp = REGNO (operands[0]) > REGNO (operands[2]) ? operands[0] + : operands[2]; Hm, this is a strange relation, and it is not obvious why it is done in that way. Usually, REGNO (op1) != REGNO (op2) does the trick. At least a comment should be added here. Uros. + emit_move_insn (tmp, operands[1]); + emit_move_insn (operands[1], operands[3]); + emit_move_insn (operands[3], tmp); + DONE; + } > > Thanks in advance, > Roger > -- >