From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ubizjak@gmail.com>
Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com
 [IPv6:2607:f8b0:4864:20::72e])
 by sourceware.org (Postfix) with ESMTPS id 2701B3858285
 for <gcc-patches@gcc.gnu.org>; Mon,  8 Aug 2022 07:22:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2701B3858285
Received: by mail-qk1-x72e.google.com with SMTP id f14so5904919qkm.0
 for <gcc-patches@gcc.gnu.org>; Mon, 08 Aug 2022 00:22:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc;
 bh=T1VffAH4WRcKhho6ENMC8mj0WybWmFKR1OG/sIpIHX0=;
 b=Dlw6DkP2XdHIbHTKvlLp9lyJQf/CGEE9CQBrOaoK/1UuJ+3HjragnUNF+R+7dSOtRO
 uEJ+GiMvACbnjSKCtyUmcjcH5L2rx4giecAotGjiRIeP1o82PlBs0qxCs2ya5I95yBuV
 WfH6wCXzaTbCGy8OpRVI+Y5iRpwBK35k4FyaqRVb/00GuqXWw+T1ZeyD9zASS3C67pUD
 ouceDvzNO2Kx5wSZZSjEOt1EFVEIro/zTrmggenZpWQIp/gOmowZAGYs6I89iDvROw5D
 bAy2tf4bS4qp2qLiJtQxtTHsDecZEWYXBQuhBhsMvMEgNBTlY2WF0udwhjvrN9Mz5/Uv
 C5/w==
X-Gm-Message-State: ACgBeo2LuI1FzpuwjUWKFxKL7OraVTrSGAtx23Dq7nUR7M9okdBfCO2H
 PK5PZs4+9Lrt9ItZdcnXt9y2S46o9Ek6ZFz/4Qs=
X-Google-Smtp-Source: AA6agR7bBy1uClicQNwFFhIiidNjRS9V3a5kF67Z/iijA7eshpKGJ0mFtIqeUFuZVDOnXD9ZomAuIXkSujnq7A6jNNw=
X-Received: by 2002:a05:620a:2587:b0:6a7:ee6f:bf2a with SMTP id
 x7-20020a05620a258700b006a7ee6fbf2amr13086796qko.542.1659943331409; Mon, 08
 Aug 2022 00:22:11 -0700 (PDT)
MIME-Version: 1.0
References: <013401d8aa7f$bd006d80$37014880$@nextmovesoftware.com>
In-Reply-To: <013401d8aa7f$bd006d80$37014880$@nextmovesoftware.com>
From: Uros Bizjak <ubizjak@gmail.com>
Date: Mon, 8 Aug 2022 09:22:00 +0200
Message-ID: <CAFULd4Yq+6qEbAm7dr10abGUu117MGXJWRyOENcvqKSg35K7aw@mail.gmail.com>
Subject: Re: [x86 PATCH take #2] Add peephole2 to reduce double word register
 shuffling
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org, Richard Biener <rguenther@suse.de>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Aug 2022 07:22:14 -0000

On Sun, Aug 7, 2022 at 7:04 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This is a resubmission of my patch from June to fix some forms of
> inefficient
> register allocation using an additional peephole2 in i386.md.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596064.html
>
> Since the original, a number of supporting patches/improvements have
> been reviewed and approved, making this peephole even more effective.
> Hence for the simple function:
> __int128 foo(__int128 x, __int128 y) { return x+y; }
>
> mainline GCC on x86_64 with -O2 currently generates:
>         movq    %rsi, %rax
>         movq    %rdi, %r8
>         movq    %rax, %rdi
>         movq    %rdx, %rax
>         movq    %rcx, %rdx
>         addq    %r8, %rax
>         adcq    %rdi, %rdx
>         ret
>
> with this patch we now generate (a three insn improvement):
>         movq    %rdx, %rax
>         movq    %rcx, %rdx
>         addq    %rdi, %rax
>         adcq    %rsi, %rdx
>         ret
>
> Back in June the review of the original patch stalled, as peephole2
> isn't the ideal place to fix this (with which I fully agree), and this patch
> is really just a workaround for a deeper deficiency in reload/lra.
> To address this I've now filed a new enhancement PR in Bugzilla,
> PR rtl-optimization/106518, that describes that underlying issue,
> which might make an interesting (GSoC) project for anyone brave
> (fool hardy) enough to tweak GCC's register allocation.
>
> By comparison, this single peephole can't adversely affect other targets,
> and should the happy day come that it's no longer required, at worst
> would just become a harmless legacy transform that no longer triggers.
>
> I'm also investigating Uros' suggestion that it may be possible for RTL
> expansion to do a better job expanding the function prologue, but
> ultimately the hard register placement constraints are fixed by the
> target ABI, and poor allocation/assignment of hard registers is the
> responsibility/fault of the register allocation passes.
> But it may still be possible to reduce register pressure, but avoiding the
> use of SUBREGs (which keep the source and destination double words
> live during shuffling) along the lines of Richard's CONCAT suggestion.
>
> This patch has been retested again mainline using make bootstrap and
> make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok mainline?
>
>
> 2022-08-07  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/43644
>         PR rtl-optimization/97756
>         PR rtl-optimization/98438
>         * config/i386/i386.md (define_peephole2): Recognize double word
>         swap sequences, and replace them with more efficient idioms,
>         including using xchg when optimizing for size.
>
> gcc/testsuite/ChangeLog
>         PR target/43644
>         * gcc.target/i386/pr43644.c: New test case.

+;; Replace a double word swap that requires 4 mov insns with a
+;; 3 mov insn implementation (or an xchg when optimizing for size).
+(define_peephole2
+  [(set (match_operand:DWIH 0 "general_reg_operand")
+ (match_operand:DWIH 1 "general_reg_operand"))
+   (set (match_operand:DWIH 2 "general_reg_operand")
+ (match_operand:DWIH 3 "general_reg_operand"))
+   (clobber (match_operand:<DWI> 4 "general_reg_operand"))
+   (set (match_dup 3) (match_dup 0))
+   (set (match_dup 1) (match_dup 2))]
+  "REGNO (operands[0]) != REGNO (operands[3])
+   && REGNO (operands[1]) != REGNO (operands[2])
+   && REGNO (operands[1]) != REGNO (operands[3])
+   && REGNO (operands[3]) == REGNO (operands[4])
+   && peep2_reg_dead_p (4, operands[0])
+   && peep2_reg_dead_p (5, operands[2])"
+  [(parallel [(set (match_dup 1) (match_dup 3))
+      (set (match_dup 3) (match_dup 1))])]

I'm not sure it is correct to remove the clobber here. Some RTL expert
should comment on this change.

+  if (!optimize_insn_for_size_p ())
+    {
+      rtx tmp = REGNO (operands[0]) > REGNO (operands[2]) ? operands[0]
+  : operands[2];

Hm, this is a strange relation, and it is not obvious why it is done
in that way. Usually, REGNO (op1) != REGNO (op2) does the trick. At
least a comment should be added here.

Uros.

+      emit_move_insn (tmp, operands[1]);
+      emit_move_insn (operands[1], operands[3]);
+      emit_move_insn (operands[3], tmp);
+      DONE;
+    }


>
> Thanks in advance,
> Roger
> --
>