From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <roger@nextmovesoftware.com>
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id B0BA73858434
 for <gcc-patches@gcc.gnu.org>; Thu,  2 Jun 2022 07:20:41 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B0BA73858434
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
 Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=EttB6zJo4rvWeSzUxvHIslIQpduUlxe+aJJT5rTXQGk=; b=EbSlaDDUMr26L23YEoFxTFEjt3
 b/xXoaWDWojihiEzeB5GepXlKQPocnYh9Pzh450P7VhlvIKcuqXjc6XL8n6xV815Lcf3b6d82IMrs
 Rf7yjWPRpnZmTDHPTR4hSa3Rlz/3NNiiIZu8LogJOL2n8v1ZmrEnZrRbFtTnd1iVyvkjyxxK2KWU2
 sNuO9xw1GFk+pidxiNlUH+QfFOOQNkdI3e/1XWYsVmzBpHRhFZnvzkzJUv+tPbSAG5TyImSgHA6nD
 ELTs0U4iDih1A+ubEFh+cFjXXceDMtWfDBoKhaIsTKdn7zxn2TPvyVLMUa2sKqm/BeqGb8Mt5Ej9e
 0w+OTplw==;
Received: from host109-154-46-241.range109-154.btcentralplus.com
 ([109.154.46.241]:54842 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <roger@nextmovesoftware.com>)
 id 1nwf8L-0003sL-4V; Thu, 02 Jun 2022 03:20:41 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: [x86 PATCH] Add peephole2 to reduce double word register shuffling.
Date: Thu, 2 Jun 2022 08:20:39 +0100
Message-ID: <032501d87651$43cf0960$cb6d1c20$@nextmovesoftware.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="----=_NextPart_000_0326_01D87659.A595E260"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: Adh2UIoGktFLlF3xQmmXyBwxff3paw==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jun 2022 07:20:42 -0000

This is a multipart message in MIME format.

------=_NextPart_000_0326_01D87659.A595E260
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

The simple test case below demonstrates an interesting register
allocation challenge facing x86_64, imposed by ABI requirements
on int128.

__int128 foo(__int128 x, __int128 y)
{
  return x+y;
}

For which GCC currently generates the unusual sequence:

        movq    %rsi, %rax
        movq    %rdi, %r8
        movq    %rax, %rdi
        movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %r8, %rax
        adcq    %rdi, %rdx
        ret

The challenge is that the x86_64 ABI requires passing the first __int128,
x, in %rsi:%rdi (highpart in %rsi, lowpart in %rdi), where internally
GCC prefers TI mode (double word) integers to be register allocated as
%rdi:%rsi (highpart in %rdi, lowpart in %rsi).  So after reload, we have
four mov instructions, two to move the double word to temporary registers
and then two to move them back.

This patch adds a peephole2 to spot this register shuffling, and with
-Os generates a xchg instruction, to produce:

        xchgq   %rsi, %rdi
        movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %rsi, %rax
        adcq    %rdi, %rdx
        ret

or when optimizing for speed, a three mov sequence, using just one of
the temporary registers, which ultimately results in the improved:

        movq    %rdi, %r8
        movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %r8, %rax
        adcq    %rsi, %rdx
        ret

I've a follow-up patch which improves things further, and with the
output in flux, I'd like to add the new testcase with part 2, once
we're back down to requiring only two movq instructions.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures.  Ok for mainline?


2022-06-02  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/i386.md (define_peephole2): Recognize double word
        swap sequences, and replace them with more efficient idioms,
        including using xchg when optimizing for size.


Thanks in advance,
Roger
--


------=_NextPart_000_0326_01D87659.A595E260
Content-Type: text/plain;
	name="patchxg.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="patchxg.txt"

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A=
index 2b1d65b..f3cf6e2 100644=0A=
--- a/gcc/config/i386/i386.md=0A=
+++ b/gcc/config/i386/i386.md=0A=
@@ -3016,6 +3016,36 @@=0A=
   [(parallel [(set (match_dup 1) (match_dup 2))=0A=
 	      (set (match_dup 2) (match_dup 1))])])=0A=
 =0A=
+;; Replace a double word swap that requires 4 mov insns with a=0A=
+;; 3 mov insn implementation (or an xchg when optimizing for size).=0A=
+(define_peephole2=0A=
+  [(set (match_operand:DWIH 0 "general_reg_operand")=0A=
+	(match_operand:DWIH 1 "general_reg_operand"))=0A=
+   (set (match_operand:DWIH 2 "general_reg_operand")=0A=
+	(match_operand:DWIH 3 "general_reg_operand"))=0A=
+   (clobber (match_operand:<DWI> 4 "general_reg_operand"))=0A=
+   (set (match_dup 3) (match_dup 0))=0A=
+   (set (match_dup 1) (match_dup 2))]=0A=
+  "REGNO (operands[0]) !=3D REGNO (operands[3])=0A=
+   && REGNO (operands[1]) !=3D REGNO (operands[2])=0A=
+   && REGNO (operands[1]) !=3D REGNO (operands[3])=0A=
+   && REGNO (operands[3]) =3D=3D REGNO (operands[4])=0A=
+   && peep2_reg_dead_p (4, operands[0])=0A=
+   && peep2_reg_dead_p (5, operands[2])"=0A=
+  [(parallel [(set (match_dup 1) (match_dup 3))=0A=
+	      (set (match_dup 3) (match_dup 1))])]=0A=
+{=0A=
+  if (!optimize_insn_for_size_p ())=0A=
+    {=0A=
+      rtx tmp =3D REGNO (operands[0]) > REGNO (operands[2]) ? =
operands[0]=0A=
+							  : operands[2];=0A=
+      emit_move_insn (tmp, operands[1]);=0A=
+      emit_move_insn (operands[1], operands[3]);=0A=
+      emit_move_insn (operands[3], tmp);=0A=
+      DONE;=0A=
+    }=0A=
+})=0A=
+=0A=
 (define_expand "movstrict<mode>"=0A=
   [(set (strict_low_part (match_operand:SWI12 0 "register_operand"))=0A=
 	(match_operand:SWI12 1 "general_operand"))]=0A=

------=_NextPart_000_0326_01D87659.A595E260--