From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com
 [IPv6:2607:f8b0:4864:20::729])
 by sourceware.org (Postfix) with ESMTPS id 62B61395A05A
 for <gcc-patches@gcc.gnu.org>; Thu,  2 Jun 2022 09:30:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 62B61395A05A
Received: by mail-qk1-x729.google.com with SMTP id 190so3231170qkj.8
 for <gcc-patches@gcc.gnu.org>; Thu, 02 Jun 2022 02:30:32 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=a1xAsslANKb5qx0M26se7wlKESvrO1cL8+PxpRSJ7Sk=;
 b=aXOHImcv2SKkM74rCjRILdE6LICGZjbmU/aixMB72c1pcm4S4Tw2tSoXXRouchbJ/z
 y459jpMBjjJwVra8elCsNdu1nDIu2joPt9+gVAADjGXGi25k3xXkPEDVoqRrD9NDexoH
 Mo/87CEq+UpmXh98kzO2Hhgr9OPB1LhBJ5vkW7M0nA1M2BPm5y8rcvsYsINiSJwigOoZ
 w1tErrTky5m3PJE/hOoNuUy18bznLuAd1VqHU2tTDLj2cwzrdhVfUIgzcuIW1TSvQl5p
 Xfwg/ssjwlSuMZH6V22m+Xw5TJFOmwAMnZcPPwOfZqZ9LPRuby7j93FaZ2K6tA6CNabu
 of/w==
X-Gm-Message-State: AOAM533ytKJaO2xlu+7MwshofbJ36J70k6TMrFLsuJjsdDgBYD//tW+I
 ugyMGRoipySF+Efl5ekG8EmRIe7JQZnZP+qhShw=
X-Google-Smtp-Source: ABdhPJwER3VAaf4bcyl1lNhrjLHzCN/Eg3SnfNNEf3HJVBXDKyqfvY7Hxo5gy+8Y8J1IB96yMKyQA5HbivqksMhA5jA=
X-Received: by 2002:a05:620a:2703:b0:6a6:48ba:5175 with SMTP id
 b3-20020a05620a270300b006a648ba5175mr2660529qkp.350.1654162231725; Thu, 02
 Jun 2022 02:30:31 -0700 (PDT)
MIME-Version: 1.0
References: <032501d87651$43cf0960$cb6d1c20$@nextmovesoftware.com>
In-Reply-To: <032501d87651$43cf0960$cb6d1c20$@nextmovesoftware.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Thu, 2 Jun 2022 11:30:20 +0200
Message-ID: <CAFiYyc1AyuX2up=giiU+wzC8suPyQv78xtmhAmNj-3tLSfrnsg@mail.gmail.com>
Subject: Re: [x86 PATCH] Add peephole2 to reduce double word register
 shuffling.
To: Roger Sayle <roger@nextmovesoftware.com>,
 "Vladimir N. Makarov" <vmakarov@redhat.com>, 
 Uros Bizjak <ubizjak@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jun 2022 09:30:34 -0000

On Thu, Jun 2, 2022 at 9:21 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
> The simple test case below demonstrates an interesting register
> allocation challenge facing x86_64, imposed by ABI requirements
> on int128.
>
> __int128 foo(__int128 x, __int128 y)
> {
>   return x+y;
> }
>
> For which GCC currently generates the unusual sequence:
>
>         movq    %rsi, %rax
>         movq    %rdi, %r8
>         movq    %rax, %rdi
>         movq    %rdx, %rax
>         movq    %rcx, %rdx
>         addq    %r8, %rax
>         adcq    %rdi, %rdx
>         ret
>
> The challenge is that the x86_64 ABI requires passing the first __int128,
> x, in %rsi:%rdi (highpart in %rsi, lowpart in %rdi), where internally
> GCC prefers TI mode (double word) integers to be register allocated as
> %rdi:%rsi (highpart in %rdi, lowpart in %rsi).

Do you know if this is a hard limitation?  I guess reg:TI 2 will cover
hardreg 2 and 3
and the overlap is always implicit adjacent hardregs?  I suspect that in other
places we prefer the current hardreg ordering so altering it to make it match
the __int128 register passing convention is not an option.

Alternatively TImode ops should be split before RA and for register passing
(concat:TI ...) could be allowed?

Fixing up after the fact is of course possible but it looks awkward that there's
no good way for the RA and the backend to communicate better here?

>  So after reload, we have
> four mov instructions, two to move the double word to temporary registers
> and then two to move them back.
>
> This patch adds a peephole2 to spot this register shuffling, and with
> -Os generates a xchg instruction, to produce:
>
>         xchgq   %rsi, %rdi
>         movq    %rdx, %rax
>         movq    %rcx, %rdx
>         addq    %rsi, %rax
>         adcq    %rdi, %rdx
>         ret
>
> or when optimizing for speed, a three mov sequence, using just one of
> the temporary registers, which ultimately results in the improved:
>
>         movq    %rdi, %r8
>         movq    %rdx, %rax
>         movq    %rcx, %rdx
>         addq    %r8, %rax
>         adcq    %rsi, %rdx
>         ret
>
> I've a follow-up patch which improves things further, and with the
> output in flux, I'd like to add the new testcase with part 2, once
> we're back down to requiring only two movq instructions.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} with
> no new failures.  Ok for mainline?
>
>
> 2022-06-02  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (define_peephole2): Recognize double word
>         swap sequences, and replace them with more efficient idioms,
>         including using xchg when optimizing for size.
>
>
> Thanks in advance,
> Roger
> --
>