From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by sourceware.org (Postfix) with ESMTPS id 7FBBA3858CDA for ; Sun, 31 Jul 2022 17:32:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7FBBA3858CDA Received: by mail-qt1-x833.google.com with SMTP id bz13so6528963qtb.7 for ; Sun, 31 Jul 2022 10:32:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=LK2nqcSv/xMMbx2SRol7GrZrnJRYjf4Lw+fygQAAhIU=; b=w4+yDbYHqZtJrPua0TxAkUGMvLIHuujqtpVp5v4Vf7sQWKzt2Dm8MQqY+1lP/oB4wZ kCObhswNqE/qKynZ7PipghdBcBmA5nu2MH4zxSvdu9zXr+ZfmQXB8QnAVlsL1HXjc+4D 2pgtksK+YNpqB/CFYFmJcsn+8jHvjaiZVMKNcGbSk3LGIKTGAKjrWTweId5UBio53mEm HjUfXAe6g+ee1ubQVVAESBGNfzrMWHqaEAniWazExcowoa2phk4/SxSoYJoNSZvW9YNC 1s8y82GKuEwzXqEXzC6yVpjeKmWVJQP0p2PNhpkbzYIz6JQP0meU21mmVKQcHakkjNTx LlhA== X-Gm-Message-State: AJIora+zob/qV2+06b7ZUjqmVzbh23XFEAI6HJuRynXaM+GZ31nE0bDW qVI0+8ccVPlnByMVvAjEtk/TCAgbxRva9eDfgbA= X-Google-Smtp-Source: AGRyM1tuBC7aV671gOD6EBIU3kmzGQBDx2K1WXKtO9bw+k0P+EUAqzS++Pc66ZxT438oQjEQbZKNyvQD7qEbr5/Wsww= X-Received: by 2002:ac8:5a84:0:b0:31e:f60e:3449 with SMTP id c4-20020ac85a84000000b0031ef60e3449mr10827136qtc.57.1659288726794; Sun, 31 Jul 2022 10:32:06 -0700 (PDT) MIME-Version: 1.0 References: <035101d8a311$ea4f0b90$beed22b0$@nextmovesoftware.com> In-Reply-To: <035101d8a311$ea4f0b90$beed22b0$@nextmovesoftware.com> From: Uros Bizjak Date: Sun, 31 Jul 2022 19:31:55 +0200 Message-ID: Subject: Re: [x86_64 PATCH] Add rotl64ti2_doubleword pattern to i386.md To: Roger Sayle Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, MEDICAL_SUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Jul 2022 17:32:09 -0000 On Fri, Jul 29, 2022 at 8:10 AM Roger Sayle wrote: > > > This patch adds rot[lr]64ti2_doubleword patterns to the x86_64 backend, > to move splitting of 128-bit TImode rotates by 64 bits after reload, > matching what we now do for 64-bit DImode rotations by 32 bits with -m32. > > In theory moving when this rotation is split should have little > influence on code generation, but in practice "reload" sometimes > decides to make use of the increased flexibility to reduce the number > of registers used, and the code size, by using xchg. > > For example: > __int128 x; > __int128 y; > __int128 a; > __int128 b; > > void foo() > { > unsigned __int128 t = x; > t ^= a; > t = (t<<64) | (t>>64); > t ^= b; > y = t; > } > > Before: > movq x(%rip), %rsi > movq x+8(%rip), %rdi > xorq a(%rip), %rsi > xorq a+8(%rip), %rdi > movq %rdi, %rax > movq %rsi, %rdx > xorq b(%rip), %rax > xorq b+8(%rip), %rdx > movq %rax, y(%rip) > movq %rdx, y+8(%rip) > ret > > After: > movq x(%rip), %rax > movq x+8(%rip), %rdx > xorq a(%rip), %rax > xorq a+8(%rip), %rdx > xchgq %rdx, %rax > xorq b(%rip), %rax > xorq b+8(%rip), %rdx > movq %rax, y(%rip) > movq %rdx, y+8(%rip) > ret > > One some modern architectures this is a small win, on some older > architectures this is a small loss. The decision which code to > generate is made in "reload", and could probably be tweaked by > register preferencing. The much bigger win is that (eventually) all > TImode mode shifts and rotates by constants will become potential > candidates for TImode STV. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? > > > 2022-07-29 Roger Sayle > > gcc/ChangeLog > * config/i386/i386.md (define_expand ti3): For > rotations by 64 bits use new rot[lr]64ti2_doubleword pattern. > (rot[lr]64ti2_doubleword): New post-reload splitter. OK. Thanks, Uros.