From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 8F0453858426 for ; Fri, 6 Oct 2023 13:58:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8F0453858426 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=tifJ/NUgD8zZLK07XGOpW091kRzrJAboRlbvS4XSNVE=; b=DCB+QhWyuDPvrtxZvKinMctUzz lxu7BRxGdb6znooTjVr7W2TeDKnjXWC1UL4E51OuhO5cyEldSi8ocXl/HIJREuwkD1fC0q8MCoKTq TyBBpiRm3LiPMZpbur0m9JG/8uBCnTwIME83hzoFeOqBiOtnXHo6LSM9wXoA9QKugg9dmQ90mdnrd rVcz0tu02CsmuYSNhkkgMPUIJclpPjtGp65SURF0hRHlR2iPP2EohFyWOn4sFpZ00qzsCoE1VOFbz g9qtKMx+78dBF+YVWutWAphGWtdDJ/XemiHOUHTiY2rrqJIO0fjYeB+LMj7lqt8Jnpk5wt/zXW2PK bS+MQuUQ==; Received: from [185.62.158.67] (port=61724 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qolLA-00059K-2R; Fri, 06 Oct 2023 09:58:04 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [X86 PATCH] Implement doubleword right shifts by 1 bit using s[ha]r+rcr. Date: Fri, 6 Oct 2023 14:58:02 +0100 Message-ID: <004601d9f85d$20923500$61b69f00$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adn4XCO95vDLi2wqTIe9gDA7EypPXA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,LIKELY_SPAM_BODY,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr functions to implement doubleword right shifts by 1 bit, using a shift of the highpart that sets the carry flag followed by a rotate-carry-right (RCR) instruction on the lowpart. Conceptually this is similar to the recent left shift patch, but with two complicating factors. The first is that although the RCR sequence is shorter, and is a ~3x performance improvement on AMD, my micro-benchmarking shows it ~10% slower on Intel. Hence this patch also introduces a new X86_TUNE_USE_RCR tuning parameter. The second is that I believe this is the first time a "rotate-right-through-carry" and a right shift that sets the carry flag from the least significant bit has been modelled in GCC RTL (on a MODE_CC target). For this I've used the i386 back-end's UNSPEC_CC_NE which seems appropriate. Finally rcrsi2 and rcrdi2 are separate define_insns so that we can use their generator functions. For the pair of functions: unsigned __int128 foo(unsigned __int128 x) { return x >> 1; } __int128 bar(__int128 x) { return x >> 1; } with -O2 -march=znver4 we previously generated: foo: movq %rdi, %rax movq %rsi, %rdx shrdq $1, %rsi, %rax shrq %rdx ret bar: movq %rdi, %rax movq %rsi, %rdx shrdq $1, %rsi, %rax sarq %rdx ret with this patch we now generate: foo: movq %rsi, %rdx movq %rdi, %rax shrq %rdx rcrq %rax ret bar: movq %rsi, %rdx movq %rdi, %rax sarq %rdx rcrq %rax ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. And to provide additional testing, I've also bootstrapped and regression tested a version of this patch where the RCR is always generated (independent of the -march target) again with no regressions. Ok for mainline? 2023-10-06 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.c (ix86_split_ashr): Split shifts by one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz. (ix86_split_lshr): Likewise, split shifts by one bit into lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz. * config/i386/i386.h (TARGET_USE_RCR): New backend macro. * config/i386/i386.md (rcrsi2): New define_insn for rcrl. (rcrdi2): New define_insn for rcrq. (3_carry): New define_insn for right shifts that set the carry flag from the least significant bit, modelled using UNSPEC_CC_NE. * config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning parameter controlling use of rcr 1 vs. shrd, which is significantly faster on AMD processors. gcc/testsuite/ChangeLog * gcc.target/i386/rcr-1.c: New 64-bit test case. * gcc.target/i386/rcr-2.c: New 32-bit test case. Thanks in advance, Roger --