From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 7D99D3870C25 for ; Thu, 5 Oct 2023 11:45:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7D99D3870C25 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=KRrLe3QSCcQvkSfwMt9q05/p6+PSj8C3ZWEDniFJRc0=; b=EEMHSu9Zwd0sdLdlYdT/Vzjv1t dRAD1mQkecNTo+YM8WEl+o1ScDGwgDCK0OTFjIgwGgF+lPX/eAp/nZFoqgEU9qULacUNxi7YqKWNl 7Md5W9SPNSPc5Sv6nz97EQJpUfYyF5QcwX8JAWjuyX/j79mDZ9kGbOFWQjtC0GbgRfZtA11o7W8xI tKEax5qyt83x12JW1GCe72ubpEFQdkG/OrVOKWH5P13qLNNww38gdHvyPzJgaLENOpCptt5UqV+uQ D9AemxfjfYBVwLuntTtyazaUP5ezQmeKV2I2tp5Sva0ezHkG1q9k3+sLDgkioj4i7K4wjMBBt0vdw 8p81t8PQ==; Received: from [185.62.158.67] (port=52089 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qoMms-00010G-3C; Thu, 05 Oct 2023 07:45:03 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" References: In-Reply-To: Subject: RE: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc. Date: Thu, 5 Oct 2023 12:45:00 +0100 Message-ID: <00e201d9f781$603fdc20$20bf9460$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_00E3_01D9F789.C2044420" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQLVdfgrnr5a8EgltrjXjdGEP8ct+65EFotA Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_00E3_01D9F789.C2044420 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Doh! ENOPATCH. > -----Original Message----- > From: Roger Sayle > Sent: 05 October 2023 12:44 > To: 'gcc-patches@gcc.gnu.org' > Cc: 'Uros Bizjak' > Subject: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc. > > > This patch tweaks the i386 back-end's ix86_split_ashl to implement doubleword > left shifts by 1 bit, using an add followed by an add-with-carry (i.e. a doubleword > x+x) instead of using the x86's shld instruction. > The replacement sequence both requires fewer bytes and is faster on both Intel > and AMD architectures (from Agner Fog's latency tables and confirmed by my > own microbenchmarking). > > For the test case: > __int128 foo(__int128 x) { return x << 1; } > > with -O2 we previously generated: > > foo: movq %rdi, %rax > movq %rsi, %rdx > shldq $1, %rdi, %rdx > addq %rdi, %rax > ret > > with this patch we now generate: > > foo: movq %rdi, %rax > movq %rsi, %rdx > addq %rdi, %rax > adcq %rsi, %rdx > ret > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and > make -k check, both with and without --target_board=unix{-m32} with no new > failures. Ok for mainline? > > > 2023-10-05 Roger Sayle > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by > one into add3_cc_overflow_1 followed by add3_carry. > * config/i386/i386.md (@add3_cc_overflow_1): Renamed from > "*add3_cc_overflow_1" to provide generator function. > > gcc/testsuite/ChangeLog > * gcc.target/i386/ashldi3-2.c: New 32-bit test case. > * gcc.target/i386/ashlti3-3.c: New 64-bit test case. > > > Thanks in advance, > Roger > -- ------=_NextPart_000_00E3_01D9F789.C2044420 Content-Type: text/plain; name="patchrr.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchrr.txt" diff --git a/gcc/config/i386/i386-expand.cc = b/gcc/config/i386/i386-expand.cc=0A= index e42ff27..09e41c8 100644=0A= --- a/gcc/config/i386/i386-expand.cc=0A= +++ b/gcc/config/i386/i386-expand.cc=0A= @@ -6342,6 +6342,18 @@ ix86_split_ashl (rtx *operands, rtx scratch, = machine_mode mode)=0A= if (count > half_width)=0A= ix86_expand_ashl_const (high[0], count - half_width, mode);=0A= }=0A= + else if (count =3D=3D 1)=0A= + {=0A= + if (!rtx_equal_p (operands[0], operands[1]))=0A= + emit_move_insn (operands[0], operands[1]);=0A= + rtx x3 =3D gen_rtx_REG (CCCmode, FLAGS_REG);=0A= + rtx x4 =3D gen_rtx_LTU (mode, x3, const0_rtx);=0A= + half_mode =3D mode =3D=3D DImode ? SImode : DImode;=0A= + emit_insn (gen_add3_cc_overflow_1 (half_mode, low[0],=0A= + low[0], low[0]));=0A= + emit_insn (gen_add3_carry (half_mode, high[0], high[0], high[0],=0A= + x3, x4));=0A= + }=0A= else=0A= {=0A= gen_shld =3D mode =3D=3D DImode ? gen_x86_shld : gen_x86_64_shld;=0A= diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A= index eef8a0e..6a5bc16 100644=0A= --- a/gcc/config/i386/i386.md=0A= +++ b/gcc/config/i386/i386.md=0A= @@ -8864,7 +8864,7 @@=0A= [(set_attr "type" "alu")=0A= (set_attr "mode" "")])=0A= =0A= -(define_insn "*add3_cc_overflow_1"=0A= +(define_insn "@add3_cc_overflow_1"=0A= [(set (reg:CCC FLAGS_REG)=0A= (compare:CCC=0A= (plus:SWI=0A= diff --git a/gcc/testsuite/gcc.target/i386/ashldi3-2.c = b/gcc/testsuite/gcc.target/i386/ashldi3-2.c=0A= new file mode 100644=0A= index 0000000..053389d=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/ashldi3-2.c=0A= @@ -0,0 +1,10 @@=0A= +/* { dg-do compile { target ia32 } } */=0A= +/* { dg-options "-O2 -mno-stv" } */=0A= +=0A= +long long foo(long long x)=0A= +{=0A= + return x << 1;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler "adcl" } } */=0A= +/* { dg-final { scan-assembler-not "shldl" } } */=0A= diff --git a/gcc/testsuite/gcc.target/i386/ashlti3-3.c = b/gcc/testsuite/gcc.target/i386/ashlti3-3.c=0A= new file mode 100644=0A= index 0000000..4f14ca0=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/ashlti3-3.c=0A= @@ -0,0 +1,10 @@=0A= +/* { dg-do compile { target int128 } } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +__int128 foo(__int128 x)=0A= +{=0A= + return x << 1;=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler "adcq" } } */=0A= +/* { dg-final { scan-assembler-not "shldq" } } */=0A= ------=_NextPart_000_00E3_01D9F789.C2044420--