From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 2CC9D3861837 for ; Thu, 5 Oct 2023 12:19:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2CC9D3861837 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:Date:Subject:In-Reply-To:References:Cc:To:From:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=KEX2ALx2/tFNpkbWx1yzSm7TM0otzKjeJ/jzBE06Up8=; b=ZsSwVearapPqXPILglL2Egbnoj iBKLmAp1wrL2HbMxrJANp4Ub3PNnmt8ec3ZGtuuLcNdgjkoMJ0GrIIV6YS+UbMt3O40HNiW0vZco6 lBtqSyNpYTydPzj9B/oN1M9mBpMEpoHMaLLgl76rFxKETZ36aCgkkLGrLZxZ9v9fY4M2klLtXrVGS 0HPl9/uR4gH5/ZhhgAs642BTB+cJXkitfXs2aD5H0CoNsL+NrtedSCvpbTXOOzB85ersXDGcI66VL /BjYvCaklUyzIYOuzBAKH0HmtIz6ChtswTw9U4JpszfIP1TrFDAzvj+QvD2MdyBWHIbtT3aTuS+/s a+M0UcaQ==; Received: from [185.62.158.67] (port=52158 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qoNKG-0002Eb-2S; Thu, 05 Oct 2023 08:19:32 -0400 From: "Roger Sayle" To: "'Uros Bizjak'" Cc: References: <00cd01d9f76b$3db62990$b9227cb0$@nextmovesoftware.com> In-Reply-To: Subject: RE: [X86 PATCH] Split lea into shorter left shift by 2 or 3 bits with -Oz. Date: Thu, 5 Oct 2023 13:19:30 +0100 Message-ID: <013101d9f786$31e77130$95b65390$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQCcI8nwK7CY4hB9KXUA0EctkVC2GQIjBrQ7sqWqaTA= Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,LIKELY_SPAM_BODY,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Uros, Very many thanks for the speedy reviews. Uros Bizjak wrote: > On Thu, Oct 5, 2023 at 11:06=E2=80=AFAM Roger Sayle = > wrote: > > > > > > This patch avoids long lea instructions for performing x<<2 and x<<3 > > by splitting them into shorter sal and move (or xchg instructions). > > Because this increases the number of instructions, but reduces the > > total size, its suitable for -Oz (but not -Os). > > > > The impact can be seen in the new test case: > > > > int foo(int x) { return x<<2; } > > int bar(int x) { return x<<3; } > > long long fool(long long x) { return x<<2; } long long barl(long = long > > x) { return x<<3; } > > > > where with -O2 we generate: > > > > foo: lea 0x0(,%rdi,4),%eax // 7 bytes > > retq > > bar: lea 0x0(,%rdi,8),%eax // 7 bytes > > retq > > fool: lea 0x0(,%rdi,4),%rax // 8 bytes > > retq > > barl: lea 0x0(,%rdi,8),%rax // 8 bytes > > retq > > > > and with -Oz we now generate: > > > > foo: xchg %eax,%edi // 1 byte > > shl $0x2,%eax // 3 bytes > > retq > > bar: xchg %eax,%edi // 1 byte > > shl $0x3,%eax // 3 bytes > > retq > > fool: xchg %rax,%rdi // 2 bytes > > shl $0x2,%rax // 4 bytes > > retq > > barl: xchg %rax,%rdi // 2 bytes > > shl $0x3,%rax // 4 bytes > > retq > > > > Over the entirety of the CSiBE code size benchmark this saves 1347 > > bytes (0.037%) for x86_64, and 1312 bytes (0.036%) with -m32. > > Conveniently, there's already a backend function in i386.cc for > > deciding whether to split an lea into its component instructions, > > ix86_avoid_lea_for_addr, all that's required is an additional clause > > checking for -Oz (i.e. optimize_size > 1). > > > > This patch has been tested on x86_64-pc-linux-gnu with make = bootstrap > > and make -k check, both with and without = --target_board=3D'unix{-m32}' > > with no new failures. Additional testing was performed by repeating > > these steps after removing the "optimize_size > 1" condition, so = that > > suitable lea instructions were always split [-Oz is not heavily > > tested, so this invoked the new code during the bootstrap and > > regression testing], again with no regressions. Ok for mainline? > > > > > > 2023-10-05 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/i386.cc (ix86_avoid_lea_for_addr): Split LEAs = used > > to perform left shifts into shorter instructions with -Oz. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/lea-2.c: New test case. > > >=20 > OK, but ... >=20 > @@ -0,0 +1,7 @@ > +/* { dg-do compile { target { ! ia32 } } } */ >=20 > Is there a reason to avoid 32-bit targets? I'd expect that the = optimization also > triggers on x86_32 for 32bit integers. Good catch. You're 100% correct; because the test case just checks that = an LEA is not used, and not for the specific sequence of shift instructions = used instead, this test also passes with --target_board=3D'unix{-m32}'. I'll remove = the target clause from the dg-do compile directive. > +/* { dg-options "-Oz" } */ > +int foo(int x) { return x<<2; } > +int bar(int x) { return x<<3; } > +long long fool(long long x) { return x<<2; } long long barl(long long > +x) { return x<<3; } > +/* { dg-final { scan-assembler-not "lea\[lq\]" } } */ Thanks again. Roger --