From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 4927138582B0 for ; Thu, 7 Jul 2022 19:40:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4927138582B0 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=dZqioSJbcALgDUK8OjP2JEH1P6/M1O64m9G57e3EtK4=; b=UQLzVpXcQjVPrL9ep7A9AbbUJ3 gXy7gqXmQQrpvAyvDoQ2GZ8AGLO0shMbmxkzAhtJb22ofUgBk2v6nq4znUIJBbAWBtFQ0IQe4WcP8 hqMJjhiKxjkOI4QnTlDmHtcxnEy/HVW15BpEoCv52rwv/V4Hxtmuh+uoYawS29oO5fFW3xCeOtsvI dVP7mNsPrxY+aJOs5GevVMF5iRVpZp+FCr4LtTw3iCFd5go7i+q215ySQDdlxLSZZCVXBoNUUfoEL vvJ+8RlQUhqN+4VCh9A6t70GycZ8EgQNkl8EBCTHHMpj06vfi0910AGAYlWJH/LR0E4I0jUz08lVq 8c7RGk3Q==; Received: from host86-130-134-60.range86-130.btcentralplus.com ([86.130.134.60]:59570 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o9XMg-00006n-A3; Thu, 07 Jul 2022 15:40:42 -0400 From: "Roger Sayle" To: "'Kewen.Lin'" Cc: , "'Segher Boessenkool'" , "'David Edelsohn'" Subject: [PATCH/RFC] combine_completed global variable. Date: Thu, 7 Jul 2022 20:40:38 +0100 Message-ID: <001a01d89239$713109e0$53931da0$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_001B_01D89241.D2F7E2E0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdiSNY/g3uz/KT3sTQqCAe0x0mxxCg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2022 19:40:45 -0000 This is a multipart message in MIME format. ------=_NextPart_000_001B_01D89241.D2F7E2E0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Kewen (and Segher), Many thanks for stress testing my patch to improve multiplication by integer constants on rs6000 by using the rldmi instruction. Although I've not been able to reproduce your ICE (using gcc135 on the compile farm), I completely agree with Segher's analysis that the Achilles heel with my approach/patch is that there's currently no way for the backend/recog to know that we're in a pass after combine. Rather than give up on this optimization (and a similar one for I386.md where test;sete can be replaced by xor $1 when combine knows that nonzero_bits is 1, but loses that information afterwards), I thought I'd post this "strawman" proposal to add a combine_completed global variable, matching the reload_completed and regstack_completed global variables already used (to track progress) by the middle-end. I was wondering if I could ask you could test the attached patch in combination with my previous rs6000.md patch (with the obvious change of reload_completed to combine_completed) to confirm that it fixes the problems you were seeing. Segher/Richard, would this sort of patch be considered acceptable? Or is there a better approach/solution? 2022-07-07 Roger Sayle gcc/ChangeLog * combine.cc (combine_completed): New global variable. (rest_of_handle_combine): Set combine_completed after pass. * final.cc (rest_of_clean_state): Reset combine_completed. * rtl.h (combine_completed): Prototype here. Many thanks in advance, Roger -- > -----Original Message----- > From: Kewen.Lin > Sent: 27 June 2022 10:04 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org; Segher Boessenkool > ; David Edelsohn > Subject: Re: [rs6000 PATCH] Improve constant integer multiply using = rldimi. >=20 > Hi Roger, >=20 > on 2022/6/27 04:56, Roger Sayle wrote: > > > > > > This patch tweaks the code generated on POWER for integer > > multiplications > > > > by a constant, by making use of rldimi instructions. Much like = x86's > > > > lea instruction, rldimi can be used to implement a shift and add = pair > > > > in some circumstances. For rldimi this is when the shifted operand > > > > is known to have no bits in common with the added operand. > > > > > > > > Hence for the new testcase below: > > > > > > > > int foo(int x) > > > > { > > > > int t =3D x & 42; > > > > return t * 0x2001; > > > > } > > > > > > > > when compiled with -O2, GCC currently generates: > > > > > > > > andi. 3,3,0x2a > > > > slwi 9,3,13 > > > > add 3,9,3 > > > > extsw 3,3 > > > > blr > > > > > > > > with this patch, we now generate: > > > > > > > > andi. 3,3,0x2a > > > > rlwimi 3,3,13,0,31-13 > > > > extsw 3,3 > > > > blr > > > > > > > > It turns out this optimization already exists in the form of a = combine > > > > splitter in rs6000.md, but the constraints on combine splitters, > > > > requiring three of four input instructions (and generating one or = two > > > > output instructions) mean it doesn't get applied as often as it = could. > > > > This patch converts the define_split into a define_insn_and_split to > > > > catch more cases (such as the one above). > > > > > > > > The one bit that's tricky/controversial is the use of RTL's > > > > nonzero_bits which is accurate during the combine pass when this > > > > pattern is first recognized, but not as advanced (not kept up to > > > > date) when this pattern is eventually split. To support this, > > > > I've used a "|| reload_completed" idiom. Does this approach seem > > > > reasonable? [I've another patch of x86 that uses the same idiom]. > > > > >=20 > I tested this patch on powerpc64-linux-gnu, it caused the below ICE = against test > case gcc/testsuite/gcc.c-torture/compile/pr93098.c. >=20 > gcc/testsuite/gcc.c-torture/compile/pr93098.c: In function = =E2=80=98foo=E2=80=99: > gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: error: = unrecognizable insn: > (insn 104 32 34 2 (set (reg:SI 185 [+4 ]) > (ior:SI (and:SI (reg:SI 200 [+4 ]) > (const_int 4294967295 [0xffffffff])) > (ashift:SI (reg:SI 140) > (const_int 32 [0x20])))) "gcc/testsuite/gcc.c- > torture/compile/pr93098.c":6:11 -1 > (nil)) > during RTL pass: subreg3 > dump file: pr93098.c.291r.subreg3 > gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: internal compiler = error: in > extract_insn, at recog.cc:2791 0x101f664b _fatal_insn(char const*, = rtx_def > const*, char const*, int, char const*) > gcc/rtl-error.cc:108 > 0x101f6697 _fatal_insn_not_found(rtx_def const*, char const*, int, = char > const*) > gcc/rtl-error.cc:116 > 0x10ae427f extract_insn(rtx_insn*) > gcc/recog.cc:2791 > 0x11b239bb decompose_multiword_subregs > gcc/lower-subreg.cc:1555 > 0x11b25013 execute > gcc/lower-subreg.cc:1818 >=20 > The above trace shows we fails to recog the pattern again due to the = inaccurate > nonzero_bits information as you pointed out above. >=20 > There was another patch [1] which wasn't on trunk but touched this = same > define_split, not sure if that can help or we can follow the similar = idea. >=20 > [1] = https://gcc.gnu.org/pipermail/gcc-patches/2021-December/585841.html >=20 > BR, > Kewen ------=_NextPart_000_001B_01D89241.D2F7E2E0 Content-Type: text/plain; name="patchcc.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchcc.txt" diff --git a/gcc/combine.cc b/gcc/combine.cc=0A= index a5fabf3..9c87bf9 100644=0A= --- a/gcc/combine.cc=0A= +++ b/gcc/combine.cc=0A= @@ -128,6 +128,10 @@ static rtx i2mod_old_rhs;=0A= /* When I2MOD is nonnull, this is a copy of the new right hand side. */=0A= =0A= static rtx i2mod_new_rhs;=0A= +=0A= +/* Nonzero after end of combine pass. */=0A= +=0A= +int combine_completed =3D 0;=0A= =0C=0A= struct reg_stat_type {=0A= /* Record last point of death of (hard or pseudo) register n. */=0A= @@ -14991,6 +14995,7 @@ rest_of_handle_combine (void)=0A= }=0A= =0A= regstat_free_n_sets_and_refs ();=0A= + combine_completed =3D 1;=0A= return 0;=0A= }=0A= =0A= diff --git a/gcc/final.cc b/gcc/final.cc=0A= index 0352786..0fc2695 100644=0A= --- a/gcc/final.cc=0A= +++ b/gcc/final.cc=0A= @@ -4513,6 +4513,7 @@ rest_of_clean_state (void)=0A= flag_rerun_cse_after_global_opts =3D 0;=0A= reload_completed =3D 0;=0A= epilogue_completed =3D 0;=0A= + combine_completed =3D 0;=0A= #ifdef STACK_REGS=0A= regstack_completed =3D 0;=0A= #endif=0A= diff --git a/gcc/rtl.h b/gcc/rtl.h=0A= index 488016b..3bb92bd 100644=0A= --- a/gcc/rtl.h=0A= +++ b/gcc/rtl.h=0A= @@ -4105,6 +4105,9 @@ extern int reload_in_progress;=0A= /* Set to 1 while in lra. */=0A= extern int lra_in_progress;=0A= =0A= +/* Nonzero after end of combine pass. */=0A= +extern int combine_completed;=0A= +=0A= /* This macro indicates whether you may create a new=0A= pseudo-register. */=0A= =0A= ------=_NextPart_000_001B_01D89241.D2F7E2E0--