From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by sourceware.org (Postfix) with ESMTPS id AEF7B3858C2C for ; Fri, 8 Jul 2022 06:51:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AEF7B3858C2C Received: by mail-qt1-x836.google.com with SMTP id g14so26062204qto.9 for ; Thu, 07 Jul 2022 23:51:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=6DeZXDq3UNdyWZTenU3Pzx1ecMXWtYjUVXNuoG8Dbnk=; b=tKgB06FO391UH6QK8AistX1EH7l9G0jZF3aYEh9GA/AB7+67c+V5nv/8WxzpfejTM2 cpbdDXR6tHjcc1zpn2sE7WRHH2G/33HnMz8yOb2WsBXDw83sWfFNH3FrfCmjsoVIzdLN fU/U3uObS7X8wrhi/klpglJ8qr8x7otA6pKmk0LQvjffwUI47XUOlAqVYIVssBvhonWM lR7i6NH+PkFDCtb+O+IP6oj7AEwTsL2j7pSX731lnsC1/BKhmc/BqB5f4Oz05fF9dtjM oObLMia5KGNHZeOEYutWRvIsPiACHpoMeqpZS49lRqCKZnR3hzu5N7dr3bjgelcsI/fR kXxA== X-Gm-Message-State: AJIora+5Nlv8uhIw4rP6pL2QjyHWQZMqzXzWhh85Nf5OAMedyXxDFU0R gPwtv9THXCTLPCVScSohUklNLy6JdrvGydCLS/o= X-Google-Smtp-Source: AGRyM1uLpV074mv4vgkojirVjd0ILpc1eCL0iZSm6qVE9mI4RoiIeiUZyAPQbLFnYiJS0iqNUUUAEF4uL+IlIBk45dw= X-Received: by 2002:a05:622a:c7:b0:31d:47d3:526 with SMTP id p7-20020a05622a00c700b0031d47d30526mr1583598qtw.581.1657263089529; Thu, 07 Jul 2022 23:51:29 -0700 (PDT) MIME-Version: 1.0 References: <001a01d89239$713109e0$53931da0$@nextmovesoftware.com> In-Reply-To: <001a01d89239$713109e0$53931da0$@nextmovesoftware.com> From: Richard Biener Date: Fri, 8 Jul 2022 08:51:18 +0200 Message-ID: Subject: Re: [PATCH/RFC] combine_completed global variable. To: Roger Sayle Cc: "Kewen.Lin" , GCC Patches , David Edelsohn , Segher Boessenkool Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2022 06:51:32 -0000 On Thu, Jul 7, 2022 at 9:41 PM Roger Sayle wro= te: > > > Hi Kewen (and Segher), > Many thanks for stress testing my patch to improve multiplication > by integer constants on rs6000 by using the rldmi instruction. > Although I've not been able to reproduce your ICE (using gcc135 > on the compile farm), I completely agree with Segher's analysis > that the Achilles heel with my approach/patch is that there's > currently no way for the backend/recog to know that we're in a > pass after combine. > > Rather than give up on this optimization (and a similar one for > I386.md where test;sete can be replaced by xor $1 when combine > knows that nonzero_bits is 1, but loses that information afterwards), > I thought I'd post this "strawman" proposal to add a combine_completed > global variable, matching the reload_completed and regstack_completed > global variables already used (to track progress) by the middle-end. > > I was wondering if I could ask you could test the attached patch > in combination with my previous rs6000.md patch (with the obvious > change of reload_completed to combine_completed) to confirm > that it fixes the problems you were seeing. > > Segher/Richard, would this sort of patch be considered acceptable? > Or is there a better approach/solution? Just to chime in - on the GIMPLE world we've transitioned to using cfun->curr_properties for this (PROP_gimple_lvec, etc.) so PROP_rtl_after_{combine,reload,regstack} would make this state per function and avoids global variables (well, in exchange for accessing cfun in the machine patterns of course). The new variable/state needs some documentation. A more functional state like can_create_pseudo_p () would be more useful than 'combine_completed', but I haven't second-guessed what exactly is no longer available after combine (some nonzero bits? but aren't those only available _during_ combine?) > > 2022-07-07 Roger Sayle > > gcc/ChangeLog > * combine.cc (combine_completed): New global variable. > (rest_of_handle_combine): Set combine_completed after pass. > * final.cc (rest_of_clean_state): Reset combine_completed. > * rtl.h (combine_completed): Prototype here. > > > Many thanks in advance, > Roger > -- > > > -----Original Message----- > > From: Kewen.Lin > > Sent: 27 June 2022 10:04 > > To: Roger Sayle > > Cc: gcc-patches@gcc.gnu.org; Segher Boessenkool > > ; David Edelsohn > > Subject: Re: [rs6000 PATCH] Improve constant integer multiply using rld= imi. > > > > Hi Roger, > > > > on 2022/6/27 04:56, Roger Sayle wrote: > > > > > > > > > This patch tweaks the code generated on POWER for integer > > > multiplications > > > > > > by a constant, by making use of rldimi instructions. Much like x86's > > > > > > lea instruction, rldimi can be used to implement a shift and add pair > > > > > > in some circumstances. For rldimi this is when the shifted operand > > > > > > is known to have no bits in common with the added operand. > > > > > > > > > > > > Hence for the new testcase below: > > > > > > > > > > > > int foo(int x) > > > > > > { > > > > > > int t =3D x & 42; > > > > > > return t * 0x2001; > > > > > > } > > > > > > > > > > > > when compiled with -O2, GCC currently generates: > > > > > > > > > > > > andi. 3,3,0x2a > > > > > > slwi 9,3,13 > > > > > > add 3,9,3 > > > > > > extsw 3,3 > > > > > > blr > > > > > > > > > > > > with this patch, we now generate: > > > > > > > > > > > > andi. 3,3,0x2a > > > > > > rlwimi 3,3,13,0,31-13 > > > > > > extsw 3,3 > > > > > > blr > > > > > > > > > > > > It turns out this optimization already exists in the form of a combin= e > > > > > > splitter in rs6000.md, but the constraints on combine splitters, > > > > > > requiring three of four input instructions (and generating one or two > > > > > > output instructions) mean it doesn't get applied as often as it could= . > > > > > > This patch converts the define_split into a define_insn_and_split to > > > > > > catch more cases (such as the one above). > > > > > > > > > > > > The one bit that's tricky/controversial is the use of RTL's > > > > > > nonzero_bits which is accurate during the combine pass when this > > > > > > pattern is first recognized, but not as advanced (not kept up to > > > > > > date) when this pattern is eventually split. To support this, > > > > > > I've used a "|| reload_completed" idiom. Does this approach seem > > > > > > reasonable? [I've another patch of x86 that uses the same idiom]. > > > > > > > > > > I tested this patch on powerpc64-linux-gnu, it caused the below ICE aga= inst test > > case gcc/testsuite/gcc.c-torture/compile/pr93098.c. > > > > gcc/testsuite/gcc.c-torture/compile/pr93098.c: In function =E2=80=98foo= =E2=80=99: > > gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: error: unrecognizab= le insn: > > (insn 104 32 34 2 (set (reg:SI 185 [+4 ]) > > (ior:SI (and:SI (reg:SI 200 [+4 ]) > > (const_int 4294967295 [0xffffffff])) > > (ashift:SI (reg:SI 140) > > (const_int 32 [0x20])))) "gcc/testsuite/gcc.c- > > torture/compile/pr93098.c":6:11 -1 > > (nil)) > > during RTL pass: subreg3 > > dump file: pr93098.c.291r.subreg3 > > gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: internal compiler e= rror: in > > extract_insn, at recog.cc:2791 0x101f664b _fatal_insn(char const*, rtx_= def > > const*, char const*, int, char const*) > > gcc/rtl-error.cc:108 > > 0x101f6697 _fatal_insn_not_found(rtx_def const*, char const*, int, char > > const*) > > gcc/rtl-error.cc:116 > > 0x10ae427f extract_insn(rtx_insn*) > > gcc/recog.cc:2791 > > 0x11b239bb decompose_multiword_subregs > > gcc/lower-subreg.cc:1555 > > 0x11b25013 execute > > gcc/lower-subreg.cc:1818 > > > > The above trace shows we fails to recog the pattern again due to the in= accurate > > nonzero_bits information as you pointed out above. > > > > There was another patch [1] which wasn't on trunk but touched this same > > define_split, not sure if that can help or we can follow the similar id= ea. > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/585841.html > > > > BR, > > Kewen