public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: "Kewen.Lin" <linkw@linux.ibm.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	 David Edelsohn <dje.gcc@gmail.com>,
	Segher Boessenkool <segher@kernel.crashing.org>
Subject: Re: [PATCH/RFC] combine_completed global variable.
Date: Fri, 8 Jul 2022 08:51:18 +0200	[thread overview]
Message-ID: <CAFiYyc1C-=TDbtiXZWcKNhdXhpqM59kCQ8smGN2v7bWd66h7Fw@mail.gmail.com> (raw)
In-Reply-To: <001a01d89239$713109e0$53931da0$@nextmovesoftware.com>

On Thu, Jul 7, 2022 at 9:41 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Kewen (and Segher),
> Many thanks for stress testing my patch to improve multiplication
> by integer constants on rs6000 by using the rldmi instruction.
> Although I've not been able to reproduce your ICE (using gcc135
> on the compile farm), I completely agree with Segher's analysis
> that the Achilles heel with my approach/patch is that there's
> currently no way for the backend/recog to know that we're in a
> pass after combine.
>
> Rather than give up on this optimization (and a similar one for
> I386.md where test;sete can be replaced by xor $1 when combine
> knows that nonzero_bits is 1, but loses that information afterwards),
> I thought I'd post this "strawman" proposal to add a combine_completed
> global variable, matching the reload_completed and regstack_completed
> global variables already used (to track progress) by the middle-end.
>
> I was wondering if I could ask you could test the attached patch
> in combination with my previous rs6000.md patch (with the obvious
> change of reload_completed to combine_completed) to confirm
> that it fixes the problems you were seeing.
>
> Segher/Richard, would this sort of patch be considered acceptable?
> Or is there a better approach/solution?

Just to chime in - on the GIMPLE world we've transitioned to
using cfun->curr_properties for this (PROP_gimple_lvec, etc.)
so PROP_rtl_after_{combine,reload,regstack} would make this
state per function and avoids global variables (well, in exchange
for accessing cfun in the machine patterns of course).

The new variable/state needs some documentation.

A more functional state like can_create_pseudo_p () would be
more useful than 'combine_completed', but I haven't second-guessed
what exactly is no longer available after combine (some nonzero bits?
but aren't those only available _during_ combine?)

>
> 2022-07-07  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * combine.cc (combine_completed): New global variable.
>         (rest_of_handle_combine): Set combine_completed after pass.
>         * final.cc (rest_of_clean_state): Reset combine_completed.
>         * rtl.h (combine_completed): Prototype here.
>
>
> Many thanks in advance,
> Roger
> --
>
> > -----Original Message-----
> > From: Kewen.Lin <linkw@linux.ibm.com>
> > Sent: 27 June 2022 10:04
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org; Segher Boessenkool
> > <segher@kernel.crashing.org>; David Edelsohn <dje.gcc@gmail.com>
> > Subject: Re: [rs6000 PATCH] Improve constant integer multiply using rldimi.
> >
> > Hi Roger,
> >
> > on 2022/6/27 04:56, Roger Sayle wrote:
> > >
> > >
> > > This patch tweaks the code generated on POWER for integer
> > > multiplications
> > >
> > > by a constant, by making use of rldimi instructions.  Much like x86's
> > >
> > > lea instruction, rldimi can be used to implement a shift and add pair
> > >
> > > in some circumstances.  For rldimi this is when the shifted operand
> > >
> > > is known to have no bits in common with the added operand.
> > >
> > >
> > >
> > > Hence for the new testcase below:
> > >
> > >
> > >
> > > int foo(int x)
> > >
> > > {
> > >
> > >   int t = x & 42;
> > >
> > >   return t * 0x2001;
> > >
> > > }
> > >
> > >
> > >
> > > when compiled with -O2, GCC currently generates:
> > >
> > >
> > >
> > >         andi. 3,3,0x2a
> > >
> > >         slwi 9,3,13
> > >
> > >         add 3,9,3
> > >
> > >         extsw 3,3
> > >
> > >         blr
> > >
> > >
> > >
> > > with this patch, we now generate:
> > >
> > >
> > >
> > >         andi. 3,3,0x2a
> > >
> > >         rlwimi 3,3,13,0,31-13
> > >
> > >         extsw 3,3
> > >
> > >         blr
> > >
> > >
> > >
> > > It turns out this optimization already exists in the form of a combine
> > >
> > > splitter in rs6000.md, but the constraints on combine splitters,
> > >
> > > requiring three of four input instructions (and generating one or two
> > >
> > > output instructions) mean it doesn't get applied as often as it could.
> > >
> > > This patch converts the define_split into a define_insn_and_split to
> > >
> > > catch more cases (such as the one above).
> > >
> > >
> > >
> > > The one bit that's tricky/controversial is the use of RTL's
> > >
> > > nonzero_bits which is accurate during the combine pass when this
> > >
> > > pattern is first recognized, but not as advanced (not kept up to
> > >
> > > date) when this pattern is eventually split.  To support this,
> > >
> > > I've used a "|| reload_completed" idiom.  Does this approach seem
> > >
> > > reasonable? [I've another patch of x86 that uses the same idiom].
> > >
> > >
> >
> > I tested this patch on powerpc64-linux-gnu, it caused the below ICE against test
> > case gcc/testsuite/gcc.c-torture/compile/pr93098.c.
> >
> > gcc/testsuite/gcc.c-torture/compile/pr93098.c: In function ‘foo’:
> > gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: error: unrecognizable insn:
> > (insn 104 32 34 2 (set (reg:SI 185 [+4 ])
> >         (ior:SI (and:SI (reg:SI 200 [+4 ])
> >                 (const_int 4294967295 [0xffffffff]))
> >             (ashift:SI (reg:SI 140)
> >                 (const_int 32 [0x20])))) "gcc/testsuite/gcc.c-
> > torture/compile/pr93098.c":6:11 -1
> >      (nil))
> > during RTL pass: subreg3
> > dump file: pr93098.c.291r.subreg3
> > gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: internal compiler error: in
> > extract_insn, at recog.cc:2791 0x101f664b _fatal_insn(char const*, rtx_def
> > const*, char const*, int, char const*)
> >         gcc/rtl-error.cc:108
> > 0x101f6697 _fatal_insn_not_found(rtx_def const*, char const*, int, char
> > const*)
> >         gcc/rtl-error.cc:116
> > 0x10ae427f extract_insn(rtx_insn*)
> >         gcc/recog.cc:2791
> > 0x11b239bb decompose_multiword_subregs
> >         gcc/lower-subreg.cc:1555
> > 0x11b25013 execute
> >         gcc/lower-subreg.cc:1818
> >
> > The above trace shows we fails to recog the pattern again due to the inaccurate
> > nonzero_bits information as you pointed out above.
> >
> > There was another patch [1] which wasn't on trunk but touched this same
> > define_split, not sure if that can help or we can follow the similar idea.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/585841.html
> >
> > BR,
> > Kewen

  reply	other threads:[~2022-07-08  6:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 19:40 Roger Sayle
2022-07-08  6:51 ` Richard Biener [this message]
2022-07-08  7:21 ` Kewen.Lin
2022-07-08 20:23 ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc1C-=TDbtiXZWcKNhdXhpqM59kCQ8smGN2v7bWd66h7Fw@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=roger@nextmovesoftware.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).