RE: nvptx vs. [PATCH] Add a late-combine pass [PR106594]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Thomas Schwinge'" <tschwinge@baylibre.com>,
	"'Richard Sandiford'" <richard.sandiford@arm.com>
Cc: <jlaw@ventanamicro.com>, <rdapp.gcc@gmail.com>,
	<gcc-patches@gcc.gnu.org>, "'Tom de Vries'" <tdevries@suse.de>
Subject: RE: nvptx vs. [PATCH] Add a late-combine pass [PR106594]
Date: Fri, 28 Jun 2024 07:07:50 +0100	[thread overview]
Message-ID: <007a01dac921$82d0d9c0$88728d40$@nextmovesoftware.com> (raw)
In-Reply-To: <87jzia2ict.fsf@euler.schwinge.ddns.net>


Hi Thomas,

There are two things I think I can contribute to this discussion.  The first is that I have
a patch (from a year or two ago) for adding rtx_costs to the nvptx backend that I will
respin, which will provide more backend control over combine-like pass decisions.

The second is in response to your comment below; nvptx is currently GCC's only
target.no_register_allocation target, and I think this provides an opportunity to
tweak/change these semantics.  Currently "no_register_allocation" turns off a 
whole bunch of post-reload passes (including things like peephole2 and predication
that would be extremely useful on nvptx).  My thoughts were that it would be nice,
if no_register_allocation turned off just/only the ira/reload passes, where the gate
function would instead simply set reload_completed.

One major advantage of this approach is that it would minimize the divergence
in code paths, between nvptx and all of GCC's other backends, which would hopefully
help with things like "late-combine" that may implicitly (and perhaps unintentionally)
require/expect certain passes to be run later.

Again I've investigated a patch/change or two towards the above approach, but its 
slightly more complicated than I'd anticipated, due to things like late nvptx-specific
passes assuming they can call gen_reg_rtx, after other targets have no_new_pseudos.
There's nothing that's impossible, but it would take a concerted effort by target and
middle-end maintainers to align a shallower "no_register_allocation".

I'm curious what folks think.

Best regards,
Roger
--

> -----Original Message-----
> From: Thomas Schwinge <tschwinge@baylibre.com>
> Sent: 27 June 2024 22:20
> To: Richard Sandiford <richard.sandiford@arm.com>
> Cc: jlaw@ventanamicro.com; rdapp.gcc@gmail.com; gcc-patches@gcc.gnu.org;
> Tom de Vries <tdevries@suse.de>; Roger Sayle <roger@nextmovesoftware.com>
> Subject: Re: nvptx vs. [PATCH] Add a late-combine pass [PR106594]
> 
> Hi!
> 
> On 2024-06-27T22:27:21+0200, I wrote:
> > On 2024-06-27T18:49:17+0200, I wrote:
> >> On 2023-10-24T19:49:10+0100, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >>> This patch adds a combine pass that runs late in the pipeline.
> >
> > [After sending, I realized I replied to a previous thread of this
> > work.]
> >
> >> I've beek looking a bit through recent nvptx target code generation
> >> changes for GCC target libraries, and thought I'd also share here my
> >> findings for the "late-combine" changes in isolation, for nvptx target.
> >>
> >> First the unexpected thing:
> >
> > So much for "unexpected thing" -- next level of unexpected here...
> > Appreciated if anyone feels like helping me find my way through this,
> > but I totally understand if you've got other things to do.
> 
> OK, I found something already.  (Unexpectedly quickly...)  ;-)
> 
> >> there are a few cases where we now see unused registers get declared,
> >> for example (random) in 'nvptx-none/newlib/libc/libm_a-s_modf.o:modf'
> 
> I've now looked into the former one ('tmp-libm_a-s_modf.i.xz' is attached), to
> avoid...
> 
> > I first looked into a simpler case: newlib 'libc/locale/lnumeric.c'.
> 
> >     ../../../source-gcc/newlib/libc/locale/lnumeric.c:88:10: warning: ‘ret’ is used
> uninitialized [-Wuninitialized]
> >        88 |   return ret;
> >           |          ^~~
> >     ../../../source-gcc/newlib/libc/locale/lnumeric.c:48:7: note: ‘ret’ was
> declared here
> >        48 |   int ret;
> >           |       ^~~
> >
> > Uh.  Given nothing else is going on in that function, I suppose '%r22'
> > relates to the uninitialized 'ret' -- and given undefined behavior,
> > GCC of course is fine to emit an unused 'reg' in that case...
> 
> ... the undefined behavior here.
> 
> But in fact, for both cases, the unexpected difference goes away if after
> 'pass_late_combine' I inject a 'pass_fast_rtl_dce'.  That's normally run as part of
> 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's all not active for
> nvptx target given '!reload_completed', given nvptx is
> 'targetm.no_register_allocation'.  Maybe we need to enable a few more passes,
> or is there anything in 'pass_late_combine' to change, so that we don't run into
> this?  Does it inadvertently mark registers live or something like that?
> 
> The following makes these two cases work, but evidently needs a lot more
> analysis: a lot of other passes are enabled that may be anything between
> beneficial and harmful for 'targetm.no_register_allocation'/nvptx.
> 
>     --- gcc/passes.cc
>     +++ gcc/passes.cc
>     @@ -676,17 +676,17 @@ const pass_data pass_data_postreload =
>      class pass_postreload : public rtl_opt_pass
>      {
>      public:
>        pass_postreload (gcc::context *ctxt)
>          : rtl_opt_pass (pass_data_postreload, ctxt)
>        {}
> 
>        /* opt_pass methods: */
>     -  bool gate (function *) final override { return reload_completed; }
>     +  bool gate (function *) final override { return reload_completed ||
> targetm.no_register_allocation; }
>     --- gcc/regcprop.cc
>     +++ gcc/regcprop.cc
>     @@ -1305,17 +1305,17 @@ class pass_cprop_hardreg : public rtl_opt_pass
>      public:
>        pass_cprop_hardreg (gcc::context *ctxt)
>          : rtl_opt_pass (pass_data_cprop_hardreg, ctxt)
>        {}
> 
>        /* opt_pass methods: */
>        bool gate (function *) final override
>          {
>     -      return (optimize > 0 && (flag_cprop_registers));
>     +      return (optimize > 0 && flag_cprop_registers &&
> !targetm.no_register_allocation);
>          }
> 
> 
> Grüße
>  Thomas
> 
> 
> > But: should we expect '-fno-late-combine-instructions' vs.
> > '-flate-combine-instructions' to behave in the same way?  (After all,
> > '%r22' remains unused also with '-flate-combine-instructions', and
> > doesn't need to be emitted.)  This could, of course, also be a nvptx
> > back end issue?
> >
> > I'm happy to supply any dump files etc.  Also, 'tmp-libc_a-lnumeric.i.xz'
> > is attached if you'd like to reproduce this with your own nvptx target
> > 'cc1':
> >
> >     $ [...]/configure --target=nvptx-none --enable-languages=c
> >     $ make -j12 all-gcc
> >     $ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase
> > tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin
> > -o tmp-libc_a-lnumeric.s -fdump-rtl-all #
> > -fno-late-combine-instructions
> >
> >
> > Grüße
> >  Thomas
>

     prev parent reply	other threads:[~2024-06-28  6:07 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-24 18:49 Richard Sandiford
2023-11-30 14:10 ` Ping: " Richard Sandiford
2023-12-11 15:23 ` Richard Sandiford
2023-12-11 16:18   ` Robin Dapp
2023-12-30 15:35 ` Ping^3: " Richard Sandiford
2024-01-01  3:11   ` YunQiang Su
2024-01-05 10:10     ` YunQiang Su
2023-12-30 18:13 ` Segher Boessenkool
2024-01-02  9:47   ` Richard Sandiford
2024-06-24 19:37     ` Segher Boessenkool
2024-06-25 10:31       ` Richard Biener
2024-06-25 17:22         ` YunQiang Su
2024-01-03  4:20 ` Jeff Law
2024-01-05 17:35   ` Richard Sandiford
2024-01-08  5:03     ` Jeff Law
2024-01-08 11:52       ` Richard Sandiford
2024-01-08 16:14         ` Jeff Law
2024-01-08 16:59           ` Richard Sandiford
2024-01-08 17:10             ` Jeff Law
2024-01-08 19:11               ` Richard Sandiford
2024-01-08 21:42                 ` Jeff Law
2024-01-10 13:01     ` Richard Sandiford
2024-01-10 13:35       ` Richard Biener
2024-01-10 16:27         ` Jeff Law
2024-01-10 16:40       ` Jeff Law
2024-06-21  4:50 ` Hongtao Liu
2024-06-27 16:49 ` nvptx vs. " Thomas Schwinge
2024-06-27 20:27   ` Thomas Schwinge
2024-06-27 21:20     ` Thomas Schwinge
2024-06-27 22:41       ` Thomas Schwinge
2024-06-28 14:01         ` Richard Sandiford
2024-06-28 16:48           ` Richard Sandiford
2024-07-01 11:55             ` Thomas Schwinge
2024-07-01 11:55         ` WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation' (was: nvptx vs. [PATCH] Add a late-combine pass [PR106594]) Thomas Schwinge
2024-06-28  6:07       ` Roger Sayle [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='007a01dac921$82d0d9c0$88728d40$@nextmovesoftware.com' \
    --to=roger@nextmovesoftware.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=rdapp.gcc@gmail.com \
    --cc=richard.sandiford@arm.com \
    --cc=tdevries@suse.de \
    --cc=tschwinge@baylibre.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).