public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* How do I stop gcc from loading data into registers when that's not needed?
@ 2018-05-18 18:03 Paul Koning
  2018-05-18 18:07 ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Koning @ 2018-05-18 18:03 UTC (permalink / raw)
  To: GCC

Gents,

In some targets, like pdp11 and vax, most instructions can reference data in memory directly.

So when I have "if (x < y) ..." I would expect something like this:

	cmpw  x, y
	bgeq  1f
	...

What I actually see, with -O2 and/or -Os, is:

	movw  x, r0
	movw  y, r1
	cmpw  r0, r1
	bgeq  1f
	...

which is both longer and slower.  I can't tell why this happens, or how to stop it.  The machine description has "general_operand" so it doesn't seem to be the place that forces things into registers.

	paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How do I stop gcc from loading data into registers when that's not needed?
  2018-05-18 18:03 How do I stop gcc from loading data into registers when that's not needed? Paul Koning
@ 2018-05-18 18:07 ` Richard Biener
       [not found]   ` <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net>
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2018-05-18 18:07 UTC (permalink / raw)
  To: gcc, Paul Koning, GCC

On May 18, 2018 8:03:05 PM GMT+02:00, Paul Koning <paulkoning@comcast.net> wrote:
>Gents,
>
>In some targets, like pdp11 and vax, most instructions can reference
>data in memory directly.
>
>So when I have "if (x < y) ..." I would expect something like this:
>
>	cmpw  x, y
>	bgeq  1f
>	...
>
>What I actually see, with -O2 and/or -Os, is:
>
>	movw  x, r0
>	movw  y, r1
>	cmpw  r0, r1
>	bgeq  1f
>	...
>
>which is both longer and slower.  I can't tell why this happens, or how
>to stop it.  The machine description has "general_operand" so it
>doesn't seem to be the place that forces things into registers.

I would expect combine to merge the load and arithmetic and thus it is eventually the target costing that makes that not succeed. 

Richard. 

>	paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How do I stop gcc from loading data into registers when that's not needed?
       [not found]   ` <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net>
@ 2018-05-22  8:49     ` Richard Biener
  2018-05-22 19:26       ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2018-05-22  8:49 UTC (permalink / raw)
  To: paulkoning, GCC Development

On Tue, May 22, 2018 at 2:19 AM Paul Koning <paulkoning@comcast.net> wrote:



> > On May 18, 2018, at 2:07 PM, Richard Biener <richard.guenther@gmail.com>
wrote:
> >
> > On May 18, 2018 8:03:05 PM GMT+02:00, Paul Koning <
paulkoning@comcast.net> wrote:
> >> Gents,
> >>
> >> In some targets, like pdp11 and vax, most instructions can reference
> >> data in memory directly.
> >>
> >> So when I have "if (x < y) ..." I would expect something like this:
> >>
> >>      cmpw  x, y
> >>      bgeq  1f
> >>      ...
> >>
> >> What I actually see, with -O2 and/or -Os, is:
> >>
> >>      movw  x, r0
> >>      movw  y, r1
> >>      cmpw  r0, r1
> >>      bgeq  1f
> >>      ...
> >>
> >> which is both longer and slower.  I can't tell why this happens, or how
> >> to stop it.  The machine description has "general_operand" so it
> >> doesn't seem to be the place that forces things into registers.
> >
> > I would expect combine to merge the load and arithmetic and thus it is
eventually the target costing that makes that not succeed.
> >
> > Richard.

> Thanks Richard.  I am not having a whole lot of luck figuring out where
precisely I need to adjust or how to make the adjustment.  I'm doing the
adjusting on the pdp11 port right now.  That has a TARGET_RTX_COSTS hook
which looks fairly plausible.  It doesn't currently have a
TARGET_MEMORY_MOVE_COST, or TARGET_ADDRESS_COST, or TARGET_INSN_COST.  It
is likely that I need some or all of those to get this working better?  If
yes, any hints you can offer where to start?

Sorry, I'm not very familiar with this area of GCC either.  Did you confirm
that combine at least tries to merge the memory ops into the instruction?
Maybe it is only RA / reload that will try.  You can look at how it works
for x86, for example whether there's already memory ops in the stmts
during RTL expansion which can happen I think when the load has a single
use (and if that works for pdp11).

Richard.

>          paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How do I stop gcc from loading data into registers when that's not needed?
  2018-05-22  8:49     ` Richard Biener
@ 2018-05-22 19:26       ` Segher Boessenkool
  2018-05-23  0:50         ` Paul Koning
  0 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2018-05-22 19:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: paulkoning, GCC Development

On Tue, May 22, 2018 at 10:49:35AM +0200, Richard Biener wrote:
> On Tue, May 22, 2018 at 2:19 AM Paul Koning <paulkoning@comcast.net> wrote:
> > > On May 18, 2018, at 2:07 PM, Richard Biener <richard.guenther@gmail.com>
> wrote:
> > > On May 18, 2018 8:03:05 PM GMT+02:00, Paul Koning <
> paulkoning@comcast.net> wrote:
> > >> In some targets, like pdp11 and vax, most instructions can reference
> > >> data in memory directly.
> > >>
> > >> So when I have "if (x < y) ..." I would expect something like this:
> > >>
> > >>      cmpw  x, y
> > >>      bgeq  1f
> > >>      ...
> > >>
> > >> What I actually see, with -O2 and/or -Os, is:
> > >>
> > >>      movw  x, r0
> > >>      movw  y, r1
> > >>      cmpw  r0, r1
> > >>      bgeq  1f
> > >>      ...
> > >>
> > >> which is both longer and slower.  I can't tell why this happens, or how
> > >> to stop it.  The machine description has "general_operand" so it
> > >> doesn't seem to be the place that forces things into registers.
> > >
> > > I would expect combine to merge the load and arithmetic and thus it is
> eventually the target costing that makes that not succeed.
> 
> > Thanks Richard.  I am not having a whole lot of luck figuring out where
> > precisely I need to adjust or how to make the adjustment.  I'm doing the
> > adjusting on the pdp11 port right now.  That has a TARGET_RTX_COSTS hook
> > which looks fairly plausible.  It doesn't currently have a
> > TARGET_MEMORY_MOVE_COST, or TARGET_ADDRESS_COST, or TARGET_INSN_COST.  It
> > is likely that I need some or all of those to get this working better?  If
> > yes, any hints you can offer where to start?

-fdump-rtl-combine-all (or just -da or -dap), and then look at the dump
file.  Does combine try this combination?  If so, it will tell you what
the resulting costs are.  If not, why does it not try it?

> Sorry, I'm not very familiar with this area of GCC either.  Did you confirm
> that combine at least tries to merge the memory ops into the instruction?

It should, it's a simple reg dependency.  In many cases it will even do
it if it is not single-use (via a 3->2 combination).


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How do I stop gcc from loading data into registers when that's not needed?
  2018-05-22 19:26       ` Segher Boessenkool
@ 2018-05-23  0:50         ` Paul Koning
  2018-05-23  9:47           ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Koning @ 2018-05-23  0:50 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Development



> On May 22, 2018, at 3:26 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> 
> -fdump-rtl-combine-all (or just -da or -dap), and then look at the dump
> file.  Does combine try this combination?  If so, it will tell you what
> the resulting costs are.  If not, why does it not try it?
> 
>> Sorry, I'm not very familiar with this area of GCC either.  Did you confirm
>> that combine at least tries to merge the memory ops into the instruction?
> 
> It should, it's a simple reg dependency.  In many cases it will even do
> it if it is not single-use (via a 3->2 combination).

I examined what gcc does with two simple functions:

void c2(void)
{
    if (x < y)
        z = 1;
    else if (x != y)
        z = 42;
    else
        z = 9;
}

void c3(void)
{
    if (x < y)
        z = 1;
    else
        z = 9;
}

Two things popped out.

1. The original RTL (from the expand phase) has a memory->register move for x and y in c2, but it doesn't for c3 (it simply generates a memory/memory compare there).  What triggers the different choice in that phase?

2. The reported costs for the various insns are
	r22:HI=['x']		6
	cmp(r22:HI,r23:HI)	4
	cmp(['x'],['y'])	16
   so the added cost for the memory argument in the cmp is 6 -- the same as the whole cost for the mov.  That certainly explains the behavior.  It isn't what I want it to be.  Which target hook(s) are involved in these numbers?  I don't see them in my rtx_costs hook.

	paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How do I stop gcc from loading data into registers when that's not needed?
  2018-05-23  0:50         ` Paul Koning
@ 2018-05-23  9:47           ` Richard Biener
  2018-05-24  0:33             ` Paul Koning
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2018-05-23  9:47 UTC (permalink / raw)
  To: paulkoning; +Cc: Segher Boessenkool, GCC Development

On Wed, May 23, 2018 at 2:50 AM Paul Koning <paulkoning@comcast.net> wrote:



> > On May 22, 2018, at 3:26 PM, Segher Boessenkool <
segher@kernel.crashing.org> wrote:
> >
> >
> > -fdump-rtl-combine-all (or just -da or -dap), and then look at the dump
> > file.  Does combine try this combination?  If so, it will tell you what
> > the resulting costs are.  If not, why does it not try it?
> >
> >> Sorry, I'm not very familiar with this area of GCC either.  Did you
confirm
> >> that combine at least tries to merge the memory ops into the
instruction?
> >
> > It should, it's a simple reg dependency.  In many cases it will even do
> > it if it is not single-use (via a 3->2 combination).

> I examined what gcc does with two simple functions:

> void c2(void)
> {
>      if (x < y)
>          z = 1;
>      else if (x != y)
>          z = 42;
>      else
>          z = 9;
> }

> void c3(void)
> {
>      if (x < y)
>          z = 1;
>      else
>          z = 9;
> }

> Two things popped out.

> 1. The original RTL (from the expand phase) has a memory->register move
for x and y in c2, but it doesn't for c3 (it simply generates a
memory/memory compare there).  What triggers the different choice in that
phase?

> 2. The reported costs for the various insns are
>          r22:HI=['x']            6
>          cmp(r22:HI,r23:HI)      4
>          cmp(['x'],['y'])        16
>     so the added cost for the memory argument in the cmp is 6 -- the same
as the whole cost for the mov.  That certainly explains the behavior.  It
isn't what I want it to be.  Which target hook(s) are involved in these
numbers?  I don't see them in my rtx_costs hook.

The rtx_cost hook.   I think the costs above make sense.  There's also a
new insn_cost hook but you have to dig whether combine uses that.
Otherwise address_cost might be involved.

Richard.

>          paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How do I stop gcc from loading data into registers when that's not needed?
  2018-05-23  9:47           ` Richard Biener
@ 2018-05-24  0:33             ` Paul Koning
  2018-05-24 18:25               ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Koning @ 2018-05-24  0:33 UTC (permalink / raw)
  To: Richard Biener; +Cc: Segher Boessenkool, GCC Development



> On May 23, 2018, at 5:46 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> ...
> 
>> 2. The reported costs for the various insns are
>>         r22:HI=['x']            6
>>         cmp(r22:HI,r23:HI)      4
>>         cmp(['x'],['y'])        16
>>    so the added cost for the memory argument in the cmp is 6 -- the same
>> as the whole cost for the mov.  That certainly explains the behavior.  It
>> isn't what I want it to be.  Which target hook(s) are involved in these
>> numbers?  I don't see them in my rtx_costs hook.
> 
> The rtx_cost hook.   I think the costs above make sense.  There's also a
> new insn_cost hook but you have to dig whether combine uses that.
> Otherwise address_cost might be involved.

Thanks.  For a pdp11, those costs aren't right because mov and cmp and 
a memory reference each have about the same cost.  So 8, 4, 12 would be 
closer.  But the real question for me at this point is where to find
the knobs that adjust these choices.

The various cost hooks have me confused, and the GCCINT manual is not
really enlightening.  There is rtx_costs, insn_cost, and addr_cost.
It sort of feels like insn_cost and addr_cost together would provide 
roughly the same information that rtx_costs gives.  In the existing 
platforms, I see rtx_costs everywhere, addr_cost in a fair number of
targets, and insn_cost in just one (rs6000).  Can someone explain the
interaction of these various cost hooks, and what happens if you define
various combinations of the three?

	paul


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How do I stop gcc from loading data into registers when that's not needed?
  2018-05-24  0:33             ` Paul Koning
@ 2018-05-24 18:25               ` Segher Boessenkool
  0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2018-05-24 18:25 UTC (permalink / raw)
  To: Paul Koning; +Cc: Richard Biener, GCC Development

On Wed, May 23, 2018 at 08:33:13PM -0400, Paul Koning wrote:
> > On May 23, 2018, at 5:46 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> >> 2. The reported costs for the various insns are
> >>         r22:HI=['x']            6
> >>         cmp(r22:HI,r23:HI)      4
> >>         cmp(['x'],['y'])        16
> >>    so the added cost for the memory argument in the cmp is 6 -- the same
> >> as the whole cost for the mov.  That certainly explains the behavior.  It
> >> isn't what I want it to be.  Which target hook(s) are involved in these
> >> numbers?  I don't see them in my rtx_costs hook.
> > 
> > The rtx_cost hook.   I think the costs above make sense.  There's also a
> > new insn_cost hook but you have to dig whether combine uses that.
> > Otherwise address_cost might be involved.
> 
> Thanks.  For a pdp11, those costs aren't right because mov and cmp and 
> a memory reference each have about the same cost.  So 8, 4, 12 would be 
> closer.  But the real question for me at this point is where to find
> the knobs that adjust these choices.
> 
> The various cost hooks have me confused, and the GCCINT manual is not
> really enlightening.  There is rtx_costs, insn_cost, and addr_cost.
> It sort of feels like insn_cost and addr_cost together would provide 
> roughly the same information that rtx_costs gives.  In the existing 
> platforms, I see rtx_costs everywhere, addr_cost in a fair number of
> targets, and insn_cost in just one (rs6000).  Can someone explain the
> interaction of these various cost hooks, and what happens if you define
> various combinations of the three?

rtx_costs computes the cost for any rtx (an insn, a set, a set source,
any random piece of one).  set_src_cost, set_rtx_cost, etc. are helper
functions that use that.

Those functions do not work for parallels.  Also, costs are not additive
like this simplified model assumes.  Also, more complex backends tend
to miss many cases in their rtx_costs function.

Many passes that want costs want to know the cost of a full insn.  Like
combine.  That's why I created insn_cost: it solves all of the above
problems.

I'll hopefully make most passes use insn_cost for GCC 9.  All of the very
easy ones already do.


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-24 18:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-18 18:03 How do I stop gcc from loading data into registers when that's not needed? Paul Koning
2018-05-18 18:07 ` Richard Biener
     [not found]   ` <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net>
2018-05-22  8:49     ` Richard Biener
2018-05-22 19:26       ` Segher Boessenkool
2018-05-23  0:50         ` Paul Koning
2018-05-23  9:47           ` Richard Biener
2018-05-24  0:33             ` Paul Koning
2018-05-24 18:25               ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).