public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Just what are rtx costs?
@ 2011-08-17 14:52 Richard Sandiford
  2011-08-17 17:29 ` Georg-Johann Lay
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Richard Sandiford @ 2011-08-17 14:52 UTC (permalink / raw)
  To: gcc

I've been working on some patches to make insn_rtx_cost take account
of the cost of SET_DESTs as well as SET_SRCs.  But I'm slowly beginning
to realise that I don't understand what rtx costs are supposed to represent.

AIUI the rules have historically been:

  1) Registers have zero cost.

  2) Constants have a cost relative to that of registers.  By extension,
     constants have zero cost if they are as cheap as a register.

  3) With an outer code of SET, actual operations have the cost
     of the associated instruction.  E.g. the cost of a PLUS
     is the cost of an addition instruction.

  4) With other outer codes, actual operations have the cost
     of the combined instruction, if available, or the cost of
     a separate instruction otherwise.  E.g. the cost of a NEG
     inside an AND might be zero on targets that support BIC-like
     instructions, and COSTS_N_INSNS (1) on most others.

Memories seem a bit more vague.  I think most ports handle them
like (2), but I'm not sure whether all RISCy ports do.

I'd wrongly assumed that rtx costs were only used for rvalues.
However, several places actually take the rtx cost of a SET.
Specifically:

	* auto-inc-dec.c (attempt_change)
	* cfgloopanal.c (seq_cost)
	* loop-invariant.c (create_new_invariant)
	* postreload.c (move2add_use_add2_insn, move2add_use_add3_insn)
	(reload_cse_move2add)

(What confused me is that these places still pass "SET" as the outer code.
"INSN" would be more accurate.)

By default, the cost of a SET is:

    COSTS_N_INSNS (1)
    + rtx_cost (SET_DEST (x), SET, speed)
    + rtx_cost (SET_SRC (x), SET, speed)

But it seems like there's some double-counting for complex SET_SRCs here.
The cost of the instruction is already included in the SET_SRC, so
adding COSTS_N_INSNS (1) is overkill.  COSTS_N_INSNS (1) does seem
necessary for register, constant, subreg and (possibly) memory
SET_SRCs though.

Some targets' cost hooks do handle SETs.  Others have a default case
that assumes any unhandled operation is expensive, so even register sets
end up getting this treatment.  (I notice ARM is inconsistent: Thumb-1
handles SETs explicitly, but the default ARM and Thumb-2 costs assume
that SET itself has a cost of COSTS_N_INSNS (4).)

insn_rtx_cost instead uses:

  cost = set_src_cost (SET_SRC (set), speed);
  return cost > 0 ? cost : COSTS_N_INSNS (1);

This ignores SET_DEST (the problem I'm trying to fix).  It also means
that constants that are slightly more expensive than a register --
somewhere in the range [0, COSTS_N_INSNS (1)] -- end up seeming
cheaper than registers.

So it doesn't look like we're consistent here, and that some rejigging
might be needed.  As others have said, it would be nice if costs could
be extracted from the .md file, but that's more work than I have time
for now.  Is it worth changing the costs anyway?

One approach I'm trying is to make sure that every target that doesn't
explicitly handle SET does nothing with it.  (Targets that do handle
SET remain unchanged.)  Then, if we see a SET whose SET_SRC is a
register, constant, memory or subreg, we give it cost:

    COSTS_N_INSNS (1)
    + rtx_cost (SET_DEST (x), SET, speed)
    + rtx_cost (SET_SRC (x), SET, speed)

as now.  In other cases we give it a cost of:

    rtx_cost (SET_DEST (x), SET, speed)
    + rtx_cost (SET_SRC (x), SET, speed)

But that hardly seems clean either.  Perhaps we should instead make
the SET_SRC always include the cost of the SET, even for registers,
constants and the like.  Thoughts?

(The patches I've been working on also tell rtx_costs -- and the target
hook -- whether the operand is an lvalue.  That means that stores can
have a different cost from loads.)

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-17 14:52 Just what are rtx costs? Richard Sandiford
@ 2011-08-17 17:29 ` Georg-Johann Lay
  2011-08-19 12:25   ` Richard Sandiford
  2011-08-17 18:56 ` Paolo Bonzini
  2011-08-18  0:55 ` Hans-Peter Nilsson
  2 siblings, 1 reply; 16+ messages in thread
From: Georg-Johann Lay @ 2011-08-17 17:29 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc

Richard Sandiford schrieb:
> I've been working on some patches to make insn_rtx_cost take account
> of the cost of SET_DESTs as well as SET_SRCs.  But I'm slowly beginning
> to realise that I don't understand what rtx costs are supposed to represent.
> 
> AIUI the rules have historically been:
> 
>   1) Registers have zero cost.
> 
>   2) Constants have a cost relative to that of registers.  By extension,
>      constants have zero cost if they are as cheap as a register.
> 
>   3) With an outer code of SET, actual operations have the cost
>      of the associated instruction.  E.g. the cost of a PLUS
>      is the cost of an addition instruction.
> 
>   4) With other outer codes, actual operations have the cost
>      of the combined instruction, if available, or the cost of
>      a separate instruction otherwise.  E.g. the cost of a NEG
>      inside an AND might be zero on targets that support BIC-like
>      instructions, and COSTS_N_INSNS (1) on most others.
> 
> [...]
> 
> But that hardly seems clean either.  Perhaps we should instead make
> the SET_SRC always include the cost of the SET, even for registers,
> constants and the like.  Thoughts?

IMO a clean approach would be to query the costs of a whole insn (resp. 
it's pattern) rather than the cost of an RTX.  COSTS_N_INSNS already 
indicates that the costs are compared to *insn* costs i.e. cost of the 
whole pattern (modulo clobbers).

E.g. the cost of a CONST_INT is meaningless if you don't know what to do 
with the constant. (set (reg:QI) (const_int 0)) might have different 
cost than (set (reg:SI) (const_int 0)).  Similar applies for an addition 
if you don't know if it's for arithmetic or address offset computation 
inside a MEM.

To distinguish a proper SET from a jump you have to analyze the RTX or 
guess that it's a jump because the rtx has VOIDmode.  All that is 
information which is readily available; many parts in the compiler even 
strip the SET and just pass the SET_DEST down to the backend and thus 
hide information the backend needs.

A backend mostly thinks in terms of insns, and passing the pattern down 
to the backend (e.g. in insn combine which has the newly synthesized 
insn handy, anyway) would enable the backend to use recog and insn 
attributes to get the insn cost.

I use that in a port that has complex ISA and bunch of combine patterns: 
Each such pattern has two attributes "size" and "speed" that tell the 
rtx costs of that insn with respect to size or speed optimization. That 
approach is much more legible compared to explicitely writing down 
XEXP-orgy.

The difference between "size" and "length" is that "length" tells the 
size (or at least the maximum possible size) in bytes of an insns 
whereas "size" tells the estimated value over all alternatives.

Look as the rtx_costs of AVR beckend: it's horrible code that looks like 
hand-written insn-recog.c! It's annoying to write, hard to track and 
duplicates code (which is already there in insn-recog, insn-extraxt, as 
RTL in md, etc.) Using cost attributes would make is possible to 
describe the costs in the place they are generated: that's the insn 
which is responsible for it -- and not as hundreds of cryptic C lines...

Besides that, it apprears to be unclear what a cost of 0 means: Some 
parts seem to treat is as "cost nothing", others as "don't know".

> (The patches I've been working on also tell rtx_costs -- and the target
> hook -- whether the operand is an lvalue.  That means that stores can
> have a different cost from loads.)

I think IRA needed better way to tell cost besides the !, ? and *.
For example, it is not possible at the moment to tell that
    (set (reg:SI 2) (mem:SI (reg:SI 0)))
could be cheaper than
    (set (reg:SI 2) (mem:SI (reg:SI 1)))
if, e.g. R0 is an implicit register but R1 is not.

Johann

> Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-17 14:52 Just what are rtx costs? Richard Sandiford
  2011-08-17 17:29 ` Georg-Johann Lay
@ 2011-08-17 18:56 ` Paolo Bonzini
  2011-08-18  7:58   ` Richard Sandiford
  2011-08-18  0:55 ` Hans-Peter Nilsson
  2 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2011-08-17 18:56 UTC (permalink / raw)
  To: gcc, richard.sandiford

On 08/17/2011 07:52 AM, Richard Sandiford wrote:
>    cost = rtx_cost (SET_SRC (set), SET, speed);
>    return cost>  0 ? cost : COSTS_N_INSNS (1);
>
> This ignores SET_DEST (the problem I'm trying to fix).  It also means
> that constants that are slightly more expensive than a register --
> somewhere in the range [0, COSTS_N_INSNS (1)] -- end up seeming
> cheaper than registers.

This can be fixed by doing

   return cost >= COSTS_N_INSNS (1) ? cost : COSTS_N_INSNS (1);

> One approach I'm trying is to make sure that every target that doesn't
> explicitly handle SET does nothing with it.  (Targets that do handle
> SET remain unchanged.)  Then, if we see a SET whose SET_SRC is a
> register, constant, memory or subreg, we give it cost:
>
>      COSTS_N_INSNS (1)
>      + rtx_cost (SET_DEST (x), SET, speed)
>      + rtx_cost (SET_SRC (x), SET, speed)
>
> as now.  In other cases we give it a cost of:
>
>      rtx_cost (SET_DEST (x), SET, speed)
>      + rtx_cost (SET_SRC (x), SET, speed)
>
> But that hardly seems clean either.  Perhaps we should instead make
> the SET_SRC always include the cost of the SET, even for registers,
> constants and the like.  Thoughts?

Similarly, this becomes

   dest_cost = rtx_cost (SET_DEST (x), SET, speed);
   src_cost = MAX (rtx_cost (SET_SRC (x), SET, speed),
                   COSTS_N_INSNS (1));
   return dest_cost + src_cost;

How does this look?

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-17 14:52 Just what are rtx costs? Richard Sandiford
  2011-08-17 17:29 ` Georg-Johann Lay
  2011-08-17 18:56 ` Paolo Bonzini
@ 2011-08-18  0:55 ` Hans-Peter Nilsson
  2011-08-18  8:14   ` Richard Sandiford
  2011-08-19 14:17   ` Richard Sandiford
  2 siblings, 2 replies; 16+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-18  0:55 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc

On Wed, 17 Aug 2011, Richard Sandiford wrote:

> It also means
> that constants that are slightly more expensive than a register --
> somewhere in the range [0, COSTS_N_INSNS (1)] -- end up seeming
> cheaper than registers.

Yes, perhaps some scale factor has to be applied to get
reasonable cost granularity in an improved cost model for the
job...  Some constants *are* more expensive (extra words and/or
extra cycles), yet preferable to registers for one (or maybe two
or...) insns.  You don't want to find that all insns except
constant-loads suddenly use register arguments and no
port-specific metric way to tweak it.  Mentioned for the record.

> By default, the cost of a SET is:
>
>     COSTS_N_INSNS (1)
>     + rtx_cost (SET_DEST (x), SET, speed)
>     + rtx_cost (SET_SRC (x), SET, speed)

'k.

> But it seems like there's some double-counting for complex SET_SRCs here.

Bugs.

> As others have said, it would be nice if costs could
> be extracted from the .md file, but that's more work than I have time
> for now.

Yes, let's start by getting the semantics settled and
implemented first.

>  Is it worth changing the costs anyway?

IMHO, it's worth making it consistent, linear, TRT...

> But that hardly seems clean either.  Perhaps we should instead make
> the SET_SRC always include the cost of the SET, even for registers,
> constants and the like.  Thoughts?

Seems more of maintaining a wart than an improvement for
consistency.

I don't think you can get into trouble for trying to improve
this area for consistency: between releases a port already
usually arbitrarily loses on some type of codes and costs have
to be re-tweaked, unless performance is closely tracked.
Aiming for traceability can only help (like, "read the added doc
blurb on how to define the port RTX costs instead of gdb
stepping and code inspection").

brgds, H-P

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-17 18:56 ` Paolo Bonzini
@ 2011-08-18  7:58   ` Richard Sandiford
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2011-08-18  7:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc

Thanks for the feedback.

Paolo Bonzini <bonzini@gnu.org> writes:
> On 08/17/2011 07:52 AM, Richard Sandiford wrote:
>>    cost = rtx_cost (SET_SRC (set), SET, speed);
>>    return cost>  0 ? cost : COSTS_N_INSNS (1);
>>
>> This ignores SET_DEST (the problem I'm trying to fix).  It also means
>> that constants that are slightly more expensive than a register --
>> somewhere in the range [0, COSTS_N_INSNS (1)] -- end up seeming
>> cheaper than registers.
>
> This can be fixed by doing
>
>    return cost >= COSTS_N_INSNS (1) ? cost : COSTS_N_INSNS (1);

Is that really a fix though?  Those sorts of constant are supposed to be
more expensive than a normal register move, not the same cost.

To put it another way: for real operations like PLUS, you have full
control.  If something is slightly more expensive than a single move,
but not as expensive as two moves, you can return a cost in the
range [COSTS_N_INSNS (1), COSTS_N_INSNS (2)].  But constants are
generally given relative to the cost of a register, so the corresponding
range would be [0, COSTS_N_INSNS (1)] instead.

>> One approach I'm trying is to make sure that every target that doesn't
>> explicitly handle SET does nothing with it.  (Targets that do handle
>> SET remain unchanged.)  Then, if we see a SET whose SET_SRC is a
>> register, constant, memory or subreg, we give it cost:
>>
>>      COSTS_N_INSNS (1)
>>      + rtx_cost (SET_DEST (x), SET, speed)
>>      + rtx_cost (SET_SRC (x), SET, speed)
>>
>> as now.  In other cases we give it a cost of:
>>
>>      rtx_cost (SET_DEST (x), SET, speed)
>>      + rtx_cost (SET_SRC (x), SET, speed)
>>
>> But that hardly seems clean either.  Perhaps we should instead make
>> the SET_SRC always include the cost of the SET, even for registers,
>> constants and the like.  Thoughts?
>
> Similarly, this becomes
>
>    dest_cost = rtx_cost (SET_DEST (x), SET, speed);
>    src_cost = MAX (rtx_cost (SET_SRC (x), SET, speed),
>                    COSTS_N_INSNS (1));
>    return dest_cost + src_cost;
>
> How does this look?

This has the same problem as above.  There's a second problem though:
what about register stores?  Say a load is as expensive as a store
(often true, when optimising for size).  The formula above would give
the cost of a load as being:

   cost(REG) + MAX(cost(MEM), CNI1) == cost(MEM)

whereas the cost of a store would be:

   cost(MEM) + MAX(cost(REG), CNI1) == cost(MEM) + CNI1

On RISCy machines you could try to compensate by making the cost of a
MEM smaller for lvalues, but that doesn't seem very clean.  It would
also skew the costs of MEM-to-MEM moves on CISCy machines, where the
MAX would have no effect.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-18  0:55 ` Hans-Peter Nilsson
@ 2011-08-18  8:14   ` Richard Sandiford
  2011-08-19 14:17   ` Richard Sandiford
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2011-08-18  8:14 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: gcc

Hans-Peter Nilsson <hp@bitrange.com> writes:
> On Wed, 17 Aug 2011, Richard Sandiford wrote:
>> It also means
>> that constants that are slightly more expensive than a register --
>> somewhere in the range [0, COSTS_N_INSNS (1)] -- end up seeming
>> cheaper than registers.
>
> Yes, perhaps some scale factor has to be applied to get
> reasonable cost granularity in an improved cost model for the
> job...  Some constants *are* more expensive (extra words and/or
> extra cycles), yet preferable to registers for one (or maybe two
> or...) insns.  You don't want to find that all insns except
> constant-loads suddenly use register arguments and no
> port-specific metric way to tweak it.  Mentioned for the record.

I was hoping that if the costs of SETs were "fixed", then that kind of
situation should work without any new scale factors.  E.g. two constant
moves of equal execution frequency that cost COSTS_N_INSNS (1) + 1
(total COSTS_N_INSNS (2) + 2) shouldn't be CSEd (COSTS_N_INSNS (3) + 1)
unless the single constant move can go in a less frequently-executed block.

> I don't think you can get into trouble for trying to improve
> this area for consistency: between releases a port already
> usually arbitrarily loses on some type of codes and costs have
> to be re-tweaked, unless performance is closely tracked.

Yeah, probably true.  Certainly a good excuse for me to use. :-)

> Aiming for traceability can only help (like, "read the added doc
> blurb on how to define the port RTX costs instead of gdb
> stepping and code inspection").

Yeah, that'd be nice.  In practice there's probably always going to be a
bit of experimentation needed.  E.g. getting the cost of multiplicaton
correct wrt shifts and adds can be tricky on a superscalar target.
(At least, it was when I last tried it too many years ago.)

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-17 17:29 ` Georg-Johann Lay
@ 2011-08-19 12:25   ` Richard Sandiford
  2011-08-21 17:03     ` Georg-Johann Lay
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2011-08-19 12:25 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc

Georg-Johann Lay <avr@gjlay.de> writes:
> Richard Sandiford schrieb:
>> I've been working on some patches to make insn_rtx_cost take account
>> of the cost of SET_DESTs as well as SET_SRCs.  But I'm slowly beginning
>> to realise that I don't understand what rtx costs are supposed to represent.
>> 
>> AIUI the rules have historically been:
>> 
>>   1) Registers have zero cost.
>> 
>>   2) Constants have a cost relative to that of registers.  By extension,
>>      constants have zero cost if they are as cheap as a register.
>> 
>>   3) With an outer code of SET, actual operations have the cost
>>      of the associated instruction.  E.g. the cost of a PLUS
>>      is the cost of an addition instruction.
>> 
>>   4) With other outer codes, actual operations have the cost
>>      of the combined instruction, if available, or the cost of
>>      a separate instruction otherwise.  E.g. the cost of a NEG
>>      inside an AND might be zero on targets that support BIC-like
>>      instructions, and COSTS_N_INSNS (1) on most others.
>> 
>> [...]
>> 
>> But that hardly seems clean either.  Perhaps we should instead make
>> the SET_SRC always include the cost of the SET, even for registers,
>> constants and the like.  Thoughts?
>
> IMO a clean approach would be to query the costs of a whole insn (resp. 
> it's pattern) rather than the cost of an RTX.  COSTS_N_INSNS already 
> indicates that the costs are compared to *insn* costs i.e. cost of the 
> whole pattern (modulo clobbers).

The problem is that we sometimes want the cost of something that cannot
be done using a single instruction.  E.g. some CONST_INTs take several
instructions to create on MIPS.  In this case the costs are really
measuring the cost of an emit_move_insn sequence, not a single insn.

I suppose we could use emit_move_insn to create a temporary sequence
and sum the cost of each individual instruction.  But that's potentially
expensive.

Also, any change along these lines is similar to the "tie costs to
.md patterns" thing that I mentioned at the end of the message.
I don't really have time to work on anything so invasive, so the
question is really whether we can sensibly change the costs within
the current framework.

> E.g. the cost of a CONST_INT is meaningless if you don't know what to do 
> with the constant. (set (reg:QI) (const_int 0)) might have different 
> cost than (set (reg:SI) (const_int 0)).

That's the old modeless-CONST_INT problem though.  I agree the mode
makes a difference, but I think it's a separate issue.

If we did pass complete instructions, each caller would have to provide
a SET whose SET_DEST is an arbitrary psuedo register of the required mode.
In all likelihood, we'd define a new utility function do this for us.

So in the modeless-CONST_INT world, we would need to pass the mode
to that utility function.  But if we're prepared to pass such extra
information around, we could simply do it by passing the mode directly
to rtx_cost.

I think the cost of a full rvalue (rather than a complete SET),
should be based on the assumption that the destination is a register.
So I don't the backend is missing any information in that respect.
I.e. the problem doesn't seem to be lack of information, but rather
lack of consistency (between registers, constants, operators,
and so on).

> Similar applies for an addition if you don't know if it's for
> arithmetic or address offset computation inside a MEM.

Yeah, that'd be bad, but I don't think we take costs at that level.
Only a few places take the cost of anything "smaller" than a complete
rvalue, and they either take the cost of a condition or take the cost of
an operand to a "top-level" operator.  The target then has the option of
stopping the recursion whenever it likes.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-18  0:55 ` Hans-Peter Nilsson
  2011-08-18  8:14   ` Richard Sandiford
@ 2011-08-19 14:17   ` Richard Sandiford
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2011-08-19 14:17 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: gcc

Hans-Peter Nilsson <hp@bitrange.com> writes:
>> But that hardly seems clean either.  Perhaps we should instead make
>> the SET_SRC always include the cost of the SET, even for registers,
>> constants and the like.  Thoughts?
>
> Seems more of maintaining a wart than an improvement for
> consistency.

So, to enumerate a few options:

1) Make the cost of a SET source X the same as the cost of (set (psuedo) X).

   The problem with this is that it seems inconsistent with recursive
   calls.  The cost of a REG in a SET would become COSTS_N_INSNS (1),
   but the cost of a REG in a PLUS should still be 0 on most targets.
   (The REG has no cost over and above the PLUS itself.)

   Similarly, constants in a PLUS should have a cost of 0 if additions
   by that constant are as cheap as additions by a register.  Other
   constants should have a higher cost.  But option (1) would mean that
   constants in a SET that are as cheap as registers in a SET should
   have cost COSTS_N_INSNS (1).  More expensive constants would have a
   higher cost.

2) Make the cost of a SET source X relative to the cost of a register
   source.

   This too is inconsistent.  If an addition is as cheap as a move,
   the cost of PLUS with an outer code of SET should be 0.
   But COSTS_N_INSNS (1) still seems the best default for other
   outer codes, especially if the addition has to be done using
   a separate instruction.

   Also, having to subtract COSTS_N_INSNS (1) from the "natural"
   cost would be a bit error-prone.  E.g. if you want the cost
   of MULT to be the number of cycles it takes, you'd need to use
   COSTS_N_INSNS (cycles - 1) rather than COSTS_N_INSNS (cycles).

3) Stick to the current approach in the target hook (at least AIUI)
   and fix up the result to suit whateer interface the middle end is using.

   This is really a generalisation of the idea in my original message
   (the one I said wasn't very clean).  But perhaps it can be justified
   by saying that the target hook gives the cost of a _term_.  That is,
   the costs provided by the target for terms are relative to the cost
   of a register.  If the supplied X isn't a term, or isn't a valid term
   for the outer code, then the hook returns the cost of forcing X into
   a register.  This seems to be approximately what we have now, and is
   probably the easiest thing for the target to provide.

   So when taking the cost of:

       (plus A B)

   with an outer code of SET, the caller is asking rtlanal.c:rtx_cost
   for the cost of:

       (set tmp-reg (plus A B))

   And by passing (plus A B) to the target hook, rtx_cost in turn is
   asking for the cost of:

       (set tmp-reg2 (plus A B))
       (set tmp-reg tmp-reg2)

   rtx_cost then has the responsibility of "optimising" this sequence into:

       (set tmp-reg (plus A B))

   by deducting COSTS_N_INSNS (1) as appropriate.

   One side-effect of this is that the target should check for fused
   operations such as MULT-in-PLUS when it sees the PLUS expression.
   It shouldn't adjust the cost of MULT down a bit if outer_code is PLUS.
   But I think that's generally true anyway.  When you see a MULT with
   an outer code of PLUS, you don't know if that PLUS was a SET_SRC or not.
   You do know when you see the PLUS.

   Another side-effect is that, if a MEM appears in a context where MEMs
   are not allowed, the target should give it a cost of COSTS_N_INSNS (1)
   higher than otherwise, to account for a separate load instruction.
   But perhaps it should be doing that anyway.

Bleh.  Maybe there just isn't a good answer here.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-19 12:25   ` Richard Sandiford
@ 2011-08-21 17:03     ` Georg-Johann Lay
  2011-08-22  8:20       ` Richard Sandiford
  2011-08-22 23:23       ` Peter Bigot
  0 siblings, 2 replies; 16+ messages in thread
From: Georg-Johann Lay @ 2011-08-21 17:03 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc

Richard Sandiford schrieb:
> Georg-Johann Lay <avr@gjlay.de> writes:
> 
>>Richard Sandiford schrieb:
>>
>>>I've been working on some patches to make insn_rtx_cost take account
>>>of the cost of SET_DESTs as well as SET_SRCs.  But I'm slowly beginning
>>>to realise that I don't understand what rtx costs are supposed to represent.
>>>
>>>AIUI the rules have historically been:
>>>
>>>  1) Registers have zero cost.
>>>
>>>  2) Constants have a cost relative to that of registers.  By extension,
>>>     constants have zero cost if they are as cheap as a register.
>>>
>>>  3) With an outer code of SET, actual operations have the cost
>>>     of the associated instruction.  E.g. the cost of a PLUS
>>>     is the cost of an addition instruction.
>>>
>>>  4) With other outer codes, actual operations have the cost
>>>     of the combined instruction, if available, or the cost of
>>>     a separate instruction otherwise.  E.g. the cost of a NEG
>>>     inside an AND might be zero on targets that support BIC-like
>>>     instructions, and COSTS_N_INSNS (1) on most others.
>>>
>>>[...]
>>>
>>>But that hardly seems clean either.  Perhaps we should instead make
>>>the SET_SRC always include the cost of the SET, even for registers,
>>>constants and the like.  Thoughts?
>>
>>IMO a clean approach would be to query the costs of a whole insn (resp. 
>>it's pattern) rather than the cost of an RTX.  COSTS_N_INSNS already 
>>indicates that the costs are compared to *insn* costs i.e. cost of the 
>>whole pattern (modulo clobbers).
> 
> The problem is that we sometimes want the cost of something that cannot
> be done using a single instruction.  E.g. some CONST_INTs take several
> instructions to create on MIPS.  In this case the costs are really
> measuring the cost of an emit_move_insn sequence, not a single insn.
> 
> I suppose we could use emit_move_insn to create a temporary sequence
> and sum the cost of each individual instruction.  But that's potentially
> expensive.

No, that complexity is not needed.  For (set (reg) (const_int)) the BE 
can just return the cost of the expanded sequence because it knows how 
it will be expanded and how much it will cost.  There's no need to 
really expand the sequence.

That's the way, e.g. AVR backend works: Shifts/mul/div must be expanded 
because the hardware does not support them natively.  The rtx_cost for 
such an expression (which are always interpreted as RHS of a (set (reg) 
...)) are the sum over the costs of all insns the expander will produce.

> Also, any change along these lines is similar to the "tie costs to
> .md patterns" thing that I mentioned at the end of the message.
> I don't really have time to work on anything so invasive, so the
> question is really whether we can sensibly change the costs within
> the current framework.
> 
>>E.g. the cost of a CONST_INT is meaningless if you don't know what to do 
>>with the constant. (set (reg:QI) (const_int 0)) might have different 
>>cost than (set (reg:SI) (const_int 0)).
> 
> That's the old modeless-CONST_INT problem though.  I agree the mode
> makes a difference, but I think it's a separate issue.

It's the same bottom line: If you see the insn, you know the cost.

> If we did pass complete instructions, each caller would have to provide
> a SET whose SET_DEST is an arbitrary psuedo register of the required mode.
> In all likelihood, we'd define a new utility function do this for us.
> 
> So in the modeless-CONST_INT world, we would need to pass the mode
> to that utility function.  But if we're prepared to pass such extra
> information around, we could simply do it by passing the mode directly
> to rtx_cost.
> 
> I think the cost of a full rvalue (rather than a complete SET),
> should be based on the assumption that the destination is a register.
> So I don't the backend is missing any information in that respect.
> I.e. the problem doesn't seem to be lack of information, but rather
> lack of consistency (between registers, constants, operators,
> and so on).
> 
>>Similar applies for an addition if you don't know if it's for
>>arithmetic or address offset computation inside a MEM.
> 
> Yeah, that'd be bad, but I don't think we take costs at that level.
> Only a few places take the cost of anything "smaller" than a complete
> rvalue, and they either take the cost of a condition or take the cost of
> an operand to a "top-level" operator.  The target then has the option of
> stopping the recursion whenever it likes.
> 
> Richard

Johann

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-21 17:03     ` Georg-Johann Lay
@ 2011-08-22  8:20       ` Richard Sandiford
  2011-08-22  9:59         ` Richard Guenther
  2011-08-22 12:12         ` Georg-Johann Lay
  2011-08-22 23:23       ` Peter Bigot
  1 sibling, 2 replies; 16+ messages in thread
From: Richard Sandiford @ 2011-08-22  8:20 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc

Georg-Johann Lay <avr@gjlay.de> writes:
>>>IMO a clean approach would be to query the costs of a whole insn (resp. 
>>>it's pattern) rather than the cost of an RTX.  COSTS_N_INSNS already 
>>>indicates that the costs are compared to *insn* costs i.e. cost of the 
>>>whole pattern (modulo clobbers).
>> 
>> The problem is that we sometimes want the cost of something that cannot
>> be done using a single instruction.  E.g. some CONST_INTs take several
>> instructions to create on MIPS.  In this case the costs are really
>> measuring the cost of an emit_move_insn sequence, not a single insn.
>> 
>> I suppose we could use emit_move_insn to create a temporary sequence
>> and sum the cost of each individual instruction.  But that's potentially
>> expensive.
>
> No, that complexity is not needed.  For (set (reg) (const_int)) the BE 
> can just return the cost of the expanded sequence because it knows how 
> it will be expanded and how much it will cost.  There's no need to 
> really expand the sequence.

Sorry, I'd misunderstood your suggestion.  I thought you were suggesting
that the rtx costs functions should only be presented with SETs that are
valid instructions.  I hadn't realised that you were still allowing these
SETs to be arbitrary ones that have been cooked up by the optimisers.

So are you saying that we should remove the recursive nature of the
rtx_cost/targetm.rtx_costs interface, and have the backend handle any
recursion itself?  I.e. targetm.rtx_costs only ever sees a complete
(but perhaps invalid) instruction pattern?  Or would you still keep
the current recursion?

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-22  8:20       ` Richard Sandiford
@ 2011-08-22  9:59         ` Richard Guenther
  2011-08-22 13:08           ` Joern Rennecke
  2011-08-22 12:12         ` Georg-Johann Lay
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2011-08-22  9:59 UTC (permalink / raw)
  To: Georg-Johann Lay, gcc, richard.sandiford

On Mon, Aug 22, 2011 at 10:19 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Georg-Johann Lay <avr@gjlay.de> writes:
>>>>IMO a clean approach would be to query the costs of a whole insn (resp.
>>>>it's pattern) rather than the cost of an RTX.  COSTS_N_INSNS already
>>>>indicates that the costs are compared to *insn* costs i.e. cost of the
>>>>whole pattern (modulo clobbers).
>>>
>>> The problem is that we sometimes want the cost of something that cannot
>>> be done using a single instruction.  E.g. some CONST_INTs take several
>>> instructions to create on MIPS.  In this case the costs are really
>>> measuring the cost of an emit_move_insn sequence, not a single insn.
>>>
>>> I suppose we could use emit_move_insn to create a temporary sequence
>>> and sum the cost of each individual instruction.  But that's potentially
>>> expensive.
>>
>> No, that complexity is not needed.  For (set (reg) (const_int)) the BE
>> can just return the cost of the expanded sequence because it knows how
>> it will be expanded and how much it will cost.  There's no need to
>> really expand the sequence.
>
> Sorry, I'd misunderstood your suggestion.  I thought you were suggesting
> that the rtx costs functions should only be presented with SETs that are
> valid instructions.  I hadn't realised that you were still allowing these
> SETs to be arbitrary ones that have been cooked up by the optimisers.
>
> So are you saying that we should remove the recursive nature of the
> rtx_cost/targetm.rtx_costs interface, and have the backend handle any
> recursion itself?  I.e. targetm.rtx_costs only ever sees a complete
> (but perhaps invalid) instruction pattern?  Or would you still keep
> the current recursion?

I would say yes to that - kill the recursion.

Richard.

> Richard
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-22  8:20       ` Richard Sandiford
  2011-08-22  9:59         ` Richard Guenther
@ 2011-08-22 12:12         ` Georg-Johann Lay
  1 sibling, 0 replies; 16+ messages in thread
From: Georg-Johann Lay @ 2011-08-22 12:12 UTC (permalink / raw)
  To: Georg-Johann Lay, gcc, richard.sandiford

Richard Sandiford wrote:
> Georg-Johann Lay <avr@gjlay.de> writes:
>>>> IMO a clean approach would be to query the costs of a whole insn (resp. 
>>>> it's pattern) rather than the cost of an RTX.  COSTS_N_INSNS already 
>>>> indicates that the costs are compared to *insn* costs i.e. cost of the 
>>>> whole pattern (modulo clobbers).
>>> The problem is that we sometimes want the cost of something that cannot
>>> be done using a single instruction.  E.g. some CONST_INTs take several
>>> instructions to create on MIPS.  In this case the costs are really
>>> measuring the cost of an emit_move_insn sequence, not a single insn.
>>>
>>> I suppose we could use emit_move_insn to create a temporary sequence
>>> and sum the cost of each individual instruction.  But that's potentially
>>> expensive.
>> No, that complexity is not needed.  For (set (reg) (const_int)) the BE 
>> can just return the cost of the expanded sequence because it knows how 
>> it will be expanded and how much it will cost.  There's no need to 
>> really expand the sequence.
> 
> Sorry, I'd misunderstood your suggestion.  I thought you were suggesting
> that the rtx costs functions should only be presented with SETs that are
> valid instructions.  I hadn't realised that you were still allowing these
> SETs to be arbitrary ones that have been cooked up by the optimisers.

RTX costs only make sense if the rtx eventually results in insns.
This can basically happen in two ways:

* expander which transforms insn-like expression to a sequence of
  insns.  Example is x << y in some backend that cannot do it natively
  and expand it into loop.  Similar example is X + big_const which
  cannot be handled by target, i.e. insn predicate denies it.

* cooking up new insns like in insn combine.  It only makes sense to
  query costs for insns that actually match, i.e. pass recog or
  recog_for_combine or whatever.

> So are you saying that we should remove the recursive nature of the
> rtx_cost/targetm.rtx_costs interface, and have the backend handle any
> recursion itself?  I.e. targetm.rtx_costs only ever sees a complete
> (but perhaps invalid) instruction pattern?  Or would you still keep
> the current recursion?

I don't see benefit of recursion because every step removes information.

E.g. in the example you gave in
   http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01264.html
which is cost of shifts like
   x << ?
the operand number does not help because you need the second operand
to determine the cost: shift by constant offset in general has different
cost than shift by variable.  Thus, only the complete RTX makes sense for
cost computation.

In general it's not possible to separate the cost function
because
   cost (f(a,b)) != cost (f) + cost (a,0) + cost (b,1)
resp. you cannot represent costs in that orthogonal way and such an ansatz
must fail.

There are also cases where costs are paradoxical, i.e. a more complex
expression has lower costs than a simpler one. Example is bit extraction
which might be cheaper than shifting the mask plus oring/anding.

BTW, avr BE does recursion inside rtx_costs which is bad idea, imo.
But that's up to the target.

Johann

> Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-22  9:59         ` Richard Guenther
@ 2011-08-22 13:08           ` Joern Rennecke
  2011-08-22 15:53             ` David Edelsohn
  0 siblings, 1 reply; 16+ messages in thread
From: Joern Rennecke @ 2011-08-22 13:08 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Georg-Johann Lay, gcc, richard.sandiford

Quoting Richard Guenther <richard.guenther@gmail.com>:

>> So are you saying that we should remove the recursive nature of the
>> rtx_cost/targetm.rtx_costs interface, and have the backend handle any
>> recursion itself?  I.e. targetm.rtx_costs only ever sees a complete
>> (but perhaps invalid) instruction pattern?  Or would you still keep
>> the current recursion?
>
> I would say yes to that - kill the recursion.

But the recursion is already optional.  If you don't want to use recursion
for your port, just make the rtx_costs hook return true.
There is no need to break ports that are OK to use the recursion in rtlanal.c
partially or in whole.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-22 13:08           ` Joern Rennecke
@ 2011-08-22 15:53             ` David Edelsohn
  0 siblings, 0 replies; 16+ messages in thread
From: David Edelsohn @ 2011-08-22 15:53 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Richard Guenther, Georg-Johann Lay, gcc, richard.sandiford

On Mon, Aug 22, 2011 at 9:08 AM, Joern Rennecke <amylaar@spamcop.net> wrote:
> Quoting Richard Guenther <richard.guenther@gmail.com>:
>
>>> So are you saying that we should remove the recursive nature of the
>>> rtx_cost/targetm.rtx_costs interface, and have the backend handle any
>>> recursion itself?  I.e. targetm.rtx_costs only ever sees a complete
>>> (but perhaps invalid) instruction pattern?  Or would you still keep
>>> the current recursion?
>>
>> I would say yes to that - kill the recursion.
>
> But the recursion is already optional.  If you don't want to use recursion
> for your port, just make the rtx_costs hook return true.
> There is no need to break ports that are OK to use the recursion in
> rtlanal.c
> partially or in whole.

Exactly.  I don't understand the disagreement about recursion.  For
instance, the rs6000 port explicitly returns true or false for
rtx_costs as necessary for its computation.  If a port wants to
compute rtx_costs without recursion, it already has that control.

Thanks, David

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-21 17:03     ` Georg-Johann Lay
  2011-08-22  8:20       ` Richard Sandiford
@ 2011-08-22 23:23       ` Peter Bigot
  2011-08-26 14:41         ` Georg-Johann Lay
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Bigot @ 2011-08-22 23:23 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: Richard Sandiford, gcc

On Sun, Aug 21, 2011 at 12:01 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>
> Richard Sandiford schrieb:
>>
>> Georg-Johann Lay <avr@gjlay.de> writes:
>>
>>> Richard Sandiford schrieb:
>>>
>>>> I've been working on some patches to make insn_rtx_cost take account
>>>> of the cost of SET_DESTs as well as SET_SRCs.  But I'm slowly beginning
>>>> to realise that I don't understand what rtx costs are supposed to represent.
>>>>
>>>> AIUI the rules have historically been:
>>>>
>>>>  1) Registers have zero cost.
>>>>
>>>>  2) Constants have a cost relative to that of registers.  By extension,
>>>>    constants have zero cost if they are as cheap as a register.
>>>>
>>>>  3) With an outer code of SET, actual operations have the cost
>>>>    of the associated instruction.  E.g. the cost of a PLUS
>>>>    is the cost of an addition instruction.
>>>>
>>>>  4) With other outer codes, actual operations have the cost
>>>>    of the combined instruction, if available, or the cost of
>>>>    a separate instruction otherwise.  E.g. the cost of a NEG
>>>>    inside an AND might be zero on targets that support BIC-like
>>>>    instructions, and COSTS_N_INSNS (1) on most others.
>>>>
>>>> [...]
>>>>
>>>> But that hardly seems clean either.  Perhaps we should instead make
>>>> the SET_SRC always include the cost of the SET, even for registers,
>>>> constants and the like.  Thoughts?
>>>
>>> IMO a clean approach would be to query the costs of a whole insn (resp. it's pattern) rather than the cost of an RTX.  COSTS_N_INSNS already indicates that the costs are compared to *insn* costs i.e. cost of the whole pattern (modulo clobbers).
>>
>> The problem is that we sometimes want the cost of something that cannot
>> be done using a single instruction.  E.g. some CONST_INTs take several
>> instructions to create on MIPS.  In this case the costs are really
>> measuring the cost of an emit_move_insn sequence, not a single insn.
>>
>> I suppose we could use emit_move_insn to create a temporary sequence
>> and sum the cost of each individual instruction.  But that's potentially
>> expensive.
>
> No, that complexity is not needed.  For (set (reg) (const_int)) the BE can just return the cost of the expanded sequence because it knows how it will be expanded and how much it will cost.  There's no need to really expand the sequence.
>
> That's the way, e.g. AVR backend works: Shifts/mul/div must be expanded because the hardware does not support them natively.  The rtx_cost for such an expression (which are always interpreted as RHS of a (set (reg) ...)) are the sum over the costs of all insns the expander will produce.

One of my problems with this approach is that the logic that's put
into an expander definition preparation statement (or, in the case of
AVR, the function invoked by the insn output statement) gets
replicated abstractly in rtx_costs: both places have long switch
statements on operand mode and const shift value to determine the
instructions that get emitted (in the former) or how many of them
there are (in the latter).  How likely is it the two are kept
consistent over the years?

I'm working on the (not yet pushed upstream) back-end for the TI
MSP430, which has some historical relationship to AVR from about a
decade ago, and the answer to that question is "not very likely".
I've changed the msp430 back-end so that instead of putting all that
logic in the output statement for the insn, it goes into a preparation
statement for a standard expander.  This way the individual insns that
result in (say) a constant shift of 8 bits using xor and bswap are
available for the optimizer and register allocator to improve.

This works pretty well, but still leaves me with problems when it
comes to computing RTX costs, because there seems to be some strength
reduction optimization for multiplication that's asking for the costs
to shift each integer type by 1 to 15 bits, when in fact no such insn
should ever be produced if real code was being generated.  I think
this is an example of the case Richard's describing.

If, in rtx_costs, I could detect an unexpected insn, deduce the
correct expander function, call it, then recurse on the sequence it
generated, I'd get the right answer---though I'd infinitely prefer not
to be asked to calculate the cost of an unexpected insn.  Doing this
expansion would probably be very expensive, though, and with the side
effects that are part of emit_insn I don't know how to safely call
things that invoke it when what gets emitted isn't part of the actual
stream.

>>
>> Also, any change along these lines is similar to the "tie costs to
>> .md patterns" thing that I mentioned at the end of the message.
>> I don't really have time to work on anything so invasive, so the
>> question is really whether we can sensibly change the costs within
>> the current framework.
>>
>>> E.g. the cost of a CONST_INT is meaningless if you don't know what to do with the constant. (set (reg:QI) (const_int 0)) might have different cost than (set (reg:SI) (const_int 0)).
>>
>> That's the old modeless-CONST_INT problem though.  I agree the mode
>> makes a difference, but I think it's a separate issue.
>
> It's the same bottom line: If you see the insn, you know the cost.

But can you know it without duplicating all the effort that goes into
expanding it?

>> If we did pass complete instructions, each caller would have to provide
>> a SET whose SET_DEST is an arbitrary psuedo register of the required mode.
>> In all likelihood, we'd define a new utility function do this for us.
>>
>> So in the modeless-CONST_INT world, we would need to pass the mode
>> to that utility function.  But if we're prepared to pass such extra
>> information around, we could simply do it by passing the mode directly
>> to rtx_cost.
>>
>> I think the cost of a full rvalue (rather than a complete SET),
>> should be based on the assumption that the destination is a register.
>> So I don't the backend is missing any information in that respect.
>> I.e. the problem doesn't seem to be lack of information, but rather
>> lack of consistency (between registers, constants, operators,
>> and so on).
>>
>>> Similar applies for an addition if you don't know if it's for
>>> arithmetic or address offset computation inside a MEM.
>>
>> Yeah, that'd be bad, but I don't think we take costs at that level.
>> Only a few places take the cost of anything "smaller" than a complete
>> rvalue, and they either take the cost of a condition or take the cost of
>> an operand to a "top-level" operator.  The target then has the option of
>> stopping the recursion whenever it likes.
>>
>> Richard
>
> Johann

Peter

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Just what are rtx costs?
  2011-08-22 23:23       ` Peter Bigot
@ 2011-08-26 14:41         ` Georg-Johann Lay
  0 siblings, 0 replies; 16+ messages in thread
From: Georg-Johann Lay @ 2011-08-26 14:41 UTC (permalink / raw)
  To: Peter Bigot; +Cc: Richard Sandiford, gcc

Peter Bigot wrote:
> On Sun, Aug 21, 2011 at 12:01 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>> Richard Sandiford schrieb:
>>> Georg-Johann Lay <avr@gjlay.de> writes:
>>> 
>>>> Richard Sandiford schrieb:
>>>> 
>>>>> I've been working on some patches to make insn_rtx_cost take account
>>>>>  of the cost of SET_DESTs as well as SET_SRCs.  But I'm slowly
>>>>> beginning to realise that I don't understand what rtx costs are
>>>>> supposed to represent.
>>>>> 
>>>>> AIUI the rules have historically been:
>>>>> 
>>>>> 1) Registers have zero cost.
>>>>> 
>>>>> 2) Constants have a cost relative to that of registers.  By
>>>>> extension, constants have zero cost if they are as cheap as a
>>>>> register.
>>>>> 
>>>>> 3) With an outer code of SET, actual operations have the cost of the
>>>>> associated instruction.  E.g. the cost of a PLUS is the cost of an
>>>>> addition instruction.
>>>>> 
>>>>> 4) With other outer codes, actual operations have the cost of the
>>>>> combined instruction, if available, or the cost of a separate
>>>>> instruction otherwise.  E.g. the cost of a NEG inside an AND might
>>>>> be zero on targets that support BIC-like instructions, and
>>>>> COSTS_N_INSNS (1) on most others.
>>>>> 
>>>>> [...]
>>>>> 
>>>>> But that hardly seems clean either.  Perhaps we should instead make 
>>>>> the SET_SRC always include the cost of the SET, even for registers, 
>>>>> constants and the like.  Thoughts?
>>>> IMO a clean approach would be to query the costs of a whole insn
>>>> (resp. it's pattern) rather than the cost of an RTX.  COSTS_N_INSNS
>>>> already indicates that the costs are compared to *insn* costs i.e.
>>>> cost of the whole pattern (modulo clobbers).
>>> The problem is that we sometimes want the cost of something that cannot 
>>> be done using a single instruction.  E.g. some CONST_INTs take several 
>>> instructions to create on MIPS.  In this case the costs are really 
>>> measuring the cost of an emit_move_insn sequence, not a single insn.
>>> 
>>> I suppose we could use emit_move_insn to create a temporary sequence and
>>> sum the cost of each individual instruction.  But that's potentially 
>>> expensive.
>> No, that complexity is not needed.  For (set (reg) (const_int)) the BE can
>> just return the cost of the expanded sequence because it knows how it will
>> be expanded and how much it will cost.  There's no need to really expand
>> the sequence.
>> 
>> That's the way, e.g. AVR backend works: Shifts/mul/div must be expanded
>> because the hardware does not support them natively.  The rtx_cost for
>> such an expression (which are always interpreted as RHS of a (set (reg)
>> ...)) are the sum over the costs of all insns the expander will produce.
> 
> One of my problems with this approach is that the logic that's put into an
> expander definition preparation statement (or, in the case of AVR, the
> function invoked by the insn output statement) gets replicated abstractly in
> rtx_costs: both places have long switch statements on operand mode and const
> shift value to determine the instructions that get emitted (in the former)
> or how many of them there are (in the latter).  How likely is it the two are
> kept consistent over the years?

Yes, it's hard and uncomfortable to have the information duplicated in several
places.  But recursing into the RTXes won't solve that problem because the
costs cannot be separated into a sum in general.  In the case where an
expression is lowered down from tree to rtl and the expander has to weight
several approaches against each other (like to shift or to multiply), it's not
even possible to get the costs from insn attribute because there are no such
insn.  Getting costs from attributes -- which avoids keeping information in
several places -- works fine from expand to reload.  But for costs of insns
sequences that are to be expanded I don't know how to avoid that duplication.

> I'm working on the (not yet pushed upstream) back-end for the TI MSP430,
> which has some historical relationship to AVR from about a decade ago, and
> the answer to that question is "not very likely". I've changed the msp430
> back-end so that instead of putting all that logic in the output statement
> for the insn, it goes into a preparation statement for a standard expander.
> This way the individual insns that result in (say) a constant shift of 8
> bits using xor and bswap are available for the optimizer and register
> allocator to improve.

Don't know if that works fine with AVR. For shifts it's a bit similar to MSP
because it can just shift by 1.  However, I observed in many situations that
splitting too early leads to bad code, mainly when there are SUBREGs all over
the place which register allocation does handle optimally, i.e. you will see
moves all over the place.  Some patterns are split late because that depends on
register allocation or rtl peephole (clobber available).

> This works pretty well, but still leaves me with problems when it comes to
> computing RTX costs, because there seems to be some strength reduction
> optimization for multiplication that's asking for the costs to shift each
> integer type by 1 to 15 bits, when in fact no such insn should ever be
> produced if real code was being generated.  I think this is an example of
> the case Richard's describing.
> 
> If, in rtx_costs, I could detect an unexpected insn, deduce the correct
> expander function, call it, then recurse on the sequence it generated, I'd
> get the right answer---though I'd infinitely prefer not to be asked to
> calculate the cost of an unexpected insn.  Doing this expansion would
> probably be very expensive, though, and with the side effects that are part
> of emit_insn I don't know how to safely call things that invoke it when what
> gets emitted isn't part of the actual stream.

Agree, but if there is no insn/expander for a, say, specific arithmetic it
might still be worth to get it's cost to determine if expanding to some insn
sequence is better than library call.

>>> Also, any change along these lines is similar to the "tie costs to .md
>>> patterns" thing that I mentioned at the end of the message. I don't
>>> really have time to work on anything so invasive, so the question is
>>> really whether we can sensibly change the costs within the current
>>> framework.

What I did in a port to get the costs for insns is all inside rtx_costs and
does not rely on any extension in the compiler.

FYI, rtx_costs works as follows:

1) Handle basic arithmetic like unary and binary operations and BLKmode stuff.

2) Query insn attributes:

/* Some rtx singletons frequently used in rtx-cost computation */

static GTY(()) rtx fake_insn = NULL_RTX;
static GTY(()) rtx fake_jump_insn = NULL_RTX;
static GTY(()) rtx fake_pattern = NULL_RTX;
static GTY(()) rtx fake_jump_pattern = NULL_RTX;

  ...

  /* Section 2:
     Query the machine description for rtx costs.  We find it too tedious to
     write all or most of the patterns down twice: once as fancy rtl and a
     second time as even more braindead and hard-to-maintain XEXP-orgy.
     We build a fake insn and look for insn attribute "ticks" resp. "space".
 */

  if (SET == outer_code
      && mode != BLKmode)
    {
      int num_clobbers;
      rtx pattern, dest;
      rtx insn;
      int icode = -1;

      /* Some passes like if-convert-after-reload call for rtx costs after
         reload_completed.  We have no idea how the set-dest looks like, GCC
         developers once more make a mistery around information which is
         actually present. We return 'unknown', i.e. 0 in that case. */

      if (reload_completed)
        {
          *total = 0;
          return true;
        }

      /* Set up an insn to be recognized */

      if (NULL_RTX == fake_insn)
        {
          /* Doh!  We've got no wrapping insn yet.  Cook one up.  */

          fake_insn = make_insn_raw (NULL_RTX);
          fake_jump_insn = make_jump_insn_raw (NULL_RTX);
          fake_pattern = gen_rtx_SET (VOIDmode,
                                      gen_rtx_REG (SImode, 99999),
                                      gen_rtx_REG (SImode, 99999));
          fake_jump_pattern = gen_rtx_SET (VOIDmode, pc_rtx,
                                           gen_rtx_REG (SImode, 99999));
        }

      if (VOIDmode == mode)
        {
          /* This is for a conditional jump */
          dest = pc_rtx;
          insn = fake_jump_insn;
        }
      else
        {
          /* This is for ordinary insn */
          dest = gen_rtx_REG (mode, 99999);
          insn = fake_insn;
        }

      pattern = gen_rtx_SET (VOIDmode, dest, x);
      PATTERN (insn) = pattern;

      /* Open gate for COST_INSNs */

      have_cost_insns = 1;

      if (icode = recog (pattern, insn, &num_clobbers), icode != -1)
        {
          int cost = (speed) ? get_attr_ticks (insn) : get_attr_space (insn);

          *total = cost;
        }

      /* Close gate for COST_INSNs */

      have_cost_insns = 0;

     /* ??? We set the pattern to a valid rtx construct because we observed ggc
        aborting for complex programs due to invalid set_dest in the pattern
        which originated in x.  Up to now, the following fix works...  */

      PATTERN (fake_insn) = fake_pattern;
      PATTERN (fake_jump_insn) = fake_jump_pattern;

The target has quite a number of combine patterns and they all are handled by
this few lines of code :-)

Playing around, I added some "costs insns".  These insns' condition is
have_cost_insn.  That way I can write arbitrary RTXes and attach costs to them
which is better to read than bulk of XEXP/GET_CoDE/GET_MODE etc.

>>>> E.g. the cost of a CONST_INT is meaningless if you don't know what to
>>>> do with the constant. (set (reg:QI) (const_int 0)) might have
>>>> different cost than (set (reg:SI) (const_int 0)).
>>> That's the old modeless-CONST_INT problem though.  I agree the mode 
>>> makes a difference, but I think it's a separate issue.
>> It's the same bottom line: If you see the insn, you know the cost.
> 
> But can you know it without duplicating all the effort that goes into 
> expanding it?

For many cases the costs are not really known and you have to provide a good
estimated value; in particular if costs depend on register allocation. I have
that case on the target: multiply-add and loading constants highly depend ond
register allocation.  Even knowing the mode for CONST_INTs would not help
because they all have word_mode.

>>> If we did pass complete instructions, each caller would have to provide 
>>> a SET whose SET_DEST is an arbitrary psuedo register of the required
>>> mode. In all likelihood, we'd define a new utility function do this for
>>> us.
>>> 
>>> So in the modeless-CONST_INT world, we would need to pass the mode to
>>> that utility function.  But if we're prepared to pass such extra 
>>> information around, we could simply do it by passing the mode directly 
>>> to rtx_cost.
>>> 
>>> I think the cost of a full rvalue (rather than a complete SET), should
>>> be based on the assumption that the destination is a register. So I
>>> don't the backend is missing any information in that respect. I.e. the
>>> problem doesn't seem to be lack of information, but rather lack of
>>> consistency (between registers, constants, operators, and so on).
>>> 
>>>> Similar applies for an addition if you don't know if it's for 
>>>> arithmetic or address offset computation inside a MEM.
>>> Yeah, that'd be bad, but I don't think we take costs at that level. Only
>>> a few places take the cost of anything "smaller" than a complete rvalue,
>>> and they either take the cost of a condition or take the cost of an
>>> operand to a "top-level" operator.  The target then has the option of 
>>> stopping the recursion whenever it likes.
>>> 
>>> Richard
>> Johann
> 
> Peter

Johann

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-08-26 14:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-17 14:52 Just what are rtx costs? Richard Sandiford
2011-08-17 17:29 ` Georg-Johann Lay
2011-08-19 12:25   ` Richard Sandiford
2011-08-21 17:03     ` Georg-Johann Lay
2011-08-22  8:20       ` Richard Sandiford
2011-08-22  9:59         ` Richard Guenther
2011-08-22 13:08           ` Joern Rennecke
2011-08-22 15:53             ` David Edelsohn
2011-08-22 12:12         ` Georg-Johann Lay
2011-08-22 23:23       ` Peter Bigot
2011-08-26 14:41         ` Georg-Johann Lay
2011-08-17 18:56 ` Paolo Bonzini
2011-08-18  7:58   ` Richard Sandiford
2011-08-18  0:55 ` Hans-Peter Nilsson
2011-08-18  8:14   ` Richard Sandiford
2011-08-19 14:17   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).