Improve addsi3 for CONST

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Improve addsi3 for CONST_INT
       [not found] ` <CADOs=zbMzc1LyUS3ZjD=zbV9dHUuamC4R+YVb-mS=1R99p1BtQ@mail.gmail.com>
@ 2011-07-08 16:45   ` Georg-Johann Lay
  2011-07-08 21:22     ` Joern Rennecke
  2011-07-10 16:57     ` Denis Chertykov
  0 siblings, 2 replies; 3+ messages in thread
From: Georg-Johann Lay @ 2011-07-08 16:45 UTC (permalink / raw)
  To: gcc; +Cc: Denis Chertykov

Denis Chertykov wrote:
> 2011/7/7 Georg-Johann Lay:
>> Hi Denis.
> 
> I think that it's a good question to discuss inside gcc mailing list.
> May be somebody more qualified person give a better suggestion than me.

...bringing this over to gcc mailing list

>> I think about improving addsi3 insn when a const_int is added.
>> The hard case is when the register to add the value is not in "d".
>> In the current implementation the constant simply gets reloaded into
>> some register (provided it is not an "easy" const like 1 or -1).
>>
>> That's a waste because
>>
>> * a 32-bit register gets initialized with the const, but a 8-bit
>>  d-register would be sufficient.
>>
>> * Instead of setting an intermediate register, the value could be
>>  added byte-by-byte if one d-reg was available.
>>
>> A bit unsure about how to approach that, I'd ask you what you think
>> about it.  I can think of three approaches:
>>
>> 1) Use a peep2 on sequence like
>>    *reload_insi
>>    *addsi3
>>   resp.
>>    *movsi
>>    *addsi3
>>
>> 2) Add new insn with (clobber (match_scratch:QI "=&d"))
>>
>> 3) Allow all const_int in *addsi3. If peep2 comes up with
>>   a d-clobber, that's fine. Else have to cook up a clobber
>>   by saving a d-reg to tmp_reg similar to movsi implementation.
>>
>> Approach (1) has the advantage that it is "neutral" with respect
>> to reloading.  However, reload will always allocate a SI register,
>> even if peep2 comes up with a scratch.
> 
> But, CSE will always have a chance to optimize it.

You mean CSE or reload-CSE?

If same constant is used multiple times, CSE can arrange for that and
combine can can compose a addsi+clobber. combine will only insert the
constant if it can reduce the number of insns, i.e. if two insns need
the same constant it won't do the replacement.

>> Approach (2) has the advantage that it has just "pressure 1 on d-regs"
>> whilst (1) and the current implementation have "pressure 4 or even
>> r-regs".
> 
> IMHO it's good idea to check.
> 
>> I am in favor of approach (3): It's easier pattern like (1) and it
>> does not put pressure on regs at all.  Also (3) makes no additional
>> work to support SF or if there were fixed-point mode.
> 
> It's a fake.
> Generally, all fake methods is a wrong way. (Look at fake addressing)

Yes, it's a fake.  Bot note that movsi "r,i" alternative is also a
fake, and maybe you remember my failed approach to use secondary
reloads for that:
   http://gcc.gnu.org/ml/gcc/2011-03/msg00290.html

A word on fake addressing (PR46278):

You were right with that.
At the time you wrote the AVR port, there was nothing like
MODE_CODE_BASE_REG_CLASS or REGNO_MODE_CODE_OK_FOR_BASE_P.
And even today it's not possible to specify different costs for
different addressing modes (ADDRESS_COST won't do it because it wors
on anatomy of an address and not on the address reg(s) invented).
If there was a way to tell the costs to IRA/reload and these passes
cared for the costs, fake addressing would be no problem!

There are architectures where
  LOAD R, [A]
resp.
  LOAD R, [B]
do exactly the same but have different costs because A can be used
implicitly and thus yields shorter code.  There's no way to tell it to
the allocator.  All you can do is define reg-alloc-order and hope for
the best.

> I think that I WAS WRONG while I decide to create __zero_reg__ and
> __tmp_reg__. (It was derived from my low qualify at start of EGCS
> hacking)

I thought about that topic already more than once.
I have no idea how it could be done better.

OPTIMIZE_MODE_SWITCHING looks rempting but it runs prior to reload,
and many uses of ZERO or TMP generated in reload or peep2.
We would need a pass with similar abilities running after peep2.

To test in ISR if ZERO or TMP is needed, insn attributes could help.
But the stack layout would have to be changed ad-hoc, new BE-pass was
needed and it's error-prone IMHO.

Introducing new ZERO_REG like R2 would change the ABI and ISR
pro/epilogue would increase even more because some insns
change/restore ZERO.

>> Or do you think the improvement is not needed?
> 
> I vote for 2, but I can be wrong.
> You know the best way ;-) Try all methods and compare results.
> (profile optimizing yourself ;-)
> 
>> Thanks for your recommendation.
> 
> I want to give you a recomendation: please, cleanup the port from
> using immediate numbers as register numbers. (in your patches you
> frequently use it)

I intend to do more backend cleanup, the elfos patch just was the first.

> Please, use constants !
> I forgot to point to them in your last patches.
> e.i.
> 
>               /* We have no clobber reg but need one.  Cook one up.
>                  That's cheaper than loading from constant pool.  */
> 
>               cooked_clobber_p = true;
>               clobber_reg = gen_rtx_REG (QImode, 31);
> 
> 
> Use REG_Z + 1 instead of 31.

Already committed:
   http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00634.html

> Few years ago I have an idea to use register renumbering to allow
> using r26,r27,r30,r31 as one SImode register.
> 
> 
> Denis.

Sounds crazy. You mean make X fixed and to have a soft-X after Z like so?

r26/27: X fix
r28/29: Y
r30/31: Z
r32/33: soft-X

That would increase pressure on XYZ even more. I don't think GCC's
register allocator is ready for that.

Johann

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Improve addsi3 for CONST_INT
  2011-07-08 16:45   ` Improve addsi3 for CONST_INT Georg-Johann Lay
@ 2011-07-08 21:22     ` Joern Rennecke
  2011-07-10 16:57     ` Denis Chertykov
  1 sibling, 0 replies; 3+ messages in thread
From: Joern Rennecke @ 2011-07-08 21:22 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc, Denis Chertykov

Quoting Georg-Johann Lay <avr@gjlay.de>:

> OPTIMIZE_MODE_SWITCHING looks rempting but it runs prior to reload,
> and many uses of ZERO or TMP generated in reload or peep2.
> We would need a pass with similar abilities running after peep2.

Actually, optimize_mode_switching used to run after reload, and its
EMIT_HARD_REG_SET interface still bears witness to that - I.e. it
passes you a HARD_REG_SET of live registers.
You could just try if it still works after reload.
You can register additional instances of a pass with a register_pass somewhere
from your port or a plugin.
In fact, I'm currently working on a port with interrelated switchable
entities where I need a total of three instances of pass_mode_switching,
plus two helper passes, to get the kind of optimized mode switching I
set out to get.

Although I'm not quite sure what bearing that has on different hard register
address costs.  I would think a set of memory constraints and
differently-costed alternatives in instruction alternatives would expose the
cost differences to the register allocator and reload.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Improve addsi3 for CONST_INT
  2011-07-08 16:45   ` Improve addsi3 for CONST_INT Georg-Johann Lay
  2011-07-08 21:22     ` Joern Rennecke
@ 2011-07-10 16:57     ` Denis Chertykov
  1 sibling, 0 replies; 3+ messages in thread
From: Denis Chertykov @ 2011-07-10 16:57 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc

2011/7/8 Georg-Johann Lay <avr@gjlay.de>:
> Denis Chertykov wrote:
>> 2011/7/7 Georg-Johann Lay:
>>> Hi Denis.
>>
>> I think that it's a good question to discuss inside gcc mailing list.
>> May be somebody more qualified person give a better suggestion than me.
>
> ...bringing this over to gcc mailing list
>
>>> I think about improving addsi3 insn when a const_int is added.
>>> The hard case is when the register to add the value is not in "d".
>>> In the current implementation the constant simply gets reloaded into
>>> some register (provided it is not an "easy" const like 1 or -1).
>>>
>>> That's a waste because
>>>
>>> * a 32-bit register gets initialized with the const, but a 8-bit
>>>  d-register would be sufficient.
>>>
>>> * Instead of setting an intermediate register, the value could be
>>>  added byte-by-byte if one d-reg was available.
>>>
>>> A bit unsure about how to approach that, I'd ask you what you think
>>> about it.  I can think of three approaches:
>>>
>>> 1) Use a peep2 on sequence like
>>>    *reload_insi
>>>    *addsi3
>>>   resp.
>>>    *movsi
>>>    *addsi3
>>>
>>> 2) Add new insn with (clobber (match_scratch:QI "=&d"))
>>>
>>> 3) Allow all const_int in *addsi3. If peep2 comes up with
>>>   a d-clobber, that's fine. Else have to cook up a clobber
>>>   by saving a d-reg to tmp_reg similar to movsi implementation.
>>>
>>> Approach (1) has the advantage that it is "neutral" with respect
>>> to reloading.  However, reload will always allocate a SI register,
>>> even if peep2 comes up with a scratch.
>>
>> But, CSE will always have a chance to optimize it.
>
> You mean CSE or reload-CSE?

As I remember reload himself can optimize such cases, also postreload CSE.

>
> If same constant is used multiple times, CSE can arrange for that and
> combine can can compose a addsi+clobber. combine will only insert the
> constant if it can reduce the number of insns, i.e. if two insns need
> the same constant it won't do the replacement.
>
>>> Approach (2) has the advantage that it has just "pressure 1 on d-regs"
>>> whilst (1) and the current implementation have "pressure 4 or even
>>> r-regs".
>>
>> IMHO it's good idea to check.
>>
>>> I am in favor of approach (3): It's easier pattern like (1) and it
>>> does not put pressure on regs at all.  Also (3) makes no additional
>>> work to support SF or if there were fixed-point mode.
>>
>> It's a fake.
>> Generally, all fake methods is a wrong way. (Look at fake addressing)
>
> Yes, it's a fake.  Bot note that movsi "r,i" alternative is also a
> fake, and maybe you remember my failed approach to use secondary
> reloads for that:
>   http://gcc.gnu.org/ml/gcc/2011-03/msg00290.html
>
> A word on fake addressing (PR46278):
>
> You were right with that.
> At the time you wrote the AVR port, there was nothing like
> MODE_CODE_BASE_REG_CLASS or REGNO_MODE_CODE_OK_FOR_BASE_P.
> And even today it's not possible to specify different costs for
> different addressing modes (ADDRESS_COST won't do it because it wors
> on anatomy of an address and not on the address reg(s) invented).
> If there was a way to tell the costs to IRA/reload and these passes
> cared for the costs, fake addressing would be no problem!
>
> There are architectures where
>  LOAD R, [A]
> resp.
>  LOAD R, [B]
> do exactly the same but have different costs because A can be used
> implicitly and thus yields shorter code.  There's no way to tell it to
> the allocator.  All you can do is define reg-alloc-order and hope for
> the best.
>
>> I think that I WAS WRONG while I decide to create __zero_reg__ and
>> __tmp_reg__. (It was derived from my low qualify at start of EGCS
>> hacking)
>
> I thought about that topic already more than once.
> I have no idea how it could be done better.
>
> OPTIMIZE_MODE_SWITCHING looks rempting but it runs prior to reload,
> and many uses of ZERO or TMP generated in reload or peep2.
> We would need a pass with similar abilities running after peep2.
>
> To test in ISR if ZERO or TMP is needed, insn attributes could help.
> But the stack layout would have to be changed ad-hoc, new BE-pass was
> needed and it's error-prone IMHO.
>
> Introducing new ZERO_REG like R2 would change the ABI and ISR
> pro/epilogue would increase even more because some insns
> change/restore ZERO.
>
>
>>> Or do you think the improvement is not needed?
>>
>> I vote for 2, but I can be wrong.
>> You know the best way ;-) Try all methods and compare results.
>> (profile optimizing yourself ;-)
>>
>>> Thanks for your recommendation.
>>
>> I want to give you a recomendation: please, cleanup the port from
>> using immediate numbers as register numbers. (in your patches you
>> frequently use it)
>
> I intend to do more backend cleanup, the elfos patch just was the first.
>
>> Please, use constants !
>> I forgot to point to them in your last patches.
>> e.i.
>>
>>               /* We have no clobber reg but need one.  Cook one up.
>>                  That's cheaper than loading from constant pool.  */
>>
>>               cooked_clobber_p = true;
>>               clobber_reg = gen_rtx_REG (QImode, 31);
>>
>>
>> Use REG_Z + 1 instead of 31.
>
> Already committed:
>   http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00634.html
>
>> Few years ago I have an idea to use register renumbering to allow
>> using r26,r27,r30,r31 as one SImode register.
>>
>>
>> Denis.
>
> Sounds crazy. You mean make X fixed and to have a soft-X after Z like so?
>
> r26/27: X fix
> r28/29: Y
> r30/31: Z
> r32/33: soft-X

No, no.
I have used Y for frame pointer because of `icall' and `ijmp'
Register allocator can't allocate SImode register to X,Z because Y
between of them.
(It's because of poor AVR architecture)
Just imagine that the Z register in r28 and r29 and Y in r30 and r31.
(only for GCC)
GCC will use r28,r29 as Z but AVR port will print r28 as r30 and r29 as r31,
r30 as r28 and r31 as r29.

>
> That would increase pressure on XYZ even more. I don't think GCC's
> register allocator is ready for that.

It will enable to allocate SImode registers to r26,r27,r30,r31 it's
definitely a win.


Denis.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-07-10 16:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4E15CADB.80906@gjlay.de>
     [not found] ` <CADOs=zbMzc1LyUS3ZjD=zbV9dHUuamC4R+YVb-mS=1R99p1BtQ@mail.gmail.com>
2011-07-08 16:45   ` Improve addsi3 for CONST_INT Georg-Johann Lay
2011-07-08 21:22     ` Joern Rennecke
2011-07-10 16:57     ` Denis Chertykov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).