public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Question on peephole2 optimizer
@ 2020-02-03 14:49 Henri Cloetens
  2020-02-03 15:50 ` Jeff Law
  0 siblings, 1 reply; 5+ messages in thread
From: Henri Cloetens @ 2020-02-03 14:49 UTC (permalink / raw)
  To: gcc-help

Hello all,

I have a question on the peephole2 optimizer.

- My target has a "load double" instruction:
    - It does an indexed load of a 64-bit operand to two 32-bit registers.
    - The requirement is that the registers are adjacant
      (Ri and Ri+1), and that the offset for the second load is 4 byte more
     than for the first load.

- I can not find a way to describe this in gcc. I tried
   "load_multiple", and this is OK, but gcc only calls that for stack 
pushing.
   I tried the vector facility, but this does not work either.

- I tried to write a peephole2 optimizer, and this works out OK, it
   manages to recognize the sequence, ... but the peephole2 optimizer is
   run AFTER register allocation, and the optimization needs to be done
   BEFORE, as there are constraints on the 2 registers, Ri and Ri+1.

Any suggestions ?. Is there any way to run peephole2 BEFORE register 
allocation ?.

Best Regards,

Henri.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question on peephole2 optimizer
  2020-02-03 14:49 Question on peephole2 optimizer Henri Cloetens
@ 2020-02-03 15:50 ` Jeff Law
  2020-02-03 17:10   ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Law @ 2020-02-03 15:50 UTC (permalink / raw)
  To: Henri Cloetens, gcc-help

On Mon, 2020-02-03 at 15:50 +0100, Henri Cloetens wrote:
> Hello all,
> 
> I have a question on the peephole2 optimizer.
> 
> - My target has a "load double" instruction:
>     - It does an indexed load of a 64-bit operand to two 32-bit registers.
>     - The requirement is that the registers are adjacant
>       (Ri and Ri+1), and that the offset for the second load is 4 byte more
>      than for the first load.
> 
> - I can not find a way to describe this in gcc. I tried
>    "load_multiple", and this is OK, but gcc only calls that for stack 
> pushing.
>    I tried the vector facility, but this does not work either.
> 
> - I tried to write a peephole2 optimizer, and this works out OK, it
>    manages to recognize the sequence, ... but the peephole2 optimizer is
>    run AFTER register allocation, and the optimization needs to be done
>    BEFORE, as there are constraints on the 2 registers, Ri and Ri+1.
> 
> Any suggestions ?. Is there any way to run peephole2 BEFORE register 
> allocation ?.
I suggest looking at ldp/stp support in the aarch64 backend.

jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question on peephole2 optimizer
  2020-02-03 15:50 ` Jeff Law
@ 2020-02-03 17:10   ` Richard Earnshaw (lists)
  2020-02-04  9:43     ` Henri Cloetens
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Earnshaw (lists) @ 2020-02-03 17:10 UTC (permalink / raw)
  To: law, Henri Cloetens, gcc-help

On 03/02/2020 15:49, Jeff Law wrote:
> On Mon, 2020-02-03 at 15:50 +0100, Henri Cloetens wrote:
>> Hello all,
>>
>> I have a question on the peephole2 optimizer.
>>
>> - My target has a "load double" instruction:
>>      - It does an indexed load of a 64-bit operand to two 32-bit registers.
>>      - The requirement is that the registers are adjacant
>>        (Ri and Ri+1), and that the offset for the second load is 4 byte more
>>       than for the first load.
>>
>> - I can not find a way to describe this in gcc. I tried
>>     "load_multiple", and this is OK, but gcc only calls that for stack
>> pushing.
>>     I tried the vector facility, but this does not work either.
>>
>> - I tried to write a peephole2 optimizer, and this works out OK, it
>>     manages to recognize the sequence, ... but the peephole2 optimizer is
>>     run AFTER register allocation, and the optimization needs to be done
>>     BEFORE, as there are constraints on the 2 registers, Ri and Ri+1.
>>
>> Any suggestions ?. Is there any way to run peephole2 BEFORE register
>> allocation ?.
> I suggest looking at ldp/stp support in the aarch64 backend.
> 
> jeff
> 

Closer would be the ldrd/strd support when generating code for Arm (not 
thumb); that has a similar restriction on register pairs being adjacent.

Summary, it's hard; and GCC's infrastructure does not support it 
particularly well.

R.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question on peephole2 optimizer
  2020-02-03 17:10   ` Richard Earnshaw (lists)
@ 2020-02-04  9:43     ` Henri Cloetens
  2020-02-04 17:09       ` Jeff Law
  0 siblings, 1 reply; 5+ messages in thread
From: Henri Cloetens @ 2020-02-04  9:43 UTC (permalink / raw)
  To: Richard Earnshaw (lists), law, gcc-help

Hello Richard, Jeff,

I checked both.
- The aarch64 backend uses the load double only for stacking operations.
   This, I have. This functionality is provided by gcc via the 
load_multiple construct.
   If you define it, gcc will use it for stack and unstack operations.
- The ARM has a peephole2 optimizer. This has the problem that it is run 
after the
   register allocation, and if the register allocation needs to change 
for the optimization
   to be done, the pattern fails. I tried that, I got that working, but 
... I dont like it.
- I found another way, which would work in theory:
   a. Add it to the "movsi"
     1. Make a "define_expand" of the movsi, which does the following:
       a. For the 'normal' case, it calls a define_insn "movsi_internal"
       b. It maintains a per-function history of past calls to self.
       c. For every call to movsi, it looks in the history if it finds a 
'partner'
           with which it can create a "load double"
       d. If it finds one, it starts going back in the insn-list, and do 
checking
           if the replacement is appropriate. It mainly means no in-between
           jumps and labels, no in-between modification of the address 
register,
           not too far back.
       e. If the checking is successful, it replaces the /previous/ 
movsi with the load double.

For now, I will park this, and do as in aarch64. I might try it later.

Best Regards,

Henri.

On 02/03/2020 06:10 PM, Richard Earnshaw (lists) wrote:
> On 03/02/2020 15:49, Jeff Law wrote:
>> On Mon, 2020-02-03 at 15:50 +0100, Henri Cloetens wrote:
>>> Hello all,
>>>
>>> I have a question on the peephole2 optimizer.
>>>
>>> - My target has a "load double" instruction:
>>>      - It does an indexed load of a 64-bit operand to two 32-bit 
>>> registers.
>>>      - The requirement is that the registers are adjacant
>>>        (Ri and Ri+1), and that the offset for the second load is 4 
>>> byte more
>>>       than for the first load.
>>>
>>> - I can not find a way to describe this in gcc. I tried
>>>     "load_multiple", and this is OK, but gcc only calls that for stack
>>> pushing.
>>>     I tried the vector facility, but this does not work either.
>>>
>>> - I tried to write a peephole2 optimizer, and this works out OK, it
>>>     manages to recognize the sequence, ... but the peephole2 
>>> optimizer is
>>>     run AFTER register allocation, and the optimization needs to be 
>>> done
>>>     BEFORE, as there are constraints on the 2 registers, Ri and Ri+1.
>>>
>>> Any suggestions ?. Is there any way to run peephole2 BEFORE register
>>> allocation ?.
>> I suggest looking at ldp/stp support in the aarch64 backend.
>>
>> jeff
>>
>
> Closer would be the ldrd/strd support when generating code for Arm 
> (not thumb); that has a similar restriction on register pairs being 
> adjacent.
>
> Summary, it's hard; and GCC's infrastructure does not support it 
> particularly well.
>
> R.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question on peephole2 optimizer
  2020-02-04  9:43     ` Henri Cloetens
@ 2020-02-04 17:09       ` Jeff Law
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Law @ 2020-02-04 17:09 UTC (permalink / raw)
  To: Henri Cloetens, Richard Earnshaw (lists), gcc-help

On Tue, 2020-02-04 at 10:44 +0100, Henri Cloetens wrote:
> Hello Richard, Jeff,
> 
> I checked both. 
> - The aarch64 backend uses the load double only for stacking operations.
>   This, I have. This functionality is provided by gcc via the load_multiple construct.
>   If you define it, gcc will use it for stack and unstack operations.
> - The ARM has a peephole2 optimizer. This has the problem that it is run after the
>   register allocation, and if the register allocation needs to change for the optimization
>   to be done, the pattern fails. I tried that, I got that working, but ... I dont like it.
> - I found another way, which would work in theory:
>   a. Add it to the "movsi" 
>     1. Make a "define_expand" of the movsi, which does the following:
>       a. For the 'normal' case, it calls a define_insn "movsi_internal"
>       b. It maintains a per-function history of past calls to self.
>       c. For every call to movsi, it looks in the history if it finds a 'partner'
>           with which it can create a "load double"
>       d. If it finds one, it starts going back in the insn-list, and do checking
>           if the replacement is appropriate. It mainly means no in-between
>           jumps and labels, no in-between modification of the address register,
>           not too far back.
>       e. If the checking is successful, it replaces the previous movsi with the load double.
> 
> For now, I will park this, and do as in aarch64. I might try it later. 
Trying to do this before register allocation isn't going to work the
way you want.   But, well, good luck.

jeff
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-04 17:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03 14:49 Question on peephole2 optimizer Henri Cloetens
2020-02-03 15:50 ` Jeff Law
2020-02-03 17:10   ` Richard Earnshaw (lists)
2020-02-04  9:43     ` Henri Cloetens
2020-02-04 17:09       ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).