Question on peephole2 optimizer

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* Question on peephole2 optimizer
@ 2020-02-03 14:49 Henri Cloetens
  2020-02-03 15:50 ` Jeff Law
  0 siblings, 1 reply; 5+ messages in thread
From: Henri Cloetens @ 2020-02-03 14:49 UTC (permalink / raw)
  To: gcc-help

Hello all,

I have a question on the peephole2 optimizer.

- My target has a "load double" instruction:
 Â Â  - It does an indexed load of a 64-bit operand to two 32-bit registers.
 Â Â  - The requirement is that the registers are adjacant
 Â Â Â Â  (Ri and Ri+1), and that the offset for the second load is 4 byte more
 Â Â Â  than for the first load.

- I can not find a way to describe this in gcc. I tried
 Â  "load_multiple", and this is OK, but gcc only calls that for stack 
pushing.
 Â  I tried the vector facility, but this does not work either.

- I tried to write a peephole2 optimizer, and this works out OK, it
 Â  manages to recognize the sequence, ... but the peephole2 optimizer is
 Â  run AFTER register allocation, and the optimization needs to be done
 Â  BEFORE, as there are constraints on the 2 registers, Ri and Ri+1.

Any suggestions ?. Is there any way to run peephole2 BEFORE register 
allocation ?.

Best Regards,

Henri.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question on peephole2 optimizer
  2020-02-03 14:49 Question on peephole2 optimizer Henri Cloetens
@ 2020-02-03 15:50 ` Jeff Law
  2020-02-03 17:10   ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Law @ 2020-02-03 15:50 UTC (permalink / raw)
  To: Henri Cloetens, gcc-help

On Mon, 2020-02-03 at 15:50 +0100, Henri Cloetens wrote:
> Hello all,
> 
> I have a question on the peephole2 optimizer.
> 
> - My target has a "load double" instruction:
>     - It does an indexed load of a 64-bit operand to two 32-bit registers.
>     - The requirement is that the registers are adjacant
>       (Ri and Ri+1), and that the offset for the second load is 4 byte more
>      than for the first load.
> 
> - I can not find a way to describe this in gcc. I tried
>    "load_multiple", and this is OK, but gcc only calls that for stack 
> pushing.
>    I tried the vector facility, but this does not work either.
> 
> - I tried to write a peephole2 optimizer, and this works out OK, it
>    manages to recognize the sequence, ... but the peephole2 optimizer is
>    run AFTER register allocation, and the optimization needs to be done
>    BEFORE, as there are constraints on the 2 registers, Ri and Ri+1.
> 
> Any suggestions ?. Is there any way to run peephole2 BEFORE register 
> allocation ?.
I suggest looking at ldp/stp support in the aarch64 backend.

jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question on peephole2 optimizer
  2020-02-03 15:50 ` Jeff Law
@ 2020-02-03 17:10   ` Richard Earnshaw (lists)
  2020-02-04  9:43     ` Henri Cloetens
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Earnshaw (lists) @ 2020-02-03 17:10 UTC (permalink / raw)
  To: law, Henri Cloetens, gcc-help

On 03/02/2020 15:49, Jeff Law wrote:
> On Mon, 2020-02-03 at 15:50 +0100, Henri Cloetens wrote:
>> Hello all,
>>
>> I have a question on the peephole2 optimizer.
>>
>> - My target has a "load double" instruction:
>>      - It does an indexed load of a 64-bit operand to two 32-bit registers.
>>      - The requirement is that the registers are adjacant
>>        (Ri and Ri+1), and that the offset for the second load is 4 byte more
>>       than for the first load.
>>
>> - I can not find a way to describe this in gcc. I tried
>>     "load_multiple", and this is OK, but gcc only calls that for stack
>> pushing.
>>     I tried the vector facility, but this does not work either.
>>
>> - I tried to write a peephole2 optimizer, and this works out OK, it
>>     manages to recognize the sequence, ... but the peephole2 optimizer is
>>     run AFTER register allocation, and the optimization needs to be done
>>     BEFORE, as there are constraints on the 2 registers, Ri and Ri+1.
>>
>> Any suggestions ?. Is there any way to run peephole2 BEFORE register
>> allocation ?.
> I suggest looking at ldp/stp support in the aarch64 backend.
> 
> jeff
> 

Closer would be the ldrd/strd support when generating code for Arm (not 
thumb); that has a similar restriction on register pairs being adjacent.

Summary, it's hard; and GCC's infrastructure does not support it 
particularly well.

R.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question on peephole2 optimizer
  2020-02-03 17:10   ` Richard Earnshaw (lists)
@ 2020-02-04  9:43     ` Henri Cloetens
  2020-02-04 17:09       ` Jeff Law
  0 siblings, 1 reply; 5+ messages in thread
From: Henri Cloetens @ 2020-02-04  9:43 UTC (permalink / raw)
  To: Richard Earnshaw (lists), law, gcc-help

Hello Richard, Jeff,

I checked both.
- The aarch64 backend uses the load double only for stacking operations.
 Â  This, I have. This functionality is provided by gcc via the 
load_multiple construct.
 Â  If you define it, gcc will use it for stack and unstack operations.
- The ARM has a peephole2 optimizer. This has the problem that it is run 
after the
 Â  register allocation, and if the register allocation needs to change 
for the optimization
 Â  to be done, the pattern fails. I tried that, I got that working, but 
... I dont like it.
- I found another way, which would work in theory:
 Â  a. Add it to the "movsi"
 Â Â Â  1. Make a "define_expand" of the movsi, which does the following:
 Â Â Â Â Â  a. For the 'normal' case, it calls a define_insn "movsi_internal"
 Â Â Â Â Â  b. It maintains a per-function history of past calls to self.
 Â Â Â Â Â  c. For every call to movsi, it looks in the history if it finds a 
'partner'
 Â Â Â Â Â Â Â Â Â  with which it can create a "load double"
 Â Â Â Â Â  d. If it finds one, it starts going back in the insn-list, and do 
checking
 Â Â Â Â Â Â Â Â Â  if the replacement is appropriate. It mainly means no in-between
 Â Â Â Â Â Â Â Â Â  jumps and labels, no in-between modification of the address 
register,
 Â Â Â Â Â Â Â Â Â  not too far back.
 Â Â Â Â Â  e. If the checking is successful, it replaces the /previous/ 
movsi with the load double.

For now, I will park this, and do as in aarch64. I might try it later.

Best Regards,

Henri.

On 02/03/2020 06:10 PM, Richard Earnshaw (lists) wrote:
> On 03/02/2020 15:49, Jeff Law wrote:
>> On Mon, 2020-02-03 at 15:50 +0100, Henri Cloetens wrote:
>>> Hello all,
>>>
>>> I have a question on the peephole2 optimizer.
>>>
>>> - My target has a "load double" instruction:
>>> Â Â Â Â  - It does an indexed load of a 64-bit operand to two 32-bit 
>>> registers.
>>> Â Â Â Â  - The requirement is that the registers are adjacant
>>> Â Â Â Â Â Â  (Ri and Ri+1), and that the offset for the second load is 4 
>>> byte more
>>> Â Â Â Â Â  than for the first load.
>>>
>>> - I can not find a way to describe this in gcc. I tried
>>> Â Â Â  "load_multiple", and this is OK, but gcc only calls that for stack
>>> pushing.
>>> Â Â Â  I tried the vector facility, but this does not work either.
>>>
>>> - I tried to write a peephole2 optimizer, and this works out OK, it
>>> Â Â Â  manages to recognize the sequence, ... but the peephole2 
>>> optimizer is
>>> Â Â Â  run AFTER register allocation, and the optimization needs to be 
>>> done
>>> Â Â Â  BEFORE, as there are constraints on the 2 registers, Ri and Ri+1.
>>>
>>> Any suggestions ?. Is there any way to run peephole2 BEFORE register
>>> allocation ?.
>> I suggest looking at ldp/stp support in the aarch64 backend.
>>
>> jeff
>>
>
> Closer would be the ldrd/strd support when generating code for Arm 
> (not thumb); that has a similar restriction on register pairs being 
> adjacent.
>
> Summary, it's hard; and GCC's infrastructure does not support it 
> particularly well.
>
> R.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question on peephole2 optimizer
  2020-02-04  9:43     ` Henri Cloetens
@ 2020-02-04 17:09       ` Jeff Law
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Law @ 2020-02-04 17:09 UTC (permalink / raw)
  To: Henri Cloetens, Richard Earnshaw (lists), gcc-help

On Tue, 2020-02-04 at 10:44 +0100, Henri Cloetens wrote:
> Hello Richard, Jeff,
> 
> I checked both. 
> - The aarch64 backend uses the load double only for stacking operations.
>   This, I have. This functionality is provided by gcc via the load_multiple construct.
>   If you define it, gcc will use it for stack and unstack operations.
> - The ARM has a peephole2 optimizer. This has the problem that it is run after the
>   register allocation, and if the register allocation needs to change for the optimization
>   to be done, the pattern fails. I tried that, I got that working, but ... I dont like it.
> - I found another way, which would work in theory:
>   a. Add it to the "movsi" 
>     1. Make a "define_expand" of the movsi, which does the following:
>       a. For the 'normal' case, it calls a define_insn "movsi_internal"
>       b. It maintains a per-function history of past calls to self.
>       c. For every call to movsi, it looks in the history if it finds a 'partner'
>           with which it can create a "load double"
>       d. If it finds one, it starts going back in the insn-list, and do checking
>           if the replacement is appropriate. It mainly means no in-between
>           jumps and labels, no in-between modification of the address register,
>           not too far back.
>       e. If the checking is successful, it replaces the previous movsi with the load double.
> 
> For now, I will park this, and do as in aarch64. I might try it later. 
Trying to do this before register allocation isn't going to work the
way you want.   But, well, good luck.

jeff
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-04 17:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03 14:49 Question on peephole2 optimizer Henri Cloetens
2020-02-03 15:50 ` Jeff Law
2020-02-03 17:10   ` Richard Earnshaw (lists)
2020-02-04  9:43     ` Henri Cloetens
2020-02-04 17:09       ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).