public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Describe instructions with same reg in def and use or mutiple defs and attach write latency
@ 2022-01-27  1:20 Reshabh K Sharma
  2022-01-28 17:39 ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Reshabh K Sharma @ 2022-01-27  1:20 UTC (permalink / raw)
  To: gcc-help

Hello everyone,

I am trying to implement a post address update load instruction in our
downstream riscv backend. I want to attach write latency information to a
use register. For example, rd = new_load rs1 rs2, I want to attach separate
write latency information to both rd and rs1.

I am unable to find how to describe instructions that have an operand as
both def and use, and later attach write latency information for the
instruction scheduler to work properly.

It will also be very helpful if you can point me to the implementation of
similar instructions in other backends, for example, LBZU in PowerPC, ARM's
LWD post/pre address update versions and ARM's neon simd load with update.

Many thanks,
Reshabh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Describe instructions with same reg in def and use or mutiple defs and attach write latency
  2022-01-27  1:20 Describe instructions with same reg in def and use or mutiple defs and attach write latency Reshabh K Sharma
@ 2022-01-28 17:39 ` Jeff Law
  2022-01-28 18:21   ` Segher Boessenkool
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2022-01-28 17:39 UTC (permalink / raw)
  To: Reshabh K Sharma, gcc-help



On 1/26/2022 6:20 PM, Reshabh K Sharma via Gcc-help wrote:
> Hello everyone,
>
> I am trying to implement a post address update load instruction in our
> downstream riscv backend. I want to attach write latency information to a
> use register. For example, rd = new_load rs1 rs2, I want to attach separate
> write latency information to both rd and rs1.
>
> I am unable to find how to describe instructions that have an operand as
> both def and use, and later attach write latency information for the
> instruction scheduler to work properly.
>
> It will also be very helpful if you can point me to the implementation of
> similar instructions in other backends, for example, LBZU in PowerPC, ARM's
> LWD post/pre address update versions and ARM's neon simd load with update.
I'm not sure the scheduler can model different latencies for the 
multiple outputs.  If anyone knows for sure, it would be Vlad.

It may not matter in practice though.  I'd hazard a guess these things 
hang out in the reorder buffer until both outputs are ready and only 
then will it move into the retirement queue.

jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Describe instructions with same reg in def and use or mutiple defs and attach write latency
  2022-01-28 17:39 ` Jeff Law
@ 2022-01-28 18:21   ` Segher Boessenkool
  2022-02-04  1:06     ` Reshabh K Sharma
  0 siblings, 1 reply; 7+ messages in thread
From: Segher Boessenkool @ 2022-01-28 18:21 UTC (permalink / raw)
  To: Jeff Law; +Cc: Reshabh K Sharma, gcc-help

On Fri, Jan 28, 2022 at 10:39:54AM -0700, Jeff Law via Gcc-help wrote:
> On 1/26/2022 6:20 PM, Reshabh K Sharma via Gcc-help wrote:
> >I am trying to implement a post address update load instruction in our
> >downstream riscv backend. I want to attach write latency information to a
> >use register. For example, rd = new_load rs1 rs2, I want to attach separate
> >write latency information to both rd and rs1.
> >
> >I am unable to find how to describe instructions that have an operand as
> >both def and use, and later attach write latency information for the
> >instruction scheduler to work properly.
> >
> >It will also be very helpful if you can point me to the implementation of
> >similar instructions in other backends, for example, LBZU in PowerPC, ARM's
> >LWD post/pre address update versions and ARM's neon simd load with update.
> I'm not sure the scheduler can model different latencies for the 
> multiple outputs.  If anyone knows for sure, it would be Vlad.

You can use TARGET_SCHED_ADJUST_COST?

> It may not matter in practice though.  I'd hazard a guess these things 
> hang out in the reorder buffer until both outputs are ready and only 
> then will it move into the retirement queue.

The GCC scheduling description says when results are ready, not when the
instructions (can) finish or complete (aka retire).  I do agree this
case doesn't matter so much, cases where it does matter will have their
dependency chains broken much earlier :-)


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Describe instructions with same reg in def and use or mutiple defs and attach write latency
  2022-01-28 18:21   ` Segher Boessenkool
@ 2022-02-04  1:06     ` Reshabh K Sharma
  2022-02-04  1:31       ` Segher Boessenkool
  0 siblings, 1 reply; 7+ messages in thread
From: Reshabh K Sharma @ 2022-02-04  1:06 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, gcc-help

On Fri, Jan 28, 2022 at 10:23 AM Segher Boessenkool <
segher@kernel.crashing.org> wrote:

> On Fri, Jan 28, 2022 at 10:39:54AM -0700, Jeff Law via Gcc-help wrote:
> > On 1/26/2022 6:20 PM, Reshabh K Sharma via Gcc-help wrote:
> > >I am trying to implement a post address update load instruction in our
> > >downstream riscv backend. I want to attach write latency information to
> a
> > >use register. For example, rd = new_load rs1 rs2, I want to attach
> separate
> > >write latency information to both rd and rs1.
> > >
> > >I am unable to find how to describe instructions that have an operand as
> > >both def and use, and later attach write latency information for the
> > >instruction scheduler to work properly.
> > >
> > >It will also be very helpful if you can point me to the implementation
> of
> > >similar instructions in other backends, for example, LBZU in PowerPC,
> ARM's
> > >LWD post/pre address update versions and ARM's neon simd load with
> update.
> > I'm not sure the scheduler can model different latencies for the
> > multiple outputs.  If anyone knows for sure, it would be Vlad.
>
> You can use TARGET_SCHED_ADJUST_COST?
>

Thank you so much! I think target_sched_adjust_cost will do.
Given two rtx_insn,
x = exp_load addr offset and
y = add addr z,
these two instructions are the input arguments to target_sched_adjust_cost,

how do I check that given rtx_insn is exp_load? (how do we check if
rtx_insn is of type exp_load, add or any other target specific instruction?)
and how do I check if there is a read after read dependency for addr
operand and not the offset.

> It may not matter in practice though.  I'd hazard a guess these things
> > hang out in the reorder buffer until both outputs are ready and only
> > then will it move into the retirement queue.
>
> The GCC scheduling description says when results are ready, not when the
> instructions (can) finish or complete (aka retire).  I do agree this
> case doesn't matter so much, cases where it does matter will have their
> dependency chains broken much earlier :-)
>
>
> Segher


Thanks again for the help!

Reshabh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Describe instructions with same reg in def and use or mutiple defs and attach write latency
  2022-02-04  1:06     ` Reshabh K Sharma
@ 2022-02-04  1:31       ` Segher Boessenkool
  2022-02-16 19:41         ` Reshabh K Sharma
  0 siblings, 1 reply; 7+ messages in thread
From: Segher Boessenkool @ 2022-02-04  1:31 UTC (permalink / raw)
  To: Reshabh K Sharma; +Cc: Jeff Law, gcc-help

Hi!

On Thu, Feb 03, 2022 at 05:06:23PM -0800, Reshabh K Sharma wrote:
> On Fri, Jan 28, 2022 at 10:23 AM Segher Boessenkool <
> segher@kernel.crashing.org> wrote:
> > You can use TARGET_SCHED_ADJUST_COST?
> 
> Thank you so much! I think target_sched_adjust_cost will do.
> Given two rtx_insn,
> x = exp_load addr offset and
> y = add addr z,
> these two instructions are the input arguments to target_sched_adjust_cost,
> 
> how do I check that given rtx_insn is exp_load? (how do we check if
> rtx_insn is of type exp_load, add or any other target specific instruction?)
> and how do I check if there is a read after read dependency for addr
> operand and not the offset.

"type" is just an insn attribute, so you would use
  if (get_attr_type (insn) == TYPE_EXP_LOAD)
or similar.


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Describe instructions with same reg in def and use or mutiple defs and attach write latency
  2022-02-04  1:31       ` Segher Boessenkool
@ 2022-02-16 19:41         ` Reshabh K Sharma
  2022-02-17 18:48           ` Segher Boessenkool
  0 siblings, 1 reply; 7+ messages in thread
From: Reshabh K Sharma @ 2022-02-16 19:41 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, gcc-help

On Thu, Feb 3, 2022 at 5:33 PM Segher Boessenkool <
segher@kernel.crashing.org> wrote:

> Hi!
>
> On Thu, Feb 03, 2022 at 05:06:23PM -0800, Reshabh K Sharma wrote:
> > On Fri, Jan 28, 2022 at 10:23 AM Segher Boessenkool <
> > segher@kernel.crashing.org> wrote:
> > > You can use TARGET_SCHED_ADJUST_COST?
> >
> > Thank you so much! I think target_sched_adjust_cost will do.
> > Given two rtx_insn,
> > x = exp_load addr offset and
> > y = add addr z,
> > these two instructions are the input arguments to
> target_sched_adjust_cost,
> >
> > how do I check that given rtx_insn is exp_load? (how do we check if
> > rtx_insn is of type exp_load, add or any other target specific
> instruction?)
> > and how do I check if there is a read after read dependency for addr
> > operand and not the offset.
>
> "type" is just an insn attribute, so you would use
>   if (get_attr_type (insn) == TYPE_EXP_LOAD)
> or similar.
>

Thank you so much!

Initially I added an instruction in binutils inside opcode/riscv-opc.c as,
"{"flwr",      0, INSN_CLASS_I,   "D,s,t",  MATCH_FLWR, MASK_FLWR,
match_opcode, INSN_DREF|INSN_4_BYTE }," for a custom instruction, flwr rd,
rs1, rs2

I wanted to add scheduling cost to rd and rs, I was suggested to use
TARGET_SCHED_ADJUST_COST but there I need to check if the instruction is
FLWR and as suggested I tried using get_attr_type. I realized that first I
need to set the type then use get_attr_type. I also couldn't find any other
place to set the attribute other than define_insn but this custom
instruction was just going to be used in inline asm right now so there was
no equivalent rtl from which I can lower into this but since I was not able
to find any other way to set the attribute, I decided to add a pattern
(hoping it to be unmatchable) where I could add the attribute, so inside
riscv.md I added the define_insn for flwr and used set_attr but I'm not
able to find any instruction in the TARGET_SCHED_ADJUST_COST for the
specific type attr.

1. Does inline asm compilation flow goes through the
TARGET_SCHED_ADJUST_COST?
2. Is there a better way to do this? / Am I missing something?


> Segher
>

Many thanks,
Reshabh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Describe instructions with same reg in def and use or mutiple defs and attach write latency
  2022-02-16 19:41         ` Reshabh K Sharma
@ 2022-02-17 18:48           ` Segher Boessenkool
  0 siblings, 0 replies; 7+ messages in thread
From: Segher Boessenkool @ 2022-02-17 18:48 UTC (permalink / raw)
  To: Reshabh K Sharma; +Cc: Jeff Law, gcc-help

On Wed, Feb 16, 2022 at 11:41:37AM -0800, Reshabh K Sharma wrote:
> I wanted to add scheduling cost to rd and rs, I was suggested to use
> TARGET_SCHED_ADJUST_COST but there I need to check if the instruction is
> FLWR and as suggested I tried using get_attr_type. I realized that first I
> need to set the type then use get_attr_type. I also couldn't find any other
> place to set the attribute other than define_insn but this custom
> instruction was just going to be used in inline asm right now so there was
> no equivalent rtl from which I can lower into this but since I was not able
> to find any other way to set the attribute, I decided to add a pattern
> (hoping it to be unmatchable) where I could add the attribute, so inside
> riscv.md I added the define_insn for flwr and used set_attr but I'm not
> able to find any instruction in the TARGET_SCHED_ADJUST_COST for the
> specific type attr.
> 
> 1. Does inline asm compilation flow goes through the
> TARGET_SCHED_ADJUST_COST?
> 2. Is there a better way to do this? / Am I missing something?

GCC does not look at the assembler template to try to figure out what
instructions are in there.  This is by design.  So, the compiler can
never know much about how it will schedule on the real machine (it does
estimate (pessimistically) how big the resulting machine code will be,
so that any branches can reach their target).

Making a define_insn for your insn is exactly the right plan.  You do
not have too hope it does not accidentally match anything else: if you
make it an unspec, it can never match anything but itself (nothing with
a different "index" number).


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-02-17 18:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27  1:20 Describe instructions with same reg in def and use or mutiple defs and attach write latency Reshabh K Sharma
2022-01-28 17:39 ` Jeff Law
2022-01-28 18:21   ` Segher Boessenkool
2022-02-04  1:06     ` Reshabh K Sharma
2022-02-04  1:31       ` Segher Boessenkool
2022-02-16 19:41         ` Reshabh K Sharma
2022-02-17 18:48           ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).