A problem with one instruction multiple latencies and pipelines

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* A problem with one instruction multiple latencies and pipelines
@ 2020-09-07  6:08 Qian, Jianhua
  2020-09-07  7:40 ` Richard Biener
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Qian, Jianhua @ 2020-09-07  6:08 UTC (permalink / raw)
  To: gcc

Hi

I'm adding a new machine model. I have a problem when writing the "define_insn_reservation" for instruction scheduling.
How to write the "define_insn_reservation" for one instruction that there are different latencies and pipelines according to parameter.

For example, the ADD (shifted register) instruction in a64fx

Instruction            Option                             Latency    Pipeline
ADD (shifted register)  <amount> = 0                     1          EX* | EAG*
                      <amount> = [1-4] && <shift>=LSL  1+1       (EXA + EXA) | (EXB + EXB)
                                                         2+1       (EXA + EXA) | (EXB + EXB)

In aarch64.md ADD (shifted register) instruction is defined as following．
 (define_insn "*add_<shift>_<mode>"
  [(set (match_operand:GPI 0 "register_operand" "=r")
        (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
                              (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n"))
                  (match_operand:GPI 3 "register_operand" "r")))]
  ""
  "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
  [(set_attr "type" "alu_shift_imm")]
)

It could not be distinguished by the type "alu_shift_imm" when writing "define_insn_reservation" for ADD (shifted register). 
What should I do?

Regards
Qian




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-07  6:08 A problem with one instruction multiple latencies and pipelines Qian, Jianhua
@ 2020-09-07  7:40 ` Richard Biener
  2020-09-07  8:45   ` Qian, Jianhua
  2020-09-07 20:20 ` Richard Sandiford
  2020-09-11 13:30 ` Richard Earnshaw
  2 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2020-09-07  7:40 UTC (permalink / raw)
  To: Qian, Jianhua; +Cc: gcc

On Mon, Sep 7, 2020 at 8:10 AM Qian, Jianhua <qianjh@cn.fujitsu.com> wrote:
>
> Hi
>
> I'm adding a new machine model. I have a problem when writing the "define_insn_reservation" for instruction scheduling.
> How to write the "define_insn_reservation" for one instruction that there are different latencies and pipelines according to parameter.
>
> For example, the ADD (shifted register) instruction in a64fx
>
> Instruction            Option                             Latency    Pipeline
> ADD (shifted register)  <amount> = 0                     1          EX* | EAG*
>                       <amount> = [1-4] && <shift>=LSL  1+1       (EXA + EXA) | (EXB + EXB)
>                                                          2+1       (EXA + EXA) | (EXB + EXB)
>
> In aarch64.md ADD (shifted register) instruction is defined as following．
>  (define_insn "*add_<shift>_<mode>"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>                               (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n"))
>                   (match_operand:GPI 3 "register_operand" "r")))]
>   ""
>   "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
>   [(set_attr "type" "alu_shift_imm")]
> )
>
> It could not be distinguished by the type "alu_shift_imm" when writing "define_insn_reservation" for ADD (shifted register).
> What should I do?

Just a guess - I'm not very familiar with the pipeline modeling, you
probably need to
expose two alternatives so you can assign a different type to the second one.

Other than that modeling the more restrictive (or permissive?) variant
might work good enough in practice.
a64fx is probably out-of-order anyway.

Richard.

> Regards
> Qian
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: A problem with one instruction multiple latencies and pipelines
  2020-09-07  7:40 ` Richard Biener
@ 2020-09-07  8:45   ` Qian, Jianhua
  2020-09-07 11:58     ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Qian, Jianhua @ 2020-09-07  8:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc

Hi Richard

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, September 7, 2020 3:41 PM
> To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>
> Cc: gcc@gcc.gnu.org
> Subject: Re: A problem with one instruction multiple latencies and pipelines
> 
> On Mon, Sep 7, 2020 at 8:10 AM Qian, Jianhua <qianjh@cn.fujitsu.com> wrote:
> >
> > Hi
> >
> > I'm adding a new machine model. I have a problem when writing the
> "define_insn_reservation" for instruction scheduling.
> > How to write the "define_insn_reservation" for one instruction that there are
> different latencies and pipelines according to parameter.
> >
> > For example, the ADD (shifted register) instruction in a64fx
> >
> > Instruction            Option                             Latency
> Pipeline
> > ADD (shifted register)  <amount> = 0                     1          EX*
> | EAG*
> >                       <amount> = [1-4] && <shift>=LSL  1+1
> (EXA + EXA) | (EXB + EXB)
> >                                                          2+1       (EXA
> + EXA) | (EXB + EXB)
> >
> > In aarch64.md ADD (shifted register) instruction is defined as following．
> >  (define_insn "*add_<shift>_<mode>"
> >   [(set (match_operand:GPI 0 "register_operand" "=r")
> >         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand"
> "r")
> >                               (match_operand:QI 2
> "aarch64_shift_imm_<mode>" "n"))
> >                   (match_operand:GPI 3 "register_operand" "r")))]
> >   ""
> >   "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
> >   [(set_attr "type" "alu_shift_imm")]
> > )
> >
> > It could not be distinguished by the type "alu_shift_imm" when writing
> "define_insn_reservation" for ADD (shifted register).
> > What should I do?
> 
> Just a guess - I'm not very familiar with the pipeline modeling, you probably
> need to expose two alternatives so you can assign a different type to the second
> one.
I'm considering such method, 
but if I do that I'm afraid it has side effects on other machine models of aarch64 series.
Some instructions' definition will be changed in aarch64.md file.

> Other than that modeling the more restrictive (or permissive?) variant might
> work good enough in practice.
Is your mean that an approximate modeling is good enough?

> a64fx is probably out-of-order anyway.
Yes, a64fx is an out-of-order architecture.

Regards
Qian

> 
> Richard.
> 
> > Regards
> > Qian
> >
> >
> >
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-07  8:45   ` Qian, Jianhua
@ 2020-09-07 11:58     ` Richard Biener
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Biener @ 2020-09-07 11:58 UTC (permalink / raw)
  To: Qian, Jianhua; +Cc: gcc

On Mon, Sep 7, 2020 at 10:46 AM Qian, Jianhua <qianjh@cn.fujitsu.com> wrote:
>
> Hi Richard
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Monday, September 7, 2020 3:41 PM
> > To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>
> > Cc: gcc@gcc.gnu.org
> > Subject: Re: A problem with one instruction multiple latencies and pipelines
> >
> > On Mon, Sep 7, 2020 at 8:10 AM Qian, Jianhua <qianjh@cn.fujitsu.com> wrote:
> > >
> > > Hi
> > >
> > > I'm adding a new machine model. I have a problem when writing the
> > "define_insn_reservation" for instruction scheduling.
> > > How to write the "define_insn_reservation" for one instruction that there are
> > different latencies and pipelines according to parameter.
> > >
> > > For example, the ADD (shifted register) instruction in a64fx
> > >
> > > Instruction            Option                             Latency
> > Pipeline
> > > ADD (shifted register)  <amount> = 0                     1          EX*
> > | EAG*
> > >                       <amount> = [1-4] && <shift>=LSL  1+1
> > (EXA + EXA) | (EXB + EXB)
> > >                                                          2+1       (EXA
> > + EXA) | (EXB + EXB)
> > >
> > > In aarch64.md ADD (shifted register) instruction is defined as following．
> > >  (define_insn "*add_<shift>_<mode>"
> > >   [(set (match_operand:GPI 0 "register_operand" "=r")
> > >         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand"
> > "r")
> > >                               (match_operand:QI 2
> > "aarch64_shift_imm_<mode>" "n"))
> > >                   (match_operand:GPI 3 "register_operand" "r")))]
> > >   ""
> > >   "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
> > >   [(set_attr "type" "alu_shift_imm")]
> > > )
> > >
> > > It could not be distinguished by the type "alu_shift_imm" when writing
> > "define_insn_reservation" for ADD (shifted register).
> > > What should I do?
> >
> > Just a guess - I'm not very familiar with the pipeline modeling, you probably
> > need to expose two alternatives so you can assign a different type to the second
> > one.
> I'm considering such method,
> but if I do that I'm afraid it has side effects on other machine models of aarch64 series.
> Some instructions' definition will be changed in aarch64.md file.
>
> > Other than that modeling the more restrictive (or permissive?) variant might
> > work good enough in practice.
> Is your mean that an approximate modeling is good enough?

Yes.

> > a64fx is probably out-of-order anyway.
> Yes, a64fx is an out-of-order architecture.
>
> Regards
> Qian
>
> >
> > Richard.
> >
> > > Regards
> > > Qian
> > >
> > >
> > >
> >
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-07  6:08 A problem with one instruction multiple latencies and pipelines Qian, Jianhua
  2020-09-07  7:40 ` Richard Biener
@ 2020-09-07 20:20 ` Richard Sandiford
  2020-09-08  5:34   ` Qian, Jianhua
  2020-09-09 21:22   ` Segher Boessenkool
  2020-09-11 13:30 ` Richard Earnshaw
  2 siblings, 2 replies; 20+ messages in thread
From: Richard Sandiford @ 2020-09-07 20:20 UTC (permalink / raw)
  To: Qian, Jianhua; +Cc: gcc

"Qian, Jianhua" <qianjh@cn.fujitsu.com> writes:
> Hi
>
> I'm adding a new machine model. I have a problem when writing the "define_insn_reservation" for instruction scheduling.
> How to write the "define_insn_reservation" for one instruction that there are different latencies and pipelines according to parameter.
>
> For example, the ADD (shifted register) instruction in a64fx
>
> Instruction            Option                             Latency    Pipeline
> ADD (shifted register)  <amount> = 0                     1          EX* | EAG*
>                       <amount> = [1-4] && <shift>=LSL  1+1       (EXA + EXA) | (EXB + EXB)
>                                                          2+1       (EXA + EXA) | (EXB + EXB)
>
> In aarch64.md ADD (shifted register) instruction is defined as following．
>  (define_insn "*add_<shift>_<mode>"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>                               (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n"))
>                   (match_operand:GPI 3 "register_operand" "r")))]
>   ""
>   "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
>   [(set_attr "type" "alu_shift_imm")]
> )
>
> It could not be distinguished by the type "alu_shift_imm" when writing "define_insn_reservation" for ADD (shifted register). 
> What should I do?

This is just personal opinion, but in general (from the point of view
of a new port, or a new subport like SVE), I think the best approach
to handling the "type" attribute is to start with the coarsest
classification that makes sense, then split these relatively coarse
types up whenever there's a specific need.

When taking that approach, it's OK (and perhaps even a good sign)
for an existing type to sometimes be too coarse for a new CPU.

So thanks for asking about this, and please don't feel constrained
by the existing "type" classification.  IMO we should split existing
types wherever that makes sense for new CPUs.

In a situation like this, I guess it would make sense to treat
alu_shift_imm as a conservative general classification and add
alu_shift_imm1to4 (say) as a new type.

I'd recommend a two-stage approach, with each stage as a separate
patch for ease of review:

(1) Add the new type to the definition of the "type" attribute and run:

      sed -i s/alu_shift_imm/alu_shift_imm1to4,alu_shift_imm/g

    on all the config/arm and config/aarch64 tuning descriptions.

    This is a purely mechanical replacement and could be tested in
    the normal way (i.e. there would be no need to do special testing
    to make sure that existing descriptions are unaffected: the change
    is mechanical enough not to need that).

(2) Make the define_insns use the new type where appropriate.
    Like Richard says, this would be handled by splitting the single
    define_insn alternative above into two separate alternatives.
    It's also possible to something like:

      (set (attr "type")
           (if_then_else (match_operand ...)
                         (const_string "...")
                         (const_string "...")))

    which localises the change to the attribute definition.

    If we end up needing that construct several times, it would also
    be possible to add a new attribute, e.g.:

      (define_attr "shift_imm_type" "none,alu_shift_op2")

    and then change the default value of the type attribute from:

      (const_string "untyped")

    to:

      (cond [(eq_attr "shift_imm_type" "alu_shift_op2")
             (if_then_else (match_operand 2 ...)
                           (const_string "...")
                           (const_string "..."))]
            (const_string "untyped"))

    define_insns like the above could then use:

      (set_attr "shift_imm_type" "alu_shift_op2")

    instead of setting "type" directly.  That's definitely more
    complicated, but it can help to reduce cut-&-paste.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: A problem with one instruction multiple latencies and pipelines
  2020-09-07 20:20 ` Richard Sandiford
@ 2020-09-08  5:34   ` Qian, Jianhua
  2020-09-09 21:22   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Qian, Jianhua @ 2020-09-08  5:34 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc

Hi Richard

Thanks a lot for your advises and detailed comments.

We will discuss which instructions need to be accurately classified, 
and estimate the workload.

Regards
Qian

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Tuesday, September 8, 2020 4:21 AM
> To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>
> Cc: gcc@gcc.gnu.org
> Subject: Re: A problem with one instruction multiple latencies and pipelines
> 
> "Qian, Jianhua" <qianjh@cn.fujitsu.com> writes:
> > Hi
> >
> > I'm adding a new machine model. I have a problem when writing the
> "define_insn_reservation" for instruction scheduling.
> > How to write the "define_insn_reservation" for one instruction that there are
> different latencies and pipelines according to parameter.
> >
> > For example, the ADD (shifted register) instruction in a64fx
> >
> > Instruction            Option                             Latency
> Pipeline
> > ADD (shifted register)  <amount> = 0                     1          EX*
> | EAG*
> >                       <amount> = [1-4] && <shift>=LSL  1+1
> (EXA + EXA) | (EXB + EXB)
> >                                                          2+1       (EXA
> + EXA) | (EXB + EXB)
> >
> > In aarch64.md ADD (shifted register) instruction is defined as following．
> >  (define_insn "*add_<shift>_<mode>"
> >   [(set (match_operand:GPI 0 "register_operand" "=r")
> >         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand"
> "r")
> >                               (match_operand:QI 2
> "aarch64_shift_imm_<mode>" "n"))
> >                   (match_operand:GPI 3 "register_operand" "r")))]
> >   ""
> >   "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
> >   [(set_attr "type" "alu_shift_imm")]
> > )
> >
> > It could not be distinguished by the type "alu_shift_imm" when writing
> "define_insn_reservation" for ADD (shifted register).
> > What should I do?
> 
> This is just personal opinion, but in general (from the point of view of a new port,
> or a new subport like SVE), I think the best approach to handling the "type"
> attribute is to start with the coarsest classification that makes sense, then split
> these relatively coarse types up whenever there's a specific need.
> 
> When taking that approach, it's OK (and perhaps even a good sign) for an
> existing type to sometimes be too coarse for a new CPU.
> 
> So thanks for asking about this, and please don't feel constrained by the
> existing "type" classification.  IMO we should split existing types wherever
> that makes sense for new CPUs.
> 
> In a situation like this, I guess it would make sense to treat alu_shift_imm as a
> conservative general classification and add
> alu_shift_imm1to4 (say) as a new type.
> 
> I'd recommend a two-stage approach, with each stage as a separate patch for
> ease of review:
> 
> (1) Add the new type to the definition of the "type" attribute and run:
> 
>       sed -i s/alu_shift_imm/alu_shift_imm1to4,alu_shift_imm/g
> 
>     on all the config/arm and config/aarch64 tuning descriptions.
> 
>     This is a purely mechanical replacement and could be tested in
>     the normal way (i.e. there would be no need to do special testing
>     to make sure that existing descriptions are unaffected: the change
>     is mechanical enough not to need that).
> 
> (2) Make the define_insns use the new type where appropriate.
>     Like Richard says, this would be handled by splitting the single
>     define_insn alternative above into two separate alternatives.
>     It's also possible to something like:
> 
>       (set (attr "type")
>            (if_then_else (match_operand ...)
>                          (const_string "...")
>                          (const_string "...")))
> 
>     which localises the change to the attribute definition.
> 
>     If we end up needing that construct several times, it would also
>     be possible to add a new attribute, e.g.:
> 
>       (define_attr "shift_imm_type" "none,alu_shift_op2")
> 
>     and then change the default value of the type attribute from:
> 
>       (const_string "untyped")
> 
>     to:
> 
>       (cond [(eq_attr "shift_imm_type" "alu_shift_op2")
>              (if_then_else (match_operand 2 ...)
>                            (const_string "...")
>                            (const_string "..."))]
>             (const_string "untyped"))
> 
>     define_insns like the above could then use:
> 
>       (set_attr "shift_imm_type" "alu_shift_op2")
> 
>     instead of setting "type" directly.  That's definitely more
>     complicated, but it can help to reduce cut-&-paste.
> 
> Thanks,
> Richard
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-07 20:20 ` Richard Sandiford
  2020-09-08  5:34   ` Qian, Jianhua
@ 2020-09-09 21:22   ` Segher Boessenkool
  2020-09-10  5:01     ` Qian, Jianhua
  2020-09-10 10:04     ` Richard Sandiford
  1 sibling, 2 replies; 20+ messages in thread
From: Segher Boessenkool @ 2020-09-09 21:22 UTC (permalink / raw)
  To: Qian, Jianhua, gcc, richard.sandiford

Hi!

On Mon, Sep 07, 2020 at 09:20:59PM +0100, Richard Sandiford wrote:
> This is just personal opinion, but in general (from the point of view
> of a new port, or a new subport like SVE), I think the best approach
> to handling the "type" attribute is to start with the coarsest
> classification that makes sense, then split these relatively coarse
> types up whenever there's a specific need.

Agreed.

> When taking that approach, it's OK (and perhaps even a good sign)
> for an existing type to sometimes be too coarse for a new CPU.
> 
> So thanks for asking about this, and please don't feel constrained
> by the existing "type" classification.  IMO we should split existing
> types wherever that makes sense for new CPUs.

You can also use some other attributes to classify instructions, you
don't have to put it all in one "type" attribute.  This can of course be
done later, at a time when it is clearer what a good design will be.
Sometimes it is obvious from the start though :-)

(This primarily makes the pipeline descriptions much simpler, but also
custom scheduling code and the like.  If one core has two types of "A"
insn, say "Aa" and "Ab", it isn't nice if all other cores now have to
handle both "Aa" and "Ab" instead of just "A").

Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: A problem with one instruction multiple latencies and pipelines
  2020-09-09 21:22   ` Segher Boessenkool
@ 2020-09-10  5:01     ` Qian, Jianhua
  2020-09-10 10:04     ` Richard Sandiford
  1 sibling, 0 replies; 20+ messages in thread
From: Qian, Jianhua @ 2020-09-10  5:01 UTC (permalink / raw)
  To: Segher Boessenkool, gcc, richard.sandiford

Hi Segher

> You can also use some other attributes to classify instructions, you don't have
> to put it all in one "type" attribute.  This can of course be done later, at a time
> when it is clearer what a good design will be.
> Sometimes it is obvious from the start though :-)

Thank you for your advice. It is also a good idea.
Considering other cores(existing and future),
I think it is better to keep the type attribute unchanging,
and add other attributes to classify instructions.

Regards
Qian

> -----Original Message-----
> From: Segher Boessenkool <segher@kernel.crashing.org>
> Sent: Thursday, September 10, 2020 5:23 AM
> To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>; gcc@gcc.gnu.org;
> richard.sandiford@arm.com
> Subject: Re: A problem with one instruction multiple latencies and pipelines
> 
> Hi!
> 
> On Mon, Sep 07, 2020 at 09:20:59PM +0100, Richard Sandiford wrote:
> > This is just personal opinion, but in general (from the point of view
> > of a new port, or a new subport like SVE), I think the best approach
> > to handling the "type" attribute is to start with the coarsest
> > classification that makes sense, then split these relatively coarse
> > types up whenever there's a specific need.
> 
> Agreed.
> 
> > When taking that approach, it's OK (and perhaps even a good sign) for
> > an existing type to sometimes be too coarse for a new CPU.
> >
> > So thanks for asking about this, and please don't feel constrained by
> > the existing "type" classification.  IMO we should split existing
> > types wherever that makes sense for new CPUs.
> 
> You can also use some other attributes to classify instructions, you don't have
> to put it all in one "type" attribute.  This can of course be done later, at a time
> when it is clearer what a good design will be.
> Sometimes it is obvious from the start though :-)
> 
> (This primarily makes the pipeline descriptions much simpler, but also custom
> scheduling code and the like.  If one core has two types of "A"
> insn, say "Aa" and "Ab", it isn't nice if all other cores now have to handle both
> "Aa" and "Ab" instead of just "A").
> 
> 
> Segher
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-09 21:22   ` Segher Boessenkool
  2020-09-10  5:01     ` Qian, Jianhua
@ 2020-09-10 10:04     ` Richard Sandiford
  2020-09-10 23:00       ` Segher Boessenkool
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2020-09-10 10:04 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Qian, Jianhua, gcc

Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi!
>
> On Mon, Sep 07, 2020 at 09:20:59PM +0100, Richard Sandiford wrote:
>> This is just personal opinion, but in general (from the point of view
>> of a new port, or a new subport like SVE), I think the best approach
>> to handling the "type" attribute is to start with the coarsest
>> classification that makes sense, then split these relatively coarse
>> types up whenever there's a specific need.
>
> Agreed.
>
>> When taking that approach, it's OK (and perhaps even a good sign)
>> for an existing type to sometimes be too coarse for a new CPU.
>> 
>> So thanks for asking about this, and please don't feel constrained
>> by the existing "type" classification.  IMO we should split existing
>> types wherever that makes sense for new CPUs.
>
> You can also use some other attributes to classify instructions, you
> don't have to put it all in one "type" attribute.  This can of course be
> done later, at a time when it is clearer what a good design will be.
> Sometimes it is obvious from the start though :-)
>
> (This primarily makes the pipeline descriptions much simpler, but also
> custom scheduling code and the like.  If one core has two types of "A"
> insn, say "Aa" and "Ab", it isn't nice if all other cores now have to
> handle both "Aa" and "Ab" instead of just "A").

I guess it makes the description of other cores more verbose, but it
doesn't IMO make them more complicated.  It's still just one test
against one attribute.  And updating existing descriptions can be
done automatically by sed.

The difficulty with splitting into subattributes is that the tests in
the cores that *do* care then become more complicated.  E.g. you have
to do:

   (ior (and (eq_attr "type" "foo")
             (eq_attr "foo_subtype" "foo1"))
        (eq_attr "type" "...others.."))

and:

   (ior (and (eq_attr "type" "foo")
             (eq_attr "foo_subtype" "!foo1"))
        (eq_attr "type" "...others.."))

instead of just:

   (eq_attr "type" "...")

in both cases.

It's not too bad when it's just one subtype (like above), but it could
easily get out of hand.

Perhaps in this case there's an argument in favour of having a separate
attribute due to combinatorial explosion.  E.g. we have:

- alu_shift_imm
- alus_shift_imm
- logic_shift_imm
- logics_shift_imm

so having one attribute that describes the shift in all four cases
would perhaps be better than splitting each of them into two.
Arguments in the other direction:

- Once we have a separate attribute, it isn't clear where the line
  should be drawn.  Alternatives include:

  (1) keeping the new attribute specific to shift immediates only

  (2) also having a “register” value, and folding:
      alu_shift_imm, alu_shift_reg -> alu_shift

  (3) also having a “none” value, and doing away with the *_shift
      type variants altogether:
      alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg

  I think we should have a clear reason for whichever we pick,
  otherwise we could be creating technical debt for later.

- That approach is the opposite of what we did for the neon_* attributes:
  every type has a q variant, rather than the size being a separate
  attribute.

FWIW, we shouldn't assume that this distinction is specific to a64fx. :-)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-10 10:04     ` Richard Sandiford
@ 2020-09-10 23:00       ` Segher Boessenkool
  2020-09-11  7:44         ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Segher Boessenkool @ 2020-09-10 23:00 UTC (permalink / raw)
  To: Qian, Jianhua, gcc, richard.sandiford

On Thu, Sep 10, 2020 at 11:04:26AM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > You can also use some other attributes to classify instructions, you
> > don't have to put it all in one "type" attribute.  This can of course be
> > done later, at a time when it is clearer what a good design will be.
> > Sometimes it is obvious from the start though :-)
> >
> > (This primarily makes the pipeline descriptions much simpler, but also
> > custom scheduling code and the like.  If one core has two types of "A"
> > insn, say "Aa" and "Ab", it isn't nice if all other cores now have to
> > handle both "Aa" and "Ab" instead of just "A").
> 
> I guess it makes the description of other cores more verbose, but it
> doesn't IMO make them more complicated.  It's still just one test
> against one attribute.  And updating existing descriptions can be
> done automatically by sed.

Consider cores that do not care about the "subtype" at all: when using
just "type", all cores have to test for "foo,foo_subtype", while with
a separate attribute they can just test for "foo".

A typical example in rs6000 is the "sign_extend" attribute for load
instructions.  All cores before power4 do not care at all about it.
(Load and store insns get into combinatorial explosion land as well,
as you discuss below, with "update" and "indexed" forms).

> The difficulty with splitting into subattributes is that the tests in
> the cores that *do* care then become more complicated.  E.g. you have
> to do:
> 
>    (ior (and (eq_attr "type" "foo")
>              (eq_attr "foo_subtype" "foo1"))
>         (eq_attr "type" "...others.."))
> 
> and:
> 
>    (ior (and (eq_attr "type" "foo")
>              (eq_attr "foo_subtype" "!foo1"))
>         (eq_attr "type" "...others.."))
> 
> instead of just:
> 
>    (eq_attr "type" "...")
> 
> in both cases.

Yes.  It is a trade-off.

> It's not too bad when it's just one subtype (like above), but it could
> easily get out of hand.
> 
> Perhaps in this case there's an argument in favour of having a separate
> attribute due to combinatorial explosion.  E.g. we have:
> 
> - alu_shift_imm
> - alus_shift_imm
> - logic_shift_imm
> - logics_shift_imm
> 
> so having one attribute that describes the shift in all four cases
> would perhaps be better than splitting each of them into two.

Yeas, exactly.  And for rs6000 we *did* have many more types before, and
very frequently one was missed (in a scheduling description usually).
Especially common was when some new type attribute was added, not all
existing cpu descriptions were updated correctly.  Maybe this is the
strongest argument "for" actually :-)

> Arguments in the other direction:
> 
> - Once we have a separate attribute, it isn't clear where the line
>   should be drawn.

Good taste ;-)

>  Alternatives include:
> 
>   (1) keeping the new attribute specific to shift immediates only
> 
>   (2) also having a “register” value, and folding:
>       alu_shift_imm, alu_shift_reg -> alu_shift
> 
>   (3) also having a “none” value, and doing away with the *_shift
>       type variants altogether:
>       alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg
> 
>   I think we should have a clear reason for whichever we pick,
>   otherwise we could be creating technical debt for later.

Yes.

One example of what we do:

;; Is this instruction using operands[2] as shift amount, and can that be a
;; register?
;; This is used for shift insns.
(define_attr "maybe_var_shift" "no,yes" (const_string "no"))

;; Is this instruction using a shift amount from a register?
;; This is used for shift insns.
(define_attr "var_shift" "no,yes"
  (if_then_else (and (eq_attr "type" "shift")
                     (eq_attr "maybe_var_shift" "yes"))
                (if_then_else (match_operand 2 "gpc_reg_operand")
                              (const_string "yes")
                              (const_string "no"))
                (const_string "no")))

define_insns only use maybe_var_shift.  Only the power6 and e*500* cpu
descriptions ever look at var_shift.  (Directly.  There is some other
scheduling code that looks at it too -- and all but the power6 ones seem
to be incorrect!  That is all a direct translation of "type-only" code
there was before...)

> - That approach is the opposite of what we did for the neon_* attributes:
>   every type has a q variant, rather than the size being a separate
>   attribute.
> 
> FWIW, we shouldn't assume that this distinction is specific to a64fx. :-)

Yeah, absolutely.  This classifies instructions, of course it is very
strongly influenced by what scheduling descriptions want, but it helps
a lot if you describe more general characteristics :-)


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-10 23:00       ` Segher Boessenkool
@ 2020-09-11  7:44         ` Richard Sandiford
  2020-09-11 13:58           ` Segher Boessenkool
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2020-09-11  7:44 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Qian, Jianhua, gcc

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Thu, Sep 10, 2020 at 11:04:26AM +0100, Richard Sandiford wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > You can also use some other attributes to classify instructions, you
>> > don't have to put it all in one "type" attribute.  This can of course be
>> > done later, at a time when it is clearer what a good design will be.
>> > Sometimes it is obvious from the start though :-)
>> >
>> > (This primarily makes the pipeline descriptions much simpler, but also
>> > custom scheduling code and the like.  If one core has two types of "A"
>> > insn, say "Aa" and "Ab", it isn't nice if all other cores now have to
>> > handle both "Aa" and "Ab" instead of just "A").
>> 
>> I guess it makes the description of other cores more verbose, but it
>> doesn't IMO make them more complicated.  It's still just one test
>> against one attribute.  And updating existing descriptions can be
>> done automatically by sed.
>
> Consider cores that do not care about the "subtype" at all: when using
> just "type", all cores have to test for "foo,foo_subtype", while with
> a separate attribute they can just test for "foo".

Right.  But that was exactly the case I was considering with the sed
comment above :-)

I don't see adding a new attribute value to a set_attr as extra
complication.  It's just adding a value to a set.  To me extra
complication is adding (ior …)s, (and …)s, etc.

> A typical example in rs6000 is the "sign_extend" attribute for load
> instructions.  All cores before power4 do not care at all about it.
> (Load and store insns get into combinatorial explosion land as well,
> as you discuss below, with "update" and "indexed" forms).
>
>> The difficulty with splitting into subattributes is that the tests in
>> the cores that *do* care then become more complicated.  E.g. you have
>> to do:
>> 
>>    (ior (and (eq_attr "type" "foo")
>>              (eq_attr "foo_subtype" "foo1"))
>>         (eq_attr "type" "...others.."))
>> 
>> and:
>> 
>>    (ior (and (eq_attr "type" "foo")
>>              (eq_attr "foo_subtype" "!foo1"))
>>         (eq_attr "type" "...others.."))
>> 
>> instead of just:
>> 
>>    (eq_attr "type" "...")
>> 
>> in both cases.
>
> Yes.  It is a trade-off.
>
>> It's not too bad when it's just one subtype (like above), but it could
>> easily get out of hand.
>> 
>> Perhaps in this case there's an argument in favour of having a separate
>> attribute due to combinatorial explosion.  E.g. we have:
>> 
>> - alu_shift_imm
>> - alus_shift_imm
>> - logic_shift_imm
>> - logics_shift_imm
>> 
>> so having one attribute that describes the shift in all four cases
>> would perhaps be better than splitting each of them into two.
>
> Yeas, exactly.  And for rs6000 we *did* have many more types before, and
> very frequently one was missed (in a scheduling description usually).
> Especially common was when some new type attribute was added, not all
> existing cpu descriptions were updated correctly.  Maybe this is the
> strongest argument "for" actually :-)

The update shouldn't be done by hand.  The regularity of the syntax
means that an appropriate sed really is good enough (perhaps with
formatting fixes on top).  For extra robustness, you can replace the
old attribute value with two new ones, so that any missed case triggers
a build error.

What I'm really wary of with taking the line above is that it feeds
into the attitude that existing scheduling descriptions should become
fossilised at the point they're added: noone working on a different
core should change the declarations in the file later in case they
miss something.  I don't think that's what you're saying, but it
could feed into that general attitude.

>> Arguments in the other direction:
>> 
>> - Once we have a separate attribute, it isn't clear where the line
>>   should be drawn.
>
> Good taste ;-)
>
>>  Alternatives include:
>> 
>>   (1) keeping the new attribute specific to shift immediates only
>> 
>>   (2) also having a “register” value, and folding:
>>       alu_shift_imm, alu_shift_reg -> alu_shift
>> 
>>   (3) also having a “none” value, and doing away with the *_shift
>>       type variants altogether:
>>       alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg
>> 
>>   I think we should have a clear reason for whichever we pick,
>>   otherwise we could be creating technical debt for later.
>
> Yes.
>
> One example of what we do:
>
> ;; Is this instruction using operands[2] as shift amount, and can that be a
> ;; register?
> ;; This is used for shift insns.
> (define_attr "maybe_var_shift" "no,yes" (const_string "no"))
>
> ;; Is this instruction using a shift amount from a register?
> ;; This is used for shift insns.
> (define_attr "var_shift" "no,yes"
>   (if_then_else (and (eq_attr "type" "shift")
>                      (eq_attr "maybe_var_shift" "yes"))
>                 (if_then_else (match_operand 2 "gpc_reg_operand")
>                               (const_string "yes")
>                               (const_string "no"))
>                 (const_string "no")))
>
> define_insns only use maybe_var_shift.  Only the power6 and e*500* cpu
> descriptions ever look at var_shift.  (Directly.  There is some other
> scheduling code that looks at it too -- and all but the power6 ones seem
> to be incorrect!  That is all a direct translation of "type-only" code
> there was before...)

Right.  This is similar to the:

      (define_attr "shift_imm_type" "none,alu_shift_op2")

thing I suggested upthread.  I think it's better for the define_insn
attribute to specify the operand number directly, since there might
well be cases where the shift isn't operand 2.

Anyway, I think we're in violent agreement on how this works, we're just
taking different conclusions from it.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-11  7:44         ` Richard Sandiford
@ 2020-09-11 13:58           ` Segher Boessenkool
  2020-09-14  5:41             ` Qian, Jianhua
  0 siblings, 1 reply; 20+ messages in thread
From: Segher Boessenkool @ 2020-09-11 13:58 UTC (permalink / raw)
  To: Qian, Jianhua, gcc, richard.sandiford

Hi!

On Fri, Sep 11, 2020 at 08:44:54AM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > Consider cores that do not care about the "subtype" at all: when using
> > just "type", all cores have to test for "foo,foo_subtype", while with
> > a separate attribute they can just test for "foo".
> 
> Right.  But that was exactly the case I was considering with the sed
> comment above :-)
> 
> I don't see adding a new attribute value to a set_attr as extra
> complication.  It's just adding a value to a set.  To me extra
> complication is adding (ior …)s, (and …)s, etc.

Combinatorial explosion is real.

(ior)s etc. are easier to read, and importantly it is much more obvious
if any case is missed with them.

Here is the very first diff in d839f53b7dfd, where I started this for
rs6000:

 (define_insn_reservation "ppc403-load" 2
-  (and (eq_attr "type" "load,load_ext,load_ext_u,load_ext_ux,load_ux,load_u,\
-                       load_l,store_c,sync")
+  (and (eq_attr "type" "load,load_l,store_c,sync")
        (eq_attr "cpu" "ppc403,ppc405"))
   "iu_40x")

We started off with 77 insn types.  Now we have 63 (it was 57 before, we
added some more).

> > Yeas, exactly.  And for rs6000 we *did* have many more types before, and
> > very frequently one was missed (in a scheduling description usually).
> > Especially common was when some new type attribute was added, not all
> > existing cpu descriptions were updated correctly.  Maybe this is the
> > strongest argument "for" actually :-)
> 
> The update shouldn't be done by hand.

No, it very much had to, because the existing code missed many cases and
was inconsistent in others.

> What I'm really wary of with taking the line above is that it feeds
> into the attitude that existing scheduling descriptions should become
> fossilised at the point they're added: noone working on a different
> core should change the declarations in the file later in case they
> miss something.  I don't think that's what you're saying, but it
> could feed into that general attitude.

It is very much not what I am saying.  I *am* saying that if people
trying to improve one CPU's modelling have to edit over 20 models for
CPUs that they do not care about, mistakes happen.  Such churn is
conceptually wrong as well, of course (the new types are just the same
as the old types on most older CPUs!)

> > Yes.
> >
> > One example of what we do:
> >
> > ;; Is this instruction using operands[2] as shift amount, and can that be a
> > ;; register?
> > ;; This is used for shift insns.
> > (define_attr "maybe_var_shift" "no,yes" (const_string "no"))
> >
> > ;; Is this instruction using a shift amount from a register?
> > ;; This is used for shift insns.
> > (define_attr "var_shift" "no,yes"
> >   (if_then_else (and (eq_attr "type" "shift")
> >                      (eq_attr "maybe_var_shift" "yes"))
> >                 (if_then_else (match_operand 2 "gpc_reg_operand")
> >                               (const_string "yes")
> >                               (const_string "no"))
> >                 (const_string "no")))
> >
> > define_insns only use maybe_var_shift.  Only the power6 and e*500* cpu
> > descriptions ever look at var_shift.  (Directly.  There is some other
> > scheduling code that looks at it too -- and all but the power6 ones seem
> > to be incorrect!  That is all a direct translation of "type-only" code
> > there was before...)
> 
> Right.  This is similar to the:
> 
>       (define_attr "shift_imm_type" "none,alu_shift_op2")
> 
> thing I suggested upthread.  I think it's better for the define_insn
> attribute to specify the operand number directly, since there might
> well be cases where the shift isn't operand 2.

Not in this case; all such insns always have it as op 2 in the machine
instruction, and RTL almost always uses the same order.

> Anyway, I think we're in violent agreement on how this works, we're just
> taking different conclusions from it.

Yeah :-)  And not even so different: I think you agree it is a
trade-off, you just see it tilt more to one side, while I see it go to
the other side often.

We still have over fifty instruction types.  We (hopefully :-) ) use the
"extra attribute" method only where that helps more than it hurts.  It
is just another tool in the toolbox.


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: A problem with one instruction multiple latencies and pipelines
  2020-09-11 13:58           ` Segher Boessenkool
@ 2020-09-14  5:41             ` Qian, Jianhua
  2020-09-14  9:55               ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Qian, Jianhua @ 2020-09-14  5:41 UTC (permalink / raw)
  To: Segher Boessenkool, gcc, richard.sandiford

Hi Richard and Segher

I don't know if I exactly understood your discussion.
If I misunderstood, please let me know.

I am trying to test these two cases.
Case 1. keep the TYPE attribute unchanged, add new attributes
    It works well as below.
    (define_attr "shift_imm_value" "shift_imm_none,shift_imm_0,shift_imm_1to4,shift_imm_ge5" (const_string "shift_imm_none"))

    (define_insn "*add_<shift>_<mode>"
      [(set (match_operand:GPI 0 "register_operand" "=r")
            (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
                                  (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n"))
                      (match_operand:GPI 3 "register_operand" "r")))]
      ""
      "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
      [(set_attr "type" "alu_shift_imm")
       (set (attr "shift_imm_value")
            (cond [(eq (symbol_ref "XINT(operands[2],0)") (const_int 0)) (const_string "shift_imm_0")
                   (and (ge (symbol_ref "XINT(operands[2],0)") (const_int 1)) (le (symbol_ref "XINT(operands[2],0)") (const_int 4))) (const_string "shift_imm_1to4")]
                   (const_string "shift_imm_ge5")))]
    )

    (define_insn_reservation "a64fx_alu_shift1to4" 2
      (and (eq_attr "tune" "a64fx")
           (eq_attr "type" "alu_shift_imm")
           (eq_attr "shift_imm_value" "shift_imm_1to4"))
      "(a64fx_exa,a64fx_exa)|(a64fx_exb,a64fx_exb)")


Case2. expand the TYPE attribute
    I replaced the alu_shift_imm to alu_shift_imm, alu_shift_imm1to4, and I got the error message " bad number of entries in SET_ATTR_ALTERNATIVE, was 2 expected 1"
    (define_attr "type"
      ...
      alu_shift_imm,\
      alu_shift_imm1to4,\
      ...

    (define_insn "*add_<shift>_<mode>"
      [(set (match_operand:GPI 0 "register_operand" "=r")
            (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
                                  (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n"))
                      (match_operand:GPI 3 "register_operand" "r")))]
      ""
      "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
      [(set_attr "type" "alu_shift_imm,alu_shift_imm1to4")]
    )

    It means that one "type" value should be matched by one operand constraint pattern. So this will raise two problems.
      1. One instruction should be classified to multiple patterns
         - define multiple operand constraints, such as "(match_operand:QI 2 "aarch64_shift_imm_<mode>" "0, 1to4, ...")"
         - or, use "cond" at set_attr just like Case1.
      2. Whatever you do with the first problem, the type attribute cannot be set with "alu_shift_imm"
        and "alu_shift_imm1to4" at the same time by one operand constraint pattern.
            - If we cannot resolve it, the existing CPUs' descriptions need to be changed. This is not what I expected.
            - If we want to add new attribute to resolve this problem, why not use the Case1 directly?

> It is very much not what I am saying.  I *am* saying that if people trying to
> improve one CPU's modelling have to edit over 20 models for CPUs that they do
> not care about, mistakes happen. 

That's what I'm worried about.

Regards
Qian

> -----Original Message-----
> From: Segher Boessenkool <segher@kernel.crashing.org>
> Sent: Friday, September 11, 2020 9:59 PM
> To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>; gcc@gcc.gnu.org;
> richard.sandiford@arm.com
> Subject: Re: A problem with one instruction multiple latencies and pipelines
> 
> Hi!
> 
> On Fri, Sep 11, 2020 at 08:44:54AM +0100, Richard Sandiford wrote:
> > Segher Boessenkool <segher@kernel.crashing.org> writes:
> > > Consider cores that do not care about the "subtype" at all: when
> > > using just "type", all cores have to test for "foo,foo_subtype",
> > > while with a separate attribute they can just test for "foo".
> >
> > Right.  But that was exactly the case I was considering with the sed
> > comment above :-)
> >
> > I don't see adding a new attribute value to a set_attr as extra
> > complication.  It's just adding a value to a set.  To me extra
> > complication is adding (ior …)s, (and …)s, etc.
> 
> Combinatorial explosion is real.
> 
> (ior)s etc. are easier to read, and importantly it is much more obvious if any case
> is missed with them.
> 
> Here is the very first diff in d839f53b7dfd, where I started this for
> rs6000:
> 
>  (define_insn_reservation "ppc403-load" 2
> -  (and (eq_attr "type"
> "load,load_ext,load_ext_u,load_ext_ux,load_ux,load_u,\
> -                       load_l,store_c,sync")
> +  (and (eq_attr "type" "load,load_l,store_c,sync")
>         (eq_attr "cpu" "ppc403,ppc405"))
>    "iu_40x")
> 
> We started off with 77 insn types.  Now we have 63 (it was 57 before, we added
> some more).
> 
> > > Yeas, exactly.  And for rs6000 we *did* have many more types before,
> > > and very frequently one was missed (in a scheduling description usually).
> > > Especially common was when some new type attribute was added, not
> > > all existing cpu descriptions were updated correctly.  Maybe this is
> > > the strongest argument "for" actually :-)
> >
> > The update shouldn't be done by hand.
> 
> No, it very much had to, because the existing code missed many cases and was
> inconsistent in others.
> 
> > What I'm really wary of with taking the line above is that it feeds
> > into the attitude that existing scheduling descriptions should become
> > fossilised at the point they're added: noone working on a different
> > core should change the declarations in the file later in case they
> > miss something.  I don't think that's what you're saying, but it could
> > feed into that general attitude.
> 
> It is very much not what I am saying.  I *am* saying that if people trying to
> improve one CPU's modelling have to edit over 20 models for CPUs that they do
> not care about, mistakes happen.  Such churn is conceptually wrong as well,
> of course (the new types are just the same as the old types on most older
> CPUs!)
> 
> > > Yes.
> > >
> > > One example of what we do:
> > >
> > > ;; Is this instruction using operands[2] as shift amount, and can
> > > that be a ;; register?
> > > ;; This is used for shift insns.
> > > (define_attr "maybe_var_shift" "no,yes" (const_string "no"))
> > >
> > > ;; Is this instruction using a shift amount from a register?
> > > ;; This is used for shift insns.
> > > (define_attr "var_shift" "no,yes"
> > >   (if_then_else (and (eq_attr "type" "shift")
> > >                      (eq_attr "maybe_var_shift" "yes"))
> > >                 (if_then_else (match_operand 2 "gpc_reg_operand")
> > >                               (const_string "yes")
> > >                               (const_string "no"))
> > >                 (const_string "no")))
> > >
> > > define_insns only use maybe_var_shift.  Only the power6 and e*500*
> > > cpu descriptions ever look at var_shift.  (Directly.  There is some
> > > other scheduling code that looks at it too -- and all but the power6
> > > ones seem to be incorrect!  That is all a direct translation of
> > > "type-only" code there was before...)
> >
> > Right.  This is similar to the:
> >
> >       (define_attr "shift_imm_type" "none,alu_shift_op2")
> >
> > thing I suggested upthread.  I think it's better for the define_insn
> > attribute to specify the operand number directly, since there might
> > well be cases where the shift isn't operand 2.
> 
> Not in this case; all such insns always have it as op 2 in the machine
> instruction, and RTL almost always uses the same order.
> 
> > Anyway, I think we're in violent agreement on how this works, we're
> > just taking different conclusions from it.
> 
> Yeah :-)  And not even so different: I think you agree it is a trade-off, you just
> see it tilt more to one side, while I see it go to the other side often.
> 
> We still have over fifty instruction types.  We (hopefully :-) ) use the "extra
> attribute" method only where that helps more than it hurts.  It is just another
> tool in the toolbox.
> 
> 
> Segher
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-14  5:41             ` Qian, Jianhua
@ 2020-09-14  9:55               ` Richard Sandiford
  2020-09-14 18:41                 ` Segher Boessenkool
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2020-09-14  9:55 UTC (permalink / raw)
  To: Qian, Jianhua; +Cc: Segher Boessenkool, gcc

"Qian, Jianhua" <qianjh@cn.fujitsu.com> writes:
> Hi Richard and Segher
>
> I don't know if I exactly understood your discussion.
> If I misunderstood, please let me know.
>
> I am trying to test these two cases.
> Case 1. keep the TYPE attribute unchanged, add new attributes
>     It works well as below.
>     (define_attr "shift_imm_value" "shift_imm_none,shift_imm_0,shift_imm_1to4,shift_imm_ge5" (const_string "shift_imm_none"))

I'm not sure the shift_imm_0 case is useful here FWIW.  alu_shift_imm
should only be used with nonzero shift amounts.

Also, the associated C++ enums are automatically prefixed with the name
of the attribute, so I think the values should just be 1to4, ge5, etc.,
with no prefix.

>     (define_insn "*add_<shift>_<mode>"
>       [(set (match_operand:GPI 0 "register_operand" "=r")
>             (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>                                   (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n"))
>                       (match_operand:GPI 3 "register_operand" "r")))]
>       ""
>       "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
>       [(set_attr "type" "alu_shift_imm")
>        (set (attr "shift_imm_value")
>             (cond [(eq (symbol_ref "XINT(operands[2],0)") (const_int 0)) (const_string "shift_imm_0")
>                    (and (ge (symbol_ref "XINT(operands[2],0)") (const_int 1)) (le (symbol_ref "XINT(operands[2],0)") (const_int 4))) (const_string "shift_imm_1to4")]
>                    (const_string "shift_imm_ge5")))]
>     )

I'll come back to this below.

>
>     (define_insn_reservation "a64fx_alu_shift1to4" 2
>       (and (eq_attr "tune" "a64fx")
>            (eq_attr "type" "alu_shift_imm")
>            (eq_attr "shift_imm_value" "shift_imm_1to4"))
>       "(a64fx_exa,a64fx_exa)|(a64fx_exb,a64fx_exb)")
>
>
> Case2. expand the TYPE attribute
>     I replaced the alu_shift_imm to alu_shift_imm, alu_shift_imm1to4, and I got the error message " bad number of entries in SET_ATTR_ALTERNATIVE, was 2 expected 1"
>     (define_attr "type"
>       ...
>       alu_shift_imm,\
>       alu_shift_imm1to4,\
>       ...
>
>     (define_insn "*add_<shift>_<mode>"
>       [(set (match_operand:GPI 0 "register_operand" "=r")
>             (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>                                   (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n"))
>                       (match_operand:GPI 3 "register_operand" "r")))]
>       ""
>       "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
>       [(set_attr "type" "alu_shift_imm,alu_shift_imm1to4")]
>     )
>
>     It means that one "type" value should be matched by one operand constraint pattern. So this will raise two problems.
>       1. One instruction should be classified to multiple patterns
>          - define multiple operand constraints, such as "(match_operand:QI 2 "aarch64_shift_imm_<mode>" "0, 1to4, ...")"
>          - or, use "cond" at set_attr just like Case1.
>       2. Whatever you do with the first problem, the type attribute cannot be set with "alu_shift_imm"
>         and "alu_shift_imm1to4" at the same time by one operand constraint pattern.

Right.  Something needs to tell the compiler how to distinguish between
the two cases.

>             - If we cannot resolve it, the existing CPUs' descriptions need to be changed. This is not what I expected.
>             - If we want to add new attribute to resolve this problem, why not use the Case1 directly?

It's a trade-off.  At the moment the rule for arm and aarch64 is simple:
the "type" attribute specifies the complete scheduling type.  If we
instead change policy and divide the information among several attributes,
those attributes will have increasingly complex relationships with one
another.  It will become easier to specify a meaningless combination
by mistake, or to forget to specify one of the relevant attributes.

I agree it's not so bad with just one extra attribute.  The question
is more: if we change the policy, what would things look like if there
were more attributes of a similar nature?

On the other side, lumping everything into one attributes gives a
long list of values.  I think the difference of opinion is that Segher
finds this argument more compelling while I find the other argument
more compelling ;-)

That said, there are really two issues here:

(1) how should the define_insn describe the instruction?
(2) how should scheduling descriptions test for it?

The same answer for (1) can support multiple answers for (2).

For (1), I think we should do what I mentioned in my first reply
and add something like (different name from last time):

  ;; config/arm/types.md
  (define_attr "autodetect_type" "none,alu_shift_op2")

This attribute would in principle be a home for any case in which
the scheduling type needs to be derived from other information
(in this case using the contents of operand 2 to infer the alu_shift
kind).  So going back to what I said above: we would be changing policy
so that define_insns can use one of *two* attributes to specify the
scheduling type:

- they can use "type" to specify an exact scheduling type directly
- they can use "autodetect_type" to specify a method for deriving the
  scheduling type from other information

The *add_<shift>_<mode> define_insn above would then use:

  [(set_attr "autodetect_type" "alu_shift_op2")]

The advantage of this IMO is that we still only have two attributes,
even if we need to autodetect other cases in future.  It also means
that there is only one copy of the condition to pick between (say)
1to4 and ge5.

This approach to (1) supports both approaches to (2).  If the
CPU models continue to use a single type attribute, we can use
autodetect_type to choose an appropriate default, using the method
I mentioned in my first reply:

  ;; config/arm/types.md
  (define_attr "type" "…long list…"
    (cond [(eq_attr "autodetect_type" "alu_shift_op2")
           (if_then_else (match_operand 2 "const_1_to_4_operand")
                         (const_string "alu_shift_imm_1to4")
                         (const_string "alu_shift_imm_ge5"))]
          (const_string "untyped")))

where:

- const_1_to_4_operand is a new predicate.  It could be defined in a
  new config/arm/common.md file that is shared by both arm and aarch64.

- alu_shift_imm is replaced by two new attributes: alu_shift_imm_1to4
  and alu_shift_imm_ge5.

(In principle, this condition could also handle alu_shift_reg,
if that proved to be useful later.)

If we instead use multiple attributes, the definitions would be:

(define_attr "type" "…"
  (cond [(eq_attr "autodetect_type" "alu_shift_op2")
         (const_string "alu_shift_imm")]
        (const_string "untyped")))

(define_attr "shift_imm_value" "none,1to4,ge5"
  (cond [(eq_attr "autodetect_type" "alu_shift_op2")
         (if_then_else (match_operand 2 "const_1_to_4_operand")
                       (const_string "1to4")
                       (const_string "ge5"))]
        (const_string "none"))

I still prefer the single type attribute, for the reasons above.

FWIW, this script updates the CPU descriptions so that every test
for alu_shift_imm is replaced by a test for the two new attributes:

find gcc/config/arm gcc/config/aarch64 -name '*.md' |
  while read arg; do
    case $arg in
      */arm.md | */aarch64.md | */types.md | */thumb1.md | */thumb2.md | \
      */arm-fixed.md)
	;;
      *)
	sed -i s/alu_shift_imm/alu_shift_imm_1to4,alu_shift_imm_ge5/g $arg
	;;
    esac
  done

(not idempotent :-))

Then you'd need to update the non-CPU files that were excluded by the
case statement above.

Although this looks/sounds complicated, the advantage is that everything
remains up-to-date.  If we instead added a second attribute and only
defined it for instructions like *add_<shift>_<mode>, other instructions
(including config/arm instructions) would still have type alu_shift_imm
but would have a shift_imm_value of "none".

Sorry for the long message.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-14  9:55               ` Richard Sandiford
@ 2020-09-14 18:41                 ` Segher Boessenkool
  2020-09-14 19:35                   ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Segher Boessenkool @ 2020-09-14 18:41 UTC (permalink / raw)
  To: Qian, Jianhua, gcc, richard.sandiford

Hi!

On Mon, Sep 14, 2020 at 10:55:35AM +0100, Richard Sandiford wrote:
> "Qian, Jianhua" <qianjh@cn.fujitsu.com> writes:
> >             - If we cannot resolve it, the existing CPUs' descriptions need to be changed. This is not what I expected.
> >             - If we want to add new attribute to resolve this problem, why not use the Case1 directly?
> 
> It's a trade-off.  At the moment the rule for arm and aarch64 is simple:
> the "type" attribute specifies the complete scheduling type.  If we
> instead change policy and divide the information among several attributes,
> those attributes will have increasingly complex relationships with one
> another.  It will become easier to specify a meaningless combination
> by mistake, or to forget to specify one of the relevant attributes.

I found exactly the opposite, when I did this to rs6000: this is much
*less* error-prone.

> I agree it's not so bad with just one extra attribute.  The question
> is more: if we change the policy, what would things look like if there
> were more attributes of a similar nature?

You often can do new attributes that apply to quite a few different
instructions.  Like, as an extreme example, we have
  ;; What data size does this instruction work on?
  ;; This is used for insert, mul and others as necessary.
  (define_attr "size" "8,16,32,64,128" (const_string "32"))

But there also is
  ;; Is this instruction record form ("dot", signed compare to 0, writing CR0)?
  ;; This is used for add, logical, shift, exts, mul.
  (define_attr "dot" "no,yes" (const_string "no"))

and a few more.  But just that: a few.  It isn't good to have very many
extra attributes, in the same way that very many insn types makes things
harder than needed.

> On the other side, lumping everything into one attributes gives a
> long list of values.  I think the difference of opinion is that Segher
> finds this argument more compelling while I find the other argument
> more compelling ;-)

It's a trade-off, we agree on that :-)  And I'd be conservative here, we
agree on that as well.

> Although this looks/sounds complicated, the advantage is that everything
> remains up-to-date.  If we instead added a second attribute and only
> defined it for instructions like *add_<shift>_<mode>, other instructions
> (including config/arm instructions) would still have type alu_shift_imm
> but would have a shift_imm_value of "none".

I would make an attribute for the mode (or data size really), and one
that says the insn uses shifted data (since many arm insns have that,
just like record form on PowerPC is everywhere).  And you can have that
one filled "by magic" usually!

Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-14 18:41                 ` Segher Boessenkool
@ 2020-09-14 19:35                   ` Richard Sandiford
  2020-09-14 22:14                     ` Segher Boessenkool
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2020-09-14 19:35 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Qian, Jianhua, gcc

Segher Boessenkool <segher@kernel.crashing.org> writes:
>> Although this looks/sounds complicated, the advantage is that everything
>> remains up-to-date.  If we instead added a second attribute and only
>> defined it for instructions like *add_<shift>_<mode>, other instructions
>> (including config/arm instructions) would still have type alu_shift_imm
>> but would have a shift_imm_value of "none".
>
> I would make an attribute for the mode (or data size really), and one
> that says the insn uses shifted data (since many arm insns have that,
> just like record form on PowerPC is everywhere).  And you can have that
> one filled "by magic" usually!

I don't think this is really answering my point above though.
What I meant is: we currently have several instructions in config/arm
and config/aarch64 that have type alu_shift_imm.  If we add some new
on-the-side attributes A, but only update *some* of the alu_shift_imm
instructions to define A (either directly or indirectly), then the other
alu_shift_imm instructions will have the default values for A.
This probably isn't the intended effect.  Ideally, every alu_shift_imm
instruction would specify correct attribute values for A (specifically,
to indicate whether the shift value is in [1, 4] or not).

In contrast, one advantage of replacing the existing alu_shift_imm type
with two new types is that any existing reference to the old type will
cause a build failure.  So keeping everything in a single type
attributes gives us static type checking that the information for each
(former) alu_shift_imm instruction is complete.  Similarly for any other
type that needs to be split in the same way.

I realise this won't convince you, and I'm not trying to. :-)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-14 19:35                   ` Richard Sandiford
@ 2020-09-14 22:14                     ` Segher Boessenkool
  0 siblings, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2020-09-14 22:14 UTC (permalink / raw)
  To: Qian, Jianhua, gcc, richard.sandiford

On Mon, Sep 14, 2020 at 08:35:44PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >> Although this looks/sounds complicated, the advantage is that everything
> >> remains up-to-date.  If we instead added a second attribute and only
> >> defined it for instructions like *add_<shift>_<mode>, other instructions
> >> (including config/arm instructions) would still have type alu_shift_imm
> >> but would have a shift_imm_value of "none".
> >
> > I would make an attribute for the mode (or data size really), and one
> > that says the insn uses shifted data (since many arm insns have that,
> > just like record form on PowerPC is everywhere).  And you can have that
> > one filled "by magic" usually!
> 
> I don't think this is really answering my point above though.
> What I meant is: we currently have several instructions in config/arm
> and config/aarch64 that have type alu_shift_imm.  If we add some new
> on-the-side attributes A, but only update *some* of the alu_shift_imm
> instructions to define A (either directly or indirectly), then the other
> alu_shift_imm instructions will have the default values for A.
> This probably isn't the intended effect.  Ideally, every alu_shift_imm
> instruction would specify correct attribute values for A (specifically,
> to indicate whether the shift value is in [1, 4] or not).
> 
> In contrast, one advantage of replacing the existing alu_shift_imm type
> with two new types is that any existing reference to the old type will
> cause a build failure.  So keeping everything in a single type
> attributes gives us static type checking that the information for each
> (former) alu_shift_imm instruction is complete.  Similarly for any other
> type that needs to be split in the same way.

Removing the old types causes build failures as well if you forget some
cases.

> I realise this won't convince you, and I'm not trying to. :-)

And you don't have to :-)  I am telling you about a scheme that worked
really well for us; I am not trying to convince you to do that in your
own target, that decision is all your own.


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-07  6:08 A problem with one instruction multiple latencies and pipelines Qian, Jianhua
  2020-09-07  7:40 ` Richard Biener
  2020-09-07 20:20 ` Richard Sandiford
@ 2020-09-11 13:30 ` Richard Earnshaw
  2020-09-14  2:53   ` Qian, Jianhua
  2 siblings, 1 reply; 20+ messages in thread
From: Richard Earnshaw @ 2020-09-11 13:30 UTC (permalink / raw)
  To: Qian, Jianhua, gcc

On 07/09/2020 07:08, Qian, Jianhua wrote:
> Hi
> 
> I'm adding a new machine model. I have a problem when writing the "define_insn_reservation" for instruction scheduling.
> How to write the "define_insn_reservation" for one instruction that there are different latencies and pipelines according to parameter.
> 
> For example, the ADD (shifted register) instruction in a64fx
> 
> Instruction            Option                             Latency    Pipeline
> ADD (shifted register)  <amount> = 0                     1          EX* | EAG*
>                       <amount> = [1-4] && <shift>=LSL  1+1       (EXA + EXA) | (EXB + EXB)
>                                                          2+1       (EXA + EXA) | (EXB + EXB)
> 

A shift by immediate zero isn't a shift, so should never use this RTL
pattern.  We can ignore that case.

> In aarch64.md ADD (shifted register) instruction is defined as following．
>  (define_insn "*add_<shift>_<mode>"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>                               (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n"))
>                   (match_operand:GPI 3 "register_operand" "r")))]
>   ""
>   "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
>   [(set_attr "type" "alu_shift_imm")]
> )

You might consider using a define_bypass to adjust the cost - the
matcher rule takes a producer and consumer RTL - you don't care about
the consumer, but you can use the bypass to reduce the cost if the
producer uses an immediate in the 'low latency' range.  This would avoid
having to make a load of whole-sale changes to the main parts of the
machine description.


> 
> It could not be distinguished by the type "alu_shift_imm" when writing "define_insn_reservation" for ADD (shifted register). 
> What should I do?
> 
> Regards
> Qian
> 
> 
> 

R.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: A problem with one instruction multiple latencies and pipelines
  2020-09-11 13:30 ` Richard Earnshaw
@ 2020-09-14  2:53   ` Qian, Jianhua
  2020-09-14  9:08     ` Richard Earnshaw
  0 siblings, 1 reply; 20+ messages in thread
From: Qian, Jianhua @ 2020-09-14  2:53 UTC (permalink / raw)
  To: Richard Earnshaw, gcc

> -----Original Message-----
> From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
> Sent: Friday, September 11, 2020 9:30 PM
> To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>; gcc@gcc.gnu.org
> Subject: Re: A problem with one instruction multiple latencies and pipelines
> 
> On 07/09/2020 07:08, Qian, Jianhua wrote:
> > Hi
> >
> > I'm adding a new machine model. I have a problem when writing the
> "define_insn_reservation" for instruction scheduling.
> > How to write the "define_insn_reservation" for one instruction that there are
> different latencies and pipelines according to parameter.
> >
> > For example, the ADD (shifted register) instruction in a64fx
> >
> > Instruction            Option                             Latency
> Pipeline
> > ADD (shifted register)  <amount> = 0                     1          EX*
> | EAG*
> >                       <amount> = [1-4] && <shift>=LSL  1+1
> (EXA + EXA) | (EXB + EXB)
> >                                                          2+1       (EXA
> + EXA) | (EXB + EXB)
> >
> 
> A shift by immediate zero isn't a shift, so should never use this RTL pattern.
> We can ignore that case.
> 
> > In aarch64.md ADD (shifted register) instruction is defined as following．
> >  (define_insn "*add_<shift>_<mode>"
> >   [(set (match_operand:GPI 0 "register_operand" "=r")
> >         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand"
> "r")
> >                               (match_operand:QI 2
> "aarch64_shift_imm_<mode>" "n"))
> >                   (match_operand:GPI 3 "register_operand" "r")))]
> >   ""
> >   "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
> >   [(set_attr "type" "alu_shift_imm")]
> > )
> 
> You might consider using a define_bypass to adjust the cost - the matcher rule
> takes a producer and consumer RTL - you don't care about the consumer, but
> you can use the bypass to reduce the cost if the producer uses an immediate in
> the 'low latency' range.  This would avoid having to make a load of whole-sale
> changes to the main parts of the machine description.

Thanks for your comment.
But I think the define_bypass can only change the latency for special instruction.
Pipeline also could be changed by define_bypass?

Regards
Qian

> >
> > It could not be distinguished by the type "alu_shift_imm" when writing
> "define_insn_reservation" for ADD (shifted register).
> > What should I do?
> >
> > Regards
> > Qian
> >
> >
> >
> 
> R.
> 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: A problem with one instruction multiple latencies and pipelines
  2020-09-14  2:53   ` Qian, Jianhua
@ 2020-09-14  9:08     ` Richard Earnshaw
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Earnshaw @ 2020-09-14  9:08 UTC (permalink / raw)
  To: Qian, Jianhua, gcc

On 14/09/2020 03:53, Qian, Jianhua wrote:
>> -----Original Message-----
>> From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
>> Sent: Friday, September 11, 2020 9:30 PM
>> To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>; gcc@gcc.gnu.org
>> Subject: Re: A problem with one instruction multiple latencies and pipelines
>>
>> On 07/09/2020 07:08, Qian, Jianhua wrote:
>>> Hi
>>>
>>> I'm adding a new machine model. I have a problem when writing the
>> "define_insn_reservation" for instruction scheduling.
>>> How to write the "define_insn_reservation" for one instruction that there are
>> different latencies and pipelines according to parameter.
>>>
>>> For example, the ADD (shifted register) instruction in a64fx
>>>
>>> Instruction            Option                             Latency
>> Pipeline
>>> ADD (shifted register)  <amount> = 0                     1          EX*
>> | EAG*
>>>                       <amount> = [1-4] && <shift>=LSL  1+1
>> (EXA + EXA) | (EXB + EXB)
>>>                                                          2+1       (EXA
>> + EXA) | (EXB + EXB)
>>>
>>
>> A shift by immediate zero isn't a shift, so should never use this RTL pattern.
>> We can ignore that case.
>>
>>> In aarch64.md ADD (shifted register) instruction is defined as following．
>>>  (define_insn "*add_<shift>_<mode>"
>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand"
>> "r")
>>>                               (match_operand:QI 2
>> "aarch64_shift_imm_<mode>" "n"))
>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>   ""
>>>   "add\\t%<w>0, %<w>3, %<w>1, <shift> %2"
>>>   [(set_attr "type" "alu_shift_imm")]
>>> )
>>
>> You might consider using a define_bypass to adjust the cost - the matcher rule
>> takes a producer and consumer RTL - you don't care about the consumer, but
>> you can use the bypass to reduce the cost if the producer uses an immediate in
>> the 'low latency' range.  This would avoid having to make a load of whole-sale
>> changes to the main parts of the machine description.
> 
> Thanks for your comment.
> But I think the define_bypass can only change the latency for special instruction.
> Pipeline also could be changed by define_bypass?
> 

Possibly, but if this is part of the out-of-order units of the pipe, I
really don't think it will matter.  In fact, I'm not even convinced that
trying to model the out-of-order stages is worthwhile - let the CPU
handle that: any long-latency instruction, such as a memory access that
misses the L1 cache will completely mess up the compiler's understanding
of the pipeline state anyway.

What I think is more important is to get a good model for the in-order
bits at the front of the pipe accurately modelled so that you can
maximize the throughput of those stages.  Try to get a mix of
instructions so that a single issue unit in the core doesn't get clogged
up and block further decode.

R.

> Regards
> Qian
> 
>>>
>>> It could not be distinguished by the type "alu_shift_imm" when writing
>> "define_insn_reservation" for ADD (shifted register).
>>> What should I do?
>>>
>>> Regards
>>> Qian
>>>
>>>
>>>
>>
>> R.
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-09-14 22:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07  6:08 A problem with one instruction multiple latencies and pipelines Qian, Jianhua
2020-09-07  7:40 ` Richard Biener
2020-09-07  8:45   ` Qian, Jianhua
2020-09-07 11:58     ` Richard Biener
2020-09-07 20:20 ` Richard Sandiford
2020-09-08  5:34   ` Qian, Jianhua
2020-09-09 21:22   ` Segher Boessenkool
2020-09-10  5:01     ` Qian, Jianhua
2020-09-10 10:04     ` Richard Sandiford
2020-09-10 23:00       ` Segher Boessenkool
2020-09-11  7:44         ` Richard Sandiford
2020-09-11 13:58           ` Segher Boessenkool
2020-09-14  5:41             ` Qian, Jianhua
2020-09-14  9:55               ` Richard Sandiford
2020-09-14 18:41                 ` Segher Boessenkool
2020-09-14 19:35                   ` Richard Sandiford
2020-09-14 22:14                     ` Segher Boessenkool
2020-09-11 13:30 ` Richard Earnshaw
2020-09-14  2:53   ` Qian, Jianhua
2020-09-14  9:08     ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).