* A problem with one instruction multiple latencies and pipelines @ 2020-09-07 6:08 Qian, Jianhua 2020-09-07 7:40 ` Richard Biener ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Qian, Jianhua @ 2020-09-07 6:08 UTC (permalink / raw) To: gcc Hi I'm adding a new machine model. I have a problem when writing the "define_insn_reservation" for instruction scheduling. How to write the "define_insn_reservation" for one instruction that there are different latencies and pipelines according to parameter. For example, the ADD (shifted register) instruction in a64fx Instruction Option Latency Pipeline ADD (shifted register) <amount> = 0 1 EX* | EAG* <amount> = [1-4] && <shift>=LSL 1+1 (EXA + EXA) | (EXB + EXB) 2+1 (EXA + EXA) | (EXB + EXB) In aarch64.md ADD (shifted register) instruction is defined as following. (define_insn "*add_<shift>_<mode>" [(set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n")) (match_operand:GPI 3 "register_operand" "r")))] "" "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" [(set_attr "type" "alu_shift_imm")] ) It could not be distinguished by the type "alu_shift_imm" when writing "define_insn_reservation" for ADD (shifted register). What should I do? Regards Qian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-07 6:08 A problem with one instruction multiple latencies and pipelines Qian, Jianhua @ 2020-09-07 7:40 ` Richard Biener 2020-09-07 8:45 ` Qian, Jianhua 2020-09-07 20:20 ` Richard Sandiford 2020-09-11 13:30 ` Richard Earnshaw 2 siblings, 1 reply; 20+ messages in thread From: Richard Biener @ 2020-09-07 7:40 UTC (permalink / raw) To: Qian, Jianhua; +Cc: gcc On Mon, Sep 7, 2020 at 8:10 AM Qian, Jianhua <qianjh@cn.fujitsu.com> wrote: > > Hi > > I'm adding a new machine model. I have a problem when writing the "define_insn_reservation" for instruction scheduling. > How to write the "define_insn_reservation" for one instruction that there are different latencies and pipelines according to parameter. > > For example, the ADD (shifted register) instruction in a64fx > > Instruction Option Latency Pipeline > ADD (shifted register) <amount> = 0 1 EX* | EAG* > <amount> = [1-4] && <shift>=LSL 1+1 (EXA + EXA) | (EXB + EXB) > 2+1 (EXA + EXA) | (EXB + EXB) > > In aarch64.md ADD (shifted register) instruction is defined as following. > (define_insn "*add_<shift>_<mode>" > [(set (match_operand:GPI 0 "register_operand" "=r") > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") > (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n")) > (match_operand:GPI 3 "register_operand" "r")))] > "" > "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" > [(set_attr "type" "alu_shift_imm")] > ) > > It could not be distinguished by the type "alu_shift_imm" when writing "define_insn_reservation" for ADD (shifted register). > What should I do? Just a guess - I'm not very familiar with the pipeline modeling, you probably need to expose two alternatives so you can assign a different type to the second one. Other than that modeling the more restrictive (or permissive?) variant might work good enough in practice. a64fx is probably out-of-order anyway. Richard. > Regards > Qian > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: A problem with one instruction multiple latencies and pipelines 2020-09-07 7:40 ` Richard Biener @ 2020-09-07 8:45 ` Qian, Jianhua 2020-09-07 11:58 ` Richard Biener 0 siblings, 1 reply; 20+ messages in thread From: Qian, Jianhua @ 2020-09-07 8:45 UTC (permalink / raw) To: Richard Biener; +Cc: gcc Hi Richard > -----Original Message----- > From: Richard Biener <richard.guenther@gmail.com> > Sent: Monday, September 7, 2020 3:41 PM > To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com> > Cc: gcc@gcc.gnu.org > Subject: Re: A problem with one instruction multiple latencies and pipelines > > On Mon, Sep 7, 2020 at 8:10 AM Qian, Jianhua <qianjh@cn.fujitsu.com> wrote: > > > > Hi > > > > I'm adding a new machine model. I have a problem when writing the > "define_insn_reservation" for instruction scheduling. > > How to write the "define_insn_reservation" for one instruction that there are > different latencies and pipelines according to parameter. > > > > For example, the ADD (shifted register) instruction in a64fx > > > > Instruction Option Latency > Pipeline > > ADD (shifted register) <amount> = 0 1 EX* > | EAG* > > <amount> = [1-4] && <shift>=LSL 1+1 > (EXA + EXA) | (EXB + EXB) > > 2+1 (EXA > + EXA) | (EXB + EXB) > > > > In aarch64.md ADD (shifted register) instruction is defined as following. > > (define_insn "*add_<shift>_<mode>" > > [(set (match_operand:GPI 0 "register_operand" "=r") > > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" > "r") > > (match_operand:QI 2 > "aarch64_shift_imm_<mode>" "n")) > > (match_operand:GPI 3 "register_operand" "r")))] > > "" > > "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" > > [(set_attr "type" "alu_shift_imm")] > > ) > > > > It could not be distinguished by the type "alu_shift_imm" when writing > "define_insn_reservation" for ADD (shifted register). > > What should I do? > > Just a guess - I'm not very familiar with the pipeline modeling, you probably > need to expose two alternatives so you can assign a different type to the second > one. I'm considering such method, but if I do that I'm afraid it has side effects on other machine models of aarch64 series. Some instructions' definition will be changed in aarch64.md file. > Other than that modeling the more restrictive (or permissive?) variant might > work good enough in practice. Is your mean that an approximate modeling is good enough? > a64fx is probably out-of-order anyway. Yes, a64fx is an out-of-order architecture. Regards Qian > > Richard. > > > Regards > > Qian > > > > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-07 8:45 ` Qian, Jianhua @ 2020-09-07 11:58 ` Richard Biener 0 siblings, 0 replies; 20+ messages in thread From: Richard Biener @ 2020-09-07 11:58 UTC (permalink / raw) To: Qian, Jianhua; +Cc: gcc On Mon, Sep 7, 2020 at 10:46 AM Qian, Jianhua <qianjh@cn.fujitsu.com> wrote: > > Hi Richard > > > -----Original Message----- > > From: Richard Biener <richard.guenther@gmail.com> > > Sent: Monday, September 7, 2020 3:41 PM > > To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com> > > Cc: gcc@gcc.gnu.org > > Subject: Re: A problem with one instruction multiple latencies and pipelines > > > > On Mon, Sep 7, 2020 at 8:10 AM Qian, Jianhua <qianjh@cn.fujitsu.com> wrote: > > > > > > Hi > > > > > > I'm adding a new machine model. I have a problem when writing the > > "define_insn_reservation" for instruction scheduling. > > > How to write the "define_insn_reservation" for one instruction that there are > > different latencies and pipelines according to parameter. > > > > > > For example, the ADD (shifted register) instruction in a64fx > > > > > > Instruction Option Latency > > Pipeline > > > ADD (shifted register) <amount> = 0 1 EX* > > | EAG* > > > <amount> = [1-4] && <shift>=LSL 1+1 > > (EXA + EXA) | (EXB + EXB) > > > 2+1 (EXA > > + EXA) | (EXB + EXB) > > > > > > In aarch64.md ADD (shifted register) instruction is defined as following. > > > (define_insn "*add_<shift>_<mode>" > > > [(set (match_operand:GPI 0 "register_operand" "=r") > > > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" > > "r") > > > (match_operand:QI 2 > > "aarch64_shift_imm_<mode>" "n")) > > > (match_operand:GPI 3 "register_operand" "r")))] > > > "" > > > "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" > > > [(set_attr "type" "alu_shift_imm")] > > > ) > > > > > > It could not be distinguished by the type "alu_shift_imm" when writing > > "define_insn_reservation" for ADD (shifted register). > > > What should I do? > > > > Just a guess - I'm not very familiar with the pipeline modeling, you probably > > need to expose two alternatives so you can assign a different type to the second > > one. > I'm considering such method, > but if I do that I'm afraid it has side effects on other machine models of aarch64 series. > Some instructions' definition will be changed in aarch64.md file. > > > Other than that modeling the more restrictive (or permissive?) variant might > > work good enough in practice. > Is your mean that an approximate modeling is good enough? Yes. > > a64fx is probably out-of-order anyway. > Yes, a64fx is an out-of-order architecture. > > Regards > Qian > > > > > Richard. > > > > > Regards > > > Qian > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-07 6:08 A problem with one instruction multiple latencies and pipelines Qian, Jianhua 2020-09-07 7:40 ` Richard Biener @ 2020-09-07 20:20 ` Richard Sandiford 2020-09-08 5:34 ` Qian, Jianhua 2020-09-09 21:22 ` Segher Boessenkool 2020-09-11 13:30 ` Richard Earnshaw 2 siblings, 2 replies; 20+ messages in thread From: Richard Sandiford @ 2020-09-07 20:20 UTC (permalink / raw) To: Qian, Jianhua; +Cc: gcc "Qian, Jianhua" <qianjh@cn.fujitsu.com> writes: > Hi > > I'm adding a new machine model. I have a problem when writing the "define_insn_reservation" for instruction scheduling. > How to write the "define_insn_reservation" for one instruction that there are different latencies and pipelines according to parameter. > > For example, the ADD (shifted register) instruction in a64fx > > Instruction Option Latency Pipeline > ADD (shifted register) <amount> = 0 1 EX* | EAG* > <amount> = [1-4] && <shift>=LSL 1+1 (EXA + EXA) | (EXB + EXB) > 2+1 (EXA + EXA) | (EXB + EXB) > > In aarch64.md ADD (shifted register) instruction is defined as following. > (define_insn "*add_<shift>_<mode>" > [(set (match_operand:GPI 0 "register_operand" "=r") > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") > (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n")) > (match_operand:GPI 3 "register_operand" "r")))] > "" > "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" > [(set_attr "type" "alu_shift_imm")] > ) > > It could not be distinguished by the type "alu_shift_imm" when writing "define_insn_reservation" for ADD (shifted register). > What should I do? This is just personal opinion, but in general (from the point of view of a new port, or a new subport like SVE), I think the best approach to handling the "type" attribute is to start with the coarsest classification that makes sense, then split these relatively coarse types up whenever there's a specific need. When taking that approach, it's OK (and perhaps even a good sign) for an existing type to sometimes be too coarse for a new CPU. So thanks for asking about this, and please don't feel constrained by the existing "type" classification. IMO we should split existing types wherever that makes sense for new CPUs. In a situation like this, I guess it would make sense to treat alu_shift_imm as a conservative general classification and add alu_shift_imm1to4 (say) as a new type. I'd recommend a two-stage approach, with each stage as a separate patch for ease of review: (1) Add the new type to the definition of the "type" attribute and run: sed -i s/alu_shift_imm/alu_shift_imm1to4,alu_shift_imm/g on all the config/arm and config/aarch64 tuning descriptions. This is a purely mechanical replacement and could be tested in the normal way (i.e. there would be no need to do special testing to make sure that existing descriptions are unaffected: the change is mechanical enough not to need that). (2) Make the define_insns use the new type where appropriate. Like Richard says, this would be handled by splitting the single define_insn alternative above into two separate alternatives. It's also possible to something like: (set (attr "type") (if_then_else (match_operand ...) (const_string "...") (const_string "..."))) which localises the change to the attribute definition. If we end up needing that construct several times, it would also be possible to add a new attribute, e.g.: (define_attr "shift_imm_type" "none,alu_shift_op2") and then change the default value of the type attribute from: (const_string "untyped") to: (cond [(eq_attr "shift_imm_type" "alu_shift_op2") (if_then_else (match_operand 2 ...) (const_string "...") (const_string "..."))] (const_string "untyped")) define_insns like the above could then use: (set_attr "shift_imm_type" "alu_shift_op2") instead of setting "type" directly. That's definitely more complicated, but it can help to reduce cut-&-paste. Thanks, Richard ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: A problem with one instruction multiple latencies and pipelines 2020-09-07 20:20 ` Richard Sandiford @ 2020-09-08 5:34 ` Qian, Jianhua 2020-09-09 21:22 ` Segher Boessenkool 1 sibling, 0 replies; 20+ messages in thread From: Qian, Jianhua @ 2020-09-08 5:34 UTC (permalink / raw) To: Richard Sandiford; +Cc: gcc Hi Richard Thanks a lot for your advises and detailed comments. We will discuss which instructions need to be accurately classified, and estimate the workload. Regards Qian > -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Tuesday, September 8, 2020 4:21 AM > To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com> > Cc: gcc@gcc.gnu.org > Subject: Re: A problem with one instruction multiple latencies and pipelines > > "Qian, Jianhua" <qianjh@cn.fujitsu.com> writes: > > Hi > > > > I'm adding a new machine model. I have a problem when writing the > "define_insn_reservation" for instruction scheduling. > > How to write the "define_insn_reservation" for one instruction that there are > different latencies and pipelines according to parameter. > > > > For example, the ADD (shifted register) instruction in a64fx > > > > Instruction Option Latency > Pipeline > > ADD (shifted register) <amount> = 0 1 EX* > | EAG* > > <amount> = [1-4] && <shift>=LSL 1+1 > (EXA + EXA) | (EXB + EXB) > > 2+1 (EXA > + EXA) | (EXB + EXB) > > > > In aarch64.md ADD (shifted register) instruction is defined as following. > > (define_insn "*add_<shift>_<mode>" > > [(set (match_operand:GPI 0 "register_operand" "=r") > > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" > "r") > > (match_operand:QI 2 > "aarch64_shift_imm_<mode>" "n")) > > (match_operand:GPI 3 "register_operand" "r")))] > > "" > > "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" > > [(set_attr "type" "alu_shift_imm")] > > ) > > > > It could not be distinguished by the type "alu_shift_imm" when writing > "define_insn_reservation" for ADD (shifted register). > > What should I do? > > This is just personal opinion, but in general (from the point of view of a new port, > or a new subport like SVE), I think the best approach to handling the "type" > attribute is to start with the coarsest classification that makes sense, then split > these relatively coarse types up whenever there's a specific need. > > When taking that approach, it's OK (and perhaps even a good sign) for an > existing type to sometimes be too coarse for a new CPU. > > So thanks for asking about this, and please don't feel constrained by the > existing "type" classification. IMO we should split existing types wherever > that makes sense for new CPUs. > > In a situation like this, I guess it would make sense to treat alu_shift_imm as a > conservative general classification and add > alu_shift_imm1to4 (say) as a new type. > > I'd recommend a two-stage approach, with each stage as a separate patch for > ease of review: > > (1) Add the new type to the definition of the "type" attribute and run: > > sed -i s/alu_shift_imm/alu_shift_imm1to4,alu_shift_imm/g > > on all the config/arm and config/aarch64 tuning descriptions. > > This is a purely mechanical replacement and could be tested in > the normal way (i.e. there would be no need to do special testing > to make sure that existing descriptions are unaffected: the change > is mechanical enough not to need that). > > (2) Make the define_insns use the new type where appropriate. > Like Richard says, this would be handled by splitting the single > define_insn alternative above into two separate alternatives. > It's also possible to something like: > > (set (attr "type") > (if_then_else (match_operand ...) > (const_string "...") > (const_string "..."))) > > which localises the change to the attribute definition. > > If we end up needing that construct several times, it would also > be possible to add a new attribute, e.g.: > > (define_attr "shift_imm_type" "none,alu_shift_op2") > > and then change the default value of the type attribute from: > > (const_string "untyped") > > to: > > (cond [(eq_attr "shift_imm_type" "alu_shift_op2") > (if_then_else (match_operand 2 ...) > (const_string "...") > (const_string "..."))] > (const_string "untyped")) > > define_insns like the above could then use: > > (set_attr "shift_imm_type" "alu_shift_op2") > > instead of setting "type" directly. That's definitely more > complicated, but it can help to reduce cut-&-paste. > > Thanks, > Richard > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-07 20:20 ` Richard Sandiford 2020-09-08 5:34 ` Qian, Jianhua @ 2020-09-09 21:22 ` Segher Boessenkool 2020-09-10 5:01 ` Qian, Jianhua 2020-09-10 10:04 ` Richard Sandiford 1 sibling, 2 replies; 20+ messages in thread From: Segher Boessenkool @ 2020-09-09 21:22 UTC (permalink / raw) To: Qian, Jianhua, gcc, richard.sandiford Hi! On Mon, Sep 07, 2020 at 09:20:59PM +0100, Richard Sandiford wrote: > This is just personal opinion, but in general (from the point of view > of a new port, or a new subport like SVE), I think the best approach > to handling the "type" attribute is to start with the coarsest > classification that makes sense, then split these relatively coarse > types up whenever there's a specific need. Agreed. > When taking that approach, it's OK (and perhaps even a good sign) > for an existing type to sometimes be too coarse for a new CPU. > > So thanks for asking about this, and please don't feel constrained > by the existing "type" classification. IMO we should split existing > types wherever that makes sense for new CPUs. You can also use some other attributes to classify instructions, you don't have to put it all in one "type" attribute. This can of course be done later, at a time when it is clearer what a good design will be. Sometimes it is obvious from the start though :-) (This primarily makes the pipeline descriptions much simpler, but also custom scheduling code and the like. If one core has two types of "A" insn, say "Aa" and "Ab", it isn't nice if all other cores now have to handle both "Aa" and "Ab" instead of just "A"). Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: A problem with one instruction multiple latencies and pipelines 2020-09-09 21:22 ` Segher Boessenkool @ 2020-09-10 5:01 ` Qian, Jianhua 2020-09-10 10:04 ` Richard Sandiford 1 sibling, 0 replies; 20+ messages in thread From: Qian, Jianhua @ 2020-09-10 5:01 UTC (permalink / raw) To: Segher Boessenkool, gcc, richard.sandiford Hi Segher > You can also use some other attributes to classify instructions, you don't have > to put it all in one "type" attribute. This can of course be done later, at a time > when it is clearer what a good design will be. > Sometimes it is obvious from the start though :-) Thank you for your advice. It is also a good idea. Considering other cores(existing and future), I think it is better to keep the type attribute unchanging, and add other attributes to classify instructions. Regards Qian > -----Original Message----- > From: Segher Boessenkool <segher@kernel.crashing.org> > Sent: Thursday, September 10, 2020 5:23 AM > To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>; gcc@gcc.gnu.org; > richard.sandiford@arm.com > Subject: Re: A problem with one instruction multiple latencies and pipelines > > Hi! > > On Mon, Sep 07, 2020 at 09:20:59PM +0100, Richard Sandiford wrote: > > This is just personal opinion, but in general (from the point of view > > of a new port, or a new subport like SVE), I think the best approach > > to handling the "type" attribute is to start with the coarsest > > classification that makes sense, then split these relatively coarse > > types up whenever there's a specific need. > > Agreed. > > > When taking that approach, it's OK (and perhaps even a good sign) for > > an existing type to sometimes be too coarse for a new CPU. > > > > So thanks for asking about this, and please don't feel constrained by > > the existing "type" classification. IMO we should split existing > > types wherever that makes sense for new CPUs. > > You can also use some other attributes to classify instructions, you don't have > to put it all in one "type" attribute. This can of course be done later, at a time > when it is clearer what a good design will be. > Sometimes it is obvious from the start though :-) > > (This primarily makes the pipeline descriptions much simpler, but also custom > scheduling code and the like. If one core has two types of "A" > insn, say "Aa" and "Ab", it isn't nice if all other cores now have to handle both > "Aa" and "Ab" instead of just "A"). > > > Segher > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-09 21:22 ` Segher Boessenkool 2020-09-10 5:01 ` Qian, Jianhua @ 2020-09-10 10:04 ` Richard Sandiford 2020-09-10 23:00 ` Segher Boessenkool 1 sibling, 1 reply; 20+ messages in thread From: Richard Sandiford @ 2020-09-10 10:04 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Qian, Jianhua, gcc Segher Boessenkool <segher@kernel.crashing.org> writes: > Hi! > > On Mon, Sep 07, 2020 at 09:20:59PM +0100, Richard Sandiford wrote: >> This is just personal opinion, but in general (from the point of view >> of a new port, or a new subport like SVE), I think the best approach >> to handling the "type" attribute is to start with the coarsest >> classification that makes sense, then split these relatively coarse >> types up whenever there's a specific need. > > Agreed. > >> When taking that approach, it's OK (and perhaps even a good sign) >> for an existing type to sometimes be too coarse for a new CPU. >> >> So thanks for asking about this, and please don't feel constrained >> by the existing "type" classification. IMO we should split existing >> types wherever that makes sense for new CPUs. > > You can also use some other attributes to classify instructions, you > don't have to put it all in one "type" attribute. This can of course be > done later, at a time when it is clearer what a good design will be. > Sometimes it is obvious from the start though :-) > > (This primarily makes the pipeline descriptions much simpler, but also > custom scheduling code and the like. If one core has two types of "A" > insn, say "Aa" and "Ab", it isn't nice if all other cores now have to > handle both "Aa" and "Ab" instead of just "A"). I guess it makes the description of other cores more verbose, but it doesn't IMO make them more complicated. It's still just one test against one attribute. And updating existing descriptions can be done automatically by sed. The difficulty with splitting into subattributes is that the tests in the cores that *do* care then become more complicated. E.g. you have to do: (ior (and (eq_attr "type" "foo") (eq_attr "foo_subtype" "foo1")) (eq_attr "type" "...others..")) and: (ior (and (eq_attr "type" "foo") (eq_attr "foo_subtype" "!foo1")) (eq_attr "type" "...others..")) instead of just: (eq_attr "type" "...") in both cases. It's not too bad when it's just one subtype (like above), but it could easily get out of hand. Perhaps in this case there's an argument in favour of having a separate attribute due to combinatorial explosion. E.g. we have: - alu_shift_imm - alus_shift_imm - logic_shift_imm - logics_shift_imm so having one attribute that describes the shift in all four cases would perhaps be better than splitting each of them into two. Arguments in the other direction: - Once we have a separate attribute, it isn't clear where the line should be drawn. Alternatives include: (1) keeping the new attribute specific to shift immediates only (2) also having a “register” value, and folding: alu_shift_imm, alu_shift_reg -> alu_shift (3) also having a “none” value, and doing away with the *_shift type variants altogether: alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg I think we should have a clear reason for whichever we pick, otherwise we could be creating technical debt for later. - That approach is the opposite of what we did for the neon_* attributes: every type has a q variant, rather than the size being a separate attribute. FWIW, we shouldn't assume that this distinction is specific to a64fx. :-) Thanks, Richard ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-10 10:04 ` Richard Sandiford @ 2020-09-10 23:00 ` Segher Boessenkool 2020-09-11 7:44 ` Richard Sandiford 0 siblings, 1 reply; 20+ messages in thread From: Segher Boessenkool @ 2020-09-10 23:00 UTC (permalink / raw) To: Qian, Jianhua, gcc, richard.sandiford On Thu, Sep 10, 2020 at 11:04:26AM +0100, Richard Sandiford wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > You can also use some other attributes to classify instructions, you > > don't have to put it all in one "type" attribute. This can of course be > > done later, at a time when it is clearer what a good design will be. > > Sometimes it is obvious from the start though :-) > > > > (This primarily makes the pipeline descriptions much simpler, but also > > custom scheduling code and the like. If one core has two types of "A" > > insn, say "Aa" and "Ab", it isn't nice if all other cores now have to > > handle both "Aa" and "Ab" instead of just "A"). > > I guess it makes the description of other cores more verbose, but it > doesn't IMO make them more complicated. It's still just one test > against one attribute. And updating existing descriptions can be > done automatically by sed. Consider cores that do not care about the "subtype" at all: when using just "type", all cores have to test for "foo,foo_subtype", while with a separate attribute they can just test for "foo". A typical example in rs6000 is the "sign_extend" attribute for load instructions. All cores before power4 do not care at all about it. (Load and store insns get into combinatorial explosion land as well, as you discuss below, with "update" and "indexed" forms). > The difficulty with splitting into subattributes is that the tests in > the cores that *do* care then become more complicated. E.g. you have > to do: > > (ior (and (eq_attr "type" "foo") > (eq_attr "foo_subtype" "foo1")) > (eq_attr "type" "...others..")) > > and: > > (ior (and (eq_attr "type" "foo") > (eq_attr "foo_subtype" "!foo1")) > (eq_attr "type" "...others..")) > > instead of just: > > (eq_attr "type" "...") > > in both cases. Yes. It is a trade-off. > It's not too bad when it's just one subtype (like above), but it could > easily get out of hand. > > Perhaps in this case there's an argument in favour of having a separate > attribute due to combinatorial explosion. E.g. we have: > > - alu_shift_imm > - alus_shift_imm > - logic_shift_imm > - logics_shift_imm > > so having one attribute that describes the shift in all four cases > would perhaps be better than splitting each of them into two. Yeas, exactly. And for rs6000 we *did* have many more types before, and very frequently one was missed (in a scheduling description usually). Especially common was when some new type attribute was added, not all existing cpu descriptions were updated correctly. Maybe this is the strongest argument "for" actually :-) > Arguments in the other direction: > > - Once we have a separate attribute, it isn't clear where the line > should be drawn. Good taste ;-) > Alternatives include: > > (1) keeping the new attribute specific to shift immediates only > > (2) also having a “register” value, and folding: > alu_shift_imm, alu_shift_reg -> alu_shift > > (3) also having a “none” value, and doing away with the *_shift > type variants altogether: > alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg > > I think we should have a clear reason for whichever we pick, > otherwise we could be creating technical debt for later. Yes. One example of what we do: ;; Is this instruction using operands[2] as shift amount, and can that be a ;; register? ;; This is used for shift insns. (define_attr "maybe_var_shift" "no,yes" (const_string "no")) ;; Is this instruction using a shift amount from a register? ;; This is used for shift insns. (define_attr "var_shift" "no,yes" (if_then_else (and (eq_attr "type" "shift") (eq_attr "maybe_var_shift" "yes")) (if_then_else (match_operand 2 "gpc_reg_operand") (const_string "yes") (const_string "no")) (const_string "no"))) define_insns only use maybe_var_shift. Only the power6 and e*500* cpu descriptions ever look at var_shift. (Directly. There is some other scheduling code that looks at it too -- and all but the power6 ones seem to be incorrect! That is all a direct translation of "type-only" code there was before...) > - That approach is the opposite of what we did for the neon_* attributes: > every type has a q variant, rather than the size being a separate > attribute. > > FWIW, we shouldn't assume that this distinction is specific to a64fx. :-) Yeah, absolutely. This classifies instructions, of course it is very strongly influenced by what scheduling descriptions want, but it helps a lot if you describe more general characteristics :-) Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-10 23:00 ` Segher Boessenkool @ 2020-09-11 7:44 ` Richard Sandiford 2020-09-11 13:58 ` Segher Boessenkool 0 siblings, 1 reply; 20+ messages in thread From: Richard Sandiford @ 2020-09-11 7:44 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Qian, Jianhua, gcc Segher Boessenkool <segher@kernel.crashing.org> writes: > On Thu, Sep 10, 2020 at 11:04:26AM +0100, Richard Sandiford wrote: >> Segher Boessenkool <segher@kernel.crashing.org> writes: >> > You can also use some other attributes to classify instructions, you >> > don't have to put it all in one "type" attribute. This can of course be >> > done later, at a time when it is clearer what a good design will be. >> > Sometimes it is obvious from the start though :-) >> > >> > (This primarily makes the pipeline descriptions much simpler, but also >> > custom scheduling code and the like. If one core has two types of "A" >> > insn, say "Aa" and "Ab", it isn't nice if all other cores now have to >> > handle both "Aa" and "Ab" instead of just "A"). >> >> I guess it makes the description of other cores more verbose, but it >> doesn't IMO make them more complicated. It's still just one test >> against one attribute. And updating existing descriptions can be >> done automatically by sed. > > Consider cores that do not care about the "subtype" at all: when using > just "type", all cores have to test for "foo,foo_subtype", while with > a separate attribute they can just test for "foo". Right. But that was exactly the case I was considering with the sed comment above :-) I don't see adding a new attribute value to a set_attr as extra complication. It's just adding a value to a set. To me extra complication is adding (ior …)s, (and …)s, etc. > A typical example in rs6000 is the "sign_extend" attribute for load > instructions. All cores before power4 do not care at all about it. > (Load and store insns get into combinatorial explosion land as well, > as you discuss below, with "update" and "indexed" forms). > >> The difficulty with splitting into subattributes is that the tests in >> the cores that *do* care then become more complicated. E.g. you have >> to do: >> >> (ior (and (eq_attr "type" "foo") >> (eq_attr "foo_subtype" "foo1")) >> (eq_attr "type" "...others..")) >> >> and: >> >> (ior (and (eq_attr "type" "foo") >> (eq_attr "foo_subtype" "!foo1")) >> (eq_attr "type" "...others..")) >> >> instead of just: >> >> (eq_attr "type" "...") >> >> in both cases. > > Yes. It is a trade-off. > >> It's not too bad when it's just one subtype (like above), but it could >> easily get out of hand. >> >> Perhaps in this case there's an argument in favour of having a separate >> attribute due to combinatorial explosion. E.g. we have: >> >> - alu_shift_imm >> - alus_shift_imm >> - logic_shift_imm >> - logics_shift_imm >> >> so having one attribute that describes the shift in all four cases >> would perhaps be better than splitting each of them into two. > > Yeas, exactly. And for rs6000 we *did* have many more types before, and > very frequently one was missed (in a scheduling description usually). > Especially common was when some new type attribute was added, not all > existing cpu descriptions were updated correctly. Maybe this is the > strongest argument "for" actually :-) The update shouldn't be done by hand. The regularity of the syntax means that an appropriate sed really is good enough (perhaps with formatting fixes on top). For extra robustness, you can replace the old attribute value with two new ones, so that any missed case triggers a build error. What I'm really wary of with taking the line above is that it feeds into the attitude that existing scheduling descriptions should become fossilised at the point they're added: noone working on a different core should change the declarations in the file later in case they miss something. I don't think that's what you're saying, but it could feed into that general attitude. >> Arguments in the other direction: >> >> - Once we have a separate attribute, it isn't clear where the line >> should be drawn. > > Good taste ;-) > >> Alternatives include: >> >> (1) keeping the new attribute specific to shift immediates only >> >> (2) also having a “register” value, and folding: >> alu_shift_imm, alu_shift_reg -> alu_shift >> >> (3) also having a “none” value, and doing away with the *_shift >> type variants altogether: >> alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg >> >> I think we should have a clear reason for whichever we pick, >> otherwise we could be creating technical debt for later. > > Yes. > > One example of what we do: > > ;; Is this instruction using operands[2] as shift amount, and can that be a > ;; register? > ;; This is used for shift insns. > (define_attr "maybe_var_shift" "no,yes" (const_string "no")) > > ;; Is this instruction using a shift amount from a register? > ;; This is used for shift insns. > (define_attr "var_shift" "no,yes" > (if_then_else (and (eq_attr "type" "shift") > (eq_attr "maybe_var_shift" "yes")) > (if_then_else (match_operand 2 "gpc_reg_operand") > (const_string "yes") > (const_string "no")) > (const_string "no"))) > > define_insns only use maybe_var_shift. Only the power6 and e*500* cpu > descriptions ever look at var_shift. (Directly. There is some other > scheduling code that looks at it too -- and all but the power6 ones seem > to be incorrect! That is all a direct translation of "type-only" code > there was before...) Right. This is similar to the: (define_attr "shift_imm_type" "none,alu_shift_op2") thing I suggested upthread. I think it's better for the define_insn attribute to specify the operand number directly, since there might well be cases where the shift isn't operand 2. Anyway, I think we're in violent agreement on how this works, we're just taking different conclusions from it. Thanks, Richard ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-11 7:44 ` Richard Sandiford @ 2020-09-11 13:58 ` Segher Boessenkool 2020-09-14 5:41 ` Qian, Jianhua 0 siblings, 1 reply; 20+ messages in thread From: Segher Boessenkool @ 2020-09-11 13:58 UTC (permalink / raw) To: Qian, Jianhua, gcc, richard.sandiford Hi! On Fri, Sep 11, 2020 at 08:44:54AM +0100, Richard Sandiford wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > Consider cores that do not care about the "subtype" at all: when using > > just "type", all cores have to test for "foo,foo_subtype", while with > > a separate attribute they can just test for "foo". > > Right. But that was exactly the case I was considering with the sed > comment above :-) > > I don't see adding a new attribute value to a set_attr as extra > complication. It's just adding a value to a set. To me extra > complication is adding (ior …)s, (and …)s, etc. Combinatorial explosion is real. (ior)s etc. are easier to read, and importantly it is much more obvious if any case is missed with them. Here is the very first diff in d839f53b7dfd, where I started this for rs6000: (define_insn_reservation "ppc403-load" 2 - (and (eq_attr "type" "load,load_ext,load_ext_u,load_ext_ux,load_ux,load_u,\ - load_l,store_c,sync") + (and (eq_attr "type" "load,load_l,store_c,sync") (eq_attr "cpu" "ppc403,ppc405")) "iu_40x") We started off with 77 insn types. Now we have 63 (it was 57 before, we added some more). > > Yeas, exactly. And for rs6000 we *did* have many more types before, and > > very frequently one was missed (in a scheduling description usually). > > Especially common was when some new type attribute was added, not all > > existing cpu descriptions were updated correctly. Maybe this is the > > strongest argument "for" actually :-) > > The update shouldn't be done by hand. No, it very much had to, because the existing code missed many cases and was inconsistent in others. > What I'm really wary of with taking the line above is that it feeds > into the attitude that existing scheduling descriptions should become > fossilised at the point they're added: noone working on a different > core should change the declarations in the file later in case they > miss something. I don't think that's what you're saying, but it > could feed into that general attitude. It is very much not what I am saying. I *am* saying that if people trying to improve one CPU's modelling have to edit over 20 models for CPUs that they do not care about, mistakes happen. Such churn is conceptually wrong as well, of course (the new types are just the same as the old types on most older CPUs!) > > Yes. > > > > One example of what we do: > > > > ;; Is this instruction using operands[2] as shift amount, and can that be a > > ;; register? > > ;; This is used for shift insns. > > (define_attr "maybe_var_shift" "no,yes" (const_string "no")) > > > > ;; Is this instruction using a shift amount from a register? > > ;; This is used for shift insns. > > (define_attr "var_shift" "no,yes" > > (if_then_else (and (eq_attr "type" "shift") > > (eq_attr "maybe_var_shift" "yes")) > > (if_then_else (match_operand 2 "gpc_reg_operand") > > (const_string "yes") > > (const_string "no")) > > (const_string "no"))) > > > > define_insns only use maybe_var_shift. Only the power6 and e*500* cpu > > descriptions ever look at var_shift. (Directly. There is some other > > scheduling code that looks at it too -- and all but the power6 ones seem > > to be incorrect! That is all a direct translation of "type-only" code > > there was before...) > > Right. This is similar to the: > > (define_attr "shift_imm_type" "none,alu_shift_op2") > > thing I suggested upthread. I think it's better for the define_insn > attribute to specify the operand number directly, since there might > well be cases where the shift isn't operand 2. Not in this case; all such insns always have it as op 2 in the machine instruction, and RTL almost always uses the same order. > Anyway, I think we're in violent agreement on how this works, we're just > taking different conclusions from it. Yeah :-) And not even so different: I think you agree it is a trade-off, you just see it tilt more to one side, while I see it go to the other side often. We still have over fifty instruction types. We (hopefully :-) ) use the "extra attribute" method only where that helps more than it hurts. It is just another tool in the toolbox. Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: A problem with one instruction multiple latencies and pipelines 2020-09-11 13:58 ` Segher Boessenkool @ 2020-09-14 5:41 ` Qian, Jianhua 2020-09-14 9:55 ` Richard Sandiford 0 siblings, 1 reply; 20+ messages in thread From: Qian, Jianhua @ 2020-09-14 5:41 UTC (permalink / raw) To: Segher Boessenkool, gcc, richard.sandiford Hi Richard and Segher I don't know if I exactly understood your discussion. If I misunderstood, please let me know. I am trying to test these two cases. Case 1. keep the TYPE attribute unchanged, add new attributes It works well as below. (define_attr "shift_imm_value" "shift_imm_none,shift_imm_0,shift_imm_1to4,shift_imm_ge5" (const_string "shift_imm_none")) (define_insn "*add_<shift>_<mode>" [(set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n")) (match_operand:GPI 3 "register_operand" "r")))] "" "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" [(set_attr "type" "alu_shift_imm") (set (attr "shift_imm_value") (cond [(eq (symbol_ref "XINT(operands[2],0)") (const_int 0)) (const_string "shift_imm_0") (and (ge (symbol_ref "XINT(operands[2],0)") (const_int 1)) (le (symbol_ref "XINT(operands[2],0)") (const_int 4))) (const_string "shift_imm_1to4")] (const_string "shift_imm_ge5")))] ) (define_insn_reservation "a64fx_alu_shift1to4" 2 (and (eq_attr "tune" "a64fx") (eq_attr "type" "alu_shift_imm") (eq_attr "shift_imm_value" "shift_imm_1to4")) "(a64fx_exa,a64fx_exa)|(a64fx_exb,a64fx_exb)") Case2. expand the TYPE attribute I replaced the alu_shift_imm to alu_shift_imm, alu_shift_imm1to4, and I got the error message " bad number of entries in SET_ATTR_ALTERNATIVE, was 2 expected 1" (define_attr "type" ... alu_shift_imm,\ alu_shift_imm1to4,\ ... (define_insn "*add_<shift>_<mode>" [(set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n")) (match_operand:GPI 3 "register_operand" "r")))] "" "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" [(set_attr "type" "alu_shift_imm,alu_shift_imm1to4")] ) It means that one "type" value should be matched by one operand constraint pattern. So this will raise two problems. 1. One instruction should be classified to multiple patterns - define multiple operand constraints, such as "(match_operand:QI 2 "aarch64_shift_imm_<mode>" "0, 1to4, ...")" - or, use "cond" at set_attr just like Case1. 2. Whatever you do with the first problem, the type attribute cannot be set with "alu_shift_imm" and "alu_shift_imm1to4" at the same time by one operand constraint pattern. - If we cannot resolve it, the existing CPUs' descriptions need to be changed. This is not what I expected. - If we want to add new attribute to resolve this problem, why not use the Case1 directly? > It is very much not what I am saying. I *am* saying that if people trying to > improve one CPU's modelling have to edit over 20 models for CPUs that they do > not care about, mistakes happen. That's what I'm worried about. Regards Qian > -----Original Message----- > From: Segher Boessenkool <segher@kernel.crashing.org> > Sent: Friday, September 11, 2020 9:59 PM > To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>; gcc@gcc.gnu.org; > richard.sandiford@arm.com > Subject: Re: A problem with one instruction multiple latencies and pipelines > > Hi! > > On Fri, Sep 11, 2020 at 08:44:54AM +0100, Richard Sandiford wrote: > > Segher Boessenkool <segher@kernel.crashing.org> writes: > > > Consider cores that do not care about the "subtype" at all: when > > > using just "type", all cores have to test for "foo,foo_subtype", > > > while with a separate attribute they can just test for "foo". > > > > Right. But that was exactly the case I was considering with the sed > > comment above :-) > > > > I don't see adding a new attribute value to a set_attr as extra > > complication. It's just adding a value to a set. To me extra > > complication is adding (ior …)s, (and …)s, etc. > > Combinatorial explosion is real. > > (ior)s etc. are easier to read, and importantly it is much more obvious if any case > is missed with them. > > Here is the very first diff in d839f53b7dfd, where I started this for > rs6000: > > (define_insn_reservation "ppc403-load" 2 > - (and (eq_attr "type" > "load,load_ext,load_ext_u,load_ext_ux,load_ux,load_u,\ > - load_l,store_c,sync") > + (and (eq_attr "type" "load,load_l,store_c,sync") > (eq_attr "cpu" "ppc403,ppc405")) > "iu_40x") > > We started off with 77 insn types. Now we have 63 (it was 57 before, we added > some more). > > > > Yeas, exactly. And for rs6000 we *did* have many more types before, > > > and very frequently one was missed (in a scheduling description usually). > > > Especially common was when some new type attribute was added, not > > > all existing cpu descriptions were updated correctly. Maybe this is > > > the strongest argument "for" actually :-) > > > > The update shouldn't be done by hand. > > No, it very much had to, because the existing code missed many cases and was > inconsistent in others. > > > What I'm really wary of with taking the line above is that it feeds > > into the attitude that existing scheduling descriptions should become > > fossilised at the point they're added: noone working on a different > > core should change the declarations in the file later in case they > > miss something. I don't think that's what you're saying, but it could > > feed into that general attitude. > > It is very much not what I am saying. I *am* saying that if people trying to > improve one CPU's modelling have to edit over 20 models for CPUs that they do > not care about, mistakes happen. Such churn is conceptually wrong as well, > of course (the new types are just the same as the old types on most older > CPUs!) > > > > Yes. > > > > > > One example of what we do: > > > > > > ;; Is this instruction using operands[2] as shift amount, and can > > > that be a ;; register? > > > ;; This is used for shift insns. > > > (define_attr "maybe_var_shift" "no,yes" (const_string "no")) > > > > > > ;; Is this instruction using a shift amount from a register? > > > ;; This is used for shift insns. > > > (define_attr "var_shift" "no,yes" > > > (if_then_else (and (eq_attr "type" "shift") > > > (eq_attr "maybe_var_shift" "yes")) > > > (if_then_else (match_operand 2 "gpc_reg_operand") > > > (const_string "yes") > > > (const_string "no")) > > > (const_string "no"))) > > > > > > define_insns only use maybe_var_shift. Only the power6 and e*500* > > > cpu descriptions ever look at var_shift. (Directly. There is some > > > other scheduling code that looks at it too -- and all but the power6 > > > ones seem to be incorrect! That is all a direct translation of > > > "type-only" code there was before...) > > > > Right. This is similar to the: > > > > (define_attr "shift_imm_type" "none,alu_shift_op2") > > > > thing I suggested upthread. I think it's better for the define_insn > > attribute to specify the operand number directly, since there might > > well be cases where the shift isn't operand 2. > > Not in this case; all such insns always have it as op 2 in the machine > instruction, and RTL almost always uses the same order. > > > Anyway, I think we're in violent agreement on how this works, we're > > just taking different conclusions from it. > > Yeah :-) And not even so different: I think you agree it is a trade-off, you just > see it tilt more to one side, while I see it go to the other side often. > > We still have over fifty instruction types. We (hopefully :-) ) use the "extra > attribute" method only where that helps more than it hurts. It is just another > tool in the toolbox. > > > Segher > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-14 5:41 ` Qian, Jianhua @ 2020-09-14 9:55 ` Richard Sandiford 2020-09-14 18:41 ` Segher Boessenkool 0 siblings, 1 reply; 20+ messages in thread From: Richard Sandiford @ 2020-09-14 9:55 UTC (permalink / raw) To: Qian, Jianhua; +Cc: Segher Boessenkool, gcc "Qian, Jianhua" <qianjh@cn.fujitsu.com> writes: > Hi Richard and Segher > > I don't know if I exactly understood your discussion. > If I misunderstood, please let me know. > > I am trying to test these two cases. > Case 1. keep the TYPE attribute unchanged, add new attributes > It works well as below. > (define_attr "shift_imm_value" "shift_imm_none,shift_imm_0,shift_imm_1to4,shift_imm_ge5" (const_string "shift_imm_none")) I'm not sure the shift_imm_0 case is useful here FWIW. alu_shift_imm should only be used with nonzero shift amounts. Also, the associated C++ enums are automatically prefixed with the name of the attribute, so I think the values should just be 1to4, ge5, etc., with no prefix. > (define_insn "*add_<shift>_<mode>" > [(set (match_operand:GPI 0 "register_operand" "=r") > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") > (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n")) > (match_operand:GPI 3 "register_operand" "r")))] > "" > "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" > [(set_attr "type" "alu_shift_imm") > (set (attr "shift_imm_value") > (cond [(eq (symbol_ref "XINT(operands[2],0)") (const_int 0)) (const_string "shift_imm_0") > (and (ge (symbol_ref "XINT(operands[2],0)") (const_int 1)) (le (symbol_ref "XINT(operands[2],0)") (const_int 4))) (const_string "shift_imm_1to4")] > (const_string "shift_imm_ge5")))] > ) I'll come back to this below. > > (define_insn_reservation "a64fx_alu_shift1to4" 2 > (and (eq_attr "tune" "a64fx") > (eq_attr "type" "alu_shift_imm") > (eq_attr "shift_imm_value" "shift_imm_1to4")) > "(a64fx_exa,a64fx_exa)|(a64fx_exb,a64fx_exb)") > > > Case2. expand the TYPE attribute > I replaced the alu_shift_imm to alu_shift_imm, alu_shift_imm1to4, and I got the error message " bad number of entries in SET_ATTR_ALTERNATIVE, was 2 expected 1" > (define_attr "type" > ... > alu_shift_imm,\ > alu_shift_imm1to4,\ > ... > > (define_insn "*add_<shift>_<mode>" > [(set (match_operand:GPI 0 "register_operand" "=r") > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") > (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n")) > (match_operand:GPI 3 "register_operand" "r")))] > "" > "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" > [(set_attr "type" "alu_shift_imm,alu_shift_imm1to4")] > ) > > It means that one "type" value should be matched by one operand constraint pattern. So this will raise two problems. > 1. One instruction should be classified to multiple patterns > - define multiple operand constraints, such as "(match_operand:QI 2 "aarch64_shift_imm_<mode>" "0, 1to4, ...")" > - or, use "cond" at set_attr just like Case1. > 2. Whatever you do with the first problem, the type attribute cannot be set with "alu_shift_imm" > and "alu_shift_imm1to4" at the same time by one operand constraint pattern. Right. Something needs to tell the compiler how to distinguish between the two cases. > - If we cannot resolve it, the existing CPUs' descriptions need to be changed. This is not what I expected. > - If we want to add new attribute to resolve this problem, why not use the Case1 directly? It's a trade-off. At the moment the rule for arm and aarch64 is simple: the "type" attribute specifies the complete scheduling type. If we instead change policy and divide the information among several attributes, those attributes will have increasingly complex relationships with one another. It will become easier to specify a meaningless combination by mistake, or to forget to specify one of the relevant attributes. I agree it's not so bad with just one extra attribute. The question is more: if we change the policy, what would things look like if there were more attributes of a similar nature? On the other side, lumping everything into one attributes gives a long list of values. I think the difference of opinion is that Segher finds this argument more compelling while I find the other argument more compelling ;-) That said, there are really two issues here: (1) how should the define_insn describe the instruction? (2) how should scheduling descriptions test for it? The same answer for (1) can support multiple answers for (2). For (1), I think we should do what I mentioned in my first reply and add something like (different name from last time): ;; config/arm/types.md (define_attr "autodetect_type" "none,alu_shift_op2") This attribute would in principle be a home for any case in which the scheduling type needs to be derived from other information (in this case using the contents of operand 2 to infer the alu_shift kind). So going back to what I said above: we would be changing policy so that define_insns can use one of *two* attributes to specify the scheduling type: - they can use "type" to specify an exact scheduling type directly - they can use "autodetect_type" to specify a method for deriving the scheduling type from other information The *add_<shift>_<mode> define_insn above would then use: [(set_attr "autodetect_type" "alu_shift_op2")] The advantage of this IMO is that we still only have two attributes, even if we need to autodetect other cases in future. It also means that there is only one copy of the condition to pick between (say) 1to4 and ge5. This approach to (1) supports both approaches to (2). If the CPU models continue to use a single type attribute, we can use autodetect_type to choose an appropriate default, using the method I mentioned in my first reply: ;; config/arm/types.md (define_attr "type" "…long list…" (cond [(eq_attr "autodetect_type" "alu_shift_op2") (if_then_else (match_operand 2 "const_1_to_4_operand") (const_string "alu_shift_imm_1to4") (const_string "alu_shift_imm_ge5"))] (const_string "untyped"))) where: - const_1_to_4_operand is a new predicate. It could be defined in a new config/arm/common.md file that is shared by both arm and aarch64. - alu_shift_imm is replaced by two new attributes: alu_shift_imm_1to4 and alu_shift_imm_ge5. (In principle, this condition could also handle alu_shift_reg, if that proved to be useful later.) If we instead use multiple attributes, the definitions would be: (define_attr "type" "…" (cond [(eq_attr "autodetect_type" "alu_shift_op2") (const_string "alu_shift_imm")] (const_string "untyped"))) (define_attr "shift_imm_value" "none,1to4,ge5" (cond [(eq_attr "autodetect_type" "alu_shift_op2") (if_then_else (match_operand 2 "const_1_to_4_operand") (const_string "1to4") (const_string "ge5"))] (const_string "none")) I still prefer the single type attribute, for the reasons above. FWIW, this script updates the CPU descriptions so that every test for alu_shift_imm is replaced by a test for the two new attributes: find gcc/config/arm gcc/config/aarch64 -name '*.md' | while read arg; do case $arg in */arm.md | */aarch64.md | */types.md | */thumb1.md | */thumb2.md | \ */arm-fixed.md) ;; *) sed -i s/alu_shift_imm/alu_shift_imm_1to4,alu_shift_imm_ge5/g $arg ;; esac done (not idempotent :-)) Then you'd need to update the non-CPU files that were excluded by the case statement above. Although this looks/sounds complicated, the advantage is that everything remains up-to-date. If we instead added a second attribute and only defined it for instructions like *add_<shift>_<mode>, other instructions (including config/arm instructions) would still have type alu_shift_imm but would have a shift_imm_value of "none". Sorry for the long message. Thanks, Richard ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-14 9:55 ` Richard Sandiford @ 2020-09-14 18:41 ` Segher Boessenkool 2020-09-14 19:35 ` Richard Sandiford 0 siblings, 1 reply; 20+ messages in thread From: Segher Boessenkool @ 2020-09-14 18:41 UTC (permalink / raw) To: Qian, Jianhua, gcc, richard.sandiford Hi! On Mon, Sep 14, 2020 at 10:55:35AM +0100, Richard Sandiford wrote: > "Qian, Jianhua" <qianjh@cn.fujitsu.com> writes: > > - If we cannot resolve it, the existing CPUs' descriptions need to be changed. This is not what I expected. > > - If we want to add new attribute to resolve this problem, why not use the Case1 directly? > > It's a trade-off. At the moment the rule for arm and aarch64 is simple: > the "type" attribute specifies the complete scheduling type. If we > instead change policy and divide the information among several attributes, > those attributes will have increasingly complex relationships with one > another. It will become easier to specify a meaningless combination > by mistake, or to forget to specify one of the relevant attributes. I found exactly the opposite, when I did this to rs6000: this is much *less* error-prone. > I agree it's not so bad with just one extra attribute. The question > is more: if we change the policy, what would things look like if there > were more attributes of a similar nature? You often can do new attributes that apply to quite a few different instructions. Like, as an extreme example, we have ;; What data size does this instruction work on? ;; This is used for insert, mul and others as necessary. (define_attr "size" "8,16,32,64,128" (const_string "32")) But there also is ;; Is this instruction record form ("dot", signed compare to 0, writing CR0)? ;; This is used for add, logical, shift, exts, mul. (define_attr "dot" "no,yes" (const_string "no")) and a few more. But just that: a few. It isn't good to have very many extra attributes, in the same way that very many insn types makes things harder than needed. > On the other side, lumping everything into one attributes gives a > long list of values. I think the difference of opinion is that Segher > finds this argument more compelling while I find the other argument > more compelling ;-) It's a trade-off, we agree on that :-) And I'd be conservative here, we agree on that as well. > Although this looks/sounds complicated, the advantage is that everything > remains up-to-date. If we instead added a second attribute and only > defined it for instructions like *add_<shift>_<mode>, other instructions > (including config/arm instructions) would still have type alu_shift_imm > but would have a shift_imm_value of "none". I would make an attribute for the mode (or data size really), and one that says the insn uses shifted data (since many arm insns have that, just like record form on PowerPC is everywhere). And you can have that one filled "by magic" usually! Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-14 18:41 ` Segher Boessenkool @ 2020-09-14 19:35 ` Richard Sandiford 2020-09-14 22:14 ` Segher Boessenkool 0 siblings, 1 reply; 20+ messages in thread From: Richard Sandiford @ 2020-09-14 19:35 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Qian, Jianhua, gcc Segher Boessenkool <segher@kernel.crashing.org> writes: >> Although this looks/sounds complicated, the advantage is that everything >> remains up-to-date. If we instead added a second attribute and only >> defined it for instructions like *add_<shift>_<mode>, other instructions >> (including config/arm instructions) would still have type alu_shift_imm >> but would have a shift_imm_value of "none". > > I would make an attribute for the mode (or data size really), and one > that says the insn uses shifted data (since many arm insns have that, > just like record form on PowerPC is everywhere). And you can have that > one filled "by magic" usually! I don't think this is really answering my point above though. What I meant is: we currently have several instructions in config/arm and config/aarch64 that have type alu_shift_imm. If we add some new on-the-side attributes A, but only update *some* of the alu_shift_imm instructions to define A (either directly or indirectly), then the other alu_shift_imm instructions will have the default values for A. This probably isn't the intended effect. Ideally, every alu_shift_imm instruction would specify correct attribute values for A (specifically, to indicate whether the shift value is in [1, 4] or not). In contrast, one advantage of replacing the existing alu_shift_imm type with two new types is that any existing reference to the old type will cause a build failure. So keeping everything in a single type attributes gives us static type checking that the information for each (former) alu_shift_imm instruction is complete. Similarly for any other type that needs to be split in the same way. I realise this won't convince you, and I'm not trying to. :-) Thanks, Richard ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-14 19:35 ` Richard Sandiford @ 2020-09-14 22:14 ` Segher Boessenkool 0 siblings, 0 replies; 20+ messages in thread From: Segher Boessenkool @ 2020-09-14 22:14 UTC (permalink / raw) To: Qian, Jianhua, gcc, richard.sandiford On Mon, Sep 14, 2020 at 08:35:44PM +0100, Richard Sandiford wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > >> Although this looks/sounds complicated, the advantage is that everything > >> remains up-to-date. If we instead added a second attribute and only > >> defined it for instructions like *add_<shift>_<mode>, other instructions > >> (including config/arm instructions) would still have type alu_shift_imm > >> but would have a shift_imm_value of "none". > > > > I would make an attribute for the mode (or data size really), and one > > that says the insn uses shifted data (since many arm insns have that, > > just like record form on PowerPC is everywhere). And you can have that > > one filled "by magic" usually! > > I don't think this is really answering my point above though. > What I meant is: we currently have several instructions in config/arm > and config/aarch64 that have type alu_shift_imm. If we add some new > on-the-side attributes A, but only update *some* of the alu_shift_imm > instructions to define A (either directly or indirectly), then the other > alu_shift_imm instructions will have the default values for A. > This probably isn't the intended effect. Ideally, every alu_shift_imm > instruction would specify correct attribute values for A (specifically, > to indicate whether the shift value is in [1, 4] or not). > > In contrast, one advantage of replacing the existing alu_shift_imm type > with two new types is that any existing reference to the old type will > cause a build failure. So keeping everything in a single type > attributes gives us static type checking that the information for each > (former) alu_shift_imm instruction is complete. Similarly for any other > type that needs to be split in the same way. Removing the old types causes build failures as well if you forget some cases. > I realise this won't convince you, and I'm not trying to. :-) And you don't have to :-) I am telling you about a scheme that worked really well for us; I am not trying to convince you to do that in your own target, that decision is all your own. Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-07 6:08 A problem with one instruction multiple latencies and pipelines Qian, Jianhua 2020-09-07 7:40 ` Richard Biener 2020-09-07 20:20 ` Richard Sandiford @ 2020-09-11 13:30 ` Richard Earnshaw 2020-09-14 2:53 ` Qian, Jianhua 2 siblings, 1 reply; 20+ messages in thread From: Richard Earnshaw @ 2020-09-11 13:30 UTC (permalink / raw) To: Qian, Jianhua, gcc On 07/09/2020 07:08, Qian, Jianhua wrote: > Hi > > I'm adding a new machine model. I have a problem when writing the "define_insn_reservation" for instruction scheduling. > How to write the "define_insn_reservation" for one instruction that there are different latencies and pipelines according to parameter. > > For example, the ADD (shifted register) instruction in a64fx > > Instruction Option Latency Pipeline > ADD (shifted register) <amount> = 0 1 EX* | EAG* > <amount> = [1-4] && <shift>=LSL 1+1 (EXA + EXA) | (EXB + EXB) > 2+1 (EXA + EXA) | (EXB + EXB) > A shift by immediate zero isn't a shift, so should never use this RTL pattern. We can ignore that case. > In aarch64.md ADD (shifted register) instruction is defined as following. > (define_insn "*add_<shift>_<mode>" > [(set (match_operand:GPI 0 "register_operand" "=r") > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") > (match_operand:QI 2 "aarch64_shift_imm_<mode>" "n")) > (match_operand:GPI 3 "register_operand" "r")))] > "" > "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" > [(set_attr "type" "alu_shift_imm")] > ) You might consider using a define_bypass to adjust the cost - the matcher rule takes a producer and consumer RTL - you don't care about the consumer, but you can use the bypass to reduce the cost if the producer uses an immediate in the 'low latency' range. This would avoid having to make a load of whole-sale changes to the main parts of the machine description. > > It could not be distinguished by the type "alu_shift_imm" when writing "define_insn_reservation" for ADD (shifted register). > What should I do? > > Regards > Qian > > > R. ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: A problem with one instruction multiple latencies and pipelines 2020-09-11 13:30 ` Richard Earnshaw @ 2020-09-14 2:53 ` Qian, Jianhua 2020-09-14 9:08 ` Richard Earnshaw 0 siblings, 1 reply; 20+ messages in thread From: Qian, Jianhua @ 2020-09-14 2:53 UTC (permalink / raw) To: Richard Earnshaw, gcc > -----Original Message----- > From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com> > Sent: Friday, September 11, 2020 9:30 PM > To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>; gcc@gcc.gnu.org > Subject: Re: A problem with one instruction multiple latencies and pipelines > > On 07/09/2020 07:08, Qian, Jianhua wrote: > > Hi > > > > I'm adding a new machine model. I have a problem when writing the > "define_insn_reservation" for instruction scheduling. > > How to write the "define_insn_reservation" for one instruction that there are > different latencies and pipelines according to parameter. > > > > For example, the ADD (shifted register) instruction in a64fx > > > > Instruction Option Latency > Pipeline > > ADD (shifted register) <amount> = 0 1 EX* > | EAG* > > <amount> = [1-4] && <shift>=LSL 1+1 > (EXA + EXA) | (EXB + EXB) > > 2+1 (EXA > + EXA) | (EXB + EXB) > > > > A shift by immediate zero isn't a shift, so should never use this RTL pattern. > We can ignore that case. > > > In aarch64.md ADD (shifted register) instruction is defined as following. > > (define_insn "*add_<shift>_<mode>" > > [(set (match_operand:GPI 0 "register_operand" "=r") > > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" > "r") > > (match_operand:QI 2 > "aarch64_shift_imm_<mode>" "n")) > > (match_operand:GPI 3 "register_operand" "r")))] > > "" > > "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" > > [(set_attr "type" "alu_shift_imm")] > > ) > > You might consider using a define_bypass to adjust the cost - the matcher rule > takes a producer and consumer RTL - you don't care about the consumer, but > you can use the bypass to reduce the cost if the producer uses an immediate in > the 'low latency' range. This would avoid having to make a load of whole-sale > changes to the main parts of the machine description. Thanks for your comment. But I think the define_bypass can only change the latency for special instruction. Pipeline also could be changed by define_bypass? Regards Qian > > > > It could not be distinguished by the type "alu_shift_imm" when writing > "define_insn_reservation" for ADD (shifted register). > > What should I do? > > > > Regards > > Qian > > > > > > > > R. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: A problem with one instruction multiple latencies and pipelines 2020-09-14 2:53 ` Qian, Jianhua @ 2020-09-14 9:08 ` Richard Earnshaw 0 siblings, 0 replies; 20+ messages in thread From: Richard Earnshaw @ 2020-09-14 9:08 UTC (permalink / raw) To: Qian, Jianhua, gcc On 14/09/2020 03:53, Qian, Jianhua wrote: >> -----Original Message----- >> From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com> >> Sent: Friday, September 11, 2020 9:30 PM >> To: Qian, Jianhua/钱 建华 <qianjh@cn.fujitsu.com>; gcc@gcc.gnu.org >> Subject: Re: A problem with one instruction multiple latencies and pipelines >> >> On 07/09/2020 07:08, Qian, Jianhua wrote: >>> Hi >>> >>> I'm adding a new machine model. I have a problem when writing the >> "define_insn_reservation" for instruction scheduling. >>> How to write the "define_insn_reservation" for one instruction that there are >> different latencies and pipelines according to parameter. >>> >>> For example, the ADD (shifted register) instruction in a64fx >>> >>> Instruction Option Latency >> Pipeline >>> ADD (shifted register) <amount> = 0 1 EX* >> | EAG* >>> <amount> = [1-4] && <shift>=LSL 1+1 >> (EXA + EXA) | (EXB + EXB) >>> 2+1 (EXA >> + EXA) | (EXB + EXB) >>> >> >> A shift by immediate zero isn't a shift, so should never use this RTL pattern. >> We can ignore that case. >> >>> In aarch64.md ADD (shifted register) instruction is defined as following. >>> (define_insn "*add_<shift>_<mode>" >>> [(set (match_operand:GPI 0 "register_operand" "=r") >>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" >> "r") >>> (match_operand:QI 2 >> "aarch64_shift_imm_<mode>" "n")) >>> (match_operand:GPI 3 "register_operand" "r")))] >>> "" >>> "add\\t%<w>0, %<w>3, %<w>1, <shift> %2" >>> [(set_attr "type" "alu_shift_imm")] >>> ) >> >> You might consider using a define_bypass to adjust the cost - the matcher rule >> takes a producer and consumer RTL - you don't care about the consumer, but >> you can use the bypass to reduce the cost if the producer uses an immediate in >> the 'low latency' range. This would avoid having to make a load of whole-sale >> changes to the main parts of the machine description. > > Thanks for your comment. > But I think the define_bypass can only change the latency for special instruction. > Pipeline also could be changed by define_bypass? > Possibly, but if this is part of the out-of-order units of the pipe, I really don't think it will matter. In fact, I'm not even convinced that trying to model the out-of-order stages is worthwhile - let the CPU handle that: any long-latency instruction, such as a memory access that misses the L1 cache will completely mess up the compiler's understanding of the pipeline state anyway. What I think is more important is to get a good model for the in-order bits at the front of the pipe accurately modelled so that you can maximize the throughput of those stages. Try to get a mix of instructions so that a single issue unit in the core doesn't get clogged up and block further decode. R. > Regards > Qian > >>> >>> It could not be distinguished by the type "alu_shift_imm" when writing >> "define_insn_reservation" for ADD (shifted register). >>> What should I do? >>> >>> Regards >>> Qian >>> >>> >>> >> >> R. >> > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2020-09-14 22:15 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-07 6:08 A problem with one instruction multiple latencies and pipelines Qian, Jianhua 2020-09-07 7:40 ` Richard Biener 2020-09-07 8:45 ` Qian, Jianhua 2020-09-07 11:58 ` Richard Biener 2020-09-07 20:20 ` Richard Sandiford 2020-09-08 5:34 ` Qian, Jianhua 2020-09-09 21:22 ` Segher Boessenkool 2020-09-10 5:01 ` Qian, Jianhua 2020-09-10 10:04 ` Richard Sandiford 2020-09-10 23:00 ` Segher Boessenkool 2020-09-11 7:44 ` Richard Sandiford 2020-09-11 13:58 ` Segher Boessenkool 2020-09-14 5:41 ` Qian, Jianhua 2020-09-14 9:55 ` Richard Sandiford 2020-09-14 18:41 ` Segher Boessenkool 2020-09-14 19:35 ` Richard Sandiford 2020-09-14 22:14 ` Segher Boessenkool 2020-09-11 13:30 ` Richard Earnshaw 2020-09-14 2:53 ` Qian, Jianhua 2020-09-14 9:08 ` Richard Earnshaw
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).