* Re: [PATCH] i386: correct division modeling in lujiazui.md
@ 2022-12-30 2:21 Mayshao-oc
2022-12-30 8:26 ` Uros Bizjak
0 siblings, 1 reply; 4+ messages in thread
From: Mayshao-oc @ 2022-12-30 2:21 UTC (permalink / raw)
To: Alexander Monakov, gcc-patches
Cc: Uros Bizjak, Jan Hubicka, Louis Qi(BJ-RD), Hawk Wang(BJ-RD)
>Ping. If there are any questions or concerns about the patch, please let me
>know: I'm interested in continuing this cleanup at least for older AMD models.
>
Hi Alexander:
According to the speccpu2017 benchmark result, the patch looks good in lujiazui.
BR
Mayshao
>I noticed I had an extra line in my Changelog:
>
>> (lua_sseicvt_si): Ditto.
>
>It got there accidentally and I will drop it.
>
>Alexander
>
>On Fri, 9 Dec 2022, Alexander Monakov wrote:
>
>> Model the divider in Lujiazui processors as a separate automaton to
>> significantly reduce the overall model size. This should also result
>> in improved accuracy, as pipe 0 should be able to accept new
>> instructions while the divider is occupied.
>>
>> It is unclear why integer divisions are modeled as if pipes 0-3 are
>> all occupied. I've opted to keep a single-cycle reservation of all
>> four pipes together, so GCC should continue trying to pack
>> instructions around a division accordingly.
>>
>> Currently top three symbols in insn-automata.o are:
>>
>> 106102 r lujiazui_core_check
>> 106102 r lujiazui_core_transitions
>> 196123 r lujiazui_core_min_issue_delay
>>
>> This patch shrinks all lujiazui tables to:
>>
>> 3 r lujiazui_decoder_min_issue_delay
>> 20 r lujiazui_decoder_transitions
>> 32 r lujiazui_agu_min_issue_delay
>> 126 r lujiazui_agu_transitions
>> 304 r lujiazui_div_base
>> 352 r lujiazui_div_check
>> 352 r lujiazui_div_transitions
>> 1152 r lujiazui_core_min_issue_delay
>> 1592 r lujiazui_agu_translate
>> 1592 r lujiazui_core_translate
>> 1592 r lujiazui_decoder_translate
>> 1592 r lujiazui_div_translate
>> 3952 r lujiazui_div_min_issue_delay
>> 9216 r lujiazui_core_transitions
>>
>> This continues the work on reducing i386 insn-automata.o size started
>> with similar fixes for division and multiplication instructions in
>> znver.md [1][2]. I plan to submit corresponding fixes for
>> b[td]ver[123].md as well.
>>
>> [1]
>> https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f12
>> 15f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543
>> [2]
>> https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonak
>> ov@ispras.ru/
>>
>> gcc/ChangeLog:
>>
>> PR target/87832
>> * config/i386/lujiazui.md (lujiazui_div): New automaton.
>> (lua_div): New unit.
>> (lua_idiv_qi): Correct unit in the reservation.
>> (lua_idiv_qi_load): Ditto.
>> (lua_idiv_hi): Ditto.
>> (lua_idiv_hi_load): Ditto.
>> (lua_idiv_si): Ditto.
>> (lua_idiv_si_load): Ditto.
>> (lua_idiv_di): Ditto.
>> (lua_idiv_di_load): Ditto.
>> (lua_fdiv_SF): Ditto.
>> (lua_fdiv_SF_load): Ditto.
>> (lua_fdiv_DF): Ditto.
>> (lua_fdiv_DF_load): Ditto.
>> (lua_fdiv_XF): Ditto.
>> (lua_fdiv_XF_load): Ditto.
>> (lua_ssediv_SF): Ditto.
>> (lua_ssediv_load_SF): Ditto.
>> (lua_ssediv_V4SF): Ditto.
>> (lua_ssediv_load_V4SF): Ditto.
>> (lua_ssediv_V8SF): Ditto.
>> (lua_ssediv_load_V8SF): Ditto.
>> (lua_ssediv_SD): Ditto.
>> (lua_ssediv_load_SD): Ditto.
>> (lua_ssediv_V2DF): Ditto.
>> (lua_ssediv_load_V2DF): Ditto.
>> (lua_ssediv_V4DF): Ditto.
>> (lua_ssediv_load_V4DF): Ditto.
>> (lua_sseicvt_si): Ditto.
>> ---
>> gcc/config/i386/lujiazui.md | 58
>> +++++++++++++++++++------------------
>> 1 file changed, 30 insertions(+), 28 deletions(-)
>>
>> diff --git a/gcc/config/i386/lujiazui.md b/gcc/config/i386/lujiazui.md
>> index 9046c09f2..58a230c70 100644
>> --- a/gcc/config/i386/lujiazui.md
>> +++ b/gcc/config/i386/lujiazui.md
>> @@ -19,8 +19,8 @@
>>
>> ;; Scheduling for ZHAOXIN lujiazui processor.
>>
>> -;; Modeling automatons for decoders, execution pipes and AGU pipes.
>> -(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu")
>> +;; Modeling automatons for decoders, execution pipes, AGU pipes, and divider.
>> +(define_automaton
>> +"lujiazui_decoder,lujiazui_core,lujiazui_agu,lujiazui_div")
>>
>> ;; The rules for the decoder are simple:
>> ;; - an instruction with 1 uop can be decoded by any of the three @@
>> -55,6 +55,8 @@ (define_reservation "lua_decoder01"
>> "lua_decoder0|lua_decoder1") (define_cpu_unit
>> "lua_p0,lua_p1,lua_p2,lua_p3" "lujiazui_core") (define_cpu_unit
>> "lua_p4,lua_p5" "lujiazui_agu")
>>
>> +(define_cpu_unit "lua_div" "lujiazui_div")
>> +
>> (define_reservation "lua_p03" "lua_p0|lua_p3") (define_reservation
>> "lua_p12" "lua_p1|lua_p2") (define_reservation "lua_p1p2"
>> "lua_p1+lua_p2") @@ -229,56 +231,56 @@ (define_insn_reservation
>> "lua_idiv_qi" 21
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "QI")
>> (eq_attr "type" "idiv"))))
>> - "lua_decoder0,lua_p0p1p2p3*21")
>> + "lua_decoder0,lua_p0p1p2p3,lua_div*21")
>>
>> (define_insn_reservation "lua_idiv_qi_load" 25
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "QI")
>> (eq_attr "type" "idiv"))))
>> - "lua_decoder0,lua_p45,lua_p0p1p2p3*21")
>> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*21")
>>
>> (define_insn_reservation "lua_idiv_hi" 22
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "HI")
>> (eq_attr "type" "idiv"))))
>> - "lua_decoder0,lua_p0p1p2p3*22")
>> + "lua_decoder0,lua_p0p1p2p3,lua_div*22")
>>
>> (define_insn_reservation "lua_idiv_hi_load" 26
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "HI")
>> (eq_attr "type" "idiv"))))
>> - "lua_decoder0,lua_p45,lua_p0p1p2p3*22")
>> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*22")
>>
>> (define_insn_reservation "lua_idiv_si" 20
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "SI")
>> (eq_attr "type" "idiv"))))
>> - "lua_decoder0,lua_p0p1p2p3*20")
>> + "lua_decoder0,lua_p0p1p2p3,lua_div*20")
>>
>> (define_insn_reservation "lua_idiv_si_load" 24
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "SI")
>> (eq_attr "type" "idiv"))))
>> - "lua_decoder0,lua_p45,lua_p0p1p2p3*20")
>> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*20")
>>
>> (define_insn_reservation "lua_idiv_di" 150
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "DI")
>> (eq_attr "type" "idiv"))))
>> - "lua_decoder0,lua_p0p1p2p3*150")
>> + "lua_decoder0,lua_p0p1p2p3,lua_div*150")
>>
>> (define_insn_reservation "lua_idiv_di_load" 154
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "DI")
>> (eq_attr "type" "idiv"))))
>> - "lua_decoder0,lua_p45,lua_p0p1p2p3*150")
>> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*150")
>>
>> ;; x87 floating point operations.
>>
>> @@ -406,42 +408,42 @@ (define_insn_reservation "lua_fdiv_SF" 15
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "SF")
>> (eq_attr "type" "fdiv,fpspc"))))
>> - "lua_decodern,lua_p0*15")
>> + "lua_decodern,lua_p0,lua_div*15")
>>
>> (define_insn_reservation "lua_fdiv_SF_load" 19
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "SF")
>> (eq_attr "type" "fdiv,fpspc"))))
>> - "lua_decoder01,lua_p45,lua_p0*15")
>> + "lua_decoder01,lua_p45,lua_p0,lua_div*15")
>>
>> (define_insn_reservation "lua_fdiv_DF" 18
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "DF")
>> (eq_attr "type" "fdiv,fpspc"))))
>> - "lua_decodern,lua_p0*18")
>> + "lua_decodern,lua_p0,lua_div*18")
>>
>> (define_insn_reservation "lua_fdiv_DF_load" 22
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "DF")
>> (eq_attr "type" "fdiv,fpspc"))))
>> - "lua_decoder01,lua_p45,lua_p0*18")
>> + "lua_decoder01,lua_p45,lua_p0,lua_div*18")
>>
>> (define_insn_reservation "lua_fdiv_XF" 22
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "XF")
>> (eq_attr "type" "fdiv,fpspc"))))
>> - "lua_decoder0,lua_p0*22")
>> + "lua_decoder0,lua_p0,lua_div*22")
>>
>> (define_insn_reservation "lua_fdiv_XF_load" 26
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "XF")
>> (eq_attr "type" "fdiv,fpspc"))))
>> - "lua_decoder0,lua_p45,lua_p0*22")
>> + "lua_decoder0,lua_p45,lua_p0,lua_div*22")
>>
>> ;; MMX instructions.
>>
>> @@ -593,84 +595,84 @@ (define_insn_reservation "lua_ssediv_SF" 13
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "SF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decodern,lua_p0*13")
>> + "lua_decodern,lua_p0,lua_div*13")
>>
>> (define_insn_reservation "lua_ssediv_load_SF" 17
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "SF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decoder01,lua_p45,lua_p0*13")
>> + "lua_decoder01,lua_p45,lua_p0,lua_div*13")
>>
>> (define_insn_reservation "lua_ssediv_V4SF" 23
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "V4SF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decodern,lua_p0*23")
>> + "lua_decodern,lua_p0,lua_div*23")
>>
>> (define_insn_reservation "lua_ssediv_load_V4SF" 27
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "V4SF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decoder01,lua_p45,lua_p0*23")
>> + "lua_decoder01,lua_p45,lua_p0,lua_div*23")
>>
>> (define_insn_reservation "lua_ssediv_V8SF" 47
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "V8SF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decoder0,lua_p0*47")
>> + "lua_decoder0,lua_p0,lua_div*47")
>>
>> (define_insn_reservation "lua_ssediv_load_V8SF" 51
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "V8SF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decoder0,lua_p45,lua_p0*47")
>> + "lua_decoder0,lua_p45,lua_p0,lua_div*47")
>>
>> (define_insn_reservation "lua_ssediv_SD" 17
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "DF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decodern,lua_p0*17")
>> + "lua_decodern,lua_p0,lua_div*17")
>>
>> (define_insn_reservation "lua_ssediv_load_SD" 21
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "DF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decoder01,lua_p45,lua_p0*17")
>> + "lua_decoder01,lua_p45,lua_p0,lua_div*17")
>>
>> (define_insn_reservation "lua_ssediv_V2DF" 30
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "V2DF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decodern,lua_p0*30")
>> + "lua_decodern,lua_p0,lua_div*30")
>>
>> (define_insn_reservation "lua_ssediv_load_V2DF" 34
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "V2DF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decoder01,lua_p45,lua_p0*30")
>> + "lua_decoder01,lua_p45,lua_p0,lua_div*30")
>>
>> (define_insn_reservation "lua_ssediv_V4DF" 56
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "none")
>> (and (eq_attr "mode" "V4DF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decoder0,lua_p0*56")
>> + "lua_decoder0,lua_p0,lua_div*56")
>>
>> (define_insn_reservation "lua_ssediv_load_V4DF" 60
>> (and (eq_attr "cpu" "lujiazui")
>> (and (eq_attr "memory" "load")
>> (and (eq_attr "mode" "V4DF")
>> (eq_attr "type" "ssediv"))))
>> - "lua_decoder0,lua_p4p5,lua_p0*56")
>> + "lua_decoder0,lua_p4p5,lua_p0,lua_div*56")
>>
>>
>>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] i386: correct division modeling in lujiazui.md
2022-12-30 2:21 [PATCH] i386: correct division modeling in lujiazui.md Mayshao-oc
@ 2022-12-30 8:26 ` Uros Bizjak
0 siblings, 0 replies; 4+ messages in thread
From: Uros Bizjak @ 2022-12-30 8:26 UTC (permalink / raw)
To: Mayshao-oc
Cc: Alexander Monakov, gcc-patches, Jan Hubicka, Louis Qi(BJ-RD),
Hawk Wang(BJ-RD)
On Fri, Dec 30, 2022 at 3:21 AM Mayshao-oc <Mayshao-oc@zhaoxin.com> wrote:
>
> >Ping. If there are any questions or concerns about the patch, please let me
> >know: I'm interested in continuing this cleanup at least for older AMD models.
> >
> Hi Alexander:
> According to the speccpu2017 benchmark result, the patch looks good in lujiazui.
The patch is OK then.
Thanks,
Uros.
> BR
> Mayshao
> >I noticed I had an extra line in my Changelog:
> >
> >> (lua_sseicvt_si): Ditto.
> >
> >It got there accidentally and I will drop it.
> >
> >Alexander
> >
> >On Fri, 9 Dec 2022, Alexander Monakov wrote:
> >
> >> Model the divider in Lujiazui processors as a separate automaton to
> >> significantly reduce the overall model size. This should also result
> >> in improved accuracy, as pipe 0 should be able to accept new
> >> instructions while the divider is occupied.
> >>
> >> It is unclear why integer divisions are modeled as if pipes 0-3 are
> >> all occupied. I've opted to keep a single-cycle reservation of all
> >> four pipes together, so GCC should continue trying to pack
> >> instructions around a division accordingly.
> >>
> >> Currently top three symbols in insn-automata.o are:
> >>
> >> 106102 r lujiazui_core_check
> >> 106102 r lujiazui_core_transitions
> >> 196123 r lujiazui_core_min_issue_delay
> >>
> >> This patch shrinks all lujiazui tables to:
> >>
> >> 3 r lujiazui_decoder_min_issue_delay
> >> 20 r lujiazui_decoder_transitions
> >> 32 r lujiazui_agu_min_issue_delay
> >> 126 r lujiazui_agu_transitions
> >> 304 r lujiazui_div_base
> >> 352 r lujiazui_div_check
> >> 352 r lujiazui_div_transitions
> >> 1152 r lujiazui_core_min_issue_delay
> >> 1592 r lujiazui_agu_translate
> >> 1592 r lujiazui_core_translate
> >> 1592 r lujiazui_decoder_translate
> >> 1592 r lujiazui_div_translate
> >> 3952 r lujiazui_div_min_issue_delay
> >> 9216 r lujiazui_core_transitions
> >>
> >> This continues the work on reducing i386 insn-automata.o size started
> >> with similar fixes for division and multiplication instructions in
> >> znver.md [1][2]. I plan to submit corresponding fixes for
> >> b[td]ver[123].md as well.
> >>
> >> [1]
> >> https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f12
> >> 15f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543
> >> [2]
> >> https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonak
> >> ov@ispras.ru/
> >>
> >> gcc/ChangeLog:
> >>
> >> PR target/87832
> >> * config/i386/lujiazui.md (lujiazui_div): New automaton.
> >> (lua_div): New unit.
> >> (lua_idiv_qi): Correct unit in the reservation.
> >> (lua_idiv_qi_load): Ditto.
> >> (lua_idiv_hi): Ditto.
> >> (lua_idiv_hi_load): Ditto.
> >> (lua_idiv_si): Ditto.
> >> (lua_idiv_si_load): Ditto.
> >> (lua_idiv_di): Ditto.
> >> (lua_idiv_di_load): Ditto.
> >> (lua_fdiv_SF): Ditto.
> >> (lua_fdiv_SF_load): Ditto.
> >> (lua_fdiv_DF): Ditto.
> >> (lua_fdiv_DF_load): Ditto.
> >> (lua_fdiv_XF): Ditto.
> >> (lua_fdiv_XF_load): Ditto.
> >> (lua_ssediv_SF): Ditto.
> >> (lua_ssediv_load_SF): Ditto.
> >> (lua_ssediv_V4SF): Ditto.
> >> (lua_ssediv_load_V4SF): Ditto.
> >> (lua_ssediv_V8SF): Ditto.
> >> (lua_ssediv_load_V8SF): Ditto.
> >> (lua_ssediv_SD): Ditto.
> >> (lua_ssediv_load_SD): Ditto.
> >> (lua_ssediv_V2DF): Ditto.
> >> (lua_ssediv_load_V2DF): Ditto.
> >> (lua_ssediv_V4DF): Ditto.
> >> (lua_ssediv_load_V4DF): Ditto.
> >> (lua_sseicvt_si): Ditto.
> >> ---
> >> gcc/config/i386/lujiazui.md | 58
> >> +++++++++++++++++++------------------
> >> 1 file changed, 30 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/gcc/config/i386/lujiazui.md b/gcc/config/i386/lujiazui.md
> >> index 9046c09f2..58a230c70 100644
> >> --- a/gcc/config/i386/lujiazui.md
> >> +++ b/gcc/config/i386/lujiazui.md
> >> @@ -19,8 +19,8 @@
> >>
> >> ;; Scheduling for ZHAOXIN lujiazui processor.
> >>
> >> -;; Modeling automatons for decoders, execution pipes and AGU pipes.
> >> -(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu")
> >> +;; Modeling automatons for decoders, execution pipes, AGU pipes, and divider.
> >> +(define_automaton
> >> +"lujiazui_decoder,lujiazui_core,lujiazui_agu,lujiazui_div")
> >>
> >> ;; The rules for the decoder are simple:
> >> ;; - an instruction with 1 uop can be decoded by any of the three @@
> >> -55,6 +55,8 @@ (define_reservation "lua_decoder01"
> >> "lua_decoder0|lua_decoder1") (define_cpu_unit
> >> "lua_p0,lua_p1,lua_p2,lua_p3" "lujiazui_core") (define_cpu_unit
> >> "lua_p4,lua_p5" "lujiazui_agu")
> >>
> >> +(define_cpu_unit "lua_div" "lujiazui_div")
> >> +
> >> (define_reservation "lua_p03" "lua_p0|lua_p3") (define_reservation
> >> "lua_p12" "lua_p1|lua_p2") (define_reservation "lua_p1p2"
> >> "lua_p1+lua_p2") @@ -229,56 +231,56 @@ (define_insn_reservation
> >> "lua_idiv_qi" 21
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "QI")
> >> (eq_attr "type" "idiv"))))
> >> - "lua_decoder0,lua_p0p1p2p3*21")
> >> + "lua_decoder0,lua_p0p1p2p3,lua_div*21")
> >>
> >> (define_insn_reservation "lua_idiv_qi_load" 25
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "QI")
> >> (eq_attr "type" "idiv"))))
> >> - "lua_decoder0,lua_p45,lua_p0p1p2p3*21")
> >> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*21")
> >>
> >> (define_insn_reservation "lua_idiv_hi" 22
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "HI")
> >> (eq_attr "type" "idiv"))))
> >> - "lua_decoder0,lua_p0p1p2p3*22")
> >> + "lua_decoder0,lua_p0p1p2p3,lua_div*22")
> >>
> >> (define_insn_reservation "lua_idiv_hi_load" 26
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "HI")
> >> (eq_attr "type" "idiv"))))
> >> - "lua_decoder0,lua_p45,lua_p0p1p2p3*22")
> >> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*22")
> >>
> >> (define_insn_reservation "lua_idiv_si" 20
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "SI")
> >> (eq_attr "type" "idiv"))))
> >> - "lua_decoder0,lua_p0p1p2p3*20")
> >> + "lua_decoder0,lua_p0p1p2p3,lua_div*20")
> >>
> >> (define_insn_reservation "lua_idiv_si_load" 24
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "SI")
> >> (eq_attr "type" "idiv"))))
> >> - "lua_decoder0,lua_p45,lua_p0p1p2p3*20")
> >> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*20")
> >>
> >> (define_insn_reservation "lua_idiv_di" 150
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "DI")
> >> (eq_attr "type" "idiv"))))
> >> - "lua_decoder0,lua_p0p1p2p3*150")
> >> + "lua_decoder0,lua_p0p1p2p3,lua_div*150")
> >>
> >> (define_insn_reservation "lua_idiv_di_load" 154
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "DI")
> >> (eq_attr "type" "idiv"))))
> >> - "lua_decoder0,lua_p45,lua_p0p1p2p3*150")
> >> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*150")
> >>
> >> ;; x87 floating point operations.
> >>
> >> @@ -406,42 +408,42 @@ (define_insn_reservation "lua_fdiv_SF" 15
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "SF")
> >> (eq_attr "type" "fdiv,fpspc"))))
> >> - "lua_decodern,lua_p0*15")
> >> + "lua_decodern,lua_p0,lua_div*15")
> >>
> >> (define_insn_reservation "lua_fdiv_SF_load" 19
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "SF")
> >> (eq_attr "type" "fdiv,fpspc"))))
> >> - "lua_decoder01,lua_p45,lua_p0*15")
> >> + "lua_decoder01,lua_p45,lua_p0,lua_div*15")
> >>
> >> (define_insn_reservation "lua_fdiv_DF" 18
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "DF")
> >> (eq_attr "type" "fdiv,fpspc"))))
> >> - "lua_decodern,lua_p0*18")
> >> + "lua_decodern,lua_p0,lua_div*18")
> >>
> >> (define_insn_reservation "lua_fdiv_DF_load" 22
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "DF")
> >> (eq_attr "type" "fdiv,fpspc"))))
> >> - "lua_decoder01,lua_p45,lua_p0*18")
> >> + "lua_decoder01,lua_p45,lua_p0,lua_div*18")
> >>
> >> (define_insn_reservation "lua_fdiv_XF" 22
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "XF")
> >> (eq_attr "type" "fdiv,fpspc"))))
> >> - "lua_decoder0,lua_p0*22")
> >> + "lua_decoder0,lua_p0,lua_div*22")
> >>
> >> (define_insn_reservation "lua_fdiv_XF_load" 26
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "XF")
> >> (eq_attr "type" "fdiv,fpspc"))))
> >> - "lua_decoder0,lua_p45,lua_p0*22")
> >> + "lua_decoder0,lua_p45,lua_p0,lua_div*22")
> >>
> >> ;; MMX instructions.
> >>
> >> @@ -593,84 +595,84 @@ (define_insn_reservation "lua_ssediv_SF" 13
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "SF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decodern,lua_p0*13")
> >> + "lua_decodern,lua_p0,lua_div*13")
> >>
> >> (define_insn_reservation "lua_ssediv_load_SF" 17
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "SF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decoder01,lua_p45,lua_p0*13")
> >> + "lua_decoder01,lua_p45,lua_p0,lua_div*13")
> >>
> >> (define_insn_reservation "lua_ssediv_V4SF" 23
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "V4SF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decodern,lua_p0*23")
> >> + "lua_decodern,lua_p0,lua_div*23")
> >>
> >> (define_insn_reservation "lua_ssediv_load_V4SF" 27
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "V4SF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decoder01,lua_p45,lua_p0*23")
> >> + "lua_decoder01,lua_p45,lua_p0,lua_div*23")
> >>
> >> (define_insn_reservation "lua_ssediv_V8SF" 47
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "V8SF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decoder0,lua_p0*47")
> >> + "lua_decoder0,lua_p0,lua_div*47")
> >>
> >> (define_insn_reservation "lua_ssediv_load_V8SF" 51
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "V8SF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decoder0,lua_p45,lua_p0*47")
> >> + "lua_decoder0,lua_p45,lua_p0,lua_div*47")
> >>
> >> (define_insn_reservation "lua_ssediv_SD" 17
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "DF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decodern,lua_p0*17")
> >> + "lua_decodern,lua_p0,lua_div*17")
> >>
> >> (define_insn_reservation "lua_ssediv_load_SD" 21
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "DF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decoder01,lua_p45,lua_p0*17")
> >> + "lua_decoder01,lua_p45,lua_p0,lua_div*17")
> >>
> >> (define_insn_reservation "lua_ssediv_V2DF" 30
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "V2DF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decodern,lua_p0*30")
> >> + "lua_decodern,lua_p0,lua_div*30")
> >>
> >> (define_insn_reservation "lua_ssediv_load_V2DF" 34
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "V2DF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decoder01,lua_p45,lua_p0*30")
> >> + "lua_decoder01,lua_p45,lua_p0,lua_div*30")
> >>
> >> (define_insn_reservation "lua_ssediv_V4DF" 56
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "none")
> >> (and (eq_attr "mode" "V4DF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decoder0,lua_p0*56")
> >> + "lua_decoder0,lua_p0,lua_div*56")
> >>
> >> (define_insn_reservation "lua_ssediv_load_V4DF" 60
> >> (and (eq_attr "cpu" "lujiazui")
> >> (and (eq_attr "memory" "load")
> >> (and (eq_attr "mode" "V4DF")
> >> (eq_attr "type" "ssediv"))))
> >> - "lua_decoder0,lua_p4p5,lua_p0*56")
> >> + "lua_decoder0,lua_p4p5,lua_p0,lua_div*56")
> >>
> >>
> >>
> >
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] i386: correct division modeling in lujiazui.md
2022-12-09 18:19 Alexander Monakov
@ 2022-12-19 16:06 ` Alexander Monakov
0 siblings, 0 replies; 4+ messages in thread
From: Alexander Monakov @ 2022-12-19 16:06 UTC (permalink / raw)
To: gcc-patches; +Cc: Mayshao-oc, Uros Bizjak, Jan Hubicka
Ping. If there are any questions or concerns about the patch, please let me
know: I'm interested in continuing this cleanup at least for older AMD models.
I noticed I had an extra line in my Changelog:
> (lua_sseicvt_si): Ditto.
It got there accidentally and I will drop it.
Alexander
On Fri, 9 Dec 2022, Alexander Monakov wrote:
> Model the divider in Lujiazui processors as a separate automaton to
> significantly reduce the overall model size. This should also result
> in improved accuracy, as pipe 0 should be able to accept new
> instructions while the divider is occupied.
>
> It is unclear why integer divisions are modeled as if pipes 0-3 are all
> occupied. I've opted to keep a single-cycle reservation of all four
> pipes together, so GCC should continue trying to pack instructions
> around a division accordingly.
>
> Currently top three symbols in insn-automata.o are:
>
> 106102 r lujiazui_core_check
> 106102 r lujiazui_core_transitions
> 196123 r lujiazui_core_min_issue_delay
>
> This patch shrinks all lujiazui tables to:
>
> 3 r lujiazui_decoder_min_issue_delay
> 20 r lujiazui_decoder_transitions
> 32 r lujiazui_agu_min_issue_delay
> 126 r lujiazui_agu_transitions
> 304 r lujiazui_div_base
> 352 r lujiazui_div_check
> 352 r lujiazui_div_transitions
> 1152 r lujiazui_core_min_issue_delay
> 1592 r lujiazui_agu_translate
> 1592 r lujiazui_core_translate
> 1592 r lujiazui_decoder_translate
> 1592 r lujiazui_div_translate
> 3952 r lujiazui_div_min_issue_delay
> 9216 r lujiazui_core_transitions
>
> This continues the work on reducing i386 insn-automata.o size started
> with similar fixes for division and multiplication instructions in
> znver.md [1][2]. I plan to submit corresponding fixes for
> b[td]ver[123].md as well.
>
> [1] https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543
> [2] https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonakov@ispras.ru/
>
> gcc/ChangeLog:
>
> PR target/87832
> * config/i386/lujiazui.md (lujiazui_div): New automaton.
> (lua_div): New unit.
> (lua_idiv_qi): Correct unit in the reservation.
> (lua_idiv_qi_load): Ditto.
> (lua_idiv_hi): Ditto.
> (lua_idiv_hi_load): Ditto.
> (lua_idiv_si): Ditto.
> (lua_idiv_si_load): Ditto.
> (lua_idiv_di): Ditto.
> (lua_idiv_di_load): Ditto.
> (lua_fdiv_SF): Ditto.
> (lua_fdiv_SF_load): Ditto.
> (lua_fdiv_DF): Ditto.
> (lua_fdiv_DF_load): Ditto.
> (lua_fdiv_XF): Ditto.
> (lua_fdiv_XF_load): Ditto.
> (lua_ssediv_SF): Ditto.
> (lua_ssediv_load_SF): Ditto.
> (lua_ssediv_V4SF): Ditto.
> (lua_ssediv_load_V4SF): Ditto.
> (lua_ssediv_V8SF): Ditto.
> (lua_ssediv_load_V8SF): Ditto.
> (lua_ssediv_SD): Ditto.
> (lua_ssediv_load_SD): Ditto.
> (lua_ssediv_V2DF): Ditto.
> (lua_ssediv_load_V2DF): Ditto.
> (lua_ssediv_V4DF): Ditto.
> (lua_ssediv_load_V4DF): Ditto.
> (lua_sseicvt_si): Ditto.
> ---
> gcc/config/i386/lujiazui.md | 58 +++++++++++++++++++------------------
> 1 file changed, 30 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/config/i386/lujiazui.md b/gcc/config/i386/lujiazui.md
> index 9046c09f2..58a230c70 100644
> --- a/gcc/config/i386/lujiazui.md
> +++ b/gcc/config/i386/lujiazui.md
> @@ -19,8 +19,8 @@
>
> ;; Scheduling for ZHAOXIN lujiazui processor.
>
> -;; Modeling automatons for decoders, execution pipes and AGU pipes.
> -(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu")
> +;; Modeling automatons for decoders, execution pipes, AGU pipes, and divider.
> +(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu,lujiazui_div")
>
> ;; The rules for the decoder are simple:
> ;; - an instruction with 1 uop can be decoded by any of the three
> @@ -55,6 +55,8 @@ (define_reservation "lua_decoder01" "lua_decoder0|lua_decoder1")
> (define_cpu_unit "lua_p0,lua_p1,lua_p2,lua_p3" "lujiazui_core")
> (define_cpu_unit "lua_p4,lua_p5" "lujiazui_agu")
>
> +(define_cpu_unit "lua_div" "lujiazui_div")
> +
> (define_reservation "lua_p03" "lua_p0|lua_p3")
> (define_reservation "lua_p12" "lua_p1|lua_p2")
> (define_reservation "lua_p1p2" "lua_p1+lua_p2")
> @@ -229,56 +231,56 @@ (define_insn_reservation "lua_idiv_qi" 21
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "QI")
> (eq_attr "type" "idiv"))))
> - "lua_decoder0,lua_p0p1p2p3*21")
> + "lua_decoder0,lua_p0p1p2p3,lua_div*21")
>
> (define_insn_reservation "lua_idiv_qi_load" 25
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "QI")
> (eq_attr "type" "idiv"))))
> - "lua_decoder0,lua_p45,lua_p0p1p2p3*21")
> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*21")
>
> (define_insn_reservation "lua_idiv_hi" 22
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "HI")
> (eq_attr "type" "idiv"))))
> - "lua_decoder0,lua_p0p1p2p3*22")
> + "lua_decoder0,lua_p0p1p2p3,lua_div*22")
>
> (define_insn_reservation "lua_idiv_hi_load" 26
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "HI")
> (eq_attr "type" "idiv"))))
> - "lua_decoder0,lua_p45,lua_p0p1p2p3*22")
> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*22")
>
> (define_insn_reservation "lua_idiv_si" 20
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "SI")
> (eq_attr "type" "idiv"))))
> - "lua_decoder0,lua_p0p1p2p3*20")
> + "lua_decoder0,lua_p0p1p2p3,lua_div*20")
>
> (define_insn_reservation "lua_idiv_si_load" 24
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "SI")
> (eq_attr "type" "idiv"))))
> - "lua_decoder0,lua_p45,lua_p0p1p2p3*20")
> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*20")
>
> (define_insn_reservation "lua_idiv_di" 150
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "DI")
> (eq_attr "type" "idiv"))))
> - "lua_decoder0,lua_p0p1p2p3*150")
> + "lua_decoder0,lua_p0p1p2p3,lua_div*150")
>
> (define_insn_reservation "lua_idiv_di_load" 154
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "DI")
> (eq_attr "type" "idiv"))))
> - "lua_decoder0,lua_p45,lua_p0p1p2p3*150")
> + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*150")
>
> ;; x87 floating point operations.
>
> @@ -406,42 +408,42 @@ (define_insn_reservation "lua_fdiv_SF" 15
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "SF")
> (eq_attr "type" "fdiv,fpspc"))))
> - "lua_decodern,lua_p0*15")
> + "lua_decodern,lua_p0,lua_div*15")
>
> (define_insn_reservation "lua_fdiv_SF_load" 19
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "SF")
> (eq_attr "type" "fdiv,fpspc"))))
> - "lua_decoder01,lua_p45,lua_p0*15")
> + "lua_decoder01,lua_p45,lua_p0,lua_div*15")
>
> (define_insn_reservation "lua_fdiv_DF" 18
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "DF")
> (eq_attr "type" "fdiv,fpspc"))))
> - "lua_decodern,lua_p0*18")
> + "lua_decodern,lua_p0,lua_div*18")
>
> (define_insn_reservation "lua_fdiv_DF_load" 22
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "DF")
> (eq_attr "type" "fdiv,fpspc"))))
> - "lua_decoder01,lua_p45,lua_p0*18")
> + "lua_decoder01,lua_p45,lua_p0,lua_div*18")
>
> (define_insn_reservation "lua_fdiv_XF" 22
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "XF")
> (eq_attr "type" "fdiv,fpspc"))))
> - "lua_decoder0,lua_p0*22")
> + "lua_decoder0,lua_p0,lua_div*22")
>
> (define_insn_reservation "lua_fdiv_XF_load" 26
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "XF")
> (eq_attr "type" "fdiv,fpspc"))))
> - "lua_decoder0,lua_p45,lua_p0*22")
> + "lua_decoder0,lua_p45,lua_p0,lua_div*22")
>
> ;; MMX instructions.
>
> @@ -593,84 +595,84 @@ (define_insn_reservation "lua_ssediv_SF" 13
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "SF")
> (eq_attr "type" "ssediv"))))
> - "lua_decodern,lua_p0*13")
> + "lua_decodern,lua_p0,lua_div*13")
>
> (define_insn_reservation "lua_ssediv_load_SF" 17
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "SF")
> (eq_attr "type" "ssediv"))))
> - "lua_decoder01,lua_p45,lua_p0*13")
> + "lua_decoder01,lua_p45,lua_p0,lua_div*13")
>
> (define_insn_reservation "lua_ssediv_V4SF" 23
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "V4SF")
> (eq_attr "type" "ssediv"))))
> - "lua_decodern,lua_p0*23")
> + "lua_decodern,lua_p0,lua_div*23")
>
> (define_insn_reservation "lua_ssediv_load_V4SF" 27
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "V4SF")
> (eq_attr "type" "ssediv"))))
> - "lua_decoder01,lua_p45,lua_p0*23")
> + "lua_decoder01,lua_p45,lua_p0,lua_div*23")
>
> (define_insn_reservation "lua_ssediv_V8SF" 47
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "V8SF")
> (eq_attr "type" "ssediv"))))
> - "lua_decoder0,lua_p0*47")
> + "lua_decoder0,lua_p0,lua_div*47")
>
> (define_insn_reservation "lua_ssediv_load_V8SF" 51
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "V8SF")
> (eq_attr "type" "ssediv"))))
> - "lua_decoder0,lua_p45,lua_p0*47")
> + "lua_decoder0,lua_p45,lua_p0,lua_div*47")
>
> (define_insn_reservation "lua_ssediv_SD" 17
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "DF")
> (eq_attr "type" "ssediv"))))
> - "lua_decodern,lua_p0*17")
> + "lua_decodern,lua_p0,lua_div*17")
>
> (define_insn_reservation "lua_ssediv_load_SD" 21
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "DF")
> (eq_attr "type" "ssediv"))))
> - "lua_decoder01,lua_p45,lua_p0*17")
> + "lua_decoder01,lua_p45,lua_p0,lua_div*17")
>
> (define_insn_reservation "lua_ssediv_V2DF" 30
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "V2DF")
> (eq_attr "type" "ssediv"))))
> - "lua_decodern,lua_p0*30")
> + "lua_decodern,lua_p0,lua_div*30")
>
> (define_insn_reservation "lua_ssediv_load_V2DF" 34
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "V2DF")
> (eq_attr "type" "ssediv"))))
> - "lua_decoder01,lua_p45,lua_p0*30")
> + "lua_decoder01,lua_p45,lua_p0,lua_div*30")
>
> (define_insn_reservation "lua_ssediv_V4DF" 56
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "none")
> (and (eq_attr "mode" "V4DF")
> (eq_attr "type" "ssediv"))))
> - "lua_decoder0,lua_p0*56")
> + "lua_decoder0,lua_p0,lua_div*56")
>
> (define_insn_reservation "lua_ssediv_load_V4DF" 60
> (and (eq_attr "cpu" "lujiazui")
> (and (eq_attr "memory" "load")
> (and (eq_attr "mode" "V4DF")
> (eq_attr "type" "ssediv"))))
> - "lua_decoder0,lua_p4p5,lua_p0*56")
> + "lua_decoder0,lua_p4p5,lua_p0,lua_div*56")
>
>
> (define_insn_reservation "lua_sseicvt_si" 2
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] i386: correct division modeling in lujiazui.md
@ 2022-12-09 18:19 Alexander Monakov
2022-12-19 16:06 ` Alexander Monakov
0 siblings, 1 reply; 4+ messages in thread
From: Alexander Monakov @ 2022-12-09 18:19 UTC (permalink / raw)
To: gcc-patches; +Cc: Mayshao-oc, Uros Bizjak, Jan Hubicka, Alexander Monakov
Model the divider in Lujiazui processors as a separate automaton to
significantly reduce the overall model size. This should also result
in improved accuracy, as pipe 0 should be able to accept new
instructions while the divider is occupied.
It is unclear why integer divisions are modeled as if pipes 0-3 are all
occupied. I've opted to keep a single-cycle reservation of all four
pipes together, so GCC should continue trying to pack instructions
around a division accordingly.
Currently top three symbols in insn-automata.o are:
106102 r lujiazui_core_check
106102 r lujiazui_core_transitions
196123 r lujiazui_core_min_issue_delay
This patch shrinks all lujiazui tables to:
3 r lujiazui_decoder_min_issue_delay
20 r lujiazui_decoder_transitions
32 r lujiazui_agu_min_issue_delay
126 r lujiazui_agu_transitions
304 r lujiazui_div_base
352 r lujiazui_div_check
352 r lujiazui_div_transitions
1152 r lujiazui_core_min_issue_delay
1592 r lujiazui_agu_translate
1592 r lujiazui_core_translate
1592 r lujiazui_decoder_translate
1592 r lujiazui_div_translate
3952 r lujiazui_div_min_issue_delay
9216 r lujiazui_core_transitions
This continues the work on reducing i386 insn-automata.o size started
with similar fixes for division and multiplication instructions in
znver.md [1][2]. I plan to submit corresponding fixes for
b[td]ver[123].md as well.
[1] https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543
[2] https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonakov@ispras.ru/
gcc/ChangeLog:
PR target/87832
* config/i386/lujiazui.md (lujiazui_div): New automaton.
(lua_div): New unit.
(lua_idiv_qi): Correct unit in the reservation.
(lua_idiv_qi_load): Ditto.
(lua_idiv_hi): Ditto.
(lua_idiv_hi_load): Ditto.
(lua_idiv_si): Ditto.
(lua_idiv_si_load): Ditto.
(lua_idiv_di): Ditto.
(lua_idiv_di_load): Ditto.
(lua_fdiv_SF): Ditto.
(lua_fdiv_SF_load): Ditto.
(lua_fdiv_DF): Ditto.
(lua_fdiv_DF_load): Ditto.
(lua_fdiv_XF): Ditto.
(lua_fdiv_XF_load): Ditto.
(lua_ssediv_SF): Ditto.
(lua_ssediv_load_SF): Ditto.
(lua_ssediv_V4SF): Ditto.
(lua_ssediv_load_V4SF): Ditto.
(lua_ssediv_V8SF): Ditto.
(lua_ssediv_load_V8SF): Ditto.
(lua_ssediv_SD): Ditto.
(lua_ssediv_load_SD): Ditto.
(lua_ssediv_V2DF): Ditto.
(lua_ssediv_load_V2DF): Ditto.
(lua_ssediv_V4DF): Ditto.
(lua_ssediv_load_V4DF): Ditto.
(lua_sseicvt_si): Ditto.
---
gcc/config/i386/lujiazui.md | 58 +++++++++++++++++++------------------
1 file changed, 30 insertions(+), 28 deletions(-)
diff --git a/gcc/config/i386/lujiazui.md b/gcc/config/i386/lujiazui.md
index 9046c09f2..58a230c70 100644
--- a/gcc/config/i386/lujiazui.md
+++ b/gcc/config/i386/lujiazui.md
@@ -19,8 +19,8 @@
;; Scheduling for ZHAOXIN lujiazui processor.
-;; Modeling automatons for decoders, execution pipes and AGU pipes.
-(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu")
+;; Modeling automatons for decoders, execution pipes, AGU pipes, and divider.
+(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu,lujiazui_div")
;; The rules for the decoder are simple:
;; - an instruction with 1 uop can be decoded by any of the three
@@ -55,6 +55,8 @@ (define_reservation "lua_decoder01" "lua_decoder0|lua_decoder1")
(define_cpu_unit "lua_p0,lua_p1,lua_p2,lua_p3" "lujiazui_core")
(define_cpu_unit "lua_p4,lua_p5" "lujiazui_agu")
+(define_cpu_unit "lua_div" "lujiazui_div")
+
(define_reservation "lua_p03" "lua_p0|lua_p3")
(define_reservation "lua_p12" "lua_p1|lua_p2")
(define_reservation "lua_p1p2" "lua_p1+lua_p2")
@@ -229,56 +231,56 @@ (define_insn_reservation "lua_idiv_qi" 21
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "QI")
(eq_attr "type" "idiv"))))
- "lua_decoder0,lua_p0p1p2p3*21")
+ "lua_decoder0,lua_p0p1p2p3,lua_div*21")
(define_insn_reservation "lua_idiv_qi_load" 25
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "QI")
(eq_attr "type" "idiv"))))
- "lua_decoder0,lua_p45,lua_p0p1p2p3*21")
+ "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*21")
(define_insn_reservation "lua_idiv_hi" 22
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "HI")
(eq_attr "type" "idiv"))))
- "lua_decoder0,lua_p0p1p2p3*22")
+ "lua_decoder0,lua_p0p1p2p3,lua_div*22")
(define_insn_reservation "lua_idiv_hi_load" 26
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "HI")
(eq_attr "type" "idiv"))))
- "lua_decoder0,lua_p45,lua_p0p1p2p3*22")
+ "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*22")
(define_insn_reservation "lua_idiv_si" 20
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "SI")
(eq_attr "type" "idiv"))))
- "lua_decoder0,lua_p0p1p2p3*20")
+ "lua_decoder0,lua_p0p1p2p3,lua_div*20")
(define_insn_reservation "lua_idiv_si_load" 24
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "SI")
(eq_attr "type" "idiv"))))
- "lua_decoder0,lua_p45,lua_p0p1p2p3*20")
+ "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*20")
(define_insn_reservation "lua_idiv_di" 150
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "DI")
(eq_attr "type" "idiv"))))
- "lua_decoder0,lua_p0p1p2p3*150")
+ "lua_decoder0,lua_p0p1p2p3,lua_div*150")
(define_insn_reservation "lua_idiv_di_load" 154
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "DI")
(eq_attr "type" "idiv"))))
- "lua_decoder0,lua_p45,lua_p0p1p2p3*150")
+ "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*150")
;; x87 floating point operations.
@@ -406,42 +408,42 @@ (define_insn_reservation "lua_fdiv_SF" 15
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "SF")
(eq_attr "type" "fdiv,fpspc"))))
- "lua_decodern,lua_p0*15")
+ "lua_decodern,lua_p0,lua_div*15")
(define_insn_reservation "lua_fdiv_SF_load" 19
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "SF")
(eq_attr "type" "fdiv,fpspc"))))
- "lua_decoder01,lua_p45,lua_p0*15")
+ "lua_decoder01,lua_p45,lua_p0,lua_div*15")
(define_insn_reservation "lua_fdiv_DF" 18
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "DF")
(eq_attr "type" "fdiv,fpspc"))))
- "lua_decodern,lua_p0*18")
+ "lua_decodern,lua_p0,lua_div*18")
(define_insn_reservation "lua_fdiv_DF_load" 22
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "DF")
(eq_attr "type" "fdiv,fpspc"))))
- "lua_decoder01,lua_p45,lua_p0*18")
+ "lua_decoder01,lua_p45,lua_p0,lua_div*18")
(define_insn_reservation "lua_fdiv_XF" 22
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "XF")
(eq_attr "type" "fdiv,fpspc"))))
- "lua_decoder0,lua_p0*22")
+ "lua_decoder0,lua_p0,lua_div*22")
(define_insn_reservation "lua_fdiv_XF_load" 26
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "XF")
(eq_attr "type" "fdiv,fpspc"))))
- "lua_decoder0,lua_p45,lua_p0*22")
+ "lua_decoder0,lua_p45,lua_p0,lua_div*22")
;; MMX instructions.
@@ -593,84 +595,84 @@ (define_insn_reservation "lua_ssediv_SF" 13
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "SF")
(eq_attr "type" "ssediv"))))
- "lua_decodern,lua_p0*13")
+ "lua_decodern,lua_p0,lua_div*13")
(define_insn_reservation "lua_ssediv_load_SF" 17
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "SF")
(eq_attr "type" "ssediv"))))
- "lua_decoder01,lua_p45,lua_p0*13")
+ "lua_decoder01,lua_p45,lua_p0,lua_div*13")
(define_insn_reservation "lua_ssediv_V4SF" 23
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "V4SF")
(eq_attr "type" "ssediv"))))
- "lua_decodern,lua_p0*23")
+ "lua_decodern,lua_p0,lua_div*23")
(define_insn_reservation "lua_ssediv_load_V4SF" 27
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "V4SF")
(eq_attr "type" "ssediv"))))
- "lua_decoder01,lua_p45,lua_p0*23")
+ "lua_decoder01,lua_p45,lua_p0,lua_div*23")
(define_insn_reservation "lua_ssediv_V8SF" 47
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "V8SF")
(eq_attr "type" "ssediv"))))
- "lua_decoder0,lua_p0*47")
+ "lua_decoder0,lua_p0,lua_div*47")
(define_insn_reservation "lua_ssediv_load_V8SF" 51
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "V8SF")
(eq_attr "type" "ssediv"))))
- "lua_decoder0,lua_p45,lua_p0*47")
+ "lua_decoder0,lua_p45,lua_p0,lua_div*47")
(define_insn_reservation "lua_ssediv_SD" 17
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "DF")
(eq_attr "type" "ssediv"))))
- "lua_decodern,lua_p0*17")
+ "lua_decodern,lua_p0,lua_div*17")
(define_insn_reservation "lua_ssediv_load_SD" 21
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "DF")
(eq_attr "type" "ssediv"))))
- "lua_decoder01,lua_p45,lua_p0*17")
+ "lua_decoder01,lua_p45,lua_p0,lua_div*17")
(define_insn_reservation "lua_ssediv_V2DF" 30
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "V2DF")
(eq_attr "type" "ssediv"))))
- "lua_decodern,lua_p0*30")
+ "lua_decodern,lua_p0,lua_div*30")
(define_insn_reservation "lua_ssediv_load_V2DF" 34
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "V2DF")
(eq_attr "type" "ssediv"))))
- "lua_decoder01,lua_p45,lua_p0*30")
+ "lua_decoder01,lua_p45,lua_p0,lua_div*30")
(define_insn_reservation "lua_ssediv_V4DF" 56
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "none")
(and (eq_attr "mode" "V4DF")
(eq_attr "type" "ssediv"))))
- "lua_decoder0,lua_p0*56")
+ "lua_decoder0,lua_p0,lua_div*56")
(define_insn_reservation "lua_ssediv_load_V4DF" 60
(and (eq_attr "cpu" "lujiazui")
(and (eq_attr "memory" "load")
(and (eq_attr "mode" "V4DF")
(eq_attr "type" "ssediv"))))
- "lua_decoder0,lua_p4p5,lua_p0*56")
+ "lua_decoder0,lua_p4p5,lua_p0,lua_div*56")
(define_insn_reservation "lua_sseicvt_si" 2
--
2.37.2
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-12-30 8:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-30 2:21 [PATCH] i386: correct division modeling in lujiazui.md Mayshao-oc
2022-12-30 8:26 ` Uros Bizjak
-- strict thread matches above, loose matches on Subject: below --
2022-12-09 18:19 Alexander Monakov
2022-12-19 16:06 ` Alexander Monakov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).