public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
@ 2022-10-24 18:48 ` amonakov at gcc dot gnu.org
2022-11-01 12:21 ` cvs-commit at gcc dot gnu.org
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-10-24 18:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Suggested partial fix for the integer-pipe side of the blowup:
https://inbox.sourceware.org/gcc-patches/4549f27b-238a-7d77-f72b-cc77df8ae36e@ispras.ru/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
2022-10-24 18:48 ` [Bug target/87832] AMD pipeline models are very costly size-wise amonakov at gcc dot gnu.org
@ 2022-11-01 12:21 ` cvs-commit at gcc dot gnu.org
2022-11-07 11:23 ` amonakov at gcc dot gnu.org
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-01 12:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alexander Monakov <amonakov@gcc.gnu.org>:
https://gcc.gnu.org/g:5cee5f94000ee5eabce9b223c44c7923c1c69f61
commit r13-3589-g5cee5f94000ee5eabce9b223c44c7923c1c69f61
Author: Alexander Monakov <amonakov@ispras.ru>
Date: Mon Oct 31 17:35:57 2022 +0300
i386: correct integer division modeling in znver.md
In znver.md, division instructions have descriptions like
(define_insn_reservation "znver1_idiv_DI" 41
(and (eq_attr "cpu" "znver1,znver2")
(and (eq_attr "type" "idiv")
(and (eq_attr "mode" "DI")
(eq_attr "memory" "none"))))
"znver1-double,znver1-ieu2*41")
which says that DImode idiv has latency 41 (which is correct) and that
it occupies 2nd integer execution unit for 41 consecutive cycles, but
that is not correct:
1) the division instruction is partially pipelined, and has throughput
1/14, not 1/41;
2) for the most part it occupies a separate division unit, not the
general arithmetic unit.
Evidently, interaction of such 41-cycle paths with the rest of
reservations causes a combinatorial explosion in the automaton.
Fix this by modeling the integer division unit properly, and correcting
reservations to use the measured reciprocal throughput of those
instructions (available from uops.info). A similar correction for
floating-point divisions is left for a followup patch.
Top 5 znver table sizes, before:
68692 r znver1_ieu_check
68692 r znver1_ieu_transitions
99792 r znver1_ieu_min_issue_delay
428108 r znver1_fp_min_issue_delay
856216 r znver1_fp_transitions
After:
1454 r znver1_ieu_translate
1454 r znver1_translate
2304 r znver1_ieu_transitions
428108 r znver1_fp_min_issue_delay
856216 r znver1_fp_transitions
gcc/ChangeLog:
PR target/87832
* config/i386/znver.md (znver1_idiv): New automaton.
(znver1-idiv): New unit.
(znver1_idiv_DI): Correct unit and cycles in the reservation.
(znver1_idiv_SI): Ditto.
(znver1_idiv_HI): Ditto.
(znver1_idiv_QI): Ditto.
(znver1_idiv_mem_DI): Ditto.
(znver1_idiv_mem_SI): Ditto.
(znver1_idiv_mem_HI): Ditto.
(znver1_idiv_mem_QI): Ditto.
(znver3_idiv_DI): Ditto.
(znver3_idiv_SI): Ditto.
(znver3_idiv_HI): Ditto.
(znver3_idiv_QI): Ditto.
(znver3_idiv_mem_DI): Ditto.
(znver3_idiv_mem_SI): Ditto.
(znver3_idiv_mem_HI): Ditto.
(znver3_idiv_mem_QI): Ditto.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
2022-10-24 18:48 ` [Bug target/87832] AMD pipeline models are very costly size-wise amonakov at gcc dot gnu.org
2022-11-01 12:21 ` cvs-commit at gcc dot gnu.org
@ 2022-11-07 11:23 ` amonakov at gcc dot gnu.org
2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-07 11:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Followup patches have been posted at
https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonakov@ispras.ru/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2022-11-07 11:23 ` amonakov at gcc dot gnu.org
@ 2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-16 13:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alexander Monakov <amonakov@gcc.gnu.org>:
https://gcc.gnu.org/g:dd744f06c9952f92738b0860630085f0f0b99574
commit r13-4092-gdd744f06c9952f92738b0860630085f0f0b99574
Author: Alexander Monakov <amonakov@ispras.ru>
Date: Tue Nov 1 17:04:25 2022 +0300
i386: correct x87&SSE division modeling in znver.md
Correct modeling of division instructions in the SIMD/FP domain for
AMD Zen architectures and avoid combinatorial explosion of automaton
tables by modeling the separate floating-point division unit and
correcting reservations to reflect reciprocal throughput of the
corresponding instructions, similar to earlier commit
5cee5f94000 ("i386: correct integer division modeling in znver.md").
Division is partially pipelined and some instructions have fractional
throughput (e.g. Zen 3 can issue divss and divsd each 3.5 and 4.5
cycles on average, respectively). Considering these CPUs implement
out-of-order execution, the model doesn't need to be exact to the last
cycle, so simplify it by using 4/5 cycles for SF/DF modes, and not
modeling the fact that FP3 pipe is occupied for one cycle.
Top znver table sizes in insn-automata.o:
Before:
428108 r znver1_fp_min_issue_delay
856216 r znver1_fp_transitions
After:
30056 r znver1_fp_min_issue_delay
120224 r znver1_fp_transitions
gcc/ChangeLog:
PR target/87832
* config/i386/znver.md (znver1_fdiv): New automaton.
(znver1-fdiv): New unit.
(znver1_fp_op_div): Correct unit and cycles in the reservation.
(znver1_fp_op_div_load): Ditto.
(znver1_fp_op_idiv_load): Ditto.
(znver2_fp_op_idiv_load): Ditto.
(znver1_ssediv_ss_ps): Ditto.
(znver1_ssediv_ss_ps_load): Ditto.
(znver1_ssediv_sd_pd): Ditto.
(znver1_ssediv_sd_pd_load): Ditto.
(znver1_ssediv_avx256_ps): Ditto.
(znver1_ssediv_avx256_ps_load): Ditto.
(znver1_ssediv_avx256_pd): Ditto.
(znver1_ssediv_avx256_pd_load): Ditto.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
@ 2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
2022-11-16 13:48 ` amonakov at gcc dot gnu.org
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-16 13:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alexander Monakov <amonakov@gcc.gnu.org>:
https://gcc.gnu.org/g:d4cc7a8c4a623b62dd0d486d7780d91b58eb6f1f
commit r13-4093-gd4cc7a8c4a623b62dd0d486d7780d91b58eb6f1f
Author: Alexander Monakov <amonakov@ispras.ru>
Date: Tue Nov 1 17:53:13 2022 +0300
i386: correct x87&SSE multiplication modeling in znver.md
All multiplication instructions are fully pipelined, except AVX256
instructions on Zen 1, which issue over two cycles on a 128-bit unit.
Correct the model accordingly to reduce combinatorial explosion in
automaton tables.
Top znver table sizes in insn-automata.o:
Before:
30056 r znver1_fp_min_issue_delay
120224 r znver1_fp_transitions
After:
6720 r znver1_fp_min_issue_delay
53760 r znver1_fp_transitions
gcc/ChangeLog:
PR target/87832
* config/i386/znver.md: (znver1_fp_op_mul): Correct cycles in
the reservation.
(znver1_fp_op_mul_load): Ditto.
(znver1_mmx_mul): Ditto.
(znver1_mmx_load): Ditto.
(znver1_ssemul_ss_ps): Ditto.
(znver1_ssemul_ss_ps_load): Ditto.
(znver1_ssemul_avx256_ps): Ditto.
(znver1_ssemul_avx256_ps_load): Ditto.
(znver1_ssemul_sd_pd): Ditto.
(znver1_ssemul_sd_pd_load): Ditto.
(znver2_ssemul_sd_pd): Ditto.
(znver2_ssemul_sd_pd_load): Ditto.
(znver1_ssemul_avx256_pd): Ditto.
(znver1_ssemul_avx256_pd_load): Ditto.
(znver1_sseimul): Ditto.
(znver1_sseimul_avx256): Ditto.
(znver1_sseimul_load): Ditto.
(znver1_sseimul_avx256_load): Ditto.
(znver1_sseimul_di): Ditto.
(znver1_sseimul_load_di): Ditto.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
@ 2022-11-16 13:48 ` amonakov at gcc dot gnu.org
2022-11-16 14:16 ` hubicka at ucw dot cz
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-16 13:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #6 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
With these patches on trunk, current situation is:
nm -CS -t d --defined-only gcc/insn-automata.o | sed 's/^[0-9]* 0*//' | sort -n
| tail -40
2496 r slm_base
2527 r bdver3_load_min_issue_delay
2746 r glm_base
3892 r bdver1_fp_base
4444 r bdver1_ieu_min_issue_delay
4492 r geode_base
4608 r bdver3_ieu_transitions
6402 r bdver1_load_transitions
6720 r znver1_fp_min_issue_delay
7862 r athlon_fp_check
7862 r athlon_fp_transitions
9122 r lujiazui_core_base
9997 t internal_insn_latency(int, int, rtx_insn*, rtx_insn*)
10108 r bdver3_load_transitions
10498 r geode_check
10498 r geode_transitions
11632 r print_reservation(_IO_FILE*, rtx_insn*)::reservation_names
12575 r athlon_fp_min_issue_delay
12742 r btver2_fp_check
12742 r btver2_fp_transitions
13896 r slm_check
13896 r slm_transitions
17149 t internal_min_issue_delay(int, DFA_chip*)
17349 t internal_state_transition(int, DFA_chip*)
17776 r bdver1_ieu_transitions
20068 r bdver1_fp_check
20068 r bdver1_fp_transitions
26208 r slm_min_issue_delay
27244 r bdver1_fp_min_issue_delay
28518 r glm_check
28518 r glm_transitions
33690 r geode_min_issue_delay
46980 r bdver3_fp_min_issue_delay
49428 r glm_min_issue_delay
53730 r btver2_fp_min_issue_delay
53760 r znver1_fp_transitions
93960 r bdver3_fp_transitions
106102 r lujiazui_core_check
106102 r lujiazui_core_transitions
196123 r lujiazui_core_min_issue_delay
What shall we do with similar blowups in lujiazui and b[dt]ver[123] models?
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2022-11-16 13:48 ` amonakov at gcc dot gnu.org
@ 2022-11-16 14:16 ` hubicka at ucw dot cz
2022-11-16 14:30 ` amonakov at gcc dot gnu.org
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: hubicka at ucw dot cz @ 2022-11-16 14:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #7 from Jan Hubicka <hubicka at ucw dot cz> ---
> 53730 r btver2_fp_min_issue_delay
> 53760 r znver1_fp_transitions
> 93960 r bdver3_fp_transitions
> 106102 r lujiazui_core_check
> 106102 r lujiazui_core_transitions
> 196123 r lujiazui_core_min_issue_delay
>
> What shall we do with similar blowups in lujiazui and b[dt]ver[123] models?
Yes, I think that makes sense...
Honza
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2022-11-16 14:16 ` hubicka at ucw dot cz
@ 2022-11-16 14:30 ` amonakov at gcc dot gnu.org
2022-11-16 15:33 ` Jan Hubicka
2022-11-16 15:34 ` hubicka at ucw dot cz
` (4 subsequent siblings)
12 siblings, 1 reply; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-16 14:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #8 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #7)
> > 53730 r btver2_fp_min_issue_delay
> > 53760 r znver1_fp_transitions
> > 93960 r bdver3_fp_transitions
> > 106102 r lujiazui_core_check
> > 106102 r lujiazui_core_transitions
> > 196123 r lujiazui_core_min_issue_delay
> >
> > What shall we do with similar blowups in lujiazui and b[dt]ver[123] models?
> Yes, I think that makes sense...
Do you mean we should fix modeling of divisions there as well? I don't have
latency/throughput measurements for those CPUs, nor access so I can run
experiments myself, unfortunately.
I guess you mean just making a patch to model division units separately,
leaving latency/throughput as in current incorrect models, and leave it to
manufacturers to correct it? Alternatively, for AMD Bobcat and Bulldozer we
might be able to crowd-source it eventually.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bug target/87832] AMD pipeline models are very costly size-wise
2022-11-16 14:30 ` amonakov at gcc dot gnu.org
@ 2022-11-16 15:33 ` Jan Hubicka
0 siblings, 0 replies; 14+ messages in thread
From: Jan Hubicka @ 2022-11-16 15:33 UTC (permalink / raw)
To: amonakov at gcc dot gnu.org; +Cc: gcc-bugs
>
> Do you mean we should fix modeling of divisions there as well? I don't have
> latency/throughput measurements for those CPUs, nor access so I can run
> experiments myself, unfortunately.
>
> I guess you mean just making a patch to model division units separately,
> leaving latency/throughput as in current incorrect models, and leave it to
> manufacturers to correct it? Alternatively, for AMD Bobcat and Bulldozer we
> might be able to crowd-source it eventually.
Actually for older cores I think the manufacturers do not care much. I
still have a working Bulldozer machine and I can do some testing.
I think in Buldozer case I was basing the latency throughput on data in
Agner Fog's manuals. How do you test it?
Honza
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2022-11-16 14:30 ` amonakov at gcc dot gnu.org
@ 2022-11-16 15:34 ` hubicka at ucw dot cz
2022-11-16 17:15 ` amonakov at gcc dot gnu.org
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: hubicka at ucw dot cz @ 2022-11-16 15:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #9 from Jan Hubicka <hubicka at ucw dot cz> ---
>
> Do you mean we should fix modeling of divisions there as well? I don't have
> latency/throughput measurements for those CPUs, nor access so I can run
> experiments myself, unfortunately.
>
> I guess you mean just making a patch to model division units separately,
> leaving latency/throughput as in current incorrect models, and leave it to
> manufacturers to correct it? Alternatively, for AMD Bobcat and Bulldozer we
> might be able to crowd-source it eventually.
Actually for older cores I think the manufacturers do not care much. I
still have a working Bulldozer machine and I can do some testing.
I think in Buldozer case I was basing the latency throughput on data in
Agner Fog's manuals. How do you test it?
Honza
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2022-11-16 15:34 ` hubicka at ucw dot cz
@ 2022-11-16 17:15 ` amonakov at gcc dot gnu.org
2022-12-07 15:23 ` amonakov at gcc dot gnu.org
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-16 17:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #10 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #9)
> Actually for older cores I think the manufacturers do not care much. I
> still have a working Bulldozer machine and I can do some testing.
> I think in Buldozer case I was basing the latency throughput on data in
> Agner Fog's manuals.
Ahhh, how could I forget that his manuals have data for those cores too. Thanks
for the reminder! This solves the conundrum nicely:
AMD Jaguar ('btver2' in GCC): int/fp division is not pipelined, separate int/fp
dividers;
AMD Bulldozer, Steamroller ('bdver1', 'bdver3'): int division is not pipelined
(one divider), fp division is slightly pipelined (two independent dividers);
Zhaoxin Lujiazui appears to use the same divider as VIA Nano 3000, which is not
pipelined.
So it's already enough to produce a decent patch.
> How do you test it?
For AMD Zen patches I was using measurements by Andreas Abel (
https://uops.info/table_overview.html ) and running a few experiments myself by
coding loops in NASM and timing them with 'perf stat' on a Zen 2 CPU.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (9 preceding siblings ...)
2022-11-16 17:15 ` amonakov at gcc dot gnu.org
@ 2022-12-07 15:23 ` amonakov at gcc dot gnu.org
2022-12-08 9:48 ` marxin at gcc dot gnu.org
2023-01-02 16:39 ` cvs-commit at gcc dot gnu.org
12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-12-07 15:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #11 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Factoring out Lujiazui divider shrinks its tables by almost 20x:
3 r lujiazui_decoder_min_issue_delay
20 r lujiazui_decoder_transitions
32 r lujiazui_agu_min_issue_delay
126 r lujiazui_agu_transitions
304 r lujiazui_div_base
352 r lujiazui_div_check
352 r lujiazui_div_transitions
1152 r lujiazui_core_min_issue_delay
1592 r lujiazui_agu_translate
1592 r lujiazui_core_translate
1592 r lujiazui_decoder_translate
1592 r lujiazui_div_translate
3952 r lujiazui_div_min_issue_delay
9216 r lujiazui_core_transitions
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (10 preceding siblings ...)
2022-12-07 15:23 ` amonakov at gcc dot gnu.org
@ 2022-12-08 9:48 ` marxin at gcc dot gnu.org
2023-01-02 16:39 ` cvs-commit at gcc dot gnu.org
12 siblings, 0 replies; 14+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-12-08 9:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #12 from Martin Liška <marxin at gcc dot gnu.org> ---
Nice work Alexander!
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/87832] AMD pipeline models are very costly size-wise
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
` (11 preceding siblings ...)
2022-12-08 9:48 ` marxin at gcc dot gnu.org
@ 2023-01-02 16:39 ` cvs-commit at gcc dot gnu.org
12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-01-02 16:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alexander Monakov <amonakov@gcc.gnu.org>:
https://gcc.gnu.org/g:ec1db9017939bb8289c9bd63aace66c0f3957ecd
commit r13-4956-gec1db9017939bb8289c9bd63aace66c0f3957ecd
Author: Alexander Monakov <amonakov@ispras.ru>
Date: Fri Dec 9 20:47:55 2022 +0300
i386: correct division modeling in lujiazui.md
Model the divider in Lujiazui processors as a separate automaton to
significantly reduce the overall model size. This should also result
in improved accuracy, as pipe 0 should be able to accept new
instructions while the divider is occupied.
It is unclear why integer divisions are modeled as if pipes 0-3 are all
occupied. I've opted to keep a single-cycle reservation of all four
pipes together, so GCC should continue trying to pack instructions
around a division accordingly.
Currently top three symbols in insn-automata.o are:
106102 r lujiazui_core_check
106102 r lujiazui_core_transitions
196123 r lujiazui_core_min_issue_delay
This patch shrinks all lujiazui tables to:
3 r lujiazui_decoder_min_issue_delay
20 r lujiazui_decoder_transitions
32 r lujiazui_agu_min_issue_delay
126 r lujiazui_agu_transitions
304 r lujiazui_div_base
352 r lujiazui_div_check
352 r lujiazui_div_transitions
1152 r lujiazui_core_min_issue_delay
1592 r lujiazui_agu_translate
1592 r lujiazui_core_translate
1592 r lujiazui_decoder_translate
1592 r lujiazui_div_translate
3952 r lujiazui_div_min_issue_delay
9216 r lujiazui_core_transitions
This continues the work on reducing i386 insn-automata.o size started
with similar fixes for division and multiplication instructions in
znver.md.
gcc/ChangeLog:
PR target/87832
* config/i386/lujiazui.md (lujiazui_div): New automaton.
(lua_div): New unit.
(lua_idiv_qi): Correct unit in the reservation.
(lua_idiv_qi_load): Ditto.
(lua_idiv_hi): Ditto.
(lua_idiv_hi_load): Ditto.
(lua_idiv_si): Ditto.
(lua_idiv_si_load): Ditto.
(lua_idiv_di): Ditto.
(lua_idiv_di_load): Ditto.
(lua_fdiv_SF): Ditto.
(lua_fdiv_SF_load): Ditto.
(lua_fdiv_DF): Ditto.
(lua_fdiv_DF_load): Ditto.
(lua_fdiv_XF): Ditto.
(lua_fdiv_XF_load): Ditto.
(lua_ssediv_SF): Ditto.
(lua_ssediv_load_SF): Ditto.
(lua_ssediv_V4SF): Ditto.
(lua_ssediv_load_V4SF): Ditto.
(lua_ssediv_V8SF): Ditto.
(lua_ssediv_load_V8SF): Ditto.
(lua_ssediv_SD): Ditto.
(lua_ssediv_load_SD): Ditto.
(lua_ssediv_V2DF): Ditto.
(lua_ssediv_load_V2DF): Ditto.
(lua_ssediv_V4DF): Ditto.
(lua_ssediv_load_V4DF): Ditto.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-01-02 16:39 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
2022-10-24 18:48 ` [Bug target/87832] AMD pipeline models are very costly size-wise amonakov at gcc dot gnu.org
2022-11-01 12:21 ` cvs-commit at gcc dot gnu.org
2022-11-07 11:23 ` amonakov at gcc dot gnu.org
2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
2022-11-16 13:48 ` amonakov at gcc dot gnu.org
2022-11-16 14:16 ` hubicka at ucw dot cz
2022-11-16 14:30 ` amonakov at gcc dot gnu.org
2022-11-16 15:33 ` Jan Hubicka
2022-11-16 15:34 ` hubicka at ucw dot cz
2022-11-16 17:15 ` amonakov at gcc dot gnu.org
2022-12-07 15:23 ` amonakov at gcc dot gnu.org
2022-12-08 9:48 ` marxin at gcc dot gnu.org
2023-01-02 16:39 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).