[Bug target/87832] AMD pipeline models are very costly size-wise

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
@ 2022-10-24 18:48 ` amonakov at gcc dot gnu.org
  2022-11-01 12:21 ` cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-10-24 18:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Suggested partial fix for the integer-pipe side of the blowup:
https://inbox.sourceware.org/gcc-patches/4549f27b-238a-7d77-f72b-cc77df8ae36e@ispras.ru/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
  2022-10-24 18:48 ` [Bug target/87832] AMD pipeline models are very costly size-wise amonakov at gcc dot gnu.org
@ 2022-11-01 12:21 ` cvs-commit at gcc dot gnu.org
  2022-11-07 11:23 ` amonakov at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-01 12:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alexander Monakov <amonakov@gcc.gnu.org>:

https://gcc.gnu.org/g:5cee5f94000ee5eabce9b223c44c7923c1c69f61

commit r13-3589-g5cee5f94000ee5eabce9b223c44c7923c1c69f61
Author: Alexander Monakov <amonakov@ispras.ru>
Date:   Mon Oct 31 17:35:57 2022 +0300

    i386: correct integer division modeling in znver.md

    In znver.md, division instructions have descriptions like

    (define_insn_reservation "znver1_idiv_DI" 41
                            (and (eq_attr "cpu" "znver1,znver2")
                                 (and (eq_attr "type" "idiv")
                                      (and (eq_attr "mode" "DI")
                                           (eq_attr "memory" "none"))))
                            "znver1-double,znver1-ieu2*41")

    which says that DImode idiv has latency 41 (which is correct) and that
    it occupies 2nd integer execution unit for 41 consecutive cycles, but
    that is not correct:

    1) the division instruction is partially pipelined, and has throughput
       1/14, not 1/41;

    2) for the most part it occupies a separate division unit, not the
       general arithmetic unit.

    Evidently, interaction of such 41-cycle paths with the rest of
    reservations causes a combinatorial explosion in the automaton.

    Fix this by modeling the integer division unit properly, and correcting
    reservations to use the measured reciprocal throughput of those
    instructions (available from uops.info). A similar correction for
    floating-point divisions is left for a followup patch.

    Top 5 znver table sizes, before:

    68692 r znver1_ieu_check
    68692 r znver1_ieu_transitions
    99792 r znver1_ieu_min_issue_delay
    428108 r znver1_fp_min_issue_delay
    856216 r znver1_fp_transitions

    After:

    1454 r znver1_ieu_translate
    1454 r znver1_translate
    2304 r znver1_ieu_transitions
    428108 r znver1_fp_min_issue_delay
    856216 r znver1_fp_transitions

    gcc/ChangeLog:

            PR target/87832
            * config/i386/znver.md (znver1_idiv): New automaton.
            (znver1-idiv): New unit.
            (znver1_idiv_DI): Correct unit and cycles in the reservation.
            (znver1_idiv_SI): Ditto.
            (znver1_idiv_HI): Ditto.
            (znver1_idiv_QI): Ditto.
            (znver1_idiv_mem_DI): Ditto.
            (znver1_idiv_mem_SI): Ditto.
            (znver1_idiv_mem_HI): Ditto.
            (znver1_idiv_mem_QI): Ditto.
            (znver3_idiv_DI): Ditto.
            (znver3_idiv_SI): Ditto.
            (znver3_idiv_HI): Ditto.
            (znver3_idiv_QI): Ditto.
            (znver3_idiv_mem_DI): Ditto.
            (znver3_idiv_mem_SI): Ditto.
            (znver3_idiv_mem_HI): Ditto.
            (znver3_idiv_mem_QI): Ditto.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
  2022-10-24 18:48 ` [Bug target/87832] AMD pipeline models are very costly size-wise amonakov at gcc dot gnu.org
  2022-11-01 12:21 ` cvs-commit at gcc dot gnu.org
@ 2022-11-07 11:23 ` amonakov at gcc dot gnu.org
  2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-07 11:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Followup patches have been posted at
https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonakov@ispras.ru/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2022-11-07 11:23 ` amonakov at gcc dot gnu.org
@ 2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
  2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-16 13:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alexander Monakov <amonakov@gcc.gnu.org>:

https://gcc.gnu.org/g:dd744f06c9952f92738b0860630085f0f0b99574

commit r13-4092-gdd744f06c9952f92738b0860630085f0f0b99574
Author: Alexander Monakov <amonakov@ispras.ru>
Date:   Tue Nov 1 17:04:25 2022 +0300

    i386: correct x87&SSE division modeling in znver.md

    Correct modeling of division instructions in the SIMD/FP domain for
    AMD Zen architectures and avoid combinatorial explosion of automaton
    tables by modeling the separate floating-point division unit and
    correcting reservations to reflect reciprocal throughput of the
    corresponding instructions, similar to earlier commit
    5cee5f94000 ("i386: correct integer division modeling in znver.md").

    Division is partially pipelined and some instructions have fractional
    throughput (e.g. Zen 3 can issue divss and divsd each 3.5 and 4.5
    cycles on average, respectively). Considering these CPUs implement
    out-of-order execution, the model doesn't need to be exact to the last
    cycle, so simplify it by using 4/5 cycles for SF/DF modes, and not
    modeling the fact that FP3 pipe is occupied for one cycle.

    Top znver table sizes in insn-automata.o:

    Before:

    428108 r znver1_fp_min_issue_delay
    856216 r znver1_fp_transitions

    After:

    30056 r znver1_fp_min_issue_delay
    120224 r znver1_fp_transitions

    gcc/ChangeLog:

            PR target/87832
            * config/i386/znver.md (znver1_fdiv): New automaton.
            (znver1-fdiv): New unit.
            (znver1_fp_op_div): Correct unit and cycles in the reservation.
            (znver1_fp_op_div_load): Ditto.
            (znver1_fp_op_idiv_load): Ditto.
            (znver2_fp_op_idiv_load): Ditto.
            (znver1_ssediv_ss_ps): Ditto.
            (znver1_ssediv_ss_ps_load): Ditto.
            (znver1_ssediv_sd_pd): Ditto.
            (znver1_ssediv_sd_pd_load): Ditto.
            (znver1_ssediv_avx256_ps): Ditto.
            (znver1_ssediv_avx256_ps_load): Ditto.
            (znver1_ssediv_avx256_pd): Ditto.
            (znver1_ssediv_avx256_pd_load): Ditto.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
@ 2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
  2022-11-16 13:48 ` amonakov at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-16 13:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alexander Monakov <amonakov@gcc.gnu.org>:

https://gcc.gnu.org/g:d4cc7a8c4a623b62dd0d486d7780d91b58eb6f1f

commit r13-4093-gd4cc7a8c4a623b62dd0d486d7780d91b58eb6f1f
Author: Alexander Monakov <amonakov@ispras.ru>
Date:   Tue Nov 1 17:53:13 2022 +0300

    i386: correct x87&SSE multiplication modeling in znver.md

    All multiplication instructions are fully pipelined, except AVX256
    instructions on Zen 1, which issue over two cycles on a 128-bit unit.
    Correct the model accordingly to reduce combinatorial explosion in
    automaton tables.

    Top znver table sizes in insn-automata.o:

    Before:

    30056 r znver1_fp_min_issue_delay
    120224 r znver1_fp_transitions

    After:

    6720 r znver1_fp_min_issue_delay
    53760 r znver1_fp_transitions

    gcc/ChangeLog:

            PR target/87832
            * config/i386/znver.md: (znver1_fp_op_mul): Correct cycles in
            the reservation.
            (znver1_fp_op_mul_load): Ditto.
            (znver1_mmx_mul): Ditto.
            (znver1_mmx_load): Ditto.
            (znver1_ssemul_ss_ps): Ditto.
            (znver1_ssemul_ss_ps_load): Ditto.
            (znver1_ssemul_avx256_ps): Ditto.
            (znver1_ssemul_avx256_ps_load): Ditto.
            (znver1_ssemul_sd_pd): Ditto.
            (znver1_ssemul_sd_pd_load): Ditto.
            (znver2_ssemul_sd_pd): Ditto.
            (znver2_ssemul_sd_pd_load): Ditto.
            (znver1_ssemul_avx256_pd): Ditto.
            (znver1_ssemul_avx256_pd_load): Ditto.
            (znver1_sseimul): Ditto.
            (znver1_sseimul_avx256): Ditto.
            (znver1_sseimul_load): Ditto.
            (znver1_sseimul_avx256_load): Ditto.
            (znver1_sseimul_di): Ditto.
            (znver1_sseimul_load_di): Ditto.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
@ 2022-11-16 13:48 ` amonakov at gcc dot gnu.org
  2022-11-16 14:16 ` hubicka at ucw dot cz
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-16 13:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #6 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
With these patches on trunk, current situation is:

nm -CS -t d --defined-only gcc/insn-automata.o | sed 's/^[0-9]* 0*//' | sort -n
| tail -40
2496 r slm_base
2527 r bdver3_load_min_issue_delay
2746 r glm_base
3892 r bdver1_fp_base
4444 r bdver1_ieu_min_issue_delay
4492 r geode_base
4608 r bdver3_ieu_transitions
6402 r bdver1_load_transitions
6720 r znver1_fp_min_issue_delay
7862 r athlon_fp_check
7862 r athlon_fp_transitions
9122 r lujiazui_core_base
9997 t internal_insn_latency(int, int, rtx_insn*, rtx_insn*)
10108 r bdver3_load_transitions
10498 r geode_check
10498 r geode_transitions
11632 r print_reservation(_IO_FILE*, rtx_insn*)::reservation_names
12575 r athlon_fp_min_issue_delay
12742 r btver2_fp_check
12742 r btver2_fp_transitions
13896 r slm_check
13896 r slm_transitions
17149 t internal_min_issue_delay(int, DFA_chip*)
17349 t internal_state_transition(int, DFA_chip*)
17776 r bdver1_ieu_transitions
20068 r bdver1_fp_check
20068 r bdver1_fp_transitions
26208 r slm_min_issue_delay
27244 r bdver1_fp_min_issue_delay
28518 r glm_check
28518 r glm_transitions
33690 r geode_min_issue_delay
46980 r bdver3_fp_min_issue_delay
49428 r glm_min_issue_delay
53730 r btver2_fp_min_issue_delay
53760 r znver1_fp_transitions
93960 r bdver3_fp_transitions
106102 r lujiazui_core_check
106102 r lujiazui_core_transitions
196123 r lujiazui_core_min_issue_delay

What shall we do with similar blowups in lujiazui and b[dt]ver[123] models?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2022-11-16 13:48 ` amonakov at gcc dot gnu.org
@ 2022-11-16 14:16 ` hubicka at ucw dot cz
  2022-11-16 14:30 ` amonakov at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: hubicka at ucw dot cz @ 2022-11-16 14:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #7 from Jan Hubicka <hubicka at ucw dot cz> ---
> 53730 r btver2_fp_min_issue_delay
> 53760 r znver1_fp_transitions
> 93960 r bdver3_fp_transitions
> 106102 r lujiazui_core_check
> 106102 r lujiazui_core_transitions
> 196123 r lujiazui_core_min_issue_delay
> 
> What shall we do with similar blowups in lujiazui and b[dt]ver[123] models?
Yes, I think that makes sense...

Honza

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2022-11-16 14:16 ` hubicka at ucw dot cz
@ 2022-11-16 14:30 ` amonakov at gcc dot gnu.org
  2022-11-16 15:33   ` Jan Hubicka
  2022-11-16 15:34 ` hubicka at ucw dot cz
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-16 14:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #8 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #7)
> > 53730 r btver2_fp_min_issue_delay
> > 53760 r znver1_fp_transitions
> > 93960 r bdver3_fp_transitions
> > 106102 r lujiazui_core_check
> > 106102 r lujiazui_core_transitions
> > 196123 r lujiazui_core_min_issue_delay
> > 
> > What shall we do with similar blowups in lujiazui and b[dt]ver[123] models?
> Yes, I think that makes sense...

Do you mean we should fix modeling of divisions there as well? I don't have
latency/throughput measurements for those CPUs, nor access so I can run
experiments myself, unfortunately.

I guess you mean just making a patch to model division units separately,
leaving latency/throughput as in current incorrect models, and leave it to
manufacturers to correct it? Alternatively, for AMD Bobcat and Bulldozer we
might be able to crowd-source it eventually.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug target/87832] AMD pipeline models are very costly size-wise
  2022-11-16 14:30 ` amonakov at gcc dot gnu.org
@ 2022-11-16 15:33   ` Jan Hubicka
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Hubicka @ 2022-11-16 15:33 UTC (permalink / raw)
  To: amonakov at gcc dot gnu.org; +Cc: gcc-bugs

> 
> Do you mean we should fix modeling of divisions there as well? I don't have
> latency/throughput measurements for those CPUs, nor access so I can run
> experiments myself, unfortunately.
> 
> I guess you mean just making a patch to model division units separately,
> leaving latency/throughput as in current incorrect models, and leave it to
> manufacturers to correct it? Alternatively, for AMD Bobcat and Bulldozer we
> might be able to crowd-source it eventually.
Actually for older cores I think the manufacturers do not care much.  I
still have a working Bulldozer machine and I can do some testing.
I think in Buldozer case I was basing the latency throughput on data in
Agner Fog's manuals.  How do you test it?
Honza

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2022-11-16 14:30 ` amonakov at gcc dot gnu.org
@ 2022-11-16 15:34 ` hubicka at ucw dot cz
  2022-11-16 17:15 ` amonakov at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: hubicka at ucw dot cz @ 2022-11-16 15:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #9 from Jan Hubicka <hubicka at ucw dot cz> ---
> 
> Do you mean we should fix modeling of divisions there as well? I don't have
> latency/throughput measurements for those CPUs, nor access so I can run
> experiments myself, unfortunately.
> 
> I guess you mean just making a patch to model division units separately,
> leaving latency/throughput as in current incorrect models, and leave it to
> manufacturers to correct it? Alternatively, for AMD Bobcat and Bulldozer we
> might be able to crowd-source it eventually.
Actually for older cores I think the manufacturers do not care much.  I
still have a working Bulldozer machine and I can do some testing.
I think in Buldozer case I was basing the latency throughput on data in
Agner Fog's manuals.  How do you test it?
Honza

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2022-11-16 15:34 ` hubicka at ucw dot cz
@ 2022-11-16 17:15 ` amonakov at gcc dot gnu.org
  2022-12-07 15:23 ` amonakov at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-16 17:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #10 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #9)
> Actually for older cores I think the manufacturers do not care much.  I
> still have a working Bulldozer machine and I can do some testing.
> I think in Buldozer case I was basing the latency throughput on data in
> Agner Fog's manuals.

Ahhh, how could I forget that his manuals have data for those cores too. Thanks
for the reminder! This solves the conundrum nicely:

AMD Jaguar ('btver2' in GCC): int/fp division is not pipelined, separate int/fp
dividers;

AMD Bulldozer, Steamroller ('bdver1', 'bdver3'): int division is not pipelined
(one divider), fp division is slightly pipelined (two independent dividers);

Zhaoxin Lujiazui appears to use the same divider as VIA Nano 3000, which is not
pipelined.

So it's already enough to produce a decent patch.

> How do you test it?

For AMD Zen patches I was using measurements by Andreas Abel (
https://uops.info/table_overview.html ) and running a few experiments myself by
coding loops in NASM and timing them with 'perf stat' on a Zen 2 CPU.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2022-11-16 17:15 ` amonakov at gcc dot gnu.org
@ 2022-12-07 15:23 ` amonakov at gcc dot gnu.org
  2022-12-08  9:48 ` marxin at gcc dot gnu.org
  2023-01-02 16:39 ` cvs-commit at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-12-07 15:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #11 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Factoring out Lujiazui divider shrinks its tables by almost 20x:

3 r lujiazui_decoder_min_issue_delay
20 r lujiazui_decoder_transitions
32 r lujiazui_agu_min_issue_delay
126 r lujiazui_agu_transitions
304 r lujiazui_div_base
352 r lujiazui_div_check
352 r lujiazui_div_transitions
1152 r lujiazui_core_min_issue_delay
1592 r lujiazui_agu_translate
1592 r lujiazui_core_translate
1592 r lujiazui_decoder_translate
1592 r lujiazui_div_translate
3952 r lujiazui_div_min_issue_delay
9216 r lujiazui_core_transitions

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2022-12-07 15:23 ` amonakov at gcc dot gnu.org
@ 2022-12-08  9:48 ` marxin at gcc dot gnu.org
  2023-01-02 16:39 ` cvs-commit at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-12-08  9:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #12 from Martin Liška <marxin at gcc dot gnu.org> ---
Nice work Alexander!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/87832] AMD pipeline models are very costly size-wise
       [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2022-12-08  9:48 ` marxin at gcc dot gnu.org
@ 2023-01-02 16:39 ` cvs-commit at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-01-02 16:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832

--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Alexander Monakov <amonakov@gcc.gnu.org>:

https://gcc.gnu.org/g:ec1db9017939bb8289c9bd63aace66c0f3957ecd

commit r13-4956-gec1db9017939bb8289c9bd63aace66c0f3957ecd
Author: Alexander Monakov <amonakov@ispras.ru>
Date:   Fri Dec 9 20:47:55 2022 +0300

    i386: correct division modeling in lujiazui.md

    Model the divider in Lujiazui processors as a separate automaton to
    significantly reduce the overall model size. This should also result
    in improved accuracy, as pipe 0 should be able to accept new
    instructions while the divider is occupied.

    It is unclear why integer divisions are modeled as if pipes 0-3 are all
    occupied. I've opted to keep a single-cycle reservation of all four
    pipes together, so GCC should continue trying to pack instructions
    around a division accordingly.

    Currently top three symbols in insn-automata.o are:

    106102 r lujiazui_core_check
    106102 r lujiazui_core_transitions
    196123 r lujiazui_core_min_issue_delay

    This patch shrinks all lujiazui tables to:

    3 r lujiazui_decoder_min_issue_delay
    20 r lujiazui_decoder_transitions
    32 r lujiazui_agu_min_issue_delay
    126 r lujiazui_agu_transitions
    304 r lujiazui_div_base
    352 r lujiazui_div_check
    352 r lujiazui_div_transitions
    1152 r lujiazui_core_min_issue_delay
    1592 r lujiazui_agu_translate
    1592 r lujiazui_core_translate
    1592 r lujiazui_decoder_translate
    1592 r lujiazui_div_translate
    3952 r lujiazui_div_min_issue_delay
    9216 r lujiazui_core_transitions

    This continues the work on reducing i386 insn-automata.o size started
    with similar fixes for division and multiplication instructions in
    znver.md.

    gcc/ChangeLog:

            PR target/87832
            * config/i386/lujiazui.md (lujiazui_div): New automaton.
            (lua_div): New unit.
            (lua_idiv_qi): Correct unit in the reservation.
            (lua_idiv_qi_load): Ditto.
            (lua_idiv_hi): Ditto.
            (lua_idiv_hi_load): Ditto.
            (lua_idiv_si): Ditto.
            (lua_idiv_si_load): Ditto.
            (lua_idiv_di): Ditto.
            (lua_idiv_di_load): Ditto.
            (lua_fdiv_SF): Ditto.
            (lua_fdiv_SF_load): Ditto.
            (lua_fdiv_DF): Ditto.
            (lua_fdiv_DF_load): Ditto.
            (lua_fdiv_XF): Ditto.
            (lua_fdiv_XF_load): Ditto.
            (lua_ssediv_SF): Ditto.
            (lua_ssediv_load_SF): Ditto.
            (lua_ssediv_V4SF): Ditto.
            (lua_ssediv_load_V4SF): Ditto.
            (lua_ssediv_V8SF): Ditto.
            (lua_ssediv_load_V8SF): Ditto.
            (lua_ssediv_SD): Ditto.
            (lua_ssediv_load_SD): Ditto.
            (lua_ssediv_V2DF): Ditto.
            (lua_ssediv_load_V2DF): Ditto.
            (lua_ssediv_V4DF): Ditto.
            (lua_ssediv_load_V4DF): Ditto.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-01-02 16:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-87832-4@http.gcc.gnu.org/bugzilla/>
2022-10-24 18:48 ` [Bug target/87832] AMD pipeline models are very costly size-wise amonakov at gcc dot gnu.org
2022-11-01 12:21 ` cvs-commit at gcc dot gnu.org
2022-11-07 11:23 ` amonakov at gcc dot gnu.org
2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
2022-11-16 13:41 ` cvs-commit at gcc dot gnu.org
2022-11-16 13:48 ` amonakov at gcc dot gnu.org
2022-11-16 14:16 ` hubicka at ucw dot cz
2022-11-16 14:30 ` amonakov at gcc dot gnu.org
2022-11-16 15:33   ` Jan Hubicka
2022-11-16 15:34 ` hubicka at ucw dot cz
2022-11-16 17:15 ` amonakov at gcc dot gnu.org
2022-12-07 15:23 ` amonakov at gcc dot gnu.org
2022-12-08  9:48 ` marxin at gcc dot gnu.org
2023-01-02 16:39 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).