* [PATCH 0/2] i386: slim down insn-automata [PR 87832] @ 2022-11-01 16:26 Alexander Monakov 2022-11-01 16:26 ` [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md Alexander Monakov ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Alexander Monakov @ 2022-11-01 16:26 UTC (permalink / raw) To: gcc-patches Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan, Alexander Monakov Hi, I'm sending followup fixes for combinatorial explosion of znver scheduling automaton tables as described in the earlier thread: https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543 I think lujiazui.md and b[dt]ver[123].md have similar issues. Alexander Monakov (2): i386: correct x87&SSE division modeling in znver.md i386: correct x87&SSE multiplication modeling in znver.md gcc/config/i386/znver.md | 67 ++++++++++++++++++++-------------------- 1 file changed, 34 insertions(+), 33 deletions(-) -- 2.37.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md 2022-11-01 16:26 [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov @ 2022-11-01 16:26 ` Alexander Monakov 2022-11-01 16:26 ` [PATCH 2/2] i386: correct x87&SSE multiplication " Alexander Monakov 2022-11-07 11:27 ` [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov 2 siblings, 0 replies; 9+ messages in thread From: Alexander Monakov @ 2022-11-01 16:26 UTC (permalink / raw) To: gcc-patches Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan, Alexander Monakov Correct modeling of division instructions in the SIMD/FP domain for AMD Zen architectures and avoid combinatorial explosion of automaton tables by modeling the separate floating-point division unit and correcting reservations to reflect reciprocal throughput of the corresponding instructions, similar to earlier commit 5cee5f94000 ("i386: correct integer division modeling in znver.md"). Division is partially pipelined and some instructions have fractional throughput (e.g. Zen 3 can issue divss and divsd each 3.5 and 4.5 cycles on average, respectively). Considering these CPUs implement out-of-order execution, the model doesn't need to be exact to the last cycle, so simplify it by using 4/5 cycles for SF/DF modes, and not modeling the fact that FP3 pipe is occupied for one cycle. Top znver table sizes in insn-automata.o: Before: 428108 r znver1_fp_min_issue_delay 856216 r znver1_fp_transitions After: 30056 r znver1_fp_min_issue_delay 120224 r znver1_fp_transitions gcc/ChangeLog: PR target/87832 * config/i386/znver.md (znver1_fdiv): New automaton. (znver1-fdiv): New unit. (znver1_fp_op_div): Correct unit and cycles in the reservation. (znver1_fp_op_div_load): Ditto. (znver1_fp_op_idiv_load): Ditto. (znver2_fp_op_idiv_load): Ditto. (znver1_ssediv_ss_ps): Ditto. (znver1_ssediv_ss_ps_load): Ditto. (znver1_ssediv_sd_pd): Ditto. (znver1_ssediv_sd_pd_load): Ditto. (znver1_ssediv_avx256_ps): Ditto. (znver1_ssediv_avx256_ps_load): Ditto. (znver1_ssediv_avx256_pd): Ditto. (znver1_ssediv_avx256_pd_load): Ditto. --- gcc/config/i386/znver.md | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md index 4aa098fd8..c52f8b532 100644 --- a/gcc/config/i386/znver.md +++ b/gcc/config/i386/znver.md @@ -24,7 +24,7 @@ (define_attr "znver1_decode" "direct,vector,double" ;; AMD znver1, znver2 and znver3 Scheduling ;; Modeling automatons for zen decoders, integer execution pipes, ;; SIMD/FP domain, AGU pipes, and dividers. -(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu, znver1_idiv") +(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu, znver1_idiv, znver1_fdiv") ;; Decoders unit has 4 decoders and all of them can decode fast path ;; and vector type instructions. @@ -95,6 +95,7 @@ (define_reservation "znver2-fvector" "znver1-fp0+znver1-fp1 ;; Dividers (define_cpu_unit "znver1-idiv" "znver1_idiv") +(define_cpu_unit "znver1-fdiv" "znver1_fdiv") ;; Call instruction (define_insn_reservation "znver1_call" 1 @@ -591,27 +592,27 @@ (define_insn_reservation "znver1_fp_op_div" 15 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "type" "fdiv") (eq_attr "memory" "none"))) - "znver1-direct,znver1-fp3*15") + "znver1-direct,znver1-fdiv*6") (define_insn_reservation "znver1_fp_op_div_load" 22 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "type" "fdiv") (eq_attr "memory" "load"))) - "znver1-direct,znver1-load,znver1-fp3*15") + "znver1-direct,znver1-load,znver1-fdiv*6") (define_insn_reservation "znver1_fp_op_idiv_load" 27 (and (eq_attr "cpu" "znver1") (and (eq_attr "type" "fdiv") (and (eq_attr "fp_int_src" "true") (eq_attr "memory" "load")))) - "znver1-double,znver1-load,znver1-fp3*19") + "znver1-double,znver1-load,znver1-fdiv*6") (define_insn_reservation "znver2_fp_op_idiv_load" 26 (and (eq_attr "cpu" "znver2,znver3") (and (eq_attr "type" "fdiv") (and (eq_attr "fp_int_src" "true") (eq_attr "memory" "load")))) - "znver1-double,znver1-load,znver1-fp3*19") + "znver1-double,znver1-load,znver1-fdiv*6") ;; MMX, SSE, SSEn.n, AVX, AVX2 instructions @@ -1088,7 +1089,7 @@ (define_insn_reservation "znver1_ssediv_ss_ps" 10 (eq_attr "mode" "V8SF,V4SF,SF"))) (and (eq_attr "type" "ssediv") (eq_attr "memory" "none"))) - "znver1-direct,znver1-fp3*10") + "znver1-direct,znver1-fdiv*4") (define_insn_reservation "znver1_ssediv_ss_ps_load" 17 (and (ior (and (eq_attr "cpu" "znver1") @@ -1099,7 +1100,7 @@ (define_insn_reservation "znver1_ssediv_ss_ps_load" 17 (eq_attr "mode" "V8SF,V4SF,SF"))) (and (eq_attr "type" "ssediv") (eq_attr "memory" "load"))) - "znver1-direct,znver1-load,znver1-fp3*10") + "znver1-direct,znver1-load,znver1-fdiv*4") (define_insn_reservation "znver1_ssediv_sd_pd" 13 (and (ior (and (eq_attr "cpu" "znver1") @@ -1110,7 +1111,7 @@ (define_insn_reservation "znver1_ssediv_sd_pd" 13 (eq_attr "mode" "V4DF,V2DF,DF"))) (and (eq_attr "type" "ssediv") (eq_attr "memory" "none"))) - "znver1-direct,znver1-fp3*13") + "znver1-direct,znver1-fdiv*5") (define_insn_reservation "znver1_ssediv_sd_pd_load" 20 (and (ior (and (eq_attr "cpu" "znver1") @@ -1121,35 +1122,35 @@ (define_insn_reservation "znver1_ssediv_sd_pd_load" 20 (eq_attr "mode" "V4DF,V2DF,DF"))) (and (eq_attr "type" "ssediv") (eq_attr "memory" "load"))) - "znver1-direct,znver1-load,znver1-fp3*13") + "znver1-direct,znver1-load,znver1-fdiv*5") (define_insn_reservation "znver1_ssediv_avx256_ps" 12 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V8SF") (and (eq_attr "memory" "none") (eq_attr "type" "ssediv")))) - "znver1-double,znver1-fp3*12") + "znver1-double,znver1-fdiv*8") (define_insn_reservation "znver1_ssediv_avx256_ps_load" 19 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V8SF") (and (eq_attr "type" "ssediv") (eq_attr "memory" "load")))) - "znver1-double,znver1-load,znver1-fp3*12") + "znver1-double,znver1-load,znver1-fdiv*8") (define_insn_reservation "znver1_ssediv_avx256_pd" 15 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V4DF") (and (eq_attr "type" "ssediv") (eq_attr "memory" "none")))) - "znver1-double,znver1-fp3*15") + "znver1-double,znver1-fdiv*10") (define_insn_reservation "znver1_ssediv_avx256_pd_load" 22 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V4DF") (and (eq_attr "type" "ssediv") (eq_attr "memory" "load")))) - "znver1-double,znver1-load,znver1-fp3*15") + "znver1-double,znver1-load,znver1-fdiv*10") ;; SSE MUL (define_insn_reservation "znver1_ssemul_ss_ps" 3 (and (ior (and (eq_attr "cpu" "znver1") -- 2.37.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md 2022-11-01 16:26 [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov 2022-11-01 16:26 ` [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md Alexander Monakov @ 2022-11-01 16:26 ` Alexander Monakov 2022-11-16 11:53 ` Kumar, Venkataramanan 2022-11-07 11:27 ` [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov 2 siblings, 1 reply; 9+ messages in thread From: Alexander Monakov @ 2022-11-01 16:26 UTC (permalink / raw) To: gcc-patches Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan, Alexander Monakov All multiplication instructions are fully pipelined, except AVX256 instructions on Zen 1, which issue over two cycles on a 128-bit unit. Correct the model accordingly to reduce combinatorial explosion in automaton tables. Top znver table sizes in insn-automata.o: Before: 30056 r znver1_fp_min_issue_delay 120224 r znver1_fp_transitions After: 6720 r znver1_fp_min_issue_delay 53760 r znver1_fp_transitions gcc/ChangeLog: PR target/87832 * config/i386/znver.md: (znver1_fp_op_mul): Correct cycles in the reservation. (znver1_fp_op_mul_load): Ditto. (znver1_mmx_mul): Ditto. (znver1_mmx_load): Ditto. (znver1_ssemul_ss_ps): Ditto. (znver1_ssemul_ss_ps_load): Ditto. (znver1_ssemul_avx256_ps): Ditto. (znver1_ssemul_avx256_ps_load): Ditto. (znver1_ssemul_sd_pd): Ditto. (znver1_ssemul_sd_pd_load): Ditto. (znver2_ssemul_sd_pd): Ditto. (znver2_ssemul_sd_pd_load): Ditto. (znver1_ssemul_avx256_pd): Ditto. (znver1_ssemul_avx256_pd_load): Ditto. (znver1_sseimul): Ditto. (znver1_sseimul_avx256): Ditto. (znver1_sseimul_load): Ditto. (znver1_sseimul_avx256_load): Ditto. (znver1_sseimul_di): Ditto. (znver1_sseimul_load_di): Ditto. --- gcc/config/i386/znver.md | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md index c52f8b532..882f250f1 100644 --- a/gcc/config/i386/znver.md +++ b/gcc/config/i386/znver.md @@ -573,13 +573,13 @@ (define_insn_reservation "znver1_fp_op_mul" 5 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "type" "fop,fmul") (eq_attr "memory" "none"))) - "znver1-direct,znver1-fp0*5") + "znver1-direct,znver1-fp0") (define_insn_reservation "znver1_fp_op_mul_load" 12 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "type" "fop,fmul") (eq_attr "memory" "load"))) - "znver1-direct,znver1-load,znver1-fp0*5") + "znver1-direct,znver1-load,znver1-fp0") (define_insn_reservation "znver1_fp_op_imul_load" 16 (and (eq_attr "cpu" "znver1,znver2,znver3") @@ -684,13 +684,13 @@ (define_insn_reservation "znver1_mmx_mul" 3 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "type" "mmxmul") (eq_attr "memory" "none"))) - "znver1-direct,znver1-fp0*3") + "znver1-direct,znver1-fp0") (define_insn_reservation "znver1_mmx_load" 10 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "type" "mmxmul") (eq_attr "memory" "load"))) - "znver1-direct,znver1-load,znver1-fp0*3") + "znver1-direct,znver1-load,znver1-fp0") ;; TODO (define_insn_reservation "znver1_avx256_log" 1 @@ -1161,7 +1161,7 @@ (define_insn_reservation "znver1_ssemul_ss_ps" 3 (eq_attr "mode" "V8SF,V4SF,SF,V4DF,V2DF,DF"))) (and (eq_attr "type" "ssemul") (eq_attr "memory" "none"))) - "znver1-direct,(znver1-fp0|znver1-fp1)*3") + "znver1-direct,znver1-fp0|znver1-fp1") (define_insn_reservation "znver1_ssemul_ss_ps_load" 10 (and (ior (and (eq_attr "cpu" "znver1") @@ -1172,47 +1172,47 @@ (define_insn_reservation "znver1_ssemul_ss_ps_load" 10 (eq_attr "mode" "V8SF,V4SF,SF"))) (and (eq_attr "type" "ssemul") (eq_attr "memory" "load"))) - "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3") + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1") (define_insn_reservation "znver1_ssemul_avx256_ps" 3 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V8SF") (and (eq_attr "type" "ssemul") (eq_attr "memory" "none")))) - "znver1-double,(znver1-fp0|znver1-fp1)*3") + "znver1-double,znver1-fp0*2|znver1-fp1*2") (define_insn_reservation "znver1_ssemul_avx256_ps_load" 10 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V8SF") (and (eq_attr "type" "ssemul") (eq_attr "memory" "load")))) - "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*3") + "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2") (define_insn_reservation "znver1_ssemul_sd_pd" 4 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V2DF,DF") (and (eq_attr "type" "ssemul") (eq_attr "memory" "none")))) - "znver1-direct,(znver1-fp0|znver1-fp1)*4") + "znver1-direct,znver1-fp0|znver1-fp1") (define_insn_reservation "znver1_ssemul_sd_pd_load" 11 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V2DF,DF") (and (eq_attr "type" "ssemul") (eq_attr "memory" "load")))) - "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*4") + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1") (define_insn_reservation "znver2_ssemul_sd_pd" 3 (and (eq_attr "cpu" "znver2,znver3") (and (eq_attr "type" "ssemul") (eq_attr "memory" "none"))) - "znver1-direct,(znver1-fp0|znver1-fp1)*3") + "znver1-direct,znver1-fp0|znver1-fp1") (define_insn_reservation "znver2_ssemul_sd_pd_load" 10 (and (eq_attr "cpu" "znver2,znver3") (and (eq_attr "type" "ssemul") (eq_attr "memory" "load"))) - "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3") + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1") (define_insn_reservation "znver1_ssemul_avx256_pd" 5 @@ -1220,14 +1220,14 @@ (define_insn_reservation "znver1_ssemul_avx256_pd" 5 (and (eq_attr "mode" "V4DF") (and (eq_attr "type" "ssemul") (eq_attr "memory" "none")))) - "znver1-double,(znver1-fp0|znver1-fp1)*4") + "znver1-double,znver1-fp0*2|znver1-fp1*2") (define_insn_reservation "znver1_ssemul_avx256_pd_load" 12 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V4DF") (and (eq_attr "type" "ssemul") (eq_attr "memory" "load")))) - "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*4") + "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2") ;;SSE imul (define_insn_reservation "znver1_sseimul" 3 @@ -1239,14 +1239,14 @@ (define_insn_reservation "znver1_sseimul" 3 (eq_attr "mode" "TI,OI"))) (and (eq_attr "type" "sseimul") (eq_attr "memory" "none"))) - "znver1-direct,znver1-fp0*3") + "znver1-direct,znver1-fp0") (define_insn_reservation "znver1_sseimul_avx256" 4 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "mode" "OI") (and (eq_attr "type" "sseimul") (eq_attr "memory" "none")))) - "znver1-double,znver1-fp0*4") + "znver1-double,znver1-fp0*2") (define_insn_reservation "znver1_sseimul_load" 10 (and (ior (and (eq_attr "cpu" "znver1") @@ -1257,28 +1257,28 @@ (define_insn_reservation "znver1_sseimul_load" 10 (eq_attr "mode" "TI,OI"))) (and (eq_attr "type" "sseimul") (eq_attr "memory" "load"))) - "znver1-direct,znver1-load,znver1-fp0*3") + "znver1-direct,znver1-load,znver1-fp0") (define_insn_reservation "znver1_sseimul_avx256_load" 11 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "mode" "OI") (and (eq_attr "type" "sseimul") (eq_attr "memory" "load")))) - "znver1-double,znver1-load,znver1-fp0*4") + "znver1-double,znver1-load,znver1-fp0*2") (define_insn_reservation "znver1_sseimul_di" 3 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "mode" "DI") (and (eq_attr "memory" "none") (eq_attr "type" "sseimul")))) - "znver1-direct,znver1-fp0*3") + "znver1-direct,znver1-fp0") (define_insn_reservation "znver1_sseimul_load_di" 10 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "mode" "DI") (and (eq_attr "type" "sseimul") (eq_attr "memory" "load")))) - "znver1-direct,znver1-load,znver1-fp0*3") + "znver1-direct,znver1-load,znver1-fp0") ;; SSE compares (define_insn_reservation "znver1_sse_cmp" 1 -- 2.37.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md 2022-11-01 16:26 ` [PATCH 2/2] i386: correct x87&SSE multiplication " Alexander Monakov @ 2022-11-16 11:53 ` Kumar, Venkataramanan 2022-11-16 12:21 ` Jan Hubička 0 siblings, 1 reply; 9+ messages in thread From: Kumar, Venkataramanan @ 2022-11-16 11:53 UTC (permalink / raw) To: Alexander Monakov, gcc-patches; +Cc: Jan Hubička, Joshi, Tejas Sanjay [AMD Official Use Only - General] Hi, Thank you for fixing this. > -----Original Message----- > From: Alexander Monakov <amonakov@ispras.ru> > Sent: Tuesday, November 1, 2022 9:57 PM > To: gcc-patches@gcc.gnu.org > Cc: Jan Hubička <honza.hubicka@gmail.com>; Joshi, Tejas Sanjay > <TejasSanjay.Joshi@amd.com>; Kumar, Venkataramanan > <Venkataramanan.Kumar@amd.com>; Alexander Monakov > <amonakov@ispras.ru> > Subject: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in > znver.md > > Caution: This message originated from an External Source. Use proper > caution when opening attachments, clicking links, or responding. > > > All multiplication instructions are fully pipelined, except AVX256 > instructions on Zen 1, which issue over two cycles on a 128-bit unit. > Correct the model accordingly to reduce combinatorial explosion in > automaton tables. > > Top znver table sizes in insn-automata.o: > > Before: > > 30056 r znver1_fp_min_issue_delay > 120224 r znver1_fp_transitions > > After: > > 6720 r znver1_fp_min_issue_delay > 53760 r znver1_fp_transitions > > gcc/ChangeLog: > > PR target/87832 > * config/i386/znver.md: (znver1_fp_op_mul): Correct cycles in > the reservation. > (znver1_fp_op_mul_load): Ditto. > (znver1_mmx_mul): Ditto. > (znver1_mmx_load): Ditto. > (znver1_ssemul_ss_ps): Ditto. > (znver1_ssemul_ss_ps_load): Ditto. > (znver1_ssemul_avx256_ps): Ditto. > (znver1_ssemul_avx256_ps_load): Ditto. > (znver1_ssemul_sd_pd): Ditto. > (znver1_ssemul_sd_pd_load): Ditto. > (znver2_ssemul_sd_pd): Ditto. > (znver2_ssemul_sd_pd_load): Ditto. > (znver1_ssemul_avx256_pd): Ditto. > (znver1_ssemul_avx256_pd_load): Ditto. > (znver1_sseimul): Ditto. > (znver1_sseimul_avx256): Ditto. > (znver1_sseimul_load): Ditto. > (znver1_sseimul_avx256_load): Ditto. > (znver1_sseimul_di): Ditto. > (znver1_sseimul_load_di): Ditto. > --- > gcc/config/i386/znver.md | 40 ++++++++++++++++++++-------------------- > 1 file changed, 20 insertions(+), 20 deletions(-) > > diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md index > c52f8b532..882f250f1 100644 > --- a/gcc/config/i386/znver.md > +++ b/gcc/config/i386/znver.md > @@ -573,13 +573,13 @@ (define_insn_reservation "znver1_fp_op_mul" 5 > (and (eq_attr "cpu" "znver1,znver2,znver3") > (and (eq_attr "type" "fop,fmul") > (eq_attr "memory" "none"))) > - "znver1-direct,znver1-fp0*5") > + "znver1-direct,znver1-fp0") > > (define_insn_reservation "znver1_fp_op_mul_load" 12 > (and (eq_attr "cpu" "znver1,znver2,znver3") > (and (eq_attr "type" "fop,fmul") > (eq_attr "memory" "load"))) > - "znver1-direct,znver1-load,znver1-fp0*5") > + "znver1-direct,znver1-load,znver1-fp0") > > (define_insn_reservation "znver1_fp_op_imul_load" 16 > (and (eq_attr "cpu" "znver1,znver2,znver3") @@ -684,13 > +684,13 @@ (define_insn_reservation "znver1_mmx_mul" 3 > (and (eq_attr "cpu" "znver1,znver2,znver3") > (and (eq_attr "type" "mmxmul") > (eq_attr "memory" "none"))) > - "znver1-direct,znver1-fp0*3") > + "znver1-direct,znver1-fp0") > > (define_insn_reservation "znver1_mmx_load" 10 > (and (eq_attr "cpu" "znver1,znver2,znver3") > (and (eq_attr "type" "mmxmul") > (eq_attr "memory" "load"))) > - "znver1-direct,znver1-load,znver1-fp0*3") > + "znver1-direct,znver1-load,znver1-fp0") > > ;; TODO > (define_insn_reservation "znver1_avx256_log" 1 @@ -1161,7 +1161,7 > @@ (define_insn_reservation "znver1_ssemul_ss_ps" 3 > (eq_attr "mode" > "V8SF,V4SF,SF,V4DF,V2DF,DF"))) > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "none"))) > - "znver1-direct,(znver1-fp0|znver1-fp1)*3") > + "znver1-direct,znver1-fp0|znver1-fp1") > > (define_insn_reservation "znver1_ssemul_ss_ps_load" 10 > (and (ior (and (eq_attr "cpu" "znver1") @@ -1172,47 > +1172,47 @@ (define_insn_reservation "znver1_ssemul_ss_ps_load" 10 > (eq_attr "mode" "V8SF,V4SF,SF"))) > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "load"))) > - "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3") > + > + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1") > > (define_insn_reservation "znver1_ssemul_avx256_ps" 3 > (and (eq_attr "cpu" "znver1") > (and (eq_attr "mode" "V8SF") > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "none")))) > - "znver1-double,(znver1-fp0|znver1-fp1)*3") > + "znver1-double,znver1-fp0*2|znver1-fp1*2") > > (define_insn_reservation "znver1_ssemul_avx256_ps_load" 10 > (and (eq_attr "cpu" "znver1") > (and (eq_attr "mode" "V8SF") > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "load")))) > - "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*3") > + > + "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2") > > (define_insn_reservation "znver1_ssemul_sd_pd" 4 > (and (eq_attr "cpu" "znver1") > (and (eq_attr "mode" "V2DF,DF") > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "none")))) > - "znver1-direct,(znver1-fp0|znver1-fp1)*4") > + "znver1-direct,znver1-fp0|znver1-fp1") > > (define_insn_reservation "znver1_ssemul_sd_pd_load" 11 > (and (eq_attr "cpu" "znver1") > (and (eq_attr "mode" "V2DF,DF") > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "load")))) > - "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*4") > + > + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1") > > (define_insn_reservation "znver2_ssemul_sd_pd" 3 > (and (eq_attr "cpu" "znver2,znver3") > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "none"))) > - "znver1-direct,(znver1-fp0|znver1-fp1)*3") > + "znver1-direct,znver1-fp0|znver1-fp1") > > (define_insn_reservation "znver2_ssemul_sd_pd_load" 10 > (and (eq_attr "cpu" "znver2,znver3") > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "load"))) > - "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3") > + > + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1") > > > (define_insn_reservation "znver1_ssemul_avx256_pd" 5 @@ -1220,14 > +1220,14 @@ (define_insn_reservation "znver1_ssemul_avx256_pd" 5 > (and (eq_attr "mode" "V4DF") > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "none")))) > - "znver1-double,(znver1-fp0|znver1-fp1)*4") > + "znver1-double,znver1-fp0*2|znver1-fp1*2") Do we need to include "znver1" check here? > > (define_insn_reservation "znver1_ssemul_avx256_pd_load" 12 > (and (eq_attr "cpu" "znver1") > (and (eq_attr "mode" "V4DF") > (and (eq_attr "type" "ssemul") > (eq_attr "memory" "load")))) > - "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*4") > + > + "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2") > > ;;SSE imul > (define_insn_reservation "znver1_sseimul" 3 @@ -1239,14 +1239,14 @@ > (define_insn_reservation "znver1_sseimul" 3 > (eq_attr "mode" "TI,OI"))) > (and (eq_attr "type" "sseimul") > (eq_attr "memory" "none"))) > - "znver1-direct,znver1-fp0*3") > + "znver1-direct,znver1-fp0") > > (define_insn_reservation "znver1_sseimul_avx256" 4 > (and (eq_attr "cpu" "znver1,znver2,znver3") > (and (eq_attr "mode" "OI") > (and (eq_attr "type" "sseimul") > (eq_attr "memory" "none")))) > - "znver1-double,znver1-fp0*4") > + "znver1-double,znver1-fp0*2") znver1 native path is 128 and znver2/3 has 256 bit paths. We need to split this into two reservations. One for znver1 and the other for znver2/3. > > (define_insn_reservation "znver1_sseimul_load" 10 > (and (ior (and (eq_attr "cpu" "znver1") @@ -1257,28 > +1257,28 @@ (define_insn_reservation "znver1_sseimul_load" 10 > (eq_attr "mode" "TI,OI"))) > (and (eq_attr "type" "sseimul") > (eq_attr "memory" "load"))) > - "znver1-direct,znver1-load,znver1-fp0*3") > + "znver1-direct,znver1-load,znver1-fp0") > > (define_insn_reservation "znver1_sseimul_avx256_load" 11 > (and (eq_attr "cpu" "znver1,znver2,znver3") > (and (eq_attr "mode" "OI") > (and (eq_attr "type" "sseimul") > (eq_attr "memory" "load")))) > - "znver1-double,znver1-load,znver1-fp0*4") > + "znver1-double,znver1-load,znver1-fp0*2") We need to split this into two reservations. One for znver1 and the other for znver2/3. > > (define_insn_reservation "znver1_sseimul_di" 3 > (and (eq_attr "cpu" "znver1,znver2,znver3") > (and (eq_attr "mode" "DI") > (and (eq_attr "memory" "none") > (eq_attr "type" "sseimul")))) > - "znver1-direct,znver1-fp0*3") > + "znver1-direct,znver1-fp0") > > (define_insn_reservation "znver1_sseimul_load_di" 10 > (and (eq_attr "cpu" "znver1,znver2,znver3") > (and (eq_attr "mode" "DI") > (and (eq_attr "type" "sseimul") > (eq_attr "memory" "load")))) > - "znver1-direct,znver1-load,znver1-fp0*3") > + "znver1-direct,znver1-load,znver1-fp0") > > ;; SSE compares > (define_insn_reservation "znver1_sse_cmp" 1 > -- > 2.37.2 The patch looks good. Regards, Venkat. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md 2022-11-16 11:53 ` Kumar, Venkataramanan @ 2022-11-16 12:21 ` Jan Hubička 2022-11-16 13:13 ` Alexander Monakov 0 siblings, 1 reply; 9+ messages in thread From: Jan Hubička @ 2022-11-16 12:21 UTC (permalink / raw) To: Kumar, Venkataramanan; +Cc: Alexander Monakov, gcc-patches, Joshi, Tejas Sanjay [-- Attachment #1: Type: text/plain, Size: 8849 bytes --] Hello, On Wed, Nov 16, 2022 at 12:53 PM Kumar, Venkataramanan < Venkataramanan.Kumar@amd.com> wrote: > [AMD Official Use Only - General] > > Hi, > > > > Top znver table sizes in insn-automata.o: > > > > Before: > > > > 30056 r znver1_fp_min_issue_delay > > 120224 r znver1_fp_transitions > > > > After: > > > > 6720 r znver1_fp_min_issue_delay > > 53760 r znver1_fp_transitions > This looks really promising. I will experiment with the patch for separate znver3 model, but I think we should be able to keep them unified and hopefully get both less code duplicatoin and table sizes. > > > > gcc/ChangeLog: > > > > PR target/87832 > > * config/i386/znver.md: (znver1_fp_op_mul): Correct cycles in > > the reservation. > > (znver1_fp_op_mul_load): Ditto. > > (znver1_mmx_mul): Ditto. > > (znver1_mmx_load): Ditto. > > (znver1_ssemul_ss_ps): Ditto. > > (znver1_ssemul_ss_ps_load): Ditto. > > (znver1_ssemul_avx256_ps): Ditto. > > (znver1_ssemul_avx256_ps_load): Ditto. > > (znver1_ssemul_sd_pd): Ditto. > > (znver1_ssemul_sd_pd_load): Ditto. > > (znver2_ssemul_sd_pd): Ditto. > > (znver2_ssemul_sd_pd_load): Ditto. > > (znver1_ssemul_avx256_pd): Ditto. > > (znver1_ssemul_avx256_pd_load): Ditto. > > (znver1_sseimul): Ditto. > > (znver1_sseimul_avx256): Ditto. > > (znver1_sseimul_load): Ditto. > > (znver1_sseimul_avx256_load): Ditto. > > (znver1_sseimul_di): Ditto. > > (znver1_sseimul_load_di): Ditto. > > --- > > gcc/config/i386/znver.md | 40 ++++++++++++++++++++-------------------- > > 1 file changed, 20 insertions(+), 20 deletions(-) > > > > diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md index > > c52f8b532..882f250f1 100644 > > --- a/gcc/config/i386/znver.md > > +++ b/gcc/config/i386/znver.md > > @@ -573,13 +573,13 @@ (define_insn_reservation "znver1_fp_op_mul" 5 > > (and (eq_attr "cpu" "znver1,znver2,znver3") > > (and (eq_attr "type" "fop,fmul") > > (eq_attr "memory" "none"))) > > - "znver1-direct,znver1-fp0*5") > > + "znver1-direct,znver1-fp0") > > > > (define_insn_reservation "znver1_fp_op_mul_load" 12 > > (and (eq_attr "cpu" "znver1,znver2,znver3") > > (and (eq_attr "type" "fop,fmul") > > (eq_attr "memory" "load"))) > > - "znver1-direct,znver1-load,znver1-fp0*5") > > + "znver1-direct,znver1-load,znver1-fp0") > > > > (define_insn_reservation "znver1_fp_op_imul_load" 16 > > (and (eq_attr "cpu" "znver1,znver2,znver3") @@ > -684,13 > > +684,13 @@ (define_insn_reservation "znver1_mmx_mul" 3 > > (and (eq_attr "cpu" "znver1,znver2,znver3") > > (and (eq_attr "type" "mmxmul") > > (eq_attr "memory" "none"))) > > - "znver1-direct,znver1-fp0*3") > > + "znver1-direct,znver1-fp0") > > > > (define_insn_reservation "znver1_mmx_load" 10 > > (and (eq_attr "cpu" "znver1,znver2,znver3") > > (and (eq_attr "type" "mmxmul") > > (eq_attr "memory" "load"))) > > - "znver1-direct,znver1-load,znver1-fp0*3") > > + "znver1-direct,znver1-load,znver1-fp0") > > > > ;; TODO > > (define_insn_reservation "znver1_avx256_log" 1 @@ -1161,7 +1161,7 > > @@ (define_insn_reservation "znver1_ssemul_ss_ps" 3 > > (eq_attr "mode" > > "V8SF,V4SF,SF,V4DF,V2DF,DF"))) > > (and (eq_attr "type" "ssemul") > > (eq_attr "memory" "none"))) > > - "znver1-direct,(znver1-fp0|znver1-fp1)*3") > > + "znver1-direct,znver1-fp0|znver1-fp1") > > > > (define_insn_reservation "znver1_ssemul_ss_ps_load" 10 > > (and (ior (and (eq_attr "cpu" "znver1") @@ > -1172,47 > > +1172,47 @@ (define_insn_reservation "znver1_ssemul_ss_ps_load" 10 > > (eq_attr "mode" > "V8SF,V4SF,SF"))) > > (and (eq_attr "type" "ssemul") > > (eq_attr "memory" "load"))) > > - > "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3") > > + > > + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1") > > > > (define_insn_reservation "znver1_ssemul_avx256_ps" 3 > > (and (eq_attr "cpu" "znver1") > > (and (eq_attr "mode" "V8SF") > > (and (eq_attr "type" "ssemul") > > (eq_attr "memory" "none")))) > > - "znver1-double,(znver1-fp0|znver1-fp1)*3") > > + "znver1-double,znver1-fp0*2|znver1-fp1*2") > > > > (define_insn_reservation "znver1_ssemul_avx256_ps_load" 10 > > (and (eq_attr "cpu" "znver1") > > (and (eq_attr "mode" "V8SF") > > (and (eq_attr "type" "ssemul") > > (eq_attr "memory" "load")))) > > - > "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*3") > > + > > + "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2") > > > > (define_insn_reservation "znver1_ssemul_sd_pd" 4 > > (and (eq_attr "cpu" "znver1") > > (and (eq_attr "mode" "V2DF,DF") > > (and (eq_attr "type" "ssemul") > > (eq_attr "memory" "none")))) > > - "znver1-direct,(znver1-fp0|znver1-fp1)*4") > > + "znver1-direct,znver1-fp0|znver1-fp1") > > > > (define_insn_reservation "znver1_ssemul_sd_pd_load" 11 > > (and (eq_attr "cpu" "znver1") > > (and (eq_attr "mode" "V2DF,DF") > > (and (eq_attr "type" "ssemul") > > (eq_attr "memory" "load")))) > > - > "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*4") > > + > > + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1") > > > > (define_insn_reservation "znver2_ssemul_sd_pd" 3 > > (and (eq_attr "cpu" "znver2,znver3") > > (and (eq_attr "type" "ssemul") > > (eq_attr "memory" "none"))) > > - "znver1-direct,(znver1-fp0|znver1-fp1)*3") > > + "znver1-direct,znver1-fp0|znver1-fp1") > > > > (define_insn_reservation "znver2_ssemul_sd_pd_load" 10 > > (and (eq_attr "cpu" "znver2,znver3") > > (and (eq_attr "type" "ssemul") > > (eq_attr "memory" "load"))) > > - > "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3") > > + > > + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1") > > > > > > (define_insn_reservation "znver1_ssemul_avx256_pd" 5 @@ -1220,14 > > +1220,14 @@ (define_insn_reservation "znver1_ssemul_avx256_pd" 5 > > (and (eq_attr "mode" "V4DF") > > (and (eq_attr "type" "ssemul") > > (eq_attr "memory" "none")))) > > - "znver1-double,(znver1-fp0|znver1-fp1)*4") > > + "znver1-double,znver1-fp0*2|znver1-fp1*2") > > Do we need to include "znver1" check here? > If people use nonsential combinations like -mtune=znver1 -march=znver2 this may help a bit. I do it from time to time to see differences between pipelilne models, but it is not too important. > > > > (define_insn_reservation "znver1_sseimul_avx256" 4 > > (and (eq_attr "cpu" "znver1,znver2,znver3") > > (and (eq_attr "mode" "OI") > > (and (eq_attr "type" "sseimul") > > (eq_attr "memory" "none")))) > > - "znver1-double,znver1-fp0*4") > > + "znver1-double,znver1-fp0*2") > > znver1 native path is 128 and znver2/3 has 256 bit paths. > We need to split this into two reservations. One for znver1 and the other > for znver2/3. > isn't it znver2 for 128 and znver3 for 256? The patch looks good. > Patch is OK then :) thanks a lot! Honza > > Regards, > Venkat. > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md 2022-11-16 12:21 ` Jan Hubička @ 2022-11-16 13:13 ` Alexander Monakov 2022-11-16 13:28 ` Jan Hubička 0 siblings, 1 reply; 9+ messages in thread From: Alexander Monakov @ 2022-11-16 13:13 UTC (permalink / raw) To: Jan Hubička; +Cc: Kumar, Venkataramanan, gcc-patches, Joshi, Tejas Sanjay [-- Attachment #1: Type: text/plain, Size: 3867 bytes --] On Wed, 16 Nov 2022, Jan Hubička wrote: > This looks really promising. I will experiment with the patch for separate > znver3 model, but I think we should be able to keep > them unified and hopefully get both less code duplicatoin and table sizes. Do you mean separate znver4 (not '3') model (i.e. the recent patch by AMD)? > > > (define_insn_reservation "znver1_ssemul_avx256_pd" 5 @@ -1220,14 > > > +1220,14 @@ (define_insn_reservation "znver1_ssemul_avx256_pd" 5 > > > (and (eq_attr "mode" "V4DF") > > > (and (eq_attr "type" "ssemul") > > > (eq_attr "memory" "none")))) > > > - "znver1-double,(znver1-fp0|znver1-fp1)*4") > > > + "znver1-double,znver1-fp0*2|znver1-fp1*2") > > > > Do we need to include "znver1" check here? > > > > If people use nonsential combinations like -mtune=znver1 -march=znver2 this > may help a bit. > I do it from time to time to see differences between pipelilne models, but > it is not too important. Actually no change is needed, the reservation already includes a check for znver1, it's just cut off in the patch context. Here's the full context: (define_insn_reservation "znver1_ssemul_avx256_pd" 5 (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "V4DF") (and (eq_attr "type" "ssemul") (eq_attr "memory" "none")))) "znver1-double,znver1-fp0*2|znver1-fp1*2") > > > (define_insn_reservation "znver1_sseimul_avx256" 4 > > > (and (eq_attr "cpu" "znver1,znver2,znver3") > > > (and (eq_attr "mode" "OI") > > > (and (eq_attr "type" "sseimul") > > > (eq_attr "memory" "none")))) > > > - "znver1-double,znver1-fp0*4") > > > + "znver1-double,znver1-fp0*2") > > > > znver1 native path is 128 and znver2/3 has 256 bit paths. > > We need to split this into two reservations. One for znver1 and the other > > for znver2/3. > > > > isn't it znver2 for 128 and znver3 for 256? No, Zen 1 splits AVX256 instructions into pairs of 128-bit uops, Zen 2 and Zen 3 have native 256-bit units. Zen 4 again executes AVX512 instructions on 256-bit units. I think a split is not needed because the preceding reservation already handles znver2 and znver3, we just need to remove them here, like this: diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md index 882f250f1..16b5afa5d 100644 --- a/gcc/config/i386/znver.md +++ b/gcc/config/i386/znver.md @@ -1242,7 +1242,7 @@ (define_insn_reservation "znver1_sseimul" 3 "znver1-direct,znver1-fp0") (define_insn_reservation "znver1_sseimul_avx256" 4 - (and (eq_attr "cpu" "znver1,znver2,znver3") + (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "OI") (and (eq_attr "type" "sseimul") (eq_attr "memory" "none")))) @@ -1260,7 +1260,7 @@ (define_insn_reservation "znver1_sseimul_load" 10 "znver1-direct,znver1-load,znver1-fp0") (define_insn_reservation "znver1_sseimul_avx256_load" 11 - (and (eq_attr "cpu" "znver1,znver2,znver3") + (and (eq_attr "cpu" "znver1") (and (eq_attr "mode" "OI") (and (eq_attr "type" "sseimul") (eq_attr "memory" "load")))) > The patch looks good. > > > Patch is OK then :) For *both* patches in the series? Alexander ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md 2022-11-16 13:13 ` Alexander Monakov @ 2022-11-16 13:28 ` Jan Hubička 0 siblings, 0 replies; 9+ messages in thread From: Jan Hubička @ 2022-11-16 13:28 UTC (permalink / raw) To: Alexander Monakov; +Cc: Kumar, Venkataramanan, gcc-patches, Joshi, Tejas Sanjay [-- Attachment #1: Type: text/plain, Size: 2762 bytes --] On Wed, Nov 16, 2022 at 2:13 PM Alexander Monakov <amonakov@ispras.ru> wrote: > > On Wed, 16 Nov 2022, Jan Hubička wrote: > > > This looks really promising. I will experiment with the patch for > separate > > znver3 model, but I think we should be able to keep > > them unified and hopefully get both less code duplicatoin and table > sizes. > > Do you mean separate znver4 (not '3') model (i.e. the recent patch by AMD)? > Yes. I guess we want to check what variant leads to smaller automaton. I would somewhat prefer to keep the models unified since they are quite similar > > > znver1 native path is 128 and znver2/3 has 256 bit paths. > > > We need to split this into two reservations. One for znver1 and the > other > > > for znver2/3. > > > > > > > isn't it znver2 for 128 and znver3 for 256? > > No, Zen 1 splits AVX256 instructions into pairs of 128-bit uops, Zen 2 and > Zen 3 have native 256-bit units. Zen 4 again executes AVX512 instructions > on 256-bit units. > Ah, of course. I mixed things up in my memory. Sorry fro that. > > I think a split is not needed because the preceding reservation already > handles > znver2 and znver3, we just need to remove them here, like this: > > diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md > index 882f250f1..16b5afa5d 100644 > --- a/gcc/config/i386/znver.md > +++ b/gcc/config/i386/znver.md > @@ -1242,7 +1242,7 @@ (define_insn_reservation "znver1_sseimul" 3 > "znver1-direct,znver1-fp0") > > (define_insn_reservation "znver1_sseimul_avx256" 4 > - (and (eq_attr "cpu" "znver1,znver2,znver3") > + (and (eq_attr "cpu" "znver1") > It should work even without removal since first reservation matches, but this is quite less confusing indeed. > (and (eq_attr "mode" "OI") > (and (eq_attr "type" "sseimul") > (eq_attr "memory" "none")))) > @@ -1260,7 +1260,7 @@ (define_insn_reservation "znver1_sseimul_load" 10 > "znver1-direct,znver1-load,znver1-fp0") > > (define_insn_reservation "znver1_sseimul_avx256_load" 11 > - (and (eq_attr "cpu" "znver1,znver2,znver3") > + (and (eq_attr "cpu" "znver1") > (and (eq_attr "mode" "OI") > (and (eq_attr "type" "sseimul") > (eq_attr "memory" "load")))) > > > The patch looks good. > > > > > Patch is OK then :) > > For *both* patches in the series? > Yes, thanks a lot for looking into this! Honza > > Alexander ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] i386: slim down insn-automata [PR 87832] 2022-11-01 16:26 [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov 2022-11-01 16:26 ` [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md Alexander Monakov 2022-11-01 16:26 ` [PATCH 2/2] i386: correct x87&SSE multiplication " Alexander Monakov @ 2022-11-07 11:27 ` Alexander Monakov 2022-11-14 11:19 ` Alexander Monakov 2 siblings, 1 reply; 9+ messages in thread From: Alexander Monakov @ 2022-11-07 11:27 UTC (permalink / raw) To: gcc-patches; +Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan On Tue, 1 Nov 2022, Alexander Monakov wrote: > Hi, > > I'm sending followup fixes for combinatorial explosion of znver scheduling > automaton tables as described in the earlier thread: > > https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543 AMD folks, do you have any feedback? What is the way forward for this patchset? Alexander > > I think lujiazui.md and b[dt]ver[123].md have similar issues. > > Alexander Monakov (2): > i386: correct x87&SSE division modeling in znver.md > i386: correct x87&SSE multiplication modeling in znver.md > > gcc/config/i386/znver.md | 67 ++++++++++++++++++++-------------------- > 1 file changed, 34 insertions(+), 33 deletions(-) > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] i386: slim down insn-automata [PR 87832] 2022-11-07 11:27 ` [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov @ 2022-11-14 11:19 ` Alexander Monakov 0 siblings, 0 replies; 9+ messages in thread From: Alexander Monakov @ 2022-11-14 11:19 UTC (permalink / raw) To: gcc-patches Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan, Jan Hubicka, Uros Bizjak On Mon, 7 Nov 2022, Alexander Monakov wrote: > > On Tue, 1 Nov 2022, Alexander Monakov wrote: > > > Hi, > > > > I'm sending followup fixes for combinatorial explosion of znver scheduling > > automaton tables as described in the earlier thread: > > > > https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543 > > AMD folks, do you have any feedback? > > What is the way forward for this patchset? Ping? > Alexander > > > > > I think lujiazui.md and b[dt]ver[123].md have similar issues. > > > > Alexander Monakov (2): > > i386: correct x87&SSE division modeling in znver.md > > i386: correct x87&SSE multiplication modeling in znver.md > > > > gcc/config/i386/znver.md | 67 ++++++++++++++++++++-------------------- > > 1 file changed, 34 insertions(+), 33 deletions(-) > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-11-16 13:28 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-01 16:26 [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov 2022-11-01 16:26 ` [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md Alexander Monakov 2022-11-01 16:26 ` [PATCH 2/2] i386: correct x87&SSE multiplication " Alexander Monakov 2022-11-16 11:53 ` Kumar, Venkataramanan 2022-11-16 12:21 ` Jan Hubička 2022-11-16 13:13 ` Alexander Monakov 2022-11-16 13:28 ` Jan Hubička 2022-11-07 11:27 ` [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov 2022-11-14 11:19 ` Alexander Monakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).