public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/2] i386: slim down insn-automata [PR 87832]
@ 2022-11-01 16:26 Alexander Monakov
  2022-11-01 16:26 ` [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md Alexander Monakov
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Alexander Monakov @ 2022-11-01 16:26 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan,
	Alexander Monakov

Hi,

I'm sending followup fixes for combinatorial explosion of znver scheduling
automaton tables as described in the earlier thread:

https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543

I think lujiazui.md and b[dt]ver[123].md have similar issues.

Alexander Monakov (2):
  i386: correct x87&SSE division modeling in znver.md
  i386: correct x87&SSE multiplication modeling in znver.md

 gcc/config/i386/znver.md | 67 ++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 33 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md
  2022-11-01 16:26 [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov
@ 2022-11-01 16:26 ` Alexander Monakov
  2022-11-01 16:26 ` [PATCH 2/2] i386: correct x87&SSE multiplication " Alexander Monakov
  2022-11-07 11:27 ` [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov
  2 siblings, 0 replies; 9+ messages in thread
From: Alexander Monakov @ 2022-11-01 16:26 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan,
	Alexander Monakov

Correct modeling of division instructions in the SIMD/FP domain for
AMD Zen architectures and avoid combinatorial explosion of automaton
tables by modeling the separate floating-point division unit and
correcting reservations to reflect reciprocal throughput of the
corresponding instructions, similar to earlier commit
5cee5f94000 ("i386: correct integer division modeling in znver.md").

Division is partially pipelined and some instructions have fractional
throughput (e.g. Zen 3 can issue divss and divsd each 3.5 and 4.5
cycles on average, respectively). Considering these CPUs implement
out-of-order execution, the model doesn't need to be exact to the last
cycle, so simplify it by using 4/5 cycles for SF/DF modes, and not
modeling the fact that FP3 pipe is occupied for one cycle.

Top znver table sizes in insn-automata.o:

Before:

428108 r znver1_fp_min_issue_delay
856216 r znver1_fp_transitions

After:

30056 r znver1_fp_min_issue_delay
120224 r znver1_fp_transitions

gcc/ChangeLog:

	PR target/87832
	* config/i386/znver.md (znver1_fdiv): New automaton.
	(znver1-fdiv): New unit.
	(znver1_fp_op_div): Correct unit and cycles in the reservation.
	(znver1_fp_op_div_load): Ditto.
	(znver1_fp_op_idiv_load): Ditto.
	(znver2_fp_op_idiv_load): Ditto.
	(znver1_ssediv_ss_ps): Ditto.
	(znver1_ssediv_ss_ps_load): Ditto.
	(znver1_ssediv_sd_pd): Ditto.
	(znver1_ssediv_sd_pd_load): Ditto.
	(znver1_ssediv_avx256_ps): Ditto.
	(znver1_ssediv_avx256_ps_load): Ditto.
	(znver1_ssediv_avx256_pd): Ditto.
	(znver1_ssediv_avx256_pd_load): Ditto.
---
 gcc/config/i386/znver.md | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md
index 4aa098fd8..c52f8b532 100644
--- a/gcc/config/i386/znver.md
+++ b/gcc/config/i386/znver.md
@@ -24,7 +24,7 @@ (define_attr "znver1_decode" "direct,vector,double"
 ;; AMD znver1, znver2 and znver3 Scheduling
 ;; Modeling automatons for zen decoders, integer execution pipes,
 ;; SIMD/FP domain, AGU pipes, and dividers.
-(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu, znver1_idiv")
+(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu, znver1_idiv, znver1_fdiv")
 
 ;; Decoders unit has 4 decoders and all of them can decode fast path
 ;; and vector type instructions.
@@ -95,6 +95,7 @@ (define_reservation "znver2-fvector" "znver1-fp0+znver1-fp1
 
 ;; Dividers
 (define_cpu_unit "znver1-idiv" "znver1_idiv")
+(define_cpu_unit "znver1-fdiv" "znver1_fdiv")
 
 ;; Call instruction
 (define_insn_reservation "znver1_call" 1
@@ -591,27 +592,27 @@ (define_insn_reservation "znver1_fp_op_div" 15
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "type" "fdiv")
 				   (eq_attr "memory" "none")))
-			 "znver1-direct,znver1-fp3*15")
+			 "znver1-direct,znver1-fdiv*6")
 
 (define_insn_reservation "znver1_fp_op_div_load" 22
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "type" "fdiv")
 				   (eq_attr "memory" "load")))
-			 "znver1-direct,znver1-load,znver1-fp3*15")
+			 "znver1-direct,znver1-load,znver1-fdiv*6")
 
 (define_insn_reservation "znver1_fp_op_idiv_load" 27
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "type" "fdiv")
 				   (and (eq_attr "fp_int_src" "true")
 					(eq_attr "memory" "load"))))
-			 "znver1-double,znver1-load,znver1-fp3*19")
+			 "znver1-double,znver1-load,znver1-fdiv*6")
 
 (define_insn_reservation "znver2_fp_op_idiv_load" 26
 			 (and (eq_attr "cpu" "znver2,znver3")
 			      (and (eq_attr "type" "fdiv")
 				   (and (eq_attr "fp_int_src" "true")
 					(eq_attr "memory" "load"))))
-			 "znver1-double,znver1-load,znver1-fp3*19")
+			 "znver1-double,znver1-load,znver1-fdiv*6")
 
 
 ;; MMX, SSE, SSEn.n, AVX, AVX2 instructions
@@ -1088,7 +1089,7 @@ (define_insn_reservation "znver1_ssediv_ss_ps" 10
 					      (eq_attr "mode" "V8SF,V4SF,SF")))
 			      (and (eq_attr "type" "ssediv")
 				   (eq_attr "memory" "none")))
-			 "znver1-direct,znver1-fp3*10")
+			 "znver1-direct,znver1-fdiv*4")
 
 (define_insn_reservation "znver1_ssediv_ss_ps_load" 17
 			 (and (ior (and (eq_attr "cpu" "znver1")
@@ -1099,7 +1100,7 @@ (define_insn_reservation "znver1_ssediv_ss_ps_load" 17
 					      (eq_attr "mode" "V8SF,V4SF,SF")))
 			      (and (eq_attr "type" "ssediv")
 				   (eq_attr "memory" "load")))
-			 "znver1-direct,znver1-load,znver1-fp3*10")
+			 "znver1-direct,znver1-load,znver1-fdiv*4")
 
 (define_insn_reservation "znver1_ssediv_sd_pd" 13
 			 (and (ior (and (eq_attr "cpu" "znver1")
@@ -1110,7 +1111,7 @@ (define_insn_reservation "znver1_ssediv_sd_pd" 13
 					      (eq_attr "mode" "V4DF,V2DF,DF")))
 			      (and (eq_attr "type" "ssediv")
 				   (eq_attr "memory" "none")))
-			 "znver1-direct,znver1-fp3*13")
+			 "znver1-direct,znver1-fdiv*5")
 
 (define_insn_reservation "znver1_ssediv_sd_pd_load" 20
 			 (and (ior (and (eq_attr "cpu" "znver1")
@@ -1121,35 +1122,35 @@ (define_insn_reservation "znver1_ssediv_sd_pd_load" 20
 					      (eq_attr "mode" "V4DF,V2DF,DF")))
 			      (and (eq_attr "type" "ssediv")
 				   (eq_attr "memory" "load")))
-			 "znver1-direct,znver1-load,znver1-fp3*13")
+			 "znver1-direct,znver1-load,znver1-fdiv*5")
 
 (define_insn_reservation "znver1_ssediv_avx256_ps" 12
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "mode" "V8SF")
 				   (and (eq_attr "memory" "none")
 					(eq_attr "type" "ssediv"))))
-			 "znver1-double,znver1-fp3*12")
+			 "znver1-double,znver1-fdiv*8")
 
 (define_insn_reservation "znver1_ssediv_avx256_ps_load" 19
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "mode" "V8SF")
 				   (and (eq_attr "type" "ssediv")
 					(eq_attr "memory" "load"))))
-			 "znver1-double,znver1-load,znver1-fp3*12")
+			 "znver1-double,znver1-load,znver1-fdiv*8")
 
 (define_insn_reservation "znver1_ssediv_avx256_pd" 15
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "mode" "V4DF")
 				   (and (eq_attr "type" "ssediv")
 					(eq_attr "memory" "none"))))
-			 "znver1-double,znver1-fp3*15")
+			 "znver1-double,znver1-fdiv*10")
 
 (define_insn_reservation "znver1_ssediv_avx256_pd_load" 22 
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "mode" "V4DF")
 				   (and (eq_attr "type" "ssediv")
 					(eq_attr "memory" "load"))))
-			 "znver1-double,znver1-load,znver1-fp3*15")
+			 "znver1-double,znver1-load,znver1-fdiv*10")
 ;; SSE MUL
 (define_insn_reservation "znver1_ssemul_ss_ps" 3
 			 (and (ior (and (eq_attr "cpu" "znver1")
-- 
2.37.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md
  2022-11-01 16:26 [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov
  2022-11-01 16:26 ` [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md Alexander Monakov
@ 2022-11-01 16:26 ` Alexander Monakov
  2022-11-16 11:53   ` Kumar, Venkataramanan
  2022-11-07 11:27 ` [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov
  2 siblings, 1 reply; 9+ messages in thread
From: Alexander Monakov @ 2022-11-01 16:26 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan,
	Alexander Monakov

All multiplication instructions are fully pipelined, except AVX256
instructions on Zen 1, which issue over two cycles on a 128-bit unit.
Correct the model accordingly to reduce combinatorial explosion in
automaton tables.

Top znver table sizes in insn-automata.o:

Before:

30056 r znver1_fp_min_issue_delay
120224 r znver1_fp_transitions

After:

6720 r znver1_fp_min_issue_delay
53760 r znver1_fp_transitions

gcc/ChangeLog:

	PR target/87832
	* config/i386/znver.md: (znver1_fp_op_mul): Correct cycles in
	the reservation.
	(znver1_fp_op_mul_load): Ditto.
	(znver1_mmx_mul): Ditto.
	(znver1_mmx_load): Ditto.
	(znver1_ssemul_ss_ps): Ditto.
	(znver1_ssemul_ss_ps_load): Ditto.
	(znver1_ssemul_avx256_ps): Ditto.
	(znver1_ssemul_avx256_ps_load): Ditto.
	(znver1_ssemul_sd_pd): Ditto.
	(znver1_ssemul_sd_pd_load): Ditto.
	(znver2_ssemul_sd_pd): Ditto.
	(znver2_ssemul_sd_pd_load): Ditto.
	(znver1_ssemul_avx256_pd): Ditto.
	(znver1_ssemul_avx256_pd_load): Ditto.
	(znver1_sseimul): Ditto.
	(znver1_sseimul_avx256): Ditto.
	(znver1_sseimul_load): Ditto.
	(znver1_sseimul_avx256_load): Ditto.
	(znver1_sseimul_di): Ditto.
	(znver1_sseimul_load_di): Ditto.
---
 gcc/config/i386/znver.md | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md
index c52f8b532..882f250f1 100644
--- a/gcc/config/i386/znver.md
+++ b/gcc/config/i386/znver.md
@@ -573,13 +573,13 @@ (define_insn_reservation "znver1_fp_op_mul" 5
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "type" "fop,fmul")
 				   (eq_attr "memory" "none")))
-			 "znver1-direct,znver1-fp0*5")
+			 "znver1-direct,znver1-fp0")
 
 (define_insn_reservation "znver1_fp_op_mul_load" 12 
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "type" "fop,fmul")
 				   (eq_attr "memory" "load")))
-			 "znver1-direct,znver1-load,znver1-fp0*5")
+			 "znver1-direct,znver1-load,znver1-fp0")
 
 (define_insn_reservation "znver1_fp_op_imul_load" 16
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
@@ -684,13 +684,13 @@ (define_insn_reservation "znver1_mmx_mul" 3
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "type" "mmxmul")
 				   (eq_attr "memory" "none")))
-			  "znver1-direct,znver1-fp0*3")
+			  "znver1-direct,znver1-fp0")
 
 (define_insn_reservation "znver1_mmx_load" 10
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "type" "mmxmul")
 				   (eq_attr "memory" "load")))
-			 "znver1-direct,znver1-load,znver1-fp0*3")
+			 "znver1-direct,znver1-load,znver1-fp0")
 
 ;; TODO
 (define_insn_reservation "znver1_avx256_log" 1
@@ -1161,7 +1161,7 @@ (define_insn_reservation "znver1_ssemul_ss_ps" 3
 					      (eq_attr "mode" "V8SF,V4SF,SF,V4DF,V2DF,DF")))
 			      (and (eq_attr "type" "ssemul")
 				   (eq_attr "memory" "none")))
-			 "znver1-direct,(znver1-fp0|znver1-fp1)*3")
+			 "znver1-direct,znver1-fp0|znver1-fp1")
 
 (define_insn_reservation "znver1_ssemul_ss_ps_load" 10 
 			 (and (ior (and (eq_attr "cpu" "znver1")
@@ -1172,47 +1172,47 @@ (define_insn_reservation "znver1_ssemul_ss_ps_load" 10
 					      (eq_attr "mode" "V8SF,V4SF,SF")))
 			      (and (eq_attr "type" "ssemul")
 				   (eq_attr "memory" "load")))
-			 "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3")
+			 "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
 
 (define_insn_reservation "znver1_ssemul_avx256_ps" 3
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "mode" "V8SF")
 				   (and (eq_attr "type" "ssemul")
 					(eq_attr "memory" "none"))))
-			 "znver1-double,(znver1-fp0|znver1-fp1)*3")
+			 "znver1-double,znver1-fp0*2|znver1-fp1*2")
 
 (define_insn_reservation "znver1_ssemul_avx256_ps_load" 10
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "mode" "V8SF")
 				   (and (eq_attr "type" "ssemul")
 					(eq_attr "memory" "load"))))
-			 "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*3")
+			 "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2")
 
 (define_insn_reservation "znver1_ssemul_sd_pd" 4
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "mode" "V2DF,DF")
 				   (and (eq_attr "type" "ssemul")
 					(eq_attr "memory" "none"))))
-			 "znver1-direct,(znver1-fp0|znver1-fp1)*4")
+			 "znver1-direct,znver1-fp0|znver1-fp1")
 
 (define_insn_reservation "znver1_ssemul_sd_pd_load" 11
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "mode" "V2DF,DF")
 				   (and (eq_attr "type" "ssemul")
 					(eq_attr "memory" "load"))))
-			 "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*4")
+			 "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
 
 (define_insn_reservation "znver2_ssemul_sd_pd" 3
 			 (and (eq_attr "cpu" "znver2,znver3")
 			      (and (eq_attr "type" "ssemul")
 				   (eq_attr "memory" "none")))
-			 "znver1-direct,(znver1-fp0|znver1-fp1)*3")
+			 "znver1-direct,znver1-fp0|znver1-fp1")
 
 (define_insn_reservation "znver2_ssemul_sd_pd_load" 10
 			 (and (eq_attr "cpu" "znver2,znver3")
 			      (and (eq_attr "type" "ssemul")
 				   (eq_attr "memory" "load")))
-			 "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3")
+			 "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
 
 
 (define_insn_reservation "znver1_ssemul_avx256_pd" 5
@@ -1220,14 +1220,14 @@ (define_insn_reservation "znver1_ssemul_avx256_pd" 5
 			      (and (eq_attr "mode" "V4DF")
 				   (and (eq_attr "type" "ssemul")
 					(eq_attr "memory" "none"))))
-			 "znver1-double,(znver1-fp0|znver1-fp1)*4")
+			 "znver1-double,znver1-fp0*2|znver1-fp1*2")
 
 (define_insn_reservation "znver1_ssemul_avx256_pd_load" 12
 			 (and (eq_attr "cpu" "znver1")
 			      (and (eq_attr "mode" "V4DF")
 				   (and (eq_attr "type" "ssemul")
 					(eq_attr "memory" "load"))))
-			 "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*4")
+			 "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2")
 
 ;;SSE imul
 (define_insn_reservation "znver1_sseimul" 3
@@ -1239,14 +1239,14 @@ (define_insn_reservation "znver1_sseimul" 3
 					      (eq_attr "mode" "TI,OI")))
 			      (and (eq_attr "type" "sseimul")
 				   (eq_attr "memory" "none")))
-			 "znver1-direct,znver1-fp0*3")
+			 "znver1-direct,znver1-fp0")
 
 (define_insn_reservation "znver1_sseimul_avx256" 4
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "mode" "OI")
 				   (and (eq_attr "type" "sseimul")
 					(eq_attr "memory" "none"))))
-			 "znver1-double,znver1-fp0*4")
+			 "znver1-double,znver1-fp0*2")
 
 (define_insn_reservation "znver1_sseimul_load" 10
 			 (and (ior (and (eq_attr "cpu" "znver1")
@@ -1257,28 +1257,28 @@ (define_insn_reservation "znver1_sseimul_load" 10
 			                (eq_attr "mode" "TI,OI")))
 			      (and (eq_attr "type" "sseimul")
 				   (eq_attr "memory" "load")))
-			 "znver1-direct,znver1-load,znver1-fp0*3")
+			 "znver1-direct,znver1-load,znver1-fp0")
 
 (define_insn_reservation "znver1_sseimul_avx256_load" 11
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "mode" "OI")
 				   (and (eq_attr "type" "sseimul")
 					(eq_attr "memory" "load"))))
-			 "znver1-double,znver1-load,znver1-fp0*4")
+			 "znver1-double,znver1-load,znver1-fp0*2")
 
 (define_insn_reservation "znver1_sseimul_di" 3 
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "mode" "DI")
 				   (and (eq_attr "memory" "none")
 					(eq_attr "type" "sseimul"))))
-			 "znver1-direct,znver1-fp0*3")
+			 "znver1-direct,znver1-fp0")
 
 (define_insn_reservation "znver1_sseimul_load_di" 10 
 			 (and (eq_attr "cpu" "znver1,znver2,znver3")
 			      (and (eq_attr "mode" "DI")
 				   (and (eq_attr "type" "sseimul")
 					(eq_attr "memory" "load"))))
-			 "znver1-direct,znver1-load,znver1-fp0*3")
+			 "znver1-direct,znver1-load,znver1-fp0")
 
 ;; SSE compares
 (define_insn_reservation "znver1_sse_cmp" 1
-- 
2.37.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] i386: slim down insn-automata [PR 87832]
  2022-11-01 16:26 [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov
  2022-11-01 16:26 ` [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md Alexander Monakov
  2022-11-01 16:26 ` [PATCH 2/2] i386: correct x87&SSE multiplication " Alexander Monakov
@ 2022-11-07 11:27 ` Alexander Monakov
  2022-11-14 11:19   ` Alexander Monakov
  2 siblings, 1 reply; 9+ messages in thread
From: Alexander Monakov @ 2022-11-07 11:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan


On Tue, 1 Nov 2022, Alexander Monakov wrote:

> Hi,
> 
> I'm sending followup fixes for combinatorial explosion of znver scheduling
> automaton tables as described in the earlier thread:
> 
> https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543

AMD folks, do you have any feedback?

What is the way forward for this patchset?

Alexander

> 
> I think lujiazui.md and b[dt]ver[123].md have similar issues.
> 
> Alexander Monakov (2):
>   i386: correct x87&SSE division modeling in znver.md
>   i386: correct x87&SSE multiplication modeling in znver.md
> 
>  gcc/config/i386/znver.md | 67 ++++++++++++++++++++--------------------
>  1 file changed, 34 insertions(+), 33 deletions(-)
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] i386: slim down insn-automata [PR 87832]
  2022-11-07 11:27 ` [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov
@ 2022-11-14 11:19   ` Alexander Monakov
  0 siblings, 0 replies; 9+ messages in thread
From: Alexander Monakov @ 2022-11-14 11:19 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jan Hubička, Joshi, Tejas Sanjay, Kumar, Venkataramanan,
	Jan Hubicka, Uros Bizjak


On Mon, 7 Nov 2022, Alexander Monakov wrote:

> 
> On Tue, 1 Nov 2022, Alexander Monakov wrote:
> 
> > Hi,
> > 
> > I'm sending followup fixes for combinatorial explosion of znver scheduling
> > automaton tables as described in the earlier thread:
> > 
> > https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543
> 
> AMD folks, do you have any feedback?
> 
> What is the way forward for this patchset?

Ping?

> Alexander
> 
> > 
> > I think lujiazui.md and b[dt]ver[123].md have similar issues.
> > 
> > Alexander Monakov (2):
> >   i386: correct x87&SSE division modeling in znver.md
> >   i386: correct x87&SSE multiplication modeling in znver.md
> > 
> >  gcc/config/i386/znver.md | 67 ++++++++++++++++++++--------------------
> >  1 file changed, 34 insertions(+), 33 deletions(-)
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md
  2022-11-01 16:26 ` [PATCH 2/2] i386: correct x87&SSE multiplication " Alexander Monakov
@ 2022-11-16 11:53   ` Kumar, Venkataramanan
  2022-11-16 12:21     ` Jan Hubička
  0 siblings, 1 reply; 9+ messages in thread
From: Kumar, Venkataramanan @ 2022-11-16 11:53 UTC (permalink / raw)
  To: Alexander Monakov, gcc-patches; +Cc: Jan Hubička, Joshi, Tejas Sanjay

[AMD Official Use Only - General]

Hi,

Thank you for fixing this.

> -----Original Message-----
> From: Alexander Monakov <amonakov@ispras.ru>
> Sent: Tuesday, November 1, 2022 9:57 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Jan Hubička <honza.hubicka@gmail.com>; Joshi, Tejas Sanjay
> <TejasSanjay.Joshi@amd.com>; Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com>; Alexander Monakov
> <amonakov@ispras.ru>
> Subject: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in
> znver.md
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> All multiplication instructions are fully pipelined, except AVX256
> instructions on Zen 1, which issue over two cycles on a 128-bit unit.
> Correct the model accordingly to reduce combinatorial explosion in
> automaton tables.
>
> Top znver table sizes in insn-automata.o:
>
> Before:
>
> 30056 r znver1_fp_min_issue_delay
> 120224 r znver1_fp_transitions
>
> After:
>
> 6720 r znver1_fp_min_issue_delay
> 53760 r znver1_fp_transitions
>
> gcc/ChangeLog:
>
>         PR target/87832
>         * config/i386/znver.md: (znver1_fp_op_mul): Correct cycles in
>         the reservation.
>         (znver1_fp_op_mul_load): Ditto.
>         (znver1_mmx_mul): Ditto.
>         (znver1_mmx_load): Ditto.
>         (znver1_ssemul_ss_ps): Ditto.
>         (znver1_ssemul_ss_ps_load): Ditto.
>         (znver1_ssemul_avx256_ps): Ditto.
>         (znver1_ssemul_avx256_ps_load): Ditto.
>         (znver1_ssemul_sd_pd): Ditto.
>         (znver1_ssemul_sd_pd_load): Ditto.
>         (znver2_ssemul_sd_pd): Ditto.
>         (znver2_ssemul_sd_pd_load): Ditto.
>         (znver1_ssemul_avx256_pd): Ditto.
>         (znver1_ssemul_avx256_pd_load): Ditto.
>         (znver1_sseimul): Ditto.
>         (znver1_sseimul_avx256): Ditto.
>         (znver1_sseimul_load): Ditto.
>         (znver1_sseimul_avx256_load): Ditto.
>         (znver1_sseimul_di): Ditto.
>         (znver1_sseimul_load_di): Ditto.
> ---
>  gcc/config/i386/znver.md | 40 ++++++++++++++++++++--------------------
>  1 file changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md index
> c52f8b532..882f250f1 100644
> --- a/gcc/config/i386/znver.md
> +++ b/gcc/config/i386/znver.md
> @@ -573,13 +573,13 @@ (define_insn_reservation "znver1_fp_op_mul" 5
>                          (and (eq_attr "cpu" "znver1,znver2,znver3")
>                               (and (eq_attr "type" "fop,fmul")
>                                    (eq_attr "memory" "none")))
> -                        "znver1-direct,znver1-fp0*5")
> +                        "znver1-direct,znver1-fp0")
>
>  (define_insn_reservation "znver1_fp_op_mul_load" 12
>                          (and (eq_attr "cpu" "znver1,znver2,znver3")
>                               (and (eq_attr "type" "fop,fmul")
>                                    (eq_attr "memory" "load")))
> -                        "znver1-direct,znver1-load,znver1-fp0*5")
> +                        "znver1-direct,znver1-load,znver1-fp0")
>
>  (define_insn_reservation "znver1_fp_op_imul_load" 16
>                          (and (eq_attr "cpu" "znver1,znver2,znver3") @@ -684,13
> +684,13 @@ (define_insn_reservation "znver1_mmx_mul" 3
>                          (and (eq_attr "cpu" "znver1,znver2,znver3")
>                               (and (eq_attr "type" "mmxmul")
>                                    (eq_attr "memory" "none")))
> -                         "znver1-direct,znver1-fp0*3")
> +                         "znver1-direct,znver1-fp0")
>
>  (define_insn_reservation "znver1_mmx_load" 10
>                          (and (eq_attr "cpu" "znver1,znver2,znver3")
>                               (and (eq_attr "type" "mmxmul")
>                                    (eq_attr "memory" "load")))
> -                        "znver1-direct,znver1-load,znver1-fp0*3")
> +                        "znver1-direct,znver1-load,znver1-fp0")
>
>  ;; TODO
>  (define_insn_reservation "znver1_avx256_log" 1 @@ -1161,7 +1161,7
> @@ (define_insn_reservation "znver1_ssemul_ss_ps" 3
>                                               (eq_attr "mode"
> "V8SF,V4SF,SF,V4DF,V2DF,DF")))
>                               (and (eq_attr "type" "ssemul")
>                                    (eq_attr "memory" "none")))
> -                        "znver1-direct,(znver1-fp0|znver1-fp1)*3")
> +                        "znver1-direct,znver1-fp0|znver1-fp1")
>
>  (define_insn_reservation "znver1_ssemul_ss_ps_load" 10
>                          (and (ior (and (eq_attr "cpu" "znver1") @@ -1172,47
> +1172,47 @@ (define_insn_reservation "znver1_ssemul_ss_ps_load" 10
>                                               (eq_attr "mode" "V8SF,V4SF,SF")))
>                               (and (eq_attr "type" "ssemul")
>                                    (eq_attr "memory" "load")))
> -                        "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3")
> +
> + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
>
>  (define_insn_reservation "znver1_ssemul_avx256_ps" 3
>                          (and (eq_attr "cpu" "znver1")
>                               (and (eq_attr "mode" "V8SF")
>                                    (and (eq_attr "type" "ssemul")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,(znver1-fp0|znver1-fp1)*3")
> +                        "znver1-double,znver1-fp0*2|znver1-fp1*2")
>
>  (define_insn_reservation "znver1_ssemul_avx256_ps_load" 10
>                          (and (eq_attr "cpu" "znver1")
>                               (and (eq_attr "mode" "V8SF")
>                                    (and (eq_attr "type" "ssemul")
>                                         (eq_attr "memory" "load"))))
> -                        "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*3")
> +
> + "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2")
>
>  (define_insn_reservation "znver1_ssemul_sd_pd" 4
>                          (and (eq_attr "cpu" "znver1")
>                               (and (eq_attr "mode" "V2DF,DF")
>                                    (and (eq_attr "type" "ssemul")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-direct,(znver1-fp0|znver1-fp1)*4")
> +                        "znver1-direct,znver1-fp0|znver1-fp1")
>
>  (define_insn_reservation "znver1_ssemul_sd_pd_load" 11
>                          (and (eq_attr "cpu" "znver1")
>                               (and (eq_attr "mode" "V2DF,DF")
>                                    (and (eq_attr "type" "ssemul")
>                                         (eq_attr "memory" "load"))))
> -                        "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*4")
> +
> + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
>
>  (define_insn_reservation "znver2_ssemul_sd_pd" 3
>                          (and (eq_attr "cpu" "znver2,znver3")
>                               (and (eq_attr "type" "ssemul")
>                                    (eq_attr "memory" "none")))
> -                        "znver1-direct,(znver1-fp0|znver1-fp1)*3")
> +                        "znver1-direct,znver1-fp0|znver1-fp1")
>
>  (define_insn_reservation "znver2_ssemul_sd_pd_load" 10
>                          (and (eq_attr "cpu" "znver2,znver3")
>                               (and (eq_attr "type" "ssemul")
>                                    (eq_attr "memory" "load")))
> -                        "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3")
> +
> + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
>
>
>  (define_insn_reservation "znver1_ssemul_avx256_pd" 5 @@ -1220,14
> +1220,14 @@ (define_insn_reservation "znver1_ssemul_avx256_pd" 5
>                               (and (eq_attr "mode" "V4DF")
>                                    (and (eq_attr "type" "ssemul")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,(znver1-fp0|znver1-fp1)*4")
> +                        "znver1-double,znver1-fp0*2|znver1-fp1*2")

Do we need to include "znver1" check here?

>
>  (define_insn_reservation "znver1_ssemul_avx256_pd_load" 12
>                          (and (eq_attr "cpu" "znver1")
>                               (and (eq_attr "mode" "V4DF")
>                                    (and (eq_attr "type" "ssemul")
>                                         (eq_attr "memory" "load"))))
> -                        "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*4")
> +
> + "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2")
>
>  ;;SSE imul
>  (define_insn_reservation "znver1_sseimul" 3 @@ -1239,14 +1239,14 @@
> (define_insn_reservation "znver1_sseimul" 3
>                                               (eq_attr "mode" "TI,OI")))
>                               (and (eq_attr "type" "sseimul")
>                                    (eq_attr "memory" "none")))
> -                        "znver1-direct,znver1-fp0*3")
> +                        "znver1-direct,znver1-fp0")
>
>  (define_insn_reservation "znver1_sseimul_avx256" 4
>                          (and (eq_attr "cpu" "znver1,znver2,znver3")
>                               (and (eq_attr "mode" "OI")
>                                    (and (eq_attr "type" "sseimul")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-fp0*4")
> +                        "znver1-double,znver1-fp0*2")

znver1 native path is 128  and znver2/3 has 256 bit paths.
We need to split this into two reservations. One for znver1 and the other for znver2/3.

>
>  (define_insn_reservation "znver1_sseimul_load" 10
>                          (and (ior (and (eq_attr "cpu" "znver1") @@ -1257,28
> +1257,28 @@ (define_insn_reservation "znver1_sseimul_load" 10
>                                         (eq_attr "mode" "TI,OI")))
>                               (and (eq_attr "type" "sseimul")
>                                    (eq_attr "memory" "load")))
> -                        "znver1-direct,znver1-load,znver1-fp0*3")
> +                        "znver1-direct,znver1-load,znver1-fp0")
>
>  (define_insn_reservation "znver1_sseimul_avx256_load" 11
>                          (and (eq_attr "cpu" "znver1,znver2,znver3")
>                               (and (eq_attr "mode" "OI")
>                                    (and (eq_attr "type" "sseimul")
>                                         (eq_attr "memory" "load"))))
> -                        "znver1-double,znver1-load,znver1-fp0*4")
> +                        "znver1-double,znver1-load,znver1-fp0*2")

We need to split this into two reservations. One for znver1 and the other for znver2/3.

>
>  (define_insn_reservation "znver1_sseimul_di" 3
>                          (and (eq_attr "cpu" "znver1,znver2,znver3")
>                               (and (eq_attr "mode" "DI")
>                                    (and (eq_attr "memory" "none")
>                                         (eq_attr "type" "sseimul"))))
> -                        "znver1-direct,znver1-fp0*3")
> +                        "znver1-direct,znver1-fp0")
>
>  (define_insn_reservation "znver1_sseimul_load_di" 10
>                          (and (eq_attr "cpu" "znver1,znver2,znver3")
>                               (and (eq_attr "mode" "DI")
>                                    (and (eq_attr "type" "sseimul")
>                                         (eq_attr "memory" "load"))))
> -                        "znver1-direct,znver1-load,znver1-fp0*3")
> +                        "znver1-direct,znver1-load,znver1-fp0")
>
>  ;; SSE compares
>  (define_insn_reservation "znver1_sse_cmp" 1
> --
> 2.37.2

The patch looks good.

Regards,
Venkat.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md
  2022-11-16 11:53   ` Kumar, Venkataramanan
@ 2022-11-16 12:21     ` Jan Hubička
  2022-11-16 13:13       ` Alexander Monakov
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Hubička @ 2022-11-16 12:21 UTC (permalink / raw)
  To: Kumar, Venkataramanan; +Cc: Alexander Monakov, gcc-patches, Joshi, Tejas Sanjay

[-- Attachment #1: Type: text/plain, Size: 8849 bytes --]

Hello,


On Wed, Nov 16, 2022 at 12:53 PM Kumar, Venkataramanan <
Venkataramanan.Kumar@amd.com> wrote:

> [AMD Official Use Only - General]
>
> Hi,
>
>
> > Top znver table sizes in insn-automata.o:
> >
> > Before:
> >
> > 30056 r znver1_fp_min_issue_delay
> > 120224 r znver1_fp_transitions
>
>
> > After:
> >
> > 6720 r znver1_fp_min_issue_delay
> > 53760 r znver1_fp_transitions
>

This looks really promising.  I will experiment with the patch for separate
znver3 model, but I think we should be able to keep
them unified and hopefully get both less code duplicatoin and table sizes.

> >
> > gcc/ChangeLog:
> >
> >         PR target/87832
> >         * config/i386/znver.md: (znver1_fp_op_mul): Correct cycles in
> >         the reservation.
> >         (znver1_fp_op_mul_load): Ditto.
> >         (znver1_mmx_mul): Ditto.
> >         (znver1_mmx_load): Ditto.
> >         (znver1_ssemul_ss_ps): Ditto.
> >         (znver1_ssemul_ss_ps_load): Ditto.
> >         (znver1_ssemul_avx256_ps): Ditto.
> >         (znver1_ssemul_avx256_ps_load): Ditto.
> >         (znver1_ssemul_sd_pd): Ditto.
> >         (znver1_ssemul_sd_pd_load): Ditto.
> >         (znver2_ssemul_sd_pd): Ditto.
> >         (znver2_ssemul_sd_pd_load): Ditto.
> >         (znver1_ssemul_avx256_pd): Ditto.
> >         (znver1_ssemul_avx256_pd_load): Ditto.
> >         (znver1_sseimul): Ditto.
> >         (znver1_sseimul_avx256): Ditto.
> >         (znver1_sseimul_load): Ditto.
> >         (znver1_sseimul_avx256_load): Ditto.
> >         (znver1_sseimul_di): Ditto.
> >         (znver1_sseimul_load_di): Ditto.
> > ---
> >  gcc/config/i386/znver.md | 40 ++++++++++++++++++++--------------------
> >  1 file changed, 20 insertions(+), 20 deletions(-)
> >
> > diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md index
> > c52f8b532..882f250f1 100644
> > --- a/gcc/config/i386/znver.md
> > +++ b/gcc/config/i386/znver.md
> > @@ -573,13 +573,13 @@ (define_insn_reservation "znver1_fp_op_mul" 5
> >                          (and (eq_attr "cpu" "znver1,znver2,znver3")
> >                               (and (eq_attr "type" "fop,fmul")
> >                                    (eq_attr "memory" "none")))
> > -                        "znver1-direct,znver1-fp0*5")
> > +                        "znver1-direct,znver1-fp0")
> >
> >  (define_insn_reservation "znver1_fp_op_mul_load" 12
> >                          (and (eq_attr "cpu" "znver1,znver2,znver3")
> >                               (and (eq_attr "type" "fop,fmul")
> >                                    (eq_attr "memory" "load")))
> > -                        "znver1-direct,znver1-load,znver1-fp0*5")
> > +                        "znver1-direct,znver1-load,znver1-fp0")
> >
> >  (define_insn_reservation "znver1_fp_op_imul_load" 16
> >                          (and (eq_attr "cpu" "znver1,znver2,znver3") @@
> -684,13
> > +684,13 @@ (define_insn_reservation "znver1_mmx_mul" 3
> >                          (and (eq_attr "cpu" "znver1,znver2,znver3")
> >                               (and (eq_attr "type" "mmxmul")
> >                                    (eq_attr "memory" "none")))
> > -                         "znver1-direct,znver1-fp0*3")
> > +                         "znver1-direct,znver1-fp0")
> >
> >  (define_insn_reservation "znver1_mmx_load" 10
> >                          (and (eq_attr "cpu" "znver1,znver2,znver3")
> >                               (and (eq_attr "type" "mmxmul")
> >                                    (eq_attr "memory" "load")))
> > -                        "znver1-direct,znver1-load,znver1-fp0*3")
> > +                        "znver1-direct,znver1-load,znver1-fp0")
> >
> >  ;; TODO
> >  (define_insn_reservation "znver1_avx256_log" 1 @@ -1161,7 +1161,7
> > @@ (define_insn_reservation "znver1_ssemul_ss_ps" 3
> >                                               (eq_attr "mode"
> > "V8SF,V4SF,SF,V4DF,V2DF,DF")))
> >                               (and (eq_attr "type" "ssemul")
> >                                    (eq_attr "memory" "none")))
> > -                        "znver1-direct,(znver1-fp0|znver1-fp1)*3")
> > +                        "znver1-direct,znver1-fp0|znver1-fp1")
> >
> >  (define_insn_reservation "znver1_ssemul_ss_ps_load" 10
> >                          (and (ior (and (eq_attr "cpu" "znver1") @@
> -1172,47
> > +1172,47 @@ (define_insn_reservation "znver1_ssemul_ss_ps_load" 10
> >                                               (eq_attr "mode"
> "V8SF,V4SF,SF")))
> >                               (and (eq_attr "type" "ssemul")
> >                                    (eq_attr "memory" "load")))
> > -
> "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3")
> > +
> > + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
> >
> >  (define_insn_reservation "znver1_ssemul_avx256_ps" 3
> >                          (and (eq_attr "cpu" "znver1")
> >                               (and (eq_attr "mode" "V8SF")
> >                                    (and (eq_attr "type" "ssemul")
> >                                         (eq_attr "memory" "none"))))
> > -                        "znver1-double,(znver1-fp0|znver1-fp1)*3")
> > +                        "znver1-double,znver1-fp0*2|znver1-fp1*2")
> >
> >  (define_insn_reservation "znver1_ssemul_avx256_ps_load" 10
> >                          (and (eq_attr "cpu" "znver1")
> >                               (and (eq_attr "mode" "V8SF")
> >                                    (and (eq_attr "type" "ssemul")
> >                                         (eq_attr "memory" "load"))))
> > -
> "znver1-double,znver1-load,(znver1-fp0|znver1-fp1)*3")
> > +
> > + "znver1-double,znver1-load,znver1-fp0*2|znver1-fp1*2")
> >
> >  (define_insn_reservation "znver1_ssemul_sd_pd" 4
> >                          (and (eq_attr "cpu" "znver1")
> >                               (and (eq_attr "mode" "V2DF,DF")
> >                                    (and (eq_attr "type" "ssemul")
> >                                         (eq_attr "memory" "none"))))
> > -                        "znver1-direct,(znver1-fp0|znver1-fp1)*4")
> > +                        "znver1-direct,znver1-fp0|znver1-fp1")
> >
> >  (define_insn_reservation "znver1_ssemul_sd_pd_load" 11
> >                          (and (eq_attr "cpu" "znver1")
> >                               (and (eq_attr "mode" "V2DF,DF")
> >                                    (and (eq_attr "type" "ssemul")
> >                                         (eq_attr "memory" "load"))))
> > -
> "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*4")
> > +
> > + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
> >
> >  (define_insn_reservation "znver2_ssemul_sd_pd" 3
> >                          (and (eq_attr "cpu" "znver2,znver3")
> >                               (and (eq_attr "type" "ssemul")
> >                                    (eq_attr "memory" "none")))
> > -                        "znver1-direct,(znver1-fp0|znver1-fp1)*3")
> > +                        "znver1-direct,znver1-fp0|znver1-fp1")
> >
> >  (define_insn_reservation "znver2_ssemul_sd_pd_load" 10
> >                          (and (eq_attr "cpu" "znver2,znver3")
> >                               (and (eq_attr "type" "ssemul")
> >                                    (eq_attr "memory" "load")))
> > -
> "znver1-direct,znver1-load,(znver1-fp0|znver1-fp1)*3")
> > +
> > + "znver1-direct,znver1-load,znver1-fp0|znver1-fp1")
> >
> >
> >  (define_insn_reservation "znver1_ssemul_avx256_pd" 5 @@ -1220,14
> > +1220,14 @@ (define_insn_reservation "znver1_ssemul_avx256_pd" 5
> >                               (and (eq_attr "mode" "V4DF")
> >                                    (and (eq_attr "type" "ssemul")
> >                                         (eq_attr "memory" "none"))))
> > -                        "znver1-double,(znver1-fp0|znver1-fp1)*4")
> > +                        "znver1-double,znver1-fp0*2|znver1-fp1*2")
>
> Do we need to include "znver1" check here?
>

If people use nonsential combinations like -mtune=znver1 -march=znver2 this
may help a bit.
I do it from time to time to see differences between pipelilne models, but
it is not too important.

> >
> >  (define_insn_reservation "znver1_sseimul_avx256" 4
> >                          (and (eq_attr "cpu" "znver1,znver2,znver3")
> >                               (and (eq_attr "mode" "OI")
> >                                    (and (eq_attr "type" "sseimul")
> >                                         (eq_attr "memory" "none"))))
> > -                        "znver1-double,znver1-fp0*4")
> > +                        "znver1-double,znver1-fp0*2")
>
> znver1 native path is 128  and znver2/3 has 256 bit paths.
> We need to split this into two reservations. One for znver1 and the other
> for znver2/3.
>

isn't it znver2 for 128 and znver3 for 256?

The patch looks good.
>
Patch is OK then :)

thanks a lot!
Honza

>
> Regards,
> Venkat.
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md
  2022-11-16 12:21     ` Jan Hubička
@ 2022-11-16 13:13       ` Alexander Monakov
  2022-11-16 13:28         ` Jan Hubička
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Monakov @ 2022-11-16 13:13 UTC (permalink / raw)
  To: Jan Hubička; +Cc: Kumar, Venkataramanan, gcc-patches, Joshi, Tejas Sanjay

[-- Attachment #1: Type: text/plain, Size: 3867 bytes --]


On Wed, 16 Nov 2022, Jan Hubička wrote:

> This looks really promising.  I will experiment with the patch for separate
> znver3 model, but I think we should be able to keep
> them unified and hopefully get both less code duplicatoin and table sizes.

Do you mean separate znver4 (not '3') model (i.e. the recent patch by AMD)?

> > >  (define_insn_reservation "znver1_ssemul_avx256_pd" 5 @@ -1220,14
> > > +1220,14 @@ (define_insn_reservation "znver1_ssemul_avx256_pd" 5
> > >                               (and (eq_attr "mode" "V4DF")
> > >                                    (and (eq_attr "type" "ssemul")
> > >                                         (eq_attr "memory" "none"))))
> > > -                        "znver1-double,(znver1-fp0|znver1-fp1)*4")
> > > +                        "znver1-double,znver1-fp0*2|znver1-fp1*2")
> >
> > Do we need to include "znver1" check here?
> >
> 
> If people use nonsential combinations like -mtune=znver1 -march=znver2 this
> may help a bit.
> I do it from time to time to see differences between pipelilne models, but
> it is not too important.

Actually no change is needed, the reservation already includes a check for
znver1, it's just cut off in the patch context. Here's the full context:

(define_insn_reservation "znver1_ssemul_avx256_pd" 5
                         (and (eq_attr "cpu" "znver1")
                              (and (eq_attr "mode" "V4DF")
                                   (and (eq_attr "type" "ssemul")
                                        (eq_attr "memory" "none"))))
                         "znver1-double,znver1-fp0*2|znver1-fp1*2")

> > >  (define_insn_reservation "znver1_sseimul_avx256" 4
> > >                          (and (eq_attr "cpu" "znver1,znver2,znver3")
> > >                               (and (eq_attr "mode" "OI")
> > >                                    (and (eq_attr "type" "sseimul")
> > >                                         (eq_attr "memory" "none"))))
> > > -                        "znver1-double,znver1-fp0*4")
> > > +                        "znver1-double,znver1-fp0*2")
> >
> > znver1 native path is 128  and znver2/3 has 256 bit paths.
> > We need to split this into two reservations. One for znver1 and the other
> > for znver2/3.
> >
> 
> isn't it znver2 for 128 and znver3 for 256?

No, Zen 1 splits AVX256 instructions into pairs of 128-bit uops, Zen 2 and
Zen 3 have native 256-bit units. Zen 4 again executes AVX512 instructions
on 256-bit units.

I think a split is not needed because the preceding reservation already handles
znver2 and znver3, we just need to remove them here, like this:

diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md
index 882f250f1..16b5afa5d 100644
--- a/gcc/config/i386/znver.md
+++ b/gcc/config/i386/znver.md
@@ -1242,7 +1242,7 @@ (define_insn_reservation "znver1_sseimul" 3
                         "znver1-direct,znver1-fp0")

 (define_insn_reservation "znver1_sseimul_avx256" 4
-                        (and (eq_attr "cpu" "znver1,znver2,znver3")
+                        (and (eq_attr "cpu" "znver1")
                              (and (eq_attr "mode" "OI")
                                   (and (eq_attr "type" "sseimul")
                                        (eq_attr "memory" "none"))))
@@ -1260,7 +1260,7 @@ (define_insn_reservation "znver1_sseimul_load" 10
                         "znver1-direct,znver1-load,znver1-fp0")

 (define_insn_reservation "znver1_sseimul_avx256_load" 11
-                        (and (eq_attr "cpu" "znver1,znver2,znver3")
+                        (and (eq_attr "cpu" "znver1")
                              (and (eq_attr "mode" "OI")
                                   (and (eq_attr "type" "sseimul")
                                        (eq_attr "memory" "load"))))

> The patch looks good.
> >
> Patch is OK then :)

For *both* patches in the series?

Alexander

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] i386: correct x87&SSE multiplication modeling in znver.md
  2022-11-16 13:13       ` Alexander Monakov
@ 2022-11-16 13:28         ` Jan Hubička
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Hubička @ 2022-11-16 13:28 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Kumar, Venkataramanan, gcc-patches, Joshi, Tejas Sanjay

[-- Attachment #1: Type: text/plain, Size: 2762 bytes --]

On Wed, Nov 16, 2022 at 2:13 PM Alexander Monakov <amonakov@ispras.ru>
wrote:

>
> On Wed, 16 Nov 2022, Jan Hubička wrote:
>
> > This looks really promising.  I will experiment with the patch for
> separate
> > znver3 model, but I think we should be able to keep
> > them unified and hopefully get both less code duplicatoin and table
> sizes.
>
> Do you mean separate znver4 (not '3') model (i.e. the recent patch by AMD)?
>

Yes. I guess we want to check what variant leads to smaller automaton.  I
would somewhat prefer to keep the models unified since they are quite
similar

> > > znver1 native path is 128  and znver2/3 has 256 bit paths.
> > > We need to split this into two reservations. One for znver1 and the
> other
> > > for znver2/3.
> > >
> >
> > isn't it znver2 for 128 and znver3 for 256?
>
> No, Zen 1 splits AVX256 instructions into pairs of 128-bit uops, Zen 2 and
> Zen 3 have native 256-bit units. Zen 4 again executes AVX512 instructions
> on 256-bit units.
>

Ah, of course.  I mixed things up in my memory. Sorry fro that.

>
> I think a split is not needed because the preceding reservation already
> handles
> znver2 and znver3, we just need to remove them here, like this:
>
> diff --git a/gcc/config/i386/znver.md b/gcc/config/i386/znver.md
> index 882f250f1..16b5afa5d 100644
> --- a/gcc/config/i386/znver.md
> +++ b/gcc/config/i386/znver.md
> @@ -1242,7 +1242,7 @@ (define_insn_reservation "znver1_sseimul" 3
>                          "znver1-direct,znver1-fp0")
>
>  (define_insn_reservation "znver1_sseimul_avx256" 4
> -                        (and (eq_attr "cpu" "znver1,znver2,znver3")
> +                        (and (eq_attr "cpu" "znver1")
>

It should work even without removal since first reservation matches, but
this is quite less confusing indeed.

>                               (and (eq_attr "mode" "OI")
>                                    (and (eq_attr "type" "sseimul")
>                                         (eq_attr "memory" "none"))))
> @@ -1260,7 +1260,7 @@ (define_insn_reservation "znver1_sseimul_load" 10
>                          "znver1-direct,znver1-load,znver1-fp0")
>
>  (define_insn_reservation "znver1_sseimul_avx256_load" 11
> -                        (and (eq_attr "cpu" "znver1,znver2,znver3")
> +                        (and (eq_attr "cpu" "znver1")
>                               (and (eq_attr "mode" "OI")
>                                    (and (eq_attr "type" "sseimul")
>                                         (eq_attr "memory" "load"))))
>
> > The patch looks good.
> > >
> > Patch is OK then :)
>
> For *both* patches in the series?
>

Yes, thanks a lot for looking into this!
Honza

>
> Alexander

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-11-16 13:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-01 16:26 [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov
2022-11-01 16:26 ` [PATCH 1/2] i386: correct x87&SSE division modeling in znver.md Alexander Monakov
2022-11-01 16:26 ` [PATCH 2/2] i386: correct x87&SSE multiplication " Alexander Monakov
2022-11-16 11:53   ` Kumar, Venkataramanan
2022-11-16 12:21     ` Jan Hubička
2022-11-16 13:13       ` Alexander Monakov
2022-11-16 13:28         ` Jan Hubička
2022-11-07 11:27 ` [PATCH 0/2] i386: slim down insn-automata [PR 87832] Alexander Monakov
2022-11-14 11:19   ` Alexander Monakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).