[PATCH 0/3] LoongArch: Fix instruction costs

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/3] LoongArch: Fix instruction costs
@ 2023-12-09 17:03 Xi Ruoyao
  2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-09 17:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c, Xi Ruoyao

Update LoongArch instruction costs based on the micro-benchmark results
on LA464 and LA664.  In particular, this allows generating alsl/slli or
alsl/slli + add pairs for multiplying some constants as on LA464/LA664
a mul instruction is 4x slower than alsl, slli, or add instructions.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (3):
  LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our
    own
  LoongArch: Fix instruction costs [PR112936]
  LoongArch: Add alslsi3_extend

 gcc/config/loongarch/loongarch-def.cc         | 42 ++++++++++---------
 gcc/config/loongarch/loongarch.cc             | 22 +++++-----
 gcc/config/loongarch/loongarch.md             | 12 ++++++
 .../loongarch/mul-const-reduction.c           | 11 +++++
 4 files changed, 56 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own
  2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
@ 2023-12-09 17:03 ` Xi Ruoyao
  2023-12-13 12:57   ` chenglulu
  2023-12-09 17:03 ` [PATCH 2/3] LoongArch: Fix instruction costs [PR112936] Xi Ruoyao
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-09 17:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c, Xi Ruoyao

With loongarch-def.cc switched from C to C++, we can include rtl.h for
COSTS_N_INSNS, instead of hard coding our own.

THis is a non-functional change for now, but it will make the code more
future-proof in case COSTS_N_INSNS in rtl.h would be changed.

gcc/ChangeLog:

	* config/loongarch/loongarch-def.cc (rtl.h): Include.
	(COSTS_N_INSNS): Remove the macro definition.
---
 gcc/config/loongarch/loongarch-def.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc
index c41804a180e..6217b19268c 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+#include "rtl.h"
 
 #include "loongarch-def.h"
 #include "loongarch-str.h"
@@ -89,8 +90,6 @@ array_tune<loongarch_align> loongarch_cpu_align =
     .set (CPU_LA464, la464_align ())
     .set (CPU_LA664, la464_align ());
 
-#define COSTS_N_INSNS(N) ((N) * 4)
-
 /* Default RTX cost initializer.  */
 loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
   : fp_add (COSTS_N_INSNS (1)),
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
  2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
  2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
@ 2023-12-09 17:03 ` Xi Ruoyao
  2023-12-13 12:22   ` chenglulu
  2023-12-09 17:03 ` [PATCH 3/3] LoongArch: Add alslsi3_extend Xi Ruoyao
  2023-12-17 15:22 ` Pushed: [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
  3 siblings, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-09 17:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c, Xi Ruoyao

Replace the instruction costs in loongarch_rtx_cost_data constructor
based on micro-benchmark results on LA464 and LA664.

This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
and slli.

gcc/ChangeLog:

	PR target/112936
	* config/loongarch/loongarch-def.cc
	(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
	instruction costs per micro-benchmark results.
	(loongarch_rtx_cost_optimize_size): Set all instruction costs
	to (COSTS_N_INSNS (1) + 1).
	* config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
	special case for multiplication when optimizing for size.
	Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
	Account the extra cost when TARGET_CHECK_ZERO_DIV and
	optimizing for speed.

gcc/testsuite/ChangeLog

	PR target/112936
	* gcc.target/loongarch/mul-const-reduction.c: New test.
---
 gcc/config/loongarch/loongarch-def.cc         | 39 ++++++++++---------
 gcc/config/loongarch/loongarch.cc             | 22 +++++------
 .../loongarch/mul-const-reduction.c           | 11 ++++++
 3 files changed, 43 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c

diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc
index 6217b19268c..4a8885e8343 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -92,15 +92,15 @@ array_tune<loongarch_align> loongarch_cpu_align =
 
 /* Default RTX cost initializer.  */
 loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
-  : fp_add (COSTS_N_INSNS (1)),
-    fp_mult_sf (COSTS_N_INSNS (2)),
-    fp_mult_df (COSTS_N_INSNS (4)),
-    fp_div_sf (COSTS_N_INSNS (6)),
+  : fp_add (COSTS_N_INSNS (5)),
+    fp_mult_sf (COSTS_N_INSNS (5)),
+    fp_mult_df (COSTS_N_INSNS (5)),
+    fp_div_sf (COSTS_N_INSNS (8)),
     fp_div_df (COSTS_N_INSNS (8)),
-    int_mult_si (COSTS_N_INSNS (1)),
-    int_mult_di (COSTS_N_INSNS (1)),
-    int_div_si (COSTS_N_INSNS (4)),
-    int_div_di (COSTS_N_INSNS (6)),
+    int_mult_si (COSTS_N_INSNS (4)),
+    int_mult_di (COSTS_N_INSNS (4)),
+    int_div_si (COSTS_N_INSNS (5)),
+    int_div_di (COSTS_N_INSNS (5)),
     branch_cost (6),
     memory_latency (4) {}
 
@@ -111,18 +111,21 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
 array_tune<loongarch_rtx_cost_data> loongarch_cpu_rtx_cost_data =
   array_tune<loongarch_rtx_cost_data> ();
 
-/* RTX costs to use when optimizing for size.  */
+/* RTX costs to use when optimizing for size.
+   We use a value slightly larger than COSTS_N_INSNS (1) for all of them
+   because they are slower than simple instructions.  */
+#define COST_COMPLEX_INSN (COSTS_N_INSNS (1) + 1)
 const loongarch_rtx_cost_data loongarch_rtx_cost_optimize_size =
   loongarch_rtx_cost_data ()
-    .fp_add_ (4)
-    .fp_mult_sf_ (4)
-    .fp_mult_df_ (4)
-    .fp_div_sf_ (4)
-    .fp_div_df_ (4)
-    .int_mult_si_ (4)
-    .int_mult_di_ (4)
-    .int_div_si_ (4)
-    .int_div_di_ (4);
+    .fp_add_ (COST_COMPLEX_INSN)
+    .fp_mult_sf_ (COST_COMPLEX_INSN)
+    .fp_mult_df_ (COST_COMPLEX_INSN)
+    .fp_div_sf_ (COST_COMPLEX_INSN)
+    .fp_div_df_ (COST_COMPLEX_INSN)
+    .int_mult_si_ (COST_COMPLEX_INSN)
+    .int_mult_di_ (COST_COMPLEX_INSN)
+    .int_div_si_ (COST_COMPLEX_INSN)
+    .int_div_di_ (COST_COMPLEX_INSN);
 
 array_tune<int> loongarch_cpu_issue_rate = array_tune<int> ()
   .set (CPU_NATIVE, 4)
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 754aeb8bfb7..f04b5798f39 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3787,8 +3787,6 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	*total = (speed
 		  ? loongarch_cost->int_mult_si * 3 + 6
 		  : COSTS_N_INSNS (7));
-      else if (!speed)
-	*total = COSTS_N_INSNS (1) + 1;
       else if (mode == DImode)
 	*total = loongarch_cost->int_mult_di;
       else
@@ -3823,14 +3821,18 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
     case UDIV:
     case UMOD:
-      if (!speed)
-	{
-	  *total = COSTS_N_INSNS (loongarch_idiv_insns (mode));
-	}
-      else if (mode == DImode)
+      if (mode == DImode)
 	*total = loongarch_cost->int_div_di;
       else
-	*total = loongarch_cost->int_div_si;
+	{
+	  *total = loongarch_cost->int_div_si;
+	  if (TARGET_64BIT && !TARGET_DIV32)
+	    *total += COSTS_N_INSNS (2);
+	}
+
+      if (TARGET_CHECK_ZERO_DIV)
+	*total += COSTS_N_INSNS (2);
+
       return false;
 
     case SIGN_EXTEND:
@@ -3862,9 +3864,7 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		  && (GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 1))
 		      == ZERO_EXTEND))))
 	{
-	  if (!speed)
-	    *total = COSTS_N_INSNS (1) + 1;
-	  else if (mode == DImode)
+	  if (mode == DImode)
 	    *total = loongarch_cost->int_mult_di;
 	  else
 	    *total = loongarch_cost->int_mult_si;
diff --git a/gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c b/gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
new file mode 100644
index 00000000000..02d9a4876d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=la464" } */
+/* { dg-final { scan-assembler "alsl\.w" } } */
+/* { dg-final { scan-assembler "slli\.w" } } */
+/* { dg-final { scan-assembler-not "mul\.w" } } */
+
+int
+test (int a)
+{
+  return a * 68;
+}
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3] LoongArch: Add alslsi3_extend
  2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
  2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
  2023-12-09 17:03 ` [PATCH 2/3] LoongArch: Fix instruction costs [PR112936] Xi Ruoyao
@ 2023-12-09 17:03 ` Xi Ruoyao
  2023-12-13 12:58   ` chenglulu
  2023-12-17 15:22 ` Pushed: [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
  3 siblings, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-09 17:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c, Xi Ruoyao

Following the instruction cost fix, we are generating

    alsl.w $a0, $a0, $a0, 4

instead of

    li.w  $t0, 17
    mul.w $a0, $t0

for "x * 4", because alsl.w is 4 times faster than mul.w.  But we didn't
have a sign-extending pattern for alsl.w, causing an extra slli.w
instruction generated to sign-extend $a0.  Add the pattern to remove the
redundant extension.

gcc/ChangeLog:

	* config/loongarch/loongarch.md (alslsi3_extend): New
	define_insn.
---
 gcc/config/loongarch/loongarch.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index afbf201d4d0..7b26d15aa4e 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2869,6 +2869,18 @@ (define_insn "alsl<mode>3"
   [(set_attr "type" "arith")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "alslsi3_extend"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(sign_extend:DI
+	  (plus:SI
+	    (ashift:SI (match_operand:SI 1 "register_operand" "r")
+		       (match_operand 2 "const_immalsl_operand" ""))
+	    (match_operand:SI 3 "register_operand" "r"))))]
+  ""
+  "alsl.w\t%0,%1,%3,%2"
+  [(set_attr "type" "arith")
+   (set_attr "mode" "SI")])
+
 \f
 
 ;; Reverse the order of bytes of operand 1 and store the result in operand 0.
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
  2023-12-09 17:03 ` [PATCH 2/3] LoongArch: Fix instruction costs [PR112936] Xi Ruoyao
@ 2023-12-13 12:22   ` chenglulu
  2023-12-13 13:20     ` Xi Ruoyao
  0 siblings, 1 reply; 12+ messages in thread
From: chenglulu @ 2023-12-13 12:22 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c


在 2023/12/10 上午1:03, Xi Ruoyao 写道:
> Replace the instruction costs in loongarch_rtx_cost_data constructor
> based on micro-benchmark results on LA464 and LA664.
>
> This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
> and slli.
>
> gcc/ChangeLog:
>
> 	PR target/112936
> 	* config/loongarch/loongarch-def.cc
> 	(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
> 	instruction costs per micro-benchmark results.
> 	(loongarch_rtx_cost_optimize_size): Set all instruction costs
> 	to (COSTS_N_INSNS (1) + 1).
> 	* config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
> 	special case for multiplication when optimizing for size.
> 	Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
> 	Account the extra cost when TARGET_CHECK_ZERO_DIV and
> 	optimizing for speed.
>
> gcc/testsuite/ChangeLog
>
> 	PR target/112936
> 	* gcc.target/loongarch/mul-const-reduction.c: New test.
> ---
>   gcc/config/loongarch/loongarch-def.cc         | 39 ++++++++++---------
>   gcc/config/loongarch/loongarch.cc             | 22 +++++------
>   .../loongarch/mul-const-reduction.c           | 11 ++++++
>   3 files changed, 43 insertions(+), 29 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
>
Well, I'm curious about how the value of this cost is obtained.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own
  2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
@ 2023-12-13 12:57   ` chenglulu
  0 siblings, 0 replies; 12+ messages in thread
From: chenglulu @ 2023-12-13 12:57 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c

LGTM!

Thanks.

在 2023/12/10 上午1:03, Xi Ruoyao 写道:
> With loongarch-def.cc switched from C to C++, we can include rtl.h for
> COSTS_N_INSNS, instead of hard coding our own.
>
> THis is a non-functional change for now, but it will make the code more
> future-proof in case COSTS_N_INSNS in rtl.h would be changed.
>
> gcc/ChangeLog:
>
> 	* config/loongarch/loongarch-def.cc (rtl.h): Include.
> 	(COSTS_N_INSNS): Remove the macro definition.
> ---
>   gcc/config/loongarch/loongarch-def.cc | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc
> index c41804a180e..6217b19268c 100644
> --- a/gcc/config/loongarch/loongarch-def.cc
> +++ b/gcc/config/loongarch/loongarch-def.cc
> @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "system.h"
>   #include "coretypes.h"
>   #include "tm.h"
> +#include "rtl.h"
>   
>   #include "loongarch-def.h"
>   #include "loongarch-str.h"
> @@ -89,8 +90,6 @@ array_tune<loongarch_align> loongarch_cpu_align =
>       .set (CPU_LA464, la464_align ())
>       .set (CPU_LA664, la464_align ());
>   
> -#define COSTS_N_INSNS(N) ((N) * 4)
> -
>   /* Default RTX cost initializer.  */
>   loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
>     : fp_add (COSTS_N_INSNS (1)),


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] LoongArch: Add alslsi3_extend
  2023-12-09 17:03 ` [PATCH 3/3] LoongArch: Add alslsi3_extend Xi Ruoyao
@ 2023-12-13 12:58   ` chenglulu
  0 siblings, 0 replies; 12+ messages in thread
From: chenglulu @ 2023-12-13 12:58 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c

LGTM!

Thanks!

在 2023/12/10 上午1:03, Xi Ruoyao 写道:
> Following the instruction cost fix, we are generating
>
>      alsl.w $a0, $a0, $a0, 4
>
> instead of
>
>      li.w  $t0, 17
>      mul.w $a0, $t0
>
> for "x * 4", because alsl.w is 4 times faster than mul.w.  But we didn't
> have a sign-extending pattern for alsl.w, causing an extra slli.w
> instruction generated to sign-extend $a0.  Add the pattern to remove the
> redundant extension.
>
> gcc/ChangeLog:
>
> 	* config/loongarch/loongarch.md (alslsi3_extend): New
> 	define_insn.
> ---
>   gcc/config/loongarch/loongarch.md | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
> index afbf201d4d0..7b26d15aa4e 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -2869,6 +2869,18 @@ (define_insn "alsl<mode>3"
>     [(set_attr "type" "arith")
>      (set_attr "mode" "<MODE>")])
>   
> +(define_insn "alslsi3_extend"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +	(sign_extend:DI
> +	  (plus:SI
> +	    (ashift:SI (match_operand:SI 1 "register_operand" "r")
> +		       (match_operand 2 "const_immalsl_operand" ""))
> +	    (match_operand:SI 3 "register_operand" "r"))))]
> +  ""
> +  "alsl.w\t%0,%1,%3,%2"
> +  [(set_attr "type" "arith")
> +   (set_attr "mode" "SI")])
> +
>   \f
>   
>   ;; Reverse the order of bytes of operand 1 and store the result in operand 0.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
  2023-12-13 12:22   ` chenglulu
@ 2023-12-13 13:20     ` Xi Ruoyao
  2023-12-14  1:16       ` chenglulu
  0 siblings, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-13 13:20 UTC (permalink / raw)
  To: chenglulu, gcc-patches; +Cc: i, xuchenghua, c

On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:

在 2023/12/10 上午1:03, Xi Ruoyao 写道:
Replace the instruction costs in loongarch_rtx_cost_data constructor
based on micro-benchmark results on LA464 and LA664.

This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
and slli.

gcc/ChangeLog:

 	PR target/112936
 	* config/loongarch/loongarch-def.cc
 	(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
 	instruction costs per micro-benchmark results.
 	(loongarch_rtx_cost_optimize_size): Set all instruction costs
 	to (COSTS_N_INSNS (1) + 1).
 	* config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
 	special case for multiplication when optimizing for size.
 	Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
 	Account the extra cost when TARGET_CHECK_ZERO_DIV and
 	optimizing for speed.

gcc/testsuite/ChangeLog

 	PR target/112936
 	* gcc.target/loongarch/mul-const-reduction.c: New test.
---
   gcc/config/loongarch/loongarch-def.cc         | 39 ++++++++++---------
   gcc/config/loongarch/loongarch.cc             | 22 +++++------
   .../loongarch/mul-const-reduction.c           | 11 ++++++
   3 files changed, 43 insertions(+), 29 deletions(-)
   create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c

Well, I'm curious about how the value of this cost is obtained.

I just make a loop containing 1000 mul.w instructions, then run the loop
1000000 times and compare the time usage with running another loop
containing 1000 addi.w instructions iterated 1000000 times too.
> 

Likewise for other instructions...

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
  2023-12-13 13:20     ` Xi Ruoyao
@ 2023-12-14  1:16       ` chenglulu
  2023-12-15  7:56         ` chenglulu
  0 siblings, 1 reply; 12+ messages in thread
From: chenglulu @ 2023-12-14  1:16 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c


在 2023/12/13 下午9:20, Xi Ruoyao 写道:
> On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:
>
> 在 2023/12/10 上午1:03, Xi Ruoyao 写道:
> Replace the instruction costs in loongarch_rtx_cost_data constructor
> based on micro-benchmark results on LA464 and LA664.
>
> This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
> and slli.
>
> gcc/ChangeLog:
>
>   	PR target/112936
>   	* config/loongarch/loongarch-def.cc
>   	(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
>   	instruction costs per micro-benchmark results.
>   	(loongarch_rtx_cost_optimize_size): Set all instruction costs
>   	to (COSTS_N_INSNS (1) + 1).
>   	* config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
>   	special case for multiplication when optimizing for size.
>   	Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
>   	Account the extra cost when TARGET_CHECK_ZERO_DIV and
>   	optimizing for speed.
>
> gcc/testsuite/ChangeLog
>
>   	PR target/112936
>   	* gcc.target/loongarch/mul-const-reduction.c: New test.
> ---
>     gcc/config/loongarch/loongarch-def.cc         | 39 ++++++++++---------
>     gcc/config/loongarch/loongarch.cc             | 22 +++++------
>     .../loongarch/mul-const-reduction.c           | 11 ++++++
>     3 files changed, 43 insertions(+), 29 deletions(-)
>     create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
>
> Well, I'm curious about how the value of this cost is obtained.
>
> I just make a loop containing 1000 mul.w instructions, then run the loop
> 1000000 times and compare the time usage with running another loop
> containing 1000 addi.w instructions iterated 1000000 times too.
> Likewise for other instructions...
>
Ok. I need to do a performance comparison of the spec here. Probably 
tomorrow the results will be available.

Thanks!



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
  2023-12-14  1:16       ` chenglulu
@ 2023-12-15  7:56         ` chenglulu
  2023-12-17  3:02           ` chenglulu
  0 siblings, 1 reply; 12+ messages in thread
From: chenglulu @ 2023-12-15  7:56 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c


在 2023/12/14 上午9:16, chenglulu 写道:
>
> 在 2023/12/13 下午9:20, Xi Ruoyao 写道:
>> On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:
>>
>> 在 2023/12/10 上午1:03, Xi Ruoyao 写道:
>> Replace the instruction costs in loongarch_rtx_cost_data constructor
>> based on micro-benchmark results on LA464 and LA664.
>>
>> This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
>> and slli.
>>
>> gcc/ChangeLog:
>>
>>       PR target/112936
>>       * config/loongarch/loongarch-def.cc
>>       (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
>>       instruction costs per micro-benchmark results.
>>       (loongarch_rtx_cost_optimize_size): Set all instruction costs
>>       to (COSTS_N_INSNS (1) + 1).
>>       * config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
>>       special case for multiplication when optimizing for size.
>>       Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
>>       Account the extra cost when TARGET_CHECK_ZERO_DIV and
>>       optimizing for speed.
>>
>> gcc/testsuite/ChangeLog
>>
>>       PR target/112936
>>       * gcc.target/loongarch/mul-const-reduction.c: New test.
>> ---
>>     gcc/config/loongarch/loongarch-def.cc         | 39 
>> ++++++++++---------
>>     gcc/config/loongarch/loongarch.cc             | 22 +++++------
>>     .../loongarch/mul-const-reduction.c           | 11 ++++++
>>     3 files changed, 43 insertions(+), 29 deletions(-)
>>     create mode 100644 
>> gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
>>
>> Well, I'm curious about how the value of this cost is obtained.
>>
>> I just make a loop containing 1000 mul.w instructions, then run the loop
>> 1000000 times and compare the time usage with running another loop
>> containing 1000 addi.w instructions iterated 1000000 times too.
>> Likewise for other instructions...
>>
> Ok. I need to do a performance comparison of the spec here. Probably 
> tomorrow the results will be available.
>
> Thanks!
>
Sorry, there is a problem with my test environment, so the results may 
not be available until tomorrow.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
  2023-12-15  7:56         ` chenglulu
@ 2023-12-17  3:02           ` chenglulu
  0 siblings, 0 replies; 12+ messages in thread
From: chenglulu @ 2023-12-17  3:02 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c


在 2023/12/15 下午3:56, chenglulu 写道:
>
> 在 2023/12/14 上午9:16, chenglulu 写道:
>>
>> 在 2023/12/13 下午9:20, Xi Ruoyao 写道:
>>> On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:
>>>
>>> 在 2023/12/10 上午1:03, Xi Ruoyao 写道:
>>> Replace the instruction costs in loongarch_rtx_cost_data constructor
>>> based on micro-benchmark results on LA464 and LA664.
>>>
>>> This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
>>> and slli.
>>>
>>> gcc/ChangeLog:
>>>
>>>       PR target/112936
>>>       * config/loongarch/loongarch-def.cc
>>>       (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
>>>       instruction costs per micro-benchmark results.
>>>       (loongarch_rtx_cost_optimize_size): Set all instruction costs
>>>       to (COSTS_N_INSNS (1) + 1).
>>>       * config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
>>>       special case for multiplication when optimizing for size.
>>>       Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
>>>       Account the extra cost when TARGET_CHECK_ZERO_DIV and
>>>       optimizing for speed.
>>>
>>> gcc/testsuite/ChangeLog
>>>
>>>       PR target/112936
>>>       * gcc.target/loongarch/mul-const-reduction.c: New test.
>>> ---
>>>     gcc/config/loongarch/loongarch-def.cc         | 39 
>>> ++++++++++---------
>>>     gcc/config/loongarch/loongarch.cc             | 22 +++++------
>>>     .../loongarch/mul-const-reduction.c           | 11 ++++++
>>>     3 files changed, 43 insertions(+), 29 deletions(-)
>>>     create mode 100644 
>>> gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
>>>
>>> Well, I'm curious about how the value of this cost is obtained.
>>>
>>> I just make a loop containing 1000 mul.w instructions, then run the 
>>> loop
>>> 1000000 times and compare the time usage with running another loop
>>> containing 1000 addi.w instructions iterated 1000000 times too.
>>> Likewise for other instructions...
>>>
>> Ok. I need to do a performance comparison of the spec here. Probably 
>> tomorrow the results will be available.
>>
>> Thanks!
>>
> Sorry, there is a problem with my test environment, so the results may 
> not be available until tomorrow.

The SPEC2006 test was without problems, and the 483 had a 2.7 percent 
performance improvement.

Thanks!


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Pushed: [PATCH 0/3] LoongArch: Fix instruction costs
  2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
                   ` (2 preceding siblings ...)
  2023-12-09 17:03 ` [PATCH 3/3] LoongArch: Add alslsi3_extend Xi Ruoyao
@ 2023-12-17 15:22 ` Xi Ruoyao
  3 siblings, 0 replies; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-17 15:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c

On Sun, 2023-12-10 at 01:03 +0800, Xi Ruoyao wrote:
> Update LoongArch instruction costs based on the micro-benchmark results
> on LA464 and LA664.  In particular, this allows generating alsl/slli or
> alsl/slli + add pairs for multiplying some constants as on LA464/LA664
> a mul instruction is 4x slower than alsl, slli, or add instructions.
> 
> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
> Xi Ruoyao (3):
>   LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our
>     own
>   LoongArch: Fix instruction costs [PR112936]
>   LoongArch: Add alslsi3_extend
> 
>  gcc/config/loongarch/loongarch-def.cc         | 42 ++++++++++---------
>  gcc/config/loongarch/loongarch.cc             | 22 +++++-----
>  gcc/config/loongarch/loongarch.md             | 12 ++++++
>  .../loongarch/mul-const-reduction.c           | 11 +++++
>  4 files changed, 56 insertions(+), 31 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c

Pushed to r14-664{1,2,3} as all 3 patches are approved.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-12-17 15:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
2023-12-13 12:57   ` chenglulu
2023-12-09 17:03 ` [PATCH 2/3] LoongArch: Fix instruction costs [PR112936] Xi Ruoyao
2023-12-13 12:22   ` chenglulu
2023-12-13 13:20     ` Xi Ruoyao
2023-12-14  1:16       ` chenglulu
2023-12-15  7:56         ` chenglulu
2023-12-17  3:02           ` chenglulu
2023-12-09 17:03 ` [PATCH 3/3] LoongArch: Add alslsi3_extend Xi Ruoyao
2023-12-13 12:58   ` chenglulu
2023-12-17 15:22 ` Pushed: [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).