* [PATCH 0/3] LoongArch: Fix instruction costs
@ 2023-12-09 17:03 Xi Ruoyao
2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-09 17:03 UTC (permalink / raw)
To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c, Xi Ruoyao
Update LoongArch instruction costs based on the micro-benchmark results
on LA464 and LA664. In particular, this allows generating alsl/slli or
alsl/slli + add pairs for multiplying some constants as on LA464/LA664
a mul instruction is 4x slower than alsl, slli, or add instructions.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
Xi Ruoyao (3):
LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our
own
LoongArch: Fix instruction costs [PR112936]
LoongArch: Add alslsi3_extend
gcc/config/loongarch/loongarch-def.cc | 42 ++++++++++---------
gcc/config/loongarch/loongarch.cc | 22 +++++-----
gcc/config/loongarch/loongarch.md | 12 ++++++
.../loongarch/mul-const-reduction.c | 11 +++++
4 files changed, 56 insertions(+), 31 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own
2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
@ 2023-12-09 17:03 ` Xi Ruoyao
2023-12-13 12:57 ` chenglulu
2023-12-09 17:03 ` [PATCH 2/3] LoongArch: Fix instruction costs [PR112936] Xi Ruoyao
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-09 17:03 UTC (permalink / raw)
To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c, Xi Ruoyao
With loongarch-def.cc switched from C to C++, we can include rtl.h for
COSTS_N_INSNS, instead of hard coding our own.
THis is a non-functional change for now, but it will make the code more
future-proof in case COSTS_N_INSNS in rtl.h would be changed.
gcc/ChangeLog:
* config/loongarch/loongarch-def.cc (rtl.h): Include.
(COSTS_N_INSNS): Remove the macro definition.
---
gcc/config/loongarch/loongarch-def.cc | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc
index c41804a180e..6217b19268c 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see
#include "system.h"
#include "coretypes.h"
#include "tm.h"
+#include "rtl.h"
#include "loongarch-def.h"
#include "loongarch-str.h"
@@ -89,8 +90,6 @@ array_tune<loongarch_align> loongarch_cpu_align =
.set (CPU_LA464, la464_align ())
.set (CPU_LA664, la464_align ());
-#define COSTS_N_INSNS(N) ((N) * 4)
-
/* Default RTX cost initializer. */
loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
: fp_add (COSTS_N_INSNS (1)),
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
@ 2023-12-09 17:03 ` Xi Ruoyao
2023-12-13 12:22 ` chenglulu
2023-12-09 17:03 ` [PATCH 3/3] LoongArch: Add alslsi3_extend Xi Ruoyao
2023-12-17 15:22 ` Pushed: [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
3 siblings, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-09 17:03 UTC (permalink / raw)
To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c, Xi Ruoyao
Replace the instruction costs in loongarch_rtx_cost_data constructor
based on micro-benchmark results on LA464 and LA664.
This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
and slli.
gcc/ChangeLog:
PR target/112936
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
instruction costs per micro-benchmark results.
(loongarch_rtx_cost_optimize_size): Set all instruction costs
to (COSTS_N_INSNS (1) + 1).
* config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
special case for multiplication when optimizing for size.
Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
Account the extra cost when TARGET_CHECK_ZERO_DIV and
optimizing for speed.
gcc/testsuite/ChangeLog
PR target/112936
* gcc.target/loongarch/mul-const-reduction.c: New test.
---
gcc/config/loongarch/loongarch-def.cc | 39 ++++++++++---------
gcc/config/loongarch/loongarch.cc | 22 +++++------
.../loongarch/mul-const-reduction.c | 11 ++++++
3 files changed, 43 insertions(+), 29 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc
index 6217b19268c..4a8885e8343 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -92,15 +92,15 @@ array_tune<loongarch_align> loongarch_cpu_align =
/* Default RTX cost initializer. */
loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
- : fp_add (COSTS_N_INSNS (1)),
- fp_mult_sf (COSTS_N_INSNS (2)),
- fp_mult_df (COSTS_N_INSNS (4)),
- fp_div_sf (COSTS_N_INSNS (6)),
+ : fp_add (COSTS_N_INSNS (5)),
+ fp_mult_sf (COSTS_N_INSNS (5)),
+ fp_mult_df (COSTS_N_INSNS (5)),
+ fp_div_sf (COSTS_N_INSNS (8)),
fp_div_df (COSTS_N_INSNS (8)),
- int_mult_si (COSTS_N_INSNS (1)),
- int_mult_di (COSTS_N_INSNS (1)),
- int_div_si (COSTS_N_INSNS (4)),
- int_div_di (COSTS_N_INSNS (6)),
+ int_mult_si (COSTS_N_INSNS (4)),
+ int_mult_di (COSTS_N_INSNS (4)),
+ int_div_si (COSTS_N_INSNS (5)),
+ int_div_di (COSTS_N_INSNS (5)),
branch_cost (6),
memory_latency (4) {}
@@ -111,18 +111,21 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
array_tune<loongarch_rtx_cost_data> loongarch_cpu_rtx_cost_data =
array_tune<loongarch_rtx_cost_data> ();
-/* RTX costs to use when optimizing for size. */
+/* RTX costs to use when optimizing for size.
+ We use a value slightly larger than COSTS_N_INSNS (1) for all of them
+ because they are slower than simple instructions. */
+#define COST_COMPLEX_INSN (COSTS_N_INSNS (1) + 1)
const loongarch_rtx_cost_data loongarch_rtx_cost_optimize_size =
loongarch_rtx_cost_data ()
- .fp_add_ (4)
- .fp_mult_sf_ (4)
- .fp_mult_df_ (4)
- .fp_div_sf_ (4)
- .fp_div_df_ (4)
- .int_mult_si_ (4)
- .int_mult_di_ (4)
- .int_div_si_ (4)
- .int_div_di_ (4);
+ .fp_add_ (COST_COMPLEX_INSN)
+ .fp_mult_sf_ (COST_COMPLEX_INSN)
+ .fp_mult_df_ (COST_COMPLEX_INSN)
+ .fp_div_sf_ (COST_COMPLEX_INSN)
+ .fp_div_df_ (COST_COMPLEX_INSN)
+ .int_mult_si_ (COST_COMPLEX_INSN)
+ .int_mult_di_ (COST_COMPLEX_INSN)
+ .int_div_si_ (COST_COMPLEX_INSN)
+ .int_div_di_ (COST_COMPLEX_INSN);
array_tune<int> loongarch_cpu_issue_rate = array_tune<int> ()
.set (CPU_NATIVE, 4)
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 754aeb8bfb7..f04b5798f39 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3787,8 +3787,6 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code,
*total = (speed
? loongarch_cost->int_mult_si * 3 + 6
: COSTS_N_INSNS (7));
- else if (!speed)
- *total = COSTS_N_INSNS (1) + 1;
else if (mode == DImode)
*total = loongarch_cost->int_mult_di;
else
@@ -3823,14 +3821,18 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code,
case UDIV:
case UMOD:
- if (!speed)
- {
- *total = COSTS_N_INSNS (loongarch_idiv_insns (mode));
- }
- else if (mode == DImode)
+ if (mode == DImode)
*total = loongarch_cost->int_div_di;
else
- *total = loongarch_cost->int_div_si;
+ {
+ *total = loongarch_cost->int_div_si;
+ if (TARGET_64BIT && !TARGET_DIV32)
+ *total += COSTS_N_INSNS (2);
+ }
+
+ if (TARGET_CHECK_ZERO_DIV)
+ *total += COSTS_N_INSNS (2);
+
return false;
case SIGN_EXTEND:
@@ -3862,9 +3864,7 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code,
&& (GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 1))
== ZERO_EXTEND))))
{
- if (!speed)
- *total = COSTS_N_INSNS (1) + 1;
- else if (mode == DImode)
+ if (mode == DImode)
*total = loongarch_cost->int_mult_di;
else
*total = loongarch_cost->int_mult_si;
diff --git a/gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c b/gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
new file mode 100644
index 00000000000..02d9a4876d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=la464" } */
+/* { dg-final { scan-assembler "alsl\.w" } } */
+/* { dg-final { scan-assembler "slli\.w" } } */
+/* { dg-final { scan-assembler-not "mul\.w" } } */
+
+int
+test (int a)
+{
+ return a * 68;
+}
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/3] LoongArch: Add alslsi3_extend
2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
2023-12-09 17:03 ` [PATCH 2/3] LoongArch: Fix instruction costs [PR112936] Xi Ruoyao
@ 2023-12-09 17:03 ` Xi Ruoyao
2023-12-13 12:58 ` chenglulu
2023-12-17 15:22 ` Pushed: [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
3 siblings, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-09 17:03 UTC (permalink / raw)
To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c, Xi Ruoyao
Following the instruction cost fix, we are generating
alsl.w $a0, $a0, $a0, 4
instead of
li.w $t0, 17
mul.w $a0, $t0
for "x * 4", because alsl.w is 4 times faster than mul.w. But we didn't
have a sign-extending pattern for alsl.w, causing an extra slli.w
instruction generated to sign-extend $a0. Add the pattern to remove the
redundant extension.
gcc/ChangeLog:
* config/loongarch/loongarch.md (alslsi3_extend): New
define_insn.
---
gcc/config/loongarch/loongarch.md | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index afbf201d4d0..7b26d15aa4e 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2869,6 +2869,18 @@ (define_insn "alsl<mode>3"
[(set_attr "type" "arith")
(set_attr "mode" "<MODE>")])
+(define_insn "alslsi3_extend"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+ (sign_extend:DI
+ (plus:SI
+ (ashift:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand 2 "const_immalsl_operand" ""))
+ (match_operand:SI 3 "register_operand" "r"))))]
+ ""
+ "alsl.w\t%0,%1,%3,%2"
+ [(set_attr "type" "arith")
+ (set_attr "mode" "SI")])
+
\f
;; Reverse the order of bytes of operand 1 and store the result in operand 0.
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
2023-12-09 17:03 ` [PATCH 2/3] LoongArch: Fix instruction costs [PR112936] Xi Ruoyao
@ 2023-12-13 12:22 ` chenglulu
2023-12-13 13:20 ` Xi Ruoyao
0 siblings, 1 reply; 12+ messages in thread
From: chenglulu @ 2023-12-13 12:22 UTC (permalink / raw)
To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c
在 2023/12/10 上午1:03, Xi Ruoyao 写道:
> Replace the instruction costs in loongarch_rtx_cost_data constructor
> based on micro-benchmark results on LA464 and LA664.
>
> This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
> and slli.
>
> gcc/ChangeLog:
>
> PR target/112936
> * config/loongarch/loongarch-def.cc
> (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
> instruction costs per micro-benchmark results.
> (loongarch_rtx_cost_optimize_size): Set all instruction costs
> to (COSTS_N_INSNS (1) + 1).
> * config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
> special case for multiplication when optimizing for size.
> Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
> Account the extra cost when TARGET_CHECK_ZERO_DIV and
> optimizing for speed.
>
> gcc/testsuite/ChangeLog
>
> PR target/112936
> * gcc.target/loongarch/mul-const-reduction.c: New test.
> ---
> gcc/config/loongarch/loongarch-def.cc | 39 ++++++++++---------
> gcc/config/loongarch/loongarch.cc | 22 +++++------
> .../loongarch/mul-const-reduction.c | 11 ++++++
> 3 files changed, 43 insertions(+), 29 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
>
Well, I'm curious about how the value of this cost is obtained.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own
2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
@ 2023-12-13 12:57 ` chenglulu
0 siblings, 0 replies; 12+ messages in thread
From: chenglulu @ 2023-12-13 12:57 UTC (permalink / raw)
To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c
LGTM!
Thanks.
在 2023/12/10 上午1:03, Xi Ruoyao 写道:
> With loongarch-def.cc switched from C to C++, we can include rtl.h for
> COSTS_N_INSNS, instead of hard coding our own.
>
> THis is a non-functional change for now, but it will make the code more
> future-proof in case COSTS_N_INSNS in rtl.h would be changed.
>
> gcc/ChangeLog:
>
> * config/loongarch/loongarch-def.cc (rtl.h): Include.
> (COSTS_N_INSNS): Remove the macro definition.
> ---
> gcc/config/loongarch/loongarch-def.cc | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc
> index c41804a180e..6217b19268c 100644
> --- a/gcc/config/loongarch/loongarch-def.cc
> +++ b/gcc/config/loongarch/loongarch-def.cc
> @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see
> #include "system.h"
> #include "coretypes.h"
> #include "tm.h"
> +#include "rtl.h"
>
> #include "loongarch-def.h"
> #include "loongarch-str.h"
> @@ -89,8 +90,6 @@ array_tune<loongarch_align> loongarch_cpu_align =
> .set (CPU_LA464, la464_align ())
> .set (CPU_LA664, la464_align ());
>
> -#define COSTS_N_INSNS(N) ((N) * 4)
> -
> /* Default RTX cost initializer. */
> loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
> : fp_add (COSTS_N_INSNS (1)),
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] LoongArch: Add alslsi3_extend
2023-12-09 17:03 ` [PATCH 3/3] LoongArch: Add alslsi3_extend Xi Ruoyao
@ 2023-12-13 12:58 ` chenglulu
0 siblings, 0 replies; 12+ messages in thread
From: chenglulu @ 2023-12-13 12:58 UTC (permalink / raw)
To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c
LGTM!
Thanks!
在 2023/12/10 上午1:03, Xi Ruoyao 写道:
> Following the instruction cost fix, we are generating
>
> alsl.w $a0, $a0, $a0, 4
>
> instead of
>
> li.w $t0, 17
> mul.w $a0, $t0
>
> for "x * 4", because alsl.w is 4 times faster than mul.w. But we didn't
> have a sign-extending pattern for alsl.w, causing an extra slli.w
> instruction generated to sign-extend $a0. Add the pattern to remove the
> redundant extension.
>
> gcc/ChangeLog:
>
> * config/loongarch/loongarch.md (alslsi3_extend): New
> define_insn.
> ---
> gcc/config/loongarch/loongarch.md | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
> index afbf201d4d0..7b26d15aa4e 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -2869,6 +2869,18 @@ (define_insn "alsl<mode>3"
> [(set_attr "type" "arith")
> (set_attr "mode" "<MODE>")])
>
> +(define_insn "alslsi3_extend"
> + [(set (match_operand:DI 0 "register_operand" "=r")
> + (sign_extend:DI
> + (plus:SI
> + (ashift:SI (match_operand:SI 1 "register_operand" "r")
> + (match_operand 2 "const_immalsl_operand" ""))
> + (match_operand:SI 3 "register_operand" "r"))))]
> + ""
> + "alsl.w\t%0,%1,%3,%2"
> + [(set_attr "type" "arith")
> + (set_attr "mode" "SI")])
> +
> \f
>
> ;; Reverse the order of bytes of operand 1 and store the result in operand 0.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
2023-12-13 12:22 ` chenglulu
@ 2023-12-13 13:20 ` Xi Ruoyao
2023-12-14 1:16 ` chenglulu
0 siblings, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-13 13:20 UTC (permalink / raw)
To: chenglulu, gcc-patches; +Cc: i, xuchenghua, c
On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:
在 2023/12/10 上午1:03, Xi Ruoyao 写道:
Replace the instruction costs in loongarch_rtx_cost_data constructor
based on micro-benchmark results on LA464 and LA664.
This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
and slli.
gcc/ChangeLog:
PR target/112936
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
instruction costs per micro-benchmark results.
(loongarch_rtx_cost_optimize_size): Set all instruction costs
to (COSTS_N_INSNS (1) + 1).
* config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
special case for multiplication when optimizing for size.
Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
Account the extra cost when TARGET_CHECK_ZERO_DIV and
optimizing for speed.
gcc/testsuite/ChangeLog
PR target/112936
* gcc.target/loongarch/mul-const-reduction.c: New test.
---
gcc/config/loongarch/loongarch-def.cc | 39 ++++++++++---------
gcc/config/loongarch/loongarch.cc | 22 +++++------
.../loongarch/mul-const-reduction.c | 11 ++++++
3 files changed, 43 insertions(+), 29 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
Well, I'm curious about how the value of this cost is obtained.
I just make a loop containing 1000 mul.w instructions, then run the loop
1000000 times and compare the time usage with running another loop
containing 1000 addi.w instructions iterated 1000000 times too.
>
Likewise for other instructions...
--
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
2023-12-13 13:20 ` Xi Ruoyao
@ 2023-12-14 1:16 ` chenglulu
2023-12-15 7:56 ` chenglulu
0 siblings, 1 reply; 12+ messages in thread
From: chenglulu @ 2023-12-14 1:16 UTC (permalink / raw)
To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c
在 2023/12/13 下午9:20, Xi Ruoyao 写道:
> On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:
>
> 在 2023/12/10 上午1:03, Xi Ruoyao 写道:
> Replace the instruction costs in loongarch_rtx_cost_data constructor
> based on micro-benchmark results on LA464 and LA664.
>
> This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
> and slli.
>
> gcc/ChangeLog:
>
> PR target/112936
> * config/loongarch/loongarch-def.cc
> (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
> instruction costs per micro-benchmark results.
> (loongarch_rtx_cost_optimize_size): Set all instruction costs
> to (COSTS_N_INSNS (1) + 1).
> * config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
> special case for multiplication when optimizing for size.
> Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
> Account the extra cost when TARGET_CHECK_ZERO_DIV and
> optimizing for speed.
>
> gcc/testsuite/ChangeLog
>
> PR target/112936
> * gcc.target/loongarch/mul-const-reduction.c: New test.
> ---
> gcc/config/loongarch/loongarch-def.cc | 39 ++++++++++---------
> gcc/config/loongarch/loongarch.cc | 22 +++++------
> .../loongarch/mul-const-reduction.c | 11 ++++++
> 3 files changed, 43 insertions(+), 29 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
>
> Well, I'm curious about how the value of this cost is obtained.
>
> I just make a loop containing 1000 mul.w instructions, then run the loop
> 1000000 times and compare the time usage with running another loop
> containing 1000 addi.w instructions iterated 1000000 times too.
> Likewise for other instructions...
>
Ok. I need to do a performance comparison of the spec here. Probably
tomorrow the results will be available.
Thanks!
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
2023-12-14 1:16 ` chenglulu
@ 2023-12-15 7:56 ` chenglulu
2023-12-17 3:02 ` chenglulu
0 siblings, 1 reply; 12+ messages in thread
From: chenglulu @ 2023-12-15 7:56 UTC (permalink / raw)
To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c
在 2023/12/14 上午9:16, chenglulu 写道:
>
> 在 2023/12/13 下午9:20, Xi Ruoyao 写道:
>> On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:
>>
>> 在 2023/12/10 上午1:03, Xi Ruoyao 写道:
>> Replace the instruction costs in loongarch_rtx_cost_data constructor
>> based on micro-benchmark results on LA464 and LA664.
>>
>> This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
>> and slli.
>>
>> gcc/ChangeLog:
>>
>> PR target/112936
>> * config/loongarch/loongarch-def.cc
>> (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
>> instruction costs per micro-benchmark results.
>> (loongarch_rtx_cost_optimize_size): Set all instruction costs
>> to (COSTS_N_INSNS (1) + 1).
>> * config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
>> special case for multiplication when optimizing for size.
>> Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
>> Account the extra cost when TARGET_CHECK_ZERO_DIV and
>> optimizing for speed.
>>
>> gcc/testsuite/ChangeLog
>>
>> PR target/112936
>> * gcc.target/loongarch/mul-const-reduction.c: New test.
>> ---
>> gcc/config/loongarch/loongarch-def.cc | 39
>> ++++++++++---------
>> gcc/config/loongarch/loongarch.cc | 22 +++++------
>> .../loongarch/mul-const-reduction.c | 11 ++++++
>> 3 files changed, 43 insertions(+), 29 deletions(-)
>> create mode 100644
>> gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
>>
>> Well, I'm curious about how the value of this cost is obtained.
>>
>> I just make a loop containing 1000 mul.w instructions, then run the loop
>> 1000000 times and compare the time usage with running another loop
>> containing 1000 addi.w instructions iterated 1000000 times too.
>> Likewise for other instructions...
>>
> Ok. I need to do a performance comparison of the spec here. Probably
> tomorrow the results will be available.
>
> Thanks!
>
Sorry, there is a problem with my test environment, so the results may
not be available until tomorrow.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
2023-12-15 7:56 ` chenglulu
@ 2023-12-17 3:02 ` chenglulu
0 siblings, 0 replies; 12+ messages in thread
From: chenglulu @ 2023-12-17 3:02 UTC (permalink / raw)
To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua, c
在 2023/12/15 下午3:56, chenglulu 写道:
>
> 在 2023/12/14 上午9:16, chenglulu 写道:
>>
>> 在 2023/12/13 下午9:20, Xi Ruoyao 写道:
>>> On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:
>>>
>>> 在 2023/12/10 上午1:03, Xi Ruoyao 写道:
>>> Replace the instruction costs in loongarch_rtx_cost_data constructor
>>> based on micro-benchmark results on LA464 and LA664.
>>>
>>> This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
>>> and slli.
>>>
>>> gcc/ChangeLog:
>>>
>>> PR target/112936
>>> * config/loongarch/loongarch-def.cc
>>> (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
>>> instruction costs per micro-benchmark results.
>>> (loongarch_rtx_cost_optimize_size): Set all instruction costs
>>> to (COSTS_N_INSNS (1) + 1).
>>> * config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
>>> special case for multiplication when optimizing for size.
>>> Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
>>> Account the extra cost when TARGET_CHECK_ZERO_DIV and
>>> optimizing for speed.
>>>
>>> gcc/testsuite/ChangeLog
>>>
>>> PR target/112936
>>> * gcc.target/loongarch/mul-const-reduction.c: New test.
>>> ---
>>> gcc/config/loongarch/loongarch-def.cc | 39
>>> ++++++++++---------
>>> gcc/config/loongarch/loongarch.cc | 22 +++++------
>>> .../loongarch/mul-const-reduction.c | 11 ++++++
>>> 3 files changed, 43 insertions(+), 29 deletions(-)
>>> create mode 100644
>>> gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
>>>
>>> Well, I'm curious about how the value of this cost is obtained.
>>>
>>> I just make a loop containing 1000 mul.w instructions, then run the
>>> loop
>>> 1000000 times and compare the time usage with running another loop
>>> containing 1000 addi.w instructions iterated 1000000 times too.
>>> Likewise for other instructions...
>>>
>> Ok. I need to do a performance comparison of the spec here. Probably
>> tomorrow the results will be available.
>>
>> Thanks!
>>
> Sorry, there is a problem with my test environment, so the results may
> not be available until tomorrow.
The SPEC2006 test was without problems, and the 483 had a 2.7 percent
performance improvement.
Thanks!
^ permalink raw reply [flat|nested] 12+ messages in thread
* Pushed: [PATCH 0/3] LoongArch: Fix instruction costs
2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
` (2 preceding siblings ...)
2023-12-09 17:03 ` [PATCH 3/3] LoongArch: Add alslsi3_extend Xi Ruoyao
@ 2023-12-17 15:22 ` Xi Ruoyao
3 siblings, 0 replies; 12+ messages in thread
From: Xi Ruoyao @ 2023-12-17 15:22 UTC (permalink / raw)
To: gcc-patches; +Cc: chenglulu, i, xuchenghua, c
On Sun, 2023-12-10 at 01:03 +0800, Xi Ruoyao wrote:
> Update LoongArch instruction costs based on the micro-benchmark results
> on LA464 and LA664. In particular, this allows generating alsl/slli or
> alsl/slli + add pairs for multiplying some constants as on LA464/LA664
> a mul instruction is 4x slower than alsl, slli, or add instructions.
>
> Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
>
> Xi Ruoyao (3):
> LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our
> own
> LoongArch: Fix instruction costs [PR112936]
> LoongArch: Add alslsi3_extend
>
> gcc/config/loongarch/loongarch-def.cc | 42 ++++++++++---------
> gcc/config/loongarch/loongarch.cc | 22 +++++-----
> gcc/config/loongarch/loongarch.md | 12 ++++++
> .../loongarch/mul-const-reduction.c | 11 +++++
> 4 files changed, 56 insertions(+), 31 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c
Pushed to r14-664{1,2,3} as all 3 patches are approved.
--
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-12-17 15:22 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-09 17:03 [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
2023-12-09 17:03 ` [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own Xi Ruoyao
2023-12-13 12:57 ` chenglulu
2023-12-09 17:03 ` [PATCH 2/3] LoongArch: Fix instruction costs [PR112936] Xi Ruoyao
2023-12-13 12:22 ` chenglulu
2023-12-13 13:20 ` Xi Ruoyao
2023-12-14 1:16 ` chenglulu
2023-12-15 7:56 ` chenglulu
2023-12-17 3:02 ` chenglulu
2023-12-09 17:03 ` [PATCH 3/3] LoongArch: Add alslsi3_extend Xi Ruoyao
2023-12-13 12:58 ` chenglulu
2023-12-17 15:22 ` Pushed: [PATCH 0/3] LoongArch: Fix instruction costs Xi Ruoyao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).